Applied Structural and Mechanical Vibrations Theory, Methods and Measuring Instrumentation 11 Probability and statistics preliminaries to random vibrations 11 1 Introduction This chapter covers some f.
11 Probability and statistics: preliminaries to random vibrations 11.1 Introduction This chapter covers some fundamental aspects of probability theory and serves the purpose of providing the necessary tools for the treatment of random vibrations, which will be discussed in the next chapter Probably, most of the readers have already some familiarity with the subject because probability theory and statistics—directly or indirectly—pervade almost all aspects of human activities, and, in particular, many branches of all scientific disciplines Nonetheless, in the philosophy of this text (as in Chapters and and in the appendices), the idea is to introduce and discuss some basic concepts with the intention of following a continuous line of reasoning from simple to more complex topics and the hope of giving the reader a useful source of reference for a clear understanding of this text in the first place, but of other more specialized books as well 11.2 The concept of probability In everyday conversation, probability is a loosely defined term employed to indicate the measure of one’s belief in the occurrence of a future event when this event may or may not occur Moreover, we use this word by indirectly making some common assumptions: (1) probabilities near (100%) indicate that the event is extremely likely to occur, (2) probabilities near zero indicate that the event is almost not likely to occur and (3) probabilities near 0.5 (50%) indicate a ‘fair chance’, i.e that the event is just as likely to occur as not If we try to be more specific, we can consider the way in which we assign probabilities to events and note that, historically, three main approaches have developed through the centuries We can call them the personal approach, the relative frequency approach and the classical approach The personal approach reflects a personal opinion and, as such, is always applicable because anyone can have a personal opinion about anything However, it is not very fruitful for our purposes The relative frequency approach is more objective and pertains to cases in which an ‘experiment’ Copyright © 2003 Taylor & Francis Group LLC can be repeated many times and the results observed; P[A], the probability of occurrence of event A is given as (11.1) where nA is the number of times that event A occurred and n is the total number of times that the experiment was run This approach is surely useful in itself but, obviously, cannot deal with a one-shot situation and, in any case, is a definition of an a posteriori probability (i.e we must perform the experiment to determine P[A]) The idea behind this definition is that the ratio on the r.h.s of eq (11.1) is almost constant for sufficiently large values of n Finally, the classical approach can be used when it can be reasonably assumed that the possible outcomes of the experiment are equally likely; then (11.2) where n(A) is the number of ways in which outcome A can occur and n(S) is the number of ways in which the experiment can proceed Note that in this case we not really need to perform the experiment because eq (11.2) defines an a priori probability A typical example is the tossing of a fair coin; without an experiment we can say that n(S)=2 (head or tail) and the probability of, say, a head is Pictorially (and also for historical reasons), we may view eq (11.2) as the ‘gambler’s definition’ of probability However, consider the following simple and classical ‘meeting problem’: two people decide to meet at a given place anytime between noon and p.m The one who arrives first is obliged to wait 20 and then leave If their arrival times are independent, what is the probability that they actually meet? The answer is 5/9 (as the reader is invited to verify) but the point is that this problem cannot be tackled with the definitions of probability given above We will not pursue the subject here, but it is evident that the definitions above cannot deal with a large number of problems of great interest As a matter of fact, a detailed analysis of both definitions (11.1) and (11.2)— because of their intrinsic limitations, logical flaws and lack of stringency— shows that they are inadequate to form a solid basis for a more rigorous mathematical theory of probability Also, the von Mises definition, which extends the relative frequency approach by writing (11.3) suffers serious limitations and runs into insurmountable logical difficulties Copyright © 2003 Taylor & Francis Group LLC The solutions to these difficulties was given by the axiomatic theory of probability introduced by Kolmogorov Before introducing this theory, however, it is worth considering some basic ideas which may be useful as guidelines for Kolmogorov’s abstract formulation Let us consider eq (11.2), we note that, in order to determine what is ‘probable’, we must first determine what is ‘possible’; this means that we have to make a list of possibilities for the experiment Some common definitions are as follows: a possible outcome of our experiment is called an event and we can distinguish between simple events, which can happen only in one way, and compound events, which can happen in more than one distinct way In the rolling of a die, for example, a simple event is the observation of a 6, whereas a compound event is the observation of an even number (2, or 6) In other words, simple events cannot be decomposed and are also called sample points The set of all possible sample points is called a sample space Now, adopting the notation of elementary set theory, we view the sample space as a set W whose elements Ej are the sample points If the sample space is discrete, i.e contains a finite or countable number of sample points, any compound event A is a subset of W and can be viewed as a collection of two or more sample points, i.e as the ‘union’ of two or more sample points In the die-rolling experiment above, for example, we can write where we call A the event ‘observation of an even number’, E2 the sample point ‘observation of a 2’ and so on In this case, it is evident that and, since E2, E4 and E6 are mutually exclusive (11.4a) The natural extension of eq (11.4a) is (11.4b) Moreover, if we denote by have also the complement of set A (i.e ),we (11.4c) and if we consider two events, say B and C, which are not mutually exclusive, then (11.4d) where the intersection symbol n is well known from set theory and is Copyright © 2003 Taylor & Francis Group LLC often called the compound probability, i.e the probability that events B and C occur simultaneously (Note that one often finds also the symbols A+B for and AB for ) Again, in the rolling of a fair die, for example, let and then and, as expected, For three nonmutually exclusive sets, it is not difficult to extend eq (11.4d) to (11.4e) as the reader is invited to verify Incidentally, it is evident that the method that we are following requires counting; for example, the counting of sample points and/or a complete itemization of equiprobable sets of sample points For large sample spaces this may not be an easy task Fortunately, aid comes from combinatorial analysis from which we know that the number of permutations (arrangements of objects in a definite order) of n distinct objects taken r at a time is given by (11.5) while the number of combinations (arrangements of objects without regard to order) of n distinct objects taken r at a time is (11.6) For example, if n=3 (objects a, b and c) and r=2, the fact that the number of combination is less than the number of permutations is evident if one thinks that in a permutation the arrangement of objects {a, b} is considered different from the arrangement {b, a}, whereas in a combination they count as one single arrangement These tools simplify the counting considerably For example, suppose that a big company has hired 15 new engineers for the same job in different plants If a particular plant has four vacancies, in how many ways can they fill these positions? The answer is now straightforward and is given by C15,4=1365 Moreover, note also that the calculations of factorials can be often made easier by using Stirling’s formula, i.e which results in errors smaller that 1% for Returning now to our main discussion, we can make a final comment before introducing the axiomatic theory of probability: the fact that two events B and C are mutually exclusive is formalized in the language of sets Copyright © 2003 Taylor & Francis Group LLC as where Ø is the empty set So, we need to include this event in the sample space and require that By so doing, we obtain the expected result that eq (11.4d) reduces to the sum P[B]+P[C] whenever events B and C are mutually exclusive In probability terminology, Ø is called the ‘impossible event’ 11.2.1 Probability—axiomatic formulation and some fundamental results We define a probability space as a triplet where: W is a set whose elements are called elementary events is a σ-algebra of subsets of W which are called events P is a probability function, i.e a real-valued function with domain such that: (a) for every (b) P[W]=1 and (c) when if the Aj s are mutually disjoint events, i.e For completeness, we recall here the definition of s-algebra: a collection subsets of a given set W is a s-algebra if If If and of then and for every index j=1, 2, 3,…then Two observations can be made immediately First—although it may not seem obvious—the axiomatic definition includes as particular cases both the classical and the relative frequency definitions of probability without suffering their limitations; second, this definition does not tell us what value of probability to assign to a given event This is in no way a limitation of this definition but simply means that we will have to model our experiment in some way in order to obtain values for the probability of events In fact, many problems of interest deal with sets of identical events which are not equally likely (for example, the rolling of a biased die) Let us introduce now two other definitions of practical importance: conditional probability and the independence of events Intuitively, we can argue that the probability of an event can vary depending upon the occurrence or nonoccurrence of one or more related events: in fact, it is different to ask in the die-rolling experiment ‘What is the probability of a 6?’ or ‘What is the probability of a given that an even number has fallen?’ The answer to the first question is 1/6 while the answer to the second question is 1/3 This is the concept of conditional probability, i.e the probability of an event A Copyright © 2003 Taylor & Francis Group LLC given that an event B has already occurred The symbol for conditional probability is P[A|B] and its definition is (11.7) provided that It is not difficult to see that, for a given probability space satisfies the three axioms above and is a probability function in its own right Equation (11.7) yields immediately the multiplication rule for probabilities, i.e (11.8a) which can be generalized to a number of events as follows: (11 8b) If the occurrence of event B has no effect on the probability assigned to an event A, then A and B are said to be independent and we can express this fact in terms of conditional probability as (11.9a) or, equivalently (11.9b) Clearly, two mutually exclusive events are not independent because, from eq (11.7), we have P[A|B]=0 when Also, if A and B are two independent events, we get from eq (11.7) (11.10a) which is referred to as the multiplication theorem for independent events (Note that some authors give eq (11.10a) as the definition of independent events) For n mutually (or collectively) independent events eq (11.8b) yields (11.10b) A word of caution is necessary at this point: three (or more) random events can be independent in pairs without being mutually independent This is illustrated by the example that follows Copyright © 2003 Taylor & Francis Group LLC Example 11.1 Consider a lottery with eight numbers (1–8) and let respectively, be the simple events of extraction of 1, extraction of 2, etc Let Now, that and It is then easy to verify and which means that the events are pairwise independent However, meaning that the three events are not mutually, or collectively, independent Another important result is known as the total probability formula Let be n mutually exclusive events such that where W is the sample space Then, a generic event B can be expressed as (11.11) where the n events are mutually exclusive Owing to the third axiom of probability, this implies so that, by using the multiplication theorem, we get the total probability formula (11.12) which remains true for With the same assumptions as above on the events let us now consider a particular event Ak; the definition of conditional probability yields (11.13) Copyright © 2003 Taylor & Francis Group LLC where eq (11.12) has been taken into account Also, by virtue of eq (11.8a) we can write so that substituting in eq (11.13) we get (11.14) which is known as Bayes’ formula and deserves some comments First, the formula is true if Second, eq (11.14) is particularly useful for experiments consisting of stages Typically, the Ajs are events defined in terms of a first stage (or, otherwise, the P[Aj] are known for some reason), while B is an event defined in terms of the whole experiment including a second stage; asking for P[Ak|B] is then, in a sense, ‘backward’, we ask for the probability of an event defined at the first stage conditioned by what happens in a later stage In Bayes’ formula this probability is given in terms of the ‘natural’ conditioning, i.e conditioning on what happens at the first stage of the experiment This is why the P[Aj] are called the a priori (or prior) probabilities, whereas P[Ak|B] is called a posteriori (posterior or inverse) probability The advantage of this approach is to be able to modify the original predictions by incorporating new data Obviously, the initial hypotheses play an important role in this case; if the initial assumptions are based on an insufficient knowledge of the mechanism of the process, the prior probabilities are no better than reasonable guesses Example 11.2 Among voters in a certain area, 40% support party and 60% support party Additional research indicates that a certain election issue is favoured by 30% of supporters of party and by 70% of supporters of party One person at random from that area—when asked—says that he/she favours the issue in question What is the probability that he/she is a supporter of party 2? Now, let • • • A1 be the event that a person supports party 1, so that P[A1]=0.4; A2 be the event that a person supports party 2, so that P[A2]=0.6; B be the event that a person at random in the area favours the issue in question Prior knowledge (the results of the research) indicate that P[B|A1]=0.3 and P[B|A2]=0.7 The problem asks for the a posteriori probability P[A2|B], i.e the probability that the person who was asked supports party given the fact that he/she favours that specific election issue From Bayes’ formula we get Then, obviously, we can also infer that Copyright © 2003 Taylor & Francis Group LLC 11.3 Random variables, probability distribution functions and probability density functions Events of major interest in science and engineering are those identified by numbers Moreover—since we assume that the reader is already familiar with the term ‘variable’—we can state that a random variable is a real variable whose observed values are determined by chance or by a number of causes beyond our control which defy any attempt at a deterministic description In this regard, it is important to note that the engineer’s and applied scientist’s approach is not so much to ask whether a certain quantity is a random variable or not (which is often debatable), but to ask whether that quantity can be modelled as a random variable and if this approach leads to meaningful results In mathematical terms, let x be any real number, then a random variable on the probability space (W, , P) is a function ( is the set of real numbers) such that the sets are events, i.e In words, let X be a real-valued function defined on W; given a real number x, we call Bx the set of all elementary events w for which If, for every x the sets Bx belong to the σ-algebra , then X is a (one-dimensional) random variable The above definition may seem a bit intricate at first glance, but a little thought will show that it provides us precisely with what we need In fact, we can now assign a definite meaning to expression P[Bx], i.e the probability that the random variable X corresponding to a given experiment will assume a value less than or equal to x It is then straightforward, for a given random as variable X, to define the function (11.15) which is called the cumulative distribution function (cdf, or the distribution function) of the random variable X From the definition, the following properties can be easily proved: (11.16) where x1, and x2 are any two real numbers such that In other words, distribution functions are monotonically non-decreasing functions which start Copyright © 2003 Taylor & Francis Group LLC at zero for and increase to unity for It should be noted that every random variable defines uniquely its distribution functions but a given distribution function corresponds to an arbitrary number of different random variables Moreover, the probabilistic properties of a random variable can be completely characterized by its distribution function Among all possible random variables, an important distinction can be made between discrete and continuous random variables The term discrete means that the random variable can assume only a finite or countably infinite number of distinct possible values Then, a complete description can be obtained by knowing the probabilities for k=1, 2, 3,…by defining the distribution function as (11.17) where we use the symbol θ for the Heaviside function (which we already encountered in Chapters and 5), i.e (11.18) The distribution function of a discrete random variable is defined over the entire real line and is a ‘step’ function with a number of jumps or discontinuities occurring at any point xk A typical and simple example is provided by the die-rolling experiment where X is the numerical value observed in the rolling of the die In this case, etc and for every k=1, 2,…, Then for for for A continuous random variable, on the other hand, can assume any value in some interval of the real line For a large and important class of random variables there exist a certain non-negative function (x) which satisfies the relationship so that for (11.19) where px(x) is called the probability density function (pdf) and η is a dummy Copyright © 2003 Taylor & Francis Group LLC has a Fourier transform because, owing to the normalization condition, the integral (11.46) verifies the Dirichlet condition Also (11.46b) A principal use of the characteristic function has to with its momentgenerating property If we differentiate eq (11.46a) with respect to ω we obtain then, letting in the above expression, we get (11.47) Continuing this process and differentiating m times, if the mth moment of X is finite, we have (11.48) meaning that if we know the characteristic function we can find the moments of the random variable in question by simply differentiating that function and then evaluating the derivative at Of course, if we know the pdf we always have to perform the integration of eq (11.46a), but if more than one moment is needed this is one integration only, rather than one for each moment to be calculated Thus, if all the moments of X exist we can expand in a Taylor series the function about the origin to get (11.49) For example, for a Gaussian distributed random variable X we can once again make use of the standardized random variable so that Copyright © 2003 Taylor & Francis Group LLC Furthermore and finally (11.50) From eq (11.50) it is easy to determine that so that, as expected (eq (11.47)) It is left to the reader to verify that Then, by virtue of eqs (11.48) and (11.37) we get which is the same result as eq (11.43) It must be noted that for a Gaussian distribution all moments are functions of the two parameters µ and s only, meaning that the normal distribution is completely characterized by its mean and variance Finally, it may be worth mentioning the fact that the so called logcharacteristic function is also convenient in some circumstances This function is defined as the natural logarithm of 11.5 More than one random variable All the concepts introduced in the previous sections can be extended to the case of two or more random variables Consider a probability space (W, , P) and let be n random variables according to the definition of Section 11.3 Then we can consider n real numbers xj and introduce the joint cumulative distribution function as (11.51) If and whenever convenient, both the Xjs and the xjs can be written as column vectors, i.e and so that the joint distribution function is written simply Fx(x) Equation (11.51) in words means that the joint cdf expresses the probability that all the inequalities take place simultaneously Copyright © 2003 Taylor & Francis Group LLC If now, for simplicity, we consider the case of two random variables X and Y (the ‘bivariate’ case) it is not difficult to see that the following properties hold: (11.52) If there exists a function pXY(x, y) such that for every x and y (11.53) this function is called the joint probability density function of X and Y This joint pdf can be obtained from FXY(x, y) by differentiation, i.e (11.54) The one-dimensional functions (11.55) are called marginal distributions of the random variables X and Y, respectively Also, we have the following properties for (11.56) Copyright © 2003 Taylor & Francis Group LLC and the one-dimensional functions pX(x) and pY(y) are called marginal density functions: pX(x)dx is the probability that while Y can assume any value within its range of definition Similarly, pY(y)dy is the probability that when X can assume any value between – and + These concepts can be extended to the case of n random variables In Section 11.2.1 we introduced the concept of conditional probability Following the definition given by eq (11.7) we can define the cdf FX(x|y) as (11.57) and similarly for FY(y|x) In terms of probability density functions, the conditional pdf that given that can be expressed as (11.58) provided that From eq (11.58) it follows that (11.59a) where pY(y) is the marginal pdf of Y Similarly (11.59b) so that (11.60) which is the multiplication rule for infinitesimal probabilities, i.e the counterpart of eq (11.8a) The key idea in this case is that a conditional pdf is truly a probability density function, meaning that, for example, we can calculate the expected value of X given that Y=y from the expression (11.61) In this regard we may note that E[X|y] is a function of y, i.e different conditional expected values are obtained for different values of y If now we let Y range over all its possible values we obtain a function of the random Copyright © 2003 Taylor & Francis Group LLC variable Y (i.e.) and we can calculate its expected value as (taking eqs (11.35b), (11.61) and (11.60) into account) which expresses the interesting result (11.62) Similarly These formulas often provide a more efficient way for calculating the expected values E[X] or E[Y] Proceeding in our discussion, we can now consider the important concept of independence In terms of random variables, independence has to with the fact that knowledge of, ay, X gives no information whatsoever on Y and vice versa This occurrence is expressed mathematically by the fact that the joint distribution function can be written as a product of the individual marginal distribution functions, i.e the random variables X and Y are independent if and only if (11.63) or, equivalently (11.64) If now we consider the descriptors of two or more random variables we can define the joint moments of X and Y defined by the expression (11.65) or the central moments (11.66) where and Copyright © 2003 Taylor & Francis Group LLC Particularly important is the second-order central moment which is called the covariance (or ΓXY are all widely adopted symbols) of X and Y, i.e (11.67a) which is often expressed in nondimensional form by introducing the correlation coefficient ρXY: (11.67b) For two independent variables eq (11.64) holds, this means and so that if the two standard deviations sX and sY are not equal to zero we have (11.68) Equation (11.68) expresses the fact that the two random variables are uncorrelated It must be noted that two independent variables are uncorrelated but the reverse is not necessarily true, i.e if eq (11.68) or holds, it does not necessarily mean that X and Y are independent However, this statement is true for normally (Gaussian) distributed random variables The correlation coefficient satisfies the inequalities and is a measure of how closely the two random variables are linearly related In the two extreme cases or there is a perfect linear relationship between X and Y In the case of n random variables the matrix notation proves to be convenient and one can form the n×n matrix of products XXT and introduce the covariance and the correlation matrices K and r This latter, for example, is given by (11.69) For n mutually independent random variables r=I where I is the nìn identity matrix Copyright â 2003 Taylor & Francis Group LLC Among others, an example worth mentioning is the joint Gaussian pdf of two random variables X and Y This is written (11.70) where ρ is the correlation coefficient ρXY The two-dimensional pdf (11.70) is often encountered in engineering practice When the correlation coefficient is equal to zero it reduces to the product of two one-dimensional Gaussian pdfs, meaning that—as has already been mentioned—in the Gaussian case noncorrelation implies independence 11.6 Some useful results: Chebyshev’s inequality and the central limit theorem Before considering two important aspects of probability theory, namely Chebychev’s inequality and the central limit theorem, we will give some results that can often be useful in practical problems Let X be a random variable with pdf pX(x) Since a deterministic relationship of the type f(X)— where f is a reasonable function—defines another random variable Y=f(X), we ask for its pdf The simplest case is when f is a monotonic increasing function Then, given a value y, whenever where y=f(x) Moreover, the function f –1 exists, is single valued and Hence (11.71a) and pY(y) can be obtained by differentiation, i.e (11.72a) If f is a monotonic decreasing function eq (11.71a) becomes (11.71b) and differentiating (11.72b) Copyright © 2003 Taylor & Francis Group LLC Then, noting that df/dx is positive if f is monotonically increasing and negative when f is monotonically decreasing, we can combine eq (11.72a) and (11.72b) into the single equation (11.73) As a simple example, consider a random variable X with pdf and let Then so that and The reader is invited to sketch a graph of pX(x) and pY(x) and note that the two curves are markedly different If f is not monotone, it can often be divided into monotone parts The considerations above are then applied to each part and the sum taken The case of two or more random variables can also be considered Suppose that we have a random n×1 vector which is a function of the basic random vector i.e Y=f(X), this symbol meaning that etc Suppose further that we know the joint pdf pX(x) and we ask for the joint pdf pY(y) Then, if the inverse f–1 exists, we can obtain a result that resembles eq (11.73), i.e (11.74) where J is the Jacobian matrix (11.75) Given two random variables X1 and X2 and their joint pdf, a problem of interest is to determine the pdf of their sum, i.e of the random variable Now, if we introduce the auxiliary variable we can adopt the vector notation that led to eq (11.74) and write the known pdf as In this case, we have (11.76) Copyright © 2003 Taylor & Francis Group LLC and |det J|=1 By noting that and we can obtain the joint pdf pY(Y) from eq (11.74), i.e and then arrive at the desired result by calculating it as the marginal pdf of the random variable y1, that is (11.77a) or—since the definition of the auxiliary variable is arbitrary—we can set and obtain the equivalent expression (11.77b) If the two original variables are independent, then and we get (11.78) which we recognize as the convolution integral of the two functions and (e.g eq (5.24)) So, when the two original random variables are independent, we can recall the properties of Fourier transforms and infer from eq (11.78) that the characteristic function of the sum random variable Y1 is given by the product of the two individual characteristic functions of X1 and X2, i.e where there is no 2π factor because no such factor appears in the definition of characteristic function (eq (11.46a)) In this regard it is worth mentioning—and it is not difficult to prove— that if the two random variables X1 and X2 are individually normally distributed, then their sum is also normally distributed Furthermore, the reverse statement is also true when the two variables are independent: if the pdf pY(y) is Gaussian and the two random variables are independent, then X1 and X2 are individually normally distributed If, on the other hand, we now look for the pdf of the product of the two variables X1 and X2, we can set and Then and since we can obtain the desired result by integrating eq (11.74) in dx1, that is Copyright © 2003 Taylor & Francis Group LLC (11.79a) or, equivalently, (11.79b) Finally, we can consider the ratio of the two original random variables X1 and X2 In this case it is convenient to set and Then and from eq (11.74) we get (11.80) If we now turn to expected values, it is a common problem to consider a random variable Y which is a linear combination of n random variables i.e (11.81) where the aj are real coefficients The expected value E[Y] is easily obtained as (11.82) while the variance can be calculated as follows: (11.83a) Copyright © 2003 Taylor & Francis Group LLC meaning that, if the variables are pairwise uncorrelated, (11.83b) Obviously, eq (11.83b) holds also for the stronger condition of mutually independent Xjs Chebychev’s inequality In practical circumstances, we often have to deal with random variables whose distribution function is not known Although we lack important information, in these cases it would be nevertheless desirable to evaluate approximately the probability that the variable in question assumes a value in a given numerical range An important result in this regard is given by Chebychev’s inequality which can be stated as follows: let X be a random then for any positive constant c variable with a finite variance (11.84a) Two remarks can be made immediately: first of all, it is important to note that eq (11.84) is valid for any probability distribution and, second, there is no requirement that the mean value µX is finite because it is not difficult to show that for a random variable with finite second-order moment—i.e —the first moment is also finite If now we take the constant c in the form (where a is a positive constant) and rearrange terms, eq (11.84a) can also be expressed as (11.84b) or (11.84c) A typical application of the Chebychev’s inequality is illustrated in the following simple example Example 11.7 Suppose the steel rods from a given industrial process have a mean diameter of 20 mm and a standard deviation of 0.2 mm Suppose further that these are the only available data about the process in question Copyright © 2003 Taylor & Francis Group LLC For the future, the management decides that the steel-rod production is considered satisfactory if at least 80% of the rods produced have diameters in the range 19.5–20.5 mm Does the production process need to be changed? Our random variable X is the rod diameter and the question is whether In this case we have and Chebychev’s inequality in the form of eq (11.84b) or (11.84c) leads to so that, according to the management’s standards, the process can be considered satisfactory and does not need to be changed In general, it must be noted that results of Chebychev’s inequality are very conservative in the sense that the actual probability that X is in the range usually exceeds the lower bound 1–1/a2 by a significant amount For example, if it was known that our random variable follows a Gaussian probability distribution we would have P[19.5