• A discrete random variable: is one that can take on a countable number of values.. • A continuous random variable is one that can take on an uncountable number of values... This is a
Trang 1Random Variables and Discrete
Probability Distributions
Trang 2• A probability (assigned to each outcome in the experiment)
• We now add the concept of a random variable and a
probability distribution.
Trang 3• A discrete random variable: is one that can take on
a countable number of values.
• A continuous random variable is one that can take
on an uncountable number of values.
Trang 4• Example: Experiment is flipping a coin 10 times, and
let X=# of heads observed in the experiment This is a
discrete random variable, since X can only take on
the values {0,1,2,…,10}, which is finite and therefore countable
• Example: Suppose the experiment is measuring the
time to complete a task, and let X=total time taken.
This is a continuous random variable: Since time is
continuous, the range of values that X can take is a continuum, and therefore uncountable.
Trang 5• To help understand the distinction between countable and uncountable sets, keep in mind: a) The set of all integer numbers is countable.
b) The set of all real numbers is uncountable (a
continuum).
Trang 6• A probability distribution describes the values that a random variable can take, along with
Trang 7• If X is a discrete random variable, its probability
distribution simply represents the probability that X
can take on each one of its possible values.
• We use upper case to denote a random variable,
and we use lower case to denote a particular value that this random variable can take.
• We represent the probability that the random
variable ‘X’ will equal ‘x’ as P(X=x) or, more simply,
P(x).
Trang 8• As a result of the conditions required of a probability (non‐negative, they must add to 1), the probability distribution P of a discrete random variable must
Trang 14• We can developing the probability distribution using a Probability Tree.
P(S)=.2
P(S C )=.8
P(S)=.2 P(S)=.2 P(S)=.2
P(S)=.2
P(S C )=.8 P(S C )=.8
P(S C )=.8 P(S)=.2
Trang 15Population/Probability Distribution …
• The discrete probability distribution describes
a population.
• Since we have populations, we can describe them by computing various parameters.
• Two of the population parameters we studied
previously are: population mean and
population variance.
Trang 16Random Variable
• Our general definition of the population mean is
• If we know that the random variable X is discrete, we can re‐express µ in terms of the probability
distribution of X. We have:
• This parameter is also called the expected value of X
Trang 17• The population variance of a discrete random variable can be expressed similarly. It is the weighted average of
the squared deviations from the mean:
• As before, there is a “short‐cut” formulation…
• The standard deviation is the same as before:
Trang 18) 7 ( 7
) 2 ( 2 )
1 ( 1 )
Trang 20• There are certain properties of the Expected Value that are useful to know. Let ‘c’ be a constant. Then:
(1) E(c) = c
In words: The expected value of a constant (c) is just the value of the constant
(2) E(X + c) = E(X) + c, and (3) E(cX) = cE(X)
In words: We can “pull” a constant out of the
expected value expression (either as part of a sum
with a random variable X or as a coefficient of random
Trang 21• Example 7.4: Monthly sales have a mean of
$25,000 and a standard deviation of $4,000.
• Profits are calculated by multiplying sales by 30% and subtracting fixed costs of $6,000
Find the mean monthly profit.
1) Describe the problem statement in algebraic terms:
sales have a mean of $25,000 E(Sales) = 25,000
profits are calculated by…
Profit = .30(Sales) – 6,000
Trang 22Find the mean monthly profit.
E(Profit) =E[.30(Sales) – 6,000]
=E[.30(Sales)] – 6,000 [by rule #2]
=.30E(Sales) – 6,000 [by rule #3]
=.30(25,000) – 6,000 = 1,500 Thus, the mean monthly profit is $1,500
Trang 233 V(cX) = c 2 V(X)
– In words: The variance of a random variable and a
constant coefficient is the coefficient squared times
the variance of the random variable.
Trang 24$25,000 and a standard deviation of $4,000. Profits are calculated by multiplying sales by 30% and subtracting
profits are calculated by… Profit = .30(Sales) – 6,000
Trang 252) The variance of profit is = V(Profit)
Trang 26• Example 7.4 (summary): Monthly sales have a
mean of $25,000 and a standard deviation of
$4,000. Profits are calculated by multiplying sales by 30% and subtracting fixed costs of
Trang 30• As before, we can calculate the marginal probabilities by
summing across rows and down columns to determine
the probabilities of X and Y individually:
Trang 33• The (population) coefficient of correlation is calculated in the same way as described
earlier…
Trang 34• Example 7.6: Compute the covariance and the
Trang 36P(X+Y=2) = P(0,2) + P(1,1) + P(2,0)
Trang 37• This is:
Pr(2 ≤ X+Y ≤ 3) = 0.19 + 0.05 = 0.24
Trang 38.)
YX
Trang 39Two Random Variables
• Previously, we stated Laws for expected values and variances involving a random variable X and a
constant ‘c’.
• We also have laws involving the sum of two random variables:
1 E(X + Y) = E(X) + E(Y)
2 V(X + Y) = V(X) + V(Y) + 2COV(X, Y)
• If X and Y are independent, COV(X, Y) = 0 and thus (2) becomes:
V(X + Y) = V(X) + V(Y)
Trang 40marginal distributions of X and Y before. We have:
• We had obtained E(X+Y) and V(X+Y) by deriving the
distribution of X+Y. But we can use the Laws of sums of random variables:
E(X + Y) = E(X) + E(Y) = .7 + .5 = 1.2 V(X + Y) = V(X) + V(Y) + 2COV(X, Y)
= .41 + .45 + 2(‐.15) = .56
Trang 41Combinations of Two Random Variables
• Let ‘c’ and ‘d’ be two constants. We can generalize the laws of expectation and variance from the sum X+Y to
Trang 45a) The expected values of the two stocks are
E(R1) = .08 and E(R2) = .15 The weights are w1 = .25 and w2 = .75.
Thus,
E(R2) = w1E(R1) + w2E(R2)
= .25(.08) + .75(.15)
= .1325 (an expected portfolio return of 13.25%)
Trang 46The standard deviations are σ1 = .12 and σ2 = .22. Thus, V(Rp) = w12 σ12 + w22 σ22 + 2w1w2ρσ1σ2
= (.25 2 )(.12 2 ) + (.75 2 )(.22 2 ) + 2(.25)(.75)ρ (.12)(.22)
= .0281 + .0099 ρ
When ρ = 1
V(Rp) = .0281 + .0099(1) = .0380 When ρ = .5
V(Rp) = .0281 + .0099(.5) = .0331 When ρ = 0
V(Rp) = .0281 + .0099(0) = .0281
Trang 47• Next, note that the statement of the problem
did not give us the covariance between R1
and R2 directly…
• However, it gave us the standard deviations of R1 and R2, and it asked us to solve the
Trang 48• Recall that:
• and, therefore:
Trang 49• Therefore, in the three correlation scenarios to be considered, we have:
Trang 52• We can extend the formulas that describe the mean and variance of the returns of a portfolio of two
k
1 i j
j i
j i
k
1 i
2 i
2
i 2 w w COV ( R , R ) w
Trang 53• When k is greater than 2 the calculations can be tedious and time‐consuming.
• For example, when k = 3, we need to know the values of the three weights, three expected
values, three variances, and three covariances.
• When k = 4, there are four expected values, four variances and six covariances. [The number of
covariances required in general is k(k‐1)/2.]
Trang 55• “Success” and “Failure” are just labels for a
binomial experiment, there is no value judgment implied.
Trang 564) The trials are independent (i.e. the outcome of heads on the first flip will have no impact on subsequent coin flips).
all conditions were met.
Trang 57• The binomial random variable counts the number of
successes in n trials of the binomial experiment. It can take on values from 0, 1, 2, …, n. Thus, its a
discrete random variable.
• To calculate the probability associated with each
value of X, we use combinatorics:
for x=0, 1, 2, …, n
Trang 59• Thus, we have that the probability of any outcome that yields ‘x’ successes in ‘n’ trials is:
• In addition, there are a total of
such outcomes
• Thus, adding up the probabilities of all such outcomes,
we obtain the binomial probability formula:
Trang 60• Example: A quiz consists of 10 multiple‐choice questions. Each question has five possible
answers, only one of which is correct.
• Suppose a student plans to guess the answer to
each question.
• What is the probability that the student gets no answers correct?
• What is the probability that the student gets two answers correct?
Trang 62• Thus, we have a binomial experiment where
n=10 , and P(success) = .20
• What is the probability that the student gets
no answers correct? This is P(X=0):
The student has about an 11% chance of getting no answers correct
using the guessing strategy.
Trang 63• What is the probability that the student gets
two answers correct? That is, P(X=2):
Pat has about a 30% chance of getting exactly two answers
correct using the guessing strategy.
Trang 64• Thus far, we have been using the binomial probability distribution to find probabilities for individual values
Trang 65• We already know P(0) = .1074 and P(2) = .3020. Using the binomial formula to calculate the others:
P(1) = .2684 , P(3) = .2013, and P(4) = .0881
• We have P(X ≤ 4) = .1074 + .2684 + … + .0881 = .9672
• Thus, its about 97% probable that the student will fail the test using the luck strategy and guessing at
answers…
Trang 66• Calculating binomial probabilities by hand is tedious and error prone. There is an easier way. Refer to Table 1 in
Trang 70• We can compute these probabilities from
cumulative probabilities, we explain how next…
Trang 71• If X is discrete, we can obtain P(X=k) from P(X ≤ k)
and P(X ≤ k‐1) by:
P(X = k) = P(X ≤ k) – P(X ≤ k–1)
• Likewise, for probabilities given as P(X ≥ k), we have:
P(X ≥ k) = 1 – P(X ≤ k–1)
• Finally, we can compute Pr(k1 ≤ X ≤ k2) as:
Trang 72• Example: Problem 7.93.‐ The leading brand of
dishwasher detergent has a 30% market share. A sample
of 25 dishwasher detergent customers was taken. What
is the probability that 10 of fewer customers chose the leading brand?
• This is an example of a binomial random variable:
X=# of customers who bought leading dishwasher brand
• The underlying experiment consists of:
n=25 trials p=Prob(“Success”)=0.30
• The problem asks for P(X ≤ 10) . Using Table 1 in the
Appendix, we have P(X ≤ 10)=0.9022
Trang 73• Example: Problem 7.97.‐ It is believed that 10% of all
voters in the United States consider themselves as
“Independent”. A survey asked 25 people to identify themselves as Democrat, Republican or
Trang 74• Once again, this is an example of a binomial random variable
X = # of Independent voters in the survey
• The underlying experiment consists of
n=25 trials p=Prob(“success”)=0.10
• The problem asks:
a) Pr(X = 0)
b) Pr(X ≤ 4)
c) Pr(X ≥ 3)
Trang 78random variable X lies within ‘k’ standard deviations of its mean is at least:
• Since we want this to be at least 75%, we first need to find the ‘k’ such that
• This yields .
Trang 79• Next, recall that in the example we have n=25 and p=0.30
• Therefore, using the expectation and variance formulas for Binomial random variables, we have:
E[X] = n∙p = 7.5 and
• Therefore, an interval that will include, with at least 75% probability, the actual number of customers who will
Trang 80• Named for Simeon Poisson, the Poisson distribution
is a discrete probability distribution and refers to the
number of events (a.k.a. successes) within a specific time period or region of space.
stretch of highway. (The interval is defined by both time, 1 day, and space, the particular stretch of
highway.)
Trang 81• Difference between Binomial and Poisson
Random variables:
• A binomial random variable is the number of successes in a given number of trials, whereas
a Poisson random variable is the number of
successes in an interval of time or in a specific region of space.
Trang 83time period
Trang 85• Example 7.12: A statistics instructor has observed
that the number of typographical errors in new editions of textbooks varies considerably from
Trang 87• How to proceed?
• First, note that we are now talking about an interval of 400 pages.
Trang 89• For a 400 page book, what is the probability that
there are five or less typos?
P(X≤5) = P(0) + P(1) + … + P(5)
• This is rather tedious to solve manually. A better alternative is to refer to Table 2 in Appendix B…
k=5, µ =6, and P(X ≤ k) = .446
“there is about a 45% chance there are 5 or less typos”
Trang 90• Characterize a range of values that will include the actual number of typos found in a 400 page book with
probability at least 90%.
• Again, we can use Chebysheff’s Theorem. Since we want
this to be at least 75%, we first need to find the ‘k’ such that
• This yields .
• Next, recall that, if X is a Poisson random variable, then
• If X is the number of typos in 400 pages, then
Trang 935‐day period.
Trang 94• This is Pr(X ≥ 3). Again, to use Table 2, we need to express this in terms of a probability of the type Pr(X ≤ k). Note that: