Applied Econometrics NormalDistribution
1
Applied Econometrics
Lecture 1:NormalDistribution
For many random variables, the probability distribution is a specific bell-shaped curve, called the
normal curve, or Gaussian curve. This is the most common and useful distribution in statistics.
1) Standard normaldistribution
The standard normaldistribution has the probability density function as follows:
e
z
2π
1
P(z)Y
2
2
1
−
==
Features of the curve are:
1) z
2
increases in the negative exponent. Therefore, P(z) decreases, approaching 0
symmetrically in both tails.
2) The mean, which is zero (μ = 0), is the balancing point or the center of symmetry.
3) The standard deviation is one (σ = 1)
Example 1.1: If z has a standard normal distribution, find: P(-2<z<2)
1
Solution: P(-2<z<2) = 1 – P(z<-2) – P(z>2) = 1 – 2. (0.023) = 0.954
2) General normaldistribution
The general normaldistribution has the probability density function as follows:
e
σ
μX
2πσ
1
Y
2
2
1
⎟
⎠
⎞
⎜
⎝
⎛
−
=
−
The quantity Y, which is the height of the curve at any point along the scale of X, is known as the
probability density of that particular value of the variable quantity, X.
Example 2.1: The local authorities in a certain city install 2,000 electricity lamps in the streets of the
city. If these lamps have an average life of 1,000 burning hours, with a standard deviation of 200
hours, what number of the lamps might be expected to fail in the first 700 burning hours?
1
If z is continuous, P(z≥c) = P(X>c). In other words, ≥ and > can be used interchangeably for any continuous random
variable.
Written by Nguyen Hoang Bao May 17, 2004
Applied Econometrics NormalDistribution
2
Solution: In this case, we want to find the probability corresponding to the area of the probability
curve below t = [(700-1000)/200] = -1.5. We ignore the sign and enter our table at 1.5 to find that the
probability for lives less than 700 hours is P = 0.067. Hence the expected number of failures will be
2,000 x 0.067 = 134.
Example 2.2: What number of lamps may be expected to fail between 900 and 1,300 burning hours?
Solution:
z The number of lamps, which will fail under 900 hours: The corresponding value of t = [(900 –
1000)/200] = -0.5. Entering the table with this value of t, we find for the probability of failure
below 900 hours: P = 0.309.
z The number of lamps, which will fail over 1,300: The corresponding value of t = [(1,300 –
1,000)/200] = 1.5. Entering the table with this value of t, we find for the probability of failure
over 1,300 hours: P = 0.067.
z Hence the probability of failure outside the limits 900 to 1,300 hours will be 0.376
(0.309+0.067 = 0.376). It follows that the number of lamps we may expect to fail outside these
limits is: 2,000 x 0.376 = 752. But we were asked to find the number, which are likely to fail
inside the limits stated. This is 2,000 – 752 = 1,248.
Example 2.3: After what period of burning hours would you expect that 10% of the lamps would
have failed?
Solution: What we want here is the value of t corresponding to a probability P = 0.1. Looking along
our table, we find that when t = 1.25 the probability is P = 0.106. This is near enough for our purpose
of prediction. Hence we may take it that 10% of the lamps will fail at 1.25 standard deviations. Since
one standard deviation is equal to 200 hours, it follows that 10% of the lamps will fail before 1,000 –
1.25 x (200) = 1,000 – 250 = 750 hours.
3) Moment-based characteristics of a distribution
First moment
Mean > Median: the distribution is skewed to the right
Mean ≅ Median ≅ Mode: the distribution is symmetrically distributed
Mean < Median: the distribution is skewed to the left
Written by Nguyen Hoang Bao May 17, 2004
Applied Econometrics NormalDistribution
3
Second moment
The spread of a distribution is measured by its standard deviation
(
)
1n
X
X
S
n
1i
2
i
−
=
∑
−
=
Third moment
Coefficient of skewness: a
3
= (1/ns
3
) ∑(X
i
- X )
3
z Cubic power preserves the sign of an expression but inflate the larger deviations proportionally
much more than smaller deviations. If the distribution is symmetrical, negative and positive
cubic power will cancel each other out.
z The cubic power of the standard deviation in the denominator is used to standardize the
measure and so remove the dimension (i.e., it will not depend on the units in which the variable
is measured)
z If a
3
> 0, the distribution is skewed to the right (meaning its long tail is to the right) and the
mean is greater than the median
If a
3
≅ 0, the distribution is normally distributed (approximate symmetry) and the mean is
approximately equal to the median
If a
3
< 0, the distribution is skewed to the left (meaning its long tail is to the left) and the mean
is smaller than the median
Fourth moment
Coefficient of kurtosis: a
4
= (1/ns
4
) ∑(X
i
- X )
4
z Fourth powers make each sign positive but inflate larger deviations even more than cubic
powers or squares would do.
z The presence of heavy tails, therefore, will tend to inflate the numerator proportionally more
than denominator. The fatter the tails, therefore, the higher the kurtosis.
z The fourth power of the standard deviation in the denominator standardizes the measure and
renders it dimensionless.
Written by Nguyen Hoang Bao May 17, 2004
Applied Econometrics NormalDistribution
4
z If a
4
> 3, the distribution has heavier tails than a normaldistribution
If a
3
< 3, the distribution has a rectangular distribution which has a body but no tails
Table 3.1: Moment-based characteristics of a distribution
Measure Population Sample
X ∼ N(0,1)
First moment Center
E(X) = μ
X = (1/n) ∑X
i
0
Second moment Spread
E(X-μ)
2
= σ
2
S
2
= [1/(n-1)] ∑(X
i
- X )
2
1
Third moment Skewness
(1/σ
3
) E(X-μ)
3
a
3
= (1/ns
3
) ∑(X
i
- X )
3
0
Fourth moment Kurtosis
(1/σ
4
) E(X-μ)
4
a
4
= (1/ns
4
) ∑(X
i
- X )
4
3
4) The skewness – kurtosis (Jarque – Bera) test for normality
The hypothesis of normality distribution H
0
is as follows:
H
0
: α
3
= 0 and α
4
= 3
Against
H
1
: α
3
≠ 0 or α
4
≠ 3 or both
The relevant test statistic is BJ which follows a chi-square distribution with two degree of freedom
BJ = a
3
2
(n/6) + (a
4
– 3)
2
(n/24)
If BJ > 5.99, the normality distribution is formally rejected.
If BJ ≤ 5.99, we have no conclusion
5) Transformations towards normality
If the data are unimodal but skewed, a data transformation is called for to correct for the skewness in
the data. To do this we rely on the ladder of power transformations, which enable us to correct for
differences in the direction of skewness (positive or negative) and its strength. Often, but not always,
a transformation renders the transformed data symmetric, and, hopefully, also more normal in shape.
If so, the classical model of inference about the population mean using the sample mean as estimator
can again be used. Table 5.1 illustrates the hierarchy of these power transformations and their impact
on the skewness in the data.
Written by Nguyen Hoang Bao May 17, 2004
Applied Econometrics NormalDistribution
5
Table 5.1: Ladder of Power to Reduce Skewness
Power p Transformation Effect on skewness
3
2
1
0
-1
X
3
X
2
X
lnX
1/X
Reduce extreme negative skewness
Reduce negative skewness
Leaves data unchanged
Reduce positive skewness
Reduce extreme positive skewness
The power used in transformation need not be only an integer but can contain fractions as well. The
choice of an appropriate transformation often involves a trade-off between one which is ideal for the
purposes of data analysis and one which performs reasonably well on this count but also has the
advantage that it lends itself to a more straightforward interpretation (in substantive terms) of the
results.
References
Bao, Nguyen Hoang (1995), ‘Applied Econometrics’, Lecture notes and Readings,
Vietnam-Netherlands Project for MA Program in Economics of Development.
Maddala, G.S. (1992), ‘Introduction to Econometrics’, Macmillan Publishing Company, New York.
Mukherjee Chandan, Howard White and Marc Wuyts (1998), ‘Econometrics and Data Analysis for
Developing Countries’ published by Routledge, London, UK.
Written by Nguyen Hoang Bao May 17, 2004
Applied Econometrics NormalDistribution
6
Workshop 1:NormalDistribution
1) Phil and Kim Bell do not know whether to buy a house now or wait a year, in which case a price
increase may put a house beyond their reach. Their best guess is that, if they wait a year, the
price increase will be approximately normal, with a mean of 8% and, reflecting the uncertainty
of the market, a standard deviation of 10%.
1.1) If the price increase exceeds 25% they feel they will be unable to afford a house. What is
the chance of this?
1.2) On the other hand, if the price drops, they will have won their gamble handsomely. What is
the chance of this?
2) Using the data file SOCECON (with the world socioeconomic data for 1990) on the diskette,
make histograms and compute means, median and modes for the following variables:
GNP (gross national product) per capita
HDI (human development index)
FERT (fertility rate)
LEXPM and LEXPF (male and female life expectancy)
POPGRWTH (population growth rate)
In each case, discuss the different averages in the light of the shape of the empirical distribution.
Would you say that any of the distributions is reasonably symmetrical and bell-shaped?
3) Collecting the macroeconomic indicators Y (GDP), I (Investment), C (consumption), X
(Exports) and M (Imports) at fixed price on the World Development Indicators 2003 for 200
countries in the world,
3.1) Make histograms and compute means, median and modes for the above variables
3.2) Calculate the coefficients of skewness and kurtosis
3.3) Use the Jarque – Bera test for normality of each variable
3.4) Transform each variable towards normality
Written by Nguyen Hoang Bao May 17, 2004
Applied Econometrics NormalDistribution
7
4) Collecting data of life expectancy (LE) and GDP per capita (Y) of 200 countries (WDI 2003),
4.1) Plot the histogram (frequency graph) for each of your two samples (life expectancy and
income per capita)
4.2) Calculate the mean, mode, and median for each of your two samples
4.3) Calculate the skewness and kurtosis for each of your two samples
4.4) Use the Bera – Jarque test for normality for each of your two samples
4.5) In each case find the most appropriate transformation so that the data are approximately
normal
4.6) Calculate the regression coefficients from regressing LE on Y using a different functional
forms
LE = a
0
+ a
1
Y
ln(LE) = b
0
+ b
1
Y
LE = c
0
+ c
1
lnY
ln(LE) = d
0
+ d
1
lnY
and compare their coefficients of determination
4.7) Which of the models you have estimated best fits of the data? Discuss your results
4.8) Does the direction of causality exist?
Written by Nguyen Hoang Bao May 17, 2004
. Econometrics Normal Distribution
1
Applied Econometrics
Lecture 1: Normal Distribution
For many random variables, the probability distribution is. Econometrics Normal Distribution
4
z If a
4
> 3, the distribution has heavier tails than a normal distribution
If a
3
< 3, the distribution has