Therefore our choice of regression line is incorrect.. So, our choice of regression line is incorrect... ∴ Our choice of regression line is incorrect.. You are given the following data:
Trang 1And, the equation of the line of regression of y on x is
( ) y( )
x
y y r σ x x
Let m1 and m2 be the slopes of ( )i and ( )ii respectively
Then, 1
y x
m
r
σ
=
σ and 2
y
x
r
=
σ .
Therefore, ( 1 2)
1 2
tan 1
m m
−
θ = +
( )
( )
( )
( )
2
2
1
1
x y
y
x
r
r r
r
−
σ + σ
+ σ
Proved
Example 9 The lines of regression of x on y and y on x are respectively x = 19.13 – 0.87y and y = 11.64 – 0.50x Find:
(a) The mean of x - series;
(b) The mean of y- series;
(c) The correlation coefficient between x and y
Sol Let the mean of x-series is x– and that of y-series be y–.
Since the lines of regression pass through ( )x y, , we have:
x– = 19.13 − 0.87 y– or x– + 0.87 y– = 19.13 (1) and y=11.64 0.50− x or 0.50x y+ =11.64 (2)
On solving ( )1 and ( )2 , we get
x=15.94 and y=3.67
Therefore, mean of x-series = 15.94
And mean of y-series = 3.67
Now, the line of regression of y on x is:
y=11.64 0.50− x ∴b yx = −0.50
Also, the line of regresson x on y is:
x=19.13 0.87− y ∴ b xy = −0.87
∴ r= b b yx xy = −( 0.50)(−0.87)= 0.435 = −0.66
Clearly, r is taken as negative, since each one of b yx and b xy is negative.
Example 10 Out of the following two regression lines, find the line of regression of x on y :
2x + 3y = 7 and 5x + 4y = 9.
Trang 2Sol Let 2x+3y=7 be the regression line of x on y.
Then, 5x + 4y = 9 is the regression line of y on x.
Therefore 2x+3y=7 and 5x+4y=9
⇒ 3 7
2
xy
4
yx
b = −
xy yx
r= b b = − − −
[3r b, xy,b yx have the same sign]
15 1,
8
= − < − which is impossible.
Therefore our choice of regression line is incorrect
Hence, the regression line of x on y is 5x + 4y = 9. Ans
Example 11 Find the correlation coefficient between x and y , when the lines of regression are: 2x – 9y + 6 = 0 and x – 2y + 1 = 0.
Sol Let the line of regression of x on y be 2x – 9y + 6 = 0
Then, the line of regression of y on x is x−2y+ =1 0
Therefore 2x−9y+ =6 0 and x−2y+ =1 0
⇒ 9 3
2
x= y− and 1 1
y= x+
⇒ 9
2
xy
2
yx
b =
xy yx
r= b b = × = >
which is impossible.
So, our choice of regression line is incorrect
Therefore, the regression line of x on y is x−2y+ =1 0
And, the regression line of y on x is 2x−9y+ =6 0
⇒ x=2y−1 and 2 2
y= x+
⇒ b xy =2 and 2
9
yx
b =
⇒ r= b xy.b yx = 2×29=23
Hence, the correlation coefficient between x and y is 2
3. Ans
Example 12 The equations of two lines of regression are: 3x + 12y = 19 and 3y + 9x = 46 Find
Trang 3(i) the mean of x-series
(ii) the mean of y-series
(iii) Regression coefficient b xy and b yx ,
(iv) Correlation coefficient between x and y.
Sol Let the mean of x-series be x and that of y-series be y Then, each of the given lines passes through ( , )x y
On solving (1) and (2), we get x = 5 and y = 1
3.
Therefore mean of x-series is 5 and mean of y-series is 1
3.
Now, let the line of regression of x on y be 3x + 12y = 19
Then, the line of regression of y on x is 3y + 9x = 46.
Therefore 3x + 12y = 19 and 3y + 9x = 46
3 and y = –3x +
46
3 ⇒ b xy = –4 and b yx = –3
⇒ r = – a fa f–4 –3 = –2 3 < –1, which is impossible
∴ Our choice of regression line is incorrect
Consequently, the regression line of x on y is 3y + 9x = 46.
And, the regression line of y on x is 3x + 12y = 19.
Therefore 3y + 9x = 46 and 3x + 12y = 19
⇒ x= −13y+469 and 1 19
y= − x+
r= − − = − =−
(Because r , b xy and b yx have the same sign)
Example 13 You are given the following data:
standard deviation 14 20 Correlation coefficient between x and y is 0.8 Find the two regression lines.
Estimate the value of y, when x is 70.
Estimate the value of x, when y is 90.
Trang 4Sol Given that x=18, y=100, σ =x 14, σ =y 20 and r=0.8.
Therefore the line of regression y on x is :
( ) y
x
y y r σ x x
σ
or ( 100) 0.8 20 ( 18)
14
y− = × x−
or y=1.14x+79.41
When x=70, we have: y=(1.14 70 79.41) 159.21× + =
And, the line of regression of x on y is:
( ) x
y
x x− =r σ y y−
σ
20
x− = × y−
or x=0.56y−38
When y=90, we have x=(0.56 90 38× − )=12.4 Ans
To Find byx and bxyUsing Assumed Mean: Let the assumed means of x-series and y-series
be A and B respectively Then, taking dx i =(x i−A) and dy i=(y i−B), we have
( ) ( )( )
( ) ( )2 2
i i yx
i i
dx dy
n b
dx dx
n
=
∑ ∑
∑
∑
∑
And,
( ) ( )( )
( ) ( )2 2
i i xy
i i
dx dy
n b
dy dy
n
=
∑ ∑
∑
∑
∑
Example 14 Find the regression coefficients and hence the equations of the two lines of regression from the following data:
Age of husband (x) 25 22 28 26 35 20 22 40 20 18
Hence estimate
(i) The age of wife, when the age of husband is 30.
(ii) The age of husband, when the age of wife is 19.
Trang 5Sol We have
256 25.6
10
i x x x
17.2 10
i y y n
Let the assumed mean of x- series and y- series be 26 and 17 respectively Then, we may prepare
the table given below:
( ) ( ) ( )2 ( )2
Therefore,
( ) ( )( )
( ) ( )
( )( ) ( )
2
4 2
10 4 450 10
i i yx
i i
dx dy
n b
dx dx
n
−
∑ ∑
∑
∑
∑
b yx = 172 0 8
450 1 6
+ –
b g = 172 8
448 4
= 0.385
( ) ( )( )
( ) ( )
( )( )
2
4 2
10 2 78 10
i i xy
i i
dx dy
n b
dy dy
n
−
∑ ∑
∑
∑
∑
( )
172 0.8 172.8
2.23
78 0.4 77.6
xy
−
Therefore the equation of the line of regression of y on x is:
( )y y− =b yx.( )x x− or (y−17.2)=(0.385)(x−25.6)
Trang 6Now, when x = 30, we get
( )( ) 17.2 0.385 30 25.6
y− = − or y=19 (approximately)
∴ When the age of husband is 30 years, the estimated age of husband is 19 years
Again, the equation of the line of regression of x on y is:
x–x
d i = b xyd iy– y or (x – 25.6) = (2.23)(y –17.2) Thus, when y = 19, we get x = 30 (approximately).
So, when the age of wife is 19 years, the estimated age of husband is 30 years Ans
The deviation of the predicted value from the observed value is known as the standard error of prediction It is given by
( )2
p yx
y y E
n
−
where y is the actual value and y p the predicted value.
Theorem: Prove that:
(1) E yx= σy 1( )−r2 , (2) E xy= σx 1( )−r2
Proof: (1) The equation of the line of regression of y onx is
y( )
x
y y r σ x x
σ
∴ y p y r y( )x x
x
σ
1
yx
x
y y
σ
2 2 2
2
1
x x
n
σ σ
2 2 2
x x
σ σ
1/2
2 2
2
x x
r
σ σ
( 2 2)1/2 ( 2)
1
y
(2) Similarly, (2) may be proved
Trang 7Example 15 For the data given below, find the standard error of estimate of y on x
Sol We leave it to the reader to find the line of regression of y on x.
This is: y=1.3x+1.1 So, y p =1.3x+1.1
Now form the table for given data:
( )
2
2
1.3 1.1
9.10
p
y y
−
−
∑
9.10
1.82 1.349 5
p yx
y y E
n
−
There are a number of situations where the dependent variable is a function of two or more independent variables either linear or non-linear Here, we shall discuss an approach to fit the experimental data where the variable under consideration is linear function of two independent variables
Let us consider a two-variable linear function given by
The sum of the squares of the errors is given by
S = y i a bx i cz i i
n
1
=
Differentiating S partially w.r.t a, b, c, we get
∂
∂
S
a = 0 ⇒ 2
1
y i a bx i cz i i
n
=
∂
∂
S
b = 0 ⇒ 2
1
y i a bx i cz i x i
i
n
=
Trang 8and ∂
∂
S
c = 0 ⇒ 2
1
y i a bx i cz i z
i
n
i
=
which on simplification and omitting the suffix i, yields.
∑y = ma + b∑x + c∑z
∑xy = a∑x + b ∑x2 + c∑xz
∑yz = a∑z + b∑xz + c∑z2
Solving the above three equations, we get values of a, b, and c Consequently, we get the linear function y = a + bx + cz called regression plane.
Example 16 Obtain a regression plane by using multiple linear regression to fit the data given below :
z : 12 18 24 30 (U.P (U.P.TU 2002) TU 2002) Sol Let y = a + bx + cz be required regression plane where a, b, c are the constants to be
determined by following equations :
and
U V|
W|
y ma b x c z
xy a x b x c xz
yz a z b zx c z
2
From table, equation (1) can be written as
84 = 4a + 10b + 6c
240 = 10a + 30b + 20c
Solving, we get a = 10, b = 2, c = 4
Hence the required regression plane is
y = 10 + 2x + 4z Ans.Ans
Trang 9PROBLEM SET 9.2
1 Find the equation of the lines of regression on the basis of the data:
:
:
[Ans y=3.75 0.25 ,− x x=3.75 0.25− y]
2 Find the regression coefficient b yx for the data:
55,
x=
[Ans.1.24]
3 The following data regarding the heights ( )y and weights ( )x of 100 college students are given:
15000,
x=
[Ans y=0.1x+53]
4 Find the coefficient of correlation when two regression equations are:
= −0.2 +4.2
5 Find the standard error of estimate of y on x for the data given below:
:
:
[Ans E yx =0.564]
6 If two regression coefficients are 0.8 and 0.2, what would be the value of coefficient of
7 x and y are two random variables with the same standard deviation and correlation
coefficient r Show that the coefficient of correlation between x and x y+ is 1 .
2
r
+
8 Show that the geometric mean of the coefficients of regression is the coefficient of correlation
GGG
Trang 10CHAPTER 10
Time Series and Forecasting
Business executives, economists, and government officials are often faced with problems that require forecast such as future sales, future revenue and expenditures, and the total business activity for the next decade Time series analysis is a statistical method, which helps the businessman to understand the past behaviour of economic variables based on collection of observations taken at different time intervals Having recognized the behaviour or movements
of a time series, the businessman tries to forecast the future of economic variables on the assumption that the time series of such an economic variable will continue to behave in the same fashion as it had in the past Thus analyzing information for the previous time periods
is the subject of time series analysis
Thus the statistical data, which are collected, observed or recorded at successive intervals
of time or arranged chronologically are said to form a time series
“A time series a set of observations taken at specified times, usually (but not always) at equal intervals” Thus a set of data depending on time, which may be year, quarter, month, week, days etc is called a time series
Examples:
1 The annual production of Rice in India over the last 15 years
2 The daily closing price of a share in the Calcutta Stock Exchange
3 The monthly sales of an Iron Industry for the last 6 months
4 Hourly temperature recorded by the meteorological office in a city
Mathematically, a time series is defined by the value y y1, 2, , of a variable y (closing price of a share, temperature etc.) at time t1, t2, t3, Thus y is a function of
t and given by
y = f (t)
A time series involving a variable y is represented pictorially by constructing a graph of y verses t.
425