But, in a statistical relationship between the two variables, when the value of one variable is known, we can simply estimate the corresponding value of another variable.. Regression ana
Trang 17 Fit a parabola y = a + bx + cx2 to the folliwng data:
x y
[Ans y=0.34 0.78− x+0.99x2]
8 Determine the constants a and b by the method of least squares such that y=ae bx fits the following data:
x y
[Ans.y=1.49989e0.50001x]
9 Fit a least square geometric curve y=ax b to the following data:
x y
[Ans.y=0.5012x1.9977]
10 A person runs the same race track for five consecutive days and is timed as follows:
( )
( )
Day x Time y
Make a least square fit to the above data using a function a b c2
x x
6.7512 4.4738 13.0065
y
11 Use the method of least squares to fit the curve 0
1
c
x
= + to the following table of values:
x y
[Ans y 1.97327 3.28182 x
x
12 Using the method of least square to fit a parabola y= + +a bx cx2 in the following data:
( )x y, :(−1, 2 , 0, 0 , 0,1 , 1, 2) ( ) ( ) ( ) [Ans 1 3 2
2 2
y= + x ]
Trang 213 The pressure of the gas corresponding to various volumes V is measured, given by the following data:
( ) ( )
3 2
V cm
p kgcm−
Fit the data to the equation pVγ =c
[Ans.pV0.28997 =167.78765]
We know that in a functional relation between two variables, if we know the value of one variable, then the corresponding value of the other variable can be determined exactly
But, in a statistical relationship between the two variables, when the value of one variable
is known, we can simply estimate the corresponding value of another variable
Regression analysis is the method used for estimating the unknown values of one variable corresponding
to the known values of another variable.
9.3.1 Dependent and Independent Variables
Suppose there is a relation between two variables The variable, whose values are known, is known
as independent variable, while another one is called the dependent variable
9.3.2 Line of Regression
Let { , } : 1x y i i ≤ ≤i n and 1≤ ≤j n} be a bivariate distribution If we plot the corresponding values
of x and y, taking the values of x along x-axis and the values of y along y-axis, we obtain a
collection of dots, called the scatter-diagram
If the scatter diagram indicates some relationship between x and y, then the dots of the
scatter diagram will be concentrated round a line, called the line of regression or the line of best fit
9.3.3 Regression Line of y on x
If we have to predict the values of y from given values of x, then the line of regression has an
equation of the form y= +a bx. This is called the regression line of y on x.
9.3.4 Regression Line of x on y
If we have to predict the values of x from given values of y, then the line of regression has an equation of the form x = a + by This is called the regression line of x on y.
9.3.5 To obtain the Equation of Line of Regression of y on x
Suppose that the line approximating the set of point (x y1, 1) (, x y2, 2) (, x y3, 3), , (x y n, n) has the equation:
Trang 3Then, y i = +a bx i and 2
i i i i
x y =ax +bx for each i=1, 2, ,n therefore
Equations ( )2 and ( )3 are normal equations for this line
Solving ( )2 and ( )3 for a and b and putting these values in ( )1 , we obtain the required equation of the line of regression of y on x.
9.3.6 To obtain the Equation of Line of Regression of x on y
Suppose that the line approximating the set of points (x y1, 1) (, x y2, 2) (, x y3, 3), ,(x y n, n) has the equation:
Then, x i = +a by i and x y i i =ay i+by i2 for each i=1, 2, ,n therefore
Equations ( )2 and ( )3 are normal equations for this line
Solving ( )2 and ( )3 for a and b and putting these values in (1), we obtain the required equation of the line of regression of x on y
Example 1 Find the line of regression of y on x for the following data:
Sol Here n=7 Now form the table given below:
i
i
∑ 47 ∑y i =60 ∑x2i =352 ∑x y i i =416
Then, y i= +a bx i andx y i i=ax i+bx i2for each i.
Trang 4Therefore the normal equations are:
2
i i i i
Putting the values from the table in ( )2 and ( )3 , we get
60 7= a+47b
416=47a+355b
Solving these equations, we get a=8.582 and b=1.094.
Putting these values in (1) the required equation is y = 8.582 + 1.094x Ans
Example 2 Find the line of regression of x on y for the following data:
Sol Here n=5 Now, form the table given below :
i
i
i
∑ 340 ∑x y i i =214
Then x i= +a by i and 2
i i i i
x y =ay +by for each . Therefore the normal equations are:
2
i i i i
Putting the values from the table in ( )2 and ( )3 , we get
30=5a+40b ⇒ a+8b=6
214=40a+340b⇒ 20a+170b=107
On solving these equations we get a=16.4 and b= −1.3.
Therefore the requried equation is, x=16.4 1.3 − y Ans
Trang 5Example 3 Prove that arithmetic mean of the coefficient of regression is greater than the coefficient
of correlation.
Sol Coefficients of regression are r y
x
σ
σ , r xy
σ σ
We have to prove that A M .>r.
y x
x y
σ σ + >
σ σ
or
1
1 2
y x
x y
σ σ + >
σ σ
y x
x y
+ − >
σ σ or 2 2
1 [ x y 2 x y] 0
x y
σ + σ − σ σ >
σ σ
[ x y]
x y
σ − σ
σ σ which is true Proved.
Example 4 Find the regression line of y on x for the following data:
Estimate the value of y, when x=10.
Sol
Let y = a + bx be the line of regression of y on x Therefore normal equations are : ∑y i =na b+ ∑x i ⇒ 40=8a+56b (1) ∑x y i i =a∑x i+b∑x i2 ⇒ 364=56a+524b (2)
On solving (1) and (2) we get
6
11
11
b= The equation of the required line is
6 7
11 11
y= + x or 7x−11y+ =6 0
If x=0, 6 7( )10 76 610
Trang 6Example 5 In a study between the amount of rainfall and the quantity of air pollution removed the following data were collected.
Daily Rainfall in 0.01cm 4.3 4.5 5.9 5.6 6.1 5.2 3.8 2.1 Pollution Removed (mg/m 3 ) 12.6 12.1 11.6 11.8 11.4 11.8 13.2 14.1 Find the regression line of y on x.
Sol
Let y= +a bx be the equation of the line of regression of y on x
∴ Normal equations are:
∑y i =na b+ ∑x i ⇒ 98.6=8a+37.5b
2
i i i i
x y =a x +b x ⇒
∑ ∑ ∑ 453.82=37.5a+188.01b
After solving these normal equations we get a=15.49 and b= − 0.675
The equation of the line of regression is y = 15.49 – 0.675x. Ans
9.3.7 Another Form of Equations of Lines of Regression
Theorem 1: Show that the equation of the line of regression of y on x is given by
( )
y
x
− = −
σ , where x and y are the means of x-series and y-series respectively; r is the coefficient of correlation between x and y; σx and σy are the standard deviations of x-series and the y-series respectively.
Proof: Suppose that the line approximating the set of points (x y1, 1) (, x y2, 2), ,(x y n, n)
has the equation
Then y i= +a bx i and 2
i i i i
x y =ax +bx for each i=1, 2, , n
Trang 7∑x y i i =a∑x i+b∑x2i (3) From (2), we have y i a b x i
Thus, it follows that ( )x y, lies on the line
Shifting the origin to ( )x y, ( )2 becomes
∑ (y i−y)=na b+ ∑ (x i−x) or a=0
3∑ b gx i−x = ∑ b gy i−y = 0
Shifting the origin to ( )x y, and taking a = 0,
i i i
x −x y −y =b x −x
From ( )6 , we have
b = x x y y
x x
i i i
d id i
d i
−
−
∑
dx
i i i
b g
∑
= dx dy
n
i i x
b g
∑
σ 2 = r y
x
σ
σ 3 ( )
( , )
i i
x y
dx dy r
n
=
σ σ
∑
Putting this values of b in ( )5 , the required equation of the line if regression of y on x i
( ) y( )
x
− = −
σ
Coefficient of Regression of y on x: The real number y
x
=
σ is called the coefficient
of regression of y on x and is denoted by b yx Thus yx y
x
=
σ .
Theorem 2: The equation of the line of regression of x on y is given by
d ix x– =r x y y
y
.σ –
σ d i
Proof: Proceed as in theorem 1
Coefficient of Regression of x on y: The real number b = r x
y
.σ
σ is called the coefficient
of regression of x on y and is denoted by b xy Thus b xy = r x
y
.σ
σ .
Trang 8Theorem 3: Prove that:
(i)
( )( ) ( )2 2
i i
i i yx
i i
x y
n b
x x
n
−
−
=
∑ ∑
∑
∑
∑
(ii)
( )( ) ( )2 2
i i
i i xy
i i
x y
n b
y y
n
−
=
∑ ∑
∑
∑
∑
Proof: (i) By definition, we have
( )2
y y x yx
x x
b rσ r σ σ
( ) ( )2
cov ,
x
x y
=
( )( ) ( )2 2
i i
i i
i i
x y
n
x x
n
=
∑ ∑
∑
∑
∑
Similarly, ( )ii can be proved
Example 6 Find the regression coefficient b yx between x and y for the following data: ∑x = 24 ,
2 2
y = 44, xy = 306, x = 164, y = 574
Sol The given data may be written as ∑x i =24, ∑y i=44, ∑x y i i =306, ∑x i2=164,
2 574
i
b yx=
n
n
i i
i
i
∑ ∑ ∑
∑
∑
R S|
T|
U V|
W|
–
–
d id i
d i
2
2 =
306 24 44
4
164 24
4
2
– –
×
a f
= 306 264
164 144
– –
20 = 2.1 Ans
Example 7 Find the regression coefficient b xy between x and y for the following data:
x
∑ = 30, ∑y = 42, ∑xy = 199, ∑x2 = 184, ∑y2 = 318 and n = 6.
Trang 9Sol The given data may be given as under: ∑x i = 30, ∑y i = 42, ∑x y i i = 199,
x i
∑ 2
= 184, ∑y i2
= 318 and n = 6.
∴
( )( ) ( )2 2
i i
i i xy
i i
x y
n b
y y
n
−
=
∑ ∑
∑
∑
∑
30 42 199
199 210 11
318
6
×
−
Ans
Example 8 For the following observations (x, y), find the regression coefficient b yx and b xy and hence find the correlation coefficient between x and y: (1, 2), (2, 4), (3, 8), (4, 7), (5, 10), (6, 5), (7, 14), (8, 16), (9, 2), (10, 20).
Sol Here n = 10 We may prepare the table, given below:
( )( ) ( )2 ( )2 2
55 88
82.5 55
385 10
i i
i i yx
i i
x y
n b
x x
n
×
−
∑ ∑
∑
∑
∑
And
( )( ) ( )2 ( )2 2
(55 88)
339.6 88
1114
10
i i
i i xy
i i
x y
n b
y y
n
×
−
∑ ∑
∑
∑
∑
Trang 10Now, yx· xy · y · x 2
b b =r σ r σ =r
σ σ
, where r is the coefficient of correlation.
∴ r= b yx·b xy = 1.24 0.30× =0.609
Thus, b yx =1.24, b xy =0.30 and r=0.609 Ans
9.3.8 Some Properties of Regression Coefficients
Let, the regression coefficient of y on x is b yx; the regression coefficient of x on y is b xy; and,
the correlation coefficient between x and y is r Then, we have the following results.
Theorem 1: Prove that r= b yx⋅b xy
Proof: We have: y
yx x
=
x xy y
σ Therefore, b yx b xy = r2 or r = b yx.b xy
Remark: Clearly we can say that, correlation coefficient is the geometric mean between the two regression coefficients
Theorem 2: Prove that r, byx and bxy are of the same sign
Proof: We know that y
yx x
=
σ and
x xy y
σ Since σx and σy are both positive, it
follows from the two equations, given above that b yx and b xy have the same sign as r.
Hence r b, yx and b xy are always of the same sign.
Theorem 3: Prove that the arithmetic mean of regression coefficient is greater than the correlation coefficient
Proof: Clearly, the required result is true,
2 b yx+b xy >r i.e., if 12 . y x
x y
σ σ + >
σ σ
i.e., if σ + σ > σ σ2y 2x 2 x y
i.e., if (σ − σ2y 2 2x) − σ σ >2 x y 0
i.e., if (σ − σy x)2>0, which is true
Hence the required result is true Proved
Theorem 4: Let θ be the angle between the regression line of y on x and the regression
line of x on y Then, prove that ( )
( )
2
2 2
x y
r r
σ + σ
Proof: The equation of the line of regression of x on y is
y