1 percent level of significance So we reject the null hypothesis on both levels of significance.Question 58 A standard drug is known to be effective in 75 percent of the cases in which i
Trang 1VIETNAM NATIONAL UNIVERSITY – HO CHI MINH CITY
INTERNATIONAL UNIVERSITY
HOMEWORK 5 – STATISTICS
HYPOTHESIS TESTING, REGRESSION
AND ANALYSIS OF VARIANCE
Trang 2Ho Chi Minh City, Vietnam
Trang 3CHAPTER 8 - HYPOTHESIS TESTING
Question 38
To learn about the feeding habits of bats, 22 bats were tagged and tracked by radio
Of these 22 bats, 12 were female and 10 were male The distances flown (in meters)between feedings were noted for each of the 22 bats, and the following summarystatistics were obtained
Test the hypothesis that the mean distance flown between feedings is the same forthe populations of both male and of female bats Use the 5 percent level ofsignificance
Trang 4The mean weight of newborns are the same in both countries
The p-value of this test:
life-a 5 percent level of significance
b 1 percent level of significance
Solution
Let X to be random variable that marks the number of calls that involve life –threatening emergencies
We have
The size of testing sample: n = 200
The p-value of this test:
a 5 percent level of significance
b 1 percent level of significance
So we reject the null hypothesis on both levels of significance
Question 58
A standard drug is known to be effective in 75 percent of the cases in which it is used
to treat a certain infection A new drug has been developed and has been found to beeffective in 42 cases out of 50 Based on this, would you accept, at the 5 percent level
of significance, the hypothesis that the two drugs are of equal effectiveness? What isthe p-value
Solution
We have
4
Trang 5CHAPTER 9: REGRESSION
Question 1
The following data relate x, the
moisture of a wet mix of a certain
product, to Y, the density of the finished
product
a Draw a scatter diagram
The estimated regression line : y =
Trang 6Question 3
The corrosion of a certain metallic substance has been studied in dry oxygen at 500degrees centigrade In this experiment, the gain in weight after various periods ofexposure was used as a measure of the amount of oxygen that had reacted with thesample Here are the data:
a Plot a scatter diagram
b Fit a linear relation
c Predict the percent weight gain when the metal is exposed for 3.2 hours
The slope is 0.011742857142857138 and the y-intercept is 0.007185714285714305
The estimated regression line is
c Predict the percent weight gain when the metal is exposed for 3.2 hours
(x=3.2)
6
Trang 7Question 4
The following data indicate the relationship between x, the specific gravity of a woodsample, and Y, its maximum crushing strength in compression parallel to the grain
a Plot a scatter diagram Does a linear relationship seem reasonable?
b Estimate the regression coefficients
c Predict the maximum crushing strength of a wood sample whose specific gravity is 43
A linear relationship seems unreasonable
b Estimate the regression coefficients
The estimated regression line is
A and B are the estimators of regression parameters
Trang 8c Predict the maximum crushing strength of a wood sample whose specific gravity is 0.43 (x=0.43)
Question 5
The following data indicate the gain in reading speed versus the number of weeks inthe program of 10 students in a speed-reading program
a Plot a scatter diagram to see if a linear relationship is indicated
b Find the least squares estimates of the regression coefficients
c Estimate the expected gain of a student who plans to take the program for 7 weeks
Output: The slope is 11.796897038081807 and the y-intercept is 2.638928067700968
The estimated regression line is
b Find the least squares estimates of the regression coefficients
8
Trang 9c Estimate the expected gain of a student who plans to take the program for 7 weeks (x=7)
Question 6
Infrared spectroscopy is often used to determine the natural rubber content ofmixtures of natural and synthetic rubber For mixtures of known percentages, theinfrared spectroscopy gave the following readings:
print ( 'The slope is' , b1,
'and the y-intercept is' , b0)
plt.show()
Output: The slope is 0.007025714285714285 and the y-intercept is 0.7497142857142856
The estimated regression line is
The estimate percentage of natural rubber when a new mixture gives an infraredspectroscopy reading of 1.15: y = 1.15
Question 10
Verify that
Solution
Question 11
The following table relates the number of sunspots that appeared each year from
1970 to 1983 to the number of auto accident deaths during that year Test thehypothesis that the number of auto deaths is not affected by the number of sunspots.(The sunspot data are from Jastrow and Thompson, Fundamentals and Frontiers ofAstronomy, and the auto death data are from General Statistics of the U.S 1985.)
Trang 10print ( 'The mean of x is' ,x_bar)
print ( 'The mean of y is' , y_bar)
# The sum of xy, x^2, y^2
yy_Sum= sum (yy)
print ( 'The sum of xy is' , xy_Sum)
print ( 'The sum of x^2 is' , xx_Sum)
print ( 'The sum of y^2 is' , yy_Sum)
# Calculate Sxx, Syy, Sxy, SSR, B
Sx= sum (xx)-n*x_bar*x_bar
Sy= sum (yy) - n* pow (y_bar, 2 )
Sxy= sum (xy)-n*x_bar*y_bar
SSr = (Sx*Sy-Sxy*Sxy)/Sx
B=Sxy/Sx
print ( 'The Sxx is' , Sx)
print ( 'The Syy is' , Sy)
print ( 'The Sxy is' , Sxy)
print ( 'The SSR is' , SSr)
print ( 'The B is' , B)
10
Trang 12a Do the above data establish the hypothesis that a lawyer’s salary is related to his height? Use the 5 percent level of significance
print ( 'The mean of x is' ,x_bar)
print ( 'The mean of y is' , y_bar)
# The sum of xy, x^2, y^2
yy_Sum= sum (yy)
print ( 'The sum of xy is' , xy_Sum)
print ( 'The sum of x^2 is' , xx_Sum)
print ( 'The sum of y^2 is' , yy_Sum)
# Calculate Sxx, Syy, Sxy, SSR, B
Sx= sum (xx)-n*x_bar*x_bar
Sy= sum (yy) - n* pow (y_bar, 2 )
Sxy= sum (xy)-n*x_bar*y_bar
SSr = (Sx*Sy-Sxy*Sxy)/Sx
B=Sxy/Sx
print ( 'The Sxx is' , Sx)
print ( 'The Syy is' , Sy)
print ( 'The Sxy is' , Sxy)
print ( 'The SSR is' , SSr)
print ( 'The B is' , B)
Trang 13a Draw a scatter diagram relating cigarette use and death rates from lung cancer
b Estimate the regression parameters α and β
Trang 14c Test at the 05 level of significance the hypothesis that cigarette consumption doesnot affect the death rate from lung cancer
d What is the p-value of the test in part (c)?
print ( 'The mean of x is' ,x_bar)
print ( 'The mean of y is' , y_bar)
# The sum of xy, x^2, y^2
yy_Sum= sum (yy)
print ( 'The sum of xy is' , xy_Sum)
print ( 'The sum of x^2 is' , xx_Sum)
print ( 'The sum of y^2 is' , yy_Sum)
# Calculate Sxx, Syy, Sxy, SSR, A, B
Sx= sum (xx)-n*x_bar*x_bar
Sy= sum (yy) - n* pow (y_bar, 2 )
Sxy= sum (xy)-n*x_bar*y_bar
SSr = (Sx*Sy-Sxy*Sxy)/Sx
B=Sxy/Sx
A=y_bar-B*x_bar
print ( 'The Sxx is' , Sx)
print ( 'The Syy is' , Sy)
print ( 'The Sxy is' , Sxy)
print ( 'The SSR is' , SSr)
print ( 'The A is' , A)
print ( 'The B is' , B)
# Test hypothesis
14
Trang 16b Estimate the regression parameters α and β.
A and B are the estimators of regression parameters
Trang 17CHAPTER 10 – ANALYSIS OF VARIANCE
Question 1
A purification process for a chemical involves passing it, in solution, through a resin onwhich impurities are adsorbed A chemical engineer wishing to test the efficiency of 3different resins took a chemical solution and broke it into 15 batches She tested eachresin 5 times and then measured the concentration of impurities after passing throughthe resins Her data were as follows:
Test the hypothesis that there is no difference in the efficiency of the resins
import scipy.stats as stats
from scipy.stats import norm, t, chi2, f
print ( 'The mean of ResinI is' ,x1_bar)
print ( 'The mean of ResinII is' ,x2_bar)
print ( 'The mean of ResinIII is' ,x3_bar)
x_bar=(x1_bar, x2_bar, x3_bar)
S2_bar=np.var(x_bar,ddof= 1
Numerator=n*S2_bar
print ( 'The variance of three sample means is' , S2_bar)
print ( 'The numerator is Numerator=' , Numerator)
Var1=np.var(ResinI,ddof= 1
Var2=np.var(ResinII,ddof= 1
Trang 18Var3=np.var(ResinIII,ddof= 1 )
print ( 'The variance of ResinI is' ,Var1)
print ( 'The variance of ResinII is' ,Var2)
print ( 'The variance of ResinIII is' ,Var3)
print ( 'The denominator is Denominator=' , Denominator)
print ( 'The test statistic is TS=' , TS)
print ( 'The F_critical is F_critical=' , F_critical)
Output:
The mean of ResinI is 0.029000000000000005
The mean of ResinII is 0.027600000000000003
The mean of ResinIII is 0.030000000000000006
The variance of three sample means is 1.453333333333337e-06
The numerator is Numerator= 7.266666666666685e-06
The variance of ResinI is 0.00021749999999999997
The variance of ResinII is 0.0001123
The variance of ResinIII is 0.00011750000000000004
The denominator is Denominator= 0.0001491
The test statistic is TS= 0.04873686563827421
The F_critical is F_critical= 3.8852938346523933
18
Trang 19Do the ovens appear to operate at the same temperature? Test at the 5 percent level
of significance What is the p-value?
import scipy.stats as stats
from scipy.stats import norm, t, chi2, f
print ( 'The mean of TemI is' ,x1_bar)
print ( 'The mean of TemII is' ,x2_bar)
print ( 'The mean of TemIII is' ,x3_bar)
x_bar=(x1_bar, x2_bar, x3_bar)
S2_bar=np.var(x_bar,ddof= 1
Numerator=n*S2_bar
print ( 'The variance of three sample means is' , S2_bar)
print ( 'The numerator is Numerator=' , Numerator)
Var1=np.var(TemI,ddof= 1
Var2=np.var(TemII,ddof= 1 )
Var3=np.var(TemIII,ddof= 1
print ( 'The variance of TemI is' ,Var1)
print ( 'The variance of TemII is' ,Var2)
print ( 'The variance of TemIII is' ,Var3)
Trang 20print ( 'The denominator is Denominator=' , Denominator)
print ( 'The test statistic is TS=' , TS)
print ( 'The F_critical is F_critical=' , F_critical)
The mean of TemI is 493.41999999999996
The mean of TemII is 482.64
The mean of TemIII is 494.71999999999997
The variance of three sample means is 43.97079999999984
The numerator is Numerator= 219.8539999999992
The variance of TemI is 12.61199999999996
The variance of TemII is 18.463000000000054
The variance of TemIII is 33.56200000000014
The denominator is Denominator= 21.54566666666672
The test statistic is TS= 10.20409363058304
The F_critical is F_critical= 3.8852938346523933
Do the data indicate that the procedures yield equivalent results?
Solution
20
Trang 21Input:
from numpy.lib.function_base import average
import numpy as np
import pandas as pd
import scipy.stats as stats
from scipy.stats import norm, t, chi2, f
print ( 'The mean of MethodI is' ,x1_bar)
print ( 'The mean of MethodII is' ,x2_bar)
print ( 'The mean of MethodIII is' ,x3_bar)
print ( 'The mean of MethodIV is' ,x4_bar)
x_bar=(x1_bar, x2_bar, x3_bar, x4_bar)
S2_bar=np.var(x_bar,ddof= 1
Numerator=n*S2_bar
print ( 'The variance of four sample means is' , S2_bar)
print ( 'The numerator is Numerator=' , Numerator)
Var1=np.var(MethodI,ddof= 1
Var2=np.var(MethodII,ddof= 1 )
Var3=np.var(MethodIII,ddof= 1
Var4=np.var(MethodIV,ddof= 1 )
print ( 'The variance of MethodI is' ,Var1)
print ( 'The variance of MethodII is' ,Var2)
print ( 'The variance of MethodIII is' ,Var3)
print ( 'The variance of MethodIV is' ,Var4)
print ( 'The denominator is Denominator=' , Denominator)
print ( 'The test statistic is TS=' , TS)
Trang 22print ( 'The F_critical is F_critical=' , F_critical)
The mean of MethodI is 78.41000000000001
The mean of MethodII is 80.75500000000001
The mean of MethodIII is 76.50999999999999
The mean of MethodIV is 84.32000000000001
The variance of four sample means is 11.313539583333371
The numerator is Numerator= 45.254158333333486
The variance of MethodI is 2.6694666666666724
The variance of MethodII is 1.6527000000000047
The variance of MethodIII is 13.316666666666634
The variance of MethodIV is 6.581333333333327
The denominator is Denominator= 6.055041666666659
The test statistic is TS= 7.473798005794104
The F_critical is F_critical= 3.490294819497605
Test, at the 5 percent level of significance, the hypothesis that the two diets haveequal effect
22
Trang 23import scipy.stats as stats
from scipy.stats import norm, t, chi2, f
print ( 'The mean of WeightI is' ,x1_bar)
print ( 'The mean of WeightII is' ,x2_bar)
x_bar=(x1_bar, x2_bar)
S2_bar=np.var(x_bar,ddof= 1
Numerator=n*S2_bar
print ( 'The variance of two sample means is' , S2_bar)
print ( 'The numerator is Numerator=' , Numerator)
Var1=np.var(WeightI,ddof= 1
Var2=np.var(WeightII,ddof= 1 )
print ( 'The variance of WeightI is' ,Var1)
print ( 'The variance of WeightII is' ,Var2)
print ( 'The denominator is Denominator=' , Denominator)
print ( 'The test statistic is TS=' , TS)
print ( 'The F_critical is F_critical=' , F_critical)
Trang 24The mean of WeightI is 17.5
The mean of WeightII is 17.869999999999997
The variance of two sample means is 0.06844999999999905
The numerator is Numerator= 0.6844999999999906
The variance of WeightI is 65.04000000000002
The variance of WeightII is 38.81344444444445
The denominator is Denominator= 58.585055555555556
The test statistic is TS= 0.011683867046108487
The F_critical is F_critical= 4.413873419170566
Test the hypothesis that the polymer performs equally well at all three temperatures.Use the
(a) 5 percent level of significance
(b) 1 percent level of significance
import scipy.stats as stats
from scipy.stats import norm, t, chi2, f
24
Trang 25print ( 'The mean of low temperature is' ,x1_bar)
print ( 'The mean of ledium temperature is' ,x2_bar)
print ( 'The mean of high temperature is' ,x3_bar)
x_bar=(x1_bar, x2_bar, x3_bar)
S2_bar=np.var(x_bar,ddof= 1
Numerator=n*S2_bar
print ( 'The variance of three sample means is' , S2_bar)
print ( 'The numerator is Numerator=' , Numerator)
Var1=np.var(low,ddof= 1 )
Var2=np.var(medium,ddof= 1
Var3=np.var(high,ddof= 1
print ( 'The variance of low temperature is' ,Var1)
print ( 'The variance of medium temperature is' ,Var2)
print ( 'The variance of high temperature is' ,Var3)
print ( 'The denominator is Denominator=' , Denominator)
print ( 'The test statistic is TS=' , TS)
print ( 'The F_critical 0.05 is F_critical=' , F_critical)
# With alpha=0.01, the F-critical value is
F_critical1=stats.f.ppf( 0.99 , dfn, dfd)
print ( 'The F_critical 0.01 is F_critical=' , F_critical1)
Output:
The mean of low temperature is 36.57142857142857
The mean of ledium temperature is 36.57142857142857
The mean of high temperature is 39.857142857142854
The variance of three sample means is 3.598639455782311
The numerator is Numerator= 25.190476190476176
The variance of low temperature is 23.61904761904762
The variance of medium temperature is 11.285714285714285
The variance of high temperature is 21.809523809523807
Trang 26The test statistic is TS= 1.3324937027707804
The F_critical 0.05 is F_critical= 3.554557145661787
The F_critical 0.01 is F_critical= 6.012904834800529
Test
(a) 5 percent level of significance
Thus, we can not reject the null hypothesis
(b) 1 percent level of significance
Thus, we can not reject the null hypothesis
Question 19
A study has been made on pyrethrum flowers to determine the content of pyrethrin, achemical used in insecticides Four methods of extracting the chemical are used, andsamples are obtained from flowers stored under three conditions: fresh flowers,flowers stored for 1 year, and flower stored for 1 year but treated It is assumed thatthere is no interaction present The data are in Table 10.4
Suggest a model for the preceding information, and use the data to estimate itsparameters