1. Trang chủ
  2. » Luận Văn - Báo Cáo

statistics hypothesis testing regression and analysis of variance

28 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Hypothesis Testing, Regression and Analysis of Variance
Tác giả Trần Châu Thạnh An
Người hướng dẫn Dr. Nguyen Minh Quan, Mr. Nguyen Minh Quan
Trường học Vietnam National University – Ho Chi Minh City
Chuyên ngành Statistics
Thể loại Homework
Năm xuất bản 2023
Thành phố Ho Chi Minh City
Định dạng
Số trang 28
Dung lượng 560,22 KB

Nội dung

1 percent level of significance So we reject the null hypothesis on both levels of significance.Question 58 A standard drug is known to be effective in 75 percent of the cases in which i

Trang 1

VIETNAM NATIONAL UNIVERSITY – HO CHI MINH CITY

INTERNATIONAL UNIVERSITY

HOMEWORK 5 – STATISTICS

HYPOTHESIS TESTING, REGRESSION

AND ANALYSIS OF VARIANCE

Trang 2

Ho Chi Minh City, Vietnam

Trang 3

CHAPTER 8 - HYPOTHESIS TESTING

Question 38

To learn about the feeding habits of bats, 22 bats were tagged and tracked by radio

Of these 22 bats, 12 were female and 10 were male The distances flown (in meters)between feedings were noted for each of the 22 bats, and the following summarystatistics were obtained

Test the hypothesis that the mean distance flown between feedings is the same forthe populations of both male and of female bats Use the 5 percent level ofsignificance

Trang 4

The mean weight of newborns are the same in both countries

The p-value of this test:

life-a 5 percent level of significance

b 1 percent level of significance

Solution

Let X to be random variable that marks the number of calls that involve life –threatening emergencies

We have

The size of testing sample: n = 200

The p-value of this test:

a 5 percent level of significance

b 1 percent level of significance

So we reject the null hypothesis on both levels of significance

Question 58

A standard drug is known to be effective in 75 percent of the cases in which it is used

to treat a certain infection A new drug has been developed and has been found to beeffective in 42 cases out of 50 Based on this, would you accept, at the 5 percent level

of significance, the hypothesis that the two drugs are of equal effectiveness? What isthe p-value

Solution

We have

4

Trang 5

CHAPTER 9: REGRESSION

Question 1

The following data relate x, the

moisture of a wet mix of a certain

product, to Y, the density of the finished

product

a Draw a scatter diagram

The estimated regression line : y =

Trang 6

Question 3

The corrosion of a certain metallic substance has been studied in dry oxygen at 500degrees centigrade In this experiment, the gain in weight after various periods ofexposure was used as a measure of the amount of oxygen that had reacted with thesample Here are the data:

a Plot a scatter diagram

b Fit a linear relation

c Predict the percent weight gain when the metal is exposed for 3.2 hours

The slope is 0.011742857142857138 and the y-intercept is 0.007185714285714305

The estimated regression line is

c Predict the percent weight gain when the metal is exposed for 3.2 hours

(x=3.2)

6

Trang 7

Question 4

The following data indicate the relationship between x, the specific gravity of a woodsample, and Y, its maximum crushing strength in compression parallel to the grain

a Plot a scatter diagram Does a linear relationship seem reasonable?

b Estimate the regression coefficients

c Predict the maximum crushing strength of a wood sample whose specific gravity is 43

A linear relationship seems unreasonable

b Estimate the regression coefficients

The estimated regression line is

A and B are the estimators of regression parameters

Trang 8

c Predict the maximum crushing strength of a wood sample whose specific gravity is 0.43 (x=0.43)

Question 5

The following data indicate the gain in reading speed versus the number of weeks inthe program of 10 students in a speed-reading program

a Plot a scatter diagram to see if a linear relationship is indicated

b Find the least squares estimates of the regression coefficients

c Estimate the expected gain of a student who plans to take the program for 7 weeks

Output: The slope is 11.796897038081807 and the y-intercept is 2.638928067700968

The estimated regression line is

b Find the least squares estimates of the regression coefficients

8

Trang 9

c Estimate the expected gain of a student who plans to take the program for 7 weeks (x=7)

Question 6

Infrared spectroscopy is often used to determine the natural rubber content ofmixtures of natural and synthetic rubber For mixtures of known percentages, theinfrared spectroscopy gave the following readings:

print ( 'The slope is' , b1,

'and the y-intercept is' , b0)

plt.show()

Output: The slope is 0.007025714285714285 and the y-intercept is 0.7497142857142856

The estimated regression line is

The estimate percentage of natural rubber when a new mixture gives an infraredspectroscopy reading of 1.15: y = 1.15

Question 10

Verify that

Solution

Question 11

The following table relates the number of sunspots that appeared each year from

1970 to 1983 to the number of auto accident deaths during that year Test thehypothesis that the number of auto deaths is not affected by the number of sunspots.(The sunspot data are from Jastrow and Thompson, Fundamentals and Frontiers ofAstronomy, and the auto death data are from General Statistics of the U.S 1985.)

Trang 10

print ( 'The mean of x is' ,x_bar)

print ( 'The mean of y is' , y_bar)

# The sum of xy, x^2, y^2

yy_Sum= sum (yy)

print ( 'The sum of xy is' , xy_Sum)

print ( 'The sum of x^2 is' , xx_Sum)

print ( 'The sum of y^2 is' , yy_Sum)

# Calculate Sxx, Syy, Sxy, SSR, B

Sx= sum (xx)-n*x_bar*x_bar

Sy= sum (yy) - n* pow (y_bar, 2 )

Sxy= sum (xy)-n*x_bar*y_bar

SSr = (Sx*Sy-Sxy*Sxy)/Sx

B=Sxy/Sx

print ( 'The Sxx is' , Sx)

print ( 'The Syy is' , Sy)

print ( 'The Sxy is' , Sxy)

print ( 'The SSR is' , SSr)

print ( 'The B is' , B)

10

Trang 12

a Do the above data establish the hypothesis that a lawyer’s salary is related to his height? Use the 5 percent level of significance

print ( 'The mean of x is' ,x_bar)

print ( 'The mean of y is' , y_bar)

# The sum of xy, x^2, y^2

yy_Sum= sum (yy)

print ( 'The sum of xy is' , xy_Sum)

print ( 'The sum of x^2 is' , xx_Sum)

print ( 'The sum of y^2 is' , yy_Sum)

# Calculate Sxx, Syy, Sxy, SSR, B

Sx= sum (xx)-n*x_bar*x_bar

Sy= sum (yy) - n* pow (y_bar, 2 )

Sxy= sum (xy)-n*x_bar*y_bar

SSr = (Sx*Sy-Sxy*Sxy)/Sx

B=Sxy/Sx

print ( 'The Sxx is' , Sx)

print ( 'The Syy is' , Sy)

print ( 'The Sxy is' , Sxy)

print ( 'The SSR is' , SSr)

print ( 'The B is' , B)

Trang 13

a Draw a scatter diagram relating cigarette use and death rates from lung cancer

b Estimate the regression parameters α and β

Trang 14

c Test at the 05 level of significance the hypothesis that cigarette consumption doesnot affect the death rate from lung cancer

d What is the p-value of the test in part (c)?

print ( 'The mean of x is' ,x_bar)

print ( 'The mean of y is' , y_bar)

# The sum of xy, x^2, y^2

yy_Sum= sum (yy)

print ( 'The sum of xy is' , xy_Sum)

print ( 'The sum of x^2 is' , xx_Sum)

print ( 'The sum of y^2 is' , yy_Sum)

# Calculate Sxx, Syy, Sxy, SSR, A, B

Sx= sum (xx)-n*x_bar*x_bar

Sy= sum (yy) - n* pow (y_bar, 2 )

Sxy= sum (xy)-n*x_bar*y_bar

SSr = (Sx*Sy-Sxy*Sxy)/Sx

B=Sxy/Sx

A=y_bar-B*x_bar

print ( 'The Sxx is' , Sx)

print ( 'The Syy is' , Sy)

print ( 'The Sxy is' , Sxy)

print ( 'The SSR is' , SSr)

print ( 'The A is' , A)

print ( 'The B is' , B)

# Test hypothesis

14

Trang 16

b Estimate the regression parameters α and β.

A and B are the estimators of regression parameters

Trang 17

CHAPTER 10 – ANALYSIS OF VARIANCE

Question 1

A purification process for a chemical involves passing it, in solution, through a resin onwhich impurities are adsorbed A chemical engineer wishing to test the efficiency of 3different resins took a chemical solution and broke it into 15 batches She tested eachresin 5 times and then measured the concentration of impurities after passing throughthe resins Her data were as follows:

Test the hypothesis that there is no difference in the efficiency of the resins

import scipy.stats as stats

from scipy.stats import norm, t, chi2, f

print ( 'The mean of ResinI is' ,x1_bar)

print ( 'The mean of ResinII is' ,x2_bar)

print ( 'The mean of ResinIII is' ,x3_bar)

x_bar=(x1_bar, x2_bar, x3_bar)

S2_bar=np.var(x_bar,ddof= 1

Numerator=n*S2_bar

print ( 'The variance of three sample means is' , S2_bar)

print ( 'The numerator is Numerator=' , Numerator)

Var1=np.var(ResinI,ddof= 1

Var2=np.var(ResinII,ddof= 1

Trang 18

Var3=np.var(ResinIII,ddof= 1 )

print ( 'The variance of ResinI is' ,Var1)

print ( 'The variance of ResinII is' ,Var2)

print ( 'The variance of ResinIII is' ,Var3)

print ( 'The denominator is Denominator=' , Denominator)

print ( 'The test statistic is TS=' , TS)

print ( 'The F_critical is F_critical=' , F_critical)

Output:

The mean of ResinI is 0.029000000000000005

The mean of ResinII is 0.027600000000000003

The mean of ResinIII is 0.030000000000000006

The variance of three sample means is 1.453333333333337e-06

The numerator is Numerator= 7.266666666666685e-06

The variance of ResinI is 0.00021749999999999997

The variance of ResinII is 0.0001123

The variance of ResinIII is 0.00011750000000000004

The denominator is Denominator= 0.0001491

The test statistic is TS= 0.04873686563827421

The F_critical is F_critical= 3.8852938346523933

18

Trang 19

Do the ovens appear to operate at the same temperature? Test at the 5 percent level

of significance What is the p-value?

import scipy.stats as stats

from scipy.stats import norm, t, chi2, f

print ( 'The mean of TemI is' ,x1_bar)

print ( 'The mean of TemII is' ,x2_bar)

print ( 'The mean of TemIII is' ,x3_bar)

x_bar=(x1_bar, x2_bar, x3_bar)

S2_bar=np.var(x_bar,ddof= 1

Numerator=n*S2_bar

print ( 'The variance of three sample means is' , S2_bar)

print ( 'The numerator is Numerator=' , Numerator)

Var1=np.var(TemI,ddof= 1

Var2=np.var(TemII,ddof= 1 )

Var3=np.var(TemIII,ddof= 1

print ( 'The variance of TemI is' ,Var1)

print ( 'The variance of TemII is' ,Var2)

print ( 'The variance of TemIII is' ,Var3)

Trang 20

print ( 'The denominator is Denominator=' , Denominator)

print ( 'The test statistic is TS=' , TS)

print ( 'The F_critical is F_critical=' , F_critical)

The mean of TemI is 493.41999999999996

The mean of TemII is 482.64

The mean of TemIII is 494.71999999999997

The variance of three sample means is 43.97079999999984

The numerator is Numerator= 219.8539999999992

The variance of TemI is 12.61199999999996

The variance of TemII is 18.463000000000054

The variance of TemIII is 33.56200000000014

The denominator is Denominator= 21.54566666666672

The test statistic is TS= 10.20409363058304

The F_critical is F_critical= 3.8852938346523933

Do the data indicate that the procedures yield equivalent results?

Solution

20

Trang 21

Input:

from numpy.lib.function_base import average

import numpy as np

import pandas as pd

import scipy.stats as stats

from scipy.stats import norm, t, chi2, f

print ( 'The mean of MethodI is' ,x1_bar)

print ( 'The mean of MethodII is' ,x2_bar)

print ( 'The mean of MethodIII is' ,x3_bar)

print ( 'The mean of MethodIV is' ,x4_bar)

x_bar=(x1_bar, x2_bar, x3_bar, x4_bar)

S2_bar=np.var(x_bar,ddof= 1

Numerator=n*S2_bar

print ( 'The variance of four sample means is' , S2_bar)

print ( 'The numerator is Numerator=' , Numerator)

Var1=np.var(MethodI,ddof= 1

Var2=np.var(MethodII,ddof= 1 )

Var3=np.var(MethodIII,ddof= 1

Var4=np.var(MethodIV,ddof= 1 )

print ( 'The variance of MethodI is' ,Var1)

print ( 'The variance of MethodII is' ,Var2)

print ( 'The variance of MethodIII is' ,Var3)

print ( 'The variance of MethodIV is' ,Var4)

print ( 'The denominator is Denominator=' , Denominator)

print ( 'The test statistic is TS=' , TS)

Trang 22

print ( 'The F_critical is F_critical=' , F_critical)

The mean of MethodI is 78.41000000000001

The mean of MethodII is 80.75500000000001

The mean of MethodIII is 76.50999999999999

The mean of MethodIV is 84.32000000000001

The variance of four sample means is 11.313539583333371

The numerator is Numerator= 45.254158333333486

The variance of MethodI is 2.6694666666666724

The variance of MethodII is 1.6527000000000047

The variance of MethodIII is 13.316666666666634

The variance of MethodIV is 6.581333333333327

The denominator is Denominator= 6.055041666666659

The test statistic is TS= 7.473798005794104

The F_critical is F_critical= 3.490294819497605

Test, at the 5 percent level of significance, the hypothesis that the two diets haveequal effect

22

Trang 23

import scipy.stats as stats

from scipy.stats import norm, t, chi2, f

print ( 'The mean of WeightI is' ,x1_bar)

print ( 'The mean of WeightII is' ,x2_bar)

x_bar=(x1_bar, x2_bar)

S2_bar=np.var(x_bar,ddof= 1

Numerator=n*S2_bar

print ( 'The variance of two sample means is' , S2_bar)

print ( 'The numerator is Numerator=' , Numerator)

Var1=np.var(WeightI,ddof= 1

Var2=np.var(WeightII,ddof= 1 )

print ( 'The variance of WeightI is' ,Var1)

print ( 'The variance of WeightII is' ,Var2)

print ( 'The denominator is Denominator=' , Denominator)

print ( 'The test statistic is TS=' , TS)

print ( 'The F_critical is F_critical=' , F_critical)

Trang 24

The mean of WeightI is 17.5

The mean of WeightII is 17.869999999999997

The variance of two sample means is 0.06844999999999905

The numerator is Numerator= 0.6844999999999906

The variance of WeightI is 65.04000000000002

The variance of WeightII is 38.81344444444445

The denominator is Denominator= 58.585055555555556

The test statistic is TS= 0.011683867046108487

The F_critical is F_critical= 4.413873419170566

Test the hypothesis that the polymer performs equally well at all three temperatures.Use the

(a) 5 percent level of significance

(b) 1 percent level of significance

import scipy.stats as stats

from scipy.stats import norm, t, chi2, f

24

Trang 25

print ( 'The mean of low temperature is' ,x1_bar)

print ( 'The mean of ledium temperature is' ,x2_bar)

print ( 'The mean of high temperature is' ,x3_bar)

x_bar=(x1_bar, x2_bar, x3_bar)

S2_bar=np.var(x_bar,ddof= 1

Numerator=n*S2_bar

print ( 'The variance of three sample means is' , S2_bar)

print ( 'The numerator is Numerator=' , Numerator)

Var1=np.var(low,ddof= 1 )

Var2=np.var(medium,ddof= 1

Var3=np.var(high,ddof= 1

print ( 'The variance of low temperature is' ,Var1)

print ( 'The variance of medium temperature is' ,Var2)

print ( 'The variance of high temperature is' ,Var3)

print ( 'The denominator is Denominator=' , Denominator)

print ( 'The test statistic is TS=' , TS)

print ( 'The F_critical 0.05 is F_critical=' , F_critical)

# With alpha=0.01, the F-critical value is

F_critical1=stats.f.ppf( 0.99 , dfn, dfd)

print ( 'The F_critical 0.01 is F_critical=' , F_critical1)

Output:

The mean of low temperature is 36.57142857142857

The mean of ledium temperature is 36.57142857142857

The mean of high temperature is 39.857142857142854

The variance of three sample means is 3.598639455782311

The numerator is Numerator= 25.190476190476176

The variance of low temperature is 23.61904761904762

The variance of medium temperature is 11.285714285714285

The variance of high temperature is 21.809523809523807

Trang 26

The test statistic is TS= 1.3324937027707804

The F_critical 0.05 is F_critical= 3.554557145661787

The F_critical 0.01 is F_critical= 6.012904834800529

Test

(a) 5 percent level of significance

Thus, we can not reject the null hypothesis

(b) 1 percent level of significance

Thus, we can not reject the null hypothesis

Question 19

A study has been made on pyrethrum flowers to determine the content of pyrethrin, achemical used in insecticides Four methods of extracting the chemical are used, andsamples are obtained from flowers stored under three conditions: fresh flowers,flowers stored for 1 year, and flower stored for 1 year but treated It is assumed thatthere is no interaction present The data are in Table 10.4

Suggest a model for the preceding information, and use the data to estimate itsparameters

Ngày đăng: 24/07/2024, 16:05

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w