1. Trang chủ
  2. » Tất cả

Report assignment probability and statistic report assignment one way anova

32 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

VIET NAM NATIONAL UNIVERSITY HCMC UNIVERSITY OF TECHNOLOGY DEPARTMENT OF CHEMICAL ENGNEERING Report Assignment PROBABILITY AND STATISTIC Report Assignment Lecturer: PhD Nguyễn Tiến Dũng CC02 – Group 09 Team member No Name Student ID Nguyễn Gia Phát 2152228 Nguyễn Trọng Nguyên 2152197 Lê Nguyễn Phú Anh 2152383 Nguyễn Quốc Hưng 2153411 Đỗ Tấn Kiệt 1852490 Sign Ho Chi Minh, Sunday 04nd December 2022 HCMC University of Technology TABLE OF CONTENTS I Topic II Theoretical basis 2.1 One-way ANOVA .2 2.2 Two-way ANOVA 2.3 Prediction model - Multiple Linear Regression III Data processing 1.Data import Checking statistics values Data visualization Building a linear regression model 19 Make forecasts for the compressive strength of concrete 24 REFERENCES 25 Contribution of team members Points 1|Page HCMC University of Technology I Topic Concrete is the most important material in civil engineering The concrete compressive strength is a highly nonlinear function of age and ingredients File “concrete.csv” contains information about the compressive strength of concrete affected by variables The data set was taken from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength The data set contains 1030 instances of the compressive strength of concrete and attributes Mains variables in the dataset: Cement – quantitative – kg in a m3 mixture – Input Variable Blast Furnace Slag – quantitative – kg in a m3 mixture – Input Variable Fly Ash – quantitative – kg in a m3 mixture – Input Variable Water – quantitative – kg in a m3 mixture – Input Variable Superplasticizer – quantitative – kg in a m3 mixture – Input Variable Coarse Aggregate – quantitative – kg in a m3 mixture – Input Variable Fine Aggregate – quantitative – kg in a m3 mixture – Input Variable Age – quantitative – kg in a m3 mixture – Input Variable Concrete compressive strength – quantitative –MPa– Out Variable The purpose of our team is to test whether the linear regression model between the concrete compressive strength really exits, if it does, make a forecast base on the data in the file “concrete.csv”, and use Anova analyze the influence of each variable II Theoretical basis 2.1 One-way ANOVA One way ANOVA is a hypothesis test used for testing the equality of three or more population means simultaneously using variance For example: 2|Page HCMC University of Technology In one laboratory, a team studied whether changes in CO2 concentration affected the germination rate of soybean seeds by gradually increasing the CO2 concentration and recording the height of the bean sprouts after day • Statistical problem: Comparing the height means between groups of CO2 concentration Assumptions for using one-way ANOVA: • The population are normally distributed To test the normality, we use the Normal probability plot of the Residuals (mentioned in Prediction model) • The sample are random and independent • The population has equal variances An observed dataset can be generalized as table below: Treatment Observation y y y y … … y y 11 … 21 A a1 12 22 a2 Totals Average 1n y1 y1 2n … … … y2 y2 … … ya ya … … y y y an a y =∑ … n ∑ yij y = y / an i=1 j=1 Model considered: Yij=µ+ τi+ϵij (i = 1, 2, , a; j = 1, 2, , n) ã Where: is the overall mean, i is the ith treatment effect, ϵij is the random error component Null and alternative hypotheses: { H 0: τ 1=τ 2=…=τk=0 H : τi≠ with at least one i Sum of square (SS) Degree of Median of square (MS) 3|Page HCMC University of Technology freedom(df) SS tre atmen t = n∑ ( yi − y ) Treatme nt a −1 MSt reatment = SStreatme nt a a−1 i=1 Error SS E=n ∑ ∑ ( yij − yi ) a (n − 1) a MSerror = [a(n−1)] i=1 j=1 Total SSE n SST = SStreatment + an – SSerror Test statistic: F 0= MStreatmen t MSE =SStreatment /(a−1)¿ ¿ SSE /¿¿ • F0 has a Fisher distribution with (a−1) and a (n−1)degree of freedom F ∼ fa−1 ,a (n−1) · • Given α, H0 would be rejected if f >fa−1 ,a( n−1)α· 2.2 Two-way ANOVA Two-way ANOVA is a statistical technique that used for examining the effect of two factors on the continuous dependent variable It also studies the interrelationship between the two independent variables which influences the values of the dependent one For example: In an Arithmetic test, several male and female students of different ages participated Exam results are recorded In this case, two-way ANOVA could be used to determine if gender and age affected the scores • Statistical problem: Comparing the score means according to the genders and ages Assumptions for using two-way ANOVA are similar with one-way ANOVA (section 2.2) The table of dataset for two-way ANOVA can be generalize as follow: Factor Factor 1 X11 X21 4|Page HCMC University of Technology X12 X22 XK2 The mean values: Mean of each Mean of each row Total mean column H ∑ Xij j= X H K H K H i=1 Xi =∑ Xij ∑ ∑ Xij j=1 ,2 , , H j=1 X = i=1 , , ,K i=1 j=1 n ∑ Xi = i=1 K ∑Xj = j=1 H Variance analysis factors: Group i Sum of square Median of square SSk K ∑( X − SSK=H i X ¿)¿ MSK = i=1 K−1 Degree of freedom K−1 F-ratio MSK F1 = Group j SSH H SSH=K ∑( X − X j ¿)¿ MSH = i=1 Error Total MSE MSH H−1 H −1 F2 = MS E SSE SSE=SST −SSK−SSH MSE= (H−1)(K −1) (H−1)(K −1) K SST =∑( X ij − X ¿) ¿ KH−1 i=1 Factor H0 No difference in means of group i Factor No difference in means of group j H1 At least difference in means of group i At least difference in means of group j Given α Reject H0 if Reject H0 if f 1> fk−1 ,(k −1)( h−1),α· f 2> fh−1 ,(k −1)( h−1) ,α· 5|Page HCMC University of Technology 2.3 Prediction model - Multiple Linear Regression Regression analysis is the collection of statistical tools that are used to model and explore relationships between variables that are related in a non-deterministic manner Multiple linear regression is a critical technique that is deployed to study the linearity and dependency between a group of independent variables and a dependent one The general formula for multiple linear regression can be expressed as: Y = β0 + β1 x1 +…+ βk xk +ϵ • β , β , , βn are regression coefficients Each parameter represents the change in the mean response, E( y) , per unit increase in the associated predictor variable when all the other predictors are held constant • ϵ is called the random error and follow N (0 , σ 2) Assumptions of multiple linear regression model: • A linear relationship between the dependent and independent variables (can be tested by using Scatter diagram) Notice that, in some cases, the independent variables are not in compatible formats or linear relationship We can use data transformation to make them fitted and better organized • The independent variables are not highly correlated with each other • The variance of the residuals is constant • Independence of observation • Multivariate normality (occurs when residuals are normally distributed) Predicted Values and Residuals: • A predicted value is calculated as ^yi=b0 +b1 x1 + +bk xk , where the b values come from statistical software and the x-values are specified by us • A residual (error) term is calculated as ei= yi− ^yi , the difference between an actual and a predicted value of y 6|Page HCMC University of Technology We see that when in normal form, it is very difficult to see the linearity (specifically, covariance) of the two variables cement and csMPa, and when we converted to log ( x+1) form, it is quite easy to see the linearity between the two variables but still a bit uncertain We will check the next variables log ( x+1) csMPa variables is distributed in relation to the Draw a scatter plot to display how the other variables both and before the form transfer Figure 11: R code and results when plotting the scatter plot showing the distribution of the csMPa variables according to the slag before and after the transfer to form log ( x+1) 12 | P a g e HCMC University of Technology Figure 11: R code and results when plotting the scatter plot showing the distribution of the csMPa variables according to the flyash before and after the transfer to log ( x+1) form 13 | P a g e ... base on the data in the file “concrete.csv”, and use Anova analyze the influence of each variable II Theoretical basis 2.1 One- way ANOVA One way ANOVA is a hypothesis test used for testing the... if gender and age affected the scores • Statistical problem: Comparing the score means according to the genders and ages Assumptions for using two -way ANOVA are similar with one- way ANOVA (section... 2.1 One- way ANOVA .2 2.2 Two -way ANOVA 2.3 Prediction model - Multiple Linear Regression III Data processing 1.Data import Checking statistics

Ngày đăng: 02/03/2023, 22:26

w