bài tập về thống kê trong kinh doanh MBA e

25 50 0
 bài tập về thống kê trong kinh doanh MBA e

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Bài tập Thống kinh doanh Thread: To study the rate of people dying from diseases related to heart disease, a research group at American University of collecting data in several states across the U.S in the number of deaths and some socio-economic data related The table given below: The State number of Age 65 Income AL AK AZ AR CA CO CT DE FL GA HI ID deaths 307.1 90.9 226.0 325.9 217.0 158.3 278.1 266.9 340.4 225.9 203.3 202.3 13.0 5.7 13.0 14.0 10.6 9.7 13.8 13.0 17.6 9.6 13.3 11.3 IL 275.3 12.1 23.471 30.064 25.578 22.257 32.275 32.949 40.640 31.255 28.145 27.940 28.221 24.180 32.2 IN 280.4 12.4 IA 303.2 14.9 KS 262.8 13.3 KY 305.4 12.5 LA 274.6 11.6 ME MD MA MI 272.8 233.6 257.0 280.8 14.4 11.3 13.5 12.3 59 27.0 11 26.7 23 27.8 16 24.2 94 23.3 34 25.623 33.872 37.992 29.612 The rate of color regions 26.0 3.5 3.1 15.7 6.7 3.8 9.1 19.2 14.6 28.7 1.8 0.4 1 1 1 1 1 1 15.1 8.4 2.1 5.7 7.3 32.5 0.5 27.9 5.4 14.2 1 1 MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA WA WV WI WY 199.6 337.2 328.7 232.1 269.9 233.9 229.0 288.5 198.4 324.2 250.8 289.3 294.9 335.4 219.0 347.7 303.6 256.9 276.1 296.9 216.6 130.8 226.0 223.0 200.0 377.5 263.3 210.4 12.1 12.1 13.5 13.4 13.6 11.0 12.0 13.2 11.7 12.9 12.0 14.7 13.3 13.2 12.8 15.6 14.5 12.1 14.3 12.4 9.9 8.5 12.7 11.2 11.2 15.3 13.1 11.7 32.101 20.993 27.445 22.569 27.829 30.529 33.332 36.983 22.203 34.547 27.194 25.068 28.400 23.517 28.350 29.539 29.685 24.321 26.115 26.239 27.871 23.907 26.901 31.162 31.528 21.915 28.232 27.230 3.5 36.3 11.2 0.3 4.0 6.8 0.7 13.6 1.9 15.9 21.6 0.6 11.5 7.6 1.6 10.0 4.5 29.5 0.6 16.4 11.5 0.8 0.5 19.6 3.2 3.2 5.7 0.8 1 1 2 2 2 2 2 2 2 2 2 2 2 2 In which: - Number of deaths: The number of deaths related to cardiovascular disease per 100,000 population - Age 65: Percentage of population aged 65 years and older - Income: per capita income measured in thousands of dollars - The rate of color: the percentage of the population are people of color - Region: The states are divided into two research areas of Zone and Zone Please use the above data to answer the following questions: Use appropriate statistical description to comment on the events in the data 2 Use the appropriate graph and correlation coefficient to comment on the relationship between the number of deaths due to cardiovascular diseases associated with each remaining variable Since then identify if set up linear regression models with the dependent variable is the number of deaths, which of the variables remaining in the can affect the dependent variable (no need to distinguish the region) Please estimate the confidence interval for the average number of deaths for the states in the region and region Compare the average number of deaths for the states in the region and region (for testing) Compared with results similar to the comparison of income? Estimate a linear regression model with the dependent variable is the number of deaths, the remaining independent variables are variables (by region): a Explain the significance of the regression coefficients and the R2 coefficient b Use the appropriate expertise to know which independent variables affect and not affect the dependent variable? Since it can be made the comment about the factors can affect the rate of deaths due to cardiovascular diseases Whether there are other factors that can affect mortality rates this? c Use F for testing whether the model makes sense or not? If the meaning of the results obtained d Predict the percentage of deaths in one state with the independent variables, respectively: 15% aged 65 years or older and average income 25000usd, 4% black Explanation for the results received Answer: To answer the question, the members of the group I've been using materials Megastat software to analyze the data and then use the results from the software to answer the question Use appropriate statistical description to comment on the events in the data 1.1 The number of deaths related to cardiovascular disease From software Megastat/ Descriptive statistics, then enter the number of deaths on the table I have the following tables Descriptive statistics The number of deaths count mean sample variance sample standard deviation minimum maximum range 50 258.954 3,191.835 56.496 90.9 377.5 286.6 skewness kurtosis coefficient of variation (CV) -0.482 0.681 21.82% 1st quartile median 3rd quartile interquartile range mode 223.725 265.100 296.400 72.675 226.000 Comment: The above table we find: - The number of the states studied are: the 50 states - The number of deaths related to cardiovascular disease in an American state average is 259 per 100,000 population The number of deaths related to cardiovascular disease median is 265,100 Thus, 50% of the states studied The number of deaths related to cardiovascular disease is lower than 265.1 and 50% of the states studied The number of deaths related to major cardiovascular disease more than 265.1 The number of deaths related to cardiovascular disease in an average state median approximation shows the sample distribution is quite symmetrical - The sample standard deviation is: 56496 shows the deviation of the distribution - Some states have number of deaths from cardiovascular diseases related to the same but the number of deaths related to cardiovascular disease the most common (maximum frequency) is 226 per 100,000 population The number of deaths related to cardiovascular disease in a low state are: 90.9 per 100,000 people The number of deaths related to cardiovascular disease in the highest state is: 377.5 per 100,000 population Range, in fact, is 286.6 The chart shows the frequency of the number of deaths related to cardiovascular disease in an American state From software Megastat/Frequency Distribution/Quantitative, data entry people die on the table, from which we have the following tables: Frequency Distribution - Quantitative The number of deaths uppe midpoin lower 50 10 15 20 250 300 350 cumulative widt frequenc percen frequenc percen < r 100 t 75 h 50 y t 2.0 y t 2.0 < 150 125 50 2.0 4.0 < 200 175 50 6.0 10.0 < 250 225 50 15 30.0 20 40.0 < < < 300 350 400 275 325 375 50 50 50 18 11 36.0 22.0 2.0 38 49 50 76.0 98.0 100.0 50 100.0 Based on the above frequency distribution table, we find: The number of deaths related to cardiovascular disease in the U.S states is popular 200 - 350 of 100,000 (accounting for 88%) Number of dead people Frequency distribution graph of the number of deaths is quite variable balance, focus in the middle However, the deviation (Sknewness) of the chart is -0482 α) 4.2 Comparing the average income of the people of the state in Zone and Zone We the same as Section 4.2 we have the following results: Hypothesis Test: Independent Groups (t-test, pooled variance) Income Group Group 28.40842 4.82352 26 29.27488 7.48108 24 mean std dev n 48 -0.866452 38.935179 6.239806 1.766297 df difference (Thu nhập - Group 2) pooled variance pooled std dev standard error of difference hypothesized difference -0.49 6260 t p-value (two-tailed) Based on the above results, we see: With a significance level α = 5%, the average income of the average people in the states of Zone and Zone is the same (p-value> α -> nobasis reject the hypothesis H0) Estimate a linear regression model with the dependent variable is the number of deaths and the remaining independent variables are variables R² Adjusted R² R Std Error 0.774 0.759 0.880 27.731 SS Df N K Dep Var 50 Number of deaths ANOVA table Source Regression 121,024.7645 MS 40,341.588 F p-value 52.46 6.92E-15 20 Residual Total 35,375.1597 156,399.9242 46 49 769.0252 Regression output confidence interval std variables coefficients Intercept Age 65 Income Rate of -60.1955 24.5202 -0.3757 error 32.6430 2.0904 0.6430 Colored 2.2768 0.4171 t (df=46) p-value 95% lower 95% upper -1.844 11.730 -0.584 0716 2.01E-15 5619 -125.9025 20.3124 -1.6700 5.5114 28.7280 0.9186 5.459 1.86E-06 1.4373 3.1163 People 5.1.Explain the significance of the regression coefficient and the coefficient of R2 Based on the above table we find: The model obtained is: (Number of deaths) = -60.2 + 24.5xAge65 - 0.4xIncome+ 2.3xRate of colored people The meaning of the regression coefficients: + 24.5: If the income and the rate of color are kept constant, while the percentage of the population aged 65 and older increased by 1%, themselves in civil 100.00, Number of deaths related to cardiovascular disease increased by 24.5 people + (-0.4): If the percentage of the population aged 65 years and older and the proportion of color is held constant, while the average income increased $ 1,000, themselves in civil 100.00, Number of deaths related to heart disease reduction circuit 0.4 + 2.3: If the percentage of the population aged 65 years and older and income is held constant, while the percentage of people of color increased by 1%, themselves in civil 100.00, Number of deaths related to cardiovascular disease increased by 2.3 people - The meaning of R2 = 0774: With 03 independent variable is the percentage of the population aged 65 years and older, the average income of the people and the percentage 21 of the population are people of color, the model explained 77.4% the change of the number of deaths related to cardiovascular diseases 5.2 Use the appropriate expertise to know which independent variables affect and not affect the dependent variable? Since it can be made the comment about the factors can affect the rate of deaths due to cardiovascular diseases Whether there are other factors that can affect mortality rates this? Based on the table above, to test the independent variables that affect and not affect the dependent variable, we build the 03 pairs following assumptions: + Pair of hypothesis 1: H0: β1 = (Percentage of the population aged 65 years and older did not affect the number of deaths) H1: β1 ≠ (Percentage of the population aged 65 years and older can affect the number of deaths) + Pair of hypothesis 2: H0: β2 = (average income does not affect the number of deaths) H1: β2 ≠ (average income may affect the number of deaths) + Pair of hypothesis 3: H0: β3 = (average income does not affect the number of deaths) H1: β3 ≠ (average income may affect the number of deaths) To test pairs assumptions, we observe the value of P - Value obtained in the original spreadsheet: + With assumption pair: P-Value = 2x10-15 reject the hypothesis H0 -> Percentage of the population aged 65 years and older may affect the number of deaths due to heart-related diseases circuit + With assumption pairs: P-Value = 0.56> α = 0.05 -> accept the hypothesis H0 -> The average income of the people does not affect the number of deaths due to cardiovascular-related diseases 22 + With assumption pairs: P-Value = 1.86x10-6 reject the hypothesis H0 -> Percentage of the population are people of color can affect the number of deaths due to heart-related diseases circuit Conclusion: With three independent variables of this study, only two variables is the percentage of the population aged 65 years or more and the percentage of the population are people of color can affect the number of deaths due to cardiovascular-related diseases Thus, there are other factors that can affect the number of deaths due to cardiovascular-related diseases that we need more research as: The number of doctors per 100,000 population, the ratio of male / female per 100,000 population, 5.3 Use F for testing whether the model makes sense or not? If the meaning of the results obtained Based on the test results, we built a linear regression model shows the relationship between the number of deaths due to cardiovascular diseases related to (dependent variable) with the percentage of the population aged 65 years up and the percentage of the population are people of color (independent variables) Regression Analysis R² Adjusted R² R Std Error 0.772 0.762 0.879 27.536 N K Dep Var SS 120,762.185 Df MS F p-value 60,381.0927 79.63 8.04E-16 47 758.2498 50 Number of deaths ANOVA table Source Regression Residual Total 35,637.7388 156,399.924 Regression output 49 confidence interval 23 std variables coefficients Intercept -70.7553 Age 65 24.4814 2.0747 11.800 2.2987 0.4125 5.573 Error 26.993 t (df=47) p-value -2.621 0118 Rate of colored people 1.18E15 1.18E06 95% 95% lower - upper 125.0585 -16.4522 20.3077 28.6551 1.4689 3.1285 The model obtained is: (Number of deaths) = -70.7 + 24.5 x Age 65 + 2.3x rate of colored people To test whether the model makes sense or not, we use test pair of the following assumptions: H0: β1 = β2 = (Percentage of the population aged 65 years and older and the rate of color does not affect the number of deaths) H1: At least one coefficient β ≠ (There are at least in variables Percentage of population aged 65 years or older or black rate affects the number of deaths) Based on the results of testing expertise ANNOVA and F as the table above we have: - The meaning of R2 = 0772 -> The model makes sense in explaining the variation of the number of deaths due to cardiovascular-related diseases: With 02 independent variable is the percentage of the population aged 65 years and older and percentage of the population are people of color, the model explained 77.2% of the change of the number of deaths related to cardiovascular diseases - The coefficient β> that depends proportional relationship with the independent variables - Value P-Value in testing F is 8.04x10-16 reject the hypothesis H0 -> At least one of the two variables Percentage of population aged 65 years or more, or the rate of color affect the number of deaths 24 Conclusion: To reduce the rate of deaths from cardiovascular disease, the government needs more investment attention to health systems, health care for the elderly (over 65 years) Conclusion: To reduce the rate of deaths from cardiovascular disease, the government needs more investment attention to health systems, health care for the elderly (over 65 years) 5.4 Predict the percentage of deaths in one state with the independent variables, respectively: 15% aged 65 years or older and average income 25000usd, 4% black Explanation for the results received Predicted values for: Deaths Age 65 15 95% Confidence 95% Prediction Interval Interval Rate of colored Predicted people 305.6603 lower upper 292.1914 319.1291 lower upper 248.6504 362.6701 Leverage 0.059 If one state Percentage of population aged 65 to 15 and the percentage of the population are people of color is 4%, then: The number of deaths due to cardiovascularrelated diseases will range from 292 to 319 people in 100,000 people References: Curriculum Decision Management - Dr Nguyen Manh The - PGSM 25 ... of the variables remaining in the can affect the dependent variable (no need to distinguish the region) Please estimate the confidence interval for the average number of deaths for the states in... The above table we find: - The number of the states studied are: the 50 states - The average income of the people in a state average of U.S $ 28,824 thousand The average income of the people... model shows the relationship between the number of deaths due to cardiovascular diseases related to (dependent variable) with the percentage of the population aged 65 years up and the percentage

Ngày đăng: 09/11/2018, 14:52

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan