RMIT International University Vietnam ASSIGNMENT COVER PAGE Faroe Islands, Gibraltar, Guerney and Aderney, Jersey, Kosovo, Liechtenstein, Vatican City, Svalbard and Jan Mayen Islands, San Marino, The
Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 28 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
28
Dung lượng
2,15 MB
Nội dung
1 RMIT International University Vietnam ASSIGNMENT COVER PAGE Subject code Subject name Class time Location and campus Title of Assignment Student name - Student number ECON1193 Business Statistics Thursday 11:30 RMIT Vietnam – SGS Team Assignment Report 3A Ho Trong Dat - S3804678 Do Hoai Viet - S3750310 Phan Minh Dang Khoa - S3818139 Tu Huu Phuc - S3812120 Greeni Maheshwari 22rd May 2020 24th May 2020 12 Lecturer Group number Assignment due date Date of submission Number of pages Name Khoa Phan Student ID S3818139 Part Contributed Part (Find dataset) Part (All) Part (All) Contribution % 100% Signature Khoa Part (Half) Viet Do S3750310 Part (2 questions) Part (Find dataset) 100% Viet 100% Phuc 85% Dat Part (All) Assignment 3B (Powerpoint + Phuc Tu S3812120 Edit) Part (Find datasets + content) Part (All) Part (2 questions) Assignment 3B (Question 1, + Dat Ho S3804678 Presentation) Part (Find dataset) Part (Half) Assignment 3B (Question + Presentation) PART 1: DATA COLLECTION: In collecting-data process, by enquiring various reliable sources, such as WHO or World Bank, our team successfully collected a wide range of secondary data in the majority of countries in two regions, Asia and Europe & European Union in terms of for six variables: - Numbers of COVID-19 deaths (between January 22 and April 23, 2020) (Our World In Data 2020) - Average temperature (in mm) that is calculated by data from 1991 to 2016 (World Bank Group 2020) Average rainfall (in Celsius) that is calculated by data from 1991 to 2016 (World Bank Group 2020) Population (in 1,000s) by using data in 2018 (The World Bank 2019) Hospitals beds (per 10,000 people) by using latest available data (WHO 2020) Medical doctors (per 10,000) by using latest available data (WHO 2020) However, due to the many national issues, mostly relating to sovereignty recognition of few countries, there is still a lack of data in those nations And solving this problem, we implemented the data-cleansing method, which adjusts and rejects the missing or poor-quality data, hence enhancing the reliability of final result in testing (Gschwandtner et al 2014), especially building regression model as in this research As a result of this cleansing progress, we finally have new well-qualified datasets without any missing data, which ensures more reliable output for final regression model: - Asia: 32 countries (cleaning countries: Hong Kong, Macao, Taiwan) Europe & European Union: 42 countries (cleaning 11 countries: Faroe Islands, Gibraltar, Guerney and Aderney, Jersey, Kosovo, Liechtenstein, Vatican City, Svalbard and Jan Mayen Islands, San Marino, The Isle of Man, Moldova) PART 2: DECRIPTIVE MEASURE: From the collected and cleaned data about deaths due to COVID-19 pandemic in the first part, we are able to analyze the descriptive measure in two those regions Generally, the death cases due to Covid-19 in European region is higher than that in Asia but the difference between mortality cases in Asian countries is overall greater than this measure in Europe & European Unions a Measure of Central Tendency: Measures Mean (Cases of death) Mode (Cases of death) Median (Cases of death) Asia 215.37 Europe & European Union 2329.08 79.5 Table 1: Measures of Central Tendancy of COVID-19 deaths in Asia and Europe & European Union Except for mode that cannot be utilized for assessing due to the variability of data in countries having death cases, two other statistics both can be ideal representative for Central Tendency And by the way of evaluation, despite the impact from outliers (7 in Asia and in Europe & European Union), mean still seems to be a better statistic for assessing Central Tendency because median witnesses a stronger detrimental effect from the unusual distribution, especially when there are 12 Asian countries having no deaths from COVID-19 (accounted for over one-third of all data in set) With this selection, the number of mortality case in Europe and European Union countries is considerably greater than that in Asia (2329.08 deaths vs 215.37 deaths) In other words, the average COVID-19 deaths in Asian countries is nearly 10 times lower than that number in Europe & European Union countries b Measure of Variance: Measures Asia Europe & European Union IQR (Cases of death) Range (Cases of death) Variance ((Cases of death)2) Standard Deviation (Cases of death) Coefficient of Variance (%) 71.5 4,619 617,357.01 785.72 364.82 491.5 25,085 37,610,339.91 6,132.73 263.31 Table 2: Measures of Variance of COVID-19 deaths in Asia and Europe & European Union Statistically, IQR and Range are not ideal statistics for reflecting the Variance because they not demonstrate the distribution Although Standard Deviation is usually used as representation for Variance due to the relation of all data in set, it seems not to be this case because the absolute value in this statistic is not suitable when the means of Asia and Europe & European Union are vastly different (about 10 times in comparison) As a consequence, Coefficient of Variance is the best selection for representing Variance since this measure shows the relative value, which allows the accurate comparison, no matter how different the means of objectives are With this choice, we conclude that the variability of numbers of deaths between Asian countries is much greater than that in European nations (364.82% vs 263.31%) Specifically, there is a further dispersion of mortality cases around its average deaths in Asian nations than those in Europe & European Union c Measure of Shape: 44619 Graph 1: Box-and-Whisker plot of Asia and Europe & European Union Even though box-and-whisker plot and mean-and-median comparison always demonstrate the same result of skewness, graph-illustrating solution is still better for analysis as it not only explains the detail of skewness but also reveals exactly the distribution of data in four quarters, which provides the viewers with a deep understanding about features of different sets For example, in this case, in spite of the same right-skew distribution, box of Europe & European Union is much longer, which describes the vaster spread of middle 50% of data in this region than that in Asian nations And as a result of this choice, we generally infer that two regions has right skewness, which means that more than 50% of total Asian countries have the COVID-19 deaths below 215.37 mortality cases while lower than 50% of total countries in Europe and European Union have the deaths over 2329.08 cases due to pandemic PART 3: MULTIPLE REGRESSION: As mentioned in part 1, through the collecting and cleansing step, we have two sets with the data from 32 Asian and 41 European countries for building regression model And with this model, we are able to estimate the change in number of COVID-19 deaths when tested predictors change Specifically, our purpose in Multiple Regression part is finding out: - Whether there are significant influences from independent variables (average temperature, average rainfall, populations, hospital beds and medical doctors) on dependent variable (COVID-19 deaths) - How those independent variables impacts dependent variable (Negative/Positive, Strong/Weak) Most remarkably, to fulfil those purposes, elimination backward procedure is used for removing all insignificant variables in this case The reason behind using this method is that the variability of error is impacted by the number of predictors, which can be explained by the mutual interactions between those independent variables that results in the inaccuracy of regression model (Cai & Hayes 2007) Consequently, by eliminating variables one-by-one, elimination backward can effectively remove those interactions, which enhances the veracity of final regression model After applying this method, our team successfully eliminate insignificant predictors to reach to the final model that contains only variable that are significant at 5% level of significance in two regions: Asia: a Regression output: b Equation: COVID19 deaths (y-hat) = -19.551 + 0.002(Population) - In which Units are: Estimated COVID19 deaths (cases) Population (1000s) c Regression coefficients: b1 = 0.002 indicates that the number of deaths increases by 0.002 cases for every 1000 people increase in population b0 = -19.551 shows that when the population is zero, the estimated deaths due to COVID19 is -19.5512 cases However, this interpretation makes no sense in this case because the deaths cannot be a negative value and it is impossible for having deaths when there are no people in a country * As a consequence of this equation, we implicate that: - There is a significant influence from population on the COVID-19 deaths in each country (p-value = 0.000 < 0.05 = Level of significance) - There is a positive (0.002 is positive) relation between COVID-19 deaths and population d Coefficients of determination: R square = 0.631 indicates that about 63.1% of the variation in COVID19 deaths may due to variation in population of a country, the remaining 36.9% of variation of COVID19 deaths are influenced by other factors Europe & European Union: a Regression output: b Equation: COVID19 deaths (y-hat) = -11669.702 + 0.142(Population) + rainfall) + 579.428(Average temperature) - 90.87(Average In which Units are: Estimated COVID19 deaths (cases) Average Rainfall (mm) Average Temperature (Celsius) Population (1000s) c Regression coefficients: b1 = 0.142 shows that the COVID 19 deaths will increase, on average, by 0.142 death for every 1000 people increase in population, holding average rainfall and average temperature as constant b2 = 90.87 shows that the COVID 19 deaths will increase, on average, by 90.87 death for every mm increase in average rainfall, holding the average temperature and the population as constant b3 = 579.428 shows that the COVID 19 deaths will increase, on average, by 579.428 death for every Celsius increase in average temperature, holding average rainfall and population as constant b0 = -11669.702 shows that when the Average rainfall, the Average temperature and Population are zero, the approximated deaths due to COVID19 calculated as – 11669.702 deaths However, it is meaningless if there is no population in a single country and number of deaths remain negative; hence there is no significant interpretation for this intercept * From equation, we infer that: - There are significant influences from population, average rainfall and average temperature on the COVID-19 deaths in each country (p-value (population) = 0.000 < 0.05; p-value (average rainfall) = 0.028 < 0.05; p-value (average temperature) = 0.005 < 0.05) - There are positive (0.142; 90.87 and 579.428 are positive) relation between COVID deaths and population d Coefficients of determination: R square equals 0.417 indicates that about 41.7% of the variation in COVID19 deaths may due to variation in the average rainfall, the average temperature and the population of a country, the remaining 58.3% of variation of COVID19 deaths are influenced by other factors PART 4: TEAM REGRESSION CONCLUSION: Do both the models have the same significant independent variable/s? Based on the final regression model in two regions, it is obvious that there are dissimilarity in significant variables between two regions Particularly, by applying Elimination Backward method (see more from models and hypothesis tests in appendix), we eliminated insignificant variables in Asia and insignificant variables in Europe & European Union Consequently, we have the final models in two regions, in which Asia has only one significant variable: population, Europe & European Union has significant independent variables: average temperature, average rainfall and population Thus, two models have different significant variables Explaining by scientific evidences, population appears in both models showing the close positive relation between population and number of deaths, which can be interpreted by many intermediate elements, especially the number of cases Specifically, the crowded population would encourage the invasion of infectious diseases as the pathogens rises (Dobson & Carper 1996) As a result, as Donaldson and his colleagues (2009) proved, the more crowed area likely has the higher number of infectious cases, hence possibly having higher deaths if the death rate is the same internationally Another explanation is that larger population size may result in lower individual care and overwhelming situation Considering Wuhan three months ago as a typical example, all hospitals at there were overcrowded and the mortality cases accelerated exponentially (Li et al 2020) So, most of scientific evidence support our final regression model Regarding remained variables, the European model implicates the positive correlation between numbers of deaths and average temperature However, it is widely acknowledged that the viability of Coronavirus is lower with the higher temperature (Chan et al 2011) In other words, this finding shows the negative relation between average temperature and the numbers of COVID-19 deaths since the higher temperature discourages the development of this virus Similarly, in this study, Chan and his colleagues (2011) stated the negative relationship between the stability of Coronavirus and the humidity As a consequence, they also denied the positive correlation between number of mortality cases and average rainfall, which is result of our final model Therefore, the positive relations of two variables with deaths are not supported by scientific evidences Which region is more impacted due to this pandemic? Based on equation of our final regression model, we conclude that Covid-19 has more impact on Europe & European Union than Asia by checking out the slopes, which summarizes the change in death cases resulting from the change in variables By the way of illustration, in the ‘population’ variable, b1 value in Asia is 0.002 that is extremely small in a comparison with the slope of 0.142 in Europe & European Union, which is nearly 70 times As a result of this exponential difference, despite the population in Asia is times greater than that in Europe & European Union (The World Bank 2019), the European nations are more impacted by population due to its massive slope comparing with Asia (1) In addition, while Asia is not significant influenced by average rainfall and average temperature due to the disappearance of two variables in equation but they strongly affect the number of death in European countries (b2 = 90.87, b3 = 579.428) Once again, Europe & European Union is more impacted by average rainfall and average temperature (2) From (1) and (2), we infer the more influence from pandemic on Europe & European Union than Asia Impressively, this finding is strongly supported by the result of the descriptive measure when the number of death in European is nearly 10 times higher than Asia (Central Tendency) * Non-technical conclusion: To sum up, from the regression output, we imply that the number of Covid-19 death in European nations are affected by average temperature, average rainfall and population while mortality cases in Asia are influenced by only population Moreover, from regression equation and descriptive measure, we generally conclude that European countries are more impacted by pandemic that the Asian partner PART 5: TIME SERIES: In this part, we will collect data of COVID-19 deaths in Asia and Europe & European Union between February 15, 2020 and April 30, 2020 Based on this dataset, we will build the trend models and choose the best one for predicting the number of COVID-19 deaths in future by using time series: Asia: After using the hypothesis tests (see more in Appendix), we infer that Quadratic (QUA) does not exist and only two significant models exist in Asia with regression outputs and formulas below: a Regression output: - Linear (LIN) trend model: - Exponential (EXP) trend model: b Formula: Model Formula ^ Y LIN EXP (in non-linear format) EXP (in linear format) = 2.425 + 5.706T Log ( ^ Y ) = 1.761 + 0.012T ^ Y = 57.677 × 1.028T Table Formula of significant models in Asia Based on regression output, we are able to compare the R-square for choosing the best model to predict the number of COVID-19 deaths in Asia Specifically, R-square of Exponential trend model is 67.3%, which is higher than the other significant trend model (38.6%) Thus, we strongly recommend the exponential (EXP) trend model for estimating the further mortality cases in Asia due to the least fault among numerous models And so, we also choose this model for forecasting the number of deaths due to COVID-19 in Asia on May 29, May 30 and May 31 as table below: EXP ^ Y = 57.677 May 29 × ≈ 1048 May 30 ≈ 1077 May 31 ≈ 1107 10 1.028T Table Predicted deaths on May 29, May 30, May 31 in Asia Europe & European Union: After using the hypothesis tests (see more in Appendix), we infer that Quadratic (QUA) does not exist and only two significant models exist in Europe & European Union with regression outputs and formulas below: a Regression output: - Linear (LIN) trend model: - Exponential (EXP) trend model: b Formula: Model LIN Formula ^ Y = -730.218 + 64.340T 14 For the more discussion, it is quite amazing to know that recent researches gave the inaccurate estimation about the COVID-19 deaths of our world (Appolonia & Barranco 2020) This dissimilarity comes from the complicated scenario, especially the distinctive policies in each nation (Dowd et al 2020) For example, after social distancing policies, which prevented the spread of Coronavirus, had been imposed, the deaths were suddenly reduced and the estimation before had been incorrect However, the positive result from this policy made the government become subjective and relaxed their pandemic policies, which once again generated an ideal environment for Coronavirus to develop, so this sudden cause made the calculations not to be exact again due to the accelerated mortality cases For this reason, our predictions also can be incorrect in the future as other professional research used to Additionally, based on the dependence of COVID-19 deaths on government intervention, we also strongly recommend government to maintain this policy for preventing the increase in death cases again Regarding the variables, further investigations need to be done to find reliable significant factors as the population that truly affect number of deaths, which improves the accuracy of prediction about COVID-19 deaths For example, the number of over-65 people in population structure or the number of male and female are many remarkable variables that affect the COVID-19 deaths Particularly, according to researchers (Sharon 2020), the Coronavirus is known as an unequal-opportunity killer, which means the older people are, the more possibility of death they have if they catch the Coronavirus By the way of explanation, being elderly, having weaker immune system and the worse overall health, or possibly having other chronic illness already, will lead to the high risk of mortality from Corona disease reasonably On the other hand, specific data from China CDC depicted that 106 men had disease for every 100 women Furthermore, the WHO mission (2020) reported 51% male cases among two sexes while in Wuhan a study discovered about 58% of the patients are male Besides, an updated written by researchers in JAMA revealed that there is slight predominance of male deaths in this pandemic As a consequence of those figures, men have more probability of mortality than the partner due to the higher cases Therefore, number of male and female mortality cases from COVID-19 should be a part of discussion To sum up, with the various available data source from Internet, further researches should enquire and build regression model as in our research to have a better estimation about COVID-19 deaths Reference: Appolonia, A & Victoria, B 2020, ‘Why COVID-19 predictions will always be wrong’, Business Insider, April 30, viewed 22 May 2020, Chan, KH, Peiris, JSM, Lam, SY, Poon, LIM, Yuen, KY & Seto, WH 2011, ‘The Effects of Temperature and Relative Humidity on the Viability of the SARS Coronavirus’, Advance in Virology, vol 2011, pp 1-7 Donaldson, LJ, Rutter, PD, Ellis, BM, Greaves, FE, Mytton, OT, Pebody, RG & Yeardley, E 2009, ‘Mortality from pandemic A/H1N1 2019 influenza in England: public health surveillance study’, BMJ, vol 339 15 Dowd, JB, Andriano, L, Brazei, DM, Rotondi, V, Block, P, Ding, X, Liu, Y & Mills, MC 2020, ‘Demographic science aids in understanding the spread and fatality rates of COVID19’, PNAS, vol 117, no 18, pp 9696-9698 Hayes, AF & Cai, L 2007, ‘Using heteroskedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation’, Behavior Research Method vol 39, no 4, pp 709-722 Li, QH, Ma, YH, Wang, N, Hu, Y & Liu, ZZ 2020, ‘New Coronavirus-Infected Pneumonia Engulfs Wuhan’, Asian Toxicology Research, vol 2, no 1, pp 1-7 Our World in Data 2020, Total comfirm COVID-19 deaths, dataset, Our World in Data, viewed 21 May 2020, Murray, CJL 2020, ‘Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilatordays and deaths by US state in the next months’, IHME COVID-19 health service utilization forecasting team, pp 1-26 Sharon B 2020, ‘Who is getting sick, and how sick? A breakdown of coronavirus risk by demographic factors’, Health, March, viewed 20 May 2020, Theresia, G, Wolfgang, A, Silvia, M, Johannes, G, Simone, K, Margit, P & Nik, S, ‘TimeCleanser: a visual analytics approach for data cleansing of time-oriented data’, IKNOW '14: Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business, no 18, pp 1-8 World Health Organization 2019, Coronavirus disease (COVID 19) advice for the public: Myth busters, World Health Organization, viewed 21 May 2020, World Health Organization 2020, Hospital beds, World Health Organization, database, viewed 21 May 2020, World Bank Group 2020, Asia and Europe and European Union rainfall, Climate change kno wledge portal dataset, World Bank Group, World Bank Group, viewed 21 May 2020, World Bank Group 2020, Asia and Europe and European Union temperature, Climate chang e knowledge portal dataset, World Bank Group, viewed 21 May 2020, World Bank Group 2020, All countries population, World Bank Group dataset, World Bank Group, viewed 21 May 2020, 16 World Health Organization 2020, Density of medical doctors (total number per 10000 population, latest avaiable year, Global Health Observatory (GHO) dataset, World Health Organization, viewed 21 May 2020, Worldometer 2020, China Coronavirus cases – Deaths, Worldometer dataset, Worldometer, viewed 23 May 2020, < https://www.worldometers.info/coronavirus/country/china/> Appendix: Multiple Regression: a Asia: Based on given data set, we are able to build the regression model of Asia with independent variables: - First model: Figure Summary output for Asia - Hypothesis test for first model: Based on figure H0 H1 Average temperature H0: B1 = (No linear relationship between deaths and average temperature H1; B1 ≠ (Linear relationship Average rainfall Population Hospital beds Medical doctors H0: B2 = (No H0: B3 = (No H0: B4 = (No H0: B5 = (No linear linear linear linear relationship relationship relationship relationship between between between death between death deaths and deaths and pop case and case average rainfall) ulation) hospital beds) and medical doctors) H1; B2 ≠ H1; B3 ≠ H1; B4 ≠ (Linear (Linear (Linear H1; B5 ≠ (Linear relationship relationship relationship between between deaths between deaths relationship 17 between deaths deaths and and average average temperature) temperature) and population) and hospital beds) between deaths and medical doctors) P-value 0.099 > 0.05 0.000 < 0.05 0.897 > 0.05 0.826 > 0.05 Decisions P-value is P-value is greater than greater than level of level of significance, significance, hence we not hence we not reject H0 reject H0 P-value is smaller than level of significance, hence we reject H P-value is greater than level of significance, hence we not reject H0 P-value is greater than level of significance, hence we not reject H0 Conclusions With 95% of confidence, we can say that there is no linear relationship between deaths and average temperature With 95% of confidence, we can say that there is no linear relationship between deaths and average rainfall With 95% of confidence, we can say that there is a linear relationship between deaths and population With 95% of With 95% of confidence, we confidence, we can say that can say that there is no there is no linear linear relationship relationship between deaths between deaths and hospital and medical doctors beds 0.188 > 0.05 From the first model, we apply the elimination backward theory by eliminating the insignificant variable that has the highest p-value In this case, we eliminate hospital beds and have new dataset for building second regression model: - Second model: Figure Summary output for Asia (excluding hospital beds) - Hypothesis test for second model: 18 Based on figure Average Average rainfall Population Medical doctors temperature H0: B1 = H0 H0: B2 = (No H0: B3 = (No H0: B4 = (No (No linear H1 linear relationship linear relationship linear relationship relationship between between between between deaths and deaths and average deaths and populati deaths and numbers average rainfall) on) of medical doctors) temperature) H1; B4 ≠ (Linear H1; B2 ≠ (Linear H1; B3 ≠ (Linear relationship H1; B1 ≠ (Linear relationship relationship between deaths relationship between between deaths and numbers of between deaths and deaths and average and population) medical doctors) average rainfall) temperature) P-value 0.068 > 0.05 0.17 > 0.05 0.000< 0.05 Decisions P-value is greater than level of significance, hence we not reject H P-value is greater P-value is smaller than level of than level of significance, hence significance, hence we not reject we reject H0 H Conclusions With 95% of With 95% of With 95% of With 95% of confidence, we can confidence, we can confidence, we can confidence, we can say that there is no say that there is no say that there is no say that there linear relationship linear relationship linear relationship is a linear between deaths relationship between deaths and between deaths and numbers of between deaths and average average doctors and population rainfall temperature 0.842 > 0.05 P-value is greater than level of significance, hence we not reject H From the second model, we apply the elimination backward theory by eliminating the insignificant variable that has the highest p-value In this case, we eliminate medical doctors and have new dataset for building third regression model: - Third model 19 - Figure Summary output for Asia (excluding medical doctors) Hypothesis test for third model: Based on figure Average temperature H0 H0: B1 = (No linear relationship H1 between deaths and average temperature) Average rainfall H0: B2 = (No linear relationship between deaths and average rainfall) Population H0: B3 = (No linear relationship between deaths and population) H1; B3 ≠ (Linear relationship between deaths and population) H1; B1 ≠ (Linear relationship between deaths and average temperature) H1; B2 ≠ (Linear relationship between deaths and average temperature) P-value 0.052 > 0.05 0.165 > 0.05 0.000< 0.05 Decisions P-value is greater than level of significance, hence we not reject H0 P-value is greater than level of significance, hence we not reject H0 P-value is smaller than level of significance, hence we reject H0 Conclusions With 95% of confidence, we can say that there is no linear relationship between deaths and average temperature With 95% of confidence, we can say that there is no linear relationship between deaths and average rainfall With 95% of confidence, we can say that there is a linear relationship between deaths and population From the third model, we apply the elimination backward theory by eliminating the insignificant variable that has the highest p-value In this case, we eliminate average rainfall and have new dataset for building fourth regression model: - Fourth model 20 Figure Summary output for Asia (excluding average rainfall) - Hypothesis test for fourth model: Based on figure Average temperature H0 H0: B1 = H1 (No linear relationship between deaths and average temperature) H1; B1 ≠ (Linear relationship between deaths and average temperature) Population H0: B2 = (No linear relationship between deaths and population) H1; B2 ≠ (Linear relationship between deaths and population) P-value 0.164 > 0.05 0.000 < 0.05 Decisions P-value is greater than level of significance, hence we not reject H0 With 95% of confidence, we can say that there is no linear relationship between deaths and average temperature P-value is smaller than level of significance, hence we reject H0 Conclusions With 95% of confidence, we can say that there is a linear relationship between deaths and population From the fourth model, we apply the elimination backward theory by eliminating the insignificant variable that has the highest p-value In this case, we eliminate average temperature and have new dataset for building fifth regression model: - Final model 21 - Figure Summary output for Asia (excluding average temperature) Hypothesis test for fifth model: Based on figure Population H0: B1 = (No linear relationship between deaths and population) H0 H1 H1; B1 ≠ (Linear relationship between deaths and population) P-value 0.000 < 0.05 Decisions P-value is smaller than level of significance, hence we reject H0 Conclusions With 95% of confidence, we can say that there is a linear relationship between deaths and population From the fifth model, we can see that population is the only significant variable so this is the final model as well b Europe: Based on given data set, we are able to build the regression model of Asia with independent variables: - First model: 22 Figure Summary output for Europe & European Union Based on figure H0 H1 Hypothesis test for first model: Average temperature H0: B1 = (No linear relationship between deaths and average temperature H1; B1 ≠ (Linear relationship between deaths and average temperature) Average rainfall H0: B2 = (No linear relationship between deaths and average rainfall) H1; B2 ≠ (Linear relationship between deaths and average rainfall) Population Hospital beds H0: B3 = (No linear relationship between deaths and population) H0: B4 = (No linear relationship between death case and hospital beds) H1; B3 ≠ (Linear relationship between deaths and population) H1; B4 ≠ (Linear relationship between deaths and hospital beds) Medical doctors H0: B5 = (No linear relationship between death case and medical doctors) H1; B5 ≠ (Linear relationship between deaths and medical doctors) P-value 0.053 > 0.05 0.005 < 0.05 0.000 < 0.05 0.053 > 0.05 0.089 > 0.05 Decision P-value is greater than level of significant, hence we not reject Ho P-value is smaller than level of significant, hence we not reject Ho P-value is greater than level of significant, hence we not reject Ho P-value is greater than level of significant, hence we not reject Ho P-value is smaller than level of significant, hence we reject Ho Conclusions With 95% of With 95% of With 95% of With 95% of With 95% of confidence, we confidence, confidence, we confidence, we confidence, can say that there we can say can say that there can say that there we can say 23 is no linear relationship between deaths and average temperature that there is a linear relationship between deaths and average rainfall is a linear relationship between deaths and population is no linear relationship between deaths and hospital beds that there is no linear relationship between deaths and medical doctors From the first model, we apply the elimination backward theory by eliminating the insignificant variable that has the highest p-value In this case, we eliminate medical doctors and have new dataset for building second regression model - Second model: Figure Summary output for Europe & European Union (excluding medical doctors) Based on figure H0 H1 Hypothesis test of second model: Average rainfall Average temperature H0: B1 = (No H0: B2 = (No linear linear relationship relationship between between deaths and deaths and average average rainfall) temperature) Population Hospital beds H0: B3 = (No linear relationship between death case and population) H0: B4 = (No linear relationship between death case and hospital beds) H1; B1 ≠ H1; B2 ≠ (Linear H1; B3 ≠ (Linear relationship between (Linear relationship relationship deaths and average between deaths between deaths and temperature) and population) average rainfall) H1; B4 ≠ (Linear relationship between deaths and hospital beds) P-value 0.049 < 0.05 0.23 > 0.05 Decision P-value is greater P-value P-value is smaller P-value is smaller than level is smaller than level than level of than level of significant, hence significant, hence of significant, hence of significant, 0.010 < 0.05 0.000 < 0.05 24 we reject Ho Conclusions we reject Ho With 95% of With 95% of confidence, we can confidence, we can say that there say that there is linear is linear relationship relationship between deaths and between deaths and average average rainfall temperature hence we reject Ho we not reject Ho With 95% of confidence, we can say that there is linear relationship between deaths and population With 95% of confidence, we can say that there is no linear relationship between deaths and hospital beds From the second model, we apply the elimination backward theory by eliminating the insignificant variable that has the highest p-value In this case, we eliminate hospital beds and have new dataset for building third regression model: - Final model: Figure Summary output for Europe & European Union (excluding Hospital beds) Based on figure H0 H1 P-value Hypothesis test for third model Average rainfall Average temperature Population H0: B1 = (No linear relationship between deaths and average rainfall) H0: B2 = (No linear relationship between deaths and average temperature) H0: B3 = (No linear relationship between death case and hospital beds) H1; B1 ≠ (Linear relationship between deaths and average rainfall) H1; B2 ≠ (Linear relationship between deaths and average temperature) 0.028 < 0.05 0.005 < 0.05 H1; B3 ≠ (Linear relationship between deaths and population) 0.000 < 0.05 25 P-value is smaller than level P-value is smaller than level of significant, hence of significant, hence we reject Ho we reject Ho Decision P-value is smaller than level of significant, hence we reject Ho Conclusions With 95% of confidence, With 95% of confidence, we can say that there we can say that there is a linear relationship is a linear relationship between deaths and between deaths and average temperature average rainfall With 95% of confidence, we can say that there is a linear relationship between deaths and population From the third model, we can see that population, average rainfall and average temperature are significant variables so this is the final model as well Time Series: a Asia: - Hypothesis Testing for Asia Region: Model Linear Trend Model Asia H0; B1 = (there is no relationship) H1; B1 ≠ (there is a linear relationsh ip) P value < α (2.1472E-090.05) Do not reject Ho Hence, model is not existed Output: Linear trend model: Formula Y^=2.425+5.706T Y^=60.578+1.232T+0.058(T^2) Log(Y^)=1.761+0.012T Y^=57.706*1.028T 26 Quadratic trend model: Exponential trend model: b Europe & European Union: - Hypothesis test for Europe & European Union: Model Linear Trend Model Quadratic Trend Model Exponential Trend Model 27 Europe and Eu H0; B1 = H0; B2 = H0; B1= ropean Union (there is no relationship (there is no relationship) (there is no relationship) ) H1; B2 ≠ H1; B ≠ H1; B1 ≠ (there is a quadratic relat (there is an exponential relatio (there is a linear relatio ionship) nshipP value < α nship) P value > α (8.65823E-130.05) Reject Ho (7.76896E-22