Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 13 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
13
Dung lượng
2,64 MB
Nội dung
MINISTRY OF EDUCATION AND TRAINING NATIONAL ECONOMICS UNIVERSITY *** GROUP MID-TERM EXAM Course: Mathematical Statistics TOPIC: Hypothesis testing Lecturer: Trần Thị Bích Class: Finance Economics – FE64 Group: Members: Mai Thái Sơn – 11225626 Phan Mạnh Hùng – 11222590 Phạm Ngọc Hải – 11222029 Trần Mạnh Hào – 11222180 Phạm Văn Lâm – 11223240 Hoàng Phú Khánh - 11223030 CONTENTS PART I: SUMMARIZE THE ARTICLE The article 2 The issue of interest The technique Additional references of Hypothesis testing Viewpoint PART II: DATA ANALYSIS TO A SPECIFIC ORGANIZATIONAL PROBLEM Overall view: Insurance claims and Dataset The purpose Descriptive Statistics for Variables Data analyze 1|Page PART I: SUMMARIZE THE ARTICLE The article Stress management through regulation of blood pressure among college students Source: https://content.iospress.com/articles/work/wor2308 The issue of interest Stress is a pervasive condition that affects individuals on multiple levels, disrupting their sleep, work, and overall quality of life Extensive research has identified job-related and academic-related factors as major contributors to stress However, the field lacks studies investigating immediate remedies or first aid measures to alleviate stress This article introduces the concept of Deep Breathing Technique (DBT) and explores its potential application as a means of stress management by regulating blood pressure among Indian college engineering students A comparative analysis is conducted between DBT and Ordinary Breathing Technique (OBT) to assess their effectiveness The primary objective of this article is to investigate whether deep breathing techniques can effectively control blood pressure and subsequently reduce stress levels By examining the impact of different breathing techniques, the article aims to provide recommendations based on the findings The technique This academic article used hypothesis testing to analyze data relating to stress management In data science and statistic hypothesis testing is an important step as it involves the verification of an assumption that could help develop a statistical parameter It is the act of testing a hypothesis or a supposition in relation to a statistical parameter Analysts implement hypothesis testing in order to test if a hypothesis is plausible or not In order to find the plausibility of the hypothesis we have to use hypothesis testing method In order to perform hypothesis testing method first weestablish two hypotheses - alternative hypothesis and null hypothesis in order to begin with the procedure Then we collect data to test the hypothesis after that, perform an 2|Page appropriate statistical test, through the test we decide whether reject the null hypothesis or no Thus, as an organizational manager, we see this as an useful tool to create a reliable environment for deciding on sample data It helps us move on knowing that there is no possibility being overlooked that may have an effect in the future Additional references of Hypothesis testing a The first source Hypothesis and Hypothesis Testing in the Clinical Trial https://www.psychiatrist.com/wp-content/uploads/2021/02/13947_hypothesishypothesis-testing-clinical-trial.pdf b The second source Testing a Hypothesis—Plant Growth https://fathom.concord.org/resources/tutorials/testing-a-hypothesis-plant-growth/ Viewpoint In order to have a better understanding of Deep Breathing Technique (DBT) and its application such as control blood pressure, level of strees, it is essential that we examine the technique carefully so that we could draw the most suitable conclusions For the target of the research, a total of 123 students were selected Sample students are filtered and selected via an initial screening (a questionnaire on academic stress) and the ones reported high mental stress during the interview were chosen for the main drills The total data set was divided into two groups named as control group and experimental group In the control group, the first readings were recorded as “before the drill readings” for Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP) The second reading was recorded as “after Ordinary Breathing Technique (OBT)” 3|Page Table Hypothesis to be tested Using the t test formula, the mean of the differences as well as standard deviation can be calculated and illustrated in Table As listed in Table (for control group), that the average DBP is 87.75 before the OBT drill and 87.27 after the OBT drill, with a variance of 7.54 and 9.34 respectively Based on the t test, it is calculated that the P-value (= 0.089 and 0.274) > α Therefore, it is concluded that both H01 and H02 are not rejected at 1% level of significance Table t test (Control Group) Onto Table (for experimental group), the DBP is 87.27 before the DBT Drill and 79.89 after the DBT drill, which is significantly lower and closer to the desired level of 80 Asvident, the P-value is less that α at 0.01 (P-value = 0.000 < α = 0.01) And the delivered conclusion is “ H03 is rejected”, and we can say that there is a significant positive effect of DBT on DBP On the other hand, the SBP was 128.82 before the DBT drill and 121.03 after the drill gain indicating a significant positive effect as the desired level is 120 4|Page References including the P-value being 0.000 (lower than α at 0.01) in the t test confirms this (H04 is rejected) Table t test (Experimentation Group) Table shows the status of hypothesis for control and experimental groups after testing and statistical analysis Table Hypothesis status Based on the result of the hypothesis testing, we could draw a conclusion that Deep Breathing Technique has a great effect on students It is recommended that people should use this techniquie in order to have a better health condition PART II: DATA ANALYSIS TO A SPECIFIC ORGANIZATIONAL PROBLEM Overall view: Insurance claims and Dataset Leveraging customer information is of paramount importance for most businesses Most firms place an emphasis on leveraging customer information In the case of an insurance company, consumer characteristics such as those listed below might be critical in making business decisions So, we have collected data from 1338 of our customers, 676 male and 662 female or as mentioned below as policy holders or insurance holders 5|Page Document continues below Discover more from: Mathematical statistics Đại học Kinh tế Quố… 392 documents Go to course Premium Bai tap powerpoint 15 14 Mathematical statistics 100% (2) Premium [Hồ Thức Thuận] full cơng thức giải nhanh… Mathematical statistics 83% (6) tốn Premium cho nhà kinh tế Mathematical statistics 100% (2) SFM Premium A1.1 - Dist 40 20 Mathematical statistics 100% (1) Sfm 1Premium - Statistic for management Mathematical statistics 100% (1) CôngPremium văn nghỉ lễ Giõ tổ Hùng vương 30-4 Dataset source: https://www.kaggle.com/code/yogidsba/insurance-claimsMathematical 100% (1) statistics eda-hypothesis-testing The purpose The objective of this testing is to find the profile of customers that will benefit the company most through questions: - If insurance holders with no kids pay less charges than average at 90% confidence level? - If medical charges made by the people who smoke are greater than those who don’t at 90% confident level? DATASET: - Age : This is an integer indicating the age of the primary beneficiary (excluding those above 64 years, since they are generally covered by the government) - Sex : This is the policy holder's gender, either male or female - BMI :This is the body mass index (BMI), which provides a sense of how over or under-weight a person is relative to their height BMI is equal to weight (in kilograms) divided by height (in meters) squared An ideal BMI is within the range of 18.5 to 24.9 - Children : This is an integer indicating the number of children / dependents covered by the insurance plan - Smoker : This is yes or no depending on whether the insured regularly smokes tobacco - Region :This is the beneficiary's place of residence in the U.S., divided into four geographic regions - northeast, southeast, southwest, or northwest - Charges :Individual medical costs billed to health insurance Above are all the variances in the data that we collected, however, to answer the questions of this test, we will only be using the "Children", "Smoker" and "Charges" variances in our calculation 6|Page Descriptive Statistics for Variables As you can see from the table above, the dataset consists of 1338 samples and all are valid with missing information Data analyze a Question - State the hypothesis Null hypothesis: the average insurance charge of people don’t have kids is $13270 H0: µ = 13270 Alternative hypothesis: the average insurance charge of people don’t have kids less than $13270 Ha: µ < 13270 7|Page - Compute the test statistic This measures (in standardized unit) how far how hypothesis µ to our sample average is 𝑡= 𝑥 + 𝜇 𝑆 √𝑛 Then, we used the One Sample T-Test to compare the means In the “Test Value” section, enter the 13270 as null hypothesis for the insurance charge of people who don’t have children The outcomes: - Make decisions As we can see on the table, the average insurance charge of people of people who don’t have kids in this sample is 12365.975 We can use a Decision Rule using either the Rejection Region, p-value found from appropriate distribution (std normal), or confidence interval approach • With rejection region We wan to be 90% certain This means a 10% chance of rejecting H0 when it is true According to our alternative hypothesis, 10% of the standard normal will be our rejection region We use a t-table with t= -1.801 with a degree of freedom of 573 8|Page T-value os negative, so significance woud only be found in the negative one-tailed t-test Comparing the critical value at 10% of a standard normal is -1.646, our test statistics= -1.801, lying below -1.646, so we reject the null hypothesis in face of the alternative hypothesis • P-value We compare the Sig (2-tailed) with significance level Sig (2-tailed) = 0.072 smaller than α = 0.1, therefore, we reject the null hypothesis and conclude that the average insurance charge of people don't have kids less than $13270 • Confidence Interval (CI) approach The 90% confidence interval for µ is: -1730.820 < µ < -77.231 Because µ= 13270 doesn’t lie between this CI, we reject the null and conclude that the average insurance charge of people don’t have kids less than $13270 Conclusion: Insurance holders with no kids pay less charges than average at 90% confidence interval b Question - State the hypothesis Null hypothesis: smokers have the same medical charges as non-smokers H0: µ = µ Alternative hypothesis: smokers have greater medical charges than non-smokers H a: µ > µ - Compute the test statistics: compute data on SPSS with Independent Sample T-test 9|Page - Make decisions We can see in the Table that the 1st group, which is smokers, has a sample of 274 and the mean of their medical charges is $32050.232 While the group of nonsmokers has a sample of 1064 and their average medical charges is $8434.268 From observation, we can see that non-smokers pay way less than smokers, however, the sample of the groups is different so we need to use the table below for testing • Tests for equal variances: Null hypothesis: H0: σ12 = σ22 Alternative hypothesis: Ha: σ12 ≠ σ22 We will use the data for Sig., which is the p-value, is almost equal to (.000) It is smaller than the significance level of 1% Therefore, we reject the null hypothesis of "H0: σ12 = σ22 " and we can conclude that "Ha: σ12 ≠ σ22 " Which means in the next steps when testing for equal means, we will use data in the "Equal variances not assumed" 10 | P a g e • Test for equal means: We can see the data for Sig (2-tailed) is also equal to 0.000, smaller than 1% of significance level We can reject the null hypothesis and conclude that the average medical charges of smokers and non-smokersare different and non-smokers pay less than smokers on average at 99% confidence level 11 | P a g e