1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Statisticshypothesis testing what does hypothesis testing means

26 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 26
Dung lượng 3,01 MB

Nội dung

NATIONAL ECONOMICS UNIVERSITY CENTER FOR ADVANCED EDUCATIONAL PROGRAMS -*** - STATISTICS HYPOTHESIS TESTING STUDENTS Hồ Phương Anh 11219354 Bùi Phúc Thanh 11215281 Dương Đức Mạnh 11219362 Trần Hà My 11214076 Ha Noi, May 2023 Table of Contents PART INTRODUCTION What does Hypothesis testing means ? What is this method solved ? .4 Application PART ARTICLE SUMMARY Article name .7 Case Purpose Technique Conclusion 11 Other articles 11 PART DATA ANALYSIS 19 Context 19 Data Analyze 19 Recommendation .19 PART CONCLUSION .21 REFERENCES 22 List of Figure Figure Systolic blood pressures… 14 Figure Confidence intervals associated with differing degrees of "confidence" using the same data as in figure 1… .16 Fiure The effect on the confidence interval of sample sizes of up to 500 subjects in each group 18 List of Table Table Research Draft… Design Table Test Results Normality… Table Test of Homogeneity of Variance… .9 Table Hypothesis Test Results… 10 Table Results of test… .12 the first hypothesis Table Results of the second hypothesis test… 12 Table Results of test… 13 the third hypothesis PART INTRODUCTION What does Hypothesis testing means ? Researchers can evaluate a hypothesis' plausibility using the technique known as hypothesis testing It entails determining whether or not an assumption on a particular population parameter is accurate Variance, standard deviation, and median are some of these population metrics Typically, null hypothesis development comes first, followed by a number of tests that either confirm or reject the null hypothesis To compare the correlation or link between two or more variables, the researcher utilizes test statistics Additionally, researchers employ hypothesis testing to compute the coefficient of variation and establish the statistical significance of the regression relationship and correlation coefficient What is this method solved ? Testing hypotheses is essential The most significant advantage of hypothesis testing is that it enables you to evaluate the reliability of your claim or assumption before applying it to your data collection Additionally, the only reliable way to establish if something "is or is not" is through hypothesis testing Other advantages are: Hypothesis testing offers a solid framework for deciding how to use data for your target audience It makes successful extrapolation of data from the sample to a wider population possible for the researcher The researcher can assess whether or not the data from the sample are statistically significant by using hypothesis testing One of the most crucial techniques for gauging the accuracy and dependability of findings in any systematic inquiry is hypothesis testing Links to underlying theory and particular research questions are helpful STEPS OF HYPOTHESIS TESTING  Step 1: State the hypotheses  Step 2: Set the criteria for a decision  Step 3: Compute the test statistics  Step 4: Make a decision Step : First, state your hypothesis We start out by presuming that the assertion or hypothesis we are testing is true The null hypothesis states this Whether this supposition is likely to be accurate serves as the foundation for the decision The null hypothesis (H0), sometimes known as the null, is a statement that assumes the truth of a population parameter, such as the population mean We'll investigate the likelihood that the value given in the null hypothesis is accurate A statement that expresses a population parameter's actual value as being less than, greater than, or not equal to the value given in the null hypothesis is known as an alternative hypothesis (Ha) Step : Establish the standards for judgment We declare the test's level of significance in order to establish the standards for a choice Level of significance, also known as significance level, is a criterion of judgment used to determine whether to accept or reject the value claimed by a null hypothesis The criterion is based on the likelihood that a statistic would be measured in a sample if the null hypothesis' value were accurate The threshold or degree of significance in behavioral research is often set at 5% We reject the value indicated in the null hypothesis when there is a less than 5% chance of obtaining a sample mean if the null hypothesis were true Step 3: Do the test statistic computation A mathematical method called the test statistic enables researchers to calculate the probability of receiving sample results if the null hypothesis were to be true The null hypothesis is decided upon using the test statistic result (explained in Step 4) Step : Reach a conclusion To decide if the null hypothesis is true, we consider the test statistic's value The P value indicates the likelihood of finding a sample mean if the null hypothesis' stated value is accurate The p value is a measure of probability that is always positive and ranges from to In Step 2, we specified the threshold or probability of finding a sample mean, which is commonly set at 5% in behavioral research, and at which point we will decide whether to reject the value specified in the null hypothesis As assess the p value against the standard we established in Step to get a conclusion P-values represent the likelihood of receiving a sample result in the event that the null hypothesis' value is correct The level of significance is compared to the p value for achieving a sample outcome In conclusion, a researcher has two options: We reject the null hypothesis when the p value is less than 5% (p 05) We maintain the null hypothesis when the p value is more than 5% (p >.05) Application Medical research Research in medicine can also benefit from the use of hypothesis testing Let's say a pharmaceutical company wishes to assess the efficacy of a new medicine it has produced to treat a certain illness A group of patients with the disease can be used in a clinical trial, with half of the patients receiving the new medication and the other half receiving a placebo They can then use the following hypotheses to conduct a hypothesis test:   Null Hypothesis (Ho): The new drug has no effect on the disease Alternative Hypothesis (Ha): The new drug is effective in treating the disease If the p-value of the test is less than some significance level (e.g α = 05), then the pharmaceutical company can reject the null hypothesis and conclude that the new drug is effective in treating the disease Finance In the area of finance, hypothesis testing is used yet again Consider an investor who thinks buying stocks in a certain sector will provide better returns than the general market They can gather information on the performance of the market as a whole as well as the returns of stocks in that industry over a specific time period to test this They can then use the following hypotheses to conduct a hypothesis test:  Null Hypothesis (Ho): The returns of stocks in the industry are not significantly different from the market average  Alternative Hypothesis (Ha): The returns of stocks in the industry are significantly higher than the market average If the p-value of the test is less than some significance level (e.g α = 05), then the investor can reject the null hypothesis and conclude that investing in stocks in that industry will yield higher returns than the market average Marketing The field of marketing can also benefit from hypothesis testing For instance, a marketing team can think that a particular ad campaign will result in more product sales The following hypotheses can be used in a hypothesis test:  Null Hypothesis (Ho): The advertisement has no effect on sales  Alternative Hypothesis (Ha): The advertisement leads to increased sales If the p-value of the test is less than some significance level (e.g α = 05), then the marketing team can reject the null hypothesis and conclude that the advertisement leads to increased sales Real estate Hypothesis testing can be used in the real estate sector to determine whether a particular property attribute impacts its worth For instance, a real estate agent might think that a swimming pool raises a property's value The following hypotheses can be used in a hypothesis test:  Null Hypothesis (Ho): The presence of a swimming pool has no effect on the value of the property  Alternative Hypothesis (Ha): The presence of a swimming pool increases the value of the property If the p-value of the test is less than some significance level (e.g α = 05), then the real estate agent can reject the null hypothesis and conclude that having a swimming pool does increase the value of a property PART ARTICLE SUMMARY Article name Comparison of Student Learning Outcomes Through Video Learning Media with Powerpoint By Illa Mudasih1 , Waspodo Tjipto Subroto2 Case The article discusses the importance of education for every human being to improve their quality of life and how technology has influenced the field of education in the era of globalization It specifically focuses on the implementation of the 2013 Curriculum in Indonesia and the use of learning media such as video and PowerPoint in teaching factory overhead material to class XII AK students Purpose The purpose of the article is to emphasize the importance of education, innovative and creative teaching and learning processes, and the use of learning media to improve student learning outcomes The study conducted by the researchers aims to determine the difference in learning outcomes between video learning media and PowerPoint in teaching factory overhead material to class XII AK students Technique o Step 1: Collect data and method This study uses experimental research with a type of True Experimental research Experimental research is a study that is used to find the effect of certain treatments on others in controlled conditions So, experimental research means finding the influence of a variable that gets treatment The design carried out in this study was the design of the pretest-posttest control group design The sample was randomly drawn from out of classes that were class XII AK 1, XII AK 2, and XII AK SMK Wachid Hasyim Surabaya The sample in this study consisted of students of: - Class XII AK with 36 female students and male student as experimental class - Class XII AK with 37 all-female students as experimental class Table Research Design Draft Description: - O1 = Pre-test which is a test given to students before teaching and learning activities using media video - O2= Pre-test which is a test given to students before teaching and learning activities using PowerPoint media - O3= Post-test is a test given to students after teaching and learning activities using video media - O4 = Post-test is a test given to students after teaching and learning activities using PowerPoint media o Step 2: Data analysis The article explains various methods of data analysis, which include normality test, homogeneity test, and hypothesis test It highlights the significance of conducting normality and homogeneity tests before carrying out a hypothesis test - The homogeneity test is used to determine whether the sample variance used is homogeneous, - The Chi-Square test is used for the normality test - t-test used in hypothesis testing to check for differences in student learning outcomes between the control class and the experimental class.The SPSS 16 program is employed for performing these tests This study begins with pretesting in the experimental class and the control class to determine the student’s initial ability to factory overhead material After the pretest, the results of the pretest were tested for normality and homogenates as a prerequisite test of the test Table Test Results Normality After testing the normality of the sample through the help of SPPS, the chi-square test for the value of the pretest video has a significance of 0.388 while the value of the pretest powerpoint has a significance of 0.059 Then the two significance levels are normally distributed because they are greater than 0.05 The homogeneity test was conducted to find out the two samples (experimental class and control class) used in the study had homogeneous or non-homogeneous variants The homogeneity test results for pretest data are as follows: Table Test of Homogeneity of Variance After the homogeneity test using SPSS with the Lavene Statistics test, it is known that the significant level (sig) of the pretest of the two classes is 0.436 and more than 0.05 It also shows that the sample used has the same or homogeneous variance After knowing that both classes are homogeneous, they determined the sample class: - Class XII Ak as experimental class (video media), average point pretest is 46.8 - Class XII Ak as experimental class (PowerPoint media), average point pretest is 45 The final results seen from the analysis of the value of the pretest and posttest showed that experimental class before getting treatment obtained an average point of pretest of 46.8 and after being treated the average value of the posttest was 85 As for the experimental class before obtaining the average point of pretest 45 and after being treated the average value of posttest was 79 Then the posttest value between experiment which uses video and posttest value of experimental class which uses PowerPoint hypothesis testing is done to determine the differences in student learning outcomes The hypothesis criteria are as follows: Ho = There is no difference in student learning outcomes between classes given treatment using video learning media and classes given treatment using Powerpoint learning media Ha = There are differences in student learning outcomes between classes given treatment using video learning media and classes given treatment using Powerpoint learning media The average posttest of the experimental class (PowerPoint media) was obtained by 79, while the average posttest of experimental class (video media) was 85 Viewed from the results of the posttest in the experimental class (media video) and experimental class (media PowerPoint) there are differences, where the experimental class (video media) gets an average posttest value higher than the average value of the posttest experimental class (PowerPoint media) Of the differences in the results of the posttest there are differences, but the differences cannot be said to be significant With that, the Independent Sample T-Test statistical test is conducted The hypothesis test is as follows: 10 - Publication on November 14, 2017 - Link : https://doi.org/10.1371/journal.pone.0188107 - Overview : This study was made to predict the tendency of the Korea Composite Stock Price Index 200 (KOSPI 200) prices using nonparametric machine learning models: artificial neural network (ANN), support vector machines with polynomial (SVMpoly) and support vector machines with radial basis function kernels (SVMRBF) This study also tested hypotheses about some controversial issues related to price prediction in ađition to other research methods In the scope of this report, we only mention the application of hypothesis testing theories in this study  Hypothesis tested - The prediction performances of each method (ANN, SVMpoly, SVMRBF) are similar - Google Trends provides better prediction performances compared with the prediction without this web search frequency - The predictive performance of the whole market index is different from those of the ensemble approaches with major companies in the index  Method : p-value  Processes and Conclusion - The prediction performances of each method (ANN, SVMpoly, SVMRBF) are similar => Hypothesis:  Conclusion: Table Results of the first hypothesis test => Interpretation: Machine-learning methods with general procedures not perform well in predicting the trends of market index prices 12 - Google Trends provides better prediction performances compared with the prediction without this web search frequency => Hypothesis:  Conclusion: Table Results of the second hypothesis test => Interpretation: Google Trends can be ineffective in predicting the index prices - The predictive performance of the whole market index is different from those of the ensemble approaches with major companies in the index => Hypothesis:  Conclusion: Table Results of the third hypothesis test => Interpretation: The ensemble methodology’s effect on the directionality of the market index is unremarkable a Confidence intervals rather than P values: estimation rather than hypothesis testing  Case - Author: MARTIN J GARDNER, DOUGLAS G ALTMAN 13 - Publication in 15 march 1986 - Link: https://www.bmj.com/content/bmj/292/6522/746.full.pdf - Overall: The importance of statistical hypothesis testing in scientific research, particularly in the medical field, is undeniable One way to apply hypothesis testing in medicine is by examining blood pressure to establish the connection between diabetes and cardiovascular disease The article discusses the "CI" method as a means of utilizing this technique in scientific research There are three common approaches for testing a hypothesis - Rejection Region, P-value, and Confidence Interval (CI) However, the use of P-value has dominated Bachelor-level scientific literature, although it has certain limitations The article aims to demonstrate how to use the CI method in hypothesis testing in research by providing a statistical example involving blood pressure  Purpose This study is not just an application of hypothesis testing, it uses the knowledge of hypothesis testing to statistically demonstrate the association of Confidence Interval and the Sample Size when ignoring other factor  Method Study comparing samples of 100 diabetic and 100 nondiabetic men of a certain age a difference of 6.0 mm Hg was found between their mean systolic blood pressures and that the standard error of this difference between sample means was 2.5 mmHg comparable to the difference between means in the Framingham study The 95% confidence interval for the population difference between means is from 1.1 to 10.9 mm Hg and is shown in fig1 together with the original data 14 Figure1 Systolic blood pressures The figure shows the systolic blood pressures in 100 diabetics and 100 non-diabetics with mean levels of 146 and 140 mmHg respectively The difference between the sample means of mm Hg is shown to the right together with the 95% confidence interval from 1.1 to 10.9 mm Hg Authors calculated the confidence intervals For single sample: the confidence intervals is given by: Where t1-α/2 is the appropriate value from t- distribution with n-1 degrees of freedom associated with a confidence of 100(1-α)% For two samples: first, they take a Standard deviation 15 Suppose that x1 and x2 are two sample means.To take a hypothesis test, author apply the fomula: In which: s – Standard deviation n1, n2 – sample sizes Then, they took the Standard error by this fomula: The confidence interval is then: Where t1-α/2 is taken from the t distribution with n1+ n2 - degrees of freedom Application in context After applying the single sample to calculate the Standard deviation they have the figure below: 16 Figure Confidence intervals associated with differing degrees of "confidence" using the same data as in figure Next, they worked samples by using two sample method to find the confidence interval Here are some of the figures available: μ = 6.0; α = 5% Standard deviation is: 17 And the standard error of the difference between the sample means is: To calculate the 95% CI, check the t- distribution, the appropriate value of 198(100+100-2) degrees of freedom is 1.97 then the 95% CI is: That is: from 1.1 to 10.99 mm Hg, as shown in Figure Suppose now that the samples had been of only 50 men each but that the means and standard deviations had been the same With the same calculation, the author received The 95% CI is from -1.0 to 13.00 mm Hg With such a hypothesis, the authors performed calculations with increasing sample size and found a relationship between sample size and confidence interval which is presented in the figure below: 18 Figure The effect on the confidence interval of sample sizes of up to 500 subjects in each group  Conclusion Overall, when we increase the size of the sample, the confidence interval will become narrower If we assume that the means and standard deviations are the same as in the example given, Figure shows the resulting 99%, 95%, and 90% confidence intervals for the difference in mean blood pressures when the sample sizes for each group are increased to 500 However, as the sample size continues to increase, the benefit of further narrowing the confidence interval becomes less significant 19 PART DATA ANALYSIS Context In today's highly competitive market, businesses are always looking for ways to gain an edge over their competitors This data is valuable to a wide range of businesses, including clothing companies, health and wellness brands, and retailers, who can use this information to create more targeted marketing campaigns and tailor their products to meet the needs of specific customer segments We want to test a hypothesis and respond to the study question listed below by evaluating this data using statistical tools like SPSS We will specifically look into whether men and women at NEU universities have significantly different average heights This study issue has significant ramifications for a range of industries because it can assist companies in better comprehending the characteristics of their target market and developing solutions that cater to their particular wants Through our analysis, we hope to provide meaningful information that will help businesses make better decisions and ultimately increase their competitive advantage By leveraging the power of data, we can create new avenues for growth and assist businesses in thriving in a market that is rapidly evolving and getting more complex Research Question: Is the average height of men significantly greater than that of women among NEU university students? Data Set: The sample data set for the height, gender, and age of NEU university students can be represented as follows: Age This is an integer indicating the age of the student enrolled at NEU University Sex This is the gender of the student, either male or female Height This is the height of the student in centimeters The above are all the variances in the data we collected, however, to answer the question of this test, we will only use the “Height” data in the calculation Descriptive Statistics for Variables: Descriptive Statistics 20 N Minimum Maximum Mean Std Deviation Age 50 19 21 19.64 1.5 Height 50 152 186 165.02 8.06 Frequencies (Dataset) Statistics N Valid Missing Age Sex Height 50 50 50 0 The 50 people in the dataset is divided into samples male and female, each sample is 25 people Then calculate the metrics associated with each sample Symbol : S1 : sample represents male height S2 : sample represents female height Data Analyze Step 1: Calculate the standard deviation of male and female height Use the standard deviation formula 21 We have: S1 = 4.79 S2 = 3.74 Step 2: Determine whether the variances of the two populations are equal according to the hypothesis F distribution with 95% confidence Ho = σ1^2 / σ2^2 = Ha = σ1^2 / σ2^2 ≠ Test statistic: F = S1^2 / S2 ^2 = 1.64 * Find the critical value at 95% confidence Rejection region is : F > F 0.025; 24; 24 = 2.27 (df1 = n1 - = 24) F < F 0.075; 24; 24 = 0.44 (df2 = n2 - = 24) 1.64 is neither > 2.27 nor < 0.44 There is not enough statistical evidence to reject Ho -> accept Ho We can infer that the variance of population is similar, at 95% confidence level Step 3: State hypothesis Null hypothesis: The average height of male students is equal to the average height of female students Ho: � = � Alternative hypothesis: The average height of male students is higher than the average height of female students Ha: � > � Step 4: Calculate testing statistic (Ho: �1 = �2) (Ha: �1 > �2) (df = df1 + df2 – = 48 ) Apply the following formula: 22 t= With: Sp^2 = = 18.47 -> t = 0.49 Step 5: Find critical value We can see that t = 0.49 < t( 0.05 ; 48) = 1.68 Therefore, there is not enough statistical evidence to reject Ho We can infer that the variance of population is similar, at 95% confidence level Recommendation Based on the data collected from NEU university students, it can be observed that there is not a significant difference between the average heights of male and female students While it is important to note that this analysis is specific to the NEU student, it provides valuable insights into the characteristics of this particular group Therefore, businesses that cater to NEU students may benefit from tailoring their products and marketing campaigns to appeal to the preferences of both male and female students, rather than focusing on one gender exclusively Additionally, it may be worthwhile for these businesses to consider promoting products that are designed to fit a variety of body types and sizes, as this can help to increase their appeal to a wider range of customers Overall, leveraging the insights gained from this analysis can help businesses gain a competitive advantage and increase their bottom line in today's rapidly evolving market 23 24 PART CONCLUSION Hypothesis testing is a powerful tool that is widely used across a range of fields including medical research, real estate, finance, and marketing In essence, hypothesis testing involves determining whether a specific event has occurred, whether a particular treatment is effective, whether groups differ from one another, or whether one variable can predict another Hypothesis testing enables researchers to identify statistically significant relationships between variables and to determine whether these relationships are due to chance or if they represent a genuine relationship between the variables One of the most significant advantages of hypothesis testing is that it allows individuals in various professions to arrive at conclusions that are both accurate and cost-effective By applying the principles of hypothesis testing, individuals can make informed decisions and develop recommendations for policy that are based on sound statistical principles However, it is important to note that the results of hypothesis testing are subject to certain limitations and assumptions, and should be interpreted with caution 25 REFERENCES [1] Predictability of machine learning techniques to forecast the trends of market index prices: Hypothesis testing for the Korean stock markets - Sujin Pyo, Jaewook Lee, Mincheol Cha, Huisu Jang - Updated on April 27th, 2022 [online] Available at: https://journals.plos.org/plosone/article? id=10.1371%2Fjournal.pone.0188107&fbclid=IwAR1VlAyUTSbnSAzskrfZpVrB3Ya NEj8TfoP972IbWa8X94AHFxGQWU3n_4k [2] Confidence intervals rather than P values: estimation rather than hypothesistesting - MARTINJGARDNER, DOUGLASGALTMAN - 15 MARCH 1986 [online] Availabe at: https://www.bmj.com/content/bmj/292/6522/746.full.pdf [3] Comparison of Student Learning Outcomes Through Video Learning Media with Powerpoint - Received 15.10.2018 Received in revised form 03.01.2019 Accepted [Available online] 01.04.2019 - Illa Mudasih1, Waspodo Tjipto Subroto2 [4] How to Calculate Standard Deviation (Guide) | Calculator & Examples - Published on September 17, 2020 by Pritha Bhandari Revised on January 20, 2023 Available at: https://www.scribbr.com/statistics/standard deviation/? fbclid=IwAR22s3dkaVVKW8fQUrR-CIAl66vCv5bvrPpfdl2CztO-f1iupa71u9NVYg [5] The influence of height on academic outcomes - Devon Gorry - February 2017, Pages 1-8 Available at: https://www.sciencedirect.com/science/article/abs/pii/S0272775715301448? fbclid=IwAR1VlAyUTSbnSAzskrfZpVrB3YaNEj8TfoP972IbWa8X94AHFxGQWU3n_4k 26

Ngày đăng: 23/10/2023, 06:27

w