Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 20 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
20
Dung lượng
72,05 KB
Nội dung
correlation coefficient plot and the probability plot are useful tools for determining a good distributional model for the data. Software The skewness and kurtosis coefficients are available in most general purpose statistical software programs, including Dataplot. 1.3.5.11. Measures of Skewness and Kurtosis http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm (4 of 4) [5/1/2006 9:57:21 AM] Sample Output Dataplot generated the following autocorrelation output using the LEW.DAT data set: THE LAG-ONE AUTOCORRELATION COEFFICIENT OF THE 200 OBSERVATIONS = -0.3073048E+00 THE COMPUTED VALUE OF THE CONSTANT A = -0.30730480E+00 lag autocorrelation 0. 1.00 1. -0.31 2. -0.74 3. 0.77 4. 0.21 5. -0.90 6. 0.38 7. 0.63 8. -0.77 9. -0.12 10. 0.82 11. -0.40 12. -0.55 13. 0.73 14. 0.07 15. -0.76 16. 0.40 17. 0.48 18. -0.70 19. -0.03 20. 0.70 21. -0.41 22. -0.43 23. 0.67 24. 0.00 25. -0.66 26. 0.42 27. 0.39 28. -0.65 29. 0.03 30. 0.63 31. -0.42 32. -0.36 33. 0.64 34. -0.05 35. -0.60 36. 0.43 37. 0.32 38. -0.64 39. 0.08 40. 0.58 1.3.5.12. Autocorrelation http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (2 of 4) [5/1/2006 9:57:45 AM] 41. -0.45 42. -0.28 43. 0.62 44. -0.10 45. -0.55 46. 0.45 47. 0.25 48. -0.61 49. 0.14 Questions The autocorrelation function can be used to answer the following questions Was this sample data set generated from a random process?1. Would a non-linear or time series model be a more appropriate model for these data than a simple constant plus error model? 2. Importance Randomness is one of the key assumptions in determining if a univariate statistical process is in control. If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as: where E i is an error term. If the randomness assumption is not valid, then a different model needs to be used. This will typically be either a time series model or a non-linear model (with time as the independent variable). Related Techniques Autocorrelation Plot Run Sequence Plot Lag Plot Runs Test Case Study The heat flow meter data demonstrate the use of autocorrelation in determining if the data are from a random process. The beam deflection data demonstrate the use of autocorrelation in developing a non-linear sinusoidal model. Software The autocorrelation capability is available in most general purpose statistical software programs, including Dataplot. 1.3.5.12. Autocorrelation http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (3 of 4) [5/1/2006 9:57:45 AM] 1.3.5.12. Autocorrelation http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (4 of 4) [5/1/2006 9:57:45 AM] run of length r is r consecutive heads or r consecutive tails. To use the Dataplot RUNS command, you could code a sequence of the N = 10 coin tosses HHHHTTHTHH as 1 2 3 4 3 2 3 2 3 4 that is, a heads is coded as an increasing value and a tails is coded as a decreasing value. Another alternative is to code values above the median as positive and values below the median as negative. There are other formulations as well. All of them can be converted to the Dataplot formulation. Just remember that it ultimately reduces to 2 choices. To use the Dataplot runs test, simply code one choice as an increasing value and the other as a decreasing value as in the heads/tails example above. If you are using other statistical software, you need to check the conventions used by that program. Sample Output Dataplot generated the following runs test output using the LEW.DAT data set: RUNS UP STATISTIC = NUMBER OF RUNS UP OF LENGTH EXACTLY I I STAT EXP(STAT) SD(STAT) Z 1 18.0 41.7083 6.4900 -3.65 2 40.0 18.2167 3.3444 6.51 3 2.0 5.2125 2.0355 -1.58 4 0.0 1.1302 1.0286 -1.10 5 0.0 0.1986 0.4424 -0.45 6 0.0 0.0294 0.1714 -0.17 7 0.0 0.0038 0.0615 -0.06 8 0.0 0.0004 0.0207 -0.02 9 0.0 0.0000 0.0066 -0.01 10 0.0 0.0000 0.0020 0.00 STATISTIC = NUMBER OF RUNS UP OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 60.0 66.5000 4.1972 -1.55 2 42.0 24.7917 2.8083 6.13 1.3.5.13. Runs Test for Detecting Non-randomness http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (2 of 5) [5/1/2006 9:57:45 AM] 3 2.0 6.5750 2.1639 -2.11 4 0.0 1.3625 1.1186 -1.22 5 0.0 0.2323 0.4777 -0.49 6 0.0 0.0337 0.1833 -0.18 7 0.0 0.0043 0.0652 -0.07 8 0.0 0.0005 0.0218 -0.02 9 0.0 0.0000 0.0069 -0.01 10 0.0 0.0000 0.0021 0.00 RUNS DOWN STATISTIC = NUMBER OF RUNS DOWN OF LENGTH EXACTLY I I STAT EXP(STAT) SD(STAT) Z 1 25.0 41.7083 6.4900 -2.57 2 35.0 18.2167 3.3444 5.02 3 0.0 5.2125 2.0355 -2.56 4 0.0 1.1302 1.0286 -1.10 5 0.0 0.1986 0.4424 -0.45 6 0.0 0.0294 0.1714 -0.17 7 0.0 0.0038 0.0615 -0.06 8 0.0 0.0004 0.0207 -0.02 9 0.0 0.0000 0.0066 -0.01 10 0.0 0.0000 0.0020 0.00 STATISTIC = NUMBER OF RUNS DOWN OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 60.0 66.5000 4.1972 -1.55 2 35.0 24.7917 2.8083 3.63 3 0.0 6.5750 2.1639 -3.04 4 0.0 1.3625 1.1186 -1.22 5 0.0 0.2323 0.4777 -0.49 6 0.0 0.0337 0.1833 -0.18 7 0.0 0.0043 0.0652 -0.07 8 0.0 0.0005 0.0218 -0.02 9 0.0 0.0000 0.0069 -0.01 10 0.0 0.0000 0.0021 0.00 1.3.5.13. Runs Test for Detecting Non-randomness http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (3 of 5) [5/1/2006 9:57:45 AM] RUNS TOTAL = RUNS UP + RUNS DOWN STATISTIC = NUMBER OF RUNS TOTAL OF LENGTH EXACTLY I I STAT EXP(STAT) SD(STAT) Z 1 43.0 83.4167 9.1783 -4.40 2 75.0 36.4333 4.7298 8.15 3 2.0 10.4250 2.8786 -2.93 4 0.0 2.2603 1.4547 -1.55 5 0.0 0.3973 0.6257 -0.63 6 0.0 0.0589 0.2424 -0.24 7 0.0 0.0076 0.0869 -0.09 8 0.0 0.0009 0.0293 -0.03 9 0.0 0.0001 0.0093 -0.01 10 0.0 0.0000 0.0028 0.00 STATISTIC = NUMBER OF RUNS TOTAL OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 120.0 133.0000 5.9358 -2.19 2 77.0 49.5833 3.9716 6.90 3 2.0 13.1500 3.0602 -3.64 4 0.0 2.7250 1.5820 -1.72 5 0.0 0.4647 0.6756 -0.69 6 0.0 0.0674 0.2592 -0.26 7 0.0 0.0085 0.0923 -0.09 8 0.0 0.0010 0.0309 -0.03 9 0.0 0.0001 0.0098 -0.01 10 0.0 0.0000 0.0030 0.00 LENGTH OF THE LONGEST RUN UP = 3 LENGTH OF THE LONGEST RUN DOWN = 2 LENGTH OF THE LONGEST RUN UP OR DOWN = 3 NUMBER OF POSITIVE DIFFERENCES = 104 NUMBER OF NEGATIVE DIFFERENCES = 95 NUMBER OF ZERO DIFFERENCES = 0 1.3.5.13. Runs Test for Detecting Non-randomness http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (4 of 5) [5/1/2006 9:57:45 AM] Interpretation of Sample Output Scanning the last column labeled "Z", we note that most of the z-scores for run lengths 1, 2, and 3 have an absolute value greater than 1.96. This is strong evidence that these data are in fact not random. Output from other statistical software may look somewhat different from the above output. Question The runs test can be used to answer the following question: Were these sample data generated from a random process? ● Importance Randomness is one of the key assumptions in determining if a univariate statistical process is in control. If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as: where E i is an error term. If the randomness assumption is not valid, then a different model needs to be used. This will typically be either a times series model or a non-linear model (with time as the independent variable). Related Techniques Autocorrelation Run Sequence Plot Lag Plot Case Study Heat flow meter data Software Most general purpose statistical software programs, including Dataplot, support a runs test. 1.3.5.13. Runs Test for Detecting Non-randomness http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (5 of 5) [5/1/2006 9:57:45 AM] Significance Level: Critical Region: The critical values for the Anderson-Darling test are dependent on the specific distribution that is being tested. Tabulated values and formulas have been published (Stephens, 1974, 1976, 1977, 1979) for a few specific distributions (normal, lognormal, exponential, Weibull, logistic, extreme value type 1). The test is a one-sided test and the hypothesis that the distribution is of a specific form is rejected if the test statistic, A, is greater than the critical value. Note that for a given distribution, the Anderson-Darling statistic may be multiplied by a constant (which usually depends on the sample size, n). These constants are given in the various papers by Stephens. In the sample output below, this is the "adjusted Anderson-Darling" statistic. This is what should be compared against the critical values. Also, be aware that different constants (and therefore critical values) have been published. You just need to be aware of what constant was used for a given set of critical values (the needed constant is typically given with the critical values). Sample Output Dataplot generated the following output for the Anderson-Darling test. 1,000 random numbers were generated for a normal, double exponential, Cauchy, and lognormal distribution. In all four cases, the Anderson-Darling test was applied to test for a normal distribution. When the data were generated using a normal distribution, the test statistic was small and the hypothesis was accepted. When the data were generated using the double exponential, Cauchy, and lognormal distributions, the statistics were significant, and the hypothesis of an underlying normal distribution was rejected at significance levels of 0.10, 0.05, and 0.01. The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the Cauchy random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4. *************************************** ** anderson darling normal test y1 ** *************************************** ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION 1. STATISTICS: NUMBER OF OBSERVATIONS = 1000 MEAN = 0.4359940E-02 1.3.5.14. Anderson-Darling Test http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm (2 of 5) [5/1/2006 9:57:46 AM] [...]... CHI-SQUARED CDF VALUE = = = ALPHA LEVEL 10 % 5% CUTOFF 33 .19 624 36. 415 03 http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda35f.htm (4 of 6) [5 /1/ 2006 9:57:46 AM] 10 00 25 0 10 316 5.4 24 1. 000000 CONCLUSION REJECT H0 REJECT H0 1. 3.5 .15 Chi-Square Goodness-of-Fit Test 1% 42.97982 REJECT H0 CELL NUMBER, BIN MIDPOINT, OBSERVED FREQUENCY, AND EXPECTED FREQUENCY WRITTEN TO FILE DPST1F.DAT *************************************************... LEVEL 10 % 5% 1% CUTOFF 32.00690 35 .17 246 41. 63840 10 00 24 0 17 .5 215 5 23 0. 217 1 01 CONCLUSION ACCEPT H0 ACCEPT H0 ACCEPT H0 CELL NUMBER, BIN MIDPOINT, OBSERVED FREQUENCY, AND EXPECTED FREQUENCY WRITTEN TO FILE DPST1F.DAT ************************************************* ** normal chi-square goodness of fit test y2 ** ************************************************* http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda35f.htm... = = = TEST: CHI-SQUARED TEST STATISTIC DEGREES OF FREEDOM CHI-SQUARED CDF VALUE = = = ALPHA LEVEL 10 % 5% 1% CUTOFF 14 .68366 16 . 918 98 21. 66600 10 00 10 0 11 62098 9 1. 000000 CONCLUSION REJECT H0 REJECT H0 REJECT H0 CELL NUMBER, BIN MIDPOINT, OBSERVED FREQUENCY, AND EXPECTED FREQUENCY WRITTEN TO FILE DPST1F.DAT As we would hope, the chi-square test does not reject the normality hypothesis for the normal... *************************************** ANDERSON-DARLING 1- SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION 1 STATISTICS: NUMBER OF OBSERVATIONS MEAN STANDARD DEVIATION = = = 10 00 1. 503854 35 .13 059 ANDERSON-DARLING TEST STATISTIC VALUE = ADJUSTED TEST STATISTIC VALUE = 287.6429 288.7863 http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda35e.htm (3 of 5) [5 /1/ 2006 9:57:46 AM] 1. 3.5 .14 Anderson-Darling Test 2 CRITICAL... 0.7870000 0. 918 0000 1. 092000 3 CONCLUSION (AT THE 5% LEVEL): THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION *************************************** ** anderson darling normal test y4 ** *************************************** ANDERSON-DARLING 1- SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION 1 STATISTICS: NUMBER OF OBSERVATIONS MEAN STANDARD DEVIATION = = = 10 00 1. 518 372 1. 719 969 ANDERSON-DARLING.. .1. 3.5 .14 Anderson-Darling Test STANDARD DEVIATION = 1. 0 018 16 ANDERSON-DARLING TEST STATISTIC VALUE = ADJUSTED TEST STATISTIC VALUE = 0.2565 918 0.257 611 7 2 CRITICAL VALUES: 90 % POINT 95 % POINT 97.5 % POINT 99 % POINT = = = = 0.6560000 0.7870000 0. 918 0000 1. 092000 3 CONCLUSION (AT THE 5% LEVEL): THE DATA DO COME FROM A NORMAL DISTRIBUTION... *************************************** ANDERSON-DARLING 1- SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION 1 STATISTICS: NUMBER OF OBSERVATIONS MEAN STANDARD DEVIATION = = = ANDERSON-DARLING TEST STATISTIC VALUE = ADJUSTED TEST STATISTIC VALUE = 2 CRITICAL VALUES: 90 % POINT 95 % POINT 97.5 % POINT 99 % POINT = = = = 10 00 0.2034888E- 01 1.3 216 27 5.826050 5.849208 0.6560000 0.7870000 0. 918 0000 1. 092000 3 CONCLUSION (AT THE... = = = TEST: CHI-SQUARED TEST STATISTIC DEGREES OF FREEDOM CHI-SQUARED CDF VALUE = = = ALPHA LEVEL 10 % 5% 1% 10 00 26 0 2030.784 25 1. 000000 CUTOFF 34.3 815 8 37.65248 44. 314 11 CONCLUSION REJECT H0 REJECT H0 REJECT H0 CELL NUMBER, BIN MIDPOINT, OBSERVED FREQUENCY, AND EXPECTED FREQUENCY WRITTEN TO FILE DPST1F.DAT ************************************************* ** normal chi-square goodness of fit test... for 90% confidence and compare the critical value 1. 062 to the http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda35e.htm (4 of 5) [5 /1/ 2006 9:57:46 AM] 1. 3.5 .14 Anderson-Darling Test Anderson-Darling test statistic (for the normal data) 0.256 Since the test statistic is less than the critical value, we do not reject the null hypothesis at the = 0 .10 level As we would hope, the Anderson-Darling... the opposite of what is used in some texts and software programs In particular, Dataplot uses the opposite convention http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda35f.htm (2 of 6) [5 /1/ 2006 9:57:46 AM] 1. 3.5 .15 Chi-Square Goodness-of-Fit Test Sample Output Dataplot generated the following output for the chi-square test where 1, 000 random numbers were generated for the normal, double exponential, . Z 1 18.0 41. 7083 6.4900 -3.65 2 40.0 18 . 216 7 3.3444 6. 51 3 2.0 5. 212 5 2.0355 -1. 58 4 0.0 1. 130 2 1. 0286 -1. 10 5 0.0 0 .19 86 0.4424 -0.45 6 0.0 0.0294 0 .17 14 -0 .17 7 0.0 0.0038 0.0 615 -0.06 . 0. 1. 00 1. -0. 31 2. -0.74 3. 0.77 4. 0. 21 5. -0.90 6. 0.38 7. 0.63 8. -0.77 9. -0 .12 10 . 0.82 11 . -0.40 12 . -0.55 13 . 0.73 14 . 0.07 15 . -0.76 16 . 0.40 17 . 0.48 18 . -0.70 19 41. 7083 6.4900 -2.57 2 35.0 18 . 216 7 3.3444 5.02 3 0.0 5. 212 5 2.0355 -2.56 4 0.0 1. 130 2 1. 0286 -1. 10 5 0.0 0 .19 86 0.4424 -0.45 6 0.0 0.0294 0 .17 14 -0 .17 7 0.0 0.0038 0.0 615 -0.06 8 0.0 0.0004