Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 20 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
20
Dung lượng
110,6 KB
Nội dung
data in the first and last thirds was collected in winter while the more stable middle third was collected in the summer. The seasonal effect was determined to be caused by the amount of humidity affecting the measurement equipment. In this case, the solution was to modify the test equipment to be less sensitive to enviromental factors. Simple graphical techniques can be quite effective in revealing unexpected results in the data. When this occurs, it is important to investigate whether the unexpected result is due to problems in the experiment and data collection, or is it in fact indicative of an unexpected underlying structure in the data. This determination cannot be made on the basis of statistics alone. The role of the graphical and statistical analysis is to detect problems or unexpected results in the data. Resolving the issues requires the knowledge of the scientist or engineer. Individual Plots Although it is generally unnecessary, the plots can be generated individually to give more detail. Since the lag plot indicates significant non-randomness, we omit the distributional plots. Run Sequence Plot 1.4.2.7.2. Graphical Output and Interpretation http://www.itl.nist.gov/div898/handbook/eda/section4/eda4272.htm (3 of 4) [5/1/2006 9:58:56 AM] Lag Plot 1.4.2.7.2. Graphical Output and Interpretation http://www.itl.nist.gov/div898/handbook/eda/section4/eda4272.htm (4 of 4) [5/1/2006 9:58:56 AM] * * = * TUK 5 PPCC = 0.7334843E+00 * * = * CAUCHY PPCC = 0.3347875E+00 * *********************************************************************** The autocorrelation coefficient of 0.972 is evidence of significant non-randomness. Location One way to quantify a change in location over time is to fit a straight line to the data set using the index variable X = 1, 2, , N, with N denoting the number of observations. If there is no significant drift in the location, the slope parameter estimate should be zero. For this data set, Dataplot generates the following output: LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 1000 NUMBER OF VARIABLES = 1 NO REPLICATION CASE PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE 1 A0 27.9114 (0.1209E-02) 0.2309E+05 2 A1 X 0.209670E-03 (0.2092E-05) 100.2 RESIDUAL STANDARD DEVIATION = 0.1909796E-01 RESIDUAL DEGREES OF FREEDOM = 998 COEF AND SD(COEF) WRITTEN OUT TO FILE DPST1F.DAT SD(PRED),95LOWER,95UPPER,99LOWER,99UPPER WRITTEN OUT TO FILE DPST2F.DAT REGRESSION DIAGNOSTICS WRITTEN OUT TO FILE DPST3F.DAT PARAMETER VARIANCE-COVARIANCE MATRIX AND INVERSE OF X-TRANSPOSE X MATRIX WRITTEN OUT TO FILE DPST4F.DAT The slope parameter, A1, has a t value of 100 which is statistically significant. The value of the slope parameter estimate is 0.00021. Although this number is nearly zero, we need to take into account that the original scale of the data is from about 27.8 to 28.2. In this case, we conclude that there is a drift in location. 1.4.2.7.3. Quantitative Output and Interpretation http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (2 of 7) [5/1/2006 9:58:57 AM] Variation One simple way to detect a change in variation is with a Bartlett test after dividing the data set into several equal-sized intervals. However, the Bartlett test is not robust for non-normality. Since the normality assumption is questionable for these data, we use the alternative Levene test. In partiuclar, we use the Levene test based on the median rather the mean. The choice of the number of intervals is somewhat arbitrary, although values of 4 or 8 are reasonable. Dataplot generated the following output for the Levene test. LEVENE F-TEST FOR SHIFT IN VARIATION (ASSUMPTION: NORMALITY) 1. STATISTICS NUMBER OF OBSERVATIONS = 1000 NUMBER OF GROUPS = 4 LEVENE F TEST STATISTIC = 140.8509 FOR LEVENE TEST STATISTIC 0 % POINT = 0.0000000E+00 50 % POINT = 0.7891988 75 % POINT = 1.371589 90 % POINT = 2.089303 95 % POINT = 2.613852 99 % POINT = 3.801369 99.9 % POINT = 5.463994 100.0000 % Point: 140.8509 3. CONCLUSION (AT THE 5% LEVEL): THERE IS A SHIFT IN VARIATION. THUS: NOT HOMOGENEOUS WITH RESPECT TO VARIATION. In this case, since the Levene test statistic value of 140.9 is greater than the 5% significance level critical value of 2.6, we conclude that there is significant evidence of nonconstant variation. Randomness There are many ways in which data can be non-random. However, most common forms of non-randomness can be detected with a few simple tests. The lag plot in the 4-plot in the previous section is a simple graphical technique. One check is an autocorrelation plot that shows the autocorrelations for various lags. Confidence bands can be plotted at the 95% and 99% confidence levels. Points outside this band indicate statistically significant values (lag 0 is always 1). Dataplot generated the following autocorrelation plot. 1.4.2.7.3. Quantitative Output and Interpretation http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (3 of 7) [5/1/2006 9:58:57 AM] The lag 1 autocorrelation, which is generally the one of greatest interest, is 0.97. The critical values at the 5% significance level are -0.062 and 0.062. This indicates that the lag 1 autocorrelation is statistically significant, so there is strong evidence of non-randomness. A common test for randomness is the runs test. RUNS UP STATISTIC = NUMBER OF RUNS UP OF LENGTH EXACTLY I I STAT EXP(STAT) SD(STAT) Z 1 178.0 208.3750 14.5453 -2.09 2 90.0 91.5500 7.5002 -0.21 3 29.0 26.3236 4.5727 0.59 4 16.0 5.7333 2.3164 4.43 5 2.0 1.0121 0.9987 0.99 6 0.0 0.1507 0.3877 -0.39 7 0.0 0.0194 0.1394 -0.14 8 0.0 0.0022 0.0470 -0.05 9 0.0 0.0002 0.0150 -0.02 10 0.0 0.0000 0.0046 0.00 STATISTIC = NUMBER OF RUNS UP OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 315.0 333.1667 9.4195 -1.93 2 137.0 124.7917 6.2892 1.94 3 47.0 33.2417 4.8619 2.83 4 18.0 6.9181 2.5200 4.40 5 2.0 1.1847 1.0787 0.76 1.4.2.7.3. Quantitative Output and Interpretation http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (4 of 7) [5/1/2006 9:58:57 AM] 6 0.0 0.1726 0.4148 -0.42 7 0.0 0.0219 0.1479 -0.15 8 0.0 0.0025 0.0496 -0.05 9 0.0 0.0002 0.0158 -0.02 10 0.0 0.0000 0.0048 0.00 RUNS DOWN STATISTIC = NUMBER OF RUNS DOWN OF LENGTH EXACTLY I I STAT EXP(STAT) SD(STAT) Z 1 195.0 208.3750 14.5453 -0.92 2 81.0 91.5500 7.5002 -1.41 3 32.0 26.3236 4.5727 1.24 4 4.0 5.7333 2.3164 -0.75 5 1.0 1.0121 0.9987 -0.01 6 1.0 0.1507 0.3877 2.19 7 0.0 0.0194 0.1394 -0.14 8 0.0 0.0022 0.0470 -0.05 9 0.0 0.0002 0.0150 -0.02 10 0.0 0.0000 0.0046 0.00 STATISTIC = NUMBER OF RUNS DOWN OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 314.0 333.1667 9.4195 -2.03 2 119.0 124.7917 6.2892 -0.92 3 38.0 33.2417 4.8619 0.98 4 6.0 6.9181 2.5200 -0.36 5 2.0 1.1847 1.0787 0.76 6 1.0 0.1726 0.4148 1.99 7 0.0 0.0219 0.1479 -0.15 8 0.0 0.0025 0.0496 -0.05 9 0.0 0.0002 0.0158 -0.02 10 0.0 0.0000 0.0048 0.00 RUNS TOTAL = RUNS UP + RUNS DOWN STATISTIC = NUMBER OF RUNS TOTAL OF LENGTH EXACTLY I I STAT EXP(STAT) SD(STAT) Z 1 373.0 416.7500 20.5701 -2.13 2 171.0 183.1000 10.6068 -1.14 3 61.0 52.6472 6.4668 1.29 4 20.0 11.4667 3.2759 2.60 5 3.0 2.0243 1.4123 0.69 6 1.0 0.3014 0.5483 1.27 7 0.0 0.0389 0.1971 -0.20 8 0.0 0.0044 0.0665 -0.07 9 0.0 0.0005 0.0212 -0.02 10 0.0 0.0000 0.0065 -0.01 1.4.2.7.3. Quantitative Output and Interpretation http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (5 of 7) [5/1/2006 9:58:57 AM] STATISTIC = NUMBER OF RUNS TOTAL OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 629.0 666.3333 13.3212 -2.80 2 256.0 249.5833 8.8942 0.72 3 85.0 66.4833 6.8758 2.69 4 24.0 13.8361 3.5639 2.85 5 4.0 2.3694 1.5256 1.07 6 1.0 0.3452 0.5866 1.12 7 0.0 0.0438 0.2092 -0.21 8 0.0 0.0049 0.0701 -0.07 9 0.0 0.0005 0.0223 -0.02 10 0.0 0.0000 0.0067 -0.01 LENGTH OF THE LONGEST RUN UP = 5 LENGTH OF THE LONGEST RUN DOWN = 6 LENGTH OF THE LONGEST RUN UP OR DOWN = 6 NUMBER OF POSITIVE DIFFERENCES = 505 NUMBER OF NEGATIVE DIFFERENCES = 469 NUMBER OF ZERO DIFFERENCES = 25 Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level. Due to the number of values that are larger than the 1.96 cut-off, we conclude that the data are not random. However, in this case the evidence from the runs test is not nearly as strong as it is from the autocorrelation plot. Distributional Analysis Since we rejected the randomness assumption, the distributional tests are not meaningful. Therefore, these quantitative tests are omitted. Since the Grubbs' test for outliers also assumes the approximate normality of the data, we omit Grubbs' test as well. Univariate Report It is sometimes useful and convenient to summarize the above results in a report. Analysis for resistor case study 1: Sample Size = 1000 2: Location Mean = 28.01635 Standard Deviation of Mean = 0.002008 95% Confidence Interval for Mean = (28.0124,28.02029) Drift with respect to location? = NO 3: Variation Standard Deviation = 0.063495 95% Confidence Interval for SD = (0.060829,0.066407) Change in variation? (based on Levene's test on quarters of the data) = YES 4: Randomness Autocorrelation = 0.972158 1.4.2.7.3. Quantitative Output and Interpretation http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (6 of 7) [5/1/2006 9:58:57 AM] Data Are Random? (as measured by autocorrelation) = NO 5: Distribution Distributional test omitted due to non-randomness of the data 6: Statistical Control (i.e., no drift in location or scale, data are random, distribution is fixed) Data Set is in Statistical Control? = NO 7: Outliers? (Grubbs' test omitted due to non-randomness of the data 1.4.2.7.3. Quantitative Output and Interpretation http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (7 of 7) [5/1/2006 9:58:57 AM] 1. Invoke Dataplot and read data. 1. Read in the data. 1. You have read 1 column of numbers into Dataplot, variable Y. 2. 4-plot of the data. 1. 4-plot of Y. 1. Based on the 4-plot, there are shifts in location and variation and the data are not random. 3. Generate the individual plots. 1. Generate a run sequence plot. 2. Generate a lag plot. 1. The run sequence plot indicates that there are shifts of location and variation. 2. The lag plot shows a strong linear pattern, which indicates significant non-randomness. 4. Generate summary statistics, quantitative analysis, and print a univariate report. 1. Generate a table of summary statistics. 2. Generate the sample mean, a confidence interval for the population mean, and compute a linear fit to detect drift in 1. The summary statistics table displays 25+ statistics. 2. The mean is 28.0163 and a 95% confidence interval is (28.0124,28.02029). The linear fit indicates drift in 1.4.2.7.4. Work This Example Yourself http://www.itl.nist.gov/div898/handbook/eda/section4/eda4274.htm (2 of 3) [5/1/2006 9:58:57 AM] location. 3. Generate the sample standard deviation, a confidence interval for the population standard deviation, and detect drift in variation by dividing the data into quarters and computing Levene's test for equal standard deviations. 4. Check for randomness by generating an autocorrelation plot and a runs test. 5. Print a univariate report (this assumes steps 2 thru 5 have already been run). location since the slope parameter estimate is statistically significant. 3. The standard deviation is 0.0635 with a 95% confidence interval of (0.060829,0.066407). Levene's test indicates significant change in variation. 4. The lag 1 autocorrelation is 0.97. From the autocorrelation plot, this is outside the 95% confidence interval bands, indicating significant non-randomness. 5. The results are summarized in a convenient report. 1.4.2.7.4. Work This Example Yourself http://www.itl.nist.gov/div898/handbook/eda/section4/eda4274.htm (3 of 3) [5/1/2006 9:58:57 AM] [...]... 9 .27 6345 9 .27 8694 9 .26 7144 9 .24 61 32 9 .23 8479 9 .26 9058 9 .24 823 9 9 .25 7439 9 .26 8481 9 .28 8454 9 .25 84 52 9 .28 6130 9 .25 1479 9 .25 7405 9 .26 8343 9 .29 13 02 9 .21 9460 9 .27 0386 9 .21 8808 9 .24 1185 9 .26 9989 9 .22 6585 9 .25 8556 9 .28 6184 9. 320 067 9. 327 973 9 .26 2963 9 .24 8181 9 .23 8644 9 .22 5073 9 .22 0878 9 .27 1318 9 .25 20 72 9 .28 1186 9 .27 0 624 9 .29 4771 9.301 821 9 .27 8849 9 .23 6680 9 .23 3988 9 .24 4687 9 .22 1601 9 .20 7 325 9 .25 8776 9 .27 5708... 9 .27 8 822 9 .25 524 4 9 .22 922 1 9 .25 3158 9 .25 629 2 9 .26 26 02 9 .21 9793 9 .25 84 52 9 .26 7987 9 .26 7987 9 .24 8903 9 .23 5153 9 .24 2933 9 .25 3453 9 .26 2671 9 .24 2536 9 .26 0803 9 .25 9 825 9 .25 3 123 9 .24 0803 9 .23 87 12 9 .26 3676 9 .24 30 02 9 .24 6 826 9 .25 2107 9 .26 1663 9 .24 7311 9.306055 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 1.htm (4 of 5) [5/1 /20 06 9:58:58 AM] 1.4 .2. 8.1 Background and Data 9 .23 7646 9 .24 8937 9 .25 6689 9 .26 5777... http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 1.htm (2 of 5) [5/1 /20 06 9:58:58 AM] 1.4 .2. 8.1 Background and Data 9 .26 8955 9 .25 726 9 9 .26 4979 9 .29 5500 9 .29 2883 9 .26 4188 9 .28 0731 9 .26 7336 9.300566 9 .25 3089 9 .26 1376 9 .23 8409 9 .22 5073 9 .23 5 526 9 .23 9510 9 .26 4487 9 .24 424 2 9 .27 75 42 9.310506 9 .26 159 4 9 .25 9791 9 .25 3089 9 .24 5735 9 .28 4058 9 .25 1 122 9 .27 5385 9 .25 4619 9 .27 9 526 9 .27 5065 9 .26 19 52 9 .27 5351 9 .25 2433 9 .23 026 3 9 .25 5150 ... 9 .26 5777 9 .29 9047 9 .24 4814 9 .28 720 5 9.300566 9 .25 6 621 9 .27 1318 9 .27 5154 9 .28 1834 9 .25 3158 9 .26 9 024 9 .28 2077 9 .27 7507 9 .28 4910 9 .23 9840 9 .26 8344 9 .24 7778 9 .22 5039 9 .23 0750 9 .27 0 024 9 .26 5095 9 .28 4308 9 .28 0697 9 .26 30 32 9 .29 1851 9 .25 20 72 9 .24 4031 9 .28 326 9 9.196848 9 .23 13 72 9 .23 2963 9 .23 4956 9 .21 6746 9 .27 4107 9 .27 3776 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 1.htm (5 of 5) [5/1 /20 06 9:58:58... 9 .26 8780 9 .29 0389 9 .27 4161 9 .25 5707 9 .26 1663 9 .25 0455 9 .26 19 52 9 .26 4041 9 .26 4509 9 .24 2114 9 .23 9674 9 .22 155 3 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 1.htm (3 of 5) [5/1 /20 06 9:58:58 AM] 1.4 .2. 8.1 Background and Data 9 .24 1935 9 .21 526 5 9 .28 5930 9 .27 155 9 9 .26 6046 9 .28 529 9 9 .26 8989 9 .26 7987 9 .24 6166 9 .23 1304 9 .24 0768 9 .26 0506 9 .27 4355 9 .29 2376 9 .27 1170 9 .26 7018 9.308838 9 .26 4153 9 .27 8 822 ... SKIP 25 READ ZARR13.DAT Y Resulting Data The following are the data used for this case study 9 .20 6343 9 .29 99 92 9 .27 7895 9.305795 9 .27 5351 9 .28 8 729 9 .28 723 9 9 .26 0973 9.303111 9 .27 5674 9 .27 2561 9 .28 8454 9 .25 56 72 9 .25 2141 9 .29 7670 9 .26 6534 9 .25 6689 9 .27 75 42 9 .24 820 5 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 1.htm (1 of 5) [5/1 /20 06 9:58:58 AM] 1.4 .2. 8.1 Background and Data 9 .25 2107 9 .27 6345... http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 2.htm (2 of 4) [5/1 /20 06 9:58:58 AM] 1.4 .2. 8 .2 Graphical Output and Interpretation Run Sequence Plot Lag Plot http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 2.htm (3 of 4) [5/1 /20 06 9:58:58 AM] 1.4 .2. 8 .2 Graphical Output and Interpretation Histogram (with overlaid Normal PDF) Normal Probability Plot http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 2.htm... SQUARES MULTILINEAR FIT SAMPLE SIZE N = NUMBER OF VARIABLES = NO REPLICATION CASE 195 1 PARAMETER ESTIMATES (APPROX ST DEV.) T VALUE 1 A0 2 A1 9 .26 699 (0. 325 3E- 02) 28 49 X -0.564115E-04 (0 .28 78E-04) -1.960 RESIDUAL RESIDUAL STANDARD DEVIATION = DEGREES OF FREEDOM = 0 .22 623 72E-01 193 The slope parameter, A1, has a t value of -1.96 which is (barely) statistically significant since it is essentially equal to... 3.0 0.0 0.0 0.0 0.0 EXP(STAT) 40.6667 17.7583 5.0806 1.1014 0.1936 0. 028 7 0.0037 0.0004 SD(STAT) 6.4079 3.3 021 2. 0096 1. 0154 0.4367 0.16 92 0.0607 0. 020 4 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 3.htm (3 of 8) [5/1 /20 06 9:58:59 AM] Z -0.88 -2. 96 3.44 1.87 -0.44 -0.17 -0.06 -0. 02 ... CUTOFF: 95% PERCENT POINT CUTOFF: 99% PERCENT POINT = = = 3.147338 7.814 727 11.34487 CHI-SQUARE CDF VALUE = 0.630538 NULL NULL HYPOTHESIS HYPOTHESIS ACCEPTANCE INTERVAL ALL SIGMA EQUAL (0.000,0.950) NULL HYPOTHESIS CONCLUSION ACCEPT http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 428 3.htm (2 of 8) [5/1 /20 06 9:58:59 AM] 1.4 .2. 8.3 Quantitative Output and Interpretation In this case, since the Bartlett . AM] 9 .24 1935 9 .21 526 5 9 .28 5930 9 .27 155 9 9 .26 6046 9 .28 529 9 9 .26 8989 9 .26 7987 9 .24 6166 9 .23 1304 9 .24 0768 9 .26 0506 9 .27 4355 9 .29 2376 9 .27 1170 9 .26 7018 9.308838 9 .26 4153 9 .27 8 822 9 .25 524 4 9 .22 922 1 9 .25 3158 9 .25 629 2 9 .26 26 02 9 .21 9793 9 .25 84 52 9 .26 7987 9 .26 7987 9 .24 8903 9 .23 5153 9 .24 2933 9 .25 3453 9 .26 2671 9 .24 2536 9 .26 0803 9 .25 9 825 9 .25 3 123 9 .24 0803 9 .23 87 12 9 .26 3676 9 .24 30 02 9 .24 6 826 9 .25 2107 9 .26 1663 9 .24 7311 9.306055 1.4 .2. 8.1 AM] 9 .25 2107 9 .27 6345 9 .27 8694 9 .26 7144 9 .24 61 32 9 .23 8479 9 .26 9058 9 .24 823 9 9 .25 7439 9 .26 8481 9 .28 8454 9 .25 84 52 9 .28 6130 9 .25 1479 9 .25 7405 9 .26 8343 9 .29 13 02 9 .21 9460 9 .27 0386 9 .21 8808 9 .24 1185 9 .26 9989 9 .22 6585 9 .25 8556 9 .28 6184 9. 320 067 9. 327 973 9 .26 2963 9 .24 8181 9 .23 8644 9 .22 5073 9 .22 0878 9 .27 1318 9 .25 20 72 9 .28 1186 9 .27 0 624 9 .29 4771 9.301 821 9 .27 8849 9 .23 6680 9 .23 3988 9 .24 4687 9 .22 1601 9 .20 7 325 9 .25 8776 9 .27 5708 1.4 .2. 8.1 AM] 9 .26 8955 9 .25 726 9 9 .26 4979 9 .29 5500 9 .29 2883 9 .26 4188 9 .28 0731 9 .26 7336 9.300566 9 .25 3089 9 .26 1376 9 .23 8409 9 .22 5073 9 .23 5 526 9 .23 9510 9 .26 4487 9 .24 424 2 9 .27 75 42 9.310506 9 .26 159 4 9 .25 9791 9 .25 3089 9 .24 5735 9 .28 4058 9 .25 1 122 9 .27 5385 9 .25 4619 9 .27 9 526 9 .27 5065 9 .26 19 52 9 .27 5351 9 .25 2433 9 .23 026 3 9 .25 5150 9 .26 8780 9 .29 0389 9 .27 4161 9 .25 5707 9 .26 1663 9 .25 0455 9 .26 19 52 9 .26 4041 9 .26 4509 9 .24 2114 9 .23 9674 9 .22 155 3 1.4 .2. 8.1. Background