Spectral Plot Another useful plot for non-random data is the spectral plot. This spectral plot shows a single dominant low frequency peak. Quantitative Output Although the 4-plot above clearly shows the violation of the assumptions, we supplement the graphical output with some quantitative measures. Summary Statistics As a first step in the analysis, a table of summary statistics is computed from the data. The following table, generated by Dataplot, shows a typical set of statistics. SUMMARY NUMBER OF OBSERVATIONS = 500 *********************************************************************** * LOCATION MEASURES * DISPERSION MEASURES * *********************************************************************** * MIDRANGE = 0.2888407E+01 * RANGE = 0.9053595E+01 * * MEAN = 0.3216681E+01 * STAND. DEV. = 0.2078675E+01 * * MIDMEAN = 0.4791331E+01 * AV. AB. DEV. = 0.1660585E+01 * * MEDIAN = 0.3612030E+01 * MINIMUM = -0.1638390E+01 * * = * LOWER QUART. = 0.1747245E+01 * * = * LOWER HINGE = 0.1741042E+01 * * = * UPPER HINGE = 0.4682273E+01 * * = * UPPER QUART. = 0.4681717E+01 * 1.4.2.3.2. Test Underlying Assumptions http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm (3 of 7) [5/1/2006 9:58:36 AM] * = * MAXIMUM = 0.7415205E+01 * *********************************************************************** * RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES * *********************************************************************** * AUTOCO COEF = 0.9868608E+00 * ST. 3RD MOM. = -0.4448926E+00 * * = 0.0000000E+00 * ST. 4TH MOM. = 0.2397789E+01 * * = 0.0000000E+00 * ST. WILK-SHA = -0.1279870E+02 * * = * UNIFORM PPCC = 0.9765666E+00 * * = * NORMAL PPCC = 0.9811183E+00 * * = * TUK 5 PPCC = 0.7754489E+00 * * = * CAUCHY PPCC = 0.4165502E+00 * *********************************************************************** The value of the autocorrelation statistic, 0.987, is evidence of a very strong autocorrelation. Location One way to quantify a change in location over time is to fit a straight line to the data set using the index variable X = 1, 2, , N, with N denoting the number of observations. If there is no significant drift in the location, the slope parameter should be zero. For this data set, Dataplot generates the following output: LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 500 NUMBER OF VARIABLES = 1 NO REPLICATION CASE PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE 1 A0 1.83351 (0.1721 ) 10.65 2 A1 X 0.552164E-02 (0.5953E-03) 9.275 RESIDUAL STANDARD DEVIATION = 1.921416 RESIDUAL DEGREES OF FREEDOM = 498 COEF AND SD(COEF) WRITTEN OUT TO FILE DPST1F.DAT SD(PRED),95LOWER,95UPPER,99LOWER,99UPPER WRITTEN OUT TO FILE DPST2F.DAT REGRESSION DIAGNOSTICS WRITTEN OUT TO FILE DPST3F.DAT PARAMETER VARIANCE-COVARIANCE MATRIX AND INVERSE OF X-TRANSPOSE X MATRIX WRITTEN OUT TO FILE DPST4F.DAT The slope parameter, A1, has a t value of 9.3 which is statistically significant. This indicates that the slope cannot in fact be considered zero and so the conclusion is that we do not have constant location. 1.4.2.3.2. Test Underlying Assumptions http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm (4 of 7) [5/1/2006 9:58:36 AM] Variation One simple way to detect a change in variation is with a Bartlett test after dividing the data set into several equal-sized intervals. However, the Bartlett test is not robust for non-normality. Since we know this data set is not approximated well by the normal distribution, we use the alternative Levene test. In partiuclar, we use the Levene test based on the median rather the mean. The choice of the number of intervals is somewhat arbitrary, although values of 4 or 8 are reasonable. Dataplot generated the following output for the Levene test. LEVENE F-TEST FOR SHIFT IN VARIATION (ASSUMPTION: NORMALITY) 1. STATISTICS NUMBER OF OBSERVATIONS = 500 NUMBER OF GROUPS = 4 LEVENE F TEST STATISTIC = 10.45940 FOR LEVENE TEST STATISTIC 0 % POINT = 0.0000000E+00 50 % POINT = 0.7897459 75 % POINT = 1.373753 90 % POINT = 2.094885 95 % POINT = 2.622929 99 % POINT = 3.821479 99.9 % POINT = 5.506884 99.99989 % Point: 10.45940 3. CONCLUSION (AT THE 5% LEVEL): THERE IS A SHIFT IN VARIATION. THUS: NOT HOMOGENEOUS WITH RESPECT TO VARIATION. In this case, the Levene test indicates that the standard deviations are significantly different in the 4 intervals since the test statistic of 10.46 is greater than the 95% critical value of 2.62. Therefore we conclude that the scale is not constant. Randomness Although the lag 1 autocorrelation coefficient above clearly shows the non-randomness, we show the output from a runs test as well. RUNS UP STATISTIC = NUMBER OF RUNS UP OF LENGTH EXACTLY I I STAT EXP(STAT) SD(STAT) Z 1 63.0 104.2083 10.2792 -4.01 2 34.0 45.7167 5.2996 -2.21 3 17.0 13.1292 3.2297 1.20 4 4.0 2.8563 1.6351 0.70 5 1.0 0.5037 0.7045 0.70 6 5.0 0.0749 0.2733 18.02 7 1.0 0.0097 0.0982 10.08 8 1.0 0.0011 0.0331 30.15 9 0.0 0.0001 0.0106 -0.01 1.4.2.3.2. Test Underlying Assumptions http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm (5 of 7) [5/1/2006 9:58:36 AM] 10 1.0 0.0000 0.0032 311.40 STATISTIC = NUMBER OF RUNS UP OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 127.0 166.5000 6.6546 -5.94 2 64.0 62.2917 4.4454 0.38 3 30.0 16.5750 3.4338 3.91 4 13.0 3.4458 1.7786 5.37 5 9.0 0.5895 0.7609 11.05 6 8.0 0.0858 0.2924 27.06 7 3.0 0.0109 0.1042 28.67 8 2.0 0.0012 0.0349 57.21 9 1.0 0.0001 0.0111 90.14 10 1.0 0.0000 0.0034 298.08 RUNS DOWN STATISTIC = NUMBER OF RUNS DOWN OF LENGTH EXACTLY I I STAT EXP(STAT) SD(STAT) Z 1 69.0 104.2083 10.2792 -3.43 2 32.0 45.7167 5.2996 -2.59 3 11.0 13.1292 3.2297 -0.66 4 6.0 2.8563 1.6351 1.92 5 5.0 0.5037 0.7045 6.38 6 2.0 0.0749 0.2733 7.04 7 2.0 0.0097 0.0982 20.26 8 0.0 0.0011 0.0331 -0.03 9 0.0 0.0001 0.0106 -0.01 10 0.0 0.0000 0.0032 0.00 STATISTIC = NUMBER OF RUNS DOWN OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 127.0 166.5000 6.6546 -5.94 2 58.0 62.2917 4.4454 -0.97 3 26.0 16.5750 3.4338 2.74 4 15.0 3.4458 1.7786 6.50 5 9.0 0.5895 0.7609 11.05 6 4.0 0.0858 0.2924 13.38 7 2.0 0.0109 0.1042 19.08 8 0.0 0.0012 0.0349 -0.03 9 0.0 0.0001 0.0111 -0.01 10 0.0 0.0000 0.0034 0.00 RUNS TOTAL = RUNS UP + RUNS DOWN STATISTIC = NUMBER OF RUNS TOTAL OF LENGTH EXACTLY I 1.4.2.3.2. Test Underlying Assumptions http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm (6 of 7) [5/1/2006 9:58:36 AM] I STAT EXP(STAT) SD(STAT) Z 1 132.0 208.4167 14.5370 -5.26 2 66.0 91.4333 7.4947 -3.39 3 28.0 26.2583 4.5674 0.38 4 10.0 5.7127 2.3123 1.85 5 6.0 1.0074 0.9963 5.01 6 7.0 0.1498 0.3866 17.72 7 3.0 0.0193 0.1389 21.46 8 1.0 0.0022 0.0468 21.30 9 0.0 0.0002 0.0150 -0.01 10 1.0 0.0000 0.0045 220.19 STATISTIC = NUMBER OF RUNS TOTAL OF LENGTH I OR MORE I STAT EXP(STAT) SD(STAT) Z 1 254.0 333.0000 9.4110 -8.39 2 122.0 124.5833 6.2868 -0.41 3 56.0 33.1500 4.8561 4.71 4 28.0 6.8917 2.5154 8.39 5 18.0 1.1790 1.0761 15.63 6 12.0 0.1716 0.4136 28.60 7 5.0 0.0217 0.1474 33.77 8 2.0 0.0024 0.0494 40.43 9 1.0 0.0002 0.0157 63.73 10 1.0 0.0000 0.0047 210.77 LENGTH OF THE LONGEST RUN UP = 10 LENGTH OF THE LONGEST RUN DOWN = 7 LENGTH OF THE LONGEST RUN UP OR DOWN = 10 NUMBER OF POSITIVE DIFFERENCES = 258 NUMBER OF NEGATIVE DIFFERENCES = 241 NUMBER OF ZERO DIFFERENCES = 0 Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level. Numerous values in this column are much larger than +/-1.96, so we conclude that the data are not random. Distributional Assumptions Since the quantitative tests show that the assumptions of randomness and constant location and scale are not met, the distributional measures will not be meaningful. Therefore these quantitative tests are omitted. 1.4.2.3.2. Test Underlying Assumptions http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm (7 of 7) [5/1/2006 9:58:36 AM] 1.4.2.3.3. Develop A Better Model http://www.itl.nist.gov/div898/handbook/eda/section4/eda4233.htm (2 of 2) [5/1/2006 9:58:36 AM] 4-Plot of Residuals Interpretation The assumptions are addressed by the graphics shown above: The run sequence plot (upper left) indicates no significant shifts in location or scale over time. 1. The lag plot (upper right) exhibits a random appearance.2. The histogram shows a relatively flat appearance. This indicates that a uniform probability distribution may be an appropriate model for the error component (or residuals). 3. The normal probability plot clearly shows that the normal distribution is not an appropriate model for the error component. 4. A uniform probability plot can be used to further test the suggestion that a uniform distribution might be a good model for the error component. 1.4.2.3.4. Validate New Model http://www.itl.nist.gov/div898/handbook/eda/section4/eda4234.htm (2 of 4) [5/1/2006 9:58:40 AM] Uniform Probability Plot of Residuals Since the uniform probability plot is nearly linear, this verifies that a uniform distribution is a good model for the error component. Conclusions Since the residuals from our model satisfy the underlying assumptions, we conlude that where the E i follow a uniform distribution is a good model for this data set. We could simplify this model to This has the advantage of simplicity (the current point is simply the previous point plus a uniformly distributed error term). Using Scientific and Engineering Knowledge In this case, the above model makes sense based on our definition of the random walk. That is, a random walk is the cumulative sum of uniformly distributed data points. It makes sense that modeling the current point as the previous point plus a uniformly distributed error term is about as good as we can do. Although this case is a bit artificial in that we knew how the data were constructed, it is common and desirable to use scientific and engineering knowledge of the process that generated the data in formulating and testing models for the data. Quite often, several competing models will produce nearly equivalent mathematical results. In this case, selecting the model that best approximates the scientific understanding of the process is a reasonable choice. 1.4.2.3.4. Validate New Model http://www.itl.nist.gov/div898/handbook/eda/section4/eda4234.htm (3 of 4) [5/1/2006 9:58:40 AM] Time Series Model This model is an example of a time series model. More extensive discussion of time series is given in the Process Monitoring chapter. 1.4.2.3.4. Validate New Model http://www.itl.nist.gov/div898/handbook/eda/section4/eda4234.htm (4 of 4) [5/1/2006 9:58:40 AM] [...]... 29 00 28 99 29 02 2899 28 98 28 99 28 99 28 99 29 00 28 99 28 99 28 97 29 01 28 98 28 98 28 99 29 00 29 00 29 00 28 98 28 99 29 00 29 01 28 97 29 00 28 99 28 97 28 98 28 98 28 98 29 00 28 97 28 97 28 99 28 97 29 00 28 99 29 00 28 99 28 98 28 97 28 98 29 02 2899 28 98 28 98 28 99 28 98 28 99 29 01 28 98 29 00 28 99 29 00 28 98 29 00 28 99 28 98 28 98 28 97 28 98 29 00 29 00 28 98 28 98 28 99 28 98 29 01 28 98 28 96 28 96 28 99 28 98 28 98 28 98 28 97 28 97 28 98 28 98 28 97 28 98... 28 97 28 97 29 01 28 99 28 98 28 99 28 99 28 99 28 99 29 00 29 00 29 00 28 98 29 00 29 00 28 99 28 99 28 99 28 97 28 97 28 99 29 00 29 00 29 00 28 99 28 99 29 00 29 00 28 98 28 98 28 99 28 98 28 98 28 99 28 96 28 99 28 98 28 98 28 97 28 98 29 00 28 99 28 99 29 01 29 02 2899 28 99 29 01 29 02 2899 28 98 28 99 28 99 28 98 29 01 29 01 29 00 29 01 29 01 28 99 29 00 28 99 28 98 28 98 28 97 28 99 28 99 28 98 28 99 28 98 28 97 28 99 28 98 28 98 28 98 28 99 28 98 29 01 28 99 28 99 29 00... 28 99 28 98 28 99 28 98 29 00 28 98 28 97 28 97 28 97 28 98 28 98 28 98 28 97 28 97 28 98 28 98 28 99 29 01 28 99 28 98 28 98 28 97 28 98 28 96 28 98 28 97 28 98 28 98 28 98 28 96 28 97 28 96 28 98 28 96 28 98 28 99 28 98 28 99 28 99 29 01 29 01 28 99 29 00 28 98 28 99 29 01 28 96 28 97 29 00 28 99 28 99 28 99 28 99 28 99 28 98 28 97 28 97 28 98 28 96 28 98 28 96 28 99 28 99 28 99 28 98 29 00 28 99 28 97 28 99 28 95 28 97 29 00 28 98 28 98 28 96 28 97 28 96 28 97 28 97 28 96 28 97... 28 97 28 99 28 98 28 98 28 97 28 99 28 99 28 97 28 98 28 99 28 99 28 99 28 99 29 00 28 99 28 98 29 00 29 00 28 98 28 99 28 98 28 99 28 99 28 97 28 99 28 97 28 99 28 98 29 00 28 98 28 98 28 99 28 99 28 99 28 98 28 97 28 98 28 97 28 96 28 96 28 97 28 97 28 98 28 99 29 00 28 97 28 97 28 98 29 01 29 00 29 00 28 98 28 97 28 98 28 97 29 00 29 00 28 99 28 99 29 00 28 99 28 98 29 00 28 99 28 98 29 00 28 98 28 98 28 97 28 98 28 98 28 98 28 97 28 98 29 00 28 98 29 00 28 98 28 98 28 97 28 98... 28 96 28 98 29 00 28 99 28 99 29 00 29 01 29 00 28 99 29 00 28 99 28 98 28 97 28 98 29 01 28 99 28 99 28 99 28 97 28 98 28 99 28 98 28 96 28 98 28 98 28 99 28 99 28 99 28 98 29 01 28 99 28 98 28 98 28 98 28 98 28 97 28 98 28 99 28 96 28 98 28 97 28 98 28 96 28 96 28 98 28 97 28 97 28 97 29 00 28 99 28 98 28 99 29 00 28 98 29 01 29 00 28 98 28 97 28 97 29 00 29 00 28 97 28 98 28 97 28 97 28 99 28 99 28 95 28 98 28 98 28 96 28 98 28 98 29 00 28 98 28 97 28 99 29 01 28 98 28 98 28 98... 28 99 28 99 29 00 29 01 28 97 28 99 29 00 29 00 28 99 29 00 28 99 28 99 29 00 29 00 28 99 28 99 29 00 28 99 28 99 28 99 29 00 28 98 28 98 28 97 28 99 28 99 28 98 28 96 28 96 28 96 28 99 28 98 29 01 29 00 29 00 29 01 28 98 29 00 29 01 28 98 29 00 28 99 28 98 28 98 28 99 29 00 29 01 29 00 29 00 28 97 28 98 28 99 29 01 28 98 28 99 28 99 28 97 28 98 28 97 28 99 28 97 29 00 28 98 28 99 28 97 28 98 28 98 28 98 28 98 29 01 28 99 28 99 29 01 28 99 29 01 28 98 28 99 28 99 29 00 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 424 1.htm... 28 98 28 95 28 96 28 98 29 00 28 96 28 96 28 96 28 97 28 95 28 96 28 98 28 99 28 98 29 00 28 98 29 00 28 98 29 01 29 00 29 00 28 99 28 98 28 98 28 98 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 424 1.htm (2 of 4) [5/1 /20 06 9:58:48 AM] 1.4 .2. 4.1 Background and Data 28 97 28 98 28 99 28 98 28 98 28 97 28 97 28 98 28 99 28 99 28 98 28 97 28 98 28 98 28 98 28 96 28 96 28 97 28 98 28 99 28 99 28 97 29 01 29 00 28 99 28 98 28 99 28 98 28 99 28 97 28 99... 28 98 28 97 28 98 28 97 28 95 28 98 28 99 28 98 28 98 28 97 28 96 29 00 28 98 28 97 28 99 29 00 28 98 29 00 29 00 28 99 28 98 29 00 28 99 28 99 29 00 29 01 29 02 2899 28 98 29 00 28 98 28 99 28 99 28 98 28 97 28 99 28 97 28 99 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 424 1.htm (3 of 4) [5/1 /20 06 9:58:48 AM] 1.4 .2. 4.1 Background and Data 28 99 28 99 28 98 28 98 28 99 29 01 29 00 29 00 29 00 29 00 28 99 29 01 28 99 28 99 28 97 28 99 29 01 29 00... 28 98 28 98 28 99 28 99 28 99 28 99 28 98 28 99 28 98 28 99 28 98 28 97 http://www.itl.nist.gov/div898 /handbook/ eda/section4/eda 424 1.htm (1 of 4) [5/1 /20 06 9:58:48 AM] 1.4 .2. 4.1 Background and Data 28 99 29 01 28 98 28 99 28 99 28 99 28 99 28 97 28 98 28 97 28 98 28 97 28 98 28 98 28 98 29 00 29 00 28 98 28 98 29 00 29 00 28 98 28 98 28 97 28 97 28 97 28 99 28 96 28 97 28 98 28 97 28 98 28 98 28 97 29 00 28 99 29 00 29 00 29 01 29 01 28 98 29 00 28 99 29 00... SKIP 25 SET READ FORMAT 5F5.0 SERIAL READ SOULEN.DAT Y SET READ FORMAT Resulting Data The following are the data used for this case study 28 99 29 01 28 98 28 97 29 00 28 98 28 99 28 99 28 99 28 99 29 01 28 99 28 98 28 98 28 99 28 98 28 99 28 97 28 98 28 99 29 02 2899 29 00 29 00 28 99 28 98 28 98 29 01 28 98 28 97 29 00 28 99 28 98 28 99 28 99 28 99 28 99 29 00 29 00 29 00 29 00 29 00 28 99 29 00 28 99 28 99 29 00 28 99 29 00 28 99 28 99 28 96 28 98 28 98 . 29 00 28 99 28 98 28 99 29 00 28 99 29 00 28 99 28 99 28 99 28 99 28 99 28 98 28 99 28 99 29 00 29 02 2899 29 00 29 00 29 01 28 99 29 01 28 99 28 99 29 02 28 98 28 98 28 98 28 98 28 99 28 99 29 00 29 00 29 00 28 98 28 99 28 99. 28 98 29 01 28 99 29 00 28 99 28 98 29 00 29 00 28 99 28 98 28 97 29 00 28 98 28 98 28 97 28 99 28 98 29 00 28 99 28 98 28 99 28 97 29 00 28 98 29 02 2897 28 98 28 99 28 99 28 99 28 98 28 97 28 98 28 97 28 98 28 99 29 00 29 00 . 28 98 28 98 28 98 29 00 28 98 28 97 28 99 28 97 28 99 28 99 29 00 28 97 29 00 29 00 28 99 28 98 28 98 28 99 28 99 28 99 28 99 28 99 28 98 28 99 28 99 28 99 29 02 2899 29 00 28 98 28 99 28 99 28 99 28 99 28 99 28 99 29 00 28 99