Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 18 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
18
Dung lượng
87,87 KB
Nội dung
1.4.2.8.3 Quantitative Output and Interpretation 10 0.0 0.0 0.0000 0.0000 0.0065 0.0020 -0.01 0.00 STATISTIC = NUMBER OF RUNS UP OF LENGTH I OR MORE I 10 STAT 58.0 23.0 15.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 EXP(STAT) 64.8333 24.1667 6.4083 1.3278 0.2264 0.0328 0.0041 0.0005 0.0000 0.0000 SD(STAT) 4.1439 2.7729 2.1363 1.1043 0.4716 0.1809 0.0644 0.0215 0.0068 0.0021 Z -1.65 -0.42 4.02 1.51 -0.48 -0.18 -0.06 -0.02 -0.01 0.00 RUNS DOWN STATISTIC = NUMBER OF RUNS DOWN OF LENGTH EXACTLY I I 10 STAT 33.0 18.0 3.0 3.0 1.0 0.0 0.0 0.0 0.0 0.0 EXP(STAT) 40.6667 17.7583 5.0806 1.1014 0.1936 0.0287 0.0037 0.0004 0.0000 0.0000 SD(STAT) 6.4079 3.3021 2.0096 1.0154 0.4367 0.1692 0.0607 0.0204 0.0065 0.0020 Z -1.20 0.07 -1.04 1.87 1.85 -0.17 -0.06 -0.02 -0.01 0.00 STATISTIC = NUMBER OF RUNS DOWN OF LENGTH I OR MORE I 10 STAT 58.0 25.0 7.0 4.0 1.0 0.0 0.0 0.0 0.0 0.0 EXP(STAT) 64.8333 24.1667 6.4083 1.3278 0.2264 0.0328 0.0041 0.0005 0.0000 0.0000 SD(STAT) 4.1439 2.7729 2.1363 1.1043 0.4716 0.1809 0.0644 0.0215 0.0068 0.0021 RUNS TOTAL = RUNS UP + RUNS DOWN STATISTIC = NUMBER OF RUNS TOTAL http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm (4 of 8) [5/1/2006 9:58:59 AM] Z -1.65 0.30 0.28 2.42 1.64 -0.18 -0.06 -0.02 -0.01 0.00 1.4.2.8.3 Quantitative Output and Interpretation OF LENGTH EXACTLY I I 10 STAT 68.0 26.0 15.0 6.0 1.0 0.0 0.0 0.0 0.0 0.0 EXP(STAT) 81.3333 35.5167 10.1611 2.2028 0.3871 0.0574 0.0074 0.0008 0.0001 0.0000 SD(STAT) 9.0621 4.6698 2.8420 1.4360 0.6176 0.2392 0.0858 0.0289 0.0092 0.0028 Z -1.47 -2.04 1.70 2.64 0.99 -0.24 -0.09 -0.03 -0.01 0.00 STATISTIC = NUMBER OF RUNS TOTAL OF LENGTH I OR MORE I 10 STAT 116.0 48.0 22.0 7.0 1.0 0.0 0.0 0.0 0.0 0.0 EXP(STAT) 129.6667 48.3333 12.8167 2.6556 0.4528 0.0657 0.0083 0.0009 0.0001 0.0000 SD(STAT) 5.8604 3.9215 3.0213 1.5617 0.6669 0.2559 0.0911 0.0305 0.0097 0.0029 Z -2.33 -0.09 3.04 2.78 0.82 -0.26 -0.09 -0.03 -0.01 0.00 LENGTH OF THE LONGEST RUN UP = LENGTH OF THE LONGEST RUN DOWN = LENGTH OF THE LONGEST RUN UP OR DOWN = NUMBER OF POSITIVE DIFFERENCES = NUMBER OF NEGATIVE DIFFERENCES = NUMBER OF ZERO DIFFERENCES = 5 98 95 Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level The runs test does indicate some non-randomness Although the autocorrelation plot and the runs test indicate some mild non-randomness, the violation of the randomness assumption is not serious enough to warrant developing a more sophisticated model It is common in practice that some of the assumptions are mildly violated and it is a judgement call as to whether or not the violations are serious enough to warrant developing a more sophisticated model for the data http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm (5 of 8) [5/1/2006 9:58:59 AM] 1.4.2.8.3 Quantitative Output and Interpretation Distributional Analysis Probability plots are a graphical test for assessing if a particular distribution provides an adequate fit to a data set A quantitative enhancement to the probability plot is the correlation coefficient of the points on the probability plot For this data set the correlation coefficient is 0.996 Since this is greater than the critical value of 0.987 (this is a tabulated value), the normality assumption is not rejected Chi-square and Kolmogorov-Smirnov goodness-of-fit tests are alternative methods for assessing distributional adequacy The Wilk-Shapiro and Anderson-Darling tests can be used to test for normality Dataplot generates the following output for the Anderson-Darling normality test ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION STATISTICS: NUMBER OF OBSERVATIONS MEAN STANDARD DEVIATION = = = ANDERSON-DARLING TEST STATISTIC VALUE = ADJUSTED TEST STATISTIC VALUE = 195 9.261460 0.2278881E-01 CRITICAL VALUES: 90 % POINT 95 % POINT 97.5 % POINT 99 % POINT = = = = 0.1264954 0.1290070 0.6560000 0.7870000 0.9180000 1.092000 CONCLUSION (AT THE 5% LEVEL): THE DATA DO COME FROM A NORMAL DISTRIBUTION The Anderson-Darling test also does not reject the normality assumption because the test statistic, 0.129, is less than the critical value at the 5% significance level of 0.918 Outlier Analysis A test for outliers is the Grubbs' test Dataplot generated the following output for Grubbs' test GRUBBS TEST FOR OUTLIERS (ASSUMPTION: NORMALITY) STATISTICS: NUMBER OF OBSERVATIONS MINIMUM MEAN MAXIMUM STANDARD DEVIATION = GRUBBS TEST STATISTIC = = = = = 195 9.196848 9.261460 9.327973 0.2278881E-01 2.918673 PERCENT POINTS OF THE REFERENCE DISTRIBUTION http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm (6 of 8) [5/1/2006 9:58:59 AM] 1.4.2.8.3 Quantitative Output and Interpretation FOR GRUBBS TEST STATISTIC % POINT = 50 % POINT = 75 % POINT = 90 % POINT = 95 % POINT = 97.5 % POINT = 99 % POINT = 100 % POINT = 0.000000 2.984294 3.181226 3.424672 3.597898 3.763061 3.970215 13.89263 CONCLUSION (AT THE 5% LEVEL): THERE ARE NO OUTLIERS For this data set, Grubbs' test does not detect any outliers at the 25%, 10%, 5%, and 1% significance levels Model Since the underlying assumptions were validated both graphically and analytically, with a mild violation of the randomness assumption, we conclude that a reasonable model for the data is: We can express the uncertainty for C, here estimated by 9.26146, as the 95% confidence interval (9.258242,9.26479) Univariate Report It is sometimes useful and convenient to summarize the above results in a report The report for the heat flow meter data follows Analysis for heat flow meter data 1: Sample Size = 195 2: Location Mean Standard Deviation of Mean 95% Confidence Interval for Mean Drift with respect to location? = = = = 9.26146 0.001632 (9.258242,9.264679) NO 3: Variation Standard Deviation = 0.022789 95% Confidence Interval for SD = (0.02073,0.025307) Drift with respect to variation? (based on Bartlett's test on quarters of the data) = NO 4: Randomness Autocorrelation Data are Random? (as measured by autocorrelation) 5: Distribution Normal PPCC Data are Normal? (as measured by Normal PPCC) = 0.280579 = NO = 0.998965 = YES 6: Statistical Control (i.e., no drift in location or scale, http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm (7 of 8) [5/1/2006 9:58:59 AM] 1.4.2.8.3 Quantitative Output and Interpretation data are random, distribution is fixed, here we are testing only for fixed normal) Data Set is in Statistical Control? = YES 7: Outliers? (as determined by Grubbs' test) http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm (8 of 8) [5/1/2006 9:58:59 AM] = NO 1.4.2.8.4 Work This Example Yourself Generate a normal probability plot The normal probability plot verifies that the normal distribution is a reasonable distribution for these data Generate summary statistics, quantitative analysis, and print a univariate report Generate a table of summary statistics The summary statistics table displays 25+ statistics Generate the mean, a confidence interval for the mean, and compute a linear fit to detect drift in location The mean is 9.261 and a 95% confidence interval is (9.258,9.265) The linear fit indicates no drift in location since the slope parameter estimate is essentially zero Generate the standard deviation, a confidence interval for the standard deviation, and detect drift in variation by dividing the data into quarters and computing Bartlett's test for equal standard deviations The standard deviation is 0.023 with a 95% confidence interval of (0.0207,0.0253) Bartlett's test indicates no significant change in variation Check for randomness by generating an autocorrelation plot and a runs test The lag autocorrelation is 0.28 From the autocorrelation plot, this is statistically significant at the 95% level Check for normality by computing the normal probability plot correlation coefficient The normal probability plot correlation coefficient is 0.999 At the 5% level, we cannot reject the normality assumption Check for outliers using Grubbs' test Grubbs' test detects no outliers at the 5% level Print a univariate report (this assumes steps thru have already been run) The results are summarized in a convenient report http://www.itl.nist.gov/div898/handbook/eda/section4/eda4284.htm (2 of 2) [5/1/2006 9:58:59 AM] 1.4.2.9.1 Background and Data Exploratory Data Analysis 1.4 EDA Case Studies 1.4.2 Case Studies 1.4.2.9 Airplane Polished Window Strength 1.4.2.9.1 Background and Data Generation This data set was provided by Ed Fuller of the NIST Ceramics Division in December, 1993 It contains polished window strength data that was used with two other sets of data (constant stress-rate data and strength of indented glass data) A paper by Fuller, et al describes the use of all three data sets to predict lifetime and confidence intervals for a glass airplane window A paper by Pepi describes the all-glass airplane window design For this case study, we restrict ourselves to the problem of finding a good distributional model of the polished window strength data Purpose of Analysis The goal of this case study is to find a good distributional model for the polished window strength data Once a good distributional model has been determined, various percent points for the polished widow strength will be computed Since the data were used in a study to predict failure times, this case study is a form of reliability analysis The assessing product reliability chapter contains a more complete discussion of reliabilty methods This case study is meant to complement that chapter by showing the use of graphical techniques in one aspect of reliability modeling Data in reliability analysis not typically follow a normal distribution; non-parametric methods (techniques that not rely on a specific distribution) are frequently recommended for developing confidence intervals for failure data One problem with this approach is that sample sizes are often small due to the expense involved in collecting the data, and non-parametric methods not work well for small sample sizes For this reason, a parametric method based on a specific distributional model of the data is preferred if the data can be shown to follow a specific distribution Parametric models typically have greater efficiency at the cost of more specific assumptions about the data, but, it is important to verify that the distributional assumption is indeed valid If the distributional assumption is not justified, then the conclusions drawn http://www.itl.nist.gov/div898/handbook/eda/section4/eda4291.htm (1 of 2) [5/1/2006 9:58:59 AM] 1.4.2.9.1 Background and Data from the model may not be valid This file can be read by Dataplot with the following commands: SKIP 25 READ FULLER2.DAT Y Resulting Data The following are the data used for this case study The data are in ksi (= 1,000 psi) 18.830 20.800 21.657 23.030 23.230 24.050 24.321 25.500 25.520 25.800 26.690 26.770 26.780 27.050 27.670 29.900 31.110 33.200 33.730 33.760 33.890 34.760 35.750 35.910 36.980 37.080 37.090 39.580 44.045 45.290 45.381 http://www.itl.nist.gov/div898/handbook/eda/section4/eda4291.htm (2 of 2) [5/1/2006 9:58:59 AM] 1.4.2.9.2 Graphical Output and Interpretation The normal probability plot has a correlation coefficient of 0.980 We can use this number as a reference baseline when comparing the performance of other distributional fits Other Potential Distributions There is a large number of distributions that would be distributional model candidates for the data However, we will restrict ourselves to consideration of the following distributional models because these have proven to be useful in reliability studies Normal distribution Exponential distribution Weibull distribution Lognormal distribution Gamma distribution Power normal distribution Fatigue life distribution http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm (2 of 7) [5/1/2006 9:59:00 AM] 1.4.2.9.2 Graphical Output and Interpretation Approach There are two basic questions that need to be addressed Does a given distributional model provide an adequate fit to the data? Of the candidate distributional models, is there one distribution that fits the data better than the other candidate distributional models? The use of probability plots and probability plot correlation coefficient (PPCC) plots provide answers to both of these questions If the distribution does not have a shape parameter, we simply generate a probability plot If we fit a straight line to the points on the probability plot, the intercept and slope of that line provide estimates of the location and scale parameters, respectively Our critierion for the "best fit" distribution is the one with the most linear probability plot The correlation coefficient of the fitted line of the points on the probability plot, referred to as the PPCC value, provides a measure of the linearity of the probability plot, and thus a measure of how well the distribution fits the data The PPCC values for multiple distributions can be compared to address the second question above If the distribution does have a shape parameter, then we are actually addressing a family of distributions rather than a single distribution We first need to find the optimal value of the shape parameter The PPCC plot can be used to determine the optimal parameter We will use the PPCC plots in two stages The first stage will be over a broad range of parameter values while the second stage will be in the neighborhood of the largest values Although we could go further than two stages, for practical purposes two stages is sufficient After determining an optimal value for the shape parameter, we use the probability plot as above to obtain estimates of the location and scale parameters and to determine the PPCC value This PPCC value can be compared to the PPCC values obtained from other distributional models Analyses for Specific Distributions We analyzed the data using the approach described above for the following distributional models: Normal distribution - from the 4-plot above, the PPCC value was 0.980 Exponential distribution - the exponential distribution is a special case of the Weibull with shape parameter equal to If the Weibull analysis yields a shape parameter close to 1, then we would consider using the simpler exponential model Weibull distribution Lognormal distribution Gamma distribution Power normal distribution Power lognormal distribution http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm (3 of 7) [5/1/2006 9:59:00 AM] 1.4.2.9.2 Graphical Output and Interpretation Summary of Results The results are summarized below Normal Distribution Max PPCC = 0.980 Estimate of location = 30.81 Estimate of scale = 7.38 Weibull Distribution Max PPCC = 0.988 Estimate of shape = 2.13 Estimate of location = 15.9 Estimate of scale = 16.92 Lognormal Distribution Max PPCC = 0.986 Estimate of shape = 0.18 Estimate of location = -9.96 Estimate of scale = 40.17 Gamma Distribution Max PPCC = 0.987 Estimate of shape = 11.8 Estimate of location = 5.19 Estimate of scale = 2.17 Power Normal Distribution Max PPCC = 0.988 Estimate of shape = 0.05 Estimate of location = 19.0 Estimate of scale = 2.4 Fatigue Life Distribution Max PPCC = 0.987 Estimate of shape = 0.18 Estimate of location = -11.0 Estimate of scale = 41.3 These results indicate that several of these distributions provide an adequate distributional model for the data We choose the 3-parameter Weibull distribution as the most appropriate model because it provides the best balance between simplicity and best fit http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm (4 of 7) [5/1/2006 9:59:00 AM] 1.4.2.9.2 Graphical Output and Interpretation Percent Point Estimates The final step in this analysis is to compute percent point estimates for the 1%, 2.5%, 5%, 95%, 97.5%, and 99% percent points A percent point estimate is an estimate of the strength at which a given percentage of units will be weaker For example, the 5% point is the strength at which we estimate that 5% of the units will be weaker To calculate these values, we use the Weibull percent point function with the appropriate estimates of the shape, location, and scale parameters The Weibull percent point function can be computed in many general purpose statistical software programs, including Dataplot Dataplot generated the following estimates for the percent points: Estimated percent points using Weibull Distribution PERCENT POINT 0.01 0.02 0.05 0.95 0.97 0.99 Quantitative Measures of Goodness of Fit POLISHED WINDOW STRENGTH 17.86 18.92 20.10 44.21 47.11 50.53 Although it is generally unnecessary, we can include quantitative measures of distributional goodness-of-fit Three of the commonly used measures are: Chi-square goodness-of-fit Kolmogorov-Smirnov goodness-of-fit Anderson-Darling goodness-of-fit In this case, the sample size of 31 precludes the use of the chi-square test since the chi-square approximation is not valid for small sample sizes Specifically, the smallest expected frequency should be at least Although we could combine classes, we will instead use one of the other tests The Kolmogorov-Smirnov test requires a fully specified distribution Since we need to use the data to estimate the shape, location, and scale parameters, we not use this test here The Anderson-Darling test is a refinement of the Kolmogorov-Smirnov test We run this test for the normal, lognormal, and Weibull distributions http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm (5 of 7) [5/1/2006 9:59:00 AM] 1.4.2.9.2 Graphical Output and Interpretation Normal Anderson-Darling Output ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION STATISTICS: NUMBER OF OBSERVATIONS MEAN STANDARD DEVIATION = = = 31 30.81142 7.253381 ANDERSON-DARLING TEST STATISTIC VALUE = ADJUSTED TEST STATISTIC VALUE = 0.5321903 0.5870153 CRITICAL VALUES: 90 % POINT 95 % POINT 97.5 % POINT 99 % POINT = = = = 0.6160000 0.7350000 0.8610000 1.021000 CONCLUSION (AT THE 5% LEVEL): THE DATA DO COME FROM A NORMAL DISTRIBUTION Lognormal Anderson-Darling Output ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A LOGNORMAL DISTRIBUTION STATISTICS: NUMBER OF OBSERVATIONS MEAN OF LOG OF DATA STANDARD DEVIATION OF LOG OF DATA = = = 31 3.401242 0.2349026 ANDERSON-DARLING TEST STATISTIC VALUE = ADJUSTED TEST STATISTIC VALUE = 0.3888340 0.4288908 CRITICAL VALUES: 90 % POINT 95 % POINT 97.5 % POINT 99 % POINT = = = = 0.6160000 0.7350000 0.8610000 1.021000 CONCLUSION (AT THE 5% LEVEL): THE DATA DO COME FROM A LOGNORMAL DISTRIBUTION http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm (6 of 7) [5/1/2006 9:59:00 AM] 1.4.2.9.2 Graphical Output and Interpretation Weibull Anderson-Darling Output ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A WEIBULL DISTRIBUTION STATISTICS: NUMBER OF OBSERVATIONS MEAN STANDARD DEVIATION SHAPE PARAMETER SCALE PARAMETER = = = = = 31 14.91142 7.253381 2.237495 16.87868 ANDERSON-DARLING TEST STATISTIC VALUE = ADJUSTED TEST STATISTIC VALUE = 0.3623638 0.3753803 CRITICAL VALUES: 90 % POINT 95 % POINT 97.5 % POINT 99 % POINT = = = = 0.6370000 0.7570000 0.8770000 1.038000 CONCLUSION (AT THE 5% LEVEL): THE DATA DO COME FROM A WEIBULL DISTRIBUTION Note that for the Weibull distribution, the Anderson-Darling test is actually testing the 2-parameter Weibull distribution (based on maximum likelihood estimates), not the 3-parameter Weibull distribution To give a more accurate comparison, we subtract the location parameter (15.9) as estimated by the PPCC plot/probability plot technique before applying the Anderson-Darling test Conclusions The Anderson-Darling test passes all three of these distributions Note that the value of the Anderson-Darling test statistic is the smallest for the Weibull distribution with the value for the lognormal distribution just slightly larger The test statistic for the normal distribution is noticeably higher than for the Weibull or lognormal This provides additional confirmation that either the Weibull or lognormal distribution fits this data better than the normal distribution with the Weibull providing a slightly better fit than the lognormal http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm (7 of 7) [5/1/2006 9:59:00 AM] 1.4.2.9.3 Weibull Analysis Alternative Plots The Weibull plot and the Weibull hazard plot are alternative graphical analysis procedures to the PPCC plots and probability plots These two procedures, especially the Weibull plot, are very commonly employed That not withstanding, the disadvantage of these two procedures is that they both assume that the location parameter (i.e., the lower bound) is zero and that we are fitting a 2-parameter Weibull instead of a 3-parameter Weibull The advantage is that there is an extensive literature on these methods and they have been designed to work with either censored or uncensored data Weibull Plot This Weibull plot shows the following The Weibull plot is approximately linear indicating that the 2-parameter Weibull provides an adequate fit to the data The estimate of the shape parameter is 5.28 and the estimate of the scale parameter is 33.32 http://www.itl.nist.gov/div898/handbook/eda/section4/eda4293.htm (2 of 3) [5/1/2006 9:59:00 AM] 1.4.2.9.3 Weibull Analysis Weibull Hazard Plot The construction and interpretation of the Weibull hazard plot is discussed in the Assessing Product Reliability chapter http://www.itl.nist.gov/div898/handbook/eda/section4/eda4293.htm (3 of 3) [5/1/2006 9:59:00 AM] 1.4.2.9.4 Lognormal Analysis http://www.itl.nist.gov/div898/handbook/eda/section4/eda4294.htm (2 of 2) [5/1/2006 9:59:01 AM] 1.4.2.9.5 Gamma Analysis http://www.itl.nist.gov/div898/handbook/eda/section4/eda4295.htm (2 of 2) [5/1/2006 9:59:01 AM] ... (= 1, 000 psi) 18 . 830 20.800 21. 657 23. 030 23. 230 24.050 24 .32 1 25.500 25.520 25.800 26.690 26.770 26.780 27.050 27.670 29.900 31 .11 0 33 .200 33 . 730 33 .760 33 .890 34 .760 35 .750 35 . 910 36 .980 37 .080... OR MORE I 10 STAT 11 6.0 48.0 22.0 7.0 1. 0 0.0 0.0 0.0 0.0 0.0 EXP(STAT) 12 9.6667 48 .33 33 12 . 816 7 2.6556 0.4528 0.0657 0.00 83 0.0009 0.00 01 0.0000 SD(STAT) 5.8604 3. 9 215 3. 02 13 1. 5 617 0.6669 0.2559.. .1. 4.2.8 .3 Quantitative Output and Interpretation OF LENGTH EXACTLY I I 10 STAT 68.0 26.0 15 .0 6.0 1. 0 0.0 0.0 0.0 0.0 0.0 EXP(STAT) 81 .33 33 35. 516 7 10 .16 11 2.2028 0 .38 71 0.0574 0.0074