1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Engineering Statistics Handbook Episode 1 Part 11 ppsx

20 286 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 20
Dung lượng 84,1 KB

Nội dung

Alternative Measures of Location A few of the more common alternative location measures are: Mid-Mean - computes a mean using the data between the 25th and 75th percentiles. 1. Trimmed Mean - similar to the mid-mean except different percentile values are used. A common choice is to trim 5% of the points in both the lower and upper tails, i.e., calculate the mean for data between the 5th and 95th percentiles. 2. Winsorized Mean - similar to the trimmed mean. However, instead of trimming the points, they are set to the lowest (or highest) value. For example, all data below the 5th percentile are set equal to the value of the 5th percentile and all data greater than the 95th percentile are set equal to the 95th percentile. 3. Mid-range = (smallest + largest)/2.4. The first three alternative location estimators defined above have the advantage of the median in the sense that they are not unduly affected by extremes in the tails. However, they generate estimates that are closer to the mean for data that are normal (or nearly so). The mid-range, since it is based on the two most extreme points, is not robust. Its use is typically restricted to situations in which the behavior at the extreme points is relevant. Case Study The uniform random numbers case study compares the performance of several different location estimators for a particular non-normal distribution. Software Most general purpose statistical software programs, including Dataplot, can compute at least some of the measures of location discussed above. 1.3.5.1. Measures of Location http://www.itl.nist.gov/div898/handbook/eda/section3/eda351.htm (5 of 5) [5/1/2006 9:57:12 AM] This simply means that noisy data, i.e., data with a large standard deviation, are going to generate wider intervals than data with a smaller standard deviation. Definition: Hypothesis Test To test whether the population mean has a specific value, , against the two-sided alternative that it does not have a value , the confidence interval is converted to hypothesis-test form. The test is a one-sample t-test, and it is defined as: H 0 : H a : Test Statistic: where , N, and are defined as above. Significance Level: . The most commonly used value for is 0.05. Critical Region: Reject the null hypothesis that the mean is a specified value, , if or Sample Output for Confidence Interval Dataplot generated the following output for a confidence interval from the ZARR13.DAT data set: CONFIDENCE LIMITS FOR MEAN (2-SIDED) NUMBER OF OBSERVATIONS = 195 MEAN = 9.261460 STANDARD DEVIATION = 0.2278881E-01 STANDARD DEVIATION OF MEAN = 0.1631940E-02 CONFIDENCE T T X SD(MEAN) LOWER UPPER VALUE (%) VALUE LIMIT LIMIT 50.000 0.676 0.110279E-02 9.26036 9.26256 75.000 1.154 0.188294E-02 9.25958 9.26334 90.000 1.653 0.269718E-02 9.25876 9.26416 95.000 1.972 0.321862E-02 9.25824 9.26468 99.000 2.601 0.424534E-02 9.25721 9.26571 99.900 3.341 0.545297E-02 9.25601 9.26691 99.990 3.973 0.648365E-02 9.25498 9.26794 99.999 4.536 0.740309E-02 9.25406 9.26886 1.3.5.2. Confidence Limits for the Mean http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm (2 of 4) [5/1/2006 9:57:13 AM] Interpretation of the Sample Output The first few lines print the sample statistics used in calculating the confidence interval. The table shows the confidence interval for several different significance levels. The first column lists the confidence level (which is 1 - expressed as a percent), the second column lists the t-value (i.e., ), the third column lists the t-value times the standard error (the standard error is ), the fourth column lists the lower confidence limit, and the fifth column lists the upper confidence limit. For example, for a 95% confidence interval, we go to the row identified by 95.000 in the first column and extract an interval of (9.25824, 9.26468) from the last two columns. Output from other statistical software may look somewhat different from the above output. Sample Output for t Test Dataplot generated the following output for a one-sample t-test from the ZARR13.DAT data set: T TEST (1-SAMPLE) MU0 = 5.000000 NULL HYPOTHESIS UNDER TEST MEAN MU = 5.000000 SAMPLE: NUMBER OF OBSERVATIONS = 195 MEAN = 9.261460 STANDARD DEVIATION = 0.2278881E-01 STANDARD DEVIATION OF MEAN = 0.1631940E-02 TEST: MEAN-MU0 = 4.261460 T TEST STATISTIC VALUE = 2611.284 DEGREES OF FREEDOM = 194.0000 T TEST STATISTIC CDF VALUE = 1.000000 ALTERNATIVE- ALTERNATIVE- ALTERNATIVE- HYPOTHESIS HYPOTHESIS HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION MU <> 5.000000 (0,0.025) (0.975,1) ACCEPT MU < 5.000000 (0,0.05) REJECT MU > 5.000000 (0.95,1) ACCEPT 1.3.5.2. Confidence Limits for the Mean http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm (3 of 4) [5/1/2006 9:57:13 AM] Interpretation of Sample Output We are testing the hypothesis that the population mean is 5. The output is divided into three sections. The first section prints the sample statistics used in the computation of the t-test.1. The second section prints the t-test statistic value, the degrees of freedom, and the cumulative distribution function (cdf) value of the t-test statistic. The t-test statistic cdf value is an alternative way of expressing the critical value. This cdf value is compared to the acceptance intervals printed in section three. For an upper one-tailed test, the alternative hypothesis acceptance interval is (1 - ,1), the alternative hypothesis acceptance interval for a lower one-tailed test is (0, ), and the alternative hypothesis acceptance interval for a two-tailed test is (1 - /2,1) or (0, /2). Note that accepting the alternative hypothesis is equivalent to rejecting the null hypothesis. 2. The third section prints the conclusions for a 95% test since this is the most common case. Results are given in terms of the alternative hypothesis for the two-tailed test and for the one-tailed test in both directions. The alternative hypothesis acceptance interval column is stated in terms of the cdf value printed in section two. The last column specifies whether the alternative hypothesis is accepted or rejected. For a different significance level, the appropriate conclusion can be drawn from the t-test statistic cdf value printed in section two. For example, for a significance level of 0.10, the corresponding alternative hypothesis acceptance intervals are (0,0.05) and (0.95,1), (0, 0.10), and (0.90,1). 3. Output from other statistical software may look somewhat different from the above output. Questions Confidence limits for the mean can be used to answer the following questions: What is a reasonable estimate for the mean?1. How much variability is there in the estimate of the mean?2. Does a given target value fall within the confidence limits?3. Related Techniques Two-Sample T-Test Confidence intervals for other location estimators such as the median or mid-mean tend to be mathematically difficult or intractable. For these cases, confidence intervals can be obtained using the bootstrap. Case Study Heat flow meter data. Software Confidence limits for the mean and one-sample t-tests are available in just about all general purpose statistical software programs, including Dataplot. 1.3.5.2. Confidence Limits for the Mean http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm (4 of 4) [5/1/2006 9:57:13 AM] Test Statistic: where N 1 and N 2 are the sample sizes, and are the sample means, and and are the sample variances. If equal variances are assumed, then the formula reduces to: where Significance Level: . Critical Region: Reject the null hypothesis that the two means are equal if or where is the critical value of the t distribution with degrees of freedom where If equal variances are assumed, then Sample Output Dataplot generated the following output for the t test from the AUTO83B.DAT data set: T TEST (2-SAMPLE) NULL HYPOTHESIS UNDER TEST POPULATION MEANS MU1 = MU2 SAMPLE 1: NUMBER OF OBSERVATIONS = 249 MEAN = 20.14458 STANDARD DEVIATION = 6.414700 STANDARD DEVIATION OF MEAN = 0.4065151 1.3.5.3. Two-Sample t-Test for Equal Means http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm (2 of 4) [5/1/2006 9:57:14 AM] SAMPLE 2: NUMBER OF OBSERVATIONS = 79 MEAN = 30.48101 STANDARD DEVIATION = 6.107710 STANDARD DEVIATION OF MEAN = 0.6871710 IF ASSUME SIGMA1 = SIGMA2: POOLED STANDARD DEVIATION = 6.342600 DIFFERENCE (DEL) IN MEANS = -10.33643 STANDARD DEVIATION OF DEL = 0.8190135 T TEST STATISTIC VALUE = -12.62059 DEGREES OF FREEDOM = 326.0000 T TEST STATISTIC CDF VALUE = 0.000000 IF NOT ASSUME SIGMA1 = SIGMA2: STANDARD DEVIATION SAMPLE 1 = 6.414700 STANDARD DEVIATION SAMPLE 2 = 6.107710 BARTLETT CDF VALUE = 0.402799 DIFFERENCE (DEL) IN MEANS = -10.33643 STANDARD DEVIATION OF DEL = 0.7984100 T TEST STATISTIC VALUE = -12.94627 EQUIVALENT DEG. OF FREEDOM = 136.8750 T TEST STATISTIC CDF VALUE = 0.000000 ALTERNATIVE- ALTERNATIVE- ALTERNATIVE- HYPOTHESIS HYPOTHESIS HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION MU1 <> MU2 (0,0.025) (0.975,1) ACCEPT MU1 < MU2 (0,0.05) ACCEPT MU1 > MU2 (0.95,1) REJECT Interpretation of Sample Output We are testing the hypothesis that the population mean is equal for the two samples. The output is divided into five sections. The first section prints the sample statistics for sample one used in the computation of the t-test. 1. The second section prints the sample statistics for sample two used in the computation of the t-test. 2. The third section prints the pooled standard deviation, the difference in the means, the t-test statistic value, the degrees of freedom, and the cumulative distribution function (cdf) value of the t-test statistic under the assumption that the standard deviations are equal. The t-test statistic cdf value is an alternative way of expressing the critical value. This cdf value is compared to the acceptance intervals printed in section five. For an upper one-tailed test, the acceptance interval is (0,1 - ), the acceptance interval for a two-tailed test is ( /2, 1 - /2), and the acceptance interval for a lower 3. 1.3.5.3. Two-Sample t-Test for Equal Means http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm (3 of 4) [5/1/2006 9:57:14 AM] one-tailed test is ( ,1). The fourth section prints the pooled standard deviation, the difference in the means, the t-test statistic value, the degrees of freedom, and the cumulative distribution function (cdf) value of the t-test statistic under the assumption that the standard deviations are not equal. The t-test statistic cdf value is an alternative way of expressing the critical value. cdf value is compared to the acceptance intervals printed in section five. For an upper one-tailed test, the alternative hypothesis acceptance interval is (1 - ,1), the alternative hypothesis acceptance interval for a lower one-tailed test is (0, ), and the alternative hypothesis acceptance interval for a two-tailed test is (1 - /2,1) or (0, /2). Note that accepting the alternative hypothesis is equivalent to rejecting the null hypothesis. 4. The fifth section prints the conclusions for a 95% test under the assumption that the standard deviations are not equal since a 95% test is the most common case. Results are given in terms of the alternative hypothesis for the two-tailed test and for the one-tailed test in both directions. The alternative hypothesis acceptance interval column is stated in terms of the cdf value printed in section four. The last column specifies whether the alternative hypothesis is accepted or rejected. For a different significance level, the appropriate conclusion can be drawn from the t-test statistic cdf value printed in section four. For example, for a significance level of 0.10, the corresponding alternative hypothesis acceptance intervals are (0,0.05) and (0.95,1), (0, 0.10), and (0.90,1). 5. Output from other statistical software may look somewhat different from the above output. Questions Two-sample t-tests can be used to answer the following questions: Is process 1 equivalent to process 2?1. Is the new process better than the current process?2. Is the new process better than the current process by at least some pre-determined threshold amount? 3. Related Techniques Confidence Limits for the Mean Analysis of Variance Case Study Ceramic strength data. Software Two-sample t-tests are available in just about all general purpose statistical software programs, including Dataplot. 1.3.5.3. Two-Sample t-Test for Equal Means http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm (4 of 4) [5/1/2006 9:57:14 AM] 18 19 14 32 14 34 14 26 14 30 12 22 13 22 13 33 18 39 22 36 19 28 18 27 23 21 26 24 25 30 20 34 21 32 13 38 14 37 15 30 14 31 17 37 11 32 13 47 12 41 13 45 15 34 13 33 13 24 14 32 22 39 28 35 13 32 14 37 13 38 14 34 15 34 12 32 13 33 13 32 14 25 13 24 12 37 13 31 18 36 16 36 1.3.5.3.1. Data Used for Two-Sample t-Test http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm (2 of 6) [5/1/2006 9:57:14 AM] 18 34 18 38 23 32 11 38 12 32 13 -999 12 -999 18 -999 21 -999 19 -999 21 -999 15 -999 16 -999 15 -999 11 -999 20 -999 21 -999 19 -999 15 -999 26 -999 25 -999 16 -999 16 -999 18 -999 16 -999 13 -999 14 -999 14 -999 14 -999 28 -999 19 -999 18 -999 15 -999 15 -999 16 -999 15 -999 16 -999 14 -999 17 -999 16 -999 15 -999 18 -999 21 -999 20 -999 13 -999 23 -999 1.3.5.3.1. Data Used for Two-Sample t-Test http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm (3 of 6) [5/1/2006 9:57:14 AM] [...]... FACTOR 1 1 26672.726562 26672.726562 6.7080 99. 011 % ** http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda355.htm (2 of 5) [5 /1/ 2006 9:57 :16 AM] 1. 3.5.5 Multi-factor Analysis of Variance FACTOR 2 1 115 24.053 711 11 524.053 711 2.8982 91. 067% FACTOR 3 1 14380.633789 14 380.633789 3. 616 6 94. 219 % FACTOR 4 1 72 714 3 .12 5000 72 714 3 .12 5000 18 2.8703 10 0.000% ** ... -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda35 31. htm (4 of 6) [5 /1/ 2006 9:57 :14 AM] 1. 3.5.3 .1 Data Used for Two-Sample t-Test 19 20 19 21 20 25 21 19 21 21 19 18 19 18 18 18 30 31 23 24 22 20 22 20 21 17 18 17 18 17 16 19 19 36 27 23 24 34 35 28 29 27 34 32 28 26 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999.. .1. 3.5.3 .1 Data Used for Two-Sample t-Test 20 23 18 19 25 26 18 16 16 15 22 22 24 23 29 25 20 18 19 18 27 13 17 13 13 13 30 26 18 17 16 15 18 21 19 19 16 16 16 16 25 26 31 34 36 20 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999... http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda354.htm (2 of 4) [5 /1/ 2006 9:57 :15 AM] 1. 3.5.4 One-Factor ANOVA LEVEL-ID NI MEAN EFFECT SD(EFFECT) -FACTOR 1- 1.00000 10 0.99800 0.00036 0.0 017 8 -2.00000 10 0.99 910 0.0 014 6 0.0 017 8 -3.00000 10 0.99540 -0.00224 0.0 017 8 -4.00000 10 0.99820 0.00056 0.0 017 8 -5.00000 10 0.9 919 0 -0.00574 0.0 017 8 -6.00000 10 0.99880 0.0 011 6 0.0 017 8 -7.00000 10 1. 0 015 0... -FACTOR 1 1.00000 240 657.5 316 8 7.45428 2.87 818 -1. 00000 240 642.62286 -7.45453 2.87 818 FACTOR 2 1. 00000 240 645 .17 755 -4.89984 2.87 818 -1. 00000 240 654.97723 4.89984 2.87 818 FACTOR 3 1. 00000 240 655.55084 5.47345 2.87 818 -1. 00000 240 644.60376 -5.47363 2.87 818 FACTOR 4 -1. 00000 240 688.99890 38.9 215 1 2.87 818 -2.00000 240 611 .15 594 -38.9 214 5 2.87 818 MODEL RESIDUAL STANDARD DEVIATION... 0.0 017 8 -8.00000 10 1. 00040 0.00276 0.0 017 8 -9.00000 10 0.99830 0.00066 0.0 017 8 -10 .00000 10 0.99480 -0.00284 0.0 017 8 MODEL RESIDUAL STANDARD DEVIATION CONSTANT ONLY-0.0062789079 CONSTANT & FACTOR 1 ONLY-0.0059385784 Interpretation of Sample Output The output is divided into three sections 1 The first section prints the number of observations (10 0), the number of factors (10 ),... -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda35 31. htm (5 of 6) [5 /1/ 2006 9:57 :14 AM] 1. 3.5.3 .1 Data Used for Two-Sample t-Test 24 19 28 24 27 27 26 24 30 39 35 34 30 22 27 20 18 28 27 34 31 29 27 24 23 38 36 25 38 26 22 36 27 27 32 28 31 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999... FACTOR 1 ONLY-74.3 419 036865 CONSTANT & FACTOR 2 ONLY-74.5548 019 409 CONSTANT & FACTOR 3 ONLY-74. 514 7094727 CONSTANT & FACTOR 4 ONLY-63.7284545898 CONSTANT & ALL 4 FACTORS -63.057727 813 7 http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda355.htm (3 of 5) [5 /1/ 2006 9:57 :16 AM] 1. 3.5.5 Multi-factor Analysis of Variance Interpretation of Sample Output The output is divided into three sections 1 The... 18 2.8703 10 0.000% ** RESIDUAL 475 18 887 31. 500000 3976.276855 RESIDUAL STANDARD DEVIATION = 63.057727 81 RESIDUAL DEGREES OF FREEDOM = 475 REPLICATION STANDARD DEVIATION = 61. 89 010 620 REPLICATION DEGREES OF FREEDOM = 464 LACK OF FIT F RATIO = 2.6447 = THE 99.7269% POINT OF THE F DISTRIBUTION WITH 11 AND 464 DEGREES OF FREEDOM **************** * ESTIMATION * ****************... NUMBER OF LEVELS FOR FACTOR 1 NUMBER OF LEVELS FOR FACTOR 2 NUMBER OF LEVELS FOR FACTOR 3 NUMBER OF LEVELS FOR FACTOR 4 BALANCED CASE RESIDUAL STANDARD DEVIATION 0.63057727 814 E+02 RESIDUAL DEGREES OF FREEDOM REPLICATION CASE REPLICATION STANDARD DEVIATION 0. 618 9 010 6201E+02 REPLICATION DEGREES OF FREEDOM NUMBER OF DISTINCT CELLS = = = = = = 480 4 2 2 2 2 = = 475 = = = 464 16 ***************** * ANOVA . FACTOR 2 1 115 24.053 711 11 524.053 711 2.8982 91. 067% FACTOR 3 1 14380.633789 14 380.633789 3. 616 6 94. 219 % FACTOR 4 1 72 714 3 .12 5000 72 714 3 .12 5000 18 2.8703 10 0.000% ** RESIDUAL 475 18 887 31. 500000. 33 18 39 22 36 19 28 18 27 23 21 26 24 25 30 20 34 21 32 13 38 14 37 15 30 14 31 17 37 11 32 13 47 12 41 13 45 15 34 13 33 13 24 14 32 22 39 28 35 13 32 14 37 13 38 14 . 9:57 :14 AM] 18 34 18 38 23 32 11 38 12 32 13 -999 12 -999 18 -999 21 -999 19 -999 21 -999 15 -999 16 -999 15 -999 11 -999 20 -999 21 -999 19 -999 15 -999 26 -999 25 -999 16

Ngày đăng: 06/08/2014, 11:20