Detection of Change in Distribution 9.1 INTRODUCTION Frequency analysis (see Chapter 5) is a univariate method of identifying a likely population from which a sample was drawn. If the sample data fall near the fitted line that is used as the best estimate of the population, then it is generally safe to use the line to make predictions. However, “nearness to the line” is a subjective assess- ment, not a systematic statistical test of how well the data correspond to the line. That aspect of a frequency analysis is not objective, and individuals who have different standards as to what constitutes a sufficiently good agreement may be at odds on whether or not to use the fitted line to make predictions. After all, lines for other distributions may provide a degree of fit that appears to be just as good. To eliminate this element of subjectivity in the decision process, it is useful to have a systematic test for assessing the extent to which a set of sample data agree with some assumed population. Vogel (1986) provided a correlation coefficient test for normal, log-normal, and Gumbel distributions. The goal of this chapter is to present and apply statistical analyses that can be used to test for the distribution of a random variable. For example, if a frequency analysis suggested that the data could have been sampled from a lognormal distribu- tion, one of the one-sample tests presented in this chapter could be used to decide the statistical likelihood that this distribution characterizes the underlying population. If the test suggests that it is unlikely to have been sampled from the assumed prob- ability distribution, then justification for testing another distribution should be sought. One characteristic that distinguishes the statistical tests from one another is the number of samples for which a test is appropriate. Some tests are used to compare a sample to an assumed population; these are referred to as one-sample tests. Another group of tests is appropriate for comparing whether two distributions from which two samples were drawn are the same, known as two-sample tests. Other tests are appropriate for comparing samples from more than two distributions, referred to as k -sample tests. 9.2 CHI-SQUARE GOODNESS-OF-FIT TEST The chi-square goodness-of-fit test is used to test for a significant difference between the distribution suggested by a data sample and a selected probability distribution. It is the most widely used one-sample analysis for testing a population distribution. Many statistical tests, such as the t -test for a mean, assume that the data have been 9 L1600_Frame_C09.fm Page 209 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC drawn from a normal population, so it may be necessary to use a statistical test, such as the chi-square test, to check the validity of the assumption for a given sample of data. The chi-square test can also be used as part of the verification phase of modeling to verify the population assumed when making a frequency analysis. 9.2.1 P ROCEDURE Data analysts are often interested in identifying the density function of a random variable so that the population can be used to make probability statements about the likelihood of occurrence of certain values of the random variable. Very often, a histogram plot of the data suggests a likely candidate for the population density function. For example, a frequency histogram with a long right tail might suggest that the data were sampled from a lognormal population. The chi-square test for goodness of fit can then be used to test whether the distribution of a random variable suggested by the histogram shape can be represented by a selected theoretical probability density function (PDF). To demonstrate the quantitative evaluation, the chi-square test will be used to evaluate hypotheses about the distribution of the number of storm events in 1 year, which is a discrete random variable. Step 1: Formulate hypotheses . The first step is to formulate both the null ( H 0 ) and the alternative ( H A ) hypotheses that reflect the theoretical density func- tion (PDF; continuous random variables) or probability mass function (PMF; discrete random variables). Because a function is not completely defined without the specification of its parameters, the statement of the hypotheses must also include specific values for the parameters of the function. For example, if the population is hypothesized to be normal, then µ and σ must be specified; if the hypotheses deal with the uniform distribution, values for the location α and scale β parameters must be specified. Estimates of the parameters may be obtained either empirically or from external condi- tions. If estimates of the parameters are obtained from the data set used in testing the hypotheses, the degrees of freedom must be modified to reflect this. General statements of the hypotheses for the chi-square goodness-of-fit test of a continuous random variable are: H 0 : X ∼ PDF (stated values of parameters) (9.1a) H A : X ≠ PDF (stated values of parameters) (9.1b) If the random variable is a discrete variable, then the PMF replaces the PDF. The following null and alternative hypotheses are typical: H 0 : The number of rainfall events that exceed 1 cm in any year at a particular location can be characterized by a uniform density function with a loca- tion parameter of zero and a scale parameter of 40. H A : The uniform population U (0, 40) is not appropriate for this random variable. L1600_Frame_C09.fm Page 210 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC Mathematically, these hypotheses are H 0 : f ( n ) = U( α = 0, β = 40) (9.2a) H A : f ( n ) ≠ U( α = 0, β = 40) (9.2b) Note specifically that the null hypothesis is a statement of equality and the alternative hypothesis is an inequality. Both hypotheses are expressed in terms of population parameters, not sample statistics. Rejection of the null hypothesis would not necessarily imply that the ran- dom variable is not uniformly distributed. It may also be rejected because one or both of the parameters, in this case 0 and 40, are incorrect. Rejection may result because the assumed distribution is incorrect, one or more of the assumed parameters is incorrect, or both. The chi-square goodness-of-fit test is always a one-tailed test because the structure of the hypotheses are unidirectional; that is, the random variable is either distributed as specified in the null hypothesis or it is not. Step 2: Select the appropriate model. To test the hypotheses formulated in step 1, the chi-square test is based on a comparison of the observed fre- quencies of values in the sample with frequencies expected with the PDF of the population, which is specified in the hypotheses. The observed data are typically used to form a histogram that shows the observed frequencies in a series of k cells. The cell bounds are often selected such that the cell width for each cell is the same; however, unequal cell widths could be selected to ensure a more even distribution of the observed and expected frequencies. Having selected the cell bounds and counted the observed frequencies for cell i ( O i ), the expected frequencies E i for each cell can be computed using the PDF of the population specified in the null hypothesis of step 1. To compute the expected frequencies, the expected probability for each cell is determined for the assumed population and multiplied by the sample size n . The expected probability for cell i , p i , is the area under the PDF between the cell bounds for that cell. The sum of the expected frequencies must equal the total sample size n . The frequencies can be summarized in a cell structure format, such as Figure 9.1a. The test statistic, which is a random variable, is a function of the observed and expected frequencies, which are also random variables: (9.3) where χ 2 is the computed value of a random variable having a chi-square distribution with ν degrees of freedom; O i and E i are the observed and ex- pected frequencies in cell i , respectively; and k is the number of discrete cat- egories (cells) into which the data are separated. The random variable χ 2 has χ 2 2 1 = − = ∑ ()OE E ii i i k L1600_Frame_C09.fm Page 211 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC a sampling distribution that can be approximated by the chi-square distribu- tion with k − j degrees of freedom, where j is the number of quantities that are obtained from the sample of data for use in calculating the expected fre- quencies. Specifically, since the total number of observations n is used to compute the expected frequencies, 1 degree of freedom is lost. If the mean and standard deviation of the sample are needed to compute the expected frequencies, then two additional degrees of freedom are subtracted (i.e., υ = k − 3). However, if the mean and standard deviation are obtained from past experience or other data sources, then the degrees of freedom for the test sta- tistic remain υ = k − 1. It is important to note that the degrees of freedom do not directly depend on the sample size n ; rather they depend on the number of cells. Step 3: Select the level of significance . If the decision is not considered critical, a level of significance of 5% may be considered appropriate, because of convention. A more rational selection of the level of significance will be discussed later. For the test of the hypotheses of Equation 9.2, a value of 5% is used for illustration purposes. Step 4: Compute estimate of test statistic . The value of the test statistic of Equation 9.3 is obtained from the cell frequencies of Figure 9.1b. The range of the random variable was separated into four equal intervals of ten. Thus, the expected probability for each cell is 0.25 (because the random variable is assumed to have a uniform distribution and the width of the cells is the same). For a sample size of 80, the expected frequency for each of the four cells is 20 (i.e., the expected probability times the total number of observa- tions). Assume that the observed frequencies of 18, 19, 25, and 18 are deter- mined from the sample, which yields the cell structure shown in Figure 9.1b. FIGURE 9.1 Cell structure for chi-square goodness-of-fit test: (a) general structure; and (b) structure for the number of rainfall events. Cell bound −∞ . . . . Cell number i 1 2 3 . . . . k Observed frequency (O i ) O 1 O 2 O 3 . . . . O k Expected frequency (E i ) E 1 E 2 E 3 . . . . E k (O i – E i ) 2 /E i . . . . (a) Cell bound 0 10 20 30 40 Cell number i 1234 Observed frequency (O i )18192518 Expected frequency (E i )20202020 (O i – E i ) 2 /E i 0.20 0.05 1.25 0.20 (b) ()OE E 11 2 1 − ()OE E 22 2 2 − ()OE E 33 2 3 − ()OE E kk k − 2 L1600_Frame_C09.fm Page 212 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC Using Equation 9.3, the computed statistic χ 2 equals 1.70. Because the total frequency of 80 was separated into four cells for computing the expected frequencies, the number of degrees of freedom is given by υ = k − 1, or 4 − 1 = 3. Step 5: Define the region of rejection . According to the underlying theorem of step 2, the test statistic has a chi-square distribution with 3 degrees of freedom. For this distribution and a level of significance of 5%, the critical value of the test statistic is 7.81 (Table A.3). Thus, the region of rejection consists of all values of the test statistic greater than 7.81. Note again, that for this test the region of rejection is always in the upper tail of the chi- square distribution. Step 6: Select the appropriate hypothesis . The decision rule is that the null hypothesis is rejected if the chi-square value computed in step 4 is larger than the critical value of step 5. Because the computed value of the test statistic (1.70) is less than the critical value (7.81), it is not within the region of rejection; thus the statistical basis for rejecting the null hypothesis is not significant. One may then conclude that the uniform distribution with location and scale parameters of 0 and 40, respectively, may be used to represent the distribution of the number of rainfall events. Note that other distributions could be tested and found to be statistically acceptable, which suggests that the selection of the distribution to test should not be an arbitrary decision. In summary, the chi-square test for goodness of fit provides the means for comparing the observed frequency distribution of a random variable with a popula- tion distribution based on a theoretical PDF or PMF. An additional point concerning the use of the chi-square test should be noted. The effectiveness of the test is diminished if the expected frequency in any cell is less than 5. When this condition occurs, both the expected and observed frequencies of the appropriate cell should be combined with the values of an adjacent cell; the value of k should be reduced to reflect the number of cells used in computing the test statistic. It is important to note that this rule is based on expected frequencies, not observed frequencies. To illustrate this rule of thumb, consider the case where observed and expected frequencies for seven cells are as follows: Note that cells 3 and 7 have expected frequencies less than 5, and should, therefore, be combined with adjacent cells. The frequencies of cell 7 can be combined with the frequencies of cell 6. Cell 3 could be combined with either cell 2 or cell 4. Unless physical reasons exist for selecting which of the adjacent cells to use, it is probably best to combine the cell with the adjacent cell that has the lowest expected Cell 1234 567 O i 3975 946 E i 68461072 L1600_Frame_C09.fm Page 213 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC frequency count. Based on this, cells 3 and 4 would be combined. The revised cell configuration follows: The value of k is now 5, which is the value to use in computing the degrees of freedom. Even though the observed frequency in cell 1 is less than 5, that cell is not combined. Only expected frequencies are used to decide which cells need to be combined. Note that a cell count of 5 would be used to compute the degrees of freedom, rather than a cell count of 7. 9.2.2 C HI -S QUARE T EST FOR A N ORMAL D ISTRIBUTION The normal distribution is widely used because many data sets have shown to have a bell-shaped distribution and because many statistical tests assume the data are normally distributed. For this reason, the test procedure is illustrated for data assumed to follow a normal population distribution. Example 9.1 To illustrate the use of the chi-square test with the normal distribution, a sample of 84 discharges is used. The histogram of the data is shown in Figure 9.2. The sample mean and standard deviation of the random variable were 10,100, and 780, respec- tively. A null hypothesis is proposed that the random variable is normally distributed with a mean and standard deviation of 10,100 and 780, respectively. Note that the sample moments are being used to define the population parameters in the statement of hypotheses; this will need to be considered in the computation of the degrees of freedom. Table 9.1 gives the cell bounds used to form the observed and expected frequency cells (see column 2). The cell bounds are used to compute standardized Cell 1 2 3 4 5 O i 3 9 12 9 10 E i 6 8 10 10 9 FIGURE 9.2 Histogram of discharge rate ( Q , cfs). 8000 9000 10,000 11.000 12,000 Q 12 4 20 24 13 7 4 L1600_Frame_C09.fm Page 214 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC TABLE 9.1 Computations for Example 9.1 Cell i Cell Bound z i P(z < z i ) Expected Probability Expected Frequency Observed Frequency 1 9000 0.0793 0.0793 6.66 12 2 9500 0.2206 0.1413 11.87 4 3 10,000 0.4483 0.2277 19.13 20 4 10,500 0.6985 0.2502 21.02 24 5 11,000 0.8749 0.1764 14.82 13 6 11,500 0.9633 0.0884 11 7 ∞ 1.0000 84 84 10.209 () 2 OE E ii i − 9000 10 100 780 141 − =− , . (.) . . 12 6 66 666 4 282 2 − = 9500 10 100 780 077 − =− , . (.) . . 41187 11 87 5 218 2 − = 10 000 10 100 780 013 ,, . − =− (.) . . 20 19 13 19 13 0 040 2 − = 10 500 10 100 780 052 ,, . − = (.) . . 24 21 02 21 02 0 422 2 − = 11 000 10 100 780 115 ,, . − = (.) . . 13 14 82 14 82 0 224 2 − = 11 500 10 100 780 179 ,, . − = 742 308 10 50 . . . (.) . . 11 10 50 10 50 0 024 2 − = 0 0367 1 0000 . . L1600_Frame_C09.fm Page 215 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC variates z i for the bounds of each interval (column 3), the probability that the variate z is less than z i (column 4), the expected probabilities for each interval (column 5), the expected and observed frequencies (columns 6 and 7), and the cell values of the chi-square statistic of Equation 9.3 (column 8). The test statistic has a computed value of 10.209. Note that because the expected frequency for the seventh interval was less than 5, both the observed and expected frequencies were combined with those of the sixth cell. Three degrees of freedom are used for the test. With a total of six cells, 1 degree of freedom was lost for n, while two were lost for the mean and standard deviation, which were obtained from the sample of 84 observations. (If past evidence had indicated a mean of 10,000 and a standard deviation of 1000, and these statistics were used in Table 9.1 for computing the expected probabilities, then 5 degrees of freedom would be used.) For a level of significance of 5% and 3 degrees of freedom, the critical chi- square value is 7.815. The null hypothesis is, therefore, rejected because the computed value is greater than the critical value. One may conclude that discharges on this watershed are not normally distributed with µ = 10,100 and σ = 780. The reason for the rejection of the null hypothesis may be due to one or more of the following: (1) the assumption of a normal distribution is incorrect, (2) µ ≠ 10,100, or (3) σ ≠ 780. Alternative Cell Configurations Cell boundaries are often established by the way the data were collected. If a data set is collected without specific bounds, then the cell bounds for the chi-square test cells can be established at any set of values. The decision should not be arbitrary, especially with small sample sizes, since the location of the bounds can influence the decision. For small and moderate sample sizes, multiple analyses with different cell bounds should be made to examine the sensitivity of the decision to the place- ment of the cell bounds. While any cell bounds can be specified, consider the following two alternatives: equal intervals and equal probabilities. For equal-interval cell separation, the cell bounds are separated by an equal cell width. For example, test scores could be separated with an interval of ten: 100–90, 90–80, 80–70, and so on. Alternatively, the cell bounds could be set such that 25% of the underlying PDF was in each cell. For the standard normal distribution N(0, 1) with four equal-probability cells, the upper bounds of the cells would have z values of −0.6745, 0.0, 0.6745, and ∞. The advantage of the equal-probability cell alternative is that the probability can be set to ensure that the expected frequencies are at least 5. For example, for a sample size of 20, 4 is the largest number of cells that will ensure expected frequencies of 5. If more than four cells are used, then at least 1 cell will have an E i of less than 5. Comparison of Cell Configuration Alternatives The two-cell configuration alternatives can be used with any distribution. This will be illustrated using the normal distribution. L1600_Frame_C09.fm Page 216 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC Example 9.2 Consider the total lengths of storm-drain pipe used on 70 projects (see Table B.6). The pipe-length values have a mean of 3096 ft and a standard deviation of 1907 ft. The 70 lengths are allocated to eight cells using an interval of 1000 ft (see Table 9.2 and Figure 9.3a). The following hypotheses will be tested: Pipe length ~ N( µ = 3096, σ = 1907) (9.4a) Pipe length ≠ N(3096, 1907) (9.4b) Note that the sample statistics are used to define the hypotheses and will, therefore, be used to compute the expected frequencies. Thus, 2 degrees of freedom will be subtracted because of their use. To compute the expected probabilities, the standard normal deviates z that correspond to the upper bounds X u of each cell are computed (see column 4 of Table 9.2) using the following transformation: (9.5) The corresponding cumulative probabilities are computed from the cumulative standard normal curve (Table A.1) and are given in column 5. The probabilities associated with each cell (column 6) are taken as the differences of the cumulative probabilities of column 5. The expected frequencies (E i ) equal the product of the sample size 70 and the probability p i (see column 7). Since the expected frequencies in the last two cells are less than 5, the last three cells are combined, which yields six cells. The cell values of the chi-square statistic of Equation 9.3 are given in column 8, with a sum of 22.769. For six cells with 3 degrees of freedom lost, the TABLE 9.2 Chi-Square Test of Pipe Length Data Using Equal Interval Cells Cell Length (ft) Range Observed Frequency, O i z i ∑p i p i E i == == np i 10–1000 3 −1.099 0.1358 0.1358 9.506 4.453 2 1000–2000 20 −0.574 0.2829 0.1471 10.297 9.143 3 2000–3000 22 −0.050 0.4801 0.1972 13.804 4.866 4 3000–4000 8 0.474 0.6822 0.2021 14.147 2.671 5 4000–5000 7 0.998 0.8409 0.1587 11.109 1.520 6 5000–6000 2 1.523 0.9361 0.0952 0.116 7 6000–7000 3 2.047 0.9796 0.0435 11.137 0 8 7000–∞ 5 ∞ 1.0000 0.0204 0 70 70.000 22.769 () 2 OE E ii i − 6.664 3.045 1.428 z XX S X u x u = − = − 3096 1907 L1600_Frame_C09.fm Page 217 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC critical test statistic for a 5% level of significance is 7.815. Thus, the computed value is greater than the critical value, so the null hypothesis can be rejected. The null hypothesis would be rejected even at a 0.5% level of significance ( ). Therefore, the distribution specified in the null hypothesis is unlikely to characterize the underlying population. For a chi-square analysis using the equal-probability alternative, the range is divided into eight cells, each with a probability of 1/8 (see Figure 9.3b). The cumulative probabilities are given in column 1 of Table 9.3. The z i values (column 2) that correspond to the cumulative probabilities are obtained from the standard normal table (Table A.1). The pipe length corresponding to each z i value is computed by (see column 3): X u = µ + z i σ = 3096 + 1907z i (9.6) These upper bounds are used to count the observed frequencies (column 4) from the 70 pipe lengths. The expected frequency (E i ) is np = 70(1/8) = 8.75. Therefore, the computed chi-square statistic is 18.914. Since eight cells were used and 3 degrees FIGURE 9.3 Frequency histogram of pipe lengths (L, ft × 10 3 ) using (a) equals interval and (b) equal probability cells. 01234567 8 01234567 8 3 17 12 13 6 5 5 9 3 20 22 8 7 2 3 5 (a) (b) L L χ . . 005 2 12 84= L1600_Frame_C09.fm Page 218 Tuesday, September 24, 2002 3:26 PM © 2003 by CRC Press LLC [...]... 0.5000 0.5704 0.6000 0.7000 0.8000 0 .90 00 0 .95 00 0 .96 00 0 .97 50 0 .98 00 0 .99 00 0 .99 50 0 .99 80 0 .99 90 0 .99 95 0 .99 99 1.0000 −5.274 −4.462 −4.100 −3.730 −3.223 −2.824 −2.407 −2.268 −1 .96 7 −1.8 19 −1.333 −0. 790 −0.4 29 −0.1 39 −0.061 0.116 0.285 0.356 0. 596 0.857 1.183 1.423 1.4 89 1.611 1.663 1.806 1 .92 6 2.057 2.141 2.213 2.350 — X = X + KS Y = 10X 2.243 2.471 2.572 2.676 2.818 2 .93 0 3.047 3.086 3.170 3.212 3.348... 0.55, 1.87, 1.66, 1. 49, 1.52, 0.46, 0.48, 0.87, 0.13, 1.15, 1.17, 0.34, 0.20} 9- 6 Use the chi-square goodness-of-fit test to decide if the Floyd River data ( 193 5– 197 3) are from a log-Pearson type III distribution: X = {1460, © 2003 by CRC Press LLC L1600_Frame_C 09. fm Page 244 Tuesday, September 24, 2002 3:26 PM 9- 7 9- 8 9- 9 9- 1 0 9- 1 1 9- 1 2 9- 1 3 9- 1 4 9- 1 5 4050, 3570, 2060, 1300, 1 390 , 1720, 6280, 1360,... 8320, 1 390 0, 71500, 6250, 2260, 318, 1330, 97 0, 192 0, 15100, 2870, 20600, 3810, 726, 7500, 7170, 2000, 8 29, 17300, 4740, 13400, 294 0, 5660} Use the chi-square goodness-of-fit test to decide if the Shoal Creek, Tennessee, annual-minimum 7-day low flows are from a log-Pearson type III distribution: X = {99 , 90 , 116, 142, 99 , 63, 128, 126, 93 , 83, 83, 111, 112, 127, 97 , 71, 84, 56, 108, 123, 120, 116, 98 , 145,... 2 Yeara Nd b CDD 197 3 197 4 197 5 197 6 197 8 197 9 9 9 9 6 11 9 18.57 1.81 64.47 6.67 11.67 18.72 20.32 22.62 Zone 3 b Nd b 8 8 9 7 10 9 Mean Sd CDD b 116.55 78.72 86.73 30.20 35.58 58. 89 67.78 32.52 a Complete data for 197 7 are not available because the melt began before data collection was initiated b (CDD = cumulative degree-days during the number of days, N , d required for the snow-covered area to... 0.5687 0. 596 8 0.6240 0.6510 0.6642 0.7 498 0.7663 0.7 692 0.7826 0.8031 0.8315 0.8487 0.8650 1.0000 0.0273 0.01 49 0.0322 0.0082 0.0142 0.0142 0.0533 0.10 79 0.1086 0.12 39* 0.1104 0. 096 8 0.0823 0.0677 0.0 392 0.0831 0.0580 0.0 192 0.0 091 0.0302 0.0435 0.0680 0. 093 3 0.0000 Log (x) = {3.642, 3.550, 3. 393 , 3.817, 3.713, 3.674, 3.435, 3.723, 3.818, 2.7 39, 3.835, 3.581, 3.814, 3 .91 9, 3.864, 3.215, 3. 696 , 3.346,... 2.676 2.818 2 .93 0 3.047 3.086 3.170 3.212 3.348 3.500 3.602 3.683 3.705 3.755 3.802 3.822 3.8 89 3 .96 2 4.054 4.121 4.140 4.174 4.188 4.228 4.262 4. 299 4.322 4.343 4.381 — 175 296 374 474 658 851 1114 12 19 1481 16 29 2230 3166 399 7 4820 5068 5682 6337 6635 7747 91 68 11317 13213 13788 1 491 8 15428 1 692 0 18283 198 97 21006 22005 24040 — Oi pi ei = npi 0 0 0 0 1 0 0 0 0 1 0 2 6 8 1 4 2 0 4 4 2 0 0 0 0 1 0 1... 1.22, 2.61, 0.26, 0.05, 1.88, 1.80, 0.61, 0.16, 0.38, 0. 79} 9- 5 Use the chi-square goodness-of-fit test to decide if the following data are from an exponential distribution: X = {0. 39, 1.34, 0.06, 0.01, 0.43, 1.61, 0. 79, 0.11, 0 .98 , 1. 19, 0.03, 0.02, 0.70, 0.68, 0.31, 0.58, 2 .96 , 1.44, 1 .97 , 1.46, 0 .92 , 0.08, 0.67, 1.24, 1.18, 0.18, 1.70, 0.14, 0. 49, 0.78, 1.24, 1.16, 1.02, 0.34, 1.37, 2.70, 1.65, 0.50,... 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 0.0417 0.0833 0.1250 0.1667 0.2083 0.2500 0. 291 7 0.3333 0.3750 0.4167 0.4583 0.5000 0.5417 0.5833 0.6250 0.6667 0.7083 0.7500 0. 791 7 0.8333 0.8750 0 .91 67 0 .95 83 1.0000 −2.186 −1. 292 −1.006 −0 .93 5 −0.863 −0.720 −0. 399 −0.148 −0.041 0.102 0.173 0.245 0.316 0.388 0.424 0.674 0.727 0.736 0.781 0.853 0 .96 0 1.031 1.103 ∞ 0.0144 0. 098 2 0.1572 0.17 49 0. 194 1...L1600_Frame_C 09. fm Page 2 19 Tuesday, September 24, 2002 3:26 PM TABLE 9. 3 Chi-Square Test of Pipe Length Data Using Equal Probability Cells ∑p 0.125 0.250 0.375 0.500 0.625 0.750 0.875 1.000 X X −1.150 −0.675 −0.3 19 0.000 0.3 19 0.675 1.150 ∞ 90 3 18 09 2488 3 096 3704 4383 52 89 ∞ Oi 3 17 12 13 6 5 5 9 Xi 70/8 70/8 70/8 70/8 70/8 70/8 70/8 70/8 3.7 79 7.7 79 1.207 2.064 0.864 1.607 1.607 0.007 18 .91 4 of freedom... the down-gradient well are assumed to have a larger proportion of high concentrations 9. 6 PROBLEMS 9- 1 Use the chi-square goodness-of-fit test to decide if the following data are from a uniform distribution U(0, 100): Ui = {94 , 67, 44, 70, 64, 10, 01, 86, 31, 40, 70,74, 44, 79, 30, 13, 70, 22, 55, 45, 68, 12, 58, 50, 13, 28, 77, 42, 29, 54, 69, 86, 54, 11, 57, 01, 29, 72, 61, 36, 11, 62, 65, 44, 98 , 22, . 0.0426 0 .99 50 1 .92 6 4.262 18283 0 0.0050 0. 190 0 0.0476 0 .99 80 2.057 4. 299 198 97 1 0.0030 0.1140 37/38 0.0243 0 .99 90 2.141 4.322 21006 0 0.0010 0.0380 0.0253 0 .99 95 2.213 4.343 22005 0 0.0005 0.0 190 . 1 .90 00 0.02 89 0 .96 00 1.4 89 4.140 13788 0 0.0100 0.3800 0.03 89 0 .97 50 1.611 4.174 1 491 8 0 0.0150 0.5700 0.05 39 0 .98 00 1.663 4.188 15428 0 0.0050 0. 190 0 0.05 89 0 .99 00 1.806 4.228 1 692 0 1 0.0100. 0.7 692 0.0 192 19 3.83 1 19 0. 791 7 0.781 0.7826 0.0 091 20 3.85 1 20 0.8333 0.853 0.8031 0.0302 21 3.88 1 21 0.8750 0 .96 0 0.8315 0.0435 22 3 .90 1 22 0 .91 67 1.031 0.8487 0.0680 23 3 .92 1 23 0 .95 83