© 2002 By CRC Press LLC Figure 37.8 is the histogram of 1000 sample variances, each calculated using three observations drawn from a normal distribution with σ 2 = 25. The average of the simulated sample variances was 25.3, with 30 values above 100 and 190 values of five or less. This is the range of variation in for sample size n = 3. A formal comparison of the equality of two sample variances uses the F statistic. Comparing two samples variances, each estimated with three degrees of freedom, would use the upper 5% value of F 3,3 = 9.28. If the ratio of the larger to the smaller of two variances is less than this F value, the two variances would be considered equal. For F 3,3 = 9.28, this would include variances from 25/9.28 = 2.7 to 25(9.28) = 232. This shows that the variance of repeat observations in a calibration experiment will be quite variable due to random experimental error. If triplicate observations in a calibration experiment did have true constant variance σ 2 = 25, replicates at one concentration level could have s 2 = 3, and at another level (not necessarily a higher concentration) the variance could be s 2 = 200. Therefore, our interest is not in ‘‘unchanging” variance, but rather in the pattern of change over the range of x or y. If change from one level of y to another is random, the variances are probably just reflecting random sampling error. If the variance increases in proportion to one of the variables, weighted least squares should be used. Making the slopes in Figure 37.7 integer values was justified by saying that the variance is estimated with low precision when there are only three replicates. Box (personal communication) has shown that the percent error in the variance is % error = 100/ , where ν is the degrees of freedom. From this, about 200 observations of y would be needed to estimate the variance with an error of 5%. Comments Nonconstant variance may occur in a variety of situations. It is common in calibration data because they cover a wide range of concentration, and also because certain measurement errors tend to be multiplicative instead of additive. Using unweighted least squares when there is nonconstant variance will distort all calculated t statistics, confidence intervals, and prediction intervals. It will lead to wrong decisions about the form of the calibration model and which parameters should be included in the model, and give biased estimates of analyte concentrations. The appropriate weights can be determined from the data if replicate measurements have been made at some settings of x. These should be true replicates and not merely multiple measurements on the same standard solution. If there is no replication, one may falsely assume that the variance is constant when it is not. If you suspect nonconstant variance, based on prior experience or knowledge about an instrument, apply reasonable weights. Any reasonable weighting is likely to be better than none. FIGURE 37.8 Distribution of 1000 simulated sample variances, each calculated using three observations drawn at random from a normal distribution with σ 2 = 25. The average of the 1000 simulated values is 25.3, with 30 variances above 100 and 190 variances of five or less. 0 25 50 75 100 125 150 0 100 200 Simulated Sample Variance, s 2 Frequency s i 2 2 ν L1592_frame_C37.fm Page 334 Tuesday, December 18, 2001 3:20 PM © 2002 By CRC Press LLC One reason analysts often make many measurements at low concentrations is to use the calibration data to calculate the limit of detection for the measurement process. If this is to be done, proper weighting is critical (Zorn et al., 1997 and 1999). References Currie, L. A. (1984). “Chemometrics and Analytical Chemistry,” in Chemometrics: Mathematics and Statistics in Chemistry, NATO ASI Series C, 138, 115–146. Danzer, K. and L. A. Currie (1998). “Guidelines for Calibration in Analytical Chemistry,” Pure Appl. Chem., 70, 993–1014. Gibbons, R. D. (1994). Statistical Methods for Groundwater Monitoring, New York, John Wiley. Draper, N. R. and H. Smith, (1998). Applied Regression Analysis, 3rd ed., New York, John Wiley. Otto, M. (1999). Chemometrics, Weinheim, Germany, Wiley-VCH. Zorn, M. E., R. D. Gibbons, and W. C. Sonzogni (1999). ‘‘Evaluation of Approximate Methods for Calculating the Limit of Detection and Limit of Quantitation,” Envir. Sci. & Tech., 33(13), 2291–2295. Zorn, M. E., R. D. Gibbons, and W. C. Sonzogni (1997). “Weighted Least Squares Approach to Calculating Limits of Detection and Quantification by Modeling Variability as a Function of Concentration,” Anal. Chem., 69(15), 3069–3075. Exercises 37.1 ICP Calibration. Fit the ICP calibration data for iron (Fe) below using weights that are inversely proportional to the square of the peak intensity (I). 37.2 Nitrate Calibration I. For the case study nitrate data (Table 37.1), plot the residuals obtained by fitting a cubic calibration curve using unweighted regession. 37.3 Nitrate Calibration II. For the case study nitrate data (Table 37.1), compare the results of fitting the calibration curve using weights 1/x 2 with those obtained using 1/s 2 and 1/y 2 . 37.4 Chloride Calibration. The following table gives triplicate calibration peaks for HPLC mea- surement of chloride. Determine appropriate weights and fit the calibration curve. Plot the residuals to check the adequacy of the calibration model. Standard Fe Conc. (mg/L) 0 50 100 200 Peak Intensity (I) 0.029 109.752 217.758 415.347 Chloride (mg/L) Peak 1 Peak 2 Peak 3 0.2 1112 895 1109 0.5 1892 1806 1796 0.7 3242 3162 3191 1.0 4519 4583 4483 2.0 9168 9159 9146 3.5 15,915 16,042 15,935 5.0 23,485 23,335 23,293 10.0 49,166 50,135 49,439 17.5 92,682 93,288 92,407 25.0 137,021 140,137 139,938 50.0 318,984 321,468 319,527 75.0 505,542 509,773 511,877 100.0 700,231 696,155 699,516 Source: Greg Zelinka, Madison Metropolitan Sewerage District. L1592_frame_C37.fm Page 335 Tuesday, December 18, 2001 3:20 PM © 2002 By CRC Press LLC 37.5 BOD Parameter Estimation. The data below are duplicate measurements of the BOD of fresh bovine manure. Use weighted nonlinear least squares to estimate the parameters in the model η = θ 1 (1 − exp (− θ 2 t). Day 13571015 BOD (mg/L) 11,320 20,730 28,000 32,000 35,200 33,000 11,720 22,320 29,600 33,600 32,000 36,600 Source: Marske, D. M. and L. B. Polkowski (1972). J. WPCF, 44, 1987–1992. L1592_frame_C37.fm Page 336 Tuesday, December 18, 2001 3:20 PM © 2002 By CRC Press LLC 38 Empirical Model Building by Linear Regression KEY WORDS all possible regressions, analysis of variance, coefficient of determination, confidence interval, diagnostic checking, empirical models, F test, least squares, linear regression, overfitting, par- simonious model, polynomial, regression sum of squares, residual plot, residual sum of squares, sedimen- tation, solids removal, standard error, t statistic, total sum of squares. Empirical models are widely used in engineering. Sometimes the model is a straight line; sometimes a mathematical French curve — a smooth interpolating function — is needed. Regression provides the means for selecting the complexity of the French curve that can be supported by the available data. Regression begins with the specification of a model to be fitted. One goal is to find a parsimonious model — an adequate model with the fewest possible terms. Sometimes the proposed model turns out to be too simple and we need to augment it with additional terms. The much more common case, however, is to start with more terms than are needed or justified. This is called overfitting . Overfitting is harmful because the prediction error of the model is proportional to the number of parameters in the model. A fitted model is always checked for inadequacies. The statistical output of regression programs is somewhat helpful in doing this, but a more satisfying and useful approach is to make diagnostic plots of the residuals. As a minimum, the residuals should be plotted against the predicted values of the fitted model. Plots of residuals against the independent variables are also useful. This chapter illustrates how this diagnosis is used to decide whether terms should be added or dropped to improve a model. If a tentative model is modified, it is refitted and rechecked. The model builder thus works iteratively toward the simplest adequate model. A Model of Sedimentation Sedimentation removes solid particles from a liquid by allowing them to settle under quiescent conditions. An ideal sedimentation process can be created in the laboratory in the form of a batch column. The column is filled with the suspension (turbid river water, industrial wastewater, or sewage) and samples are taken over time from sampling ports located at several depths along the column. The measure of sedimentation efficiency will be solids concentrations (or fraction of solids removed), which will be measured as a function of time and depth. The data come from a quiescent batch settling test. At the beginning of the test, the concentration is uniform over the depth of the test settling column. The mass of solids in the column initially is M = C 0 ZA , where C 0 is the initial concentration (g/m 3 ), Z is the water depth in the settling column (m), and A is the cross-sectional area of the column (m 2 ). This is shown in the left-hand panel of Figure 38.1. After settling has progressed for time t , the concentration near the bottom of the column has increased relative to the concentration at the top to give a solids concentration profile that is a function of depth at any time t . The mass of solids remaining above depth z is M = A ∫ C ( z , t ) dz . The total mass of solids in the column is still M = C 0 ZA . This is shown in the right-hand panel of Figure 38.1. L1592_frame_C38 Page 337 Tuesday, December 18, 2001 3:21 PM © 2002 By CRC Press LLC The fraction of solids removed in a settling tank at any depth z , that has a detention time t , is estimated as: This integral could be calculated graphically (Camp, 1946) or an approximating polynomial can be derived for the concentration curve and the fraction of solids removed ( R ) can be calculated algebraically. Suppose, for example, that: C ( z , t ) = 167 − 2.74 t + 11.9 z − 0.08 zt + 0.014 t 2 is a satisfactory empirical model and we want to use this model to predict the removal that will be achieved with 60-min detention time, for a depth of 8 ft and an initial concentration of 500 mg/L. The solids concen- tration profile as a function of depth at t = 60 min is: C ( z , t ) = 167 − 2.74(60) + 11.9 z − 0.08 z (60) + 0.014(60) 2 = 53.0 + 7.1 z This is integrated over depth ( Z = 8 ft) to give the fraction of solids that are expected to be removed: The model building problem is to determine the form of the polynomial function and to estimate the coefficients of the terms in the function. Method: Linear Regression Suppose the correct model for the process is η = f ( β , x ) and the observations are y i = f ( β , x ) + e i , where the e i are random errors. There may be several parameters ( β ) and several independent variables ( x ). According to the least squares criterion, the best estimates of the β ’s minimize the sum of the squared residuals: where the summation is over all observations. FIGURE 38.1 Solids concentration as a function of depth at time. The initial condition ( t = 0) is shown on the left. The condition at time t is shown on the right. 0 Z z d z C( z , t )0 0 0 Z C 0 Solids profile at settling time t = 0 Solids profile at settling time t Mass of solids M = C 0 ZA Mass of solids M = A” C( z, t ) dz Depth Rz, t() AZC 0 A∫ 0 z Cz, t()zd– AZC 0 1 1 ZC 0 Cz, t()zd 0 z ∫ –== Rz 8, t 60==()1 1 8500() 53.0 7.1z+()zd z=0 z=8 ∫ –= 1 1 8500() 53 8() 3.55 8 2 ()+()– 0.84== minimize S β () y i η i –() 2 ∑ = L1592_frame_C38 Page 338 Tuesday, December 18, 2001 3:21 PM © 2002 By CRC Press LLC The minimum sum of squares is called the residual sum of squares, RSS. The residual mean square (RMS) is the residual sum of squares divided by its degrees of freedom. RMS = RSS /(n − p), where n = number of observations and p = number of parameters estimated. Case Study: Solution A column settling test was done on a suspension with initial concentration of 560 mg/L. Samples were taken at depths of 2, 4, and 6 ft (measured from the water surface) at times 20, 40, 60, and 120 min; the data are in Table 38.1. The simplest possible model is: C(z, t) = β 0 + β 1 t The most complicated model that might be needed is a full quadratic function of time and depth: C(z, t) = β 0 + β 1 t + β 2 t 2 + β 3 z + β 4 z 2 + β 5 zt We can start the model building process with either of these and add or drop terms as needed. Fitting the simplest possible model involving time and depth gives: which has R 2 = 0.844 and residual mean square = 355.82. R 2 , the coefficient of determination, is the percentage of the total variation in the data that is accounted for by fitting the model (Chapter 39). Figure 38.2a shows the diagnostic residual plots for the model. The residuals plotted against the predicted values are not random. This suggests an inadequacy in the model, but it does not tell us how TABLE 38.1 Data from a Laboratory Settling Column Test Suspended Solids Concentration at Time t (min) Depth (ft) 20 40 60 120 2 135 90 75 48 4 170 110 90 53 6 180 126 96 60 FIGURE 38.2 (a) Residuals plotted against the predicted suspended solids concentrations are not random. (b) Residuals plotted against settling time suggest that a quadratic term is needed in the model. y ˆ 132.3 7.12z 0.97t–+= L1592_frame_C38 Page 339 Tuesday, December 18, 2001 3:21 PM © 2002 By CRC Press LLC the model might be improved. The pattern of the residuals plotted against time (Figure 38.2b) suggests that adding a t 2 term may be helpful. This was done to obtain: which has R 2 = 0.97 and residual mean square = 81.5. A diagnostic plot of the residuals (Figure 38.3) reveals no inadequacies. Similar plots of residuals against the independent variables also support the model. This model is adequate to describe the data. The most complicated model, which has six parameters, is: The model contains quadratic terms for time and depth and the interaction of depth and time (zt). The analysis of variance for this model is given in Table 38.2. This information is produced by computer programs that do linear regression. For now we do not need to know how to calculate this, but we should understand how it is interpreted. Across the top, SS is sum of squares and df = degrees of freedom associated with a sum of squares quantity. MS is mean square, where MS = SS/df. The sum of squares due to regression is the regression sum of squares (RegSS): RegSS = 20,255.5. The sum of squares due to residuals is the residual sum of squares (RSS); RSS = 308.8. The total sum of squares, or Total SS, is: Total SS = RegSS + RSS Also: The residual sum of squares (RSS) is the minimum sum of squares that results from estimating the parameters by least squares. It is the variation that is not explained by fitting the model. If the model is correct, the RSS is the variation in the data due to random measurement error. For this model, RSS = 308.8. The residual mean square is the RSS divided by the degrees of freedom of the residual sum of squares. For RSS, the degrees of freedom is df = n − p, where n is the number of observations and p is the number of parameters in the fitted model. Thus, RMS = RSS /(n − p). The residual sum of squares (RSS = 308.8) and the TABLE 38.2 Analysis of Variance for the Six-Parameter Settling Linear Model Due to df SS MS = SS/df Regression (Reg SS) 5 20255.5 4051.1 Residuals (RSS) 6 308.8 51.5 Total (Total SS) 11 20564.2 FIGURE 38.3 Plot of residuals against the predicted values of the regression model = 185.97 + 7.125t + 0.014t 2 − 3.057z. • • • • • • • • • • • • Predicted SS Values Residuals 10 -10 0 160 40 80 120 200 y ˆ y ˆ 186.0 7.12z 3.06t– 0.0143t 2 ++= y ˆ 152 20.9z 2.74t– 1.13z 2 – 0.0143t 2 0.080zt––+= Total SS y i y–() 2 ∑ = L1592_frame_C38 Page 340 Tuesday, December 18, 2001 3:21 PM © 2002 By CRC Press LLC residual mean square (RMS = 308.8/6 = 51.5) are the key statistics in comparing this model with simpler models. The regression sum of squares (RegSS) shows how much of the total variation (i.e., how much of the Total SS) has been explained by the fitted equation. For this model, RegSS = 20,255.5. The coefficient of determination, commonly denoted as R 2 , is the regression sum of squares expressed as a fraction of the total sum of squares. For the complete six-parameter model (Model A in Table 38.3), R 2 = (20256/20564) = 0.985, so it can be said that this model accounts for 98.5% of the total variation in the data. It is natural to be fascinated by high R 2 values and this tempts us to think that the goal of model building is to make R 2 as high as possible. Obviously, this can be done by putting more high-order terms into a model, but it should be equally obvious that this does not necessarily improve the predictions that will be made using the model. Increasing R 2 is the wrong goal. Instead of worrying about R 2 values, we should seek the simplest adequate model. Selecting the “Best” Regression Model The “best” model is the one that adequately describes the data with the fewest parameters. Table 38.3 summarizes parameter estimates, the coefficient of determination R 2 , and the regression sum of squares for all eight possible linear models. The total sum of squares, of course, is the same in all eight cases because it depends on the data and not on the form of the model. Standard errors [SE] and t ratios (in parentheses) are given for the complete model, Model A. One approach is to examine the t ratio for each parameter. Roughly speaking, if a parameter’s t ratio is less than 2.5, the true value of the parameter could be zero and that term could be dropped from the equation. Another approach is to examine the confidence intervals of the estimated parameters. If this interval includes zero, the variable associated with the parameter can be dropped from the model. For example, in Model A, the coefficient of z 2 is b 3 = −1.13 with standard error = 1.1 and 95% confidence interval [ −3.88 to +1.62]. This confidence interval includes zero, indicating that the true value of b 3 is very likely to be zero, and therefore the term z 2 can be tentatively dropped from the model. Fitting the simplified model (without z 2 ) gives Model B in Table 38.3. The standard error [SE] is the number in brackets. The half-width of the 95% confidence interval is a multiple of the standard error of the estimated value. The multiplier is a t statistic that depends on the selected level of confidence and the degrees of freedom. This multiplier is not the same value as the t ratio given in Table 38.3. Roughly speaking, if the degrees of freedom are large (n − p ≥ 20), the half- width of the confidence interval is about 2SE for a 95% confidence interval. If the degrees of freedom are small (n − p < 10), the multiplier will be in the range of 2.3SE to 3.0SE. TABLE 38.3 Summary of All Possible Regressions for the Settling Test Model Coefficient of the Term Decrease Model b 0 b 1 zb 2 tb 3 z 2 b 4 t 2 b 5 tz R 2 RegSS in RegSS A 152 20.9 −2.74 −1.13 0.014 −0.08 0.985 20256 (t ratio) (2.3) (8.3) (1.0) (7.0) (2.4) [SE] [9.1] [0.33] [1.1] [0.002] [0.03] B 167 11.9 −2.74 0.014 −0.08 0.982 20202 54 C 171 16.1 −3.06 −1.13 0.014 0.971 19966 289 D 186 7.1 −3.06 0.143 0.968 19912 343 E 98 20.9 −0.65 −1.13 −0.08 0.864 17705 2550 F 113 11.9 −0.65 −0.08 0.858 17651 2605 G 117 16.1 −0.97 −1.13 0.849 17416 2840 H 132 7.1 −0.97 0.844 17362 2894 Note: () indicates t ratios of the estimated parameters. [] indicates standard errors of the estimated parameters. L1592_frame_C38 Page 341 Tuesday, December 18, 2001 3:21 PM © 2002 By CRC Press LLC After modifying a model by adding, or in this case dropping, a term, an additional test should be made to compare the regression sum of squares of the two models. Details of this test are given in texts on regression analysis (Draper and Smith, 1998) and in Chapter 40. Here, the test is illustrated by example. The regression sum of squares for the complete model (Model A) is 20,256. Dropping the z 2 term to get Model B reduced the regression sum of squares by only 54. We need to consider that a reduction of 54 in the regression sum of squares may not be a statistically significant difference. The reduction in the regression sum of squares due to dropping z 2 can be thought of as a variance associated with the z 2 term. If this variance is small compared to the variance of the pure experimental error, then the term z 2 contributes no real information and it should be dropped from the model. In contrast, if the variance associated with the z 2 term is large relative to the pure error variance, the term should remain in the model. There were no repeated measurements in this experiment, so an independent estimate of the variance due to pure error variance cannot be computed. The best that can be done under the circumstances is to use the residual mean square of the complete model as an estimate of the pure error variance. The residual mean square for the complete model (Model A) is 51.5. This is compared with the difference in regression sum of squares of the two models; the difference in regression sum of squares between Models A and B is 54. The ratio of the variance due to z 2 and the pure error variance is F = 54/51.5 = 1.05. This value is compared against the upper 5% point of the F distribution (1, 6 degrees of freedom). The degrees of freedom are 1 for the numerator (1 degree of freedom for the one parameter that was dropped from the model) and 6 for the denominator (the mean residual sum of squares). From Table C in the appendix, F 1,6 = 5.99. Because 1.05 < 5.99, we conclude that removing the z 2 term does not result in a significant reduction in the regression sum of squares. Therefore, the z 2 term is not needed in the model. The test used above is valid to compare any two of the models that have one less parameter than Model A. To compare Models A and E, notice that omitting t 2 decreases the regression sum of squares by 20256 − 17705 = 2551. The F statistic is 2551/51.5 = 49.5. Because 49.5 >> 5.99 (the upper 95% point of the F distribution with 1 and 6 degrees of freedom), this change is significant and t 2 needs to be included in the model. The test is modified slightly to compare Models A and D because Model D has two less terms than Model A. The decrease of 343 in the regression sum of squares results from dropping to terms (z 2 and zt). The F statistic is now computed using 343/2 in the numerator and 51.5 in the denominator: F = (343/2)/51.5 = 3.33. The upper 95% point of the appropriate reference distribution is F = 5.14, which has 2 degrees of freedom for the numerator and 6 degrees of freedom for the denominator. Because F for the model is less than the reference F (F = 3.33 < 5.14), the terms z 2 and zt are not needed. Model D is as good as Model A. Model D is the simplest adequate model: This is the same model that was obtained by starting with the simplest possible model and adding terms to make up for inadequacies. Comments The model building process uses regression to estimate the parameters, followed by diagnosis to decide whether the model should be modified by adding or dropping terms. The goal is not to maximize R 2 , because this puts unneeded high-order terms into the polynomial model. The best model should have the fewest possible parameters because this will minimize the prediction error of the model. One approach to finding the simplest adequate model is to start with a simple tentative model and use diagnostic checks, such as residuals plots, for guidance. The alternate approach is to start by overfitting the data with a highly parameterized model and to then find appropriate simplifications. Each time a Model D y ˆ 186 7.12t 3.06z– 0.143t 2 ++= L1592_frame_C38 Page 342 Tuesday, December 18, 2001 3:21 PM © 2002 By CRC Press LLC term is added or deleted from the model, a check is made on whether the difference in the regression sum of squares of the two models is large enough to justify modification of the model. References Berthouex, P. M. and D. K. Stevens (1982). “Computer Analysis of Settling Data,” J. Envr. Engr. Div., ASCE, 108, 1065–1069. Camp, T. R. (1946). “Sedimentation and Design of Settling Tanks,” Trans. Am. Soc. Civil Engr., 3, 895–936. Draper, N. R. and H. Smith (1998). Applied Regression Analysis, 3rd ed., New York, John Wiley. Exercises 38.1 Settling Test. Find a polynomial model that describes the following data. The initial suspended solids concentration was 560 mg/L. There are duplicate measurements at each time and depth. 38.2 Solid Waste Fuel Value. Exercise 3.5 includes a table that relates solid waste composition to the fuel value. The fuel value was calculated from the Dulong model, which uses elemental composition instead of the percentages of paper, food, metal, and plastic. Develop a model to relate the percentages of paper, food, metals, glass, and plastic to the Dulong estimates of fuel value. One proposed model is E(Btu/lb) = 23 Food + 82.8 Paper + 160 Plastic. Compare your model to this. 38.3 Final Clarifier. An activated sludge final clarifier was operated at various levels of overflow rate (OFR) to evaluate the effect of overflow rate (OFR), feed rate, hydraulic detention time, and feed slurry concentration on effluent total suspended solids (TSS) and underflow solids concentration. The temperature was always in the range of 18.5 to 21°C. Runs 11–12, 13–14, and 15–16 are duplicates, so the pure experimental error can be estimated. (a) Construct a polynomial model to predict effluent TSS. (b) Construct a polynomial model to predict underflow solids concentration. (c) Are underflow solids and effluent TSS related? Susp. Solids Conc. at Time t (min) Depth (ft) 20 40 60 120 2 135 90 75 48 140 100 66 40 4 170 110 90 53 165 117 88 46 6 180 126 96 60 187 121 90 63 Feed Detention Feed Effluent OFR Rate Time Slurry Underflow TSS Run (m/d) (m/d) (h) (kg/m 3 ) (kg/m 3 ) (mg/L) 1 11.1 30.0 2.4 6.32 11.36 3.5 2 11.1 30.0 1.2 6.05 10.04 4.4 3 11.1 23.3 2.4 7.05 13.44 3.9 4 11.1 23.3 1.2 6.72 13.06 4.8 5 16.7 30.0 2.4 5.58 12.88 3.8 6 16.7 30.0 1.2 5.59 13.11 5.2 7 16.7 23.3 2.4 6.20 19.04 4.0 8 16.7 23.3 1.2 6.35 21.39 4.5 9 13.3 33.3 1.8 5.67 9.63 5.4 L1592_frame_C38 Page 343 Tuesday, December 18, 2001 3:21 PM [...]... 8 9 10 11 12 13 14 15 16 17 18 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 5 5 5 12 12 12 12 12 8 8 8 8 8 4 4 4 4 4 12 12 12 32.6 32.6 24.5 16.4 8. 2 57.0 40 .8 24.5 8. 2 8. 2 57.0 40 .8 40 .8 24.5 8. 2 32.6 24.5 24.5 48 60 55 36 45 64 55 30 16 45 37 21 14 4 11 20 28 12 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 5 5 5 5 5 5 5 5 2 2 2 2 2 2 2 2 2 2 12 12 8 8 8 4 4 4 12 12 12 8 8 8 8 4 4 4 16.4 8. 2 40 .8 24.5 8. 2... x y x y 10.0 8. 0 13.0 9.0 11.0 14.0 6.0 4.0 12.0 7.0 5.0 8. 04 6.95 7. 58 8 .81 8. 33 9.96 7.24 4.26 10 .84 4 .82 5. 68 10.0 8. 0 13.0 9.0 11.0 14.0 6.0 4.0 12.0 7.0 5.0 9.14 8. 14 8. 74 8. 77 9.26 8. 10 6.13 3.10 9.13 7.26 4.74 10.0 8. 0 13.0 9.0 11.0 14.0 6.0 4.0 12.0 7.0 5.0 7.46 6.77 12.74 7.11 7 .81 8. 84 6. 08 5.39 8. 15 6.42 5.73 8. 0 8. 0 8. 0 8. 0 8. 0 8. 0 8. 0 19.0 8. 0 8. 0 8. 0 6. 58 5.76 7.71 8. 84 8. 47 7.04 5.25... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 140 140 120 190 120 110 110 100 100 120 120 100 80 100 0 0 0 0 0 0 0 0 0 0 0 0 0 5.96 6. 08 5.93 5.99 6.01 5.97 5 .88 6.06 6.06 6.03 6.02 6.17 6.31 6.27 6.42 6. 28 6.43 6.33 6.43 6.37 6.09 6.32 6.37 6.73 6 .89 6 .87 6.30 6.52 6.39 6 .87 6 .85 5 .82 5.94 5.73 5.91 5 .87 5 .80 5 .80 5. 78 5. 78 5.73 5.63 5.79 6.02 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0... 2 2 2 2 2 2 2 12 12 8 8 8 4 4 4 12 12 12 8 8 8 8 4 4 4 16.4 8. 2 40 .8 24.5 8. 2 40 .8 24.5 8. 2 16.4 16.4 8. 2 40 .8 40 .8 24.5 8. 2 40 .8 24.5 8. 2 18 15 47 41 57 39 41 43 19 36 23 26 15 17 14 39 43 48 Source: Cashion B S and T M Keinath, J WPCF, 55, 1331–13 38 © 2002 By CRC Press LLC L1592_frame_C39 Page 345 Tuesday, December 18, 2001 3:22 PM 39 2 The Coefficient of Determination, R coefficient of determination,... variable Z3WA Z3 = 1 for storms 1 and 2, and 0 for storm 3 The fitted model is: Model C: (t-ratios) © 2002 By CRC Press LLC pH = 5 .82 + 1.11Z1 + 1.38Z2 − 0.0057Z3WA (8. 43) (12.19) (6. 68) L1592_frame_C40 Page 360 Tuesday, December 18, 2001 3:24 PM TABLE 40.2 Alternate Models for pH at Cosby Creek Model 2 Reg SS A pH = 5.77 − 0.00008WA + 0.998Z1 + 1.65Z2 − 0.005Z1WA − 0.008Z2WA B pH = 5 .82 + 0.95Z1 + 1.60Z2... 61.2 9 .8 8.9 44.9 6.3 20.3 7.5 1.2 159 .8 44.4 57.4 25.9 37.9 55.0 151.7 116.2 129.9 19.4 7.7 36.7 17 .8 8.5 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 159 .8 44.4 57.4 25.9 37.9 55.0 151.7 116.2 129.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19.4 7.7 36.7 17 .8 8.5 142.3 140 .8 62.7 32.5 82 .3 58. 6 15.5 2.5 1527.3 697.5 429.9 215.2 331.6 185 .7... model? x y1 y2 2 5 8 12 15 18 20 0.0 4.0 5.1 8. 1 9.2 11.3 11.7 1.7 2.0 4.1 8. 9 8. 3 9.5 10.7 y3 2.0 4.5 5 .8 8.4 8. 8 10.9 10.4 39.4 Range of Data Fit a straight-line calibration model to the first 10 observations in the Exercise 36.3 data set, that is for COD between 60 and 195 mg/L Then fit the straight line to the full 2 data set (COD from 60 to 675 mg/L) Interpret the change in R for the two cases ©... SS A pH = 5.77 − 0.00008WA + 0.998Z1 + 1.65Z2 − 0.005Z1WA − 0.008Z2WA B pH = 5 .82 + 0.95Z1 + 1.60Z2 − 0.005Z1WA − 0.008Z2WA C pH = 5 .82 + 1.11Z1 + 1.38Z2 − 0.0057Z3WA Res SS R 4.2 78 4.2 78 4.229 0.662 0.662 0.712 0 .86 6 0 .86 6 0 .85 6 This simplification of the model can be checked in a more formal way by comparing regression sums of squares of the simplified model with the more complicated one The regression...L1592_frame_C 38 Page 344 Tuesday, December 18, 2001 3:21 PM 10 11 12 13 14 15 16 13.3 13.3 13.3 13.3 13.3 13.3 13.3 20.0 26.7 26.7 26.7 26.7 26.7 26.7 1 .8 3.0 3.0 0.6 0.6 1 .8 1 .8 7.43 6.06 6.14 6.36 5.40 6. 18 6.26 20.55 12.20 12.56 11.94 10.57 11 .80 12.12 3.0 3.7 3.6 6.9 6.9 5.0 4.0 Source: Adapted from Deitz J D and T M Keinath, J WPCF, 56, 344–350 (Original values have been rounded.) 38. 4 Final Clarification... 0 0 0 0 0 0 0 0 Diesel fuel #2 1 2 3 4 5 6 7 8 3.62 4.29 4.21 4.46 4.41 4.61 5. 38 4.64 −3.05 −3.72 −3.62 −3. 98 −4.03 −4.50 −4.49 −5.19 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −3.05 −3.72 −3.62 −3. 98 −4.03 −4.50 −4.49 −5.19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Diesel fuel #3 1 2 3 4 5 6 7 8 3.71 4.44 4.36 4. 68 4.52 4. 78 5.36 5.61 −3.05 −3.72 −3.62 −3. 98 −4.03 −4.50 −4.49 −5.19 0 0 0 0 0 0 0 0 1 1 . 24.5 41 8 8 8 24.5 30 26 5 4 8. 2 43 9 8 8 8. 2 16 27 2 12 16.4 19 10 8 8 8. 2 45 28 2 12 16.4 36 11 8 4 57.0 37 29 2 12 8. 2 23 12 8 4 40 .8 21 30 2 8 40 .8 26 13 8 4 40 .8 14 31 2 8 40 .8 15 14 8 4 24.5. 10.0 8. 04 10.0 9.14 10.0 7.46 8. 0 6. 58 8.0 6.95 8. 0 8. 14 8. 0 6.77 8. 0 5.76 13.0 7. 58 13.0 8. 74 13.0 12.74 8. 0 7.71 9.0 8. 81 9.0 8. 77 9.0 7.11 8. 0 8. 84 11.0 8. 33 11.0 9.26 11.0 7 .81 8. 0 8. 47 14.0. (mg/L) 1 8 12 32.6 48 19 5 12 16.4 18 2 8 12 32.6 60 20 5 12 8. 2 15 3 8 12 24.5 55 21 5 8 40 .8 47 4 8 12 16.4 36 22 5 8 24.5 41 5 8 12 8. 2 45 23 5 8 8.2 57 6 8 8 57.0 64 24 5 4 40 .8 39 7 8 8 40 .8 55