Statistics for Environmental Engineers - Part 5 potx

© 2002 By CRC Press LLC 40 Regression Analysis with Categorical Variables KEY WORDS acid rain, pH , categorical variable, F test, indicator variable, east squares, linear model, regression, dummy variable, qualitative variables, regression sum of squares, t -ratio, weak acidity. Qualitative variables can be used as explanatory variables in regression models. A typical case would be when several sets of data are similar except that each set was measured by a different chemist (or different instrument or laboratory), or each set comes from a different location, or each set was measured on a different day. The qualitative variables — chemist, location, or day — typically take on discrete values (i.e., chemist Smith or chemist Jones). For convenience, they are usually represented numerically by a combination of zeros and ones to signify an observation’s membership in a category; hence the name categorical variables . One task in the analysis of such data is to determine whether the same model structure and parameter values hold for each data set. One way to do this would be to fit the proposed model to each individual data set and then try to assess the similarities and differences in the goodness of fit. Another way would be to fit the proposed model to all the data as though they were one data set instead of several, assuming that each data set has the same pattern, and then to look for inadequacies in the fitted model. Neither of these approaches is as attractive as using categorical variables to create a collective data set that can be fitted to a single model while retaining the distinction between the individual data sets. This technique allows the model structure and the model parameters to be evaluated using statistical methods like those discussed in the previous chapter. Case Study: Acidification of a Stream During Storms Cosby Creek, in the southern Appalachian Mountains, was monitored during three storms to study how pH and other measures of acidification were affected by the rainfall in that region. Samples were taken every 30 min and 19 characteristics of the stream water chemistry were measured (Meinert et al., 1982). Weak acidity (WA) and pH will be examined in this case study. Figure 40.1 shows 17 observations for storm 1, 14 for storm 2, and 13 for storm 3, giving a total of 44 observations. If the data are analyzed without distinguishing between storms one might consider models of the form pH = β 0 + β 1 WA + β 2 WA 2 or pH = θ 3 + ( θ 1 − θ 3 )exp( −θ 2 WA ). Each storm might be described by pH = β 0 + β 1 WA , but storm 3 does not have the same slope and intercept as storms 1 and 2, and storms 1 and 2 might be different as well. This can be checked by using categorical variables to estimate a different slope and intercept for each storm. Method: Regression with Categorical Variables Suppose that a model needs to include an effect due to the category (storm event, farm plot, treatment, truckload, operator, laboratory, etc.) from which the data came. This effect is included in the model in the form of categorical variables (also called dummy or indicator variables ). In general m − 1 categorical variables are needed to specify m categories. L1592_frame_C40 Page 355 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC Begin by considering data from a single category. The quantitative predictor variable is x 1 which can predict the independent variable y 1 using the linear model: where β 0 and β 1 are parameters to be estimated by least squares. If there are data from two categories (e.g., data produced at two different laboratories), one approach would be to model the two sets of data separately as: and and then to compare the estimated intercepts ( α 0 and β 0 ) and the estimated slopes ( α 1 and β 1 ) using confidence intervals or t -tests. A second, and often better, method is to simultaneously fit a single augmented model to all the data. To construct this model, define a categorical variable Z as follows: The augmented model is: With some rearrangement: In this last form the regression is done as though there are three independent variables, x , Z , and Zx . The vectors of Z and Zx have to be created from the categorical variables defined above. The four parameters α 0 , β 0 , α 1 , and β 1 are estimated by linear regression. A model for each category can be obtained by substituting the defined values. For the first category, Z = 0 and: FIGURE 40.1 The relation of pH and weak acidity data of Cosby Creek after three storms. Z = 0 if the data are in the first category Z = 1 if the data are in the second category 7006005004003002001000 5. 6. 6. 7. pH 5 0 5 0 Weak Acidity (µg/L) y 1i β 0 β 1 x 1i e i ++= y 1i α 0 α 1 x 1i e i ++= y 2i β 0 β 1 x 2i e i ++= y i α 0 α 1 x i Z β 0 β 1 x i +()e i ++ += y i α 0 β 0 Z α 1 x i β 1 Zx i e i ++++= y i α 0 α 1 x i e i ++= L1592_frame_C40 Page 356 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC For the second category, Z = 1 and: The regression might estimate either β 0 or β 1 as zero, or both as zero. If β 0 = 0, the two lines have the same intercept. If β 1 = 0, the two lines have the same slope. If both β 1 and β 0 equal zero, a single straight line fits all the data. Figure 40.2 shows the four possible outcomes. Figure 40.3 shows the particular case where the slopes are equal and the intercepts are different. If simplification seems indicated, a simplified version is fitted to the data. We show later how the full model and simplified model are compared to check whether the simplification is justified. To deal with three categories, two categorical variables are defined: This implies Z 1 = 0 and Z 2 = 0 for category 3. The model is: The parameters with subscript 0 estimate the intercept and those with subscript 1 estimate the slopes. This can be rearranged to give: The six parameters are estimated by fitting the original independent variable x i plus the four created variables Z 1 , Z 2 , Z 1 x i , and Z 2 x i . Any of the parameters might be estimated as zero by the regression analysis. A couple of examples explain how the simpler models can be identified. In the simplest possible case, the regression would FIGURE 40.2 Four possible models to fit a straight line to data in two categories. FIGURE 40.3 Model with two categories having different intercepts but equal slopes. Category 1: Z 1 = 1 and Z 2 = 0 Category 2: Z 1 = 0 and Z 2 = 1 Intercepts Different Intercepts Equal y i =( α 0 + β 0 ) +( α 1 + β 1 ) x i + e i y i = α 0 +( α 1 + β 1 ) x i + e i y i =( α 0 + β 0 ) + α 1 x i + e i y i = α 0 + α 1 x i + e i Slopes Diffferent Slopes Equal α 0 + β 0 α α 0 β 0 slope = α 1 for both lines Complete model y= ( α 0 + β 0 )+( α 1 + β 1 ) x + e Category 2: y = ( α 0 + β 0 )+ α 1 x + e Category 1: y = α 0 + α 1 x + e 1 α 1 y i α 0 β 0 +() α 1 β 1 +()x i e i ++= y i α 0 α 1 x i +()Z 1 β 0 β 1 x i +()Z 2 γ 0 γ 1 x i +()e i +++= y i α 0 β 0 Z 1 + γ 0 Z 2 α 1 x i + β 1 Z 1 x i γ 1 Z 2 x i e i ++++= L1592_frame_C40 Page 357 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC estimate β 0 = 0, γ 0 = 0, β 1 = 0, and γ 1 = 0 and the same slope ( α 1 ) and intercept ( α 0 ) would apply to all three categories. The fitted simplified model is . If the intercepts are different for the three categories but the slopes are the same, the regression would estimate β 1 = 0 and γ 1 = 0 and the model becomes: For category 1: For category 2: For category 3: Case Study: Solution The model under consideration allows a different slope and intercept for each storm. Two dummy variables are needed: Z 1 = 1 for storm 1 and zero otherwise Z 2 = 1 for storm 2 and zero otherwise The model is: pH = α 0 + α 1 WA + Z 1 ( β 0 + β 1 WA ) + Z 2 ( γ 0 + γ 1 WA ) where the α ’s, β ’s, and γ ’s are estimated by regression. The model can be rewritten as: pH = α 0 + β 0 Z 1 + γ 0 Z 2 + α 1 WA + β 1 Z 1 WA + γ 1 Z 2 WA The dummy variables are incorporated into the model by creating the new variables Z 1 WA and Z 2 WA. Table 40.1 shows how this is done. Fitting the full six-parameter model gives: Model A: pH = 5.77 − 0.00008WA + 0.998Z 1 + 1.65Z 2 − 0.005Z 1 WA − 0.008Z 2 WA (t-ratios) (0.11) (2.14) (3.51) (3.63) (4.90) which is also shown as Model A in Table 40.2 (top row). The numerical coefficients are the least squares estimates of the parameters. The small numbers in parentheses beneath the coefficients are the t-ratios for the parameter values. Terms with t < 2 are candidates for elimination from the model because they are almost certainly not significant. The term WA appears insignificant. Dropping this term and refitting the simplified model gives Model B, in which all coefficients are significant: Model B: pH = 5.82 + 0.95Z 1 + 1.60Z 2 − 0.005Z 1 WA − 0.008Z 2 WA (t-ratios) (6.01) (9.47) (4.35) (5.54) [95% conf. interval] [0.63 to 1.27] [1.26 to 1.94] [−0.007 to −0.002] [−0.01 to −0.005] The regression sum of squares, listed in Table 40.2, is the same for Model A and for Model B (Reg SS = 4.278). Dropping the WA term caused no decrease in the regression sum of squares. Model B is equivalent to Model A. Is any further simplification possible? Notice that the 95% confidence intervals overlap for the terms −0.005 Z 1 WA and –0.008 Z 2 WA. Therefore, the coefficients of these two terms might be the same. To check this, fit Model C, which has the same slope but different intercepts for storms 1 and 2. This is y i α 0 α 1 x i + e i += y i α 0 β 0 Z 1 γ 0 Z 2 ++() α 1 x i e i ++= y i α 0 β 0 Z 1 +() α 1 x i e i ++= y i α 0 γ 0 Z 2 +() α 1 x i e i ++= y i α 0 α 1 x i e i ++= L1592_frame_C40 Page 358 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC done by combining columns Z 1 WA and Z 2 WA to form the two columns on the right-hand side of Table 40.1. Call this new variable Z 3 WA. Z 3 = 1 for storms 1 and 2, and 0 for storm 3. The fitted model is: Model C: pH = 5.82 + 1.11Z 1 + 1.38Z 2 − 0.0057Z 3 WA (t-ratios) (8.43) (12.19) (6.68) TABLE 40.1 Weak Acidity (WA), pH, and Categorical Variables for Three Storms Storm WA Z 1 Z 2 Z 1 WA Z 2 WA pH Z 3 Z 3 WA 1 190 1 0 190 0 5.96 1 190 1 110 1 0 110 0 6.08 1 110 1 150 1 0 150 0 5.93 1 150 1 170 1 0 170 0 5.99 1 170 1 170 1 0 170 0 6.01 1 170 1 170 1 0 170 0 5.97 1 170 1 200 1 0 200 0 5.88 1 200 1 140 1 0 140 0 6.06 1 140 1 140 1 0 140 0 6.06 1 140 1 160 1 0 160 0 6.03 1 160 1 140 1 0 140 0 6.02 1 140 1 110 1 0 110 0 6.17 1 110 1 110 1 0 110 0 6.31 1 110 1 120 1 0 120 0 6.27 1 120 1 110 1 0 110 0 6.42 1 110 1 110 1 0 110 0 6.28 1 110 1 110 1 0 110 0 6.43 1 110 2 140 0 1 0 140 6.33 1 140 2 140 0 1 0 140 6.43 1 140 2 120 0 1 0 120 6.37 1 120 2 190 0 1 0 190 6.09 1 190 2 120 0 1 0 120 6.32 1 120 2 110 0 1 0 110 6.37 1 110 2 110 0 1 0 110 6.73 1 110 2 100 0 1 0 100 6.89 1 100 2 100 0 1 0 100 6.87 1 100 2 120 0 1 0 120 6.30 1 120 2 120 0 1 0 120 6.52 1 120 2 100 0 1 0 100 6.39 1 100 2 80 0 1 0 80 6.87 1 80 2 100 0 1 0 100 6.85 1 100 3 580 0 0 0 0 5.82 0 0 3 640 0 0 0 0 5.94 0 0 3 500 0 0 0 0 5.73 0 0 3 530 0 0 0 0 5.91 0 0 3 670 0 0 0 0 5.87 0 0 3 670 0 0 0 0 5.80 0 0 3 640 0 0 0 0 5.80 0 0 3 640 0 0 0 0 5.78 0 0 3 560 0 0 0 0 5.78 0 0 3 590 0 0 0 0 5.73 0 0 3 640 0 0 0 0 5.63 0 0 3 590 0 0 0 0 5.79 0 0 3 600 0 0 0 0 6.02 0 0 Note: The two right-hand columns are used to fit the simplified model. Source: Meinert, D. L., S. A. Miller, R. J. Ruane, and H. Olem (1982). “A Review of Water Quality Data in Acid Sensitive Watersheds in the Tennessee Valley,” Rep. No. TVA.ONR/WR-82/10, TVA, Chattanooga, TN. L1592_frame_C40 Page 359 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC This simplification of the model can be checked in a more formal way by comparing regression sums of squares of the simplified model with the more complicated one. The regression sum of squares is a measure of how well the model fits the data. Dropping an important term will cause the regression sum of squares to decrease by a noteworthy amount, whereas dropping an unimportant term will change the regression sum of squares very little. An example shows how we decide whether a change is “noteworthy” (i.e., statistically significant). If two models are equivalent, the difference of their regression sums of squares will be small, within an allowance for variation due to random experimental error. The variance due to experimental error can be estimated by the mean residual sum of squares of the full model (Model A). The variance due to the deleted term is estimated by the difference between the regression sums of squares of Model A and Model C, with an adjustment for their respective degrees of freedom. The ratio of the variance due to the deleted term is compared with the variance due to experimental error by computing the F statistic, as follows: where Reg SS = regression sum of squares Reg df = degrees of freedom associated with the regression sum of squares Res SS = residual sum of squares Res df = degrees of freedom associated with the residual sum of squares Model A has five degrees of freedom associated with the regression sum of squares (Reg df = 5), one for each of the six parameters in the model minus one for computing the mean. Model C has three degrees of freedom. Thus: For a test of significance at the 95% confidence level, this value of F is compared with the upper 5% point of the F distribution with the appropriate degrees of freedom (5 – 3 = 2 in the numerator and 38 in the denominator): F 2,38,0.05 = 3.25. The computed value (F = 1.44) is smaller than the critical value F 2,38,0.05 = 3.25, which confirms that omitting WA from the model and forcing storms 1 and 2 to have the same slope has not significantly worsened the fit of the model. In short, Model C describes the data as well as Model A or Model B. Because it is simpler, it is preferred. Models for the individual storms are derived by substituting the values of Z 1 , Z 2 , and Z 3 into Model C: Storm 1 Z 1 = 1, Z 2 = 0, Z 3 = 1pH = 6.93 − 0.0057WA Storm 2 Z 1 = 0, Z 2 = 1, Z 3 = 1pH = 7.20 − 0.0057WA Storm 3 Z 1 = 0, Z 2 = 0, Z 3 = 0pH = 5.82 The model indicates a different intercept for each storm, a common slope for storms 1 and 2, and a slope of zero for storm 3, as shown by Figure 40.4. In storm 3, the variation in pH was random about a mean TABLE 40.2 Alternate Models for pH at Cosby Creek Model Reg SS Res SS R 2 A pH = 5.77 − 0.00008WA + 0.998Z 1 + 1.65Z 2 − 0.005Z 1 WA − 0.008Z 2 WA 4.278 0.662 0.866 B pH = 5.82 + 0.95Z 1 + 1.60Z 2 − 0.005Z 1 WA − 0.008Z 2 WA 4.278 0.662 0.866 C pH = 5.82 + 1.11Z 1 + 1.38Z 2 − 0.0057Z 3 WA 4.229 0.712 0.856 F Reg SS A Reg SS C –()/Reg df A Reg df C –() Res SS A /Res df A = F 4.278 4.229–()/53–() 0.66/38 0.0245 0.017 1.44=== L1592_frame_C40 Page 360 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC of 5.82. For storms 1 and 2, increased WA was associated with a lowering of the pH. It is not difficult to imagine conditions that would lead to two different storms having the same slope but different intercepts. It is more difficult to understand how the same stream could respond so differently to storm 3, which had a range of WA that was much higher than either storm 1 or 2, a lower pH, and no change of pH over the observed range of WA. Perhaps high WA depresses the pH and also buffers the stream against extreme changes in pH. But why was the WA so much different during storm 3? The data alone, and the statistical analysis, do not answer this question. They do, however, serve the investigator by raising the question. Comments The variables considered in regression equations usually take numerical values over a continuous range, but occasionally it is advantageous to introduce a factor that has two or more discrete levels, or categories. For example, data may arise from three storms, or three operators. In such a case, we cannot set up a continuous measurement scale for the variable storm or operator. We must create categorical variables (dummy variables) that account for the possible different effects of separate storms or operators. The levels assigned to the categorical variables are unrelated to any physical level that might exist in the factors themselves. Regression with categorical variables was used to model the disappearance of PCBs from soil (Berthouex and Gan, 1991; Gan and Berthouex, 1994). Draper and Smith (1998) provide several examples on creating efficient patterns for assigning categorical variables. Piegorsch and Bailer (1997) show examples for nonlinear models. References Berthouex, P. M. and D. R. Gan (1991). “Fate of PCBs in Soil Treated with Contaminated Municipal Sludge,” J. Envir. Engr. Div., ASCE, 116(1), 1–18. Daniel, C. and F. S. Wood (1980). Fitting Equations to Data: Computer Analysis of Multifactor Data, 2nd ed., New York, John Wiley. Draper, N. R. and H. Smith, (1998). Applied Regression Analysis, 3rd ed., New York, John Wiley. Gan, D. R. and P. M. Berthouex (1994). “Disappearance and Crop Uptake of PCBs from Sludge-Amended Farmland,” Water Envir. Res., 66, 54–69. Meinert, D. L., S. A. Miller, R. J. Ruane, and H. Olem (1982). “A Review of Water Quality Data in Acid Sensitive Watersheds in the Tennessee Valley,” Rep. No. TVA.ONR/WR-82/10, TVA, Chattanooga, TN. Piegorsch, W. W. and A. J. Bailer (1997). Statistics for Environmental Biology and Toxicology, London, Chapman & Hall. FIGURE 40.4 Stream acidification data fitted to Model C (Table 40.2). Storms 1 and 2 have the same slope. 7006005004003002001000 5.5 6.0 6.5 7.0 pH Weak Acidity (mg/L) pH = 6.93 - 0.0057 WA pH = 7.20 - 0.0057 WA pH = 5.82 L1592_frame_C40 Page 361 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC Exercises 40.1 PCB Degradation in Soil. PCB-contaminated sewage sludge was applied to test plots at three different loading rates (kg/ha) at the beginning of a 5-yr experimental program. Test plots of farmland where corn was grown were sampled to assess the rate of disappearance of PCB from soil. Duplicate plots were used for each treatment. Soil PCB concentration (mg/kg) was measured each year in the fall after the corn crop was picked and in the spring before planting. The data are below. Estimate the rate coefficients of disappearance (k) using the model PCB t = PCB 0 exp (−kt). Are the rates the same for the four treatment conditions? 40.2 1,1,1-Trichloroethane Biodegradation. Estimates of biodegradation rate (k b ) of 1,1,1-trichloroethane were made under three conditions of activated sludge treatment. The model is y i = bx i + e i , where the slope b is the estimate of k b . Two dummy variables are needed to represent the three treatment conditions, and these are arranged in the table below. Does the value of k b depend on the activated sludge treatment condition? Time Treatment 1 Treatment 2 Treatment 3 0 1.14 0.61 2.66 2.50 0.44 0.44 5 0.63 0.81 2.69 2.96 0.25 0.31 12 0.43 0.54 1.14 1.51 0.18 0.22 17 0.35 0.51 1.00 0.48 0.15 0.12 24 0.35 0.34 0.93 1.16 0.11 0.09 29 0.32 0.30 0.73 0.96 0.08 0.10 36 0.23 0.20 0.47 0.46 0.07 0.06 41 0.20 0.16 0.57 0.36 0.03 0.04 48 0.12 0.09 0.40 0.22 0.03 0.03 53 0.11 0.08 0.32 0.31 0.02 0.03 x (×× ×× 10 −− −− 6 ) Z 1 Z 2 Z 1 ∗∗ ∗∗ xZ 2 ∗∗ ∗∗ xy(×× ×× 10 −− −− 3 ) 61.2 0 0 0 0 142.3 9.8 0 0 0 0 140.8 8.9 0 0 0 0 62.7 44.9 0 0 0 0 32.5 6.3 0 0 0 0 82.3 20.3 0 0 0 0 58.6 7.5 0 0 0 0 15.5 1.2 0 0 0 0 2.5 159.8 1 0 159.8 0 1527.3 44.4 1 0 44.4 0 697.5 57.4 1 0 57.4 0 429.9 25.9 1 0 25.9 0 215.2 37.9 1 0 37.9 0 331.6 55.0 1 0 55.0 0 185.7 151.7 1 0 151.7 0 1169.2 116.2 1 0 116.2 0 842.8 129.9 1 0 129.9 0 712.9 19.4 0 1 0 19.4 49.3 7.7 0 1 0 7.7 21.6 36.7 0 1 0 36.7 53.3 17.8 0 1 0 17.8 59.4 8.5 0 1 0 8.5 112.3 L1592_frame_C40 Page 362 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC 40.3 Diesel Fuel. Four diesel fuels were tested to estimate the partition coefficient K dw of eight organic compounds as a function of their solubility in water (S ). The compounds are (1) naphthalene, (2) 1-methyl-naphthalene, (3) 2-methyl-naphthalene, (4) acenaphthene, (5) fluorene, (6) phenanthrene, (7) anthracene, and (8) fluoranthene. The table is set up to do linear regression with dummy variables to differentiate between diesel fuels. Does the partitioning relation vary from one diesel fuel to another? 40.4 Threshold Concentration. The data below can be described by a hockey-stick pattern. Below some threshold value ( τ ) the response is a constant plateau value ( η = γ 0 ). Above the threshold, the response is linear η = γ 0 + β 1 (x − τ ). These can be combined into a continuous segmented model using a dummy variable z such that z = 1 when x > τ and z = 0 when x ≤ τ . The dummy variable formulation is η = γ 0 + β 1 (x − τ )z, where z is a dummy variable. This gives η = γ 0 for x ≤ τ and η = γ 0 + β 1 (x − τ ) = γ 0 + β 1 x − β 1 τ for x ≥ τ . Estimate the plateau value γ 0 , the post-threshold slope β 1 , and the unknown threshold dose τ . Compound y == == log(K dw ) x == == log(S) Z 1 Z 2 Z 3 Z 1 log(S) Z 3 log(S) Z 3 log(S) Diesel fuel #1 1 3.67 −3.05 0 0 0 0 0 0 2 4.47 −3.72 0 0 0 0 0 0 3 4.31 −3.62 0 0 0 0 0 0 4 4.35 −3.98 0 0 0 0 0 0 5 4.45 −4.03 0 0 0 0 0 0 6 4.6 −4.50 0 0 0 0 0 0 7 5.15 −4.49 0 0 0 0 0 0 8 5.32 −5.19 0 0 0 0 0 0 Diesel fuel #2 1 3.62 −3.05 1 0 0 −3.05 0 0 2 4.29 −3.72 1 0 0 −3.72 0 0 3 4.21 −3.62 1 0 0 −3.62 0 0 4 4.46 −3.98 1 0 0 −3.98 0 0 5 4.41 −4.03 1 0 0 −4.03 0 0 6 4.61 −4.50 1 0 0 −4.50 0 0 7 5.38 −4.49 1 0 0 −4.49 0 0 8 4.64 −5.19 1 0 0 −5.19 0 0 Diesel fuel #3 1 3.71 −3.05 0 1 0 0 −3.05 0 2 4.44 −3.72 0 1 0 0 −3.72 0 3 4.36 −3.62 0 1 0 0 −3.62 0 4 4.68 −3.98 0 1 0 0 −3.98 0 5 4.52 −4.03 0 1 0 0 −4.03 0 6 4.78 −4.50 0 1 0 0 −4.50 0 7 5.36 −4.49 0 1 0 0 −4.49 0 8 5.61 −5.19 0 1 0 0 −5.19 0 Diesel fuel #4 1 3.71 −3.05 0 0 1 0 0 −3.05 2 4.49 −3.72 0 0 1 0 0 −3.72 3 4.33 −3.62 0 0 1 0 0 −3.62 4 4.62 −3.98 0 0 1 0 0 −3.98 5 4.55 −4.03 0 0 1 0 0 −4.03 6 4.78 −4.50 0 0 1 0 0 −4.50 7 5.20 −4.49 0 0 1 0 0 −4.49 8 5.60 −5.19 0 0 1 0 0 −5.19 Source: Lee, L. S. et al. (1992). Envir. Sci. Tech., 26, 2104–2110. L1592_frame_C40 Page 363 Tuesday, December 18, 2001 3:24 PM © 2002 By CRC Press LLC 40.5 Coagulation. Modify the hockey-stick model of Exercise 40.4 so it describes the intersection of two straight lines with nonzero slopes. Fit the model to the coagulation data (dissolved organic carbon, DOC) given below to estimate the slopes of the straight-line segments and the chemical dose (alum) at the intersection. x 2.5 22 60 90 105 144 178 210 233 256 300 400 y 16.6 15.3 16.9 16.1 17.1 16.9 18.6 19.3 25.8 28.4 35.5 45.3 Alum Dose DOC Alum Dose DOC (mg/L) (mg/L) (mg/L) (mg/L) 0 6.7 35 3.3 5 6.4 40 3.3 10 6.0 49 3.1 15 5.2 58 2.8 20 4.7 68 2.7 25 4.1 78 2.6 30 3.9 87 2.6 Source: White, M. W. et al. (1997). J. AWWA, 89(5). L1592_frame_C40 Page 364 Tuesday, December 18, 2001 3:24 PM [...]... Controlled Dilution Rate D = V/Q 0.70 0 .55 0 .55 200 140 120 0.66 0. 35 2800 150 100 1700 0.60 0 .55 140 0 .52 0.27 1200 80 70 17 75 0.60 0 .55 120 0 .54 0.27 2998 50 2 1770 0.60 0 .55 54 Source: Johnson, D B and P M Berthouex (19 75) Biotech Bioengr., 18, 55 7 57 0 1.00 n= 2 0. 75 n= 5 θ 1 0 .50 n= 4 0. 25 0 0 50 100 θ2 150 200 FIGURE 42.2 Approximate joint 95% confidence regions for θ1 and θ 2 estimated after the first,... Durbin-Watson Test Bounds for the 0. 05 Level of Significance p=2 p=3 p=4 n dL dU dL dU dL dU 15 20 25 30 50 1.08 1.20 1.29 1. 35 1 .50 1.36 1.41 1. 45 1.49 1 .59 0. 95 1.10 1.21 1.28 1.46 1 .54 1 .54 1 .55 1 .57 1.63 0.82 1.00 1.12 1.21 1.42 1. 75 1.68 1.66 1. 65 1.67 Note: n = number of observations; p = number of parameters estimated in the model Source: Durbin, J and G S Watson (1 951 ) Biometrika, 38, 159 –178... second-order effects might be important (b) The second cycle of experimentation actually done gave these results: P (g/L) F (% by wt.) Yield 55 140 29 55 160 100 65 140 1 05 65 160 108 60 150 120 60 150 171 60 150 118 60 150 120 60 150 118 53 150 77 60 1 65 99 60 1 35 102 67 150 97 The location of the experiments and the direction moved from the first cycle may be different than you proposed in part (a)... 0. 35 0.07 0.01 0.12 0.08 0.31 0.22 0.06 0.28 0.18 0.23 0.26 0.30 0.29 0.24 0.30 0.28 5. 6 7 .5 7.4 7 .5 5 .5 7.9 5. 8 5. 7 6.7 6.4 5. 5 8.2 6.9 6 .5 6.7 6.8 6.7 6.6 6 .5 6.6 1.8 4.6 3.8 4.8 2 .5 6.2 1.8 2.9 2.1 1.8 3.8 6.4 2.2 1.9 2.1 2.3 2.0 2.0 2.3 2.1 © 2002 By CRC Press LLC 159 2_frame_C_44 Page 389 Tuesday, December 18, 2001 3:26 PM 44 Designing Experiments for Nonlinear Parameter Estimation BOD, Box-Lucas... N(0,1) x 0 1 2 3 4 5 6 7 8 9 10 No Autocorrelation η ai yi = η + ai 20.0 20 .5 21.0 21 .5 22.0 22 .5 23.0 23 .5 24.0 24 .5 25. 0 © 2002 By CRC Press LLC 1.0 0 .5 –0.7 0.3 0.0 –2.3 –1.9 0.2 –0.3 0.2 –0.1 21.0 21.0 20.3 21.8 22.0 20.2 21.1 23.7 23.7 24.7 24.9 0.8ei−1 − + 0.00 0.80 1.04 0.27 0.46 0.37 –1 .55 –2.76 –2. 05 –1.88 –1.34 + + + + + + + + + + + Autocorrelation, ρ = 0.8 ai = ei 1.0 0 .5 –0.7 0.3 0.0 –2.3... this influence your model building strategy? Month 3 4 5 6 7 8 9 10 11 12 1 2 3 4 © 2002 By CRC Press LLC Year Leaded Gasoline Sold 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 1981 1981 1981 1981 141 166 161 170 148 136 169 109 117 87 1 05 73 82 75 Pb in Umbilical Cord Blood ( µ g/dL) 6.4 6.1 5. 7 6.9 7.0 7.2 6.6 5. 7 5. 7 5. 3 4.9 5. 4 4 .5 6.0 L 159 2_Frame_C41 Page 372 Tuesday, December 18, 2001 3:24... 43.2 Experimental Design and Results for Iteration 2 C (g /L) D (1/h) R (g /h) 1.0 1.0 1 .5 1 .5 0.16 0.18 0.16 0.18 0.041 0.042 0.034 0.0 35 0.22 Iteration 2 Dilution Rate (1/h) 0.20 Optimum _ R = 0.042 0.18 0.16 0.14 0.12 0.0 45 0.04 0.0 35 0.03 0.10 1.0 2.0 0 .5 1 .5 Phenol Concentration (mg/L) FIGURE 43.3 Approximation of the response surface estimated from the second-stage exploratory experiment CD term... are the two-level design from iteration 2; the solid circles indicate the center point and star points that were added to investigate curvature near the peak © 2002 By CRC Press LLC 159 2_frame_C_43 Page 383 Tuesday, December 18, 2001 3:26 PM TABLE 43.3 Experimental Results for the Third Iteration C (g/L) D (1/h) 0.16 0.18 0.16 0.18 0.17 0. 156 0.17 0.184 0.17 1.0 1.0 1 .5 1 .5 0.9 1. 25 1. 25 1. 25 1.6 R (g/h)... the durability and uniformity goals 1 Factor 2 3 0 0 0 0 −1.682 1.682 0 0 0 0 0 0 −1 −1 +1 +1 −1 −1 +1 +1 0 0 −1.682 1.682 0 0 0 0 0 0 0 0 −1 −1 −1 −1 +1 +1 +1 +1 −1.682 1.682 0 0 0 0 0 0 0 0 0 0 Run 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 © 2002 By CRC Press LLC −1 +1 −1 +1 −1 +1 −1 +1 Response Durability Uniformity 8 10 29 28 23 17 45 45 14 29 6 35 23 7 22 21 22 24 21 25 0.77 0.84 0.16 0.18... 16 17 20 26 16 18 21 36 24 33 24 2 5 14 20 16 32 15 34 35 36 35 35 34 36 2 0 0 0 0 0 0 0 0 0 0 0 0 43.4 Chrome Waste Solidification Fine solid precipitates from lime neutralization of liquid effluents from surface finishing operations in stainless steel processing are treated by cementbased solidification The solidification performance was explored in terms of water-to-solids ratio (W/S), cement content . 32 .5 6.3 0 0 0 0 82.3 20.3 0 0 0 0 58 .6 7 .5 0 0 0 0 15. 5 1.2 0 0 0 0 2 .5 159 .8 1 0 159 .8 0 152 7.3 44.4 1 0 44.4 0 697 .5 57.4 1 0 57 .4 0 429.9 25. 9 1 0 25. 9 0 2 15. 2 37.9 1 0 37.9 0 331.6 55 .0. 4 d L d U d L d U d L d U 15 1.08 1.36 0. 95 1 .54 0.82 1. 75 20 1.20 1.41 1.10 1 .54 1.00 1.68 25 1.29 1. 45 1.21 1 .55 1.12 1.66 30 1. 35 1.49 1.28 1 .57 1.21 1. 65 50 1 .50 1 .59 1.46 1.63 1.42 1.67 Note:. 0 3 670 0 0 0 0 5. 87 0 0 3 670 0 0 0 0 5. 80 0 0 3 640 0 0 0 0 5. 80 0 0 3 640 0 0 0 0 5. 78 0 0 3 56 0 0 0 0 0 5. 78 0 0 3 59 0 0 0 0 0 5. 73 0 0 3 640 0 0 0 0 5. 63 0 0 3 59 0 0 0 0 0 5. 79 0 0 3 600

Định dạng
Số trang	75
Dung lượng	1,36 MB