Statistics for Environmental Engineers Second Edition phần 10 doc

© 2002 By CRC Press LLC For our two-variable example, the estimate of variance based on the Taylor series expansion shown earlier is: We will estimate the sensitivity coefficients θ 1 = ∆k/∆X 1 and θ 2 = ∆k/∆X 2 by evaluating k at distances ∆X 1 and ∆X 2 from the center point. Assume that the center of region of interest is located at = 200, = 20, and that k 0 = 0.90 at this point. Further assume that Var(X 1 ) = 100 and Var(X 2 ) = 1. A reasonable choice of ∆ X 1 and ∆ X 2 is from one to three standard deviations of the error in X 1 and X 2 . We will use ∆X 1 = = 20 and ∆X 2 = = 2. Suppose that k = 1.00 at [ + ∆X 1 = 200 + 20, = 20] and k = 0.70 at [ = 200, + ∆X 2 = 20 +2]. The sensitivity coefficients are: These sensitivity coefficients can be used to estimate the expected variance of k: and An approximate 95% confidence interval would be k = 0.90 ± 2(0.11) = 0.90 ± 0.22, or 0.68 < k < 1.12. Unfortunately, at these specified experimental settings, the precision of the estimate of k depends almost entirely upon X 2 ; 80% of the variance in k is contributed by X 2 . This may be surprising because X 2 has the smallest variance, but it is such failures of our intuition that merit this kind of analysis. If the precision of k must be improved, the options are (1) try to center the experiment in another region where variation in X 2 will be suppressed, or (2) improve the precision with which X 2 is measured, or (3) make replicate measures of X 2 to average out the random variation. Propagation of Uncertainty in Models The examples in this chapter have been about the propagation of measurement error, but the same methods can be used to investigate the propagation of uncertainty in design parameters. Uncertainty is expressed as the variance of a distribution that defines the uncertainty of the design parameter. If only the range of parameter values is known, the designer should use a uniform distribution. If the designer can express a “most likely” value within the range of the uncertain parameter, a triangular distribution can be used. If the distribution is symmetric about the expected value, the normal distribution might be used. The variance of the distribution that defines the uncertainty in the design parameter is used in the propagation of error equations (Berthouex and Polkowski, 1970). The simulation methods used in Chapter 51 can also be used to investigate the effect of uncertainty in design inputs on design outputs and decisions. They are especially useful when real variability in inputs exists and the variability in output needs to be investigated (Beck, 1987; Brown, 1987). Var k() ∆k ∆X 1   2 Var X 1 () ∆k ∆X 2   2 Var X 2 ()+= X 1 0 X 2 0 2 σ X 1 2 σ X 2 X 1 0 X 2 0 X 1 0 X 2 0 ∆k ∆X 1 1.00 0.90– 20 0.0050== ∆k ∆X 2 0.70 0.90– 2 0.10–== σ k 2 0.0050() 2 100 0.10–() 2 1+ 0.0025 0.010+ 0.0125=== σ k 0.11= L1592_Frame_C49 Page 430 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC Comments It is a serious disappointment to learn after an experiment that the variance of computed values is too large. Avoid disappointment by investigating this before running the experiment. Make an analysis of how measurement errors are transmitted into calculated values. This can be done when the model is a simple equation, or when the model is complicated and must be solved by numerical approximation. References Beck, M. B. (1987). “Water Quality Modeling: A Review of the Analysis of Uncertainty,” Water Resour. Res., 23(5), 1393–1441. Berthouex, P. M. and L. B. Polkowski (1970). “Optimum Waste Treatment Plant Design under Uncertainty,” J. Water Poll. Control Fed., 42(9), 1589–1613. Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, New York, Wiley Interscience. Brown, L. C. (1987). “Uncertainty Analysis in Water Quality Modeling Using QUAL2E,” in Systems Analysis in Water Quality Measurement, (Advances in Water Pollution Control Series), M. B. Beck, Ed., Perga- mon Press, pp. 309–319. Mandel, J. (1964). The Statistical Analysis of Experimental Data, New York, Interscience Publishers. Exercises 49.1 Exponential Model. The model for a treatment process is y = 100 exp(−kt). You wish to estimate k with sufficient precision that the value of y is known within ±5 units. The expected value of k is 0.2. How precisely does k need to be known for t = 5? For t = 15? (Hint: It may help if you draw y as a function of k for several values of t.) 49.2 Mixed Reactor. The model for a first-order kinetic reaction in a completely mixed reactor is y = . (a) Use a Taylor series linear approximation and evaluate the variance of y for k = 0.5, V = 10, Q = 1, and x = 100, assuming the standard deviation of each variable is 10% of its value (i.e., σ k = 0.1(0.05) = 0.005). (b) Evaluate the variance of k for V = 10, Q = 1, x = 100, and y = 20, assuming the standard deviation of each variable is 10% of its value. Which variable contributes most to the variance of k? 49.3 Simulation of an Exponential Model. For the exponential model y = 100 exp(−kt), simulate the distribution of y for k with mean 0.2 and standard deviation 0.02 for t = 5 and for t = 15. 49.4 DO Model. The Streeter-Phelps equation used to model dissolved oxygen in streams is: where D is the dissolved oxygen deficit (mg/L), L a is the initial BOD concentration (mg/L), D a is the initial dissolved oxygen deficit (mg/L), and k 1 and k 2 are the bio-oxidation and reaeration coefficients (1/day). For the following conditions, estimate the dissolved oxygen deficit and its standard deviation at travel times (t) of 1.5 and 3.0 days. Parameter Average Std. Deviation L a (mg/L) 15 1.0 D a (mg/L) 0.47 0.05 k 1 (day −1 ) 0.52 0.1 k 2 (day −1 ) 1.5 0.2 x 1 kV/Q+ D k 1 L a k 2 k 1 – k 1 t–() k 2 t()exp–exp[]D a k 2 t–()exp+= L1592_Frame_C49 Page 431 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC 49.5 Chloroform Risk Assessment. When drinking water is chlorinated, chloroform (a trihalomethane) is inadvertently created in concentrations of approximately 30 to 70 µ g/L. The model for estimating the maximum lifetime risk of cancer, for an adult, associated with the chloroform in the drinking water is: Use the given values to estimate the mean and standard deviation of the lifetime cancer risk. Under these conditions, which variable is the largest contributor to the variance of the cancer risk? Variable Mean Std. Deviation Chloroform concentration, C ( µ g/L) 50 15 Intake rate, IR (L /day) 2 0.4 Exposure duration, ED (yr) 70 12 Absorption factor, AF 0.8 0.05 Body weight, BW (kg) 70 20 Lifetime, LT (yr) 70 12 Potency factor, PF (kg-day/ µ g) 0.004 0.0015 Risk PFCIREDAF×××× BW LT× = L1592_Frame_C49 Page 432 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC 50 Using Simulation to Study Statistical Problems KEY WORDS bootstrap, lognormal distribution, Monte Carlo simulation, percentile estimation, random normal variate, random uniform variate, resampling, simulation, synthetic sampling, t -test. Sometimes it is difficult to analytically determine the properties of a statistic. This might happen because an unfamiliar statistic has been created by a regulatory agency. One might demonstrate the properties or sensitivity of a statistical procedure by carrying through the proposed procedure on a large number of synthetic data sets that are similar to the real data. This is known as Monte Carlo simulation , or simply simulation . A slightly different kind of simulation is bootstrapping . The bootstrap is an elegant idea. Because sampling distributions for statistics are based on repeated samples with replacement ( resamples ), we can use the computer to simulate repeated sampling. The statistic of interest is calculated for each resample to construct a simulated distribution that approximates the true sampling distribution of the statistic. The approximation improves as the number of simulated estimates increases. Monte Carlo Simulation Monte Carlo simulation is a way of experimenting with a computer to study complex situations. The method consists of sampling to create many data sets that are analyzed to learn how a statistical method performs. Suppose that the model of a system is y = f ( x ). It is easy to discover how variability in x translates into variability in y by putting different values of x into the model and calculating the corresponding values of y . The values for x can be defined as a probability density function. This process is repeated through many trials (1000 to 10,000) until the distribution of y values becomes clear. It is easy to compute uniform and normal random variates directly. The values generated from good commercial software are actually pseudorandom because they are derived from a mathematical formula, but they have statistical properties that cannot be distinguished from those of true random numbers. We will assume such a random number generating program is available. To obtain a random value Y U ( α , β ) from a uniform distribution over the interval ( α , β ) from a random uniform variate R U over the interval (0,1), this transformation is applied: In a similar fashion, a normally distributed random value Y N ( η , σ ) that has mean η and standard deviation σ is derived from a standard normal random variate R N (0, 1) as follows: Lognormally distributed random variates can be simulated from random normal variates using: Y U α , β () αβα –()R U 0,1()+= Y N η , σ () ησ R N 0,1()+== Y LN α , β () ησ R N 0,1()+()exp= L1592_Frame_C50 Page 433 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC Here, the logarithm of Y LN is normally distributed with mean η and standard deviation σ . The mean ( α ) and standard deviation ( β ) of the lognormal variable Y LN are: and You may not need to make the manipulations described above. Most statistics software programs (e.g., MINITAB, Systat, Statview) will generate standard uniform, normal, t , F , chi-square, Beta, Gamma, Bernoulli, binomial, Poisson, logistic, Weibull, and other distributions. Microsoft EXCEL will generate random numbers from uniform, normal, Bernoulli, binomial, and Poisson distributions. Equations for generating random values for the exponential, Gamma, Chi-square, lognormal, Beta, Weibull, Poisson, and binomial distributions from the standard uniform and normal variates are given in Hahn and Shapiro (1967). Another useful source is Press et al. (1992). Case Study: Properties of a Computed Statistic A new regulation on chronic toxicity requires enforcement decisions to be made on the basis of 4-day averages. Suppose that preliminary sampling indicates that the daily observations x are lognormally distributed with a geometric mean of 7.4 mg/L, mean η x = 12.2, and variance σ x = 16.0. If y = ln( x ), this corresponds to a normal distribution with η y = 2 and . Averages of four observations from this system should be more nearly normal than the parent lognormal population, but we want to check on how closely normality is approached. We do this empirically by constructing a distribution of simulated averages. The steps are: 1. Generate four random, independent, normally distributed numbers having η = 2 and σ = 1. 2. Transform the normal variates into lognormal variates x = exp( y ). 3. Average the four values to estimate the 4-day average ( ). 4. Repeat steps 1 and 2 one thousand times, or until the distribution of is sufficiently clear. 5. Plot a histogram of the average values. Figure 50.1(a) shows the frequency distribution of the 4000 observations actually drawn in order to compute the 1000 simulated 4-day averages represented by the frequency distribution of Figure 50.1(b). Although 1000 observations sounds likes a large number, the frequency distributions are still not smooth, but the essential information has emerged from the simulation. The distribution of 4-day averages is skewed, although not as strongly as the parent lognormal distribution. The median, average, and standard deviation of the 4000 lognormal values are 7.5, 12.3, and 16.1. The average of the 1000 4-day averages is 12.3; the standard deviation of the 4-day averages is 11.0; 90% of the 4-day averages are in the range of 5.0 to 26.5; and 50% are in the range of 7.2 to 15.4. Case Study: Percentile Estimation A state regulation requires the 99th percentile of measurements on a particular chemical to be less than 18 µ g/L. Suppose that the true underlying distribution of the chemical concentration is lognormal as shown in the top panel of Figure 50.2. The true 99th percentile is 13.2 µ g/L, which is well below the standard value of 18.0. If we make 100 random observations of the concentration, how often will the 99th percentile αη 0.5 σ 2 +()exp= βη 0.5 σ 2 +() σ 2 ()1–expexp= σ y 2 = 1 x 4 x 4 L1592_Frame_C50 Page 434 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC “violate” the 18- µ g/L limit? Will the number of violations depend on whether the 99th percentile is estimated parametrically or nonparametrically? (These two estimation methods are explained in Chapter 8.) These questions can be answered by simulation, as follows. 1. Generate a set of n = 100 observations from the “true” lognormal distribution . 2. Use these 100 observations to estimate the 99th percentile parametrically and nonparametrically. 3. Repeat steps 1 and 2 many times to generate an empirical distribution of 99th percentile values. Figure 50.2 shows the empirical distribution of 100 estimates of the 99th percentiles made using a nonparametric method, each estimate being obtained from 100 values drawn at random from the FIGURE 50.1 Left-hand panel: frequency distribution of 4000 daily observations that are random, independent, and have a lognormal distribution x = exp( y ), where y is normally distributed with η = 2 and σ = 1. Right-hand panel: frequency distribution of 1000 4-day averages, each computed from four random values sampled from the lognormal distribution. FIGURE 50.2 Distribution of 100 nonparametric estimates and 100 parametric estimates of the 99th percentile, each computed using a sample of n = 100 from the lognormal distribution shown in the top panel. x Histogram of 1000 4-day averages of lognormally distributed values Histogram of 4000 lognormally distributed values (14 values > 100) Percentage 4-day average x 4 0 10 20 30 40 50 60 700 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 0 10 20 30 40 L1592_Frame_C50 Page 435 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC log-normal distribution. The bottom panel of Figure 50.2 shows the distribution of 100 estimates made with the parametric method. One hundred estimates gives a rough, but informative, empirical distribution. Simulating one thousand estimates would give a smoother distribution, but it would still show that the parametric estimates are less variable than the nonparametric estimates and they are distributed more symmetrically about the true 99th percentile value of p 0.99 = 13.2. The parametric method is better because it uses the information that the data are from a lognormal distribution, whereas the nonparametric method assumes no prior knowledge of the distribution (Berthouex and Hau, 1991). Although the true 99th percentile of 13.2 µ g/L is well below the 18 µ g/L limit, both estimation methods show at least 5% violations due merely to random errors in sampling the distribution, and this is with a large sample size of n = 100. For a smaller sample size, the percentage of trials giving a violation will increase. The nonparametric estimation gives more and larger violations. Bootstrap Sampling The bootstrap method is random resampling, with replacement, to create new sets of data (Metcalf, 1997; Draper and Smith, 1998). Suppose that we wish to determine confidence intervals for the parameters in a model by the bootstrap method. Fitting the model to a data set of size n will produce a set of n residuals. Assuming the model is an adequate description of the data, the residuals are random errors. We can imagine that in a repeat experiment the residual of the original eighth observation might happen to become the residual for the third new observation, the original third residual might become the new sixth residual, etc. This suggests how n residuals drawn at random from the original set can be assigned to the original observations to create a set of new data. Obviously this requires that the original data be a random sample so that residuals are independent of each other. The resampling is done with replacement, which means that the original eighth residual can be used more than once in the bootstrap sample of new data. The boostrap resampling is done many times, the statistics of interest are estimated from the set of new data, and the empirical reference distributions of the statistics are compiled. The number of resamples might depend on the number of observations in the pool that will be sampled. One recommendation is to resample B = n [ln(n)] 2 times, but it is common to round this up to 100, 500, or 1000 (Peigorsch and Bailer, 1997). The resampling is accomplished by randomly selecting the mth observation using a uniformly distributed random number between 1 and n: where R U (0,1) is uniformly distributed between 0 and 1. The resampling continues with replacement until n observations are selected. This is the bootstrap sample. The bootstrap method will be applied to estimating confidence intervals for the parameters of the model y = β 0 + β 1 x that were obtained by fitting the data in Table 50.1. Of course, there is no need to bootstrap this problem because the confidence intervals are known exactly, but using a familiar example makes it easy to follow and check the calculations. The fitted model is = 49.13 + 0.358x. The bootstrap procedure is to resample, with replacement, the 10 residuals given in Table 50.1. Table 50.2 shows five sets of 10 random numbers that were used to generate the resampled residuals and new y values listed in Table 50.3. The model was fitted to each set of new data to obtain the five pairs of parameter estimates shown Table 50.4, along with the parameters from the original fitting. If this process were repeated a large number of times (i.e., 100 or more), the distribution of the intercept and slope would become apparent and the confidence intervals could be inferred from these distribution. Even with this very small sample, Figure 50.3 shows that the elliptical joint confidence region is starting to emerge. m i round nR U 0,1()0.501+[]= y ˆ L1592_Frame_C50 Page 436 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC Comments Another use of simulation is to test the consequences of violation the assumptions on which a statistical procedure rests. A good example is provided by Box et al. (1978) who used simulation to study how nonnormality and serial correlation affect the performance of the t-test. The effect of nonnormality was not very serious. In a case where 5% of tests should have been significant, 4.3% were significant for TABLE 50.1 Data and Residuals Associated with the Model = 49.14 + 0.358x Observation xy Residual 1 23 63.7 57.37 6.33 2 25 63.5 58.09 5.41 3 40 53.8 63.46 −9.66 4 48 55.7 66.32 −10.62 5 64 85.5 72.04 13.46 6 94 84.6 82.78 1.82 7 118 84.9 91.37 −6.47 8 125 82.8 93.88 −11.08 9 168 123.2 109.27 13.93 10 195 115.8 118.93 −3.13 TABLE 50.2 Random Numbers from 1 to 10 that were Generated to Resample the Residuals in Table 50.1 Resample Random Number (from 1 to 10) 1 9 6 8912282 7 2 1 9 873252210 3 10 4 8766827 1 4 310 8572796 3 5 7 310441765 9 TABLE 50.3 New Residuals and Data Generated by Resampling, with Replacement, Using the Random Numbers in Table 50.2 and the Residuals in Table 50.1 Random No. 9689122827 Residual 13.93 1.82 −11.08 13.93 6.33 5.41 5.41 −11.08 5.41 −6.47 New y 77.63 65.32 42.72 69.63 91.83 90.01 90.31 71.72 128.61 109.33 Random No. 19873252210 Residual 6.33 13.93 −11.08 −6.47 −9.66 5.41 13.46 5.41 5.41 −3.13 New y 70.03 77.43 42.72 49.23 75.84 90.01 98.36 88.21 128.61 112.67 Random No. 10487668271 Residual −3.13 −10.62 −11.08 −6.47 1.82 1.82 −11.08 5.41 −6.47 6.33 New y 60.57 52.88 42.72 49.23 87.32 86.42 73.82 88.21 116.73 122.13 Random No. 31085727963 Residual −9.66 −3.13 −11.08 13.46 −6.47 5.41 −6.47 13.93 1.82 −9.66 New y 54.04 60.37 42.72 69.16 79.03 90.01 78.43 96.73 125.02 106.14 Random No. 73104417659 Residual −6.47 −9.66 −3.33 −10.62 −10.62 6.33 −6.47 1.82 13.46 13.93 New y 57.23 53.84 50.67 45.08 74.88 90.93 78.43 84.62 136.66 129.73 y ˆ y ˆ L1592_Frame_C50 Page 437 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC normally distributed data, 6.0% for a rectangular parent distribution, and 5.9% for a skewed parent distribution. The effect of modest serial correlation in the data was much greater than these differences due to nonnormality. A positive autocorrelation of r = 0.4 inflated the percentage of tests found significant from the correct level of 5% to 10.5% for the normal distribution, 12.5% for a rectangular distribution, and 11.4% for a skewed distribution. They also showed that randomization would negate the autocorrelation and give percentages of significant results at the expected level of about 5%. Normality, which often causes concern, turns out to be relatively unimportant while serial correlation, which is too seldom considered, can be ruinous. The bootstrap method is a special form of simulation that is based on resampling with replacement. It can be used to investigate the properties of any statistic that may have unusual properties or one for which a convenient analytical solution does not exist. Simulation is familiar to most engineers as a design tool. Use it to explore and discover unknown properties of unfamiliar statistics and to check the performance of statistical methods that might be applied to data with nonideal properties. Sometimes we find that our worries are misplaced or unfounded. References Berthouex, P. M. and I. Hau (1991). “Difficulties in Using Water Quality Standards Based on Extreme Percentiles,” Res. J. Water Pollution Control Fed., 63(5), 873–879. Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, New York, Wiley Interscience. Draper, N. R. and H. Smith, (1998). Applied Regression Analysis, 3rd ed., New York, John Wiley. Hahn, G. J. and S. S. Shapiro (1967). Statistical Methods for Engineers, New York, John Wiley. Metcalf, A. V. (1997). Statistics in Civil Engineering, London, Arnold. TABLE 50.4 Parameter Estimates for the Original Data and for Five Sets of New Data Generated by Resampling the Residuals in Table 50.1 Data Set b 0 b 1 Original 49.14 0.358 Resample 1 56.22 0.306 Resample 2 49.92 0.371 Resample 3 41.06 0.410 Resample 4 46.68 0.372 Resample 5 36.03 0.491 FIGURE 50.3 Emerging joint confidence region based the original data plus five new sets generated by resampling, with replacement. 100806040200 0 5 0 Resample data Original data 0. 0. 1. Intercept (b 0 ) Slope (b 1 ) L1592_Frame_C50 Page 438 Tuesday, December 18, 2001 3:36 PM © 2002 By CRC Press LLC Peigorsch, W. W. and A. J. Bailer (1997). Statistics for Environmental Biology and Toxicology, New York, Chapman & Hall. Press, W. H., B. P. Flannery, S. A. Tenkolsky, and W. T. Vetterling (1992). Numerical Recipes in FORTRAN: The Art of Scientific Computing, 2nd ed., Cambridge, England, Cambridge University Press. Exercises 50.1 Limit of Detection. The Method Limit of Detection is calculated using MDL = 3.143s, where s is the standard deviation of measurements on seven identical aliquots. Use simulation to study how much the MDL can vary due to random variation in the replicate measurements if the true standard deviation is σ = 0.4. 50.2 Nonconstant Variance. Chapter 37 on weighted least squares discussed a calibration problem where there were three replicate observations at several concentration levels. By how much can the variance of triplicate observations vary before one would decide that there is nonconstant variance? Answer this by simulating 500 sets of random triplicate observations, calculating the variance of each set, and plotting the histogram of estimated variances. 50.3 Uniform Distribution. Data from a process is discovered to have a uniform distribution with mean 10 and range 2. Future samples from this process will be of size n = 10. By simulation, determine the reference distribution for the standard deviation, the standard error of the mean, and the 95% confidence interval of the mean for samples of size n = 10. 50.4 Regression. Extend the example in Table 50.3 and add five to ten more points to Figure 50.3. 50.5 Bootstrap Confidence Intervals. Fit the exponential model y = θ 1 exp(− θ 2 x) to the data below and use the bootstrap method to determine the approximate joint confidence region of the parameter estimates. Optional: Add two observations (x = 15, y = 14 and x = 18, y = 8) to the data and repeat the bootstrap experiment to see how the shape of the confidence region is changed by having data at larger values of x. 50.6 Legal Statistics. Find an unfamiliar or unusual statistic in a state or U.S. environmental regulation and discover its properties by simulation. 50.7 99th Percentile Distribution. A quality measure for an industrial discharge (kg/day of TSS) has a lognormal distribution with mean 3000 and standard deviation 2000. Use simulation to construct a reference distribution of the 99th percentile value of the TSS load. From this distribution, estimate an upper 90% confidence limit for the 99th percentile. x 1 4 8 10 11 y 179 104 51 35 30 L1592_Frame_C50 Page 439 Tuesday, December 18, 2001 3:36 PM [...]... 53.1 Forecasts and Forecast Errors for the AR(1) Process Shown in Figure 53.1 Time Predicted Observed Forecast error 50 51 52 53 54 55 10. 6 10. 4 10. 6 0.18 10. 3 10. 0 −0.3 10. 2 10. 3 −0.1 10. 1 11.7 1.6 10. 1 11.8 1.7 For lead time ᐉ = 5: 5 ˆ Z t ( 5 ) = 10. 0 + 0.72 ( 10. 6 – 10. 0 ) = 10. 10 As ᐉ increases, the forecasts will exponentially converge to: ˆ Z t ( ᐉ ) ≈ η = 10 A statement of the forecast precision... this for τ = 3 52.2 Input–Output For the discrete transfer function model y t = 0.2x t + 0.8y t−1, calculate the output yt for the inputs given below Time x(t) Time x(t) Time x(t) Time x(t) Time x(t) 1 2 3 4 5 6 7 8 9 10 10 10 10 15 10 10 10 10 10 10 11 12 13 14 15 16 17 18 19 20 10 10 20 20 20 20 20 20 20 20 21 22 23 24 25 26 27 28 29 30 20 50 50 50 50 50 50 10 10 10 31 32 33 34 35 36 37 38 39 40 10. .. predictions and forecast errors shown in Table 53.1 were made as follows The observed value at t = 50 is Z50 = 10. 6 The one-step-ahead forecast is: ˆ Z t ( 1 ) = 10. 0 + 0.72 ( 10. 6 – 10. 0 ) = 10. 43 This has been rounded to 10. 4 in Table 53.1 For lead time ᐉ = 2: ˆ Z t ( 2 ) = 10. 0 + 0.72 ( 10. 43 – 10. 0 ) = 10. 3 or the equivalent: 2 ˆ Z t ( 2 ) = 10. 0 + 0.72 ( 10. 6 – 10. 0 ) = 10. 3 14 12 y ⊕ 10 ⊕ 8 0 20... good forecasts? Plot the forecast errors and see if they look like a random series “If they don’t and it looks as if each error might be forecast to some extent from its predecessors, then your forecasting method is not predicting good forecasts (for if you can forecast forecasting errors, the you can obviously obtain a better forecast than the one you’ve got)” (Box, 1991) Forecast errors from a good forecasting... December 18, 2001 3:40 PM For θ = 0.6: 2 3 ˜ z t = 0.4 ( z t + 0.6z t−1 + 0.6 z t−2 + 0.6 z t−3 + … ) ˜ ˆ The one-step-ahead forecast for the EWMA is the EWMA z t+1 = z t This is also the forecast for several days ahead; the forecast from origin t for any distance ahead is a straight horizontal line To update the forecast as new observations become available, use the forecast updating formula: ˆ ˆ z t+1... 17 18 19 20 10 10 20 20 20 20 20 20 20 20 21 22 23 24 25 26 27 28 29 30 20 50 50 50 50 50 50 10 10 10 31 32 33 34 35 36 37 38 39 40 10 15 15 10 10 20 20 20 25 20 41 42 43 44 45 46 47 48 49 50 10 10 10 10 20 30 25 10 10 10 52.3 Model Fitting Fit a model of the form y t = θ 1 x t + θ 2 y tϪ1 + a t to the data below Evaluate the fit by (a) plotting the predicted values over the data, (b) plotting the residuals... 7.3 7.3 11.4 8.5 10. 0 12.1 8.3 10. 9 12.1 7.0 7.5 9.0 8.4 6.4 … y(t) 10. 1 12.0 11.6 12.9 16.7 12.7 11.7 9.4 10. 8 8.5 13.4 11.2 12.6 10. 1 8.2 … 12.0 11.6 12.9 16.7 12.7 11.7 9.4 10. 8 8.5 13.4 11.2 12.6 10. 1 8.2 8.1 … Note: y(t) is the dependent variable and x(t) and y(t – 1) are the independent (predictor) variables 50 Fitted model yt = 0.19 xt + 0.81 yt–1 40 y 30 20 10 0 0 20 40 60 80 100 FIGURE 52.2... variance of the forecast error converges to the variance of the process about the mean value The forecasts for nonstationary processes (i.e., MA processes) do not converge to a mean value because there is no long-term mean for a nonstationary process) The forecast from origin t for the useful IMA(0,1,1) model (EMWA forecasts) is a horizontal line projected from the forecast origin The forecast variance... 53.4 AR(1) Process Simulate an AR(1) time series with φ = 0.7 and σ a = 1.00 for n = 100 2 observations Fit the simulated data to estimate φ and σ a for the actual series Then, using the estimated values, forecast the values for t = 101 to t = 105 Calculate the forecast errors and the approximate 95% confidence intervals of the forecasts 2 © 2002 By CRC Press LLC ... 53.1 AR(1) Forecasting Assume the current value of Zt is 165 from a process that has the AR(1) model zt = 0.4zt−1 + at and mean 162 (a) Make one-step-ahead forecasts for the 10 observations t 121 122 123 124 125 126 127 128 129 130 zt 2.1 2.8 1.5 1.2 0.4 2.7 1.3 −2.1 0.4 0.9 (b) Calculate the 50 and 95% confidence intervals for the forecasts in part (a) (c) Make forecasts from origin t = 130 for days . of y for k = 0.5, V = 10, Q = 1, and x = 100 , assuming the standard deviation of each variable is 10% of its value (i.e., σ k = 0.1(0.05) = 0.005). (b) Evaluate the variance of k for V = 10, . 4000 lognormally distributed values (14 values > 100 ) Percentage 4-day average x 4 0 10 20 30 40 50 60 700 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 0 10 20 30 40 L1592_Frame_C50 Page 435 Tuesday,. Residuals in Table 50.1 Resample Random Number (from 1 to 10) 1 9 6 8912282 7 2 1 9 873252 210 3 10 4 8766827 1 4 310 8572796 3 5 7 3104 41765 9 TABLE 50.3 New Residuals and Data Generated by

Định dạng
Số trang	48
Dung lượng	1,81 MB