CHAPTER 5 Environmental Monitoring 5.1 Introduction The increasing worldwide concern about threats to the natural environment both on a local and a global scale has led to the introduction of many monitoring schemes that are intended to provide an early warning of violations of quality control systems, to detect the effects of major events such as accidental oil spills or the illegal disposal of wastes, and to study long-term trends or cycles in key environmental variables. Examples of some of the national monitoring schemes that are now operating are the United States Environmental Protection Agency's Environmental Monitoring and Assessment Program (EMAP) based on 12,600 hexagons each with an area of 40 square kilometres, the United Kingdom Environmental Change Network (ECN) based on nine sites, and the Swedish Environmental Monitoring Program based on 20 sites. In all three of these schemes a large number of variables are recorded on a regular basis to describe physical aspects of the land, water and atmosphere, and the abundance of many species of animals and plants. Around the world numerous smaller scale monitoring schemes are also operated for particular purposes, such as to ensure that the quality of drinking water is adequate. Monitoring schemes to detect unexpected changes and trends are essentially repeated surveys. The sampling methods described in Chapter 2 are therefore immediately relevant. In particular, if the mean value of a variable for the sample units in a geographical area is of interest, then the population of units should be randomly sampled so that the accuracy of estimates can be assessed in the usual way. Modifications of simple random sampling such as stratified sampling may well be useful to improve efficiency. The requirements of environmental monitoring schemes have led to an interest in special types of sampling designs that include aspects of random sampling, good spatial cover, and the gradual replacement of sampling sites over time (Skalski, 1990; Stevens and Olsen, 1991; Overton et al., 1991; Urquhart et al., 1993; Conquest and Ralph, 1998). Designs that are optimum in some sense have also been developed (Fedorov and Mueller, 1989; Caselton et al., 1992). © 2001 by Chapman & Hall/CRC Although monitoring schemes sometimes require fairly complicated designs, as a general rule it is a good idea to keep designs as simple as possible so that they are easily understood by administrators and the public. Simple designs also make it easier to use the data for purposes that were not foreseen in the first place, which is something that will often occur. As noted by Overton and Stehman (1995, 1996), complex sample structures create potential serious difficulties that do not exist with simple random sampling. 5.2 Purposely Chosen Monitoring Sites For practical reasons the sites for long-term monitoring programs are often not randomly chosen. For example, Cormack (1994) notes that the nine sites for the United Kingdom ECN were chosen on the basis of having: (a) a good geographical distribution covering a wide range of environmental conditions and the principal natural and managed ecosystems; (b) some guarantee of long-term physical and financial security; (c) a known history of consistent management; (d) reliable and accessible records of past data, preferably for ten or more years; and (e) sufficient size to allow the opportunity for further experiments and observations. In this scheme it is assumed that the initial status of sites can be allowed for by only considering time changes. These changes can then be related to differences between the sites in terms of measured meteorological variables and known geographical differences. 5.3 Two Special Monitoring Designs Skalski (1990) suggested a rotating panel design with augmentation for long-term monitoring. This takes the form shown in Table 5.1 if there are eight sites that are visited every year and four sets of ten sites that are rotated. Site set 7, for example, consists of ten sites that are visited in years 4 to 7 of the study. The number of sites in different © 2001 by Chapman & Hall/CRC sets is arbitrary. Preferably, the sites will be randomly chosen from an appropriate population of sites. This design has some appealing properties: the sites that are always measured can be used to detect long-term trends but the rotation of blocks of ten sites ensures that the study is not too dependent on an initial choice of sites that may be unusual in some respects. Table 5.1 Skalski's (1990) rotating panel design with augmentation. Every year 48 sites are visited. Of these, 8 are always the same and the other 40 sites are in four blocks of size ten, such that each block of ten remains in the sample for four years after the initial start up period Site set Number of sites 1 2 3 4 5 6 7 8 9 10 11 12 0 8 x x x x x x x x x x x x 1 10 x 2 10 x x 3 10 x x x 4 10 x x x x 5 10 x x x x 6 10 x x x x 7 10 x x x x . . . . . . 14 10 x x 15 10 x The serially alternating design with augmentation that is used for EMAP is of the form shown in Table 5.2. It differs because sites are not rotated out of the study. Rather, there are eight sites that are measured every year and another 160 sites in blocks of 40, where each block of 40 is measured every four years. The number of sites in different sets is at choice in a design of this form. Sites should be randomly selected from an appropriate population. Urquhart et al. (1993) compared the efficiency of the designs in Tables 5.1 and 5.2 when there are a total of 48 sites, of which the © 2001 by Chapman & Hall/CRC number visited every year (i.e., in set 0) ranged from 0 to 48. To do this, they assumed the model Y ijk = S i(j)k + T j + e ijk , where Y ijk is a measure of the condition at site i, in year j, within site set k; S i(j)k is an effect specific to site i, in site set k, in year j, T j is a year effect common to all sites, and e ijk is a random disturbance. They also allowed for autocorrelation between the overall year effects, and between the repeated measurements at one site. They found the design of Table 5.2 to always be better for estimating the current mean and the slope in a trend because more sites are measured in the first few years of the study. However, in a more recent study which compared the two designs in terms of variance and cost, Lesser and Kalsbeek (1997) concluded that the first design tends to be better for detecting short-term change while the second design tends to be better for detecting long-term change. Table 5.2 A serially alternating design with augmentation. Every year 48 sites are measured. Of these, eight sites are always the same and the other 40 sites are measured every four years Site set Number of sites 1 2 3 4 5 6 7 8 9 10 11 12 0 8 x x x x x x x x x x x x 1 40 x x x 2 40 x x x 3 40 x x x 4 40 x x x The EMAP sample design is based on approximately 12,600 points on a grid, each of which is the centre of a hexagon with area 40 km². The grid is itself within a large hexagonal region covering much of North America, as shown in Figure 5.1. The area covered by the 40 km² hexagons entered on the grid points is one sixteenth of the total area of the conterminous United States, with the area used being chosen after a random shift in the grid. Another aspect of the design is that the four sets of sites that are measured on different years are © 2001 by Chapman & Hall/CRC spatially interpenetrating, as indicated in Figure 5.2. This allows the estimation of parameters for the whole area every year. Figure 5.1 The EMAP baseline grid for North America. The shaded area shown is covered by about 12,600 small hexagons, with a spacing between their centres being of 27 km. Figure 5.2 The use of spatially interpenetrating samples for visits at four year intervals. © 2001 by Chapman & Hall/CRC 5.4 Designs Based on Optimization One approach to the design of monitoring schemes is by choosing the sites so that the amount of information is in some sense maximized. The main question then is how to measure the information that is to be maximized, particularly if the monitoring scheme has a number of different objectives, some of which will only become known in the future. One possibility involves choosing a network design, or adding or subtracting stations to minimize entropy, where low entropy corresponds to high 'information' (Caselton et al., 1992). The theory is complex, and needs more prior information than will usually be available, particularly if there is no existing network to provide this. Another possibility considers the choice of a network design to be a problem of the estimation of a regression function for which a classical theory of optimal design exists (Fedorov and Mueller, 1989). 5.5 Monitoring Designs Typically Used In practice, sample designs for monitoring often consist of selecting a certain number of sites preferably (but not necessarily) at random from the potential sites in a region, and then measuring the variable of interest at those sites at a number of points in time. A complication is that for one reason or another some of the sites may not be measured at some of the times. A typical set of data will then look like the data in Table 5.3 for pH values measured on lakes in Norway. With this set of data, which is part of the more extensive data that are shown in Table 1.1 and discussed in Example 1.2, the main question of interest is whether there is any evidence for changes from year to year in the general level of pH and, in particular, whether the pH level was tending to increase or decrease. 5.6 Detection of Changes by Analysis of Variance A relatively simple analysis for data like the Norwegian lake pH values shown in Table 5.3 involves carrying out a two factor analysis of variance, as discussed in Section 3.5. The two factors are then the site and the time. The model for the observation at site i at time j is y ij = µ + S i + T j + e ij , (5.1) © 2001 by Chapman & Hall/CRC where µ represents an overall general level for the variable being measured, S i represents the deviation of site i from the general level, T j represents a time effect, and e ij represents measurement errors and other random variation that is associated with the observation at the site at the particular time. The model (5.1) does not include a term for the interaction between sites and times as is included in the general two-factor analysis of variance model as defined in equation (4.31). This is because there is only at most one observation for a site in a particular year, which means that it is not possible to separate interactions from measurement errors. Consequently, it must be assumed that any interactions are negligible. Example 5.1 Analysis of Variance on the pH Values The results of an analysis of variance on the pH values for Norwegian lakes are summarised in Table 5.4. The results in this table were obtained using the MINITAB package (Minitab Inc., 1994) using an option that takes into account the missing values, although many other standard statistical packages could have been used just as well. The effects in the model were assumed to be fixed rather than random (as discussed in Section 4.5), although since interactions are assumed to be negligible the same results would be obtained using random effects. It is found that there is a significant difference between the lakes (p = 0.000) and a nearly significant difference between the years (p = 0.061). Therefore there is no very strong evidence from this analysis of differences between years. To check the assumptions of the analysis, standardized residuals (the differences between the actual observations and those predicted by the model, divided by their standard deviations) can be plotted against the lake, the year, and against their position in space for each of the four years. These plots are shown in Figures 5.3 and 5.4. These residuals show no obvious patterns so that the model seems satisfactory, except that there are one or two residuals that are rather large. © 2001 by Chapman & Hall/CRC Table 5.3 Values for pH for lakes in southern Norway with the latitudes (Lat) and longitudes (Long) for the lakes pH Lake Lat Long 1976 1977 1978 1981 1 58.0 7.2 4.59 4.48 4.63 2 58.1 6.3 4.97 4.60 4.96 4 58.5 7.9 4.32 4.23 4.40 4.49 5 58.6 8.9 4.97 4.74 4.98 5.21 6 58.7 7.6 4.58 4.55 4.57 4.69 7 59.1 6.5 4.80 4.74 4.94 8 58.9 7.3 4.72 4.81 4.83 4.90 9 59.1 8.5 4.53 4.70 4.64 4.54 10 58.9 9.3 4.96 5.35 5.54 5.75 11 59.4 6.4 5.31 5.14 4.91 5.43 12 58.8 7.5 5.42 5.15 5.23 5.19 13 59.3 7.6 5.72 5.73 5.70 15 59.3 9.8 5.47 5.38 5.38 17 59.1 11.8 4.87 4.76 4.87 4.90 18 59.7 6.2 5.87 5.95 5.59 6.02 19 59.7 7.3 6.27 6.28 6.17 6.25 20 59.9 8.3 6.67 6.44 6.28 6.67 21 59.8 8.9 6.06 5.80 6.09 24 60.1 12.0 5.38 5.32 5.33 5.21 26 59.6 5.9 5.41 5.94 30 60.4 10.2 5.60 6.10 5.57 5.98 32 60.4 12.2 4.93 4.94 4.91 4.93 34-1 60.5 5.5 4.90 4.87 36 60.9 7.3 5.60 5.69 5.41 5.66 38 60.9 10.0 6.72 6.59 6.39 40 60.7 12.2 5.97 6.02 5.71 5.67 41 61.0 5.0 4.68 4.72 5.02 42 61.3 5.6 5.07 5.18 43 61.0 6.9 6.23 6.34 6.20 6.29 46 61.0 9.7 6.64 6.24 6.37 47 61.3 10.8 6.15 6.23 6.07 5.68 49 61.5 4.9 4.82 4.77 5.09 5.45 50 61.5 5.5 5.42 4.82 5.34 5.54 57 61.7 4.9 4.99 5.16 5.25 58 61.7 5.8 5.31 5.77 5.60 5.55 59 61.9 7.1 6.26 5.03 5.85 65 62.2 6.4 5.99 6.10 5.99 6.13 80 58.1 6.7 4.63 4.59 4.92 81 58.3 8.0 4.47 4.36 4.50 82 58.7 7.1 4.60 4.54 4.66 83 58.9 6.1 4.88 4.99 4.86 4.92 85 59.4 11.3 4.60 4.88 4.91 4.84 86 59.3 9.4 4.85 4.65 4.77 4.84 87 59.2 7.6 5.06 5.15 5.11 88 59.4 7.3 5.97 5.82 5.90 6.17 89 59.3 6.3 5.47 6.05 5.82 94 61.0 11.5 6.05 5.97 5.78 5.75 95-1 61.2 4.6 5.70 5.50 Mean 5.34 5.40 5.31 5.38 SD 0.65 0.66 0.57 0.56 © 2001 by Chapman & Hall/CRC Table 5.4 Analysis of variance table for the data on pH levels in Norwegian lakes Source of Variation Sum of Squares 1 Degrees of Freedom Mean Square F Signif- icance level (p) Lake 58.70 47 1.249 37.95 0.000 Year 0.25 3 0.083 2.53 0.061 Error 3.85 117 0.033 Total 62.80 167 1 The sums of squares shown here depend on the order in which effects are added into the model, which is species, then the treatment, and finally the interaction between these two factors. Figure 5.3 Standardized residuals from the analysis of variance model for pH in Norwegian lakes plotted against the lake number and the year number. 5.7 Detection of Changes Using Control Charts Control charts are used to monitor industrial processes (Montgomery, 1991) and they can be used equally well with environmental data. The simplest approach involves using an x chart to detect changes in a process mean, together with a range chart to detect changes in the amount of variation. These types of charts are often called Shewhart control charts after their originator (Shewhart, 1931). Typically, the starting point is a moderately large set of data consisting of M random samples of size n, where these are taken at equally spaced intervals of time from the output of the process. This set of data is then used to estimate the process mean and standard deviation, and hence to construct the two charts. The data are then © 2001 by Chapman & Hall/CRC plotted on the charts. It is usually assumed that the observations are normally distributed. If the process seems to have a constant mean and standard deviation, then the sampling of the process is continued with new points being plotted to monitor whatever is being measured. If the mean or standard deviation does not seem to have been constant for the time when the initial samples were taken, then in the industrial process situation, action is taken to bring the process under control. With environmental monitoring this may not be possible. However, the knowledge that the process being measured is not stable will be of interest anyway. Figure 5.4 Standardized residuals from the analysis of variance model for pH in Norwegian lakes plotted against the locations of the lakes. The standardized residuals are rounded to the nearest integer for clarity. The method for constructing the x-chart involves the following stages: © 2001 by Chapman & Hall/CRC [...]... 7. 75 7.11 7.87 7.90 7.94 7.97 7 .51 7.30 8.01 7.08 7.14 7.74 7.73 7. 95 Mean Range 7.34 1.06 7.71 0.89 7.67 1.16 7.89 0 .52 7.81 0.98 7 .55 0 .50 7.71 0.84 7.63 0.48 7.79 0 .50 7.70 1. 05 7.67 0. 75 7.67 0.70 7.78 0 .57 7.79 1.69 7. 75 0.64 7.79 0.17 7 .53 1.08 7.60 0.70 7.86 1.28 7. 85 0.39 7.92 0 .58 7. 45 0 .55 7. 65 0 .56 7 .50 0.67 7.74 1.36 7.66 0.84 7 .58 0.72 7.67 0.60 7.47 0. 75 7 .56 0 .54 7.86 0 .56 7.43 0.78 7 .56 ... 7 .50 7 .52 7 .57 7.44 7.37 6.93 7.14 7.26 7 .58 7.09 7. 65 7.78 7.63 7.98 7.70 7.62 7.74 7.80 7.80 7.22 7.88 7.90 8.06 8.22 7.82 7 .58 7.42 7.76 7.72 7 .56 7.90 7 .57 7. 75 7 .52 7.78 7. 75 7.30 7.83 7.89 7. 35 7 .56 7.67 7. 65 7 .53 7.68 7. 45 7 .50 7.22 7.47 7.80 8.03 7.02 7.84 7.83 7.83 8. 05 7 .58 7.97 7.61 7.89 7.89 7.27 7.72 7.49 7.86 7 .58 7.76 8.12 7.70 7.64 7.61 7.03 6.99 6.99 7.73 8.06 7.96 7.41 7. 95 7. 45 7 .57 ... 7.82 0.32 7. 65 0.64 7.47 1.02 7 .53 0.84 7.79 1.04 7.70 0.84 7.64 0 .53 7.90 0 .53 7.67 0.68 7 .57 0.71 7.39 0 .54 7.72 0.37 7.64 0.41 7. 85 0.89 7.78 0.89 7 .56 0.84 7.47 0 .50 7.68 0 .57 7. 05 0.44 7.26 0.48 7.27 0.63 7 .58 0.47 7.77 0 .55 7. 65 0.83 7.62 0.39 7.79 0.96 7.77 0 .52 7.83 0. 65 7.82 0.72 7.62 0.34 7. 45 0. 85 7. 65 0.60 7.78 0.60 7 .59 0.64 7.42 0.74 7.73 0.64 7.70 0 .57 Table 5. 6 (Continued) Year 1996 1997... May Jun Jul Aug Sep Oct Nov Dec 7.29 7 .50 8.12 7.64 7 .59 7.60 7.07 7. 65 7 .51 7.81 7.16 7.67 7.97 7.17 7 .52 7. 65 7.62 7.10 7. 85 7.39 8.18 7.67 7 .57 7.97 pH Values 7.62 7. 95 7 .50 7.90 7.71 7.20 7. 75 7.80 7 .57 7.86 7.97 7.14 7.70 7.33 7.68 7.99 7.64 7. 25 7 .53 7.88 7. 85 7.63 8. 05 8.12 7.04 7.48 7.69 8. 15 7.84 8.12 7.14 7.38 7.64 8.17 7.16 7.71 7.62 7.68 7 .53 7.11 7. 75 7.86 7.66 7.87 7.92 7.72 8.16 7.70 7.72... 7 .56 7.19 7.60 7 .50 8.13 7.23 7.08 7 .55 7. 75 6.94 7.46 7.62 7. 45 7. 65 7. 85 7 .56 8.18 7.63 7 .59 7.47 7 .52 7.61 7.30 7. 75 7.77 7.79 7.87 8.01 © 2001 by Chapman & Hall/CRC pH Values 7.96 7.86 7.11 8 .53 7.68 7. 15 7.20 7.42 7. 35 7.68 6.96 7 .56 7.12 7.70 7.41 7.47 7.77 6.96 7.41 7.41 7.32 7. 65 7.74 7. 95 8.01 7.37 7.39 7.03 7.17 7.97 8.22 7.64 7.80 7.28 7.41 7.79 7. 75 7.76 7.84 7.71 7.92 7.43 7.73 7.21 7.49... 8.13 7.82 8.09 7 .51 7. 65 7.13 7. 25 7.80 7 .58 7.30 7.22 7.91 7.97 7.63 7.80 7.69 7.80 7 .59 7.07 7.26 7.83 7.74 7 .55 7.68 7.13 7.32 7.61 7. 85 7.83 7.77 7.23 7.64 8.08 7.89 7.23 8.10 7.17 7.72 7.71 7.82 8.29 7.62 7.88 8.19 7.32 7.91 7 .51 7.04 8.13 7.81 7. 75 7.83 8. 05 7. 75 7.97 7.60 7.62 7.78 7.16 7.68 7.26 7.37 7.82 7.29 8.11 7. 75 7.77 7 .54 7.42 7.71 8.08 7.21 7. 95 7.61 7.48 7.80 7. 45 7.47 8.03 7.70 7.63... 7.49 7.09 7 .56 7.82 7.49 7.78 7. 85 7.96 7.12 7.76 7.74 7.12 7.24 7.60 7.86 7.96 7 .51 7 .59 7.13 7.18 8.12 7.36 7.82 7. 95 7.21 7.33 7.80 7. 45 7.24 7.83 7.40 7.14 8.01 6.96 7.47 7.06 7.31 7 .51 7.13 7 .59 7.79 7.97 7.73 8.19 7.76 6.87 7.72 7. 75 7.14 7.09 7.99 7.44 Mean Range 7. 85 0.79 7.64 1.42 7.61 0.70 7 .58 0.76 7 .52 0.82 7.46 0.84 7.48 0.62 7.44 0.68 7 .58 1.07 7.41 0 .58 7.72 0.62 7.82 0.32 7. 65 0.64 7.47... the standard deviation For example, for samples of size 3 the standard deviation is 0 .59 1µ R Source: Tables G1 and G2 of Davies and Goldsmith (1972) Sample size 2 3 4 5 6 7 8 9 10 Lower limits Action Warning 0.00 0.04 0.04 0.18 0.10 0.29 0.16 0.37 0.21 0.42 0.26 0.46 0.29 0 .50 0.32 0 .52 0. 35 0 .54 Upper limits Warning Action 2.81 4.12 2.17 2.99 1.93 2 .58 1.81 2.36 1.72 2.22 1.66 2.12 1.62 2.04 1 .58 1.99... 0. 45 7.96 0 .51 7.640 0.694 Figure 5. 5 Histogram of the distribution of pH for samples from lakes in the South Island of New Zealand, 1989 to 1997 © 2001 by Chapman & Hall/CRC (a) x Chart (b) Range Chart Figure 5. 6 Control charts for pH levels in rivers in the South Island of New Zealand For a process that is stable, about 1 in 40 points should plot outside one of the warning limits (LWL and UWL) and. .. 7.88 6.96 7. 85 7.23 7 .56 7 .57 7.71 7.39 7.77 7.82 7.73 8.21 7.98 7.69 7 .56 7.73 7.22 7.72 7.26 7.72 7.91 7 .50 7.66 7.77 8.24 7.47 8.07 7.66 7 .53 7. 15 7.72 7.03 7.77 7 .51 7.47 7.74 Mean Mean Range 7.71 0.69 7 .54 0.78 7.60 0.92 7.73 0.16 7.63 0.70 7.63 0.83 7. 35 0.63 7.64 0.82 7.63 0.66 7 .57 0.77 7.64 0.72 7.80 0.74 7.72 1.20 7.49 1.19 7.88 0.60 7.41 0 .52 7.70 0.64 7.34 0.61 7.72 0.23 7.29 0 .50 7.87 0.43 . 5. 35 5 .54 5. 75 11 59 .4 6.4 5. 31 5. 14 4.91 5. 43 12 58 .8 7 .5 5.42 5. 15 5.23 5. 19 13 59 .3 7.6 5. 72 5. 73 5. 70 15 59.3 9.8 5. 47 5. 38 5. 38 17 59 .1 11.8 4.87 4.76 4.87 4.90 18 59 .7 6.2 5. 87 5. 95 5 .59 . 6. 15 6.23 6.07 5. 68 49 61 .5 4.9 4.82 4.77 5. 09 5. 45 50 61 .5 5 .5 5.42 4.82 5. 34 5. 54 57 61.7 4.9 4.99 5. 16 5. 25 58 61.7 5. 8 5. 31 5. 77 5. 60 5. 55 59 61.9 7.1 6.26 5. 03 5. 85 65 62.2 6.4 5. 99 6.10 5. 99. 59 .2 7.6 5. 06 5. 15 5.11 88 59 .4 7.3 5. 97 5. 82 5. 90 6.17 89 59 .3 6.3 5. 47 6. 05 5.82 94 61.0 11 .5 6. 05 5.97 5. 78 5. 75 9 5- 1 61.2 4.6 5. 70 5. 50 Mean 5. 34 5. 40 5. 31 5. 38 SD 0. 65 0.66 0 .57 0 .56 © 2001