Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 26 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
26
Dung lượng
237,46 KB
Nội dung
Geostatistics for Environmental Scientists Second Edition Richard Webster Rothamsted Research, UK Margaret A. Oliver University of Reading, UK Geostatistics for Environmental Scientists Second Edition Richard Webster Rothamsted Research, UK Margaret A. Oliver University of Reading, UK Contents Preface xi 1 Introduction 1 1.1 Why geostatistics? 1 1.1.1 Generalizing 2 1.1.2 Description 5 1.1.3 Interpretation 5 1.1.4 Control 5 1.2 A little history 6 1.3 Finding your way 8 2 Basic Statistics 11 2.1 Measurement and summary 11 2.1.1 Notation 12 2.1.2 Representing variation 13 2.1.3 The centre 15 2.1.4 Dispersion 16 2.2 The normal distribution 18 2.3 Covariance and correlation 19 2.4 Transformations 20 2.4.1 Logarithmic transformation 21 2.4.2 Square root transformation 21 2.4.3 Angular transformation 22 2.4.4 Logit transformation 22 2.5 Exploratory data analysis and display 22 2.5.1 Spatial aspects 25 2.6 Sampling and estimation 26 2.6.1 Target population and units 28 2.6.2 Simple random sampling 28 2.6.3 Confidence limits 29 2.6.4 Student’s t 30 2.6.5 The x 2 distribution 31 2.6.6 Central limit theorem 32 2.6.7 Increasing precision and efficiency 32 2.6.8 Soil classification 35 3 Prediction and Interpolation 37 3.1 Spatial interpolation 37 3.1.1 Thiessen polygons (Voronoi polygons, Dirichlet tessellation) 38 3.1.2 Triangulation 38 3.1.3 Natural neighbour interpolation 39 3.1.4 Inverse functions of distance 40 3.1.5 Trend surfaces 40 3.1.6 Splines 42 3.2 Spatial classification and predicting from soil maps 42 3.2.1 Theory 43 3.2.2 Summary 45 4 Characterizing Spatial Processes: The Covariance and Variogram 47 4.1 Introduction 47 4.2 A stochastic approach to spatial variation: the theory of regionalized variables 48 4.2.1 Random variables 48 4.2.2 Random functions 49 4.3 Spatial covariance 50 4.3.1 Stationarity 52 4.3.2 Ergodicity 53 4.4 The covariance function 53 4.5 Intrinsic variation and the variogram 54 4.5.1 Equivalence with covariance 54 4.5.2 Quasi-stationarity 55 4.6 Characteristics of the spatial correlation functions 55 4.7 Which variogram? 60 4.8 Support and Krige’s relation 60 4.8.1 Regularization 63 4.9 Estimating semivariances and covariances 65 4.9.1 The variogram cloud 65 4.9.2 h-Scattergrams 66 4.9.3 Average semivariances 67 4.9.4 The experimental covariance function 73 5 Modelling the Variogram 77 5.1 Limitations on variogram functions 79 5.1.1 Mathematical constraints 79 5.1.2 Behaviour near the origin 80 5.1.3 Behaviour towards infinity 82 5.2 Authorized models 82 5.2.1 Unbounded random variation 83 5.2.2 Bounded models 84 vi Contents 5.3 Combining models 95 5.4 Periodicity 97 5.5 Anisotropy 99 5.6 Fitting models 101 5.6.1 What weights? 104 5.6.2 How complex? 105 6 Reliability of the Experimental Variogram and Nested Sampling 109 6.1 Reliability of the experimental variogram 109 6.1.1 Statistical distribution 109 6.1.2 Sample size and design 119 6.1.3 Sample spacing 126 6.2 Theory of nested sampling and analysis 127 6.2.1 Link with regionalized variable theory 128 6.2.2 Case study: Youden and Mehlich’s survey 129 6.2.3 Unequal sampling 131 6.2.4 Case study: Wyre Forest survey 134 6.2.5 Summary 138 7 Spectral Analysis 139 7.1 Linear sequences 139 7.2 Gilgai transect 140 7.3 Power spectra 142 7.3.1 Estimating the spectrum 144 7.3.2 Smoothing characteristics of windows 148 7.3.3 Confidence 149 7.4 Spectral analysis of the Caragabal transect 150 7.4.1 Bandwidths and confidence intervals for Caragabal 150 7.5 Further reading on spectral analysis 152 8 Local Estimation or Prediction: Kriging 153 8.1 General characteristics of kriging 154 8.1.1 Kinds of kriging 154 8.2 Theory of ordinary kriging 155 8.3 Weights 159 8.4 Examples 160 8.4.1 Kriging at the centre of the lattice 161 8.4.2 Kriging off-centre in the lattice and at a sampling point 169 8.4.3 Kriging from irregularly spaced data 172 8.5 Neighbourhood 172 8.6 Ordinary kriging for mapping 174 Contents vii 8.7 Case study 175 8.7.1 Kriging with known measurement error 180 8.7.2 Summary 180 8.8 Regional estimation 181 8.9 Simple kriging 183 8.10 Lognormal kriging 185 8.11 Optimal sampling for mapping 186 8.11.1 Isotropic variation 188 8.11.2 Anisotropic variation 190 8.12 Cross-validation 191 8.12.1 Scatter and regression 193 9 Kriging in the Presence of Trend and Factorial Kriging 195 9.1 Non-stationarity in the mean 195 9.1.1 Some background 196 9.2 Application of residual maximum likelihood 200 9.2.1 Estimation of the variogram by REML 200 9.2.2 Practicalities 203 9.2.3 Kriging with external drift 203 9.3 Case study 205 9.4 Factorial kriging analysis 212 9.4.1 Nested variation 212 9.4.2 Theory 212 9.4.3 Kriging analysis 213 9.4.4 Illustration 218 10 Cross-Correlation, Coregionalization and Cokriging 219 10.1 Introduction 219 10.2 Estimating and modelling the cross-correlation 222 10.2.1 Intrinsic coregionalization 224 10.3 Example: CEDAR Farm 226 10.4 Cokriging 228 10.4.1 Is cokriging worth the trouble? 231 10.4.2 Example of benefits of cokriging 232 10.5 Principal components of coregionalization matrices 235 10.6 Pseudo-cross-variogram 241 11 Disjunctive Kriging 243 11.1 Introduction 243 11.2 The indicator approach 246 11.2.1 Indicator coding 246 11.2.2 Indicator variograms 247 11.3 Indicator kriging 249 viii Contents 11.4 Disjunctive kriging 251 11.4.1 Assumptions of Gaussian disjunctive kriging 251 11.4.2 Hermite polynomials 252 11.4.3 Disjunctive kriging for a Hermite polynomial 254 11.4.4 Estimation variance 256 11.4.5 Conditional probability 256 11.4.6 Change of support 257 11.5 Case study 257 11.6 Other case studies 263 11.7 Summary 266 12 Stochastic Simulation 267 12.1 Introduction 267 12.2 Simulation from a random process 268 12.2.1 Unconditional simulation 270 12.2.2 Conditional simulation 270 12.3 Technicalities 271 12.3.1 Lower–upper decomposition 272 12.3.2 Sequential Gaussian simulation 273 12.3.3 Simulated annealing 274 12.3.4 Simulation by turning bands 276 12.3.5 Algorithms 277 12.4 Uses of simulated fields 277 12.5 Illustration 278 Appendix A Aide-me´moire for Spatial Analysis 285 A.1 Introduction 285 A.2 Notation 285 A.3 Screening 285 A.4 Histogram and summary 286 A.5 Normality and transformation 287 A.6 Spatial distribution 288 A.7 Spatial analysis: the variogram 288 A.8 Modelling the variogram 290 A.9 Spatial estimation or prediction: kriging 291 A.10 Mapping 292 Appendix B GenStat Instructions for Analysis 293 B.1 Summary statistics 293 B.2 Histogram 294 B.3 Cumulative distribution 294 B.4 Posting 295 B.5 The variogram 295 Contents ix B.5.1 Experimental variogram 295 B.5.2 Fitting a model 296 B.6 Kriging 297 B.7 Coregionalization 297 B.7.1 Auto- and cross-variograms 297 B.7.2 Fitting a model of coregionalization 298 B.7.3 Cokriging 298 B.8 Control 298 References 299 Index 309 x Contents 2 Basic Statistics Before focusing on the main topic of this book, geostatistics, we want to ensure that readers have a sound understanding of the basic quantitative methods for obtaining and summarizing information on the environment. There are two aspects to consider: one is the choice of variables and how they are measured; the other, and more important, is how to sample the environme nt. This chapter deals with these. Chapter 3 will then consider how such records can be used for estimation, prediction and mapping in a classical framework. The environ ment varies from place to place in almost every aspect. There are infinitely many places at which we might record what it is like, but practically we can measure it at only a finite number by sampling. Equally, there are many properties by which we can describe the environment, and we must choose those tha t are relevant. Our choice might be based on prior knowledge of the most significant descriptors or from a preliminary analysis of data to hand. 2.1 MEASUREMENT AND SUMMARY The simplest kind of environmental variable is binary, in which there are only two possible states, such as present or absent, wet or dry, calcareous or non- calcareous (rock or soil). They may be assigned the values 1 and 0, and they can be treated as quantitative or numerical data. Other features, such as classes of soil, soil wetness, stratigraphy, and ecological communities, may be recorded qualitatively. These qualitative characters can be of two types: unordered and ranked. The structure of the soil, for example, is an unordered variable and may be classified into blocky, granular, platy, etc. Soil wetness classes—dry, moist, wet—are ranked in that they can be placed in order of increasing wetness. In both cases the classes may be recorde d numerically, but the records should not be treated as if they were measured in any sense. They can be converted to sets of binary variables, called ‘indicators’ in geostatistics (see Chapter 11), and can often be analysed by non-parametric statistical methods. Geostatistics for Environmental Scientists/2nd Edition R. Webster and M.A. Oliver # 2007 John Wiley & Sons, Ltd The most informative records are those for which the variables are measured fully quantitatively on continuous scales with equal intervals. Examples include the soil’s thickness, its pH, the cadmium content of rock, and the proportion of land covered by vegetation. Some such scales have an absolute zero, whereas for others the zero is arbitrary. Temperature may be recorded in kelvin (absolute zero) or in degrees Celsius (arbitrary zero). Acidity can be measured by hydrogen ion concentration (with an absolute zero) or as its negative logarithm to base 10, pH, for which the zero is arbitrarily taken as Àlog 10 1 (in moles per litre). In most instances we need not distinguish between them. Some properties are recorded as counts, e.g. the number of roots in a given volume of soil, the pollen grains of a given species in a sample from a deposit, the number of plants of a particular type in an area. Such records can be analysed by many of the methods used for continuous variables if treated with care. Properties measured on continuous scales are amenable to all kinds of mathematical operation and to many kinds of statistical analysis. They are the ones that we concentrate on because they are the most informative, and they provide the most precise estimates and predictions. The same statistical treatment can often be applied to binary data, though because the scale is so coarse the results may be crude and inference from them uncertain. In some instances a continuous variable is deliberately converted to binary, or to an ‘indicator’ variable, by cutting its scale at some specific value, as described in Chapter 11. Sometimes, environmental variables are recorded on coarse stepped scales in the field because refined measurement is too expensive. Examples include the percentage of stones in the soil, the root density, and the soil’s strength. The steps in their scales are not necessarily equal in terms of measured values, but they are chosen as the best compromise between increments of equal practical significance and those with limits that can be detected consistently. These scales need to be treated with some caution for analysis, but they can often be treated as fully quantitative. Some variables, such as colour hue and longitude, have circular scales. They may often be treated as linear where only a small part of each scale is used. It is a different matter when a whole circle or part of it is represented. This occurs with slope aspect and with orientations of stones in till. Special methods are needed to summarize and analyse such data (see Mardia and Jupp, 2000), and we shall not consider them in this book. 2.1.1 Notation Another feature of environmental data is that they have spatial and temporal components as well as recorded values, which makes them unique or determi- nistic (we return to this point in Chapter 4). In representing the data we must distinguish measurement, location and time. For most classical statistical 12 Basic Statistics [...]... frequencies constitutes the frequency distribution, and its graph (with frequency on the ordinate and the variate values on the abscissa) is the histogram Figures 2.1 and 2.4 are examples The number of classes chosen depends on the Figure 2.1 Histograms: (a) exchangeable potassium (K) in mg lÀ1 ; (b) log10 K, for the topsoil at Broom’s Barn Farm The curves are of the (lognormal) probability density 14 Basic... given by pffiffiffiffi À ys= N z pffiffiffiffi and þ ys= N : z ð2:25Þ These are the lower and upper limits on m, given a sample mean and standard z deviation s that estimates s 2 precisely, corresponding to some chosen probability or level of confidence Values of standard normal deviates and their cumulative probabilities are published, and we list the values for a few typical confidences at which people might wish... replace y in expressions (2.25) by Student’s t, which is defined by t¼ À m z pffiffiffiffi : s= N ð2:26Þ The true mean, m, is unknown of course, but t has been worked out and tabulated for N up to 120 So one chooses the confidence level, and then finds from the published table the value of t corresponding to N À 1 degrees of freedom The confidence limits of the mean are then pffiffiffiffi À ts= N z pffiffiffiffi and þ ts=... stratified designs the region of interest, R, is divided into small subdivisions (strata) These are typically small squares, but they may be other shapes, of equal area At least two sampling points are chosen randomly within each stratum For this scheme the largest possible gap is then less than four strata Sampling and Estimation 33 The variance within a stratum k is estimated from nk data in it by... main disadvantage of systematic sampling is that classical theory provides no means of determining the variance or standard error without bias from the sample because once one sampling point has been chosen (and the orientation in two dimensions) there is no randomization An approximation may be obtained by dividing the region into strata and computing the pooled within-stratum variance as if sampling... is convenient We then move the window along the transect in steps and compute dm at each new position If the transect is short then the positions should overlap; if not, a satisfactory procedure is to choose the first sampling point in a new position as the last one in the previous position In this way every sampling point contributes, and with equation (2.35) all contribute equally Then the variance... is the number of steps or positions of the window, and the quantity m À 2 þ 0:5 is the sum of the squares of the coefficients in equation (2.35) For a two-dimensional grid the procedure is analogous One chooses a square window For illustration let it be of side 4 The coefficients can be assigned as follows: À0:25 þ0:5 À0:5 þ0:25 þ0:5 À1:0 þ1:0 À0:5 À0:5 þ1:0 À1:0 þ0:5 þ0:25 À0:5 þ0:5 À0:25 The variance . there are many properties by which we can describe the environment, and we must choose those tha t are relevant. Our choice might be based on prior knowledge of the most significant descriptors or. for obtaining and summarizing information on the environment. There are two aspects to consider: one is the choice of variables and how they are measured; the other, and more important, is how to sample the. strength. The steps in their scales are not necessarily equal in terms of measured values, but they are chosen as the best compromise between increments of equal practical significance and those with limits