GIS for Environmental Decision Making - Chapter 7 ppt

117 CHAPTER 7 GIS and Predictive Modelling: A Comparison of Methods for Forest Management and Decision-Making A. Felicísimo and A. Gómez-Muñoz 7.1 INTRODUCTION GIS can be a useful tool for spatial or land-use planning, but only if several conditions are fulfilled. The key conditions are related to 1) the quality of basic spatial information, and 2) the statistical methods applied to the spatial nature of the data. Appropriate information and methods allow the generation of robust models that guarantee objective and methodologically sound decisions. In this study we apply several multivariate statistical methods and test their usefulness to provide robust solutions in forestry planning using GIS. We must emphasize that in our Iberian study area, where forests have progressively decreased in extent over centuries, the main aims of forestry planning are the reduction of forest fragmentation, biodiversity conservation, and restoration of degraded biotopes. The research develops a set of likelihood or suitability models for the presence of tree species that are widely distributed over a study area of 41,000 km 2 . The utility of suitability models has been demonstrated in some previous studies 1 , but they are still not as widely employed as might be expected. A suitability model is a raster map in which each pixel is assigned a value reflecting suitability for a given use (e.g., presence of a tree species). Suitability models can be generated through diverse techniques, such as logistic regression or non-parametric CART (classification and regression trees) and MARS (multiple adaptive regression splines) 2-4 . All of these techniques require a vegetation map (dependent variable) and a set of environmental variables (climate, topography, geology, etc.) which potentially influence the vegetation distribution. The foundation of the method is to establish relationships between the environmental variables and the spatial distribution of the vegetation. Typically, each vegetation type will respond in a different way as a consequence of its contrasting environmental requirements. Suitability is commonly expressed on a 0-1 scale (incompatible-ideal). The precise value depends on a set of physical and biological factors that favor or limit the growth of each type of vegetation. Once the distribution of suitability values across a region is known, decisions on land use and management can be made on the basis of objective criteria. © 2008 by Taylor & Francis Group, LLC 118 GIS for environmental decision-making The set of suitability values for a region can be considered as the potential distribution model if presented as a map: the area defined as ‘suitable’ in a model should reflect the potential area for the vegetation type under consideration. Such a model also represents the relationships between presence/absence of each forest type and the values of the potentially influential environmental variables in a given region. Usually, current forest distributions are significantly smaller than the potential spatial extents because they have been systematically logged. Potential distribution models allow the recognition and delineation of such former distribution areas in order to direct current and future management plans, provide valuable data for restoration initiatives and highlight areas where such actions should be considered a priority. 7.2 OBJECTIVES The main objectives of the study were to 1) use several different statistical methods to generate maps of potential distributions and suitability for each of three species of Quercus (oak) in the study area, and 2) identify the most appropriate method and assess its advantages and limitations. In order to fulfill these objectives, we developed a workflow that included sampling strategies, GIS implementation of statistical models and validation of results. 7.3 STUDY AREA The study area was Extremadura, one of the 17 Autonomous Communities of Spain, covering 41,680 km 2 , and located in the west of the Iberian Peninsula (Figure 7.1). It has a Mediterranean climate, somewhat softened by the relative proximity to the sea and the passage of frontal systems from the Atlantic. The study subjects, which partially cover this area, were three species of the genus Quercus that grow in forests or ‘dehesas’. Dehesas are artificial ecotypes derived from original forest clearings (Figure 7.2). Continuous forest cover disappeared centuries ago and currently only scattered patches remain over a large potential area. In some places deforestation was complete and not even the most open dehesas remain. Trees from the genus Quercus are the dominant constituents of forests in the area, the most important species (and those considered in the analysis) being Quercus rotundifolia Lam. (holm oak, 12,680 km 2 , synonym: Quercus ilex L. ssp. ballota (Desf.) Samp.), Quercus suber L. (cork oak, 2,130 km 2 ) and Quercus pyrenaica Wild. (Pyrenean oak, 950 km 2 ). With some exceptions, Pyrenean oak appears most commonly in forests, while cork and holm oaks preferentially occur in dehesas. © 2008 by Taylor & Francis Group, LLC Predictive modelling of tree species 119 Figure 7.1 Location of Extremadura in the Iberian Peninsula. Figure 7.2 Dehesas are artificial ecotypes comparable to savannas: a Mediterranean (seasonal) grassland containing scattered trees of the genus Quercus. © 2008 by Taylor & Francis Group, LLC 120 GIS for environmental decision-making 7.4 DATA A set of raster maps was compiled to reflect the spatial distribution of dependent and independent (predictive) variables. 7.4.1 Quercus Distributions Current Quercus species distribution maps were taken from the Forestry Map of Spain (scale 1:50,000), produced by the Spanish General Directorate for Nature Conservation during the period 1986-96. We used the digital version of the map to identify the main vegetation classes and the current spatial distributions (Figure 7.3). Figure 7.3 Current distribution of Quercus species in the study area (black represents Pyrenean oak, Q. pyrenaica; dark gray, cork oak, Q. suber; and pale gray, holm oak, Q. rotundifolia). © 2008 by Taylor & Francis Group, LLC Predictive modelling of tree species 121 7.4.2 Predictive Variables Raster maps were generated to represent the following independent variables: • Elevation. A digital elevation model (DEM) was constructed using Delaunay triangulation of spot height and contour data from the 1:50,000 scale topographic map of the Army Geographical Service, followed by transformation to a regular 100 m resolution grid. • Slope angle was calculated from the DEM by applying Sobel's algorithm 5 . • Potential insolation. A measure was derived following the method proposed by Fernández Cepedal and Felicísimo 6 . This used the DEM to assess the extent of topographical shading given the position of the sun at different standard date periods 7 . The result was an estimate of the time that each point on the terrain surface was directly illuminated by solar radiation. The temporal resolution was 20 minutes and the spatial resolution 100 m. • Temperature maps of the annual maxima and minima were interpolated from data for 140 meteorological monitoring points (National Institute of Meteorology, Spain) using the thin-plate spline method 8,9 with a spatial resolution of 500 m. • Quarterly rainfall maps were interpolated from data for 276 meteorological monitoring points (National Institute of Meteorology, Spain) using the thin-plate spline method with a 500 m spatial resolution. These variables were selected because of their potential influence on the distribution of the vegetation and the availability of sufficient data to generate GIS digital layers. Lack of data eliminated other variables (e.g., soils) commonly used in ecological modelling. 7.5 METHODS 7.5.1 Statistical Methods The methods used in predictive modelling are usually of two main types: global parametric and local non-parametric. Global parametric models adopt an approach where each entered predictor has a universal relationship with the response variable. An advantage of global parametric models, such as linear and logistic regression, is that they are easy and quick to compute, and their integration with a GIS is straightforward. As an example of such a model we used logistic multiple regression (LMR). This is widely employed in predictive modelling 10 , but has several important limitations. For instance, ecologists frequently assume a © 2008 by Taylor & Francis Group, LLC 122 GIS for environmental decision-making response function which is unimodal and symmetric, yet this is often not justified 11,12 . An alternative hypothesis when modelling organism or community distributions is to assume that the response is related to predictor variables in a non-linear and local manner. Local non-parametric models are appropriate for such an approach since they use a strategy of local variable selection and reduction, and are flexible enough to allow non-linear relationships. Two examples of this type of model are CART (classification and regression trees) and MARS (multiple adaptive regression splines). All three types of model used in this study were calculated from stratified random samples of pixels with an approximately even representation of points where each Quercus species was present or absent. Each random sample covered about 10-20% of the total area for each species. One sample was used to generate the models, and a second to test the reliability of the predictions. 7.5.1.1 Logistic Multiple Regression Logistic multiple regression (LMR) has been used to generate likelihood models for forecasting in a variety of fields. It requires a dichotomous (presence/absence) dependent variable and the predicted probability of presence takes the form shown in Equation 7.1: P(i) = 1 / 1+exp[-(b 0 + b 1 · x 1 + b 2 · x 2 +…+ b n · x n )] (7.1) where P(i) is the probability of presence (e.g., for a tree species), x 1 x n represent the values of the independent variables, and b 1 b n the coefficients. The predicted values from the regression are probabilities which range from 0 to 1 and can be interpreted as measures of potential suitability 13 . Several studies have combined LMR with GIS tools to present such probabilities in cartographic form. For instance, Guisan et al. 14 used LMR in the ArcInfo GIS to generate a distribution model for the plant Carex curvula in the Swiss Alps. A similar study on aquatic vegetation was conducted by Van de Rijt et al. 15 using the GRASS GIS. In this study LMR was performed using a forward conditional stepwise method in SPSS ® 11.5 16 and the results were then imported back into the ArcInfo ® GIS 17 for mapping. 7.5.1.2 Classification and Regression Trees CART is a rule-based method that generates a binary tree through ‘binary recursive partitioning’, a process that splits a node based on yes/no answers about the values of the predictors 2 . Each split is based on a single variable, and while some variables can be used several times in a model, others may not be used at all. The rule generated at each step minimizes the variability within each of the two resulting subsets. Applying CART often results in a complex tree of subsets based © 2008 by Taylor & Francis Group, LLC Predictive modelling of tree species 123 on a node purity criterion and subsequently this is usually ‘pruned back’ to avoid over-fitting via cross-validation. The main drawback of CART models when used to predict organism distributions is that the generated models can be extremely complex and difficult to interpret. For example, work on Australian forests by Moore et al. 18 produced a tree with 510 nodes from just 10 predictors. In this study, the optimal tree generated from the Quercus rotundifolia data set had 4889 terminal nodes. Although the complexity of such a tree does not diminish its predictive power, it makes it almost impossible to interpret, which in many studies is a key requirement. Moreover, implementation of such an analysis within a GIS is difficult. Nevertheless, as part of this study we developed a method to translate the large CART reports (text files) to AML (Arc Macro Language) files that could be run with the ArcInfo GIS. Such files can be large (e.g., the text file containing the CART decision rules for constructing the Q. rotundifolia suitability map was 1.8 Mb in size) and execution times may be long (about 55 hours for the Q. rotundifolia model). 7.5.1.3 Multivariate Adaptive Regression Splines MARS is a relatively novel technique that combines classical linear regression, mathematical construction of splines and binary recursive partitioning to produce a local model where relationships between response and predictors can be either linear or non-linear 3 . To do this, MARS approximates the underlying function through a set of adaptive piecewise linear regressions termed ‘basis functions’. For example, the first four basis functions from the Q. pyrenaica model are: BF1 = MAX (0, PT4 - 3431) BF2 = MAX (0, 3431 - PT4 ) BF3 = MAX (0, MDE50 - 1181) BF4 = MAX (0, 1181 - MDE50) where PT4 is the mean rainfall for the period October-December (l/m 2 * 10) and MDE50 is elevation (m). Changes in the slope of these basis functions occur at points called ‘knots’ (the values 3431 or 1181 in the above examples). Regression lines are allowed to bend at the knots, which mark the end of one region of data and the beginning of another with different functional behavior. Like the subdivisions in CART, knots are established in a forward/backward stepwise way. A model which clearly overfits is produced first and then those knots that contribute least to efficiency are discarded in a backwards-pruning step to avoid overfitting. The best model is selected via cross-validation, a process that applies a penalty to each term (i.e., a knot) added to the model in order to keep complexity as low as possible. © 2008 by Taylor & Francis Group, LLC 124 GIS for environmental decision-making As in the CART analysis, we transformed the MARS text report files into AML and then generated the suitability models using the ArcInfo GIS. 7.5.2 Model Evaluation The predictive capacity of a model can be evaluated as a function of the percentages of correct classifications, both for presences and absences (sensitivity and specificity parameters). The sensitivity and specificity of the model depend on the threshold or cut-off, which is set so as to classify each point according to its likelihood value. To assess model performance we used the area under the Receiver Operating Characteristic (ROC) curve, particularly a measure commonly termed AUC 19 . The ROC curve is a plot of the relationship between sensitivity and specificity across all cut-off points of the model. We developed a method to construct the ROC curves by importing the databases associated with sample points into the SPSS statistical package. The ROC curve is recommended for comparing two-class classifiers, as it does not merely summarize performance at a single arbitrarily selected decision threshold, but across all possible decision thresholds 20,21 . AUC is a synthesized overall measure of model accuracy where 1 indicates a perfect fit and a value of 0.5 indicates that the model is performing no better than chance. AUC is also equivalent to the normalized Mann-Whitney two-sample statistic, which makes it comparable to the Wilcoxon statistic. 7.6 RESULTS 7.6.1 Suitability Models All the LMR equations, MARS basis functions and CART classification rules were translated into ArcInfo GIS syntax. ArcInfo was subsequently used to generate the spatial suitability models, whose goodness-of-fit was evaluated by AUC values. Table 7.1 compares the overall results for different tree species and statistical methods, with bold text highlighting the best fitting models for each species. The AUC values indicate that the LMR models provided the poorest goodness-of-fit for each species, while the CART ones were the best performers. However, there were some differences between tree species with a relatively narrow range of AUC values for Q. pyrenaica (i.e., all the methods produce a good fit) and a much greater one in the Q. rotundifolia case. This may be related to differences in the current extent of the species (see Section 7.3) with Q. rotundifolia being the most common and therefore having potentially more complex environmental relationships. It is also worth noting that greater complexity (number of terminal nodes) in the CART models does not guarantee better results. This is an interesting finding that could assist in the practicalities of implementing such models within a GIS framework. © 2008 by Taylor & Francis Group, LLC Predictive modelling of tree species 125 Table 7.1 Summary statistics for the suitability models Quercus Species Method Terminal Nodes AUC Confidence Interval (95%) Q. pyrenaica, Pyrenean oak LMR Not Applicable 0.924 Not Available Sample Size MARS Not Applicable 0.972 0.970-0.974 18,880 positive cases CART 56 0.970 0.968-0.972 18,590 negative cases CART 102 0.974 0.972-0.976 CART 204 0.979 0.977-0.981 CART 817 0.974 0.972-0.976 Q. suber, cork oak RLM Not Applicable 0.790 Not Available Sample Size MARS Not Applicable 0.802 0.799-0.805 42,040 positive cases CART 525 0.971 0.970-0.972 41,979 negative cases CART 1016 0.975 0.974-0.977 CART 2355 0.975 0.973-0.976 Q. rotundifolia, holm oak RLM Not Applicable 0.627 Not Available Sample Size MARS Not Applicable 0.767 0.764-0.770 50,394 positive cases CART 1343 0.889 0.887-0.891 50,690 negative cases CART 2347 0.894 0.892-0.896 CART 4889 0.895 0.893-0.897 Another feature of the CART model output became apparent when the results were converted into suitability maps. As is illustrated in Figure 7.4a the CART maps show abrupt transitions between areas of high and low suitability (darker and lighter shading respectively) which reflects the reliance on binary rules. In addition, due to the influence of climate variables, the suitability models frequently replicate the shapes of isopleths, which makes them visually less convincing. Although the backward pruning process in CART reduces the number of terminal nodes and makes the final model less complex, it does not eliminate such effects. These features are not present in the MARS-based maps (Figures 7.4b-7.4d) which show more smoothed and continuous distributions of suitability values. For this reason, we decided to use the MARS model output to generate a potential vegetation distribution. © 2008 by Taylor & Francis Group, LLC 126 GIS for environmental decision-making Figure 7.4 Suitability models: a) CART model for Q. rotundifolia, b) MARS model for Q. pyrenaica, c) MARS model for Q. suber, d) MARS model for Q. rotundifolia. Darker shading indicates higher suitability. 7.6.2 Potential Vegetation Model Suitability models for the three tree species were combined to generate a potential vegetation distribution map that could be used to inform land management and decision-making. This map was generated through a decision rule that took into account both suitability values as well as proximity to the current presence of forests. We defined a function where, for each cell, the suitability value for each species was corrected by the inverse of the distance to the closest cell where the species currently grows. This correction can be considered as a coarse indicator of © 2008 by Taylor & Francis Group, LLC [...]... Vegetation zonation in a former tidal area: A vegetation-type response model based on DCA and logistic regression using GIS, Journal of Vegetation Science, 7, 500 5-5 18, 1996 16 SPSS, SPSS 11.5, SPSS Inc, http://www.spss.com, 2004 17 ESRI, ArcInfo Desktop, Environmental Systems Research Institute, http://www.esri.com, 2004 18 Moore, D.M., Lee, B.G., and Davey, S.M., A new method for predicting vegetation... utility Combining such model-based maps with current land-use information and management data could help provide decision support tools that would be extremely useful in many aspects of spatial or environmental planning 7. 8 ACKNOWLEDGMENTS This study was conducted as part of Project 2PR01C023 co-funded by the Junta de Extremadura and FEDER (Fondo Europeo de Desarrollo Regional) 7. 9 REFERENCES 1 Guisan,... de la Universidad de Oviedo, 5, 10 9-1 19, 19 87 7 Heywood, H., Standard date periods with declination limits, Nature, 204, 678 , 1964 8 Hutchinson, M.F., Climatic analyses in data sparse regions, in Climate Risk in Crop Production Models and Management for the Semiarid Tropics and Subtropics, Chow, M.C and Bellamy J.A., Eds., CAB International, Wallingford, 1991, 5 5 -7 1 9 Lennon, J.J and Turner, J.R.,... Great Britain, Journal of Animal Ecology, 64, 37 0-3 92, 1995 10 Felicísimo, A.M., Francés, E., Fernández, J.M., González-Díez, A., and Varas, J., Modeling the potential distribution of forests with a GIS, Photogrammetric Engineering & Remote Sensing, 68, 455461, 2002 11 Austin, M.P and Smith, T.M., A new model for the continuum concept, Vegetatio, 83, 3 5-4 7, 1989 12 Yee, T.W and Mitchell, N.D., Generalized... this study is based on robust statistical or GIS operations, and objective cartographical © 2008 by Taylor & Francis Group, LLC 128 GIS for environmental decision- making information There is an explicit procedure to produce the final result and the entire workflow of information is transparent and repeatable The models used are based on real data (data driven) and in our experience these methods give... distributions using decision tree analysis in a geographic information system, Environmental Management, 15, 5 9 -7 1, 1991 19 Hanley, J.A and McNeil, B.J., The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve, Radiology, 143, 2 9-3 6, 1982 20 Fielding, A.H and Bell, J.F., A review of methods for the assessment of prediction errors in conservation presence/absence models, Environmental. .. modelling of tree species 1 27 colonization likelihood The result of these calculations was a model showing, for each cell, the type of forest with the highest potential value after considering colonization processes Figure 7. 5 shows the result, highlighting relatively clustered regions for Q pyrenaica amidst more dispersed distributions for the other two Quercus species Figure 7. 5 Potential distribution... statistical tools and GIS, but it is clear that there is still a need for better integration of such capabilities in most common commercial GIS Transferring the potential vegetation model into practical forestry action would also require further information, especially on soil properties and economic factors Dealing with such implementation issues is beyond the scope of this chapter, but it is evident... (pale gray) 7. 7 CONCLUDING DISCUSSION Suitability maps represent a useful tool for environmental management as they synthesize a wide range of knowledge which is difficult to integrate in any other way Until recently, most potential vegetation maps were developed by largely subjective methods, usually by an ‘expert’ In contrast, the approach used in this study is based on robust statistical or GIS operations,... Journal of Vegetation Science, 2, 58 7- 6 02, 1991 13 Jongman, R.H.G., Ter Braak, C.J.F., and van Tongeren, O.F.R., Data Analysis in Community and Landscape Ecology, Cambridge University Press, Cambridge, 1995 14 Guisan, A., Theurillat, J.-P., and Kienast, F., Predicting the potential distribution of plant species in an alpine environment, Journal of Vegetation Science, 9, 6 5 -7 4, 1998 15 Van de Rijt, C.W.C.J., . Applicable 0. 972 0. 97 0-0 . 974 18,880 positive cases CART 56 0. 970 0.96 8-0 . 972 18,590 negative cases CART 102 0. 974 0. 97 2-0 . 976 CART 204 0. 979 0. 97 7-0 .981 CART 8 17 0. 974 0. 97 2-0 . 976 Q. suber,. Applicable 0 .79 0 Not Available Sample Size MARS Not Applicable 0.802 0 .79 9-0 .805 42,040 positive cases CART 525 0. 971 0. 97 0-0 . 972 41, 979 negative cases CART 1016 0. 975 0. 97 4-0 . 977 CART 2355 0. 975 . 1 17 CHAPTER 7 GIS and Predictive Modelling: A Comparison of Methods for Forest Management and Decision- Making A. Felicísimo and A. Gómez-Muñoz 7. 1 INTRODUCTION GIS can be

Định dạng
Số trang	13
Dung lượng	3,78 MB