Martin H. Trauth MATLAB ® Recipes for Earth Sciences Martin H. Trauth MATLAB ® Recipes for Earth Sciences With text contributions by Robin Gebbers and Norbert Marwan and illustrations by Elisabeth Sillmann With 77 Figures and a CD-ROM Privatdozent Dr. rer. nat. habil. M.H. Trauth University of Potsdam Department of Geosciences P.O. Box 60 15 53 14415 Potsdam Germany E-mail: trauth@geo.uni-potsdam.de Copyright disclaimer MATLAB ® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB ® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB ® software. For MATLAB ® product information, please contact: The MathWorks, Inc. 3 Apple Hill Drive Natick, MA, 01760-2098 USA Tel: 508-647-7000 Fax: 508-647-7001 E-mail: info@mathworks.com Web: www.mathworks.com Library of Congress Control Number: 2005937738 ISBN-10 3-540-27983-0 Springer Berlin Heidelberg New York ISBN-13 978-3540-27983-9 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustra- tions, recitation, broadcasting, reproduction on microfilm or in any other way, and stor- age in data banks. Duplication of this publication or parts thereof is permitted only un- der the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Viola- tions are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media Springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in The Netherlands The use of general descriptive names, registered names, trademarks, etc. in this publica- tion does not imply, even in the absence of a specific statement, that such names are ex- empt from the relevant protective laws and regulations and therefore free for general use. Cover design: Erich Kirchner Typesetting: camera-ready by Elisabeth Sillmann, Landau Production: Christine Jacobi Printing: Krips bv, Meppel Binding: Stürtz AG, Würzburg Printed on acid-free paper 32/2132/cj 5 4 3 2 1 0 Preface Various books on data analysis in earth sciences have been published during the last ten years, such as Statistics and Data Analysis in Geology by JC Davis, Introduction to Geological Data Analysis by ARH Swan and M Sandilands, Data Analysis in the Earth Sciences Using MATLAB ® by GV Middleton or Statistics of Earth Science Data by G Borradaile. Moreover, a number of software packages have been designed for earth scientists such as the ESRI product suite ArcGIS or the freeware package GRASS for generating geo- graphic information systems, ERDAS IMAGINE or RSINC ENVI for remote sensing and GOCAD and SURFER for 3D modeling of geologic features. In addition, more general software packages as IDL by RSINC and MATLAB ® by The MathWorks Inc. or the freeware software OCTAVE provide powerful tools for the analysis and visualization of data in earth sciences. Most books on geological data analysis contain excellent theoreti- cal introductions, but no computer solutions to typical problems in earth sciences, such as the book by JC Davis. The book by ARH Swan and M Sandilands contains a number of examples, but without the use of com- puters. G Middleton·s book fi rstly introduces MATLAB as a tool for earth scientists, but the content of the book mainly refl ects the personal interests of the author, rather then providing a complete introduction to geological data analysis. On the software side, earth scientists often encounter the prob- lem that a certain piece of software is designed to solve a particular geologic problem, such as the design of a geoinformation system or the 3D visualiza- tion of a fault scarp. Therefore, earth scientists have to buy a large volume of software products, and even more important, they have to get used to it before being in the position to successfully use it. This book on MATLAB Recipes for Earth Sciences is designed to help undergraduate and PhD students, postdocs and professionals to learn meth- ods of data analysis in earth sciences and to get familiar with MATLAB, the leading software for numerical computations. The title of the book is an appreciation of the book Numerical Recipes by WH Press and others that is still very popular after initially being published in 1986. Similar to the book by Press and others, this book provides a minimum amount of VI Preface theoretical background, but then tries to teach the application of all methods by means of examples. The software MATLAB is used since it provides numerous ready-to-use algorithms for most methods of data analysis, but also gives the opportunity to modify and expand the existing routines and even develop new software. The book contains numerous MATLAB scripts to solve typical problems in earth sciences, such as simple statistics, time- series analysis, geostatistics and image processing. The book comes with a compact disk, which contains all MATLAB recipes and example data fi les. All MATLAB codes can be easily modifi ed in order to be applied to the reader·s data and projects. Whereas undergraduates participating in a course on data analysis might go through the entire book, the more experienced reader will use only one particular method to solve a specifi c problem. To facilitate the use of this book for the various readers, I outline the concept of the book and the con- tents of its chapters. 1. Chapter 1 – This chapter introduces some fundamental concepts of sam- ples and populations, it links the various types of data and questions to be answered from these data to the methods described in the following chapters. 2. Chapter 2 – A tutorial-style introduction to MATLAB designed for earth scientists. Readers already familiar with the software are advised to pro- ceed directly to the following chapters. 3. Chapter 3 and 4 – Fundamentals in univariate and bivariate statistics. These chapters contain very basic things how statistics works, but also introduce some more advanced topics such as the use of surrogates. The reader already familiar with basic statistics might skip these two chap- ters. 4. Chapter 5 and 6 – Readers who wish to work with time series are recom- mended to read both chapters. Time-series analysis and signal processing are tightly linked. A solid knowledge of statistics is required to success- fully work with these methods. However, the two chapters are more or less independent from the previous chapters. 5. Chapter 7 and 8 – The second pair of chapters. From my experience, reading both chapters makes a lot of sense. Processing gridded spatial data and analyzing images has a number of similarities. Moreover, aerial Preface VII photographs and satellite images are often projected upon digital eleva- tion models. 6. Chapter 9 – Data sets in earth sciences are tremendously increasing in the number of variables and data points. Multivariate methods are applied to a great variety of types of large data sets, including even satellite images. The reader particularly interested in multivariate methods is advised to read Chapters 3 and 4 before proceeding to this chapter. I hope that the various readers will now fi nd their way through the book. Experienced MATLAB users familiar with basic statistics are invited to pro- ceed to Chapters 5 and 6 (the time series), Chapters 7 and 8 (spatial data and images) or Chapter 9 (multivariate analysis) immediately, which contain both an introduction to the subjects as well as very advanced and special procedures for analyzing data in earth sciences. It is recommended to the beginners, however, to read Chapters 1 to 4 carefully before getting into the advanced methods. I thank the NASA/GSFC/METI/ERSDAC/JAROS and U.S./Japan ASTER Science Team and the director Mike Abrams for allowing me to include the ASTER images in the book. The book has benefi t from the comments of a large number of colleagues and students. I gratefully acknowledge my col- leagues who commented earlier versions of the manuscript, namely Robin Gebbers, Norbert Marwan, Ira Ojala, Lydia Olaka, Jim Renwick, Jochen Rössler, Rolf Romer, and Annette Witt. Thanks also to the students Mathis Hein, Stefanie von Lonski and Matthias Gerber, who helped me to improve the book. I very much appreciate the expertise and patience of Elisabeth Sillmann who created the graphics and the complete page design of the book. I also acknowledge Courtney Esposito leading the author program at The MathWorks, Claudia Olrogge and Annegret Schumann at Mathworks Deutschland, Wolfgang Engel at Springer, Andreas Bohlen and Brunhilde Schulz at UP Transfer GmbH. I would like to thank Thomas Schulmeister who helped me to get a campus license for MATLAB at Potsdam University. The book is dedicated to Peter Koch, the late system administrator of the Department of Geosciences who died during the fi nal writing stages of the manuscript and who helped me in all kinds of computer problems during the last few years. Potsdam, September 2005 Martin Trauth Contents Preface V 1 Data Analysis in Earth Sciences 1 1.1 Introduction 1 1.2 Collecting Data 1 1.3 Types of Data 3 1.4 Methods of Data Analysis 7 2 Introduction to MATLAB 11 2.1 MATLAB in Earth Sciences 11 2.2 Getting Started 12 2.3 The Syntax 15 2.4 Data Storage 19 2.5 Data Handling 19 2.6 Scripts and Functions 21 2.7 Basic Visualization Tools 25 3 Univariate Statistics 29 3.1 Introduction 29 3.2 Empirical Distributions 29 3.3 Example of Empirical Distributions 36 3.4 Theoretical Distributions 41 3.5 Example of Theoretical Distributions 50 3.6 The t–Test 51 3.7 The F–Test 53 3.8 The χ 2 –Test 56 X Contents 4 Bivariate Statistics 61 4.1 Introduction 61 4.2 Pearson·s Correlation Coeffi cient 61 4.3 Classical Linear Regression Analysis and Prediction 68 4.5 Analyzing the Residuals 72 4.6 Bootstrap Estimates of the Regression Coeffi cients 74 4.7 Jackknife Estimates of the Regression Coeffi cients 76 4.8 Cross Validation 77 4.9 Reduced Major Axis Regression 78 4.10 Curvilinear Regression 80 5 Time-Series Analysis 85 5.1 Introduction 85 5.2 Generating Signals 85 5.3 Autospectral Analysis 91 5.4 Crossspectral Analysis 97 5.5 Interpolating and Analyzing Unevenly-Spaced Data 101 5.6 Nonlinear Time-Series Analysis (by N. Marwan) 106 6 Signal Processing 119 6.1 Introduction 119 6.2 Generating Signals 120 6.3 Linear Time-Invariant Systems 121 6.4 Convolution and Filtering 124 6.5 Comparing Functions for Filtering Data Series 127 6.6 Recursive and Nonrecursive Filters 129 6.7 Impulse Response 131 6.8 Frequency Response 134 6.9 Filter Design 139 6.10 Adaptive Filtering 143 7 Spatial Data 151 7.1 Types of Spatial Data 151 7.2 The GSHHS Shoreline Data Set 152 7.3 The 2-Minute Gridded Global Elevation Data ETOPO2 154 7.4 The 30-Arc Seconds Elevation Model GTOPO30 157 Contents XI 7.5 The Shuttle Radar Topography Mission SRTM 158 7.6 Gridding and Contouring Background 161 7.7 Gridding Example 164 7.8 Comparison of Methods and Potential Artifacts 169 7.9 Geostatistics (by R. Gebbers) 173 8 Image Processing 193 8.1 Introduction 193 8.2 Data Storage 194 8.3 Importing, Processing and Exporting Images 199 8.4 Importing, Processing and Exporting Satellite Images 204 8.5 Georeferencing Satellite Images 207 8.6 Digitizing from the Screen 209 9 Multivariate Statistics 213 9.1 Introduction 213 9.2 Principal Component Analysis 214 9.3 Cluster Analysis 221 9.4 Independent Component Analysis (by N. Marwan) 225 General Index 231 1 Data Analysis in Earth Sciences 1.1 Introduction Earth sciences include all disciplines that are related to our planet Earth. Earth scientists make observations and gather data, they formulate and test hypotheses on the forces that have operated in a certain region in order to create its structure. They also make predictions about future changes of the planet. All these steps in exploring the system Earth include the acquisition and analysis of numerical data. An earth scientist needs a solid knowledge in statistical and numerical methods to analyze these data, as well as the ability to use suitable software packages on a computer. This book introduces some of the most important methods of data analy- sis in earth sciences by means of MATLAB examples. The examples can be used as recipes for the analysis of the reader·s real data after learn- ing their application on synthetic data. The introductory Chapter 1 deals with data acquisition (Chapter 1.2), the expected data types (Chapter 1.3) and the suitable methods for analyzing data in the fi eld of earth sciences (Chapter 1.4). Therefore, we fi rst explore the characteristics of a typical data set. Subsequently, we proceed to investigate the various ways of analyzing data with MATLAB. 1.2 Collecting Data Data sets in earth sciences have a very limited sample size. They also con- tain a signifi cant amount of uncertainties. Such data sets are typically used to describe rather large natural phenomena such as a granite body, a large landslide or a widespread sedimentary unit. The methods described in this book help in fi nding a way of predicting the characteristics of a larger pop- ulation from the collected samples (Fig 1.1). In this context, a proper sam- pling strategy is the fi rst step towards obtaining a good data set. The devel- opment of a successful strategy for fi eld sampling includes decisions on [...]... of numbers A simple 1-by-1 matrix is a scalar Matrices with one column or row are vectors, time series and other one-dimensional data fields An m-by-n matrix can be used for a digital elevation model or a grayscale image RGB color images are usually stored as three-dimensional arrays, i.e., the colors red, green and blue are represented by a m-by-n-by-3 array Entering matrices in MATLAB is easy To enter... such arrays are done element-byelement Whereas this does not make any difference in addition and subtraction, the multiplicative operations are different MATLAB uses a dot as part of the notation for these operations For instance, multiplying A and B element-by-element is performed by typing C = A * B which generates the output C = 8 63 2 18 8 24 3 6 18 -5 -2 4 6 35 12 -4 5 -6 2.5 Data Handling 19 2.4... for their undergraduates Similarly, The MathWorks provides classroom kits for teachers at a reasonable price It is also possible for students to purchase a low-cost edi- 12 2 Introduction to MATLAB tion of the software This student version provides an inexpensive way for students to improve their MATLAB skills The following Chapters 2.2 to 2.7 contain a tutorial-style introduction to the software MATLAB, ... Storage This chapter is on how to store, import and export data with MATLAB In earth sciences, data are collected in a great variety of formats, which often have to be converted before being analyzed with MATLAB On the other hand, the software provides a number of import routines to read many binary data formats in earth sciences, such as the formats used to store digital elevation models and satellite date... 103 0.5353 0.5191 104 0.5009 0.5216 105 0.5415 -9 99 106 0.501 -9 99 The first row contains the variable names The columns provide the data for each sample The absurd value -9 99 marks missing data in the data set Two things have to be changed in order to convert this table into MATLAB format First, MATLAB uses NaN as the arithmetic representation for Not-a-Number that can be used to mark gaps Second, you... establishment of a data format that can be used on different computer platforms and software is the American Standard Code for Information Interchange ASCII that was first published in 1963 by the American Standards Association (ASA) ASCII as a 7-bit code consists of 27=128 characters (codes 0 to 127) Whereas ASCII-1963 was lacking lower-case letters, the update ASCII-1967, lower-case letters as well... reference to MATLAB Similarly, a large number of the computer codes in the leading Elsevier journal Computers and Geosciences are now written in MATLAB It appears that the software has taken over FORTRAN in terms of popularity Universities and research institutions have also recognized the need for MATLAB training for their staff and students Many earth science departments across the world offer MATLAB courses... faurii a b 2.5 4.0 7.0 -0 .5 0 1 2 3 4 5 6 7 +2.0 +4.0 -3 -2 -1 0 1 2 3 4 c d 30 0 25 50 50 75 N 27 82.5% 25 28 100% 30 33 31 e f N 45° N 70° W E 110° W E S S g Fig 1.3 Types of data in earth sciences a Nominal data, b ordinal data, c ratio data, d interval data, e closed data, f spatial data and g directional data For explanation see text All data types are described in the book except for directional data... available methods for data analysis may require certain types of data in earth sciences These are 1 nominal data – Information in earth sciences is sometimes presented as a list of names, e.g., the various fossil species collected from a limestone bed or the minerals identified in a thin section In some studies, these data are converted into a binary representation, i.e., one for present and zero for absent... drainagesystem analysis, the identification of old landscape forms and lineament analysis in tectonically-active regions 6 Image processing – The processing and analysis of images has become increasingly important in earth sciences These methods include manipulating images to increase the signal-to-noise ratio and to extract certain components of the image Examples for this analysis are analyzing satellite images, . Martin H. Trauth MATLAB ® Recipes for Earth Sciences Martin H. Trauth MATLAB ® Recipes for Earth. nat. habil. M. H. Trauth University of Potsdam Department of Geosciences P.O. Box 60 15 53 14415 Potsdam Germany E-mail: trauth@ geo.uni-potsdam.de Copyright disclaimer MATLAB ® . like to thank Thomas Schulmeister who helped me to get a campus license for MATLAB at Potsdam University. The book is dedicated to Peter Koch, the late system administrator of the Department of