A Low-Cost System for Creating 3D Terrain Models from Digital Images Howard Schultz Aerial Vision Inc. Amherst, MA 01002 schultz@aerialvision.com ABSTRACT Recent advances in digital image sensors and computer vision techniques have the potential to significantly improve the ability to monitor and study a wide variety of environments, including the relationship between forest conservation and global climate change, the impact of suburban sprawl on drinking water quality, and fire risk assessment over large tracts of public and private land. However, the full potential of these technologies cannot be realized until robust, automatic end-to-end systems are made available that enable the end user to generate meaningful results from these new data sources and analysis tools. Our goal is to improve the way in which Geographic Information System (GIS) databases are created, updated and utilized. We are building systems that enable organizations to rapidly and inexpensively generate, update, analyze and visualize high-resolution 3D digital terrain models from digital images collected by a portable camera system mounted on a single engine aircraft. These techniques will expand the scope of GIS applications, especially in areas where low cost and short turnaround time are critical. This paper discusses the general problem of reconstructing 3D terrain models for small format digital images, and presents a system that integrates an instrument package (constructed from commercial off-the-shelf components) with a suite of analysis and terrain modeling algorithms. Preliminary results are presented that demonstrate the feasibility of the system. 1. INTRODUCTION Understanding the impact of society on the natural world has become a central theme in scientific research and government policy formation. U.S. government agencies and international organizations are increasing the amount of resources allocated to studying a wide range of environmental and ecological topics, including carbon cycle modeling, global warming, bio- diversity and resource management. To better understand the impact of these programs high-resolution terrain models and attribute maps over ecologically Copyright © 2004 CRC Press, LLC 179 sensitive areas need to be quickly and economically generated. The primary goal of our research and development program is to enhance the ability of ecology and resource management programs to acquire, analyze and distribute high-resolution maps of important environmental attributes. This paper describes a remote sensing system under development by Aerial Vision Inc. (AVI) for collecting high quality, high-resolution digital images and metadata, and creating geographically registered high-resolution 3D terrain models and attribute maps. The goal is to provide a user friendly end-to-end system that will enable scientists and engineers with a minimum of specialized training in photogrammetry and computer vision to quickly and inexpensively generate high quality GIS attribute layers. The system is based on several years of research in computer science and resource management at the University of Massachusetts, Amherst [Schultz 99b, 99c, 02b; Slaymaker 99]. 2. GENERAL CONSIDERATIONS The process of creating 3D terrain models and attribute layers involves a wide range of technologies from camera calibration to softcopy photogrammetry. This paper examines each step in the process with the goal of determining a proper mix of off-the-shelf components and new technologies for building a practical system. In theory, a 3D terrain model can be reconstructed from a collection of images if the camera(s) are calibrated so that the orientation of the image rays are known relative to the camera coordinate system, and the position and orientation (pose) of the camera, for each exposure, is known relative to a fixed world coordinate system. The intrinsic camera parameters are determined in the laboratory. The camera pose, however, must be determined from metadata collected during data acquisition. Our approach uses an optimal filtering technique to produce accurate estimates of camera pose from reliable but noisy estimates from low-cost instruments. Softcopy photogrammetry algorithms further reduces the measurement error to a level that permits the generation of geographically registered 3D terrain models. Camera Calibration. Camera calibration is an essential element that is often overlooked in the construction of low cost systems. The purpose of geometric camera calibration is to build a mathematical model that defines the orientation of the rays that emanate from each pixel. Traditional calibration procedures, based on this philosophy, categorize the camera and lens separately. These methods work well provided that great care is taken in the manufacture and handling of the camera and lenses. For low-cost cameras, Copyright © 2004 CRC Press, LLC 180 GeoSensor Networks however, it is not possible to consider the components separately. Instead, the camera and lens must be treated as one unit. The intrinsic camera parameters (focal length, image center and lens distortion) are determined using the calibration procedures developed at the University of Massachusetts, Amherst [Kovalenko 02]. 3. INSTRUMENTATION AND DATA ACQUISITION A system for collecting multi-spectral images and navigation metadata is currently under development. The system is built from commercial-off-the- shelf (COTS) parts and is designed to be maintained and operated with a minimum of specialized training. To ensure optimal flexibility, the system is designed so that it can be mounted to a Cessna 172/182 aircraft, which are Figure 1: A data flow diagram of the navigation, instrumentation and data acquisition systems. The signal received by the GPS antenna is split between the GPS receiver and the UTC time code generator. The GPS receiver, AHRS and dual return laser altimeter download data to the system through an RS-232 interface. The digital cameras download data through a high-speed camera link port on the frame grabbers, which write the data directly to disk through an onboard SCSI controller. The GPS receiver, AHRS, laser altimeter, and digital cameras receive setup and control function commands from the system through the RS-232 communications ports. The pilot is guided by a light bar and flight computer. Copyright © 2004 CRC Press, LLC Creating 3D Terrain Models 181 readily available for rent at most general aviation airports in most countries. The basic system components are shown in Figure 1 and described below. Laser Altimeter. The system incorporates a profiling laser altimeter, manufactured by Laser Atlanta [http://www.laseratlanta.com]. The primary function of the altimeter is to measure the instantaneous height of the aircraft above the terrain at approximately 240 Hz. Navigation System. During data collection, the navigation system guides the pilot along a predetermined grid pattern, and provides a stream of position estimates (at 5 Hz) to the data acquisition system. The system incorporates an Ag-Vav navigation system, which was originally designed for crop spraying applications. The system is capable of locating the instantaneous position of the aircraft to sub-meter accuracy at 5Hz. Attitude and Heading Reference System (AHRS). A Watson Industries AHRS E304 Attitude and Heading Reference System [http://www.watson- gyro.com/products/ahrs.html] is used to determine the camera attitude and heading (where the camera is pointing) to an accuracy of 0.01°, and 0.1° relative to the vertical plane. Data Acquisition System. The data acquisition system platform is a standard 3 GH Pentium workstation. The laser altimeter, ARHS, and GPS receiver generate ASCII data streams at approximately 250 Hz, 17 Hz and 5 Hz, respectively. The data rate from these instruments is slow enough to be read by a standard RS-232 port. The digital cameras, on the other hand, generate data at a significantly higher rate of approximately 80 MB/sec and use a standard high speed camera link interface. Because each instrument operates asynchronously and transmits data at a unique rate, the data from each instrument must be time tagged before it is written to disk. 4. DATA PROCESSING So far the discussion has focused on procedures for acquiring images and metadata. In the remaining sections, the discussion turns to the process of converting the raw data to high-resolution 3D terrain models. Pre-processing. The pre-processing procedure provides a simple automatic means for interfacing the raw data to a commercial GIS/softcopy software package. After the data disks are returned to the laboratory, the data are processed by a suite of pre-processing programs that automatically clean and format the raw data. Copyright © 2004 CRC Press, LLC 182 GeoSensor Networks Camera Pose Recovery. Camera pose is estimated in a two step process. First, the data are unpacked, checked for validity and converted to engineering units, the GPS, AHRS, laser altimeter, and image data are assembled in time coded tabular form, where each record contains the derived observation and the time of the observation. Next, the independent noisy observations, an estimate of the camera pose for each exposure is derived using a Kalman-Bucy filter [Kalman 61, Gelb 74], which is often used in navigation systems to assimilate position and orientation information from a global positioning system (GPS) and inertial measure unit (IMU) [Lin 1991, Cook 94, Seclel 64]. In our system, photogrammetric analysis of the motion imagery provides additional observations of the aircraft motion [Sim 96]. The camera pose estimates are used to initialize the block bundle adjustment procedure of the softcopy photogrammetry routine. It is not necessary to measure the precise camera pose during flight. Instead, the navigation instruments must simply provide sufficiently accurate estimates of camera pose to initialize the softcopy photogrammetry system. DEM Extraction. Seamless feature maps are created from a collection of small format digital images by projecting each image onto a digital elevation map (DEM). This process requires precise knowledge of the camera pose at each exposure and the shape of the terrain [Maune 01]. The DEM may be extracted from a database, such as the ones provided by the USGS, or generated from the recorded images using softcopy photogrammetry techniques. For low-resolution projects, an existing DEM may be acceptable. For high-resolution projects requiring sub-meter registration or a dense array of posting, however, the DEM must be extracted from the recorded images and metadata [Rodriguez 90]. We will implement the Terrest methodology [Schultz 94, 95, 02a] to recover high-resolution topography from a sequence of spatially overlapping digital images. Terrest has the capability of fusing the overlapping information captured by a sequence of images into a single composite DEM. The algorithm uses the overlapping information to estimate the optimal elevation, geospatial uncertainty and a reliability figure of merit for a dense array of points [Leclerc 98a, b, Schultz 99b, 02a]. A high resolution DEM and a corresponding orthoimage generated from the images are shown in Figure 2. The ground sampling distance for the DEM is 10 cm. The ortho-image is a false color image with the near infrared, red and green bands encoded in the red, green and blue color channels. The scene covers an area of approximately 100 by 150 meters. Note that Copyright © 2004 CRC Press, LLC Creating 3D Terrain Models 183 individual trees (and tree gaps) are clearly visible, and that the terrain slopes downhill from right to left. 5. CONCLUSION Preliminary results have demonstrated the feasibility of building a low-cost, portable aerial imaging and analysis system capable of generating high- resolution 3D terrain models in the form of GIS layers. The instrumentation, data acquisition and navigation system are built from commercial off-the- shelf parts. We have demonstrated the feasibility of generating accurate estimates of camera pose from inexpensive navigation instruments by integrating optimal filtering and standard softcopy photogrammetry techniques. ACKNOWLEDGMENTS The work is supported by a grant (DMI-0232361) from the National Science Foundation. REFERENCES Blazquez, C.H. 1989. Computer-Based Image Analysis and Tree Counting with Aerial Color Infrared Photography. Journal of Imaging Technology, Vol. 15(4) pp. 163-168. Figure 2: A high resolution DEM and a corresponding ortho- image generated from two overlapping multi- spectral digital images. The ground sampling distance for the DEM is 10cm. Copyright © 2004 CRC Press, LLC 184 GeoSensor Networks Chester C.S., T. Charles, and W.H. Soren, (Editors), 1980. Manual of Photogrammetry, Fourth edition. American Society for Photogrammetry and Remote Sensing, Falls Church, VA. Cook M. and M. Rycroft, 1994. Aerospace Vehicle Dynamics and Control, Clarendon Press, Oxford. Gelb, A., 1974. Applied Optimal Estimation, M.I.T. Press, Cambridge, MA. Kalman R. and R. Bucy, 1961. New Results in Filtering and Prediction Theory, J. Basic Eng., Ser. D,Vol. 83,pp. 95-108. Kovalenko, S., 2002. Camera Calibration for Generating Mosaics, M.S. Project, University of Massachusetts, Amherst, Computer Science Department. http://vis-www.cs.umass.edu/~kovalenk/thesis/ms.html. Leclerc, Y.G., Q.T. Luong, and P. Fua, 1998. A Framework for Detecting Changes in Terrain, IEEE Trans. Pattern Analysis and Machine Intel., Vol. 20(11), pp. 1143-1160. Leclerc, Y.G., Q.T. Luong et al., 1998. Self-consistency: A Novel Approach to Characterizing the Accuracy and Reliability of Point Correspondence Algorithms, DARPA Image Understanding Workshop, Morgan Kauffman. Lin, C-F. 1991. Modern Navigation, Guidance, and Control Processing, Prentice-Hall, Englewood Cliffs, NJ. Maune, D. (Edtor), 2001. Digital Elevation Model Technologies and Applications: The DEM Users Manual, American Society for Photogrammetry and Remote Sensing. Rodriguez, J.J. and J.K. Aggarwal, 1990. Matching Aerial Images to 3-D Terrain Maps, IEEE Trans on Pattern Analysis and Machine Intel. Vol. 12(12). Schultz, H., A.R. Hanson, E.M. Riseman, F.R. Stolle, D. Woo, and Z. Zhu, 2002. A Self-consistency Technique for Fusing 3D Information. Invited talk at the IEEE 5th Int. Conference on Information Fusion, Annapolis, MD. Schultz, H., A.R. Hanson, E.M. Riseman, and F.R. Stolle, 2002. Rapid Updates of GIS Databases from Digital Images. National Conference for Digital Government Research, Los Angeles CA. Schultz, H., E.M. Riseman, F.R. Stolle, and D M. Woo, 1999. Error Detection and DEM Fusion Using Self-Consistency, 7th IEEE Int. Conference on Computer Vision, Kerkyra, Greece, Vol. 2, pp. 1174-1181. Schultz, H., A. Hanson, C. Holmes, E. Riseman, D. Slaymaker, and F. Stolle, 1999. Integrating Small Format Aerial Photography, Videography, and Copyright © 2004 CRC Press, LLC Creating 3D Terrain Models 185 a Laser Profiler for Environmental Monitoring, ISPRS WG III/1 Workshop on Integrated Sensor Calibration and Orientation, Portland, ME Schultz, H., D. Slaymaker, A. Hanson, E. Riseman, C. Holmes, M. Powell, and M. Delaney, 1999. Cost-Effective Determination of Biomass from Aerial Images, International Workshop on Integrated Spatial Data, Portland, ME. Schultz, H., 1995. Terrain Reconstruction from Widely Separated Images, Proc. SPIE, Vol. 2486, pp. 113-123, Orlando, FL. Schultz, H., 1994. Terrain Reconstruction from Oblique Views, Proc. DARPA Image Understanding Workshop, Monterey, CA, pp. 1001-1008. Seckel, E., 1964. Stability and Control of Airplanes and Helicopters, Academic Press, NY. Sim, D G., S Y. Jeong, R H Park, R C. Kim, S.U. Lee, and I.C. Kim, 1996. Navigation Parameter Estimation from Sequential Aerial Images, Int. Conference on Image Processing, Vol. 1, pp. 629-632. Slaymaker, D., H. Schultz, A. Hanson, E. Riseman, C. Holmes, M. Powell, and M. Delaney, 1999. Calculating Forest Biomass With Small Format Aerial Photography, Videography and a Profiling Laser, ASPRS Proc. of the 17th Biennial Workshop on Color Photography and Videography in Resource Assessment, Reno, NV. Copyright © 2004 CRC Press, LLC 186 GeoSensor Networks . ACQUISITION A system for collecting multi-spectral images and navigation metadata is currently under development. The system is built from commercial-off-the- shelf (COTS) parts and is designed. pp. 16 3-1 68. Figure 2: A high resolution DEM and a corresponding ortho- image generated from two overlapping multi- spectral digital images. The ground sampling distance for the DEM is 10cm SPIE, Vol. 2486, pp. 11 3-1 23, Orlando, FL. Schultz, H., 1994. Terrain Reconstruction from Oblique Views, Proc. DARPA Image Understanding Workshop, Monterey, CA, pp. 100 1-1 008. Seckel, E., 1964.