Handbook of Research on Geoinformatics Hassan A Karimi University of Pittsburgh, USA Information science reference Hershey • New York Director of Editorial Content: Director of Production: Managing Editor: Assistant Managing Editor: Typesetter: Cover Design: Printed at: Kristin Klinger Jennifer Neidig Jamie Snavely Carole Coulson Jeff Ash Lisa Tosheff Yurchak Printing Inc Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E Chocolate Avenue, Suite 200 Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: cust@igi-global.com Web site: http://www.igi-global.com and in the United Kingdom by Information Science Reference (an imprint of IGI Global) Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanbookstore.com Copyright © 2009 by IGI Global All rights reserved No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher Product or company names used in this set are for identi.cation purposes only Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark Library of Congress Cataloging-in-Publication Data Handbook of research on geoinformatics / Hassan A Karimi, editor p cm Includes bibliographical references and index Summary: "This book discusses the complete range of contemporary research topics such as computer modeling, geometry, geoprocessing, and geographic information systems" Provided by publisher ISBN 978-1-59904-995-3 (hardcover) ISBN 978-1-59140-996-0 (ebook) Geographic information systems Research Handbooks, manuals, etc I Karimi, Hassan A G70.212.H356 2009 910.285 dc22 2008030767 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library All work contributed to this book set is original material The views expressed in this book are those of the authors, but not necessarily of the publisher If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating the library's complimentary electronic access to this publication Editorial Advisory Board Yvan Bédard Université Laval, Canada Matt Duckham University of Melbourne, Australia Michael Gould Universitat Jaume I, Spain Stephen Hirtle University of Pittsburgh, USA May Yuan University of Oklahoma, USA Alexander Zipf University of Applied Sciences, Germany List of Contributors Aditya, Trias / Gadjah Mada University, Indonesia 42 Argyreas, Nikolaos / National Center of Scienti.c Resear ch “Demokritos”, Greece 422 Arpinar, Ismailcem Budak / University of Georgia, USA 161 Bai, Yuqi / George Mason University, USA 171, 213, 222 Bernabé, Miguel Ángel / Technical University of Madrid, Spain 36 Chandramouli, Magesh / Purdue University, USA 137, 320 Chen, Aijun / George Mason University, USA 171, 213, 222 Córcoles, Jose E / Castilla La-Mancha University, Spain 1, 11 Corral, Antonio / University of Almería, Spain 20 Curtin, Kevin M / George Mason University, USA 113, 246 D’Ulizia, Arianna / Consiglio Nazional delle Ricerche, IRPPS, Italy 340 de la Osa, Maikel Garma / University of Havana, Cuba 65 Delmelle, Eric / University of North Carolina at Charlotte, USA 89 Dezzani, Raymond / University of Idaho, USA 89 Di, Liping / George Mason University, USA 171, 178, 196, 205, 213, 222 Duckham, Matt / University of Melbourne, Australia 254 Esbrí, Miguel Ángel / Universitat Jaume I, Spain 189 Ferri, Fernando / Consiglio Nazional delle Ricerche, Italy 340 Gardarin, Georges / PRiSM Laboratory, France 350 Gillavry, Edward Mac / Webmapper, The Netherlands 388 Gontran, Hervé / Swiss Federal Institute of Technology (EPFL), Switzerland and JM Vuadens SA, Switzerland 51 González, Pascual / Castilla La-Mancha University, Spain 1, 11 Gould, Michael / Universitat Jaume I, Spain 36, 100, 311 Granell, Carlos / Universitat Jaume I, Spain 36, 189 Grifoni, Patrizia / Consiglio Nazional delle Ricerche, IRPPS, Italy 340 Hakimpour, Farshad / University of Georgia, USA 161 Hanke, Henrik / University of Duisburg-Essen, Germany 269 Hansen, Stefan / Spatial/Information Systems Ltd./LISAsoft, Australia 230 Hegedüs, Péter / Budapest University of Technology and Economics, Hungary 239 Hirtle, Stephen / University of Pittsburgh, USA 58 Hosszú, Gábor / Budapest University of Technology and Economics, Hungary 239 Huang, Bo / Chinese University of Hong Kong, China 137, 320 Iqbal, Muhammad Usman / University of New South Wales, Australia 293 Kakaletris, George / University of Athens, Greece 433 Kathlene, Lyn / Colorado State University, USA 369 Katsianis, Dimitris / University of Athens, Greece 433 Klippel, Alexander / University of Melbourne, Australia 230 Kovács, Ferenc / Budapest University of Technology and Economics, Hungary 239 Kraak, Menno-Jan / International Institute of Geo-Information Science and Earth Observation (ITC), The Netherlands 42 Ku, Wei-Shinn / Auburn University, USA 285 Lazar, Alina / Youngstown State University, USA 106 Liao, Guangxuan / George Mason University, USA 222 Lim, Samsung / University of New South Wales, Australia 293 Liu, Yang / George Mason University, USA 171 Manso, Miguel Ángel / Technical University of Madrid, Spain 36 Meenar, Mahbubur R / Temple University, USA 73, 277 Misra, Santosh K / Cleveland State University, USA 400 Neumann, Alf / University of Cologne, Germany 269 Núđez-Rodríguez, Yurai / Queen’s University, Canada 82 Orosz, Mihály / Budapest University of Technology and Economics, Hungary 239 Pazos, Andrés / Universitat Jaume I, Spain 311 Perry, Matthew / University of Georgia, USA 161 Poveda, José / University of Texas, USA 100, 311 Quddus, Mohammed A / Loughborough University, UK 302 Rachev, Boris / Technical University of Varna, Bulgaria 20 Richter, Kai-Florian / Universiät Bremen, Germany 230 Sánchez, Yissell Arias / University of Havana, Cuba 65 Savary, Lionel / PRiSM Laboratory, France 350 Shellito, Bradley A / Youngstown State University, USA 106 Sheth, Amit / University of Georgia, USA 161 Sikder, Iftikhar U / Cleveland State University, USA 154, 332, 400 Skogster, Patrik / Rouaniemi University of Applied Sciences, Finland 28 Sorrentino, John A / Temple University, USA 73, 277 Sphicopoulos, Thomas / University of Athens, Greece 433 Stoeva, Mariana / Technical University of Varna, Bulgaria 20 Thomopoulos, Stelios C A / National Center of Scienti.c Resear ch “Demokritos”, Greece 422 Valova, Irena / University of Rousse, Bulgaria 20 Varoutas, Dimitris / University of Athens, Greece 433 Vassilakopoulos, Michael / University of Central Greece, Greece 20 Wang, Haojun / University of Southern California, USA 285 Wei, Yaxing / George Mason University, USA 171, 213, 222 Yang, Wenli / George Mason University, USA 178, 196, 205 Yang, Xiaojun / Florida State University, USA 122, 129 Yemsin, Sharmin / Temple Univeristy, USA 277 Yu, Genong / George Mason University, USA 178, 196, 205 Yuan, May / University of Oklahoma, USA 144 Yue, Peng / George Mason University, USA & Wuhan University, China 178, 196, 205 Zadorozhny, Vladimir I / University of Pittsburgh, USA 260 Zeitouni, Karine / PRiSM Laboratory, France 350 Zhao, Baohua / University of Science and Technology China, China 222 Zhao, Peisheng / George Mason University, USA 178, 196, 205 Zimmermann, Roger / National University of Singapore, Singapore 285 Table of Contents Preface xxviii Section I Spatial Databases Chapter I GML as Database: Present and Future Jose E Córcoles, Castilla La-Mancha University, Spain Pascual González, Castilla La-Mancha University, Spain Chapter II Querying GML: A Pressing Need 11 Jose E Córcoles, Castilla La-Mancha University, Spain Pascual González, Castilla La-Mancha University, Spain Chapter III Image Database Indexing Techniques 20 Michael Vassilakopoulos, University of Central Greece, Greece Antonio Corral, University of Almería, Spain Boris Rachev, Technical University of Varna, Bulgaria Irena Valova, University of Rousse, Bulgaria Mariana Stoeva, Technical University of Varna, Bulgaria Chapter IV Different Roles and Definitions of Spatial Data Fusion 28 Patrik Skogster, Rouaniemi University of Applied Sciences, Finland Chapter V Spatial Data Infrastructures 36 Carlos Granell, Universitat Jaume I, Spain Michael Gould, Universitat Jaume I, Spain Miguel Ángel Mansom, Technical University of Madrid, Spain Miguel Ángel Bernabé, Technical University of Madrid, Spain Chapter VI Geoportals and the GDI Accessibility 42 Trias Aditya, Gadjah Mada University, Indonesia Menno-Jan Kraak, International Institute of Geo-Information Science and Earth Observation (ITC), The Netherlands Chapter VII Real-Time Extraction of the Road Geometry 51 Hervé Gontran, Swiss Federal Institute of Technology (EPFL), Switzerland and JM Vuadens SA, Switzerland Section II Mapping and Visualization Chapter VIII Cognitive Maps 58 Stephen Hirtle, University of Pittsburgh, USA Chapter IX Map Overlay Problem 65 Maikel Garma de la Osa, University of Havana, Cuba Yissell Arias Sánchez, University of Havana, Cuba Chapter X Dealing with 3D Surface Models: Raster and TIN 73 Mahbubur R Meenar, Temple University, USA John A Sorrentino, Temple University, USA Chapter XI Web Map Servers Data Formats 82 Yurai Núđez-Rodríguez, Queen’s University, Canada Chapter XII Overview, Classi.cation and Selection of Map Projections for Geospatial Applications 89 Eric Delmelle, University of North Carolina at Charlotte, USA Raymond Dezzani, University of Idaho, USA Section III Analysis Chapter XIII About the Point Location Problem 100 José Poveda, University of Texas, USA Michael Gould, Universitat Jaume I, Spain Chapter XIV Classification in GIS Using Support Vector Machines 106 Alina Lazar, Youngstown State University, USA Bradley A Shellito, Youngstown State University, USA Chapter XV Network Modeling 113 Kevin M Curtin, George Mason University, USA Chapter XVI Artificial Neural Networks 122 Xiaojun Yang, Florida State University, USA Chapter XVII Spatial Interpolation 129 Xiaojun Yang, Florida State University, USA Chapter XVIII Spatio-Temporal Object Modeling 137 Bo Huang, Chinese University of Hong Kong, China Magesh Chandramouli, Purdue University, USA Chapter XIX Challenges and Critical Issues for Temporal GIS Research and Technologies 144 May Yuan, University of Oklahoma, USA Chapter XX Rough Sets and Granular Computing in Geospatial Information 154 Iftikhar U Sikder, Cleveland State University, USA Section IV Ontologies Chapter XXI Geospatial and Temporal Semantic Analytics 161 Matthew Perry, University of Georgia, USA Amit Sheth, University of Georgia, USA Ismailcem Budak Arpinar, University of Georgia, USA Farshad Hakimpour, University of Georgia, USA Chapter XXII Geospatial Image Metadata Catalog Services 171 Yuqi Bai, George Mason University, USA Liping Di, George Mason University, USA Aijun Chen, George Mason University, USA Yang Liu, George Mason University, USA Yaxing Wei, George Mason University, USA Chapter I GML as Database: Present and Future Jose E Córcoles Castilla La-Mancha University, Spain Pascual González Castilla La-Mancha University, Spain Abstr act An interesting feature of GML is to consider it as a database, but only in the strictest sense of the term That is, as a collection of data As a database format, it can be queried In order to this, we need a query language with spatial operators In addition, in order to use any query language over GML, it is necessary to find an implementation that allows to exploit all its features, i.e., an efficient storage of GML documents is necessary The general aim of this chapter is to discuss different approaches for storing and querying GML documents In order to achieve our aim we discuss well-known approaches to the storage of XML documents (with only alphanumeric data) and their application to GML documents Although there are many approaches to storing and retrieving XML documents with only alphanumeric features, few approaches are applicable to query GML documents Introduct ion An interesting feature of eXtensible Markup Language (XML) (W3C, 2005) is to consider it as a database, but only in the strictest sense of the term That is, as a collection of data In many ways, this makes it no different from any other file As a database format, XML has several advantages For example, it is self-describing (the markup describes the structure and type names of the data, although not the semantics), it is portable (Unicode), and it can describe data in tree or graph structures This point of view makes it possible to open a new set of XML applications, all of them Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited GML as Database involving storage and retrieval of information represented by XML One example of XML as a database is a restaurant catalog It could be defined with alphanumeric features, for example name, phone number, address, and capacity An advantage of XML is that the data is portable, and it can easily be manipulated for inserting, updating, and deleting information Another example could be an extension of the previous example Besides representing alphanumeric information, the authors can include spatial features Thus, they include a polygon (spatial coordinates) to represent the parcel where the restaurant is located In this case, our set of data with alphanumeric features (name, number, etc.) and spatial feature (parcel) would be a Geography Markup Language (GML) document (Open Geospatial Consortium, 2003), instead of an XML document Since XML (GML by extension) is a database, it can be queried In order to this, we need a query language (of general use) to retrieve information from an XML document Nevertheless, it is necessary to enrich the query language over XML features with spatial operators if we wish to apply it over spatial data encoded with GML Otherwise, the query language could only be used to query alphanumeric features of an XML document and not, for example, the topological relationship between two spatial regions Today, there is a large set of query languages over XML These query languages are different with respect to syntax, available operators and environment of applicability However, they share the same features, that is, features of query languages over semi-structured data This is because XML is not structured data, but instead has a structure that is flexible (Abiteboul et al., 1997) More specifically, in order to use any query language over GML (Córcoles and González, 2001), it is necessary to find an implementation that allows to exploit all its features, that is, an efficient storage of GML documents is necessary Although there are many approaches to storing and retrieving XML documents with only alphanumeric features (McHugh et al., 1997; Yoshikawa and Amagasa, 2001; Bohannon et al, 2002), few approaches are applicable to query GML documents (Córcoles and González, 2002; Huang et al., 2006) BaC kg ROUND The general aim of this chapter is to discuss different approaches for storing and querying GML documents In order to achieve our aim we discuss well-known approaches to the storage of XML documents (with only alphanumeric data) and their application to GML documents In the following sections we show that this is not a trivial problem because, due to the resources required to query and store spatial elements, appropriate XML-based approaches with alphanumeric operators not obtain good results when combined with spatial operators Furthermore, some XML-based approaches are not applicable to GML douments Many approaches to storing and retrieving XML documents have been implemented to date, and several database management systems for storing XML documents have been developed (e.g., McHugh et al., 1997) There are also several approaches based on the relational model or object-oriented model When XML documents are stored in off-the-shelf database management systems, the problem of storage model design for storing XML data becomes a database schema design problem Yoshikawa and Amagasa (2001) divide such database schemas into two approaches: structure-mapping and model-mapping In the former, the design of the database schema is based on the understanding of Document Type Descriptor (DTD) or XML Schema that describes the structure of XML documents (e.g., Bohannon et al., 2002; Kappel et al., 2000; Lee and Chu 2000; Huang et al., 2006; Klettke and Meyer, 2000; GML as Database Shanmugasundaram et al., 1999) In the latter, a fixed database schema is used to store any XML document without the assistance of XML Schema (e.g., Jiang et al., 2002; Yoshikawa and Amagasa, 2001; Kanne and Moerkotte, 2000; Schmidt et al., 2000; Florescu and Kossmann, 1999) Also, commercial and open source databases support particular alternatives for storing XML documents Examples include Oracle 10g, Tamino, DBXML, X-Hive, Excelon, DB2, Apache Xindice, eXist and Ozone/XML Approaches to store GML documents have been developed recently Zhu et al (2006) propose an approach to map GML schema to object-relational database schema by using GML schema graph, and algorithms for storing/querying valid GML documents into/from the relations generated by the corresponding object-relational schema Shrestha (2004) studied various XML technologies and their use for GML by experimenting with several approaches of XML storage with both structured storage and unstructured storage in XML-enabled database systems, native-XML database systems and hybrid storages Jeung and Park (2004) and Li et al (2004) proposed GML storage and query methods by extending spatial databases such as PostgreSQL/SPE and Oracle spatial In the next section, we discuss approaches based on relational databases to store GML documents In this way, we can use a complete set of data management services (including concurrency control, crash recovery, scalability, etc.) and benefit from the highly optimised relational query processor In addition, the Relational Database Management System (RDBMS) allows us to store spatial objects according to the possibilities offered in Open Geospatial Consortium (1999) The disadvantage of these approaches is that the relational databases that are built to support structured data and the requirements of processing XML data (semi-structured) are vastly different from the requirements of processing traditional data Appro aches Of all the alternatives for storing XML documents, three approaches are discussed: (1) LegoDB (Bohannon et al., 2002) inspired by the solution proposed by Shanmugasundaram et al., (1999) (structure-mapping approach); (2) Monet (Schmidt et al., 2000), a model-mapping approach; (3) XParent (Jiang et al., 2002), a model-mapping approach These approaches have been selected because they possess four important features: (i) they have good performances over XML documents with alphanumeric data (no spatial data) (ii) they support, or could be updated easily to store, spatial information; an efficient storage of spatial information offers the chance to create a spatial index for spatial objects (which is not a trivial solution in XML) (iii) they can be modified to store spatial objects in line with the possibilities offered in Open Geospatial Consortium (2003) (iv) they are not dependent on a particular RDBMS XRel (Yoshikawa and Amagasa, 2001) is not included in this list because although it has the second feature, in comparison with XParent, it performs worse (Jiang et al., 2002) The Edges approach (Florescu and Kossmann, 1999) is not included because it stores all values in the same column and so needs to make a type of coercion in order to compare value with type difference in the query Commercial solutions not satisfy some of the requirements mentioned above; storing GML documents in Oracle Large Objects (LOBs) does not allow spatial information to be indexed correctly as it is necessary to retrieve GML as Database spatial information efficiently (Samet, 1990), and the absence of a definitive standard to define these alternatives makes it impossible to use objectrelational/relational storage without depending on a commercial database Figure shows a simplified data model of a GML document representing a City Model This example data model was developed by Córcoles and González (2001) The symbols within the circles are unique identifiers of each vertex In this document, a city has several blocks and each block has several parcels This data graph is used to detail the following approaches LegoDB 1st Approach LegoDB solution (Bohannon et al., 2002) is a structure-mapping approach, and the database schema design is based on the understanding of DTD (XML-Schema) which describes the structure of XML (GML) documents This approach processes a DTD/XML-Schema to generate a relational schema In general, the LegoDB mapFigure Example of data graph ping engine creates a table for each such type (Citymodel) and maps the contents of the elements (name, population, extentof, etc.) to columns in the tables The mapping generates a key column that contains the id of the corresponding element, and a foreign key that keeps track of the parentchild relationship In the same DTD/XML-Schema, there are indeed a very large number of possible rewritings applicable to XML schemas, which can focus on a limited set of such rewritings as correspond to interesting storage alternatives (Bohannon et al., 2002) In Figure 2, an example over the data graph in Figure is shown It is the simplest mapping proposed, usually called inlining, so it was advocated as one of the main heuristics in Shanmugasundaram et al (1999) This approach respects the data type defined Thus, a modification to store spatial objects is not necessary because the complex object can be directly mapped in the column and a spatial index created for each one of them GML as Database Monet 2nd Approach As a variation of the Edge approach (Florescu and Kossmann, 1999), Monet (Schmidt et al., 2000) stores XML data graphs in multiple tables In other words, Monet partitions the Edge table according to all possible label-paths For each unique path, Monet creates a table For example, the leftmost path in Figure needs to be stored in three tables, namely state, state.name and state name.cdata The first two are for the element nodes and the last one is for the text node For element nodes, the corresponding element table has three attributes: Source, Target and Ordinal The Source and Target attributes together specify a unique edge in an XML data graph (source attribute is a foreign key of the Target attribute in the Parent table) For nodes (or attributes), Monet creates a table with two attributes, Id and Value Unlike the Edge approach, there are no corresponding Label and Flag attributes The table name implicitly specifies its labels and type (flag) The number of tables equals the number of distinct label-paths For the XML data graph in Figure 1, 32 tables could be obtained by the Monet approach In the same way as the LegoDB approach, a modification of the Monet approach is not necessary for the storage of spatial objects because Monet respects the value types and each value (simple or complex) is stored in distinct columns XParent 3rd Approach XParent (Jiang et al., 2002) is a four table database schema, comprising LabelPath, DataPath, Element and Data (LabelPath (Id, Len, Path); DataPath (Pid,Cid); Element (PathID, Did, Ordinal); Data (PathId, Did, Ordinal, Value)) Table LabelPath stores label-paths Each label-path, identified by a unique Id, is stored in the attribute Path The number of edges of the label-path is recorded in the attribute Len Because the data-path usually varies in length, and can be very large, the author stores pairs of node identifiers in the table DataPath Here, Pid Figure Example of LegoDB approach GML as Database and Cid are the parent-node Id and child-node Id of an edge in the data-path In Element and Data tables, PathID is a foreign key of the Id in table LabelPath, which identifies the label-path of a node The Did (for Data-Path Id) is a node identifier which also serves as a unique data-path identifier for the data-path ending at the node itself DataPath keeps a parent-child relationship Therefore, it also needs joins in order to check edge connections This approach needs to be modified to support spatial objects efficiently In the Data table all values are stored as text, e.g., XParent converts the values of different types (integer, string, real, etc.) into values of type string Córcoles and González (2001) have eliminated this limitation by storing the spatial objects in different tables In this way, the simple values continue being stored in the Data table, and the complex objects (spatial objects) are stored in other tables (the number of tables equals the number of distinct label-paths with spatial objects, as in Monet) Thus, it is possible to store different spatial objects (state boundaryby, parcel extentof, etc.) in different tables with different spatial indices The structure of the new table is equal to the Data table, but the type of the Value column depends on the type of the spatial object that will be stored In Figure an example of data graph storage is shown Quer y Process ing In this section, we briefly discuss how GML (XML) queries are translated into Structured Query Language (SQL) statements for different mapping schemas and the operations involved in the SQL statements The general aim of this section is to show the most important operations needed to execute the same query in the different models We want to highlight the joins that are necessary in each model The Join operation is the most important operation (it is very expensive), so great attention must be paid to it when a relational schema for storing XML documents is designed An XML query is described below which uses the query language syntax by Córcoles and González (2001) In this mapping, we use the following syntax for the Area operator in SQL: Area(s Surface):Double Precision (Open Geospatial Consortium, 1999) Example 1: Given the XML data graph in Figure 2, select the number of all parcels with area greater than 200 Q1: select p.number from State.#.Parcel p where Area(p.extentof) > 200 Two label-paths are involved in this example, State.#.Parcel.Number and State.#.Parcel.Extentof (simplification in our query language of State CityMember.Block.BlockMember.Number and State.CityMember.Block.BlockMember Extentof) Because all data is stored in tables in a relational database management system, for this query, the system needs to test all pairs (Number, Extentof) First, the pairs must be for the same Parcel Second, the Extentof must be an Area greater than 200 Due to the differences in database schemata, in order to process this query, a first approximation of LegoDB uses selection and equijoins Monet uses selection and equijoins XParent uses selections and equijoins SQL1: A translated SQL query for the XML query Q1 using legoDB SQL1 shows the translated SQL query using the LegoDB schema In this rewriting of LegoDB, all attributes of the parcel element are GML as Database Figure Example of XParent approach stored in the same table For this reason, joins are necessary As mentioned above, inlining is used to represent the data graph of Figure Inlining has some similarities with vertical partitioning It reduces the need for joins when accessing the contents of an element, but it increases the size of the corresponding table SQL2 shows the translated SQL query using the Monet schema Monet uses a large number of small relations It is important to know that, even for simple path queries, the number of joins is not small In this simple example, Monet keeps all names and edges in two tables: State.CityMember.Block.BlockMemeber.Parcel Extentof.Cdata State.CityMember.Block.BlockMemeber.Parcel Number.Cdata However, it needs to join State.CityMember.Block.BlockMemeber.Parcel, State.CityMember.Block.BlockMemeber.Parcel Number, State.CityMember.BlockBlockMemeber.Parcel Extentof GML as Database SQL2 SQL2: A translated SQL query for the XML query Q1 using Monet where State.CityMember.Block.BlockMemeber Parcel is the least upper bound of the two labelpaths, State.CityMember.Block.BlockMemeber Parcel.Number and State.CityMember.Block BlockMemeber.Parcel.Extentof Obviously, if the least upper bound ancestor is far away from the corresponding Cdata tables, more equijoins are needed SQL3 shows the translated SQL query using the XParent schema This approach uses three joins to identify the three path identifiers Then it uses three joins to check edge connections In all, equijoins are used In the same way as Monet, if the least upper bound ancestor is far away from the corresponding Data tables, more equijoins are needed between DataPath Tables C onc lus ion In this chapter, we have described three models for the storage of a GML document These approaches are well-known storage models for XML documents over a RDBMS We have focused on RDBMS-based approaches because they have efficient storage and retrieval techniques and can make use of various indexing mechanisms to evaluate a query In order to store complex objects (spatial objects), a modification of XParent has been implemented Modification of the Monet and LegoDB models was not necessary Córcoles and González (2002) provide an overview of these approaches This overview is focused on two important problems: (i) In order to store XML (GML) documents over a relational model a very large number of joins are necessary SQL3 SQL3: A translated SQL query for the XML query Q1 using XParent GML as Database This is one of the most important performance problems in the relational implementation of these query languages (ii) Storing and querying spatial data requires a larger amount of resources than storing and querying alphanumeric data The experiments show that the inclusion of spatial operators influences the performance of these approaches The performance obtained with only alphanumeric operators varies greatly when we include spatial operators We can therefore infer that the modification of XParent is not a good solution for spatial queries Monet has a good scalability in comparison with LegoDB, but the elapsed times are worse than LegoDB In addition, the number of joins in Monet depends on the length of the path and the number of paths involved in the query This limitation makes Monet a good solution to query spatial data in shallow documents and in queries with few attributes involved (two features not guaranteed in GML) LegoDB considerably reduces the number of joins and obtains a limit regardless of the attributes involved in the query In view of the results obtained, the best approach for the storage and retrieval of GML documents with our query language is LegoDB Furthermore, we think the fact that LegoDB is a structure-mapping approach makes this solution more natural for storing GML, because the XMLSchema (DTD) of a GML document has to be known Future work on GML should involve the development of specific GML database management systems for storing and retrieving GML (or part of it) efficiently (GML-Native) Today, research such as that of Huang et al (2006) is contributing to progress in this area R eferences Abiteboul, S., Quass, S., McHugh, J., Widom, J., & Wiener, J (1997) The Lorel Query Language for Semistructured Data International Journal on Digital Libraries, 1(1), 68-88 Bohannon, P., Freire, J., Roy, P., & Simeon, J (2002) From XML Schema to Relations: A Cost-Based Approach to XML Storage 18th International Conference on Data Engineering (ICDE2002) Córcoles, J E., & González, P (2001) A Specification of a Spatial Query Language over GML ACM-GIS 2001 9th ACM International Symposium on Advances in Geographic Information Systems Atlanta (USA) Córcoles, J E., & González, P (2002) Analysis of Different Approaches for Storing GML Documents ACM-GIS 2002 10th ACM International Symposium on Advances in Geographic Information Systems McLean (USA) Florescu, D., & Kossmann, D (1999) Storing and Querying XML Data Using a RDBMS Data Engineering Bulletin, 22, Huang C., Chiang, T Deng, D., & Lee, H (2006) Efficient GML-native processors for web-based GIS: techniques and tools ACM-GIS 2006 14th ACM International Symposium on Advances in Geographic Information Systems Arlington (USA) Jeung, H., & Park, S (2004) A GML data storage method for spatial databases Journal of GIS Association of Korea, 12(4), 307–319 Jiang, H., Lu, H., Wang, W., & Xu Yu, J (2002) Path Materialization Revisted: An efficient Storage Model for XML Data 2nd Australian Institute of Computer ethics Conference (AICE2000) Canberra Australia Kanne, C., & Moerkotte, G (2000) Efficient storage of XML data In proceedings of the international conference on Data engineering Kappel, G., Kapsammer, E., Raush.Schott, S., & Retschitzegger, W (2000) X-ray – towards integrating XML and relational database systems GML as Database Klettke, M., & Meyer, H (2000) Managing XML documents in Object-Relational databases Workshop on the Web and Databases (WebDB) Lee, D., & Chu, W (2000) Contraints-preserving transformation from XML document type definition to relational schema In proceedings of the International Conference on Conceptual Modelling Li, Y., Li, J., & Zhou, S (2004) GML storage: A spatial database approach In Proc of ER’04 Workshops -CoMoGIS, CoMWIM, ECDM, CoMoA, DGOV, and eCOMO, 55–66 McHugh, J., Abiteboul, S., Goldman, R., Quass, D., & Widom, J (1997) Lore: A database management system for Semi-structured data SIGMOD Record, 26(3) Schmidt, A R., Kersten, M L., Windhouwer M A., & Waas, F (2000) Efficient Relational Storage and Retrieval of XML Documents Workshop on the Web and Databases (WebDB) Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., & Naughton, J (1999) Relational Databases for Querying XML Documents: Limitations and Opportunities In Proc of the Int’l Conf On Very Large Databases Shrestha, B (2004) XML database technology and its use for GML Master Thesis, International Institute for Geo-information Science and Earth Observation (ITC), The Netherlands Samet, H (1990) Applications of spatial data structures Computer Graphics, Image processing and GIS Addison – Wesley Open Geospatial Consortium (1999) Simple Features Specification For SQL, 05-1341 Open Geospatial Consortium Retrieved 13th January, 2005, from http://www.opengeospatial.org 10 Open Geospatial Consortium (2003) Geography Markup Language – GML Retrieved 13th January, 2005, from http://www.opengis.org/techno/ documents/02-023r4.pdf 2003 W3C (2005) Extensible Markup Language – XML Retrieved 13th January, 2005, from http://www.w3c.org/XML/ Yoshikawa, M., & Amagasa, T (2001) XRel: A path-based approach to storage and retrieval of XML documents using relational databases ACM Transactions on Internet Technology, 1(1) Zhu, F., Guan, J., Zhou, J., & Zhou, S (2006) Storing and Querying GML in Object-Relational Databases ACM-GIS 2006 14th ACM International Symposium on Advances in Geographic Information Systems Arlington (USA) key T er ms GML: Geography Markup Language - GML is an XML grammar written in XML Schema for the modelling, transport, and storage of geographic information Joins Operator: Join is a dyadic operator that is written as R S in relational algebra where R and S are relations The result of the join is the set of all combinations of tuples in R and S that are equal on their common attribute names This operation is the most expensive in an implementation SQL: Structured Query Language – SQL is the most popular computer language used to create, modify and retrieve data from relational database management systems XML: eXtensible Markup Language XML is a W3C-recommended general-purpose markup language for creating special-purpose languages 11 Chapter II Querying GML: A Pressing Need Jose E Córcoles Castilla La-Mancha University, Spain Pascual González Castilla La-Mancha University, Spain Abstr act As a database format, XML (GML by extension) can be queried In order to this, we need a query language (of general use) to retrieve information from an XML document Nevertheless, it is necessary to enrich the query language over XML features with spatial operators if we wish to apply it over spatial data encoded with GML Otherwise, these query languages could only be used to query alphanumeric features of an XML document and not, for example, the topological relationship between two spatial regions Today, there is a large set of query languages over XML These query languages are different with respect to syntax, available operators and environment of applicability However, they share the same features, that is, features of query languages over semi-structured data With respect to GML, from the literature, it is known that four GML query languages have been proposed The following chapter briefly describes these query languages over GML Introduct ion Today, the World Wide Web speaks a common language named eXtendible Markup Language (XML) (W3C, 2005) XML is a W3C-recommended general-purpose markup language for creating special-purpose languages, capable of describing many different kinds of data Although XML is a recent technology, its structure and its original aim is the recovery of an ancient proposal named Standard Generalized Markup Language (SGML), which dates back to the 1970s XML is mainly used on the Web as an exchange format, that is, its primary purpose is to facilitate the sharing of data across different systems This use solves the syntactic heterogenic, Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited Querying GML as both sources use the same structure in order to represent their data However, XML has been extended in many more applications, for example, configuration files, databases, and so forth XML has several features that make it wellsuited for its purpose Some of these are: it is in both human and machine-readable format; it supports Unicode, allowing almost any information in any human language to be communicated; the ability to represent the most general computer science data structures: records, lists and trees; the selfdocumenting format that describes structure and field names as well as specific values; the strict syntax and parsing requirements that allow the necessary parsing algorithms to remain simple, efficient, and consistent Although XML has a general purpose, there is a wide range of specific-purpose languages based on XML For example, Geography Markup Language (GML), Resource Description Framework (RDF), Scalable Vector Graphics (SVG), Mathematical Markup Langauge (MathML) and Virtual Reality Markup Language (VRML) All of them are defined in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form In particular, GML, defined by Open Geospatial Consortium (Open Geospatial Consortium, 2003), plays an important role in spatial systems in general and in geospatial systems in particular GML can be defined as follows: “An XML encoding for the transport and storage of geographic information, including both the spatial and nonspatial properties of geographic features” However, a formal definition given by the GML specification is: “Geography Markup Language is an XML grammar written in XML Schema for the modelling, transport, and storage of geographic information” The key concepts used by GML to model the world are drawn from the OpenGIS Abstract Specification and the ISO 19100 series GML provides a variety of objects for describing geography including features, coordinate 12 reference systems, geometry, topology, time, units of measure, and generalized values A geographic feature is “an abstraction of a real world phenomenon; it is a geographic feature if it is associated with a location relative to the Earth” So a digital representation of the real world can be thought of as a set of features The state of a feature is defined by a set of properties, where each property can be thought of as a {name, type, value} triple The number of properties a feature may have, together with their names and types, are determined by its type definition Geographic features with geometry are those with properties that may be geometry-valued A feature collection is a collection of features that can itself be regarded as a feature; as a consequence of this a feature collection has a feature type and thus may have its own distinct properties, in addition to the features it contains XML as a Dat ab ase: Quer y ing XML Undoubtedly, an interesting advantage of XML is that it can be used as a database, even though an XML document is a database only in the strictest sense of the term, i.e., it is a collection of data In many ways, this makes it no different from any other file As a database format, XML has several advantages For example, it is self-describing (the markup describes the structure and type names of the data, although not the semantics), it is portable (Unicode), and it can describe data in tree or graph structures These properties make it possible to open a new set of XML applications, all of them involving storage and retrieval of information represented by XML One example of XML as a database is a restaurant catalog It could be defined with alphanumeric features, e.g., name, phone number, address, Querying GML capacity An advantage of XML is that the data is portable, and it can easily be manipulated for inserting, updating and deleting information Another example could be an extension of the previous example Besides representing alphanumeric information, we can include spatial features Thus, we include a polygon (spatial coordinates) to represent the parcel where the restaurant is located In this case, our set of data with alphanumeric features (name, number, etc.) and spatial feature (parcel) would be a GML document (Open Geospatial Consortium,2003), instead of an XML document Since XML (GML by extension) is a database, it can be queried In order to this, we need a query language (of general use) to retrieve information from an XML document Nevertheless, it is necessary to enrich the query language over XML features with spatial operators if we wish to apply it over spatial data encoded with GML Otherwise, these query languages could only be used to query alphanumeric features of an XML document and not, for example, the topological relationship between two spatial regions Today, there is a large set of query languages over XML These query languages are different with respect to syntax, available operators and environment of applicability However, they share the same features, that is, features of query languages over semi-structured data This is because XML is not structured data, but instead has a structure that is flexible (Abiteboul et al., 1997) Although there is a large set of query languages over XML, the most widespread are the following: XQuery, XQL, XML-QL and Lorel XQL (Robie, 1998) is a notation for selecting the elements and text of XML documents XQL can be considered a natural extension to the XSL pattern syntax (W3C, 1998) It is designed with the goal of being syntactically very simple and compact, with reduced expressive power XMLQL (Deutsch et al., 1999) extends Structured Query Language (SQL) with an explicit construct clause for building the document resulting from the query and uses the element patterns to match data in an XML document XML-QL can express queries as well as transformations for integrating XML data from different sources LOREL (Abiteboul et al., 1997) was originally designed for querying semi-structured data and has now been extended to XML data It is a user-friendly language in the SQL style, including a string mechanism for type coercion and permitting very powerful path expressions, which is extremely useful when the structure of a document is not known in advance XQuery (W3C, 2001) was developed by the W3C Query work group for querying XML documents Its working draft was published in 2003 and describes XQuery as a language in which queries are concise and easy to understand, and applicable across many types of XML information sources Quer y ing G ML As it is mentioned above, since GML is an application of the XML standard to geographic data, it can be queried However, querying geographic data often involves spatial relations and spatial operations The XML query languages developed so far not take these issues into account, and thus are not fully suitable to query geographic data in GML format The evolution of XML database systems could make it possible to store geographic data in GML format For this to happen, the underlying database systems must have a query language capable of performing spatial queries Until now, no such GML database systems have been developed, but the growing interest in GML might lead to a GML database with a query language capable of performing such queries From the literature, it is known that four GML query languages have been proposed The following sections briefly describe these query languages 13 Querying GML GML-Speci.c Query Language The first GML-specific query language was developed by Córcoles and González (2001) Unlike XML query languages, the main feature of this language is that it includes spatial operators in its specifications To carry out the implementation of this language it is necessary to previously define a data model and an algebra that support basic features of XML query languages and spatial features In order to define an algebra and data model, Córcoles and González (2001) extended a data model and algebra defined by Beech et al (1999) to allow the representation of geometry elements and geometry operators over these elements With this, querying GML or XML documents is possible because the original features from Beech et al (1999) are conserved The original approach defined by Beech et al (1999) shows how the components of an XML document and their interrelationships can be represented as a directed graph and queried with an algebra It is very simple and powerful Before XQuery, it was a W3C data model and algebra proposal to represent and query XML The syntax chosen for this language is based on SQL The Select-From-Where statement is widespread in query languages since it allows for rapid learning of the language Note that in this language no effort is made to give the definition of these features of SQL Only important features for running simple and powerful queries are developed On the other hand, many features of the query languages over XML are not considered in traditional query languages (SQL, OQL, etc.), although they are necessary in this kind of language (i.e., path expressions) The syntax of this language is very similar to the syntax defined in Lorel (Abiteboul et al., 1997) The graph contains three categories of vertices (or nodes): vertices that represent data values (Vtype(v)), vertices that represent geometry elements in GML (Vgeometry), and vertices that 14 represent the rest of non-geometry elements of a document (Velement) Figure shows the data model obtained from a GML document The type of elements represented by Vgeometry are: coordinate elements (GML:CoordType and GML:CoordinatesType), primitive geometry elements (GML:PointType, GML:LineStringType, GML:LinearRingType, GML:PolygonType, GML:BoxType), and aggregate geometry elements (GML:MultiPointType, GML:MultiLineStringType, GML:MultiPolygonType) Each one of these Types has a direct relationship with the following elements defined in the GML Schemas: Coord, Coordinates, Point, LinesString, LinearRing, Polygon, Box, MultiPoint, MultiLineString and MultiPolygon The algebra of the query language enables the selection of documents or document components that meet given criteria The algebra also supports the composition of XML documents from selected documents and their components Beech et al (1999) propose an algebra with these features that is minimal enough to provide an abstraction of the basic functionality However, in order to complete this algebra it is necessary to define a set of spatial relationship predicates to be applied over the geometry vertex The original algebra from Beech et al (1999) has the following operators: navigation (φ), kleene star (*), map, selection (with existential and universal quantification), Joins, distinct, sort, unorder, and operations to results construction All of these operators can be used in the data model explained above In addition to these operators, the main contribution to this algebra is the definition of spatial relationship predicates The set of predicates is obtained from Open Geospatial Consortium (1999) This is based on the Dimensionally Extended Nine-Intersection Model The six predicates are named disjoint, touches, crosses, within and overlaps The definition of these predicates is given in Table The term P is used to refer to 0dimensional geometries (Points and MultiPoints), Querying GML Figure Mapping a GML document to the data model M11 05.0 20.610.7 80.560.9 motorway 11 L is used to refer to one-dimensional geometries (LineStrings and MultiLineStrings) and A is used to refer to two-dimensional geometries (Polygons and MultiPolygons) In addition, other operators that support spatial analysis have been included: distance, buffer, convexhull, intersection, union, difference and symdifference To summarize, the following query contains most of the features defined It obtains all names of roads with classifications such as “motorway”, number or “num” greater than zero and intersect with all rivers with names such as “cam%” Select C.%.Road.name From [http://www.uclm.es//prove.XML].CityModel C Where C.%.clasification Like ‘Motorway’ and C.%.[number|num]>0 and C.CityMember.Road.%.LineString intersect_all ( select CM.%.River.LineString from [http://www.uclm.es//prove.XML].CityModel CM where CM.CityMember.River.name like (cam%) G ML-QL GML-QL (Vatsavai, 2002) is an extension of XQuery In this approach, XQuery has been extended with spatial predicates and analysis functions so that it can be used to perform spatial queries Spatial queries are characterized by spatial operations and spatial data types in which operations are performed Spatial operations often deal with dimension, boundary, length, area and volume of spatial objects It also involves measuring 15 ... United States of America by Information Science Reference (an imprint of IGI Global) 7 01 E Chocolate Avenue, Suite 200 Hershey PA 17 033 Tel: 717 -533-8845 Fax: 717 -533-86 61 E-mail: cust@igi-global.com... University, USA 13 7, 320 Chen, Aijun / George Mason University, USA 17 1, 213 , 222 Córcoles, Jose E / Castilla La-Mancha University, Spain 1, 11 Corral, Antonio / University... Switzerland 51 González, Pascual / Castilla La-Mancha University, Spain 1, 11 Gould, Michael / Universitat Jaume I, Spain 36, 10 0, 311 Granell, Carlos / Universitat