Web Data Management Sourav S Bhowmick Wee Keong Ng Sanjay K Madria Web Data Management A Warehouse Approach With 106 Illustrations Sourav S Bhowmick and Wee Keong Ng School of Computer Engineering Nanyang Technological University 50 Nanyang Avenue Blk N4 2A-32 Nanyang, 639798 Singapore assourav@ntu.edu.sg awkng@ntu.edu.sg Sanjay K Madria University of Missouri Department of Computer Science 1870 Miner Circle Drive 310 Computer Science Building Rolla, MO 65409 USA madrias@umr.edu Library of Congress Cataloging-in-Publication Data Bhowmick, Sourav S Web data management : a warehouse approach / Sourav S Bhowmick, Sanjay K Madria, Wee Keong Ng p cm — (Springer professional computing) Includes bibliographical references and index ISBN 0-387-00175-1 (alk paper) Web databases Database management Data warehousing I Madria, Sanjay Kumar II Ng, Wee Keong III Title IV Series QA76.9.W43B46 2003 005.75′8—dc21 2003050523 ISBN 0-387-00175-1 Printed on acid-free paper 2004 Springer-Verlag New York, Inc All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights Printed in the United States of America SPIN 10901038 Typesetting: Pages created by the author using a Springer TEX macro package www.springer-ny.com Springer-Verlag New York Berlin Heidelberg A member of BertelsmannSpringer Science+Business Media GmbH 464 Index table node pool, 374 tag, 52 tag attributes, 203 tag object, 79 tagless, 147 tags, 72, 147 target node type identifier, 129 Technion (Israel Institute of Technology), 21 Thunderstone, 20 TopBlend, 369 topic-specific PIW crawlers, 11 topological structure, 149 transaction management, 48 TranScm, 208 traverse, 50 treatment statement, 127 trigger, 13 truncation, 18 TSIMMIS, 37 tumour, 127 tuple, 27 tuple set, 257 type coercion, 48 type identifiers, 260 type-checking, 48 ULIXES, 46, 146, 147, 205 unbounded length paths, 203 UNIX, 21 unordered, 56 UnQL, 47, 146, 149 URL, 27, 127, 158, 202, 208 URL-minder, 369 user-driven coupling, 252 valid coupling query, 194 valid query, 189 validity checking phase, 181 validity conditions, 165 value, 100 value-driven predicate, 114 variables, 30 versions, 374 view-definition language, 42 virtual loose schema, 42 visibility, 399 visualization, visualize, 353 VScmDL, 208 W3QL, 21, 22 W3QS, 21, 30, 146, 147, 204, 208 warehouse, 149 warehouse data, 207 warehouse document pool, 374 warehouse node pool, 374 warehousing, Web, 1, 17 web algebra, 417 web algebraic operators, 16, 251, 367, 418 web bag, 273, 419 web cartesian product, 288 web coalesce, 358 web correlate, 424 web crawler, 5, 20 Web data, 1, 14, 18, 147, 203, 257, 417 web delta manager, 11 web deltas, 12 web directory services, 425 web distinct, 273, 417 Web documents, 127, 255, 357, 418, 420 web join, 213, 417 web manipulator, 11 web marts, 13 web miner, 11, 417 web objects, 202, 418 web operators, 251, 417 web project, 11, 216, 251, 417 web query, 99, 161, 202 web query processing systems, 391 web ranking, 424 web schema, 181, 207, 252, 355, 399, 418 web schema pool, 375 web select, 168, 247 Web sites, 2, web sort, 364 web table, 12, 176, 207, 251, 353, 355, 371, 391, 418 web table generation phase, 253 web tuple pool, 374, 375 web tuples, 200, 207, 251, 391, 418 web tuples generation phase, 253 web union, 252, 417 web warehouse, 1, 2, 5, 10, 205, 207, 287, 389, 392 web warehousing, 417 WebCQ, 370 WebGUIDE, 369 WebLog, 21, 28, 30, 147, 149, 204, 208 WebOQL, 40, 44, 146, 147, 204 Index webs, 45 WebSQL, 21, 27, 30, 147, 149, 203, 204, 208 WHIRL, 40 WHOM, 10, 94, 418 WHOWEDA, 11, 17, 146, 207, 251, 289, 367, 418 WordNet, 32 World Wide Web, wrapper, 7, 8, 35 Wrapper Specification Language (WSL), 38 WWW, 14, 18, 146, 390 X-Terminal, 24 XML, 8, 17, 210 XML Graph, 57 XML-QL, 52, 56, 146, 147, 205 XML-QL query, 58 Xpath, 90 Yahoo, 5, 20 YAT, 209 YATL , 52, 147, 205 465