Fundamentals of database systems, 6th edition

Thông tin tài liệu

This book introduces the fundamental concepts nec essary for designing, using, and implementing database systems and database applications. Our presentation stresses the fundamentals of database modeling and design, the languages and models provided by the database management systems, and database system implementation techniques. The book is meant to be used as a textbook for a one or twosemester course in database systems at the junior, senior, or graduate level, and as a reference book. Our goal is to provide an indepth and uptodate presentation of the most important aspects of database systems and applications, and related technologies. We assume that readers are familiar with elementary programming and datastructuring concepts and that they have had some exposure to the basics of computer organization.

FUNDAMENTALS OF Database Systems SIXTH EDITION This page intentionally left blank FUNDAMENTALS OF Database Systems SIXTH EDITION Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B Navathe College of Computing Georgia Institute of Technology Addison-Wesley Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Editor in Chief: Acquisitions Editor: Editorial Assistant: Managing Editor: Senior Production Project Manager: Media Producer: Director of Marketing: Marketing Coordinator: Senior Manufacturing Buyer: Senior Media Buyer: Text Designer: Cover Designer: Cover Image: Full Service Vendor: Copyeditor: Proofreader: Indexer: Printer/Binder: Cover Printer: Michael Hirsch Matt Goldstein Chelsea Bell Jeffrey Holcomb Marilyn Lloyd Katelyn Boller Margaret Waples Kathryn Ferranti Alan Fischer Ginny Michaud Sandra Rigney and Gillian Hall Elena Sidorova Lou Gibbs/Getty Images Gillian Hall, The Aardvark Group Rebecca Greenberg Holly McLean-Aldis Jack Lewis Courier, Westford Lehigh-Phoenix Color/Hagerstown Credits and acknowledgments borrowed from other sources and reproduced with permission in this textbook appear on appropriate page within text The interior of this book was set in Minion and Akzidenz Grotesk Copyright © 2011, 2007, 2004, 2000, 1994, and 1989 Pearson Education, Inc., publishing as Addison-Wesley All rights reserved Manufactured in the United States of America This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise To obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, 501 Boylston Street, Suite 900, Boston, Massachusetts 02116 Many of the designations by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or all caps Library of Congress Cataloging-in-Publication Data Elmasri, Ramez Fundamentals of database systems / Ramez Elmasri, Shamkant B Navathe.—6th ed p cm Includes bibliographical references and index ISBN-13: 978-0-136-08620-8 Database management I Navathe, Sham II Title Addison-Wesley is an imprint of QA76.9.D3E57 2010 005.74—dc22 10 1—CW—14 13 12 11 10 ISBN 10: 0-136-08620-9 ISBN 13: 978-0-136-08620-8 To Katrina, Thomas, and Dora (and also to Ficky) R E To my wife Aruna, mother Vijaya, and to my entire family for their love and support S.B.N This page intentionally left blank Preface T his book introduces the fundamental concepts necessary for designing, using, and implementing database systems and database applications Our presentation stresses the fundamentals of database modeling and design, the languages and models provided by the database management systems, and database system implementation techniques The book is meant to be used as a textbook for a one- or two-semester course in database systems at the junior, senior, or graduate level, and as a reference book Our goal is to provide an in-depth and up-to-date presentation of the most important aspects of database systems and applications, and related technologies We assume that readers are familiar with elementary programming and datastructuring concepts and that they have had some exposure to the basics of computer organization New to This Edition The following key features have been added in the sixth edition: ■ A reorganization of the chapter ordering to allow instructors to start with projects and laboratory exercises very early in the course ■ The material on SQL, the relational database standard, has been moved early in the book to Chapters and to allow instructors to focus on this important topic at the beginning of a course ■ The material on object-relational and object-oriented databases has been updated to conform to the latest SQL and ODMG standards, and consolidated into a single chapter (Chapter 11) ■ The presentation of XML has been expanded and updated, and moved earlier in the book to Chapter 12 ■ The chapters on normalization theory have been reorganized so that the first chapter (Chapter 15) focuses on intuitive normalization concepts, while the second chapter (Chapter 16) focuses on the formal theories and normalization algorithms ■ The presentation of database security threats has been updated with a discussion on SQL injection attacks and prevention techniques in Chapter 24, and an overview of label-based security with examples vii viii Preface ■ ■ Our presentation on spatial databases and multimedia databases has been expanded and updated in Chapter 26 A new Chapter 27 on information retrieval techniques has been added, which discusses models and techniques for retrieval, querying, browsing, and indexing of information from Web documents; we present the typical processing steps in an information retrieval system, the evaluation metrics, and how information retrieval techniques are related to databases and to Web search The following are key features of the book: ■ A self-contained, flexible organization that can be tailored to individual needs ■ A Companion Website (http://www.aw.com/elmasri) includes data to be loaded into various types of relational databases for more realistic student laboratory exercises ■ A simple relational algebra and calculus interpreter ■ A collection of supplements, including a robust set of materials for instructors and students, such as PowerPoint slides, figures from the text, and an instructor’s guide with solutions Organization of the Sixth Edition There are significant organizational changes in the sixth edition, as well as improvement to the individual chapters The book is now divided into eleven parts as follows: ■ Part (Chapters and 2) includes the introductory chapters ■ The presentation on relational databases and SQL has been moved to Part (Chapters through 6) of the book; Chapter presents the formal relational model and relational database constraints; the material on SQL (Chapters and 5) is now presented before our presentation on relational algebra and calculus in Chapter to allow instructors to start SQL projects early in a course if they wish (this reordering is also based on a study that suggests students master SQL better when it is taught before the formal relational languages) ■ The presentation on entity-relationship modeling and database design is now in Part (Chapters through 10), but it can still be covered before Part if the focus of a course is on database design ■ Part covers the updated material on object-relational and object-oriented databases (Chapter 11) and XML (Chapter 12) ■ Part includes the chapters on database programming techniques (Chapter 13) and Web database programming using PHP (Chapter 14, which was moved earlier in the book) ■ Part (Chapters 15 and 16) are the normalization and design theory chapters (we moved all the formal aspects of normalization algorithms to Chapter 16) Preface ■ ■ ■ ■ ■ Part (Chapters 17 and 18) contains the chapters on file organizations, indexing, and hashing Part includes the chapters on query processing and optimization techniques (Chapter 19) and database tuning (Chapter 20) Part includes Chapter 21 on transaction processing concepts; Chapter 22 on concurrency control; and Chapter 23 on database recovery from failures Part 10 on additional database topics includes Chapter 24 on database security and Chapter 25 on distributed databases Part 11 on advanced database models and applications includes Chapter 26 on advanced data models (active, temporal, spatial, multimedia, and deductive databases); the new Chapter 27 on information retrieval and Web search; and the chapters on data mining (Chapter 28) and data warehousing (Chapter 29) Contents of the Sixth Edition Part describes the basic introductory concepts necessary for a good understanding of database models, systems, and languages Chapters and introduce databases, typical users, and DBMS concepts, terminology, and architecture Part describes the relational data model, the SQL standard, and the formal relational languages Chapter describes the basic relational model, its integrity constraints, and update operations Chapter describes some of the basic parts of the SQL standard for relational databases, including data definition, data modification operations, and simple SQL queries Chapter presents more complex SQL queries, as well as the SQL concepts of triggers, assertions, views, and schema modification Chapter describes the operations of the relational algebra and introduces the relational calculus Part covers several topics related to conceptual database modeling and database design In Chapter 7, the concepts of the Entity-Relationship (ER) model and ER diagrams are presented and used to illustrate conceptual database design Chapter focuses on data abstraction and semantic data modeling concepts and shows how the ER model can be extended to incorporate these ideas, leading to the enhancedER (EER) data model and EER diagrams The concepts presented in Chapter include subclasses, specialization, generalization, and union types (categories) The notation for the class diagrams of UML is also introduced in Chapters and Chapter discusses relational database design using ER- and EER-to-relational mapping We end Part with Chapter 10, which presents an overview of the different phases of the database design process in enterprises for medium-sized and large database applications Part covers the object-oriented, object-relational, and XML data models, and their affiliated languages and standards Chapter 11 first introduces the concepts for object databases, and then shows how they have been incorporated into the SQL standard in order to add object capabilities to relational database systems It then ix 1158 Index Phantoms, transaction support in SQL, 771 PHP arrays, 486–488 bibliographic references, 497 collecting data from forms and inserting records, 493–494 connecting to databases, 491–493 features, 484–485 functions, 488–490 overview of, 481–482 retrieval queries, 494–495 server variables and forms, 490–491 simple example of, 482–484 summary and exercises, 496–497 variables, data types, and constructs, 485–486 PHP Extension and Application Repository (PEAR), 491 Phrase queries, types of queries in IR systems, 1008 Physical clustering, of records on disks, 617 Physical data independence, in three-schema architecture, 36 Physical data models, 30 Physical database design See also Database design bibliographic references, 740 data organization in, 587 denormalization as design decision related to query speed, 731–732 in ER (Entity-Relationship) model, 202 factors influencing, 727–729 indexing decisions, 730–731 overview of, 9, 326–327 summary and exercises, 739–740 tuning and, 735–736 Physical database file structures, 583 Physical database phase, in database design, 311 Physical indexes vs logical, 668 ordering primary and clustering indexes, 642 Physical problems/catastrophes, recovery needed due to, 751 Physical relationships, between file records, 617 Pile file (heap), 602 Pipelined evaluation, converting query trees into query execution plans, 710 Pipelining, combining operations using, 700 Pivoting (rotation) functionality of data warehouses, 1078 working with data cubes, 1070–1072 PL/SQL designing database programming language from scratch, 449 impedance mismatch and, 450 writing database applications with, 447 Plaintext, 864 Point events (facts), in temporal databases, 946 Pointers, blocks of data and, 597 Points on maps, 959–960 in temporal databases, 945 Policies access control for e-commerce and Web, 854–855 flow policies, 860 for label-based security, 853 security policies, 836 Polygons, on maps, 960 Polyinstantiation, in mandatory access control, 849–850 Polymorphism (operator overloading) defined, 369 in OO systems, 357 overview of, 367–368 specifying in SQL, 375–376 populating (loading) databases, 33 Populations, in statistical database security, 859 Positional iterator, SQLJ, 461–462 Positive literals, in Datalog language, 973 Precedence graph (serialization graph), 763–765 Precision metrics finding relevant information and, 1019 measures of relevance in IR, 1015–1017 Precision, vs security, 841 Precompilers DML commands and, 42 embedded SQL and, 452 in SQL programming, 449 Predicate-defined (conditiondefined) subclasses, 252, 264 Predicate dependency graph, 982 Predicate locking, 801 Predicates as arity or degree of p, 973 built-in, 972–973 fact-defined and rule-defined, 978 interpretation of, 976 in Prolog languages, 970–972 relational schemas and, 66 Prediction, as goal of data mining, 1037 Preprocessors embedded SQL and, 452 in SQL programming, 449 in Web usage analysis, 1025–1027 Presentation layer (client), in threetier client/server architecture, 892 Pretty Good Privacy (PGP), 854 Primary file organization B-trees as, 651 data organization and, 587 Primary indexes cost functions for SELECT operations, 713 methods for simple selection, 686 for ordered records (sorted files), 605 overview of, 633–635 searching nondense multilevel primary index, 646 tables comparing index types, 642 types of ordered indexes, 632 PRIMARY KEY clause, CREATE TABLE command, 95 Primary keys defined, 519 normal forms based on, 516–517 primary indexes and, 633 relational model constraints, 69 Primary site, concurrency control techniques for distributed databases, 910–911 Primary storage, 584 Prime attributes, 519, 526 Printer servers, in client/server architecture, 45 Privacy information privacy vs information security, 841–842 issues in database security, 866–867 protecting in statistical databases, 859 Private keys, in public (asymmetric) key algorithms, 864 Privileged software, 19 Index Privileges discretionary, 842–844 granting/revoking, 111, 844–846 limits on propagation of, 846–847 unauthorized escalation and abuse, 855, 858 views for specifying, 844 Proactive updates, valid time relations and, 949 Probabilistic model, for information retrieval, 1005–1006 Procedural DMLs, 37–38 Process-driven design, 310 PROCESS RULES, in active database systems, 938 Processes in database design, 322 multiprogramming and, 744 Processors, parallel, 1079 Program-data independence, 11–12, 23–24 Program-operation independence, 12 Program variables, 599 Programming languages advantages/disadvantages of, 477 approaches to database programming, 449 DBMS, 36–38 impedance mismatch and, 450 object-orientation creating compatibility between, 369 Web databases See PHP XML, 432–436 Programs, insulation between programs and data, 11–13 PROJECT operations algorithms for, 696–697 Query processing and optimizing, 696–697 in relational algebra, 149–150 Projection attributes, SELECT command and, 98 Projective operators, types of spatial operators, 961 Prolog language See also Datalog language logic programming and, 970 notation, 970–973 Proof-theoretic interpretation, of rules in deductive databases, 975 Properties, of association rules, 1041 Properties of relational decompositions dependency preservation, 552–553 dependency-preserving and nonadditive join decomposition into 3NF schemas, 560–563 dependency-preserving decomposition into 3NF schemas, 558–559 insufficiency of normal forms and, 552 nonadditive join decomposition into BCNF schemas, 559–560 nonadditive (lossless) join, 553–556 overview of, 544, 551 successive nonadditive join decompositions, 557 testing binary decompositions for nonadditive join property, 553–556 Protocols concurrency control, 777 deadlock prevention, 785–787 for ensuring serializability of transaction schedules, 767–768 Proximity queries, 1008 PSM (Persistent stored modules), 474–476 Public (asymmetric) key algorithms, 863–865 Public keys, in public (asymmetric) key algorithm, 864 Publishing XML documents, 431 Punctuation marks, text preprocessing in information retrieval, 1011 Pure time conditions, 955 QBE (Query-By-Example) basic retrieval in, 1091–1095 domain calculus and, 183, 185 grouping, aggregation, and database modification in, 1095–1098 overview of, 1091 QMF (Query Management Facility), 185 Quadtrees, 963 Qualified aggregations, in UML class diagrams, 228 Qualified associations, in UML class diagrams, 228 Qualifier conditions, XPath, 432 Quality control, data warehousing and, 1080 Quantifiers collection operators in OQL, 403–405 existential and universal, 177–178 1159 transforming, 180 using in queries, 180–182 Queries See also OQL (object query language); SQL (Structured Query Language) content-based retrieval, 965 database tuning and, 736–738 defined, design decisions related to query speed, 731–732 evaluating nonrecursive Datalog queries, 981–983 information retrieval, 1007–1009 interactive interface for, 40 IR systems, 1007–1009 keyword-based, 39 modes of interaction in IR systems, 999 physical database design and, 728–729 processing in databases, 19–20 in Prolog languages, 973 retrieval queries from database tables, 494–495 spatial, 958, 961 statistical, 859 TSQL2, 954–956 Query blocks, 681 Query-By-Example See QBE (Query-By-Example) Query compilers, 41 Query decomposition, 905–907 Query execution plans converting query trees into, 709–710 creating, 679 Query graphs creating, 679 notation for, 179–180, 701–703 Query languages DML as, 38 for federated databases, 886 SQL See also SQL (Structured Query Language) TSQL2 See also SQL (Structured Query Language) Query Management Facility (QMF), 185 Query mapping, 901 Query modification, 135 Query optimizer, 41, 679 Query processing and optimizing aggregate functions, 698–699 bibliographic references, 725 catalog information used in cost functions, 712–713 1160 Index converting query trees into query execution plans, 709–710 cost components of query execution, 711–712 cost functions for JOIN, 715–718 cost functions for SELECT, 713–715 DBMS module for, 20 disjunctive selection conditions, 688 external sorting, 682–685 heuristic algebraic optimization algorithm, 708–709 heuristic optimization of query trees, 703–706 heuristics used in query optimization, 700–701 hybrid hash-join, 696 implementing JOIN operations, 689–690 implementing SELECT operations, 685 join selection factors, 693–694 multiple relation queries and JOIN ordering, 718–719 nested-loop joins, 690–693 notation for query trees and query graphs, 701–703 operations, 700 OUTER JOIN operations, 699–700 overview of, 679–681 partition-hash joins, 694–696 PROJECT operations, 696–697 query optimization in Oracle, 721–722 search methods for complex selection, 686–687 search methods for simple selection, 685–686 selectivity and cost estimates in query optimization, 710–711 selectivity of conditions and, 687–688 semantic query optimization, 722–723 set operations, 697–698 summary and exercises, 723–725 transformation rules for relational algebra operations, 706–708 translating SQL queries into relational algebra, 681–682 Query processing and optimizing, in distributed databases data transfer costs for distributed query processing, 902–904 distributed query processing using semijoin operation, 904 overview of, 901–902 query update and decomposition, 905–907 Query results cursors for looping over tuples in, 450 ordering, 106–107 path expressions and, 400–402 retrieval queries from database tables, 494–495 Query (transaction) server, in twotier client/server architecture, 47 Query trees converting into query execution plans, 709–710 creating, 679 notation for, 163–165, 701–703 optimization of, 703–706 R-Trees, for spatial indexing, 962 RAID (Redundant Array of Inexpensive Disks) levels, 620–621 overview of, 617–619 performance improvements, 619–620 reliability improvements, 619 RAM (Random Access Memory), 585 Random access storage devices, 592 Randomizing function (hash function), 606 Range queries, 686, 961 Range relations, of tuple variables, 175–176 Rational Rose data modeler, 338 database design with, 337 tools and options for data modeling, 338–342 RBAC (role-based access control), 851–852 RBG (red, blue, green) colors, 967 RDBMS (relational database management systems) creating indexes, 731 ORDBMS (object-relational database management systems), 354 providing application flexibility, 23–24 two-tier client/server architectures and, 46 RDBs (relational databases) designing See relational database design overview of, 395–396 schemas See relational database schemas RDF (Resource Description Framework), 436 Reachability, of objects, 363 Read command, hard disks, 591 Read-only transaction, 745 READ operation, transactions, 751 Read (or Get) operation, on files, 600 Read phase, of optimistic concurrency control, 794 Read-set, of transaction, 747 Read timestamp, 789 Read-write conflicts, in transaction schedules, 757 Read/write heads, on hard disks, 591 Read/write, OSs controlling disk read/write, 40 Read-write transactions, 745–747 read_item(X), 746 Real-time database technology, Reasoning mechanisms, in knowledge representation, 268 Recall metrics, in IR, 1015–1017, 1019 Recall/precision curve, in IR, 1017 Record-at-a-time DMLs, 38 Record-based data models, 31 Record pointers, 609 Records See also Files (of records) anchor record (block anchor), 633 blocking, 597 catalog information used in query cost estimation, 712 fixed-length and variable-length, 595–597 inserting, 493–494 mixed, 616–617 ordered (sorted files), 603–606 phantom records, concurrency control techniques, 800–801 placing file records on disk, 594 spanned/unspanned, 597–598 in SQL/CLI, 464–468 types of, 594–595 unordered (heap files), 601–602 Recoverability, transaction schedules based o, 757–759 Recovery See also Backup and recovery; Database recovery techniques Index transaction management in distributed databases, 912–913 types of failures and, 750–751 Recursive closure operations, in relational algebra, 168–169 Recursive relationships, 168, 215 Recursive rules, in Prolog languages, 972 Red, blue, green (RBG) colors, 967 REDO phase, of ARIES recovery algorithm, 823 Redo transaction, 753 REDO, write-ahead logging and, 810–811 Redundancy, controlling in databases, 17–18 Redundant Array of Inexpensive Disks (RAID) See RAID (Redundant Array of Inexpensive Disks) REF keyword, specifying relationships via reference, 376 Reference types, OIDs using, 373–374 References foreign key, 73 representing object relationships, 360 specifying relationships via reference, 376 Referencing relations, 73 Referential integrity constraints inclusion dependencies and, 571 integrity constraints in databases, 21 relational data model and, 73–74 specifying in SQL, 95–96 Reflexive associations, in UML class diagrams, 227 Regression function, 1058 Regression, in data mining, 1057–1058 Regression rule, 1057 Regular entity types, 219, 287–288 Relation extension, 62 Relation intension, 62 Relation nodes notation for, 703 in query graphs, 179 Relation schemas domains and, 61 goodness of, 501–502 in relational databases, 501 Relation (table) level, assigning privileges at, 842–843 Relational algebra aggregate functions and grouping, 166–168 bibliographic references, 194–195 CARTESIAN PRODUCT operation, 155–157 complete set of relational algebra operations, 161, 164 DIVISION operation, 162–163 EQUIJOIN and NATURAL JOIN operations, 159–161 examples of queries in, 171–174 generalized projection, 165–166 JOIN operation, 157–158 notation for query trees, 163–165 OUTER JOIN operations, 169–170 OUTER UNION operation, 170–171 overview of, 145–146 PROJECT operation, 149–150 recursive closure operations, 168–169 RENAME operation, 151–152 SELECT operation, 147–149 sequences of operations, 151 summary and exercises, 185–194 transformation rules for operations, 706–708 translating SQL queries into, 681–682 UNION, INTERSECTION, and MINUS operations, 152–155 Relational calculus domain (relational) calculus, 183–185 overview of, 146–147 tuple relational calculus See Tuple relational calculus Relational completeness, of relational query languages, 174 Relational data model bibliographic references, 85 characteristics of relations, 63–66 classifying DBMSs and, 49 concepts, 60–61 constraints, 67–70 correspondence to ER model, 293 Delete operation, 77–78 domains, attributes, tuples, and relations, 61–63 formal languages for See Relational algebra; Relational calculus Insert operation, 76–77 1161 integrity, referential integrity, and foreign keys, 73–74 in list of data model types, 31 mapping from EER model to See EER-to-Relational mapping mapping from ER model to See ER-to-Relational mapping notation, 66–67 other types of constraints, 74–75 overview of, 50, 59–60 practical language for See SQL (Structured Query Language) schemas, 70–73 SQL compared with, 97 summary and exercises, 79–85 transactions and, 79 update operations, 75–76, 78–79 Relational database design algorithms for, 557, 566–567 attribute semantics in, 503–507 bibliographic references, 302, 579 bottom-up approach to, 544 Boyce-Codd normal form (BCNF), 529–531 dependency preservation properties of decompositions, 552–553 dependency-preserving and nonadditive join decomposition into 3NF schemas, 560–563 dependency-preserving decomposition into 3NF schemas, 558–559 disallowing possibility for spurious tuples, 510–513 domain-key normal form (DKNF), 574–575 equivalence of sets of functional dependencies, 549 first normal form (1NF), 519–523 formal analysis of relational schemas, 513 formal definition of fourth normal form, 533–534, 568–570 functional dependencies based on arithmetic functions and procedures, 572–574 functional dependency and, 513–516 general definition of second normal form, 526–527 general definition of third normal form, 528 goodness of relational schemas, 501–502 1162 Index inclusion dependencies, 571–572 inference rules for functional and multivalued dependencies, 568 inference rules for functional dependencies, 545–549 informal guidelines for relational schemas, 503, 513 join dependencies and fifth normal form, 534–535 key definitions, 518–519 mapping from EER model to relational model See EER-toRelational mapping mapping from ER model to relational model See ER-toRelational mapping minimal sets of functional dependencies, 549–551 multivalued dependency and fourth normal form, 531–533 nonadditive join decomposition into 4NF relations, 570 nonadditive join decomposition into BCNF schemas, 559–560 nonadditive (lossless) join properties of decompositions, 553–556 normal forms based on primary keys, 516–517 normalization of relations, 517–518 NULL values and dangling tuples and, 563–565 overview of, 285 practical use of normal forms, 518 reducing NULL values in tuples, 509–510 reducing redundant information in tuples, 507–509 relational decomposition and insufficiency of normal forms, 552 second normal form (2NF), 523 successive nonadditive join decompositions, 557 summary and exercises, 299–301, 575–578 template dependencies, 572 testing binary decompositions for nonadditive join property, 557 third normal form (3NF), 523–525 top-down and bottom-up approaches, 502 tuning and, 733 Relational database management systems See RDBMS (relational database management systems) Relational database schemas algorithms for schema design, 557 bibliographic references, 542 clear semantics for attributes in, 503–507 components of, 70–73 disallowing possibility for spurious tuples, 510–513 formal analysis of, 513 functional dependency and, 513–516 informal guidelines, 503, 513 overview of, 501–502 reducing NULL values in tuples, 509–510 reducing redundant information in tuples, 507–509 relation schemas in, 501 summary and exercises, 535–542 Relational database state, 70 Relational design by analysis, 543 Relational design by synthesis, 544 Relational expressions, 983 Relational OLAP (ROLAP), 1079 Relational operators in deductive database systems, 980–981 relational expressions and, 983 Relations (relation states) See also Tables alternative definition of, 64–65 column-based storage of, 669–670 defined, 61 interpretation (meaning) of, 66 legality of, 514 normalization of, 517–518 ordering tuples in, 63 ordering values within tuples, 64 overview of, 62–63 values and NULLS in tuples, 65–66 Relations, temporal bitemporal time, 950–952 transaction time, 949–950 valid time, 947–949 Relationship relation (lookup table) mapping of binary 1:1 relationship types, 289 mapping of binary 1:N relationship types, 290 mapping of binary M:N relationship types, 290–291 Relationships in data modeling, 31 in ODMG object model, 386 references to, 360 representing in OO systems, 356 specifying by reference, 376 symbols for, 1084 University student database example, Relationships, in EER model class/subclass relationships, 247 specific relationship types and, 249–250 Relationships, in ER model attributes of relationship types, 218 constraints on binary relationship types, 216–218 degree of relationship greater than two, 228–232 degree of relationship type, 213–214 overview of, 212 relationship types, sets, and instances, 212–213 relationships as attributes, 214 role names and recursive relationships, 215 Relevant sets, in probabilistic model for IR, 1005 Reliability, in distributed databases, 881, 882 Remote commands, for SQL injection attacks, 857 RENAME operation, in relational algebra, 151–152 Reorganize operation, on files, 600 Repeating field or groups, in file records, 595 Repeating history, in ARIES recovery algorithm, 821 Replication active rules for maintaining consistency of replicated tables, 943 in distributed databases, 897 example of fragmentation, allocation, and replication, 898–901 transparency of, 880 Representational (or implementation) data models, 31 Requirements collection and analysis phase in database design, 200, 311–313 database design starting with, of information system (IS) life cycle, 307 Index Reset operations, on files, 599 Resource Description Framework (RDF), 436 Response time, physical database design and, 326 Restrict option, of delete operation, 77 Result equivalence, of transaction schedules, 762 Result relations, 75 Result tables, in QBE, 1095 Retrieval operations database design and, 728 from database tables, 494–495 on files, 599 modes of interaction in IR systems, 999 objects, 362 QBE (Query-By-Example), 1091–1095 types of relational data model operations, 75 Retrieval transactions, 322 Retroactive update, valid time relations and, 949 Return values, of PHP functions, 490 Reverse engineering, Rational Rose and, 338 Revoking privileges, 844, 845–846 Rewrite blocks, file organization and, 602 Rewrite time, as disk parameter, 1089 RIFT (rotation invariant feature transform), 968 Rigorous two-phase locking, 785 Rivest, Ron, 865 ROLAP (relational OLAP), 1079 Role-based access control (RBAC), 851–852 Role hierarchy, in role-based access control, 851 Role names, and recursive relationships, 215 Roll-up display functionality of data warehouses, 1078 working with data cubes, 1070–1072 ROLLBACK (or ABORT) operation, 752 Rollbacks, in database recovery, 813–815, 950 Root element, XML schema language, 429 Root tag, XML documents, 423 Roots, of tree structures, 646 Rotation See Pivoting (rotation) Rotation invariant feature transform (RIFT), 968 Rotational delay (rd) as disk parameter, 1087 on hard disks, 591 Row-level access control, 852–853 Row-level triggers, 937 Rows See Tuples (rows) Rows, in SQL, 89 RSA encryption algorithm, 865 Rule consideration, in active databases deferred consideration, 942 overview of, 938–939 Rule-defined predicates (views), 978 Rule sets, in active database systems, 938 Rules, in deductive databases interpretation of, 975–977 overview of, 21, 932 in Prolog/Datalog notation, 970–972 safe, 979–980 Runtime database processor DBMS component modules, 42 query execution and, 679 Runtime, specifying SQL queries at, 458–459 Safe expressions, in tuple relational calculus, 182–183 Safe rules, in deductive databases, 979–980 Sampling algorithm, in data mining, 1042 SANs (Storage Area Networks), 621–622 Saturation, hue, saturation, and value (HSV), 967 SAX (Simple API for XML), 423 Scale-invariant feature transform (SIFT), 968 Scan operations, files, 600 Scanner, for SQL, 679 Schedules (histories), of transactions characterizing based on recoverability, 757–759 characterizing based on serializability, 759–760 equivalence of, 768–770 overview of, 755–757 serial, nonserial, and conflictserializable schedules, 761–763 1163 testing conflict serializability of, 763–765 Schema conceptual design, 313–321 entity type describing for entity sets, 208 instances and database state and, 32–33 ontologies and, 272 relational See Relational database schemas relational data model and, 70–73 three-schema architecture See Three-schema architecture Schema construct, 32, 222 Schema diagram, 32 Schema evolution, 33 Schema matching, types of Web information integration, 1023 Schema, SQL change statements, 137–139 names, 89 overview of, 89–90 Schema (view) integration, 316–317, 319–321 Schemaless XML documents, 422 Scientific applications, 25 Scope, variable, 490 Scripting languages, PHP as, 482 SCSI (Small Computer System Interface), 591 SDL (storage definition language), 37, 110 Search engines overview of, 998–999 vertical and metasearch, 1018 Search fields, 648 Search trees, 647–649 Searches conversational, 1029–1030 faceted, 1028–1029 information retrieval See IR (Information Retrieval) measures of relevance, 1014–1015 methods for complex selection, 686–687 methods for simple selection, 685–686 navigational, informational, and transactional, 996 social searches, 1029 Web See Web search and analysis Second normal form (2NF) general definition of, 526–527 overview of, 523 Secondary access path, 631 1164 Index Secondary file organization, 587 Secondary indexes advantages of, 668 cost functions for SELECT, 714 methods for simple selection, 686 overview of, 636–642 tables comparing index types, 642 types of ordered indexes, 632–633 Secondary keys, 636 Secondary storage, 584, 711 Secret key algorithms, 863 Sectors, of hard disk, 589 Security vs precision, 841 Web security, 1028 Security and authorization subsystem, DBMS, 19 Security, database See Database security Seek time (s) as disk parameter, 1087 on hard disks, 591 Segmentation, automatic analysis of images, 967 SELECT command, SQL aggregate functions used in, 125 basic form of, 97–98 FROM clause, 107 DISTINCT keyword with, 103 information retrieval with, 97 projection attributes and selection conditions, 98, 100 in SQL retrieval queries, 129–130 SELECT-FROM-WHERE structure, of SQL queries, 98–100 SELECT operations cost functions for, 713–715 disjunctive selection conditions, 688 on files, 599 implementing, 685 in relational algebra, 147–149 search methods for complex selection, 686–687 search methods for simple selection, 685–686 selectivity of conditions, 687–688 SELECT operator (σ), 147 Select-project-join queries, 179 Selection cardinality, 712 Selection conditions in domain calculus, 184 SELECT command and, 98, 100 SELECT operation and, 147 Selection, functionality of data warehouses, 1079 Selective inheritance, in ODBs (object databases), 368 Selectivity and cost estimates, in query optimization catalog information used in cost functions, 712–713 cost components of query execution, 711–712 cost functions for JOIN, 715–718 cost functions for SELECT, 713–715 multiple relation queries and JOIN ordering, 718–719 overview of, 710–711 Selectivity, of conditions, 687–688 Self-describing data, 10–11, 416 Semantic constraints relational model constraints, 68 template dependencies and, 572 types of constraints, 74 Semantic data models abstraction concepts in, 268 aggregation and association, 269–271 classification and instantiation, 268 compared with knowledge representation, 267–268 ER (Entity-Relationship) model, 245 identification, 269 for information retrieval, 1006–1007 specialization and generalization, 269 Semantic query optimization, 722–723 Semantic relationships, in semantic model for IR, 1006 Semantic Web, 272–273 Semantics approach to IR, 1000 of attributes, 503–507, 514 equivalence of transaction schedules and, 769–770 heterogeneity of in federated databases, 886–887 integrity constraints and, 21 tagging images, 969 Semijoin operation, 904 Semistructured data, 416–417 Separators, XPath, 432 Sequence diagrams, UML, 329, 331 Sequential order, in accessing data blocks, 592 Sequential patterns in data mining, 1037 describing knowledge discovered by data mining, 1039 discovery of, 1057 in pattern discovery phase of Web usage analysis, 1027 Serial schedules, 761 Serializability, of transaction schedules characterizing schedules based on, 759–760 serial, nonserial, and conflictserializable schedules, 761–763 testing conflict serializability of schedules, 763–765 used for concurrency control, 765–768 view serializability, 768–769 Serialization (precedence) graph, 763–765 Servers client program calling database server, 451 database servers, 42 DBMS module for, 29 parallel architecture for, 1079 PHP variables, 490–491 server level in two-tier client/ server architecture, 47 specialized servers in client/server architecture, 45–46 Set-at-a-time DMLs, 38 Set constructor, 359 SET DIFFERENCE operation algorithms for, 697–698 in relational algebra, 152–155 Set null (set default) option, in delete operations, 77–78 Set operations algorithms for, 697–698 query processing and optimizing, 697–698 SQL, 104 Set types, in network data model, 51 Sets equivalence of, 549 explicit sets of values in SQL, 122 SQL table as multiset of tuples, 97 tables as, 103–105 Shadow directory, 820 Shadow paging, 820–821 Shamir, Adi, 865 Shape, automatic analysis of images, 967 Shape descriptors, 965 Shared nothing architecture, 887–888 Index Shared subclasses (multiple inheritance), 256, 297 Shared variables, embedded SQL and, 452 Sharing data and multiuser transactions, 13–14 Sharing databases, Shrinking (second) phase, in twophase locking, 782 SIFT (scale-invariant feature transform), 968 Simple API for XML (SAX), 423 Simple (atomic) attributes, in ER model, 205–207 Simple Object Access Protocol (SOAP), 436 Simultaneous update, 949 Single inheritance, subclasses and, 256–257 Single-level indexes clustering indexes, 635–636 overview of, 632–633 primary indexes, 633–635 secondary indexes, 636–642 tables comparing index types, 642 Single-loop joins cost functions for, 716 methods for implementing joins, 689 Single-quoted strings, PHP text processing, 485–486 Single-relation options, for mapping specialization or generalization, 295 Single-sided disks, 589 Single time points, in temporal databases, 946 Single-user systems, 49 Single-user transaction processing system, 744–745 Single-valued attributes, in ER model, 206 Singular value decompositions (SVD), 967 Slice and dice, functionality of data warehouses, 1078 Small Computer System Interface (SCSI), 591 SMART document retrieval system, 998 SMP (symmetric multiprocessor), 1079 Snowflake schema, for multidimensional data models, 1073–1074 SOAP (Simple Object Access Protocol), 436 Social searches, 1029 Software costs, choosing a DBMS, 323 Software developers, 16 Software engineers database actors on the scene, 16 design and testing of applications, 199 Sort-merge joins cost functions for, 717 methods for implementing joins, 689–690 Sort-merge strategy, 683 Sorting external, 682–685 functionality of data warehouses, 1078 implementing aggregate operations, 699 ordered records (sorted files), 603–606 Space utilization, physical database design and, 326 Spamming, Web spamming, 1028 Spanned/unspanned organization, of records, 597 Sparse indexes, 633 Spatial analysis, 959 Spatial applications, 25 Spatial databases applications of spatial data, 964–965 data indexing, 961–963 data mining, 963–964 data types and models, 959–960 dynamic operators, 961 operators, 960–961 overview of, 957–959 Spatial joins/overlays, 961 Spatial outliers, 965 Special purpose DBMSs, 50 Specialization/generalization constraints on, 251–254 definitions, 264 design choices for, 263–264 EER-to-Relational mapping, 294–297 generalization, 250–251 hierarchies and lattices, 254–257 in knowledge representation, 269 notation for, 1084–1085 refining conceptual schemas, 257–258 specialization, 248–250 UML (Unified Modeling Language), 265–266 1165 Specialized servers, in client/server architecture, 45 Specific attributes (local attributes), of subclass, 249 Specific relationship types, subclasses and, 249–250 Specification, conceptualization and, 272 Speech input and output, queries and, 39 SQL-99, 942–943 SQL/CLI (Call Level Interface) database programming with, 464–468 library of functions, 448 SQL injection attacks code injection, 856 function call injection, 856–857 protecting against, 858 risks associated with, 857–858 SQL manipulation, 856 types of, 855 SQL programming techniques approaches to database programming, 449–450 bibliographic references, 479 database programming techniques and issues, 448–449 dynamic SQL, 448, 458–459 embedded SQL See Embedded SQL function calls See Function calls, database programming with impedance mismatch, 450 overview of, 447–448 sequence of interactions in, 451 SQL/PSM (SQL/Persistent Stored Modules) See SQL/PSM (SQL/ Persistent Stored Modules) summary and exercises, 477–478 SQL/PSM (SQL/Persistent Stored Modules) overview of, 473 specifying persistent stored modules, 475–476 stored procedures and functions, 473–475 SQL (Structured Query Language) See also Embedded SQL * (asterisk) for retrieving all attribute values of selected tuples, 102–103 aliases, 101–102 bibliographic references, 114 CHECK clauses for specifying constraints on tuples, 97 1166 Index clauses in simple SQL queries, 107 common data types, 92–94 CREATE TABLE command, 90–92 data definition in, 89 dealing with ambiguous attribute names, 100–101 DELETE command, 109 embedding SQL commands in Java, 459–461 external sorting, 682–685 INSERT command, 107–109 list of features in, 110–111 manipulation by SQL injection attacks, 856 missing or unspecified WHERE clauses, 102 naming constraints, 96–97 object-relational features in, 354 ordering query results, 106–107 overview of, 87–89 QBE compared with, 1098 schema and catalog concepts in, 89–90 SELECT-FROM-WHERE structure of queries, 98–100 servers, 47 specifying attribute constraints and default values, 94–95 specifying key and referential integrity constraints, 95–96 substring pattern matching and arithmetic operators, 105–106 summary and exercises, 111–114 tables as sets in, 103–105 temporal data types, 945 transaction support, 770–772 translating SQL queries into relational algebra, 681–682 UDT (user-defined types) in, 111 UPDATE command, 109–110 SQL (Structured Query Language), advanced features aggregate functions, 124–126 ALTER command, 138–139 bibliographic references, 143 clauses in retrieval queries, 129–130 comparisons involving NULL and three-valued logic, 116–117 correlated nested queries, 119–120 CREATE ASSERTION command, 131–132 CREATE TRIGGER command, 132–133 CREATE VIEW command, 134–135 DROP command, 138 EXISTS and NOT EXISTS functions, 120–122 explicit sets and renaming of attributes, 122 GROUP BY clause, 126–129 HAVING clause, 127–129 inline views, 137 nested queries, 117–119 outer and inner joins, 123–124 overview of, 115 schema change statements, 137 summary and exercises, 139–143 UNIQUE function, 122 view implementation and update, 135–137 views (virtual tables) in, 133–134 SQL (Structured Query Language), ODB extensions to dot notation for build path expressions, 376 encapsulation of operations, 374–375 inheritance and polymorphism, 375–376 OIDs (object identifiers) using reference types, 373–374 overview of, 369–370 specifying relationships via reference, 376 tables based on UDTs, 374 UDTs and complex structures for objects, 370–373 SQLJ embedding SQL command in Java, 459–461 retrieving multiple tuples using iterators, 461–464 SQLODE communication variable, 454 SQLSTATE communication variable, 454 Standards database approach and, 22 database design specification, 328 SQL, 88 Star schema, 1073 Starvation, concurrency control and, 788 State in ODMG object model, 382 relational database state, 70–72 transaction, 751–752 State constraints, 75 Statechart diagrams, UML, 329, 333 Statement-level active rules, in STARBURST example, 940–942 Statement-level triggers overview of, 937 in STARBURST example, 940 Statement records, in SQL/CLI, 464–468 Static (early) binding, in ODMS, 368 Static files, 601 Static hashing, 610 Static Web pages, 420 Statistical analysis, in pattern discovery phase of Web usage analysis, 1026 Statistical approach, to IR, 1000–1002 Statistical database security, 859–860 Statistical databases, 837–838, 874 Statistical queries, 859 Steal/no-steal techniques in database recovery, 811–812 UNDO/REDO recovery algorithm, 819 Stem, of words, 1010 Stemming, text preprocessing in information retrieval, 1010 Stopwords in keyword queries, 1007 removal, 1009–1010 text/document sources, 966 Storage allocation of file blocks on disk, 598 bibliographic references, 630 buffer management and, 593–594 column-based storage of relations, 669–670 cost components of query execution, 711 covert channels, 861 database storage, 586–587 database storage reorganization, 43 database tuning and, 733 file headers (descriptors) and, 598 file systems and See Files (of records) files, fixed-length records, and variable-length records, 595–597 hardware structures of disk devices, 588–592 iSCSI (Internet SCSI), 623–624 magnetic tape devices, 592–593 Index measuring capacity, 585 memory hierarchies and, 584–586 NAS (network-attached storage), 622–623 overview of, 583–584 parallelization of access See RAID (Redundant Array of Inexpensive Disks) placing file records on disk, 594 record blocking and, 597 records and record types, 594–595 SANs (Storage Area Networks), 621–622 secondary storage devices, 587 spanned/unspanned records, 597–598 summary and exercises, 624–630 Storage Area Networks (SANs), 621–622 Storage definition language (SDL), 37, 110 Storage medium, physical, 584 Stored attributes, in ER model, 206 Stored data manager module, DBMS, 40, 42 Stored procedures, 21, 473–475 Stream-based processing, 700 Streaming XML documents, 423 Strict hierarchies, 255 Strict schedules, 759 Strict timestamp ordering, 790–791 Strict two-phase locking, 784–785 Strings pattern matching, 105 PHP text processing, 485 Strong entity types, 219, 287 Struct (tuple) constructors, 358–359 Structural constraints, of relationships, 218 Structural diagrams, UML, 329 Structured data extracting, 1022 overview of, 416 vs unstructured, 993–994 Structured domains, in UML class diagrams, 227 Structured literals, 378 Subclasses in EER model, 246–248, 264 generalizing into superclasses, 250 as leaf classes in UML, 265 options for mapping specialization or generalization, 294 predicate-defined and userdefined, 252 shared, 256 specific attributes (local attributes) of, 249 specific relationship types and, 249–250 union types or categories, 258–260 Subset of Cartesian product, 63 Subsets, of attributes, 68–69 Substring pattern matching, in SQL, 105–106 Subtrees, 646 Subtypes, 247, 365–366 SUM function aggregate functions in SQL, 124–125 grouping and, 166, 168 implementing aggregate operations, 698 Superclass/subclass relationships in EER model, 264 overview of, 247 union types or categories, 258–260 Superclasses base class and, 265 in EER model, 246–248, 264 generalization and, 250 options for mapping specialization or generalization, 294 specialization and, 248 Superkeys defined, 518 relational model constraints, 69 Supertypes, 247, 365 Superuser accounts, 838 Supervised learning classification and, 1051 neural networks and, 1058 Support, for association rules, 1040 Surrogate keys, 298 Survivability, challenges in database security, 867 SVD (singular value decompositions), 967 Symmetric key algorithms, 863 Symmetric multiprocessor (SMP), 1079 Synonyms, thesaurus as collection of, 1010 Syntactic analysis, in semantic model for IR, 1006 System accounts, 838 catalog, 42 definition in database application life cycle, 308 1167 recovery needed due to system error, 750 security issues at system level, 836 System designers, 16 System environment DBMS module, 40–42 tools, application environments, and communication facilities, 43–44 utilities for, 42–43 System independent mapping, in choosing a DBMS, 326 System logs See also Logs/logging auditing and, 839–840 database recovery and, 808 tracking transaction operations, 753–754 Systems analyst, 16 Table inheritance, in SQL, 376 Tables ALTER TABLE command, 138–139 assigning privileges at table level, 842–843 base tables (relations) vs virtual relations, 90 basing on UDTs, 374 DROP TABLE command, 138 in relational model, 60, 61 retrieval queries from database tables, 494–495 in SQL, 89 SQL table as multiset of tuples, 97, 103–105 virtual See Views Tags HTML, 418–419 semistructured data and, 417 Tape jukeboxes, 586 Tape, magnetic, 592–593 Tape reel, 592 Taxonomies, 272 Technical metadata, in data warehousing, 1078 Templates dependencies, 572 in Query-By-Example, 1091 Temporal aggregation, 957 Temporal databases attribute versioning for incorporating time in OODBs, 953–954 bitemporal time relations, 950–952 options for storing tuples in temporal relations, 952–953 overview of, 943–945 1168 Index querying constructs using TSQL2 language, 954–956 time representation, calendars and time dimensions, 945–947 time series data, 957 transaction time relations, 949–950 valid time relations, 947–949 Temporal intersection join, 952 Temporal normal form, 952 Temporal variables, 948 Temporary updates (dirty reads), concurrency control and, 748–749 Term frequency-inverse document frequency See TF-IDF (term frequency-inverse document frequency) Terminated state, transactions, 752 Terms (keywords) modes of interaction in IR systems, 999 sets of terms in Boolean model for IR, 1002 Ternary relationships choosing between binary and ternary relationships, 228–231 constraints on, 232 in ER (Entity-Relationship) model, 213–214 Tertiary storage, 584, 586 Testing conflict serializability of schedules, 763–765 in database application life cycle, 308 Texels (texture elements), 967 Text preprocessing in information retrieval, 1009–1012 sources in multimedia databases, 966 storing XML document as, 431 Texture, automatic analysis of images, 967 TF-IDF (term frequency-inverse document frequency) applying to inverted indexing, 1013 in vector space model for IR, 1003–1004 Thematic analysis, for spatial databases, 959 Theorem proving, in deductive databases, 976 Thesaurus ontologies, 272 text preprocessing in information retrieval, 1010–1011 Third normal form (3NF) dependency-preserving and nonadditive join decomposition into, 558–563 dependency-preserving decomposition into, 558–559 general definition of, 528 overview of, 523–525 Thomas’s write rule, 791 Threats, to database security, 836–837 Three-phase commit (3PC) protocol, 908 three-schema architecture data independence and, 35–36 levels of, 34–35 overview of, 33 Three-tier architectures client/server architecture, 892–894 PHP, 482 for Web applications, 47–49 Three-valued logic, 116–117 Time constraints, on queries and transactions, 729 TIME data type, 945 Time dimensions, in temporal databases, 945–947 Time periods, in temporal databases, 946 Time representation, in temporal databases, 945–947 Time series management systems, 957 patterns in, 1039, 1057 as specialized database applications, 25 in temporal databases, 946, 957 Time-varying attributes, 953 Timeouts, for dealing with deadlocks, 788 TIMESTAMP data type, SQL, 93, 945 Timestamp ordering (TO) basic, 789–790 for concurrency control, 777 multiversion technique based on, 792 strict timestamp ordering, 790–791 Thomas’s write rule, 791 Timestamps overview of, 789 read and write, 789 transaction time relations and, 949 Timing channels, covert, 861 TO See Timestamp ordering (TO) Tool developers, 17 Tools, DBMS, 43–44 Top-down methodology for conceptual refinement, 257 for database design, 502 for schema design, 315–316 Topical relevance, in IR, 1015 Topological operators, 960 Topological relationships, among spatial objects, 959 Topologies, network, 879 Total categories, 260 Total participation, binary relationships and, 217 Total specialization constraint, 253 Tracks, on hard disks, 589 Trade-off analysis, 345 Training costs, in choosing a DBMS, 323–324 Transaction-id, 753 Transaction processing systems ACID properties, 754–755 bibliographic references, 775 characterizing schedules based on recoverability, 757–759 characterizing schedules based on serializability, 759–760 commit point of transactions, 754 concurrency control, 747–750 database design and, 306 equivalence of schedules, 769–770 overview of, 743–744 recovery, 750–751 schedules (histories) of transactions, 756–757 serial, nonserial, and conflictserializable schedules, 761–763 serializability used for concurrency control, 765–768 single-user vs multiuser, 744–745 SQL support for transactions, 770–772 summary and exercises, 772–774 system log, 753–754 testing conflict serializability of schedules, 763–765 transaction states and operations, 751–752 transactions, database items, read/write operations, and DBMS buffers, 745–747 view equivalence and view serializability, 768–769 Index Transaction processing systems, in distributed databases catalog management, 913 concurrency control, 909–912 operating system support, 909 overview of, 907–908 recovery, 912–913 two-phase and three-phase commit protocols, 908–909 Transaction Table, in ARIES recovery algorithm, 822 Transaction time, in temporal databases, 946 Transaction time relations, in temporal databases, 949–950 Transaction timestamp, 786 Transactional databases, distinguishing data warehouses from, 1069 Transactional searches, 996 Transactions ACID properties, 754–755 canned, 15 commit point of, 754 committed and aborted, 750 defined, designing, 322–323 interactive, 801 multiuser, 13–14 recovery needed due to transaction error, 750 relational data model and, 79 schedules (histories) of, 756–757 SQL transaction control commands, 111 states and operations, 751–752 throughput in physical database design, 327 types of, 745 Transfer rate (tr), disk blocks, 1088 Transformation approach, to image database queries, 966 Transience collections, 367 data, 586 object lifetime and, 378 objects, 355, 363 Transition constraints, 75 Transition tables, in STARBURST example, 940 Transitive closure, of relations, 168 Transitive dependencies, in 3NF, 523–524 Transparency autonomy as complement to, 882 in distributed databases, 879–881 Tree data models See Hierarchical data models Tree structures See also B+-trees; B-trees decision making in database design, 730 FP-tree (frequent-pattern tree) algorithm, 1043–1045 leaf-deep trees, 718 overview of, 646–647 R-trees, 962 search trees, 647–649 specialization hierarchy, 255 TV-trees (telescoping vector trees), 967 Triggers active rules specified by, 933 associating with database tables, 21 before, after, and instead triggers, 938 CREATE TABLE command, 132–133 CREATE TRIGGER command, 936 creating in SQL, 111 overview of, 932 row-level and statement-level, 937 specifying constraints, 74 in SQL-99, 942–943 Truth values, of atoms, 184 TSQL2 language, 954–956 Tuning databases design, 735–736 guidelines for, 738–739 implementation and, 311 indexes, 734–735 overview of, 733–734 queries, 736–738 system implementation and tuning, 327–328 Tuple-based constraints, 97 Tuple relational calculus examples of queries in, 178–179 existential and universal quantifiers, 177–178 expressions and formulas, 176–177 notation for query graphs, 179–180 overview of, 174–175 safe expressions, 182–183 SQL based on, 88 transforming universal and existential quantifiers, 180 1169 tuple variables and range relations, 175–176 universal quantifier used in queries, 180–182 Tuple versioning approach, to implementing temporal databases, 947–953 bitemporal time relations, 950–952 implementation considerations, 952–953 transaction time relations and, 949–950 valid time relations and, 947–949 Tuples (rows) classification in mandatory access control, 848 combining using JOIN operation, 157–158 comparison of values in, 118 component values of, 67 dangling tuples in relational design, 563–565 defined, 61 disallowing spurious, 510–513 eliminating duplicates, 150 hypothesis tuples, 572 n-tuple for relations, 62 ordering in relations, 64 ordering values within, 64–65 reducing NULL values in, 509–510 reducing redundant information in, 507–509 retrieving all attribute values of selected, 102–103 retrieving multiple tuples in SQLJ, 461–464 retrieving multiple tuples using cursors, 455–457 SQL table as multiset of, 97 storing in temporal relations, 952–953 unspecified WHERE clause and, 102 valid time relations and, 948 values and NULLS in, 65–66 versioning for incorporating time in relational databases, 953 Tuples variables aliases and, 101 looping with iterators, 98 range relations and, 175–176 TV-trees (telescoping vector trees), 967 1170 Index Two-phase commit (2PC) protocol recovery in multidatabase systems, 825–826 transaction management in distributed databases, 908 Two-phase locking basic locks, 784 binary locks, 778–780 conversion of locks, 782 overview of, 777–778 serializability guaranteed by, 782–784 shared/exclusive (read/write) locks, 780–782 variations on two-phase locking, 784–785 Two-tier client/server architecture, 46–47 Two-way joins, 689 Type (class) hierarchies constraints on extents corresponding to, 366–367 inheritance and, 369 in OO systems, 356 simple model for inheritance, 364–366 Type-compatible relations, 697 Type constructors atom constructor, 358 collection constructor, 359 defined, 369 ODB features included in SQL, 370 ODL and, 359–360 struct (tuple) constructor, 358–359 Type generator, 358–359 UDT (user-defined types) creating, 370–373 in SQL, 111 tables based on, 374 UML (Unified Modeling Language) class diagrams, 226–228 for database application design, 329 as design specification standard, 328 diagram types, 329–334 notation for ER diagrams, 224 object modeling with, 200 representing specialization/generalization in, 265–266 University student database example, 334–337 UMLS metathesaurus, 1010–1011 Unary relational operations CARTESIAN PRODUCT operation, 155–157 overview of, 146 PROJECT operation, 149–150 SELECT operation, 147–149 UNION, INTERSECTION, and MINUS operations, 152–155 Unbalanced trees, 646 Unconstrained write assumption, 769 UNDO/NO-REDO recovery immediate update techniques, 818–819 overview of, 807, 809 Undo operations, transactions, 753 UNDO phase, of ARIES recovery algorithm, 823 UNDO/REDO recovery immediate update techniques, 819 overview of, 807, 809 UNDO, write-ahead logging and, 810–811 Unidirectional associations, in UML class diagrams, 227 Unified Modeling Language See UML (Unified Modeling Language) UNION operation algorithms for, 697–698 in relational algebra, 152–155 SQL set operations, 104 Union types (categories) EER-to-Relational mapping, 297–299 modeling, 258–260 UNIQUE function, SQL, 122 Unique identity, in ODMS, 357 UNIQUE KEY clause, CREATE TABLE command, 96 Unique keys, in relational models, 70 Uniqueness constraints on entity attributes, 208–209 factors influencing physical database design, 729 integrity constraints in databases, 21 overview of, 68–70 specifying in SQL, 95–96 Universal quantifiers transforming, 180 in tuple relational calculus, 177–178 used in queries, 180–182 Universal relation assumption, 552 Universal relation schema, 552 Universal relations, 544 Universe of discourse (UoD), University student database example data records in, 6–9 EER schema applied to, 260–263 Unordered (heap files) records, 601–602 Unrepeatable read problem, 750 Unstructured data HTML and, 418–420 information retrieval dealing with, 993–994 Unsupervised learning clustering and, 1054 neural networks and, 1058 UoD (universe of discourse), Update anomalies, avoiding redundant information in tuples, 507 UPDATE command, SQL active rules and, 936 overview of, 109–110 Update operations bitemporal databases and, 950 database design and, 728 factors influencing physical database design, 729 operations on files, 599 query processing in distributed databases, 905–907 in relational data model, 78–79 types of relational data model operations, 75 Update transactions, 322 Usage projections, data warehousing and, 1080 Use case diagrams, UML, 329–331 User accounts, database security and, 839–840 User-defined subclasses, 252, 264 User-defined time, 947 User-defined types See UDT (userdefined types) User-friendly interfaces, 38 User interfaces GUIs (graphical user interfaces), 20, 39, 1061 multiple users, 20 User labels, combining with data labels, 869–870 Users classifying DBMSs by number of, 49 Index database actors on the scene, 15–16 measures of relevance in IR, 1015 multiuser transactions, 13–14 types of users in information retrieval, 995–996 Utilities, DBMS system, 42–43 Valid event data, 957 Valid state database states, 33 relational databases, 71 Valid time databases, 946 Valid time, in temporal databases, 946 Valid time relations, in temporal databases, 947–949 valid XML documents, 422–425 Validation in database application life cycle, 307–308 of queries, 679 Validation (optimistic) concurrency control, 777, 794–795 Validation phase, of optimistic concurrency control, 794 Value, hue, saturation, and, 967 Value references, in RDBs, 396 Value sets (domains), of attributes, 209–210 Values stored in records, 594 in tuples, 65–66 Values (literals) atomic formulas as, 973 atomic literals, 378 collection literals, 382 complex types for, 358–360 in OO systems, 358 structured literals, 378 Variable-length records, 595–597 Variables bind variables (parameterized statements), 858 communication variables in SQL, 454 domain, 183 instance, 356 iterator variables, in OQL, 399–400 limited, 980 PHP, 485–486 PHP server, 490–491 PHP variable names, 484–485 program, 599 in Prolog languages, 971 scope, 490 shared, 452 temporal, 948 tuple, 98, 101, 175–176 VDL (view definition language), 37 Vector space model, for information retrieval, 1003–1005 Vertical fragmentation, in distributed databases, 881, 895 Vertical partitioning, database tuning and, 735 Vertical propagation, of privileges, 847 Vertical search engines, 1018 Very large databases, 586 Victim selection algorithm, for deadlock prevention, 788 Video applications, 25 Video clips, in multimedia databases, 932, 965 Video segments, in multimedia databases, 966 Video sources, in multimedia databases, 966 View definition language (VDL), 37 View equivalence, of transaction schedules, 768–769 View integration approach, in conceptual schema design, 315 View materialization, 135 View serializability, of transaction schedules, 768–769 Views data warehouses compared with, 1079–1080 database designers creating, 15 granting/revoking privileges, 844 multiple views of data supported in databases, 12 specifying as named queries in OQL, 402–403 Views (virtual tables), SQL vs base tables, 134 CREATE VIEW command, 134–135 implementation and update, 135–137 inline views, 137 overview of, 89, 133–134 Virtual data, in views, 12 Virtual data warehouses, 1070 Virtual private databases (VPDs), 868–869 Virtual relations, specifying with CREATE VIEW command, 90 1171 Virtual tables See Views (virtual tables), SQL Visible/hidden attributes, of objects, 361 Vocabularies in inverted indexing, 1012 searching, 1013–1014 Volatile storage, 586 Voting method, distributed concurrency control based on, 912 VPDs (virtual private databases), 868–869 Wait-die transaction timestamp, 786 Wait-for graph, 787 WAL (write-ahead logging), 810–812 WANs (wide area networks), 879 Weak entity types, 219–220, 288–289 Web access control policies for, 854–855 hypertext documents and, 415 interchanging data on, 24 Web analysis, 1019, 1027 Web applications, architectures for, 47–49 Web-based user interfaces, 38 Web browsers, 38 Web clients, 38 Web content analysis agent-based approach to, 1024–1025 concept hierarchies in, 1024 database-based approach to, 1025 ontologies and, 1023–1024 overview of, 1022 segmenting Web pages and detecting noise, 1024 structured data extraction, 1022 types of Web analysis, 1019 Web information integration, 1022–1023 Web crawlers, 1028 Web databases, programming See PHP Web forms, collecting data from/inserting record into, 493–494 Web interface, for database applications, 449 Web Ontology Language (OWL), 969 Web pages analyzing link structure of, 1020–1021 1172 Index content analysis, 1024 ranking, 1000 Web query interface integration, 1023 Web search and analysis analyzing link structure of Web pages, 1020–1021 comparing with information retrieval, 1018–1019 HITS ranking algorithm, 1021–1022 overview of, 1018 PageRank algorithm, 1021 practical uses of Web analysis, 1027–1028 searching the Web, 1020 Web content analysis, 1022–1025 Web searches combining browsing and retrieval, 1000 Web usage analysis, 1025–1027 Web security, 1028 Web servers middle tier in three-tier architecture, 48 specialized servers in client/server architecture, 45 Web Services Description Language (WSDL), 436 Web spamming, 1028 Web structure analysis analyzing link structure of Web pages, 1020–1022 types of Web analysis, 1019 Web usage analysis pattern analysis phase of, 1027 pattern discovery phase of, 1026–1027 preprocessing phase of, 1025–1026 types of Web analysis, 1019 Well-formed XML, 422–425 WHERE clause DELETE command, 109 explicit sets of values in, 122 missing or unspecified, 102 in SQL retrieval queries, 129–130 UPDATE command, 109–110 Wide area networks (WANs), 879 Wildcard (*) types of queries in IR systems, 1008–1009 using with XPath, 433 WITH CHECK OPTION, view updates and, 137 WordNet thesaurus, 1011 Wound-wait transaction timestamp, 786 Wrappers, structured data extraction and, 1022 Write-ahead logging (WAL), 810–812 Write command, hard disks and, 591 Write phase, of optimistic concurrency control, 794 Write-set, of transactions, 747 Write timestamp, 789 Write-write conflicts, in transaction schedules, 757 write_item(X), 746 WSDL (Web Services Description Language), 436 XML access control, 853–854 XML declaration, 423 XML (eXtended Markup Language) data model, 51 interchanging data on Web using, 24 XML (Extensible Markup Language) bibliographic references, 443 converting graphs into trees, 441 hierarchical (tree) data model, 420–422 hierarchical XML views over flat or graph-based data, 436–440 languages, 432 languages related to, 436 overview of, 415–416 storing/extracting XML documents from databases, 431–432, 442 structured, semistructured, and unstructured data, 416–420 summary and exercises, 442–443 well-formed and valid documents, 422–425 XML schema language, 425–430 XPath, 432–434 XQuery, 434–435 XML schema language, 425–430 example schema file, 426–428 list of concepts in, 428–429 overview of, 425 XPath, 432–434 XQuery, 434–435 XSL (Extensible Stylesheet Language), 415, 436 XSLT (Extensible Stylesheet Language Transformations), 415, 436

Ngày đăng: 12/05/2023, 22:12

Xem thêm: