(BQ) Part 2 book Fundamentals of database systems has contents: Database design theory and normalization; file structures, indexing, and hashing; query processing and optimization, and database tuning; transaction processing, concurrency control, and recovery;...and other contents.
part Database Design Theory and Normalization This page intentionally left blank chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases I n Chapters through 6, we presented various aspects of the relational model and the languages associated with it Each relation schema consists of a number of attributes, and the relational database schema consists of a number of relation schemas So far, we have assumed that attributes are grouped to form a relation schema by using the common sense of the database designer or by mapping a database schema design from a conceptual data model such as the ER or Enhanced-ER (EER) data model These models make the designer identify entity types and relationship types and their respective attributes, which leads to a natural and logical grouping of the attributes into relations when the mapping procedures discussed in Chapter are followed However, we still need some formal way of analyzing why one grouping of attributes into a relation schema may be better than another While discussing database design in Chapters through 10, we did not develop any measure of appropriateness or goodness to measure the quality of the design, other than the intuition of the designer In this chapter we discuss some of the theory that has been developed with the goal of evaluating relational schemas for design quality—that is, to measure formally why one set of groupings of attributes into relation schemas is better than another There are two levels at which we can discuss the goodness of relation schemas The first is the logical (or conceptual) level—how users interpret the relation schemas and the meaning of their attributes Having good relation schemas at this level enables users to understand clearly the meaning of the data in the relations, and hence to formulate their queries correctly The second is the implementation (or 501 502 Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases physical storage) level—how the tuples in a base relation are stored and updated This level applies only to schemas of base relations—which will be physically stored as files—whereas at the logical level we are interested in schemas of both base relations and views (virtual relations) The relational database design theory developed in this chapter applies mainly to base relations, although some criteria of appropriateness also apply to views, as shown in Section 15.1 As with many design problems, database design may be performed using two approaches: bottom-up or top-down A bottom-up design methodology (also called design by synthesis) considers the basic relationships among individual attributes as the starting point and uses those to construct relation schemas This approach is not very popular in practice1 because it suffers from the problem of having to collect a large number of binary relationships among attributes as the starting point For practical situations, it is next to impossible to capture binary relationships among all such pairs of attributes In contrast, a top-down design methodology (also called design by analysis) starts with a number of groupings of attributes into relations that exist together naturally, for example, on an invoice, a form, or a report The relations are then analyzed individually and collectively, leading to further decomposition until all desirable properties are met The theory described in this chapter is applicable to both the top-down and bottom-up design approaches, but is more appropriate when used with the top-down approach Relational database design ultimately produces a set of relations The implicit goals of the design activity are information preservation and minimum redundancy Information is very hard to quantify—hence we consider information preservation in terms of maintaining all concepts, including attribute types, entity types, and relationship types as well as generalization/specialization relationships, which are described using a model such as the EER model Thus, the relational design must preserve all of these concepts, which are originally captured in the conceptual design after the conceptual to logical design mapping Minimizing redundancy implies minimizing redundant storage of the same information and reducing the need for multiple updates to maintain consistency across multiple copies of the same information in response to real-world events that require making an update We start this chapter by informally discussing some criteria for good and bad relation schemas in Section 15.1 In Section 15.2, we define the concept of functional dependency, a formal constraint among attributes that is the main tool for formally measuring the appropriateness of attribute groupings into relation schemas In Section 15.3, we discuss normal forms and the process of normalization using functional dependencies Successive normal forms are defined to meet a set of desirable constraints expressed using functional dependencies The normalization procedure consists of applying a series of tests to relations to meet these increasingly stringent requirements and decompose the relations when necessary In Section 15.4, we dis- 1An exception in which this approach is used in practice is based on a model called the binary relational model An example is the NIAM methodology (Verheijen and VanBekkum, 1982) 15.1 Informal Design Guidelines for Relation Schemas cuss more general definitions of normal forms that can be directly applied to any given design and not require step-by-step analysis and normalization Sections 15.5 to 15.7 discuss further normal forms up to the fifth normal form In Section 15.6 we introduce the multivalued dependency (MVD), followed by the join dependency (JD) in Section 15.7 Section 15.8 summarizes the chapter Chapter 16 continues the development of the theory related to the design of good relational schemas We discuss desirable properties of relational decomposition— nonadditive join property and functional dependency preservation property A general algorithm that tests whether or not a decomposition has the nonadditive (or lossless) join property (Algorithm 16.3 is also presented) We then discuss properties of functional dependencies and the concept of a minimal cover of dependencies We consider the bottom-up approach to database design consisting of a set of algorithms to design relations in a desired normal form These algorithms assume as input a given set of functional dependencies and achieve a relational design in a target normal form while adhering to the above desirable properties In Chapter 16 we also define additional types of dependencies that further enhance the evaluation of the goodness of relation schemas If Chapter 16 is not covered in a course, we recommend a quick introduction to the desirable properties of decomposition and the discussion of Property NJB in Section 16.2 15.1 Informal Design Guidelines for Relation Schemas Before discussing the formal theory of relational database design, we discuss four informal guidelines that may be used as measures to determine the quality of relation schema design: ■ ■ ■ ■ Making sure that the semantics of the attributes is clear in the schema Reducing the redundant information in tuples Reducing the NULL values in tuples Disallowing the possibility of generating spurious tuples These measures are not always independent of one another, as we will see 15.1.1 Imparting Clear Semantics to Attributes in Relations Whenever we group attributes to form a relation schema, we assume that attributes belonging to one relation have certain real-world meaning and a proper interpretation associated with them The semantics of a relation refers to its meaning resulting from the interpretation of attribute values in a tuple In Chapter we discussed how a relation can be interpreted as a set of facts If the conceptual design described in Chapters and is done carefully and the mapping procedure in Chapter is followed systematically, the relational schema design should have a clear meaning 503 504 Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases In general, the easier it is to explain the semantics of the relation, the better the relation schema design will be To illustrate this, consider Figure 15.1, a simplified version of the COMPANY relational database schema in Figure 3.5, and Figure 15.2, which presents an example of populated relation states of this schema The meaning of the EMPLOYEE relation schema is quite simple: Each tuple represents an employee, with values for the employee’s name (Ename), Social Security number (Ssn), birth date (Bdate), and address (Address), and the number of the department that the employee works for (Dnumber) The Dnumber attribute is a foreign key that represents an implicit relationship between EMPLOYEE and DEPARTMENT The semantics of the DEPARTMENT and PROJECT schemas are also straightforward: Each DEPARTMENT tuple represents a department entity, and each PROJECT tuple represents a project entity The attribute Dmgr_ssn of DEPARTMENT relates a department to the employee who is its manager, while Dnum of PROJECT relates a project to its controlling department; both are foreign key attributes The ease with which the meaning of a relation’s attributes can be explained is an informal measure of how well the relation is designed Figure 15.1 A simplified COMPANY relational database schema EMPLOYEE Ename F.K Ssn Bdate Address Dnumber P.K F.K DEPARTMENT Dname Dnumber Dmgr_ssn P.K DEPT_LOCATIONS F.K Dnumber Dlocation P.K PROJECT Pname F.K Pnumber Plocation P.K WORKS_ON F.K F.K Ssn Pnumber P.K Hours Dnum 15.1 Informal Design Guidelines for Relation Schemas Figure 15.2 Sample database state for the relational database schema in Figure 15.1 EMPLOYEE Ename Smith, John B Ssn 123456789 Bdate 1965-01-09 Address 731 Fondren, Houston, TX Wong, Franklin T 333445555 1955-12-08 638 Voss, Houston, TX 999887777 1968-07-19 3321 Castle, Spring, TX Wallace, Jennifer S 987654321 Narayan, Ramesh K 666884444 1941-06-20 1962-09-15 291Berry, Bellaire, TX 975 Fire Oak, Humble, TX English, Joyce A 1972-07-31 5631 Rice, Houston, TX Jabbar, Ahmad V 453453453 987987987 1969-03-29 980 Dallas, Houston, TX Borg, James E 888665555 1937-11-10 450 Stone, Houston, TX Zelaya, Alicia J Dnumber DEPT_LOCATIONS DEPARTMENT Dnumber Dmgr_ssn Dnumber Dlocation Research 333445555 Houston Administration 987654321 Stafford Headquarters 888665555 Bellaire Sugarland Houston Dname WORKS_ON Ssn PROJECT Pnumber Hours Pname Pnumber Plocation Dnum 123456789 32.5 ProductX Bellaire 123456789 7.5 ProductY Sugarland Houston 666884444 40.0 ProductZ 453453453 453453453 20.0 20.0 Computerization 10 Stafford Reorganization 20 Houston 333445555 333445555 10.0 10.0 Newbenefits 30 Stafford 333445555 333445555 10 10.0 20 10.0 999887777 30 10 30.0 10.0 10 30 35.0 5.0 30 20.0 20 15.0 20 Null 999887777 987987987 987987987 987654321 987654321 888665555 505 506 Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases The semantics of the other two relation schemas in Figure 15.1 are slightly more complex Each tuple in DEPT_LOCATIONS gives a department number (Dnumber) and one of the locations of the department (Dlocation) Each tuple in WORKS_ON gives an employee Social Security number (Ssn), the project number of one of the projects that the employee works on (Pnumber), and the number of hours per week that the employee works on that project (Hours) However, both schemas have a well-defined and unambiguous interpretation The schema DEPT_LOCATIONS represents a multivalued attribute of DEPARTMENT, whereas WORKS_ON represents an M:N relationship between EMPLOYEE and PROJECT Hence, all the relation schemas in Figure 15.1 may be considered as easy to explain and therefore good from the standpoint of having clear semantics We can thus formulate the following informal design guideline Guideline Design a relation schema so that it is easy to explain its meaning Do not combine attributes from multiple entity types and relationship types into a single relation Intuitively, if a relation schema corresponds to one entity type or one relationship type, it is straightforward to interpret and to explain its meaning Otherwise, if the relation corresponds to a mixture of multiple entities and relationships, semantic ambiguities will result and the relation cannot be easily explained Examples of Violating Guideline The relation schemas in Figures 15.3(a) and 15.3(b) also have clear semantics (The reader should ignore the lines under the relations for now; they are used to illustrate functional dependency notation, discussed in Section 15.2.) A tuple in the EMP_DEPT relation schema in Figure 15.3(a) represents a single employee but includes additional information—namely, the name (Dname) of the department for which the employee works and the Social Security number (Dmgr_ssn) of the department manager For the EMP_PROJ relation in Figure 15.3(b), each tuple relates an employee to a project but also includes Figure 15.3 Two relation schemas suffering from update anomalies (a) EMP_DEPT and (b) EMP_PROJ (a) EMP_DEPT Ename Ssn Bdate Address Dnumber Dname (b) EMP_PROJ Ssn Pnumber FD1 FD2 FD3 Hours Ename Pname Plocation Dmgr_ssn 15.1 Informal Design Guidelines for Relation Schemas the employee name (Ename), project name (Pname), and project location (Plocation) Although there is nothing wrong logically with these two relations, they violate Guideline by mixing attributes from distinct real-world entities: EMP_DEPT mixes attributes of employees and departments, and EMP_PROJ mixes attributes of employees and projects and the WORKS_ON relationship Hence, they fare poorly against the above measure of design quality They may be used as views, but they cause problems when used as base relations, as we discuss in the following section 15.1.2 Redundant Information in Tuples and Update Anomalies One goal of schema design is to minimize the storage space used by the base relations (and hence the corresponding files) Grouping attributes into relation schemas has a significant effect on storage space For example, compare the space used by the two base relations EMPLOYEE and DEPARTMENT in Figure 15.2 with that for an EMP_DEPT base relation in Figure 15.4, which is the result of applying the NATURAL JOIN operation to EMPLOYEE and DEPARTMENT In EMP_DEPT, the attribute values pertaining to a particular department (Dnumber, Dname, Dmgr_ssn) are repeated for every employee who works for that department In contrast, each department’s information appears only once in the DEPARTMENT relation in Figure 15.2 Only the department number (Dnumber) is repeated in the EMPLOYEE relation for each employee who works in that department as a foreign key Similar comments apply to the EMP_PROJ relation (see Figure 15.4), which augments the WORKS_ON relation with additional attributes from EMPLOYEE and PROJECT Storing natural joins of base relations leads to an additional problem referred to as update anomalies These can be classified into insertion anomalies, deletion anomalies, and modification anomalies.2 Insertion Anomalies Insertion anomalies can be differentiated into two types, illustrated by the following examples based on the EMP_DEPT relation: ■ ■ 2These To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that the employee works for, or NULLs (if the employee does not work for a department as yet) For example, to insert a new tuple for an employee who works in department number 5, we must enter all the attribute values of department correctly so that they are consistent with the corresponding values for department in other tuples in EMP_DEPT In the design of Figure 15.2, we not have to worry about this consistency problem because we enter only the department number in the employee tuple; all other attribute values of department are recorded only once in the database, as a single tuple in the DEPARTMENT relation It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation The only way to this is to place NULL values in the anomalies were identified by Codd (1972a) to justify the need for normalization of relations, as we shall discuss in Section 15.3 507 508 Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases Redundancy EMP_DEPT Ename Smith, John B Ssn Bdate Dnumber Address 123456789 1965-01-09 731 Fondren, Houston, TX Wong, Franklin T 333445555 1955-12-08 638 Voss, Houston, TX Zelaya, Alicia J 999887777 1968-07-19 3321 Castle, Spring, TX Dname Research Dmgr_ssn 333445555 Research 333445555 Administration 987654321 Wallace, Jennifer S 987654321 1941-06-20 291 Berry, Bellaire, TX Administration 987654321 Narayan, Ramesh K 666884444 1962-09-15 975 FireOak, Humble, TX Research 333445555 English, Joyce A 453453453 1972-07-31 5631 Rice, Houston, TX Research 333445555 Jabbar, Ahmad V 987987987 1969-03-29 980 Dallas, Houston, TX Administration 987654321 Borg, James E 888665555 1937-11-10 Headquarters 888665555 450 Stone, Houston, TX Redundancy Redundancy EMP_PROJ Hours 32.5 Ssn 123456789 Pnumber 123456789 7.5 666884444 40.0 453453453 20.0 453453453 20.0 333445555 10.0 333445555 333445555 10 333445555 999887777 Ename Smith, John B Pname ProductX Smith, John B ProductY Sugarland Narayan, Ramesh K ProductZ Houston English, Joyce A ProductX Bellaire English, Joyce A ProductY Sugarland Wong, Franklin T ProductY Sugarland 10.0 Wong, Franklin T ProductZ Houston 10.0 Wong, Franklin T Computerization Stafford 20 10.0 Wong, Franklin T Reorganization Houston 30 30.0 Zelaya, Alicia J Newbenefits Stafford 999887777 10 10.0 Zelaya, Alicia J Computerization Stafford 987987987 10 35.0 Jabbar, Ahmad V Computerization Stafford 987987987 30 5.0 Jabbar, Ahmad V Newbenefits Stafford 987654321 30 20.0 Wallace, Jennifer S Newbenefits Stafford 987654321 20 15.0 Wallace, Jennifer S Reorganization Houston 888665555 20 Null Borg, James E Reorganization Houston Plocation Bellaire Figure 15.4 Sample states for EMP_DEPT and EMP_PROJ resulting from applying NATURAL JOIN to the relations in Figure 15.2 These may be stored as base relations for performance reasons attributes for employee This violates the entity integrity for EMP_DEPT because Ssn is its primary key Moreover, when the first employee is assigned to that department, we not need this tuple with NULL values any more This problem does not occur in the design of Figure 15.2 because a department is entered in the DEPARTMENT relation whether or not any employees work for it, and whenever an employee is assigned to that department, a corresponding tuple is inserted in EMPLOYEE 1158 Index Phantoms, transaction support in SQL, 771 PHP arrays, 486–488 bibliographic references, 497 collecting data from forms and inserting records, 493–494 connecting to databases, 491–493 features, 484–485 functions, 488–490 overview of, 481–482 retrieval queries, 494–495 server variables and forms, 490–491 simple example of, 482–484 summary and exercises, 496–497 variables, data types, and constructs, 485–486 PHP Extension and Application Repository (PEAR), 491 Phrase queries, types of queries in IR systems, 1008 Physical clustering, of records on disks, 617 Physical data independence, in three-schema architecture, 36 Physical data models, 30 Physical database design See also Database design bibliographic references, 740 data organization in, 587 denormalization as design decision related to query speed, 731–732 in ER (Entity-Relationship) model, 202 factors influencing, 727–729 indexing decisions, 730–731 overview of, 9, 326–327 summary and exercises, 739–740 tuning and, 735–736 Physical database file structures, 583 Physical database phase, in database design, 311 Physical indexes vs logical, 668 ordering primary and clustering indexes, 642 Physical problems/catastrophes, recovery needed due to, 751 Physical relationships, between file records, 617 Pile file (heap), 602 Pipelined evaluation, converting query trees into query execution plans, 710 Pipelining, combining operations using, 700 Pivoting (rotation) functionality of data warehouses, 1078 working with data cubes, 1070–1072 PL/SQL designing database programming language from scratch, 449 impedance mismatch and, 450 writing database applications with, 447 Plaintext, 864 Point events (facts), in temporal databases, 946 Pointers, blocks of data and, 597 Points on maps, 959–960 in temporal databases, 945 Policies access control for e-commerce and Web, 854–855 flow policies, 860 for label-based security, 853 security policies, 836 Polygons, on maps, 960 Polyinstantiation, in mandatory access control, 849–850 Polymorphism (operator overloading) defined, 369 in OO systems, 357 overview of, 367–368 specifying in SQL, 375–376 populating (loading) databases, 33 Populations, in statistical database security, 859 Positional iterator, SQLJ, 461–462 Positive literals, in Datalog language, 973 Precedence graph (serialization graph), 763–765 Precision metrics finding relevant information and, 1019 measures of relevance in IR, 1015–1017 Precision, vs security, 841 Precompilers DML commands and, 42 embedded SQL and, 452 in SQL programming, 449 Predicate-defined (conditiondefined) subclasses, 252, 264 Predicate dependency graph, 982 Predicate locking, 801 Predicates as arity or degree of p, 973 built-in, 972–973 fact-defined and rule-defined, 978 interpretation of, 976 in Prolog languages, 970–972 relational schemas and, 66 Prediction, as goal of data mining, 1037 Preprocessors embedded SQL and, 452 in SQL programming, 449 in Web usage analysis, 1025–1027 Presentation layer (client), in threetier client/server architecture, 892 Pretty Good Privacy (PGP), 854 Primary file organization B-trees as, 651 data organization and, 587 Primary indexes cost functions for SELECT operations, 713 methods for simple selection, 686 for ordered records (sorted files), 605 overview of, 633–635 searching nondense multilevel primary index, 646 tables comparing index types, 642 types of ordered indexes, 632 PRIMARY KEY clause, CREATE TABLE command, 95 Primary keys defined, 519 normal forms based on, 516–517 primary indexes and, 633 relational model constraints, 69 Primary site, concurrency control techniques for distributed databases, 910–911 Primary storage, 584 Prime attributes, 519, 526 Printer servers, in client/server architecture, 45 Privacy information privacy vs information security, 841–842 issues in database security, 866–867 protecting in statistical databases, 859 Private keys, in public (asymmetric) key algorithms, 864 Privileged software, 19 Index Privileges discretionary, 842–844 granting/revoking, 111, 844–846 limits on propagation of, 846–847 unauthorized escalation and abuse, 855, 858 views for specifying, 844 Proactive updates, valid time relations and, 949 Probabilistic model, for information retrieval, 1005–1006 Procedural DMLs, 37–38 Process-driven design, 310 PROCESS RULES, in active database systems, 938 Processes in database design, 322 multiprogramming and, 744 Processors, parallel, 1079 Program-data independence, 11–12, 23–24 Program-operation independence, 12 Program variables, 599 Programming languages advantages/disadvantages of, 477 approaches to database programming, 449 DBMS, 36–38 impedance mismatch and, 450 object-orientation creating compatibility between, 369 Web databases See PHP XML, 432–436 Programs, insulation between programs and data, 11–13 PROJECT operations algorithms for, 696–697 Query processing and optimizing, 696–697 in relational algebra, 149–150 Projection attributes, SELECT command and, 98 Projective operators, types of spatial operators, 961 Prolog language See also Datalog language logic programming and, 970 notation, 970–973 Proof-theoretic interpretation, of rules in deductive databases, 975 Properties, of association rules, 1041 Properties of relational decompositions dependency preservation, 552–553 dependency-preserving and nonadditive join decomposition into 3NF schemas, 560–563 dependency-preserving decomposition into 3NF schemas, 558–559 insufficiency of normal forms and, 552 nonadditive join decomposition into BCNF schemas, 559–560 nonadditive (lossless) join, 553–556 overview of, 544, 551 successive nonadditive join decompositions, 557 testing binary decompositions for nonadditive join property, 553–556 Protocols concurrency control, 777 deadlock prevention, 785–787 for ensuring serializability of transaction schedules, 767–768 Proximity queries, 1008 PSM (Persistent stored modules), 474–476 Public (asymmetric) key algorithms, 863–865 Public keys, in public (asymmetric) key algorithm, 864 Publishing XML documents, 431 Punctuation marks, text preprocessing in information retrieval, 1011 Pure time conditions, 955 QBE (Query-By-Example) basic retrieval in, 1091–1095 domain calculus and, 183, 185 grouping, aggregation, and database modification in, 1095–1098 overview of, 1091 QMF (Query Management Facility), 185 Quadtrees, 963 Qualified aggregations, in UML class diagrams, 228 Qualified associations, in UML class diagrams, 228 Qualifier conditions, XPath, 432 Quality control, data warehousing and, 1080 Quantifiers collection operators in OQL, 403–405 existential and universal, 177–178 1159 transforming, 180 using in queries, 180–182 Queries See also OQL (object query language); SQL (Structured Query Language) content-based retrieval, 965 database tuning and, 736–738 defined, design decisions related to query speed, 731–732 evaluating nonrecursive Datalog queries, 981–983 information retrieval, 1007–1009 interactive interface for, 40 IR systems, 1007–1009 keyword-based, 39 modes of interaction in IR systems, 999 physical database design and, 728–729 processing in databases, 19–20 in Prolog languages, 973 retrieval queries from database tables, 494–495 spatial, 958, 961 statistical, 859 TSQL2, 954–956 Query blocks, 681 Query-By-Example See QBE (Query-By-Example) Query compilers, 41 Query decomposition, 905–907 Query execution plans converting query trees into, 709–710 creating, 679 Query graphs creating, 679 notation for, 179–180, 701–703 Query languages DML as, 38 for federated databases, 886 SQL See also SQL (Structured Query Language) TSQL2 See also SQL (Structured Query Language) Query Management Facility (QMF), 185 Query mapping, 901 Query modification, 135 Query optimizer, 41, 679 Query processing and optimizing aggregate functions, 698–699 bibliographic references, 725 catalog information used in cost functions, 712–713 1160 Index converting query trees into query execution plans, 709–710 cost components of query execution, 711–712 cost functions for JOIN, 715–718 cost functions for SELECT, 713–715 DBMS module for, 20 disjunctive selection conditions, 688 external sorting, 682–685 heuristic algebraic optimization algorithm, 708–709 heuristic optimization of query trees, 703–706 heuristics used in query optimization, 700–701 hybrid hash-join, 696 implementing JOIN operations, 689–690 implementing SELECT operations, 685 join selection factors, 693–694 multiple relation queries and JOIN ordering, 718–719 nested-loop joins, 690–693 notation for query trees and query graphs, 701–703 operations, 700 OUTER JOIN operations, 699–700 overview of, 679–681 partition-hash joins, 694–696 PROJECT operations, 696–697 query optimization in Oracle, 721–722 search methods for complex selection, 686–687 search methods for simple selection, 685–686 selectivity and cost estimates in query optimization, 710–711 selectivity of conditions and, 687–688 semantic query optimization, 722–723 set operations, 697–698 summary and exercises, 723–725 transformation rules for relational algebra operations, 706–708 translating SQL queries into relational algebra, 681–682 Query processing and optimizing, in distributed databases data transfer costs for distributed query processing, 902–904 distributed query processing using semijoin operation, 904 overview of, 901–902 query update and decomposition, 905–907 Query results cursors for looping over tuples in, 450 ordering, 106–107 path expressions and, 400–402 retrieval queries from database tables, 494–495 Query (transaction) server, in twotier client/server architecture, 47 Query trees converting into query execution plans, 709–710 creating, 679 notation for, 163–165, 701–703 optimization of, 703–706 R-Trees, for spatial indexing, 962 RAID (Redundant Array of Inexpensive Disks) levels, 620–621 overview of, 617–619 performance improvements, 619–620 reliability improvements, 619 RAM (Random Access Memory), 585 Random access storage devices, 592 Randomizing function (hash function), 606 Range queries, 686, 961 Range relations, of tuple variables, 175–176 Rational Rose data modeler, 338 database design with, 337 tools and options for data modeling, 338–342 RBAC (role-based access control), 851–852 RBG (red, blue, green) colors, 967 RDBMS (relational database management systems) creating indexes, 731 ORDBMS (object-relational database management systems), 354 providing application flexibility, 23–24 two-tier client/server architectures and, 46 RDBs (relational databases) designing See relational database design overview of, 395–396 schemas See relational database schemas RDF (Resource Description Framework), 436 Reachability, of objects, 363 Read command, hard disks, 591 Read-only transaction, 745 READ operation, transactions, 751 Read (or Get) operation, on files, 600 Read phase, of optimistic concurrency control, 794 Read-set, of transaction, 747 Read timestamp, 789 Read-write conflicts, in transaction schedules, 757 Read/write heads, on hard disks, 591 Read/write, OSs controlling disk read/write, 40 Read-write transactions, 745–747 read_item(X), 746 Real-time database technology, Reasoning mechanisms, in knowledge representation, 268 Recall metrics, in IR, 1015–1017, 1019 Recall/precision curve, in IR, 1017 Record-at-a-time DMLs, 38 Record-based data models, 31 Record pointers, 609 Records See also Files (of records) anchor record (block anchor), 633 blocking, 597 catalog information used in query cost estimation, 712 fixed-length and variable-length, 595–597 inserting, 493–494 mixed, 616–617 ordered (sorted files), 603–606 phantom records, concurrency control techniques, 800–801 placing file records on disk, 594 spanned/unspanned, 597–598 in SQL/CLI, 464–468 types of, 594–595 unordered (heap files), 601–602 Recoverability, transaction schedules based o, 757–759 Recovery See also Backup and recovery; Database recovery techniques Index transaction management in distributed databases, 912–913 types of failures and, 750–751 Recursive closure operations, in relational algebra, 168–169 Recursive relationships, 168, 215 Recursive rules, in Prolog languages, 972 Red, blue, green (RBG) colors, 967 REDO phase, of ARIES recovery algorithm, 823 Redo transaction, 753 REDO, write-ahead logging and, 810–811 Redundancy, controlling in databases, 17–18 Redundant Array of Inexpensive Disks (RAID) See RAID (Redundant Array of Inexpensive Disks) REF keyword, specifying relationships via reference, 376 Reference types, OIDs using, 373–374 References foreign key, 73 representing object relationships, 360 specifying relationships via reference, 376 Referencing relations, 73 Referential integrity constraints inclusion dependencies and, 571 integrity constraints in databases, 21 relational data model and, 73–74 specifying in SQL, 95–96 Reflexive associations, in UML class diagrams, 227 Regression function, 1058 Regression, in data mining, 1057–1058 Regression rule, 1057 Regular entity types, 219, 287–288 Relation extension, 62 Relation intension, 62 Relation nodes notation for, 703 in query graphs, 179 Relation schemas domains and, 61 goodness of, 501–502 in relational databases, 501 Relation (table) level, assigning privileges at, 842–843 Relational algebra aggregate functions and grouping, 166–168 bibliographic references, 194–195 CARTESIAN PRODUCT operation, 155–157 complete set of relational algebra operations, 161, 164 DIVISION operation, 162–163 EQUIJOIN and NATURAL JOIN operations, 159–161 examples of queries in, 171–174 generalized projection, 165–166 JOIN operation, 157–158 notation for query trees, 163–165 OUTER JOIN operations, 169–170 OUTER UNION operation, 170–171 overview of, 145–146 PROJECT operation, 149–150 recursive closure operations, 168–169 RENAME operation, 151–152 SELECT operation, 147–149 sequences of operations, 151 summary and exercises, 185–194 transformation rules for operations, 706–708 translating SQL queries into, 681–682 UNION, INTERSECTION, and MINUS operations, 152–155 Relational calculus domain (relational) calculus, 183–185 overview of, 146–147 tuple relational calculus See Tuple relational calculus Relational completeness, of relational query languages, 174 Relational data model bibliographic references, 85 characteristics of relations, 63–66 classifying DBMSs and, 49 concepts, 60–61 constraints, 67–70 correspondence to ER model, 293 Delete operation, 77–78 domains, attributes, tuples, and relations, 61–63 formal languages for See Relational algebra; Relational calculus Insert operation, 76–77 1161 integrity, referential integrity, and foreign keys, 73–74 in list of data model types, 31 mapping from EER model to See EER-to-Relational mapping mapping from ER model to See ER-to-Relational mapping notation, 66–67 other types of constraints, 74–75 overview of, 50, 59–60 practical language for See SQL (Structured Query Language) schemas, 70–73 SQL compared with, 97 summary and exercises, 79–85 transactions and, 79 update operations, 75–76, 78–79 Relational database design algorithms for, 557, 566–567 attribute semantics in, 503–507 bibliographic references, 302, 579 bottom-up approach to, 544 Boyce-Codd normal form (BCNF), 529–531 dependency preservation properties of decompositions, 552–553 dependency-preserving and nonadditive join decomposition into 3NF schemas, 560–563 dependency-preserving decomposition into 3NF schemas, 558–559 disallowing possibility for spurious tuples, 510–513 domain-key normal form (DKNF), 574–575 equivalence of sets of functional dependencies, 549 first normal form (1NF), 519–523 formal analysis of relational schemas, 513 formal definition of fourth normal form, 533–534, 568–570 functional dependencies based on arithmetic functions and procedures, 572–574 functional dependency and, 513–516 general definition of second normal form, 526–527 general definition of third normal form, 528 goodness of relational schemas, 501–502 1162 Index inclusion dependencies, 571–572 inference rules for functional and multivalued dependencies, 568 inference rules for functional dependencies, 545–549 informal guidelines for relational schemas, 503, 513 join dependencies and fifth normal form, 534–535 key definitions, 518–519 mapping from EER model to relational model See EER-toRelational mapping mapping from ER model to relational model See ER-toRelational mapping minimal sets of functional dependencies, 549–551 multivalued dependency and fourth normal form, 531–533 nonadditive join decomposition into 4NF relations, 570 nonadditive join decomposition into BCNF schemas, 559–560 nonadditive (lossless) join properties of decompositions, 553–556 normal forms based on primary keys, 516–517 normalization of relations, 517–518 NULL values and dangling tuples and, 563–565 overview of, 285 practical use of normal forms, 518 reducing NULL values in tuples, 509–510 reducing redundant information in tuples, 507–509 relational decomposition and insufficiency of normal forms, 552 second normal form (2NF), 523 successive nonadditive join decompositions, 557 summary and exercises, 299–301, 575–578 template dependencies, 572 testing binary decompositions for nonadditive join property, 557 third normal form (3NF), 523–525 top-down and bottom-up approaches, 502 tuning and, 733 Relational database management systems See RDBMS (relational database management systems) Relational database schemas algorithms for schema design, 557 bibliographic references, 542 clear semantics for attributes in, 503–507 components of, 70–73 disallowing possibility for spurious tuples, 510–513 formal analysis of, 513 functional dependency and, 513–516 informal guidelines, 503, 513 overview of, 501–502 reducing NULL values in tuples, 509–510 reducing redundant information in tuples, 507–509 relation schemas in, 501 summary and exercises, 535–542 Relational database state, 70 Relational design by analysis, 543 Relational design by synthesis, 544 Relational expressions, 983 Relational OLAP (ROLAP), 1079 Relational operators in deductive database systems, 980–981 relational expressions and, 983 Relations (relation states) See also Tables alternative definition of, 64–65 column-based storage of, 669–670 defined, 61 interpretation (meaning) of, 66 legality of, 514 normalization of, 517–518 ordering tuples in, 63 ordering values within tuples, 64 overview of, 62–63 values and NULLS in tuples, 65–66 Relations, temporal bitemporal time, 950–952 transaction time, 949–950 valid time, 947–949 Relationship relation (lookup table) mapping of binary 1:1 relationship types, 289 mapping of binary 1:N relationship types, 290 mapping of binary M:N relationship types, 290–291 Relationships in data modeling, 31 in ODMG object model, 386 references to, 360 representing in OO systems, 356 specifying by reference, 376 symbols for, 1084 University student database example, Relationships, in EER model class/subclass relationships, 247 specific relationship types and, 249–250 Relationships, in ER model attributes of relationship types, 218 constraints on binary relationship types, 216–218 degree of relationship greater than two, 228–232 degree of relationship type, 213–214 overview of, 212 relationship types, sets, and instances, 212–213 relationships as attributes, 214 role names and recursive relationships, 215 Relevant sets, in probabilistic model for IR, 1005 Reliability, in distributed databases, 881, 882 Remote commands, for SQL injection attacks, 857 RENAME operation, in relational algebra, 151–152 Reorganize operation, on files, 600 Repeating field or groups, in file records, 595 Repeating history, in ARIES recovery algorithm, 821 Replication active rules for maintaining consistency of replicated tables, 943 in distributed databases, 897 example of fragmentation, allocation, and replication, 898–901 transparency of, 880 Representational (or implementation) data models, 31 Requirements collection and analysis phase in database design, 200, 311–313 database design starting with, of information system (IS) life cycle, 307 Index Reset operations, on files, 599 Resource Description Framework (RDF), 436 Response time, physical database design and, 326 Restrict option, of delete operation, 77 Result equivalence, of transaction schedules, 762 Result relations, 75 Result tables, in QBE, 1095 Retrieval operations database design and, 728 from database tables, 494–495 on files, 599 modes of interaction in IR systems, 999 objects, 362 QBE (Query-By-Example), 1091–1095 types of relational data model operations, 75 Retrieval transactions, 322 Retroactive update, valid time relations and, 949 Return values, of PHP functions, 490 Reverse engineering, Rational Rose and, 338 Revoking privileges, 844, 845–846 Rewrite blocks, file organization and, 602 Rewrite time, as disk parameter, 1089 RIFT (rotation invariant feature transform), 968 Rigorous two-phase locking, 785 Rivest, Ron, 865 ROLAP (relational OLAP), 1079 Role-based access control (RBAC), 851–852 Role hierarchy, in role-based access control, 851 Role names, and recursive relationships, 215 Roll-up display functionality of data warehouses, 1078 working with data cubes, 1070–1072 ROLLBACK (or ABORT) operation, 752 Rollbacks, in database recovery, 813–815, 950 Root element, XML schema language, 429 Root tag, XML documents, 423 Roots, of tree structures, 646 Rotation See Pivoting (rotation) Rotation invariant feature transform (RIFT), 968 Rotational delay (rd) as disk parameter, 1087 on hard disks, 591 Row-level access control, 852–853 Row-level triggers, 937 Rows See Tuples (rows) Rows, in SQL, 89 RSA encryption algorithm, 865 Rule consideration, in active databases deferred consideration, 942 overview of, 938–939 Rule-defined predicates (views), 978 Rule sets, in active database systems, 938 Rules, in deductive databases interpretation of, 975–977 overview of, 21, 932 in Prolog/Datalog notation, 970–972 safe, 979–980 Runtime database processor DBMS component modules, 42 query execution and, 679 Runtime, specifying SQL queries at, 458–459 Safe expressions, in tuple relational calculus, 182–183 Safe rules, in deductive databases, 979–980 Sampling algorithm, in data mining, 1042 SANs (Storage Area Networks), 621–622 Saturation, hue, saturation, and value (HSV), 967 SAX (Simple API for XML), 423 Scale-invariant feature transform (SIFT), 968 Scan operations, files, 600 Scanner, for SQL, 679 Schedules (histories), of transactions characterizing based on recoverability, 757–759 characterizing based on serializability, 759–760 equivalence of, 768–770 overview of, 755–757 serial, nonserial, and conflictserializable schedules, 761–763 1163 testing conflict serializability of, 763–765 Schema conceptual design, 313–321 entity type describing for entity sets, 208 instances and database state and, 32–33 ontologies and, 272 relational See Relational database schemas relational data model and, 70–73 three-schema architecture See Three-schema architecture Schema construct, 32, 222 Schema diagram, 32 Schema evolution, 33 Schema matching, types of Web information integration, 1023 Schema, SQL change statements, 137–139 names, 89 overview of, 89–90 Schema (view) integration, 316–317, 319–321 Schemaless XML documents, 422 Scientific applications, 25 Scope, variable, 490 Scripting languages, PHP as, 482 SCSI (Small Computer System Interface), 591 SDL (storage definition language), 37, 110 Search engines overview of, 998–999 vertical and metasearch, 1018 Search fields, 648 Search trees, 647–649 Searches conversational, 1029–1030 faceted, 1028–1029 information retrieval See IR (Information Retrieval) measures of relevance, 1014–1015 methods for complex selection, 686–687 methods for simple selection, 685–686 navigational, informational, and transactional, 996 social searches, 1029 Web See Web search and analysis Second normal form (2NF) general definition of, 526–527 overview of, 523 Secondary access path, 631 1164 Index Secondary file organization, 587 Secondary indexes advantages of, 668 cost functions for SELECT, 714 methods for simple selection, 686 overview of, 636–642 tables comparing index types, 642 types of ordered indexes, 632–633 Secondary keys, 636 Secondary storage, 584, 711 Secret key algorithms, 863 Sectors, of hard disk, 589 Security vs precision, 841 Web security, 1028 Security and authorization subsystem, DBMS, 19 Security, database See Database security Seek time (s) as disk parameter, 1087 on hard disks, 591 Segmentation, automatic analysis of images, 967 SELECT command, SQL aggregate functions used in, 125 basic form of, 97–98 FROM clause, 107 DISTINCT keyword with, 103 information retrieval with, 97 projection attributes and selection conditions, 98, 100 in SQL retrieval queries, 129–130 SELECT-FROM-WHERE structure, of SQL queries, 98–100 SELECT operations cost functions for, 713–715 disjunctive selection conditions, 688 on files, 599 implementing, 685 in relational algebra, 147–149 search methods for complex selection, 686–687 search methods for simple selection, 685–686 selectivity of conditions, 687–688 SELECT operator (σ), 147 Select-project-join queries, 179 Selection cardinality, 712 Selection conditions in domain calculus, 184 SELECT command and, 98, 100 SELECT operation and, 147 Selection, functionality of data warehouses, 1079 Selective inheritance, in ODBs (object databases), 368 Selectivity and cost estimates, in query optimization catalog information used in cost functions, 712–713 cost components of query execution, 711–712 cost functions for JOIN, 715–718 cost functions for SELECT, 713–715 multiple relation queries and JOIN ordering, 718–719 overview of, 710–711 Selectivity, of conditions, 687–688 Self-describing data, 10–11, 416 Semantic constraints relational model constraints, 68 template dependencies and, 572 types of constraints, 74 Semantic data models abstraction concepts in, 268 aggregation and association, 269–271 classification and instantiation, 268 compared with knowledge representation, 267–268 ER (Entity-Relationship) model, 245 identification, 269 for information retrieval, 1006–1007 specialization and generalization, 269 Semantic query optimization, 722–723 Semantic relationships, in semantic model for IR, 1006 Semantic Web, 272–273 Semantics approach to IR, 1000 of attributes, 503–507, 514 equivalence of transaction schedules and, 769–770 heterogeneity of in federated databases, 886–887 integrity constraints and, 21 tagging images, 969 Semijoin operation, 904 Semistructured data, 416–417 Separators, XPath, 432 Sequence diagrams, UML, 329, 331 Sequential order, in accessing data blocks, 592 Sequential patterns in data mining, 1037 describing knowledge discovered by data mining, 1039 discovery of, 1057 in pattern discovery phase of Web usage analysis, 1027 Serial schedules, 761 Serializability, of transaction schedules characterizing schedules based on, 759–760 serial, nonserial, and conflictserializable schedules, 761–763 testing conflict serializability of schedules, 763–765 used for concurrency control, 765–768 view serializability, 768–769 Serialization (precedence) graph, 763–765 Servers client program calling database server, 451 database servers, 42 DBMS module for, 29 parallel architecture for, 1079 PHP variables, 490–491 server level in two-tier client/ server architecture, 47 specialized servers in client/server architecture, 45–46 Set-at-a-time DMLs, 38 Set constructor, 359 SET DIFFERENCE operation algorithms for, 697–698 in relational algebra, 152–155 Set null (set default) option, in delete operations, 77–78 Set operations algorithms for, 697–698 query processing and optimizing, 697–698 SQL, 104 Set types, in network data model, 51 Sets equivalence of, 549 explicit sets of values in SQL, 122 SQL table as multiset of tuples, 97 tables as, 103–105 Shadow directory, 820 Shadow paging, 820–821 Shamir, Adi, 865 Shape, automatic analysis of images, 967 Shape descriptors, 965 Shared nothing architecture, 887–888 Index Shared subclasses (multiple inheritance), 256, 297 Shared variables, embedded SQL and, 452 Sharing data and multiuser transactions, 13–14 Sharing databases, Shrinking (second) phase, in twophase locking, 782 SIFT (scale-invariant feature transform), 968 Simple API for XML (SAX), 423 Simple (atomic) attributes, in ER model, 205–207 Simple Object Access Protocol (SOAP), 436 Simultaneous update, 949 Single inheritance, subclasses and, 256–257 Single-level indexes clustering indexes, 635–636 overview of, 632–633 primary indexes, 633–635 secondary indexes, 636–642 tables comparing index types, 642 Single-loop joins cost functions for, 716 methods for implementing joins, 689 Single-quoted strings, PHP text processing, 485–486 Single-relation options, for mapping specialization or generalization, 295 Single-sided disks, 589 Single time points, in temporal databases, 946 Single-user systems, 49 Single-user transaction processing system, 744–745 Single-valued attributes, in ER model, 206 Singular value decompositions (SVD), 967 Slice and dice, functionality of data warehouses, 1078 Small Computer System Interface (SCSI), 591 SMART document retrieval system, 998 SMP (symmetric multiprocessor), 1079 Snowflake schema, for multidimensional data models, 1073–1074 SOAP (Simple Object Access Protocol), 436 Social searches, 1029 Software costs, choosing a DBMS, 323 Software developers, 16 Software engineers database actors on the scene, 16 design and testing of applications, 199 Sort-merge joins cost functions for, 717 methods for implementing joins, 689–690 Sort-merge strategy, 683 Sorting external, 682–685 functionality of data warehouses, 1078 implementing aggregate operations, 699 ordered records (sorted files), 603–606 Space utilization, physical database design and, 326 Spamming, Web spamming, 1028 Spanned/unspanned organization, of records, 597 Sparse indexes, 633 Spatial analysis, 959 Spatial applications, 25 Spatial databases applications of spatial data, 964–965 data indexing, 961–963 data mining, 963–964 data types and models, 959–960 dynamic operators, 961 operators, 960–961 overview of, 957–959 Spatial joins/overlays, 961 Spatial outliers, 965 Special purpose DBMSs, 50 Specialization/generalization constraints on, 251–254 definitions, 264 design choices for, 263–264 EER-to-Relational mapping, 294–297 generalization, 250–251 hierarchies and lattices, 254–257 in knowledge representation, 269 notation for, 1084–1085 refining conceptual schemas, 257–258 specialization, 248–250 UML (Unified Modeling Language), 265–266 1165 Specialized servers, in client/server architecture, 45 Specific attributes (local attributes), of subclass, 249 Specific relationship types, subclasses and, 249–250 Specification, conceptualization and, 272 Speech input and output, queries and, 39 SQL-99, 942–943 SQL/CLI (Call Level Interface) database programming with, 464–468 library of functions, 448 SQL injection attacks code injection, 856 function call injection, 856–857 protecting against, 858 risks associated with, 857–858 SQL manipulation, 856 types of, 855 SQL programming techniques approaches to database programming, 449–450 bibliographic references, 479 database programming techniques and issues, 448–449 dynamic SQL, 448, 458–459 embedded SQL See Embedded SQL function calls See Function calls, database programming with impedance mismatch, 450 overview of, 447–448 sequence of interactions in, 451 SQL/PSM (SQL/Persistent Stored Modules) See SQL/PSM (SQL/ Persistent Stored Modules) summary and exercises, 477–478 SQL/PSM (SQL/Persistent Stored Modules) overview of, 473 specifying persistent stored modules, 475–476 stored procedures and functions, 473–475 SQL (Structured Query Language) See also Embedded SQL * (asterisk) for retrieving all attribute values of selected tuples, 102–103 aliases, 101–102 bibliographic references, 114 CHECK clauses for specifying constraints on tuples, 97 1166 Index clauses in simple SQL queries, 107 common data types, 92–94 CREATE TABLE command, 90–92 data definition in, 89 dealing with ambiguous attribute names, 100–101 DELETE command, 109 embedding SQL commands in Java, 459–461 external sorting, 682–685 INSERT command, 107–109 list of features in, 110–111 manipulation by SQL injection attacks, 856 missing or unspecified WHERE clauses, 102 naming constraints, 96–97 object-relational features in, 354 ordering query results, 106–107 overview of, 87–89 QBE compared with, 1098 schema and catalog concepts in, 89–90 SELECT-FROM-WHERE structure of queries, 98–100 servers, 47 specifying attribute constraints and default values, 94–95 specifying key and referential integrity constraints, 95–96 substring pattern matching and arithmetic operators, 105–106 summary and exercises, 111–114 tables as sets in, 103–105 temporal data types, 945 transaction support, 770–772 translating SQL queries into relational algebra, 681–682 UDT (user-defined types) in, 111 UPDATE command, 109–110 SQL (Structured Query Language), advanced features aggregate functions, 124–126 ALTER command, 138–139 bibliographic references, 143 clauses in retrieval queries, 129–130 comparisons involving NULL and three-valued logic, 116–117 correlated nested queries, 119–120 CREATE ASSERTION command, 131–132 CREATE TRIGGER command, 132–133 CREATE VIEW command, 134–135 DROP command, 138 EXISTS and NOT EXISTS functions, 120–122 explicit sets and renaming of attributes, 122 GROUP BY clause, 126–129 HAVING clause, 127–129 inline views, 137 nested queries, 117–119 outer and inner joins, 123–124 overview of, 115 schema change statements, 137 summary and exercises, 139–143 UNIQUE function, 122 view implementation and update, 135–137 views (virtual tables) in, 133–134 SQL (Structured Query Language), ODB extensions to dot notation for build path expressions, 376 encapsulation of operations, 374–375 inheritance and polymorphism, 375–376 OIDs (object identifiers) using reference types, 373–374 overview of, 369–370 specifying relationships via reference, 376 tables based on UDTs, 374 UDTs and complex structures for objects, 370–373 SQLJ embedding SQL command in Java, 459–461 retrieving multiple tuples using iterators, 461–464 SQLODE communication variable, 454 SQLSTATE communication variable, 454 Standards database approach and, 22 database design specification, 328 SQL, 88 Star schema, 1073 Starvation, concurrency control and, 788 State in ODMG object model, 382 relational database state, 70–72 transaction, 751–752 State constraints, 75 Statechart diagrams, UML, 329, 333 Statement-level active rules, in STARBURST example, 940–942 Statement-level triggers overview of, 937 in STARBURST example, 940 Statement records, in SQL/CLI, 464–468 Static (early) binding, in ODMS, 368 Static files, 601 Static hashing, 610 Static Web pages, 420 Statistical analysis, in pattern discovery phase of Web usage analysis, 1026 Statistical approach, to IR, 1000–1002 Statistical database security, 859–860 Statistical databases, 837–838, 874 Statistical queries, 859 Steal/no-steal techniques in database recovery, 811–812 UNDO/REDO recovery algorithm, 819 Stem, of words, 1010 Stemming, text preprocessing in information retrieval, 1010 Stopwords in keyword queries, 1007 removal, 1009–1010 text/document sources, 966 Storage allocation of file blocks on disk, 598 bibliographic references, 630 buffer management and, 593–594 column-based storage of relations, 669–670 cost components of query execution, 711 covert channels, 861 database storage, 586–587 database storage reorganization, 43 database tuning and, 733 file headers (descriptors) and, 598 file systems and See Files (of records) files, fixed-length records, and variable-length records, 595–597 hardware structures of disk devices, 588–592 iSCSI (Internet SCSI), 623–624 magnetic tape devices, 592–593 Index measuring capacity, 585 memory hierarchies and, 584–586 NAS (network-attached storage), 622–623 overview of, 583–584 parallelization of access See RAID (Redundant Array of Inexpensive Disks) placing file records on disk, 594 record blocking and, 597 records and record types, 594–595 SANs (Storage Area Networks), 621–622 secondary storage devices, 587 spanned/unspanned records, 597–598 summary and exercises, 624–630 Storage Area Networks (SANs), 621–622 Storage definition language (SDL), 37, 110 Storage medium, physical, 584 Stored attributes, in ER model, 206 Stored data manager module, DBMS, 40, 42 Stored procedures, 21, 473–475 Stream-based processing, 700 Streaming XML documents, 423 Strict hierarchies, 255 Strict schedules, 759 Strict timestamp ordering, 790–791 Strict two-phase locking, 784–785 Strings pattern matching, 105 PHP text processing, 485 Strong entity types, 219, 287 Struct (tuple) constructors, 358–359 Structural constraints, of relationships, 218 Structural diagrams, UML, 329 Structured data extracting, 1022 overview of, 416 vs unstructured, 993–994 Structured domains, in UML class diagrams, 227 Structured literals, 378 Subclasses in EER model, 246–248, 264 generalizing into superclasses, 250 as leaf classes in UML, 265 options for mapping specialization or generalization, 294 predicate-defined and userdefined, 252 shared, 256 specific attributes (local attributes) of, 249 specific relationship types and, 249–250 union types or categories, 258–260 Subset of Cartesian product, 63 Subsets, of attributes, 68–69 Substring pattern matching, in SQL, 105–106 Subtrees, 646 Subtypes, 247, 365–366 SUM function aggregate functions in SQL, 124–125 grouping and, 166, 168 implementing aggregate operations, 698 Superclass/subclass relationships in EER model, 264 overview of, 247 union types or categories, 258–260 Superclasses base class and, 265 in EER model, 246–248, 264 generalization and, 250 options for mapping specialization or generalization, 294 specialization and, 248 Superkeys defined, 518 relational model constraints, 69 Supertypes, 247, 365 Superuser accounts, 838 Supervised learning classification and, 1051 neural networks and, 1058 Support, for association rules, 1040 Surrogate keys, 298 Survivability, challenges in database security, 867 SVD (singular value decompositions), 967 Symmetric key algorithms, 863 Symmetric multiprocessor (SMP), 1079 Synonyms, thesaurus as collection of, 1010 Syntactic analysis, in semantic model for IR, 1006 System accounts, 838 catalog, 42 definition in database application life cycle, 308 1167 recovery needed due to system error, 750 security issues at system level, 836 System designers, 16 System environment DBMS module, 40–42 tools, application environments, and communication facilities, 43–44 utilities for, 42–43 System independent mapping, in choosing a DBMS, 326 System logs See also Logs/logging auditing and, 839–840 database recovery and, 808 tracking transaction operations, 753–754 Systems analyst, 16 Table inheritance, in SQL, 376 Tables ALTER TABLE command, 138–139 assigning privileges at table level, 842–843 base tables (relations) vs virtual relations, 90 basing on UDTs, 374 DROP TABLE command, 138 in relational model, 60, 61 retrieval queries from database tables, 494–495 in SQL, 89 SQL table as multiset of tuples, 97, 103–105 virtual See Views Tags HTML, 418–419 semistructured data and, 417 Tape jukeboxes, 586 Tape, magnetic, 592–593 Tape reel, 592 Taxonomies, 272 Technical metadata, in data warehousing, 1078 Templates dependencies, 572 in Query-By-Example, 1091 Temporal aggregation, 957 Temporal databases attribute versioning for incorporating time in OODBs, 953–954 bitemporal time relations, 950–952 options for storing tuples in temporal relations, 952–953 overview of, 943–945 1168 Index querying constructs using TSQL2 language, 954–956 time representation, calendars and time dimensions, 945–947 time series data, 957 transaction time relations, 949–950 valid time relations, 947–949 Temporal intersection join, 952 Temporal normal form, 952 Temporal variables, 948 Temporary updates (dirty reads), concurrency control and, 748–749 Term frequency-inverse document frequency See TF-IDF (term frequency-inverse document frequency) Terminated state, transactions, 752 Terms (keywords) modes of interaction in IR systems, 999 sets of terms in Boolean model for IR, 1002 Ternary relationships choosing between binary and ternary relationships, 228–231 constraints on, 232 in ER (Entity-Relationship) model, 213–214 Tertiary storage, 584, 586 Testing conflict serializability of schedules, 763–765 in database application life cycle, 308 Texels (texture elements), 967 Text preprocessing in information retrieval, 1009–1012 sources in multimedia databases, 966 storing XML document as, 431 Texture, automatic analysis of images, 967 TF-IDF (term frequency-inverse document frequency) applying to inverted indexing, 1013 in vector space model for IR, 1003–1004 Thematic analysis, for spatial databases, 959 Theorem proving, in deductive databases, 976 Thesaurus ontologies, 272 text preprocessing in information retrieval, 1010–1011 Third normal form (3NF) dependency-preserving and nonadditive join decomposition into, 558–563 dependency-preserving decomposition into, 558–559 general definition of, 528 overview of, 523–525 Thomas’s write rule, 791 Threats, to database security, 836–837 Three-phase commit (3PC) protocol, 908 three-schema architecture data independence and, 35–36 levels of, 34–35 overview of, 33 Three-tier architectures client/server architecture, 892–894 PHP, 482 for Web applications, 47–49 Three-valued logic, 116–117 Time constraints, on queries and transactions, 729 TIME data type, 945 Time dimensions, in temporal databases, 945–947 Time periods, in temporal databases, 946 Time representation, in temporal databases, 945–947 Time series management systems, 957 patterns in, 1039, 1057 as specialized database applications, 25 in temporal databases, 946, 957 Time-varying attributes, 953 Timeouts, for dealing with deadlocks, 788 TIMESTAMP data type, SQL, 93, 945 Timestamp ordering (TO) basic, 789–790 for concurrency control, 777 multiversion technique based on, 792 strict timestamp ordering, 790–791 Thomas’s write rule, 791 Timestamps overview of, 789 read and write, 789 transaction time relations and, 949 Timing channels, covert, 861 TO See Timestamp ordering (TO) Tool developers, 17 Tools, DBMS, 43–44 Top-down methodology for conceptual refinement, 257 for database design, 502 for schema design, 315–316 Topical relevance, in IR, 1015 Topological operators, 960 Topological relationships, among spatial objects, 959 Topologies, network, 879 Total categories, 260 Total participation, binary relationships and, 217 Total specialization constraint, 253 Tracks, on hard disks, 589 Trade-off analysis, 345 Training costs, in choosing a DBMS, 323–324 Transaction-id, 753 Transaction processing systems ACID properties, 754–755 bibliographic references, 775 characterizing schedules based on recoverability, 757–759 characterizing schedules based on serializability, 759–760 commit point of transactions, 754 concurrency control, 747–750 database design and, 306 equivalence of schedules, 769–770 overview of, 743–744 recovery, 750–751 schedules (histories) of transactions, 756–757 serial, nonserial, and conflictserializable schedules, 761–763 serializability used for concurrency control, 765–768 single-user vs multiuser, 744–745 SQL support for transactions, 770–772 summary and exercises, 772–774 system log, 753–754 testing conflict serializability of schedules, 763–765 transaction states and operations, 751–752 transactions, database items, read/write operations, and DBMS buffers, 745–747 view equivalence and view serializability, 768–769 Index Transaction processing systems, in distributed databases catalog management, 913 concurrency control, 909–912 operating system support, 909 overview of, 907–908 recovery, 912–913 two-phase and three-phase commit protocols, 908–909 Transaction Table, in ARIES recovery algorithm, 822 Transaction time, in temporal databases, 946 Transaction time relations, in temporal databases, 949–950 Transaction timestamp, 786 Transactional databases, distinguishing data warehouses from, 1069 Transactional searches, 996 Transactions ACID properties, 754–755 canned, 15 commit point of, 754 committed and aborted, 750 defined, designing, 322–323 interactive, 801 multiuser, 13–14 recovery needed due to transaction error, 750 relational data model and, 79 schedules (histories) of, 756–757 SQL transaction control commands, 111 states and operations, 751–752 throughput in physical database design, 327 types of, 745 Transfer rate (tr), disk blocks, 1088 Transformation approach, to image database queries, 966 Transience collections, 367 data, 586 object lifetime and, 378 objects, 355, 363 Transition constraints, 75 Transition tables, in STARBURST example, 940 Transitive closure, of relations, 168 Transitive dependencies, in 3NF, 523–524 Transparency autonomy as complement to, 882 in distributed databases, 879–881 Tree data models See Hierarchical data models Tree structures See also B+-trees; B-trees decision making in database design, 730 FP-tree (frequent-pattern tree) algorithm, 1043–1045 leaf-deep trees, 718 overview of, 646–647 R-trees, 962 search trees, 647–649 specialization hierarchy, 255 TV-trees (telescoping vector trees), 967 Triggers active rules specified by, 933 associating with database tables, 21 before, after, and instead triggers, 938 CREATE TABLE command, 132–133 CREATE TRIGGER command, 936 creating in SQL, 111 overview of, 932 row-level and statement-level, 937 specifying constraints, 74 in SQL-99, 942–943 Truth values, of atoms, 184 TSQL2 language, 954–956 Tuning databases design, 735–736 guidelines for, 738–739 implementation and, 311 indexes, 734–735 overview of, 733–734 queries, 736–738 system implementation and tuning, 327–328 Tuple-based constraints, 97 Tuple relational calculus examples of queries in, 178–179 existential and universal quantifiers, 177–178 expressions and formulas, 176–177 notation for query graphs, 179–180 overview of, 174–175 safe expressions, 182–183 SQL based on, 88 transforming universal and existential quantifiers, 180 1169 tuple variables and range relations, 175–176 universal quantifier used in queries, 180–182 Tuple versioning approach, to implementing temporal databases, 947–953 bitemporal time relations, 950–952 implementation considerations, 952–953 transaction time relations and, 949–950 valid time relations and, 947–949 Tuples (rows) classification in mandatory access control, 848 combining using JOIN operation, 157–158 comparison of values in, 118 component values of, 67 dangling tuples in relational design, 563–565 defined, 61 disallowing spurious, 510–513 eliminating duplicates, 150 hypothesis tuples, 572 n-tuple for relations, 62 ordering in relations, 64 ordering values within, 64–65 reducing NULL values in, 509–510 reducing redundant information in, 507–509 retrieving all attribute values of selected, 102–103 retrieving multiple tuples in SQLJ, 461–464 retrieving multiple tuples using cursors, 455–457 SQL table as multiset of, 97 storing in temporal relations, 952–953 unspecified WHERE clause and, 102 valid time relations and, 948 values and NULLS in, 65–66 versioning for incorporating time in relational databases, 953 Tuples variables aliases and, 101 looping with iterators, 98 range relations and, 175–176 TV-trees (telescoping vector trees), 967 1170 Index Two-phase commit (2PC) protocol recovery in multidatabase systems, 825–826 transaction management in distributed databases, 908 Two-phase locking basic locks, 784 binary locks, 778–780 conversion of locks, 782 overview of, 777–778 serializability guaranteed by, 782–784 shared/exclusive (read/write) locks, 780–782 variations on two-phase locking, 784–785 Two-tier client/server architecture, 46–47 Two-way joins, 689 Type (class) hierarchies constraints on extents corresponding to, 366–367 inheritance and, 369 in OO systems, 356 simple model for inheritance, 364–366 Type-compatible relations, 697 Type constructors atom constructor, 358 collection constructor, 359 defined, 369 ODB features included in SQL, 370 ODL and, 359–360 struct (tuple) constructor, 358–359 Type generator, 358–359 UDT (user-defined types) creating, 370–373 in SQL, 111 tables based on, 374 UML (Unified Modeling Language) class diagrams, 226–228 for database application design, 329 as design specification standard, 328 diagram types, 329–334 notation for ER diagrams, 224 object modeling with, 200 representing specialization/generalization in, 265–266 University student database example, 334–337 UMLS metathesaurus, 1010–1011 Unary relational operations CARTESIAN PRODUCT operation, 155–157 overview of, 146 PROJECT operation, 149–150 SELECT operation, 147–149 UNION, INTERSECTION, and MINUS operations, 152–155 Unbalanced trees, 646 Unconstrained write assumption, 769 UNDO/NO-REDO recovery immediate update techniques, 818–819 overview of, 807, 809 Undo operations, transactions, 753 UNDO phase, of ARIES recovery algorithm, 823 UNDO/REDO recovery immediate update techniques, 819 overview of, 807, 809 UNDO, write-ahead logging and, 810–811 Unidirectional associations, in UML class diagrams, 227 Unified Modeling Language See UML (Unified Modeling Language) UNION operation algorithms for, 697–698 in relational algebra, 152–155 SQL set operations, 104 Union types (categories) EER-to-Relational mapping, 297–299 modeling, 258–260 UNIQUE function, SQL, 122 Unique identity, in ODMS, 357 UNIQUE KEY clause, CREATE TABLE command, 96 Unique keys, in relational models, 70 Uniqueness constraints on entity attributes, 208–209 factors influencing physical database design, 729 integrity constraints in databases, 21 overview of, 68–70 specifying in SQL, 95–96 Universal quantifiers transforming, 180 in tuple relational calculus, 177–178 used in queries, 180–182 Universal relation assumption, 552 Universal relation schema, 552 Universal relations, 544 Universe of discourse (UoD), University student database example data records in, 6–9 EER schema applied to, 260–263 Unordered (heap files) records, 601–602 Unrepeatable read problem, 750 Unstructured data HTML and, 418–420 information retrieval dealing with, 993–994 Unsupervised learning clustering and, 1054 neural networks and, 1058 UoD (universe of discourse), Update anomalies, avoiding redundant information in tuples, 507 UPDATE command, SQL active rules and, 936 overview of, 109–110 Update operations bitemporal databases and, 950 database design and, 728 factors influencing physical database design, 729 operations on files, 599 query processing in distributed databases, 905–907 in relational data model, 78–79 types of relational data model operations, 75 Update transactions, 322 Usage projections, data warehousing and, 1080 Use case diagrams, UML, 329–331 User accounts, database security and, 839–840 User-defined subclasses, 252, 264 User-defined time, 947 User-defined types See UDT (userdefined types) User-friendly interfaces, 38 User interfaces GUIs (graphical user interfaces), 20, 39, 1061 multiple users, 20 User labels, combining with data labels, 869–870 Users classifying DBMSs by number of, 49 Index database actors on the scene, 15–16 measures of relevance in IR, 1015 multiuser transactions, 13–14 types of users in information retrieval, 995–996 Utilities, DBMS system, 42–43 Valid event data, 957 Valid state database states, 33 relational databases, 71 Valid time databases, 946 Valid time, in temporal databases, 946 Valid time relations, in temporal databases, 947–949 valid XML documents, 422–425 Validation in database application life cycle, 307–308 of queries, 679 Validation (optimistic) concurrency control, 777, 794–795 Validation phase, of optimistic concurrency control, 794 Value, hue, saturation, and, 967 Value references, in RDBs, 396 Value sets (domains), of attributes, 209–210 Values stored in records, 594 in tuples, 65–66 Values (literals) atomic formulas as, 973 atomic literals, 378 collection literals, 382 complex types for, 358–360 in OO systems, 358 structured literals, 378 Variable-length records, 595–597 Variables bind variables (parameterized statements), 858 communication variables in SQL, 454 domain, 183 instance, 356 iterator variables, in OQL, 399–400 limited, 980 PHP, 485–486 PHP server, 490–491 PHP variable names, 484–485 program, 599 in Prolog languages, 971 scope, 490 shared, 452 temporal, 948 tuple, 98, 101, 175–176 VDL (view definition language), 37 Vector space model, for information retrieval, 1003–1005 Vertical fragmentation, in distributed databases, 881, 895 Vertical partitioning, database tuning and, 735 Vertical propagation, of privileges, 847 Vertical search engines, 1018 Very large databases, 586 Victim selection algorithm, for deadlock prevention, 788 Video applications, 25 Video clips, in multimedia databases, 932, 965 Video segments, in multimedia databases, 966 Video sources, in multimedia databases, 966 View definition language (VDL), 37 View equivalence, of transaction schedules, 768–769 View integration approach, in conceptual schema design, 315 View materialization, 135 View serializability, of transaction schedules, 768–769 Views data warehouses compared with, 1079–1080 database designers creating, 15 granting/revoking privileges, 844 multiple views of data supported in databases, 12 specifying as named queries in OQL, 402–403 Views (virtual tables), SQL vs base tables, 134 CREATE VIEW command, 134–135 implementation and update, 135–137 inline views, 137 overview of, 89, 133–134 Virtual data, in views, 12 Virtual data warehouses, 1070 Virtual private databases (VPDs), 868–869 Virtual relations, specifying with CREATE VIEW command, 90 1171 Virtual tables See Views (virtual tables), SQL Visible/hidden attributes, of objects, 361 Vocabularies in inverted indexing, 1012 searching, 1013–1014 Volatile storage, 586 Voting method, distributed concurrency control based on, 912 VPDs (virtual private databases), 868–869 Wait-die transaction timestamp, 786 Wait-for graph, 787 WAL (write-ahead logging), 810–812 WANs (wide area networks), 879 Weak entity types, 219–220, 288–289 Web access control policies for, 854–855 hypertext documents and, 415 interchanging data on, 24 Web analysis, 1019, 1027 Web applications, architectures for, 47–49 Web-based user interfaces, 38 Web browsers, 38 Web clients, 38 Web content analysis agent-based approach to, 1024–1025 concept hierarchies in, 1024 database-based approach to, 1025 ontologies and, 1023–1024 overview of, 1022 segmenting Web pages and detecting noise, 1024 structured data extraction, 1022 types of Web analysis, 1019 Web information integration, 1022–1023 Web crawlers, 1028 Web databases, programming See PHP Web forms, collecting data from/inserting record into, 493–494 Web interface, for database applications, 449 Web Ontology Language (OWL), 969 Web pages analyzing link structure of, 1020–1021 1172 Index content analysis, 1024 ranking, 1000 Web query interface integration, 1023 Web search and analysis analyzing link structure of Web pages, 1020–1021 comparing with information retrieval, 1018–1019 HITS ranking algorithm, 1021–1022 overview of, 1018 PageRank algorithm, 1021 practical uses of Web analysis, 1027–1028 searching the Web, 1020 Web content analysis, 1022–1025 Web searches combining browsing and retrieval, 1000 Web usage analysis, 1025–1027 Web security, 1028 Web servers middle tier in three-tier architecture, 48 specialized servers in client/server architecture, 45 Web Services Description Language (WSDL), 436 Web spamming, 1028 Web structure analysis analyzing link structure of Web pages, 1020–1022 types of Web analysis, 1019 Web usage analysis pattern analysis phase of, 1027 pattern discovery phase of, 1026–1027 preprocessing phase of, 1025–1026 types of Web analysis, 1019 Well-formed XML, 422–425 WHERE clause DELETE command, 109 explicit sets of values in, 122 missing or unspecified, 102 in SQL retrieval queries, 129–130 UPDATE command, 109–110 Wide area networks (WANs), 879 Wildcard (*) types of queries in IR systems, 1008–1009 using with XPath, 433 WITH CHECK OPTION, view updates and, 137 WordNet thesaurus, 1011 Wound-wait transaction timestamp, 786 Wrappers, structured data extraction and, 1022 Write-ahead logging (WAL), 810–812 Write command, hard disks and, 591 Write phase, of optimistic concurrency control, 794 Write-set, of transactions, 747 Write timestamp, 789 Write-write conflicts, in transaction schedules, 757 write_item(X), 746 WSDL (Web Services Description Language), 436 XML access control, 853–854 XML declaration, 423 XML (eXtended Markup Language) data model, 51 interchanging data on Web using, 24 XML (Extensible Markup Language) bibliographic references, 443 converting graphs into trees, 441 hierarchical (tree) data model, 420–422 hierarchical XML views over flat or graph-based data, 436–440 languages, 432 languages related to, 436 overview of, 415–416 storing/extracting XML documents from databases, 431–432, 442 structured, semistructured, and unstructured data, 416–420 summary and exercises, 442–443 well-formed and valid documents, 422–425 XML schema language, 425–430 XPath, 432–434 XQuery, 434–435 XML schema language, 425–430 example schema file, 426–428 list of concepts in, 428–429 overview of, 425 XPath, 432–434 XQuery, 434–435 XSL (Extensible Stylesheet Language), 415, 436 XSLT (Extensible Stylesheet Language Transformations), 415, 436 ... the value of one of the attributes of a particular department—say, the manager of department 5—we must update the tuples of all employees who work in that department; otherwise, the database will... LOTS1A LOTS1B 1NF LOTS2 2NF LOTS2 3NF Price Tax_rate 527 528 Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases 15.4 .2 General Definition of Third Normal Form... Franklin T 20 .0 ProductX Bellaire Smith, John B 20 .0 ProductX Bellaire English, Joyce A Smith, John B English, Joyce A Pnumber Hours 32. 5 ProductX * 123 456789 32. 5 123 456789 7.5 * 123 456789 * 123 456789