1.2 The Database Life Cycle 7 number of data dependencies that need to be analyzed. This is accomplished by inserting conceptual data modeling and inte- gration steps [steps II(a) and II(b) of Figure 1.2] into the tradi- Figure 1.2 (continued) Step III Physical design Step II(c) Transformation of the conceptual model to SQL tables Step II(d) Normalization of SQL tables Customer Product prod-no prod-name qty-in-stock cust-no sales-name sales-name addr addr dept dept job-level job-level job-level vacation-days vacation-days Order-product order-no prod-no Order order-no sales-name cust-n o cust-name . . . Salesperson Decomposition of tables and removal of update anomalies Indexing Clustering Partitioning Materialized views Denormalization Salesperson Sales-vacations create table customer (cust_no integer, cust_name char(15), cust_addr char(30), sales_name char(15), prod_no integer, primary key (cust_no), foreign key (sales_name) references salesperson foreign key (prod_no) references product); Teorey.book Page 7 Saturday, July 16, 2005 12:57 PM 8CHAPTER 1Introduction tional relational design approach. The objective of these steps is an accurate representation of reality. Data integrity is preserved through normalization of the candidate tables created when the conceptual data model is transformed into a relational model. The purpose of physical design is to optimize performance as closely as possible. As part of the physical design, the global schema can some- times be refined in limited ways to reflect processing (query and transaction) requirements if there are obvious, large gains to be made in efficiency. This is called denormalization. It consists of selecting dominant processes on the basis of high frequency, high volume, or explicit priority; defining simple extensions to tables that will improve query performance; evaluating total cost for query, update, and storage; and considering the side effects, such as possible loss of integrity. This is particularly important for Online Analytical Processing (OLAP) applications. IV. Database implementation, monitoring, and modifica- tion. Once the design is completed, the database can be created through implementation of the formal schema using the data definition language (DDL) of a DBMS. Then the data manipula- tion language (DML) can be used to query and update the data- base, as well as to set up indexes and establish constraints, such as referential integrity. The language SQL contains both DDL and DML constructs; for example, the create table command rep- resents DDL, and the select command represents DML. As the database begins operation, monitoring indicates whether performance requirements are being met. If they are not being satisfied, modifications should be made to improve performance. Other modifications may be necessary when requirements change or when the end users’ expectations increase with good performance. Thus, the life cycle continues with monitoring, redesign, and modifications. In the next two chapters we look first at the basic data modeling concepts and then—starting in Chapter 4—we apply these concepts to the database design process. 1.3 Conceptual Data Modeling Conceptual data modeling is the driving component of logical database design. Let us take a look at how this component came about, and why Teorey.book Page 8 Saturday, July 16, 2005 12:57 PM 1.3 Conceptual Data Modeling 9 it is important. Schema diagrams were formalized in the 1960s by Charles Bachman. He used rectangles to denote record types and directed arrows from one record type to another to denote a one-to- many relationship among instances of records of the two types. The entity-relationship (ER) approach for conceptual data modeling, one of the two approaches emphasized in this book and described in detail in Chapter 2, was first presented in 1976 by Peter Chen. The Chen form of the ER model uses rectangles to specify entities, which are somewhat analogous to records. It also uses diamond-shaped objects to represent the various types of relationships, which are differentiated by numbers or letters placed on the lines connecting the diamonds to the rectangles. The Unified Modeling Language (UML) was introduced in 1997 by Grady Booch and James Rumbaugh and has become a standard graphi- cal language for specifying and documenting large-scale software sys- tems. The data modeling component of UML (now UML-2) has a great deal of similarity with the ER model and will be presented in detail in Chapter 3. We will use both the ER model and UML to illustrate the data modeling and logical database design examples throughout this book. In conceptual data modeling, the overriding emphasis is on simplic- ity and readability. The goal of conceptual schema design, where the ER and UML approaches are most useful, is to capture real-world data requirements in a simple and meaningful way that is understandable by both the database designer and the end user. The end user is the person responsible for accessing the database and executing queries and updates through the use of DBMS software, and therefore has a vested interest in the database design process. The ER model has two levels of definition—one that is quite simple and another that is considerably more complex. The simple level is the one used by most current design tools. It is quite helpful to the database designer who must communicate with end users about their data require- ments. At this level you simply describe, in diagram form, the entities, attributes, and relationships that occur in the system to be conceptual- ized, using semantics that are definable in a data dictionary. Specialized constructs, such as “weak” entities or mandatory/optional existence notation, are also usually included in the simple form. But very little else is included, to avoid cluttering up the ER diagram while the designer’s and end user’s understandings of the model are being reconciled. An example of a simple form of ER model using the Chen notation is shown in Figure 1.3. In this example, we want to keep track of video- tapes and customers in a video store. Videos and customers are repre- sented as entities Video and Customer, and the relationship “rents” Teorey.book Page 9 Saturday, July 16, 2005 12:57 PM 10 CHAPTER 1 Introduction shows a many-to-many association between them. Both Video and Cus- tomer entities have a few attributes that describe their characteristics, and the relationship “rents” has an attribute due date that represents the date that a particular video rented by a specific customer must be returned. From the database practitioner’s standpoint, the simple form of the ER model (or UML) is the preferred form for both data modeling and end user verification. It is easy to learn and applicable to a wide variety of design problems that might be encountered in industry and small busi- nesses. As we will demonstrate, the simple form can be easily translated into SQL data definitions, and thus it has an immediate use as an aid for database implementation. The complex level of ER model definition includes concepts that go well beyond the simple model. It includes concepts from the semantic models of artificial intelligence and from competing conceptual data models. Data modeling at this level helps the database designer capture more semantics without having to resort to narrative explanations. It is also useful to the database application programmer, because certain integrity constraints defined in the ER model relate directly to code— code that checks range limits on data values and null values, for exam- ple. However, such detail in very large data model diagrams actually detracts from end user understanding. Therefore, the simple level is recommended as the basic communication tool for database design verification. Figure 1.3 A simple form of ER model using the Chen notation due-datecust-id cust-name NN Customer Video video-id copy-no title rents Teorey.book Page 10 Saturday, July 16, 2005 12:57 PM 1.4 Summary 11 1.4 Summary Knowledge of data modeling and database design techniques is impor- tant for database practitioners and application developers. The database life cycle shows the steps needed in a methodical approach to designing a database,, from logical design, which is independent of the system environment, to physical design, which is based on the details of the database management system chosen to implement the database. Among the variety of data modeling approaches, the ER and UML data models are arguably the most popular ones in use today, due to their simplicity and readability. A simple form of these models is used in most design tools; it is easy to learn and to apply to a variety of industrial and business applications. It is also a very useful tool for communicating with the end user about the conceptual model and for verifying the assumptions made in the modeling process. A more complex form, a superset of the simple form, is useful for the more experienced designer who wants to capture greater semantic detail in diagram form, while avoiding having to write long and tedious narrative to explain certain requirements and constraints. 1.5 Literature Summary Much of the early data modeling work was done by Bachman [1969, 1972], Chen [1976], Senko et al. [1973], and others. Database design textbooks that adhere to a significant portion of the relational database life cycle described in this chapter are Teorey and Fry [1982], Muller [1999], Stephens and Plew [2000], Simsion and Witt [2001], and Hernan- dez and Getz [2003]. Temporal (time-varying) databases are defined and discussed in Jensen and Snodgrass [1996] and Snodgrass [2000]. Other well used approaches for conceptual data modeling include IDEF1X [Bruce, 1992; IDEF1X, 2005] and the data modeling component of the Zachmann Framework [Zachmann, 1987; Zachmann Institute for Frame- work Advancement, 2005]. Schema evolution during development, a frequently occurring problem, is addressed in Harriman, Hodgetts, and Leo [2004]. Teorey.book Page 11 Saturday, July 16, 2005 12:57 PM . data modeling and database design techniques is impor- tant for database practitioners and application developers. The database life cycle shows the steps needed in a methodical approach to designing a. data modeling and logical database design examples throughout this book. In conceptual data modeling, the overriding emphasis is on simplic- ity and readability. The goal of conceptual schema design, . designing a database, , from logical design, which is independent of the system environment, to physical design, which is based on the details of the database management system chosen to implement the database. Among