Database Modeling & Design Fourth Edition- P25 doc

107 6 Normalization his chapter focuses on the fundamentals of normal forms for relational databases and the database design step that normalizes the candidate tables [step II(d) of the database design life cycle]. It also inves- tigates the equivalence between the conceptual data model (e.g., the ER model) and normal forms for tables. As we go through the examples in this chapter, it should become obvious that good, thoughtful design of a conceptual model will result in databases that are either already normalized or can be easily normalized with minor changes. This truth illus- trates the beauty of the conceptual modeling approach to database design, in that the experienced relational database designer will develop a natural gravitation toward a normalized model from the beginning. For most database practitioners, Sections 6.1 through 6.4 cover the critical normalization needed for everyday use, through Boyce-Codd normal form (BCNF). Section 6.5 covers the higher normal forms of mostly theoretical interest; however, we do show the equivalency between higher normal forms and ternary relationships in the ER model and UML for the interested reader. 6.1 Fundamentals of Normalization Relational database tables, whether they are derived from ER or UML models, sometimes suffer from some rather serious problems in terms of T Teorey.book Page 107 Saturday, July 16, 2005 12:57 PM 108 CHAPTER 6 Normalization performance, integrity and maintainability. For example, when the entire database is defined as a single large table, it can result in a large amount of redundant data and lengthy searches for just a small number of target rows. It can also result in long and expensive updates, and dele- tions in particular can result in the elimination of useful data as an unwanted side effect. Such a situation is shown in Figure 6.1, where products, salespersons, customers, and orders are all stored in a single table called Sales. In this table, we see that certain product and customer information is stored redundantly, wasting storage space. Certain queries, such as “Which customers ordered vacuum cleaners last month?” would require a search of the entire table. Also, updates such as changing the address of the customer Dave Bachmann would require changing many rows. Finally, deleting an order by a valued customer such as Qiang Zhu (who bought an expensive computer), if that is his only outstanding order, deletes the only copy of his address and credit rating as a side effect. Such information may be difficult (or sometimes impossible) to recover. These problems also occur for situations in which the database has already been set up as a collection of many tables, but some of the tables are still too large. If we had a method of breaking up such a large table into smaller tables so that these types of problems would be eliminated, the database would be much more efficient and reliable. Classes of relational database schemes or table definitions, called normal forms, are commonly used to accomplish this goal. The creation of a normal form database table is called normalization. Normalization is accomplished by analyzing the Figure 6.1 Single table database DVD player Plymouth product_name Sales order_no cust_name cust_addr credit date 1-3-03 4-15-05 9-12-04 12-5-04 5-10-05 8-3-02 10-1-04 4-19-99 1-4-04 3-15-04 sales_name Carl Bloch Ted Hanss Dick Phillips Fred Remley R. Metz Paul Basile Carl Bloch Carl Bloch Dick Phillips Fred Remley 6 10 8 3 7 8 7 8 6 6 Austin Ann Arbor Detroit Chicago Mumbai Chicago Detroit Austin East Lansing 1458 2730 2460 519 1986 1817 1865 1885 1943 2315 Dave Bachmann Qiang Zhu Mike Stolarchuck Peter Honeyman Charles Antonelli C.V. Ravishankar Charles Antonelli Betsy Karmeisool Dave Bachmann Sakti Pramanik vacuum cleaner computer refrigerator radio CD player vacuum cleaner vacuum cleaner refrigerator television Teorey.book Page 108 Saturday, July 16, 2005 12:57 PM 6.1 Fundamentals of Normalization 109 interdependencies among individual attributes associated with those tables and taking projections (subsets of columns) of larger tables to form smaller ones. Let us first review the basic normal forms, which have been well established in the relational database literature and in practice. 6.1.1 First Normal Form Definition. A table is in first normal form (1NF) if and only if all columns contain only atomic values, that is, each column can have only one value for each row in the table. Relational database tables, such as the Sales table illustrated in Figure 6.1, have only atomic values for each row and for each column. Such tables are considered to be in first normal form, the most basic level of normalized tables. To better understand the definition for 1NF, it helps to know the dif- ference between a domain, an attribute, and a column. A domain is the set of all possible values for a particular type of attribute, but may be used for more than one attribute. For example, the domain of people’s names is the underlying set of all possible names that could be used for either customer-name or salesperson-name in the database table in Fig- ure 6.1. Each column in a relational table represents a single attribute, but in some cases more than one column may refer to different attributes from the same domain. When this occurs, the table is still in 1NF because the values in the table are still atomic. In fact, standard SQL assumes only atomic values and a relational table is by default in 1NF. A nice explanation of this is given in Muller [1999]. 6.1.2 Superkeys, Candidate Keys, and Primary Keys A table in 1NF often suffers from data duplication, update performance, and update integrity problems, as noted above. To understand these issues better, however, we must define the concept of a key in the con- text of normalized tables. A superkey is a set of one or more attributes, which, when taken collectively, allows us to identify uniquely an entity or table. Any subset of the attributes of a superkey that is also a superkey, and not reducible to another superkey, is called a candidate key. A primary key is selected arbitrarily from the set of candidate keys to be used in an index for that table. Teorey.book Page 109 Saturday, July 16, 2005 12:57 PM 110 CHAPTER 6 Normalization As an example, in Figure 6.2 a composite of all the attributes of the table forms a superkey because duplicate rows are not allowed in the relational model. Thus, a trivial superkey is formed from the composite of all attributes in a table. Assuming that each department address (dept_addr) in this table is single valued, we can conclude that the composite of all attributes except dept_addr is also a superkey. Looking at smaller and smaller composites of attributes and making realistic assumptions about which attributes are single valued, we find that the composite (report_no, author_id) uniquely determines all the other attributes in the table and is therefore a superkey. However, neither report_no nor author_id alone can determine a row uniquely, and the composite of these two attributes cannot be reduced and still be a superkey. Thus, the composite (report_no, author_id) becomes a candidate key. Since it is the only candidate key in this table, it also becomes the primary key. A table can have more than one candidate key. If, for example, in Figure 6.2, we had an additional column for author_ssn, and the composite of report_no and author_ssn uniquely determine all the other attributes of the table, then both (report_no, author_id) and (report_no, author_ssn) would be candidate keys. The primary key would then be an arbitrary choice between these two candidate keys. Other examples of multiple candidate keys can be seen in Figure 5.5 (see Chapter 5). In Figure 5.5a the table uses_notebook has three candidate keys: (emp_id, project_name), (emp_id, notebook_no), and (project_name, notebook_no); and in Figure 5.5b the table assigned_to has two candidate keys: (emp_id, loc_name) and (emp_id, project_name). Figures 5.5c and 5.5d each have only a single candidate key. Figure 6.2 Report table 5789 design report_no Report editor dept_no dept_name author_iddept_addr author_name mantei bolton koenig fry umar koenig author_addr cs-tor mathrev mathrev folkstone prise mathrev 53 44 71 26 38 71 design design analysis analysis analysis argus1 argus1 argus1 argus2 argus2 argus2 woolf woolf woolf koenig koenig koenig 15 15 15 27 27 27 4216 4216 4216 5789 5789 Teorey.book Page 110 Saturday, July 16, 2005 12:57 PM 6.1 Fundamentals of Normalization 111 6.1.3 Second Normal Form To explain the concept of second normal form (2NF) and higher, we introduce the concept of functional dependence, which was briefly described in Chapter 2. The property of one or more attributes that uniquely determine the value of one or more other attributes is called functional dependence. Given a table (R), a set of attributes (B) is function- ally dependent on another set of attributes (A) if, at each instant of time, each A value is associated with only one B value. Such a functional dependence is denoted by A -> B. In the preceding example from Figure 6.2, let us assume we are given the following functional dependencies for the table report: report: report_no -> editor, dept_no dept_no -> dept_name, dept_addr author_id -> author_name, author_addr Definition. A table is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is fully dependent on the primary key. An attribute is fully dependent on the primary key if it is on the right side of an FD for which the left side is either the primary key itself or something that can be derived from the primary key using the transitivity of FDs. An example of a transitive FD in report is the following: report_no -> dept_no dept_no -> dept_name Therefore we can derive the FD (report_no -> dept_name), since dept_name is transitively dependent on report_no. Continuing our example, the composite key in Figure 6.2, (report_no, author_id), is the only candidate key and is therefore the primary key. However, there exists one FD (dept_no -> dept_name, dept_addr) that has no component of the primary key on the left side, and two FDs (report_no -> editor, dept_no and author_id -> author_name, author_addr) that contain one component of the primary key on the left side, but not both components. As such, report does not satisfy the condition for 2NF for any of the FDs. Teorey.book Page 111 Saturday, July 16, 2005 12:57 PM . conceptual modeling approach to database design, in that the experienced relational database designer will develop a natural gravitation toward a normalized model from the beginning. For most database. fundamentals of normal forms for relational databases and the database design step that normalizes the candidate tables [step II(d) of the database design life cycle]. It also inves- tigates the. A -> B. In the preceding example from Figure 6.2, let us assume we are given the following functional dependencies for the table report: report: report_no -> editor, dept_no dept_no ->

Định dạng
Số trang	5
Dung lượng	184,72 KB