6.5 Fourth and Fifth Normal Forms 127 Step 5. Definition of the Minimum Set of Normalized Tables The minimum set of normalized tables has now been computed. We define them below in terms of the table name, the attributes in the table, the FDs in the table, and the candidate keys for that table: Note that this result is not only 3NF, but also BCNF, which is very frequently the case. This fact suggests a practical algorithm for a (near) minimum set of BCNF tables: Use Bernstein’s algorithm to attain a mini- mum set of 3NF tables, then inspect each table for further decomposi- tion (or partial replication, as shown in Section 6.1.5) to BCNF. 6.5 Fourth and Fifth Normal Forms Normal forms up to BCNF were defined solely on FDs, and, for most database practitioners, either 3NF or BCNF is a sufficient level of normal- ization. However, there are in fact two more normal forms that are needed to eliminate the rest of the currently known anomalies. In this section, we will look at different types of constraints on tables: multival- ued dependencies and join dependencies. If these constraints do not exist in a table, which is the most common situation, then any table in BCNF is automatically in fourth normal form (4NF), and fifth normal form (5NF) as well. However, when these constraints do exist, there may be further update (especially delete) anomalies that need to be corrected. First, we must define the concept of multivalued dependency. 6.5.1 Multivalued Dependencies Definition. In a multivalued dependency (MVD), X->>Y holds on table R with table scheme RS if, whenever a valid instance of table R(X,Y,Z) contains a pair of rows that contain duplicate values of X, R1: ABC (AB->C with key AB) R5: DFJ (F->DJ with key F) R2: AEF (A->EF with key A) R6: DKLMNP (D->KLMNP, L->D, with keys D, L) R3: EG (E->G with key E) R7: PQRT (PQR->T with key PQR) R4: DGI (G->DI with key G) R8: PRS (PR->S with key PR) Teorey.book Page 127 Saturday, July 16, 2005 12:57 PM 128 CHAPTER 6 Normalization then the instance also contains the pair of rows obtained by inter- changing the Y values in the original pair. This includes situations where only pairs of rows exist. Note that X and Y may contain either single or composite attributes. An MVD X ->> Y is trivial if Y is a subset of X, or if X union Y = RS. Finally, an FD implies an MVD, which implies that a single row with a given value of X is also an MVD, albeit a trivial form. The following examples show where an MVD does and does not exist in a table. In R1, the first four rows satisfy all conditions for the MVDs X->>Y and X->>Z. Note that MVDs appear in pairs because of the cross-product type of relationship between Y and Z=RS-Y as the two right sides of the two MVDs. The fifth and sixth rows of R1 (when the X value is 2) satisfy the row interchange conditions in the above defini- tion. In both rows, the Y value is 2, so the interchanging of Y values is trivial. The seventh row (3,3,3) satisfies the definition trivially. In table R2, however, the Y values in the fifth and sixth rows are dif- ferent (1 and 2), and interchanging the 1 and 2 values for Y results in a row (2,2,2) that does not appear in the table. Thus, in R2 there is no MVD between X and Y or between X and Z, even though the first four rows satisfy the MVD definition. Note that for the MVD to exist, all rows must satisfy the criterion for an MVD. Table R3 contains the first three rows that do not satisfy the crite- rion for an MVD, since changing Y from 1 to 2 in the second row results in a row that does not appear in the table. Similarly, changing Z from 1 to 2 in the third row results in a nonappearing row. Thus, R3 does not have any MVDs between X and Y or between X and Z. R1: XYZ R2: XY Z R3: XY Z 111 111 111 112 112 112 121 121 121 122 122 221 221 221 222 222 212 333 Teorey.book Page 128 Saturday, July 16, 2005 12:57 PM 6.5 Fourth and Fifth Normal Forms 129 By the same argument, in table R1 we have the MVDs Y->> X and Y->>Z, but none with Z on the left side. Tables R2 and R3 have no MVDs at all. The following inference rules for MVDs are somewhat analogous to the inference rules for functional dependencies given in Section 6.4 [Beeri, Fagin, and Howard, 1977]. They are quite useful in the analysis and decomposition of tables into 4NF. Multivalued Dependency Inference Rules 6.5.2 Fourth Normal Form The goal of 4NF is to eliminate nontrivial MVDs from a table by project- ing them onto separate smaller tables, and thus to eliminate the update anomalies associated with the MVDs. This type of normal form is rea- sonably easy to attain if you know where the MVDs are. In general, MVDs must be defined from the semantics of the database; they cannot be determined from just looking at the data. The current set of data can only verify whether your assumption about an MVD is currently true or not, but this may change each time the data is updated. Reflexivity X >> X Augmentation If X >> Y, then XZ >> Y. Transitivity If X >>Y and Y >> Z, then X >> (Z-Y). Pseudotransitivity If X >> Y and YW >> Z, then XW >> (Z-YW). (Transitivity is a special case of pseudotransitivity when W is null.) Union If X >> Y and X >> Z, then X >> YZ. Decomposition If X >> Y and X >> Z, then X >> Y intersect Z and X >> (Z-Y). Complement If X >> Y and Z=R-X-Y, then X >> Z. FD Implies MVD If X -> Y, then X >> Y. FD, MVD Mix If X >> Z and Y >> Z’ (where Z’ is contained in Z, and Y and Z are disjoint), then X->Z’. Teorey.book Page 129 Saturday, July 16, 2005 12:57 PM 130 CHAPTER 6 Normalization Definition. A table R is in fourth normal form (4NF) if and only if it is in BCNF and, whenever there exists an MVD in R (say X ->> Y), at least one of the following holds: the MVD is trivial, or X is a super- key for R. Applying this definition to the three tables in the example in the previous section, we see that R1 is not in 4NF because at least one non- trivial MVD exists and no single column is a superkey. In tables R2 and R3, however, there are no MVDs. Thu,s these two tables are at least 4NF. As an example of the transformation of a table that is not in 4NF to two tables that are in 4NF, we observe the ternary relationship skill- required, shown in Figure 6.6. The relationship skill-required is defined as follows: “An employee must have all the required skills needed for a project to work on that project.” For example, in Table 6.5 the project with proj_no = 3 requires skill types A and B by all employees (see employees 101 and 102). The table skill_required has no FDs, but it does have several nontrivial MVDs, and is therefore only in BCNF. In such a case it can have a lossless decomposition into two many-to-many binary relationships between the entities Employee and Project, and Project and Skill. Each of these two new relationships represents a table in 4NF. It can also have a lossless decomposition resulting in a binary many-to-many relationship between the entities Employee and Skill, and Project and Skill. A two-way lossless decomposition occurs when skill_required is projected over (emp_id, proj_no) to form skill_req1 and projected over (proj_no, skill_type) to form skill_req3. Projection over (emp_id, Figure 6.6 Ternary relationship with multiple interpretations Employee Skill Project NN N ** (1) skill-required (2) skill-in-common (3) skill-used ** Teorey.book Page 130 Saturday, July 16, 2005 12:57 PM 6.5 Fourth and Fifth Normal Forms 131 proj_no) to form skill_req1 and over (emp_id, skill_type) to form skill_req2, however, is not lossless. A three-way lossless decomposition occurs when skill_required is projected over (emp_id, proj_no), (emp_id, skill_type), and (proj_no, skill_type). Tables in 4NF avoid certain update anomalies (or inefficiences). For instance, a delete anomaly exists when two independent facts get tied together unnaturally so that there may be bad side effects of certain deletes. For example, in skill_required, the last row of a skill_type may be lost if an employee is temporarily not working on any projects. An update inefficiency may occur when adding a new project in skill_required, which requires insertions for many rows to include all the required skills for that new project. Likewise, loss of a project requires many deletions. These inefficiencies are avoided when Table 6.5 The Table skill_required and Its Three Projections skill_required emp_id proj_no skill_type MVDs(nontrivial) 101 3 A proj_no ->> skill_type 101 3 B proj_no ->> emp_id 101 4 A 101 4 C 102 3 A 102 3 B 103 5 D skill_req1 skill_req2 skill_req3 emp_id proj_no emp_id skill_type proj_no skill_type 101 3 101 A 3 A 101 4 101 B 3 B 102 3 101 C 4 A 103 5 102 A 4 C 102 B 5 D 103 D Teorey.book Page 131 Saturday, July 16, 2005 12:57 PM . updated. Reflexivity X >> X Augmentation If X >> Y, then XZ >> Y. Transitivity If X >>Y and Y >> Z, then X >> (Z-Y). Pseudotransitivity If X >> Y and YW >> Z,. X >> Y intersect Z and X >> (Z-Y). Complement If X >> Y and Z=R-X-Y, then X >> Z. FD Implies MVD If X -> Y, then X >> Y. FD, MVD Mix If X >> Z and Y >>. XW >> (Z-YW). (Transitivity is a special case of pseudotransitivity when W is null.) Union If X >> Y and X >> Z, then X >> YZ. Decomposition If X >> Y and X >>