132 CHAPTER 6 Normalization skill_required is decomposed into skill_req1 and skill_req3. In general (but not always), decomposition of a table into 4NF tables results in less data redundancy. 6.5.3 Decomposing Tables to 4NF Algorithms to decompose tables into 4NF are difficult to develop. Let’s look at some straightforward approaches to 4NF from BCNF and lower normal forms. First, if a table is BCNF, it either has no FDs, or each FD is characterized by its left side being a superkey. Thus, if the only MVDs in this table are derived from its FDs, they have only superkeys as their left sides, and the table is 4NF by definition. If, however, there are other nontrivial MVDs whose left sides are not superkeys, the table is only in BCNF and must be decomposed to achieve higher normalization. The basic decomposition process from a BCNF table is defined by selecting the most important MVD (or if that is not possible, then by selecting one arbitrarily), defining its complement MVD, and decom- pose the table into two tables containing the attributes on the left and right sides of that MVD and its complement. This type of decomposition is lossless because each new table is based on the same attribute, which is the left side of both MVDs. The same MVDs in these new tables are now trivial because they contain every attribute in the table. However, other MVDs may be still present, and more decompositions by MVDs and their complements may be necessary. This process of arbitrary selec- tion of MVDs for decomposition is continued until only trivial MVDs exist, leaving the final tables in 4NF. As an example, let R(A,B,C,D,E,F) with no FDs, and with MVDs A ->> B and CD ->> EF. The first decomposition of R is into two tables R1(A,B) and R2(A,C,D,E,F) by applying the MVD A ->> B and its complement A->>CDEF. Table R1 is now 4NF, because A ->> B is trivial and is the only MVD in the table. Table R2, however, is still only BCNF, because of the nontrivial MVD CD ->> EF. We then decompose R2 into R21(C,D,E,F) and R22(C,D,A) by applying the MVD CD ->> EF and its complement CD ->> A. Both R21 and R22 are now 4NF. If we had applied the MVD complement rule in the opposite order, using CD ->> EF and its complement CD ->> AB first, the same three 4NF tables would result from this method. However, this does not occur in all cases; it only occurs in those tables where the MVDs have no intersecting attributes. This method, in general, has the unfortunate side effect of poten- tially losing some or all of the FDs and MVDs. Therefore, any decision to Teorey.book Page 132 Saturday, July 16, 2005 12:57 PM 6.5 Fourth and Fifth Normal Forms 133 transform tables from BCNF to 4NF must take into account the trade-off between normalization and the elimination of delete anomalies, and the preservation of FDs and possibly MVDs. It should also be noted that this approach derives a feasible, but not necessarily a minimum, set of 4NF tables. A second approach to decomposing BCNF tables is to ignore the MVDs completely and split each BCNF table into a set of smaller tables, with the candidate key of each BCNF table being the candidate key of a new table and the nonkey attributes distributed among the new tables in some semantically meaningful way. This form of decomposing by candi- date key (that is, superkey) is lossless because the candidate keys uniquely join; it usually results in the simplest form of 5NF tables, those with a candidate key and one nonkey attribute, and no MVDs. However, if one or more MVDs still exist, further decomposition must be done with the MVD/MVD-complement approach given above. The decompo- sition by candidate keys preserves FDs, but the MVD/MVD-complement approach does not preserve either FDs or MVDs. Tables that are not yet in BCNF can also be directly decomposed into 4NF using the MVD/MVD-complement approach. Such tables can often be decomposed into smaller minimum sets than those derived from transforming into BCNF first and then 4NF, but with a greater cost of lost FDs. In most database design situations, it is preferable to develop BCNF tables first, then evaluate the need to normalize further while pre- serving the FDs. 6.5.4 Fifth Normal Form Definition. A table R is in fifth normal form (5NF) or project-join nor- mal form (PJ/NF) if and only if every join dependency in R is implied by the keys of R. As we recall, a lossless decomposition of a table implies that it can be decomposed by two or more projections, followed by a natural join of those projections (in any order) that results in the original table, without any spurious or missing rows. The general lossless decomposition con- straint, involving any number of projections, is also known as a join dependency (JD). A join dependency is illustrated by the following exam- ple: in a table R with n arbitrary subsets of the set of attributes of R, R satisfies a join dependency over these n subsets if and only if R is equal to the natural join of its projections on them. A JD is trivial if one of the subsets is R itself. Teorey.book Page 133 Saturday, July 16, 2005 12:57 PM 134 CHAPTER 6 Normalization 5NF or PJ/NF requires satisfaction of the membership algorithm [Fagin, 1979], which determines whether a JD is a member of the set of logical consequences of (can be derived from) the set of key dependen- cies known for this table. In effect, for any 5NF table, every dependency (FD, MVD, JD) is determined by the keys. As a practical matter we note that because JDs are very difficult to determine in large databases with many attributes, 5NF tables are not easily derivable, and logical database design typically produces BCNF tables. We should also note that by the preceding definitions, just because a table is decomposable does not necessarily mean it is not 5NF. For exam- ple, consider a simple table with four attributes (A,B,C,D), one FD (A- >BCD), and no MVDs or JDs not implied by this FD. It could be decom- Table 6.6 The Table skill_in_common and Its Three Projections skill_in_common emp_id proj_no skill_type 101 3 A 101 3 B 101 4 A 101 4 B 102 3 A 102 3 B 103 3 A 103 4 A 103 5 A 103 5 C skill_in_com1 skill_in_com2 skill_in_com3 emp_id proj_no emp_id skill_type proj_no skill_type 101 3 101 A 3 A 101 4 101 B 3 B 102 3 102 A 4 A 103 3 102 B 4 B 103 4 103 A 5 A 103 5 103 C 5 C Teorey.book Page 134 Saturday, July 16, 2005 12:57 PM 6.5 Fourth and Fifth Normal Forms 135 posed into three tables, A->B, A->C, and A->D, all based on the same superkey A; however, it is already in 5NF without the decomposition. Thus, the decomposition is not required for normalization. On the other hand, decomposition can be a useful tool in some instances for perfor- mance improvement. The following example demonstrates that a table representing a ter- nary relationship may not have any two-way lossless decompositions; however, it may have a three-way lossless decomposition, which is equivalent to three binary relationships, based on the three possible pro- jections of this table. This situation occurs in the relationship skill-in- common (Figure 6.6), which is defined as “The employee must apply the intersection of his or her available skills with the skills needed to work on certain projects.” In this example, skill-in-common is less restrictive than skill-required because it allows an employee to work on a project even if he or she does not have all the skills required for that project. As Table 6.6 shows, the three projections of skill_in_common result in a three-way lossless decomposition. There are no two-way loss- less decompositions and no MVDs; thus, the table skill_in_common is in 4NF. The ternary relationship in Figure 6.6 can be interpreted yet another way. The meaning of the relationship skill-used is “We can selectively record different skills that each employee applies to working on individ- ual projects.” It is equivalent to a table in 5NF that cannot be decom- posed into either two or three binary tables. Note by studying Table 6.7 that the associated table, skill_used, has no MVDs or JDs. Table 6.7 The Table skill_used, Its Three Projections, and Natural Joins of Its Projections skill_used emp_id proj_no skill_type 101 3 A 101 3 B 101 4 A 101 4 C 102 3 A 102 3 B 102 4 A 102 4 B Teorey.book Page 135 Saturday, July 16, 2005 12:57 PM 136 CHAPTER 6 Normalization A table may have constraints that are FDs, MVDs, and JDs. An MVD is a special case of a JD. To determine the level of normalization of the table, analyze the FDs first to determine normalization through BCNF; then analyze the MVDs to determine which BCNF tables are also 4NF; then, finally, analyze the JDs to determine which 4NF tables are also 5NF. Three projections on skill_used result in: skill_used1 skill_used2 skill_used3 emp_id proj_no proj_no skill_type emp_id skill_type 101 3 3 A 101 A 101 4 3 B 101 B 102 3 4 A 101 C 102 4 4 B 102 A 4C 102B join skill_used1 with skill_used2 to form: join skill_used12 with skill_used3 to form: skill_used_12 skill_used_123 emp_id proj_no skill_type emp_id proj_no skill_type 101 3 A 101 3 A 101 3 B 101 3 B 101 4 A 101 4 A 101 4 B 101 4 B (spurious) 101 4 C 101 4 C 102 3 A 102 3 A 102 3 B 102 3 B 102 4 A 102 4 A 102 4 B 102 4 B 102 4 C Table 6.7 The Table skill_used, Its Three Projections, and Natural Joins of Its Projections (continued) Teorey.book Page 136 Saturday, July 16, 2005 12:57 PM . with MVDs A ->> B and CD ->> EF. The first decomposition of R is into two tables R1(A,B) and R2(A,C,D,E,F) by applying the MVD A ->> B and its complement A->>CDEF. Table. CD ->> EF and its complement CD ->> A. Both R21 and R22 are now 4NF. If we had applied the MVD complement rule in the opposite order, using CD ->> EF and its complement CD ->>. Table R1 is now 4NF, because A ->> B is trivial and is the only MVD in the table. Table R2, however, is still only BCNF, because of the nontrivial MVD CD ->> EF. We then decompose R2