Hướng dẫn học Microsoft SQL Server 2008 part 11 pdf

10 324 0
Hướng dẫn học Microsoft SQL Server 2008 part 11 pdf

Đang tải... (xem toàn văn)

Thông tin tài liệu

Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 62 Part I Laying the Foundation To use the standard organization chart as an example, each tuple in the employee entity represents one employee. Each employee reports to a supervisor who is also listed in the employee entity. The ReportsToID foreign key points to the supervisor’s primary key. Because EmployeeID is a primary key and ReportsToID is a foreign key, the relationship cardinal- ity is one-to-many, as shown in Figure 3-12. One manager may have several direct reports, but each employee may have only one manager. FIGURE 3-12 The reflexive, or recursive, relationship is a one-to-many relationship between two tuples of the same entity. This shows the organization c hart for members of the Adventure Works IT department. Primary Key: ContactID Foreign Key: ReportsToID Contact Ken Sánchez <NULL> Jean Trenary Ken Sánchez Stephanie Conroy Jean Trenary François Ajenstat Jean Trenary Dan Wilson Jean Trenary A bill of materials is a more complex form of the recursive pattern because a part may be built from sev- eral source parts, and the part may be used to build several parts in the next step of the manufacturing process, as illustrated in Figure 3-13. FIGURE 3-13 The conceptual diagram of a many-to-many recursive relationship shows multiple cardinality at each end of the relationship. Part An associative entity is required to resolve the many-to-many relationship between the component parts being used and the part being assembled. In the MaterialSpecification sample database, the BoM 62 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 63 Relational Database Design 3 (bill of materials) associative entity has two foreign keys that both point to the Part entity, as shown in Figure 3-14. The first foreign key points to the part being built. The second foreign key points to the source parts. FIGURE 3-14 The physical implementation of the many-to-many re flexive relationship must include a associative entity to resolve the many-to-many relationship, just like the many-to-many two-entity relationship. Part Widget Super Widget BoM Part APart B Part C Primary Key:ContactID Widget Thing1 Bolt ForeignKey:AssemblyID Foreign Key: ComponentID Widget Part A Part B Super Widget Part A Widget Part A Thing 1 Part A Bolt Part B Thing 1 Super Widget Part A SuperWidget Part C Part C In the sample data, Part A is constructed from two parts (a Thing1 and a bolt) and is used in the assem- bly of two parts (Widget and SuperWidget). The first foreign key points to the material being built. The second foreign key points to the source material. Entity-Value Pairs Pattern E very couple of months, I hear about data modelers working with the entity-value pairs pattern ,alsoknown as the entity-attribute-value (EAV) pattern , sometimes called the generic pattern or property bag / property table pattern , illustrated in Figure 3-15. In the SQL Server 2000 Bible , I called it the ‘‘dynamic/relational pattern.’’ continued 63 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 64 Part I Laying the Foundation continued FIGURE 3-15 The entity-values pairs pattern is a simple design with only four tables: class/type, attribute/column, object/item, and value. The value table stores every value for every attribute for every item — one long list. Class Category Object Item Attribute Property Value This design can be popular when applications require dynamic attributes. Sometimes it’s used as an OO DBMS physical design within a RDBMS product. It’s also gaining popularity with cloud databases. At first blush, the entity-value pairs pattern is attractive, novel, and appealing. It offers unlimited logical design alterations without any physical schema changes — the ultimate flexible extensible design. But there are problems. Many problems . . . ■ The entity-value pairs pattern lacks data integrity — specifically, data typing. The data type is the most basic data constraint. The basic entity-value pairs pattern stores every value in a single nvarchar or sql_variant column and ignores data typing. One option that I wouldn’t recommend is to create a value table for each data type. While this adds data typing, it certainly complicates the code. ■ It’s difficult to query the entity-value pairs pattern. I’ve seen two solutions. The most common method is hard-coding .NET code to extract and normalize the data. Another option is to code-gen a table-valued UDF or crosstab view for each class/type to extract the data and return a normalized data set. This has the advantage of being usable in normal SQL queries, but performance and inserts/updates remain difficult. Either solution defeats the dynamic goal of the pattern. ■ Perhaps the greatest complaint against the entity-value pairs pattern is that it’s nearly impossible to enforce referential integrity. Can the value-pairs pattern be an efficient, practical solution? I doubt it. I continue to hear of projects using this pattern that initially look promising and then fail under the weight of querying once it’s fully populated. Nulltheless, someday I’d like to build out a complete EAV code-gen tool and test it under a heavy load — just for the fun of it. 64 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 65 Relational Database Design 3 Database design layers I’ve observed that every database can be visualized as three layers: domain integrity (lookup) layer, busi- ness visible layer, and supporting layer, as drawn in Figure 3-16. FIGURE 3-16 Visualizing the database as three layers can be useful when designing the conceptual diagram and coding the SQL DLL implementation. • Domain Integrity Look up tables • Business Entities (Visible) Objects the user can describe • Supporting Entities Associative tables While you are designing the conceptual diagram, visualizing the database as three layers can help orga- nize the entities and clarify the design. When the database design moves into the SQL DDL implementa- tion phase, the database design layers become critical in optimizing the primary keys for performance. The center layer contains those entities that the client or subject-matter expert would readily recognize and understand. These are the main work tables that contain working data such as transaction, account, or contact information. When a user enters data on a daily basis, these are the tables hit by the insert and update. I refer to this layer as the visible layer or the business entity layer. Above the business entity layer is the domain integrity layer. This top layer has the entities used for val- idating foreign key values. These tables may or may not be recognizable by the subject-matter expert or a typical end-user. The key point is that they are used only to maintain the list of what’s legal for a for- eign key, and they are rarely updated once initially populated. Below the visible layer live the tables that are a mystery to the end-user — associative tables used to materialize a many-to-many logical relationship are a perfect example of a supporting table. Like the vis- ible layer, these tables are often heavily updated. Normal Forms Taking a detailed look at the normal forms moves this chapter into a more formal study of relational database design. Contrary to popular opinion, the forms are not a progressive methodology, but they do represent a pro- gressive level of compliance. Technically, you can’t be in 2NF until 1NF has been met. Don’t plan on designing an entity and moving it through first normal form to second normal form, and so on. Each normal form is simply a different type of data integrity fault to be avoided. 65 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 66 Part I Laying the Foundation First normal form (1NF) The first normalized form means the data is in an entity format, such that the following three conditions are met: ■ Every unit of data is represented within scalar attributes. A scalar value is a value ‘‘capable of being represented by a point on a scale,’’ according to Merriam-Webster. Every attribute must contain one unit of data, and each unit of data must fill one attribute. Designs that embed multiple pieces of information within an attribute violate the first normal form. Likewise, if multiple attributes must be combined in some way to determine a single unit of data, then the attribute design is incomplete. ■ All data must be represented in unique attributes. Each attribute must have a unique name and a unique purpose. An entity should have no repeating attributes. If the attributes repeat, or the entity is very wide, then the object is too broadly designed. A design that repeats attributes, such as an order entity that includes item1, item2,and item3 attributes to hold multiple line items, violates the first normal form. ■ All data must be represented within unique tuples. If the entity design requires or permits duplicate tuples, that design violates the first normal form. If the design requires multiple tuples to represent a single item, or multiple items are repre- sented by a single tuple, then the table violates first normal form. For an example of the first normal form in action, consider the listing of base camps and tours from the Cape Hatteras Adventures database. Table 3-3 shows base camp data in a model that violates the first normal form. The repeating tour attribute is not unique. TABLE 3-3 Violating the First Normal Form BaseCamp Tour1 Tour2 Tour3 Ashville Appalachian Trail Blue Ridge Parkway Hike Cape Hatteras Outer Banks Lighthouses Freeport Bahamas Dive Ft. Lauderdale Amazon Trek West Virginia Gauley River Rafting To redesign the data model so that it complies with the first normal form, resolve the repeating group of tour attributes into a single unique attribute, as shown in Table 3-4, and then move any multiple val- ues to a unique tuple. The BaseCamp entity contains a unique tuple for each base camp, and the Tour entity’s BaseCampID refers to the primary key in the BaseCamp entity. 66 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 67 Relational Database Design 3 TABLE 3-4 Conforming to the First Normal Form Tour Entity BaseCamp Entity BaseCampID(FK) Tour BaseCampID (PK) Name 1 Appalachian Trail 1 Ashville 1 Blue Ridge Parkway Hike 2 Cape Hatteras 2 Outer Banks Lighthouses 3 Freeport 3 Bahamas Dive 4 Ft. Lauderdale 4 Amazon Trek 5 West Virginia Gauley River Rafting Another example of a data structure that desperately needs to adhere to the first normal form is a cor- porate product code that embeds the department, model, color, size, and so forth within the code. I’ve even seen product codes that were so complex they included digits to signify the syntax for the follow- ing digits. In a theoretical sense, this type of design is wrong because the attribute isn’t a scalar value. In practical terms, it has the following problems: ■ Using a digit or two for each data element means that the database will soon run out of possible data values. ■ Databases don’t index based on the internal values of a string, so searches require scanning the entire table and parsing each value. ■ Business rules are difficult to code and enforce. Entities with non-scalar attributes need to be completely redesigned so that each individual data attribute has its own attribute. Smart keys may be useful for humans, but it is best if it is generated by combining data from the tables. Second normal form (2NF) The second normal form ensures that each attribute does in fact describe the entity. It’s a dependency issue. Does the attribute depend on, or describe, the item identified by the primary key? If the entity’s primary key is a single value, this isn’t too difficult. Composite primary keys can some- times get into trouble with the second normal form if the attributes aren’t dependent on every attribute in the primary key. If an attribute depends on one of the primary key attributes but not the other, that is a partial dependency, which violates the second normal form. An example of a data model that violates the second normal form is one in which the base camp phone number is added to the BaseCampTour entity, as shown in Table 3-5. Assume that the primary key 67 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 68 Part I Laying the Foundation (PK) is a composite of both the BaseCamp and the Tour, and that the phone number is a permanent phone number for the base camp, not a phone number assigned for each tour. TABLE 3-5 Violating the Second Normal Form PK-BaseCamp PK-Tour Base Camp PhoneNumber Ashville Appalachian Trail 828-555-1212 Ashville Blue Ridge Parkway Hike 828-555-1212 Cape Hatteras Outer Banks Lighthouses 828-555-1213 Freeport Bahamas Dive 828-555-1214 Ft. Lauderdale Amazon Trek 828-555-1215 West Virginia Gauley River Rafting 828-555-1216 The problem with this design is that the phone number is an attribute of the base camp but not the tour, so the PhoneNumber attribute is only partially dependent on the entity’s primary key. An obvious practical problem with this design is that updating the phone number requires either updat- ing multiple tuples or risking having two phone numbers for the same phone. The solution is to remove the partially dependent attribute from the entity with the composite keys, and create an entity with a unique primary key for the base camp, as shown in Table 3-6. This new entity is then an appropriate location for the dependent attribute. TABLE 3-6 Conforming to the Second Normal Form Tour Entity Base Camp Entity PK-Base Camp PK-Tour PK-Base Camp PhoneNumber Ashville Appalachian Trail Ashville 828-555-1212 Ashville Blue Ridge Parkway Hike Cape Hatteras 828-555-1213 Cape Hatteras Outer Banks Lighthouses Freeport 828-555-1214 Freeport Bahamas Dive Ft. Lauderdale 828-555-1215 Ft. Lauderdale Amazon Trek West Virginia 828-555-1216 West Virginia Gauley River Rafting 68 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 69 Relational Database Design 3 The PhoneNumber attribute is now fully dependent on the entity’s primary key. Each phone number is stored in only one location, and no partial dependencies exist. Third normal form (3NF) The third normal form checks for transitive dependencies. A transitive dependency is similar to a partial dependency in that they both refer to attributes that are not fully dependent on a primary key. A depen- dency is transient when attribute1 is dependent on attribute2, which is dependent on the pri- mary key. The second normal form is violated when an attribute depends on part of the key. The third normal form is violated when the attribute does depend on the key but also depends on another non-key attribute. The key phrase when describing third normal form is that every attribute ‘‘must provide a fact about the key, the whole key, and nothing but the key.’’ Just as with the second normal form, the third normal form is resolved by moving the non-dependent attribute to a new entity. Continuing with the Cape Hatteras Adventures example, a guide is assigned as the lead guide respon- sible for each base camp. The BaseCampGuide attribute belongs in the BaseCamp entity; but it is a violation of the third normal form if other information describing the guide is stored in the base camp, as shown in Table 3-7. TABLE 3-7 Violating the Third Normal Form Base Camp Entity BaseCampPK BaseCampPhoneNumber LeadGuide DateofHire Ashville 1-828-555-1212 Jeff Davis 5/1/99 Cape Hatteras 1-828-555-1213 Ken Frank 4/15/97 Freeport 1-828-555-1214 Dab Smith 7/7/2001 Ft. Lauderdale 1-828-555-1215 Sam Wilson 1/1/2002 West Virginia 1-828-555-1216 Lauren Jones 6/1/2000 The DateofHire describestheguidenotthebase,sothehire-date attribute is not directly dependent on the BaseCamp entity’s primary key. The DateOfHire’s dependency is transitive — it describes the key and a non-key attribute — in that it goes through the LeadGuide attribute. Creating a Guide entity and moving its attributes to the new entity resolves the violation of the third normal form and cleans up the logical design, as demonstrated in Table 3-8. 69 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 70 Part I Laying the Foundation TABLE 3-8 Conforming to the Third Normal Form Tour Entity LeadGuide Entity BaseCampPK LeadGuide LeadGuidePK DateofHire Ashville, NC Jeff Davis Jeff Davis 5/1/99 Cape Hatteras Ken Frank Ken Frank 4/15/97 Freeport Dab Smith Dab Smith 7/7/2001 Ft. Lauderdale Sam Wilson Sam Wilson 1/1/2002 West Virginia Lauren Jones Lauren Jones 6/1/2000 Best Practice I f the entity has a good primary key and every attribute is scalar and fully dependent on the primary key, then the logical design is in the third normal form. Most database designs stop at the third normal form. The additional forms prevent problems with more complex logical designs. If you tend to work with mind-bending modeling problems and develop creative solutions, then understanding the advanced forms will prove useful. The Boyce-Codd normal form (BCNF) The Boyce-Codd normal form occurs between the third and fourth normal forms, and it handles a prob- lem with an entity that has multiple candidate keys. One of the candidate keys is chosen as the primary key and the others become alternate keys. For example, a person might be uniquely identified by his or her social security number (ssn), employee number, and driver’s license number. If the ssn is the pri- mary key, then the employee number and driver’s license number are the alternate keys. The Boyce-Codd normal form simply stipulates that in such a case every attribute must describe every candidate key. If an attribute describes one of the candidate keys but not another candidate key, then the entity violates BCNF. Fourth normal form (4NF) The fourth normal form deals with problems created by complex composite primary keys. If two inde- pendent attributes are brought together to form a primary key along with a third attribute but the two attributes don’t really uniquely identify the entity without the third attribute, then the design violates the fourth normal form. 70 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 71 Relational Database Design 3 For example, assume the following conditions: 1. The BaseCamp and the base camp’s LeadGuide were used as a composite primary key. 2. An Event and the Guide were brought together as a primary key. 3. Because both used a guide all three were combined into a single entity. The preceding example violates the fourth normal form. The fourth normal form is used to help identify entities that should be split into separate entities. Usu- ally this is only an issue if large composite primary keys have brought too many disparate objects into a single entity. Fifth normal form (5NF) The fifth normal form provides the method for designing complex relationships that involve multiple (three or more) entities. A three-way or ternary relationship, if properly designed, is in the fifth normal form. The cardinality of any of the relationships could be one or many. What makes it a ternary rela- tionship is the number of related entities. As an example of a ternary relationship, consider a manufacturing process that involves an operator, a machine, and a bill of materials. From one point of view, this could be an operation entity with three foreign keys. Alternately, it could be thought of as a ternary relationship with additional attributes. Just like a two-entity many-to-many relationship, a ternary relationship requires a resolution entity in the physical schema design to resolve the many-to-many relationship into multiple artificial one-to-many relationships; but in this case the resolution entity has three or more foreign keys. In such a complex relationship, the fifth normal form requires that each entity, if separated from the ternary relationship, remains a proper entity without any loss of data. It’s commonly stated that third normal form is enough. Boyce-Codd, fourth, and fifth normal forms may be complex, but violating them can cause severe problems. It’s not a matter of more entities vs. fewer entities; it’s a matter of properly aligned attributes and keys. As I mentioned earlier in this chapter, Louis Davidson (aka Dr. SQL) and I co-present a session at conferences on database design. I recommend his book Pro SQL Server 2008 Relational Database Design and Implementation (Apress, 2008). Summary Relational database design, covered in Chapter 2, showed why the database physical schema is critical to the database’s performance. This chapter looked at the theory behind the logical correctness of the database design and the many patterns used to assemble a database schema. ■ There are three phases in database design: the conceptual (diagramming) phase, the SQL DDL (create table) phase, and the physical layer (partition and file location) phase. Databases designed with only the conceptual phase perform poorly. 71 www.getcoolebook.com . relationship. Part Widget Super Widget BoM Part APart B Part C Primary Key:ContactID Widget Thing1 Bolt ForeignKey:AssemblyID Foreign Key: ComponentID Widget Part A Part B Super Widget Part A Widget Part. 1 Part A Bolt Part B Thing 1 Super Widget Part A SuperWidget Part C Part C In the sample data, Part A is constructed from two parts (a Thing1 and a bolt) and is used in the assem- bly of two parts. Davidson (aka Dr. SQL) and I co-present a session at conferences on database design. I recommend his book Pro SQL Server 2008 Relational Database Design and Implementation (Apress, 2008) . Summary Relational

Ngày đăng: 04/07/2014, 09:20

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan