Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 52 Part I Laying the Foundation For some entities, there might be multiple possible primary keys to choose from: employee number, driver’s license number, national ID number (ssn). In this case, all the potential primary keys are known as candidate keys. Candidate keys that are not selected as the primary key are then known as alternate keys. It’s important to document all the candidate keys because later, at the SQL DLL layer, they will need unique constraints. At the conceptual diagramming phase, a primary key might be obvious — an employee number, an automobile VIN number, a state or region name — but often there is no clearly recognizable uniquely identifying value for each item in reality. That’s OK, as that problem can be solved later during the SQL DLL layer. Foreign keys When two entities (tables) relate to one another, one entity is typically the primary entity and the other entity is the secondary entity. The connection between the two entities is made by replicating the primary key from the primary entity in the secondary entity. The duplicated attributes in the secondary entity are known as a foreign key. Informally this type of relationship is sometimes called a parent-child relationship. Enforcing the foreign key is referred to as referential integrity. The classic example of a primary key and foreign key relationship is the order and order details rela- tionship. Each order item (primary entity) can have multiple order detail rows (secondary entity). The order’s primary key is duplicated in the order detail entity, providing the link between the two entities, as shown in Figure 3-3. You’ll see several examples of primary keys and foreign keys in the ‘‘Data Design Patterns’’ section later in this chapter. Cardinality The cardinality of the relationship describes the number of tuples (rows) on each side of the relation- ship. Either side of the relationship may be restricted to allow zero, one, or multiple tuples. The type of key enforces the restriction of multiple tuples. Primary keys are by definition unique and enforce the single-tuple restriction, whereas foreign keys permit multiple tuples. There are several possible cardinality combinations, as shown in Table 3-2. Within this section, each of the cardinality possibilities is examined in detail. Optionality The second property of the relationship is its optionality. The difference between an optional relationship and a mandatory relationship is critical to the data integrity of the database. 52 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 53 Relational Database Design 3 FIGURE 3-3 A one-to-many relationship consists of a primary entity and a secondary entity. The secondary entity’s foreign key points to the primary entity’s primary key. In this case, the Sales.SalesOrderDetail’s SalesOrderID i s the foreign key that relates to Sales.SalesOrderheader’s primary key. TABLE 3-2 Common Relationship Cardinalities Relationship Type First Entity’s Key Second Entity’s Key One-to-one Primary entity–primary key–single tuple Primary entity–primary key–single tuple One-to-many Primary entity–primary key–single tuple Secondary entity–foreign key–multiple tuples Many-to-many Multiple tuples Multiple tuples 53 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 54 Part I Laying the Foundation Some relationships are mandatory, or strong. These secondary tuples (rows) require that the foreign key point to a primary key. The secondary tuple would be incomplete or meaningless without the primary entity. For the following examples, it’s critical that the relationship be enforced: ■ An order-line item without an order is meaningless. ■ An order without a customer is invalid. ■ In the Cape Hatteras Adventures database, an event without an associated tour tuple is a useless event tuple. Conversely, some relationships are optional, or weak. The secondary tuple can stand alone without the primary tuple. The object in reality that is represented by the secondary tuple would exist with or with- out the primary tuple. For example: ■ A customer is valid with or without a discount code. ■ In the OBXKites sample database, an order may or may not have a priority code. Whether the order points to a valid tuple in the order priority entity or not, it’s still a valid order. Some database developers prefer to avoid optional relationships and so they design all relationships as mandatory and point tuples that wouldn’t need a foreign key value to a surrogate tuple in the primary table. For example, rather than allow nulls in the discount attribute for customers without discounts, a ‘‘no discount’’ tuple is inserted into the discount entity and every customer without a discount points to that tuple. There are two reasons to avoid surrogate null tuples (pointing to a ‘‘no discount’’ tuple): The design adds work when work isn’t required (additional inserts and foreign key checks), and it’s easier to locate a tuple without the relationship by selecting where column is not null. The null value is a standard and useful design element. Ignoring the benefits of nullability only creates additional work for both the developer and the database. From a purist’s point of view, a benefit of using the surrogate null tuple is that the ‘‘no discount’’ is explicit and a null value can then actually mean unknown or missing, rather than ‘‘no discount.’’ Some rare situations call for a complex optionality based on a condition. Depending on a rule, the rela- tionship must be enforced, for example: ■ If an organization sometimes sells ad hoc items that are not in the item entity, then the rela- tionship may, depending on the item, be considered optional. The orderdetail entity can use two attributes for the item. If the ItemID attribute is used, then it must point to a valid item entity primary key. ■ However, if the NonStandardItemDescription attribute is used instead, the ItemID attribute is left null. ■ A check constraint ensures that for each row, either the ItemID or NonStandardItemDescription is null. How the optionality is implemented is up to the SQL DDL layer. The only purpose of the conceptual design layer is to model the organization’s objects, their relationships, and their business rules. Data schema diagrams for the sample databases are in Appendix B. The code to create the sample database may be downloaded from www.sqlserverbible.com. 54 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 55 Relational Database Design 3 Data-Model Diagramming D ata modelers use several methods to graphically work out their data models. The Chen ER diagramming method is popular, and Visio Professional includes it and five others. The method I prefer, Information Engineering — E/R Diagramming, is rather simple and works well on a whiteboard, as shown in Figure 3-4. The cardinality of the relationship is indicated by a single line or by three lines (crow’s feet). If the r elationship is optional, a circle is placed near the foreign key. FIGURE 3-4 A simple method for diagramming logical schemas Primary Table Secondary Table Another benefit of this simple diagramming method is that it doesn’t require an advanced version of Visio. Visio is OK as a starting point, but it doesn’t give you a nice life cycle like a dedicated modeling tool. There are several more powerful tools, but it’s really a personal preference. Data Design Patterns Design is all about building something new by combining existing concepts or items using patterns. The same is true for database design. The building blocks are tables, rows, and columns, and the patterns are one-to-many, many-to-many, and others. This section explains these patterns. Once the entities — nouns and verbs — are organized, the next step is to determine the relationships among the objects. Each relationship connects two entities using their primary and foreign keys. Clients or business analysts should be able to describe the common relationships between the objects using terms such as includes, has,orcontains. For example, a customer may place (has) many orders. An order may include (contains) many items. An item may be on many orders. Based on these relationship descriptions, the best data design pattern may be chosen. One-to-many pattern By far the most common relationship is a one-to-many relationship; this is the classic parent-child rela- tionship. Several tuples (rows) in the secondary entity relate to a single tuple in the primary entity. The relationship is between the primary entity’s primary key and the secondary entity’s foreign key, as illus- trated in the following examples: ■ In the Cape Hatteras Adventures database, each base camp may have several tours that originate from it. Each tour may originate from only one base camp, so the relationship is 55 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 56 Part I Laying the Foundation modeled as one base camp relating to multiple tours. The relationship is made between the BaseCamp’s primary key and the Tour entity’s BaseCampID foreign key, as diagrammed in Figure 3-5. Each Tour’s foreign key attribute contains a copy of its BaseCamp’s primary key. FIGURE 3-5 The one-to-many relationship relates zero to many tuples (rows) in the secondary entity to a single tuple in the primary entity. Ashville Ashville Ashville Appalachian Trail Blue Ridge Parkway Hike Cape Hatteras Outer Banks Lighthouses Cape Hatteras Base Camp Tour Primary Key: Base Camp Foreign Key: Base Camp Tour ■ Each customer may place multiple orders. While each order has its own unique OrderID primary key, the Order entity also has a foreign key attribute that contains the CustomerID of the customer who placed the order. The Order entity may have several tuples with the same CustomerID that defines the relationship as one-to-many. ■ A non-profit organization has an annual pledge drive. As each donor makes an annual pledge, the pledges go into a secondary entity that can store an infinite number of years’ worth of pledges — one tuple per year. One-to-one pattern At the conceptual diagram layer, one-to-one relationships are quite rare. Typically, one-to-one relation- ships are used in the SQL ODD or the physical layer to partition the data for some performance or secu- rity reason. One-to-one relationships connect two entities with primary keys at both entities. Because a primary key must be unique, each side of the relationship is restricted to one tuple. For example, an Employee entity can store general information about the employee. However, more sensitive classified information is stored in a separate entity as shown in Figure 3-6. While security can be applied on a per-attribute basis, or a view can project selected attributes, many organizations choose to model sensitive information as two one-to-one entities. Many-to-many pattern In a many-to-many relationship, both sides may relate to multiple tuples (rows) on the other side of the relationship. The many-to-many relationship is common in reality, as shown in the following examples: 56 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 57 Relational Database Design 3 FIGURE 3-6 This one-to-one relationship partitions employee data, segmenting classified information into a separate entity. Employee Employee_Classified John Smith John Smith Mary Jones Secret Stuff Secret Stuff Mary Jones Davey Jones Sue Miller Primary Key: EmployeeID ClassifiedPrimary Key: EmployeeID ■ The classic example is members and groups. A member may belong to multiple groups, and a group may have multiple members. ■ In the OBXKites sample database, an order may have multiple items, and each item may be sold on multiple orders. ■ In the Cape Hatteras Adventures sample database, a guide may qualify for several tours, and each tour may have several qualified guides. In a conceptual diagram, the many-to-many relationship can be diagramed by signifying multiple cardi- nality at each side of the relationship, as shown in Figure 3-7. FIGURE 3-7 The many-to-many logical model shows multiple tuples on both ends of the relationship. Customer Event Many-to-many relationships are nearly always optional. For example, the many customers-to-many events relationship is optional because the customer and the tour/event are each valid without the other. The one-to-one and the one-to-many relationship can typically be constructed from items within an organization that users can describe and understand. That’s not always the case with many-to-many relationships. To implement a many-to-many relationship in SQL DDL, a third table, called an associative table (some- times called a junction table) is used, which artificially creates two one-to-many relationships between the two entities (see Figure 3-8). Figure 3-9 shows the associative entity with data to illustrate how it has a foreign key to each of the two many-to-many primary entities. This enables each primary entity to assume a one-to-many relationship with the other entity. 57 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 58 Part I Laying the Foundation FIGURE 3-8 The many-to-many implementation adds an associative table to create artificial one-to-many relation- ships for both tables. FIGURE 3-9 In the associative entity (Customer_mm_Event), each customer can be represented multiple times, which creates an artificial one-event-to-many-customers relationship. Likewise, each event can be listed multiple times in the associative entity, creating a one-customer-to-many-events relationship. John Foreign Key: CustomerID Foreign Key: EventID John John Appalachian Trail Blue Ridge Parkway Hike Customer EventCustomer_mm_Event Paul Bob Paul Appalachian Trail Paul Outer Banks Lighthouses Appalachian Trail Blue Ridge Parkway Hike Outer Banks Lighthouses Dog House Tour Primary Key: ContactID Primary Key: CustomerID 58 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 59 Relational Database Design 3 In some cases the subject-matter experts will readily recognize the associated table: ■ In the case of the many orders to many products example, the associative entity is the order details entity. ■ A class may have many students and each student may attend many classes. The associative entity would be recognized as the registration entity. In other cases an organization might understand that the relationship is a many-to-many relationship, but there’s no term to describe the relationship. In this case, the associative entity is still required to resolve the many-to-many relationship — just don’t discuss it with the subject-matter experts. Typically, additional facts and attributes describe the many-to-many relationship. These attributes belong in the associative entity. For example: ■ In the case of the many orders to many products example, the associative entity ( order details entity) would include the quantity and sales price attributes. ■ In the members and groups example, the member_groups associative entity might include the datejoined and status attributes. When designing attributes for associative entities, it’s extremely critical that every attribute actually describe only the many-to-many relationship and not one of the primary entities. For example, including a product name describes the product entity and not the many orders to many products relationship. Supertype/subtype pattern One of my favorite design patterns, that I don’t see used often enough, is the supertype/subtype pattern. It supports generalization, and I use it extensively in my designs. The supertype/subtype pattern is also perfectly suited to modeling an object-oriented design in a relational database. The supertype/subtype relationship leverages the one-to-one relationship to connect one supertype entity with one or more subtype entities. This extends the supertype entity with what appears to be flexible attributes. The textbook example is a database that needs to store multiple types of contacts. All contacts have basic contact data such as name, location, phone number, and so on. Some contacts are customers with customer attributes (credit limits, loyalty programs, etc.). Some contacts are vendors with vendor-specific data. While it’s possible to use separate entities for customers and vendors, an alternative design is to use a single Contact entity (the supertype) to hold every contact, regardless of their type, and the attributes common to every type (probably just the name and contact attributes). Separate entities (the subtypes) hold the attributes unique to customers and vendors.Acustomerwouldhaveatuple(row)inthecon- tact and the customer entities. A vendor would have tuples in both the contact and vendor entities. All three entities share the same primary key, as shown in Figure 3-10. Sometime data modelers who use the supertype/subtype pattern add a type attribute in the supertype entity so it’s easy to quickly determine the type by searching the subtypes. This works well but it restricts the tuples to a single subtype. 59 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 60 Part I Laying the Foundation FIGURE 3-10 The supertype/subtype pattern uses an optional one-to-one relationship t hat relates a primary key to a primary key. John John Paul 10 Points 3 Points Contact Customer Vendor Paul Earnest Baked Good Frank’s General Store Earnest Baked Good Nulls-R-Us Always fresh Never know when he’ll show up Nulls-R-Us Frank’s General Store Dependable Primary Key: ContactID Primary Key: ContactID Customer Loyality data Primary Key: ContactID Vendor Status Without the type attribute, it’s possible to allow tuples to belong to multiple subtypes. Sometimes this is referred to as allowing the supertype to have multiple roles. In the contact example, multiple roles (e.g. a contact who is both an employee and customer) could mean the tuple has data in the supertype entity (e.g. contact entity) and each role subtype entity (e.g. employee and customer entities.) Nordic O/R DBMS N ordic (New Object/Relational Design) is my open-source experiment to transform SQL Server into an object-oriented database. Nordic builds on the supertype/subtype pattern and uses T-SQL code generation to create a T-SQL API fac¸ade that supports classes with multiple inheritance, attribute inheritance, polymorphism, inheritable class roles, object morphing, and inheritable class-defined workflow state. If you want to play with Nordic, go to www.CodePlex.com/nordic. 60 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 61 Relational Database Design 3 Domain integrity lookup pattern The domain integrity lookup pattern, informally called the lookup table pattern, is very common in pro- duction databases. This pattern only serves to limit the valid options for an attribute, as illustrated in Figure 3-11. FIGURE 3-11 The domain integrity lookup pattern uses a foreign key to ensure that only valid data is entered into the attribute. Primary Key: ContactID Foreign Key: RegionID Contact North CarolinaNC Region CO Colorado NY New York John NC Paul CO Earnest Baked Good CO Primary Key: RegionID Region Description Nulls-R-Us NY Frank’s General Store NC The classic example is the state, or region, lookup entity. Unless the organization regularly deals with several states as clients, the state lookup entity only serves to ensure that the state attributes in other entities are entered correctly. Its only purpose is data consistency. Recursive pattern A recursive relationship pattern (sometimes called a self-referencing, unary,orself-join relationship) is one that relates back to itself. In reality, these relationships are quite common: ■ An organizational chart represents a person reporting to another person. ■ A bill of materials details how a material is constructed from other materials. ■ Within the Family sample database, a person relates to his or her mother and father. Chapter 17, ‘‘Traversing Hierarchies,’’ deals specifically with modeling and querying recur- sive relationships within SQL Server 2008. 61 www.getcoolebook.com . open-source experiment to transform SQL Server into an object-oriented database. Nordic builds on the supertype/subtype pattern and uses T -SQL code generation to create a T -SQL API fac¸ade that supports. 12:07pm Page 60 Part I Laying the Foundation FIGURE 3 -10 The supertype/subtype pattern uses an optional one-to-one relationship t hat relates a primary key to a primary key. John John Paul 10 Points 3. Hierarchies,’’ deals specifically with modeling and querying recur- sive relationships within SQL Server 2008. 61 www.getcoolebook.com