At this point, you should have a list of the main entities, and although it might not be a complete list, it should be at least a reasonable first pass at the main attributes for each entity. Now comes an important phase in designing a database: breaking out those attributes that can occur several times for each entity and deciding how the different entities relate to each other. This is often referred to as cardinality.
Some people like to consider the relationships even before generating an attribute list. We find that listing the main attributes helps in understanding the entities, so we perform that step first. There is no definitive right and wrong way; use whichever works best for you.
Drawing Relationship Diagrams
With databases, a graphical representation of the structure of the data can be extremely helpful in understanding the design. At this stage, you are working on what is termed a conceptual model. You are not yet concerned about the finer implementation detail, but more about the logical structure of our data. In a conceptual data model, tables are shown as boxes, with rela- tionships between the tables shown using lines, with symbols at the end of the line indicating the type of relationship, or the cardinality. Relationships between tables are always in two directions; therefore, there will always be a symbol at each end, and you read the diagram toward the table of interest. The symbols we will be using here are shown in Table 12-1.
■Note There are many different diagramming techniques and styles in use in database circles. We will use a common notation; you will find other notation styles in use.
Suppose we had a relationship between two tables, A and B, as shown in Figure 12-1.
Figure 12-1. Simple relationship between two tables
This means that the tables have the following relationship:
• For each row in table A, there must be exactly one row in table B.
• For each row in table B, there can be zero, one, or many rows in table A.
Table 12-1. Cardinality Symbols Relationship Symbol
Zero or one
Exactly one
Zero or many
One or many
For example, if table A is order and table B is customer, this would say, “For each order, there must be exactly one customer. For each customer there can be zero, one, or many orders.”
Now that we have the basics of the diagram elements for drawing table relationships, we can look at our example with customers, orders, and products. Our customer table has no multiple attributes, so we can leave it alone for now. Let’s tackle our item table next, as this is reasonably straightforward.
Our only difficulty with the item table is that each item could have more than one barcode.
As we discussed earlier in the book, having an unknown number of repeating columns in a database table is not generally possible. (PostgreSQL does have an array data type, but that is quite unusual and should be used with caution; we prefer to stick to standard column types.) Suppose most items have two barcodes, but some have three, so we decide that an easy solu- tion is to add three columns to the item table: barcode1, barcode2, and barcode3. This seems like a nice solution to the problem, but it doesn’t stand up to closer scrutiny. What happens when a product comes along that has four barcodes? Do we redesign our database structure to add a fourth barcode column? How many columns are “enough”? As we saw in Chapter 2, having repeated columns is very inflexible, and is almost always the wrong solution.
Another solution we might think of is to have a variable-length string, and “hide” barcodes in that string, perhaps separated by a character we know doesn’t typically appear in barcodes, such as a semicolon. Again, this is a very bad solution, because we have stored many pieces of information in the same location. As with a good spreadsheet, it’s very important to ensure that each entity is stored separately, so entities can be processed independently.
We need to separate the repeating information—the barcodes—into a new table. That way, we can arrange to store an arbitrary number of barcodes for each item. While we are breaking out the barcode, we also need to consider the relationship between an item and a barcode.
Thinking from the item side first, we know that each item could have no barcodes, one barcode, or many barcodes. Thinking from the barcode end, we know that each barcode must be asso- ciated with exactly one item. A barcode on a product is always the lowest level of identifier, identifying different versions of products, such as promotional packs or overfill packs, while the core product remains the same. We can draw this relationship as shown in Figure 12-2.
Figure 12-2. The relationship between item and barcode entities
This shows that each item can have zero, one, or many barcodes, but a barcode belongs to exactly one item. Notice that we have not identified any columns to join the two tables. This will come later. The important thing at this point is to determine relationships, not how we will enforce them in the database.
Now we can move on to the order table, which is slightly harder to analyze. The first problem is how to represent the products that have been ordered. Often, orders will consist of more than one product, so we know that we have a repeating set of information relating to orders.
As before, this means that we must separate the products being ordered into another table.
We will call our main order table orderinfo, and call the table we split out to hold the products
ordered orderline, since we can imagine each row of this table corresponding to a line on a paper order.
Now we need to think about the relationship between the orderinfo and orderline tables.
It makes no sense to have an order for nothing, or to prevent a single order from having multiple items, so we know that orderinfo to orderline must have a one-to-many relationship. Thinking about an orderline, we realize that each orderline must relate to exactly one actual order, so the relationship between the two is that for each orderline entry, there must be exactly one orderinfo entry. Figure 12-3 illustrates this relationship.
Figure 12-3. The initial design for the orderline to orderinfo relationship
If you think about this a little more carefully, you can see a possible snag. When people go into a shop, they do not generally order things one at a time:
• I’d like a coffee please.
• I’d like a coffee please.
• I’d like a donut please.
• I’d like a milkshake please.
• I’d like a coffee please.
• I’d like a donut please.
They are much more likely to express their order as follows:
I’d like three coffees and two donuts and a milkshake please.
Currently, our design copes perfectly with the first situation, but it can cope with the second situation only by converting it to the many single lines situation.
Now we might decide this is okay, but if we are going to print out an order for a large round of coffees, milkshakes, and donuts, it’s going to look a bit silly to the customer if each item has a separate line. We are also making life difficult for ourselves if we do a discount on multiple items ordered at the same time. For these reasons, we decide it would be better to store a quantity against each line, as shown in Figure 12-4. This way, we can store each type of product in an order only once, and store the quantity of the product required in a separate column.
Figure 12-4. The corrected design for the orderline to orderinfo relationship
Now we have a basic conceptual design for all our entities, as shown in Figure 12-5. It’s time to relate them to each other.
Figure 12-5. First full set of entities
We can see that we have three core groups of entities, and look at how the three groups relate to each other. In this simple database, it’s immediately obvious that customer rows must relate to orderinfo rows. Looking at the relationship between items and orders, we can see that the relationship is not between the orderinfo and the item, it is between the orderline and the item.
How exactly do customers relate to orders? Clearly, each order must relate to a single customer, and each customer could have many orders, but could a customer have no orders?
Although not very likely, it could happen, perhaps while a customer account is being set up, so we will allow the possibility of a customer with no orders.
Similarly, we must define the exact relationship between item and orderline. Each orderline is for an item, so this relationship is exactly one. In the opposite direction, item to orderline, any individual item could have never been ordered, or could appear on many different order lines, so the relationship is zero or many. Adding these relationships gives us Figure 12-6.
We now have what we believe to be a complete map of all the major entities and their most important attributes, broken down where we think we need to store them in individual columns, and a diagram showing the relationship between them. We have our first conceptual database design.
Figure 12-6. The full conceptual data model
Validating the Conceptual Design
At this point, it’s vital that you stop and validate your initial conceptual design. A mistake at this stage will be much harder to correct later. It is a well-known tenet of software engineering that the earlier you find an error, the less it costs to fix. Some studies have suggested that the cost of correcting an error increases by a factor of ten for each stage in the development process.
Invest in getting the requirements capture correct and the initial design right.
This doesn’t mean you can’t take an iterative approach if you prefer, but it is a little harder with database design. This is because after the first iteration, you may have significant volumes of live data in your database. Migrating this data to a later design can be challenging in its own right, never mind the application developers not being impressed with needing changes in their code to handle an “improvement” in the underlying database design!
If you have access to the future users of the system, this is the point at which you should go back and talk to them. Show them the diagram, and explain to them what it means, step by step, to check that what you have designed conforms to their expectations of the system. If your design is partially based on an existing database, go back and revisit the original, to check that you have not missed anything vital. Most users will understand a basic entity relationship diagram such as this, provided that you sit with them and talk them through it. Not only does it help you validate the design, but it also makes users feel involved and consulted in the development.