1. Trang chủ
  2. » Công Nghệ Thông Tin

Beginning Databases with Postgre SQL phần 7 docx

66 202 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 66
Dung lượng 1,95 MB

Nội dung

CHAPTER 12 ■ DATABASE DESIGN 373 • Last name and ZIP code. This is better, but still not guaranteed to be unique, since there could be a husband and wife who are both customers. • First name, last name, and ZIP code. This is probably unique, but again not a certainty. It’s also rather messy and inefficient to need to use three columns to get to a unique key. One is much preferable, though we will accept two. There is no clear candidate key for the customer table, so we will need to generate a logical key that is unique for each customer. To be consistent, we will always name logical keys <table name>_id, which gives us customer_id. orderinfo table: This table has exactly the same problem as the customer table. There is no clear way of uniquely identifying each row, so again, we will create a key: orderinfo_id. item table: We could use the description here, but descriptions could be quite a large text string, and long text strings do not make good keys, since they are slow to search. There is also a small risk that descriptions might not always be unique, even though they probably should be. Again, we will create a key: item_id. orderline table: This table sits between the orderinfo table and the item table. If we decide that any particular item will appear on an order only once, because we handle multiple items on the same order using a quantity column, we could consider the item to be a candidate key. In practice, this won’t work, because if two different customers order the same item, it will appear in two different orderline rows. We know that we will need to find some way of relating each orderline row to its parent order in orderinfo, and since there is no column present yet that can do this, we know we will need to add one. We can postpone briefly the problem of candidate keys in the orderline table, and come back to it in a moment. Establishing Foreign Keys After establishing primary keys, you can work on the mechanism to use to relate the tables together. The conceptual model shows the way the tables relate to each other, and you have also established what uniquely identifies each row in a table. When you establish foreign keys, often all you need to do is ensure that the column you have in one table identified as a primary key also appears in all the other tables that are directly related to that table. After adjusting some column names in our tables to make them a little more meaningful, and changing the relationship lines to a physical model version, where we simply draw an arrow that points at the “must exist” table, we have a diagram that looks like Figure 12-7. Notice how the diagram has changed from the conceptual model as we move to the physical model. Now we are showing information about how tables could be physically related, not about the cardinality of those relationships. We have shown the primary key columns underlined. Don’t worry about the data types or sizes for columns yet; that will be a later step. We have deliberately left all the column types as char(10). We will revisit the type and sizes of all the columns shortly. For now, we need to work out how to relate tables. Usually, this simply entails checking that the primary key in the “must exist” table also exists in the table that is related to it. In this case, we needed to add customer_id to orderinfo, orderinfo_id to orderline, and item_id to barcode. MatthewStones_4789C12.fm Page 373 Tuesday, March 8, 2005 2:21 PM 374 CHAPTER 12 ■ DATABASE DESIGN Figure 12-7. Initial conversion to a physical data model Notice the orderline table in Figure 12-7. We can see that the combination of item_id and orderinfo_id will always be unique. Adding in the extra column we need has solved our missing primary key problem. We have one last optimization to make to our schema. We know that, for our particular business, we have a very large number of items, but wish to keep only a few of them in stock. This means that for our item table, quantity_in_stock will almost always be zero. For just a single column, this is unimportant, but consider the problem if we wanted to store a large amount of information for a stocked item, such as the date it arrived at the warehouse, a warehouse location, expiry dates, and batch numbers. These columns would always be empty for unstocked items. For the purposes of demonstration, we will separate the stock information from the item information, and hold it in its own table. This is sometimes referred to as a subsidiary table. Our physical design, with relationships added, primary keys defined (shown as underlined), and the stock information broken out, looks like Figure 12-8. Notice we have been careful to ensure that all related columns have the same name. We didn’t need to do this. We could have had a customer_ident in the orderinfo table that matched customer_id in the customer table. However, as we stressed earlier, database designs that empha- size consistency are much easier to work with. So, unless there are very good reasons to do otherwise, we strongly urge you to keep column names identical for columns that are related to each other. MatthewStones_4789C12.fm Page 374 Tuesday, March 8, 2005 2:21 PM CHAPTER 12 ■ DATABASE DESIGN 375 Figure 12-8. Conversion to physical data model with stock as a subsidiary table It’s also a good idea to be consistent in your naming. If you need an ident column as a primary key for a table, then stick to a naming rule, preferably one that is <table name>_<something>. It doesn’t matter if you use id, ident, key, or pk as the suffix. What is important is that the naming is consistent across the database. Establishing Data Types Once you have the tables, columns, and relationships, you can work through each table in turn, adding data types to each column. At this stage, you also need to identify any columns that will need to accept NULL values, and declare the remaining columns as NOT NULL. Notice that we start from the assumption that columns should be declared NOT NULL, and look for exceptions. This is a better approach than assuming NULL is allowed, because, as explained in Chapter 2, NULL values in columns are often hard to handle, so you should minimize their occurrence where you can. Generally, columns to be used as primary keys or foreign keys should be set to a native data type that can be efficiently stored and processed, such as integer. PostgreSQL will auto- matically enforce a constraint to prevent primary keys from storing NULL values. MatthewStones_4789C12.fm Page 375 Tuesday, March 8, 2005 2:21 PM 376 CHAPTER 12 ■ DATABASE DESIGN Assigning a data type for currency is often a difficult choice. Some people prefer a money type, if the database supports it. PostgreSQL does have a money type, but the documentation urges people to use numeric instead, which is what we have chosen to do in our sample data- base. You should generally avoid using a type with undefined rounding characteristics, such as a floating-point type like float(P). Fixed-precision types, such as numeric(P,S), are much safer for working with financial information, because the rounding behavior is defined. For text strings, there are a wide choice of options. When you know the length of a field exactly, and it is a fixed length, such as barcode, you will generally choose a char(N) type, where N is the required length. For other short text strings, we also prefer to us fixed-length strings, such as char(4) for a title. This is largely a matter of preference, however, and it would be just as valid to use a variable-length type for these strings. For variable-length text columns, PostgreSQL has the text type, which supports variable- length character strings. Unfortunately, this is not standard and, although similar extensions do appear in other databases, the ISO/ANSI definition defines only a varchar(N) text type, where N specifies a maximum length of the string. We value portability quite highly, so we stick with the more standard varchar(N) type. Again consistency is very important. Make sure all your numeric type fields have exactly the same precision. Check that commonly used columns such as description and name, which might appear in several tables in your database, aren’t defined differently (and thus used in different ways) in each. The fewer unique types and character lengths that you need to use, the easier your database will be to manage. Let’s work through the customer table, seeing how we assign types. The first thing to do is give a type to customer_id. It’s a column we added specially to be a primary key, so we can make it efficient by using an integer type. Titles will be things like Mr, Mrs, or Dr. This is always a short string of characters; therefore, we make it a char(4) type. Some designers prefer to always use varchar to reduce the number of types being used, and that would also be a perfectly valid choice. It’s possible not to know someone’s title, so we will allow this field to store NULL values. We then come to fname and lname, for first and last names. It’s unlikely these ever need to exceed 32 characters, but we know the length will be quite variable, so we make them both varchar(32). We also decide that we could accept fname being a NULL, but not lname. Not knowing a customer’s last name seems unreasonable. In this database, we have chosen to keep all the address parts together, in a single long field. As was discussed earlier, this is probably oversimplified for the real world, but addresses are always a design challenge; there is no fixed right answer. You need to do what is appropriate for each particular design. Notice that we store phone as a character string. It is almost always a mistake to store phone numbers as numbers in a database, because that approach does not allow interna- tional dialing codes to be stored. For example, +44 (0)116 … would be a common way of representing a United Kingdom dialing code, where the country code is 44, but if you are already in the United Kingdom, you need to add a 0 before the area code, rather than dialing the +44. Also, storing a number with leading zeros will not work in a numeric field, and in phone numbers, leading zeros are very important. We continue assigning types to columns in this way. The final type allocation for our physical database design is shown in Figure 12-9. MatthewStones_4789C12.fm Page 376 Tuesday, March 8, 2005 2:21 PM CHAPTER 12 ■ DATABASE DESIGN 377 Figure 12-9. Final conversion to physical data model Completing the Table Definitions At this point, you should go back and double-check that all the information you wish to store in the database is present. All the entities should be represented, and all the attributes listed with appropriate types. You may also decide to add some lookup, or static data, tables. For example, in our sample database, we might have a lookup table of cities or titles. Generally, these lookup tables are unrelated to any other tables, and they are simply used by the application as a convenient way of soft-coding values to offer the user. You could hard-code these options into an application, but in general, storing them in a database, from which they can be loaded into an application at runtime, makes it much easier to modify the options. Then the application doesn’t need to be changed to add new options. You just need to insert additional rows in the database lookup table. Implementing Business Rules After the table definitions are complete, you would write, or generate from a tool, the SQL to create the database schema. If all is well, you can implement any additional business rules. For each rule, you must consider if it is best implemented as a constraint, as discussed in Chapter 8, or as a trigger, as shown in Chapter 10. In general, you use constraints if possible, as these are much easier to work with. Some examples of constraints that we might wish to use in our simple database were shown in Chapter 10. MatthewStones_4789C12.fm Page 377 Tuesday, March 8, 2005 2:21 PM 378 CHAPTER 12 ■ DATABASE DESIGN Checking the Design By now, you should have a database implemented, complete with constraints and possibly triggers to enforce business rules. Before handing over your completed work, and celebrating a job well done, it’s time to test your database again. Just because a database isn’t code in the conventional sense doesn’t mean you can’t test it. Testing is a necessity, not an optional extra! Get some sample data, if possible part of the live data that will go into the database. Insert some of these sample rows. Check that attempting to insert NULL values into columns you don’t think should ever be NULL results in an error. Attempt to delete data that is referenced by other data. Try to manipulate data to break the business rules you have implemented as triggers or constraints. Write some SQL to join tables together to generate the kind of data you would expect to find on reports. Once your database has gone into production, it is difficult to update your design. Anything other than a minor change probably means stopping the system, unloading live data into text files, updating the database design, and reloading the data. This is not something you want to undertake any more than absolutely necessary. Similarly, once faulty data has been loaded into a table, you will often find it is referenced by other data and difficult to correct or remove from the database. Time spent testing the design before it goes live is time well spent. If possible, go back to your intended users and show them the sample data being extracted from the database, and how you can manipulate it. Even at this belated stage, there is much to be gained by discovering an error, even a minor one, before the system goes live. Normal Forms No chapter on database design would be complete without a mention of normal forms and database normalization. We have left these toward the end of the chapter, since they are rather dry when presented on their own. Now that we have walked through the design stages, you should see how the final design has conformed to these rules. What is commonly considered the origins of database normalization is a paper written by E.F. Codd in 1969, published in Communications of the ACM, Vol. 13, No. 6, June 1970. In later work, various normal forms were defined. Each normal form builds on previous rules and applies more stringent requirements to the design. In classic normalization theory, there are five normal forms, although others have been defined, such as Boyce-Codd normal form. You will be pleased to learn that only the first three forms are commonly used, and those are the ones we will look at here. The advantage of structuring your data so that it conforms to at least the first three normal forms is that you will find it much easier to manage. Databases that are not well normalized are almost always significantly harder to maintain and more prone to storing invalid data. First Normal Form First normal form requires that each attribute in a table cannot be further subdivided and that there are no repeating groups. For example, in our database design, we separate the customer name into a title, first name, and last name. We know we may wish to use them separately, so we must consider them as individual attributes and store them separately. MatthewStones_4789C12.fm Page 378 Tuesday, March 8, 2005 2:21 PM CHAPTER 12 ■ DATABASE DESIGN 379 The second part—no repeating groups—we saw in Chapter 2 when we looked at what happened when we tried to use a simple spreadsheet to store customers and their orders. Once a customer had more than one order, we had repeating information for that customer, and our spreadsheet no longer had the same number of rows in all columns. If we had decided earlier to hold both first names in the fname column of our customer table, this would have violated first normal form, because the column fname would actually be holding first names, which are clearly divisible entities. Sometimes, you need to take a pragmatic approach and argue that, provided that you are confident you will never need to consider different first names separately, they are, for the purposes of a particular database design, a single entity. Alternatively, you could decide to store only a single first name, which is an equally valid approach and the one we took for our sample database. Another example of violating first normal form—one that is seen with worrying frequency— is to store in a single column a character string where different character positions have different meanings. For example, characters 1 through 3 tell you the warehouse, 4 through 11 the bay, and 12 the shelf. This is a clear violation of first normal form, since you do need to consider subdivisions of the column separately. In practice, this turns out to be very hard to manage. Information being stored in this way should always be considered a design mistake, not a judi- cious stretching of the first normal form rule. Second Normal Form Second normal form says that no information in a row must depend on only part of the primary key. Suppose in our orderline table we had stored the date that the order was placed in this table, as shown in Figure 12-10. Figure 12-10. Example of breaking second normal form Recall that our primary key for orderline is a composite of orderinfo_id and item_id. The date the order was placed depends on only the orderinfo information, not on the item ordered, so this would have violated second normal form. Sometimes, you may find you are storing data that looks as though it may violate second normal form, but in practice it does not. Suppose we changed our prices frequently. Customers would rightly expect to pay the price shown on the day they ordered, not on the day it was shipped. In order to do this, we would need to store the selling price in the orderline table to record the price in effect on the day the order was placed. This would not violate second normal form, because the price stored in the orderline table would depend on both the item and the actual order. Third Normal Form Third normal form is very similar to second normal form, but more general. It says that no information in a column that is not the primary key can depend on anything except the primary key. This is often stated as, “Non-key values must depend on the key, the whole key, and nothing but the key.” MatthewStones_4789C12.fm Page 379 Tuesday, March 8, 2005 2:21 PM 380 CHAPTER 12 ■ DATABASE DESIGN Suppose in our customer table we had stored a customer’s age and date of birth, as shown in Figure 12-11. This would violate third normal form, because the customer’s age depends on the date of birth, a non-key column, as well as the actual customer, which is given by customer_id, the primary key. Figure 12-11. Example of breaking third normal form Although putting your database into third normal form (making its structure conform to all of the first three normalization rules) is almost always the preferred solution, there are occa- sions when it’s necessary to break the rules. This is called denormalizing the database, and is occasionally necessary to improve performance. You should always design a fully normalized database first, however, and denormalize it only if you know that you have a serious problem with performance. Common Patterns In database design, there are a number of common patterns that occur over and over again. It’s useful to recognize these patterns, because they generally can be solved in the same way. Here, we will look briefly at three standard problems that have standard solutions. Many-to-Many You have two entities, which seem to have a many-to-many relationship between them. It is never correct to implement a many-to-many table relationship in the physical database, so you need to break the relationship down further. The solution is almost always to insert an additional table, a link table, between the two tables that apparently have a many-to-many relationship. Suppose we had two tables, author and book. Each author could have written many books, and each book, like this one, could have had contributions from more than one author. How do we represent this in a physical database? The solution is to insert a table in between the other two tables. This link table normally contains the primary key of each of the other tables. For the author and book example, we would create a new table, bookauthor. As shown in Figure 12-12, this new table has a composite primary key, where each component is the primary key of one of the other tables. MatthewStones_4789C12.fm Page 380 Tuesday, March 8, 2005 2:21 PM CHAPTER 12 ■ DATABASE DESIGN 381 Figure 12-12. Many-to-many relationship Now each author can appear in the author table exactly once, but have many entries in the bookauthor table, one for each book the author has written. Each book appears exactly once in the book table, but can appear in the bookauthor table more than once, if the book has more than one author. However, each individual entry in the bookauthor table is unique—the combination of book and author occurs only once. Hierarchy Another frequent pattern is a hierarchy. This can appear in many different guises. Suppose we have many shops, each shop is in a geographic area, and these areas are grouped into larger areas known as regions. It might be tempting to use the design shown in Figure 12-13, where each shop stores the area and region in which it resides. Figure 12-13. Flawed hierarchy Although this might work, it’s not ideal. Once we know the area, we also know the region, so storing both the area and region in the shop table is violating third normal form. The region stored in the shop table depends on the area, which is not the primary key for the shop table. A much better design is shown in Figure 12-14. This design correctly shows the hierarchy of shop in an area, which is itself in a region. It may still be that you need to denormalize this ideal design for performance reasons, storing the region_id in the shop table. In this case, you should write a trigger to ensure that the region_id stored in the shop table is always correctly aligned with that found by looking for the region via the area table. This approach would add cost to the design, and increase the complexity of insertions and updates, in order to reduce the database query costs. MatthewStones_4789C12.fm Page 381 Tuesday, March 8, 2005 2:21 PM 382 CHAPTER 12 ■ DATABASE DESIGN Figure 12-14. Better hierarchy Recursive Relationships The recursive relationship pattern is not quite as common as the other two, but occurs frequently in a couple of situations: representing the hierarchy of staff in a company and parts explosion, where parts in an item-type table are themselves composed of other parts from the same table. Let’s consider the staff example. All staff, from the most junior to senior managers, have many attributes in common, such as name, phone number, employee number, salary, grades, and address. Therefore, it seems logical to have a single table that is common to all members of staff to store those details. How do we then store the hierarchy of management, particularly as different areas of the company may have a different number of levels of management to be represented? One answer is a recursive relationship, where each entry for a staff member in the person table stores a manager_id, to record the person who is their manager. The clever bit is that the managers’ information is stored in the same person table, generating a recursive relationship. So, to find a person’s manager, we pick up their manager_id, and look back in the same table for that to appear as an emp_id. We have stored a complex relationship, with an arbitrary number of levels, in a simple one-table structure, as illustrated in Figure 12-15. Figure 12-15. Recursive relationship Suppose we wanted to represent a slightly more complex hierarchy, such as shown in Figure 12-16. MatthewStones_4789C12.fm Page 382 Tuesday, March 8, 2005 2:21 PM [...]... associated with a connection All of these values will not change during the lifetime of a connection 391 MatthewStones_ 478 9.book Page 392 Wednesday, February 23, 2005 6:49 AM 392 CHAPTER 13 ■ ACCESSING POSTGRESQL FROM C USING LIBPQ Executing SQL with libpq Now that we can connect to a PostgreSQL database from within a C program, the next step is to execute SQL statements The query process is initiated with. .. for sample programs # in Beginning PostgreSQL # Edit the base directories for your # PostgreSQL installation INC=/usr/local/pgsql/include LIB=/usr/local/pgsql/lib CFLAGS=-I$(INC) LDLIBS=-L$(LIB) -lpq ALL=async1 async2 connect create cursor cursor2 print select1 select2 import MatthewStones_ 478 9.book Page 391 Wednesday, February 23, 2005 6:49 AM CHAPTER 13 ■ ACCESSING POSTGRESQL FROM C USING LIBPQ all:... CHAPTER 13 ■■■ Accessing PostgreSQL from C Using libpq I n this chapter, we are going to begin examining ways to create client applications for PostgreSQL Up until now in this book, we have mostly used either command-line applications such as psql that are part of the PostgreSQL distribution, or graphical tools such as pgAdmin III that have been developed specifically for PostgreSQL In Chapter 5, we learned... PQconnectdb(""); MatthewStones_ 478 9.book Page 4 07 Wednesday, February 23, 2005 6:49 AM CHAPTER 13 ■ ACCESSING POSTGRESQL FROM C USING LIBPQ if(PQstatus(conn) == CONNECTION_OK) { printf("connection made\n"); doSQL(conn, "BEGIN work"); doSQL(conn, "DECLARE mycursor CURSOR FOR " "SELECT fname, lname FROM customer"); doSQL(conn, "FETCH ALL IN mycursor"); doSQL(conn, "CLOSE mycursor"); doSQL(conn, "COMMIT work");... tell the compiler to link with -lpq and specify the PostgreSQL library directory as a place to look for libraries by using the -L option A typical libpq program has this structure: #include main() { /* Connect to a PostgreSQL database */ LOOP: /* Execute SQL statement */ /* Read query results */ /* Disconnect from database */ } MatthewStones_ 478 9.book Page 3 87 Wednesday, February 23, 2005... PQexec(myconnection, "INSERT INTO number VALUES (42, 'The Answer')"); Note that any double quotes within the SQL statement will need to be escaped with backslashes, as is necessary with psql As with connection structures, result objects must also be freed when we are finished with them We can do this with PQclear, which will also handle NULL pointers Note that results are not cleared automatically,... important foundation of good design with relational databases Finally, we looked at three common problem patterns that appear in database design, and how they are conventionally solved In the next chapter, we will begin to look at ways to build client applications using PostgreSQL, starting with the libpq library, which allows access to PostgreSQL from C MatthewStones_ 478 9.book Page 385 Wednesday, February... lname = Stones(6), fname = Adrian(6), lname = Matthew (7) , fname = Simon(5), lname = Cozens(6), fname = Neil(4), lname = Matthew (7) , fname = Richard (7) , lname = Stones(6), fname = Ann(3), lname = Stones(6), fname = Christine(9), lname = Hickman (7) , 4 07 MatthewStones_ 478 9.book Page 408 Wednesday, February 23, 2005 6:49 AM 408 CHAPTER 13 ■ ACCESSING POSTGRESQL FROM C USING LIBPQ fname = Mike(4), lname = Howard(6),... -I/usr/local/pgsql/include -lpq Other Linux distributions and other platform installations may place the include files and libraries in different places Generally, they will be in the include and lib directories of the base PostgreSQL install directory Later in this chapter, we’ll see how using a makefile can make building PostgreSQL applications a little easier Making Database Connections In general, a PostgreSQL... of our own to execute SQL statements, check the results, and print errors We will add more functionality to it as we go along The initial version follows With it, we can execute SQL queries almost as easily as we can enter commands to psql Save this code in a file called create.c: MatthewStones_ 478 9.book Page 395 Wednesday, February 23, 2005 6:49 AM CHAPTER 13 ■ ACCESSING POSTGRESQL FROM C USING LIBPQ . PostgreSQL, starting with the libpq library, which allows access to PostgreSQL from C. MatthewStones_ 478 9C12.fm Page 384 Tuesday, March 8, 2005 2:21 PM 385 ■ ■ ■ CHAPTER 13 Accessing PostgreSQL. 13 ■ ACCESSING POSTGRESQL FROM C USING LIBPQ Executing SQL with libpq Now that we can connect to a PostgreSQL database from within a C program, the next step is to execute SQL statements. The. Makefile for sample programs # in Beginning PostgreSQL # Edit the base directories for your # PostgreSQL installation INC=/usr/local/pgsql/include LIB=/usr/local/pgsql/lib CFLAGS=-I$(INC) LDLIBS=-L$(LIB)

Ngày đăng: 09/08/2014, 14:20