Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
758,75 KB
Nội dung
CHAPTER 4 • DATABASE DESIGN AND NORMALIZATION 120 Third Normal Form The rules for Third Normal Form are that the database must be in Second Normal Form and that all nonkey fields must directly depend on the primary key. The most obvious violations of Third Normal Form are calculated fields. If you design an Invoice table that includes Quantity, Price, and TotalPrice fields (with TotalPrice being simply Quantity multiplied by Price), you’ve violated Third Normal Form. You can derive the total price any time you need it by knowing the Quantity and Price values for the record. Storing it requires you to make multiple changes to keep the record self-consistent any time you must change one of these fields. Third Normal Form also helps you see that some tables need to be split into multiple pieces. For example, in the Second Normal Form of the animal feed example, if a sup- plier moved to a different city, you’d need to make changes to more than one row of the Feed table. This is an inefficient and potentially error-prone process. You’re better off moving the list of suppliers and cities to its own table. Tables 4.8, 4.9, and 4.10 show the animal feed database in Third Normal Form. Another way to think about Third Normal Form is that it’s concerned with making each table contain information about only one thing. In the Second Normal Form version of these tables, the Feed table contained both facts about feeds and facts about suppliers. Now the supplier facts are in their own table. There is still a Supplier- Name field in the Feed table, because you still need to be able to trace the relation- ships between the tables and preserve referential integrity. Also, you can use the Breed field in the Animal table and the Breed field in the Feed table to trace the relation- ships between animals and feeds. For example, llamas eat pasture and llama feed. TABLE 4.8: ANIMAL TABLE IN THIRD NORMAL FORM Name* Breed Danny Horse Tango Llama Scotty Llama Genghis Goat 2627ch04.qxd 8/22/00 10:02 AM Page 120 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 121 TABLE 4.9: FEED TABLE IN THIRD NORMAL FORM Breed* Feed* SupplierName Horse Pasture Jones Horse Sweet Feed Grange Llama Pasture Jones Llama Purina Llama Feed Grange Goat Hi-Pro CostCo TABLE 4.10: SUPPLIERCITY TABLE IN THIRD NORMAL FORM Supplier* City Jones Endicott Grange Colfax CostCo Spokane Boyce-Codd Normal Form There’s still one problem with the feed tables in Third Normal Form. Although the SupplierName field in the Feed table does depend on the primary key of the table (that is, knowing the Breed and Feed, you can deduce the SupplierName), the field depends on only a part of that key. So if you decide to buy a type of feed from a different supplier, you might need to fix multiple rows of the table. Boyce-Codd Normal Form, sometimes called BCNF, adds the restriction that every column not in the primary key must depend on the entire primary key. This is not the case in Table 4.9 (in the previous section), because the Supplier depends only on the Feed column. Once again, the problem can be remedied by splitting the tables further. Tables 4.11 through 4.14 show the example feed database in BCNF. BOYCE-CODD NORMAL FORM Introducing SQLServer PART I 2627ch04.qxd 8/22/00 10:02 AM Page 121 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 4 • DATABASE DESIGN AND NORMALIZATION 122 TABLE 4.11: ANIMAL TABLE IN BCNF Name* Breed Danny Horse Tango Llama Scotty Llama Genghis Goat TABLE 4.12: FEED TABLE IN BCNF Breed* Feed* Horse Pasture Horse Sweet Feed Llama Pasture Llama Purina Llama Feed Goat Hi-Pro TABLE 4.13: FEEDSUPPLIER TABLE IN BCNF Feed* Supplier Pasture Jones Sweet Feed Grange Purina Llama Feed Grange Hi-Pro CostCo TABLE 4.14: SUPPLIERCITY TABLE IN BCNF Supplier* City Jones Endicott Grange Colfax CostCo Spokane 2627ch04.qxd 8/22/00 10:02 AM Page 122 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 123 If you examine these tables and think about the sorts of information you might like to change in the database, you can see that any potential change will affect only one row of a table at a time. This is the end result of normalization: a set of tables that can be updated easily without the need to change more than one piece of data at a time to make the updates. Advanced Normalization It’s worth mentioning that BCNF is not the end of the road for normalization. Data- base researchers have identified additional normal forms, including Fourth Normal Form and Fifth Normal Form. For most everyday databases, though, putting your tables into BCNF should be sufficient. In fact, if your database is relatively straightfor- ward, it may already be in Fifth Normal Form when you design it in BCNF. If the data- base is complex enough to be subject to the problems that lead to Fourth and Fifth Normal Forms, you might want to consult someone who does a lot of normalization for guidance. Fourth Normal Form Fourth Normal Form addresses the issues that arise when there are dependencies of sets of entities. For example, suppose you’re designing tables for a database used by a college math department to track course assignments. There might be a set of books used in each course and a set of teachers who teach each course. One approach would be to create a single table as shown in Table 4.15. TABLE 4.15: EXAMPLE TABLE NOT IN FOURTH NORMAL FORM Teacher* Course* Text* George Algebra Fundamentals of Algebra George Algebra Advanced Algebra Phyllis Algebra Fundamentals of Algebra Phyllis Algebra Advanced Algebra Ethel Geometry Plato’s Solids Ethel Geometry Mickey Does Geometry Adam Geometry Plato’s Solids Adam Geometry Mickey Does Geometry ADVANCED NORMALIZATION Introducing SQLServer PART I 2627ch04.qxd 8/22/00 10:02 AM Page 123 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 4 • DATABASE DESIGN AND NORMALIZATION 124 This table is in Third Normal Form, but it still suffers from a problem when you try to insert a new teacher for an existing course with multiple texts. For example, if you added another teacher for the Geometry course, you’d have to add two rows to the table, one for each text used in the course. In this case, the table contains what is called a multi-valued dependency. The course doesn’t determine the teacher uniquely, but it does determine a set of teachers. The same applies to the relation between course and text—the course doesn’t deter- mine the text, but it does determine a set of texts. To obtain Fourth Normal Form, you can break this single table down into two tables, one for each relation implied in the first table. These two tables are shown in Tables 4.16 and 4.17. TABLE 4.16: COURSETEACHER TABLE IN FOURTH NORMAL FORM Course* Teacher* Algebra George Algebra Phyllis Geometry Ethel Geometry Adam TABLE 4.17: COURSETEXT TABLE IN FOURTH NORMAL FORM Course* Text* Algebra Fundamentals of Algebra Algebra Advanced Algebra Geometry Plato’s Solids Geometry Mickey Does Geometry Now you can assign a new teacher to a course, or a new text to a course, with only a single insertion operation. Further, you retain the flexibility to have one teacher teach multiple courses, which would not be the case if you used Teacher as the pri- mary key in the CourseTeacher table. 2627ch04.qxd 8/22/00 10:02 AM Page 124 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 125 Fifth Normal Form Fifth Normal Form addresses an issue where a table can’t be decomposed into two tables without losing information, but it can be decomposed into more than two tables. Examples that demonstrate this tend to be highly artificial and difficult to understand, so we won’t try to give one here. The important thing is to know that Fifth Normal Form is mainly an academic notion, not one of practical database design. It’s hard to find such dependencies in any real database, and the inefficiencies they produce are not large in practice. In other words, it’s not really worth knowing more than this about Fifth Normal Form. Denormalization Just as normalization is the process of arranging data in a fashion that allows mak- ing changes without redundancy, denormalization is the process of deliberately introducing redundancy to your data. Theoretically, of course, one should never denormalize data. However, in the real world, things are not quite that simple. Sometimes it may be necessary to denormalize data in the interest of performance. An overnormalized database can be slow on a network due to the number of joins that have to be performed to retrieve data from multiple tables. For instance, in the Farms database, suppose you need to know all the cities where you purchased food for a particular animal. That would require retrieving information from all of the tables in the database. TIP When you are forced to denormalize data for performance, make sure you docu- ment your decision, so that another developer doesn’t think you simply made a mistake. Although it’s not possible to tell you exactly how (or whether) to denormalize tables in all circumstances, we can offer some guidance. If your normalized data model produces tables with multipart primary keys, particularly if those keys have four or more columns in them and are used in joins with other tables, you should consider denormalizing the data by introducing arbitrary surrogate keys. Identity columns, combined with UNIQUE constraints, provide a convenient means for creating these surrogate keys. You can then add arbitrary foreign keys to tables that join back to the main table and enforce the join on the surrogate keys instead. This will often provide a substantial performance benefit, because SQLServer can resolve the relationships faster between tables if those relationships are represented in a single field. DENORMALIZATION Introducing SQLServer PART I 2627ch04.qxd 8/22/00 10:02 AM Page 125 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 4 • DATABASE DESIGN AND NORMALIZATION 126 If producing calculated values such as maximum historic prices involves complex queries with many joins, you should consider denormalizing the data by adding cal- culated columns to your tables to hold these values. You can use triggers to ensure that these calculated columns are updated whenever one of the columns that they depend on is updated (for more on triggers, see Chapter 14). If your database contains extremely large tables, you should consider denormaliz- ing the data by creating multiple redundant tables instead. You may do this either by column or by row. For example, if an Employees table contains many columns and some of these (such as hire date) are very infrequently used, it may help performance to move the less frequently used columns to a separate table. By reducing the volume of data in the main table, you can make it faster to access this data. If the Employees table is worldwide and most queries require information about employees from only one region, you can speed up the queries by creating separate tables for each region. If data is no longer live and is being used for archiving, or is otherwise read-only, denormalizing by storing calculated values in fields can make certain queries run faster. In this case, you might also consider using Microsoft Analysis Server to store the nonlive data for fast analysis. We’ll talk about Analysis Server in Chapter 27. TIP If you split a table into multiple tables by row, you can still query all the data by using the Transact-SQL UNION operator. You’ll learn about the UNION operator in Chapter 6. If queries on a single table frequently use only one column from a second table, consider including a copy of that single field in the first table. For example, you might choose to include the SupplierCity field in the Feed table, even though the table already includes the SupplierName, because you always print your shopping list orga- nized by the city where each feed store is located. In this case, of course, you’ll need to write code to ensure that the SupplierCity field is updated every time the Supplier- Name is changed. This code might take the form of a stored procedure that is used to update supplier information. WARN I NG Remember that you should never denormalize your data without a spe- cific business reason for the denormalization. Careless denormalization can ruin the integrity of your data and lead to slower performance as well—if you denormalize too far, you’ll end up including many extra fields in each table, and it takes time to move that extra data from one place in your application to another. 2627ch04.qxd 8/22/00 10:02 AM Page 126 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 127 Making the Trade-offs So, given a list of rules for normalization and a set of ideas for denormalization, how do you make the trade-offs between the two? Although it’s impossible to give a cook- book recipe for coming up with the perfect database, here’s a strategy that’s worked well for many people in practice: 1. Inspect the data to be stored in your database. Be sure you talk to end users at this point to get a sense of what they really need to know. Don’t just ask about what they think needs to be stored, ask what they need to do with the data. Often this last step will reveal additional data that needs to be stored. 2. Normalize the database design to BCNF. 3. Armed with the BCNF design of the database, review the list of operations that users wish to perform with the data. Make sure that there’s enough data to per- form each of these operations. Also make sure that none of the operations require multiple simultaneous rows to be updated in the same table (a sign that you’ve not completely normalized the database). 4. Implement the BCNF version of the database. Build the necessary user interface to allow users to work with the data. 5. Deploy a pilot version of the application. 6. During the pilot program, collect information using SQL Profiler on all opera- tions performed. 7. Use the SQL Profiler information to tune the indexes in your database. Inspect the SQL Profiler information to identify bottlenecks. SQL Profiler was covered in Chapter 3, and you’ll learn about index tuning in Chapter 25. 8. Interview users to identify any operations during which the database isn’t per- forming quickly enough. 9. Use the information from steps 7 and 8 to selectively denormalize the database. 10. Repeat steps 5 through 9 until the database delivers adequate performance. TIP If you must maintain the design of a large database with many tables, or if you’re frequently involved in database design projects, you may find a third-party design product to be helpful. These products allow you to concentrate on the logical design of the data- base and automatically produce the physical design to match. Tools in this category include Platinum ERwin (http://www.platinum.com/products/appdev/erwin_ps.htm) and Visio Enterprise (http://www.visio.com/visio2000/enterprise/ ). DENORMALIZATION Introducing SQLServer PART I 2627ch04.qxd 8/22/00 10:02 AM Page 127 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. CHAPTER 4 • DATABASE DESIGN AND NORMALIZATION 128 Tools for Normalization in SQLServerSQLServer supplies a number of tools that help you maintain your database in a nor- malized form. These tools help make sure that only sensible data is inserted in tables and that only sensible changes can be made. Anytime you can enforce normalization directly at the server, you don’t have to write application code to do so. This is a big win for most databases. In this section, we’ll look briefly at these tools: • Identity columns • Constraints • Rules • Declarative referential integrity • Triggers • Database Diagrams All of these tools are covered in more detail later in the book, but let’s get the big picture before we dig into the details. Identity Columns A simple tool for enforcing entity integrity is the identity column. An identity column is a column in a table for which SQLServer automatically supplies values. By default, the first value is 1, and each succeeding value is one more than the previous value, but both the starting value (the seed) and the increment can be specified by the data- base designer. An identity column provides a handy way to include a surrogate key in a table’s design. Surrogate keys often lead to enhanced database linking by relating tables on small numeric columns rather than more natural textual data. NOTE You’ll learn how to create identity columns in Chapter 11. Constraints SQLServer uses constraints to enforce limitations on the data that can be entered into a particular column in a table. Constraints are rules that govern what data is accept- able for a particular column in a table. You can use UNIQUE, DEFAULT, and CHECK constraints to enforce entity, domain, and user-defined integrity. In addition, SQL 2627ch04.qxd 8/22/00 10:02 AM Page 128 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 129 Server uses PRIMARY KEY and FOREIGN KEY constraints to implement referential integrity. These two types of constraints are discussed in their own section later in this chapter. Chapter 8 shows you how to create constraints when you’re building tables in your own databases. TIP If a constraint is violated, the command that caused the violation is terminated and has no effect. However, if this command is part of a batch transaction, the transaction will continue. If statements in a transaction may violate constraints, you should check the value of the @@ERROR global variable and execute a ROLLBACK TRANSACTION statement if the @@ERROR variable is not equal to zero. Chapter 8 has more information on using transac- tions in SQL Server. UNIQUE Constraints A UNIQUE constraint specifies that all values in a given column must be unique; that is, the column must have a different value in every row in the table. A table can have multiple UNIQUE constraints, in which case they must all be satisfied for every row. UNIQUE constraints bring entity integrity to a table because they guarantee that every row is different. Any table that has a primary key consisting of a single column should also have a UNIQUE constraint applied to this column. If you’re using SQL Server’s declarative referential integrity (DRI), SQLServer will automatically create a unique index on this column for you. WARN I NG If you’ve used Microsoft Access, you might expect a SQLServer identity column to automatically enforce entity integrity, but this is not the case. You can insert duplicated values into an identity column. To enforce entity integrity, you should also apply a UNIQUE constraint to the column. DEFAULT Constraints A DEFAULT constraint gives you a way to supply a default value for a column in any table. That is, the constraint provides the value that will be stored with new rows in the data when the value for the column is not otherwise specified. DEFAULT con- straints can help enforce domain integrity by providing reasonable values for new records. They also help with some user-defined integrity problems: For example, all new customers might start with an account balance of zero. TOOLS FOR NORMALIZATION IN SQLSERVER Introducing SQLServer PART I 2627ch04.qxd 8/22/00 10:02 AM Page 129 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... SQL- 92 At that time, SQLServer was compliant with the entry level of the standard Since then, the federal testing program has been discontinued, and SQLServer has been revised The bottom line for you, as a developer working with SQL Server, is that most basic SQL is the same from product to product What you learn by learning the SQL implemented by SQLServer is close enough to ANSI SQL- 92 to give you... TRANSACT -SQL OVERVIEW AND BASICS TI P For the most part, the default behavior with SQLServer is SET ANSI_DEFAULTS ON followed by SET CURSOR_CLOSE_ON_COMMIT OFF and SET IMPLICIT_TRANSACTIONS OFF This is the set of choices made by the SQLServer ODBC driver and the SQLServer OLE DB provider when they connect to the server Because all of the built-in tools (Query Analyzer, SQL Profiler, and so on) use the SQL. .. trying to use an application written for an older version of SQLServer Using SET for ANSI Compatibility The SET statement is one of the workhorses of the SQL language You can use SET in SQL scripts to alter a wide range of server behaviors In particular, SET can be used to change some defaults in SQLServer s processing to adhere to the SQL- 92 standard Let’s start with one of the possible SET statements... • Chapters 6 and 7 will introduce you to some common SQL queries • Chapter 8 covers some advanced SQL topics • Chapter 10 will show you how to use SQL to construct database objects What Is Transact -SQL? Transact -SQL is simply Microsoft’s implementation of the standard Structured Query Language (SQL) Sometimes called T -SQL, but usually just called SQL (at least by developers who work with Microsoft products),... return a null • NUMERIC_ROUNDABORT tells SQLServer to terminate a query if any loss of precision occurs in a query expression • RECURSIVE_TRIGGERS tells SQLServer to use the results of triggers to trigger other triggers • RECOVERY FULL causes SQLServer to log enough information to be robust in the case of any media failure RECOVERY BULK_LOGGED causes SQL Server to compress log information for certain... a head start if you ever move to a different product SQL Configuration Options Over the years, SQLServer has moved more and more into compliance with SQL- 92 This has posed some problems for database administrators who depended on nonstandard features in previous versions So SQL Server provides several mechanisms for adjusting the behavior of its SQL in certain circumstances These mechanisms—the SET... sp_dbcmptlevel is to make SQL Server behave as if it were a previous version of itself That is, if you execute: sp_dbcmptlevel ‘Northwind’, 60 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark 2627ch05.qxt 8/22/00 10:35 AM Page 149 T -SQL SYNTAX AND CONVENTIONS 149 the Northwind database will behave as if it’s running on SQL Server 6.0 instead of SQL Server 2000 Changing the... Refer to SQLServer Books Online if you’d like to see the whole list TI P You should limit the use of sp_dbcmptlevel to applications that you’re migrating from a previous version of SQLServer There’s no cause to use it with new applications T -SQL Syntax and Conventions Reading Syntax Diagrams Here’s the full set of rules for reading the syntax diagrams of T -SQL statements: • Words in UPPERCASE are SQL. .. last user exits • AUTO_CREATE_STATISTICS ON tells SQLServer to build any statistics needed by a query whenever that query is optimized Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Transact -SQL 2627ch05.qxt 2627ch05.qxt 146 8/22/00 10:35 AM Page 146 CHAPTER 5 • TRANSACT -SQL OVERVIEW AND BASICS • AUTO_SHRINK ON tells SQLServer that it’s OK to shrink this database if it... some of the tools that SQLServer supplies for enforcing normalization within a database You’ll learn much more about those tools in the coming chapters First, though, it’s time to learn about the language used within SQLServer itself: Transact -SQL Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark 2627ch05.qxt 8/22/00 10:35 AM Page 135 PA R T II Transact -SQL LEARN TO: • Understand . been discontinued, and SQL Server has been revised. The bottom line for you, as a developer working with SQL Server, is that most basic SQL is the same from. learning the SQL imple- mented by SQL Server is close enough to ANSI SQL- 92 to give you a head start if you ever move to a different product. SQL Configuration