1. Trang chủ
  2. » Công Nghệ Thông Tin

The Language of SQL- P43 potx

5 247 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 5
Dung lượng 118,23 KB

Nội dung

Despite these provisos, certain database design principles have evolved over time to guide us in our quest for an optimal design structure. It should be said from the outset that the most influential architect of relational database design is E.F. Codd, who published his groundbreaking article, ‘‘A Relational Model of Data for Large Shared Data Banks’’ in 1970. This article laid the foundation for what we now call the relational model and the concept of normalization. Goals of Normalization The term normalization refers to a specific process that allows designers to turn unstructured data into a properly designed set of tables and data elements. The best way to understand normalization is to illustrate what it isn’t. To do this, we’ll start with the presentation of a poorly designed table with a number of obvious problems. The following is a table named Grades, and it attempts to present information about all of the grades that students have received for the tests they’ve taken. Each row represents a grade for a particula r student. Test Student Date Total Points Grade TestFormat Teacher Assistant Pronoun Quiz Amy 2009-03-02 10 8 Multiple Choice Smith Collins Pronoun Quiz Jon 2009-03-02 10 6 Multiple Choice Jones Brown Solids Quiz Beth 2009-03-03 20 17 Multiple Choice Kaplan NULL China Test Karen 2009-02-04 50 45 Essay Harris Taylor China Test Alex 2009-03-04 50 38 Essay Harris Taylor Grammar Test Karen 2009-03-05 100 88 Multiple Choice, Essay Smith Collins Let’s first list the information that each column in this table is meant to provide. The columns are: ■ Test: A description of the test or quiz given ■ Student: The student who took the test ■ Date: The dat e on which the test was taken ■ TotalPoints: The total number of possible poi nts for the test ■ Grade: The number of points that the student received Chapter 19 ■ Principles of Database Design196 ■ TestFormat: The format of the test, either essay, multiple choice, or both ■ Teacher: The teacher who gave the test ■ Assistant: The person who assisted the teacher in this class We’re going to assume that the primary key for this table is a composite primary key consisting of the Test and Student columns. Each row in the table is meant to express a grade for a specific test and student. Let’s now discuss two obvious problems with this table. First, certain data is unnecessarily duplicated. For example, you can see that the Pronoun Quiz, which was given on 2009-03-02, had a total of 10 points. The problem, however, is that this information needs to be repeated on every row for that quiz. It would be better if we could simply record the total points for that particular quiz once. A second problem is that data is repeated within certain single cells. We have a row for which the TestFormat is both Multiple Choice and Essay. This was done because the test had both types of questions. But this makes the data difficult to utilize. If we wanted to retrieve all tests with essay questions, how could we do that? To be more general, the main problem with this table is that it attempts to put all information into a single table. It would be much better to break down the information in this table into separate entities, such as students, grades, and teachers, representing each entity as a separate table. The power of SQL can then be used to join tables together to retrieve any needed information. With this discussion in mind, let’s now formalize what the process of normal- ization hopes to accomplish. There are two main goals: ■ Eliminate redundant data. The above example clearly illustrates the issue of redundant data. But why is this important? What exactly is the problem with listing the same data on multi ple rows? Besides the obvious duplication of effort, one answer is that redundancy reduces flexibility. When data is repeated, that means that any changes to particular values affect multiple rows rather than just one. ■ Eliminate insert, delete, and update anomalies. The problem of redundant data also relates to this second goal, which is to eliminate insert, delete, and Goals of Normalization 197 update anomalies. Let’s say, for example, that one particular teacher gets married and changes her name. You would like the data to reflect the new name. You now need to do an update on all rows that contain her name. Because the data is stored redundantly, you need to update a large amount of data, rather than just one row. There are also insert and delete anomalies. For example, let’s say you just hired a new teacher to teach music. You would like to record that some- where in your database. However, since that teacher hasn’t yet given any tests, there is nowhere to put this information, since you don’t have a table specific to the entity of teachers. Similarly, a delete anomaly would occur if you wanted to delete a row, but by doing so that would eliminate some related piece of information. To use another example, if you had a database of books and wanted to delete a row for a book by Nathaniel Hawthorne, and if that were the only book for Mr. Hawthorne, then that row deletion would not only eliminate the book, but also the fact that Nathanial Hawthorne is an author of other books you might acquire in the future. How to Normalize Data We’ve been throwing around the term normalization for a while. It’s now time to be more specific about what it means. The term itself originates with E.F. Codd, and it refers to a series of recommended steps taken to remove redundancy and update anomalies from a database design. The steps involved in the normalization process are commonly referred to as first normal form, second normal form, third normal form, and so on. Although certain individuals have described steps up to sixth normal form, the usual practice is to go only through first, second, and third normal form. When data is in third normal form, it is generally said to be sufficiently normalized. We are not going to describe the entire set of rules and procedures for converting data into first, second, and third normal form. There are texts that will lead you through the process in great detail, showing you how to transform data first into first normal form, then into second form, and then finally into third normal form. Instead, we are going to summarize the rules for getting your data into third normal form. In practice, an experienced database administrator can jump from Chapter 19 ■ Principles of Database Design198 unstructured data to third normal form without having to follow every inter- mediate procedure. We will do the same thing here. The three main rules for normalizing your data are as follows: ■ Eliminate repeating data. This rule means that no multivalued attributes are allowed. In the previous example, we cannot allow a value such as Multiple Choice, Essay to exist in a single data cell. The existence of multiple values in a single cell creates obvious difficulties in retrieving data by any given specified value. A corollary to this rule is that repeated columns are not allowed. In our example, the database might have been designed so that, rather than a single column named TestFormat, we had two separate columns named Test Format1 and TestFormat2. With this alternative approach, we might have placed the value Multiple Choice in the Test Format1 column and Essay in the TestFormat2 column. This would not be permitted. We don’t want to have repeated data, whether it is multiple values in a single column or multiple columns to handle similar data. ■ Eliminate partial dependencies. This rule refers primarily to situations where the primary key for a table is a composite primary key, meaning a key composed of multiple columns. The rule states that no column in the table can be related only to part of the primary key. Let’s illustrate with an example. As mentioned, the primary key in the Grades table is a composite key consisting of the Student and Test columns. The problem occurs with columns such as TotalPoints. The TotalPoints column is really an attribute of the test and has nothing to do with students. This rule mandates that all non-key columns in a table refer to the entire key, and not just a part of the key. In essence, partial dependencies indicate that the data in the table relates to more than one entity. ■ Eliminate transitive dependencies. This rule refers to situations where a column in the table refers not to the primary key, but to another non-key column in the same table. In this example, the Assistant column is really an attribute of the Teacher column. The fact that Assistant relates to the teacher and not to anything in the primary key (test or student) indicates that the information doesn’t belong in this table. How to Normalize Data 199 So we’ve seen the problems and have talked about the rules for fixing the data. How are proper database design changes actually determined? This is where experience comes in. And there is generally not a single solution to the problem. That said, the following is one solution to this design problem. In this new design, several tables have been created from the one original table, and all data is now in normalized form. Figure 19.1 shows the tables in the new design, shown without data. The primary keys in each table are shown in bold. A number of ID columns with auto-increment values have been added to the tables, allowing relationships between the tables to be defined. All the other columns are the same as shown before. The main point to notice is that every entity discussed in this example has been broken out into separate tables. The Students table has information about each student. The only attribute in this table is the student name. The Grades table has information about each grade. It has a composite primary key of StudentID and TestID because each grade is tied to a student and to a specific test given. The Tests table has information about each test given, such as the date, TeacherID, the test description, and the total points for the test The Formats table has information about the test formats. Multiple rows are added to this table for each test, to show whether the test is multiple choice, essay, or both. The Teachers table has information about each teacher, including the teacher’s assistant, if there is one. The following shows the dat a contained in these new tables, corresponding to the data in the original Grades table. Figure 19.1 Normalized design. Chapter 19 ■ Principles of Database Design200 . dat e on which the test was taken ■ TotalPoints: The total number of possible poi nts for the test ■ Grade: The number of points that the student received Chapter 19 ■ Principles of Database Design196 ■. Database Design196 ■ TestFormat: The format of the test, either essay, multiple choice, or both ■ Teacher: The teacher who gave the test ■ Assistant: The person who assisted the teacher in this class We’re. list the information that each column in this table is meant to provide. The columns are: ■ Test: A description of the test or quiz given ■ Student: The student who took the test ■ Date: The dat

Ngày đăng: 05/07/2014, 05:20

TỪ KHÓA LIÊN QUAN