Students table: StudentID Student 1 Amy 2 Jon 3 Beth 4 Karen 5 Alex Teachers table: TeacherID Teacher Assistant 1 Smith Collins 2 Jones Brown 3 Kaplan NULL 4 Harris Taylor Tests table: TestID TeacherID Test Date TotalPoints 1 1 Pronoun Quiz 2009-03-02 10 2 2 Pronoun Quiz 2009-03-02 10 3 3 Solids Quiz 2009-03-03 20 4 4 China Test 2009-03-04 50 5 1 Grammar Test 2009-03-05 100 Formats table: TestID TestFormat 1 Multiple Choice 2 Multiple Choice 3 Multiple Choice 4 Essay 5 Multiple Choice 5 Essay How to Normalize Data 201 Grades table: StudentID TestID Grade 118 226 3317 4445 5438 4588 Your first impression might be that we have unnecessarily complicated the situation, rather than improved it. For example, the Grades table is now a mass of numbers, the meaning of which is not completely obvious on quick inspection. This is true. However, remembering the ability of SQL to join tables together easily, you can also see that there is now much greater flexibility in this new design. Not only are we free to join together only those tables needed for any particular analysis, but we can now add new columns to these tables much more readily, without affecting anything else. Our information has become more modularized. For example, if we should decide that we want to capture additional information about each student, such as address and phone, we can simply add new columns to the Students table. Additionally, when we want to modify a student’s address or phone later, it only affects one row in the table The Art of Database Design Ultimately, designing a database is much more than simply going through the normalization procedures. Database design is really more of an art than a sci- ence, and it requires asking and thinking about relevant business issues. In our grades example, we presented one possible database design as an illustra- tion of how to normalize data. In truth, there are many possibilities for designing this database. Much depends on the realities of how the data will be accessed and modified. Numerous questions can be asked to ascertain whether your design is as flexible and meaningful as it needs to be. For example: ■ Are there other tables that need to be added to our database? One obvious choice would be a Subjects table, so you could easily select tests by subject, Chapter 19 ■ Principles of Database Design202 such as English or Math. If you did this, would you relate the subject to the test or to the teacher who gave the test? ■ Is it possible for a grade to count in more than one subject? Maybe the English and Social Studies teachers are doing a combined lesson and want certain tests to count for both subjects. How do you account for that? ■ What do you do if a child flunks a grade and is now taking the same tests for a second year? How do you differentiate his grade now from last year’s grades? ■ How do you allow for special rules that teachers might implement, such as dropping the lowest quiz score in a particular time period? ■ Are there special analysis requirements for the data? If there is more than one teacher for the same subject, do you want to be able to compare the average grades for the students of each teacher, to make sure that one teacher isn’t unfairly inflating grades? The list of possible questions is endless. But the point is that data doesn’t exist in a vacuum. There is a necessar y interaction between data design and requirements in the real world. Databases need to be designed in such a way as to allow for needed flexibil ity. However, there is also a danger that databases can be overly designed to a point whe re the data b ecomes unintelligible. A zealous database administrator may decide to create 20 tables to all ow for every possib le situation. That, too, is inad- visable. Database d es ign is someth ing of a balancing ac t in search of a design that is sufficiently flexible but also intuitive and understandable by users of the system. Alternatives to Normalization We have emphasized that normalization is the overriding principle that should be followed in designing a database. In certain situations, however, there are viable alternatives that might make more sense. For example, in the realm of data warehouse systems and software, many prac- titioners advocate utilizing a star sche ma design for databases rather than nor- malization. In a star schema, a certain amount of redundancy is allowed and encouraged. The emphasis is on creating a data structure that more intuitively reflects business realities, and also one that allows for quick processing of data by special analytical software. Alternatives to Normalization 203 To give a brief overview of star schema designs, the mai n idea is to create a central fact table, which is related to any number of dimension tables. The fact table contains all the quantitative numbers that are additive in nature. In our prior example, the Grade column is such a number, since we can add up grades to obtain a meaningful total grade. The dimension tables contain information on all the entities that are related to the central facts, such as subject, time, teacher, student, and so on. Furthermore, special analytical software exists that allows database developers to create cubes from their star schema databases. These cubes extend analysis capa- bilities, allowing users to drill down predefined hierarchies, which are defined in the various dimensions. A user of such a system would be able to drill down from viewing a semester’s worth of grades for a student, to his grades in any individual week. Figure 19.2 shows what a database with a star schema design might look like for our grades example. In this design, the Grades table is the central fact table. The other tables are all dimension tables. The first four columns in the Grades table (Date, TestID, Studen tID, and TeacherID) are there only to relate the table to each of the dimensions. The other two columns have the additive numeric quantities we talked about. Notice that Figure 19.2 Star schema design. Chapter 19 ■ Principles of Database Design204 TotalPoints is now in the Grades table. In our normalized design, it was an attribute of the Tests table. By putting both the Grade and TotalPoints in the Grades table, we can use our analytical software to easily sum up grades and compute average grades (Grade divided by the TotalPoints) for any set of data. Certainly, this is only a brief introduction to the subject of designing databases for data warehouses. It illustrates the point that there are many different ways to design a database, and the best way often relates to the type of software that will be used with the data. Looking Ahead This chapter covered the principles of database design. We went over the basics of the normalization process, showing how a database with a single table can be converted into a more flexible structure with multiple tables, related by addi- tional key columns. We also emphasized that database design is not merely a technical exercise. Attention must be paid to organizational realities and to considerations as to how the data will be utilized. Finally, we briefly described one alternative to the conventional normalized design, in an effort to emphasize that there is often more than one approach to this endeavor. In our final chapter, ‘‘Strategies for Displaying Data,’’ we’re going to discuss some interesting possibilities for using reporting software tools to complement our knowledge of SQL. In our quest to sharpen our SQL skills, we must not forget that there is a world beyond SQL. We make to make sure that we don’t expend our efforts in SQL when the underlying objective can be accomplished more effectively through other means. Looking Ahead 205 . database, and the best way often relates to the type of software that will be used with the data. Looking Ahead This chapter covered the principles of database design. We went over the basics of the normalization. table is the central fact table. The other tables are all dimension tables. The first four columns in the Grades table (Date, TestID, Studen tID, and TeacherID) are there only to relate the table. period? ■ Are there special analysis requirements for the data? If there is more than one teacher for the same subject, do you want to be able to compare the average grades for the students of each