I also add plenty of sample data in the script. You don’t want to work with actual data early on, because you are guaranteed to mess up somewhere during the process. However, it is a good idea to work with sample data that is a copied sub- set of the actual data. Your sample data should anticipate some of the anomalies that might occur in actual data. (For example, what if a person doesn’t have a middle name?) My entire script for the spy database is available on the book’s CD as buildSpy.sql. All SQL code fragments shown in the rest of this chapter come from that file and use the MySQL syntax. If you can’t use MySQL or want to try an alter- native, check out appendix B for information on SQLite, an intriguing alternative to MySQL. SQLite scripts and database files for all the database examples in the book are packaged on the CD that accompanies this book. Setting Up the System I began my SQL script with some comments that describe the database and a few design decisions I made when building the database: ###################################### # buildSpy.sql # builds and populates all databases for spy examples # uses mysql - should adapt easily to other rdbms # by Andy Harris for PHP/MySQL for Abs. Beg ###################################### 368 P H P 5 /M y S Q L P r o g r a m m i n g f o r t h e A b s o l u t e B e g i n n e r IN THE REAL WORLD Professional programmers often use expensive software tools to help build data diagrams, but you don’t need anything more than paper and pencil to draw ER figures. I do my best data design with a partner drawing on a white board. I like to talk through designs out loud and look at them in a large format. Once I’ve got a sense of the design, I usually use a vector-based drawing program to pro- duce a more formal version of the diagram. This type of drawing tool is useful because it allows you to connect elements together, already has the crow’s feet lines available, and allows you to move ele- ments around without disrupting the lines between them. Dia is an excellent open-source program for drawing all kinds of diagrams. I used it to produce all the ER figures in this chapter. A copy of Dia is on the CD that accompanies this book. ###################################### # conventions ###################################### # primary key = table name . ID # primary key always first field # all primary keys autonumbered # all field names camel-cased # only link tables use underscore # foreign keys indicated although mySQL does not enforce # every table used as foreign reference has a name field ###################################### ###################################### #housekeeping ###################################### use chapter11; DROP TABLE IF EXISTS badSpy; DROP TABLE IF EXISTS agent; DROP TABLE IF EXISTS operation; DROP TABLE IF EXISTS specialty; DROP TABLE IF EXISTS agent_specialty; DROP TABLE IF EXISTS spyFirst; Notice that I specified a series of conventions. These self-imposed rules help make my database easier to manage. Some of the rules might not make sense yet (because I haven’t identified what a foreign key is, for instance), but the impor- tant thing is that I have clearly identified some rules that help later on. The code then specifies the chapter11 database and deletes all tables if they already existed. This behavior ensures that I start with a fresh version of the data. Creating the agent Table The normalized agent table is quite simple. The actual table is shown in Table 11.3. The only data remaining in the agent table is the agent’s name and a numerical field for the operation. The operationID field is used as the glue that holds together the agent and operation tables. I’ve added a few things to improve the SQL code that creates the agent table. 369 C h a p t e r 1 1 D a t a N o r m a l i z a t i o n 370 P H P 5 /M y S Q L P r o g r a m m i n g f o r t h e A b s o l u t e B e g i n n e r Agent ID Name Operation ID 1 Bond 1 2 Falcon 1 3 Cardinal 2 4 Blackford 2 TABLE 11.3 THE AGENT TABLE These improvements enhance the behavior of the agent table, and simplify the table tremendously. CREATE TABLE agent ( agentID int(11) NOT NULL AUTO_INCREMENT, name varchar(50) default NULL, operationID int(11) default NULL, PRIMARY KEY (agentID), FOREIGN KEY (operationID) REFERENCES operation (operationID) ); Recall that the first field in a table is usually called the primary key. Primary keys must be unique and each record must have one. • I named each primary key according to a special convention. Primary key names always begin with the table name and end with ID. I added this convention because it makes things easier when I write programs to work with this data. • The NOT NULL modifier ensures that all records of this table must have a primary key. • The AUTO_INCREMENT identifier is a special tool that allows MySQL to pick a new value for this field if no value is specified. This will ensure that all entries are unique. • I added an indicator at the end of the CREATE TABLE statement to indicate that agentID is the primary key of the agent table. Not all databases use the AUTO_INCREMENT feature the same way as MySQL, but most offer an alternative. You might need to look up some other way to automati- cally generate key fields if you aren’t using MySQL. Check the Help system for whatever DBMS you’re using to learn any specific quirks. TRAP 371 C h a p t e r 1 1 D a t a N o r m a l i z a t i o n Creating a Reference to the operation Table Take a careful look at the operationID field. This field contains an integer, which refers to a particular operation. I also added an indicator specifying operationID as a foreign key reference to the operation table. The operationID field in the agent table contains a reference to the primary key of the operation table. This type of field is referred to as a foreign key. Some DBMS systems require you to specify primary and foreign keys. MySQL cur- rently does not require this, but it’s a good idea to do so anyway for two reasons. First, it’s likely that future versions of MySQL will require these statements, because they improve a database’s reliability. Second, it’s good to specify in the code when you want a field to have a special purpose, even if the DBMS doesn’t do anything with that information. Inserting a Value into the agent Table The INSERT statements for the agent table have one new trick made possible by the primary key’s AUTO_INCREMENT designation. INSERT INTO agent VALUES( null, ‘Bond’, 1 ); The primary key is initialized with the value null. This might be surprising because primary keys are explicitly designed to never contain a null value. Since the agentID field is set to AUTO_INCREMENT, the null value is automatically replaced with an unused integer. This trick ensures that each primary key value is unique. Building the operation Table The new operation table contains information referring to an operation. TRICK Operation ID Name Description Location 1 Dancing Elephant Infiltrate suspicious zoo London 2 Enduring Angst Make bad guys feel really guilty Lower Volta 3 Furious Dandelion Plant crabgrass in enemy lawns East Java TABLE 11.4 THE OPERATION TABLE Each operation gets its own record in the operation table. All the data corre- sponding to an operation is stored in the operation record. Each operation’s data is stored only one time. This has a number of positive effects: • It’s necessary to enter operation data only once per operation, saving time on data entry. • Since there’s no repeated data, you won’t have data inconsistency prob- lems (like the circus/zoo problem). • The new database requires less space, because there’s no repeated data. • The operation is not necessarily tied to an agent, so you won’t accidentally delete all references to an operation by deleting the only agent assigned to that mission. (Remember, this could happen with the original data design.) • If you need to update operation data, you don’t need to go through every agent to figure out who was assigned to that operation. (Again, you would have had to do this with the old database design.) The SQL used to create the operation table is much like that used for the agent table: CREATE TABLE operation ( operationID int(11) NOT NULL AUTO_INCREMENT, name varchar(50) default NULL, description varchar(50) default NULL, location varchar(50) default NULL, PRIMARY KEY (`OperationID`) ); INSERT INTO operation VALUES( null, ‘Dancing Elephant’, ‘Infiltrate suspicious zoo’, ‘London’ ); As you can see, the operation table conforms to the rules of normalization, and it also is much like the agent table. Notice that I’m being very careful about how I name things. SQL is (theoretically) case-insensitive, but I’ve found that this is not always true. (I have found this especially in MySQL, where the Windows ver- sions appear unconcerned about case, but UNIX versions treat operationID and OperationID as different field names.) I specified that all field names will use camel-case (just like you’ve been doing with your PHP variables). I also named the key field according to my own formula (table name followed by ID). 372 P H P 5 /M y S Q L P r o g r a m m i n g f o r t h e A b s o l u t e B e g i n n e r . databases for spy examples # uses mysql - should adapt easily to other rdbms # by Andy Harris for PHP/ MySQL for Abs. Beg ###################################### 368 P H P 5 /M y S Q L P r o g r a m m i n g f o r t h e A b s o l u t e B e g i n n e r IN. names.) I specified that all field names will use camel-case (just like you’ve been doing with your PHP variables). I also named the key field according to my own formula (table name followed by ID). 372 P H P 5 /M y S Q L P r o g r a m m i n g f o r t h e A b s o l u t e B e g i n n e r