Joe Celko s SQL for Smarties - Advanced SQL Programming P5 pptx

12 CHAPTER 1: DATABASE DESIGN When you use NULLs in math calculations, they propagate in the results so that the answer is another NULL. When you use them in logical expressions or comparisons, they return a logical value of UNKNOWN and give SQL its strange three-valued logic. They sort either always high or always low in the collation sequence. They group together for some operations, but not for others. In short, NULLs cause a lot of irregular features in SQL, which we will discuss later. Your best bet is just to memorize the situations and the rules for NULLs when you cannot avoid them. CHECK() Constraint The CHECK() constraint tests the rows of the table against a logical expression, which SQL calls a search condition, and rejects rows whose search condition returns FALSE. However, the constraint accepts rows when the search condition returns TRUE or UNKNOWN. This is not the same rule as the WHERE clause, which rejects rows that test UNKNOWN. The reason for this “benefit-of-the-doubt” feature is so that it will be easy to write constraints on NULL-able columns. The usual technique is to do simple range checking, such as CHECK (rating BETWEEN 1 AND 10), or to verify that a column’s value is in an enumerated set, such as CHECK (sex IN (0, 1, 2, 9)), with this constraint. Remember that the sex column could also be set to NULL, unless a NOT NULL constraint is also added to the column’s declaration. Although it is optional, it is a really good idea to use a constraint name. Without it, most SQL products will create a huge, ugly, unreadable random string for the name, since they need to have one in the schema tables. If you provide your own, you can drop the constraint more easily and understand the error messages when the constraint is violated. For example, you can enforce the rule that a firm must not hire anyone younger than 21 years of age for a job that requires a liquor- serving license by using a single check clause to check the applicant’s birth date and hire date. However, you cannot put the current system date into the CHECK() clause logic for obvious reasons, it is always changing. The real power of the CHECK() clause comes from writing complex expressions that verify relationships with other rows, with other tables, or with constants. Before SQL-92, the CHECK() constraint could only reference columns in the table in which it was declared. In Standard SQL, the CHECK() constraint can reference any schema object. As an example of how complex things can get, consider a database of movies. 1.1 Schema and Table Creation 13 First, let’s enforce the rule that no country can export more than ten titles. CREATE TABLE Exports (movie_title CHAR(25) NOT NULL, country_code CHAR(2) NOT NULL, use 2-letter ISO nation codes sales_amt DECIMAL(12, 2) NOT NULL, PRIMARY KEY (movie_title, country_code), CONSTRAINT National_Quota CHECK ( reference to same table 10 <= ALL (SELECT COUNT(movie_title) FROM Exports AS E1 GROUP BY E1.country_code)) ); When doing a self-join, you must use the base table name and not all correlation names. Let’s make sure no movies from different countries have the same title. CREATE TABLE ExportMovies (movie_title CHAR(25) NOT NULL, country_code CHAR(2) NOT NULL, sales_amt DECIMAL(12, 2) NOT NULL, PRIMARY KEY (movie_title, country_code), CONSTRAINT National_Quota CHECK (NOT EXISTS self-join (SELECT * FROM ExportMovies AS E1 WHERE ExportMovies.movie_title = E1.movie_title AND ExportMovies.country_code <> E1.country_code) ); Here is way to enforce the rule that you cannot export a movie to its own country of origin. CREATE TABLE ExportMovies (movie_title CHAR(25) NOT NULL, country_code CHAR(2) NOT NULL, sales_amt DECIMAL(12, 2) NOT NULL, PRIMARY KEY (movie_title, country_code), CONSTRAINT Foreign_film 14 CHAPTER 1: DATABASE DESIGN CHECK (NOT EXISTS reference to second table (SELECT * FROM Movies AS M1 WHERE M1.movie_title = ExportMovies.movie_title AND M1.country_of_origin = ExportMovies.country_code))); These table-level constraints often use a NOT EXISTS() predicate. Despite the fact that you can often do a lot of work in a single constraint, it is better to write a lot of small constraints, so that you know exactly what went wrong when one of them is violated. Another important point to remember is that all constraints are true if the table is empty. This is handled by the CREATE ASSERTION statement, which we will discuss shortly. UNIQUE and PRIMARY KEY Constraints The UNIQUE constraint says that no duplicate values are allowed in the column or columns involved. <unique specification> ::= UNIQUE | PRIMARY KEY File system programmers understand the concept of a PRIMARY KEY, but for the wrong reasons. They are used to a file, which can have only one key because that key is used to determine the physical order of the records within the file. Tables have no order; the term PRIMARY KEY in SQL has to do with defaults in referential actions, which we will discuss later. There are some subtle differences between UNIQUE and PRIMARY KEY. A table can have only one PRIMARY KEY but many UNIQUE constraints. A PRIMARY KEY is automatically declared to have a NOT NULL constraint on it, but a UNIQUE column can have NULLs in a row unless you explicitly add a NOT NULL constraint. Adding the NOT NULL whenever possible is a good idea, as it makes the column into a proper relational key. There is also a multiple-column form of the <unique specification>, which is usually written at the end of the column declarations. It is a list of columns in parentheses after the proper keyword, and it means that the combination of those columns is unique. For example, I might declare PRIMARY KEY (city, department) so I can be sure that though I have offices in many cities and many identical departments in those offices, there is only one personnel department in Chicago. 1.1 Schema and Table Creation 15 REFERENCES Clause The <references specification> is the simplest version of a referential constraint definition, which can be quite tricky. For now, let us just consider the simplest case: <references specification> ::= [CONSTRAINT <constraint name>] REFERENCES <referenced table name>[(<reference column>)] This relates two tables together, so it is different from the other options we have discussed so far. What this says is that the value in this column of the referencing table must appear somewhere in the referenced table’s column named in the constraint. Furthermore, the referenced column must be in a UNIQUE constraint. For example, you can set up a rule that the Orders table can have orders only for goods that appear in the Inventory table. If no <reference column> is given, then the PRIMARY KEY column of the referenced table is assumed to be the target. This is one of those situations where the PRIMARY KEY is important, but you can always play it safe and explicitly name a column. There is no rule to prevent several columns from referencing the same target column. For example, we might have a table of flight crews that has pilot and copilot columns that both reference a table of certified pilots. A circular reference is a relationship in which one table references a second table, which in turn references the first table. The old gag about “you cannot get a job until you have experience, and you cannot get experience until you have a job!” is the classic version of this. Notice that the columns in a multicolumn FOREIGN KEY must match to a multicolumn PRIMARY KEY or UNIQUE constraint. The syntax is: [CONSTRAINT <constraint name>] FOREIGN KEY (<column list>) REFERENCES <referenced table name>[(<reference column list>)] Referential Actions The REFERENCES clause can have two subclauses that take actions when a database event changes the referenced table. This feature came with Standard SQL and took a while to be implemented in most SQL products. The two database events are updates and deletes, and the subclauses look like this: 16 CHAPTER 1: DATABASE DESIGN <referential triggered action> ::= <update rule> [<delete rule>] | <delete rule> [<update rule>] <update rule> ::= ON UPDATE <referential action> <delete rule> ::= ON DELETE <referential action> <referential action> ::= CASCADE | SET NULL | SET DEFAULT | NO ACTION When the referenced table is changed, one of the referential actions is set in motion by the SQL engine. 1. The CASCADE option will change the values in the referencing table to the new value in the referenced table. This is a very common method of DDL programming that allows you to set up a single table as the trusted source for an identifier. This way the system can propagate changes automatically. This removes one of the arguments for nonrelational system- generated surrogate keys. In early SQL products that were based on a file system for their physical implementation, the values were repeated for both the referenced and referencing tables. Why? The tables were regarded as separate units, like files. Later SQL products regarded the schema as a whole. The referenced values appeared once in the referenced table, and the referencing tables obtained them by following pointer chains to that one occurrence in the schema. The results are much faster update cascades, a physically smaller database, faster joins, and faster aggregations. 2. The SET NULL option will change the values in the referencing table to a NULL. Obviously, the referencing column needs to be NULL-able. 3. The SET DEFAULT option will change the values in the referencing table to the default value of that column. Obviously, the referencing column needs to have some DEFAULT declared for it, but each referencing column can have its own default in its own table. 4. The NO ACTION option explains itself. Nothing is changed in the referencing table, and it is possible that some error message about reference violation will be raised. If a referential 1.1 Schema and Table Creation 17 constraint does not specify any ON UPDATE or ON DELETE rule in the update rule, then NO ACTION is implicit. You will also see the reserved word RESTRICT in some products instead of NO ACTION. Standard SQL has more options about how matching is done between the referenced and referencing tables. Most SQL products have not implemented them, so I will not mention them anymore. Standard SQL has deferrable constraints. This option lets the programmer turn a constraint off during a session, so that the table can be put into a state that would otherwise be illegal. However, at the end of a session, all the constraints are enforced. Many SQL products have implemented these options, and they can be quite handy, but I will not mention them until we get to the section on transaction control. 1.1.4 UNIQUE Constraints versus UNIQUE Indexes UNIQUE constraints are not the same thing as UNIQUE indexes. Technically speaking, indexes do not even exist in Standard SQL. They were considered too physical to be part of a logical model of a language. In practice, however, virtually all products have some form of “access enhancement” for the DBA to use, and most often, it is an index. The column referenced by a FOREIGN KEY has to be either a PRIMARY KEY or a column with a UNIQUE constraint; a unique index on the same set of columns cannot be referenced, since the index is on one table and not a relationship between two tables. Although there is no order to a constraint, an index is ordered, so the unique index might be an aid for sorting. Some products construct special index structures for the declarative referential integrity (DRI) constraints, which in effect “pre- JOIN” the referenced and referencing tables. All the constraints can be defined as equivalent to some CHECK constraint. For example: PRIMARY KEY = CHECK (UNIQUE (SELECT <key columns> FROM <table>) AND (<key columns>) IS NOT NULL) UNIQUE = CHECK (UNIQUE (SELECT <key columns> FROM <table>)) NOT NULL = CHECK (<column> IS NOT NULL) These predicates can be reworded in terms of other predicates and subquery expressions, and then passed on to the optimizer. 18 CHAPTER 1: DATABASE DESIGN 1.1.5 Nested UNIQUE Constraints One of the basic tricks in SQL is representing a one-to-one or many-to- many relationship with a table that references the two (or more) entity tables involved by their primary keys. This third table has several popular names, such as “junction table” or “join table,” but we know that it is a relationship. This type of table needs constraints to ensure that the relationships work properly. For example, here are two tables: CREATE TABLE Boys (boy_name VARCHAR(30) NOT NULL PRIMARY KEY ); CREATE TABLE Girls (girl_name VARCHAR(30) NOT NULL PRIMARY KEY, ); Yes, I know using names for a key is a bad practice, but it will make my examples easier to read. There are a lot of different relationships that we can make between these two tables. If you don’t believe me, just watch the Jerry Springer Show sometime. The simplest relationship table looks like this: CREATE TABLE Couples (boy_name VARCHAR(30) NOT NULL REFERENCES Boys (boy_name) ON UPDATE CASCADE ON DELETE CASCADE, girl_name VARCHAR(30) NOT NULL, REFERENCES Girls(girl_name) ON UPDATE CASCADE ON DELETE CASCADE); The Couples table allows us to insert rows like this: ('Joe Celko', 'Hilary Duff') ('Joe Celko', 'Lindsay Lohan') ('Toby McGuire', 'Lindsay Lohan') ('Joe Celko', 'Hilary Duff') 1.1 Schema and Table Creation 19 Oops! I am shown twice with Hilary Duff, because the Couples table does not have its own key. This mistake is easy to make, but the way to fix it is not obvious. CREATE TABLE Orgy (boy_name VARCHAR(30) NOT NULL REFERENCES Boys (boy_name) ON DELETE CASCADE ON UPDATE CASCADE, girl_name VARCHAR(30) NOT NULL, REFERENCES Girls(girl_name) ON UPDATE CASCADE ON DELETE CASCADE, PRIMARY KEY (boy_name, girl_name)); compound key The Orgy table gets rid of the duplicated rows and makes this a proper table. The primary key for the table is made up of two or more columns and is called a compound key because of that fact. These are valid rows now. ('Joe Celko', 'Hilary Duff') ('Joe Celko', 'Lindsay Lohan') ('Toby McGuire’, 'Lindsay Lohan') But the only restriction on the couples is that they appear only once. Every boy can be paired with every girl, much to the dismay of the Moral Majority. I think I want to make a rule that guys can have as many gals as they want, but the gals have to stick to one guy. The way I do this is to use a NOT NULL UNIQUE constraint on the girl_name column, which makes it a key. It is a simple key, since it is only one column, but it is also a nested key, because it appears as a subset of the compound PRIMARY KEY. CREATE TABLE Playboys (boy_name VARCHAR(30) NOT NULL REFERENCES Boys (boy_name) ON UPDATE CASCADE ON DELETE CASCADE, girl_name VARCHAR(30) NOT NULL UNIQUE, nested key REFERENCES Girls(girl_name) ON UPDATE CASCADE 20 CHAPTER 1: DATABASE DESIGN ON DELETE CASCADE, PRIMARY KEY (boy_name, girl_name)); compound key The Playboys is a proper table, without duplicated results, but it also enforces the condition that I get to play around with one or more ladies. ('Joe Celko', 'Hilary Duff') ('Joe Celko', 'Lindsay Lohan') The women might want to go the other way and keep company with a series of men. CREATE TABLE Playgirls (boy_name VARCHAR(30) NOT NULL UNIQUE nested key REFERENCES Boys (boy_name) ON UPDATE CASCADE ON DELETE CASCADE, girl_name VARCHAR(30) NOT NULL, REFERENCES Girls(girl_name) ON UPDATE CASCADE ON DELETE CASCADE, PRIMARY KEY (boy_name, girl_name)); compound key The Playgirls table would permit these rows from our original set. ('Joe Celko', 'Lindsay Lohan') ('Toby McGuire', 'Lindsay Lohan') Think about all of these possible keys for a minute. The compound PRIMARY KEY is now redundant. If each boy appears only once in the table, or each girl appears only once in the table, then each (boy_name, girl_name) pair can appear only once. However, the redundancy can be useful in searching the table, because it will probably create extra indexes that give us a covering of both names. The query engine then can use just the index and touch the base tables. The Moral Majority is pretty upset about this Hollywood scandal and would love for us to stop running around and settle down in nice stable couples. CREATE TABLE Marriages (boy_name VARCHAR(30) NOT NULL UNIQUE nested key REFERENCES Boys (boy_name) 1.1 Schema and Table Creation 21 ON UPDATE CASCADE ON DELETE CASCADE, girl_name VARCHAR(30) NOT NULL UNIQUE nested key, REFERENCES Girls(girl_name) ON UPDATE CASCADE ON DELETE CASCADE, PRIMARY KEY(boy_name, girl_name)); redundant compound key!! Since one of the goals of an RDBMS (relational database management system) is to remove redundancy, why would I have that compound primary key? One reason might be to get a covering index on both columns for performance. But the more likely answer is that this is an error that a smart optimizer will spot. I leave same-sex marriages as an exercise for the reader. The Couples table allows us to insert these rows from the original set. ('Joe Celko', 'Hilary Duff') ('Toby McGuire', 'Lindsay Lohan') However, SQL products and theory do not always match. Many products make the assumption that the PRIMARY KEY is somehow special in the data model and will be the way that they should access the table most of the time. In fairness, making special provision for the PRIMARY KEY is not a bad assumption, because the REFERENCES clause uses the PRIMARY KEY of the referenced table as the default. Many new SQL programmers are not aware that a FOREIGN KEY constraint can also reference any UNIQUE constraint in the same table or in another table. The following nightmare code will give you an idea of the possibilities. The multiple column versions follow the same syntax. CREATE TABLE Foo (foo_key INTEGER NOT NULL PRIMARY KEY, self_ref INTEGER NOT NULL REFERENCES Foo(fookey), outside_ref_1 INTEGER NOT NULL REFERENCES Bar(bar_key), outside_ref_2 INTEGER NOT NULL REFERENCES Bar(other_key), ); . insert rows like this: (&apos ;Joe Celko& apos;, 'Hilary Duff') (&apos ;Joe Celko& apos;, 'Lindsay Lohan') ('Toby McGuire', 'Lindsay Lohan') (&apos ;Joe Celko& apos;,. predicates and subquery expressions, and then passed on to the optimizer. 18 CHAPTER 1: DATABASE DESIGN 1.1.5 Nested UNIQUE Constraints One of the basic tricks in SQL is representing a one-to-one. columns and is called a compound key because of that fact. These are valid rows now. (&apos ;Joe Celko& apos;, 'Hilary Duff') (&apos ;Joe Celko& apos;, 'Lindsay Lohan') ('Toby

Định dạng
Số trang	10
Dung lượng	136,07 KB