Joe Celko s SQL for Smarties - Advanced SQL Programming P6 docx

10 322 0
Joe Celko s SQL for Smarties - Advanced SQL Programming P6 docx

Đang tải... (xem toàn văn)

Thông tin tài liệu

22 CHAPTER 1: DATABASE DESIGN CREATE TABLE Bar (bar_key INTEGER NOT NULL PRIMARY KEY, other_key INTEGER NOT NULL UNIQUE, ); 1.1.6 Overlapping Keys But let’s get back to the nested keys. Just how far can we go with them? My favorite example is a teacher’s schedule kept in a table like this [I am leaving out reference clauses and CHECK() constraints]: CREATE TABLE Schedule (teacher_name VARCHAR(15) NOT NULL, class_name CHAR(15) NOT NULL, room_nbr INTEGER NOT NULL, period INTEGER NOT NULL, PRIMARY KEY (teacher_name, class_name, room_nbr, period)); That choice of a primary key is the most obvious one, use all the columns. Typical rows would look like this: ('Mr. Celko', 'Database 101', 222, 6) The rules we want to enforce are: 1. A teacher is in only one room each period. 2. A teacher teaches only one class each period. 3. A room has only one class each period. 4. A room has only one teacher in it each period. Stop reading and see what you come up with for an answer. Okay, now consider using one constraint for each rule in the list, thus. CREATE TABLE Schedule_1 version one, WRONG! (teacher_name VARCHAR(15) NOT NULL, class_name CHAR(15) NOT NULL, room_nbr INTEGER NOT NULL, period INTEGER NOT NULL, UNIQUE (teacher_name, room_nbr, period), rule #1 UNIQUE (teacher_name, class_name, period), rule #2 1.1 Schema and Table Creation 23 UNIQUE (class_name, room_nbr, period), rule #3 UNIQUE (teacher_name, room_nbr, period), rule #4 PRIMARY KEY (teacher_name, class_name, room_nbr, period)); We know that there are four ways to pick three things from a set of four things. While column order is important in creating an index, we can ignore it for now and then worry about index tuning later. I could drop the PRIMARY KEY as redundant if I have all four of these constraints in place. But what happens if I drop the PRIMARY KEY and then one of the constraints? CREATE TABLE Schedule_2 still wrong (teacher_name VARCHAR(15) NOT NULL, class_name CHAR(15) NOT NULL, room_nbr INTEGER NOT NULL, period INTEGER NOT NULL, UNIQUE (teacher_name, room_nbr, period), rule #1 UNIQUE (teacher_name, class_name, period), rule #2 UNIQUE (class_name, room_nbr, period)); rule #3 I can now insert these rows in the second version of the table: ('Mr. Celko', 'Database 101', 222, 6) ('Mr. Celko', 'Database 102', 223, 6) This gives me a very tough sixth-period teaching load, because I have to be in two different rooms at the same time. Things can get even worse when another teacher is added to the schedule: ('Mr. Celko', 'Database 101', 222, 6) ('Mr. Celko', 'Database 102', 223, 6) ('Ms. Shields', 'Database 101', 223, 6) Ms. Shields and I are both in room 223, trying to teach different classes at the same time. Matthew Burr looked at the constraints and the rules, and he came up with this analysis. CREATE TABLE Schedule_3 correct version (teacher_name VARCHAR(15) NOT NULL, class_name CHAR(15) NOT NULL, room_nbr INTEGER NOT NULL, 24 CHAPTER 1: DATABASE DESIGN period INTEGER NOT NULL, UNIQUE (teacher_name, period), rules #1 and #2 UNIQUE (room_nbr, period), UNIQUE (class_name, period)); rules #3 and #4 If a teacher is in only one room each period, then given a period and a teacher I should be able to determine only one room; i.e., room is functionally dependent upon the combination of teacher and period. Likewise, if a teacher teaches only one class each period, then class is functionally dependent upon the combination of teacher and period. The same thinking holds for the last two rules: class is functionally dependent upon the combination of room and period, and teacher is functionally dependent upon the combination of room and period. With the constraints that were provided in the first version, you will find that the rules are not enforced. For example, I could enter the following rows: (‘Mr. Celko’, ‘Database 101’, 222, 6) (‘Mr. Celko’, ‘Database 102’, 223, 6) These rows violate the first and second rules. However, the unique constraints first provided in Schedule_2 do not capture this violation and will allow the rows to be entered. The following constraint: UNIQUE (teacher_name, room_nbr, period) checks the complete combination of teacher, room, and period, and since ('Mr. Celko', 222, 6) is different from ('Mr. Celko', 223, 6), the DDL does not find any problem with both rows being entered, even though that means that Mr. Celko is in more than one room during the same period. The constraint: UNIQUE (teacher_name, class_name, period) does not catch its associated rule either, since ('Mr. Celko', 'Database 101', 6) is different from ('Mr. Celko', 'Database 102', 6). As a result, Mr. Celko is able to teach more than one class during the same period, thus violating rule #2. It seems that we’d also be able to add the following row: 1.1 Schema and Table Creation 25 ('Ms. Shields', 'Database 103', 222, 6) This violates the third and fourth rules. 1.1.7 CREATE ASSERTION Constraints In Standard SQL, CREATE ASSERTION allows you to apply a constraint on the tables within a schema, but not to attach the constraint to any particular table. The syntax is: <assertion definition> ::= CREATE ASSERTION <constraint name> <assertion check> [<constraint attributes>] <assertion check> ::= CHECK <left paren> <search condition> <right paren> As you would expect, there is a DROP ASSERTION statement, but no ALTER statement. An assertion can do things that a CHECK() clause attached to a table cannot do, because it is outside of the tables involved. A CHECK() constraint is always TRUE if the table is empty. For example, it is very hard to make a rule that the total number of employees in the company must be equal to the total number of employees in all the health plan tables. CREATE ASSERTION Total_health_Coverage CHECK (SELECT COUNT(*) FROM Personnel) = + (SELECT COUNT(*) FROM HealthPlan_1) + (SELECT COUNT(*) FROM HealthPlan_2) + (SELECT COUNT(*) FROM HealthPlan_3); 1.1.8 Using VIEWs for Schema Level Constraints Until you can get CREATE ASSERTION constraints, you have to use procedures and triggers to get the same effects. Consider a schema for a chain of stores that has three tables, thus: CREATE TABLE Stores (store_nbr INTEGER NOT NULL PRIMARY KEY, store_name CHAR(35) NOT NULL, ); 26 CHAPTER 1: DATABASE DESIGN CREATE TABLE Personnel (ssn CHAR(9) NOT NULL PRIMARY KEY, last_name CHAR(15) NOT NULL, first_name CHAR(15) NOT NULL, ); The first two explain themselves. The third table, following, shows the relationship between stores and personnel, namely who is assigned to what job at which store and when this happened. Thus: CREATE TABLE JobAssignments (store_nbr INTEGER NOT NULL REFERENCES Stores (store_nbr) ON UPDATE CASCADE ON DELETE CASCADE, ssn CHAR(9) NOT NULL PRIMARY KEY REFERENCES Personnel( ssn) ON UPDATE CASCADE ON DELETE CASCADE, start_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, end_date TIMESTAMP, CHECK (start_date <= end_date), job_type INTEGER DEFAULT 0 NOT NULL unassigned = 0 CHECK (job_type BETWEEN 0 AND 99), PRIMARY KEY (store_nbr, ssn, start_date)); Let’s invent some job_type codes, such as 0 = 'unassigned', 1 = 'stockboy', etc., until we get to 99 = 'Store Manager'. We have a rule that each store has, at most, one manager. In Standard SQL, you could write a constraint like this: CREATE ASSERTION ManagerVerification CHECK (1 <= ALL (SELECT COUNT(*) FROM JobAssignments WHERE job_type = 99 GROUP BY store_nbr)); This is actually a bit subtler than it looks. If you change the <= to =, then the stores must have exactly one manager if it has any employees at all. But as we said, most SQL products still do not allow CHECK() constraints that apply to the table as a whole, nor do they support the scheme-level CREATE ASSERTION statement. 1.1 Schema and Table Creation 27 So, how to do this? You might use a trigger, which will involve proprietary, procedural code. Despite the SQL/PSM Standard, most vendors implement very different trigger models and use their proprietary 4GL language in the body of the trigger. We need to set TRIGGERs that validate the state of the table after each INSERT and UPDATE operation. If we DELETE an employee, this will not create more than one manager per store. The skeleton for these triggers would be something like this: CREATE TRIGGER CheckManagers AFTER UPDATE ON JobAssignments same for INSERT IF 1 <= ALL (SELECT COUNT(*) FROM JobAssignments WHERE job_type = 99 GROUP BY store_nbr) THEN ROLLBACK; ELSE COMMIT; END IF; But being a fanatic, I want a pure SQL solution that is declarative within the limits of most current SQL products. Let’s create two tables. This first table is a Personnel table for the store managers only and it is keyed on their Social Security numbers. Notice the use of DEFAULT and CHECK() on their job_type to ensure that this is really a “managers only” table. CREATE TABLE Job_99_Assignments (store_nbr INTEGER NOT NULL PRIMARY KEY REFERENCES Stores (store_nbr) ON UPDATE CASCADE ON DELETE CASCADE, ssn CHAR(9) NOT NULL REFERENCES Personnel (ssn) ON UPDATE CASCADE ON DELETE CASCADE, start_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, end_date TIMESTAMP, CHECK (start_date <= end_date), job_type INTEGER DEFAULT 99 NOT NULL CHECK (job_type = 99)); 28 CHAPTER 1: DATABASE DESIGN This second table is a Personnel table for employees who are not 'store manager' and it is also keyed on Social Security numbers. Notice the use of DEFAULT for a starting position of 'unassigned' and CHECK() on their job_type to ensure that this is really a “no managers allowed” table. CREATE TABLE Job_not99_Assignments (store_nbr INTEGER NOT NULL REFERENCES Stores (store_nbr) ON UPDATE CASCADE ON DELETE CASCADE, ssn CHAR(9) NOT NULL PRIMARY KEY REFERENCES Personnel (ssn) ON UPDATE CASCADE ON DELETE CASCADE, start_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, end_date TIMESTAMP, CHECK (start_date <= end_date), job_type INTEGER DEFAULT 0 NOT NULL CHECK (job_type BETWEEN 0 AND 98) no 99 code ); From these two tables, build this UNIONed view of all the job assignments in the entire company and show that to users. CREATE VIEW JobAssignments (store_nbr, ssn, start_date, end_date, job_type) AS (SELECT store_nbr, ssn, start_date, end_date, job_type FROM Job_not99_Assignments UNION ALL SELECT store_nbr, ssn, start_date, end_date, job_type FROM Job_99_Assignments) The key and job_type constraints in each table, working together, will guarantee at most manager per store. The next step is to add INSTEAD OF triggers to the VIEW or write stored procedures, so the users can insert, update, and delete from it easily. A simple stored procedure, without error handling or input validation, would be: CREATE PROCEDURE InsertJobAssignments 1.1 Schema and Table Creation 29 (IN store_nbr INTEGER, IN new_ssn CHAR(9), IN new_start_date DATE, IN new_end_date DATE, IN new_job_type INTEGER) LANGUAGE SQL IF new_job_typ <> 99 THEN INSERT INTO Job_not99_Assignments VALUES (store_nbr, new_ssn, new_start_date, new_end_date, new_job_type); ELSE INSERT INTO Job_99_Assignments VALUES (store_nbr, new_ssn, new_start_date, new_end_date, new_job_type); END IF; Likewise, a procedure to terminate an employee: CREATE PROCEDURE FireEmployee (IN new_ssn CHAR(9)) LANGUAGE SQL IF new_job_typ <> 99 THEN DELETE FROM Job_not99_Assignments WHERE ssn = new_ssn; ELSE DELETE FROM Job_99_Assignments WHERE ssn = new_ssn; END IF; If a developer attempts to change the Job_Assignments VIEW directly with an INSERT, UPDATE, or DELETE, he will get an error message telling him that the VIEW is not updatable because it contains a UNION operation. That is a good thing in one way, because we can force the developer to use only the stored procedures. Again, this is an exercise in programming a solution within certain limits. The TRIGGER is probably going give better performance than the VIEW. 1.1.9 Using PRIMARY KEYs and ASSERTIONs for Constraints Let’s do another version of the “stores and personnel” problem given in section 1.1.8. CREATE TABLE JobAssignments (ssn CHAR(9) NOT NULL PRIMARY KEY nobody is in two Stores REFERENCES Personnel (ssn) ON UPDATE CASCADE 30 CHAPTER 1: DATABASE DESIGN ON DELETE CASCADE, store_nbr INTEGER NOT NULL REFERENCES Stores (store_nbr) ON UPDATE CASCADE ON DELETE CASCADE); The key on the Social Security number will ensure that nobody is at two stores, and that a store can have many employees assigned to it. Ideally, you want an SQL-92 constraint to check that each employee does have a branch assignment. The first attempt is usually something like this. CREATE ASSERTION Nobody_Unassigned CHECK (NOT EXISTS (SELECT * FROM Personnel AS P LEFT OUTER JOIN JobAssignments AS J ON P.ssn = J.ssn WHERE J.ssn IS NULL AND P.ssn IN (SELECT ssn FROM JobAssignments UNION SELECT ssn FROM Personnel))); However, this example is overkill and does not prevent an employee from being at more than one store. There are probably indexes on the Social Security number values in both Personnel and JobAssignments tables, so getting a COUNT() function should be cheap. This assertion will also work. CREATE ASSERTION Everyone_assigned_one_store CHECK ((SELECT COUNT(ssn) FROM JobAssignments) = (SELECT COUNT(ssn) FROM Personnel)); This is a surprise to people at first, because they expect to see a JOIN to do the one-to-one mapping between personnel and job assignments. But the PK-FK (primary key–foreign key) requirement provides that for you. Any unassigned employee will make the Personnel table bigger than the JobAssignments table, and an employee in JobAssignments must have a match in Personnel. Good optimizers extract things like that as 1.1 Schema and Table Creation 31 predicates and use them, which is why we want declarative referential integrity, instead of triggers and application-side logic. You will need to have a stored procedure that inserts into both tables as a single transaction. The updates and deletes will cascade and clean up the job assignments. Let’s change the specs a bit and allow employees to work at more than one store. If we want to have employees in multiple Stores, we could change the keys on JobAssignments, thus. CREATE TABLE JobAssignments (ssn CHAR(9) NOT NULL REFERENCES Personnel (ssn) ON UPDATE CASCADE ON DELETE CASCADE, store_nbr INTEGER NOT NULL REFERENCES Stores (store_nbr) ON UPDATE CASCADE ON DELETE CASCADE, PRIMARY KEY (ssn, store_nbr)); Then use a COUNT(DISTINCT ) in the assertion. CREATE ASSERTION Everyone_assigned_at_least_once CHECK ((SELECT COUNT(DISTINCT ssn) FROM JobAssignments) = (SELECT COUNT(ssn) FROM Personnel)); You must be aware that the uniqueness constraints and assertions work together; a change in one or both of them can also change this rule. 1.1.10 Avoiding Attribute Splitting Attribute splitting takes many forms. It occurs when you have a single attribute, but put its values in more than one place in the schema. The most common form of attribute splitting is to create separate tables for each value. Another form of attribute splitting is to create separate rows in the same table for part of each value. These concepts are probably easier to show with examples. Attribute Split Tables If I were to create a database with a table for male employees and separate table for female employees, you would immediately see that . JobAssignments AS J ON P.ssn = J.ssn WHERE J.ssn IS NULL AND P.ssn IN (SELECT ssn FROM JobAssignments UNION SELECT ssn FROM Personnel))); However, this example is overkill and does not prevent. CREATE ASSERTION Everyone_assigned_one_store CHECK ((SELECT COUNT(ssn) FROM JobAssignments) = (SELECT COUNT(ssn) FROM Personnel)); This is a surprise to people at first, because they expect to see. Using VIEWs for Schema Level Constraints Until you can get CREATE ASSERTION constraints, you have to use procedures and triggers to get the same effects. Consider a schema for a chain of stores

Ngày đăng: 06/07/2014, 09:20

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan