52 CHAPTER 1: DATABASE DESIGN <domain constraint> ::= [<constraint name definition>] <check constraint definition> [<constraint attributes>] <alter domain statement> ::= ALTER DOMAIN <domain name> <alter domain action> <alter domain action> ::= <set domain default clause> | <drop domain default clause> | <add domain constraint definition> | <drop domain constraint definition> It is important to note that a DOMAIN has to be defined with a basic data type and not with other DOMAINs. Once declared, a DOMAIN can be used in place of a data type declaration on a column. The CHECK() clause is where you can put the code for validating data items with check digits, ranges, lists, and other conditions. Since the DOMAIN is in one place, you can make a good argument for writing the following: CREATE DOMAIN StateCode AS CHAR(2) DEFAULT '??' CONSTRAINT valid_state_code CHECK (VALUE IN ('AL', 'AK', 'AZ', )); instead of: CREATE DOMAIN StateCode AS CHAR(2) DEFAULT ‘??’ CONSTRAINT valid_state_code CHECK (VALUE IN (SELECT state FROM StateCodeTable)); The second method would have been better if you did not have a DOMAIN and had to replicate the CHECK() clause in multiple tables in the database. This would collect the values and their changes in one place instead of many. 1.4.4 CREATE TRIGGER Statement A TRIGGER is a feature in many versions of SQL that will execute a block of procedural code against the database when a table event occurs. This 1.4 Other Schema Objects 53 is not part of Standard SQL, but has been proposed in the SQL3 working document. You can think of a TRIGGER as a generalization of the referential actions. The procedural code is usually written in a proprietary language, but some products let you attach programs in standard procedural languages. A TRIGGER could be used to automatically handle discontinued merchandise, for example, by creating a credit slip in place of the original order item data. There is a Standard syntax for TRIGGERs, based on the SQL/PSM Standard, but it is not widely implemented. You should look at what your particular vendor has given you if you want to work with TRIGGERs. The advantages of TRIGGERs over declarative referential integrity are that you can do everything that DRI can, and almost anything else, too. The disadvantages are that the optimizer cannot get any data from the procedural code, the TRIGGERs take longer to execute, and they are not portable from product to product. My advice would be to avoid TRIGGERs when you can use declarative referential integrity instead. If you do use them, check the code very carefully and keep it simple so that you will not hurt performance. 1.4.5 CREATE PROCEDURE Statement CREATE PROCEDURE allows you to declare and name a module of procedural code written in SQL/PSM or another ANSI X3J programming language. The two major differences in a TRIGGER and a PROCEDURE are that a procedure can accept parameters and return values, and that is it is explicitly invoked by a CALL from a user session and not a database event. Again, many SQL products have had their own versions of procedure, so you should look at what your particular vendor has given you, check the code very carefully, and keep it simple so you will not hurt performance. The SQL/PSM (see Understanding SQL’s Stored Procedures by Jim Melton) for procedural code is an ISO Standard. Still, even with the move to the ISO Standard, existing implementations will still have their own proprietary syntax in many places. 1.4.6 DECLARE CURSOR Statement I will not spend much time with cursors in this book, but you should understand them at a high level, since you will see them in actual code. Despite a standard syntax, every product has a proprietary version of 54 CHAPTER 1: DATABASE DESIGN cursors, because cursors are a low-level construct that works close to the physical implementation in the product. A CURSOR is a way of converting an SQL result set into a sequential data structure that looks like a simple sequential file. This structure can be handled by the procedural host language, which contains the very statement that executes and creates a structure that looks like a sequential file. In fact, the whole cursor process looks like an old- fashioned magnetic tape system! You might have noticed that in SQL, the keyword CREATE builds persistent schema objects. The keyword DECLARE builds transient objects that disappear with the end of the session in which they were build. For this reason, you say DECLARE CURSOR, not CREATE CURSOR. First, you allocate working storage in the host program with a BEGIN DECLARE END DECLARE section. This allocation sets up an area where SQL variables can be converted into host language data types, and vice versa. NULLs are handled by declaring INDICATOR variables in the host language BEGIN DECLARE section. The INDICATOR variables are paired with the appropriate host variables. An INDICATOR is an exact numeric data type with a scale of zero, that is, some kind of integer in the host language. DECLARE CURSOR Statement The DECLARE CURSOR statement must appear next. The SQL-92 syntax is fairly representative of actual products, but you must read your manual. <declare cursor> ::= DECLARE <cursor name> [INSENSITIVE] [SCROLL] CURSOR FOR <cursor specification> <cursor specification> ::= <query expression> [<order by clause>] [<updatability clause>] <updatability clause> ::= FOR {READ ONLY | UPDATE [OF <column name list>]} <order by clause> ::= ORDER BY <sort specification list> <sort specification list> ::= 1.4 Other Schema Objects 55 <sort specification> [{<comma> <sort specification>} ] <sort specification> ::= <sort key> [<collate clause>] [<ordering specification>] <sort key> ::= <column name> <ordering specification> ::= ASC | DESC A few things need explaining. First of all, the ORDER BY clause is part of a cursor, not part of a SELECT statement. Because some SQL products, such as SQL Server and Sybase, allow the user to create implicit cursors, many newbies get this wrong. This is easy to implement in products that evolved from sequential file systems and still expose this architecture to the user, in violation of Dr. Codd’s rules. Oracle is probably the worst offender as of this writing, but some of the “micro- SQLs” are just as bad. If either INSENSITIVE, SCROLL, or ORDER BY is specified, or if the working table is a read-only, then an <updatability clause> of READ ONLY is implicit. Otherwise, an <updatability clause> of FOR UPDATE without a <column name list> is implicit. OPEN Statement The OPEN <cursor name> statement positions an imaginary read/ write head before the first record in the cursor. FETCH statements can then move this imaginary read/write head from record to record. When the read/write head moves past the last record, an exception is raised, like an EOF (end of file) flag in a magnetic tape file system. Watch out for this model! In some file systems, the read/write head starts on the first record and the EOF flag is set to TRUE when it reads the last record. Simply copying the algorithms from your procedural code into SQL/PSM might not work. FETCH Statement <fetch statement> ::= FETCH [[<fetch orientation>] FROM] <cursor name> INTO <fetch target list> <fetch orientation> ::= NEXT | PRIOR | FIRST | LAST | {ABSOLUTE | RELATIVE} <simple value specification> The FETCH statement takes one row from the cursor, then converts each SQL data type into a host-language data type and puts result into 56 CHAPTER 1: DATABASE DESIGN the appropriate host variable. If the SQL value was a NULL, the INDICATOR is set to -1; if no indicator was specified, an exception condition is raised. As you can see, the host program must be sure to check the INDICATORs, because otherwise the value of the parameter will be garbage. If the parameter is passed to the host language without any problems, the INDICATOR is set to zero. If the value being passed to the host program is a non- NULL character string and it has an indicator, the indicator is set to the length of the SQL string and can be used to detect string overflows or to set the length of the parameter. The <fetch orientation> tells the read/write head which way to move. NEXT and PRIOR read one record forward or backward from the current position. FIRST and LAST put the read/write on the first or last records respectively. The ABSOLUTE fetch moves to a given record number. The RELATIVE fetch moves the read/write head forward or backward (n) records from the current position. Again, this is a straight imitation of a sequential file system. CLOSE Statement The CLOSE <cursor name> statement resets the cursor read/write head to a position before the first row in the cursor. The cursor still exists, but must be reopened before it can be used. This is similar to the CLOSE FILE operations in FORTRAN or COBOL, but with an important difference, the cursor can be recomputed when it is reopened! DEALLOCATE Statement The DEALLOCATE CURSOR statement frees up the working storage in the host program. Think of it as dismounting a tape from the tape drive in a sequential file system. How to Use a CURSOR The best performance improvement technique for cursors inside the database is not to use them. SQL engines are designed for set processing, and they work better with sets of data than with individual rows. The times when using cursor is unavoidable usually deal with corrections to the database caused by an improper design, or when speed of a cursor is faster because of the physical implementation in the product. For example, a cursor can be used to take redundant duplicates out of a table that does not have a key. The old argument for cursors in the original Sybase SQL Server training course was this example. You own a bookstore and you want to 1.4 Other Schema Objects 57 change prices; all books $25 and over are reduced 10%, and all books under $25 are increased 15%. BEGIN ATOMIC UPDATE Books SET price = price * 0.90 WHERE price >= $25.00; UPDATE Books SET price = price * 1.15 WHERE price < $25.00; END; Oops! Look at a book that was $25.00 ((25.00 * .90) *1.10) = $24.75. So you were told to cursor through the table, and change each row with a cursor. Today you write: UPDATE Books SET price = CASE WHEN price < $25.00; THEN price * 1.15 WHEN price >= $25.00 THEN price * 0.90 ELSE price END; But Steve Kass pointed out that even back then, it was possible to avoid a cursor: BEGIN ATOMIC UPDATE Books SET price = price * 1.80 WHERE price >= $25.00; UPDATE Books SET price = price * 1.15 WHERE price < $25.00; UPDATE Books SET price = price * 0.50 WHERE price >= $45.00; END; 58 CHAPTER 1: DATABASE DESIGN However, this code makes three passes through the Books table, instead of just one. That could be worse than a cursor! Limit the number of rows and columns in the cursor’s SELECT statement to only those required for the desired result set. This limitation will avoid unnecessary fetching of data, which in turn will require fewer server resources and increase cursor performance. Use FOR READ ONLY instead of UPDATE cursors, if possible. You will have to watch the transaction isolation level, however. Opening an INSENSITIVE cursor can cause its rows to be copied to a working table in many products or locked at the table level in others. Do a CLOSE cursor as soon as you are finished with the result set. This will release any locks on the rows. Always remember to deallocate your CURSORs when you are finished. Look for your product options. For example, SQL Server has FAST_FORWARD and FORWARD_ONLY cursor options when working with unidirectional, read-only result sets. Using FAST_FORWARD defines a FORWARD_ONLY, READ_ONLY cursor with a number of internal performance optimizations. Be careful if you are using a CURSOR loop to modify a large number of rows contained within a transaction. Depending on the transaction isolation level, those rows may remain locked until the transaction is committed or rolled back, possibly causing resource contention on the server. In Standard SQL, there is an SQLSTATE code that tells you if the result set of a GROUP BY has members that excluded NULLs from their aggregate computations. This warning can be raised in the DECLARE CURSOR statement, the OPEN statement, or when the row representing such a grouping is FETCHed. Know how your product handles this situation. The truth is that the host languages have to use cursors because they are designed for sequential file systems Positioned UPDATE and DELETE Statements Obviously, the cursor needs an explicit or implicit <updatability clause> of FOR UPDATE for this to work, and it has to be in the same module as the positioned statements. You get an exception when you try to change a READ ONLY cursor, or if the CURSOR is not positioned on a record. The clause CURRENT OF <cursor name> refers to the record that the imaginary read/write head is on. This cursor record has to map back to one and only one row in the base table. 1.4 Other Schema Objects 59 UPDATE Statement: <update statement: positioned> ::= UPDATE <table name> SET <set clause list> WHERE CURRENT OF <cursor name> The cursor remains positioned on its current row, even if an exception condition is raised during the update attempt. DELETE FROM Statement: <delete statement: positioned> ::= DELETE FROM <table name> WHERE CURRENT OF <cursor name> If, while the cursor is open, another DELETE FROM or UPDATE statement attempts to modify the current cursor record, then a cursor operation conflict warning is raised. The transaction isolation level then determines what happens. If the <delete statement: positioned> deleted the last cursor record, then the position of the cursor is after the last record; otherwise, the position of the cursor is before the next cursor record. CHAPTER 2 Normalization T HE RELATIONAL MODEL of data, and the normal forms of the relational model, were first defined by Dr. E. F. Codd (Codd 1970), and then extended by other writers after him. Dr. Codd invented the term “normalized relations” by borrowing from the political jargon of the day. The branch of mathematics called relations deals with mappings among sets defined by predicate calculus from formal logic. Just as in an algebraic equation, there are many forms of the same relational statement, but the normal forms of relations are certain formally defined desirable constructions. The goal of normal forms is to avoid certain data anomalies that can occur in unnormalized tables. Data anomalies are easier to explain with an example, but first please be patient while I define some terms. A predicate is a statement of the form A(X), which means that X has the property A. For example, “John is from Indiana” is a predicate statement; here, “John” is the subject and “is from Indiana” is the predicate. A relation is a predicate with two or more subjects. “John and Bob are brothers” is an example of a relation. The common way of visualizing a set of relational statements is as a table, in which the columns are attributes of the relation, and each row is a specific relational statement. When Dr. Codd defined the relational model, he gave 0 to 12 rules for the visualization of the relation as a table: . few things need explaining. First of all, the ORDER BY clause is part of a cursor, not part of a SELECT statement. Because some SQL products, such as SQL Server and Sybase, allow the user to. cursors inside the database is not to use them. SQL engines are designed for set processing, and they work better with sets of data than with individual rows. The times when using cursor is unavoidable. CURSORs when you are finished. Look for your product options. For example, SQL Server has FAST_FORWARD and FORWARD_ONLY cursor options when working with unidirectional, read-only result sets.