Joe Celko s SQL for Smarties - Advanced SQL Programming P27 pps

232 CHAPTER 8: TABLE OPERATIONS CREATE TABLE Foo (col_a CHAR(1) NOT NULL, col_b INTEGER NOT NULL); INSERT INTO Foo VALUES ('A', 0),('B', 0),('C', 0); CREATE TABLE Bar (col_a CHAR(1) NOT NULL, col_b INTEGER NOT NULL); INSERT INTO Bar VALUES ('A', 1), ('A', 2),('B', 1), ('C', 1); You run this proprietary UPDATE with a FROM clause: UPDATE Foo SET Foo.col_b = Bar.col_b FROM Foo INNER JOIN Bar ON Foo.col_a = Bar.col_a; The result of the UPDATE cannot be determined. The value of the column will depend upon either order of insertion, (if there are no clustered indexes present), or on order of clustering (but only if the cluster is not fragmented). 8.5 MERGE Statement SQL-99 added a single statement to mimic a common magnetic tape file system “merge and insert” procedure. The business logic, in a pseudocode, is like this. FOR EACH row IN the Transactions table DO IF working row NOT IN Master table THEN INSERT working row INTO the Master table; ELSE UPDATE Master table SET Master table columns to the Transactions table values WHERE they meet a matching criteria; END IF; END FOR; 8.5 MERGE Statement 233 In the 1950s, we would sort the transaction tape(s) and Master tape on the same key, read each one looking for a match, then perform whatever logic is needed. In its simplest form, the MERGE statement looks like this: MERGE INTO <table name> [AS [<correlation name>]] USING <table reference> ON <search condition> {WHEN [NOT] MATCHED [AND <search condition>] THEN <modification operation>} [ELSE IGNORE] You will notice that use of a correlation name in the MERGE INTO clause is in complete violation of the principle that a correlation name effectively creates a temporary table. There are several other places where SQL 2003 destroyed the original SQL language model, but you do not have to write irregular syntax in all cases. After a row is matched (or not) to the target table, you can add more <search condition>s in the WHEN clauses. The <modification operation> clause can include insertion, update, or delete operations that follow the same rules as those single statements. This approach can hide complex programming logic in a single statement. Let’s assume that that we have a table of Personnel salary changes at the branch office in a table called PersonnelChanges. Here is a MERGE statement that will take the contents of the PersonnelChanges table and merge them with the Personnel table. Both of them use the emp_nbr as the key. This is a typical, but very simple, use of MERGE INTO. MERGE INTO Personnel USING (SELECT emp_nbr, salary, bonus, comm FROM PersonnelChanges) AS C ON Personnel.emp_nbr = C.emp_nbr WHEN MATCHED THEN UPDATE SET (Personnel.salary, Personnel.bonus, Personnel.comm) = (C.salary, C.bonus, C.comm) WHEN NOT MATCHED THEN INSERT (Personnel.emp_nbr, Personnel.salary, Personnel.bonus, Personnel.comm) VALUES (C.emp_nbr, C.salary, C.bonus, C.comm); 234 CHAPTER 8: TABLE OPERATIONS Think about it for a minute. If there is a match, then all you can do is update the row. If there is no match, then all you can do is insert the new row. There are proprietary versions of this statement and other options. In particular, look for the tern “upsert” in the literature. These statements are most often used for adding data to a data warehouse. If you do not have this statement, you can get the same effect from this pseudocode block of code. BEGIN ATOMIC UPDATE T1 SET (a, b, c, = (SELECT a, b, c, FROM T2 WHERE T1.somekey = T2.somekey), WHERE EXISTS (SELECT * FROM T2 WHERE T1.somekey = T2.somekey); INSERT INTO T1 SELECT * FROM T2 WHERE NOT EXISTS (SELECT * FROM T2 WHERE T1.somekey = T2.somekey); END; For performance, first do the UPDATE, then the INSERT INTO. If you INSERT INTO first, all rows just inserted will be affected by the UPDATE as well. CHAPTER 9 Comparison or Theta Operators D R. CODD INTRODUCED THE term “theta operators” in his early papers to refer to what a programmer would have called comparison predicate operators. The large number of data types in SQL makes doing comparisons a little harder than in other programming languages. Values of one data type have to be promoted to values of the other data type before the comparison can be done. The available data types are implementation- and hardware-dependent, so read the manuals for your product. The comparison operators are overloaded and will work for <numeric> , <character> , and <datetime> data types. The symbols and meanings for comparison operators are shown in table 9.1. Table 9.1 Symbols and Meanings for Comparison Operators operator numeric character datetime =========================================================================== < : less than (collates before) (earlier than) = : equal (collates equal to) (same time as) > : greater than (collates after) (later than) <= : at most (collates before or equals) (no earlier than) <> : not equal (not the same as) (not the same time as) >= : at least (collates after or equals) (no later than) 236 CHAPTER 9: COMPARISON OR THETA OPERATORS You will also see != or ~= for “not equal to” in some older SQL implementations. These symbols are borrowed from the C and PL/I programming languages, respectively, and have never been part of standard SQL. It is a bad habit to use them, since it destroys the portability of your code and makes it harder to read. 9.1 Converting Data Types Numeric data types are all mutually comparable and mutually assignable. If an assignment will result in a loss of the most significant digits, an exception condition is raised. If the least significant digits are lost, the implementation defines what rounding or truncating has occurred and does not report an exception condition. Most often, one value is converted to the same data type as the other, and then the comparison is done in the usual way. The chosen data type is the “higher” of the two, using the following ordering: SMALLINT , INTEGER , BIGINT , DECIMAL , NUMERIC , REAL , FLOAT , DOUBLEPRECISION . Floating-point hardware will often affect comparisons for REAL , FLOAT , and DOUBLEPRECISION numbers. There is no good way to avoid this, since it is not always reasonable to use DECIMAL or NUMERIC in their place. A host language will probably use the same floating-point hardware, so at least errors will be constant across the application. CHARACTER and CHARACTER VARYING data types are comparable if and only if they are taken from the same character repertoire. That means that ASCII characters cannot be compared to graphics characters, English cannot be compared to Arabic, and so on. In most implementations, this is not a problem, because the database has only one repertoire. The comparison takes the shorter of the two strings and pads it with spaces. The strings are compared position by position from left to right, using the collating sequence for the repertoire—ASCII or EBCDIC, in most cases. Temporal (or <datetime> , as they are called in the standard) data types are mutually assignable only if the source and target of the assignment have the same <datetime> fields. That is, you cannot compare a date and a time. The CAST() operator can do explicit type conversions before you do a comparison. Table 9.2 shows the valid combinations of source and target data types in Standard SQL . Y means that the combination is syntactically valid without restriction; M indicates that the combination 9.1 Converting Data Types 237 is valid subject to other syntax rules; and N indicates that the combination is not valid. The codes mean yes, maybe, and no in English. Table 9.2 Valid Combinations of Source and Target Data Types in Standard SQL <value | <cast target> expr> | EN AN VC FC VB FB D T TS YM DT =============================================================== EN | Y Y Y Y N N N N N M M AN | Y Y Y Y N N N N N N N C | Y Y M M Y Y Y Y Y Y Y B | N N Y Y Y Y N N N N N D | N N Y Y N N Y N Y N N T | N N Y Y N N N Y Y N N TS | N N Y Y N N Y Y Y N N YM | M N Y Y N N N N N Y N DT | M N Y Y N N N N N N Y In Table 9.2, EN = Exact Numeric AN = Approximate Numeric C = Character (Fixed- or Variable-length) FC = Fixed-length Character VC = Variable-length Character B = Bit String (Fixed- or Variable-length) FB = Fixed-length Bit String VB = Variable-length Bit String D = Date T = Time TS = Timestamp YM = Year-Month Interval DT = Day-Time Interval 238 CHAPTER 9: COMPARISON OR THETA OPERATORS 9.2 Row Comparisons in SQL Standard SQL generalized the theta operators so they would work on row expressions and not just on scalars. This feature is not yet popular, but it is very handy for situations where a key is made from more than one column, and so forth. This makes SQL more orthogonal, and it has an intuitive feel to it. Take three row constants: A = (10, 20, 30, 40); B = (10, NULL, 30, 40); C = (10, NULL, 30, 100); It seems reasonable to define a row comparison as valid only when the data types of each corresponding column in the rows are union- compatible. If not, the operation is an error and should report a warning. It also seems reasonable to define the results of the comparison to the AND ed results of each corresponding column using the same operator. That is, (A = B) becomes: ((10, 20, 30, 40) = (10, NULL, 30, 40)); becomes: ((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 40)) becomes: (TRUE AND UNKNOWN AND TRUE AND TRUE); becomes: (UNKNOWN); This seems to be reasonable and conforms to the idea that a NULL is a missing value that we expect to resolve at a future date, so we cannot draw a conclusion about this comparison just yet. Now consider the comparison (A = C), which becomes: ((10, 20, 30, 40) = (10, NULL, 30, 100)); 9.2 Row Comparisons in SQL 239 becomes: ((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 100)); becomes: (TRUE AND UNKNOWN AND TRUE AND FALSE); becomes: (FALSE); There is no way to pick a value for column 2 of row C such that the UNKNOWN result will change to TRUE , because the fourth column is always FALSE . This leaves you with a situation that is not very intuitive. The first case can resolve to TRUE or FALSE , but the second case can only go to FALSE . Standard SQL decided that the theta operators would work as shown in the table below. The expression RX <comp op> RY is shorthand for a row RX compared to a row RY ; likewise, RXi means the i th column in the row RX . The results are still TRUE , FALSE , or UNKNOWN , if there is no error in type matching. The rules favor solid tests for TRUE or FALSE , using UNKNOWN as a last resort. The idea of these rules is the same principle that you would use to compare words alphabetically. As you read the columns from left to right, match them by position and compare each one. This is how it would work if you were alphabetizing words. The rules are 1. RX = RY is TRUE if and only if RXi = RYi for all i . 2. RX <> RY is TRUE if and only if RXi <> RYi for some i. 3. RX < RY is TRUE if and only if RXi = RYi for all i < n and RXn < RYn for some n. 4. RX > RY is TRUE if and only if RXi = RYi for all i < n and RXn > RYn for some n. 5. RX <= RY is TRUE if and only if Rx = Ry or Rx < Ry. 6. RX >= RY is TRUE if and only if Rx = Ry or Rx > Ry. 7. RX = RY is FALSE if and only if RX <> RY is TRUE. 240 CHAPTER 9: COMPARISON OR THETA OPERATORS 8. RX <> RY is FALSE if and only if RX = RY is TRUE. 9. RX < RY is FALSE if and only if RX >= RY is TRUE. 10. RX > RY is FALSE if and only if RX <= RY is TRUE. 11. RX <= RY is FALSE if and only if RX > RY is TRUE. 12. RX >= RY is FALSE if and only if RX < RY is TRUE. 13. RX <comp op> RY is UNKNOWN if and only if RX <comp op> RY is neither TRUE nor FALSE. The negations are defined so that the NOT operator will still have its usual properties. Notice that a NULL in a row will give an UNKNOWN result in a comparison. Consider this expression: (a, b, c) < (x, y, z) which becomes: ((a < x) OR ((a = x) AND (b < y)) OR ((a = x) AND (b = y) AND (c < z))) The standard allows a single-row expression of any sort, including a single-row subquery, on either side of a comparison. Likewise, the BETWEEN predicate can use row expressions in any position in Standard SQL. CHAPTER 10 Valued Predicates V ALUED PREDICATES IS MY term for a set of related unary Boolean predicates that test for the logical value or NULL value of their operands. IS NULL has always been part of SQL, but the logical IS predicate was new to SQL-92, and is not well implemented at this time. 10.1 IS NULL Predicate The IS NULL predicate is a test for a NULL value in a column with the syntax: <null predicate> ::= <row value constructor> IS [NOT] NULL It is the only way to test to see if an expression is NULL or not, and it has been in SQL-86 and all later versions of the standard. The SQL-92 standard extended it to accept <row value constructor> , instead of a single column or scalar expression, as we saw in Section 9.2. This extended version will start showing up in implementations when other row expressions are allowed. If all the values in row R are the NULL value, then R IS NULL is TRUE ; otherwise, it is FALSE . If none of the values in R are NULL value, R IS NOT NULL is TRUE ; otherwise, it is FALSE . The case where the row is a mix of NULL and . row expressions and not just on scalars. This feature is not yet popular, but it is very handy for situations where a key is made from more than one column, and so forth. This makes SQL more. z))) The standard allows a single-row expression of any sort, including a single-row subquery, on either side of a comparison. Likewise, the BETWEEN predicate can use row expressions in any position. has been in SQL- 86 and all later versions of the standard. The SQL- 92 standard extended it to accept <row value constructor> , instead of a single column or scalar expression, as

Định dạng
Số trang	10
Dung lượng	246,37 KB