242 CHAPTER 10: VALUED PREDICATES non- NULL values is defined by Table 10.1, where Degree means the number of columns in the row expression. Table 10.1 Cases Where a Row Is a Mix of NULL and non- NULL Values R IS R IS NOT NOT R IS NOT R IS NOT Expression NULL NULL NULL NULL ============================================================== Degree = 1 NULL TRUE FALSE FALSE TRUE No NULL FALSE TRUE TRUE FALSE ============================================================== Degree > 1 All NULLs TRUE FALSE FALSE TRUE Some NULLs FALSE FALSE TRUE TRUE No NULLs FALSE TRUE TRUE FALSE Note that R IS NOT NULL has the same result as NOT R IS NULL if and only if R is of degree 1. This is a break in the usual pattern of predicates with a NOT option in them. Here are some examples: (1, 2, 3) IS NULL = FALSE (1, NULL, 3) IS NULL = FALSE (1, NULL, 3) IS NOT NULL = FALSE (NULL, NULL, NULL) IS NULL = TRUE (NULL, NULL, NULL) IS NOT NULL = FALSE NOT (1, 2, 3) IS NULL = TRUE NOT (1, NULL, 3) IS NULL = TRUE NOT (1, NULL, 3) IS NOT NULL = TRUE NOT (NULL, NULL, NULL) IS NULL = FALSE NOT (NULL, NULL, NULL) IS NOT NULL = TRUE 10.1.1 Sources of NULLs It is important to remember where NULL s can occur. They are more than just a possible value in a column. Aggregate functions on empty sets, OUTER JOIN s, arithmetic expressions with NULL s, and OLAP operators all return NULL s. These constructs often show up as columns in VIEW s. 10.2 IS [NOT]{TRUE | FALSE | UNKNOWN} Predicate This predicate tests a condition that has the truth-value TRUE , FALSE , or UNKNOWN , and returns TRUE or FALSE . The syntax is: 10.2 IS [NOT]{TRUE | FALSE | UNKNOWN} Predicate 243 <Boolean test> ::= <Boolean primary> [IS [NOT] <truth value>] <truth value> ::= TRUE | FALSE | UNKNOWN <Boolean primary> ::= <predicate> | <left paren> <search condition> <right paren> As you would expect, the expression IS NOT <logical value> is the same as NOT (IS <logical value>) , so the predicate can be defined as shown in Table 10.2. Table 10.2 Defining the Predicate: True, False, or Unknown IS condition | TRUE FALSE UNKNOWN ======================================= TRUE | TRUE FALSE FALSE FALSE | FALSE TRUE FALSE UNKNOWN | FALSE FALSE TRUE If you are familiar with some of Chris Date’s writings, his MAYBE(x) predicate is not the same as the ANSI (x) IS NOT FALSE predicate, but it is equivalent to the (x) IS UNKNOWN predicate. Date’s predicate excludes the case where all conditions in the predicate are TRUE . Date points out that it is difficult to ask a conditional question in English. To borrow one of his examples (Date 1990), consider the problem of finding employees who might be programmers born before January 18, 1975 with a salary less than $50,000. The statement of the problem is a bit unclear as to what the “might be” covers—just being a programmer, or all three conditions. Let’s assume that we want some doubt on any of the three conditions. With this predicate, the answer is fairly easy to write: SELECT * FROM Personnel WHERE (job = 'Programmer' AND dob < CAST ('1975-01-18' AS DATE) AND salary < 50000) IS UNKNOWN; 244 CHAPTER 10: VALUED PREDICATES This could be expanded in the old SQL-89 to: SELECT * FROM Personnel WHERE (job = 'Programmer' AND dob < CAST ('1975-01-18' AS DATE) AND salary < 50000.00) OR (job IS NULL AND dob < CAST ('1975-01-18' AS DATE) AND salary < 50000.00) OR (job = 'Programmer' AND dob IS NULL AND salary < 50000.00) OR (job = 'Programmer' AND dob < CAST ('1975-01-18' AS DATE) AND salary IS NULL) OR (job IS NULL AND dob IS NULL AND salary < 50000.00) OR (job IS NULL AND dob < CAST ('1975-01-18' AS DATE) AND salary IS NULL) OR (job = 'Programmer' AND dob IS NULL AND salary IS NULL) OR (job IS NULL AND dob IS NULL AND salary IS NULL); The problem is that every possible combination of NULL s and non- NULL s has to be tested. Since there are three predicates involved, this gives us (3^2) = 8 combinations to check out. The IS NOT UNKNOWN predicate does not have to bother with the combinations, only the final logical value. 10.3 IS [NOT] NORMALIZED Predicate <string> IS [NOT] NORMALIZED determines whether a Unicode string is one of the four normal forms (D, C, KD, or KC). The use of the words “normal form” here are not the same as in a relational context. In the Unicode model, a single character can be built from several other 10.3 IS [NOT] NORMALIZED Predicate 245 characters. Accent marks can be put on basic Latin letters. Certain combinations of letters can be displayed as ligatures (‘æ’ becomes ‘Ê’). Some languages, such as Hangul (Korean) and Vietnamese, build glyphs from concatenating symbols in two dimensions. Some languages have special forms of one letter that are determined by context, such as the terminal sigma in Greek or accented ‘u’ in Czech. In short, writing is more complex than just putting one letter after another. The Unicode standard defines the order of such constructions in their normal forms. You can still produce the same results with different orderings and sometimes with different combinations of symbols. But it is very handy when you are searching such text to know that it is normalized, rather than to try and parse each glyph on the fly. You can find details about normalization and links to free software at www.unicode.org. CHAPTER 11 CASE Expressions T HE CASE EXPRESSION IS probably the most useful addition in SQL-92. This is a quick overview of how to use the expression, but you will find other tricks spread throughout the book. The reason it is so important is that: 1. It works with any data type. 2. It allows the programmer to avoid procedural code by replacing IF-THEN-ELSE control flow with CASE expression inside the query. 3. It makes SQL statements equivalent to primitive recursive functions. You can look up what that means in a book on the theory of computation, but it is a nice mathematical property that guarantees certain kinds of problems can be solved. 11.1 The CASE Expression The CASE expression allows the programmer to pick a value based on a logical expression in his code. ANSI stole the idea and the syntax from the now-defunct Ada programming language. Here is the syntax for a <case specification> : 248 CHAPTER 11: CASE EXPRESSIONS <case specification> ::= <simple case> | <searched case> <simple case> ::= CASE <case operand> <simple when clause> [<else clause>] END <searched case> ::= CASE <searched when clause> [<else clause>] END <simple when clause> ::= WHEN <when operand> THEN <result> <searched when clause> ::= WHEN <search condition> THEN <result> <else clause> ::= ELSE <result> <case operand> ::= <value expression> <when operand> ::= <value expression> <result> ::= <result expression> | NULL <result expression> ::= <value expression> The searched CASE expression is probably the most used version of the expression. First, the expression is given a data type by finding the highest data type in its THEN clauses. The WHEN THEN clauses are executed in left-to-right order. The first WHEN clause that tests TRUE returns the value given in its THEN clause. And, yes, you can nest CASE expressions inside each other. If no explicit ELSE clause is given for the CASE expression, then the database will insert an implicit ELSE NULL clause. If you wish to return a NULL from a THEN , however, you should use a CAST (NULL AS <data type>) expression to establish the data type for the compiler. 11.1 The CASE Expression 249 this works CASE WHEN 1 = 1 THEN NULL ELSE CAST(NULL AS INTEGER) END this works CASE WHEN 1 = 1 THEN CAST(NULL AS INTEGER) ELSE NULL END this does not work; no <result> to establish a data type CASE WHEN 1 = 1 THEN NULL ELSE NULL END might or might not work in your SQL CAST (CASE WHEN 1 = 1 THEN NULL ELSE NULL END AS INTEGER) I recommend always writing an explicit ELSE clause, so that you can change it later when you find a value to return. I would also recommend that you explicitly cast a NULL in the CASE expression THEN clause to the desired data type. If the THEN clauses have results of different data types, the compiler will find the most general one and CAST() the others to it. But again, actual implementations might have slightly different ideas about how and when this casting should be done. The <simple case expression> is defined as a searched CASE expression in which all the WHEN clauses are made into equality comparisons against the <case operand> . For example: CASE iso_sex_code WHEN 0 THEN 'Unknown' WHEN 1 THEN 'Male' WHEN 2 THEN 'Female' WHEN 9 THEN 'N/A' ELSE NULL END This could also be written as: 250 CHAPTER 11: CASE EXPRESSIONS CASE WHEN iso_sex_code = 0 THEN 'Unknown' WHEN iso_sex_code = 1 THEN 'Male' WHEN iso_sex_code = 2 THEN 'Female' WHEN iso_sex_code = 9 THEN 'N/A' ELSE NULL END There is a gimmick in this definition, however. The expression: CASE foo WHEN 1 THEN 'bar' WHEN NULL THEN 'no bar' END becomes: CASE WHEN foo = 1 THEN 'bar' WHEN foo = NULL THEN 'no_bar' problem! ELSE NULL END The WHEN foo = NULL clause is always UNKNOWN . This definition can get really weird with a random number generator in the expression. Let’s assume that RANDOM() uses a seed value and returns a uniformly distributed random floating point number between 0.0000 and 0.99999999 . . . 99 whenever it is called. This expression will spend most of its time in the ELSE clause instead of returning a number word between one and five. SET pick_one = CASE CAST((5.0 * RANDOM()) + 1 AS INTEGER) WHEN 1 THEN 'one' WHEN 2 THEN 'two' WHEN 3 THEN 'three' WHEN 4 THEN 'four' WHEN 5 THEN 'five' ELSE 'This should not happen' END; The expansion will reproduce the CAST() expression for each WHEN clause, and the RANDOM() function will be reevaluated each time. You need to be sure that it is evaluated only once. 11.1 The CASE Expression 251 BEGIN DECLARE pick_a_number INTEGER; SET pick_a_number = CAST((5.0 * RANDOM()) + 1 AS INTEGER); SET pick_one = CASE pick_a_number WHEN 1 THEN 'one' WHEN 2 THEN 'two' WHEN 3 THEN 'three' WHEN 4 THEN 'four' WHEN 5 THEN 'five' ELSE 'This should not happen' END; END; The variable pick_a_number is also expanded in the WHEN clause, but because it is not a function call, it is not evaluated over and over. 11.1.1 The COALESCE() and NULLIF() Functions The SQL-92 Standard defines other functions in terms of the CASE expression, which makes the language a bit more compact and easier to implement. For example, the COALESCE() function can be defined for one or two expressions by: 1) COALESCE (<value exp #1>) is equivalent to (<value exp #1>) 2) COALESCE (<value exp #1>, <value exp #2>) is equivalent to CASE WHEN <value exp #1> IS NOT NULL THEN <value exp #1> ELSE <value exp #2> END Then we can recursively define it for ( n ) expressions, where ( n >= 3), in the list by: COALESCE (<value exp #1>, <value exp #2>, , n), as equivalent to: CASE WHEN <value exp #1> IS NOT NULL THEN <value exp #1> ELSE COALESCE (<value exp #2>, , n) END Likewise, NULLIF (<value exp #1>, <value exp #2>) is equivalent to: . 11 CASE Expressions T HE CASE EXPRESSION IS probably the most useful addition in SQL- 92. This is a quick overview of how to use the expression, but you will find other tricks spread throughout. | NULL <result expression> ::= <value expression> The searched CASE expression is probably the most used version of the expression. First, the expression is given a data. guarantees certain kinds of problems can be solved. 11.1 The CASE Expression The CASE expression allows the programmer to pick a value based on a logical expression in his code. ANSI stole