112 CHAPTER 6: CODING CHOICES WHEN 9 THEN 'N/A' ELSE NULL END could also be written as: CASE WHEN iso_sex_code = 0 THEN 'Unknown' WHEN iso_sex_code = 1 THEN 'Male' WHEN iso_sex_code = 2 THEN 'Female' WHEN iso_sex_code = 9 THEN 'N/A' ELSE NULL END There is a gimmick in this definition, however. The expression: CASE foo WHEN 1 THEN 'bar' WHEN NULL THEN 'no bar' END becomes: CASE WHEN foo = 1 THEN 'bar' WHEN foo = NULL THEN 'no_bar' —error! ELSE NULL END The second WHEN clause is always UNKNOWN. Use the simple CASE expression when it is appropriate. 6.2.2.3 Other CASE Expressions The SQL-92 standard defines other functions in terms of the CASE expression, which makes the language a bit more compact and easier to implement. For example, the COALESCE () function can be defined for one or two expressions by: 1. COALESCE (<value exp #1>) is equivalent to (<value exp #1>) 2. COALESCE (<value exp #1>, <value exp #2>) is equivalent to: CASE WHEN <value exp #1> IS NOT NULL THEN <value exp #1> ELSE <value exp #2> END 6.2 Pick Compact Constructions over Longer Equivalents 113 Then we can recursively define it for ( n ) expressions, where ( n >= 3), in the list by: COALESCE (<value exp #1>, <value exp #2>, , n ) as equivalent to: CASE WHEN <value exp #1> IS NOT NULL THEN <value exp #1> ELSE COALESCE (<value exp #2>, , n) END Likewise, NULLIF (<value exp #1>, <value exp #2>) is equivalent to: CASE WHEN <value exp #1> = <value exp #2> THEN NULL ELSE <value exp #1> END Use the most compact form of these CASE expressions, and do not expand them out to their definitions. 6.2.3 Avoid Redundant Expressions Rationale: Most modern SQL engines are pretty smart. This was not always the case, so older SQL programmers will sometimes add redundant predicates to a where clause. For example, if none of the columns in the table Foobar is NULL-able, then given: SELECT a, b, c FROM Foobar WHERE a = b AND b = c AND c = a; One of the three search conditions is redundant, because it can be deduced from the other two. Redundant predicates only confuse the human readers and do not give information to a good optimizer. 114 CHAPTER 6: CODING CHOICES Exceptions: If your SQL has a bad optimizer and needs the extra help, then add redundant predicates. 6.2.4 Seek a Compact Form Rationale: Many of the earlier SQL engines could not use an index on a column if it were in an expression, and they did not do any algebraic optimizations. Today, we do this bit of cleanup work because a simpler form of an expression is easier to maintain and to read: SELECT a, b, c FROM Foobar WHERE a + 2 = b - 4; And a little algebra becomes: SELECT a, b, c FROM Foobar WHERE a = b + 2; Exceptions: If your SQL has a really good optimizer, and the complicated form is easier for a human being to read for some reason, then use it. Sometimes there is no simple form. 6.2.4.1 Use BETWEEN, Not AND-ed Predicates Rationale: Consider this simple query: SELECT a, b, c FROM Foobar WHERE a <= b AND b <= c; which can be written as: SELECT a, b, c FROM Foobar WHERE b BETWEEN a AND c; 6.2 Pick Compact Constructions over Longer Equivalents 115 The BETWEEN is more compact and gives the reader information about the relationship among three columns that might not be so obvious amid a longer list of search conditions. Exceptions: This rule makes sense from a readability standpoint, but it does not always stand up in terms of performance. Consider DB2 for z/OS in which “<column name> BETWEEN <expression> AND <expression> is both indexable and a stage one predicate.” Without explaining what a stage one predicate is, it is preferred for performance. However, “<value> BETWEEN <column name 1>AND <column name 2>” is both stage two and nonindexable, but formulating the same using two <= predicates could be both stage one and indexable and therefore preferable for performance. Likewise, the same execution plan applies to “<column name 1> BETWEEN <column name 2> AND <column name 3>” predicates. This will differ from DBMS to DBMS and platform to platform. As optimizers get better, this will be less and less true. 6.2.4.2 Use IN(), Not OR-ed predicates Rationale: The IN() predicate was first introduced in the Pascal programming language. In SQL it has two forms; the list and the subquery. The list form has a comma-separated list of values or expressions on the right- hand side. The predicate returns a TRUE result if there is a match in that list with the left-hand side of the predicate. It is shorthand for a list or OR-ed predicates. For example consider: SELECT a, b, c FROM Foobar WHERE a = b OR a = c; which can be written as: SELECT a, b, c FROM Foobar WHERE a IN (b, c); The IN() is more compact and gives the reader information about the relationship among three columns that might not be so obvious amid a 116 CHAPTER 6: CODING CHOICES longer list of search conditions. The list can also consist of scalar expressions, but that is not common. Exceptions: Watch out for NULLs! The IN () predicate is defined as a chain of OR-ed predicates, thus: a IN (x, y, z) means ((a = x) OR (a = y) OR (a = z)) Therefore: a IN (x, y, NULL) means ((a = x) OR (a = y) OR (a = NULL)) ((a = x) OR (a = y) OR UNKNOWN) We are now in SQL’s three-valued logic. Remember that a NULL is not the same thing as an UNKNOWN; SQL-92 has no Boolean data type; and you cannot use AND, OR, and NOT on a NULL. The NOT IN () predicate is defined as the negation of the IN(): a NOT IN (x, y, z) means: NOT (a IN (x, y, z)) NOT ((a = x) OR (a = y) OR (a = z)) (NOT(a = x) AND NOT(a = y) AND NOT(a = z)) DeMorgan's law ((a <> x) AND (a <> y) AND (a <> z)) definition Now put in a NULL for one of the list elements: ((a <> x) AND (a <> y) AND UNKNOWN) (UNKNOWN) If you wish to have a match on a NULL in a list, then you can COALESCE() the NULLs to the left-hand expression, thus: WHERE a IN (x, y, COALESCE (z, a)) . 6.2.3 Avoid Redundant Expressions Rationale: Most modern SQL engines are pretty smart. This was not always the case, so older SQL programmers will sometimes add redundant predicates to a. Exceptions: If your SQL has a bad optimizer and needs the extra help, then add redundant predicates. 6.2.4 Seek a Compact Form Rationale: Many of the earlier SQL engines could not use. OR-ed predicates Rationale: The IN() predicate was first introduced in the Pascal programming language. In SQL it has two forms; the list and the subquery. The list form has a comma-separated