Joe Celko s SQL for Smarties - Advanced SQL Programming P29 doc

252 CHAPTER 11: CASE EXPRESSIONS CASE WHEN <value exp #1> = <value exp #2> THEN NULL ELSE <value exp #1> END 11.1.2 CASE Expressions with GROUP BY A CASE expression is very useful with a GROUP BY query. For example, to determine how many employees of each gender by department you have in your Personnel table, you can write: SELECT dept_nbr, SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0) AS males, SUM(CASE WHEN gender = 'F' THEN 1 ELSE 0) AS females FROM Personnel GROUP BY dept_nbr; or: SELECT dept_nbr, COUNT(CASE WHEN gender = 'M' THEN 1 ELSE NULL) AS males, COUNT(CASE WHEN gender = 'F' THEN 1 ELSE NULL) AS females FROM Personnel GROUP BY dept_nbr; I am not sure if there is any general rule as to which form will run faster. Aggregate functions remove NULL s before they perform their operations, so the order of execution might be different in the ELSE 0 and the ELSE NULL versions. The previous example shows the CASE expression inside the aggregate function; it is possible to put aggregate functions inside a CASE expression. For example, assume you are given a table of employees’ skills: CREATE TABLE PersonnelSkills (emp_id CHAR(11) NOT NULL, skill_id CHAR(11) NOT NULL, primary_skill_ind CHAR(1) NOT NULL CONSTRAINT primary_skill_given CHECK (primary_skill_ind IN ('Y', 'N'), PRIMARY KEY (emp_id, skill_id)); 11.1 The CASE Expression 253 Each employee has a row in the table for each of his skills. If the employee has multiple skills, she will have multiple rows in the table, and the primary skill indicator will be a ‘Y’ for her main skill. If she only has one skill (which means one row in the table), the value of primary_skill_ind is indeterminate. The problem is to list each employee once along with her only skill, if she only has one row in the table, or her primary skill, if she has multiple rows in the table. SELECT emp_id, CASE WHEN COUNT(*) = 1 THEN MAX(skill_id) ELSE MAX(CASE WHEN primary_skill_ind = 'Y' THEN skill_id END) ELSE NULL END) END AS main_skill FROM PersonnelSkills GROUP BY emp_id; This solution looks at first like a violation of the rule in SQL that prohibits nested aggregate functions, but if you look closely, it is not. The outermost CASE expression resolves to an aggregate function, namely MAX(). The ELSE clause simply has to return an expression inside its MAX() that can be resolved to a single value. 11.1.3 CASE, CHECK() Clauses and Logical Implication Complicated logical predicates can be put into a CASE expression that returns either 1 ( TRUE) or 0 (FALSE): CONSTRAINT implication_example CHECK (CASE WHEN dept_nbr = 'D1' THEN CASE WHEN salary < 44000.00 THEN 1 ELSE 0 END ELSE 1 END = 1) This is a logical implication operator. It is usually written as an arrow with two stems (⇒), and its definition is usually stated as “a true premise cannot imply a false conclusion” or as “if a then b.” In English, the above condition says “if an employee is in department ‘D1’, then his salary is less than $44,000.00,” which is not the same as saying (dept_nbr = ‘D1’ AND salary < 44000.00) in the 254 CHAPTER 11: CASE EXPRESSIONS constraint. In standard Boolean logic, there is a simple transformation called the Smisteru rule (after the engineer who discovered it), which says that (A ⇒ B) is equivalent to (NOT (A) OR B). In SQL, the Data Declaration language (DDL) uses predicates in CHECK() constraints and treats TRUE and UNKNOWN alike. The Data Manipulation Language (DML) uses predicates in the WHERE and ON clauses and treats treats FALSE and UNKNOWN alike. How do you define logical implication with two different rules? Let’s try the Smisteru transform first: CREATE TABLE Foobar_DDL_1 (a CHAR(1) CHECK (a IN ('T', 'F')), b CHAR(1) CHECK (b IN ('T', 'F')), CONSTRAINT implication_example CHECK (NOT (A ='T') OR (B = 'T'))); INSERT INTO Foobar_DDL_1 VALUES ('T', 'T'); INSERT INTO Foobar_DDL_1 VALUES ('T', 'F'); fails INSERT INTO Foobar_DDL_1 VALUES ('T', NULL); INSERT INTO Foobar_DDL_1 VALUES ('F', 'T'); INSERT INTO Foobar_DDL_1 VALUES ('F', 'F'); INSERT INTO Foobar_DDL_1 VALUES ('F', NULL); INSERT INTO Foobar_DDL_1 VALUES (NULL, 'T'); INSERT INTO Foobar_DDL_1 VALUES (NULL, 'F'); INSERT INTO Foobar_DDL_1 VALUES (NULL, NULL); SELECT * FROM Foobar_DDL_1; Results a b =========== TT T NULL FT FF F NULL NULL T NULL F NULL NULL 11.1 The CASE Expression 255 Now my original version: CREATE TABLE Foobar_DDL (a CHAR(1) CHECK (a IN ('T', 'F')), b CHAR(1) CHECK (b IN ('T', 'F')), CONSTRAINT implication_example_2 CHECK(CASE WHEN A = 'T' THEN CASE WHEN B = 'T' THEN 1 ELSE 0 END ELSE 1 END = 1)); INSERT INTO Foobar_DDL VALUES ('T', 'T') ('T', 'F'), fails ('T', NULL), ('F', 'T'), ('F', 'F'), ('F', NULL), (NULL, 'T'), (NULL, 'F'), (NULL, NULL); SELECT * FROM Foobar_DDL; Results a b =========== TT FT FF F NULL NULL T NULL F NULL NULL Both agree that a TRUE premise cannot lead to a FALSE conclusion, but Smisteru allows ('T', NULL). Not quite the same implication operators! Let’s now look at the query side of the house: CREATE TABLE Foobar_DML (a CHAR(1) CHECK (a IN ('T', 'F')), b CHAR(1) CHECK (b IN ('T', 'F'))); INSERT INTO Foobar_DML VALUES ('T', 'T'); 256 CHAPTER 11: CASE EXPRESSIONS INSERT INTO Foobar_DML VALUES ('T', 'F'); INSERT INTO Foobar_DML VALUES ('T', NULL); INSERT INTO Foobar_DML VALUES ('F', 'T'); INSERT INTO Foobar_DML VALUES ('F', 'F'); INSERT INTO Foobar_DML VALUES ('F', NULL); INSERT INTO Foobar_DML VALUES (NULL, 'T'); INSERT INTO Foobar_DML VALUES (NULL, 'F'); INSERT INTO Foobar_DML VALUES (NULL, NULL); Using the Smisteru rule as the search condition: SELECT * FROM Foobar_DML WHERE (NOT (A ='T') OR (B = 'T')); Results a b ========== TT FT FF F NULL NULL T Using the original predicate: SELECT * FROM Foobar_DML WHERE CASE WHEN A = 'T' THEN CASE WHEN B = 'T' THEN 1 ELSE 0 END ELSE 1 END = 1; Results a b ========== TT FT FF F NULL NULL T NULL F NULL NULL 11.1 The CASE Expression 257 This is why I used the CASE expression; it works the same way in both the DDL and DML. 11.1.4 Subquery Expressions and Constants Subquery expressions are SELECT statements inside of parentheses. Of course, there is more to it than that. The four flavors of subquery expressions are tabular, columnar, row, and scalar subquery expressions. As you might guess from the names, the tabular or table subquery returns a table as a result, so it has to appear any place that a table is used in SQL-92, which usually means it is in the FROM clause. The columnar subquery returns a table with a single column in it. This was the important one in the original SQL-86 and SQL-89 standards, because the IN, <comp op> ALL and <comp op> {ANY|SOME} predicates were based on the ability of the language to convert the single column into a list of comparisons connected by ANDs or ORs. The row subquery returns a single row. It can be used anywhere a row can be used. This sort of query is the basis for the singleton SELECT statement used in the embedded SQL. It is not used too much right now, but with the extension of theta operators to handle row comparisons, it might become more popular. The scalar subquery returns a single scalar value. It can be used anywhere a scalar value can be used, which usually means it is in the SELECT or WHERE clauses. If a scalar subquery returns an empty result, it is converted to a NULL. If a scalar subquery returns more than one row, you get a cardinality violation. I will make the very general statement now that the performance of scalar subqueries depends largely on the architecture of the hardware upon which your SQL is implemented. A massively parallel machine can allocate a processor to each scalar subquery and get drastic performance improvement. A table constant of any shape can be constructed using the VALUES() expression. New SQL programmers think that this is only an option in the INSERT INTO statement. However, Standard SQL allows you to use it to build a row as a comma-separated list of scalar expressions, and then build a table as a comma-separated list of those row constructors. Consider this lookup table of ZIP code ranges by state: CREATE VIEW StateZipcodes (state_code, low_zip, high_zip) AS VALUES ('AK', 99500, 99998), 258 CHAPTER 11: CASE EXPRESSIONS ('GA', 30000, 30399), ('WY', 82000, 83100); This table cannot be changed without dropping the VIEW and rebuilding it. It has no named base table. 11.2 Rozenshtein Characteristic Functions A characteristic function converts a logical expression into a one if it is TRUE and to a zero if it is FALSE. This is what we have been doing with some of the CASE expressions shown here, but not under that name. The literature uses a lowercase delta (δ) or a capital chi (Χ) as the symbol for this operator. Programmers first saw this in Ken Iverson’s APL programming language, and then later in Donald Knuth’s books on programming theory. The name comes from the fact that it is used to define a set by giving a rule for membership in the set. David Rozenshtein found ways of implementing characteristic functions with algebraic expression on numeric columns in the Sybase T- SQL language (Rozenshtein 1995) before they had a CASE expression in their product. Without going into the details, I will borrow Dr. Rozenshtein’s notation and give the major formulas for putting converted numeric comparisons into a computed characteristic function: ((a = b) becomes (1 - ABS(SIGN(a - b))) ((a <> b) becomes (ABS(SIGN(a - b))) ((a < b) becomes (1 - SIGN(1 + SIGN(a - b))) ((a <= b) becomes (SIGN(1 - SIGN(a - b))) ((a > b) becomes (1 - SIGN(1 - SIGN(a - b))) ((a >= b) becomes (SIGN(1 + SIGN(a - b))) The basic logical operators can also be put into computed characteristic functions. If we ignore NULLs and use standard Boolean logic, we can write these expressions: NOT ((a) becomes (1 - ((a)) (((a) AND ((b)) becomes SIGN(((a) * ((b)) (((a) OR ((b)) becomes SIGN(((a) + ((b)) If you remember George Boole’s original notation for Boolean Algebra, this will look very familiar. But be aware that if a or b is a NULL, 11.2 Rozenshtein Characteristic Functions 259 then the results will be a NULL and not a one or zero—something Mr. Boole never thought about. Character strings can be handled with the POSITION function, if you are careful. ((a = s) becomes POSITION(a IN s) ((a <> s) becomes SIGN (1 - POSITION (a IN s)) Rozenshtein’s book gives more tricks (Rozenshtein 1995), but many of them depend on Sybase’s T-SQL functions and are not portable. Another problem is that the code can become very hard to read, so that what is happening is not obvious to the next programmer to read the code. At this point in time, using the CASE expression is the better choice, since a human being must maintain the code. CHAPTER 12 LIKE Predicate T HE LIKE PREDICATE IS a string pattern-matching test with the syntax: <like predicate> ::= <match value> [NOT] LIKE <pattern> [ESCAPE <escape character>] <match value> ::= <character value expression> <pattern> ::= <character value expression> <escape character> ::= <character value expression> The expression M NOT LIKE P is equivalent to NOT (M LIKE P) , which follows the usual syntax pattern in SQL. Two wildcards are allowed in the <pattern> string. They are the ‘%’ and ‘_’ characters. The ‘_’ character represents a single arbitrary character; the ‘%’ character represents an arbitrary substring, possibly of length zero. Notice that there is no way to represent zero or one arbitrary character. This is not the case in many text-search languages, and can lead to problems or very complex predicates. Any other character in the <pattern> represents that character itself. This means that SQL patterns are case-sensitive, but many vendors allow you to set case sensitivity on or off at the database system level. . versions. The previous example shows the CASE expression inside the aggregate function; it is possible to put aggregate functions inside a CASE expression. For example, assume you. VALUES ('T', 'T'); 256 CHAPTER 11: CASE EXPRESSIONS INSERT INTO Foobar_DML VALUES ('T', 'F'); INSERT INTO Foobar_DML VALUES ('T', NULL); INSERT. characteristic function: ((a = b) becomes (1 - ABS(SIGN(a - b))) ((a <> b) becomes (ABS(SIGN(a - b))) ((a < b) becomes (1 - SIGN(1 + SIGN(a - b))) ((a <= b) becomes (SIGN(1 - SIGN(a -

Định dạng
Số trang	10
Dung lượng	236,92 KB