302 CHAPTER 15: EXISTS() PREDICATE SELECT P1.emp_name, ' was born on a day without a famous New Yorker!' FROM Personnel AS P1 WHERE P1.birthday NOT IN (SELECT C1.birthday FROM Celebrities AS C1 WHERE C1.birth_city = 'New York'); and you would think that the EXISTS version would be: SELECT P1.emp_name, ' was born on a day without a famous New Yorker!' FROM Personnel AS P1 WHERE NOT EXISTS (SELECT * FROM Celebrities AS C1 WHERE C1.birth_city = 'New York' AND C1.birthday = P1.birthday); Assume that Gloria Glamour is our only New Yorker and we still do not know her birthday. The subquery will be empty for every employee in the NOT EXISTS predicate version, because her NULL birthday will not test equal to the known employee birthdays. That means that the NOT EXISTS predicate will return TRUE and we will get every employee to match to Ms. Glamour. But now look at the IN predicate version, which will have a single NULL in the subquery result. This predicate will be equivalent to (Personnel.birthday = NULL) , which is always UNKNOWN , and we will get no employees back. Likewise, you cannot, in general, transform the quantified comparison predicates into EXISTS predicates, because of the possibility of NULL values. Remember that x <> ALL <subquery> is shorthand for x NOT IN <subquery>, and x = ANY <subquery> is shorthand for x IN <subquery>, and it will not surprise you. In general, the EXISTS predicates will run faster than the IN predicates. The problem is in deciding whether to build the query or the subquery first; the optimal approach depends on the size and distribution of values in each, and that cannot usually be known until runtime. 15.2 EXISTS and INNER JOINs The [NOT] EXISTS predicate is almost always used with a correlated subquery. Very often the subquery can be “flattened” into a JOIN , which 15.3 NOT EXISTS and OUTER JOINs 303 will frequently run faster than the original query. Our sample query can be converted into: SELECT P1.emp_name, ' has the same birthday as a famous person!' FROM Personnel AS P1, Celebrities AS C1 WHERE P1.birthday = C1.birthday; The advantage of the JOIN version is that it allows us to show columns from both tables. We should make the query more informative by rewriting it: SELECT P1.emp_name, ' has the same birthday as ', C1.emp_name FROM Personnel AS P1, Celebrities AS C1 WHERE P1.birthday = C1.birthday; This new query could be written with an EXISTS() predicate, but that is a waste of resources. SELECT P1.emp_name, ' has the same birthday as ', C1.emp_name FROM Personnel AS P1, Celebrities AS C1 WHERE EXISTS (SELECT * FROM Celebrities AS C2 WHERE P1.birthday = C2.birthday AND C1.emp_name = C2.emp_name); 15.3 NOT EXISTS and OUTER JOINs The NOT EXISTS version of this predicate is almost always used with a correlated subquery. Very often the subquery can be “flattened” into an OUTER JOIN, which will frequently run faster than the original query. Our other sample query was: SELECT P1.emp_name, ' was born on a day without a famous New Yorker!' FROM Personnel AS P1 WHERE NOT EXISTS (SELECT * FROM Celebrities AS C1 WHERE C1.birth_city = 'New York' AND C1.birthday = P1.birthday); 304 CHAPTER 15: EXISTS() PREDICATE Which we can replace with: SELECT P1.emp_name, ' was born on a day without a famous New Yorker!' FROM Personnel AS P1 LEFT OUTER JOIN Celebrities AS C1 ON C1.birth_city = 'New York' AND C1.birthday = E2.birthday WHERE C1.emp_name IS NULL; This is assuming that we know each and every celebrity name in the Celebrities table. If the column in the WHERE clause could have NULLs in its base table, then we could not prune out the generated NULLs. The test for NULL should always be on (a column of) the primary key, which cannot be NULL. Relating this back to the example, how could a celebrity be a celebrity with an unknown name? Even The Unknown Comic had a name (“The Unknown Comic”). 15.4 EXISTS() and Quantifiers Formal logic makes use of quantifiers that can be applied to propositions. The two forms are “For allx, P(x)” and “For somex, P(x)”. The first is written as {{inverted uppercase A }} and the second is written as {{reversed uppercase E}}, if you want to look up formulas in a textbook. The quantifiers put into symbols such statements as “all men are mortal” or “some Cretans are liars” so they can be manipulated. The big question more than 100 years ago was that of existential import in formal logic. Everyone agreed that saying “all men are mortal” implies that “no men are not mortal,” but does it also imply that “some men are mortal”—that we have to have at least one man who is mortal? Existential import lost the battle and the modern convention is that “All men are mortal” has the same meaning as “There are no men who are immortal,” but does not imply that any men exist at all. This is the convention followed in the design of SQL. Consider the statement “some salesmen are liars” and the way we would write it with the EXISTS() predicate in SQL: EXISTS(SELECT * 15.5 EXISTS() and Referential Constraints 305 FROM Personnel AS P1, Liars AS L1 WHERE P1.job = 'Salesman' AND P1.emp_name = L1.emp_name); If we are more cynical about salesmen, we might want to formulate the predicate “all salesmen are liars” with the EXISTS predicate in SQL, using the transform rule just discussed: NOT EXISTS(SELECT * FROM Personnel AS P1 WHERE P1.job = 'Salesman' AND P1.emp_name NOT IN (SELECT L1.emp_name FROM Liars AS L1)); That says, informally, “there are no salesmen who are not liars” in English. In this case, the IN predicate can be changed into JOIN, which should improve performance and be a bit easier to read. 15.5 EXISTS() and Referential Constraints Standard SQL was designed so that the declarative referential constraints could be expressed as EXISTS() predicates in a CHECK() clause. For example: CREATE TABLE Addresses (addressee_name CHAR(25) NOT NULL PRIMARY KEY, street_loc CHAR(25) NOT NULL, city_name CHAR(20) NOT NULL, state_code CHAR(2) NOT NULL REFERENCES ZipCodeData(state_code), ); could be written as: CREATE TABLE Addresses (addressee_name CHAR(25) NOT NULL PRIMARY KEY, street_loc CHAR(25) NOT NULL, 306 CHAPTER 15: EXISTS() PREDICATE city_name CHAR(20) NOT NULL, state_code CHAR(2) NOT NULL, CONSTRAINT valid_state_code CHECK (EXISTS(SELECT * FROM ZipCodeData AS Z1 WHERE Z1.state_code = Addresses.state_code)), ); There is no advantage to this expression for the DBA, since you cannot attach referential actions with the CHECK() constraint. However, an SQL database can use the same mechanisms in the SQL compiler for both constructions. 15.6 EXISTS and Three-Valued Logic This example is due to an article by Lee Fesperman at FirstSQL. Using Chris Date’s “SupplierParts” table with three rows: CREATE TABLE SupplierPart (sup_nbr CHAR(2) NOT NULL PRIMARY KEY, part_nbr CHAR(2) NOT NULL, qty INTEGER CHECK (qty > 0)); sup_nbr part_nbr qty ====================== 'S1' 'P1' NULL 'S2' 'P1' 200 'S3' 'P1' 1000 The row (‘S1’, ‘P1’, NULL) means that supplier ‘S1’ supplies part ‘P1’ but we do not know what quantity he has. The query we wish to answer is “Find suppliers of part ‘P1’, but not in a quantity of 1000 on hand.” The correct answer is ‘S2’. All suppliers in the table supply ‘P1’, but we do know ‘S3’ supplies the part in quantity 1000 and we do not know in what quantity ‘S1’ supplies the part. The only supplier we eliminate for certain is ‘S2’. An SQL query to retrieve this result would be: SELECT spx.sup_nbr FROM SupplierParts AS spx WHERE px.part_nbr = 'P1' 15.6 EXISTS and Three-Valued Logic 307 AND 1000 NOT IN (SELECT spy.qty FROM SupplierParts AS spy WHERE spy.sup_nbr = spx.sup_nbr AND spy.part_nbr = 'P1'); According to Standard SQL, this query should return only ‘S2’, but when we transform the query into an equivalent version, using EXISTS instead, we obtain: SELECT spx.sup_nbr FROM SupplierParts AS spx WHERE spx.part_nbr = 'P1' AND NOT EXISTS (SELECT * FROM SupplierParts AS spy WHERE spy.sup_nbr = spx.sup_nbr AND spy.part_nbr = 'P1' AND spy.qty = 1000); Which will return (‘S1’, ‘S2’). You can argue that this is the wrong answer because we do not definitely know whether or not ‘S1’ supplies ‘P1’ in quantity 1000. The EXISTS() predicate will return TRUE or FALSE, even in situations where a subquery’s predicate returns an UNKNOWN (i.e., NULL = 1000). The solution is to modify the predicate that deals with the quantity in the subquery to explicitly say that you do or not want to give the “benefit of the doubt” to the NULL. You have several alternatives: 1. (spy.qty = 1000) IS NOT FALSE This uses the new predicates in Standard SQL for testing logical values. Frankly, this is confusing to read and worse to maintain. 2. (spy.qty = 1000 OR spy.qty IS NULL) This uses another test predicate, but the optimizer can probably use any index on the qty column. 308 CHAPTER 15: EXISTS() PREDICATE 3. (COALESCE(spy.qty, 1000) = 1000) This is portable and easy to maintain. The only disadvantage is that some SQL products might not be able to use an index on the qty column, because it is in an expression. The real problem is that the query was formed with a double negative in the form of a NOT EXISTS and an implicit IS NOT FALSE condition. The problem stems from the fact that the EXISTS() predicate is one of the few two-value predicates in SQL, and that (NOT (NOT UNKNOWN)) = UNKNOWN. For another approach based on Dr. Codd’s second relational model, visit www.FirstSQL.com and read some of the white papers by Lee Fesperman. He used the two NULLs Codd proposed to develop a product. CHAPTER 16 Quantified Subquery Predicates A QUANTIFIER IS A logical operator that states the quantity of objects for which a statement is TRUE . This is a logical quantity, not a numeric quantity; it relates a statement to the whole set of possible objects. In everyday life, you see statements like “There is only one mouthwash that stops dinosaur breath,” “All doctors drive Mercedes,” or “Some people got rich investing in cattle futures,” which are quantifie d. The first statement, about the mouthwash, is a uniqueness quantifier. If there were two or more products that could save us from dinosaur breath, the statement would be FALSE . The second statement has what is called a universal quantifier, since it deals with all doctors—find one exception and the statement is FALSE . The last statement has an existential quantifier, since it asserts that one or more people exist who got rich on cattle futures—find one example and the statement is TRUE . SQL has forms of these quantifiers that are not quite like those in formal logic. They are based on extending the use of comparison predicates to allow result sets to be quantified, and they use SQL’s three-valued logic, so they do not return just TRUE or FALSE . 310 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES 16.1 Scalar Subquery Comparisons Standard SQL allows both scalar and row comparisons, but most queries use only scalar expressions. If a subquery returns a single-row, single- column result table, it is treated as a scalar value in Standard SQL in virtually any place a scalar could appear. For example, to find out if we have any teachers who are more than one year older than the students, I could write: SELECT T1.teacher_name FROM Teachers AS T1 WHERE T1.birthday > (SELECT MAX(S1.birthday) - INTERVAL '365' DAY FROM Students AS S1); In this case, the scalar subquery will be run only once and reduced to a constant value by the optimizer before scanning the Teachers table. A correlated subquery is more complex, because it will have to be executed for each value from the containing query. For example, to find which suppliers have sent us fewer than 100 parts, we would use this query. Notice how the SUM(quantity) has to be computed for each supplier number, sup_nbr. SELECT sup_nbr, sup_name FROM Suppliers WHERE 100 > (SELECT SUM(quantity) FROM Shipments WHERE Shipments.sup_nbr = Suppliers.sup_nbr); If a scalar subquery returns a NULL , we have rules for handling comparison with NULL s. But what if it returns an empty result—a supplier that has not shipped us anything? In Standard SQL, the empty result table is converted to a NULL of the appropriate data type. In Standard SQL, you can place scalar or row subqueries on either side of a comparison predicate as long as they return comparable results. But you must be aware of the rules for row comparisons. For example, the following query will find the product manager who has more of his product at the stores than in the warehouse: SELECT manager_name, product_nbr FROM Stores AS S1 16.2 Quantifiers and Missing Data 311 WHERE (SELECT SUM(qty) FROM Warehouses AS W1 WHERE S1.product_nbr = W1.product_nbr) < (SELECT SUM(qty) FROM RetailStores AS R1 WHERE S1.product_nbr = R1.product_nbr); Here is a programming tip: the main problem with writing these queries is getting a result with more than one row in it. You can guarantee uniqueness in several ways. An aggregate function on an ungrouped table will always be a single value. A JOIN with the containing query based on a key will always be a single value. 16.2 Quantifiers and Missing Data The quantified predicates are used with subquery expressions to compare a single value to those of the subquery, and take the general form <value expression> <comp op> <quantifier> <subquery> . The predicate "<value expression> <comp op> [ANY|SOME] <table expression>" is equivalent to taking each row, s , (assume that they are numbered from 1 to n ) of <table expression> and testing "<value expression> <comp op> s" with OR s between the expanded expressions: ((<value expression> <comp op> s1) OR (<value expression> <comp op> s2) OR (<value expression> <comp op> s n )) When you get a single TRUE result, the whole predicate is TRUE . As long as <table expression> has cardinality greater than zero and one non- NULL value, you will get a result of TRUE or FALSE . The keyword SOME is the same as ANY , and the choice is just a matter of style and readability. Likewise, "<value expression> <comp op> ALL <table expression>" takes each row, s , of <table expression> and tests <value expression> <comp op> s with AND s between the expanded expressions: ((<value expression> <comp op> s1) AND (<value expression> <comp op> s2) AND (<value expression> <comp op> s n )) . QUANTIFIED SUBQUERY PREDICATES 16.1 Scalar Subquery Comparisons Standard SQL allows both scalar and row comparisons, but most queries use only scalar expressions. If a subquery returns a single-row,. about salesmen, we might want to formulate the predicate “all salesmen are liars” with the EXISTS predicate in SQL, using the transform rule just discussed: NOT EXISTS(SELECT * FROM Personnel. to Standard SQL, this query should return only S2 ’, but when we transform the query into an equivalent version, using EXISTS instead, we obtain: SELECT spx.sup_nbr FROM SupplierParts AS spx