■ Although I use SELECT COUNT(*) in some of the DBMS-specific subqueries in the DBMS Tip in this section, you should be wary of using an aggregate function in a subquery’s SELECT clause. The existence test in Listing 8.59, for example, always is true because COUNT(*) always will return a row (with the value zero here). I could argue that the result, Figure 8.59, is flawed logically because no publisher ID XXX exists. ■ To run Listings 8.55, 8.57, and 8.58 in Microsoft Access, change SELECT * to SELECT 1 . Additionally, in Listing 8.57 add the clause FROM authors to the outer query, and in Listing 8.58 add the clause FROM title_authors to the outer query. To run Listings 8.57 and 8.58 in Oracle, add the clause FROM DUAL to the outer query; see the DBMS Tip in “Creating Derived Columns” in Chapter 5. To run Listings 8.55, 8.57, and 8.58 in DB2, change SELECT * to SELECT 1 . Additionally, in Listings 8.57 and 8.58, add the clause FROM SYSIBM.SYSDUMMY1 to the outer query; see the DBMS Tip in “Creating Derived Columns” in Chapter 5. For example, change Listing 8.57 to: SELECT DISTINCT ‘Yes’ AS “Duplicates?” FROM SYSIBM.SYSDUMMY1 WHERE EXISTS (SELECT 1 FROM authors GROUP BY au_id HAVING COUNT(*) > 1); In MySQL, to run Listing 8.57 add the clause FROM authors to the outer query, and in Listing 8.58 add the clause FROM title_authors to the outer query. MySQL 4.0 and earlier don’t support subqueries; see the DBMS Tip in “Understanding Subqueries” earlier in this chapter. To run Listings 8.55, 8.57, and 8.58 in PostgreSQL, change SELECT * to SELECT 1 . 300 Chapter 8 Testing Existence with EXISTS Listing 8.59 Be careful when using aggregate functions in a subquery SELECT clause. See Figure 8.59 for the result. SELECT pub_id FROM publishers WHERE EXISTS (SELECT COUNT(*) FROM titles WHERE pub_id = 'XXX'); Listing pub_id P01 P02 P03 P04 Figure 8.59 The result of Listing 8.59. Comparing Equivalent Queries As you’ve seen in this chapter and the pre- ceding one, you can express the same query in different ways (different syntax, same semantics). To expand on this point, I’ve written the same query six semantically equivalent ways. Each of the statements in Listing 8.60 lists the authors who have written (or cowritten) at least one book. See Figure 8.60 for the result. The first two queries (inner joins) will run at the same speed as one another. Of the third through sixth queries (which use sub- queries), the last one probably is the worst performer. The DBMS will stop processing the other subqueries as soon as it encounters a single matching value. But the subquery in the last statement has to count all the matching rows before it returns either true or false. Your DBMS’s optimizer should run the inner joins at about the same speed as the fastest subquery statement. You might find this programming flexibility to be attractive, but people who design DBMS optimizers don’t, because they’re tasked with considering all the possible ways to express a query, figuring out which one per- forms best, and reformulating your query internally to its optimal form. (Entire careers are devoted to solving these types of opti- mization problems.) If your DBMS has a flawless optimizer, it will run all six of the queries in Listing 8.60 at the same speed. But that situation is unlikely, so you’ll have to experiment with your DBMS to see which version runs fastest. 301 Subqueries Comparing Equivalent Queries Listing 8.60 These six queries are equivalent semantically; they all list the authors who have written (or cowritten) at least one book. See Figure 8.60 for the result. SELECT DISTINCT a.au_id FROM authors a INNER JOIN title_authors ta ON a.au_id = ta.au_id; SELECT DISTINCT a.au_id FROM authors a, title_authors ta WHERE a.au_id = ta.au_id; SELECT au_id FROM authors a WHERE au_id IN (SELECT au_id FROM title_authors); SELECT au_id FROM authors a WHERE au_id = ANY (SELECT au_id FROM title_authors); SELECT au_id FROM authors a WHERE EXISTS (SELECT * FROM title_authors ta WHERE a.au_id = ta.au_id); SELECT au_id FROM authors a WHERE 0 < (SELECT COUNT(*) FROM title_authors ta WHERE a.au_id = ta.au_id); Listing au_id A01 A02 A03 A04 A05 A06 Figure 8.60 Each of the six statements in Listing 8.60 returns this result. ✔ Tips ■ You should compare queries against large test tables (more than 10,000 or even 100,000 rows) so that speed and memory differences will be obvious. ■ DBMSs provide tools to let you measure the efficiency of queries. Tables 8.4 and 8.5 list the com- mands that time queries and show their execution plans. 302 Chapter 8 Comparing Equivalent Queries Table 8.4 Timing Queries DBMS Command Access Not available SQL Server SET STATISTICS TIME ON Oracle SET TIMING ON DB2 db2batch MySQL The mysql command-line utility prints execution times by default. PostgreSQL \timing Table 8.5 Showing Query Execution Plans DBMS Command Access Not available SQL Server SET SHOWPLAN_TEXT ON Oracle EXPLAIN PLAN DB2 EXPLAIN or db2expln MySQL EXPLAIN PostgreSQL EXPLAIN SQL Tuning After you learn the basics of SQL, your next step is to tune your SQL statements so that they run efficiently, which means learning about your DBMS’s optimizer. Performance tuning involves some platform-independent general principles, but the most effective tuning relies on the idiosyncrasies of the specific DBMS. Tuning is beyond the scope of this book, but the internet has plenty of discussion groups and articles—search for tuning (or performance or optimization) together with the name of your DBMS. A good book to get started with is Peter Gulutzan and Trudy Pelzer’s SQL Performance Tuning (Addison-Wesley), which covers eight DBMSs, or Dan Tow’s SQL Tuning (O’Reilly), which covers Microsoft SQL Server, Oracle, and DB2. If you look up one of these books on Amazon.com, you can find other tuning books by clicking the Similar Items link. Recall from Chapter 2 that set theory is fundamental to the relational model. But whereas mathematical sets are unchanging, database sets are dynamic—they grow, shrink, and otherwise change over time. This chapter covers the following SQL set operators, which combine the results of two SELECT statements into one result: ◆ UNION returns all the rows returned by both queries, with duplicates removed. ◆ INTERSECT returns all rows common to both queries (that is, all distinct rows retrieved by both queries). ◆ EXCEPT returns all rows from the first query without the rows that appear in the second query, with duplicates removed. These set operations aren’t joins, but you can mix and chain them to combine two or more tables. 303 Set Operations 9 Set Operations Combining Rows with UNION A UNION operation combines the results of two queries into a single result that has the rows returned by both queries. (This operation differs from a join, which combines columns from two tables.) A UNION expression removes duplicate rows from the result; a UNION ALL expression doesn’t remove duplicates. Unions are simple, but they have some restrictions: ◆ The SELECT -clause lists in the two queries must have the same number of columns (column names, arithmetic expressions, aggregate functions, and so on). ◆ The corresponding columns in the two queries must be listed in the same order in the two queries. ◆ The corresponding columns must have the same data type or must be implicitly convertible to the same type. ◆ If the names of corresponding columns match, that column name is used in the result. If the corresponding column names differ, it’s up to the DBMS to determine the column name in the result. Most DBMSs take the result’s column names from the first individual query in the UNION statement. If you want to rename a column in the result, use an AS clause in the first query; see “Creating Column Aliases with AS ” in Chapter 4. ◆ An ORDER BY clause can appear in only the final query in the UNION statement. The sort is applied to the final, combined result. Because the result’s column names depend on the DBMS, it’s often easiest to use relative column positions to specify the sort order; see “Sorting Rows with ORDER BY ” in Chapter 4. ◆ GROUP BY and HAVING can be specified in the individual queries only; they can’t be used to affect the final result. 304 Chapter 9 Combining Rows with UNION Listing 9.1 List the states where authors and publishers are located. See Figure 9.1 for the result. SELECT state FROM authors UNION SELECT state FROM publishers; Listing state NULL CA CO FL NY Figure 9.1 Result of Listing 9.1. Listing 9.2 List the states where authors and publishers are located, including duplicates. See Figure 9.2 for the result. SELECT state FROM authors UNION ALL SELECT state FROM publishers; Listing state NY CO CA CA NY CA FL NY CA NULL CA Figure 9.2 Result of Listing 9.2. To combine rows: ◆ Type: select_statement1 UNION [ALL] select_statement2; select_statement1 and select_statement2 are SELECT statements. The number and the order of the columns must be identi- cal in both statements, and the data types of corresponding columns must be compatible. Duplicate rows are eliminated from the result unless ALL is specified. Listing 9.1 lists the states where authors and publishers are located. By default, UNION removes duplicate rows from the result. See Figure 9.1 for the result. Listing 9.2 is the same as Listing 9.1 except that it includes the ALL keyword, so all rows are included in the results, and duplicates aren’t removed. See Figure 9.2 for the result. Listing 9.3 lists the names of all the authors and publishers. The AS clause in the first query names the column in the result. The ORDER BY clause uses a relative column posi- tion instead of a column name to sort the result. See Figure 9.3 for the result. 305 Set Operations Combining Rows with UNION Listing 9.3 List the names of all the authors and publishers. See Figure 9.3 for the result. SELECT au_fname || ' ' || au_lname AS "Name" FROM authors UNION SELECT pub_name FROM publishers ORDER BY 1 ASC; Listing Name Kellsey Abatis Publishers Christian Kells Core Dump Books Hallie Hull Klee Hull Paddy O'Furniture Sarah Buchman Schadenfreude Press Tenterhooks Press Wendy Heydemark Figure 9.3 Result of Listing 9.3. Listing 9.4 expands on Listing 9.3 and defines the extra column Type to identify which table each row came from. The WHERE conditions retrieve the authors and publishers from New York state only. See Figure 9.4 for the result. Listing 9.5 adds a third query to Listing 9.4 to retrieve the titles of books published in New York state also. See Figure 9.5 for the result. Listing 9.6 is similar to Listing 9.5 except that it lists the counts of each author, pub- lisher, and book, instead of their names. See Figure 9.6 for the result. 306 Chapter 9 Combining Rows with UNION Listing 9.4 List the names of all the authors and publishers located in New York state, sorted by type and then by name. See Figure 9.4 for the result. SELECT 'author' AS "Type", au_fname || ' ' || au_lname AS "Name", state FROM authors WHERE state = 'NY' UNION SELECT 'publisher', pub_name, state FROM publishers WHERE state = 'NY' ORDER BY 1 ASC, 2 ASC; Listing Type Name state author Christian Kells NY author Sarah Buchman NY publisher Abatis Publishers NY Figure 9.4 Result of Listing 9.4. 307 Set Operations Combining Rows with UNION Listing 9.5 List the names of all the authors and publishers located in New York state and the titles of books published in New York state, sorted by type and then by name. See Figure 9.5 for the result. SELECT 'author' AS "Type", au_fname || ' ' || au_lname AS "Name" FROM authors WHERE state = 'NY' UNION SELECT 'publisher', pub_name FROM publishers WHERE state = 'NY' UNION SELECT 'title', title_name FROM titles t INNER JOIN publishers p ON t.pub_id = p.pub_id WHERE p.state = 'NY' ORDER BY 1 ASC, 2 ASC; Listing Type Name author Christian Kells author Sarah Buchman publisher Abatis Publishers title 1977! title How About Never? title Not Without My Faberge Egg title Spontaneous, Not Annoying Figure 9.5 Result of Listing 9.5. Listing 9.6 List the counts of all the authors and publishers located in New York state and the titles of books published in New York state, sorted by type. See Figure 9.6 for the result. SELECT 'author' AS "Type", COUNT(au_id) AS "Count" FROM authors WHERE state = 'NY' UNION SELECT 'publisher', COUNT(pub_id) FROM publishers WHERE state = 'NY' UNION SELECT 'title', COUNT(title_id) FROM titles t INNER JOIN publishers p ON t.pub_id = p.pub_id WHERE p.state = 'NY' ORDER BY 1 ASC; Listing Type Count author 2 publisher 1 title 4 Figure 9.6 Result of Listing 9.6. In Listing 9.7, I revisit Listing 5.30 in “Evaluating Conditional Values with CASE ” in Chapter 5. But instead of using CASE to change book prices and simulate if-then logic, I use multiple UNION queries. See Figure 9.7 for the result. 308 Chapter 9 Combining Rows with UNION Listing 9.7 Raise the price of history books by 10 percent and psychology books by 20 percent, and leave the prices of other books unchanged. See Figure 9.7 for the result. SELECT title_id, type, price, price * 1.10 AS "New price" FROM titles WHERE type = 'history' UNION SELECT title_id, type, price, price * 1.20 FROM titles WHERE type = 'psychology' UNION SELECT title_id, type, price, price FROM titles WHERE type NOT IN ('psychology','history') ORDER BY type ASC, title_id ASC; Listing title_id type price New price T06 biography 19.95 19.95 T07 biography 23.95 23.95 T10 biography NULL NULL T12 biography 12.99 12.99 T08 children 10.00 10.00 T09 children 13.95 13.95 T03 computer 39.95 39.95 T01 history 21.99 24.19 T02 history 19.95 21.95 T13 history 29.99 32.99 T04 psychology 12.99 15.59 T05 psychology 6.95 8.34 T11 psychology 7.99 9.59 Figure 9.7 Result of Listing 9.7. UNION Commutativity In theory, the order in which the SELECT statements (tables) occur in a UNION should make no speed difference. But in practice your DBMS might run small_table1 UNION small_table2 UNION big_table; faster than small_table1 UNION big_table UNION small_table2; because of the way the optimizer merges intermediate results and removes duplicate rows. The results are DBMS dependent. Experiment. ✔ Tips ■ UNION is a commutative operation: A UNION B is the same as B UNION A . ■ The SQL standard gives INTERSECT higher precedence than UNION and EXCEPT , but your DBMS might use a different order. Use parentheses to specify order of eval- uation in queries with mixed set operators; see “Determining the Order of Evaluation” in Chapter 5. ■ Don’t use UNION where a compound con- dition will suffice: SELECT DISTINCT * FROM mytable WHERE col1 = 1 AND col2 = 2; usually is faster than SELECT * FROM mytable WHERE col1 = 1; UNION SELECT * FROM mytable WHERE col2 = 2; ■ If you mix UNION and UNION ALL in a single statement, use parentheses to specify order of evaluation. Take these two statements, for example: SELECT * FROM table1 UNION ALL (SELECT * FROM table2 UNION SELECT * FROM table3); and: (SELECT * FROM table1 UNION ALL SELECT * FROM table2) UNION SELECT * FROM table3; The first statement eliminates duplicates in the union of table2 and table3 but doesn’t eliminate duplicates in the union of that result and table1. The second statement includes duplicates in the union of table1 and table2 but eliminates duplicates in the subsequent union with table3, so ALL has no effect on the final result of this statement. ■ For UNION operations, the DBMS per- forms an internal sort to identify and remove duplicate rows; hence, the result of a UNION might be sorted even if you don’t specify an ORDER BY clause. UNION ALL doesn’t sort because it doesn’t need to remove duplicates. Sorting is compu- tationally expensive—don’t use UNION when UNION ALL will suffice. ■ In Microsoft Access and Microsoft SQL Server, use + to concatenate strings (see “Concatenating Strings with || ” in Chapter 5). To run Listings 9.3 through 9.5, change the con- catenation expressions to: au_fname + ‘ ‘ + au_lname In MySQL, use CONCAT() to concatenate strings (see “Concatenating Strings with || ” in Chapter 5). To run Listings 9.3 through 9.5, change the concatenation expressions to: CONCAT(au_fname, ‘ ‘, au_lname) In older PostgreSQL versions, convert the floating-point numbers in Listing 9.7 to DECIMAL ; see “Converting Data Types with CAST() ” in Chapter 5. To run Listing 9.7, change new-price calcula- tions to: price * CAST((1.10) AS DECIMAL) price * CAST((1.20) AS DECIMAL) 309 Set Operations Combining Rows with UNION . available SQL Server SET SHOWPLAN_TEXT ON Oracle EXPLAIN PLAN DB2 EXPLAIN or db2expln MySQL EXPLAIN PostgreSQL EXPLAIN SQL Tuning After you learn the basics of SQL, your next step is to tune your SQL. Command Access Not available SQL Server SET STATISTICS TIME ON Oracle SET TIMING ON DB2 db2batch MySQL The mysql command-line utility prints execution times by default. PostgreSQL iming Table 8.5 Showing. Peter Gulutzan and Trudy Pelzer’s SQL Performance Tuning (Addison-Wesley), which covers eight DBMSs, or Dan Tow’s SQL Tuning (O’Reilly), which covers Microsoft SQL Server, Oracle, and DB2. If you