Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 272 Part II Manipulating Data With Select The result includes P.Code, but not the name or description. Of course, it’s possible to simply group by every column to be returned, but that’s sloppy. The following query performs the aggregate summa- tion in a subquery that is then joined with the Product table so that every column is available without additional work: SELECT P.Code, P.ProductName, Sales.QuantitySold FROM dbo.Product AS P JOIN (SELECT ProductID, SUM(Quantity) AS QuantitySold FROM dbo.OrderDetail GROUP BY ProductID) AS Sales ON P.ProductID = Sales.ProductID ORDER BY P.Code; If you use SQL Server Management Studio’s Query Designer, a derived table may be added to the query. Figure 11-1 illustrates the previous query being constructed using the GUI tool. FIGURE 11-1 Derived tables may be included within Query Designer by using the context menu and selecting Add Derived Table. 272 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 273 Including Data with Subqueries and CTEs 11 The query is fast and efficient, it provides the required aggregate data, and all the product columns can be added to the output columns. The result is as follows: Code ProductName QuantitySold 1002 Dragon Flight 47.00 1003 Sky Dancer 5.00 1004 Rocket Kite 2.00 1012 Falcon F-16 5.00 Another example of using a derived table to solve a problem answers the question ‘‘How many children has each mother borne?’’ from the Family sample database: USE Family; SELECT p.PersonID, p.FirstName, p.LastName, c.Children FROM dbo.Person AS p JOIN (SELECT m.MotherID, COUNT(*) AS Children FROM dbo.Person AS m WHERE m.MotherID IS NOT NULL GROUP BY m.MotherID) AS c ON p.PersonID = c.MotherID ORDER BY c.Children DESC; The subquery performs the aggregate summation, and the columns are joined with the Person table to present the final results, as follows: PersonID FirstName LastName Children 6 Audry Halloway 4 8 Melanie Campbell 3 12 Alysia Halloway 3 20 Grace Halloway 2 Row constructors New for SQL Server 2008, row constructors provide a convenient way to supply hard-coded values directly in a subquery. The VALUES clause is wrapped in parentheses, as is every hard-coded row. It requires an alias and a column alias list, also in parentheses. Row constructors can be used in the FROM clause and joined just like any other type of data source. 273 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 274 Part II Manipulating Data With Select The next query creates a row constructors data source called MyRowConstructor with two columns, a and b: SELECT a, b FROM (VALUES (1, 2), (3, 4), (5, 6), (7, 8), (9, 10) ) AS MyRowConstructor(a, b) Result: ab 12 34 56 78 910 All, some, and any Though not as popular as IN, three other options are worth examining when using a subquery in a WHERE clause. Each provides a twist on how items in the subquery are matched with the WHERE clause’s test value. ALL must be true for every value. SOME and ANY, which are equivalent keywords, must be true for some of the values in the subquery. The next query demonstrates a simple ALL subquery. In this case, select returns true if 1 is less than every value in the subquery: SELECT ‘True’ as ‘AllTest’ WHERE 1 < ALL (SELECT a FROM (VALUES (2), (3), (5), (7), (9) ) AS ValuesTable(a) ); Result: AllTest True 274 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 275 Including Data with Subqueries and CTEs 11 Be very careful with the ALL condition if the subquery might return a null. A null value in the subquery results will force the ALL to return a false, because it’s impossible to prove that the test is true for every value in the subquery if one of those values is unknown. In this query, the last value is changed from a 9 to null and the query no longer returns true: SELECT ‘True’ AS ‘AllTest’ WHERE 1 < ALL (SELECT a FROM (VALUES (2), (3), (5), (7), (null) ) AS ValuesTable(a) ); Result (empty result set): AllTest The SOME and ANY conditional tests return true if the condition is met for any values in the subquery result set. For example: SELECT ‘True’ as ‘SomeTest’ WHERE 5 = SOME (SELECT a FROM (VALUES (2), (3), (5), (7), (9) ) AS MyTable(a) ); Result: SomeTest True The ANY and SOME conditions are similar to the in condition. In fact = ANY and = SOME are exactly like IN. ANY and SOME conditions have the extra functionality to testing for other conditional tests such as <, <=, >, =>,and<>. 275 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 276 Part II Manipulating Data With Select Correlated Subqueries Correlated subqueries sound impressive, and they are. They are used in the same ways that simple subqueries are used, the difference being that correlated subqueries reference columns in the outer query. They do this by referencing the name or alias of a table in the outer query, to reference the outer query. This capability to limit the subquery by the outer query makes these queries powerful and flexible. Because correlated subqueries can reference the outer query, they are especially useful for complex WHERE conditions. Correlating in the where clause The capability to reference the outer query also means that correlated subqueries won’t run by them- selves because the reference to the outer query would cause the query to fail. The logical execution order is as follows: 1. The outer query is executed once. 2. The subquery is executed once for every row in the outer query, substituting the values from the outer query into each execution of the subquery. 3. The subquery’s results are integrated into the result set. If the outer query returns 100 rows, then SQL Server will execute the logical equivalent of 101 queries — one for the outer query, and one subquery for every row returned by the outer query. In practice, the SQL Server Query Optimizer will likely figure out a way to perform the correlated subquery without actually performing the 101 queries. In fact, I’ve sometimes seen correlated subqueries outperform other query plans. If they solve your problem, then don’t avoid them for performance reasons. To explore correlated subqueries, the next few queries, based on the Outer Banks Adventures sam- ple database, use them to compare the locations of customers and tour base camps. First, the following data-modification queries set up the data: USE CHA2; UPDATE dbo.BaseCamp SET Region = ‘NC’ WHERE BaseCampID = 1; UPDATE dbo.BaseCamp SET Region = ‘NC’ WHERE BaseCampID = 2; UPDATE dbo.BaseCamp SET Region = ‘BA’ WHERE BaseCampID = 3; UPDATE dbo.BaseCamp SET Region = ‘FL’ WHERE BaseCampID = 4; UPDATE dbo.BaseCamp SET Region = ‘WV’ WHERE BaseCampID = 5; UPDATE dbo.Customer SET Region = ‘ND’ WHERE CustomerID = 1; UPDATE dbo.Customer SET Region = ‘NC’ WHERE CustomerID = 2; UPDATE dbo.Customer SET Region = ‘NJ’ WHERE CustomerID = 3; UPDATE dbo.Customer SET Region = ‘NE’ WHERE CustomerID = 4; UPDATE dbo.Customer SET Region = ‘ND’ WHERE CustomerID = 5; UPDATE dbo.Customer SET Region = ‘NC’ WHERE CustomerID = 6; UPDATE dbo.Customer SET Region = ‘NC’ WHERE CustomerID = 7; UPDATE dbo.Customer SET Region = ‘BA’ WHERE CustomerID = 8; UPDATE dbo.Customer SET Region = ‘NC’ WHERE CustomerID = 9; UPDATE dbo.Customer SET Region = ‘FL’ WHERE CustomerID = 10; 276 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 277 Including Data with Subqueries and CTEs 11 This sample set of data produces the following matrix between customer locations and base-camp loca- tions: SELECT DISTINCT c.Region, b.Region FROM dbo.Customer AS c INNER JOIN dbo.Event_mm_Customer AS ec ON c.CustomerID = ec.CustomerID INNER JOIN dbo.Event AS e ON ec.EventID = e.EventID INNER JOIN dbo.Tour AS t ON e.TourID = t.TourID INNER JOIN dbo.BaseCamp AS b ON t.BaseCampID = b.BaseCampID WHERE c.Region IS NOT NULL ORDER BY c.Region, b.Region; Result: Customer BaseCamp Region Region BA BA BA FL BA NC FL FL FL NC FL WV NC BA NC FL NC NC NC WV ND BA ND FL ND NC NE FL NE WV NJ FL NJ NC NJ WV With this data foundation, the first query asks, ‘‘Who lives in the same region as one of our base camps?’’ The query uses a correlated subquery to locate base camps that share the same Region as the customer. The subquery is executed for every row in the Customer table, using the outer query’s named range, C, to reference the outer query. If a BaseCamp match exists for that row, then the EXISTS condition is true and the row is accepted into the result set: SELECT C.FirstName, C.LastName, C.Region FROM dbo.Customer AS C WHERE EXISTS 277 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 278 Part II Manipulating Data With Select (SELECT * FROM dbo.BaseCamp AS B WHERE B.Region = C.Region) ORDER BY C.LastName, C.FirstName; The same query written with joins requires a DISTINCT predicate to eliminate duplicate rows, usually resulting in worse performance. However, it can refer to columns in every referenced table — something a correlated subquery within a WHERE EXISTS can’t do: SELECT DISTINCT C.FirstName, C.LastName, C.Region FROM Customer AS C INNER JOIN dbo.BaseCamp AS B ON C.Region = B.Region ORDER BY C.LastName, C.FirstName; Result: FirstName LastName Region Jane Doe BA Francis Franklin FL Melissa Anderson NC Lauren Davis NC Wilson Davis NC John Frank NC A more complicated comparison asks, ‘‘Who has gone on a tour in his or her home region?’’ The answer lies in the Event_mm_Customer table — a resolution (or junction) table between the Event and Customer tables that serves to store the logical many-to-many relationships between customers and events (multiple customers may attend a single event, and a single customer may attend multiple events). The Event_mm_Customer table may be thought of as analogous to a customer’s ticket to an event. The outer query logically runs through every Event_mm_Customer row to determine whether there EXISTS any result from the correlated subquery. The subquery is filtered by the current EventID and customer RegionID from the outer query. In an informal way of thinking, the query checks every ticket and creates a list of events in a customer’s home region that the customer has attended. If anything is in the list, then the WHERE EXISTS condi- tion is true for that row. If the list is empty, WHERE EXISTS is not satisfied and the customer row in question is eliminated from the result set: USE CHA2; SELECT DISTINCT C.FirstName, C.LastName, C.Region AS Home FROM dbo.Customer AS C INNER JOIN dbo.Event_mm_Customer AS EC ON C.CustomerID = EC.CustomerID WHERE C.Region IS NOT NULL AND EXISTS (SELECT * 278 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 279 Including Data with Subqueries and CTEs 11 FROM dbo.Event AS E INNER JOIN dbo.Tour AS T ON E.TourID = T.TourID INNER JOIN dbo.BaseCamp AS B ON T.BaseCampID = B.BaseCampID WHERE B.Region = C.Region AND E.EventID = EC.EventID); ORDER BY C.LastName; Result: FirstName LastName Home Francis Franklin FL Jane Doe BA John Frank NC Lauren Davis NC Melissa Anderson NC The same query can be written using joins. Although it might be easier to read, the following query took 131 milliseconds, compared to only 80 milliseconds taken by the preceding correlated subquery: SELECT DISTINCT C.FirstName, C.LastName, C.Region AS Home, T.TourName, B.Region ?PAUL: With the TourName and Region columns included, I don’t think the DISTINCT predicate does much. Consider removing. FROM dbo.Customer AS C INNER JOIN dbo.Event_mm_Customer AS EC ON C.CustomerID = EC.CustomerID INNER JOIN dbo.Event AS E ON EC.EventID = E.EventID INNER JOIN dbo.Tour AS T ON E.TourID = T.TourID INNER JOIN dbo.BaseCamp AS B ON T.BaseCampID = B.BaseCampID WHERE C.Region = B.Region AND C.Region IS NOT NULL ORDER BY C.LastName; The join query has the advantage of including the columns from the Tour table without having to explicitly return them from the subquery. The join also lists Lauren and John twice, once for each in-region tour (and yes, the Amazon Trek tour is based out of Ft. Lauderdale): FirstName LastName Home TourName Region Melissa Anderson NC Outer Banks Lighthouses NC Lauren Davis NC Appalachian Trail NC Lauren Davis NC Outer Banks Lighthouses NC Jane Doe BA Bahamas Dive BA 279 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 280 Part II Manipulating Data With Select John Frank NC Appalachian Trail NC John Frank NC Outer Banks Lighthouses NC Francis Franklin FL Amazon Trek FL Although correlated subqueries can be mind-bending, the flexibility and potential performance gains are worth it. Make sure that the correlated subquery returns the correct answer. Correlating a derived table using apply A subquery used as a derived table can reference the outer query, which makes it a correlated subquery. This technique leverages the previous correlated subquery method by using the correlation in the WHERE clause of the derived table subquery. Subqueries used as derived tables aren’t allowed to reference the outer query if they are included in the outer query with a JOIN. However, the CROSS APPLY or OUTER APPLY method of including the sub- query allows passing data to the subquery. First, to set up some sample data: USE tempdb; CREATE TABLE TableA (ID INT); INSERT INTO TableA VALUES (1); INSERT INTO TableA VALUES (2); CREATE TABLE TableB (ID INT); INSERT INTO TableB VALUES (1); INSERT INTO TableB VALUES (3); The following query uses a CROSS APPLY to pass every row from the outer query to the derived table subquery. The subquery then filters its rows to those that match IDs. The CROSS APPLY returns every row from the outer query that had a match in the subquery. Functionally, it’s the equivalent to an inner join between TableA and TableB: SELECT B.ID AS Bid, A.ID AS Aid FROM TableB AS B CROSS APPLY (Select ID from TableA where TableA.ID = B.ID)ASA; Result: Bid Aid 11 The next query uses the same correlated derived table subquery, but changes to an OUTER APPLY to include all rows from the outer query. This query is the same as a left outer join between TableA and TableB: 280 www.getcoolebook.com Nielsen c11.tex V4 - 07/23/2009 1:54pm Page 281 Including Data with Subqueries and CTEs 11 SELECT B.ID AS Bid, A.ID AS Aid FROM TableB AS B OUTER APPLY (Select ID from TableA where TableA.ID = B.ID) AS A; Result: ID ID 11 3 NULL Relational Division Recall that a cross join, discussed in the previous chapter, is relational multiplication — two data sets are multiplied to create a Cartesian product. In theory, all joins are cross joins with some type of con- ditional restriction. Even an inner join is the relational-multiplication product of two tables restricted to those results that match keys. Relational division complements relational multiplication just as basic math division complements multi- plication. If the purpose of relational multiplication is to produce a product set from two multiplier sets, then the purpose of relational division is to divide one data set (the dividend data set) by another data set (the divisor data set) to find the quotient data set, as shown in Figure 11-2. In other words, if the Carte- sian product is known, and one of the multiplier data sets is known, then relational division can deduce the missing multiplier set. FIGURE 11-2 Relational division is the inverse of relational multiplication, deducing the quotient set by dividing the dividend set by the divisor set. a b Factor c a a Cartesian Product a 1 2 Factor Set Multiplication 3 1 2 3 b b b 1 2 3 c c c 1 2 3 b c Quotient d a b Dividend c 2 3 Divisor Set Division 1 2 3 d e f 2 4 5 While this may sound academic, relational division can be very practical. The classic example of relational division answers the question ‘‘Which students have passed every required course?’’ An exact 281 www.getcoolebook.com . rows, then SQL Server will execute the logical equivalent of 101 queries — one for the outer query, and one subquery for every row returned by the outer query. In practice, the SQL Server Query. 4 8 Melanie Campbell 3 12 Alysia Halloway 3 20 Grace Halloway 2 Row constructors New for SQL Server 2008, row constructors provide a convenient way to supply hard-coded values directly in a. dbo.OrderDetail GROUP BY ProductID) AS Sales ON P.ProductID = Sales.ProductID ORDER BY P.Code; If you use SQL Server Management Studio’s Query Designer, a derived table may be added to the query. Figure 11-1