ptg 1704 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks 10 Clutch 3 2 16 Gear Box 3 2 5 Radiator 2 2 6 Intake Manifold 2 2 7 Exhaust Manifold 2 2 8 Carburetor 2 2 13 Piston 2 2 14 Crankshaft 2 2 In Listing 43.39, the filter WHERE lvl < 2 in the recursive member is used as a recursion termination check; recursion stops when lvl = 2. The filter on the outer query (WHERE lvl = 2) is used to remove all parts up to the second level. Logically, the filter in the outer query ( lvl = 2) is sufficient by itself to return only the desired rows, but for perfor- mance reasons, you should include the filter in the recursive member to stop the recursion as soon as two levels below the drivetrain are returned. SQL Server allows the use of local variables in a CTE to help make the query more generic. For example, you can use variables instead of constants for the part ID and level, as shown in Listing 43.40. LISTING 43.40 Using Local Variables in a Recursive CTE DECLARE @partid AS INT, @lvl AS INT; SET @partid = 22; Car SET @lvl = 2; two levels WITH PartsCTE(partid, partname, parentpartid, lvl) AS ( SELECT partid, partname, parentpartid, 0 FROM PARTS WHERE partid = @partid UNION ALL SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1 FROM Parts as P JOIN PartsCTE as PP ON P.parentpartid = PP.Partid WHERE lvl < @lvl ) SELECT PartID, Partname, ParentPartid, lvl FROM PartsCTE Go PartID Partname ParentPartid lvl 22 Car NULL 0 ptg 1705 Common Table Expressions 43 1 DriveTrain 22 1 23 Body 22 1 24 Frame 22 1 2 Engine 1 2 3 Transmission 1 2 4 Axle 1 2 12 Drive Shaft 1 2 You can also use recursive CTEs to perform aggregations, such as counting the total number of subparts that make up each parent part, as shown in Listing 43.41. LISTING 43.41 Performing Aggregation with a Recursive CTE WITH PartsCTE(parentpartid, lvl) AS ( SELECT parentpartid, 0 FROM PARTS WHERE parentpartid is not null UNION ALL SELECT P.parentpartid, lvl+1 FROM Parts as P JOIN PartsCTE as PP ON PP.parentpartid = P.Partid WHERE P.parentpartid is not null ) SELECT C.parentpartid, P.PartName, COUNT(*) AS cnt FROM PartsCTE C JOIN PArts P on C.ParentPartID = P.PartID GROUP BY C.parentpartid, P.PArtName go parentpartid PartName cnt 1 DriveTrain 20 2 Engine 8 3 Transmission 8 8 Carburetor 1 13 Piston 1 16 Gear Box 5 22 Car 23 In the example in Listing 43.41, the anchor member returns a row with the parentpartid for each part, being sure to filter out the NULL value in the parentpartid column because it is essentially the top of the hierarchy and represents no parent part. The recursive ptg 1706 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks member returns the parentpartid of each parent of the previously returned parts, again excluding any NULL values. Eventually, the CTE contains, for each part, as many occur- rences as their direct or indirect number of subparts. The outer query is then left with the tasks of grouping the results by parentpartid and returning the count of occurrences. A join to Parts is included to get the corresponding partname for each parent part to provide more meaningful results. Suppose you want to generate a report that is a bit more readable, with the subparts sorted and indented according to hierarchical dependencies. Listing 43.42 provides a way you could accomplish this. LISTING 43.42 Generating a Formatted Report with a Recursive CTE WITH PartsCTE(partid, partname, parentpartid, lvl, sortcol) AS ( SELECT partid, partname, parentpartid, 0, cast(partid as varbinary(max)) FROM Parts WHERE partid = 22 UNION ALL SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1, CAST(sortcol + CAST(P.partid AS BINARY(4)) AS VARBINARY(max)) FROM Parts AS P JOIN PartsCTE AS PP ON P.parentpartID = PP.PartID ) SELECT REPLICATE(‘ ’, lvl) + right(‘>’,lvl) + partname AS partname FROM PArtsCTE order by sortcol go partname Car >DriveTrain >Engine >Radiator >Intake Manifold >Exhaust Manifold >Carburetor >Float Valve >Piston >Piston Rings ptg 1707 Common Table Expressions 43 >Crankshaft >Transmission >Flywheel >Clutch >Gear Box >Reverse Gear >First Gear >Second Gear >Third Gear >Fourth Gear >Axle >Drive Shaft >Body >Frame In this example, you use a varbinary string as the sortcol to sort subparts according to the partid value. The anchor member is the starting point, generating a binary value for the partid of the root part. In each iteration, the recursive member appends the current part ID, converted to a binary value, to the parent part ID’s sortcol. The outer query then sorts the result by sortcol, which groups the subparts under each immediate parent part. Setting the MAXRECURSION Option To help avoid infinite recursion in CTEs, SQL Server, by default, sets a MAXRECURSION value of 100. If a recursive CTE attempts to perform more than 100 recursions, it is aborted, with the following error message: Msg 530, Level 16, State 1, Line 1 The statement terminated. The maximum recursion 100 has been exhausted before statement completion. You can override the default MAXRECURSION setting by using the OPTION(MAXRECURSION value) query hint to force termination of the query after a specific number of recursive iterations have been invoked. Listing 43.43 shows an example. LISTING 43.43 Controlling the Number of Recursions with MAXRECURSION WITH PartsCTE(partid, partname, parentpartid, lvl) AS ( SELECT partid, partname, parentpartid, 0 FROM PARTS WHERE partid = 22 Car UNION ALL SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1 FROM Parts as P ptg 1708 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks JOIN PartsCTE as PP ON P.partid = PP.Partid ) SELECT PartID, Partname, ParentPartid, lvl FROM PartsCTE OPTION (MAXRECURSION 10) go Msg 530, Level 16, State 1, Line 2 The statement terminated. The maximum recursion 10 has been exhausted before statement completion. Keep in mind that if you use MAXRECURSION to control the number of levels of recursion in a CTE, your application receives the error message. It is not considered good program- ming practice to use code that returns errors in valid situations. Certain applications may discard query results if an error message is received. Instead, it is recommended that you use the level counter to limit recursion, as shown earlier in this chapter, in Listing 43.39. You should use the MAXRECURSION hint as a safeguard against infinite loops due to bad data or as a coding safeguard. Ranking Functions SQL Server 2005 introduced four new ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. These functions allow you to analyze data and provide ranking values to result rows of a query. For example, you might use these ranking functions for assigning sequen- tial integer row IDs to result rows or for presentation, paging, or scoring purposes. All four ranking functions follow a similar syntax pattern: function_name() OVER( [PARTITION BY partition_by_list] ORDER BY order_by_list) The ROW_NUMBER Function The ROW_NUMBER function allows you to provide sequential integer values to the result rows of a query, based on the order of the rows in the result. The result set must be ordered using an OVER clause, with an ORDER BY clause as a variable. The ROW_NUMBER function has been a feature long desired by SQL Server developers. For example, suppose you want to return the publishers and total number of titles per publisher and list the result rows, in descending order, with a numeric score assigned to each row. The query shown in Listing 43.44 generates the desired results by using the ROW_NUMBER function, specifying ordering over the num_titles column, in descending order. ptg 1709 Ranking Functions 43 LISTING 43.44 Using ROW_NUMBER to Rank Publishers by Number of Titles select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles, ROW_NUMBER () OVER (order by count(*) DESC) as Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by count(*) desc go pub_id pub_name num_titles Rank 9911 Jones Jones and Johnson 44 1 9904 Strawberry Publications 34 2 9907 Incandescent Imprints 33 3 9905 Gooseberry Titles 32 4 9909 North American Press 30 5 9912 Landlocked Books 30 6 9913 Blackberry’s 28 7 9914 Normanskill Printing Company 28 8 9910 Sidney’s Books and More 28 9 9906 Tomato Books 28 10 9903 Kumquat Technical Publishing 28 11 In this example, the publishers with the highest number of titles got row number 1, and the publisher with the tenth-highest number of titles got row number 10. The ROW_NUMBER function always generates a distinct row number for each row, according to the requested sort. If the ORDER BY list specified within the OVER() option is not on a unique key, the order- ing of the row numbers is nondeterministic. For publishers that may have the same number of titles, each row would be assigned a different unique row number. The sequence of the row numbers assigned to those publishers could be different in different invocations of the query. In the results for Listing 43.44, for example, five different publishers have the same number of titles (28). Because SQL Server has to assign different row numbers to the different publishers, you should assume that the row numbers were assigned in arbitrary order among those publishers. To ensure that the result is always deterministic, specify a unique ORDER BY list. For example, adding pub_id to the ORDER BY list ensures that in the case of a tie between publishers, the lowest pub_id is always assigned the lower row number, as shown in Listing 43.45. LISTING 43.45 Using a Unique ORDER BY List for Deterministic ROW_NUMBER Results select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles, ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank ptg 1710 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by count(*) desc go pub_id pub_name num_titles Rank 9911 Jones Jones and Johnson 44 1 9904 Strawberry Publications 34 2 9907 Incandescent Imprints 33 3 9905 Gooseberry Titles 32 4 9909 North American Press 30 5 9912 Landlocked Books 30 6 9903 Kumquat Technical Publishing 28 7 9906 Tomato Books 28 8 9910 Sidney’s Books and More 28 9 9913 Blackberry’s 28 10 9914 Normanskill Printing Company 28 11 In the previous two examples, the sequence of row numbers is generated across the entire result set as one group. You can also have ranking values calculated indepen- dently within groups of rows as opposed to being calculated for all table rows as one group by using the PARTITION BY clause. Partitioning by ROW_NUMBER() PARTITION BY allows you to specify a list of expressions that identify the groups of rows for which ranking values should be calculated independently. For example, the query in Listing 43.46 assigns row numbers within each type of book separately, in num_titles and pub_id order. LISTING 43.46 Using PARTITION BY to Rank Rows Within Groups select top 20 WITH TIES p.pub_id, pub_name, type, count(*) as num_titles, ROW_NUMBER () OVER (partition by type order by count(*) DESC, p.pub_id) as Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name, type order by type, count(*) desc go pub_id pub_name type num_titles Rank 9906 Tomato Books biography 4 1 9911 Jones Jones and Johnson biography 4 2 9905 Gooseberry Titles biography 2 3 9900 Boysenberry Books biography 1 4 ptg 1711 Ranking Functions 43 9903 Kumquat Technical Publishing biography 1 5 9904 Strawberry Publications biography 1 6 9909 North American Press biography 1 7 9913 Blackberry’s biography 1 8 9914 Normanskill Printing Company biography 1 9 9916 Nordome Titles biography 1 10 9918 Significant Titles Company biography 1 11 1389 Algodata Infosystems business 3 1 0736 New Moon Books business 1 2 9911 Jones Jones and Johnson children 21 1 9914 Normanskill Printing Company children 13 2 9905 Gooseberry Titles children 12 3 9901 GGG&G children 11 4 9903 Kumquat Technical Publishing children 11 5 9915 Beanplant General children 9 6 9900 Boysenberry Books children 8 7 9913 Blackberry’s children 8 8 The RANK and DENSE_RANK Functions The RANK and DENSE_RANK functions are similar to the ROW_NUMBER function in the sense that they also provide ranking values according to a specified sort. The difference is that rather than assign a unique ranking value to each row, RANK and DENSE_RANK assign the same ranking value to rows with the same values in the specified sort columns when the ORDER BY list is not unique. The difference between RANK and DENSE_RANK is that with the DENSE_RANK function, there are no gaps in the ranking. The RANK function skips the next number if there is a tie in the ranking value. Listing 43.47 modifies the query shown in Listing 43.44 by replacing the ROW_NUMBER function with RANK and DENSE_RANK and provides a good example of the differ- ences between the two. LISTING 43.47 Using RANK and DENSE_RANK select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles, RANK() OVER (order by count(*) DESC) as Rank, DENSE_RANK() OVER (order by count(*) DESC) as Dense_Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by count(*) desc go pub_id pub_name num_titles Rank Dense_Rank ptg 1712 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks 9911 Jones Jones and Johnson 44 1 1 9904 Strawberry Publications 34 2 2 9907 Incandescent Imprints 33 3 3 9905 Gooseberry Titles 32 4 4 9909 North American Press 30 5 5 9912 Landlocked Books 30 5 5 9913 Blackberry’s 28 7 6 9914 Normanskill Printing Company 28 7 6 9910 Sidney’s Books and More 28 7 6 9906 Tomato Books 28 7 6 9903 Kumquat Technical Publishing 28 7 6 Notice that in this result set, all publishers with the same number of titles get the same RANK and DENSE_RANK values. NOTE If the ORDER BY list for a ranking function is unique, ROW_NUMBER, RANK, and DENSE_RANK produce exactly the same values. The NTILE Function The NTILE function assigns a ranking value by separating the result rows of a query into a specified number of approximately even-sized groups. Each group of rows is assigned the same ranking number, starting with 1 for the first group, 2 for the second, and so on. You specify the number of groups you want the result set divided into as the argument to the NTILE function. The number of rows in a group is determined by dividing the total number of rows in the result set by the number of groups. If there’s a remainder, n, the first n groups have an additional row assigned to them. Listing 43.48 provides an example of using the NTILE function, so you can compare it to the ROW_NUMBER function. LISTING 43.48 Using the NTILE Function select p.pub_id, pub_name, count(*) as num_titles, NTILE(3) OVER (order by count(*) DESC) as NTILE, ROW_NUMBER() OVER (order by count(*) DESC) as RowNum from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by count(*) desc go pub_id pub_name num_titles NTILE RowNum 9911 Jones Jones and Johnson 44 1 1 9904 Strawberry Publications 34 1 2 ptg 1713 Ranking Functions 43 9907 Incandescent Imprints 33 1 3 9905 Gooseberry Titles 32 1 4 9909 North American Press 30 1 5 9912 Landlocked Books 30 2 6 9913 Blackberry’s 28 2 7 9914 Normanskill Printing Company 28 2 8 9910 Sidney’s Books and More 28 2 9 9906 Tomato Books 28 2 10 9903 Kumquat Technical Publishing 28 3 11 9902 Lemon Legal Publishing 27 3 12 9901 GGG&G 25 3 13 9908 Springfield Publishing 25 3 14 9900 Boysenberry Books 23 4 15 9916 Nordome Titles 22 4 16 9915 Beanplant General 21 4 17 9917 BFG Books 17 4 18 9918 Significant Titles Company 17 5 19 0877 Binnet & Hardley 6 5 20 1389 Algodata Infosystems 6 5 21 0736 New Moon Books 5 5 22 In this example, NTILE is used to divide the result set into five groups. Because there are 22 rows in the publishers table, there are 4 rows in each group, with 2 left over. The 2 extra rows are added to the first two groups. The NTILE function provides a way to generate a histogram with an even distribution of items for each step. In the previous example, the first step represents the publishers with the highest number of titles, and the last step represents the publishers with the lowest number of titles. You can use this information in a CASE expression to provide descriptive meaningful alternatives to the ranking numbers, as shown in Listing 43.49. LISTING 43.49 Using a CASE Expression to Provide Meaningful Labels to Ranking Values select p.pub_id, pub_name, count(*) as num_titles, case NTILE(5) OVER (order by count(*) DESC) when 1 then ‘Highest’ when 2 then ‘Above Average’ when 3 then ‘Average’ when 4 then ‘Below Average’ when 5 then ‘Lowest’ end as Ranking from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by pub_id go . recursive member to stop the recursion as soon as two levels below the drivetrain are returned. SQL Server allows the use of local variables in a CTE to help make the query more generic. For example,. immediate parent part. Setting the MAXRECURSION Option To help avoid infinite recursion in CTEs, SQL Server, by default, sets a MAXRECURSION value of 100. If a recursive CTE attempts to perform more. safeguard against infinite loops due to bad data or as a coding safeguard. Ranking Functions SQL Server 2005 introduced four new ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. These