ptg 1554 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 TABLE 42.1 Join Methods Used for WHEN Clauses Specified WHEN Clauses Join Method WHEN MATCHED clause only INNER JOIN WHEN NOT MATCHED BY TARGET clause, but not the WHEN NOT MATCHED BY SOURCE clause LEFT OUTER JOIN from source to target WHEN MATCHED clause and the WHEN NOT MATCHED BY SOURCE clause, but not the WHEN NOT MATCHED BY TARGET clause RIGHT OUTER JOIN from source to target WHEN NOT MATCHED BY TARGET clause and the WHEN NOT MATCHED BY SOURCE clause FULL OUTER JOIN WHEN NOT MATCHED BY SOURCE clause only ANTI SEMI JOIN The combination of WHEN clauses specified in the MERGE statement determines the join method that SQL Server will use to process the query (see Table 42.1). To improve the performance of the MERGE statement, you should make sure you have appropriate indexes to support the join columns between the source table and target table. Any additional columns in the source table index that will help to cover the query may help improve performance even more (for information on index covering, see Chapter 34, “Data Structures, Indexes, and Performance”). The indexes should ensure that the join keys are unique and, if possible, sort the data in the tables in the order it will be processed so additional sort operations are not necessary. Unique indexes supporting the join condi- tions for the MERGE statement will improve query performance because the query optimizer does not need to perform extra validation processing to locate and update duplicate rows. To better understand how the MERGE statement works, let’s look at an example. First, you need to set up some data in a source table. In the bigpubs2008 database, there is a table called stores. For this example, let’s assume you want to set up a new table that keeps track of each store’s inventory to support an application that can monitor each store’s inventory and send notifications when certain items run low, as well as to support the ability of each store to search other store inventories to locate rare and out-of-print books that other stores may have available. On a daily basis, each store uploads a full refresh of its current inventory to a staging table ( inventory_load), which is the source table for the MERGE. You then use the inventory_load table to modify the store’s inventory in the store_inventory table (which is the target table for the MERGE operation). First, let’s create the new store_inventory table (see Listing 42.1). Just for sake of the example, you can create and populate it with the existing data from the sales table for stor_id ‘A011’ and create a primary key constraint on the stor_id and title_id columns. The next step is to load the inventory_load table. Normally, in a real-world scenario, this table would likely be populated via a BULK INSERT statement or SQL Server Integration Services. However, for the sake of this example, you simply are going to create ptg 1555 MERGE Statement 42 some test data by creating and populating the inventory_load table using SELECT INTO with data merged from the sales data for both stor_id ‘A011’ and ’A017’. When the inventory_load table is created and populated, you can create a primary key on the stor_id and title_id columns as well to support the join with the store_inventory table. The next step is to build out the MERGE statement. Following are the rules to be applied: . If there is a matching row between the source and target tables and the qty value is different, update the qty value in the target table to the value in the source table. . If a row in the source table doesn’t have a match in the target table, this is a new inventory item, so insert the new row to the target table. . If a row in the target table doesn’t have a matching row in the source table, that inventory item no longer exists, so delete it from the target table. Also for the sake of the example so that you can see just what the MERGE statement ends up doing, the OUTPUT clause has been added with the $action column included. The $action column displays what operation (INSERT, UPDATE, DELETE) was performed on each row, and displays the title_id and qty values for both the source and target tables for each row processed (note that if the title_id and qty columns are NULL, that was a nonmatching row). LISTING 42.1 A MERGE Example use bigpubs2008 go if OBJECT_ID(‘store_inventory’) is not null drop table store_inventory go Create and populate the store_inventory table select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE() into store_inventory from sales s where stor_id = ‘A011’ group by stor_id, title_id go add primary key on store_inventory to support the join to source table alter table store_inventory add constraint PK_store_inventory primary key (stor_id, title_id) Go if OBJECT_ID(‘inventory_load’) is not null drop table inventory_load go Now, create and populate the inventory_load table select stor_id = ‘A011’, ptg 1556 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 title_id, qty = SUM(qty) into inventory_load from sales s where stor_id like ‘A01[17]’ and title_id not like ‘%8’ group by title_id go — add primary key on store_inventory to support the join to target table alter table inventory_load add constraint PK_inventory_load primary key (stor_id, title_id) go select * from store_inventory go perform the marge, updating any matching rows with different quantities adding any rows in source not in the target, and deleting any rows from the target that are not in the source. Output clause is specified to display the results of the MERGE MERGE INTO store_inventory as s USING inventory_load as i ON s.stor_id = i.stor_id and s.title_id = i.title_id WHEN MATCHED and s.qty <> i.qty THEN UPDATE SET s.qty = i.qty, update_dt = getdate() WHEN NOT MATCHED THEN INSERT (stor_id, title_id, qty, update_dt) VALUES (i.stor_id, i.title_id, i.qty, getdate()) WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT $action, isnull(inserted.title_id, ‘’) as src_titleid, isnull(str(inserted.qty, 5), ‘’) as src_qty, isnull(deleted.title_id, ‘’) as tgt_titleid, isnull(str(deleted.qty, 5), ‘’) as tgt_qty ; go select * from store_inventory go If you run the script in Listing 42.1, you should see output like the following. ptg 1557 MERGE Statement 42 stor_id title_id qty update_dt A011 CH0741 1452 2010-03-25 00:34:25.597 A011 CH3348 24 2010-03-25 00:34:25.597 A011 FI0324 1392 2010-03-25 00:34:25.597 A011 FI0392 1176 2010-03-25 00:34:25.597 A011 FI1552 1476 2010-03-25 00:34:25.597 A011 FI1872 540 2010-03-25 00:34:25.597 A011 FI3484 1428 2010-03-25 00:34:25.597 A011 FI3660 984 2010-03-25 00:34:25.597 A011 FI4020 1704 2010-03-25 00:34:25.597 A011 FI4970 1140 2010-03-25 00:34:25.597 A011 FI4992 180 2010-03-25 00:34:25.597 A011 FI5832 1632 2010-03-25 00:34:25.597 A011 NF8918 1140 2010-03-25 00:34:25.597 A011 PC9999 1272 2010-03-25 00:34:25.597 A011 TC7777 1692 2010-03-25 00:34:25.597 (15 row(s) affected) $action INSERT BU2075 1536 DELETE CH3348 24 INSERT CH5390 888 INSERT CH7553 540 INSERT FI1950 1308 INSERT FI2100 1104 INSERT FI3822 996 UPDATE FI4970 1632 FI4970 1140 INSERT FI7040 1596 INSERT LC8400 732 DELETE NF8918 1140 (11 row(s) affected) stor_id title_id qty update_dt A011 BU2075 1536 2010-03-25 00:54:54.547 A011 CH0741 1452 2010-03-25 00:34:25.597 A011 CH5390 888 2010-03-25 00:54:54.547 A011 CH7553 540 2010-03-25 00:54:54.547 A011 FI0324 1392 2010-03-25 00:34:25.597 A011 FI0392 1176 2010-03-25 00:34:25.597 A011 FI1552 1476 2010-03-25 00:34:25.597 ptg 1558 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 A011 FI1872 540 2010-03-25 00:34:25.597 A011 FI1950 1308 2010-03-25 00:54:54.547 A011 FI2100 1104 2010-03-25 00:54:54.547 A011 FI3484 1428 2010-03-25 00:34:25.597 A011 FI3660 984 2010-03-25 00:34:25.597 A011 FI3822 996 2010-03-25 00:54:54.547 A011 FI4020 1704 2010-03-25 00:34:25.597 A011 FI4970 1632 2010-03-25 00:54:54.547 A011 FI4992 180 2010-03-25 00:34:25.597 A011 FI5832 1632 2010-03-25 00:34:25.597 A011 FI7040 1596 2010-03-25 00:54:54.547 A011 LC8400 732 2010-03-25 00:54:54.547 A011 PC9999 1272 2010-03-25 00:34:25.597 A011 TC7777 1692 2010-03-25 00:34:25.597 (21 row(s) affected) If you examine the results and compare the before and after contents of the store_inventory, you see that eight new rows were inserted to store_inventory, two rows were deleted, and one row was updated. MERGE Statement Best Practices and Guidelines The MERGE statement is a great addition to the T-SQL language. It provides a concise and effi- cient mechanism to perform multiple operations on a table based on contents in a source table without having to resort to using a cursor or running multiple set-oriented operations against the table. However, there are some guidelines and best practices you should keep in mind to help ensure you get the best performance from your MERGE statements. First, you should try to reduce the number of rows accessed by the MERGE statement early in the process by specifying any additional search condition to the ON clause that filters out rows that do not need to be processed. You should avoid using the conditions in the WHEN clauses as row filters. However, you need to be careful if you are using any of the WHEN NOT MATCHED clauses because the elimination of rows via the ON clause may cause unexpected and incorrect results. Because the additional search conditions specified in the ON clause are not used for matching the source and target data, they can be misapplied. To ensure correct results are obtained, you should specify only search conditions in the ON clause that determine the criteria for matching data in the source and target tables. That is, specify only columns from the target table that are compared to the corresponding columns of the source table. Do not include comparisons to other values such as a constant. To filter out rows from the source or target tables, you should consider using one of the following methods. . Specify the search condition for row filtering in the appropriate WHEN clause. For example, WHEN NOT MATCHED AND qty > 0 THEN INSERT ptg 1559 Insert over DML 42 . Define a view on the source or target that returns the filtered rows and reference the view as the source or target table. If the view is used as the target, make sure the view is updateable (for more information about updating data by using a view, see Chapter 27, “Creating and Managing Views”). . Use the WITH <common table expression> clause to filter out rows from the source or target tables. However, if you are not careful, this method is similar to specifying additional search criteria in the ON clause and may produce incorrect results. You should test this approach thoroughly before implementing it (for information on using common table expressions, see Chapter 43, “Transact-SQL Programming Guidelines, Tips, and Tricks”). Insert over DML Another T-SQL enhancement in SQL Server 2008 applies to the use of the OUTPUT clause. The OUTPUT clause allows you to return data from a modification statement (INSERT, UPDATE, MERGE,orDELETE) as a result set or into a table variable or an output table. In SQL Server 2008, you can include one of these Data Manipulation Language (DML) statements with an OUTPUT clause within the context of an INSERT SELECT statement. In the MERGE statement in Listing 42.1, the OUTPUT clause was used to display the rows affected by the statement. Suppose that you want the output of this to be put into a sepa- rate audit or processing table. In SQL Server 2008, you can do so by allowing the MERGE statement with the OUTPUT clause to be incorporated as a derived table in the SELECT clause of an INSERT statement. To demonstrate this approach, you first need to create a table for storing that data: if OBJECT_ID(‘inventory_audit’) is not null drop table inventory_audit go CREATE TABLE inventory_audit ( Action varchar(10) not null, Src_title_id varchar(6) null, Src_qty int null, Tgt_title_id varchar(6) null, Tgt_qty int null, Loginname varchar(30) null default suser_name(), Action_DT datetime2 null default sysdatetime() ) Now it is possible to be put a SELECT statement atop the MERGE command as the values clause for an INSERT into the inventory_audit table (see Listing 42.2). ptg 1560 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 LISTING 42.2 Insert over DML Example NOTE: to see the results for this example you first need to clear out and repopulate the store_inventory table Truncate table store_inventory Insert store_inventory (stor_id, title_id, qty, update_dt) select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE() from sales s where stor_id = ‘A011’ group by stor_id, title_id go insert inventory_audit (action, Src_title_id, Src_qty , Tgt_title_id, Tgt_qty , Loginname, Action_DT ) select *, SUSER_NAME(), SYSDATETIME() from ( MERGE INTO store_inventory as s USING inventory_load as i ON s.stor_id = i.stor_id and s.title_id = i.title_id WHEN MATCHED and s.qty <> i.qty THEN UPDATE SET s.qty = i.qty, update_dt = getdate() WHEN NOT MATCHED THEN INSERT (stor_id, title_id, qty, update_dt) VALUES (i.stor_id, i.title_id, i.qty, getdate()) WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT $action, isnull(inserted.title_id, ‘’) as src_titleid, isnull(str(inserted.qty, 5), ‘’) as src_qty, isnull(deleted.title_id, ‘’) as tgt_titleid, isnull(str(deleted.qty, 5), ‘’) as tgt_qty ) changes ( action, Src_title_id, Src_qty , Tgt_title_id, ptg 1561 GROUP BY Clause Enhancements 42 Tgt_qty ); go select * from inventory_audit go Action Src_title_id Src_qty Tgt_title_id Tgt_qty Loginname Action_DT INSERT BU2075 1536 0 rrankins 2010-04-02 22:20:59.48 DELETE 0 CH3348 24 rrankins 2010-04-02 22:20:59.48 INSERT CH5390 888 0 rrankins 2010-04-02 22:20:59.48 INSERT CH7553 540 0 rrankins 2010-04-02 22:20:59.48 INSERT FI1950 1308 0 rrankins 2010-04-02 22:20:59.48 INSERT FI2100 1104 0 rrankins 2010-04-02 22:20:59.48 INSERT FI3822 996 0 rrankins 2010-04-02 22:20:59.48 UPDATE FI4970 1632 FI4970 1140 rrankins 2010-04-02 22:20:59.48 INSERT FI7040 1596 0 rrankins 2010-04-02 22:20:59.48 INSERT LC8400 732 0 rrankins 2010-04-02 22:20:59.48 DELETE 0 NF8918 1140 rrankins 2010-04-02 22:20:59.48 GROUP BY Clause Enhancements SQL Server 2008 introduces a number of enhancements and changes to the grouping aggregate relational result set. These changes include the following: . ROLLUP and CUBE operator syntax changes . New GROUPING SETS operator . New GROUPING_ID() function ROLLUP and CUBE Operator Syntax Changes The ROLLUP and CUBE operators produce additional aggregate groupings and are appended to the GROUP BY clause. Prior to SQL Server 2008, to include ROLLUP or CUBE groupings, you had to specify the WITH ROLLUP or WITH CUBE options in the GROUP BY clause after the list of grouping columns. In SQL Server 2008, the syntax now follows the ANSI standard for ROLLUP and CUBE; you first designate the ROLLUP or CUBE option and then provide the grouping columns to these operators as a comma-separated list enclosed in parentheses. The new syntax is GROUP BY [ROLLUP | CUBE ( non-aggregate_column_list ) ] Following are examples using the pre-2008 syntax: SELECT type, pub_id, AVG(price) AS average FROM titles ptg 1562 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 GROUP BY type, pub_id WITH CUBE SELECT pub_id, type, SUM(ytd_sales) as ytd_sales FROM dbo.titles where type like ‘%cook%’ or type = ‘business’ GROUP BY type, pub_id WITH ROLLUP An example of the new ANSI standard syntax supported in SQL Server 2008 is as follows: SELECT type, pub_id, AVG(price) AS average FROM titles GROUP BY CUBE ( type, pub_id) SELECT pub_id, type, SUM(ytd_sales) as ytd_sales FROM dbo.titles where type like ‘%cook%’ or type = ‘business’ GROUP BY ROLLUP (type, pub_id) NOTE The old-style CUBE and ROLLUP syntax is still supported for backward-compatibility pur- poses but is being deprecated. You should convert any existing queries using the pre- 2008 WITH CUBE or WITH ROLLUP syntax to the new syntax to ensure future compatibility. GROUPING SETS The CUBE and ROLLUP operators allow you to run a single query and generate multiple sets of groupings. However, the sets of groupings are fixed. For example, if you use GROUP BY ROLLUP (A, B, C) , you get aggregates generated for the following groupings of nonaggre- gate columns: . GROUP BY A, B, C . GROUP BY A, B . GROUP BY A . A super-aggregate for all rows If you use GROUP BY CUBE (A, B, C), you get aggregates generated for the following groupings of nonaggregate columns: . GROUP BY A, B, C . GROUP BY A, B . GROUP BY A, C ptg 1563 GROUP BY Clause Enhancements 42 . GROUP BY B, C . GROUP BY A . GROUP BY B . GROUP BY C . A super-aggregate for all rows SQL Server 2008 introduces the GROUPING SETS operator in addition to the CUBE and ROLLUP operators for performing several groupings in a single query. With GROUPING SETS, only the specified groups are aggregated instead of the full set of aggregations generated by CUBE or ROLLUP. GROUPING SETS enables you to generate results with multiple groupings in a single query, without having to resort to writing multiple GROUP BY queries and combining the results using a UNION ALL statement. The GROUPING SETS operator supports concatenating column groupings and an optional super aggregate row. The syntax for defining grouping sets is as follows: GROUP BY [ GROUPING SETS ( ( ) | grouping_set_item | grouping_set_item_list [, n ] ) ] The GROUPING SETS items can be single columns or a list of columns. The null field list ”( )” can also be used to generate a super-aggregate (that is, a grand total for the entire result set). A non-nested list of columns works as separate simple GROUP BY statements, which are then combined in an implied UNION ALL. A nested list of columns in parentheses within the GROUPING SETS item list works as a GROUP BY on that set of columns. Table 42.2 demonstrates examples of GROUPING SETS clauses and the corresponding groupings that the query generates. TABLE 42.2 Grouping Sets Examples GROUPING SETS Clause Equivalent Statement GROUP BY GROUPING SETS (A,B,C) GROUP BY A UNION ALL GROUP BY B UNION ALL GROUP BY C GROUP BY GROUPING SETS ((A,B,C)) GROUP BY A,B,C GROUP BY GROUPING SETS (A,(B,C)) GROUP BY A UNION ALL GROUP BY B,C GROUP BY GROUPING SETS ((A,C),(B,C)) GROUP BY A,C UNION ALL GROUP BY B,C . to SQL Server 2008, to include ROLLUP or CUBE groupings, you had to specify the WITH ROLLUP or WITH CUBE options in the GROUP BY clause after the list of grouping columns. In SQL Server 2008, . table expressions, see Chapter 43, “Transact -SQL Programming Guidelines, Tips, and Tricks”). Insert over DML Another T -SQL enhancement in SQL Server 2008 applies to the use of the OUTPUT clause. The OUTPUT. ] Following are examples using the pre -2008 syntax: SELECT type, pub_id, AVG(price) AS average FROM titles ptg 1562 CHAPTER 42 What’s New for Transact -SQL in SQL Server 2008 GROUP BY type, pub_id WITH