ptg 1654 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks UNION Versus UNION ALL Performance You should use UNION ALL instead of UNION if there is no need to eliminate duplicate result rows from the result sets being combined with the UNION operator. The UNION statement has to combine the result sets into a worktable to remove any duplicate rows from the result set. UNION ALL simply concatenates the result sets together, without the overhead of putting them into a worktable to remove duplicate rows. Use IF EXISTS Instead of SELECT COUNT(*) You should use IF EXISTS instead of SELECT COUNT(*) when checking only for the exis- tence of any matching data values and when determining the number of matching rows is not required. IF EXISTS stops the processing of the select query as soon as the first matching row is found, whereas SELECT COUNT(*) continues searching until all matches are found, wasting I/O and CPU cycles. For example, you could replace if (SELECT count(*) FROM dbo.sales WHERE stor_id = ‘6380’) > 0 with an IF EXISTS check similar to if exists (SELECT * FROM dbo.sales WHERE stor_id = ‘6380’) Avoid Unnecessary ORDER BY or DISTINCT Clauses When a T-SQL query contains an ORDER BY or DISTINCT clause, a worktable is often required to process the final result of the query if it cannot determine that the rows will already be retrieved in the desired sort order or that a unique key in the result makes the rows distinct. If a query requires a worktable, that adds extra overhead and I/O to put the results into the worktable in tempdb and do the sorting necessary to order the results or to eliminate duplicate rows. This can result in extended processing time for the query, which can delay the time it takes for the final result to be returned to the client application. If it is not absolutely necessary for the rows returned to the application to be in a specific order (for example, returning rows to a grid control where the contents can be re-sorted by any column in the grid control itself), you should leave off the ORDER BY clause in your queries. Likewise, you should not arbitrarily include the DISTINCT clause in all your queries unless it is absolutely necessary to eliminate any duplicate rows from the result set. Temp Tables Versus Table Variables Versus Common Table Expressions SQL Server 2008 provides multiple options for working with temporary result sets in T- SQL code: . Temporary tables . Table variables ptg 1655 General T-SQL Performance Recommendations 43 . Derived tables . Common table expressions One of the questions you may consider is “Which method should I use and when?” Whether you use a temporary table, table variable, derived table, or common table expres- sion depends, in part, on how often and for how long you intend to use it. This section provides some general recommendations to consider. You should use table variables instead of temporary tables in stored procedures whenever possible or feasible. Table variables are memory resident and do not incur the I/O over- head and system table and I/O contention that can occur in tempdb with normal tempo- rary tables. However, remember that table variables exist only for the duration of the SQL batch or stored procedure in which they are defined. In SQL Server 2005 and later, you also have the option of using derived tables or common table expressions in your queries to generate and hold intermediate result sets that can be further processed by the main query. A derived table is a subquery contained in a FROM clause that can be referred to by an alias and used as a table in the query. Derived tables and common table expressions can be thought of as sort of dynamic views that exist only for the duration of the query. Derived tables are handy if you don’t need to use a result set more than once in multiple queries. You should consider using derived tables or common table expressions when possible to completely avoid the use of table variables or temporary tables, especially if the temporary table or table variable is used only once by a single query. NOTE For more information on common table expressions and how to use them, see the sec- tion “Common Table Expressions” later in this chapter. You should generally consider using temporary tables only when you need to share data between an application and stored procedures or between stored procedures. Also, if the temporary result set is going to be very large (that is, larger than can be held in SQL Server cache memory), you should consider storing it in a temporary table rather than a table variable. NOTE In SQL Server 2008, you can define table data types, which makes it possible to pass table variables to stored procedures as table parameters, so temp tables aren’t the only way to share data between stored procedures. However, there are some limita- tions: primarily, the contents of a table parameter passed to a stored procedure are read only and cannot be modified within the stored procedure. If you want to share data between stored procedures and have the ability to add, remove, or modify rows in any of the stored procedures, temporary tables are still the best solution. ptg 1656 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks If you need to use temporary tables, you can follow these general guidelines to help improve their performance: . Select only the columns actually required by the subsequent SQL statements into the temp table (that is, avoid using select *). This helps reduce the size of the temp table, thereby reducing the number of writes to tempdb and also speeding up access of the data within the temp table because more rows will fit on a data page, reduc- ing the number of data pages that need to be accessed by the query. . Select only the rows needed by the subsequent queries, again to help limit the size of the temp table and reduce the amount of I/O in tempdb. . If the temporary table will be accessed multiple times by queries using search argu- ments (SARGs), consider creating an index on the temporary table if it can be used to speed up the queries against the temp table and reduce I/O. Of course, this option should be considered only if the time and I/O saved by having an index on the tem- porary table significantly exceeds the time and I/O required to create the index. Avoid Unnecessary Function Executions If you call a SQL Server function (for example, suser_name(), getdate()) repeatedly within a procedure or in T-SQL code, you should consider using a local variable to hold the value returned by the function and use the local variable repeatedly throughout your SQL statements rather than repeatedly executing the SQL Server function. This saves CPU cycles within your stored procedure and T-SQL code. NOTE For additional performance-related query recommendations related specifically to how queries are optimized, see Chapter 35. Cursors and Performance In contrast to most other programming languages, SQL is a set-based processing language. You retrieve sets of rows, update sets of rows, and delete sets of rows. The set of rows affected is determined by the search conditions specified in the query. Unfortunately, most programmers are used to doing record-oriented operations on data and often want to apply the same technique to SQL Server data. Admittedly, at times, processing rows as a single result set with a single query can seem difficult or impossible. However, because of the performance implications, cursors should not be used just because it’s easier to program that way. NOTE SQL Server 2008 introduces the new MERGE statement, which provides another set-ori- ented option for processing a set of input rows and making a row-by-row determination which rows to ignore or which to insert, update, or delete in the target table. For more information on using the MERGE statement, see Chapter 42, “What’s New for Transact- SQL in SQL Server 2008.” ptg 1657 General T-SQL Performance Recommendations 43 When to Use Cursors Application performance can sometimes be slow due to the improper use of cursors. You should always try to write your T-SQL code so SQL Server can perform what it is good at: set-based operations. It makes little sense to have an advanced relational database manage- ment system (RDBMS) and use it only for one-row-at-a-time retrievals. For example, many update operations performed using cursors can be performed with a single UPDATE state- ment using the CASE expression. Consider the cursor shown in Listing 43.5. LISTING 43.5 Updating the titles Table by Using a Cursor /* This is a SQL script to update book prices dependent on current price and ytd_sales */ /*declare cursor*/ declare titles_curs cursor for select ytd_sales, price from dbo.titles for update of price declare @ytd_sales int, @price money open titles_curs fetch next from titles_curs into @ytd_sales, @price if (@@fetch_status = -1) begin print ‘No books found’ close titles_curs deallocate titles_curs return end while (@@fetch_status = 0) begin if @ytd_sales < 500 update titles set price = @price * .75 where current of titles_curs else if @price > $15 update titles set price = @price * .9 where current of titles_curs else update titles set price = @price * 1.15 where current of titles_curs fetch next from titles_curs into @ytd_sales, @price end if (@@fetch_status = -2) raiserror (‘Attempt to fetch a row failed’, 16, 1) close titles_curs deallocate titles_curs ptg 1658 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks This cursor can be replaced with a simple, single UPDATE statement, using the CASE expres- sion, as shown in Listing 43.6 LISTING 43.6 The titles Cursor Example Performed with a Single UPDATE Statement Using the CASE Expression update titles set price = case when ytd_sales < 500 then price *.75 when price > $15 then price * .90 else price * 1.15 end The advantages with this approach are significant performance improvement and much cleaner and simpler code. In testing the performance of the single update versus the cursor using the bigpubs2008 database, the cursor required on average around 100 milliseconds (ms) to complete. The single update statement required, on average, about 10 ms. (Your results may vary depending on hardware capabilities.) Although both of these completed within a subsecond response time, consider that the cursor took 10 times longer to complete than the single update. Factor that out over hundreds of thousands or millions of rows, and you could be looking at a significant performance difference. Why is the cursor so much slower? Well, for one thing, a table scan performed by an UPDATE,aDELETE,oraSELECT uses internal, compiled C code to loop through the result set. A cursor uses interpreted SQL code. In addition, with a cursor, you are performing multiple lines of code per row retrieved. The titles cursor example is a relatively simple one; it performs one or two conditional checks and a single update per row, but it is still three or four times slower. Because of the overhead required to process cursors, set-oriented operations typically run much faster, even if multiple passes of the table are required. Although set-oriented operations are almost always faster than cursor operations, the one possible disadvantage of using a single update is locking concurrency. Even though a single update runs faster than a cursor, while it is running, the single update might end up locking the entire table for an extended period of time. This would prevent other users from accessing the table during the update. If concurrent access to the table is more important than the time it takes for the update to complete, you might want to consider using a cursor. A cursor locks the data only a row at a time instead of locking the entire table (as long as each row is committed individually and the entire cursor is not in a transaction). Another situation in which you might want to consider using cursors is for scrolling appli- cations when the result sets can be quite large. Consider a customer service application. The customer representative might need to pull up a list of cases and case contacts associ- ated with a customer. If the result sets are small, you can just pull the entire result set down into a list box and let the user scroll through them and not need to use a cursor. ptg 1659 General T-SQL Performance Recommendations 43 However, if thousands of rows of information are likely, you might want to pull back only a block of rows at a time, especially if the user needs to look at only a few of the rows to get the information he or she needs. It probably wouldn’t be worth pulling back all that data across the network just for a few rows. In this type of situation, you might want to use a scrollable API server cursor. This way, you can retrieve the appropriate number of rows to populate the list box and then use the available scrolling options to quickly fetch to the bottom of the list, using the LAST or ABSOLUTE n options, or you can go backward or forward by using the RELATIVE option. NOTE You need to be careful using the scrollable API server cursor approach in a multitier environment. Many multitier architectures include a middle data layer that often uses connection sharing for multiple clients, and the users are typically assigned any avail- able connection when they need to access SQL Server. Users do not necessarily use the same connection each time. Therefore, if a user created a cursor in one connec- tion, the next time the user submitted a fetch through the data layer, he or she might get a different connection, and the cursor will not be available. One solution for this problem is to go back to retrieving the entire result set down to the client application. Another possible solution is to use a global temp table as a type of homemade insensitive cursor to hold the result set and grab the data from the global temp table in chunks, as needed. With the temp table approach, you need to make sure a sequential key is on the table so you can quickly grab the block of rows you need. You need to be aware of the potential impact on tempdb performance and the size requirements of tempdb if the result sets are large and you have many concur- rent users. As a general rule, you should use cursors only as a last resort when no set-oriented solu- tion is feasible. If you have decided that a cursor is the appropriate solution, you should try to make it as efficient as possible by limiting the number of commands to be executed within the cursor loop as much as possible. Also, you should try to keep the cursor processing on the server side within stored procedures. If you will be performing multiple fetches over the network (for example, to support a scrolling application), you should use an API server cursor. You should avoid using client-side cursors that will be performing many cursor operations in the client application; otherwise, you will find your application making excessive requests to the server, and the volume of network roundtrips will make for a sloth-like application. Variable Assignment in UPDATE Statements One commonly overlooked feature in T-SQL is the ability to assign values to local vari- ables in the SET clause of the UPDATE statement. This capability can help improve query performance by reducing locking and CPU contention and reducing the number of state- ments required in a T-SQL batch. ptg 1660 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks The simplified syntax of the SET clause for assigning values to variables is as follows: SET { column_name = { expression | DEFAULT | NULL } | @variable = expression | @variable = column = expression [ , n ] } [ , n ] One common use of variable assignment in UPDATE statements is when you have a table that is used for storing and generating unique key values. To demonstrate this, you can create the keygen table and populate it as shown in Listing 43.7. LISTING 43.7 Creating and Populating the keygen Table create table keygen (keytype char(1), keyval int) go insert keygen(keytype, keyval) values (‘x’, 1) go The typical approach often used to perform the task of retrieving a key value and updating the keygen table to generate the next key is to issue a SELECT statement and UPDATE state- ment within a transaction. Listing 43.8 shows an example of this. LISTING 43.8 Retrieving and Updating keyval with SELECT and UPDATE begin tran declare @newkey int Select current keyval into @newkey select @newkey = keyval from keygen (XLOCK) where keytype = ‘x’ update keygen set keyval = keyval + 1 where keytype = ‘x’ commit select @newkey as newkey go newkey 1 ptg 1661 General T-SQL Performance Recommendations 43 TIP In Listing 43.8, the XLOCK hint is specified in the SELECT statement. This prevents two separate user processes from running this T-SQL batch at the same time and both acquiring the same keyval. With the XLOCK hint, only one of the processes can acquire an exclusive lock, and the other process waits until the lock is released and acquires the next keyval. The use of XLOCK is definitely preferable to HOLDLOCK because the use of HOLDLOCK in this type of scenario often leads to a deadlock situation. By using variable assignment in an UPDATE statement, you can eliminate the SELECT state- ment altogether and capture the keyval in the same statement you use to update the keygen table, as shown in Listing 43.9. LISTING 43.9 Using Variable Assignment in an UPDATE to Update and Retrieve keyval declare @newkey int update keygen set keyval = keyval + 1, @newkey = keyval where keytype = ‘x’ select @newkey as newkey go newkey 2 Notice that the value assigned to the local variable using the syntax shown in Listing 43.9 is the value of the keyval column prior to the update. If you prefer to assign the value of the column after the column is updated, you use the @variable = column = expression syntax, as shown in Listing 43.10. LISTING 43.10 Using Variable Assignment in an UPDATE to Update and Retrieve keyval After Update declare @newkey int update keygen set @newkey = keyval = keyval + 1 where keytype = ‘x’ select @newkey as newkey go ptg 1662 CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks newkey 4 You need to be aware that the variable assignment is performed for every row that qualifies in the update. The resulting value of the local variable is the value of the last row updated. Another use for variable assignment in UPDATE statements is to accumulate the sum of a column into a local variable for all the rows being updated. The alternative approach would be to use a cursor, as shown in Listing 43.11. LISTING 43.11 Using a Cursor to Accumulate a Sum of a Column for Each Row Updated declare c1 cursor for select isnull(ytd_sales, 0) from titles where type = ‘business’ for update of price go declare @ytd_sales_total int, @ytd_sales int select @ytd_sales_total = 0 open c1 fetch c1 into @ytd_sales while @@fetch_status = 0 begin update titles set price = price where current of c1 select @ytd_sales_total = @ytd_sales_total + @ytd_sales fetch c1 into @ytd_sales end select @ytd_sales_total as ytd_sales_total close c1 deallocate c1 go ytd_sales_total 30788 By using variable assignment in an UPDATE statement, you can replace the cursor in Listing 43.11 with a single UPDATE statement, as shown in Listing 43.12. ptg 1663 T-SQL Tips and Tricks 43 LISTING 43.12 Using Variable Assignment in an UPDATE Statement to Accumulate a Sum of a Column for Each Row Updated declare @ytd_sales_total int set @ytd_sales_total = 0 update titles set price = price, @ytd_sales_total = @ytd_sales_total + isnull(ytd_sales, 0) where type = ‘business’ select @ytd_sales_total as ytd_sales_total go ytd_sales_total 30788 As you can see from the examples presented in this chapter, using variable assignment in UPDATE statements results in much more concise and efficient T-SQL code than using cursors or other alternatives. When your code is more concise and consistent, it will run faster, requiring fewer CPU resources. Also, faster, more efficient code reduces the amount of time locks are held, which reduces the chance for locking contention, which also helps improve overall application performance. T-SQL Tips and Tricks The following sections provide some general tips and tricks to help you get the most from your T-SQL code. Date Calculations Occasionally, you may find that you need to start with a date value and use it to calculate some other date. For example, your SQL code might need to determine what date is the first day of the month or last day of the month. As you may know, working with the datetime data type in SQL Server can be a bit of a challenge. You probably already know how to use the datepart() function to extract specific components of a date (for example, year, month, day). You can then use those components along with a number of functions to calculate a date that you might need. This section provides some examples of algo- rithms you can use to generate some commonly needed date values. The DATEDIFF function calculates the difference between two dates, where the difference is based on an interval, such as hours, days, weeks, months, years, and so on. The DATEADD function calculates a date by adding an interval of time to a date. In this case, the inter- vals of time are the same as those used by the DATEDIFF function. Using the DATEADD and DATEDIFF functions to calculate specific dates requires thinking outside the box a bit to . Common Table Expressions SQL Server 2008 provides multiple options for working with temporary result sets in T- SQL code: . Temporary tables . Table variables ptg 1655 General T -SQL Performance Recommendations 43 (that is, larger than can be held in SQL Server cache memory), you should consider storing it in a temporary table rather than a table variable. NOTE In SQL Server 2008, you can define table data. on using the MERGE statement, see Chapter 42, “What’s New for Transact- SQL in SQL Server 2008. ” ptg 1657 General T -SQL Performance Recommendations 43 When to Use Cursors Application performance