Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 582 Part IV Developing with SQL Server In the following example cursor loop, a row is inserted and deleted so that you can test how data changes affect the different types of cursors: Use AdventureWorks2008; setup sample row for deleted row cursor test DELETE Production.Location WHERE Name like ‘Paul%’; INSERT Production.Location (Name, CostRate, Availability) VALUES (’PaulsTest’, 1, 1); set-up variables DECLARE @LocationID SMALLINT, @LocationName VARCHAR(50), @CostRate SMALLMONEY, @Availability DECIMAL(8,2); Step 1 / Declare the Cursor alternative cursor types: STATIC | KEYSET | DYNAMIC | FAST_FORWARD DECLARE cLocation CURSOR STATIC FOR SELECT LocationID, Name, CostRate, Availability FROM Production.Location ORDER BY Name; Step 2 / Open the Cursor OPEN cLocation; Step 3 / Prime the Cursor FETCH cLocation INTO @LocationID, @LocationName, @CostRate, @Availability; set-up the print output PRINT ‘@@Fetch_Status LocationID Name ‘; PRINT ‘ ’; WHILE @@Fetch_Status <> -1 =0 BEGIN; while loop perform some work with the data but only if row actually found IF @@Fetch_Status = 0 BEGIN; PRINT CAST(@@Fetch_Status as CHAR(10)) + ‘ ‘ + CAST(@LocationID as CHAR(10)) 582 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 583 Kill the Cursor! 22 + ‘ ‘ + @LocationName; END; IF @@Fetch_Status = -2 PRINT ‘Hit Deleted Row’; Step 3 / Iterating through the cursor FETCH cLocation INTO @LocationID, @LocationName, @CostRate, @Availability; Insert and delete rows during cursor run for cursor type testing IF @LocationID = 40 BEGIN; INSERT Production.Location (Name, CostRate, Availability) VALUES (’PaulsINSERT’, 1, 1); DELETE Production.Location WHERE Name = ‘PaulsTest’; END; END; while loop PRINT ‘’; PRINT ‘Final @@Fetch_Status: ‘ + Cast(@@Fetch_Status as char(2)); Step 4 / Close CLOSE cLocation; Step 5 / Deallocate DEALLOCATE cLocation; Watching the cursor In SQL Server 2008, there are three ways to observe the cursor in action: ■ Step through the cursor WHILE loop using the new T-SQL debugger. The T-SQL debugger is covered in Chapter 6, ‘‘Using Management Studio,’’ and there’s a screencast demonstrating the debugger on www.sqlserverbible.com. ■ Use SQL Profiler to watch the T-SQL statements. Select the T-SQL / SQL:StmtCompleted event. There are cursor events listed in SQL Profiler, but they apply to ADO cursors, not T-SQL server-side cursors. SQL Profiler is explained in Chapter 56, ‘‘Tracing and Profiling.’’ ■ Insert simple PRINT and SELECT statements to display the cursor progress to the client, as shown in the previous example. Cursor options Focusing on the enhanced T-SQL cursor, SQL Server supports four basic types of cursors — their differ- ences lie in how they store the data set as the cursor is working with the data. 583 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 584 Part IV Developing with SQL Server To test these cursor types, change the option in thepreviouscodesampleandwatchforinsertedand deleted rows: ■ Static: Copies all the data into tempdb and the cursor iterates over the copy of the data. Any changes (inserts, updates, or deletes) to the real data are not seen by the cursor. This type of cursor is generally the fastest. ■ Keyset: Only the minimum number of columns needed to identify the rows in the correct order are copied to tempdb. The cursor walks through the data by internally joining the keyset table in tempdb with the real data. Updates and deletes are seen by the cursor, but not inserts. This is the only cursor type that experiences deleted rows as @@fetch_status = −2, so be sure to test for deleted rows. Keyset cursors, compared to static cursors, write less to tempdb when creating the cursor set, but they must perform most of the cursor SELECT statement for every fetch. Therefore, if the SELECT statement used to define the cursor refer ences several data sources, avoid keyset cursors. ■ Dynamic: The cursor i terates over the original real data. All changes are seen by the cursor without any special handling of the changes. If a row is inserted after the cursor location, then the cursor will see that row when the cursor reaches the new row. If a row is deleted, then the cursor will simply not see the row when it reaches where the row had been. ■ Fast_Forward: This is the ‘‘high-performance’’ cursor option introduced in SQL Server 2000. Basically, it’s a read-only, forward-only dynamic cursor. While there are numerous options I’m purposely ignoring in this chapter, two others worth mentioning are: ■ Forward_only: The cursor may move only to the next row using FETCH [next]. ■ Scroll: The cursor may move to any row in any direction using any FETCH option — first, last, prior, next, relative, or absolute. Some developers, having heard that cursors are slow, propose to replace the cursor with a manual while loop. This technique, which I call a surrogate cursor , is still nothing more than a row-by-row iteration, and accomplishes nothing toward optimization and performance. Update cursor Because the cursor is already iterating through the data set, SQL Server knows which row is the current row. The cursor pointer can be referenced within a SQL UPDATE or DELETE command’s WHERE clause to manipulate the correct data. The cursor DECLARE command’s FOR UPDATE option enables updating using the cursor. Specific columns may be listed; or, if no columns are listed, then any column may be updated: DECLARE cDetail CURSOR FOR SELECT DetailID FROM Detail 584 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 585 Kill the Cursor! 22 WHERE AdjAmount IS NULL FOR UPDATE OF AdjAmount; Within the cursor loop, after the r ow has been fetched, a DML command may include the cur- sor within the WHERE clause using the CURRENT OF syntax. The following example, from the KilltheCursor.sql script, references the cDetail cursor: UPDATE Detail SET AdjAmount = @SprocResult WHERE CURRENT OF cDetail; You would think that the update cursor would have a performance advantage when iterat- ing through rows and updating them. In testing, however, Hugo Kornelis (this book’s pri- mary tech editor) and I have found that the update cursor is actually slightly slower than other cursor options. There are more cursor options, but if you’re that deep into a writing a cursor, you’ve gone too far. Cursor scope Because cursors tend to be used in the most convoluted situations, understanding cursor scope is impor- tant. The scope of the cursor determines whether the cursor lives only in the batch in which it was cre- ated or extends to any called procedures. The scope can be configured as the cursor is declared: DECLARE CursorName CURSOR Local | Global FOR Select Statement; The default cursor scope is set at the database level with the cursor_default option: ALTER DATABASE Family SET CURSOR_DEFAULT LOCAL; The current cursor scope is important to the execution of the procedure. To examine the current default setting, use the database property’s examine function: SELECT DATABASEPROPERTYEX(’Family’, ‘IsLocalCursorsDefault’); Result: 1 Cursors and transactions When compared with set-based solutions, cursors seem to have the edge with locking. Some argue that a million-row, set-based update transaction might lock the entire table, whereas performing the same update within a cursor would lock a single row at a time, so while the cursor might take several times longer, at least it’s not blocking other transactions. You can decide where you fall in this particular debate. Executing set-based queries in controlled batches of rows — for example, 1,000 rows per set-based batch — is another way to manage the locking issues presented by very large transactions. This can be done using Set RowCount (but it’s deprecated) or by windowing and ranking’s row_number()function. 585 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 586 Part IV Developing with SQL Server Regarding locking, one technique that is sometimes used to improve the performance of cursors is to wrap the entire cursor within a logical transaction. There are pros and cons to this solution. While it will improve the cursor by as much as 50 percent because less locks need to be taken and released, the penalty is the blocking caused by coarser granularity and longer life span of the locks in the transaction. Cursor Strategies Most SQL Server developers would agree that cursors should be avoided, unless they are required, which begs the question, ‘‘When are cursors the best solution?’’ In that spirit, here are the four ‘‘Paul Approved’’ specific situations when using a cursor is the right solution: ■ Iterating over a stored procedure: When a stored procedure must be executed several times, once for each row or value, and the stored procedure can’t be refactored into a set-based solution, or it’s a system stored procedure, then a cursor is the right way to iteratively call the stored procedure. ■ Iterating over DDL code: When DDL code must be dynamically executed multiple times, using a cursor is the appropriate solution. For example, in several places in the Nordic code (my O/R DBMS, available on CodePlex.com), it’s necessary to iterate over multiple rows or columns, generating a dynamic SQL statement for each row or column. For instance, each IsSearchable column must be inserted into the SearchWordList table as a separate row, so a cursor selects each IsSearchable column and a dynamic SQL statement performs the insert. This type of technique can also be used for automating admin tasks. However, as with the problem of denormalizing a list, if the DDL order is not important, such as when generating the updated()portions of the AutoAudit triggers (see Chapter 53, ‘‘Data Audit Triggers’’), then the multiple-assignment variable solution performs best. ■ Cumulative Totals/Running Sums: While there are set-based solutions, a cursor is the best-performing solution in these cases because it only has to add the next row’s value to the cumulative value. See Chapter 12, ‘‘Aggregating Data,’’ for a complete cursor solution to this type of problem. ■ Time-Sensitive Data: Some time-sensitive problems, depending on the database design, can benefit by using a cursor to determine the duration between events. Like the cumulative totals problem, time-sensitive data requires comparing the current row with the last row. Although there are possible set-based solutions, in some cases I’ve seen cursors perform better than set-based solutions. For example, if a table holds manufacturing data using a row for every process event, then a cursor might be required to analyze the time difference between manufacturing events, or to identify the previous event that caused a problem in the current events. 586 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 587 Kill the Cursor! 22 It’s important to note that a difficult problem or one with complex logic is not a justified reason for resorting to a cursor. Refactoring Complex-Logic Cursors A cursor that’s wrapped around a complex-logic problem is often considered the most difficult cursor to kill. The difficulty arises when the logic includes multiple formulas, variable amounts, and multiple exceptions. I have found three techniques useful when refactoring an iterative solution into a set-based solution: ■ Embed the logic into a user-defined f unction. ■ Break up the solution into multiple set-based queries. ■ Use case expressions to embed variable logic, and even dynamic formulas, into the query. Testing a Complex-Logic Problem I magine a billing situation with multiple billing formulas and multiple exceptions. Here are the business rules for the sample complex-logic cursor. Variable Formula: ■ 1 — Normal: BaseRate * Amount * ActionCode’s BaseMultiplier ■ 2 — Accelerated Rate Job: BaseRate * Amount * Variable Acceleration Rate ■ 3 — Prototype Job: Amount * ActionCode’s BaseMultiplier Exceptions: ■ If there’s an executive override on the order, then ignore the ActionCode’s BaseMultiplier. ■ If the transaction occurs on a weekend, then multiply the adjusted amount by an additional 2.5. ■ Premium clients receive a 20% discount to their adjusted rate. ■ The adjusted rate is zero if the client is a pro bono client. That’s it: three formulas and four exceptions. Typically, that’s enough to justify writing a cursor . . . but is it? The CursorTest script sets up the preceding problem and then tests it with several possible options, including several cursor types, a surrogate cursor, and the three set-based methods for refactoring a complex cursor against a progressively growing set of data. You can see the results in Figure 22-1. continued 587 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 588 Part IV Developing with SQL Server continued FIGURE 22-1 Cursors fail to scale, as demonstrated by the results of the CursorTest performance test. The vertical scale indicates the time to complete the formula for all the rows ranging from 0 to 35000 milliseconds. The horizontal scale (1-10) indicates the size of the data. Each iteration adds the same amount of data. 35000 30000 25000 20000 15000 12 10000 5000 Fast Forward Cursor/Update 0 345678910 Update Cursor Fast Forward Cursor/Update from Sproc SQL-92 Cursor/ Update from Sproc Multiple Queries Query w/ Function Query w/ Case If you want to examine the code for this test, test it, and tweak it for yourself, download the CursorTest.sql script from www.sqlserverbible.com. Update query with user-defined function This solution appears surprisingly simplistic, but looks can be deceiving. This solution hides all the logic within the user-defined formula. Although it would appear that SQL Server calls the function for every row of the query, embedding the function within an UPDATE DML statement has its benefits. Examin- ing the query execution plan shows that the Query Optimizer can sometimes incorporate the function’s logic within the query plan and generate an excellent set-based solution. Here’s an example from the CursorTest script: UPDATE dbo.Detail SET AdjAmount = dbo.fCalcAdjAmount(DetailID) WHERE AdjAmount IS NULL; 588 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 589 Kill the Cursor! 22 Multiple queries The second set-based method uses an individua l query for each formula and exception. The WHERE clauses of the queries restrict their operation to only those rows that require their respective formula or exception. This solution introduces a data-driven database design component. The acceleration rate is supplied as a data-driven value from the Variable table using a scalar subquery, and the exceptions are handled using data-driven joins to the ClientType and DayofWeekMultiplier tables: UPDATE dbo.Detail SET AdjAmount = BaseRate * Amount FROM Detail JOIN ActionCode ON Detail.ActionCode = ActionCode.ActionCode JOIN [Order] ON [Order].OrderID = Detail.OrderID WHERE (Formula = 1 OR Formula = 3 )AND ExecOverRide = 1 AND AdjAmount IS NULL; UPDATE dbo.Detail SET AdjAmount = BaseRate * Amount * BaseMultiplier FROM Detail JOIN ActionCode ON Detail.ActionCode = ActionCode.ActionCode JOIN [Order] ON [Order].OrderID = Detail.OrderID WHERE Formula = 1 AND ExecOverRide = 0 AND AdjAmount IS NULL; 2-Accelerated BaseRate * Amount * Acceleration Rate UPDATE dbo.Detail SET AdjAmount = BaseRate * Amount * (SELECT Value FROM dbo.Variable WHERE Name = ‘AccRate’); FROM Detail JOIN ActionCode ON Detail.ActionCode = ActionCode.ActionCode JOIN [Order] ON [Order].OrderID = Detail.OrderID WHERE Formula = 2 AND AdjAmount IS NULL; 3-Prototype Amount * ActionCode’s BaseMultiplier UPDATE dbo.Detail SET AdjAmount = Amount * BaseMultiplier FROM Detail JOIN ActionCode ON Detail.ActionCode = ActionCode.ActionCode 589 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 590 Part IV Developing with SQL Server JOIN [Order] ON [Order].OrderID = Detail.OrderID WHERE Formula = 3 AND ExecOverRide = 0 AND AdjAmount IS NULL; Exceptions WeekEnd Adjustment UPDATE dbo.Detail SET AdjAmount *= Multiplier FROM Detail JOIN [Order] ON [Order].OrderID = Detail.OrderID JOIN DayOfWeekMultiplier DWM ON CAST(DatePart(dw,[Order].TransDate) as SMALLINT) = DWM.DayOfWeek; Client Adjustments UPDATE dbo.Detail SET AdjAmount *= Multiplier FROM Detail JOIN [Order] ON [Order].OrderID = Detail.OrderID JOIN Client ON [Order].ClientID = Client.ClientID Join ClientType ON Client.ClientTypeID = ClientType.ClientTypeID; Query with case expression The third refactoring strategy uses a CASE expression and data-driven values to solve complexity within asinglequery.The CASE expression’s power derives from the fact that it incorporates flexible logic within a single query. Data-driven values and formulas are also incorporated into the query using joins to connect the base row with the correct lookup values. Data-driven designs also reduce maintenance costs b ecause values can be easily changed without programming alterations. In this example, the CASE expression selects the correct formula based on the values within the ActionCode table. The executive override is hard-coded into the CASE expression, but with a little work that too could be data driven. As with the multiple query solution, the acceleration rate and exceptions are data driven: UPDATE dbo.Detail SET AdjAmount = DWM.Multiplier * ClientType.Multiplier * CASE WHEN ActionCode.Formula = 1 AND ExecOverRide = 0 THEN BaseRate * Amount * BaseMultiplier 590 www.getcoolebook.com Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 591 Kill the Cursor! 22 WHEN (ActionCode.Formula = 1 OR ActionCode.Formula = 3 ) AND ExecOverRide = 1 THEN BaseRate * Amount WHEN ActionCode.Formula = 2 THEN BaseRate * Amount * (SELECT Value FROM dbo.Variable WHERE Name = ‘AccRate’) WHEN (Formula = 3 AND ExecOverRide = 0) THEN Amount * BaseMultiplier END FROM Detail JOIN ActionCode ON Detail.ActionCode = ActionCode.ActionCode JOIN [Order] ON [Order].OrderID = Detail.OrderID JOIN Client ON [Order].ClientID = Client.ClientID Join ClientType ON Client.ClientTypeID = ClientType.ClientTypeID JOIN DayOfWeekMultiplier DWM ON CAST(DatePart(dw,[Order].TransDate) as SMALLINT) = DWM.DayOfWeek WHERE AdjAmount IS NULL; Summary To quote another book author and MVP, my friend Bill Vaughn, ‘‘Cursors are evil!’’ — perhaps not the greatest evil in database development, but Bill has a point. When an optimization improves performance by a magnitude of time (e.g., hours to minutes, minutes to seconds), that’s when the job is fun. As shown earlier, cursors have their place, but I’ve never seen any business requirements that couldn’t be solved with a set-based query. There’s no better way to optimize a stored procedure than to find one that has an unnecessary cursor. When you’re looking for low-hanging fruit, cursors are a bout the juiciest you can find. The next chapter continues the theme of developing smarter T-SQL code by adding error handling to the mix. 591 www.getcoolebook.com . demonstrating the debugger on www.sqlserverbible.com. ■ Use SQL Profiler to watch the T -SQL statements. Select the T -SQL / SQL: StmtCompleted event. There are cursor events listed in SQL Profiler, but they. cLocation; Watching the cursor In SQL Server 2008, there are three ways to observe the cursor in action: ■ Step through the cursor WHILE loop using the new T -SQL debugger. The T -SQL debugger is covered. Nielsen c22.tex V4 - 07/23/2009 4:52pm Page 582 Part IV Developing with SQL Server In the following example cursor loop, a row is inserted and deleted so that