Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1342 Part IX Performance Tuning and Optimization FIGURE 64-13 Filtering by two indexes adds a merge join into the mix. Examining the performance stat in Table 64-1, multiple indexes has a Query Optimizer cost of .12 and uses four logical reads. For infrequent queries, Query Path 7, with its multiple indexes, is more than adequate, and much better than no index at all. However, for those few queries that run constantly, the next query path is a better solution for multiple criteria. Query Path 8: Filter by Ordered Composite Index For raw performance, the fastest solution to the ‘‘multiple WHERE clause criteria’’ problem is a single composite index, as demonstrated in Query Path 8. Creating a composite index with ProductID and StartDate as key columns sets up the test: 1342 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1343 Indexing Strategies 64 DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID DROP INDEX Production.WorkOrder.IX_WorkOrder_StartDate CREATE INDEX IX_WorkOrder_ProductID ON Production.WorkOrder (ProductID, StartDate); Rerunning the same query: SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757 AND StartDate = ‘2002-01-04’; The query execution plan, show in Figure 64-14, is a simple single-index seek operation and it performs wonderfully. FIGURE 64-14 Filtering two criteria using a composite index performs like greased lighting. 1343 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1344 Part IX Performance Tuning and Optimization Query Path 9: Filter by Unordered Composite Index One common indexing myth is that the order of the index key columns doesn’t matter — that is, SQL Server can use an index so long as the column is anywhere in the index. Like most myths, it’s a half truth. Searching b-tree indexes requires the data for the leading columns in the order of the columns. Searching for col1, col2, and so on works great, but searching for the columns out of order — e.g., col2 without col1 — requires scanning all the leaf-level data. Query Path 9 demonstrates the inefficiency of filtering by an unordered composite index. In the follow- ing example, StartDate is the second key in the composite index, so the data is there. Will the query use the index? SELECT WorkOrderID FROM Production.WorkOrder WHERE StartDate = ‘2002-01-04’; The Query Optimizer uses the IX_WorkOrder_ProductID composite non-clustered index, as shown in Figure 64-15, because it’s narrower than the clustered index, enabling more rows to fit on a page. But because the filter is by the second column, it can’t use the b-tree of the index; instead, SQL Server is forced to scan every row and manually filter (in the scan operation) to select the correct rows. Essen- tially, it’s doing the exact same operation as manually scanning a telephone book for everyone with a first name of Paul. Query Path 10: Non-SARGable Expressions SQL Server’s Query Optimizer examines the conditions within the query’s WHERE clause to determine which indexes are actually useful. If SQL Server can optimize the WHERE condition using an index, the condition is referred to as a search argument, or SARG for short. However, not every condition is a ‘‘SARGable’’ search argument. The final query path walks through a series of anti-patterns — designing WHERE clauses with conditions that can’t use b-tree indexes and that fall back to an index scan. ■ Wrapping the column in an expression forces SQL Server to evaluate the data using the expression for every row before it can determine if the row passes the WHERE clause criteria: SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID + 2 = 759; ■ The solution to this non-SARGable issue is to apply a little algebra and rewrite the query with the expression on the other side of the equals: SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID = 759 - 2; 1344 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1345 Indexing Strategies 64 FIGURE 64-15 Filtering by the second key column of an index forces an index scan. ■ Multiple conditions that are ANDed together are SARGs, but ORed conditions might not be useful for the b-tree: SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757 OR StartDate = ‘2002-01-04’; ■ Negative search conditions (<>, !>, !<, Not Exists, Not In, Not Like) are not eas- ily optimizable. It’s easy to prove that a row exists, but to prove it doesn’t exist requires examining every row: SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID NOT IN (400,800, 950); However, sometimes, a few negative values can be SARGable, so it’s worth testing. Often, it’s the number of rows returned that forces a scan, not the negative condition. 1345 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1346 Part IX Performance Tuning and Optimization ■ Conditions that begin with wildcards aren’t SARGable. An index can quickly locate WorkOrderID = 757, but must scan every row to find any WorkOrderID’s ending in 7: SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE WorkOrderID like ‘%7’; ■ If the WHERE clause includes a function, such as a string function, a table scan is required so every row can be tested with the function applied to the data: SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE DateName(dw, StartDate) = ‘Monday’; SQL Server 2008 does include some optimizations that can avoid the scan when working with date conversions. The type of access (index scan vs. index seek) not only affects the performance of reading data from the single table, it also affects join performance. The type of join chosen by SQL Server depends on whether or not the data is ordered. Typically, if the data is being read as efficiently as possible from the single table, the data will then passed to an efficient join. However, inefficient table access is compounded by the subsequent inefficient join performance. A Comprehensive Indexing Strategy An index strategy deals with the overall application, rather than fixing isolated problems to the detri- ment of the whole. In my consulting practice, I’ve found that the key to indexing is knowing when you need a bookmark lookup vs. when to design indexes to avoid bookmark lookups. Identifying key queries Analyzing a full query workload, which includes a couple days of operations (and nightly or weekend workloads) will likely reveal that although there may be a few hundred distinct queries, the majority of the CPU time is spent on the top handful of queries. I’ve tuned systems on which 95 percent of the CPU time was spent on only five queries. Those top queries demand flat-out performance, while the other queries might be able to afford a bookmark lookup. To identify those top queries: 1. Create a Profiler trace to capture all queries or stored procedures: ■ Profiler event: T-SQL SQL:Completed and Remote Procedure Call:Completed ■ Profiler columns: TextData, ApplicationName, CPU, Reads, Writes, Duration, SPID, EndTime, DatabaseName,andRowCounts. It’s terribly important to not filter the trace to capture only long-running queries (a common suggestion is to set the filter to capture only queries with a duration > 1 sec). Every query must be captured. 2. Test the trace definition using Profiler for a few moments, and then stop the trace. Be sure to filter out applications or databases not being analyzed. 1346 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1347 Indexing Strategies 64 3. In the trace properties, add a stop time to the trace definition (so it will capture a full day’s and night’s workload), and set up the trace to write to a file. 4. Generate a trace script using the File ➪ Export ➪ Script Trace Definition ➪ for SQL Server 2005–2008 menu command. 5. Check the script. You may need to edit it to supply a filename and path, and double-check the start and stop times. Execute the trace script on the production server. 6. Pull the trace file into Profiler and then save it to a table using the File ➪ Save As ➪ Trace Table menu command. 7. Profiler exports the TextData column as an nText data type, and that just won’t do. The following code creates an nVarChar(max) column, which is much friendlier, with string functions: alter table trace alter column textdata nvarchar(max); 8. Run the following aggregate query to summarize the query load. This query assumes that the trace data was saved to a table creatively named trace: select substring(querytext, 1, CHARINDEX(‘ ’,querytext, 6)), count(*) as ‘count’, sum(duration) as ‘SumDuration’, avg(duration) as ‘AvgDuration’, max(duration) as ‘MaxDuration’, cast(SUM(duration) as numeric(20,2)) / (select sum(Duration) from trace) as ‘Percentage’, sum(rowcounts) as ‘SumRows’ from trace group by substring(querytext, 1, charindex(‘ ’,querytext, 6)) order by sum(Duration) desc; The top queries will become obvious. Table CRUD analysis For each table involved with one of the top queries, it’s important to collect together in one pre- sentation these top queries and stored procedures that hit that table. Plot the access using a CRUD (create, retrieve, update, delete) matrix, as shown in Table 64-2. This example analyzes a fictitious OrderDetail table and examines only three fictitious procedures for simplicity. The abbreviations are as follows: S for selected column, O for order by column, W for a column refer- enced in the WHERE clause, and G for the group by function. The next step is to design the fewest number of indexes that satisfies the table’s needs. This process first determines the clustered index and then creates indexes for every procedure and query that accesses the table, as shown in the following list and Table 64-3. The numbers in the chart indicate the ordinal posi- tion of the column in the index. An included column is listed as I. 1347 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1348 Part IX Performance Tuning and Optimization TABLE 64-2 Table CRUD Usage Analysis Column pGetOrder pCheckQuantity pShipOrder OrderDetailID SW OrderID WW ProductID SS NonStockProduct S Quantity SS UnitPrice S ExtendedPrice S ShipRequestDate SW ShipDate SU ShipComment SU TABLE 64-3 Table Strategic Index Plan Column CI Ix2 Ix1 OrderDetailID 1 OrderID 1 (cl) (cl) ProductID I NonStockProduct Quantity I UnitPrice ExtendedPrice ShipRequestDate 1 ShipDate ShipComment 1348 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1349 Indexing Strategies 64 Because the OrderDetail table is often selected using the OrderID column, and this column can also be used to gather multiple rows into a single data page, OrderID is the best candidate for the clustered index ( CI). The clustered index will consist of one column — the OrderID —soa1 goes in the ordered row for the clustered index, indicating that it’s the first column of the clustered index. The clustered index satisfies the pGetOrder procedure. The pCheckQuantity procedure verifies the quantity on hand prior to shipping. It filters the rows by ShipRequestDate and OrderID. Creating a non-clustered index with ShipRequestDate will index both the ShipRequestDate column and the OrderID column, as the clustered index is present in the leaf node of the non-clustered index. Because the procedure needs only four columns, adding ProductID and Quantity as included columns will enable Ix1 to completely cover the needs of the query and significantly improve performance. The third procedure can be satisfied by adding a non-clustered index, Ix2, with the OrderDetailID column. Although this example had only three procedures, and may seem simplistic, if the plan focuses on the top queries, most production tables will, in fact, have only a handful of queries or stored procedures. The Database Engine Tuning Advisor is a SQL Server 2008 utility that can analyze a single query or a set of queries and recommend indexes and partitions to improve performance. My indexing strategy is based on knowing when to use a bookmark lookup vs. when to avoid a book- mark lookup. The Database Engine Tuning Advisor doesn’t know whether a given query should or should not have a bookmark lookup so it can’t follow the strategy. Therefore, I recommend that you avoid the Database Engine Tuning Advisor. If you understand how queries work, you don’t need the Advisor anyway. Selecting the clustered index Selecting the clustered index is a critical piece of the performance puzzle, perhaps the most important piece of the physical schema. A clustered index can affect performance in several ways: ■ When an index seek operation finds a row using a clustered index, the data is right there — no bookmark lookup. This makes the column used to select the row, probably the primary key, an ideal candidate for a clustered index. ■ Clustered indexes gather rows with the same or similar values to the smallest possible number of data pages, thus reducing the number of data pages required to retrieve a set a rows. Clus- tered indexes are therefore excellent for columns that are often used to select a range of rows, such as secondary table foreign keys like OrderDetail.OrderID. ■ Inserting data in the middle of a clustered index is always a bad idea. The page splits can cripple performance, so carefully consider the actual data usage for every clustered index. The MOC (Microsoft Official Curriculum) used to teach that the primary purpose of the clustered index was gathering together similar rows (the second bullet above). When I wrote the SQL Server 2000 Bible , I also believed that was the primary reason for a clustered index. I now believe that avoiding a bookmark lookup is a stronger case for designing a clustered index, and the second bullet only sometimes applies. 1349 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1350 Part IX Performance Tuning and Optimization Creating base indexes Even before tuning, the locations of a few indexes are easy to determine. These base indexes are the first step in building a solid set of indexes. Here are a few steps to keep in mind when building these base indexes: 1. Create a clustered index for every table. For primary tables, cluster on the column most likely used to select the row — probably the primary key. For secondary tables that are most commonly retrieved by a set of related rows, create a clustered index for the most important foreign key to group those related rows together. 2. Create non-clustered indexes for the columns of every foreign key, except for the foreign key that was indexed in step 1. Use only the foreign key values as index keys. I’ve devel- oped a script that will create a non-clustered index for every foreign key (download from www.sqlserverbible.com). 3. Create a single-column index for every column expected to be referenced in a WHERE clause, an ORDER BY,oraGROUP BY. While this indexing plan is far from perfect, and it’s definitely not a final indexing plan, it provides an initial compromise between no indexes and tuned indexes, and can serve as a baseline performance measurement to compare against future index tuning. Additional tuning will likely involve creating composite indexes and removing unnecessary indexes. Best Practice W hen planning indexes, there’s a subtle tension between serving the needs of select queries vs. update queries. While an index may improve query performance, there’s a performance cost because when a row is inserted, updated, or deleted, the indexes must be updated as well. Nonetheless, some indexing is necessary for write operations. The update or delete operation must locate the row prior to performing the write operation, and useful indexes facilitate locating that row, thereby speeding up write operations. Therefore, when planning indexes, be careful to include the fewest number of indexes to accomplish the job. SQL Server exposes index usage statistics via dynamic management views. Specifically, sys.dm_db_index_operational_stats and sys.dm_index_usage_stats uncover infor- mation about how indexes are being used. In addition, there are four dynamic management views that reveal indexes that the Query Optimizer looked for but didn’t find: sys.dm_missing_index_groups, sys.dm_missing_index_group_stats, sys.dm_missing_index_columns,andsys.dm_missing_ index_details . Specialty Indexes Beyond the standard clustered and non-clustered indexes, SQL Server offers two type of indexes I refer to as specialty indexes. Filtered indexes, new in SQL Server 2008, include less data, and indexed 1350 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1351 Indexing Strategies 64 views, available since SQL Server 2000, build out custom sets of data. Both are considered high-end performance-tuning indexes. Filtered indexes Until SQL Server 2008, every non-clustered index indexed every key value and every row. Filtered indexes allow adding a WHERE clause to the CREATE INDEX statement. This option is only available for non-clustered indexes (how could a clustered index not include some rows?) A filtered index not only includes fewer rows at the leaf level, but also includes fewer values in the intermediate levels. It’s this reduction in intermediate levels that causes the reads to be fewer for any index seek. An example of employing a filtered index in AdventureWorks2008 is the ScrappedReasonID col- umn in the Production.WorkOrder table. Fortunately for Adventure Works, they scrapped only 612 (.8%) parts over the life of the database. The existing IX_WorkOrder_ScrapReasonID includes every row. The ScrapReasonID foreign key in the Production.WorkOrder table allows nulls for work orders that were not scrapped. The index includes all the null values with pointers to the WorkOrder rows with NULL ScrapReasonIDs. The current index uses 109 pages. The following script recreates the index with a WHERE clause that excludes all the NULL values: DROP INDEX Production.WorkOrder.IX_WorkOrder_ScrapReasonID CREATE INDEX IX_WorkOrder_ScrapReasonID ON Production.WorkOrder(ScrapReasonID) WHERE ScrapReasonID IS NOT NULL The new index uses only two pages. Interestingly, the difference isn’t noticeable between using the filtered or a non-filtered index when selecting all the work orders with a scrap reason that’s not null. That’s because there aren’t enough intermediate levels to make a significant difference. For a much larger table, the difference would be worth testing, and most likely the filtered index would provide a benefit. Filtered indexes, because of their compact size, not only reduce the disk usage but are easier to maintain. Best Practice W hen designing a covering index (see index kata #6) to solve a specific query — probably one that represents the top handful of CPU duration according to the indexing strategy — if the covering index works with a relatively small subset of data, and the overall table is a large table, consider filtering the covering index. Another situation that might benefit from filtered indexes is building a unique index that includes multi- ple rows with null values. A normal unique index allows only a single row to include a null value in the key columns. However, building a unique index that excludes null in the WHERE clause creates a unique index that permits an unlimited number of null values. 1351 www.getcoolebook.com . standard clustered and non-clustered indexes, SQL Server offers two type of indexes I refer to as specialty indexes. Filtered indexes, new in SQL Server 2008, include less data, and indexed 1350 www.getcoolebook.com Nielsen. 64 views, available since SQL Server 2000, build out custom sets of data. Both are considered high-end performance-tuning indexes. Filtered indexes Until SQL Server 2008, every non-clustered. Non-SARGable Expressions SQL Server s Query Optimizer examines the conditions within the query’s WHERE clause to determine which indexes are actually useful. If SQL Server can optimize the WHERE