Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1332 Part IX Performance Tuning and Optimization Conventional wisdom holds that this is the fastest possible query path, and it is snappy when returning a single row; however, from a rows-returned-per-millisecond perspective, it’s one of the slowest query paths. A common myth is that seeks can only return single rows and that’s why seeking multiple rows would be very slow compared to scans. As the next two query paths indicate, that’s not true. Query Path 3: Range Seek Query The third query path selects a narrow range of consecutive values using a between operator in the WHERE clause: SELECT * FROM Production.WorkOrder WHERE WorkOrderID between 10000 and 10010; The Query Optimizer must first determine whether there’s a suitable index to select the range. In this case it’s the same key column in the clustered index as in Query path 2. A range seek query has an interesting query execution plan. The seek predicate (listed in the index seek properties), which defines how the query is navigating the b-tree, has both a start and an end, as shown in Figure 64-6. This means the operation is seeking the first row and then quickly scanning and return- ing every row to the end of the range, as illustrated in Figure 64-7. To further investigate the range seek query path, this next query pushes the range to the limit by select- ing every row in the table. Both queries are tested just to prove that between is logically the same as >= with <=: SELECT * FROM Production.WorkOrder WHERE WorkOrderID >= 1 and WorkOrderID <= 72591; SELECT * FROM Production.WorkOrder WHERE WorkOrderID between 1 and 72591; At first blush it would seem that this query should generate the same query execution plan as the first query path ( select * from table), but, just like the narrow range query, the between operator needs a consecutive range of rows, which causes the Query Optimizer to select index seek to return ordered rows. Keep in mind that there’s no guarantee that another row might be added after the query plan is gener- ated and before it’s executed. Therefore, for range queries, an index seek is the fastest possible way to ensure that only the correct rows are selected. 1332 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1333 Indexing Strategies 64 FIGURE 64-6 The clustered index seek’s seek predicate has a start and an end indicating the range of rows searched for using the b-tree index. Index seeks and index scans both perform well when returning large sets of data. The minor difference between the two query’s durations listed in the performance chart (refer to Table 64-1) is more likely due to variance in my computer’s performance. There were some iterations of the index seek that per- formed faster than some iterations of the index scan. 1333 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1334 Part IX Performance Tuning and Optimization FIGURE 64-7 An index seek operation has the option of seeking to find the first row, and then sequentially scanning on a block of data. Clustered Index PK_WorkOrder_WorkOrderID Seek Scan Query Path 4: Filter by non-key column The previous query paths were simple to solve because the filter column matched the clustered index key column and all the data was available from one index; but what if that isn’t the case? Consider this query: SELECT * FROM Production.WorkOrder WHERE StartDate = ‘2003-06-25’; There’s no index with a key column of StartDate. This means that the Query Optimizer can’t choose a fast b-tree index and must resort to scanning the entire table and then manually searching for rows that match the WHERE clause. Without an index, this query path is 23 times slower than the clustered index seek query path. The cost isn’t the filter operation alone (which is only 7 percent of the total query cost). The real cost is having to scan in every row and pass 72,592 rows to the filter operation, as shown in the query execu- tion plan in Figure 64-8. Note that this query execution plan suggests a missing index. Management Studio will even generate the code to create the missing index using the context menu, not that I’d suggest using that as an indexing strategy. (Too often the missing index is not the best index, and it often wants to build a non-clustered index that includes every column.) 1334 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1335 Indexing Strategies 64 FIGURE 64-8 Query path 4 (filter by non-key column) passes every row from an index scan to a filter operation to manually select the rows. Query Path 5: Bookmark Lookup This bookmark lookup query path is a two-edged sword. For infrequent queries, it’s the perfect query path, but for the handful of queries that consume the majority of the server’s CPU, this query path will kill performance. To demonstrate a bookmark lookup query path, the following query filters by ProductID while return- ing all the base table’s columns: SELECT * FROM Production.WorkOrder WHERE ProductID = 757; 1335 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1336 Part IX Performance Tuning and Optimization To rephrase the query in pseudo-code, find the rows for Product 757 and give me all the columns for those rows. There is an index on the ProductID column, so the Query Optimizer has two possible options: ■ Scan the entire clustered index to access all the columns, and then filter the results to find the right rows. Essentially, this would be the same as query path 4. ■ Perform an index seek on the IX_Workload_ProductID index to fetch the 11 rows. In the process, it learns the WorkOrderID values for those 11 rows (because the clustered index key columns are in the leaf level of the non-clustered index). Then it can index seek those 11 rows from the clustered index to fetch the other columns. This jump, from the non-clustered index used to find the rows to the clustered index to complete the columns needed for the query, is called a bookmark lookup and is shown in Figure 64-9. FIGURE 64-9 The non-clustered index is missing a column. To solve the query, SQL Server has to perform a bookmark lookup (the dashed line) from the non-clustered index to the clustered index. This illustration shows a single row. In reality it’s often hundreds or thousands of rows scattered throughout the clustered index. PK_WorkOrder_WorkOrderIDIX_WorkOrder_ProductID The real cost of the bookmark lookup is that the rows are typically scattered throughout the clustered index. Locating the 11 rows in the non-clustered indexwasasinglepagehit,butthose11rowsmight be on 11 different pages in the clustered index. With a larger number of selected rows the problem intensifies. Selecting 1,000 rows with a bookmark lookup might mean reading 3–4 pages from the non-clustered index and then reading more than a thousand pages from the clustered index b-tree and leaf level. Eventually, SQL Server will decide that the bookmark lookup is more expensive than just scanning the clustered index. 1336 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1337 Indexing Strategies 64 In the Zen mindset of indexing, the best query path is one that can return all the data by navigating a single index. The bookmark lookup has to navigate two indexes, which is wasteful. The query execution plan for a bookmark lookup shows the two indexes as data sources for a nested loop join (as shown in Figure 64-10). For each row returned by the seek of the non-clustered index, the nested loop join is requesting the matching rows from the clustered index by calling the key lookup. If you think of SQL Server as having tables with indexes, this query execution plan appears confusing; but if you think of SQL Server as a collection of indexes with varying amounts of data, then fetching data from two indexes and joining the results makes sense. FIGURE 64-10 The query execution plan shows the bookmark lookup as an index seek being joined with a key lookup. It’s frequently said that Select * is wrong because it returns too many columns — the extra data is considered wasteful. I agree that Select * is wrong, but the real reason isn’t the extra network traffic, it’s the bookmark lookup that is almost always generated by a Select *. 1337 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1338 Part IX Performance Tuning and Optimization The following query builds on the last bookmark lookup query and demonstrates more about the bookmark lookup problem; the difference is that this query requests only one column that’s not available from the non-clustered index: SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757; Consider the performance difference (again, refer to Table 64-1) between this query path and the select * bookmark lookup query path. Their performance is nearly identical. It doesn’t take many columns to force a bookmark lookup; a single column missing from the non-clustered index means SQL Server must also look to the clustered index to solve the query. There are only two ways to avoid the bookmark lookup problem: ■ Filter by the clustered index key columns so the query can be satisfied using the clustered index (Query path 2 or 3). ■ Design a covering index (the next query path). Query Path 6: Covering Index If a non-clustered index includes every column required by the query (and that means every column referenced by the query: SELECT columns, JOIN ON condition columns, GROUP BY columns, WHERE clause columns, and windowing columns), then SQL Server’s Query Optimizer can choose to solve the query using only that non-clustered index. When this occurs the index is said to cover the needs of the query — in other words, it’s a covering index. An index by itself isn’t a covering index, rather it becomes a covering index for a specific query when the Query Optimizer can solve the query using only the non-clustered index. Query Path 5’s second query selected the StartDate column. Because StartDate isn’t part of the IX_WorkOrder_ProductID index, SQL Server was forced to use an evil bookmark lookup. To solve the problem, the following code adds StartDate to the IX_WorkOrder_ProductID index so the index can cover the query: DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID CREATE INDEX IX_WorkOrder_ProductID ON Production.WorkOrder (ProductID) INCLUDE (StartDate); The INCLUDE option (added in SQL Server 2005) adds the StartDate column to the leaf level of the IX_WorkOrder_ProductID index. The Query Optimizer can now solve the queries with an index seek (as show in Figure 64-11): SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757; 9 rows 1338 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1339 Indexing Strategies 64 SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 945; –- 1,105 rows FIGURE 64-11 With the StartDate column included in the index, the queries are solved with an index seek — a perfect covering index. A nuance of the non-clustered index structure proves to be useful when designing covering indexes. This next query filters by the non-clustered index key and returns the clustered index key value: SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID = 757; The Ix_WorkOrder_ProductID non-clustered index has the ProductID column as the key column, so that data is available in the b-tree. Even though the clustered index key, WorkOrderID, doesn’t show up anywhere in the Ix_WorkOrder_ProductID dialogs in Management Studio, it’s there. WorkOrderID is 1339 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1340 Part IX Performance Tuning and Optimization the clustered index key column, so every work order table’s non-clustered index includes ProductID in the index. The next query is a rare example of a covering index. Compared to the previous query path, this query adds the StartDate column in the WHERE clause. Conventional wisdom would say that this query requires an index scan because it filters by a non-key column ( StartDate is an included column in the index and not a key column): SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID = 945 AND StartDate = ‘2002-01-04’; In this case the index seek operator uses the b-tree index (keyed by ProductID) to seek the rows matching ProductID = 945. This can be seen in the index seek properties as the seek predicate (as illustrated in Figure 64-12). But then, the index seek operator continues to select the correct rows by filtering the rows by the included column ( AND StartDate = ‘2002-01-04’). In the index seek properties, the predicate is filtering by the StartDate column. The performance difference between the bookmark lookup solution and the covering index is dramatic. When comparing the Query Optimizer cost and the logical reads (refer to Table 64-1), the query paths that use a covering index are about 12 times more efficient. The duration appears less so due to my lim- ited hardware. QueryPath7:Filterby2xNCIndexes A common indexing dilemma is how to index for multiple WHERE clause criteria. Is it better to create one composite index that includes both key columns? Or do two single-key column indexes perform better? Query Paths 7 through 9 evaluate the options. The following code reconfigures the indexes: one index keyed on ProductID,andonewith StartDate: DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID; CREATE INDEX IX_WorkOrder_ProductID ON Production.WorkOrder (ProductID); CREATE INDEX IX_WorkOrder_StartDate ON Production.WorkOrder (StartDate); With these indexes in place, this query filters by both key columns: SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757 AND StartDate = ‘2002-01-04’; 1340 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1341 Indexing Strategies 64 FIGURE 64-12 The index seek operator can have a seek predicate, which uses the b-tree; and a predicate, which functions as a non-indexed filter. To use both indexes, SQL Server uses a merge join to request rows from each index seek and then cor- relate the data to return the rows that meet both criteria, as shown in Figure 64-13. 1341 www.getcoolebook.com . calling the key lookup. If you think of SQL Server as having tables with indexes, this query execution plan appears confusing; but if you think of SQL Server as a collection of indexes with varying. 5’s second query selected the StartDate column. Because StartDate isn’t part of the IX_WorkOrder_ProductID index, SQL Server was forced to use an evil bookmark lookup. To solve the problem, the. in Figure 64-9. FIGURE 64-9 The non-clustered index is missing a column. To solve the query, SQL Server has to perform a bookmark lookup (the dashed line) from the non-clustered index to the