Microsoft SQL Server 2008 R2 Unleashed- P130 doc

ptg 1234 CHAPTER 35 Understanding Query Optimization When the index union strategy is used on a heap table (such as the sales_noclust table), you see a query plan similar to the one shown in Figure 35.11. Notice that the merge join is replaced with a concatenation operation, and the stream aggregate is replaced with distinct sort operation. Although the steps are slightly different from the index intersection strategy, the result is similar: a list of unique RIDs is returned, and they are used to retrieve the matching data rows in the table itself. When the OR in the query involves only a single column and a nonclustered index exists on the column, the Query Optimizer in SQL Server 2008 typically resolves the query with an index seek against the nonclustered index and then a bookmark lookup to retrieve the data rows. Consider the following query: select * from sales where ord_date in (‘6/15/2005’, ‘9/28/2008’, ‘6/25/2008’) This query is the same as the following: select * from sales where ord_date = ‘6/15/2005’ or ord_date = ‘9/28/2008’ or ord_date = ‘6/25/2008’ To process this query, SQL Server performs a single index seek that looks for each of the search values and then joins the list of bookmarks returned with either the clustered index or the RIDs of the target table. No removal of duplicates is necessary because each OR condition matches a distinct set of rows. Figure 35.12 shows an example of the query plan for multiple OR conditions against a single column. Index Joins Besides using the index intersection and index union strategies, another way of using multiple indexes on a single table is to join two or more indexes to create a covering index. This is similar to an index intersection, except that the final bookmark lookup is not required because the merged index rows contain all the necessary information. Consider the following example: FIGURE 35.11 An execution plan for an index union strategy on a heap table. Download from www.wowebook.com ptg 1235 Row Estimation and Index Selection 35 FIGURE 35.12 An execution plan using index seek to retrieve rows for an OR condition on a single column. select stor_id from sales where qty = 816 and ord_date = ‘1/2/2008’ Again, the sales table contains indexes on both the qty and ord_date columns. Each of these indexes contains the clustered index as a bookmark, and the clustered index contains the stor_id column. In this instance, when the Query Optimizer merges the two indexes using a merge join, joining them on the matching clustered indexes, the index rows in the merge set have all the information needed to resolve the query because stor_id is part of the nonclustered indexes. There is no need to perform a bookmark lookup on the data page. By joining the two index result sets, SQL Server creates the same effect as having one covering index on qty, ord_date, and stor_id on the table. If you use the same numbers as in the “Index Intersection” section presented earlier, the cost of the index join would be as follows: 8 pages (1 root page + 1 intermediate page + the 6 leaf pages to find all the bookmarks for the 1,200 matching index rows on qty) + 4 pages (1 root page + 1 intermediate page + 2 leaf pages to find all the bookmarks for the 212 matching index rows on ord_date) = 12 pages Download from www.wowebook.com ptg 1236 CHAPTER 35 Understanding Query Optimization Figure 35.13 shows an example of the execution plan for an index join. Notice that it does not include the bookmark lookup present in the index intersection execution plan (refer to Figure 35.8). Optimizing with Indexed Views In SQL Server 2008, when you create a unique clustered index on a view, the result set for the view is materialized and stored in the database with the same structure as a table that has a clustered index. Changes made to the data in the underlying tables of the view are automatically reflected in the view the same way as changes to a table are reflected in its indexes. In the Developer and Enterprise Editions of SQL Server 2008, the Query Optimizer automatically considers using the index on the view to speed up access for queries run directly against the view. The Query Optimizer in the Developer and Enterprise Editions of SQL Server 2008 also looks at and considers using the indexed view for searches against the underlying base table, when appropriate. NOTE Although indexed views can be created in any edition of SQL Server 2008, they are considered for query optimization only in the Developer and Enterprise Editions of SQL Server 2008. In other editions of SQL Server 2008, indexed views are not used to optimize the query unless the view is explicitly referenced in the query and the NOEXPAND Query Optimizer hint is specified. For example, to force the Query Optimizer to consider using the sales_Qty_Rollup indexed view in the Standard Edition of SQL Server 2008, you execute the query as follows: FIGURE 35.13 An execution plan for an index join. Download from www.wowebook.com ptg 1237 Row Estimation and Index Selection 35 select * from sales_Qty_Rollup WITH (NOEXPAND) where stor_id between ‘B914’ and ‘B999’ The NOEXPAND hint is allowed only in SELECT statements, and the indexed view must be referenced directly in the query. (Only the Developer and Enterprise Editions consider using an indexed view that is not directly referenced in the query.) As always, you should use Query Optimizer hints with care. When the NOEXPAND hint is included in the query, the Query Optimizer cannot consider other alternatives for optimizing the query. Consider the following example, which creates an indexed view on the sales table, containing stor_id and sum(qty) grouped by stor_id: set quoted_identifier on go if object_id(‘sales_Qty_Rollup’) is not null drop view sales_Qty_Rollup go create view sales_qty_rollup with schemabinding as select stor_id, sum(qty) as total_qty, count_big(*) as id from dbo.sales group by stor_id go create unique clustered index idx1 on sales_Qty_Rollup (stor_id) go The creation of the clustered index on the view essentially creates a clustered table in the database with the three columns stor_id, total_qty, and id. As you would expect, the following query on the view itself uses a clustered index seek on the view to retrieve the result rows from the view instead of having to scan or search the sales table itself: select * from sales_Qty_Rollup where stor_id between ‘B914’ and ‘B999’ However, the following query on the sales table uses the indexed view sales_qty_rollup to retrieve the result set as well: select stor_id, sum(qty) from sales where stor_id between ‘B914’ and ‘B999’ group by stor_id Download from www.wowebook.com ptg 1238 CHAPTER 35 Understanding Query Optimization Essentially, the Query Optimizer recognizes the indexed view essentially as another index on the sales table that covers the query. The execution plan in Figure 35.14 shows the indexed view being searched in place of the table. NOTE In addition to the seven required SET options that need to be set appropriately when the indexed view is created, they must also be set the same way for a session to be able to use the indexed view in queries. The required SET option settings are as follows: SET ARITHABORT ON SET CONCAT_NULL_YIELDS_NULL ON SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON SET NUMERIC_ROUNDABORT OFF If these SET options are not set appropriately for the session running a query that could make use of an indexed view, the indexed view is not used, and the table is searched instead. For more information on indexed views, see Chapters 27, “Creating and Managing Views,” and 34, “Data Structures, Indexes, and Performance.” FIGURE 35.14 An execution plan showing an indexed view being searched to satisfy a query on a base table. Download from www.wowebook.com ptg 1239 Row Estimation and Index Selection 35 You might find rare situations when using the indexed view in the Enterprise, Datacenter, or Developer Editions of SQL Server 2008 leads to poor query performance, and you might want to avoid having the Query Optimizer use the indexed view. To force the Query Optimizer to ignore the indexed view(s) and optimize the query using the indexes on the underlying base tables, you specify the EXPAND VIEWS query option, as follows: select * from sales_Qty_Rollup where stor_id between ‘B914’ and ‘B999’ OPTION (EXPAND VIEWS) Optimizing with Filtered Indexes SQL Server 2008 introduces the capability to define filtered indexes and statistics on a subset of rows rather than on the entire rowset in a table. This is done by specifying simple predicates in the index create statement to restrict the set of rows included in the index. Filtered statistics help solve a common problem in estimating the number of matching rows when the estimates become skewed due to a large number of duplicate values (or NULLs) in an index or due to data correlation between columns. Filtered indexes provide query optimization benefits when you frequently query specific subsets of your data rows. If a filtered index exists on a table, the optimizer recognizes when a search predicate is compatible with the filtered index; it considers using the filtered index to optimize the query if the selectivity is good. For example, the titles table in the bigpubs2008 database contains a large percentage of rows where ytd_sales is 0. A nonclustered index typically doesn’t help for searches in which ytd_sales is 0 because the selectivity isn’t adequate, and a table scan would be performed. An advantageous approach then is to create a filtered index on ytd_sales without including the values of 0 to reduce the size of the index and make it more efficient. For example, first create an unfiltered index on ytd_sales on the titles table: create index ytd_sales_unfiltered on titles (ytd_sales) Then, execute the following two queries: select * from titles where ytd_sales = 0 select * from titles where ytd_sales = 10 As you can see by the query plan displayed in Figure 35.15, a query where ytd_sales = 0 still uses a table scan instead of the index because the selectivity is poor, whereas it uses the index for ytd_sales = 10. Download from www.wowebook.com ptg 1240 CHAPTER 35 Understanding Query Optimization Now, drop the unfiltered index and re-create a filtered index that excludes values of 0: drop index titles.ytd_sales_unfiltered go create index ytd_sales_filtered on titles (ytd_sales) where ytd_sales <> 0 Re-run the queries and examine the query plan again. Figure 35.16 shows that the query where ytd_sales = 0 still uses a table scan as before, but the query where ytd_sales = 10 is able to use the filtered index. In this case, it may be beneficial to define the filtered index instead of a normal index on ytd_sales because the filtered index will require less space and be a more efficient index FIGURE 35.15 An execution plan showing index not being used due to poor selectivity. FIGURE 35.16 An execution plan showing the filtered index being used. Download from www.wowebook.com ptg 1241 Join Selection 35 by excluding all the rows with ytd_sales values of 0, especially if the majority of the queries against the table are searching for ytd_sales values that are nonzero. NOTE For more information on creating and using filtered indexes, see Chapter 34. Join Selection The job of the Query Optimizer is incredibly complex. The Query Optimizer can consider literally thousands of options when determining the optimal execution plan. The statistics are simply one of the tools that the Query Optimizer can use to help in the decision- making process. In addition to examining the statistics to determine the most efficient access paths for SARGs and join clauses, the Query Optimizer must consider the optimum order in which to access the tables, the appropriate join algorithms to use, the appropriate sorting algorithms, and many other details too numerous to list here. The goal of the Query Optimizer during join selection is to determine the most efficient join strategy. As mentioned at the beginning of this chapter, delving into the detailed specifics of the various join strategies and their costing algorithms is beyond the scope of a single chapter on optimization. In addition, some of these costing algorithms are proprietary and not publicly available. The goal of this section, then, is to present an overview of the most common query processing algorithms that the Query Optimizer uses to determine an efficient execution plan. Join Processing Strategies If you are familiar with SQL, you are probably very familiar with using joins between tables in creating SQL queries. A join occurs any time the SQL Server Query Optimizer has to compare two inputs to determine an output. The join can occur between one table and another table, between an index and a table, or between an index and another index (as described previously, in the section “Index Intersection”). The SQL Server Query Optimizer uses three primary types of join strategies when it must compare two inputs: nested loops joins, merge joins, and hash joins. The Query Optimizer must consider each one of these algorithms to determine the most appropriate and efficient algorithm for a given situation. Each of the three supported join algorithms could be used for any join operation. The Query Optimizer examines all the possible alternatives, assigns costs to each, and chooses the least expensive join algorithm for a given situation. Merge and hash joins often Download from www.wowebook.com ptg 1242 CHAPTER 35 Understanding Query Optimization greatly improve the query processing performance for very large data tables and data warehouses. Nested Loops Joins The nested loops join algorithm is by far the simplest of the three join algorithms. The nested loops join uses one input as the “outer” loop and the other input as the “inner” loop. As you might expect, SQL Server processes the outer input one row at a time. For each row in the outer input, the inner input is searched for matching rows. Figure 35.17 illustrates a query that uses a nested loops join. Note that in the graphical execution plan, the outer loop is represented as the top input table, and the inner loop is represented as the bottom input table. In most instances, the Query Optimizer chooses the input table with the fewest number of qualifying rows to be the outer loop to limit the number of iterative lookups against the inner table. However, the Query Optimizer may choose the input table with the greater number of qualifying rows as the outer table if the I/O cost of searching that table first and then performing the iterative loops on the other table is lower than the alternative. The nested loop join is the easiest join strategy for which to estimate the I/O cost. The cost of the nested loop join is calculated as follows: Number of I/Os to read in outer input + Number of matching rows × Number of I/Os per lookup on inner input = Total logical I/O cost for quer y FIGURE 35.17 An execution plan for a nested loops join. The Query Optimizer evaluates the I/O costs for the various possible join orders as well as the various possible access paths and indexes available to determine the most efficient Download from www.wowebook.com ptg 1243 Join Selection 35 join order. The nested loops join is efficient for queries that typically affect only a small number of rows. As the number of rows in the outer loop increases, the effectiveness of the nested loops join strategy diminishes. The reason is the increased number of logical I/Os required as the number of qualifying rows increases. Also, if there are no useful indexes on the join columns, the nested loop join is not an efficient join strategy because it requires a table scan lookup on the inner table for each row in the outer table. Lacking useful indexes for the join, the Query Optimizer often opts to perform a merge or hash join. Merge Joins The merge join algorithm is much more effective than the nested loops join for dealing with large data volumes or when the lack of limiting SARGs or useful indexes on SARGs leads to a table scan of one or both tables involved in the join. A merge join works by retrieving one row from each input and comparing them, matching on the join column(s). Figure 35.18 illustrates a query that uses a merge join. A merge join requires that both inputs be sorted on the merge columns—that is, the columns specified in the equality ( ON) clauses of the join predicate. A merge join does not work if both inputs are not sorted. In the query shown in Figure 35.18, both tables have a clustered index on stor_id, so the merge column (stor_id) is already sorted for each table. If the merge columns are not already sorted, a separate sort operation may be required before the merge join operation. When the input is sorted, the merge join operation retrieves a row from each input and compares them, returning the rows if they are equal. If the inputs are not equal, the lower-value row is discarded, and another row is obtained from that input. This process repeats until all rows have been processed. Usually, the Query Optimizer chooses a merge join strategy, as in this example, when the data volume is large and both columns are contained in an existing presorted index, such as a clustered primary key. If either of the inputs is not already sorted, the Query Optimizer has to perform an explicit sort before the join. Figure 35.19 shows an example of a sort being performed before the merge join is performed. FIGURE 35.18 An execution plan for a merge join. Download from www.wowebook.com . any edition of SQL Server 2008, they are considered for query optimization only in the Developer and Enterprise Editions of SQL Server 2008. In other editions of SQL Server 2008, indexed views. ‘9/28 /2008 , ‘6/25 /2008 ) This query is the same as the following: select * from sales where ord_date = ‘6/15/2005’ or ord_date = ‘9/28 /2008 or ord_date = ‘6/25 /2008 To process this query, SQL. Strategies If you are familiar with SQL, you are probably very familiar with using joins between tables in creating SQL queries. A join occurs any time the SQL Server Query Optimizer has to compare

Định dạng
Số trang	10
Dung lượng	698,55 KB