Microsoft SQL Server 2008 R2 Unleashed- P126 doc

ptg 1194 CHAPTER 34 Data Structures, Indexes, and Performance In addition, the functions in the computed column must be deterministic. A deterministic function is one that returns the same result every time it is called with the same set of input parameters. When you create a clustered index on a computed column, it is no longer a virtual column in the table. The computed value for the column is stored in the data rows of the table. If you create a nonclustered index on a computed column, the computed value is stored in the nonclustered index rows but not in the data rows, unless you also have a clustered index on the computed column. Be aware of the overhead involved with indexes on computed columns. Updates to the columns that the computed columns are based on result in updates to the index on the computed column as well. Indexes on computed columns can be useful when you need an index on large character fields. As discussed earlier, the smaller an index, the more efficient it is. You could create a computed column on the large character field by using the CHECKSUM() function. CHECKSUM() generates a 4-byte integer that is relatively unique for character strings but not absolutely unique. (Different character strings can generate the same checksum, so when searching against the checksum, you need to include the character string as an additional search argument to ensure that you are matching the right row.) The benefit is that you can create an index on the 4-byte integer generated by the CHECKSUM() that can be used to search against the character string instead of having to create an index on the large character column itself. Listing 34.7 shows an example of applying this solution. LISTING 34.7 Using an Index on a Computed Checksum Column The first statement is used to disable any previously created DDL triggers in the database which would prevent creating a new constraint. DISABLE TRIGGER ALL ON DATABASE go First add the computed column to the table alter table titles add title_checksum as CHECKSUM(title) go Next, create an index on the computed column create index NC_titles_titlechecksum on titles(title_checksum) go In your queries, include both the checksum column and the title column in your search argument select title_id, ytd_sales from titles where title_checksum = checksum(‘Fifty Years in Buckingham Palace Kitchens’) and title = ‘Fifty Years in Buckingham Palace Kitchens’ Download from www.wowebook.com ptg 1195 Filtered Indexes and Statistics 34 SQL Server 2008 also supports persisted computed columns. With persisted computed columns, SQL Server stores the computed values in the table without requiring an index on the computed column. Like indexed computed columns, persisted computed columns are updated when any other columns on which the computed column depends are updated. Persisted computed columns allow you to create an index on a computed column that is defined with a deterministic, but imprecise, expression. This option enables you to create an index on a computed column when SQL Server cannot determine with certainty whether a function that returns a computed column expression—for example, a CLR function that is created in the Microsoft .NET Framework—is both deterministic and precise. Filtered Indexes and Statistics As discussed earlier in this chapter, a nonclustered index contains a row for every row in the table, even rows with a large number of duplicate key values where the nonclustered index will not be an effective method for finding those rows. For these situations, SQL Server 2008 introduces filtered indexes. Filtered indexes are an optimized form of nonclustered indexes, created by specifying a search predicate when defining the index. This search predicate acts as a filter to create the index on only the data rows that match the search predicate. This reduces the size of the index and essentially creates an index that covers your queries, which return only a small percentage of rows from a well-defined subset of data within your table. Filtered indexes can provide the following advantages over full-table indexes: . Improved query performance and plan quality—A well-designed filtered index improves query performance and execution plan quality because it is smaller than a full-table nonclustered index and has filtered statistics. Filtered statistics are more accurate than full-table statistics because they cover only the rows contained in the filtered index. . Reduced index maintenance costs—Filtered indexes are maintained only when data modifications affect the data values contained in the index. Also, because a filtered index contains only the frequently accessed data, the smaller size of the index reduces the cost of updating the statistics. . Reduced index storage costs—Filtered indexes can reduce disk storage for nonclustered indexes when a full-table index is not necessary. You can replace a full- table nonclustered index with multiple filtered indexes without significantly increasing the storage requirements. Following are some of the situations in which filtered indexes can be useful: . When a column contains mostly NULL values, but your queries search only for rows where data values are NOT NULL. . When a column contains a large number of duplicate values, but your queries typically ignore those values and search only for the more unique values. Download from www.wowebook.com ptg 1196 CHAPTER 34 Data Structures, Indexes, and Performance . When you want to enforce uniqueness on a subset of values—for example, a column on which you want to allow NULL values. A unique constraint allows only one NULL value; however, a filtered index can be defined as unique over only the rows that are NOT NULL. . When queries retrieve only a particular range of data values and you want to index these values but not the entire table. For example, you have a table that contains a large number of historical values, but you want to search only values for the current year or quarter. You can create a filtered index on the desired range of values and possibly even use the INCLUDE option to add columns so your index fully covers your queries. Now, you may be asking, “Can’t some of the preceding solutions be accomplished using indexed views?” Yes, they can, but filtered indexes provided a better alternative. The most significant advantage is that filtered indexes can be used in any edition of SQL Server 2008, whereas indexed views are chosen by the optimizer only in the Developer, Enterprise, and Datacenter Editions unless you use the NOEXPAND hint in other editions. In addition, filtered indexes have reduced index maintenance costs (the query processor uses fewer CPU resources to update a filtered index than an indexed view); the Query Optimizer considers using a filtered index in more situations than the equivalent indexed view; you can perform online rebuilds of filtered indexes (online index rebuilds are not supported for indexed views); and filtered indexes can be nonunique, whereas indexed views must be unique. Based on these advantages, it is recommended that you use filtered indexes instead of indexed views when possible. Consider replacing indexed views with filtered indexes when the view references only one table, the view query doesn’t return computed columns, and the view predicate uses simple comparison logic and doesn’t contain a view. Creating and Using Filtered Indexes To define filtered indexes, you use the normal CREATE INDEX command but include a WHERE condition as a search predicate to specify which data rows the filtered index should include. In the current implementation, you can specify only simple search predicates such as IN; the comparison operators IS NULL, IS NOT NULL, =, <>, !=, >, >=, !>, <, <=, !<; and the logical operator AND. In addition, filtered indexes cannot be created on computed columns, user-defined data types, Hierarchyid, or spatial types. For example, assume you need to search only the sales table in the bigpubs2008 database for sales since 9/1/2008. The majority of the rows in the sales table have order dates prior to 9/1/2008. To create a filtered index on the ord_date column, you would execute a command like the following: create index ord_date_filt on sales (ord_date) WHERE ord_date >= ‘2008-09-01 00:00:00.000’ Download from www.wowebook.com ptg 1197 Filtered Indexes and Statistics 34 FIGURE 34.29 Query plan for a query using a value not in the filtered index. Now, let’s look at a couple queries that may or may not use the new filtered index. First, let’s consider the following query looking for any sales for 9/15/2008: select * from sales where ord_date = ‘9/15/2008’ If you look at the execution plan in Figure 34.28, you can see that the filtered index, ord_date_filt, is used to locate the qualifying row values. The clustered index, UPKCL_sales, is used as the row locator to retrieve the data rows (as described earlier in the “Nonclustered Indexes” section). NOTE For more information on understanding and analyzing query plans, see Chapter 36. If you run the following query using a data values that’s outside the range of values stored in the filtered index, you see that the filtered index is not used (see Figure 34.29): select * from sales where ord_date = ‘9/15/2008’ FIGURE 34.28 Query plan for a query that uses a filtered index. Download from www.wowebook.com ptg 1198 CHAPTER 34 Data Structures, Indexes, and Performance Now let’s consider a query that you might expect would use the filtered index but does not: select stor_id, qty from sales where ord_date > ‘9/15/2008’ Now, you might expect that this query would use the filtered index because the data values are within the range of values for the filtered index, but due to the number of rows that match, SQL Server determines that the I/O cost of using the filtered nonclustered index to locate the matching rows and then retrieve the data rows using the clustered index row locators requires more I/Os than simply performing a clustered index scan of the entire table (the same query plan as shown in Figure 34.29). In this case, you might want to use included columns on the filtered index so that the data values for the query can be retrieved using index covering without incurring the extra cost of using the row locators to retrieve the actual data rows. The following example creates a filtered index on ord_date that includes stor_id and qty: create index ord_date_filt2 on sales (ord_date) INCLUDE (qty, stor_id) WHERE ord_date >= ‘2008-09-01 00:00:00.000’ If you rerun the same query and examine the query plan, you see that the filtered index is used this time, and SQL Server uses index covering (see Figure 34.30). You can tell that it’s using index covering with the ord_dat_filt2 index because there is no use of the clustered index to retrieve the data rows. Using the row locators is unnecessary because all the information requested by the query can be retrieved from the index leaf rows that contain the values of the included columns as well. Creating and Using Filtered Statistics Similar to the way you use filtered indexes, SQL Server 2008 also lets you create filtered statistics. Like filtered indexes, filtered statistics are also created over a subset of rows in the table based on a specified filter predicate. Creating a filtered index on a column autocreates the corresponding filtered statistics. In addition, filtered statistics can be created explicitly by including the WHERE clause with the CREATE STATISTICS statement. FIGURE 34.30 Query plan using index covering on a filtered index with included columns. Download from www.wowebook.com ptg 1199 Choosing Indexes: Query Versus Update Performance 34 Filtered statistics can be used to avoid a common issue with statistics where the cardinality estimation is skewed due to a large number of NULL or duplicate values, or due to a data correlation between columns. For example, let’s consider the titles table in the bigpubs2008 database. All the cooking books (type = ‘trad_cook’ or ’mod_cook’) are published by a single publisher (pub_id = ‘0877’). However, SQL Server stores column- level statistics on each of these columns independent of each other. Based on the statistics, SQL Server estimates there are six rows in the titles table where pub_id = ‘0877’, and five rows where the type is either ’trad_cook’ or ’mod_cook’. However, let’s assume you were to execute the following query: select * from titles where pub_id = ‘0877’ and type in (‘trad_cook’, ‘mod_cook’) When the Query Optimizer estimates the selectivity of this query where each search predicate is part of an AND condition, it assumes the conditions are independent of one another and estimates the number of matching rows by taking the intersection of the two conditions. Essentially, it multiplies the selectivity of each of the two conditions together to determine the total selectivity. The selectivity of each is 0.011 (6/537) and 0.009 (5/537), which, when multiplied together, comes out to approximately 0.0001, so the optimizer estimates at most only a single row will match. However, because all five cooking books are published by pub_id ‘0877’, in actuality a total of five rows match. Now, in this example, the difference between one row and five rows is likely not significant enough to make a big difference in query performance, but a similar estimation error could be quite large with other data sets, leading the optimizer to possibly choose an inap- propriate, and considerably more expensive, query plan. Filtered statistics can help solve this problem by letting you capture these types of data correlations in your column statistics. For example, to capture the fact that all cooking books are also published by the same publisher, you could create the filtered statistics using the following statement: create statistics pub_id_type on titles (pub_id, type) where pub_id = ‘0877’ and type in (‘trad_cook’, ‘mod_cook’) When these filtered statistics are defined and the same query is run, SQL Server uses the filtered statistics to determine that the query will match five rows instead of only one. Although using this solution could require having to define a number of filtered statistics, it can be effective to help fix your most critical queries where cardinality estimates due to data correlation or data skew issues are causing the Query Optimizer to choose poorly performing query plans. Choosing Indexes: Query Versus Update Performance I/O is the primary factor in determining query performance. The challenge for a database designer is to build a physical data model that provides efficient data access. Creating Download from www.wowebook.com ptg 1200 CHAPTER 34 Data Structures, Indexes, and Performance indexes on database tables allows SQL Server to access data with fewer I/Os. Defining useful indexes during the logical and physical data modeling step is crucial. The SQL Server Query Optimizer relies heavily on index key distribution and index density to determine which indexes to use for a query. The Query Optimizer in SQL Server can use multiple indexes in a query (through index intersection) to reduce the I/O required to retrieve information. In the absence of indexes, the Query Optimizer performs a table scan, which can be costly from an I/O standpoint. Although indexes provide a means for faster access to data, they slow down data modification statements due to the extra overhead of having to maintain the index during inserts, updates, and deletes. In a DSS environment, defining many indexes can help your queries and does not create much of a performance issue because the data is relatively static and doesn’t get updated frequently. You typically load the data, create the indexes, and forget about it until the next data load. As long as you have the necessary indexes to support the user queries and they’re getting decent response time, the penalties of having too many indexes in a DSS environment are the space wasted for indexes that possibly won’t be used, the additional time required to create the excessive indexes, and the additional time required to back up and run DBCC checks on the data. In an OLTP environment, on the other hand, too many indexes can lead to significant performance degradation, especially if the number of indexes on a table exceeds four or five. Think about it for a second. Every single-row insert is at least one data page write and one or more index page writes (depending on whether a page split occurs) for every index on the table. With eight nonclustered indexes, that would be a minimum of nine writes to the database for a single-row insert. Therefore, for an OLTP environment, you want as few indexes as possible—typically only the indexes required to support the update and delete operations and your critical queries, and to enforce your uniqueness constraints. The natural solution, in a perfect world, would be to create a lot of indexes for a DSS environment and as few indexes as possible in an OLTP environment. Unfortunately, in the real world, you typically have an environment that must support both DSS and OLTP applications. How do you resolve the competing indexing requirements of the two envi- ronments? Meeting the indexing needs of DSS and OLTP applications requires a bit of a balancing act, with no easy solution. It often involves making hard decisions as to which DSS queries might have to live with table scans and which updates have to contend with additional overhead. One solution is to have two separate databases: one for DSS applications and another for OLTP applications. Obviously, this method requires some method of keeping the databases in sync. The method chosen depends on how up-to-date the DSS database has to be. If you can afford some lag time, you could consider using a dump-and-load mechanism, such as Log Shipping or periodic full database restores. If the DSS system requires up-to-the- minute concurrency, you might want to consider using replication or database mirroring. Another possible alternative is to have only the required indexes in place during normal processing periods to support the OLTP requirements. At the end of the business day, you can create the indexes necessary to support the DSS queries and reports, and they can run Download from www.wowebook.com ptg 1201 Identifying Missing Indexes 34 as batch jobs after normal processing hours. When the DSS reports are complete, you can drop the additional indexes, and you’re ready for the next day’s processing. Note that this solution assumes that the time required to create the additional indexes is offset by the time saved by the faster running of the DSS queries. If the additional indexes do not result in substantial time savings, they are probably not necessary and need not be created in the first place. The queries need to be more closely examined to select the appropriate indexes to best support your queries. As you can see, it is important to choose indexes carefully to provide a good balance between data search and data modification performance. The application environment usually governs the choice of indexes. For example, if the application is mainly OLTP with transactions requiring fast response time, creating too many indexes might have an adverse impact on performance. On the other hand, the application might be a DSS with few transactions doing data modifications. In that case, it makes sense to create a number of indexes on the columns frequently used in queries. Identifying Missing Indexes When developing an index design for your database and applications, you should make sure you create appropriate indexes for the various queries that will be executed against your tables. However, it can be quite a chore to identify all the queries you may need to create indexes for. Fortunately, SQL Server 2008 provides a couple of tools to help you identify any indexes you may need in your database: The Database Engine Tuning Advisor and the missing index dynamic management objects. The Database Engine Tuning Advisor The Database Engine Tuning Advisor is a tool that can analyze a SQL Script file or a set of queries captured in a SQL Profiler trace and recommend changes to your indexing scheme. After performing its analysis, the Database Engine Tuning Advisor provides recommendations for new or more effective indexes, indexed views, and partitioning schemes, along with the estimated improvement in execution time should the recommendation be imple- mented. You can choose to implement the recommendations immediately or later, or you can save the SQL statements to a script file. For detailed information on using the Database Engine Tuning Advisor, see Chapter55. Although the Database Engine Tuning Advisor is a useful tool, and improvements have been made since it was introduced in SQL Server 2005 to improve its recommendations, it does still have some limitations. For one, because the Database Engine Tuning Advisor gathers statistics by sampling the data, repeatedly running the tool on the same workload may produce different results as different samples are used. In addition, if you impose constraints, such as specifying maximum disk space for tuning recommendations, the Database Engine Tuning Advisor may be forced to drop certain existing indexes, and the resulting recommendation may produce a negative expected improvement. The Database Engine Tuning Advisor may also not make recommendations under the following circumstances: . The table being tuned contains fewer than 10 data pages. Download from www.wowebook.com ptg 1202 CHAPTER 34 Data Structures, Indexes, and Performance . The recommended indexes would not offer enough improvement in query performance over the current physical database design. . The user who runs the Database Engine Tuning Advisor is not a member of the db_owner database role or the sysadmin fixed server role. Missing Index Dynamic Management Objects In addition to the Database Engine Tuning Advisor, SQL Server 2008 introduces the missing index dynamic management objects to help identify potentially missing indexes in your database. The missing index dynamic management objects are a set of new dynamic management objects introduced in SQL Server 2008: . sys.dm_db_missing_index_group_stats—Returns summary information about missing index groups, such as the performance improvements that could be gained by implementing a specific group of missing indexes. . sys.dm_db_missing_index_groups—Returns information about a specific group of missing indexes, such as the group identifier and identifiers of all missing indexes contained in that group. . sys.dm_db_missing_index_columns—Returns detailed information about a missing index; for example, it returns the name and identifier of the table where the index is missing, and the columns and column types that should make up the missing index. . sys.dm_db_missing_index_details—Returns information about the database table columns missing an index. After running a typical workload on SQL Server, you can query the dynamic management functions to retrieve information about possible missing indexes. Listing 34.8 provides a sample query that displays the missing index information for a query on the sales table that was run between 10:30 and 10:40 p.m. on 2/21/2010. LISTING 34.8 Querying the Missing Index Dynamic Management Objects SELECT mig.index_group_handle as handle, convert(varchar(30), statement) AS table_name, convert(varchar(12), column_name) AS Column_name, convert(varchar(10), column_usage) as ColumnUsage, avg_user_impact as avg_impact FROM sys.dm_db_missing_index_details AS mid CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle) INNER JOIN sys.dm_db_missing_index_groups AS mig ON mig.index_handle = mid.index_handle inner join sys.dm_db_missing_index_group_stats AS migs ON migs.group_handle = mig.index_group_handle where mid.object_id = object_id(‘sales’) and last_user_seek between ‘2010-02-21 22:30’ and ‘2010-02-21 22:40’ Download from www.wowebook.com ptg 1203 Identifying Missing Indexes 34 ORDER BY mig.index_group_handle, mig.index_handle, column_id; GO handle table_name Column_name ColumnUsage avg_impact 2 [bigpubs2008].[dbo].[sales] stor_id INCLUDE 87.46 2 [bigpubs2008].[dbo].[sales] qty INEQUALITY 87.46 In this example, the optimizer recommends an index on the qty column to support an inequality operator. It is also recommended that the stor_id column be specified as an included column in the index. This index is estimated to improve performance by 87.46%. Although the missing index feature provides some helpful information for identifying potentially missing indexes in your database, it too has a few limitations: . It is not intended to fine-tune the existing indexes, only to recommend additional indexes when no useful index is found that can be used to satisfy a search or join condition. . It reports only included columns for some queries. You need to determine whether the included columns should be specified as additional index key columns instead. . It may return different costs for the same missing index group for different executions. . It does not suggest filtered indexes. . It is unable to provide recommendations for clustered indexes, indexed views, or table partitioning (you should use the Database Engine Tuning Advisor instead for these recommendations). Probably the key limitation is that although the missing index feature is helpful for identifying indexes that may be useful for you to define, it’s not a substitute for a well-thought- out index design. Missing Index Feature Versus Database Engine Tuning Advisor The missing indexes dynamic management objects are a lightweight, server-side, always- on feature for identifying and correcting potential indexing oversights. The Database Engine Tuning Advisor, on the other hand, is a comprehensive client-side tool that can be used to assess the physical database design and recommend new physical design structures for improving performance, including not only indexes, but also indexed views or partitioning schemes. The Database Engine Tuning Advisor and missing indexes feature can possibly return different recommendations, even for a single-query workload. The reason is that the missing indexes dynamic management objects’ index key column recommendations are not order sensitive. On the other hand, the Database Engine Tuning Advisor recommendations include ordering of the key columns for indexes to optimize query performance. Download from www.wowebook.com . www.wowebook.com ptg 1195 Filtered Indexes and Statistics 34 SQL Server 2008 also supports persisted computed columns. With persisted computed columns, SQL Server stores the computed values in the table without. on database tables allows SQL Server to access data with fewer I/Os. Defining useful indexes during the logical and physical data modeling step is crucial. The SQL Server Query Optimizer relies. db_owner database role or the sysadmin fixed server role. Missing Index Dynamic Management Objects In addition to the Database Engine Tuning Advisor, SQL Server 2008 introduces the missing index dynamic

Định dạng
Số trang	10
Dung lượng	329,65 KB