Microsoft SQL Server 2008 R2 Unleashed- P122 pptx

ptg 1154 CHAPTER 34 Data Structures, Indexes, and Performance LISTING 34.4 DBCC SHOW_STATISTICS Output for the aunmind Index on the authors Table dbcc show_statistics (authors, aunmind ) go Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows aunmind Mar 14 2010 10:20PM 172 172 148 1 24.06977 YES NULL 172 (1 row(s) affected) All density Average Length Columns 0.00625 6.406977 au_lname 0.005813953 13.06977 au_lname, au_fname 0.005813953 24.06977 au_lname, au_fname, au_id (3 row(s) affected) RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS Ahlberg 0 2 0 1 Alexander 0 1 0 1 Amis 0 1 0 1 Arendt 0 1 0 1 Arnosky 0 1 0 1 Bate 0 1 0 1 Bauer 0 1 0 1 Benchley 0 1 0 1 Bennet 0 1 0 1 Blotchet-Halls 0 1 0 1 del Castillo 0 1 0 1 Dillard 0 1 0 1 Doctorow 0 1 0 1 Doyle 0 1 0 1 Durrenmatt 2 1 2 1 Eastman 0 1 0 1 Gringlesby 0 1 0 1 Grisham 0 1 0 1 Gunning 0 1 0 1 Download from www.wowebook.com ptg 1155 Index Statistics 34 Hill 0 1 0 1 Hutchins 3 2 3 1 Ionesco 0 1 0 1 Ishiguro 0 1 0 1 Tyler 0 1 0 1 Van Allsburg 0 1 0 1 Van der 0 1 0 1 Van der Meer 0 1 0 1 von Goethe 0 1 0 1 Walker 0 1 0 1 Warner 0 1 0 1 White 0 2 0 1 Wilder 0 1 0 1 Williams 0 2 0 1 Wilson 0 1 0 1 Yates 0 1 0 1 Yokomoto 0 1 0 1 Young 0 1 0 1 Looking at the output, you can determine that the statistics were last updated on March 14, 2010. At the time the statistics were generated, the table had 172 rows, and all 172 rows were sampled to generate the statistics (no filtering was applied). The average key length is 24.06977 bytes. From the All density information, you can see that this index is highly selective. (A low density means high selectivity; index densities are covered shortly.) After the general information and the index densities, the index histogram is displayed. The Statistics Histogram Up to 200 sample values can be stored in the statistics histogram. Each sample value is called a step. The sample value stored in each step is the endpoint of a range of values. Three values are stored for each step: . RANGE_ROWS—This indicates how many other rows are inside the range between the current step and the step prior, not including the step values themselves. . EQ_ROWS—This is the number of rows that have the same value as the sample value. In other words, it is the number of duplicate values for the step. . Range density—This indicates the number of distinct values within the range. The range density information is actually displayed in two separate columns, DISTINCT_RANGE_ROWS and AVG_RANGE_ROWS: . DISTINCT_RANGE_ROWS is the number of distinct values between the current step and the step prior, not including the step values itself. . AVG_RANGE_ROWS is the average number of rows per distinct value within the range of the step. Download from www.wowebook.com ptg 1156 CHAPTER 34 Data Structures, Indexes, and Performance In the output in Listing 34.4, distinct key values in the first column of the index are stored as the sample values in the histogram. Because most of the values for au_lname are unique, most of the range values are 0. You can see that there is a duplicate in the index key for the last name of Hutchins (EQ_ROWS is 2). For comparison purposes, Listing 34.5 shows a snippet of the DBCC SHOW_STATISTICS output for the titleidind index on the sales table in bigpubs2008. LISTING 34.5 DBCC SHOW_STATISTICS Output for the titleidind Index on the sales Table in the bigpubs2008 Database dbcc show_statistics (sales, ‘titleidind’) go Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows titleidind Mar 14 2010 10:39PM 168725 152432 188 0.003537365 26.40519 YES NULL 168725 All density Average Length Columns 0.001858736 6 title_id 5.99844E-06 10 title_id, stor_id 5.926804E-06 26.4007 title_id, stor_id, ord_num RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS BI0194 0 274.8199 0 1 BI2184 639.6047 312.9337 2 277.1448 BI2915 893.1208 271.811 3 261.8779 BI3976 637.2789 260.778 2 276.137 BI8448 1685.068 281.8409 6 300.0652 BU1111 616.3464 276.8259 2 267.0668 BU7832 357.0157 299.8948 1 296.2236 CH0249 1067.558 279.8349 3 313.0259 CH0639 1019.879 284.8499 3 299.0454 CH0671 316.3136 259.7751 1 262.4521 CH0847 1333.867 266.796 5 295.557 CH1260 1069.884 287.8589 3 313.7079 CH1380 612.8576 311.9307 2 265.5551 CH1568 645.4193 286.8559 2 279.6643 CH1692 974.525 275.8229 3 285.7469 CH2080 329.1057 285.8529 1 273.066 CH2240 715.1943 273.817 2 309.8983 CH2256 352.364 310.9277 1 292.364 Download from www.wowebook.com ptg 1157 Index Statistics 34 CH2360 630.3014 293.8768 2 273.1136 CH2480 626.8126 311.9307 2 271.6019 CH2574 679.1439 279.8349 2 294.2774 CH2610 334.9203 280.8379 1 277.8905 CH2706 343.0607 300.8978 1 284.6448 CH2856 326.7799 287.8589 1 271.1362 FI9853 623.3239 295.8828 2 270.0902 FI9965 625.6497 323.9666 2 271.098 LC1680 629.1384 286.8559 2 272.6097 LC5292 647.7451 265.793 2 280.6721 MC3021 610.5318 244.7302 2 264.5473 NF2924 652.3968 266.796 2 282.6877 NF8918 669.8406 310.9277 2 290.2462 PC9999 665.1889 275.8229 2 288.2306 PS2106 709.3798 259.7751 2 307.3788 TC3218 617.5093 291.8708 2 267.5707 TC4203 29.23513 293.8768 0 284.9097 TC7777 29.23513 269.805 0 284.9097 As you can see in this example, there are a greater number of rows per range and a greater number of duplicates for each step value. Also, 188 steps in the histogram are used, and the sample values for the 168,725 rows in the table are distributed across those 188 step values. Also, in this example, 152,432 rows, rather than the whole table, were sampled to generate the statistics. How the Statistics Histogram Is Used The histogram steps are used for SARGs only when a constant expression is compared against an indexed column and the value of the constant expression is known at query compile time. The following SARG examples show where histogram steps can be used: . where col_a = getdate() . where cust_id = 12345 . where monthly_sales < 10000 / 12 . where l_name like “Smith” + “%” Some constant expressions cannot be evaluated until query runtime. They include search arguments that contain local variables or subqueries and also join clauses, such as the following: . where price = @avg_price . where total_sales > (select sum(qty) from sales) . where titles.pub_id = publishers.pub_id Download from www.wowebook.com ptg 1158 CHAPTER 34 Data Structures, Indexes, and Performance For these types of statements, you need some other way of estimating the number of matching rows. In addition, because histogram steps are kept only on the first column of the index, SQL Server must use a different method for determining the number of matching rows for SARGs that specify multiple column values for a composite index, such as the following: select * from sales where title_id = ‘BI3976’ and stor_id = ‘P648’ When the histogram is not used or cannot be used, SQL Server uses the index density values to estimate the number of matching rows. Index Densities SQL Server stores the density values of each column in the index for use in queries where the SARG value is not known until runtime or when the SARG is on multiple columns of the index. For composite keys, SQL Server stores the density for the first column of the composite key; for the first and second columns; for the first, second, and third columns; and so on. This information is shown in the All density section of the DBCC SHOW_STATISTICS output in Listings 34.4 and 34.5. Index density essentially represents the inverse of all unique key values of the key. The density of each key is calculated by using the following formula: Key density = 1.00 / Count of distinct key values in the table Therefore, the density for the au_lname column in the authors table in the bigpubs2008 database is calculated as follows: Select Density = 1.00/ (select count(distinct au_lname) from authors) go Density 0.0062500000000 The density for the combination of the columns au_lname and au_fname is as follows: Select Density = 1.00/ (select count(distinct au_lname + au_fname) from authors) go Density 0.0058139534883 Notice that, unlike with the selectivity ratio, a smaller index density indicates a more selective index. As the density value approaches 1, the index becomes less selective and essentially useless. When the index selectivity is poor, the Query Optimizer might choose to do a table scan or a leaf-level index scan rather than perform an index seek because it is more cost-effective. Download from www.wowebook.com ptg 1159 Index Statistics 34 TABLE 34.8 Index Densities for the titleidind Index on the sales Table Key Column Index Density title_id 0.001858736 title_id, stor_id 5.99844E-06 (.00000599844) title_id, stor_id, ord_num 5.926804E-06 (.000005926804) TIP Watch out for database indexes that have poor selectivity. Such indexes are often more of a detriment to the performance of the system than they are a help. Not only are they usually not used for data retrieval, but they also slow down your data modification statements because of the additional index overhead. You should identify such indexes and consider dropping them. Typically, the density value should become smaller (that is, more selective) as you add more columns to the key. For example, in Listing 34.5, the densities get progressively smaller (and thus, more selective) as additional columns are factored in, as shown in Table 34.8. Estimating Rows Using Index Statistics How does the Query Optimizer use the index statistics to estimate the number of rows that match the SARGs in a query? SQL Server uses the histogram information when searching for a known value being compared to the leading column of the index key column, especially when the search spans a range or when there are duplicate values in the key. Consider this query on the sales table in the bigpubs2008 database: select * from sales where title_id = ‘BI3976’ Because there are duplicates of title_id in the table, SQL Server uses the histogram on title_id (refer to Listing 34.5) to estimate the number of matching rows. For the value of BI3976, it would look at the EQ_ROWS value, which is 260.778. This indicates that there are approximately 261 rows in the table that have a title_id value of BI3976. When an exact match for the search argument is not found as a step in the histogram, SQL Server uses the AVG_RANGE_ROWS value for the next step greater than the search value. For example, SQL Server would estimate that for a search value of ’BI4184’, on average, it would >match approximately 300.0652 rows because that is the AVG_RANGE_ROWS value for the step value of ’BI8448’, which is the next step greater than ’BI3976’. Download from www.wowebook.com ptg 1160 CHAPTER 34 Data Structures, Indexes, and Performance When the query is a range retrieval that spans multiple steps, SQL Server sums the RANGE_ROWS and EQ_ROWS values between the endpoints of the range retrieval. For example, when we use the histogram in Listing 34.5, if the search argument were where title_id <= ‘BI3976’, the row estimate would be 274.8199+639.6047+312.9337+893.1208+ 271.811+637.2789+260.778, or 3290.3470 rows. As mentioned previously, when the histogram cannot be used, SQL Server uses just the index density to estimate the number of matching rows. The formula is straightforward for an equality search; it looks like this: Row Estimate = Number of Rows in Table × Index Density For example, to estimate the number of matching rows for any given title_id in the sales table, multiply the number of rows in the sales table by the index density for the title_id key (0.001862197), as follows: select count(*) * 0.001862197 as ‘Row Estimate’ from sales go Row Estimate 314.199188825 If a query specifies both the title_id and stor_id as SARGs, and if the SARG for title_id is a constant expression that can be evaluated at optimization time, SQL Server uses both the index density on title_id and stor_id as well as the histogram on title_id to estimate the number of matching rows. For some data values, the estimated number of matching rows for title_id and stor_id calculated using the index density could be greater than the estimated number of rows that match the specific title_id, as determined by the histogram. SQL Server uses whichever is the smaller of the two to calculate the row estimate. Multiplying the number of rows in the sales table by the index density for title_id, stor_id (5.997505E-06), you can see that it is nearly unique, essentially matching only a single row: select count(*) * 5.997505E-06 as ‘Row Estimate’ from sales Row Estimate 1.011929031125 In this example, SQL Server would use the index density on title_id and stor_id to estimate the number of matching rows. In this case, it is estimated that the query will return, on average, one matching row. Download from www.wowebook.com ptg 1161 Index Statistics 34 Generating and Maintaining Index and Column Statistics At this point, you might ask, “How do the index statistics get created?” and “How are they maintained?” The index statistics are first created when you create the index on a table that already contains data rows or when you run the UPDATE STATISTICS command. Index statistics can also be automatically updated by SQL Server. SQL Server can be configured to constantly monitor the update activity on the indexed key values in a database and update the statistics through an internal process, when appropriate. Auto-Update Statistics To automatically update statistics, an internal SQL Server process monitors the updates to a table’s columns to determine when statistics should be updated. SQL Server internally keeps track of the number of modifications made to a column via column modification counters (colmodctrs). SQL Server uses information about the table and the colmodctrsto determine whether statistics are out of date and need to be updated. Statistics are consid- ered out of date in the following situations: . When the table size has gone from 0 to >0 rows . When the number of rows in the table at the time the statistics were gathered was 500 or fewer and the colmodctr of the leading column of the statistics object has changed by more than 500 . When the table had more than 500 rows at the time the statistics were gathered and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table If the statistics are defined on a temporary table, there is an additional threshold for updating statistics every six column modifications if the table contains fewer than 500 rows. The colmodctrs are incremented in the following situations: . When a row is inserted into the table . When a row is deleted from the table . When an indexed column is updated Whenever the index statistics have been updated for a column, the colmodctr for that column is reset to 0. When SQL Server generates an update of the column statistics, it generates the new statistics based on a sampling of the data values in the table. Sampling helps minimize the overhead of the AutoStats process. The sampling is random across the data pages, and the values are taken from the table or the smallest nonclustered index on the columns needed to generate the statistics. After a data page containing a sampled row has been read from disk, all the rows on the data page are used to update the statistical information. Download from www.wowebook.com ptg 1162 CHAPTER 34 Data Structures, Indexes, and Performance CAUTION Having up-to-date statistics on tables helps ensure that optimum execution plans are being generated for queries at all times. In most cases, you would want SQL Server to automatically keep the statistics updated. However, it is possible that Auto-Update Statistics can cause an update of the index statistics to run at inappropriate times in a production environment or in a high-volume environment to run too often. If this prob- lem is occurring, you might want to turn off the AutoStats feature and set up a sched- uled job to update statistics during off-peak periods. Do not forget to update statistics periodically; otherwise, the resulting performance problems might end up being much worse than the momentary ones caused by the AutoStats process. To determine how often the AutoStats process is being run, you can use SQL Server Profiler to determine when an automatic update of index statistics is occurring by moni- toring the Auto Stats event in the Performance event class. (For more information on using SQL Server Profiler, see Chapter 6.) If necessary, it is possible to turn off the AutoStats behavior by using the sp_autostats system stored procedure. This stored procedure allows you to turn the automatic updating of statistics on or off for a specific index or all the indexes of a table. The following command turns off the automatic update of statistics for an index named aunmind on the authors table: Exec sp_autostats ‘authors’, ‘OFF’, ‘aunmind’ When you run sp_autostats and simply supply the table name, it displays the current setting for the table as well as the database. Following are the settings for the authors table: Exec sp_autostats ‘authors’ go Global statistics settings for [bigpubs2008]: Automatic update statistics: ON Automatic create statistics: ON settings for table [authors] Index Name AUTOSTATS Last Updated [UPKCL_auidind] ON 2009-10-19 01:23:47.263 [aunmind] OFF 2010-03-14 22:20:52.177 [_WA_Sys_state_4AB81AF0] ON 2009-10-19 01:23:47.263 [au_fname] ON 2009-10-19 01:23:47.280 [phone] ON 2009-10-19 01:23:47.293 [address] ON 2009-10-19 01:23:47.310 [city] ON 2009-10-19 01:23:47.310 [zip] ON 2009-10-19 01:23:47.310 Download from www.wowebook.com ptg 1163 Index Statistics 34 There are three other ways to disable auto-updating of statistics for an index: . Specify the STATISTICS_NORECOMPUTE clause when creating the index. . Specify the NORECOMPUTE option when running the UPDATE STATISTICS command. . Specify the NORECOMPUTE option when creating statistics with the CREATE STATISTICS command. (You learn more about this command in the “Creating Statistics” section, later in the chapter.) You can also turn AutoStats on or off for the entire database by setting the database option in SQL Server Management Studio; to do this, right-click the database in Object Explorer to bring up the Database Properties dialog, select the Options page, and set the Auto Update Statistics option to False. You can also disable or enable the AutoStats option for a database by using the ALTER DATABASE command: ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS { ON | OFF } NOTE What actually happens when you execute sp_autostats or use the NORECOMPUTE option in the UPDATE STATISTICS command to turn off auto-update statistics for a specific index or table? SQL Server internally sets a flag in the system catalog to inform the internal SQL Server process not to update the index statistics for the table or index that has had the option turned off using any of these commands. To re-enable Auto Update Statistics, you either run UPDATE STATISTICS without the NORECOMPUTE option or execute the sp_autostats system stored procedure and specify the value ’ON’ for the second parameter. Asynchronous Statistics Updating In versions prior to SQL Server 2005, when SQL Server determined that the statistics being examined to optimize a query were out of date, the query would wait for the statistics update to complete before compilation of the query plan would continue. This is still the default behavior in SQL Server 2008. However, the database option, AUTO_UPDATE_STATISTICS_ASYNC, can be enabled to support asynchronous statistics updating. When the AUTO_UPDATE_STATISTICS_ASYNC option is enabled, queries do not have to wait for the statistics to be updated before compiling. Instead, SQL Server puts the out-of-date statistics on a queue to be updated by a worker thread, which runs as a background process. The query and any other concurrent queries compile immediately by using the existing out-of-date statistics. Because there is no delay for updated statistics, query response times are more predictable, even if the out-of-date statistics may cause the Query Optimizer to choose a less-efficient query plan. Queries that start after the updated statistics are ready use the updated statistics. Download from www.wowebook.com . the histogram is not used or cannot be used, SQL Server uses the index density values to estimate the number of matching rows. Index Densities SQL Server stores the density values of each column. is not found as a step in the histogram, SQL Server uses the AVG_RANGE_ROWS value for the next step greater than the search value. For example, SQL Server would estimate that for a search value. run the UPDATE STATISTICS command. Index statistics can also be automatically updated by SQL Server. SQL Server can be configured to constantly monitor the update activity on the indexed key values

Định dạng
Số trang	10
Dung lượng	231,18 KB