SQL Server Tacklebox- P23 ppt

5 119 0
SQL Server Tacklebox- P23 ppt

Đang tải... (xem toàn văn)

Thông tin tài liệu

4 – Managing data growth 110 Also, it should be noted that users will be unable to connect to the Book_List table for the duration of the index build. Essentially, SQL Server has to physically order those millions of records to align with the definition of the clustered index. Let's see what the index took out of my hide by way of space. The former index space for this table was 8K and data space was over 3 Gig. What does sp_spaceused tell me now? See Figure 4.17. Figure 4.17: Building the clustered index has increased the index_size to 5376KB. An increase in index_size to 5376K does not seem too significant. When you create a clustered index, the database engine takes the data in the heap (table) and physically sorts it. In the simplest terms, both a heap and a clustered table (a table with a clustered index) both store the actual data, one is just physically sorted. So, I would not expect that adding a clustered index for the Read_ID column to cause much growth in index_size. However, while the data size and index size for the Book_List table did not grow significantly, the space allocated for the database did double, as you can see from Figure 4.18. 4 – Managing data growth 111 Figure 4.18: Creating the clustered index caused the data file to double in size. So not only did the index addition take the table offline for the duration of the build, 12 minutes, it also doubled the space on disk. The reason for the growth is that SQL Server had to do all manner of processing to reorganize the data from a heap to a clustered table and additional space, almost double, was required to accommodate this migration from a heap table to a clustered table. Notice, though, that after the process has completed there is nearly 50% free space in the expanded file. The question remains, did I benefit from adding this index, and do I need to add any covering non-clustered indexes? First, let's consider the simple query shown in Listing 4.2. It returns data based on a specified range of Read_ID values (I know I have a range of data between 1 and 2902000 records). Select book_list.Read_ID, book_list.Read_Date, book_list.Book, 4 – Managing data growth 112 book_list.Person from book_list where Read_Id between 756000 and 820000 Listing 4.2: A query on the Read_ID column. This query returned 64,001 records in 2 seconds which, at first glance, appears to be the sort of performance I'd expect. However, to confirm this, I need to examine the execution plan, as shown in Figure 4.19. Figure 4.19: Beneficial use of clustered index for the Book_list table. You can see that an Index Seek operation was used, which indicates that this index has indeed served our query well. It means that the engine was able to retrieve all of the required data based solely on the key values stored in the index. If, instead, I had seen an Index Scan, this would indicate that the engine decided to scan every single row of the index in order to retrieve the ones required. An Index Scan is similar in concept to a table scan and both are generally inefficient, especially when dealing with such large record sets. However, the query engine will sometimes choose to do a scan even if a usable index is in place if, for example, a high percentage of the rows need to be returned. This is often an indicator of an inefficient WHERE clause. Let's say I now want to query a field that is not included in the clustered index, such as the Read_Date. I would like to know how many books were read on July 24th of 2008. The query would look something like that shown in Listing 4.3. Select count(book_list.Read_ID), book_list.Read_Date from book_list where book_list.Read_Date between '07/24/2008 00:00:00' and '07/24/2008 11:59:59' Group By book_list.Read_Date Listing 4.3: A query that is not covered by the clustered index. 4 – Managing data growth 113 Executing this query, and waiting for the results to return, is a bit like watching paint dry or, something I like to do frequently, watching a hard drive defragment. It took 1 minute and 28 seconds to complete, and returned 123 records, with an average count of the number of books read on 7/24/2008 of 1000. The execution plan for this query, not surprisingly, shows that an index scan was utilized, as you can see in Figure 4.20. Figure 4.20: Clustered index scan for field with no index. What was a bit surprising, though, is that the memory allocation for SQL Server shot up through the roof as this query was executed. Figure 4.21 shows the memory consumption at 2.51G which is pretty drastic considering the system only has 2G of RAM. Figure 4.21: Memory utilization resulting from date range query. The reason for the memory increase is that, since there was no available index to limit the data for the query, SQL Server had to load several million records into the buffer cache in order to give me back the 123 rows I needed. Unless you have enabled AWE, and set max server memory to 2G (say) less than total server memory (see memory configurations for SQL Server in Chapter 1), then the server is going to begin paging, as SQL Server grabs more than its fair share of memory, and thrashing disks. This will have a substantial impact on performance. If there is one thing that I know for sure with regard to SQL Server configuration and management, it is that once SQL Server has acquired memory, it does not like to give it back to the OS unless prodded to do so. Even though the query I ran 4 – Managing data growth 114 has completed many minutes ago, my SQL Server instance still hovers at 2.5G of memory used, most of it by SQL Server. It's clear that I need to create indexes that will cover the queries I need to run, and so avoid SQL Server doing such an expensive index scan. I know that this is not always possible in a production environment, with many teams of developers all writing their own queries in their own style, but in my isolated environment it is an attainable goal. The first thing I need to do is restart SQL Server to get back down to a manageable level of memory utilization. While there are other methods to reduce the memory footprint, such as freeing the buffer cache ( DBCC DROPCLEANBUFFERS ), I have the luxury of an isolated environment and restarting SQL Server will give me a "clean start" for troubleshooting. Having done this, I can add two non-clustered indexes, one which will cover queries on the Book field and the other the Read_Date field. Having created the two new indexes, let's take another look at space utilization in the Book_List table, using sp_spaceused, as shown in Figure 4.22. Figure 4.22: Increased index size for 2 non clustered indexes. The index_size has risen from 5MB to 119MB, which seems fairly minimal, and an excellent trade-off assuming we get the expected boost in the performance of the read_date query. If you are a DBA, working alongside developers who give you their queries for analysis, this is where you hold your breath. Breath held, I click execute. And … the query went from 1 minute 28 seconds to 2 seconds without even a baby's burp in SQL Server memory. The new execution plan, shown in Figure 4.23, tells the full story. . memory to 2G (say) less than total server memory (see memory configurations for SQL Server in Chapter 1), then the server is going to begin paging, as SQL Server grabs more than its fair share. my SQL Server instance still hovers at 2.5G of memory used, most of it by SQL Server. It's clear that I need to create indexes that will cover the queries I need to run, and so avoid SQL. performance. If there is one thing that I know for sure with regard to SQL Server configuration and management, it is that once SQL Server has acquired memory, it does not like to give it back to

Ngày đăng: 04/07/2014, 23:20