Hướng dẫn học Microsoft SQL Server 2008 part 137 ppsx

Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1322 Part IX Performance Tuning and Optimization Indexes only become useful as they serve the needs of a query, so designing indexes means thinking about how the query will navigate the indexes to reach the data. ‘‘Zen and the Art of Indexing’’ means that you see the query path in your mind’s eye and design the shortest path from the query to the data. What’s New With Indexes? I ndexing is critical to SQL Server performance, and Microsoft has steadily invested in SQL Server’s indexing capabilities. Back in SQL Server 2005, my favorite new feature was included columns for non-clustered indexes, which made non-clustered indexes more efficient as covering indexes. With SQL Server 2008, Microsoft has again added several significant new indexing features. Filtered indexes means that a non-clustered index can be created that indexes only a subset of the data. This is perfect for situations like a manufacturing orders table with 2% active orders. The new star-join optimization uses bitmap filters for up to seven times performance gains when joining a single table (fact table) with several lookup (dimension) tables. The new Forceseek table hint, as the name implies, forces the Query Optimizer to choose a seek operation instead of a scan. Indexing Basics You can’t master indexing without a solid understanding of how indexes work. Please don’t skip this section. To apply the strategies described later in this chapter, you must grok the b-tree. The b-tree index Conventional wisdom says that SQL Server has two types of indexes: clustered and non-clustered; but a closer look reveals that SQL Server has in fact only one type of index: the b-tree, or balanced tree, index, because internally both clustered and non-clustered indexes are b-tree indexes. B-tree indexes exist on index pages and have a root level, one or more intermediate levels, and a leaf or node level. The columns actually sorted by the b-tree index are called the index’s key columns,asshown in Figure 64-1. The difference between clustered and non-clustered indexes is the amount and type of data stored at the leaf level. While this chapter discusses the strategies of designing and optimizing indexes and does include some code examples that demonstrate creating indexes, the sister Chapter 20, ‘‘Creating the Physical Database Schema,’’ details the actual syntax and Management Studio methods of creating indexes. Over time, indexes typically become fragmented, which significantly hurts performance. For more information on index maintenance, turn to Chapter 42, ‘‘Maintaining the Database.’’ 1322 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1323 Indexing Strategies 64 After you’ve read this chapter, I highly recommend digging deeper into the internals of SQL Server’s indexes with my favorite SQL Server book, Kalen Delaney’s SQL Server 2008 Internals (Microsoft Press, 2009). FIGURE 64-1 The b-tree index is the most basic element of SQL Server. This figure illustrates a simplified view of a clustered index with an identity column as the clustered index key. The first name is the data column. Data Columns Key Columns Balanced Tree Index 1-3 4-6 1 2 3 Matt Paul Beth 1-6 7-12 7-9 10-12 4 5 6 Nick Steve Zack 7 8 9 Tom Hank Greg 10 11 12 Susan Albert Ingrid Clustered indexes In SQL Server, when all the data columns are attached to the b-tree index’s leaf level, it’s called a clustered index, and some might call it a table or base table (refer to Figure 64-1). A clustered index is often called the physical sort order of the table, which is mostly, or at least logically, true. Logically, the clustered index pages will have the data in the clustered index sort order; but physically, on the disk, those pages are a linked list — each page links to the next page and the previous page in the list. In a perfect world the pages would be in the same order as the list, but in reality they are often moved around due to page splits and fragmentation (more on page splits later in this chapter). In this case, the links probably jump around a bit. A table may only have one physical sort order, and therefore, only one clustered index. The quintessen- tial example of a clustered index is a telephone book (the old-fashioned printed kind, not the Internet search type). The telephone book itself is a clustered index. The last name and first name columns are the index keys, and the rest of the data (address, phone number) is attached to the index. A telephone book even simulates a b-tree index. Open a telephone book to the middle. Choose the side with the name you want to find, and then split that side in half. In a few halves and splits, you’ll be at the page with the name you’re looking for. Your eye can now quickly scan that page and find the last name and first name you want. Because the address and phone number are printed right next to the names, no more searching is needed. 1323 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1324 Part IX Performance Tuning and Optimization Non-clustered indexes SQL Server can also create non-clustered indexes, which are similar to the indexes in the back of a book. This type of index is keyed, or sorted, by the keywords, and the page numbers are pointers to the book’s content. Internally, SQL Server non-clustered indexes are b-tree indexes and point to the base table, which is either a clustered index or a heap. If the base table is a clustered index, then the clustered index keys (every sort-by column) are included at every level of the non-clustered index b-tree and leaf level. If the base table is a heap, then the heap RID (row ID) is used. For example, the non-clustered index illustrated in Figure 64-2 uses the first name column as its key column, so that’s the data sorted by the b-tree. The non-clustered index points to the base table by including the clustered index key column. In Figure 64-2, the clustered index key column is the identity column used in Figure 64-1. Since SQL Server 2005, additional unsorted columns can be included in the leaf level. The employee’s title and department columns could be added to the previous index, which is extremely useful in designing covering indexes (described in the next section). A SQL Server table may have up to 999 non-clustered indexes, but I’ve never seen a well-normalized table that required more than a dozen well-designed indexes. FIGURE 64-2 This simplified illustration of a non-clustered index has a b-tree index with first name as the key column. The non-clustered index includes pointers to the clustered index key column. Clustered Keys or Heap RowID (2005) Included Columns Key Columns Balanced Tree Index (2008) Filtered A-G H-M 11 A-M N-Z N-St Su-Z 3 9 8 12 1 4 2 5 10 7 6 Albert Beth Greg Hank Ingrid Matt Nick Paul Steve Susan Tom Zack Composite indexes A composite index is a clustered or non-clustered index that is keyed, or sorted, on multiple columns. Composite indexes are common in production. 1324 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1325 Indexing Strategies 64 The order of the columns in a composite index is important. In order for a search to take advantage of a composite index it must include the index columns from left to right. If the composite index is lastname, firstname, a search for firstname can’t seek quickly through the b-tree index, but a search for lastname,orlastname and firstname, will use the b-tree. Various methods of indexing for multiple columns are examined in Query Paths 9 through 11 later in this chapter. A similar problem is searching for words within a column but not at the beginning of the text string stored in the column. For these word searches, SQL Server can use Integrated Full-Text Search (iFTS), covered in Chapter 19, ‘‘Using Integrated Full-Text Search.’’ Unique indexes and constraints Because primary keys are the unique method of identifying any row, indexes and primary keys are intertwined — in fact, a primary key must be indexed. By default, creating a primary key automatically creates a unique clustered index, but it can optionally create a unique non-clustered index instead. A unique index limits data to being unique so it’s like a constraint; and a unique constraint builds a unique index to quickly check the data. In fact, a unique constraint and a unique index are the exact same thing — creating either one builds a unique constraint/index. The only difference between a unique constraint/index and a primary key is that a primary key cannot allow nulls, whereas a unique constraint/index can permit a single null value. The page split problem Every b-tree index must maintain the key column data in the correct sort order. Inserts, updates, and deletes will affect that data. As the data is inserted or modified, if the index page to which a value needs to be added is full, then SQL Server must split the page into two less-than-full pages so it can insert the value in the correct position. Turning again to the telephone book example, if several new Nielsens moved into the area and the Nie page 515 had to now accommodate 20 additions, a simulated page split would take several steps: 1. Cut page 515 in half making two pages; call them 515a and 515b. 2. Print out and tape the new Nielsens to page 515a. 3. Tape page 515b inside the back cover of the telephone book. 4. Make a note on page 515a that the Nie listing continues on page 515b located at the end of the book, and a note on page 515b indicating that the listing continues on page 515a. Pages splits cause several performance-related problems: ■ The page split operation is expensive because it involves several steps and moving data. I’ve personally seen page splits reduce an intensive insert process’ performance by 90 percent. ■ If, after the page split, there still isn’t enough room, then the page will be split again. This can occur repeatedly depending on certain circumstances. ■ The data structure is left fragmented and can no longer be read in a single contiguous pass. The data structure has more empty space, which means less data is read with every page read and less data is stored in the buffer per page. 1325 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1326 Part IX Performance Tuning and Optimization Index selectivity Another aspect of index tuning is the selectivity of the index. An index that is very selective has more distinct index values and selects fewer data rows per index value. A primary key or unique index has the highest possible selectivity; each index key only relates to one row. An index with only a few distinct values spread across a large table is less selective. Indexes that are less selective may not even be useful as indexes. A column with three values spread throughout the table is a poor candidate for an index. A bit column has low selectivity and cannot be indexed directly. SQL Server uses its internal index statistics to track the selectivity of an index. DBCC Show_Statistic reports the last date on which the statistics were updated, and basic information about the index statistics, including the usefulness of the index. A low density indicates that the index is very selective. A high density indicates that a given index node points to several table rows and that the index may be less useful, as shown in this code sample: Use CHA2; DBCC Show_Statistics (Customer, IxCustomerName); Result (formatted and abridged; the full listing includes details for every value in the index): Statistics for INDEX ‘IxCustomerName’. Rows Average Updated Rows Sampled Steps Density key length May 1,02 42 42 33 0.0 11.547619 All density Average Length Columns 3.0303031E-2 6.6904764 LastName 2.3809524E-2 11.547619 LastName, FirstName DBCC execution completed. If DBCC printed error messages, contact your system administrator. Sometimes changing the order of the key columns can improve the selectivity of an index and its performance. Be careful, however, because other queries may depend on the order for their performance. Unordered heaps It’s also possible to create a table without a clustered index, in which case the data is stored in an unordered heap. Instead of being identified by the clustered index key columns, the rows are identified internally using the heap’s RowID. The RowID is an actual physical location composed of three values, FileID:PageNum:SlotNum, and cannot be directly queried. Any non-clustered indexes store the heap’s RowID in all levels of the index to point to the heap instead of using the clustered index key columns to point to the clustered index. Because a heap does not include a clustered index, a heap’s primary key must be a non-clustered index. 1326 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1327 Indexing Strategies 64 Why Use Heaps? I believe heaps add no value and nearly always require a bookmark lookup (explained in Query Path 5), so I avoid creating heaps. Developers who like heaps tend to be the same developers who prefer natural primary keys (as opposed to surrogate primary keys). Natural primary keys are nearly always unordered. When natural primary keys are used for clustered indexes they generate a lot of page splits, which kills performance. Heaps simply add new rows at the end of the heap and they avoid the natural primary key page split problem. Some developers claim that heaps are faster than clustered indexes for inserts. This is true only when the clustered index is designed in a way that generates page splits. Comparing insert performance between heaps and clustered surrogate primary keys, there is little measurable difference, or the clustered index is slightly faster. Heaps are organized by RIDs—row IDs (includes file, page, and row). Any seek operation (detailed soon) into a heap must use a non-clustered index and a bookmark lookup (detailed in Query Path 5 later in this chapter). Query operations Although there are dozens of logical and physical query execution operations, SQL Server uses three primary operations to actually fetch the data: ■ Table scan: Reads the entire heap and, most likely, passes all the data to a secondary filter operation ■ Index scan: Reads the entire leaf level (every row) of the clustered index or non-clustered index. The index scan operation might filter the rows and return only those rows that meet the criteria, or it might pass all the rows to another filter operation depending on the complexity of the criteria. The data may or may not be ordered. ■ Index seek: Locates specific row(s) data using the b-tree and returns only the selected rows in an ordered list, as illustrated in Figure 64-3 The Query Optimizer chooses the fetch operation with the least cost. Sequentially reading the data is a very efficient task, so an index scan and filter operation may actually be cheaper than an index seek with a bookmark lookup (see Query Path 5 below) involving hundreds of random I/O index seeks. It’s all about correctly guessing the number of rows touched and returned by each operation in the query execution plan. Path of the Query Indexes exist to serve queries — an index by itself serves no purpose. The best way to understand how to design efficient indexes is to observe and learn from the various possible paths queries take through the indexes to locate data. 1327 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1328 Part IX Performance Tuning and Optimization FIGURE 64-3 An index-seek operation navigates the b-tree index, selects a beginning row, and then scans all the required rows. Seek Clustered Index SeekD Scan There are ten kata (a Japanese word for martial arts choreographed patterns or movements), or query paths, with different combinations of indexes combined with index seeks and scans. These kata begin with a simple index scan and progress toward more complex query paths. Not every query path is an efficient query path. There are nine good paths, and three paths that should be avoided. A good test table for observing the twelve query paths in the AdventureWorks2008 database is the Production.WorkOrder table. It has 72,591 rows, only 10 columns, and a single-column clustered primary key. Here’s the table definition: CREATE TABLE [Production].[WorkOrder]( [WorkOrderID] [int] IDENTITY(1,1) NOT NULL, [ProductID] [int] NOT NULL, [OrderQty] [int] NOT NULL, [StockedQty] AS (isnull([OrderQty]-[ScrappedQty],(0))), [ScrappedQty] [smallint] NOT NULL, [StartDate] [datetime] NOT NULL, [EndDate] [datetime] NULL, [DueDate] [datetime] NOT NULL, [ScrapReasonID] [smallint] NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_WorkOrder_WorkOrderID] PRIMARY KEY CLUSTERED ([WorkOrderID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]; 1328 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1329 Indexing Strategies 64 As installed, the WorkOrder table has the three indexes, each with one column as identified in the index name: ■ PK_WorkOrder_WorkOrderID (clustered) ■ IX_WorkORder_ProductID (non-unique, non-clustered) ■ IX_WorkOrder_ScrapReasonID (non-unique, non-clustered) Performance data for each kata, listed in Table 64-1, was captured by watching the T-SQL ➪ SQL:StmtComplete and Performance ➪ Showplan XML Statistics Profile events in Profiler, and examining the query execution plan. The key performance indicators are the query execution plan optimizer costs (Cost), and the number of logical reads (Reads). For the duration column, I ran each query multiple times and averaged the results. Of course, your SQL Server machine is probably beefier than my notebook. I urge you to run the script on your own SQL Server instance, take your own performance measurements, and study the query execution plans. The Rows per ms column is calculated from the number of rows returned and the average duration. Before executing each query path, the following code clears the buffers: DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS; Query Path 1: Fetch All The first query path sets a baseline for performancebysimplyrequestingallthedatafromthebase table: SELECT * FROM Production.WorkOrder; Without a WHERE clause and every column selected, the query must read every row from the clustered index. A clustered index scan (illustrated in Figure 64-4) sequentially reads every row. This query is the longest query of all the query paths, so it might seem to be a slow query, but when comparing the number of rows returned per millisecond, the index scan returns the highest number of rows per millisecond of any query path. Query Path 2: Clustered Index Seek The second query path adds a WHERE clause to the first query and filters the result to a single row using a clustered key value: SELECT * FROM Production.WorkOrder WHERE WorkOrderID = 1234; 1329 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1330 Part IX Performance Tuning and Optimization TABLE 64-1 Query Path Performance Execution Missing Duration Rows Path Kata Plan Rows Cost Reads Index (ms) per ms 1 Fetch All C Ix Scan 72,591 .485 526 1,196 60.71 2 Clustered Index Seek C Ix Seek 1 .003 2 7 .14 3 Range Seek Query (narrow) C Ix Seek (Seek keys start-end) 11 .003 3 13 .85 Range Seek Query (wide) C Ix Seek (Seek keys start-end) 72,591 .485 526 1,257 57.73 4 Filter by non-Key Column C Ix Scan  filter (predicate) 55 .519 526 NC (include all columns) 170 .32 5 Bookmark Lookup (Select *) NC Ix Seek  BML 9 .037 29 226 .04 Bookmark Lookup (Select clustered key, non-key col) NC Ix Seek  BML 9 .037 29 128 .07 6 Covering Index (narrow) NC Ix Seek (Seek Predicate) 9 .003 2 30 .30 Covering Index (wide) NC Ix Seek (Seek Predicate) 1,105 .005 6 106 10.46 NC Seek Selecting Clustered Key (narrow) NC Ix Seek (Seek Predicate) 9 .003 2 46 .20 NC Seek Selecting Clustered Key (wide) NC Ix Seek (Seek Predicate) 1,105 .004 4 46 24.02 Filter by Include Column NC Ix Seek (Seek Predicate + Predicate) 1 .003 2 51 .02 7 Filter by 2 x NC Indexes 2 x NC Ix Seek (Predicate  Merge Join 1 .012 4 63 .02 8 Filter by Ordered NC Composite Index NC Ix Seek (Seek Predicate w/ 2 prefixes) 1 .003 2 56 .02 9 Filter by Unordered NC Composite Index NC Ix Scan 118 .209 173 NC by missing key, include C Key 72 1.64 10 Filter by Expression NC Ix Scan 9 .209 173 111 .08 1330 www.getcoolebook.com Nielsen c64.tex V4 - 07/21/2009 4:08pm Page 1331 Indexing Strategies 64 FIGURE 64-4 The clustered index scan sequentially reads all the rows from the clustered index. Clustered Index PK_WorkOrder_WorkOrderID The Query Optimizer offers two clues that there’s only one row that meets the WHERE clause criteria: statistics and the fact that WorkOrderID is the primary key constraint so it must be unique. WorkOrderID is also the clustered index key, so the Query Optimizer knows there’s a great index available to locate a single row. The clustered index seek operation navigates the clustered index b-tree and quickly locates the desired row, as illustrated in Figure 64-5. FIGURE 64-5 A clustered index seek navigates the b-tree index and locates the row in a snap. Clustered Index PK_WorkOrder_WorkOrderID 1331 www.getcoolebook.com . of SQL Server s indexes with my favorite SQL Server book, Kalen Delaney’s SQL Server 2008 Internals (Microsoft Press, 2009). FIGURE 64-1 The b-tree index is the most basic element of SQL Server. . New With Indexes? I ndexing is critical to SQL Server performance, and Microsoft has steadily invested in SQL Server s indexing capabilities. Back in SQL Server 2005, my favorite new feature was. b-tree. The b-tree index Conventional wisdom says that SQL Server has two types of indexes: clustered and non-clustered; but a closer look reveals that SQL Server has in fact only one type of index: the

Định dạng
Số trang	10
Dung lượng	603,79 KB