ptg 794 CHAPTER 25 Creating and Managing Indexes Dave Bob Amy Zelda Elizabeth Elizabeth George George Amy Sam Sam Alexis, Amy, Root Page Intermediate Page Data Page Amundsen, Fred, Baker, Joe, Best, Elizabeth, Albert, John, Masonelli, Irving, Narin, Anabelle, Naselle, Amy, Neat, Juanita Mason, Emma, Zelda Amy Amy Emma Leaf Page Anabelle FIGURE 25.2 A simplified diagram of a nonclustered index. A nonclustered index is also structured as a B-tree. Figure 25.2 shows a simplified diagram of a nonclustered index defined on a first name column. As with a clustered index, in a nonclustered index, all index key values are stored in the nonclustered index levels in sorted order, based on the index key(s). This sort order is typi- cally different from the sort order of the table itself. The main difference between a nonclustered index and clustered index is that the leaf row of a nonclustered index is independent of the data rows in the table. The leaf level of a nonclustered index contains a row for every data row in the table, along with a pointer to locate the data row. This pointer is either the clustered index key for the data row, if the table has a clustered index on it, or the data page ID and row ID of the data row if the table is stored as a heap struc- ture (that is, if the table has no clustered index defined on it). To locate a data row via a nonclustered index, SQL Server starts at the root node and navi- gates through the appropriate index pages in the intermediate levels of the index until it reaches the leaf page, which should contain the index key for the desired data row. It then scans the keys on the leaf page until it locates the desired index key value. SQL Server then uses the pointer to the data row stored with the index key to retrieve the correspond- ing data row. Download from www.wowebook.com ptg 795 Creating Indexes 25 NOTE For a more detailed discussion of clustered tables versus heap tables (that is, tables with no clustered indexes) and more detailed descriptions of clustered and nonclus- tered index key structures and index key rows, as well as how SQL Server internally maintains indexes, see Chapter 34. The efficiency of the index lookup and the types of lookups should drive the selection of nonclustered indexes. In the book index example, a single page reference is a very simple lookup for the book reader and requires little work. If, however, many pages are referenced in the index, and those pages are spread throughout the book, the lookup is no longer simple, and much more work is required to get all the information. You should choose your nonclustered indexes with the book index example in mind. You should consider using nonclustered indexes for the following: . Queries that do not return large result sets . Columns that are frequently used in the WHERE clause that return exact matches . Columns that have many distinct values (that is, high cardinality) . All columns referenced in a critical query (a special nonclustered index called a covering index that eliminates the need to go to the underlying data pages) Having a good understanding of your data access is essential to creating nonclustered indexes. Fortunately, SQL Server comes with tools such as the SQL Server Profiler and Database Engine Tuning Advisor that can help you evaluate your data access paths and determine which columns are the best candidates. SQL Profiler is discussed in more detail in Chapter 6, “SQL Server Profiler.” In addition, Chapter 34 discusses the use of the SQL Server Profiler and Database Engine Tuning Advisor to assist in developing an optimal indexing strategy. Creating Indexes The following sections examine the most common means for creating indexes in SQL Server. Microsoft provides several different methods for creating indexes, each of which has advantages. The method used is often a matter of personal preference, but there are situations in which a given method has distinct advantages. Creating Indexes with T-SQL Transact-SQL (T-SQL) is the most fundamental means for creating an index. This method was available in all previous versions of SQL Server. It is a very powerful option for creat- ing indexes because the T-SQL statements that create indexes can be stored in a file and Download from www.wowebook.com ptg 796 TABLE 25.1 Arguments for CREATE INDEX Argument Explanation UNIQUE Indicates that no two rows in the index can have the same index key values. Inserts into a table with a UNIQUE index will fail if a row with the same value already exists in the table. CLUSTERED | NON-CLUSTERED Defines the index as clustered or nonclustered. NON-CLUSTERED is the default. Only one clus- tered index is allowed per table. index_name Specifies the name of the index to be created. object Specifies the name of the table or view to be indexed. column_name Specifies the column or columns that are to be indexed. ASC | DESC Specifies the sort direction for the particular index column. ASC creates an ascending sort order and is the default. The DESC option causes the index to be created in descending order. INCLUDE (column [ , n ] ) Allows a column to be added to the leaf level of an index without being part of the index key. This is a new argument. run as part of a database installation or upgrade. In addition, T-SQL scripts that were used in prior SQL Server versions to create indexes can be reused with very little change. You can create indexes by using the T-SQL CREATE INDEX command. Listing 25.1 shows the basic CREATE INDEX syntax. Refer to SQL Server 2008 Books Online for the full syntax. LISTING 25.1 CREATE INDEX Syntax CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name ON <object> ( column [ ASC | DESC ] [ , n ] ) [ INCLUDE ( column_name [ , n ] ) ] [ WHERE <filter_predicate> ] [ WITH ( <relational_index_option> [ , n ] ) ] Table 25.1 lists the CREATE INDEX arguments. CHAPTER 25 Creating and Managing Indexes Download from www.wowebook.com ptg 797 TABLE 25.1 Arguments for CREATE INDEX Argument Explanation WHERE <filter_predicate> This argument, new to SQL Server 2008, is used to create a filtered index. The filter_predicate contains a WHERE clause that limits the number of rows in the table that are included in the index. relational_index_option Specifies the index option to use when creating the index. Creating Indexes 25 Following is a simple example using the basic syntax of the CREATE INDEX command: CREATE NONCLUSTERED INDEX [NC_Person_LastName] ON [Person].[Person] ( [LastName] ASC ) This example creates a nonclustered index on the person.person table, based on the LastName column. The NONCLUSTERED and ASC keywords are not necessary because they are the defaults. Because the UNIQUE keyword is not specified, duplicates are allowed in the index (that is, multiple rows in the table can have the same LastName). Unique indexes are more involved because they serve two roles: they provide fast access to the data via the index’s columns, but they also serve as a constraint by allowing only one row to exist on a table for the combination of column values in the index. They can be clustered or nonclustered. Unique indexes are also defined on a table whenever you define a unique or primary key constraint on a table. The following example shows the creation of a nonclustered unique index: CREATE UNIQUE NONCLUSTERED INDEX [AK_CreditCard_CardNumber] ON [Sales].[CreditCard] ( [CardNumber] ASC ) This example creates a nonclustered index named AK_CreditCard_CardNumber on the Sales.CreditCard table. This index is based on a single column in the table. When it is created, this index prevents credit card rows with the same credit card number from being inserted into the CreditCard table. Download from www.wowebook.com ptg 798 TABLE 25.2 Relational Index Options for CREATE INDEX Argument Explanation PAD_INDEX = {ON | OFF} Determines whether free space is allocated to the non-leaf-level pages of an index. The percentage of free space is determined by FILLFACTOR. FILLFACTOR = fillfactor Determines the amount of free space left in the leaf level of each index page. The fillfactor values represent a percentage, from 0 to 100. The default value is 0. If fillfactor is 0 or 100, the index leaf-level pages are filled to capacity, leaving only enough space for at least one more row to be inserted. SORT_IN_TEMPDB = {ON | OFF} Specifies whether intermediate sort results that are used to create the index are stored in tempdb. Using them can speed up the creation of the index (if tempdb is on a separate disk), but it requires more disk space. IGNORE_DUP_KEY = {ON | OFF} Determines whether multirow inserts will fail when duplicate rows in the insert violate a unique index. When this option is set to ON, duplicate key values are ignored, and the rest of the multirow insert succeeds. When it is OFF (the default), the entire multirow insertfails if a duplicate is encountered. STATISTICS_NO_RECOMPUTE = {ON | OFF} Determines whether distribution statistics used by the Query Optimizer are recomputed. When ON, the statistics are not automatically recomputed. DROP_EXISTING = {ON | OFF} Determines whether an index with the same name is dropped prior to re-creation. This can provide some performance benefits over dropping the exist- ing index first and then creating. Clustered indexes see the most benefit. ONLINE = {ON | OFF} Determines whether the index is built such that the underlying table is still available for queries and data modification during the index creation. This new feature is discussed in more detail in the “Online Indexing Operations” section, later in this chapter. The relational index options listed in Table 25.2 allow you to define more sophisticated indexes or specify how an index is to be created. CHAPTER 25 Creating and Managing Indexes Download from www.wowebook.com ptg 799 TABLE 25.2 Relational Index Options for CREATE INDEX Argument Explanation ALLOW_ROW_LOCKS = {ON | OFF} Determines whether row locks are allowed when accessing the index. The default for this new feature is ON. ALLOW_PAGE_LOCKS = {ON | OFF} Determines whether page locks are allowed when accessing the index. The default for this new feature is ON. MAXDOP = number of processors Determines the number of processors that can be used during index operations. The default for this new feature is 0, which causes an index operation to use the actual number of processors or fewer, depending on the workload on the system. This can be a useful option for index operations on large tables that may impact performance during the operation. For example, if you have four proces- sors, you can specify MAXDOP = 2 to limit the index operation to use only two of the four processors. DATA_COMPRESSION = { NONE | ROW | PAGE} [ ON PARTITIONS ( { <parti- tion_number_expression> | <range> } [ , n ] ) Determines whether data compression is used on the specified index. The compression can be done on the row or page level and specific index parti- tions can be compressed if the index uses parti- tioning. Creating Indexes 25 The following example creates a more complex index that utilizes several of the index options described in Table 25.2: CREATE NONCLUSTERED INDEX [ IX_Person_LastName_FirstName_MiddleName] ON [Person].[Person] ( [LastName] ASC, [FirstName] ASC, [MiddleName] ASC )WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, FILLFAC- TOR=80) This example creates a nonclustered composite index on the person’s last name (LastName), first name (FirstName), and middle name (MiddleName). It utilizes some of the commonly used options and demonstrates how multiple options can be used in a single CREATE statement. Download from www.wowebook.com ptg 800 TIP SQL Server Management Studio (SSMS) has several methods for generating the T-SQL code that creates indexes. You therefore rarely need to type index CREATE statements from scratch. Instead, you can use the friendly GUI screens that enable you to specify the common index options, and then you can generate the T-SQL script that can be executed to create the index. Additional syntax options (not listed here) relate to backward compatibility and the creation of indexes on XML columns. Refer to Chapter 47, “Using XML in SQL Server 2008,” and the SQL Server Books Online documentation for further details. Creating Indexes with SSMS SQL Server 2008 has many options for creating indexes within SSMS. You can create indexes within SSMS via the Database Engine Tuning Advisor, database diagrams, the Table Designer, and several places within the Object Explorer. The means available from the Object Explorer are the simplest to use and are the focus of this section. The other options are discussed in more detail in related chapters of this book. Index creation in the Object Explorer is facilitated by the New Index screen. You can launch this screen from SMSS by expanding the database tree in the Object Explorer and navigating to the Indexes node of the table that you want to add the index to. Then you right-click the Indexes node and select New Index. A screen like the one shown in Figure 25.3 is displayed. The name and options that are populated in Figure 25.3 are based on the person index created in the previous T-SQL section. The LastName, FirstName, and MiddleName columns were selected and added as part of this new index by clicking the Add button, which displays a screen with all the columns in the table that are available for the index. You simply select the column(s) you want to include on the index. This populates the Index Key Columns grid on the default General page. You can select other options for an index by changing the Select a Page options available on the top-left side of the New Index screen. The Options, Included Columns, Storage, Spatial, and Filter pages each provide a series of options that relate to the corresponding category and are utilized when creating the index. Of particular interest is the Included Columns page. This page allows you to select columns that you want to include in the leaf-level pages of the index but don’t need as part of the index key. For example, you could consider using included columns if you have a critical query that often selects last name, first name, and address from a table but uses only the last name and first name as search arguments in the WHERE clause. This may be a situation in which you would want to consider the use of a covering index that places all the referenced columns from the query into a nonclustered index. In the case of our critical query, the address column can be added to the index as an included column. It is not included in the index key, but it is available in the leaf-level pages of the index so that the additional overhead of going to the data pages to retrieve the address is not needed. CHAPTER 25 Creating and Managing Indexes Download from www.wowebook.com ptg 801 FIGURE 25.3 Using Object Explorer to create indexes. Creating Indexes 25 The Spatial and Filter option pages are new to SQL Server 2008. The Spatial page can be used to create spatial indexes on a column that is defined as a spatial data type; that is either type geometry or geography. If your table contains a column of this data type, you can use the Index Type drop-down to change the index type to Spatial. After this is done, you can add a column that is defined as a spatial data type to the index. Finally, you can select the Spatial option page, as shown in Figure 25.4, that allows you to fully define a spatial index. The meaning of the parameters on this page are beyond the scope of this chapter and are discussed in more detail in Chapter 34. The Filter option page allows you to define a filtering criterion to limit the rows that are included in the index. The page, shown in Figure 25.5, is relatively simple with a single input area that contains your filtering criterion. This criterion is basically the contents of a WHERE clause that is similar to what you would use in a query window to filter the rows in your result. The filter expression shown in Figure 25.5 was defined for an index on the PersonType column, which is found in the Person.Person table of the AdventureWorks2008 sample database. Many of the rows in this table have a PersonType value equal to ’IN’ so a filtered index that does not include rows with this value will dramatically reduce the size of the index and make searches on values other than ’IN’ relatively fast. After selecting all the options you want for your index via the New Index screen, you have several options for actually creating the index. You can script the index, schedule the index creation for a later time, or simply click OK to allow the New Index screen to add the index immediately. As mentioned earlier, it is a good idea to use this New Index Download from www.wowebook.com ptg 802 CHAPTER 25 Creating and Managing Indexes FIGURE 25.4 Spatial Index options page. FIGURE 25.5 Filter Index options page. Download from www.wowebook.com ptg 803 Managing Indexes screen to specify the index options, and then you can click the Script button to generate all the T-SQL statements needed to create the index. You can then save this script to a file to be used for generating a database build script or for maintaining a record of the indexes defined in a database. Managing Indexes There are two different aspects to index management. The first aspect is the management of indexes by the SQL Server database engine. Fortunately, the engine does a good job of managing the indexes internally so that limited manual intervention is required. This is predicated on a well-designed database system and the use of SQL Server features, such as automatic updates to distribution statistics. The other aspect of index management typically comes into play when performance issues arise. Index adjustments and maintenance of these indexes make up the bulk of this effort. Managing Indexes with T-SQL One of the T-SQL features available with SQL Server 2008 is the ALTER INDEX statement. This statement simplifies many of the tasks associated with managing indexes. Index oper- ations such as index rebuilds and changes to fill factor that were previously handled with DBCC commands are now available via the ALTER INDEX statement. The basic syntax for ALTER INDEX is as follows: ALTER INDEX {index_name | ALL} ON [{database_name.[schema_name]. | schema_name.}] {table_or_view_name} { REBUILD [WITH(<rebuild_index_option>[, n])] | REORGANIZE [ WITH( LOB_COMPACTION = {ON | OFF})] | DISABLE | SET (<set_index_option>[, n]) } Let’s look at a few examples that demonstrate the power of the ALTER INDEX statement. The first example simply rebuilds the primary key index on the Production.Product table: ALTER INDEX [PK_Product_ProductID] ON [Production].[Product] REBUILD This offline operation is equivalent to the DBCC DBREINDEX command. The specified index is dropped and re-created, removing all fragmentation from the index pages. This is done dynamically, without the need to drop and re-create constraints that reference any of the affected indexes. If it is run on a clustered index, the data pages of the table are defrag- mented as well. If you specify the ALL option for the ALTER INDEX command, all indexes as well as the data pages of the table (if the table has a clustered index) are defragmented. 25 Download from www.wowebook.com . Refer to Chapter 47, “Using XML in SQL Server 2008, ” and the SQL Server Books Online documentation for further details. Creating Indexes with SSMS SQL Server 2008 has many options for creating. advantages. Creating Indexes with T -SQL Transact -SQL (T -SQL) is the most fundamental means for creating an index. This method was available in all previous versions of SQL Server. It is a very powerful. columns are the best candidates. SQL Profiler is discussed in more detail in Chapter 6, SQL Server Profiler.” In addition, Chapter 34 discusses the use of the SQL Server Profiler and Database