ptg 1124 CHAPTER 34 Data Structures, Indexes, and Performance sp_estimate_data_compression_savings [ @schema_name = ] ‘schema_name’ , [ @object_name = ] ‘object_name’ , [@index_id = ] index_id , [@partition_number = ] partition_number , [@data_compression = ] ‘data_compression’ You can estimate the data compression savings for a table for either row or page compres- sion by specifying either ’ROW’ or ’PAGE’ as the value for the @data_compression parame- ter. You can also estimate the average size of the compressed table if compression is disabled by specifying NONE as the value for @data_compression. You can also use the sp_estimate_data_compression_savings procedure to estimate the space savings for compression on a specific index or partition. The following example estimates the space savings if page compression were applied to the big_sales table in the bigpubs2008 table versus row compression: use bigpubs2008 go exec sp_estimate_data_compression_savings ‘dbo’, ‘sales_big’, null, null, ‘PAGE’ go object_name schema_name index_id partition_number size_with_current_compression_setting(KB) size_with_requested_compression_setting(KB) sample_size_with_current_compression_setting(KB) sample_size_with_requested_compression_setting(KB) sales_big dbo 1 1 116512 39128 40016 13440 sales_big dbo 2 1 36648 22128 10904 6584 exec sp_estimate_data_compression_savings ‘dbo’, ‘sales_big’, null, null, ‘ROW’ go Download from www.wowebook.com ptg 1125 Data Compression 34 object_name schema_name index_id partition_number size_with_current_compression_setting(KB) size_with_requested_compression_setting(KB) sample_size_with_current_compression_setting(KB) sample_size_with_requested_compression_setting(KB) sales_big dbo 1 1 116512 97936 40344 33912 sales_big dbo 2 1 36648 27176 10992 8152 You can see in this example that the space savings from page compression would be significant, with an estimated reduction in the size of the table itself (index_id = 1) from 113MB (116,512 KB) to 38MB (39,128 KB), a savings of more than 66%. Row compression would not provide nearly as significant a savings, with an estimated reduction in size from 113MB to only 95MB (97,936 KB) , only a 16% savings. If you compress the table, you can compare the estimated space savings to the actual size. For example, let’s look at the initial size of the sales_big table: use bigpubs2008 go select sum(page_count) as pages, sum(compressed_page_count) as compressed_pages from sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(‘sales_big’), 1, null, ‘DETAILED’) where index_level = 0 SELECT SUM(used_page_count/ 128.0) AS size_in_MB FROM sys.dm_db_partition_stats WHERE object_id=OBJECT_ID(‘dbo.sales_big’) AND index_id=1 GO pages compressed_pages 14519 0 Download from www.wowebook.com ptg 1126 CHAPTER 34 Data Structures, Indexes, and Performance size_in_MB 113.742187 Now, implement page compression on the sales_big table: ALTER TABLE sales_big REBUILD WITH (DATA_COMPRESSION=PAGE) Now, re-examine the size of the sales_big table: select sum(page_count) as pages, sum(compressed_page_count) as compressed_pages from sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(‘sales_big’), 1, null, ‘DETAILED’) where index_level = 0 SELECT SUM(used_page_count/ 128.0) AS size_in_MB FROM sys.dm_db_partition_stats WHERE object_id=OBJECT_ID(‘dbo.sales_big’) AND index_id=1 GO pages compressed_pages 4452 4451 size_in_MB 34.906250 In this example, you can see that the table was reduced in size significantly, from 14,519 pages to 4,452 pages (113.7MB to 34.9MB), pretty much right in line with the estimated space savings. You can also see that compression was reasonably effective, compressing 4,451 of 4,452 pages. Be aware that you may not always receive the space savings predicted due to the effects of fill factor and the actual size of the rows. For example, if you have a row that is 8,000 bytes long and compression reduces its size by 40%, only one row can still be fit on the data page, so there is no space savings for that page. If the results of running sp_estimate_data_compression_savings indicate that the table will grow, this indicates that many of the rows in the table are using nearly the full precision of the data types, and the addition of the small overhead needed for the compressed format is more than the savings from compression. In this, it is obvious that there is no advantage to enabling compression. Managing Data Compression with SSMS The preceding examples show the T-SQL commands you can use to evaluate and manage row and page compression in SQL Server 2008. SSMS provides a Data Compression Wizard for evaluating and performing data compression activities. To invoke the Data Download from www.wowebook.com ptg 1127 Understanding Table Structures 34 Compression Wizard, right-click on the table in the Object Explorer and select Storage and then select Manage Compression. Click Next to move past the Welcome page to bring up the Select Compression Type page, as shown in Figure 34.11. On the Compression Type Page, you can choose the compression type to use at the parti- tion level or to use the same compression type for all partitions. You can also see the esti- mated savings for selected compression type by clicking on the Calculate button. After you click on Calculate, the wizard displays the current partition size and requested compression size in the corresponding columns (note that it might take a few moments to do the calculation). After making your selections, click on Next to display the Select and Output Option page. Here, you have the opportunity to have the wizard generate a script of commands you can run manually to implement the selected compression type. If you choose to generate a script, you have the option to save the script to a file, the Clipboard, or to a new query window in SSMS. You also have the option to run the compression changes immediately or schedule a SQL Agent job to run the changes at a specified time. Understanding Table Structures A table is logically defined as a set of columns with certain properties, such as the data type, nullability, constraints, and so on. Information about data types, column properties, constraints, and other information related to defining and creating tables can be found in Chapters 24, “Creating and Managing Tables,” and 27, “Creating and Managing Views.” FIGURE 34.11 The Data Compression Wizard’s Select Compression Type page. Download from www.wowebook.com ptg 1128 CHAPTER 34 Data Structures, Indexes, and Performance Internally, a table is contained in one or more partitions. A partition is a user-defined unit of data organization. By default, a table has at least one partition that contains all the table pages. This partition resides in a single filegroup, as described earlier. When a table has multiple parti- tions, the data is partitioned horizontally so that groups of rows are mapped into individual partitions, based on a specified column. The partitions can be placed in one or more filegroups in the database. The table is treated as a single logical entity when queries or updates are performed on the data. Figure 34.12 shows the organization of a table in SQL Server 2008. Each table has one row in the sys.objects catalog view, and each table and index in a database is represented by a single row in the sys.indexes catalog view. Each partition of a table or index is represented by one or more rows in the sys.partitions catalog view. Each partition can have three types of data, each stored on its own set of pages: in-row data pages, row-overflow pages, and LOB data pages. Each of these types of pages has an allocation unit, which is contained in the sys.allocation_units view. There is always at least one allocation unit for the in-row data. The following sample query shows how to view the partition and allocation information for the databaselog and currency tables in the AdventureWorks2008R2 database: use AdventureWorks2008R2 go SELECT convert(varchar(15), o.name) AS table_name, p.index_id as indid, convert(varchar(30), i.name) AS index_name , convert(varchar(18), au.type_desc) AS allocation_type, au.data_pages as d_pgs, partition_number as ptn FROM sys.allocation_units AS au JOIN sys.partitions AS p ON au.container_id = p.partition_id JOIN sys.objects AS o ON p.object_id = o.object_id JOIN sys.indexes AS i ON p.index_id = i.index_id AND i.object_id = p.object_id Table Partitionn … Heap or B-Tree Partition1 Heap or B-Tree Data LOB Row Overflow Data LOB Row Overflow FIGURE 34.12 Table organi zation in S QL Ser ver 2008. Download from www.wowebook.com ptg 1129 Understanding Table Structures 34 WHERE o.name = N’databaselog’ OR o.name = N’currency’ ORDER BY o.name, p.index_id; table_name indid index_name allocation_type d_pgs ptn Currency 1 PK_Currency_CurrencyCode IN_ROW_DATA 1 1 Currency 2 AK_Currency_Name IN_ROW_DATA 1 1 DatabaseLog 0 NULL IN_ROW_DATA 753 1 DatabaseLog 0 NULL LOB_DATA 0 1 DatabaseLog 0 NULL ROW_OVERFLOW_DATA 0 1 DatabaseLog 2 PK_DatabaseLog_DatabaseLogID IN_ROW_DATA 3 1 In this example, you can see that the DatabaseLog table (which is a heap table) has three allocation units associated with the table—LOB, row-overflow, and in-row data—and one allocation unit for the nonclustered index PK_DatabaseLog_DatabaseLogID. The currency table (which is a clustered table) has a single in-row allocation unit for both the table (index_id = 1) and the nonclustered index (AK_Currency_Name). In SQL Server 2008, there are two types of tables: heap tables and clustered tables. Let’s look at how they are stored. Heap Tables A table without a clustered index is a heap table. There is no imposed ordering of the data rows for a heap table. Additionally, there is no direct linkage between the pages in a heap table. By default, a heap has a single partition. Heaps have one row in sys.partitions, with an index ID of 0 for each partition used by the heap. When a heap has multiple partitions, each partition has a heap structure that contains the data for that specific partition. For example, if a heap has four partitions, there are four heap structures (one in each parti- tion) and four rows in sys.partitions. Depending on the data types in the heap, each heap structure has one or more allocation units to store and manage the data for each partition. At a minimum, each heap has one IN_ROW_DATA allocation unit per partition. The heap also has one LOB_DATA allocation unit per partition, if it contains large object columns. It also has one ROW_OVERFLOW_DATA allo- cation unit per partition if it contains variable-length columns that exceed the 8,060-byte row size limit. To access the contents of a heap, SQL Server uses the IAM pages. In SQL Server 2008, each heap table has at least one IAM page. The address of the first IAM page is available in the undocumented sys.sytem_internals_allocation_units system view. The column first_iam_page points to the first IAM page in the chain of IAM pages that manage the space allocated to the heap in a specific partition. The following query returns the first IAM pages for each of the allocation units for the heap table DatabaseLog in AdventureWorks2008R2: Download from www.wowebook.com ptg 1130 CHAPTER 34 Data Structures, Indexes, and Performance use AdventureWorks2008R2 go select p.partition_number as ptn, type_desc, filegroup_id, first_iam_page from sys.system_internals_allocation_units i inner join sys.partitions p on p.hobt_id = i.container_id where p.object_id = OBJECT_ID(‘DatabaseLog’) and index_id = 0 go ptn type_desc filegroup_id first_iam_page 1 IN_ROW_DATA 1 0xAA0000000100 1 LOB_DATA 1 0xB90000000100 1 ROW_OVERFLOW_DATA 1 0x000000000000 Note that the value 0x000000000000 for the first_iam_page for ROW_OVERFLOW_DATA indi- cates that no extents have yet been allocated for storing row-overflow data. NOTE The sys.system_internals_allocation_units system view is reserved for Microsoft SQL Server internal use only. Future compatibility and availability of this view is not guar- anteed. The data pages and rows in the heap are not sorted in any specific order and are not linked. The IAM page registers which extents are used by the table. SQL Server can then simply scan the allocated extents referenced by the IAM page, in physical order. This essentially avoids the problem of page chain fragmentation during reads because SQL Server always reads full extents in sequential order. Using the IAM pages to set the scan sequence also means that rows from the heap often are not returned in the order in which they were inserted. As discussed earlier, each IAM can map a maximum of 63,903 extents for a table. As a table uses extents beyond the range of those 63,903 extents, more IAM pages are created for the heap table as needed. A heap table also has at least one IAM page for each file on which the heap table has extents allocated. Figure 34.13 illustrates the structure of a heap and how its contents are traversed using the IAM pages. Clustered Tables A clustered table is a table that has a clustered index defined on it. When you create a clus- tered index, the data rows in the table are physically sorted in the order of the columns in the index key. The data pages are chained together in a doubly linked list (each page points Download from www.wowebook.com ptg 1131 Understanding Table Structures 34 IAM File A SYS.SYSTEM_INTERNALS_ALLOCATION_UNITS ALLOCATION_UNIT_ID TYPE FILEGROUP ID CONTAINER ID FIRST PAGE ROOT PAGE FIRST IAM PAG E TOTAL PAG E S USED PAG E S DATA PAG E S IAMData Page Data Page Data Page Data Page Data Page ••••• File B IAM Data Page Data Page Data Page Data Page FIGURE 34.13 The structure of a heap table. to the next page and to the previous page). Normally, data pages are not linked. Only index pages within a level are linked in this manner to allow for ordered scans of the data in an index level. Because the data pages of a clustered table constitute the leaf level of the clustered index, they are chained as well. This allows for an ordered table scan. The page pointers are stored in the page header. Figure 34.14 shows a simplified example of the data pages of a clustered table. (Note that the figure shows only the data pages.) Previous Albert, Lynn,… Alexis, Amy,… Cox, Nancy,… Dean, Beth,… Next Previous Eddy, Elizabeth,… Franks, Anabelle,… Hunt, Sally,… Martin, Emma,… Next Previous Smith, David,… Toms, Mik e,… Watson, Tom,… Next FIGURE 34.14 The data page structure of a clustered table. Download from www.wowebook.com ptg 1132 CHAPTER 34 Data Structures, Indexes, and Performance TIP More details on the structure and maintenance of clustered tables are provided in the remainder of this chapter. Understanding Index Structures When you run a query against a table that has no indexes, SQL Server has to read every page of the table, looking at every row on each page to find out whether each row satisfies the search arguments. SQL Server has to scan all the pages because there’s no way of knowing whether any rows found are the only rows that satisfy the search arguments. This search method is referred to as a table scan. A table scan is not an efficient way to retrieve data unless you really need to retrieve all rows. The Query Optimizer in SQL Server always calculates the cost of performing a table scan and uses that as a baseline when evaluating other access methods. The various access methods and query plan cost analysis are discussed in more detail in Chapter 35, “Understanding Query Optimization.” Suppose that a table is stored on 10,000 pages; even if only one row is to be returned or modified, all the pages must be searched, resulting in a scan of approximately 80MB of data (that is, 10,000 pages × 8KB per page = 80,000KB). Indexes are structures stored separately from the actual data pages; they contain pointers to data pages or data rows. Indexes are used to speed up access to the data; they are also the mechanism used to enforce the uniqueness of key values. Indexes in SQL Server are balanced trees (B-trees; see Figure 34.12). There is a single root page at the top of the tree, which branches out into N pages at each intermediate level until it reaches the bottom (leaf level) of the index. The leaf level has one row stored for each row in the table. The index tree is traversed by following pointers from the upper- Level 2 (Root) Level 1 (Intermediate) Level 0 (Leaf) FIGURE 34.15 The basic structure of a B-tree index. Download from www.wowebook.com ptg 1133 Understanding Index Structures 34 level pages down through the lower-level pages. Each level of the index is linked as a doubly linked list. An index can have many intermediate levels, depending on the number of rows in the table, index type, and index key width. The maximum number of columns in an index is 16; the maximum width of an index row is 900 bytes. To provide a more efficient mechanism to identify and locate specific rows within a table quickly and easily, SQL Server supports two types of indexes: clustered and nonclustered. Clustered Indexes When you create a clustered index, all rows in the table are sorted and stored in the clus- tered index key order. Because the rows are physically sorted by the index key, you can have only one clustered index per table. You can think of the structure of a clustered index as being similar to a filing cabinet: the data pages are like folders in a file drawer in alpha- betical order, and the data rows are like the records in the file folder, also in sorted order. You can think of the intermediate levels of the index tree as the file drawers, also in alpha- betical order, that assist you in finding the appropriate file folder. Figure 34.16 shows an example of a clustered index tree structure. In Figure 34.16, note that the data page chain is in clustered index order. However, the rows on each page might not be physically sorted in clustered index order, depending on when rows were inserted or deleted in the page. SQL Server still keeps the proper sort order of the rows via the row IDs and the row offset table. A clustered index is useful for range-retrieval queries or searches against columns with duplicate values because the rows within the range are physically located in the same page or on adjacent pages. The data pages of the table are also the leaf level of a clustered index. To find all clustered index key values, SQL Server must eventually scan all the data pages. SQL Server performs the following steps when searching for a value using a clustered index: 1. Queries the system catalogs for the page address for the root page of the index. (For a clustered index, the root_page column in sys.system_internals_allocation_units points to the top of the clustered index for a specific partition.) 2. Compares the search value against the key values stored on the root page. 3. Finds the highest key value on the page where the key value is less than or equal to the search value. 4. Follows the page pointer stored with the key to the appropriate page at the next level down in the index. 5. Continues following page pointers (that is, repeats steps 3 and 4) until the data page is reached. 6. Searches the rows on the data page to locate any matches for the search value. If no matching row is found on that data page, the table contains no matching values. Download from www.wowebook.com . exceed the 8,060-byte row size limit. To access the contents of a heap, SQL Server uses the IAM pages. In SQL Server 2008, each heap table has at least one IAM page. The address of the first. allocation information for the databaselog and currency tables in the AdventureWorks200 8R2 database: use AdventureWorks200 8R2 go SELECT convert(varchar(15), o.name) AS table_name, p.index_id as indid,. Compression with SSMS The preceding examples show the T -SQL commands you can use to evaluate and manage row and page compression in SQL Server 2008. SSMS provides a Data Compression Wizard for evaluating