MaintainingYourDatabase•Chapter7 257 As you have learned from earlier chapters, tables can be stored as a heap, where rows are stored in no particular order; or as a clustered index, where rows are stored in the order defined by the index. You can use compression on both types of table. Nonclustered indexes are stored separately from the table on which they are defined. Nonclustered indexes can also be created on views, a situation referred to as “indexed views”. You can use compression on nonclustered indexes for tables and views. Finally, data compression can be configured for individual partitions that tables and indexes are stored across. For example, if a table is partitioned into current and historical partitions, you can choose to enable data compression on the historical partition only. To enable compression on a table or index, use the CREATE TABLE or CREATE INDEX statement with DATA_COMPRESSION = ROW | PAGE option. If you are enabling compression on an existing table or index, use the DATA_COMPRESSION option with the ALTER TABLE or ALTER INDEX statement. Enabling compression will cause a rebuild of the object and is, therefore, a highly time and resource consum- ing operation. Enabling data compression on a table will have no effect on that table’s non clustered indexes. Each non clustered index can be compressed separately. The syn- tax for enabling data compression is shown in Examples 7.2 through 7.6. Example 7.2 Enabling Data Compression on a New Table—Syntax CREATE TABLE [database_name].[schema_name].table_name (<Column Definition List>) WITH (DATA_COMPRESSION = ROW | PAGE) Example 7.3 Enabling Data Compression on an Existing Table—Syntax ALTER TABLE [database_name].[schema_name].table_name REBUILD WITH (DATA_COMPRESSION = ROW | PAGE | NONE) Example 7.4 Enabling Data Compression on a New Nonclustered Index—Syntax CREATE NONCLUSTERED INDEX index_name ON table_name (<Column List>) WITH (DATA_COMPRESSION = ROW | PAGE) Example 7.5 Enabling Data Compression on an Existing Nonclustered Index—Syntax ALTER INDEX index_name ON table_name REBUILD WITH (DATA_COMPRESSION = ROW | PAGE | NONE) Example 7.6 Enabling Data Compression on a Partitioned Table—Syntax ALTER TABLE partitioned_table_name REBUILD PARTITION = 1 WITH (DATA_COMPRESSION = ROW | PAGE | NONE) 258 Chapter7•MaintainingYourDatabase Row versus Page Compression Row compression attempts to reduce disk space by storing all fixed-length data types as variable length, including numeric data types. This can reduce the size of each individual row, allowing you to fit more rows on a page. Row compression uses compression metadata to describe the offset of each value within the row. However, this space saving is not always achieved. For example, when values stored in columns of fixed length data type consume the entire length of the column, no space saving occurs. In fact, in this scenario more space is used as the overhead compression metadata must still be written to the page. Row compression has no effect on the smallest possible data types like tinyint, smalldatetime, date and uniqueidentifier data types. It also has no effect on data types that are already stored as variable-length like varchar, nvarchar, and varbinary. Finally, special data types like text, image, xml, table, sql_variant, and cursor are not affected by row level compression. The bit data type is always negatively affected because, together with the metadata overhead, it requires four bits of storage as opposed to the one byte usually required for up-to-eight–bit columns. Page compression applies the following compression techniques to each data page: Row compression Prefix compression Dictionary compression These techniques are applied in order when the data page becomes full. This is why page compression has a profound negative effect on write performance. Page compression goes further than row compression when attempting to save space. When you enable page compression, row compression is automatically enabled. Prefix compression identifies repeating values in each column and stores the repeating value once in the compression information (CI) structure in the page header. The repeating value throughout the column is then replaced by a reference to the value in the page header. The reference can also indicate a partial match. Dictionary compression is applied after prefix compression. This type of compres- sion identifies repeating values anywhere on the page and then stores these values, once in the CI structure, in the page header. Repeating values throughout the page are replaced by a reference. Dictionary compression is not limited to a single column; it is applied to the entire page. MaintainingYourDatabase•Chapter7 259 Estimating Space Savings Using sp_estimate_data_compression_savings It can be difficult to decide whether or not to implement data compression. To take the guesswork out of this decision, SQL Server 2008 provides a handy stored procedure: sp_estimate_data_compression savings. This stored procedure takes the name of the table or indexed view, optional index number (specify NULL for all indexes or 0 for the heap), optional partition number, and the type of data compression to calculate the estimate for. This stored procedure is also useful if you have a compressed table and want to know how much space the table would consume uncompressed. The following columns are included in the results of the sp_estimate_data_compression_savings stored procedure: Object_name This is the name of the table or the indexed view for which you are calculating the savings. Schema_name The schema that this table or view belongs to. Index_id Index number: 0 stands for the heap, 1 for the clustered index, other numbers for nonclustered indexes. Partition_number Number of the partition: 1 stands for a nonparti- tioned table or index. Size_with_current_compression_setting (KB) Current size of the object. Size_with_requested_compression_setting (KB) Estimated size of the object without fragmentation or padding. Sample_size_with_current_compression_setting (KB) Size of the sample using the existing compression setting. Sample_size_with_requested_compression_setting (KB) Size of the sample using the requested compression setting. In Example 7.7, we will use the sp_estimate_data_compression_savings with the Purchasing.PurchaseOrderDetail table. Example 7.7 Estimating Compression Savings Use AdventureWorks; GO execute sp_estimate_data_compression_savings Purchasing, PurchaseOrderDetail, null, null, Page 260 Chapter7•MaintainingYourDatabase Using Sparse Columns Sparse columns reduce the amount of space taken up by null values. However, sparse columns increase the time it takes to retrieve values that are not null. Most columns that allow nulls can be marked as sparse. The best practice is to use sparse columns when the technique saves at least 20 to 40 percent of space. You are not concerned about read performance reduction for non-null values. Columns are marked as sparse within the CREATE TABLE or ALTER TABLE statements, as shown in Examples 7.8 through 7.10. Example 7.8 Creating a Sparse Column in a New Table—Syntax CREATE TABLE [database_name].[schema_name].table_name (Column1 int PRIMARY KEY, Column2 varchar(50) SPARSE NULL) Example 7.9 Marking an Existing Column as Sparse—Syntax ALTER TABLE [database_name].[schema_name].table_name ALTER COLUMN Column2 ADD SPARSE Example 7.10 Marking an Existing Column as Non-Sparse—Syntax ALTER TABLE [database_name].[schema_name].table_name ALTER COLUMN Column2 DROP SPARSE New & Noteworthy… Using Column Sets Sparse columns are often used with a new feature of SQL Server called column sets. A column set is like a calculated column that, when queried, returns an XML fragment representing all values stored in all sparse columns within a single table. A column set, similar to a calculated column, consumes no storage space except for table metadata. Unlike a calculated column, you can update a column set by updating the XML returned by the column set. This makes column sets especially useful for storing a large number of properties that are often null. MaintainingYourDatabase•Chapter7 261 Consider using column sets when it is difficult to work with a large number of columns individually, and many of the values in these columns are null. Column sets can offer improved performance except in situations where many indexes are defined on the table. Example 7.11 demonstrates the use of column sets: Example 7.11 Using Column Sets CREATE TABLE Planets (PlanetID int IDENTITY PRIMARY KEY, PlanetName nvarchar(50) SPARSE NULL, PlanetType nvarchar(50) SPARSE NULL, Radius int SPARSE NULL, PlanetDescription XML COLUMN_SET FOR ALL_SPARSE_COLUMNS); GO INSERT Planets (PlanetName, PlanetType, Radius) VALUES ('Earth', NULL, NULL), ('Jupiter', 'Gas Giant', 71492), ('Venus', NULL, 6051); GO SELECT PlanetDescription FROM Planets Results: PlanetDescription <PlanetName>Earth</PlanetName> <PlanetName>Jupiter</PlanetName><PlanetType>Gas Giant </PlanetType><Radius>71492</Radius> <PlanetName>Venus</PlanetName><Radius>6051</Radius> UPDATE Planets SET PlanetDescription = '<PlanetName>Earth</PlanetName><PlanetType>Terrestrial Planet</PlanetType><Radius>6371</Radius>' WHERE PlanetName = 'Earth'; GO SELECT * FROM Planets Results: PlanetID PlanetDescription 1 <PlanetName>Earth</PlanetName><PlanetType>Terrestrial Planet </PlanetType><Radius>6371</Radius> 2 <PlanetName>Jupiter</PlanetName><PlanetType>Gas Giant </PlanetType><Radius>71492</Radius> 3 <PlanetName>Venus</PlanetName><Radius>6051</Radius> DROP TABLE Planets; GO . stores the repeating value once in the compression information (CI) structure in the page header. The repeating value throughout the column is then replaced by a reference to the value in the. compression. To take the guesswork out of this decision, SQL Server 2008 provides a handy stored procedure: sp_estimate_data_compression savings. This stored procedure takes the name of the table or. space the table would consume uncompressed. The following columns are included in the results of the sp_estimate_data_compression_savings stored procedure: Object_name This is the name of the