ptg 1614 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 declare @smallBox GEOMETRY = ‘polygon((0 0, 0 2, 2 2, 2 0, 0 0))’; declare @largeBox GEOMETRY = ‘polygon((1 1, 1 4, 4 4, 4 1, 1 1))’; declare @line GEOMETRY = ‘linestring(0 2, 4 4)’; select @smallBox union all select @largeBox union all select @smallBox.STIntersection(@largeBox) union all select @line Spatial Data Types: Where to Go from Here? The preceding sections provide only a brief introduction to spatial data types and how to work with geometry and geography data. For more information on working with spatial data, in addition to Books Online, you might want to visit the Microsoft SQL Server 2008 Spatial Data page at http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx. This page provides links to whitepapers and other technical documents related to working with spatial data in SQL Server 2008. In addition, all examples here deal with spatial data only as data values and coordinates. Spatial data is often most useful when it can be displayed visually, such as on a map. SQL Server 2008 R2 Reporting Services provides new map controls and a map wizard for creating map reports based on spatial data. For more information, see Chapter 53, “SQL Server 2008 Reporting Services.” Change Data Capture In SQL Server 2008, Microsoft introduced a new feature called Change Data Capture (CDC), which is designed to make it much easier and less resource intensive to identify and retrieve changed data from tables in an online transaction processing (OLTP) data- base. In a nutshell, CDC captures and records INSERT, UPDATE, and DELETE activity in an OLTP database and stores it in a form that is easily consumed by an application, such as a SQL Server Integration Services (SSIS) package. In the past, capturing data changes for your tables for auditing or extract, transform, and load (ETL) purposes required using replication, time stamp columns, triggers, complex queries, or expensive third-party tools. None of these other methods are easy to imple- ment, and many of them use a lot of server resources, negatively affecting the perfor- mance of the OLTP server. Change Data Capture provides for a more efficient mechanism for capturing the data changes in a table. ptg 1615 Change Data Capture 42 NOTE Change Data Capture is available only in the SQL Server 2008 Developer, Enterprise, and Datacenter Editions. The source of change data for Change Data Capture is the SQL Server transaction log. As inserts, updates, and deletes are applied to tables, entries that describe those changes are added to the transaction log. When Change Data Capture is enabled for a database, a SQL Server Agent capture job is created to invoke the sp_replcmds system procedure. This procedure is an internal server function and is the same mechanism used by transactional replication to harvest changes from the transaction log. NOTE If replication is already enabled for the database, the transactional log reader used for replication is also used for CDC. This strategy significantly reduces log contention when both replication and Change Data Capture are enabled for the same database. The principal task of the Change Data Capture process is to scan the log and identify changes to data rows in any tables configured for Change Data Capture. As these changes are identified, the process writes column data and transaction-related information to the Change Data Capture tables. The changes can then be read from these change tables to be applied as needed. The Change Data Capture Tables When CDC is enabled for a database and one or more tables, an associated Change Data Capture table is created for each table being monitored. The Change Data Capture tables are used to store the changes made to the data in corresponding source tables, along with some metadata used to track the changes. By default, the name of the CDC change table is schemaname_tablename_CT and is based on the name of the source table. The first five columns of a Change Data Capture change table are metadata columns and contain additional information relevant to the recorded change: . __$start_lsn—Identifies the commit log sequence number (LSN) assigned to the change. This value can be used to determine the order of the transactions. . __$end_lsn—Is currently not used and in SQL Server 2008 is always NULL. . __$seqval—Can be used to order changes that occur within the same transaction. ptg 1616 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 . __$operation—Records the operation associated with the change: 1 = delete, 2 = insert, 3 = update before image(delete), and 4 = update after image(insert) . __$update_mask—Is a variable bit mask with one defined bit for each captured col- umn to identify what columns were changed. For insert and delete entries, the update mask always has all bits set. Update rows have the bits set only for the columns that were modified. The remaining columns in the Change Data Capture change table are identical to the columns from the source table in name and type and are used to store the column data gathered from the source table when an insert, update, or delete operation is performed on the table. For every row inserted into the source table, a single row a single row is inserted into the change table, and this row contains the column values inserted into the source table. Every row deleted from the source table is also inserted as a single row into the change table but contains the column values in the row before the delete operation. An update operation is captured as a delete followed by an insert, so two rows are captured for each update: one row entry to capture the column values before the update, and a second row entry to capture the column values after the update. In addition to the Change Data Capture tables, the following Change Data Capture meta- data tables are also created: . cdc.change_tables—Contains one row for each change table in the created when Change Data Capture is enabled on a source table. . cdc.index_columns—Contains one row for each index column used by Change Data Capture to uniquely identify rows in the source table. By default, this is the column of the primary key of the source table, but a different unique index on the source table can be specified when Change Data Capture is enabled on the source table. A primary key or unique index is required on the source table only if Net Change Tracking is enabled. . cdc.captured_columns—Contains one row for each column tracked in each source table. By default, all columns of the source table are captured, but you can include or exclude columns when enabling Change Data Capture for a table by specifying a column list. . cdc.ddl_history—Contains a row for each Data Definition Language (DDL) change made to any table enabled for Change Data Capture. You can use this table to deter- mine when a DDL change occurred on a source table and what the change was. . cdc.lsn_time_mapping—Contains a row for each transaction stored in a change table and is used to map between log sequence number (LSN) commit values and the actual time the transaction was committed. Although you can query the Change Data Capture tables directly, it is not recommended. Instead, you should use the Change Data Capture functions, which are discussed later. ptg 1617 Change Data Capture 42 All these objects associated with a CDC instance are created in the special schema called cdc when Change Data Capture is enabled for a database. Enabling CDC for a Database Before you can begin capturing data changes for a table, you must first enable the data- base for Change Data Capture. You do this by running the stored procedure sys.sp_cdc_enable_db within the desired database context. When a database is enabled for Change Data Capture, the cdc schema, cdc user, metadata tables, as well as the system functions, are used to query for change data. NOTE To determine whether a database is already enabled for CDC, you can check the value in the is_cdc_enabled column in the sys.databases catalog view. A value of 1 indi- cates that CDC is enabled for the specified database. The following SQL code enables CDC for the AdventureWorks2008R2 database and then checks that CDC is enabled by querying the sys.databases catalog view: use AdventureWorks2008R2 go exec sys.sp_cdc_enable_db go select is_cdc_enabled from sys.databases where name = ‘AdventureWorks2008R2’ go is_cdc_enabled 1 NOTE Although the examples presented here are run against the AdventureWorks2008R2 data- base, they can also be run against the AdventureWorks2008 database. However, you should be aware that some of the column values displayed may not be exactly the same. Enabling CDC for a Table When the database is enabled for Change Data Capture, you can use the sys.sp_cdc_enable_table stored procedure to enable a Change Data Capture instance for any tables in that database. The sp_cdc_enable_Table stored procedure supports the following parameters: ptg 1618 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 . @source_schema—Specifies the name of the schema in which the source table resides. . @source_name—Specifies the name of the source table. . @role_name—Indicates the name of the database role used to control access to Change Data Capture tables. If this parameter is set to NULL, no role is used to limit access to the change data. If the specified role does not exist, SQL Server creates a database role with the specified name. . @capture_instance—Specifies the name of the capture instance used to name the instance-specific Change Data Capture objects. By default, this is the source schema name plus the source table name in the format schemaname_sourcename. A source table can have a maximum of two capture instances. . @supports_net_changes—Is set to 1 or 0 to indicate whether support for querying for net changes is to be enabled for this capture instance. If this parameter is set to 1, the source table must have a defined primary key, or an alternate unique index must be specified for the @index_name parameter. . @index_name—Specifies the name of a unique index to use to uniquely identify rows in the source table. . @captured_column_list—Specifies the source table columns to be included in the change table. By default, all columns are included in the change table. . @filegroup_name—Specifies the filegroup to be used for the change table created for the capture instance. If this parameter is NULL or not specified, the default filegroup is used. If possible, it is recommended you create a separate filegroup from your source tables for the Change Data Capture change tables. . @allow_partition_switch—Indicates whether the SWITCH PARTITION command of ALTER TABLE can be executed against a table that is enabled for Change Data Capture. The default is 1 (enabled). If any partition switches occur, Change Data Capture does not track the changes resulting from the switch. This causes data inconsistencies when the change data is consumed. The @source_schema, @source_name, and @role_name parameters are the only required parameters. All the others are optional and apply default values if not specified. To implement basic change data tracking for a table, let’s first create a copy of the Customer table to play around with: select * into MyCustomer from Sales.Customer alter table MyCustomer add Primary key (CUstomerID) Now, to enable CDC on the MyCustomer table, you can execute the following: EXEC sys.sp_cdc_enable_table @source_schema = N’dbo’, @source_name = N’MyCustomer’, @role_name = NULL ptg 1619 Change Data Capture 42 NOTE If this is the first time you are enabling CDC for a table in the database, you may see the following messages, which indicate that SQL Server is enabling the SQL Agent jobs to begin capturing the data changes in the database: Job ‘cdc.AdventureWorks2008R2_capture’ started successfully. Job ‘cdc.AdventureWorks2008R2_cleanup’ started successfully. The Capture job that is created generally runs continuously and is used to move changed data to the CDC tables from the transaction log. The Cleanup job runs on a scheduled basis to remove older data from the CDC tables so that they don’t grow too large. By default, it automatically removes data that is more than three days old. The properties of these jobs can be viewed and modified using the sys.sp_cdc_help_jobs and sys.sp_cdc_change_job procedures, respectively. To determine whether or not a source table has been enabled for Change Data Capture, you can query the is_tracked_by_cdc column in the sys.tables catalog view for that table: select is_tracked_by_cdc from sys.tables where name = ‘MyCustomer’ go is_tracked_by_cdc 1 TIP To get information on which tables are configured for CDC and what the settings for each are, you can execute the sys.sp_cdc_help_change_data_capture stored proce- dure. It reports the name and ID of the source and Change Tracking tables, the CDC table properties, the columns included in the capture, and the date the CDC was enabled/created for the source table. Querying the CDC Tables After you enable change data tracking for a table, SQL Server begins capturing any data changes for the table in the Change Data Capture tables. To identify the data changes, you need to query the Change Data Capture tables. Although you can query the Change Data Capture tables directly, it is recommended that you use the CDC functions instead. The main CDC table-valued functions (TVFs) are . cdc.fn_cdc_get_all_changes_capture_instance ptg 1620 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 . cdc.fn_cdc_get_net_changes_capture_instance NOTE The Change Data Capture change table and associated CDC table-valued functions created along with it constitute what is referred to as a capture instance. A capture instance is created for every source table that is enabled for CDC. Each capture instance is given a unique name based on the schema and table names. For example, if the table named sales.products is CDC enabled, the capture instance created is named sales_products. The name of the CDC change table within the cap- ture instance is sales_products_CT, and the names of the two associated CDC query functions are cdc.fn_cdc_get_all_changes_sales_products and cdc.fn_cdc_get_net_changes_sales_products. Both of the CDC table-valued functions require two parameters to define the range of log sequence numbers to use as the upper and lower bounds to determine which records are to be included in the returned result set. A third required parameter, the row_filter_option, specifies the content of the metadata columns as well as the rows to be returned in the result set. Two values can be specified for the row_filter for the cdc.fn_cdc_get_all_changes_capture_instance function: ”all” and ”all update old”. If ”all” is specified, the function returns all changes within the specified log sequence number (LSN) range. For changes due to an update operation, only the row containing the new values after the update are returned. If ”all update old” is specified, the function returns all changes within the specified LSN range. For changes due to an update opera- tion, this option returns both the before and after update copies of the row. For the cdc.fn_cdc_get_net_changes_capture_instance function, three values can be specified for the row_filter parameter: ”all”, ”all with mask”, and ”all with merge”. If ”all” is specified, the function returns the LSN of the final change to the row, and the operation needed to apply the change to the row is returned in the __$start_lsn and __$operation metadata columns. The __$update_mask column is always NULL. If ”all with mask” is specified, the function returns the LSN of the final change to the row and the operation needed to apply the change to the row. Plus, if the __$operation equals 4 (that is, it contains the after update row values), the columns actually modified in the update are identified by the bit mask returned in the __$update_mask column. If the ”all with merge” option is passed, the function returns the LSN of the final change to the row and the operation needed to apply the change to the row. The __$operation column will have one of two values: 1 for delete and 5 to indicate that the operation needed to apply the change is either an insert or update. The column __$update_mask is always NULL. ptg 1621 Change Data Capture 42 So how do you determine what LSNs to specify to return the rows you need? Fortunately, SQL Server provides several functions to help determine the appropriate LSN values for use in querying the TVFs: . sys.fn_cdc_get_min_lsn—Returns the smallest LSN associated with a capture instance validity interval. The validity interval is the time interval for which change data is currently available for its capture instances. . sys.fn_cdc_get_max_lsn—Returns the largest LSN in the validity interval. . sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time—Are used to corre- late LSN values with a standard time value. . sys.fn_cdc_increment_lsn and sys.fn_cdc_decrement_lsn—Can be used to make an incremental adjustment to an LSN value. This adjustment is sometimes necessary to ensure that changes are not duplicated in consecutive query windows. So, before you can start querying the CDC tables, you need to generate some records in them by running some data modifications against the source tables. First, you need to run the statements in Listing 42.21 against the MyCustomer table to generate some records in the dbo_MyCustomer_CT Change Data Capture change table. LISTING 42.21 Some Data Modifications to Populate the MyCustomer CDC Capture Table delete MyCustomer where CustomerID = 22 Insert MyCustomer (PersonID, StoreID, TerritoryID, AccountNumber, rowguid, ModifiedDate) Values (20778, null, 9, ‘AW’ + RIGHT(‘00000000’ + convert(varchar(8), IDENT_Current(‘MyCustomer’)), 8), NEWID(), GETDATE()) declare @ident int select @ident = SCOPE_IDENTITY() update MyCustomer set TerritoryID = 3, ModifiedDate = GETDATE() where CustomerID = @ident Now that you have some rows in the CDC capture table, you can start retrieving them. First, you need to identify the min and max LSN values to pass to the ptg 1622 CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008 cdc.fn_cdc_get_all_changes_dbo_MyCustomer function. This can be done using the sys.fn_cdc_get_min_lsn and sys.fn_cdc_get_max_lsn functions. Listing 42.22 puts all these pieces together to return the records stored in the CDC capture table. LISTING 42.22 Querying the MyCustomer CDC Capture Table USE AdventureWorks2008R2 GO declare variables to represent beginning and ending lsn DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10) get the first LSN for table changes SELECT @from_lsn = sys.fn_cdc_get_min_lsn(‘dbo_MyCustomer’) get the last LSN for table changes SELECT @to_lsn = sys.fn_cdc_get_max_lsn() get all changes in the range using “all update old” parameter SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_MyCustomer (@from_lsn, @to_lsn, ‘all update old’); GO __$start_lsn __$seqval __$operation __$update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate 0x00000039000014400004 0x00000039000014400002 1 0x7F 22 NULL 494 3 AW00000022 9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263 0x00000039000014410004 0x00000039000014410003 2 0x7F 30119 20778 NULL 9 AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:44.267 0x000000390000144C0004 0x000000390000144C0002 3 0x48 30119 20778 NULL 9 AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:44.267 ccc0x000000390000144C0004 0x000000390000144C0002 4 ptg 1623 Change Data Capture 42 ccc0x48 30119 20778 NULL 3 cccAW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 ccc2010-04-27 22:38:48.263 Because the option ”all update old” is specified in Listing 42.22, all the rows in the dbo_MyCustomer_CT capture table are returned, including the deleted row, inserted row, and both the before and after copies of the row updated. If you want to return only the final version of each row within the LSN range (and the @supports_net_changes was set to 1 when CDC was enabled for the table), you can use the cdc.fn_cdc_get_net_changes_capture_instance function, as shown in Listing 42.23. LISTING 42.23 Querying the MyCustomer CDC Capture Table for Net Changes USE AdventureWorks2008R2 GO declare variables to represent beginning and ending lsn DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10) get the first LSN for table changes SELECT @from_lsn = sys.fn_cdc_get_min_lsn(‘dbo_MyCustomer’) get the last LSN for table changes SELECT @to_lsn = sys.fn_cdc_get_max_lsn() get all changes in the range using “all with_merge” parameter SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer (@from_lsn, @to_lsn, ‘all with merge’); GO __$start_lsn __$operation __$update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate 0x00000039000014400004 1 NULL 22 NULL 494 3 AW00000022 9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263 ccc0x000000390000144C0004 5 NULL 30119 ccc20778 NULL 3 AW00030119 ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263 . visit the Microsoft SQL Server 2008 Spatial Data page at http://www .microsoft. com/sqlserver /2008/ en/us/spatial-data.aspx. This page provides links to whitepapers and other technical documents. a map. SQL Server 2008 R2 Reporting Services provides new map controls and a map wizard for creating map reports based on spatial data. For more information, see Chapter 53, SQL Server 2008 Reporting. used and in SQL Server 2008 is always NULL. . __$seqval—Can be used to order changes that occur within the same transaction. ptg 1616 CHAPTER 42 What’s New for Transact -SQL in SQL Server 2008 . __$operation—Records