Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1282 Part VIII Monitoring and Auditing Here’s the catch: The context isn’t automatic — it must be added to each and every DML command. In addition, it uses a WITH clause, just like a common table expression, so the syntax is confusing. While I’m glad it’s possible to capture the context, I’m not a huge fan of the implementation. The following code creates the varbinary variable and passes it to Change Tracking as part of an UPDATE command. DECLARE @AppContext VARBINARY(128) = CAST(‘Maui/Pn’ as VARBINARY(128)); WITH Change_Tracking_Context (@AppContext) UPDATE HumanResources.Department SET GroupName = ‘Certified Master w/Context’ WHERE Name = ‘Row Three’; When querying ChangeTable,thesys_Change_Context column returns the context data. The CAST() function converts it to readable text: SELECT CT.SYS_CHANGE_VERSION, CT.DepartmentID, CT.SYS_CHANGE_OPERATION, d.Name, d.GroupName, D.ModifiedDate, CAST(SYS_CHANGE_CONTEXT as VARCHAR) as ApplicationContext FROM ChangeTable (Changes HumanResources.Department, 5) as CT LEFT OUTER JOIN HumanResources.Department d ON d.DepartmentID = CT.DepartmentID ORDER BY CT.SYS_CHANGE_VERSION; Removing Change Tracking It’s as easy to remove Change Tracking as it is to enable it: Disable it from every table, and then remove it from the database. If the goal is to reduce Change Tracking by a single table, then the same ALTER command that enabled Change Tracking can disable it: ALTER TABLE HumanResources.Department Disable Change_tracking; When Change Tracking is disabled from a table, all stored ChangeTable data — the PKs and columns updated — are lost. If the goal is to remove Change Tracking from the database, then Change Tracking must first be removed from every table in the database. One way to accomplish this is to leverage the sp_MSforeachtable stored procedure: EXEC sp_MSforeachtable ‘ALTER TABLE ? Disable Change_tracking;’; 1282 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1283 Change Tracking 59 However, after much testing, I can only warn that in many cases sp_msforeachtable often fails to remove Change Tracking from every table. A less elegant, but more reliable, method of ensuring that Change Tracking is completely removed from every table in the database is to actually cursor through the sys.change_tracking_tables table: DECLARE @SQL NVARCHAR(MAX)=‘’; SELECT @SQL = @SQL + ‘ALTER TABLE ’ + s.name + ‘.’ + t.name + ‘ Disable Change_tracking;’ FROM sys.change_tracking_tables ct JOIN sys.tables t ON ct.object_id = t.object_id JOIN sys.schemas s ON t.schema_id = s.schema_id; PRINT @SQL; EXEC sp_executesql @SQL; Only after Change Tracking is disabled from every table can Change Tracking be removed from the database: ALTER DATABASE AdventureWorks2008 SET Change_tracking = off; Even though Change Tracking is removed from the database, it doesn’t reset the Change Tracking version number, so if Change Tracking is restarted it won’t cause a synchronization nightmare. Summary Designing a DIY synchronization system involves triggers that either update row timestamps or write keys to a table. Change Tracking does all the hard work, adds auto cleanup, is relatively easy to set up and use, and reliably returns the net changes. Without question, using Change Tracking sets you up for success with ETL processes and mobile device synchronization. Microsoft introduces several new auditing and monitoring technologies with SQL Server 2008. The next chapter continues exploring these new technologies with Change Tracking’s big brother, Change Data Capture. 1283 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1284 www.getcoolebook.com Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1285 Change Data Capture IN THIS CHAPTER High-end BI ETL Leveraging the T-Log I know almost nothing about the CDC in Atlanta. The little I do know about the Centers for Disease Control comes from watching Dustin Hoffman in the movie Outbreak. Fortunately for me and you, this chapter is about the other CDC — Change Data Capture. There’s power hidden in the transaction log (T-Log), and Change Data Capture (CDC) harnesses the transaction log to capture data changes with the least possible impact on performance. Any data written to the transaction log can be asynchronously captured using CDC from the transaction log after the transaction is complete, so it doesn’t affect the original transaction’s performance. CDC can track any data from the T-Log, including any DML INSERT, UPDATE, DELETE,andMERGE command, and DDL CREATE, ALTER,andDROP. Changes are stored in change tables — tables created by CDC with the same columns as the tracked tables plus a few extra CDC-specific columns. All the changes are captured, so CDC can return all the intermediate values or just the net changes. Because CDC gathers its data by reading the log, the data in the change tables is organized the same way the transaction log is organized — by T-log log sequence numbers, known as LSNs. (Kalen Delaney told a joke about Oracle’s founder Larry Ellison being inside SQL Server — just look at the transaction log and there’s LSN! Ha!) There are only a few drawbacks to CDC: ■ Cost: It requires Enterprise Edition. ■ Code: Personally, this really irks me. CDC uses system stored proce- dures instead of standardized ALTER statements. 1285 www.getcoolebook.com Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1286 Part VIII Monitoring and Auditing ■ Code: There’s no UI for configuring change data capture in Management Studio. ■ T-Log I/O: The transaction log will experience about twice as much I/O because CDC reads from the log. ■ Performance hit: Although it can vary greatly, expect an approximate 10% performance hit on the OLTP server running CDC. ■ Disk space: Because CDC essentially stores copies of every transaction data, there’s the potential that it can grow like a transaction log gone wild. Where change data capture shines is in gathering data for ETL from a high-traffic OLTP database to a data warehouse. Of the possible options, change data capture has the least performance hit, and it does a great job of providing the right set of data for the Business Intelligence ETL (extract-transform-load). When you think big-dollar BI, think change data capture. Enabling CDC Change Data Capture is enabled at the database level first, and then for every table that needs to be tracked. Because change data capture reads from the transaction log, one might think that CDC requires the database to be set to full recovery model so that the transaction log is kept. However, SQL Server doesn’t flush the log until after the transactions have been read by CDC, so CDC will work with any recovery model, even simple. Also, and this is very important, change data capture uses SQL Agent jobs to capture and clean up the data, so SQL Agent must be running or data will not be captured. Enabling the database To enable the database, execute the sys.sp_cdc_enable_db system stored procedure in the current database. It has no parameters: EXEC sys.sp_cdc_enable_db The is_cdc_enabled column in sys.databases can be used to determine which databases have CDC enabled on them: SELECT * FROM sys.databases WHERE is_cdc_enabled = 1 This procedure creates six system tables in the current database: ■ cdc.captured_columns: Stores metadata for tracked table’s columns ■ cdc.change_tables: Stores metadata for tracked tables ■ cdc.ddl_history: Tracks DDL activity ■ cdc.index_columns: Tracks table indexes ■ cdc.lsn_time_mapping: Used for calculating clean-up time 1286 www.getcoolebook.com Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1287 Change Data Capture 60 ■ dbo.systranschemas: Tracks schema changes These are listed in Object Explorer under the Database ➪ Tables ➪ System tables node. Enabling tables Once the database has been prepared for CDC, tables may be set up for CDC using the sys.sp_ cdc_enable_table stored procedure, which has several options: ■ @source_schema: The name of the table to be tracked ■ @source_name: The tracked table’s schema ■ @role_name: The role with permission to view CDC data The last six parameters are optional: ■ @capture_instance: May be used to create multiple capture instances for the table. This is useful if the schema is changed. ■ @supports_net_changes: Allows seeing just the net changes, and requires the primary key. The default is true. ■ @index_name: The name of the unique index, if there’s no primary key for the table (but you’d never do that, right?) ■ @captured_column_list: Determines which columns are tracked. The default is to track all columns. ■ @filegroup_name: The filegroup the CDC will be stored on. If not specified, then the change table is created on the default filegroup. ■ @allow_partition_switch: Allows ALTER TABLE SWITCH PARTITION on CDC table Note that the last parameter, @allow_partition_switch, was changed late in development of SQL Server 2008, and some sources incorrectly list it as @partition_switch. The following batch configures CDC to track changes made to the HumanResources.Department table: EXEC sys.sp_cdc_enable_table @source_schema = ‘HumanResources’, @source_name = ‘Department’, @role_name = null; With the first table that’s enabled, SQL Server generates two SQL Agent jobs: ■ cdc.dbname_capture ■ cdc.dbname_cleanup With every table that’s enabled for CDC, SQL Server creates a change table: ■ cdc.change_tables 1287 www.getcoolebook.com Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1288 Part VIII Monitoring and Auditing ■ cdc.index_columns ■ cdc.captured_columns For an excellent article on tuning the performance of change data capture under various loads, see http://msdn.microsoft.com/en-us/library/dd266396.aspx. Working with Change Data Capture It isn’t difficult to work with change data capture. The trick is to understand the transaction log’s log sequence numbers. Assuming AdventureWorks2008 has been freshly installed, the following scripts make some data changes so there will be some activity in the log for change data capture to gather: INSERT HumanResources.Department (Name, GroupName) VALUES (’CDC New Row’, ‘SQL Rocks’), (’Test Two’ , ‘CDC Rocks ’); UPDATE HumanResources.Department SET Name = ‘Changed Name’ WHERE Name = ‘CDC New Row’; INSERT HumanResources.Department (Name, GroupName) VALUES (’Row Three’, ‘PBM Rocks’), (’Row Four’ , ‘TVP Rocks’); UPDATE HumanResources.Department SET GroupName = ‘T-SQL Rocks’ WHERE Name = ‘Test Two’; DELETE FROM HumanResources.Department WHERE Name = ‘Row Four’; With five transactions complete, there should be some activity in the log. The following DMVs can reveal information about the log: SELECT * FROM sys.dm_cdc_log_scan_sessions SELECT * FROM sys.dm_repl_traninfo SELECT * FROM sys.dm_cdc_errors 1288 www.getcoolebook.com Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1289 Change Data Capture 60 Examining the log sequence numbers The data changes are organized in the change tables by log sequence number (LSN). Converting a given date time to LSN is essential to working with change data capture. The sys.fn_cdc_map_ time_to_lsn function is designed to do just that. The first parameter defines the LSN search (called LSN boundary options), and the second parameter is the point in time. Possible searches are as follows: ■ smallest greater than ■ smallest greater than or equal ■ largest less than ■ largest less than or equal Each of the search options defines how the function will locate the nearest LSN in the change tables. The following sample query defines a range beginning with Jan 20 and ending with Jan 24, and returns the LSNs that bound that range: select sys.fn_cdc_map_time_to_lsn (’smallest greater than or equal’, ‘20090101’) as BeginLSN, sys.fn_cdc_map_time_to_lsn (’largest less than or equal’, ‘20091231’) as EndLSN; Result: BeginLSN EndLSN 0x0000002F000001330040 0x0000003B000002290001 The sys.fn_cdc_get_min_lsn() and sys.fn_cdc_get_max_lsn() functions serve as anchor points to begin the walk through the log. The min function requires a table and returns the oldest log entry. The max function has no parameters and returns the most recent LSN in the change tables: DECLARE @BeginLSN VARBINARY(10) = sys.fn_cdc_get_min_lsn(’HumanResources_Department’); SELECT @BeginLSN; DECLARE @EndLSN VARBINARY(10) = sys.fn_cdc_get_max_lsn(); SELECT @EndLSN; There’s not much benefit to knowing the hexadecimal LSN values by themselves, but the LSNs can be passed to other functions to select data from the change tables. 1289 www.getcoolebook.com Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1290 Part VIII Monitoring and Auditing Querying the change tables Change tracking creates a function for each table being tracked using the name cdc.fn_cdc_get_ all_changes concatenated with the schema and name of the table. The following script uses the sys.fn_cdc_map_time_to_lsn function to determine the LSN range values, store them in variables, and then pass the variables to the department tables’ custom change data capture function: with variables DECLARE @BeginLSN VARBINARY(10) = sys.fn_cdc_map_time_to_lsn (’smallest greater than or equal’, ‘20090101’), @EndLSN VARBINARY(10) = sys.fn_cdc_map_time_to_lsn (’largest less than or equal’, ‘20091231’); SELECT $start_lsn, $seqval, $operation, $update_mask, DepartmentID Name, GroupName, ModifiedDate FROM cdc.fn_cdc_get_all_changes_HumanResources_Department (@BeginLSN, @EndLSN, ‘all’) ORDER BY $start_lsn Result: $start_lsn $seqval $operation 0x0000005400001D6E0008 0x0000005400001D6E0003 2 0x0000005400001D6E0008 0x0000005400001D6E0006 2 0x0000005400001D700007 0x0000005400001D700002 4 0x0000005400001D7D0008 0x0000005400001D7D0003 2 0x0000005400001D7D0008 0x0000005400001D7D0006 2 0x0000005400001D7F0004 0x0000005400001D7F0002 4 0x0000005400001D810005 0x0000005400001D810003 1 $update_mask Name GroupName ModifiedDate 0x0F 17 SQL Rocks 2009-03-07 11:21:48.720 0x0F 18 CDC Rocks 2009-03-07 11:21:48.720 0x02 17 SQL Rocks 2009-03-07 11:21:48.720 0x0F 19 PBM Rocks 2009-03-07 11:21:55.387 0x0F 20 TVP Rocks 2009-03-07 11:21:55.387 0x04 18 T-SQL Rocks 2009-03-07 11:21:48.720 0x0F 20 TVP Rocks 2009-03-07 11:21:55.387 It’s also possible to pass the functions directly to the table’s change data capture function. This is essen- tially the same code as the previous query, but slightly simpler, which is usually a good thing: SELECT * FROM cdc.fn_cdc_get_all_changes_HumanResources_Department (sys.fn_cdc_map_time_to_lsn 1290 www.getcoolebook.com Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1291 Change Data Capture 60 (’smallest greater than or equal’, ‘20090101’), sys.fn_cdc_map_time_to_lsn (’largest less than or equal’, ‘20091231’), ‘all’) as CDC ORDER BY $start_lsn You can also convert an LSN directly to a time using the fn_cdc_map_lsn_to_time() function. The next query extends the previous query by returning the time of the transaction: with lsn converted to time SELECT sys.fn_cdc_map_lsn_to_time( $start_lsn) as StartLSN, * FROM cdc.fn_cdc_get_all_changes_HumanResources_Department (sys.fn_cdc_map_time_to_lsn (’smallest greater than or equal’, ‘20090101’), sys.fn_cdc_map_time_to_lsn (’largest less than or equal’, ‘20091231’), ‘all’) as CDC ORDER BY $start_lsn The $Operation column returned by the change data capture custom table functions identifies the type of DML that caused the data change. Similar to a DML trigger, the data can be the before (deleted table) or after (inserted table) image of an update. The default ‘all’ parameter directs CDC to only return the after, or new, image from an update oper- ation. The ‘all update old’ option, shown in the following example, tells CDC to return a row for both the before update image and the after update image. This query uses a row constructor subquery to spell out the meaning of the operation: SELECT sys.fn_cdc_map_lsn_to_time( $start_lsn) as StartLSN, Operation.Description as ‘Operation’, DepartmentID, Name, GroupName FROM cdc.fn_cdc_get_all_changes_HumanResources_Department (sys.fn_cdc_map_time_to_lsn(’smallest greater than or equal’, ‘20090101’), sys.fn_cdc_map_time_to_lsn(’largest less than or equal’, ‘20091231’), ‘all update old’) as CDC JOIN (VALUES (1, ‘delete’), (2, ‘insert’), (3, ‘update/deleted’), ‘all update old’ option to view (4, ‘update/inserted’) ) as Operation(OperationID, Description) ON CDC. $operation = Operation.OperationID ORDER BY $start_lsn 1291 www.getcoolebook.com . filegroup. ■ @allow_partition_switch: Allows ALTER TABLE SWITCH PARTITION on CDC table Note that the last parameter, @allow_partition_switch, was changed late in development of SQL Server 2008, and some. ‘Department’, @role_name = null; With the first table that’s enabled, SQL Server generates two SQL Agent jobs: ■ cdc.dbname_capture ■ cdc.dbname_cleanup With every table that’s enabled for CDC, SQL. s.schema_id; PRINT @SQL; EXEC sp_executesql @SQL; Only after Change Tracking is disabled from every table can Change Tracking be removed from the database: ALTER DATABASE AdventureWorks2008 SET Change_tracking