Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1272 Part VIII Monitoring and Auditing Enabling all tables Enabling every table in a large database for Change Tracking can be cumbersome — scripting the ALTER command for every table. Fortunately, sp_MSforeachtable, an undocumented Microsoft stored procedure, is the salve that binds the wound. sp_MSforeachtable executes like a cursor, executing a command, enclosed in single quotes, for every table in the current database. The ? placeholder is replaced with the schema.tablename for every table. If an error occurs, then it’s reported in the message pane, but sp_MSforeachtable trudges along with the next table. This script enables Change Tracking for every table in the current database: EXEC sp_MSforeachtable ‘ALTER TABLE ? Enable Change_tracking With (track_columns_updated = on);’; Internal tables Change Tracking stores its data in internal tables. There’s no reason to directly query these tables to use Change Tracking. However, it is useful to look at the space used by these tables when considering the cost of using Change Tracking and to estimate disk usage. Query sys.internal_tables to find the internal tables. Of course, your Change Tracking table(s) will have a different name: SELECT s.name + ‘.’ + o.name as [table], i.name as [ChangeTracking], ct.is_track_columns_updated_on, ct.min_valid_version, ct.begin_version, ct.cleanup_version FROM sys.internal_tables i JOIN sys.objects o ON i.parent_id = o.object_id JOIN sys.schemas s ON o.schema_id = s.schema_id JOIN sys.change_tracking_tables ct ON o.object_id = ct.object_id WHERE i.name LIKE ‘change_tracking%’ ORDER BY [table] Result (abbreviated): table ChangeTracking HumanResources.Department sys.change_tracking_757577737 Armed with the name, it’s easy to find the disk space used. Because Change Tracking was just enabled in this database, the internal table is still empty: 1272 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1273 Change Tracking 59 EXEC sp_spaceused ‘sys.change_tracking_757577737’ Result: name rows reserved data index_size unused change_tracking_757577737 0 0 KB 0 KB 0 KB 0 KB This query combines the Change Tracking configuration with the internal name: SELECT s.name + ‘.’ + o.name as [table], i.name as [ChangeTracking], ct.is_track_columns_updated_on, ct.min_valid_version, ct.begin_version, ct.cleanup_version FROM sys.internal_tables i JOIN sys.objects o ON i.parent_id = o.object_id JOIN sys.schemas s ON o.schema_id = s.schema_id JOIN sys.change_tracking_tables ct ON o.object_id = ct.object_id WHERE i.name LIKE ‘change_tracking%’ ORDER BY [table] Querying Change Tracking Once Change Tracking is enabled for a table, SQL Server begins to store information about which rows have changed. This data may be queried to select only the changed data from the source table — perfect for synchronization. Version numbers Key to understanding Change Tracking is that Change Tracking numbers every transaction with a database-wide version number, which becomes important when working with the changed data. This version number may be viewed using a function: SELECT Change_tracking_current_version(); Result: 0 The current version number is the number of the latest Change Tracking version stored by Change Tracking, so if the current version is 5, then there is a version 5 in the database, and the next transaction will be version 6. 1273 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1274 Part VIII Monitoring and Auditing The following code makes inserts and updates to the HumanResources.Department table while watching the Change Tracking version number: INSERT HumanResources.Department (Name, GroupName) VALUES (‘CT New Row’, ‘SQL Rocks’), (‘Test Two’ , ‘SQL Rocks’); SELECT Change_tracking_current_version(); Result: 1 The inserts added two new rows, with primary key values of DepartmentID 17 and 18. And now an update: UPDATE HumanResources.Department SET Name = ‘Changed Name’ WHERE Name = ‘CT New Row’; The update affected row DepartmentID = 17. Testing the Change Tracking version shows that it has been incremented to 2: SELECT Change_tracking_current_version(); Result: 2 The version number is critical to querying ChangeTable (explained in the next section), and it must be within the range of the oldest possible version number for a given table and the current database version number. The old data is probably being cleaned up automatically, so the oldest possible version number will likely vary for each table. The following query can report the valid version number range for any table. In this case, it returns the current valid queryable range for HumanResources.Department: SELECT Change_tracking_min_valid_version (Object_id(N‘HumanResources.Department’)) as ‘oldest’, Change_tracking_current_version() as ‘current’; Result: oldest current 02 1274 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1275 Change Tracking 59 Changes by the row Here’s where Change Tracking shows results. The primary keys of the rows that have been modified since (or after) a given version number can be found by querying the ChangeTable table-valued function, passing to it the Change Tracking table and a beginning version number. For example, passing table XYZ and version number 10 to ChangeTable will return the changes for version 11 and following that were made to table XYZ. Think of the version number as the number of the last synchronization, so this synchronization needs all the changes after the last synchronization. In this case, the Change Tracking table is HumanResources.Department and the beginning version is 0: SELECT * FROM ChangeTable (Changes HumanResources.Department, 0)asCT; Result: SYS SYS CHANGE SYS SYS SYS CHANGE CREATION CHANGE CHANGE CHANGE VERSION VERSION OPERATION COLUMNS CONTEXT DepartmentID 2 1 I NULL NULL 17 1 1 I NULL NULL 18 Since version number 0, two rows have been inserted. The update to row 17 is still reported as an insert because, for the purposes of synchronization, row 17 must be inserted. If version number 1 is passed to ChangeTable, then the result should show only change version 2: SELECT * FROM ChangeTable (Changes HumanResources.Department, 1) as CT; Result (formatted to include the syschangecolumns data): SYS SYS CHANGE SYS SYS SYS CHANGE CREATION CHANGE CHANGE CHANGE VERSION VERSION OPERATION COLUMNS CONTEXT DepartmentID 21U 0x0000000002000000 17 NULL This time row 17 shows up as an update, because when version 2 occurred, row 17 already existed, and version 2 updated the row. A synchronization based on changes made since version 1 would need to update row 17. Note that as a table-valued function, ChangeTable must have an alias. 1275 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1276 Part VIII Monitoring and Auditing Synchronizing requires joining with the source table. The following query reports the changed rows from HumanResources.Department since version 1. The left outer join is necessary to pick up any deleted rows which, by definition, no longer exist in the source table and would therefore be missed by an inner join: SELECT CT.SYS_CHANGE_VERSION as Version, CT.DepartmentID, CT.SYS_CHANGE_OPERATION as Op, d.Name, d.GroupName FROM ChangeTable (Changes HumanResources.Department, 1) as CT LEFT OUTER JOIN HumanResources.Department d ON d.DepartmentID = CT.DepartmentID ORDER BY CT.SYS_CHANGE_VERSION; Result: Version DepartmentID Op Name GroupName 2 17 U Changed Name SQL Rocks As expected, the result shows row 17 being updated, so there’s no data other than the primary key returned by the ChangeTable data source. The join pulls in the data from HumanResources .Department . Coding a synchronization Knowing which rows have been changed means that it should be easy to merge those changes into a synchronization table. The trick is synchronizing a set of data while changes are still being made at the source, without locking the source. Assuming the previous synchronization was at version 20, and the current version is 60, then 20 is passed to ChangeTable. But what becomes the new current version? The current version just before the ChangeTable is queried and the data is merged? What if more changes occur during the synchronization? The new SQL Server 2008 MERGE command would seem to be the perfect solution. It does support the output clause. If the version is stored in the synchronization target table, then the output clause’s inserted table can return the insert and update operation new versions, and the max() versions can be determined. But deletion operations return only the deleted virtual table, which would return the version number of the last change made to the deleted row, and not the version number of the deletion event. The solution is to capture all the ChangeTable data to a temp table, determine the max version num- ber for that synchronization set, store that version number, and then perform the synchronization merge. As much as I hate temp tables, it’s the only clean solution. The following script sets up a synchronization from HumanResources.Department to HRDeptSynch. Synchronization typically occurs from one device to another, or one database to another. Here, AdventureWorks2008 is the source database, and tempdb will serve as the target database. Assume the tempdb.dbo.HRDeptSynch table was last synchronized before any changes were made to AdventureWorks2008.HumanResources.Department in this chapter. By including the database name in the code, there’s no need to issue a USE DATABASE command: 1276 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1277 Change Tracking 59 create synch master version table CREATE TABLE Tempdb.dbo.SynchMaster ( TableName SYSNAME, LastSynchVersion INT, SynchDateTime DATETIME ) initialize for HRDeptSynch INSERT Tempdb.dbo.SynchMaster (TableName, LastSynchVersion) VALUES (‘HRDeptSynch’, 0) create target table CREATE TABLE Tempdb.dbo.HRDeptSynch ( DepartmentID SmallINT, Name NVARCHAR(50), GroupName NVARCHAR(50), Version INT ) Populate Synch table with baseline original data INSERT Tempdb.dbo.HRDeptSynch (DepartmentID, Name, GroupName) SELECT DepartmentID, Name, GroupName FROM HumanResources.Department; Another good idea in this process is to check Check Change_tracking_min_valid_version (Object_id(N‘HumanResources.Department’)) as ‘oldest’ to verify that the synchronization won’t miss cleaned-up data. The following stored procedure uses Change Tracking, a synch master table, a temp table, and the new SQL Server MERGE command to synchronize any changes in the source table (HumanResources .Department) into the target table (Tempdb.dbo.HRDeptSynch): USE AdventureWorks2008; CREATE PROC pHRDeptSynch AS SET NoCount ON; DECLARE @LastSynchMaster INT, @ThisSynchMaster INT; CREATE TABLE #HRDeptSynch ( Version INT, Op CHAR(1), DepartmentID SmallINT, Name NVARCHAR(50), 1277 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1278 Part VIII Monitoring and Auditing GroupName NVARCHAR(50) ); SELECT @LastSynchMaster = LastSynchVersion FROM Tempdb.dbo.SynchMaster WHERE TableName = ‘HRDeptSynch’; INSERT #HRDeptSynch (Version, Op, DepartmentID, Name, GroupName) SELECT CT.SYS_CHANGE_VERSION as Version, CT.SYS_CHANGE_OPERATION as Op, CT.DepartmentID, d.Name, d.GroupName FROM ChangeTable (Changes HumanResources.Department, @LastSynchMaster) as CT LEFT OUTER JOIN HumanResources.Department d ON d.DepartmentID = CT.DepartmentID ORDER BY CT.SYS_CHANGE_OPERATION; MERGE INTO Tempdb.dbo.HRDeptSynch as Target USING (SELECT Version, Op, DepartmentID, Name, GroupName FROM #HRDeptSynch) AS Source (Version, Op, DepartmentID, Name, GroupName) ON Target.DepartmentID = Source.DepartmentID WHEN NOT MATCHED AND Source.Op = ‘I’ THEN INSERT (DepartmentID, Name, GroupName) VALUES (DepartmentID, Name, GroupName) WHEN MATCHED AND Source.Op = ‘U’ THEN UPDATE SET Name = Source.Name, GroupName = Source.GroupName WHEN MATCHED AND Source.Op = ‘D’ THEN DELETE; UPDATE Tempdb.dbo.SynchMaster SET LastSynchVersion = (SELECT Max(Version) FROM #HRDeptSynch), SynchDateTime = GETDATE() WHERE TableName = ‘HRDeptSynch’; Go To put the stored procedure through its paces, the following script makes several modifications to the source table and calls pHRDeptSynch: INSERT HumanResources.Department (Name, GroupName) VALUES (‘Row Three’, ‘Data Rocks!’), (‘Row Four’ , ‘SQL Rocks!’); 1278 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1279 Change Tracking 59 UPDATE HumanResources.Department SET GroupName = ‘SQL Server 2008 Bible’ WHERE Name = ‘Test Two’; EXEC pHRDeptSynch; DELETE FROM HumanResources.Department WHERE Name = ‘Row Four’; EXEC pHRDeptSynch; EXEC pHRDeptSynch; DELETE FROM HumanResources.Department WHERE Name = ‘Test Two’; EXEC pHRDeptSynch; To test the results, the next two queries search for out of synch conditions. The first query uses a set-difference query with a FULL OUTER JOIN and two IS NULLs to find any mismatched rows on either side of the join: check for out-of-synch rows: SELECT * FROM HumanResources.Department Source FULL OUTER JOIN tempdb.dbo.HRDeptSynch Target ON Source.DepartmentID = Target.DepartmentID WHERE Source.DepartmentID IS NULL OR Target.DepartmentID IS NULL There is no result set. The second verification query simply joins the tables and compares the data columns in the WHERE clause to return any rows with mismatched data: Check for out-of-synch data SELECT * FROM HumanResources.Department Source LEFT OUTER JOIN tempdb.dbo.HRDeptSynch Target ON Source.DepartmentID = Target.DepartmentID WHERE Source.Name != Target.Name OR Source.GroupName != Target.GroupName There is no result set. Good. The Change Tracking and the synchronization stored procedure worked — and the stored version number is absolutely the correct version number for the next synchronization. To check the versions, the next two queries look at Change Tracking’s current version and the version stored in SynchMaster: 1279 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1280 Part VIII Monitoring and Auditing SELECT Change_tracking_current_version(); Result: 6 SELECT * FROM tempdb.dbo.SynchMaster; Result: TableName LastSynchMaster SynchDateTime HRDeptSynch 6 2009-01-16 18:00:42.643 Although lengthy, this exercise showed how to leverage Change Tracking and the new MERGE command to build a complete synchronization system. Change Tracking Options It’s completely reasonable to use only the ChangeTable function to design a Change Tracking system, but three advanced options are worth exploring. Column tracking If Change Tracking was enabled for the table with the track_columns_updated option on (it’s off by default), then SQL Server stores which columns are updated in a bitmap that costs four bytes per changed column (to store the column’s column_id). The CHANGE_TRACKING_IS_COLUMN_IN_MASK function returns a Boolean true if the column was updated. It requires two parameters: the column’s column_id and the bit-mapped column. The bit-mapped column that actually stored the data is the SYS_CHANGED_COLUMNS column in the ChangeTable row. The following query demonstrates the function, and the easiest way to pass in the column_id: SELECT CT.SYS_CHANGE_VERSION, CT.DepartmentID, CT.SYS_CHANGE_OPERATION, d.Name, d.GroupName, d.ModifiedDate, CHANGE_TRACKING_IS_COLUMN_IN_MASK( ColumnProperty( Object_ID(‘HumanResources.Department’), ‘Name’, ‘ColumnID’), SYS_CHANGE_COLUMNS) as IsChanged_Name, CHANGE_TRACKING_IS_COLUMN_IN_MASK( ColumnProperty( Object_ID(‘HumanResources.Department’), ‘GroupName’, ‘ColumnID’), SYS_CHANGE_COLUMNS) as IsChanged_GroupName FROM ChangeTable (Changes HumanResources.Department, 1) as CT LEFT OUTER JOIN HumanResources.Department d ON d.DepartmentID = CT.DepartmentID; 1280 www.getcoolebook.com Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1281 Change Tracking 59 Determining latest version per row The Change Tracking version is a database-wide version number, but it is possible to determine the latest version for every row in a table, regardless of the last synchronization, using the ChangeTable’s version option. The CROSS APPLY calls the table-valued function for every row in the outer query: SELECT d.DepartmentID, CT.SYS_CHANGE_VERSION FROM HumanResources.Department d CROSS APPLY ChangeTable (Version HumanResources.Department, (DepartmentID), (d.DepartmentID)) as CT ORDER BY d.DepartmentID; Result (abbreviated): DepartmentID Sys_Change_Version 15 NULL 16 NULL 17 2 19 3 To find the last synchronized version per row since a specific version, use ChangeTable with the Changes option. In this example, row 17 was last updated with version 2, so requesting the most recent versions since version 2 returns a NULL for row 17: SELECT d.DepartmentID, CT.SYS_CHANGE_VERSION FROM HumanResources.Department d LEFT OUTER JOIN ChangeTable (Changes HumanResources.Department, 2) as CT ON d.DepartmentID = CT.DepartmentID ORDER BY d.DepartmentID; Result (abbreviated): DepartmentID Sys_Change_Version 15 NULL 16 NULL 17 NULL 19 3 Capturing application context It’s possible to pass information about the DML’s context to Change Tracking. Typically the context could be the username, application, or workstation name. The context is passed as a varbinary data type. Adding context to Change Tracking opens the door for Change Tracking to be used to gather OLTP audit trail data. 1281 www.getcoolebook.com . query: SELECT d.DepartmentID, CT.SYS_CHANGE_VERSION FROM HumanResources.Department d CROSS APPLY ChangeTable (Version HumanResources.Department, (DepartmentID), (d.DepartmentID)) as CT ORDER BY d.DepartmentID; Result. d.DepartmentID, CT.SYS_CHANGE_VERSION FROM HumanResources.Department d LEFT OUTER JOIN ChangeTable (Changes HumanResources.Department, 2) as CT ON d.DepartmentID = CT.DepartmentID ORDER BY d.DepartmentID; Result. * FROM HumanResources.Department Source FULL OUTER JOIN tempdb.dbo.HRDeptSynch Target ON Source.DepartmentID = Target.DepartmentID WHERE Source.DepartmentID IS NULL OR Target.DepartmentID IS NULL There