Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 40 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
40
Dung lượng
1,7 MB
Nội dung
754 C HAPTER 59 Incremental loads using T-SQL and SSIS SQLServer 2008 are the Change Tracking and Change Data Capture fea- tures which, as their names imply, automatically track which rows have been changed, making selecting from the source database much easier. Now that we’ve looked at an incremental load using T-SQL , let’s consider how SQLServer Integration Services can accomplish the same task without all the hand-coding. Incremental loads in SSIS SQLServer Integration Services ( SSIS ) is Microsoft’s application bundled with SQLServer that simplifies data integration and transformations—and in this case, incre- mental loads. For this example, we’ll use SSIS to execute the lookup transformation (for the join functionality) combined with the conditional split (for the WHERE clause conditions) transformations. Before we begin, let’s reset our database tables to their original state using the T-SQL code in listing 8. USE SSISIncrementalLoad_Source GO TRUNCATE TABLE dbo.tblSource INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1) -- insert a "changed" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(1, 'B', '1/1/2007 12:02 AM', -2) INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(2, 'N', '1/1/2007 12:03 AM', -3) USE SSISIncrementalLoad_Dest GO TRUNCATE TABLE dbo.tblDest INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1) INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(1, 'C', '1/1/2007 12:02 AM', -2) With the tables back in their original state, we’ll create a new project using Business Intelligence Development Studio ( BIDS ). Creating the new BIDS project To follow along with this example, first open BIDS and create a new project. We’ll name the project SSISIncrementalLoad, as shown in figure 1. Once the project loads, open Solution Explorer, right-click the package, and rename Package1.dtsx to SSISIn- crementalLoad.dtsx. Listing 8 Resetting the tables Insert unchanged row Insert changed row Insert new row Insert unchanged row Insert changed row Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 755 Incremental loads in SSIS When prompted to rename the package object, click the Yes button. From here, fol- low this straightforward series: 1 From the toolbox, drag a data flow onto the Control Flow canvas. Double-click the data flow task to edit it. 2 From the toolbox, drag and drop an OLE DB source onto the Data Flow canvas. Double-click the OLE DB Source connection adapter to edit it. 3 Click the New button beside the OLE DB Connection Manager drop-down. Click the New button here to create a new data connection. Enter or select your server name. Connect to the SSISIncrementalLoad_Source database you cre- ated earlier. Click the OK button to return to the Connection Manager configu- ration dialog box. 4 Click the OK button to accept your newly created data connection as the con- nection manager you want to define. Select dbo.tblSource from the Table drop-down. 5 Click the OK button to complete defining the OLE DB source adapter. Defining the lookup transformation Now that the source adapter is defined, let’s move on to the lookup transformation that’ll join the data from our two tables. Again, there’s a standard series of steps in SSIS : 1 Drag and drop a lookup transformation from the toolbox onto the Data Flow canvas. Figure 1 Creating a new BIDS project named SSISIncrementalLoad Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 756 C HAPTER 59 Incremental loads using T-SQL and SSIS 2 Connect the OLE DB connection adapter to the lookup transformation by click- ing on the OLE DB Source, dragging the green arrow over the lookup, and dropping it. 3 Right-click the lookup transformation and click Edit (or double-click the lookup transformation) to edit. You should now see something like the exam- ple shown in figure 2. When the editor opens, click the New button beside the OLE DB Connection Manager drop-down (as you did earlier for the OLE DB source adapter). Define a new data con- nection—this time to the SSISIncrementalLoad_Dest database. After setting up the new data connection and connection manager, configure the lookup transformation to connect to dbo.tblDest. Click the Columns tab. On the left side are the columns currently in the SSIS data flow pipeline (from SSISIncrementalLoad_Source. dbo.tblSource). On the right side are columns available from the lookup destination you just configured (from SSISIncrementalLoad_Dest.dbo.tblDest). We’ll need all the rows returned from the destination table, so check all the check boxes beside the rows in the destination. We need these rows for our WHERE clauses and our JOIN ON clauses. We don’t want to map all the rows between the source and destination—only the columns named ColID between the database tables. The mappings drawn between the Available Input columns and Available Lookup columns define the JOIN ON clause. Multi-select the mappings between ColA, ColB, and ColC by clicking on them while holding the Ctrl key. Right-click any of them and click Delete Selected Mappings to delete these columns from our JOIN ON clause, as shown in figure 3. Figure 2 Using SSIS to edit the lookup transformation Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 757 Incremental loads in SSIS Add the text Dest_ to each column’s output alias. These rows are being appended to the data flow pipeline. This is so that we can distinguish between source and destina- tion rows farther down the pipeline. Setting the lookup transformation behavior Next we need to modify our lookup transformation behavior. By default, the lookup operates similar to an INNER JOIN —but we need a LEFT ( OUTER ) JOIN . Click the Con- figure Error Output button to open the Configure Error Output screen. On the Lookup Output row, change the Error column from Fail Component to Ignore Fail- ure. This tells the lookup transformation that if it doesn’t find an INNER JOIN match in the destination table for the source table’s ColID value, it shouldn’t fail. This also effectively tells the lookup to behave like a LEFT JOIN instead of an INNER JOIN . Click OK to complete the lookup transformation configuration. From the toolbox, drag and drop a conditional split transformation onto the Data Flow canvas. Connect the lookup to the conditional split as shown in figure 4. Right- click the conditional split and click Edit to open the Conditional Split Transformation Editor. The Editor is divided into three sections. The upper-left section contains a list of available variables and columns. The upper-right section contains a list of available operations you may perform on values in the conditional expression. The lower sec- tion contains a list of the outputs you can define using SSIS Expression Language. Figure 3 Using the Lookup Transformation Editor to establish the correct mappings Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 758 C HAPTER 59 Incremental loads using T-SQL and SSIS Expand the NULL Functions folder in the upper-right section of the Conditional Split Transformation Editor, and expand the Columns folder in the upper-left section. Click in the Output Name column and enter New Rows as the name of the first output. From the NULL Functions folder, drag and drop the ISNULL( <<expression>> ) func- tion to the Condition column of the New Rows condition. Next, drag Dest_ColID from the Columns folder and drop it onto the <<expression>> text in the Condition col- umn. New rows should now be defined by the condition ISNULL( [Dest_ColID] ) . This defines the WHERE clause for new rows—setting it to WHERE Dest_ColID Is NULL . Type Changed Rows into a second output name column. Add the expression (ColA != Dest_ColA) || (ColB != Dest_ColB) || (ColC != Dest_ColC) to the Condition col- umn for the Changed Rows output. This defines our WHERE clause for detecting changed rows—setting it to WHERE ((Dest_ColA != ColA) OR (Dest_ColB != ColB) OR (Dest_ColC != ColC)) . Note that || is the expression for OR in SSIS expressions. Change the default output name from Conditional Split Default Output to Unchanged Rows. It’s important to note here that the data flow task acts on rows. It can be used to manipulate (transform, create, or delete) data in columns in a row, but the sources, destinations, and transformations in the data flow task act on rows. In a conditional split transformation, rows are sent to the output when the SSIS Expression Language condition for that output evaluates as true. A conditional split transformation behaves like a Switch statement in C# or Select Case in Visual Basic, in that the rows are sent to the first output for which the condition evaluates as true. This means that if two or more conditions are true for a given row, the row will be sent to the first output in the list for which the condition is true, and that the row will never be checked to see whether it meets the second condition. Click the OK button to com- plete configuration of the conditional split transformation. Drag and drop an OLE DB destination connection adapter and an OLE DB com- mand transformation onto the Data Flow canvas. Click on the conditional split and connect it to the OLE DB destination. A dialog box will display prompting you to select a conditional split output (those outputs you defined in the last step). Select the New Rows output. Next connect the OLE DB command transformation to the conditional split’s Changed Rows output. Your Data Flow canvas should appear similar to the example in figure 4. Configure the OLE DB destination by aiming at the SSISIncremental- Load_Dest.dbo.tblDest table. Click the Mappings item in the list to the left. Make sure the ColID, ColA, ColB, and ColC source columns are mapped to their matching destination columns (aren’t you glad we prepended Dest_ to the destination col- umns?). Click the OK button to complete configuring the OLE DB destination con- nection adapter. Double-click the OLE DB command to open the Advanced Editor for the OLE DB Command dialog box. Set the Connection Manager column to your SSISIncrementalLoad_Dest connection manager. Click on the Component Proper- ties tab. Click the ellipsis ( .) beside the SQLCommand property. The String Value Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 759 Incremental loads in SSIS Editor displays. Enter the following parameterized T-SQL statement into the String Value text box: UPDATE dbo.tblDest SET ColA = ? ,ColB = ? ,ColC = ? WHERE ColID = ? The question marks in the previous parameterized T-SQL statement map by ordinal to columns named Param_0 through Param_3. Map them as shown here—effectively altering the UPDATE statement for each row: UPDATE SSISIncrementalLoad_Dest.dbo.tblDest SET ColA = SSISIncrementalLoad_Source.dbo.ColA ,ColB = SSISIncrementalLoad_Source.dbo.ColB ,ColC = SSISIncrementalLoad_Source.dbo.ColC WHERE ColID = SSISIncrementalLoad_Source.dbo.ColID As you can see in figure 5, the query is executed on a row-by-row basis. For perfor- mance with large amounts of data, you’ll want to employ set-based updates instead. Figure 4 The Data Flow canvas shows a graphical view of the transformation. Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 760 C HAPTER 59 Incremental loads using T-SQL and SSIS Click the OK button when mapping is completed. If you execute the package with debugging (press F5), the package should succeed. Note that one row takes the New Rows output from the conditional split, and one row takes the Changed Rows output from the conditional split transformation. Although not visible, our third source row doesn't change, and would be sent to the Unchanged Rows output—which is the default Conditional Split output renamed. Any row that doesn’t meet any of the predefined conditions in the conditional split is sent to the default output. Summary The incremental load design pattern is a powerful way to leverage the strengths of the SSIS 2005 data flow task to transport data from a source to a destination. By using this method, you only insert or update rows that are new or have changed. Figure 5 The Advanced Editor shows a representation of the data flow prior to execution. Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 761 Summary About the author Andy Leonard is an architect with Unisys corporation, SQLServer database and integration services developer, SQLServerMVP , PASS regional mentor (Southeast US ), and engineer. He’s a coauthor of several books on SQLServer topics. Andy founded and manages VSTeamSystemCentral.com and main- tains several blogs there—Applied Team System, Applied Data- base Development, and Applied Business Intelligence—and also blogs for SQLBlog.com. Andy’s background includes web application architecture and development, VB , and ASP ; SQLServer Integration Services ( SSIS ); data warehouse develop- ment using SQLServer 2000, 2005, and 2008; and test-driven database development. Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 763 A abstraction 6 abstraction layer 7 access control lists 464 Access databases 715 access roles 298 Access. See Microsoft Access Access/JET database 272 ACLs. See access control lists ACS. See Audit Collection Ser- vice action groups 673 AUDIT_CHANGE_GROUP 673 DBCC_GROUP 673 SCHEMA_OBJECT _CHANGE_GROUP 673 Action on Audit Failure 365 Action property 355 actions 621–623, 625, 673 binding 626 T-SQL stack 625 types 625 Active Directory 345, 498, 500, 517 configuring 503 domains 500–501, 503 Domains and Trust 502–503 forest 501, 503 requirements 503, 508 trees 501 User and Computers 504 Active Domain authentication credentials 270 active queries 390 ActiveSync-connected device 301 ad hoc full backup 447 ad hoc queries 399, 598, 600 largest 598 ad hoc reports 687 ad hoc SQL 7, 217, 554 ad hoc workloads 452 AddDevice method 355 administrative considerations 258 ADO.NET 213, 227, 262, 264–267, 273, 299, 307, 346, 351 ADO.NET 2.0 259, 263, 274 ADO.NET 3.5 210, 226, 268 ADO .NET 3.5 SP1 210 code 210, 228 connection manager 715 conversions 227 data providers 302 data table 208 factory-based objects 228 factory-method-based objects 227 objects 226 OLE DB data provider 663 provider 303 provider model 210 SqlClient data provider 272 ADO.NET Entity Framework 210, 216, 219 Advanced Schedule Options 468 AdventureWorks 30, 47, 87, 111, 178, 182, 185, 188, 541–542, 547, 585–586 SQLServer 2005 version 541 AdventureWorks2008 database 111–112, 177, 189, 201, 376, 646 AdventureWorksDW database 235 AdventureWorksDW2008 691 AdventureWorksLT2008 database 225, 228, 230 ADW aggregations 705 designing aggregations 702 query patterns 701 Specify Object Counts 703 AES 298 AFTER triggers 20, 558, 561 Age in Cache column 593 Age of Empires 269 Agent jobs 330–331, 338, 342 Agent schedule 331 Aggregate component. See asynchronous compo- nents aggregate queries 601 aggregated levels 700 aggregation candidates 702–703 aggregation design 700, 702–703, 705, 707 high-performance cubes 708 index Licensed to Kerri Ross <pedbro@gmail.com> Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... Microsoft Product Support Services 498 Microsoft Project Server Report Pack 89 Microsoft Replication Newsgroup 480 Microsoft Reporting Services 642 Microsoft SQLServer 255 Microsoft SQL Server 2005 150, 235, 277 XML namespaces 163 Microsoft SQLServer Desktop Engine 463 Microsoft SQLServer environment, application performance 582 Microsoft SQLServer Version 4.21 321 Microsoft Sync Framework 305–306... post-snapshot scripts 481 Power Toys 315 PowerGadgets 347 PowerShell 326, 344, 353 administering SQLServer 351 automation engine 351 engine 344 fearures 344 issues 345 profile file 355 remoting 346 scripts 326 SMO 347–348, 354 snap-ins 344 SQLServer 2000 346 SQLServer 2005 346 SQL Server 2008 345–346, 348 SQLServer functionality 348 team 344 Version 2 344 wildcards 345 pre-allocation scheme 245, 247... frequency 577 polling interval 577 resource consumption 580 running as a service 578 transactions/sec 578 Windows Server 2000 578 Windows Server NT 4.0 578 See also Windows System Monitor PerfMon counters 524, 577, 580 SQLServer Statistics, Batch Request/ sec 578 SQL Server 2000 578 SQLServer 2005 578 PerfMon logging automation 578 data manipulation 578 database table 578 System Monitor 578 PerfMon... Linear Regression 691 linked servers four-part naming 661 OPENQUERY 661 linked server data 661 LinkedServers 350 LINQ 212 extension methods 217 LINQ queries 210 rephrasing 218 LINQ to Entities query 212 LINQ to SQL 210–211, 213–215, 217, 219 eager loading 216 escape clauses 218 lazy loading 216 manipulate objects and save 214 performance 211, 219 plan cache pollution 211 SQL injection 211 SubmitChanges... Press 698 msdb 336 database 272 MSDE 465 See also Microsoft SQLServer Desktop Engine MSDN 308 MSF 306, 308, 314, 316 See also Microsoft Sync Framework Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross INDEX MSI-based installations 298, 300 MSrepl_transactions 486 MSSQL 260 MSSQLSERVER 260 MSXML 144 See also Microsoft XML Core Services... 343, 600 maintenance operations 343 Maintenance Plan 331 Maintenance Plan Wizard 342 maintenance plan 342 definition 330 scheduling 331 task types 332 maintenance plans 330 best practices 333 SQL Server 2000 332 SQL Server 2008 332 Maintenance Plans folder 330 maintenance procedures 283 maintenance tasks 387 manageability 421 managed reports 687 Management Data Warehouse 575 Management folder 330 Management... T -SQL 668 empty catalog 177 empty database, creating 433 empty partition 415, 418 empty string 19, 111 empty table partition 414 emulated devices 524 ENABLE 181 Enable Audit 370 enable_table 681 enabling minimal logging 108 encapsulation 3, 7 Encode-SqlName 350 encryption 564 end users 326, 687 Endpoints 350 endpoints 315, 455, 459 Enterprise Admin privileges 498 Enterprise Edition CDC 681 for SQL Server. .. Virtual PC 519 Microsoft Virtual Server 526 Microsoft Virtual Server 2005 R2 519 Microsoft Virtual Server 2005 R2 SP1 520 Microsoft Windows Presentation Foundation 263 Microsoft Word 177, 653 Microsoft XML Core Services Library 143 MicrosoftReportViewer 642 MicrosoftReportViewer control 653 middle application tier 496, 517 middle-tier, rolling maintenance 456 middle-tier server 456 middle-tier write caching... 129 homogenous SQL 211 host level 461 host operating system 520 host process 622 host server 257 hosting domain 258 hostname 553 hot cache 583 HTML 715 HTTPS 272 hung agents 489 hybrid objects 15 Hyper-V 521, 523–525, 527 counters 524 Integration Services 519, 525 logical processors 523 network adapters 525 Perfmon counters 524 performance monitoring 524 role 519 server 524 Hyper-V server, backups... Explorer 505, 512, 644 Internet Information Server 177, 270 Internet Information Services 310, 643 inter-process communication mechanisms 298 inter-table constraints 21 interview method 28 intranet server 497 intranet zone 507 intra-table constraints 21 intra-table data 422 inventory queries 430 inventory system 4 Invoke-PolicyEvaluation 350 Invoke-Sqlcmd 350 Invoke-Sqlcmd cmdlet 350 Iometer 609 IP address . using T -SQL , let’s consider how SQL Server Integration Services can accomplish the same task without all the hand-coding. Incremental loads in SSIS SQL Server. Leonard is an architect with Unisys corporation, SQL Server database and integration services developer, SQL Server MVP , PASS regional mentor (Southeast US ),