308 Hands-On Microsoft SQL Server 2008 Integration Services because both the containers were running under one transaction that was started by the package, and when one of the tasks in a container fails, the transaction rolls back all the work done by previous tasks. One lesson to learn from this exercise is that the parent container, which is the package in this case, must have its TransactionOption property set to Required to start a transaction, and the child containers need to have at least the Supported attribute for this property. Exercise (Case III: Transaction Spanning over Multiple Packages) In the last part of this exercise, you will use a transaction to roll back the inconsistent data when your loading process uses multiple packages. When you have multiple packages to process, you use the Execute Package task to embed them inside a single package to run them. The Execute Package task is basically a wrapper task that enables a package to be used inside another package. The Execute Package task is covered in Chapter 5. 26. Right-click the SSIS Packages node in the Solution Explorer window and choose New SSIS Package from the context menu. You will see that the new package has been added with the default name of Package1.dtsx and the screen is switched to the new package. Note that the Designer shows these two packages as tabs. 27. Go to Package.dtsx, right-click the localhost.Campaign Connection Manager, and choose Copy. Switch back to Package1.dtsx and paste this connection manager in the Connection Managers area. 28. Again go to Package.dtsx and cut the Sequence Container 1 with Loading Vehicle Task, return to Package1.dtsx, and paste this container on the Control Flow. You will see a validation error about the connection manager on the Loading Vehicle task. This is because the ID for the localhost.Campaign Connection Manager has been changed. 29. Double-click the Loading Vehicle task icon to open the editor. In the Connection field, choose localhost.Campaign Connection Manager from the drop-down list and click OK. You’ve divided the first package into two separate packages. To run these two packages as a single job, you need to create a new package and call these two packages using the Package Execute task. 30. Right-click the SSIS Packages node in the Solution Explorer window and choose New SSIS Package from the context menu. When the new blank package is loaded, drop two Execute Package tasks on the Control Flow surface. 31. Rename the first Execute Package task Package and the second task Package1. Join Package to Package1 using an on-success precedence constraint. 32. Double-click the Package icon to open the editor. Go to the Package page and change the Location field value to File System. Chapter 8: Advanced Features of Integration Services 309 33. Click in the Connection field and then click the drop-down arrow and choose <New Connection . . .>. In the File Connection Manager Editor’s File field, type C:\SSIS\Projects\Maintaining data Integrity with Transactions\Package.dtsx and click OK. You will see Package.dtsx displayed in the Connection field. Click OK to close the Execute Package Task Editor. 34. As in the last two steps, open the editor for the Package1 task, change the Location to File System, and add a file connection manager in the Connection field pointing to C:\SSIS\Projects\Maintaining data Integrity with Transactions\Package1.dtsx as the existing file. Close the Execute Package Task Editor after making these changes. 35. Click anywhere on the blank surface of the Control Flow panel and press 4 to open the Properties window for the package. Scroll down and locate the Transactions section and set the TransactionOption property to Required. This will run both the Execute Package tasks and hence the child packages in the context of a single transaction. However, before proceeding any further, verify that the TransactionOption is set to the default value on Package and Package1 tasks and on the Package1.dtsx package. The Package.dtsx will have this property set to Required, which is okay, as this will also enable it to join the transaction started by Package2 .dtsx. At this time, your package will look like the one shown in Figure 8-7. Figure 8-7 Calling multiple packages using the Execute Package tasks 310 Hands-On Microsoft SQL Server 2008 Integration Services 36. Go to the Solution Explorer window, right-click Package2.dtsx, and then select Execute Package from the context menu. You will see that the Package.dtsx will execute successfully and then Package1.dtsx will execute, but it fails, and the components will turn red. 37. Switch to SQL Server Management Studio and run the command you created in Step 10 in the first sequence of steps to see the results. You will see that still no record has been added to the tables, despite the fact that Package.dtsx executed successfully. This is because both the packages were running under one transaction. And when the Loading Vehicle task failed in the Package1.dtsx package, the transaction rolled back not only all the tasks in this package but also the tasks in the other package, Package.dtsx. Review You’ve seen how you can use a transaction to combine various tasks and containers and even the packages to behave as a single unit and create atomicity among them that will commit or roll back as a unit. You’ve worked with the Sequence container to combine set of tasks as a logical unit and have learned a new trick of copying and pasting tasks among packages to increase productivity. While all the preceding is useful when you want to use distributed transactions, you cannot use the distributed transactions in all situations. Sometimes you may need to use Native Transaction support. Native transactions are native to the RDBMS that is used, for instance. A simple case could be that you create and populate a temporary table in one task and want to use it later in another task. This kind of requirement cannot be met using the distributed transaction support. In SSIS, when you configure a task you specify a connection manager on each task. So, when a task is run, a connection is opened specifically for that task, and later this connection is closed when the defined operation on the task has been performed. The closure of a connection doesn’t help to perform native transactions that need the same connection to be retained across all the tasks involved. SSIS provides you a Boolean property on the Connection Manager named intuitively the RetainSameConnection property that allows you to keep a connection open across all the involved tasks. To use this property, click the Connection Manager, then set the RetainSameConnection to True, and then use this connection manager in all the tasks that participate in native transaction process. One of the main benefits of using a native transaction is that you can build a logic-based commit or rollback of the transaction that is otherwise not possible with distributed transactions, which can commit or roll back only on success or failure of the tasks involved. Chapter 8: Advanced Features of Integration Services 311 Restarting Packages with Checkpoints If you’re like most other information analysts and update your data warehouse every night, this feature will be of much interest to you. After having set up logging for your packages, every morning you’d be checking the logs for the last night’s update process to see how the update went. You usually expect that the update process has been successful, but what if the update process has failed? You will have to rerun your package during the daytime—and I know you wouldn’t be happy about this, because doing this work during business hours involves some serious implications. Your users will not get the latest updates and will experience poor performance of the involved database servers while you rerun the update process. If you’ve worked with DTS 2000 packages, you know that DTS 2000 doesn’t support restating a package from the point of failure. You have to rerun the package from the start or manually run the tasks individually, which is quite involving and sometimes impossible to do. This is where Integration Services comes to the rescue by providing improved functionality of restarting a package. By using checkpoints with Integration Services packages, you can restart your failed packages from the point of failure and can save the work that has completed successfully. Integration Services writes all the information that is required to restart a failed package in a checkpoint file. This file is created whenever you run a package the first time after a successful completion, and it is deleted when the package successfully completes. However, if an Integration Services package fails and is configured to use checkpoints, the checkpoint file is not deleted; instead, it is updated with information that is required to rerun the package from that point. When you rerun your package, Integration Services checks two things before executing the package: whether the package is configured to use checkpoints and whether the checkpoint file exists—i.e., whether the package failed while executing last time. If it finds that the package configured to use checkpoints has actually failed the last time it was run—i.e., the checkpoint file exists, it then reads the checkpoint file associated with the package, gets the required information from the file, and restarts the package from the point of failure. The checkpoint file contains all the necessary information for a package to restart at the point of failure such as the execution results of all the completed units of work, the current values of variables involved, and package configuration information. You decide the key positions in your package that would be good candidates for the point of restart and can be written as checkpoints in the file. For example, you would definitely designate a checkpoint immediately after the task that loads a large data set or downloads multiple large files from an FTP site. In case of failure of the package after successfully downloading files or completing loading the data set, the package will be restarted after these tasks, as the checkpoint defines the starting place. As mentioned earlier, the checkpoint file also contains the package configuration information— i.e., the information about the configurations under which the package was running. 312 Hands-On Microsoft SQL Server 2008 Integration Services This avoids reloading of package configurations, as this is read from the checkpoint file and hence maintains the original configurations into which the package was running at the time of failure. To enable your package to record checkpoints information, you set the following properties at the package level: CheckpointUsage c You can access this property in the Checkpoints section of the package Properties window. is property can have one of three values: Never, Always, or If Exists. e default value is Never, which means the checkpoints are not enabled and no checkpoint file will be created; hence, the package will always start processing from the beginning whenever it is executed. e second value is Always, which, if selected, will make the package always use a checkpoint file. If the package has failed in the previous execution and you’ve somehow deleted or lost the checkpoint file, the package will fail to execute. e third possible value is If Exists, which, when selected, makes the package use a checkpoint file if it exists and start the package from the point of failure in the previous execution. You can reuse a checkpoint file over and over for the same package. However, if the checkpoint file doesn’t exist, the package will always start from the beginning. e checkpoint file is specific to a package. Before executing a package, SSIS checks if the PackageID in the checkpoint file is the same as that of the package. If there is a mismatch, SSIS won’t execute the package. SaveCheckpoints c After enabling your package to use checkpoints, you can set this property to True to indicate that checkpoints should be saved. CheckpointFileName c Using this property, you can specify the path and the file into which you would like to save checkpoints. Along with these properties, you also need to set the FailPackageOnFailure property, available in the Execution section in Properties window on the package and the containers, to True to specify that the package will fail when a failure occurs. This property helps in setting the checkpoints on the tasks that you want to make as points of restart. If you do not set this property on any task or container in the package, the checkpoint file will not include any information for the containers on failure and will restart the package from the beginning. It is interesting to note the following points concerning the smallest unit that can be restarted: e smallest unit that can be restarted is a task. c e Data Flow task, which is a special task in Integration Services enclosing c the data flow engine, can consist of several data flow transformations. is task is considered similar to any other Control Flow task as far as checkpoints are Chapter 8: Advanced Features of Integration Services 313 concerned and cannot be started from halfway where it failed. If you have massive pipeline operations in your package and you’re concerned about rerunning packages, it is better that you divide up the data transformations work between multiple Data Flow tasks. e Foreach Loop Container is also considered an atomic unit of work that will c either commit or restart completely to iterate over all the values provided by the enumerator used. When used with For Loop Container, the checkpoint file will save the last value c of the variable and hence will restart from the same point where it left off. The use of an atomic unit of work actually calls for a discussion on transactions and checkpoints, as transactions convert the tasks and the packages involved into an atomic unit of work. Let’s understand the checkpoints and their operation within the scope of a transaction in the following Hands-On exercise. Hands-On: Restarting a Failed Package Using Checkpoints In this exercise, you will simulate a package failure and configure your package with checkpoints to restart it from the point of failure. Method You will use the package you developed earlier in the last exercise and apply checkpoint configurations to it. In the second step, you will use transactions over the package to see its behavior. Exercise (Apply Checkpoint Configurations to Your Package) In the first part of this Hands-on, you configure the Integration Services package to use the checkpoints and execute the package to see it execution behavior. 1. Open BIDS and create a new Integration Services Project with the following details: Name Restarting failed package Location C:\SSIS\Projects 2. When a blank project is created, delete the Package.dtsx package under SSIS Packages node in the Solution Explorer window. Then, right-click the SSIS Packages node and choose Add Existing Package from the context menu. 314 Hands-On Microsoft SQL Server 2008 Integration Services 3. In the Add Copy Of Existing Package dialog box, select Package Location as the File System. In the Package Path field, type C:\SSIS\Projects\Maintaining data Integrity with Transactions\Package.dtsx and click OK to add this package. Once the package has been added, open it in the Designer. 4. Drop an Execute SQL task from the Toolbox on to the Designer surface outside the Sequence container and rename this task Loading Vehicle. Double-click the task icon to open the editor. In the General page’s Connection field, choose the Add localhost.Campaign Connection Manager and type the following SQL statement in the SQLStatement field: INSERT INTO Vehicle (CustomerID, Series, Model) VALUES ('N501', 'X11 Series', 'Saloon') You already know that this SQL statement is without the mandatory VIN field; hence it will fail the Loading Vehicle task. Join the Sequence Container with the Loading Vehicle task using an on-success precedence constraint. 5. Click anywhere on the blank surface of the Designer and press 4 to open the Properties of the package. First, make sure that the package is not configured to use transactions. Scroll down and locate the TransactionOption property, and change its value to Supported. 6. Scroll up in the Properties window and locate the Checkpoints section. Specify the following settings in this section: SaveCheckpoints True CheckpointUsage IfExists CheckPointFileName C:\SSIS\Projects\Restarting failed package\checkpoints.chk 7. Because we want to include the restart information of the Loading Vehicle task in the checkpoints file, click the Loading Vehicle task on the Designer surface. You will see that the context of Properties window changes to show the properties of the Loading Vehicle task. Locate the FailPackageOnFailure property in the Execution section and change its value to True. 8. Press 5 to execute the package. You already know the result of the execution. The Sequence Container and the two Execute SQL tasks in it successfully execute and turn green, but the Loading Vehicle task fails and shows up in red. Press shift- 5 to switch back to designer mode. 9. Let’s see what has happened in the background while the package was executing. Open SQL Server Management Studio and run the following query to see the records imported into the database: SELECT n.[CustomerID], [FirstName], [SurName], [Email], [Type], [VIN], [Series], [Model] FROM [Campaign].[dbo].[NewCustomer] n LEFT OUTER JOIN [Campaign].[dbo].[EmailAddress] e Chapter 8: Advanced Features of Integration Services 315 ON n.CustomerID = e.CustomerID LEFT OUTER JOIN [Campaign].[dbo].[Vehicle] v ON n.CustomerID = v.CustomerID You will see that the customer information and its e-mail information have been loaded while the vehicle information fields have null values. Using Windows Explorer, navigate to the C:\SSIS\Projects\Restarting failed package folder and note that the checkpoints.chk file has been created. Open this XML formatted file and note that it contains information about the failure of the package and the cylinder involved in the failure. 10. Change the SQL statement of the Loading Vehicle task to include the VIN information with the following query: INSERT INTO Vehicle (CustomerID, VIN, Series, Model) VALUES ('N501', 'UV123WX456YZ789', 'X11 Series', 'Saloon') 11. Again execute the package. This time you will see that only the Loading Vehicle task is executed and the earlier two tasks and the Sequence container did not run at all (see Figure 8-8). This is because the package reads the checkpoint file before executing and finds the information about where to start executing. Press - 5 to switch back to design mode. Figure 8-8 Restarting package with checkpoints 316 Hands-On Microsoft SQL Server 2008 Integration Services 12. Explore to the C:\SSIS\Projects\Restarting failed package folder and note that the checkpoints.chk file does not exist. 13. Switch to SQL Server Management Studio and run the script specified in Step 9 to see the result set. You will see one record containing customer, e-mail, and vehicle information. Run the following queries to clear the tables: DELETE [Campaign].[dbo].[NewCustomer] DELETE [Campaign].[dbo].[EmailAddress] DELETE [Campaign].[dbo].[Vehicle] Exercise (Effect of Transaction on Checkpoints) To set transactions on this package we need to set the TransactionOption value to Required. So, let’s do it. 14. Click anywhere on the blank surface of the Control Flow Panel and press 4 to open the Properties window. Scroll down and locate the TransactionOption property in the Transactions section. Set it to the Required value so that it starts a transaction. But SSIS doesn’t allow you to do this and throws an error as shown in Figure 8-9. This behavior is different than Integration Services 2005, in which you could use transactions and checkpoints in the same package and Integration Services left proper usage and management of both of them to you. In that case the transactions roll back the information of the checkpoint file and cause that package to execute all over again. This is actually applicable to containers in simple packages also. But there is a potential for error or misbehavior when you are using Integration Services 2005 with checkpoints Figure 8-9 Error thrown while trying to use transactions alongside checkpoints Chapter 8: Advanced Features of Integration Services 317 and transactions in a complex package; that is, if your package consists of a complex container hierarchy and a subcontainer commits before the parent container fails, the subcontainers do not get rolled back and also do not get recorded in the checkpoint file. This causes those subcontainers to be executed again when the parent container is restarted. Similarly, the Foreach Loop container does not record any information in the checkpoint file about the iterations it may have already done before failing and gets executed all over again when restarted. So, when you’re planning to use checkpoints alongside the transactions, use caution and test thoroughly. Integration Services 2008 R2, by contrast, stops you doing that altogether due to the complexity and risk involved, and you can’t use transactions and checkpoints in your packages at the same time. Review You’ve seen in this exercise that the checkpoints can help you restart a package precisely from the task where the package failed. You also understand that you need to be careful while using transactions and checkpoints on packages with complex container hierarchies in Integration Services 2005. On the other hand, Integration Services 2008 R2 doesn’t allow you to implement checkpoints and transactions at the same time. Expressions and Variables You learned about variables and property expressions in Chapter 3 and have used them in various Hands-On exercises in subsequent chapters. With DTS 2000, use of variables was considered an advanced feature that allowed you to add some dynamic behavior to your packages. However, use of variables in Integration Services is made easier and has been tied into SSIS package design so much that the packages developed without using variables are reduced to ad hoc data operations, most of which can be done using the SQL Server Import and Export Wizard. On the other hand, use of property expressions is a new feature in Integration Services that provides an ability to set values for component properties dynamically using variables that are updated at run time by other tasks. Property Expressions allow you to evaluate values generated at run time by other tasks and use the evaluated values to update properties exposed by the concerned task at run time. This is quite a powerful feature, as it allows you to read and evaluate the values that exist only at run time and modify the property or behavior of other tasks in the package. Though you’ve used variables and expressions in the Hands-On exercises earlier, here you will do another exercise that uses variables and particularly property expressions extensively to update properties of the send mail task to generate personalized mails. . 308 Hands-On Microsoft SQL Server 2008 Integration Services because both the containers were running under one transaction. 8-7. Figure 8-7 Calling multiple packages using the Execute Package tasks 310 Hands-On Microsoft SQL Server 2008 Integration Services 36. Go to the Solution Explorer window, right-click Package2.dtsx,. information about the configurations under which the package was running. 312 Hands-On Microsoft SQL Server 2008 Integration Services This avoids reloading of package configurations, as this is read