78 Hands-On Microsoft SQL Server 2008 Integration Services the FileUsageType property of the connection manager to indicate how you want to use the File Connection Manager—that is, you want to create or use an existing file or a folder. Flat File Connection Manager This connection manager provides access to data in a flat file. It is used to extract data from a flat-file source or load data to a destination and can use delimited, fixed-width, or ragged-right format. This connection manager accesses only one file. If you want to reference multiple flat files, you must use a Multiple Flat Files Connection Manager. FTP Connection Manager Use this connection manager whenever you want to upload or download files using File Transfer Protocol (FTP). It enables you to connect to an FTP server using anonymous authentication or basic authentication. The default port used for FTP connection is 21. FTP Connection Manager can send and receive files using active or passive mode. The transfer mode is defined as active mode when the server initiates the FTP connection and passive mode when the client initiates the FTP connection. HTTP Connection Manager Whenever you want to upload or download files using HTTP (port 80), use this connection manager. It enables you to connect to a web server using HTTP. The Web Service task provided in Integration Services uses this connection manager. Like the FTP Connection Manager, the HTTP Connection Manager allows connections using anonymous authentication or basic authentication. MSMQ Connection Manager When you’re working with mainframe systems or on systems with messaging architecture, you will need to use Message Queuing within your packages for which you will have to use an MSMQ Connection Manager. For example, if you want to use the Message Queue task in Integration Services, you need to add an MSMQ Connection Manager. An MSMQ Connection Manager enables a package to connect to a message queue. Analysis Services Connection Manager If you are creating an analysis services project or database as part of your solution, you may want to update or process Analysis Services objects as part of your SSIS jobs. One simple example could be that your SSIS packages update the data mart nightly, after Chapter 3: Nuts and Bolts of the SSIS Workflow 79 which you may want to process the cube and dimensions to include the latest data in the SSAS database. For such reasons as these, you may include the Analysis Services Connection Manager into your SSIS packages. This connection manager provides access to Analysis Services objects such as cube and dimensions by allowing you to connect to an Analysis Services database or an Analysis Services project in the same solution, though you can connect to an Analysis Services project only at design time. You will use the Analysis Services Connection Manager with the Analysis Services Processing task, Analysis Services Execute DDL task, or Data Mining Model Training destination objects in your package. Multiple Files Connection Manager When you have to connect to multiple files within your Script task or Script component scripts, you will use the Multiple Files Connection Manager. When you add this connection manager, you can add multiple files or folders to be referenced. Those multiple files and folders show up as a piped delimited list in the ConnectionString property of this connection manager. To specify multiple files or folders, you can also use wildcards. Suppose, for example, that you want to use all the text files in the C:\SSIS folder. You could add the Multiple Files Connection Manager by choosing only one file in the C:\SSIS folder, going to the Properties window of the connection manager, and setting the value of the ConnectionString property to C:\SSIS\*.txt. Similar to the File Connection Manager, the Multiple Files Connection Manager has a FileUsageType property to indicate the usage type—that is, how you want to create or use an existing file or a folder. Multiple Flat Files Connection Manager As you can reference only one flat file using the Flat File Connection Manager, you use the Multiple Flat Files Connection Manager when you need to reference more than one flat file. You can access data in flat files having delimited, fixed-width, or ragged- right format. In the GUI of this connection manager, you can select multiple files by using the Browse button and highlighting multiple files. These files are then listed as a piped delimited list in the connection manager. You can also use wildcards to specify multiple files. Suppose, for example, that you want to use all the flat files in the C:\SSIS folder. To do this, you would add C:\SSIS\*.txt in the File Names field to choose multiple files. However, note that all these files must have the same format. So, when you have multiple flat files to import from a folder, you have two options. One is to loop over the files using Foreach Loop Container, read the filenames and pass those filenames one by one to the Flat File Connection Manager so that the files can be imported iteratively. The second option is to use a Multiple Flat Files 80 Hands-On Microsoft SQL Server 2008 Integration Services Connection Manager where you don’t need to use a looping construct; rather, this connection manager reads all the files, collates the data, and passes the data directly to the downstream components in a single iteration as if the data were coming from a single source such as a database table instead of multiple flat files. Both these options have their usability in particular scenarios; for example, if you have to import several files from the same folder and you’re not worried much about auditing and lineage—i.e., where the data is coming from, you can use the Multiple Flat Files Connection Manager method. This method bulk-imports the data quite quickly comparative to the looping construct of dealing with each file. The cost of speed is paid in terms of resource utilization. As all the files are read within the same batch, the CPU utilization and memory requirements are quite high in this case, although for a short duration, depending upon the file sizes. On the other hand, the iterative method deals with a file at a time, requiring less CPU and memory resources, but for a longer duration. Based on the file size, lineage, and auditing requirements, the resource availability on your server and the time window available to import data, you can choose one of these two methods to address the requirements. ODBC Connection Manager This connection manager enables an Integration Services package to connect to a wide range of relational database management systems (RDBMS) using the Open Database Connectivity (ODBC) protocol. OLE DB Connection Manager This connection manager enables an Integration Services package to connect to a data source using an OLE DB provider. OLE DB is an updated ODBC standard and is designed to be faster, more efficient, and more stable than ODBC; it is an open specification for accessing several kinds of data. Many of the Integration Services tasks and data flow components use the OLE DB Connection Manager. For example, the OLE DB source adapter and OLE DB destination adapter use OLE DB Connection Manager to extract and load data, and one of the connections that the Execute SQL task uses is the OLE DB Connection Manager to connect to an SQL Server database to run queries. SMO Connection Manager SQL Management Objects (SMO) is a collection of objects that can be programmed to manage SQL Server. SMO is an upgrade to SQL-DMO, a set of APIs you use to create and manage SQL Server database objects. SMO performs better, is more scalable, and is easy to use compared to SQL-DMO. SMO Connection Manager enables an Chapter 3: Nuts and Bolts of the SSIS Workflow 81 Integration Services package to connect to an SMO server and hence enable you to manage SQL Server objects using SMO scripts. For example, Integration Services transfer tasks use an SMO connection to transfer objects from one server to another. SMTP Connection Manager An SMTP Connection Manager enables an Integration Services package to connect to a Simple Mail Transfer Protocol (SMTP) server. For example, when you want to send an e-mail notification from a package, you can use Send Mail Task and configure it to use SMTP Connection Manager to connect to an SMTP server. SQL Server Compact Edition Connection Manager When you need to connect to an SQL Server Compact database, you will use an SQL Server Compact Connection Manager. SQL Server Compact Destination adapter uses this connection to load data into a table in an SQL Server Compact Edition database. If you’re running the package that uses this connection manager on a 64-bit server, you will need to run it in 32-bit mode, as the SQL Server Compact Edition provider is available in a 32-bit version. WMI Connection Manager Windows Management Instrumentation (WMI) enables you to access management information in enterprise systems such as networks, computers, managed devices, and other managed components using the Web-Based Enterprise Management (WBEM) standard. Using a WMI Connection Manager, your Integration Services package can manage and automate administrative tasks in an enterprise environment. Microsoft Connector 1.0 for SAP BI You can import and export data between Integration Services and SAP BI by using Microsoft Connector 1.0 for SAP BI. Using this connector in Integration Services, you can integrate a non-SAP data source with SAP BI or can use SAP BI as a data source in your data integration application. The Microsoft Connector for SAP BI is a set of managed components that transfers data from and to an SAP NetWeaver BI version 7 system in both Full and Delta modes via standard interfaces. This connector is not installed in the default installation; rather it is an add-in to Integration Services and you have to download the installation files separately from the Microsoft SQL Server 2008 Feature Pack download web page. The SAP BI Connector can be installed on an Enterprise or a Developer Edition of SQL Server 2008 Integration Services; however, 82 Hands-On Microsoft SQL Server 2008 Integration Services you can transfer data between SAP BI 7.0 and any of the versions from SQL Server 2000 and later. The SAP BI connector provides three main components: SAP BI Source c SAP BI Destination c SAP BI Connection Manager c As you can guess, SAP BI Source can be used to extract data from an SAP BI system, SAP BI Destination can be used to load data into an SAP BI system and the SAP BI Connection Manager helps to manage the RFC connection between the Integration Services package and SAP BI. When you install the SAP BI connector, the SAP BI Connection Manager is displayed in the list of connection managers; however, you will need to add the SAP BI Source and SAP BI Destination manually. You can do this by right-clicking the Data Flow Sources in the Toolbox, selecting the Choose Items option, and selecting SAP BI Source from the list in the SSIS Data Flow Items tab. Similarly, you can add the SAP BI Destination by right-clicking the Data Flow Destinations in the Toolbox. Figure 3-2 shows the SAPBI Connection Manager in the Add SSIS Connection Manager dialog box, the SAP BI Source in Data Flow Sources section, and the SAP BI Destination in the Data Flow Destinations section of the Toolbox. Microsoft Connector for Oracle by Attunity Microsoft Oracle and Teradata connectors are developed by Attunity and have been implemented in the same fashion as the SAP BI connector. That is, when you install these connectors, you get a connection manager, a Source component, and a Destination component, though you will have to manually add source and destination components in to the Data Flow Designer Toolbox. Refer to Figure 3-2 to see how these components have been implemented. The Oracle connector has been developed to achieve optimal performance when transferring data from or to an Oracle database using Integration Services. The connector is implemented as a set of managed components and is available for Enterprise and Developer Editions of SQL Server 2008 Integration Services only. The Attunity Oracle Connector supports Oracle 9.2.0.4 and higher-version databases and requires Oracle client software version 10.x or 11.x be installed on the same computer where SSIS will be using this connector. With this connector, you can: Fast Load c Bulk Load Destination using OCI (Oracle Call Interface) Direct Path. Arrayed Load c Bulk Load Destination in batches and the entire batch is inserted under the same transaction. Bulk Extract Source c Using OCI Array Binding. Chapter 3: Nuts and Bolts of the SSIS Workflow 83 Microsoft Connector for Teradata by Attunity The Microsoft Connector for Teradata is a set of managed components developed to achieve optimal performance for transferring data from or to a Teradata database using Integration Services. The connector is available for the Enterprise and Developer Editions of SQL Server 2008 Integration Services only. The SSIS components for Teradata—i.e., Teradata Source, Teradata Destination, and Teradata Connection Figure 3-2 SSIS connection managers and data flow sources and destinations 84 Hands-On Microsoft SQL Server 2008 Integration Services Manager (see Figure 3-2) use the Teradata Parallel Connector (TPC) for connectivity. The Microsoft Connector for Teradata supports Teradata Database version 2R6.0 c Teradata Database version 2R6.1 c Teradata Database version 2R6.2 c Teradata Database version 12.0 c To use this connector, you will have to install Teradata Parallel Transporter (TPT) version 12.0 and the Teradata ODBC driver (version 12 recommended) on the same computer where SSIS will be using this connector. You can use this connector for Bulk Load Destination using TPT FastLoad c Incremental Load Destination using TPT Tpump c Bulk Extract Source using TPT c Data Sources and Data Source Views We have talked about connection managers that can be added in the packages. However, you might have noticed two folders, Data Sources and Data Source Views, in your project in Solution Explorer. These folders can also contain data source connections. However, these are only design-time objects and aren’t available at run time. The connection managers embedded in the packages are used at run time. Data Sources You can create design-time data source objects in Integration Services, Analysis Services, and Reporting Services projects in BIDS. A data source is a connection to a data store— for example, a database. You can create a data source by right-clicking the Data Sources node and selecting the New Data Source option. This will start the Data Source Wizard that will help you create a data source. So, the data source object gets created outside the package and you reference it later in the package. Once a data source is created, it can be referenced by multiple packages. You can reference a data source in a package by right- clicking in the Connection Managers area and selecting the New Connection from Data Source option from the context menu. When you reference a data source inside a package, it is added as a connection manager connection and is used at run time. This approach of having data source created outside a package and then referencing it or embedding it in the package as Chapter 3: Nuts and Bolts of the SSIS Workflow 85 a connection manager has several benefits. You can provide a consistent approach in your packages to make managing connections easier. You can update all the connection managers used in various packages that reference a data source by simply making a change at one place only—in the data source itself, as the data source provides synchronization between itself and the connection managers. Last, you can delete a data source any time without affecting the connection managers in the packages. This is possible because there is no dependency between the two. Connection managers don’t need data sources to be able to work, as they are complete in themselves. The only link between a data source and the connection managers that reference it is that the connection managers get synchronized at times or when the changes occur. The data sources and the data source views are only design-time objects that help in management of the connection managers across several packages. During run time, the package doesn’t need a data source to be present, as it uses connection managers that gets embedded in it anyway. Data sources are not used when building packages programmatically. Data Source View A data source view, built on a data source, is a named, saved subset that defines the underlying schema of a relational data source. A data source view can include metadata that can define sources, destinations, and lookup tables for SSIS tasks, transformations, and data adapters. While a data source is a connection to a data store, the data source views are used to reference more specific objects such as tables or views or their subsets. As you can apply filters on a data source view, you can in fact create multiple data source view objects from a data source. For example, a data source can reference a database, while different data source views can be created to reference its different tables or views. To use a data source view in a package, you must first add the data source to the package. Using data source views can be beneficial. While you can use a data source view in multiple packages, refreshing a data source view reflects the changes in its underlying data sources. Data source views can also cache metadata of the data sources on which they are built and can extend a data source view by adding calculated columns, new relationships, and so on. You can consider this as an additional abstraction layer provided to you for polishing the data model or aligning the metadata as per your package requirements. This can be a very powerful facility in case you’re dealing with third-party databases or working with systems where it is not easy for you to make a change. The data source view can be referenced by data flow components such as OLE DB source and lookup transformations. To reference a data source view, you instantiate the data source and then refer the data source view in the component. Figure 3-3 shows an OLE DB source referencing a CampaignZone1 data source view, where 86 Hands-On Microsoft SQL Server 2008 Integration Services Campaign is a data source. Once you add a data source view to a package, it is resolved to an SQL statement and stored in a property of the component using it. You create a data source view by using the Data Source View Wizard and then modify it in the Data Source View Designer. Data source views are not used when building packages programmatically. SSIS Variables Variables are used to store values. They enable SSIS objects to communicate among each other in the package as well as between parent and child packages at run time. You can use variables in a variety of ways—for example, you can load results of an Execute SQL task to a variable, change the way a package works by dynamically updating its parameters at run time using variables, control looping within a package by using a loaded variable, raise an error when a variable is altered, use them in scripts, or evaluate them as an expression. Figure 3-3 Referencing a data source view inside an OLE DB source Chapter 3: Nuts and Bolts of the SSIS Workflow 87 DTS 2000 provides global variables, for which users set the values in a single area in the package and then use those values over and over. This allows users to extend the dynamic abilities of packages. As the global variables are defined at the package level, sometimes managing all the variables at a single place becomes quite challenging for complex packages. SSIS has improved on this shortcoming by assigning a scope to the variables. Scopes are discussed in greater detail a bit later in the chapter in the section “User-Defined Variables.” Integration Services provides two types of variables—system variables and user- defined variables—that you can configure and use in your packages. System variables are made available in the package and provide environmental information or the state of the system at run time. You don’t have to create the system variables, as they are provided for you, and hence you can use them in your packages straightaway. However, you must create a user-defined variable before you can use it in your package. To see the variables available in a package in BIDS, either go to the Variables window or go to the Package Explorer tab and expand the Variables folder. System Variables The preconfigured variables provided in Integration Services are called system variables. While you create user-defined variables to meet the needs of your packages, you cannot create additional system variables. They are read-only; however, you can configure them to raise an event when they change their value. System variables store informative values about the packages and their objects, which can be used in expressions to customize packages, containers, tasks, and event handlers. Different containers have different system variables available to them. For example, PackageID is available in the package scope, whereas TaskID is available in the Data Flow Task scope. Some of the more frequently used system variables for different containers are defined in Table 3-1. Using these system variables, you can actually extract interesting information from the packages on the fly. For example, at run time using system variables, you can log who started which package at what time. This is exactly what you are going to do in the following Hands-On exercise. Hands-On: Using System Variables to Create Custom Logs This exercise demonstrates how you can create a custom log for an Integration Services package. . Edition of SQL Server 2008 Integration Services; however, 82 Hands-On Microsoft SQL Server 2008 Integration Services you can transfer data between SAP BI 7.0 and any of the versions from SQL Server. an SMTP server. SQL Server Compact Edition Connection Manager When you need to connect to an SQL Server Compact database, you will use an SQL Server Compact Connection Manager. SQL Server Compact. sources and destinations 84 Hands-On Microsoft SQL Server 2008 Integration Services Manager (see Figure 3-2) use the Teradata Parallel Connector (TPC) for connectivity. The Microsoft Connector for