358 Hands-On Microsoft SQL Server 2008 Integration Services To train the data mining models using this destination, you need a connection to SQL Server Analysis Services, where the mining structure and the mining models reside. For this, you can use Analysis Services Connection Manager to connect to an instance of Analysis Services or to the Analysis Services project. The Data Mining Model Training Editor has two tabs, Connection and Columns, in which you can configure the required properties. In the Connection tab, you specify the connection manager for Analysis Services in the Connection Manager field and then specify the mining structure that contains the mining models you want this data to train. Once you select a mining structure in the Mining structure field, the list of mining models is displayed in the Mining models area, and this destination adapter will train all the models contained within the specified mining structure. In the Columns tab, you can map available input columns to the Mining structure columns. The processing of the mining model requires data to be sorted, which you can achieve by adding a sort transformation before the data mining model training destination. DataReader Destination When your ADO.NET–compliant application needs to access data from the data flow of an SSIS package, you can use the DataReader destination. Integration Services can provide data straight from the pipeline to your ADO.NET application in cases where you need dynamic processing to happen when users request using the ADO.NET DataReader interface. SSIS data processing extension facilitates provision of the data via the DataReader destination. An excellent use of the DataReader destination is as a data source for an SSRS report. The DataReader destination doesn’t have a custom UI but uses the Advanced Editor to expose all the properties organized in three tabs. You can specify Name, Description, LocaleID, and ValidateExternalMetadata properties in the Common Properties section of the Component Properties tab. In the Custom Properties section, you can specify a ReadTimeout value in milliseconds, and if this value is exceeded, you can choose to fail the component in the FailOnTimeout field. In the Input Columns tab, you can select the columns you want to output, assign each of them an output alias, and specify a usage type of READONLY or READWRITE from the drop-down list box. Finally, the Input And Output Properties tab lists only the input column details, as DataReader destination has only one input and no error output. Dimension Processing Destination One of the frequent uses of Integration Services is to load data warehouse dimensions using the dimension processing destination. This destination can be used to load and process an SQL Server Analysis Services dimension. Being a destination, it has no output and one input, and it does not support an error output. Chapter 9: Data Flow Components 359 The dimension processing destination has a custom user interface, but the Advanced Editor can also be used to modify properties that are not available in the custom editor. In the Dimension Processing Destination Editor, the properties are grouped logically in three different pages. In the Connection Manager page, you can specify the connection manager for Analysis Services to connect to the Analysis Services server or an Analysis Services project. Using this connection manager, the Dimension Processing Destination Editor accesses all the dimensions in the source and displays them as a list for you to select the one you want to process. Next you can choose the processing method from add (incremental), full, or update options. In the Mappings page, you can map the Available Input Columns to the Available Destination Columns using a drag-and- drop operation. The Advanced page allows you to configure error handling in the dimension processing destination. You can choose from several options to configure the way you want the errors to be handled: By default, this destination will use default Analysis Services error handling that c you can change by un-checking the Use Default Error Configuration check box. When the dimension processing destination processes a dimension to populate c values from the underlying columns, an unacceptable key value may be encountered. In such cases, you can use the Key Error Action field to specify that the record be discarded by selecting the DiscardRecord value, or you can convert the unacceptable key value to the UnknownMember value. UnknownMember is a property of the analysis services dimension indicating that the supporting column doesn’t have a value. Next you can specify the processing error limits and can choose to either ignore c errors or stop on error. If you select Stop On Error option, then you can specify the error threshold using the Number Of Errors option. Also, you can specify the on error action either to stop processing or to stop logging when the error threshold is reached by selecting the StopProcessing or StopLogging value. You can also specify specific error conditions such as these: c When the destination raises an error of Key Not Found, you can select it to be c IgnoreError or ReportAndStop, whereas, by default, it is ReportAndContinue. Similarly, you can configure for Duplicate Key error for which default action c is to IgnoreError. You can set it to ReportAndStop or ReportAndContinue if you wish. When a null key is converted to the UnknownMember value, you can choose c to ReportAndStop or ReportAndContinue. By default, the destination will IgnoreError. 360 Hands-On Microsoft SQL Server 2008 Integration Services When a null key value is not allowed in data, this destination will c ReportAndContinue by default. However, you can set it to IgnoreError or ReportAndStop. You can specify a path for the error log using the Browse button. c Excel Destination Using the Excel destination, you can output data straight to an Excel workbook, worksheets, or ranges. You use an Excel Connection Manager to connect to an Excel workbook. Like an Excel Source, the Excel destination treats the worksheets and ranges in an Excel workbook as tables or views. The Excel destination has one regular input and one error output. This destination has its own custom user interface that you can use to configure its properties; the Advanced Editor can also be used to modify the remaining properties. The Excel Destination Editor lists its properties in three different pages. In the Connection Manager page, you can select the name of the connection manager from the drop-down list in the OLE DB Connection Manager field. Then you can choose one of these three data access mode options: Table or view c Lets the Excel destination load data in the Excel worksheet or named range; specify the name of the worksheet or the range in the Name Of e Excel Sheet field. Table name or view name variable c Works like the Table Or View option except that the name of the table or view is contained within a variable that you specify in the Variable Name field. SQL command c Allows you to load the results of an SQL statement to an Excel file. In the Mappings page, you can map Available Input Columns to the Available Destination Columns using a drag-and-drop operation. In the Error Output page you can configure the behavior of the Excel destination for errors and truncations. You can ignore the failure, redirect the data, or fail the component for each of the columns in case of an error or a truncation. Flat File Destination Every now and then you may require outputting some data from disparate sources to a text file, as this is the most convenient method to share data with external systems. You can build an Integration Services package to connect to those disparate sources, extract data using customized extraction rules, and output the required data set to a text file Chapter 9: Data Flow Components 361 using the flat file destination adapter. This destination requires a Flat File Connection Manager to connect to a text file. When you configure a Flat File Connection Manager, you also configure various properties to specify the type of the file and how the data will reside in the file. For example, you can choose the format of the file to be delimited, fixed width, or ragged right (also called mixed format). You also specify how the columns and rows will be delimited and the data type of each column. In this way, the Flat File Connection Manager provides a basic structure to the file, which the destination adapter uses as is. This destination has one output and no error output. The Flat File destination has a simple customized user interface, though you can also use the Advanced Editor to configure some of the properties. In the Flat File Destination Editor, you can specify the connection manager you want to use for this destination in the Flat File Connection Manager field and select the check box for “Overwrite data in the file” if you want to overwrite the existing data in the flat file. Next you are given an opportunity to provide a block of text in the Header field, which can be added before the data as a header to the file. In the Mappings page, you can map Available Input Columns to the Available Destination Columns. OLE DB Destination You can use the OLE DB destination when you want to load your transformed data to OLE DB–compliant databases, such as Microsoft SQL Server, Oracle, or Sybase database servers. This destination adapter requires an OLE DB Connection Manager with an appropriate OLE DB provider to connect to the data destination. The OLE DB destination has one regular input and one error output. This destination adapter has a custom user interface that can be used to configure most of the properties alternatively you can also use the Advanced Editor. In the OLE DB Destination Editor, you can specify an OLE DB connection manager in the Connections Manager page. If you haven’t configured an OLE DB Connection Manager in the package yet, you can create a new connection by clicking New. Once you’ve specified the OLE DB Connection Manager, you can select the data access mode from the drop-down list. Depending on the option you choose, the editor interface changes to collect the relevant information. Here you have five options to choose from: Table or view c You can load data into a table or view in the database specified by OLE DB Connection Manager. Select the table or the view from the drop- down list in the name of the table or the view field. If you don’t already have a table in the database where you want to load data, you can create a new table by clicking New. An SQL statement for creating a table is created for you when you click New. e columns use the data type and the length same as that of the input 362 Hands-On Microsoft SQL Server 2008 Integration Services columns, which you can change if you want. However, if you provide the wrong data type or a shorter column length, you will not be warned and may get errors at run time. If you are happy with the CREATE TABLE statement, all you need to do is provide a table name replacing the [OLE DB Destination] string after CREATE TABLE in the SQL statement. Table or view—fast load c e data is loaded into a table or view as in the preceding option; however, you can configure additional options here when you select fast load data access mode. e additional fast load options are: Keep identity c During loading, the OLE DB destination needs to know whether it has to keep the identity values coming in the data or it has to assign unique values itself to the columns configured to have identity key. Keep nulls c Tells the OLE DB destination to keep the null values in the data. Table lock c Acquires a table lock during bulk load operation to speed up the loading process. is option is selected by default. Check constraints c Checks the constraints at the destination table during the data loading operation. is option is selected by default. Rows per batch c Specifies the number of rows in a batch in this box. e loading operation handles the incoming rows in batches and the setting in this box will affect the buffer size. So, you should test out a suitable value for this field based on the memory available to this process during run time on your server. Maximum insert commit size c You can specify a number in this dialog box to indicate the maximum size that the OLE DB destination handles to commit during loading. e default value of 2147483647 indicates that these many rows are considered in a single batch and they will be handled together—i.e., they will commit or fail as a single batch. Use this setting carefully, taking into consideration how busy your system is and how many rows you want to handle in a single batch. A smaller value means more commits and hence the overall loading will take more time; however, if the server is a transactional server hosting other applications, then this might be a good idea to share resources on the server. However, if the server is a dedicated reporting or data mart server or you’re loading at a time when the other activities on the server are less active, then using a higher value in this box will reduce the overall loading time. Make sure you use fast load data access mode when loading with double-byte character set (DBCS) data; otherwise, you may get corrupted data loaded in your table or view. The DBCS is a set of characters in which each character is represented by two bytes. Chapter 9: Data Flow Components 363 The environments using ideographic writing systems such as Japanese, Korean, and Chinese use DBCS, as they contain more characters than can be represented by 256 code points. These double-byte characters are commonly called Unicode characters. Examples of data types that support Unicode data in SQL Server are nchar, nvarchar, and ntext, whereas Integration Services has DT_WSTR and DT_NTEXT data types to support Unicode character strings. Table name or view name variable c is data access mode works like table or view access mode except that in this access mode you supply the name of a variable in the Variable Name field that contains the name of the table or the view. Table name or view name variable—fast load c is data access mode works like table or view—fast load access mode except here you supply the name of a variable in the Variable Name field that contains the name of the table or the view. You still specify the fast load options in this data access mode. SQL command c Load the result set of an SQL statement using this option. You can provide the SQL query in the SQL Command Text dialog box or build a query by clicking Build Query. In the Mappings page, you can map Available Input Columns to the Available Destination Columns using a drag-and-drop operation, and in the Error Output page, you can specify the behavior when an error occurs. Partition Processing Destination The partition processing destination is used to load and process an SQL Server Analysis Services partition and works like a dimension processing destination. This destination has a custom user interface that is like the one for the dimension processing destination. This destination adapter requires the Analysis Services Connection Manager to connect to the cubes and its partitions that reside in an Analysis Services server or the Analysis Services project. The Partition Processing Destination Editor has three pages to configure properties. In the Connection Manager page, you can specify an Analysis Services Connection Manager and can choose from the three processing methods—Add (incremental) for incremental processing; Full, which is a default option and performs full processing of the partition; and Data only to perform update processing of the partition. In the Mappings page, you can map Available Input Columns to the Available Destination Columns using a drag-and-drop operation. In the Advanced page you can configure error-handling options when various types of errors occur. Error-handling options are similar to those available on the Advanced page of dimension processing destination. 364 Hands-On Microsoft SQL Server 2008 Integration Services Raw File Destination Sometimes you may need to stage data in between processes, for which you will want to extract data at the fastest possible speed. For example, if you have multiple packages that work on a data set one after another—i.e., a package needs to export the data at the end of its operation for the next package to continue its work on the data—a raw file destination and raw file source combination can be excellent choices. The raw file destination writes raw data to the destination raw file in an SSIS native form that doesn’t require translation. This raw data can be imported back to the system using the raw file source discussed earlier. Using the raw file destination to export and raw file source to import data back into the system results in high performance for the staging or export/import operation. However, if you have binary large object (BLOB) data that needs to be handled in such a fashion, Raw File destination cannot help you, as it doesn’t support BLOB objects. The Raw File Destination Editor has two pages to expose the configurable properties. The Connection Managers page allows you to select an access mode—File name or File name from variable—to specify how the filename information is provided. You can either specify the filename and path in the File Name field directly or you can use a variable to pass these details. Note that the Raw File destination doesn’t use a connection manager to connect to the raw file and hence you don’t specify a connection manager in this page; it connects to the raw file directly using the specified filename or by reading the filename from a variable. Next, you can choose from the following four options to write data to a file in the Write Option field: Append c Lets you use an existing file and append data to the already existing data. is option requires that the metadata of the appended data match the metadata of the existing data in the file. Create Always c is is a default option and always creates a new file using the filename details provided either directly in the File Name field or indirectly in a variable specified in the Variable Name field. Create Once c In the situations where you are using the data flow inside a repeating logic—i.e., inside a loop container—you may want to create a new file in the first iteration of the loop and then append the data to the file in the second and higher iterations. You can achieve this requirement by using this option. Truncate And Append c If you’ve an existing raw file that you want to use to write the data into, but want to delete the existing data before the new data is written into it, you can use this option to truncate the existing file first and then append the data to this file. Chapter 9: Data Flow Components 365 In all these options, wherever you use an existing file, the metadata of the data being loaded to the destination must match with the metadata of the file specified. In the Columns tab, you can select the columns you want to write into the raw file and assign them an output alias as well. Recordset Destination Sometimes you may need to take a record set from the data flow to pass it over to other elements in the package. Of course, in this instance you do not want to write to an external storage and then read from it unnecessarily. You can achieve this by using a variable and the recordset destination that populates an in-memory ADO record set to the variable at run time. This destination adapter doesn’t have its own custom user interface but uses the Advanced Editor to expose its properties. When you double-click this destination, the Advanced Editor for Recordset destination opens and displays properties organized in three tabs. In the Component Properties tab, you can specify the name of the variable to hold the record set in the Variable Name field. In the Input Columns tab, you can select the columns you want to extract out to the variable and assign an alias to each of the selected column along with specifying whether this is a read-only or a read-write column. As this source has only one input and no error output, the Input And Output Properties tab lists only the input columns. Script Component Destination You can use the script component as a data flow destination when you choose Destination in the Select Script Component Type dialog box. On being deployed as a destination, this component supports only one input and no output, as you know data flow destinations don’t have an output. The script component as a destination is covered in Chapter 11. SQL Server Compact Destination Integration Services stretches out to give you an SQL Server Compact destination, enabling your packages to write data straight to an SQL Server Compact database table. This destination uses the SQL Server Compact Connection Manager to connect to an SQL Server Compact database. The SQL Server Compact Connection Manager lets your package connect to a compact database file, and then you can specify the table you want to update in an SQL Server Compact destination. You need to create an SQL Server Compact Connection Manager before you can configure an SQL Server Compact destination. This destination does not have a custom user interface and hence uses the Advanced Editor to expose its properties. When you double-click this destination, the Advanced Editor for SQL Server 366 Hands-On Microsoft SQL Server 2008 Integration Services Compact destination opens with four tabs. Choose the connection manager for a Compact database in the Connection Manager tab. Specify the table name you want to update in the Table Name field under the Custom Properties section of the Component Properties tab. In the Column Mappings tab, you can map Available Input Columns to the Available Destination Columns using a drag-and-drop operation. The Input and Output Properties tab shows you the External Columns and Input Columns in the Input Collection and the Output Columns in the Error Output Collection. SQL Server Compact destination has one input and supports an error output. SQL Server Destination We have looked at two different ways to import data into SQL Server—using the Bulk Insert Task in Chapter 5 and the OLE DB destination earlier in this chapter. Though both are capable of importing data into SQL Server, they suffer from some limitations. The Bulk Insert task is a faster way to import data but is a part of the control flow, not the data flow, and doesn’t let you transform data before import. The OLE DB destination is part of the data flow and lets you transform the data before import; however, it isn’t the fastest method to import data into SQL Server. The SQL Server destination combines benefits of both the components—it lets you transform the data before import and use the speed of the Bulk Insert task to import data into local SQL Server tables and views. The SQL Server destination can write data into a local SQL Server only. So, if you want to import data faster to an SQL Server table or a view on the same server where the package is running, use an SQL Server destination rather than an OLE DB destination. Being a destination adapter, this has one input only and does not support an error output. SQL Server destination has a custom user interface, though you can also use the Advanced Editor to configure its properties. In the Connection Manager page of the SQL Destination Editor, you can specify a connection manager, a data source, or a data source view in the Connection Manager field to connect to an SQL Server database. Then select a table or view from the drop-down list in the Use A Table Or View field. You also have an option to create a new connection manager or a table or view by clicking the New buttons provided. In the Mappings page, you can map Available Input Columns to the Available Destination Columns using a drag-and-drop operation. You specify the Bulk Insert options in the Advanced page of the SQL Destination Editor dialog box. You can configure the following ten options in this page: Keep identity c is option is not checked by default. Check this box to keep the identity values coming in the data rather than using the unique values assigned by SQL Server. Chapter 9: Data Flow Components 367 Keep nulls c is option is not checked by default. Check this box to retain the null values. Table lock c is option is checked by default. Uncheck this option if you don’t want to lock the table during loading time. is option may impact the availability of tables being loaded to other applications or users. If you want to allow concurrent use of SQL Server tables that are being loaded by this destination, uncheck this box; however, if you are running this package at a quiet time—i.e., when no other applications or users are accessing the tables being loaded, or you do not want to allow concurrent use of those tables—it is better to leave the default setting. Check constraints c is option is checked by default. is means any constraint on the table being loaded will be checked during loading time. If you’re confident the data being loaded does not break any constraints and want faster import of data, you may uncheck this box to save processing overhead of checking constraints. Fire triggers c is option is not checked by default. Check this box to let the bulk insert operation execute insert triggers on target tables during loading. Selecting to execute insert triggers on the destination table may affect the performance of the loading operation. First row c Specify a value for the first row from which the bulk insert will start. Last row c Specify a value in this field for the last row to insert. Maximum number of errors c Provide a value for the maximum number of rows that cannot be imported due to errors in data before the bulk insert operation stops. Leave the First Row, Last Row, and Maximum Number Of Errors fields blank to indicate that you do not want to specify any limits. However, if you’re using the Advanced Editor, use a –1 value to indicate the same. Timeout c Specify the number of seconds in this field before the bulk insert operation times out. Order columns c Specify a comma-delimited list of columns in this field to sort data on in ascending or descending order. Data Flow Paths First, think of how you connect tasks in the control flow. You click the first task in the control flow to highlight the task and display a green arrow, representing output from the task. Then you drag the green arrow onto the next task in the work flow to create a connection between the tasks, represented by the green line by default. The green line, called a precedence constraint, enables you to define some conditions when the following tasks can be executed. In the data flow, you connect the components in the same way you . 358 Hands-On Microsoft SQL Server 2008 Integration Services To train the data mining models using this destination, you need a connection to SQL Server Analysis Services, where the. When you double-click this destination, the Advanced Editor for SQL Server 366 Hands-On Microsoft SQL Server 2008 Integration Services Compact destination opens with four tabs. Choose the connection. local SQL Server tables and views. The SQL Server destination can write data into a local SQL Server only. So, if you want to import data faster to an SQL Server table or a view on the same server