408 Hands-On Microsoft SQL Server 2008 Integration Services 34. Go to Connection page, click the New button next to the OLE DB Connection Manager field, and choose localhost.Campaign from the Data Connections list; then click OK to add this connection manager. 35. Select the Use results of an SQL query option and type in the following SQL query. SELECT [City], UPPER([Postcode]) Postcode FROM [Campaign].[dbo].[PostalCity] 36. Go to the Columns page and map the Postcode column in the Available Input Columns list to the Postcode in the Available Lookup columns. Figure 10-5 Configurations of Character Map transformation Chapter 10: Data Flow Transformations 409 37. Click in the City Column check box to add this column as a new column in the output, as shown in Figure 10-6. 38. Click OK to close the Lookup transformation. Rename this transformation as Adding City Column. 39. From the Data Flow destinations section of the Toolbox, drag and drop the Flat File destination below the Adding City Column transformation. Drag the green arrow from the Adding City Column and drop it on the Flat File destination. This will open the Input Output Selection dialog box. Select Lookup No Match Output in the Output field and click OK to close. Figure 10-6 Adding a new column in the output using the Lookup transformation 410 Hands-On Microsoft SQL Server 2008 Integration Services 40. Double-click the Flat File destination to open its editor and click the New button next to Flat File Connection Manager. Select the Delimited Option radio button in the pop-up dialog box asking you to select a Flat File Format, and then click OK. This will open the Flat File Connection Manager Editor. 41. Type No Match Lookups in the Connection Manager Name field and C:\SSIS\ RawFiles\NoMatchLookups.txt in the File Name field. Click to check the Column names in the first data row option. 42. Go to the Columns page and check out that {CR}{LF} is selected as Row delimiter and Comma {,} is selected as Column delimiter. Go to the Advanced page and check out the properties of the columns. Note that the connection manager has correctly picked up the data types and the length of the columns. Click OK to close it and return to the Flat File Destination Editor. Leave the Overwrite Data in the file option selected. 43. Go to the Mappings page to create mappings between the Available Input Columns and Available Destination Columns and then click OK to close it. Rename Flat File destination as No Match Lookups File. Exercise (Delete Duplicates and Load PersonContact) In this part of the exercise, you will first delete the records from the PersonContact table that are to be updated and are also coming in the pipeline and will then insert all the records coming in the data flow into the PersonContact table. 44. Drop the OLE DB command from the Toolbox below the Adding City Column transformation. Connect both of these transformations using the green data flow path. 45. Double-click the OLE DB command to open the Advanced Editor for OLE DB Command. Choose localhost.Campaign in the Connection Manager field from the drop-down list in the Connection Managers tab. Go to the Component Properties tab. 46. Type Deleting Duplicates in the Name field. Type the following SQL statement in the SqlCommand field and click OK. DELETE PersonContact WHERE FirstName = ? AND LastName = ? 47. Go to Column Mappings page, where you will see Param_0 and Param_1 columns in the Available Destination Columns. These represent parameters that you’ve used in the preceding SQL statement. Map Param_0 to FirstName and Param_1 to LastName columns in the Available Input columns (see Figure 10-7) and click Refresh. This should remove any validation error appearing at the bottom of the Advanced Editor. Click OK to close. 48. Drop an OLE DB destination on the Designer surface and join it with the green data flow path from the Deleting Duplicates transformation. Chapter 10: Data Flow Transformations 411 49. Double-click the OLE DB destination to open the editor. You will see localhost .Campaign selected in the OLE DB Connection Manager field, as this is the only OLE DB Connection Manager configured in the package. Leave “Table or view—fast load” selected in the Data access mode field. Select the [dbo]. [PersonContact] table from the drop-down list in the “Name of the table or the view” field. 50. Go to the Mappings page to create the required mappings. Click OK to close this. Rename this destination PersonContact. After some adjustments, your package should look like one shown in Figure 10-8. Press -- to save all the items in the project. Figure 10-7 Mapping parameters in the OLE DB Command transformation 412 Hands-On Microsoft SQL Server 2008 Integration Services Exercise (Execute the Package) Here, you will add a data viewer on the Data Flow path to see the data flowing through the pipeline at run time and execute the package. 51. You can add as many data viewers to the package as you like before executing the package, but as a minimum, add a data viewer before the Deriving Salutation component and before the Deleting Duplicates component. To add a data viewer, double-click the green line joining the two components to open the Data Flow Path Editor. Go to the Data Viewers page, click Add, and then click OK in the Configure Data Viewer dialog box to return; then click OK again to add a grid type data viewer. 52. Press 5 to execute this package. As the package is executed, you will see two data viewer output windows appear. If you adjust them on the screen (without Figure 10-8 Updating the PersonContact package Chapter 10: Data Flow Transformations 413 dropping them on each other) so that you can see the background package as well, you can notice that the package execution is halted till the place where the data viewer is added in the data flow. 53. In the beginning of the package execution, note that 20 rows from flat file and 30 rows from Excel file are combined by the Union All transformation. In the first data viewer, you can see that 50 records (see in the status bar at the bottom of the data viewer window) have been combined and are flowing as a single data collection. Detach this data viewer and see the data in the second data viewer, where the Salutation has been derived and the City column has been populated with city names. This data viewer shows only 46 records, as 4 records did not find an exact match in the lookup table and so were sent out to the No Match Lookups output—i.e., No Match Lookups File. Click Detach to complete the package execution. 54. Press -5 to stop debugging. Close the project and exit BIDS. Review In this first data flow Hands-On exercise, you used several components that are the basic building blocks in a data flow. You also saw how you can use these components one after another to convert data to match the succeeding components requirements, while keeping in line with the final destination’s requirements. You also captured records from the Lookup transformation for which there were no exact matches in the Lookup table. The concept of a Lookup table is extensively used to standardize data and to identify updates and inserts in the loading process. There are examples in Books Online that you can refer to understand more about configurations of lookup transformation in full cache mode or partial or no cache mode. Later in the exercise you have used an OLE DB Command transformation that uses parameters to map to input column values. If you are updating a table with a large number of columns, it will become confusing and troublesome to configure OLE DB transformation as the parameter names are not intuitive. Fortunately, this problem can be solved by using a stored procedure than using raw SQL in the SqlCommand property. For example, in the preceding case you could also create a stored procedure as follows: CREATE PROCEDURE uspDeletingDuplicates @FirstName VARCHAR(50), @LastName VARCHAR(50) AS BEGIN SET NOCOUNT ON; DELETE PersonContact WHERE FirstName = @FirstName AND LastName = @LastName END 414 Hands-On Microsoft SQL Server 2008 Integration Services And you could execute the stored procedure using SqlCommand field as EXEC dbo.uspDeletingDuplicates ?, ? And in Column Mappings tab, you could have mapped columns using the variable names as shown in Figure 10-9. As you can make out, it is much easier to understand and maintain. Figure 10-9 Updating the PersonContact package Chapter 10: Data Flow Transformations 415 Rowset Transformations The Rowset transformations work on record sets. These transformations first receive all the rows and then do the data processing as the operations they perform need all the rows upfront. Among these are the Aggregate transformation, Sort transformation, Percentage and Row Sampling transformations, and Pivot and Unpivot transformations. In this section, you will work through two Hands-On exercises built around these transformations—one built around the Aggregate transformation and the other one built around the Pivot transformation, in which you will also use the Sort transformation along with other data flow components. Sort Transformation The Sort transformation allows you to sort input rows in ascending or descending order by selecting one or more input columns for sort order criteria, similar to the ORDER BY clause of T-SQL. To sort the records, as you can imagine, this transformation requires collecting all the rows before applying sorting order. While sifting through the records, this transformation can also look for duplicate records with the same sort key values and can “de-dupe” them. The Sort transformation supports one input and one output to perform its operation. The user interface of this transformation is simple and has a list of Available Input Columns in the upper half of the dialog box, along with check boxes before and after the column names. The selection of the check boxes after the column names—i.e., the Pass Through check boxes—allow the columns to be included in the sorted output; the selection of check boxes before the column names allow you to sort the input records on those columns. The lower half of the dialog has five fields to specify the columns you want to work with and the criteria to apply for sort order. When you open the Sort Transformation Editor, you will realize that all the Pass Through check boxes have already been selected. This means by default all the columns will be sent to the Sort Transformation output columns. To configure the sort order, you select the input columns by clicking the check boxes before the Available Input Columns. Alternatively, you can select a column from the drop-down list invoked by clicking in the Input Column field. As you select an input column, notice that Output Alias, Sort Type, and Sort Order are automatically assigned to this column, though you can modify these values. The default value for the Sort Type is ascending, but it can be changed to descending. The Sort Type setting applies a sorting order to the data in the selected column and has nothing to do with the Sort Order field, which applies a sort order to the columns. The Sort Order is a numerical value assigned to a column on the basis of its position in the list of columns selected for sorting. This value starts with 1 for the first column and is increased by 1 for subsequent columns selected. The Sort Order value 416 Hands-On Microsoft SQL Server 2008 Integration Services determines which column is sorted first. The column with the smaller value is always sorted before a column with bigger Sort Order value—i.e., the column with a Sort Order value of 1 will be sorted before the column with a Sort Order value of 2, and so on. The Sort Type and Sort Order are represented by a single property of the input column in the Advanced Editor. If you open the Advanced Editor for Sort Transformation, go to the Input and Output Properties tab, expand the Sort Input and then expand Input columns, you will see a list of input columns that are available to the transformation and have been selected to pass through. If you click any of the input columns and scroll to the bottom of its properties, you will see the property NewSortKeyPosition, which holds a positive or negative value. The positive value indicates that the sort type is ascending, and a negative value indicates that the sort type is descending. The numerical value— i.e., 1, 2, or 3 and so on—represents the sort order property of customer user interface. A value of 0 indicates that the column is not included in the sort criteria; in fact, these are the columns that have only Pass Through check boxes selected against them. The input columns that are not included in the sort criteria, but are selected to pass through, are copied along with the sorted columns to the output columns. Once the Sort transformation collects all the rows, it can apply the sorting criteria to the data rows. A sort operation can be an expensive process and can cause performance issues. You need to decide what you want to achieve before using this transformation; for example, do you want to provide all the server power to the sort process or want to limit usage of resources for sort operation? In the Advanced Editor for Sort Transformation, you can specify the maximum number of threads that can be used by the Sort transformation in the MaximumThreads property in the Component Properties tab. The default value for this property is –1, which indicates an infinite number of threads available to the transformation. Returning to the Sort Transformation Editor, where you’ve selected input columns and specified sort criteria, notice the Comparison Flags field. If you click in this field, you will see a list of options. The purpose of this field is to specify how the sorting should handle the data comparison. As you sort columns, the Sort transformation compares the data to allocate a proper sort order for the data rows. You can specify the options in this field that can affect the sensitivity of the comparison of the data. For example, you can choose the Ignore Case option to specify that uppercase and lowercase data is to be treated equally. Six options are available, and you can choose more than one option here: Ignore Case c You choose this option to specify that while comparing data, the Sort transformation will not distinguish between uppercase and lowercase letters. Ignore Kana Type c If you are using the Japanese language, selecting this option requires the Sort transformation to ignore the distinction between hiragana and katakana, the two types of Japanese kana characters. For the benefit of those who Chapter 10: Data Flow Transformations 417 are less acquainted with Japanese, kana is a general term used to express the two types of Japanese syllabic scripts. Hiragana is a cursive and flowing variety of kana used in most modern Japanese texts, and katakana is a relatively angular kana used for writing foreign words or official documents such as telegrams. Ignore Nonspacing Characters c e data you deal with may contain diacritics, especially in the age of Internet when the data entry is left in the hands of end users across the globe. A diacritic is a mark added to a letter to indicate a special phonetic value—such as the acute accent of resumé. Selecting the check box for this option treats the spacing characters and diacritics alike. Ignore Character Width c Using this option allows you to treat the single-byte characters (non-Unicode) and double-byte characters (Unicode characters) alike. Integration Services may automatically convert the data to Unicode data before comparing text data. Ignore Symbols c Sometimes it is useful to compare the string data by eliminating the symbols and white-space characters. is may be due to poor data quality or standardization, again due to free style data entered by users. e Sort transformation ignores the symbols when you select this option and treats HNO# and HNO as identical. Sort Punctuation As Symbols c You can configure the Sort transformation to remove all punctuation symbols except hyphens and apostrophes appended before the string data before comparison. Using this option treats .NET and NET as identical. The sort criteria you specify for the input column generates a sort key value used to compare string data and sort it appropriately. If duplicate rows are included and you want to remove these, you can do so by selecting the “Remove rows with duplicate sort values” option provided on the lower-left side of the Sort Transformation Editor. If you can extract data from the source in a sorted way, then you can avoid using Sort transform. However, some components in data flow need to know that the data is sorted or not and on what fields; for example, the Merge Join transformation will need this information before joining data from two different sources. You can do so by using the IsSorted and SortKeyPosition advanced properties on source adapters. Refer to Chapter 15, particularly Figure 15-4 and Figure 15-5 and the explanation around them, to understand how to configure them. Percentage Sampling Transformation When you need to give out data to call centers for telesales activities, you are generally asked to create a sample set from a data segmentation. Sometimes the requirement is defined as a percentage—for example, you may be asked to create a sample set of . @FirstName AND LastName = @LastName END 414 Hands-On Microsoft SQL Server 2008 Integration Services And you could execute the stored procedure using SqlCommand field as EXEC dbo.uspDeletingDuplicates. 10-6 Adding a new column in the output using the Lookup transformation 410 Hands-On Microsoft SQL Server 2008 Integration Services 40. Double-click the Flat File destination to open its editor and. project. Figure 10-7 Mapping parameters in the OLE DB Command transformation 412 Hands-On Microsoft SQL Server 2008 Integration Services Exercise (Execute the Package) Here, you will add a data viewer