Hands-On Microsoft SQL Server 2008 Integration Services part 41 ppsx

378 Hands-On Microsoft SQL Server 2008 Integration Services M ost of the business requirements will need you to work with data and make it suitable for deriving some business value out of it. The data and schema are modified in Integration Services using data flow transformations. In this chapter, you will explore the preconfigured transformations in detail. You will also be working with Hands-On exercises that will cover most of the transformations from the perspective of their usability. As you progress in the chapter, the Hands-On exercises will include more and more complex transformations and business scenarios, and of course they will be more interesting as well. To keep such a progression and your interest in the chapter, the transformations may not appear in the order in which they appeared in Chapter 9. We’ll start with the easiest ones first. Row Transformations Row transformations are simple compared to other types of transformations. Starting with the simplest transformation—i.e., Copy Column transformation—to copy an input column to output columns, we will study the Character Map transformation, the Data Conversion transformation, and then the Data Derivation transformation. The next two transformations deal with columnar data—the Export Column transformation exports partial data from the pipeline into a file, and the Import Column transformation reads data from a file and adds it into the pipeline. The ability to perform more complex functions will increase as we go till we reach the last two transformations, Script Component and OLE DB Command, in this category, which can be programmed to perform the functions you want. Let’s start our journey with Copy Column transformation. Copy Column Transformation The Copy Column transformation is probably the simplest of the transformations available in the SQL Server Integration Services. It does precisely what it says—it copies a column in the data flow to add a new column in the data flow. Sometimes you may need to output a column two times for other applications, and as SSIS does not provide the flexibility of mapping a column twice in the destination adapters, you would use this transformation to fill in. This new column then can be given a name within the transformation. To perform the desired function, this transformation has been designed to support one input and one output. The Copy Column transformation has a custom user interface defined by the Copy Column Transformation Editor. In the Copy Column Transformation Editor, you can select the columns you want to copy either from the upper half by checking the boxes in front of columns in the list under Available Input Columns or select from the drop-down list provided in the Input column. You can select multiple columns to be copied or choose to create multiple copies of a column in the same transformation. The selected columns will be copied and added as new columns to the data flow output Chapter 10: Data Flow Transformations 379 columns of this transformation. When you select a column in the Input column, the Output Alias shows the Input column name prefixed with a Copy Of string by default. You can change this alias and specify a more appropriate name for the column. As this transformation does not cause any changes to the data schema or data quality, this transformation doesn’t introduce any errors in the data flow and hence doesn’t have error output. Character Map Transformation You use this transformation to apply string functions to the string data type columns. Configuring this transformation and applying any of the following string functions to one of the columns are simple operations. The converted data can then either be populated in a new column or it can perform an in-place data conversion. An in-place change means that the data will be modified in the existing column and no new column will be added to the data flow. The string operation you specify is applied row by row as the data flows through this transformation. To perform a function, this transformation supports one input, one output, and one error output. To add the Character Map transformation to the data flow and open the editor UI, you need to drop this transformation in your data flow task and join with an upstream component before you double-click it. The user interface in the Character Map Transformation Editor is simple and intuitive. In the upper part, a list provides the Available Input Columns with a check box in front of each column. Select a column by clicking the check box, and a line is added in the lower half of the window with the selected column shown in the Input column. Alternatively, you can select a column for applying string operations by clicking in the Input column and selecting a column from the drop-down list. Note that the Character Map transformation can be used only for columns with string data types. If you configure this transformation with a column that has a data type other than string, validation errors will occur with the error message stating that the input column has an unsupported data type. The lower half of the transformation UI has four columns. Input Column c Here you can select the column you want to apply Character Map transformations. Destination c e Destination column allows you to specify whether you want the modified data to be copied into a New Column, which is a default option, or treated for an in-place change. You select either option by clicking in the column and choosing it from the drop-down list. When you select the New Column value, the Output Alias field shows Column Name prefixed with Copy Of for the new column, and when you choose In-Place Change, the Output Alias shows Column Name as is. 380 Hands-On Microsoft SQL Server 2008 Integration Services Operation c e Operation column allows you to select from any of the string operations listed in the following table. You can select multiple operations by clicking the multiple check boxes in the list; the Operation column will show the selected operations in a comma-delimited list. However, there are some restrictions on the selection of multiple operations on the same column—for example, you cannot select Uppercase and Lowercase operations to be applied on the same column in one transformation. Refer to Microsoft SQL Server 2008 Books Online for more details on these restrictions. Operation Description Lowercase Convert characters of a column to lowercase. Uppercase Convert characters of a column to uppercase. Byte reversal Reverses the byte order of a column data. Hiragana Convert katakana characters of a column data to hiragana characters. Katakana Convert hiragana characters of a column data to katakana characters. Both hiragana and katakana are part of Japanese syllabary. Half width Convert full-width characters of a column data to half-width characters. Full width Convert half-width characters of a column data to full-width characters. Linguistic casing Apply linguistic casing instead of the system rules. Simplified Chinese Convert traditional Chinese characters of a column data to simplified Chinese characters. Traditional Chinese Convert simplified Chinese characters of column data to traditional Chinese characters. Output Alias c is column is used to specify a name to the newly added data column in the data flow. You can specify the error handling behavior of this transformation by clicking the Configure Error Output button. As this component applies string transformations to the text columns, truncations can occur in the transformed data. For example, when you convert Half Width data to Full Width data, the column length may not support that and can cause the data to be truncated. In such cases, you can use the Configure Error Output button to specify the action that this component should take. When you click this button, the Configure Error Output dialog box opens, where you can specify whether errors should fail the component, be ignored, or redirect the failing rows to the error output for each of the input or output columns that you’ve selected or created in the transformation. Chapter 10: Data Flow Transformations 381 Data Conversion Transformation When data is coming from disparate sources into a data warehouse or a data mart, data type mismatches can occur during the data upload process. One of the functions performed during data loading to a data warehouse or a data mart is to convert data type of the column to the one that matches the data type of the destination column. This transformation supports one input, one output, and one error output to perform its functions. When the data is read from the data source, depending on the data and the data source, a data type is assigned to the Input column, which may not be exactly what you want. For example, a date column read from a text file is generally assigned a string data type that needs to be converted into a date time data type before loading into a destination expecting a date time data type. This problem was handled in DTS 2000 using CAST or CONVERT functions of T-SQL within the Execute SQL task. SSIS provides Data Conversion transformation as a preconfigured task to perform this operation, though you still can write T-SQL code using OLE DB Command transformation within the data flow. The Data Conversion transformation allows you to convert the data type of an input column to a different data type and add the converted data to a new output column. During data conversion, the data of the input column is parsed and then converted into a new Integration Services data type before being written to a new output column. By now you know that Integration Services provides two types of parsing routines—locale- insensitive fast parsing and the locale-sensitive standard parsing routine. You can specify the parsing routines for each of the output columns in the Advanced Editor for Data Conversion. To specify the parsing method, open the Advanced Editor for Data Conversion, go to the Input And Output Properties tab, click the appropriate output column by expanding the Output Columns folder under Data Conversion Output. Go to the Custom Properties section of the selected output column and choose between the False and True values for the FastParse option. In the Data Conversion Transformation Editor, you can select the input columns that you want to convert to a different data type in the input column or select check boxes in the list of Available Input Columns. You can select multiple columns for conversion or apply multiple conversions to the same column. As the converted data is copied into a new output column, you can define an output alias for the newly created column in the data flow. In the Data Type column, you can specify the data type to which you want to convert the selected input column. The Data Type column helps you to select a data type for the column from the drop-down list of available Integration Services data types. Depending on the data type you select, you can set other attributes of the new data type such as length, precision, scale, and code page. While specifying the code page for the new string data type output column, you must keep the code 382 Hands-On Microsoft SQL Server 2008 Integration Services page for the new output column the same as the input column code page when you are converting data between string data type columns. As Data Conversion transformation creates new columns with modified data types and modifies the data flow, error and truncation occurrences in data are possible. To handle these, this transformation provides an error-handling mechanism via the Configure Error Output button. In the Configure Error Output dialog box, you will find a list of all the output columns that you have created in this transformation. You can specify whether to fail the component, ignore the error, or redirect the failing row against error and truncation for each of the column. Derived Column Transformation The Derived Column transformation enables you to perform more complex derivations on an input column data. Until now you’ve been performing simple operations on the data, such as copying a column, applying string transformations using Character Map transformation, or even changing data types. With this transformation in your toolkit, you can derive your columns using complex derivation rules. Using this transformation, you will be able to perform operations such as deriving sensible information from the Notes or Comments column, concatenating two text strings using two columns or a column and a variable to populate in a single column, performing mathematical operations on a numerical column, using conditional derivations, and much more. This transformation is designed to have one input, one regular output, and one error output. As this transformation will be making changes to the data and the data schema, an error or a truncation can occur in the data. To handle such an instance, this transformation provides a Configure Error Output button to invoke the Configure Error Output dialog box, where you can specify to fail the transformation, ignore the error, or redirect the failing rows to an error output for each of the derived column for errors and for truncations. This is a very versatile transformation within data flow task that is repeatedly used within your packages and is also used in other components and transformations. For instance, the transformations you have studied so far are special applications of a derived column transformation, though with a slightly different UI. The user interface of a derived column transformation resembles the Expression Builder dialog box. In the Derived Column field in the lower half of the window, you can either select the Add As New Column option to add the derived data into a new column or choose to replace an existing column. Depending on your choice, the Derived Column Name field will be filled in with a default name that you can change to a more appropriate name. The Expression field is the most important field in the Derived Column Transformation Editor. Here you specify the expression to derive the data. You can use any combination of Chapter 10: Data Flow Transformations 383 variables, input columns, mathematical functions, string functions, date/time functions, null functions, type casts, and operators to build this expression. After specifying an expression to derive data, you can specify the data type, length, precision, scale, and code page for the data. Once you have successfully configured one row in the Expression Builder, you can add more rows to derive additional columns and hence can perform multiple derivations using the same transformation. The left-upper half of the window lists variables and input columns that you can drag and drop in the Expression field. Similarly, the right-upper half of the window lists functions and operators that can also be embedded in the expression simply by dragging and dropping. There are six categories of such functions and operators. Mathematical Functions c Allow you to perform mathematical operations such as returning absolute positive value with ABS, rounding a numeric value to the closest integer value using ROUND, or calculating the square of a numeric value using SQUARE. String Functions c Allow you to perform string operations such as FINSTRING, SUBSTRING, TRIM, LOWER, UPPER, LEN, and RIGHT. Date/Time Functions c Allow you to manipulate date and time values using functions such as DAY, MONTH, YEAR, DATEADD, DATEPART, and DATEDIFF. Null Functions c Allow you to find whether an expression is a null using the ISNULL function or return a null value of specified data type using functions such as NULL(DT_WSTR, <length>). Type Casts c Allow you to cast your expressions in a specified data type; for instance, the (DT_I4) function will convert the result of your expression into an integer with length 4. Operators c Allow you to perform various types of operations on the expressions, including add, subtract, multiply, divide, concatenate, equal, greater than or equal to, logical and, logical or, and conditional operation. You will be using these versatile operators quite frequently, so make sure you acquaint yourself with all the available operators. While configuring the Derived Column transformation is not difficult, writing expressions for derivations of the data could be challenging sometimes, especially for a newbie. So, here are some of the cookies for you to get started quickly. These examples 384 Hands-On Microsoft SQL Server 2008 Integration Services are only for you to practice and do not cover most of the instances you might come across in real life; however, you might find them a good starting point. To trim a column CustomerName, you can use either TRIM([CustomerName]) c or RTRIM(LTRIM([CustomerName])) e following expression returns the first position of the semicolon (;) in the string. c FINDSTRING("abcd;cc;c",";",1) You can combine functions with other functions to solve complex issues. For c example, using SUBSTRING with FINDSTRING functions, you can truncate the preceding string up to the characters till the first semicolon. SUBSTRING("abcd;cc;c",1,FINDSTRING("abcd;cc;c",";",1) - 1) In the following example, you use simple operators to derive PercentageProfit c column and round off the result to a precision of 2. ROUND(((SalePrice – CostPrice) * 100) / CostPrice, 2) In this example, you are working on MiddleName column. When you get a null c value in this column and do not want to pass it through but prefer to replace null with a blank string value, you would use the IsNull function along with a conditional operator to return a blank string. IsNull(MiddleName) ? " " : MiddleName The conditional operator evaluates a Boolean condition, which is on the left side of this expression—i.e., before the question mark sign—IsNull(MiddleName— and if true it returns the first value after the question mark sign, or if the Boolean evaluates to false, it returns the second value, which is shown after the colon in the preceding expression. Another similar example is with numeric data when DamageCost contains null, but you want to pass a 0 instead. ISNULL(DamageCost) ? 0 : DamageCost You can build quite complex expressions using this conditional operator, as you can also nest conditional operations as well. is example shows use of a conditional operator along with an OR operator to c create a bit more complex Boolean condition. Here in the Country field you get two-digit country codes, which are okay for all other countries other than the UK, for which two values exist in data: UK and GB. You want to pass only UK if the country code is either UK or GB and let other codes pass through as is. Country == "UK" || Country == "GB" ? "UK" : Country Chapter 10: Data Flow Transformations 385 Last is again an example of conditional operator, but this time it is using dates. c Here you return a 1900-01-01 date in the date time format if EndDate is null; else, you let the date time pass through as is. ISNULL(EndDate) ? (DT_DBTIMESTAMP)"1900-01-01 00:00:00.000" : EndDate As you have used a DT_DBTIMESTAMP type cast here, similarly you can cast a date time into any other required data type format using one of the several date time format type casts. Export Column Transformation The Export Column transformation has one input, one output, and one error output. Using this transformation, you can export data from the data flow to an external file. For example, you can export images from the data flow and save them to individual files. To use this transformation, two columns must exist in the data flow: one that contains data—i.e., images—and the second that contains the paths for the files to which you want to save those images. By properly configuring the Export Column transformation, you can save each image in every row to a separate file specified by the column that contains the file path information. When you open the transformation editor after connecting an input, you can select columns that you want to extract in the Extract Column field and select the column in the File Path Column that specify file paths for the extracted data. The Extract Column and the File Path Column fields provide drop-down lists of the columns in the data flow; however, the Extract Column displays only the columns that have the DT_TEXT, DT_NTEXT, or DT_IMAGE data type. You will be able to select Allow Append and Force Truncate check boxes as well. The Allow Append check box allows the transformation to append data to an existing file, and the Force Truncate check box allows transformation to delete and re-create the file before writing data into it. The editor’s user interface doesn’t allow you to select both the check boxes simultaneously; however, if you set these properties programmatically, the component fails. At run time, the transformation exports data row by row and writes the exported data to the files. Depending on your selections of Allow Append and Force Truncate options and the existence of files to which you want to save data, the transformation decides the appropriate action. For example, if the file to which data is to be inserted does not exist, the transformation creates a new file and writes the data to it irrespective of your settings of these check boxes. However, if the file to which data is to be inserted already exists, the transformation uses information from the check boxes. In this case, if the Allow Append check box is selected, the transformation opens the file and writes the data at the end of the file, and if the Force Truncate option is checked, the transformation deletes and re-creates the file and writes the data in to the file. If you do not select any of the check boxes and the file exists, a run-time error occurs because you are effectively not letting the transformation append or truncate the file, prohibiting it from writing data to the file. 386 Hands-On Microsoft SQL Server 2008 Integration Services The last option you may find on the user interface is a check box to Write Byte-Order Mark. A Byte-Order Mark (BOM) is a ZERO-WIDTH NO-BREAK SPACE character used to denote Endianness of a string. This is particularly useful for images saved in the TIFF format that store pixel values as words, and BOM makes a difference in performance. Images stored in JPEG or GIF format are not word-oriented and Byte-Order does not matter. Note that BOM is written only when the data type is DT_NTEXT and data is written to a new file. Import Column Transformation As you can expect, the Import Column transformation is the reverse of the Export Column transformation and has one input, one output, and one error output. Using this transformation, you can read data from files and add the data to the columns in the data flow. For example, you can add images using this transformation in the data flow along with the text fields. An input column in the transformation input contains the file paths for the files that contain data that you want to add to the data flow. At run time, this transformation processes each row, gets the file path information from the input column, and then loads the data from it to an output column in the data flow. This transformation doesn’t have its own user interface but uses the Advanced Editor to expose its properties for configurations. When you open the Advanced Editor for Import Column transformation, you can select an input column that contains the file path information in the Input Columns tab. Then you can add an output column into which you want to copy the data in the Input And Output Properties tab. To add an output column, select the Output Columns collection in the Import Column Output node and click Add Column. When this new column is added, select the desired data type for the column. The data type of the output column must be DT_TEXT, DT_NTEXT, or DT_IMAGE. Note the ID of this column and expand Input Columns collection in the Import Column Input node. You will see the column that you’ve selected in the Input Columns tab earlier. Scroll down in the Custom Properties section to locate the FileDataColumnID property and specify the ID of the new output column in this property. By specifying the ID of the output column in the FileDataColumnID property of the input column, you tell the transformation ID of the column into which you want to receive the imported data. If the data type of the new output column is DT_NTEXT, you can also specify True in the ExpectBOM field if the file is expected to begin with a BOM. Script Component You have learned about this Data Flow component in the Data Flow Sources and Destinations. By now you understand that this component can also be used as a Chapter 10: Data Flow Transformations 387 transformation. The basic functionality of this component remains the same—i.e., it enables you to write custom code and include that custom code as a component inside the package. Using the Script component as a transformation, you can write custom code to call functions that are not available in SSIS—i.e., you can call the .NET assembly to use working code that is available outside SSIS; build transformations that are not available in SSIS; or apply multiple transformations with custom code to enhance the performance of your package. You can in fact write custom code to do all the transformations within the same Script component; although you could but probably you shouldn’t for simplicity, modularity, and performance reasons. All this is covered in greater detail in Chapter 11. OLE DB Command Transformation The Execute SQL task of DTS 2000 is commonly used to perform a variety of functions varying from work flow to data-related operations such as truncating tables and log files; creating or dropping tables and views; and updating, deleting, or inserting data in the database objects. As the control flow and the data flow engines have been separated in Integration Services, so the Execute SQL task has been made available in both the engines of Integration Services to perform the appropriate functions within that engine. The Execute SQL task provided in the control flow of Integration Services is designed to perform the workflow functions such as truncating tables, creating and dropping tables, and updating variable values using the results of an SQL statement, while the OLE DB Command transformation in the data flow engine provides for performing data-related operations. This transformation allows you to run an SQL statement that can insert, update, or delete rows in a database table. As this transformation is designed to be a Row transformation, the SQL statement is actually run for each row of the data flowing through this transformation. The real power of the OLE DB transformation lies in the ability to run a parameterized query to perform insert, update, or delete operations against each row. This also means that you can use a stored procedure with parameters to run against each row of the data. Though it is easy to use this component, as it processes rows one by one, for a considerable size of rowset, the performance will be noticeably less when compared to alternate set-based operations such as staging and loading. The OLE DB transformation has one input and one output and also supports one error output; however, it doesn’t have a custom user interface and hence uses the Advanced Editor to expose its properties. In the Connection Managers tab of the Advanced Editor for OLE DB Command, you specify the connection to the database that you want to update, delete, or insert into. . 378 Hands-On Microsoft SQL Server 2008 Integration Services M ost of the business requirements will need you to work with data. choose In-Place Change, the Output Alias shows Column Name as is. 380 Hands-On Microsoft SQL Server 2008 Integration Services Operation c e Operation column allows you to select from any of. for the new string data type output column, you must keep the code 382 Hands-On Microsoft SQL Server 2008 Integration Services page for the new output column the same as the input column code

Định dạng
Số trang	10
Dung lượng	117,53 KB