Hands-On Microsoft SQL Server 2008 Integration Services part 48 pps

448 Hands-On Microsoft SQL Server 2008 Integration Services Dimension Columns Change Type Country Historical attribute Email Fixed attribute Home Phone Changing attribute Work Phone Changing attribute Mobile Phone Changing attribute When you’ve configured all the columns, the wizard screen should look as shown in Figure 10-20. Click Next to move on. 16. In the Fixed and Changing Attribute Options screen, uncheck the “Fail the transformation if changes are detected in a fixed attribute” option, because we will make changes to the Email field to test how it goes. In a production environment, you need to consider both of the options in this page in light of your requirements. Click Next to open the Historical Attribute Options screen. Figure 10-20 Setting change types on the dimension columns Chapter 10: Data Flow Transformations 449 17. You have two options to record a Historical attribute. If you select a single column option, you can then select a column and the value pair to indicate the Current/ Expired or True/False values. Alternatively, you can select start and end dates to identify current and expired records. Select the radio button next to the “Use start and end dates to identify current and expired records” option. Select StartDate in the Start Date column and EndDate in the End Date column fields. Then select System:: StartTime in the “Variable to set date values” field, as shown in Figure 10-21. This will update the values of StartDate and EndDate fields using the start time of the package provided by the specified system variable. Click Next. 18. By default, the inferred member support is enabled in the SCD transformation, and you can select from either of the two options to identify the inferred member record. In the first option, you can specify null as the column values for all columns with an inferred member change type, and in the second option, you specify an indicator column. Select the radio button for “Use a Boolean column to indicate whether the current record is an inferred member.” Select IsInferred Column from the drop-down list in the Inferred Member Indicator field as shown in Figure 10-22. Click Next to review the outputs summary of the SCD Wizard. Figure 10-21 Setting Historical attribute options in the SCD Wizard 450 Hands-On Microsoft SQL Server 2008 Integration Services 19. Click Finish after reviewing the summary information. The SCD Wizard will take a while to build the data flow for you, but when it’s done, you will have a lot to review. After moving some of the components around, your data flow should look like the one shown in Figure 10-23. The SCD Wizard has added a data flow to the four outputs out of six outputs available. Let’s do a quick review. Inferred Member Updates Output c Double-click the OLE DB Command transformation attached to this output and go to the Component Properties tab. Check the SQL Statement in the SqlCommand field to find that this statement will update the DimCustomer for the Inferred member record (IsInferred = 1) and also reset its status while updating (IsInferred = 0). Click Cancel. Changing Attribute Updates Output c Double-click the OLE DB Command attached to this output and go to the Component Properties tab. Check the SQL Statement in the SqlCommand field to find that this statement will update the Changing attributes—i.e., the phone numbers for the member records. Click Cancel to exit. Figure 10-22 Inferred member support in the SCD Wizard Chapter 10: Data Flow Transformations 451 Historical Attribute Inserts Output c Double-click the Derived Column component attached to this output and note that this transformation creates a new column EndDate and populates the package start time in this new column. Click Cancel. Double-click the OLE DB Command attached to the output of the Derived Column transformation and go to the Component Properties tab. Check the SQL Statement in the SqlCommand field to find that this statement will update the member record of the DimCustomer table with the EndDate derived in the previous component hence expiring the active record. Click Cancel. After the OLE DB Command, the output is then combined with the data flow of New Output using the Union All component. Double-click the Union All component to open the editor and note that the EndDate field created earlier has not been passed on to the New Output data flow. is will actually leave the EndDate column value null, indicating that the added record is the active member. So, in the Historical Figure 10-23 Data flow created by the SCD Wizard 452 Hands-On Microsoft SQL Server 2008 Integration Services Attribute Inserts Output data flow, the current member is first expired and then a new row is inserted for the same member with the new attributes as an active member. New Output c is output contains new records that are combined with the “to be inserted” records from Historical Attribute Inserts output as stated in the preceding item. After that, the output is connected to Derived Column component. Double- click the Derived Column component and note that it adds a new StartDate column and populates it with a package start time using a system variable. Finally, it connects to an OLE DB Destination to insert these new records into the DimCustomer table. Two other outputs, Fixed Attribute Output and Unchanged Output, are available at the Loading DimCustomer SCD transformation for which SCD Wizard has not created any data flow. However, if you want to capture the rows from those outputs, you can create a downstream data flow for them. Exercise (Execute Loading Slowly Changing Dimension Package) In this final part, you will execute the package to see how SCD deals with different types of updates. 20. Double-click the New Output Data Flow Path to open the Data Flow Path Editor. Go to Data Viewers page and then click Add. Click OK to add a grid-type data viewer in the Configure Data Viewer dialog box. Click OK again to close the editor. Similarly add data viewers to the other three outputs. 21. You are now ready to run this package. First, though, let’s see the data that is going to be played and replayed to see the responses of SCD transformation to different changes. Open SQL Server Management Studio and connect to the database engine. Open a new query pane and run the following query to see records in the Customer table: Select * from [Campaign].[dbo].[Customer] Note that the Customer table contains 14 records and the CustomerID of the last four records is NULL. As per our package design, the Conditional Split transformation should filter out these records. Switch to BIDS, right-click the Loading Slowly Changing Dimension.dtsx package in the Solution Explorer, and choose Execute Package from the context menu. As the package executes, the data viewer that gets data first will pop up, which is the one on New Output in our case. If you move around on the screen, as shown Chapter 10: Data Flow Transformations 453 in Figure 10-24, you will Note that the Customers Filter component diverts four rows to the Excel destination and sends the other ten rows to Loading DimCustomer SCD transformation. Further, the SCD transformation identifies that all these records are in fact new records and diverts them to New Output. No record flows through any other output. Click the Detach button on the New Output Data Viewer to let the data flow to the destination and be inserted into DimCustomer table. Click the Detach button on other data viewers and close them. Press - 5 to return to design mode. Figure 10-24 Executing Loading Slowly Changing Dimension Package 454 Hands-On Microsoft SQL Server 2008 Integration Services 22. Let’s make changes to Customer’s data and see how this transformation works with them. Run the CustomerChanges.sql from the C:\SSIS\RawFiles folder in the query pane of SQL Server Management Studio. This script makes changes to the first three records in Customer table as follows: CustomerID Column Changed New Value 101 [Home Phone] 020885711000 102 [Email] peter.gormlay@AffordingIT.co.uk 103 [Address Line 1] [Address Line 2] 15 Abercrombie Avenue Wooburn Green Then it adds an Inferred member record first in DimCustomer table with minimal information as shown here: CustomerID Columns Value 111 [IsInferred] [StartDate] 1 getdate() Then inserts additional information for inferred member in the Customer table, as shown next, to simulate a real-life scenario: CustomerID Columns Value 111 [CustomerID] [FirstName] [LastName] [Address Line 1] [Address Line 2] [Address Line 3] [City] [State] [Postcode] [Country] [Email] [Home Phone] [Work Phone] [Mobile Phone] 111 Mark Morris Flat 22, Crescent Flats The Ridgeway Sketty Chertsey West Midlands PE7 3RQ United Kingdom mmark@AffordingIT.co.uk 079576516756 23. After running the CustomerChanges.sql script successfully, switch to BIDS and execute the package again. As the package executes, this time three data viewers will pop up. First, you will see a record in the data viewer attached to Changing Attribute Updates output. Note that this is the record for which you changed the Chapter 10: Data Flow Transformations 455 home phone number, which is allowed, and no history is to be kept. Click Detach to let it process. Then look at the data viewer attached to the Historical Attribute Inserts Output will display a record with CustomerID = 103. Note that this is the record that changed the address for which a history is to be kept. This record will first update the existing record with EndDate and then will be combined with the New Output using Union All transformations so as to be inserted as a new record. Click Detach to let it process. Next look at the data viewer attached to Inferred Member Updates Output that shows a record with CustomerID = 111. This record will be updated in the DimCustomer table. Click Detach to let it process. The New Output path will also be processed and the package will complete. 24. Press -5 to switch back to design mode. Press -- to save all the files and close the project. 25. Switch to SQL Server Management Studio and check the DimCustomer table to see that it has been loaded with the expected values. Review This exercise has explained loading a dimension, which is a daily chore of a DBA’s life. Loading a data warehouse dimension has been made easy with the Slowly Changing Dimension transformation, and configuring this transformation has been made easier still by the SCD Wizard. The SCD Wizard does so much work for you that you will probably use it every time you need to configure a SCD transformation. With the ease of use and coverage for most of the loading scenarios for a dimension, you may think perhaps SCD has nailed the problem; however, SCD has a shortcoming: its performance when it comes to loading into a large dimension. The use of OLE DB components to update each member in a dimension slows down the SCD to almost unacceptable levels of performance. You will need to do some work here to improve performance. One solution could be to create indexes on input and destination columns, while the other one could be to replace OLE DB Command with a staging OLE DB destination and an Execute SQL task to upload a data set virtually converting a row-based operation into a set-based operation. I’ve seen other customized solutions where developers have derived different flags, first using a script task and then loading the data set straightaway, converting the row-based decision-making logic at run time to a set-based operation and gaining performance. However, whichever method you use, testing for performance is the last, must-do step. Data Mining Query Transformation You can use a Data Mining Query transformation whenever you want to perform prediction queries against data mining models. This transformation has one input and one output and no error output. As you can envision, to execute your Data Mining 456 Hands-On Microsoft SQL Server 2008 Integration Services Extensions (DMX) query will require you to create a connection to the data mining model, so the user interface of this transformation provides two tabs—Mining Model and Query—for you to configure. In the Mining Model tab, you specify the Analysis Services Connection Manager to connect to an Analysis Services project or server. You select the mining structure from the drop-down list in the Mining Structure field. Once you’ve selected a mining structure, the Mining Models field lists the mining models associated with that mining structure. In the Query tab, you can either type in your query or use the graphical query designer by clicking Build New Query. The graphical query designer helps you build DMX queries. If you type directly in the query mode and later on switch to design mode, you will get a prompt alerting you that your changes may get lost. The designer in the Query Builder allows you to select mining models and data flow input columns from two list boxes. You can drag and drop fields from these boxes onto the designer cells and build custom DMX queries for evaluating data flow input data against an existing mining model You can create more than one prediction query using multiple mining models. However, the mining models you select must belong to the same mining structure. Term Lookup Transformation Using the Term Lookup transformation, you can count the number of times a text term occurs in the input data row and create custom word lists and word frequency statistics. This transformation reads the terms from a lookup table to look for matches in an input column and then, by default, adds two columns named Term and Frequency to the output containing the term and the count for the term. The transformation supports one input and one output. The reference data can be in SQL Server 2000 and later or in an Access database, and this transformation can do a lookup only on a column that is either the DT_WSTR or the DT_NTEXT data type. When you open the editor, you will see the following three tabs: Reference Table C Specify an OLE DB Connection Manager and a reference table name from which the transformation can read lookup terms. Term Lookup C Map an Input column to a Reference column to indicate that the lookup terms in the Reference column are to be counted in the Input column. You also select the columns that you want to pass through the data flow. Based on your selections, an advanced property—the InputColumnType of the Input column— is set that is available in Advanced Editor. Selecting a column only for pass-through sets the InputColumnType property to 0, mapping a column for lookup only sets the InputColumnType property to 1, and selecting a column for both pass-through and lookup operations sets the InputColumnType property to 2. Chapter 10: Data Flow Transformations 457 Advanced c Select to use a case-sensitive term lookup in which uppercase words are treated separate from lowercase words. However, if a word is a first word in a sentence and its first letter is capitalized, this can still match with a lowercase equivalent—so, for example, the word travel will match with the word Travel in the sentence “Travel in style and comfort.” It is relatively straightforward to configure this transformation. The mechanics are also simple. This transformation loads the terms it is to look up from the reference table in its private memory. It works in a fully precached mode only, so it loads all the values before processing lookups against the input column. However, to get accurate results you need to understand how this transformation behaves for different types of matches. The Term Lookup transformation extracts the required term from the input column by breaking the text into sentences, breaking sentences into words, and then normalizing the words. To extract the matching term, this transformation observes the following rules: If you specify the singular form of the word or phrase in the reference table, this c transformation will match both singular and plural forms of the word or phrase. If you use a plural form of the noun or noun phrase in the reference table, the c transformation matches it only with a plural form of the noun or noun phrase in the input data. If you want to do a match for nouns and noun phrases that contain special c characters such as %, @, &, $, #, *, :, ;, ., , , !, ?, <, >, +, =, ^, ~, |, \, /, (, ), [, ], {, }, “, and ‘, you can do so by including these special characters in the nouns and noun phrases in the reference table. e Term Lookup transformation returns only one result for any lookup input c column in which multiple overlapping terms are involved. While normalizing the input column words, the Term Lookup transformation c will affect the last word in the lemmatized noun phrase for normalization. Term Extraction Transformation Using a Term Extraction transformation, you can extract terms from the text of an input column and can thus build a list of terms used repeatedly in the input column for text mining and data analysis. This transformation, however, is limited in that it can extract only nouns or noun phrases or a combination of both, in English text only. It is aware of linguistic information about English and comes with its own English dictionary. At run time, the Term Extraction transformation reads the specified input . 448 Hands-On Microsoft SQL Server 2008 Integration Services Dimension Columns Change Type Country Historical attribute Email. Package 454 Hands-On Microsoft SQL Server 2008 Integration Services 22. Let’s make changes to Customer’s data and see how this transformation works with them. Run the CustomerChanges .sql from the. in the Historical Figure 10-23 Data flow created by the SCD Wizard 452 Hands-On Microsoft SQL Server 2008 Integration Services Attribute Inserts Output data flow, the current member is first expired

Định dạng
Số trang	10
Dung lượng	478,22 KB