1. Trang chủ
  2. » Công Nghệ Thông Tin

Microsoft SQL Server 2008 R2 Unleashed- P215 docx

10 237 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 605,58 KB

Nội dung

ptg 2104 CHAPTER 52 SQL Server Integration Services . Your bulk data movement doesn’t have to be faster than the speed of light. Unfortunately, SSIS must utilize conventional connection techniques to these data sources. It must also create intermediate buffers to hold data during the transforma- tion steps. This usually disqualifies SSIS on the high-performance side of require- ments (at least for large, bulk data movements with any type of data transformations defined). However, many performance enhancements are present in SSIS and the data providers that are now supported, which has resulted in about a 50% increase in bulk data movement speeds. Alternative importing/exporting facilities such as bcp offer better performance but lack the flexibility of SSIS. The following additional SSIS data sources and destinations are supported: . An XML source for extracting data from XML documents directly . Full insert and updating support for SQL Server Mobile destinations . Reading and writing to Raw data files (sources and destinations) . Creating an in-memory ADO DB recordset via a destination . Direct access to a number of Analysis Services object destinations (for example, mining models, cubes, and dimensions) . The ADO.NET DataReader source and destination for reading and writing to any .NET framework data provider SQL Server 2008 now supports the following additional SSIS data transformations: . Data warehousing operations, such as the Aggregate, Pivot, Un-pivot, and Slowly Changing Dimension transformations . Enhanced text data mining via the Term Extraction and Term Lookup transformations . Caching for Lookup transformations . Enhancing data values from a lookup table via the Data Lookup and Fuzzy Lookup transformations . The identification of similar data rows via the Fuzzy Grouping transformation . Multiple downstream data flow component data distribution via the Conditional Split and Multicast transformations . The merging and combining of data rows from multiple upstream data flow compo- nents via the Union All, Merge, and Merge Join transformations . Extensive copying and modifying of column data values, using the Copy Column, Data Conversion, and Derived Column transformations . Sample rowset extractions, using the Percentage Sampling and Row Sampling transformations . Sorting of data and identification of duplicate data rows via the Sort transformation SSIS includes a set of tools and features that support managing, editing, executing, and migrating DTS packages from earlier versions of SQL Server. You can see all available DTS ptg 2105 SSIS Architecture and Concepts 52 packages in SSMS (in a separate branch). You can also choose to migrate old DTS packages (from SQL Server 2000) forward to SSIS packages (to SQL Server 2008) via the Package Migration Wizard. It’s quite easy. If you can’t migrate your old DTS packages yet, you can directly execute DTS packages from SSIS packages. If you need to be able to design changes to existing DTS packages, you can either download the special DTS designer version for SQL Server 2008 from Microsoft’s website, or just bite the bullet and migrate them forward. We recommend migration as rapidly as is feasible. SSIS Architecture and Concepts You can think of SSIS as a data import/export/transformation layer in the overall system architecture that you are deploying for at least most of your Microsoft-based applications and a few non-Microsoft applications (see Figure 52.5). SSIS allows you to “data enable” almost all the individual applications or systems that are part of your overall implementa- tion, such as OLTP databases, multidimensional cubes, OLAP data warehouses, Excel files, Access databases, flat files, other heterogeneous database sources, and even web services. The Integration Services object model includes both native and managed APIs for doing most SSIS work. This includes APIs for any of the SSIS tools, the command-line utilities, and even custom applications. SSIS Designer and the Integration Services Wizard both use the Integration Services object model. SSIS includes the integration service itself (that is, the service that manages all SSIS packages), the Integration Services object model, the SSIS runtime and runtime executables, and the data flow task (which has a data flow engine, source, transformation, and destination components). FIGURE 52.5 SSIS architecture. ptg 2106 CHAPTER 52 SQL Server Integration Services Microsoft uses SSIS packages to implement any data movement/transformation. Basically, Microsoft treats SSIS packages as if they are managed code and requires that you create Integration Services projects and deployment utilities as part of managing these SSIS pack- ages. In addition, separate Integration Services Connection projects can now be created to aid in connection and provider services. All in all, this is a very good approach that signif- icantly reduces errors and allows you to go through a reasonably formal release to produc- tion (that is, development and deployment) cycle. SSIS packages contain a collection of connections, control flow elements, data flow elements, event handlers, variables, and configurations. They take the form of tasks, containers, transformations, and workflows. SSIS packages go through one or more steps that are either executed sequentially or in parallel at package execution time. In a nutshell, when an SSIS package is executed, it does the following: 1. Connects to any identified data source 2. Copies data (and database objects, if needed) 3. Transforms data 4. Disconnects from the data sources 5. Notifies users, processes, and even other packages of events (such as sending an email when something is done or has errors) The basic SSIS package consists of the following: . SSIS packages—A package is a discrete, named collection of connections, control flow, and data flows that implement data movement/data transformation. . SSIS control flow and tasks—One or more tasks and containers drive what the package does. You organize control flow based on what you want the package to do. Tasks are the actions taken to accomplish the desired data transformation and move- ment. A task can execute any SQL statement, send mail, bulk insert data, execute an ActiveX script, run a Visual Studio Tool for Application script (VSTA), or launch another package or an external program. . SSIS containers—A container groups one or more related tasks that you want to manage together (and reuse together). . Workflows—Workflows are definable precedence constraints that allow you to link two tasks, based on whether the first task executes, executes successfully, or executes unsuccessfully. Workflow containers are the wrappers for the tasks and are the means for the flow of control. A task can run alone, parallel to another task, or sequentially, according to precedence constraints. Precedence constraints are of three types: . Unconditional—It does not matter whether the preceding step failed or succeeded. . On success—The preceding step must have been successful for the execution of the next step. . On failure—This constraint returns the appropriate error. ptg 2107 SSIS Architecture and Concepts 52 . SSIS data flow—The data flow identifies the sources and destinations that extract and load data; identifies the transformations that manipulate or enhance the data; and provides the paths that link sources, transformations, and destinations. . SSIS data flow task—A data flow task creates, orders, and runs the data flows themselves, using a data flow engine. . SSIS transformations—Transformations are one or more functions or operations applied against a piece of data before the data arrives at the destination. In SSIS, everything is pretty much a task or a collection of tasks (one or more containers, tasks in containers), as you can see in Figure 52.6. Control flow determines the overall execution of the package and data flows that access the data, transform it, and write it. Precedence constraints determine the overall control flow—connecting the executables, containers, and tasks into an ordered control flow. SSIS also has several objects that extend package functionality: . SSIS event handlers—These workflow tasks run in response to events raised by a package, task, or container. This is much the same as most programming languages, such as Java or C#. If a task (or package or container) has some issue (that is, raises an event), the event handler can be used to handle the issue appropriately. Typical events in data transformation processing that need to be handled with event handlers might include connections not being established, disk space issues, and so FIGURE 52.6 SSIS package elements. ptg 2108 CHAPTER 52 SQL Server Integration Services on. You can even have the event handlers write out emails or initiate other workflows. . SSIS configurations—These objects are used to help parameterize many of the previously hard-bound characteristics of packages at runtime. When a package is run, the configuration information is loaded (updating the values of the package’s properties), and then the package is run using the new configuration values (all without having to modify the package). SSIS configurations use the classic prop- erty/value pair paradigm to represent the properties that are to be configurable. Following are the varied methods of representing configuration files: . XML configuration file—This file identifies the configuration property/value pairs for any number of configuration values. The following sample XML configuration file is for a package named UnleashedPackage with a property of PKGVar: <?xml version=”1.0”?> <DTSConfiguration> <DTSConfigurationHeading> <DTSConfigurationFileInfo GeneratedBy=”DatabaseArchitechs\PBertucci” GeneratedFromPackageName=”UnleashedPackage” GeneratedFromPackageID=”{3GV09721-816B-4E28-9878-0DE37A150234}” GeneratedDate=”7/09/2009 7:12:22 AM”/> </DTSConfigurationHeading> <Configuration ConfiguredType=”Property” Path=”\Package.Variables[User::PKGVar].Value” ValueType=”Int32”> <ConfiguredValue>0</ConfiguredValue> </Configuration> </DTSConfiguration> A configuration header contains information about the configuration file. This element includes attributes such as when the file was created and the name of the person who generated the file. In addition, a configuration element contains information about each configuration. This element includes attrib- utes such as the property path and configured value of a property. . Configuration table in SQL Server—This table stores configuration entries for use by the packages. . Environment variables (VARs)—These can be referenced by the package. . Parent package VARs—These can be used by child packages. . Entry in Registry—The Registry can also contain the configuration values. ptg 2109 SSIS Architecture and Concepts 52 . SSIS Logging—Logging can be done from any task or package to write out any type of logging information desired. When a supplied logging provider is used, a package can provide a rich runtime history. Logs are associated with packages (that is, the reference point), but any task (or container) can write to any package’s log. In this way, it is possible to have consolidated logs of a driver package with the full execu- tion history of all child packages. The log providers (out of the box) write to a flat file (text file) or to SQL Server tables. Other custom logging providers can be used, though. You can log what you need to log—start date/time, end date/time, records transformed, errors, and so on. . SSIS variables—SSIS has both system variables and user-defined variables. System variables provide runtime package object information to tasks or other packages. This information is helpful when you want to reference these system variables to help decide what to do next. (They can be used in expressions, scripts, and configu- rations.) User-defined variables are really for specialized variables that are not found as system variables and only have to be used within a package’s scope. Again, these variables can be used in expressions, scripts, and configurations within a package. SSIS packages can run other packages. This capability is very helpful when you want to granularly break out common data transformations for reuse by many different higher- level solutions (that is, higher-level packages that execute common-detail-level transforma- tion packages). NOTE When an SSIS package is first created, it is given a globally unique identifier (GUID) that is added to the package’s ID property and a name that is added to its NAME prop- erty. After these are created, they become part of the reference mechanism for the package itself. If you ever copy a package as the basis of a new package, you have to rename these two properties so they are unique (that is, new GUID and new NAME prop- erty). If you simply want to give an existing package a new NAME or ID value, you can do so directly or with the dtutil command-line utility. You can also create packages that can be restarted at a point of failure, including restarting specific tasks within a package (and not all the tasks in a package). This is a super addition to SQL Server 2008. If a package had more than one data flow task and one completed but the others didn’t, you could restart just the data flow tasks that had not completed without rerunning the ones that had worked fine. Long-running packages can also create checkpoints to provide milestones from which to restart. This capability will save many sleepless nights for the folks doing production support for data transformation processing. ptg 2110 CHAPTER 52 SQL Server Integration Services FIGURE 52.7 Package creation options within Visual Studio/BI Development Studio. SSIS Tools and Utilities SSIS includes several tools that simplify package creation, execution, and management. These tools are available within the Visual Studio/BI Development Studio IDE (as shown in the drop-down list in Figure 52.7) or integrated into other component-based tools (such as SSMS, as shown in Figure 52.8). Equally as easily, you can invoke SSIS functionality (for example, the SSIS Import and Export Wizard) from within SSMS (see Figure 52.8). Also, within SSMS, you can organize packages; execute packages (via the Execute Package utility); import and export packages to and from the SQL Server msdb database, the SSIS package store, and the filesystem ( .dtsx files); and migrate DTS packages (older SQL Server version packages). Following are the primary working environments for creating, managing, and deploying SSIS packages: . Import and Export Wizard—You can use this wizard, available within Visual Studio/BI Development Studio or from SSMS, to build packages to import, export, and transform data or to copy database objects (see Figure 52.9). This is an easy way to create the basic SSIS packages that you need quickly and deploy them with great ease. . SSIS Designer—This standard GUI is available in the Visual Studio/BI Development Studio, as part of an SSIS project. It lets you construct/manipulate packages contain- ing complex workflows, multiple connections to heterogeneous data sources, and ptg 2111 SSIS Tools and Utilities 52 FIGURE 52.8 Invoking SSIS import/export data (package creation) capability from within SSMS. FIGURE 52.9 The Import and Export Wizard from Visual Studio/BI Development Studio. ptg 2112 CHAPTER 52 SQL Server Integration Services FIGURE 52.10 The SSIS Designer IDE. even event-driven logic (see Figure 52.10). This is the same IDE that all code devel- opment uses in the .NET platform, making it extremely easy to start developing right away. . SSIS command-line utilities—A number of utilities are available within SSMS to aid you in running and managing SSIS packages (see Figure 52.11). One example is the Execute Package utility (which uses dtexec and dtutil command-line utilities). If the utility accesses a package that is stored in msdb, the command prompt may require a username and password. . SSIS Query Builder—Query Builder provides an easy-to-use GUI for quickly devel- oping SQL queries, testing the queries, and embedding them into the SSIS packages that you are developing. It is sort of like a mini SQL Query Profiler. It is entirely point-and-click oriented. Figure 52.12 shows the point at which you can invoke the Query Builder as you add Execute SQL Task as part of an SSIS package to the SQL Task Editor. Figure 52.13 shows the full Query Builder interface, along with a SQL statement that is being developed that retrieves address information from the AdventureWorks2008 Person.Address table. . SSIS Expression Builder—You can use Expression Builder to develop the simple or complex expressions that get used by a package (the expression property of the pack- age configuration). These expressions are things like validating working directories on a local machine where an SSIS package has been deployed and other complex ptg 2113 A Data Transformation Requirement 52 FIGURE 52.11 The Integration Services branch in SSMS. evaluations that you want to have used by an SSIS package property. This graphical tool enhances your ability to use these types of expressions for your SSIS packages. It not only helps you develop the expressions, but also evaluates them to make sure they are providing the proper results (much like what Query Builder does for SQL statements). Figure 52.14 shows a typical expression palette of both the variables that can have expressions defined for them and some of the functions (such as string functions) that can be used with the expression. Finally, after you have created SSIS packages, you need to execute them via command-line execution, within SQL programs, or via other .NET–supported programming languages. You can easily do this by using the dtexec package execution utility. You manage packages by using the dtutil utility. A Data Transformation Requirement Let’s consider a true-life data export requirement that is best served by using SSIS. The requirement is for a small business intelligence data mart (on SQL Server 2008) to be spun off each week from the main OLTP database (also on SQL Server 2008) that addresses a product sales manager’s need to see the total year-to-date business that a customer has generated. This data mart is merely a standard SQL Server database and tables that have been transformed (that is, aggregated) for a targeted purpose. As an option, the manager . is for a small business intelligence data mart (on SQL Server 2008) to be spun off each week from the main OLTP database (also on SQL Server 2008) that addresses a product sales manager’s need. changes to existing DTS packages, you can either download the special DTS designer version for SQL Server 2008 from Microsoft s website, or just bite the bullet and migrate them forward. We recommend migration. SSIS architecture. ptg 2106 CHAPTER 52 SQL Server Integration Services Microsoft uses SSIS packages to implement any data movement/transformation. Basically, Microsoft treats SSIS packages as if

Ngày đăng: 05/07/2014, 02:20