Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
752,28 KB
Nội dung
DTS Connections and the DataTransformation Tasks
P
ART II
126
As indicated by its name, the Transform Data task is at the heart of Data Transformation
Services. This task is a data pump that moves data from a data source to a data destination,
giving you the opportunity to modify each record as you move it.
Three chapters of this book are devoted to the Transform Data task:
• This chapter outlines the task’s basic functionality and properties.
• Chapter 7, “Writing ActiveX Scripts for a Transform Data Task,” describes the use of
ActiveX scripts to programmatically control data transformations. This chapter also dis-
cusses creating and using lookups.
• Chapter 9, “The Multiphase Data Pump,” shows how to use the new SQLServer 2000
capability to write code for eight different events in the operation of the Data Pump.
There are also chapters devoted to the other two datatransformation tasks:
• Chapter 8, “The Data Driven Query Task,” describes a task that can define several output
queries in the process of data transformation.
• Chapter 10, “The Parallel Data Pump Task,” describes a new task that lets the data pump
use hierarchical recordsets.
Additional key information relating to the Transform Data task can be found in these chapters:
• Chapter 5, “DTS Connections”
• Chapter 27, “Handling Errors in a Package and Its Transformations”
• Chapter 28, “High Performance DTS Packages”
• Chapter 32, “Creating a Custom Transformation with VC++”
It’s possible to get confused about the naming of the Transform Data task. Some peo-
ple refer to it as the Data Pump task, reflecting the
DataPumpTask and DataPumpTask2
objects that implement this task. It is also called the DataTransformation task.
NOTE
When to Use the Transform Data Task
I have built DTS packages that don’t have any Transform Data tasks, and I have built other
packages in which this task did all the movement and manipulation of the data.
The Transform Data task is one of the most versatile of all the DTS tasks. Many of the others
have limitations that prevent them from being used in certain circumstances. The Transform
Data task can be used with a variety of data sources and destinations, it delivers high perfor-
mance, and you can manipulate data in a very precise way.
09 0672320118 CH06 11/13/00 4:56 PM Page 126
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
I decide whether or not to use the Transform Data task by going through a process of elimina-
tion. If another task will do the job better, I choose it. If I can’t use any of the other tasks
because of their limitations, I use the Transform Data task.
Consider these specialized situations where other tasks are more effective:
• If you are transferring whole databases from SQLServer 7.0/2000 to SQLServer 2000,
use the Transfer Databases task.
• If you are transferring database objects (tables, views, stored procedures, and so on)
from a SQLServer 7.0/2000 database to a SQLServer 7.0/2000 database, use a Transfer
SQL Server Objects task.
• If you need to choose between several queries when transforming each row of data, con-
sider using the Data Driven Query task. (But the Transform Data task in SQL Server
2000 now allows you to modify data using lookups, which removes some of the Data
Driven Query task’s advantage in this area.)
• If your data source is a text file, your data destination is SQL Server, you are not trans-
forming the data as it’s being imported, and you want the fastest possible speed for your
data movement, use the Bulk Insert task.
• If you are moving data between tables in the same type of relational database, consider
using an Execute SQL task. It will be faster than the Transform Data task, but you lose
the flexibility of row-by-row processing.
• If you are moving hierarchical rowsets, take advantage of the new Parallel Data Pump
task.
• If you need to move data files to another location, use the FTP task.
In all other cases, use the Transform Data task to transform your data.
The Transform Data Task
C
HAPTER 6
6
T
HE
T
RANSFORM
D
ATA
TASK
127
When I was first learning DTS development, I used the Transform Data task a lot
more than I do now.
I’ve realized that there are many situations where one or more Execute SQL tasks will
move my data significantly faster. The Transform Data task is a high-speed data
pump, but it still has to process each row of data sequentially, and the high perfor-
mance of set-oriented SQL queries can often beat it.
I’ve also started using the Bulk Insert task more often because it delivers much better
performance.
If you need the Transform Data task, use it. It gives you Rapid Application
Development and excellent performance. But it’s also good to be aware of the alter-
natives.
TIP
09 0672320118 CH06 11/13/00 4:56 PM Page 127
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Creating a New Transform Data Task
You can create Transform Data tasks in the Package Designer, in the DTS Import/Export
Wizard, and in code.
Using the Package Designer
You can create a new Transform Data task in the Package Designer in several different ways. I
recommend the new way provided in SQLServer 2000:
1. Create two connections, one for the data source and the other for the data destination.
2. Select the Transform Data task from the task palette, the toolbar, the Task menu, or Add
Task on the pop-up menu.
3. An icon will appear that contains the words “Select source connection.” Move the cursor
to the connection you are going to use for the source and select it.
4. The icon will change and will now have the words “Select destination connection,” as
shown in Figure 6.1. Click on the connection to be used for the destination. You’ve just
created a Transform Data task.
DTS Connections and the DataTransformation Tasks
P
ART II
128
FIGURE 6.1
An icon directs you to choose a source connection and then a destination connection.
09 0672320118 CH06 11/13/00 4:56 PM Page 128
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
You can also create a Transform Data task by doing any of the following:
•Reverse steps 2 and 3. If you select a connection before choosing the Transform Data
task, that connection will be used as the source.
• Select a connection for the source. Press and hold the Shift key while selecting the con-
nection for the destination. Then select the Transform Data task.
•Draw a marquee around the two connections to be used for the Transform Data task.
Then select the Transform Data task. The first connection included in the marquee will
usually be used as the source (but not always).
Using the DTS Import/Export Wizard
If you want to create Transform Data tasks for several tables at the same time, consider using
the Import/Export Wizard. If the tables have the same names in the source and the destination,
those tables will be connected automatically. If any table does not exist in the destination, the
wizard will also make an Execute SQL task with a CREATE TABLE statement for that table. This
statement creates a destination table with the same design and structure as the source table.
The wizard sets a precedence constraint so that the table is created before the Transform Data
task is executed.
Using Code
The Transform Data task is implemented in SQLServer2000 with a DataPumpTask2 object.
This object inherits all the collections, properties, and methods of the SQLServer 7.0
DataPumpTask object and adds some new properties. All these collections and properties are
described in this chapter. The last two sections of the chapter have code samples showing how
to create a Transform Data task and all the different types of transformations.
The Description and Name of the Task
The Source tab of the Transform Data Task Properties dialog has a place to enter a description
of the task. This sets the Description property of the task, which is displayed for each task in
the DTS Designer and when the package is executed.
The
Description property of a task is more important than the Name property—unless you
want to refer to a task in code. The names of many of the tasks, including the Transform Data
task, are not shown in the Package Designer interface. If you want to view or set the Name
property, you have to use Disconnected Edit or code.
The Transform Data Task
C
HAPTER 6
6
T
HE
T
RANSFORM
D
ATA
TASK
129
09 0672320118 CH06 11/13/00 4:56 PM Page 129
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The most convenient way to refer to a task in code is by using its name, as shown in this sam-
ple of VBScript:
Dim pkg, tsk, cus
set pkg = DTSGlobalVariables.Parent
set tsk = pkg.Tasks(“tskLoadSalesFact”)
DTS Connections and the DataTransformation Tasks
P
ART II
130
When I create a task using the Package Designer, I often rename it immediately using
Disconnected Edit. The name has to be changed in two places—the
Name property of
the
Task object and the TaskName object of the Step object.
The default names created by the Package Designer are not very descriptive:
DTSTask_DTSDataPumpTask_1
DTSTask_DTSDataPumpTask_2
DTSTask_DTSDataPumpTask_3
The names created by the Import/Export Wizard are very descriptive, but they are
long and difficult to type in code:
Copy Data from dbEmployee to [SalesDataMart].[dbo].[Employee] Task
Copy Data from dbCustomer to [SalesDataMart].[dbo].[Customer] Task
Copy Data from dbProductInfo to [SalesDataMart].[dbo].[Product] Task
I prefer task names that are short but also descriptive:
tskLoadEmployee
tskLoadCustomer
tskLoadProduct
Make sure you change the
TaskName of the Step object at the same time as you
change the
Name of the Task object. If you don’t, the task will not be executed.
I don’t believe there are any other risks in changing task names in Disconnected Edit,
unless the existing names are referenced in code.
If you aren’t planning to refer to a task in code, you don’t need to rename it. But if
you are referencing your tasks in ActiveX Scripts or exporting your packages to VB for
editing, you can make your code clearer by creating better task names.
TIP
The Source of a Transform Data Task
The Source tab of the Transform Data Task Properties dialog, shown in Figure 6.2, displays
the name of the source connection. You cannot change this connection without using code or
Disconnected Edit.
09 0672320118 CH06 11/13/00 4:56 PM Page 130
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
FIGURE 6.2
The first tab of the Transform Data Task Properties dialog displays the data source properties.
In some cases, you have the opportunity to specify which data from the source is to be used.
Your choices differ depending on the type of source you are using—a text file, a relational
database, or a multidimensional database.
Text File Source
If the data source is a text file, you don’t have any more choices to make on this tab. The file,
as it is specified in the connection, will be the source for the transformation.
The Transform Data Task
C
HAPTER 6
6
T
HE
T
RANSFORM
D
ATA
TASK
131
You cannot use binary files as the source for the Transform Data task. You have to
convert them to text files first, and you cannot use any of the built-in DTS tasks to do
this conversion.
NOTE
SQL Table, View, or Query for a Relational Database
Source
If the data source is a relational database, you can choose between using a table, a view, or a
query as the source for the transformation. A list shows the names of all the tables and views.
09 0672320118 CH06 11/13/00 4:56 PM Page 131
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
If you elect to use a query as the transformation source, you have three options for creating the
query:
•Type the query into the box on the Source tab.
• Choose the Browse button to find a file that has a SQL statement in it.
• Choose the Build Query button and design the query in the DataTransformation Services
Query Designer.
There is also a Parse Query button that checks the query syntax and the validity of all the field
and table names used.
DTS Connections and the DataTransformation Tasks
P
ART II
132
Do as much of the data manipulation as possible in the source query of the data
transformation. Consider using CASE statements or joins to lookup tables to homoge-
nize data values. You can greatly improve performance, especially if you are able to
move from ActiveX Script transformations to the faster Copy Column transformations.
TIP
The DataTransformation Services Query Designer
The DataTransformation Services Query Designer is shown in Figure 6.3. It is the same query
designer that is available in the Enterprise Manager for looking at table data and for creating a
view.
FIGURE 6.3
The DataTransformation Services Query Designer provides an interactive design environment for creating queries.
09 0672320118 CH06 11/13/00 4:56 PM Page 132
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
There are four panes in the Query Designer:
• The Diagram pane is shown at the top of Figure 6.3. Any changes that you make in this
box are immediately reflected in the Grid and SQL panes. In the Diagram pane, you can
do the following:
Drag tables into the pane from the table list at the left.
Join tables by dragging a field from one table to another.
Right-click the join line to choose a dialog for setting the properties of the join.
Select fields to include in the query output.
Right-click a field and choose it for sorting.
Highlight a field and pick the group by icon on the toolbar.
• The Grid pane provides a more detailed view for specifying how individual columns are
used in the query. Changes in this pane are immediately reflected in the Diagram pane
and the SQL pane.
• The SQL pane shows the text of the SQL statement that is being generated for this query.
Changes here are not made immediately in the Diagram and Grid panes, but they are
made as soon as you click any object outside the SQL pane.
• The Results pane shows the results of running the query you are designing. The effects
of the changes you make in the query design are not reflected until you rerun the query
by clicking the Execute button on the toolbar.
The Transform Data Task
C
HAPTER 6
6
T
HE
T
RANSFORM
D
ATA
TASK
133
Right-clicking in any of the panes brings up a menu that includes the Properties dia-
log for the query. Among other things, you can choose the TOP X or TOP X PERCENT
of the records in a resultset.
TIP
MDX Query for a Multidimensional Cube Source
You may also want to get data from an OLAP cube. You can connect to Microsoft OLAP
Services cubes with the Microsoft OLE DB Provider for OLAP Services.
On the Source tab of the Transform Data Task Properties dialog, select SQL Query and type
your MDX Statement in the box. You can also use the browse button to find a file that has the
MDX statement in it. Don’t try to use the Query Designer. It’s not ready to generate MDX
queries—yet!
09 0672320118 CH06 11/13/00 4:56 PM Page 133
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
I’ve used MDX statements to return a single value to verify the results of a data load and cube
process. For example, if I know the number of new orders that are being imported into the
cube’s fact table, I can query the cube before and after it’s processed to verify that number:
select {[Measures].[Order Count]} on columns from OrdersCube
DTS Connections and the DataTransformation Tasks
P
ART II
134
You could choose to use a Table/View option, but the choices that show up in the list
are entire cubes. You will generate a cellset that returns every cell of the cube. The
lowest level of every dimension is returned. It can take a long time to load even a
small cube like Warehouse from the Foodmart sample OLAP database.
NOTE
The MDX language allows you to return a cubeset of any number of dimensions from
0 to 64. The Transform Data task can only handle 1- and 2-dimension cubesets.
The task won’t handle the following valid MDX query, which returns a 0-dimension
cellset:
select from warehouse
This query fails because it doesn’t supply a column heading, so the resulting value
can’t be referenced to create a transformation.
NOTE
Using XML as the Source
You can use an XML document as the data source for a Transform Data query, if you have an
OLE DB provider that supports XML. An XML provider was not shipped with the initial
release of SQLServer 2000.
I have used the DataDirect XML ADO Provider from Merant.
NOTE
09 0672320118 CH06 11/13/00 4:56 PM Page 134
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Using Parameters in a Source Query
One of the new features in SQLServer2000 is the ability to use parameters in a source query
of the Transform Data task:
SELECT ProductID, Quantity, Price, SalesDate
FROM Sales
WHERE SalesDate = ?
You assign a value to the parameter by using a global variable. This reference is resolved at
runtime.
You make the assignments by clicking on the Parameters button. Then, on the Parameter
Mapping dialog (shown in Figure 6.4), choose a global variable to use as the Input Global
Variable for each of your parameters.
The Transform Data Task
C
HAPTER 6
6
T
HE
T
RANSFORM
D
ATA
TASK
135
FIGURE 6.4
You map the parameters in your source query to global variables using the Parameter Mapping dialog.
If you want to create a new global variable, click the Create Global Variables button. Within
the Global Variables dialog, you can create, modify, or delete each global variable in the DTS
package. Each global variable must have a unique name and a datatype. You can also assign
the variable a default value.
09 0672320118 CH06 11/13/00 4:56 PM Page 135
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
[...]... in the Testing Transformation dialog, and the data produced by the test is shown in the View Data dialog The Collections That Implement a Transformation A Transform Data task has a Transformations collection that contains one object for each transformation that has been defined Each mapping line corresponds to one Transformation object The Transform Data Task CHAPTER 6 ‘Assume DTS .Transformation variable... file • 2—DTSExceptionFile_ErrorFile—Create the SQL Server2000 Error Text file This cannot be used at the same time as the 7.0 format file because they are assigned the same filename • 4—DTSExceptionFile_SourceRowFile—Create the SQL Server2000 Source Row file • 8—DTSExceptionFile_DestRowFile—Create the SQL Server2000 Destination Row file The Transform Data Task CHAPTER 6 • 256—DTSExceptionFile_Ansi—Create... Properties of the Transform Data Task You can set error handling, data movement, and SQL Server- specific properties on the Options tab of the Transform Data Task Properties dialog, shown in Figure 6.22 FIGURE 6.22 Error handling and data movement are among the properties set on the Options tab of the Transform Data Task Properties dialog 154 DTS Connections and the DataTransformation Tasks PART II... TransformDataTest, the following assignments would be made: • Task Description—TransformDataTest • Step Description—TransformDataTest • Source Connection Description—SourceTransformDataTest • Destination Connection Description—DestTransformDataTest • Task Name—tskTransformDataTest The Transform Data Task CHAPTER 6 • Step Name—stpTransformDataTest • Destination Connection Name—conDestTransformDataTest... this transformation 6 THE TRANSFORM DATA TASK The Transformation object itself has two collections, one containing the source columns and the other containing the destination columns These collections are referenced in Visual Basic as the SourceColumns and DestinationColumns of the Transformation object: 147 148 DTS Connections and the DataTransformation Tasks PART II The Transformation Types In the SQL. .. SQLServer 7.0 version of DataTransformation Services, you could choose between two types of transformations, Copy Column or ActiveX script There are seven more choices in SQL Server2000 The DateTime String In the previous version of DTS, it was possible to convert dates to new formats, but it took a lot of ActiveX programming You can get the same results much faster with the new DateTime String transformation. .. provider for SQLServer for the destination connection NOTE Chapter 28, “High Performance DTS Packages,” has charts showing the relative performance of the Transform Data task with different options The most important performance choice with the Transform Data task is to use fast load, which is selected by default A datatransformation with fast load executes about 130 times faster than a data transformation. .. the Custom Transformation You can create a new type of transformation, or use a Custom Transformation that someone else has made For more information about Custom Transformations, refer to Chapter 32 The Transform Data Task CHAPTER 6 153 6 THE TRANSFORM DATA TASK FIGURE 6.21 The ActiveX Script Transformation Properties dialog gives you a place to write code that executes for each row of data Other... requires references to the Microsoft DTSPackage Object Library and the Microsoft DTSDataPump Scripting Object Library LISTING 6.1 The Visual Basic Code to Create a Transform Data Task Option Explicit Public Function fctCreateTransformDataTask( _ pkg As DTS.Package2, _ Optional sBaseName As String = “TransformDataTask”, _ Optional sSourceDataSource As String = “”, Optional sDestDataSource As String = “”,... However, a new feature in SQL Server2000 is the addition of the Populate from Source button on the Define Columns dialog Clicking this button automatically rematches the columns from the source The Transform Data Task CHAPTER 6 139 DataPumpTask Destination Properties 6 The properties for the destination of a Transform Data task are similar to those for the source: THE TRANSFORM DATA TASK • DestinationConnectionID—An . procedures, and so on)
from a SQL Server 7.0 /2000 database to a SQL Server 7.0 /2000 database, use a Transfer
SQL Server Objects task.
• If you need to choose. SQL Server 2000,
use the Transfer Databases task.
• If you are transferring database objects (tables, views, stored procedures, and so on)
from a SQL Server