1. Trang chủ
  2. » Công Nghệ Thông Tin

Microsoft SQL Server 2000 Data Transformation Services- P13

50 369 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 50
Dung lượng 894,48 KB

Nội dung

F IGURE 28.5 Column names versus column ordinal numbers in scripts in Transform Data tasks. There are two problems with switching from column names to column ordinal numbers: • The code is harder to read and write. • The ordinal numbers do not consistently identify the columns. All the ordinal numbers of the columns are changed whenever you view the Source Columns tab or the Destination Columns tab of the Transformation Options dialog. Listing 28.1 shows VBScript code for an ActiveX Script task that will dynamically modify all the ActiveX Script transformations in all the Transform Data tasks in the package, replacing the column names with the column ordinal numbers. This code is included on the CD in a package stored in the ReplaceNamesWithOrdinals.dts file. This package also has a task that switches all the ordinal numbers back to names, which can be run at the end of the package execution. L ISTING 28.1 VBScript Code That Switches Column Names to Column Ordinal Numbers for All Transform Data Tasks in a DTS Package Option Explicit Function Main() DTS Packages and Steps P ART V 576 34 0672320118 CH28 11/13/00 5:01 PM Page 576 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Dim pkg, tsk, cus, trn, col Dim sScript, sFind, sReplace Set pkg = DTSGlobalVariables.Parent For Each tsk in pkg.Tasks Select Case tsk.CustomTaskID Case “DTSDataPumpTask”, “DTSDataPumpTask2” Set cus = tsk.CustomTask For Each trn in cus.Transformations Select Case trn.TransformServerID Case “DTSPump.DataPumpTransformScript”, _ “DTSPump.DataPumpTransformScriptProperties2” sScript = trn.TransformServerProperties(“Text”).Value For Each col in trn.DestinationColumns sFind = “DTSDestination(“”” & col.Name & “””)” sReplace = “DTSDestination(“ & CStr(col.Ordinal) & “)” sScript = Replace(sScript, sFind, sReplace) Next For Each col in trn.SourceColumns sFind = “DTSSource(“”” & col.Name & “””)” sReplace = “DTSSource(“ & CStr(col.Ordinal) & “)” sScript = Replace(sScript, sFind, sReplace) Next trn.TransformServerProperties(“Text”).Value = sScript End Select Next End Select Next Main = DTSTaskExecResult_Success End Function High-Performance DTS Packages C HAPTER 28 28 H IGH - P ERFORMANCE DTS P ACKAGES 577 L ISTING 28.1 Continued 34 0672320118 CH28 11/13/00 5:01 PM Page 577 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Fetch Buffer Size, Table Lock, and Insert Batch Size When you use Fast Load, you can choose from several specific loading options. We found that two of these options, Table Lock and Insert Batch Size, had a definite effect on performance. We expect that some of the other Fast Load options also could have an effect on performance in specific situations. Table 28.6 and Figure 28.6 show the effect on performance of Table Lock, Insert Batch Size, and Fetch Buffer Size, a setting that can be used whether or not Fast Load is selected. Our tests were conducted without other users in the database. The default choice, with Table Lock off, an Insert Batch Size of 0 (load all records in a single batch), and a Fetch Buffer Size of 1, is shown first. We were unable to increase the Fetch Buffer Size beyond 5000. T ABLE 28.6 The Effect of Table Lock, Insert Batch Size, and Fetch Buffer Size in the Transform Data Task % Diff from Individual Test Description Records Per Second Transformations Lock Off, Insert 0, Fetch 1 15,000 0% Lock On, Insert 0, Fetch 1 16,000 6.7% faster Lock Off, Insert 0, Fetch 5000 16,000 6.7% faster Lock Off, Insert 5000, Fetch 1 13,300 11.3% slower Lock On, Insert 100,000, Fetch 5000 21,000 40% faster Lock On, Insert 0, Fetch 5000 22,000 46.7% faster Moving Transformation Logic to the Source Query Chapter 7, “Writing ActiveX Scripts for a Transform Data Task,” has a series of examples that show how you can move transformation logic from an ActiveX script into a source query: •Simple string manipulation. Select au_lname + ‘, ‘ + au_fname as au_fullname from AuthorName •Assigning an unknown value: select case when au_lname is null or au_lname = ‘’ then ‘Unknown Name’ when au_fname is null or au_fname = ‘’ then au_lname else DTS Packages and Steps P ART V 578 34 0672320118 CH28 11/13/00 5:01 PM Page 578 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. au_lname + ‘, ‘ + au_fname end as au_fullname from AuthorName • Looking up an unknown value in another table. For our test, 17% of the values were found in the lookup table. The lookup table contained a total of 23 records, so all of them could be stored in the lookup’s cache at the same time: select case when a.au_lname is null or a.au_lname = ‘’ or a.au_fname is null or a.au_fname = ‘’ then lkp.FullName else au_lname + ‘, ‘ + au_fname end as au_fullname from AuthorName a inner join tblAuthorNameList lkp on a.au_id = lkp.au_id High-Performance DTS Packages C HAPTER 28 28 H IGH - P ERFORMANCE DTS P ACKAGES 579 F IGURE 28.6 The effect of Table Lock, Insert Batch Size, and Fetch Buffer Size in the Transform Data task. Table 28.7 and Figure 28.7 show the performance of these three examples, comparing between having the transformation logic in the script of an ActiveX Script transformation and having the transformation logic in the source query and using a Copy Column transformation. 34 0672320118 CH28 11/13/00 5:02 PM Page 579 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. T ABLE 28.7 ActiveX Scripts Versus Source Queries in Transform Data Tasks Test Description Records Per Second % Diff from Simple Script Query—Simple manipulation 22,000 0% Script—Simple manipulation 6000 72.7% slower Query—Assigning value 21,000 4.5% slower Script—Assigning value 4800 78.2% slower Query—Table lookup 18,500 15.9% slower Script—Table lookup 2700 87.7% slower DTS Packages and Steps P ART V 580 F IGURE 28.7 ActiveX Scripts versus source queries in Transform Data tasks. These tests show the performance benefits of moving transformation logic into the source query. That’s especially true when a lookup is involved. In the last example in this test, the source query delivered a performance improvement by almost a factor of seven. The more complex the data transformation, the more beneficial it is to put the logic into the source query. Unfortunately, those more complex situations are where the option of writing a transformation script is the most convenient. N OTE 34 0672320118 CH28 11/13/00 5:02 PM Page 580 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Moving Logic into a Custom Transformation Chapter 32, “Creating a Custom Transformation with VC++,” shows how to create a custom transformation that finds the average value for a set of integer source fields. Table 28.8 and Figure 28.8 show the performance of this custom transformation compared with the perfor- mance of a transformation script and a source query with the same logic. This test was run with 10 fields being averaged together. T ABLE 28.8 Transformation Script Versus Custom Transformation Versus Source Query Test Description Records Per Second % Diff from Script Transformation Script 3333 0% Custom Transformation 15150 354% faster Source Query 15625 369% faster High-Performance DTS Packages C HAPTER 28 28 H IGH - P ERFORMANCE DTS P ACKAGES 581 My development strategy is to create all but the simplest transformations with scripts and then, if I need the better performance, convert that transformation logic par- tially or completely into source queries. F IGURE 28.8 Transformation script versus custom transformation versus source query. 34 0672320118 CH28 11/13/00 5:02 PM Page 581 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Performance of the Transform Data Task and the Data Driven Query Task Our testing indicates that when the Transform Data task uses Fast Load, it inserts records more than 100 times faster than the Data Driven Query task. If you cannot use Fast Load, the two tasks insert records at approximately the same speed. The reason you use the Data Driven Query task is so you can choose from between several queries as the result of a transformation. With SQL Server 2000, you can also do this with a Transform Data task by using a data modification lookup. You can reproduce the functionality of a Data Driven Query task in a Transform Data task by doing the following: • Replacing the Insert Query with a transformation. • Replacing the other three queries with data modification lookups. • Changing the logic of the transformation script. When the Insert Query should be exe- cuted, return a value of DTSTransformStat_OK . When any of the other queries should be executed, return a value of DTSTransformStat_SkipInsert and include code that exe- cutes the appropriate data modification lookup. DTS Packages and Steps P ART V 582 One additional advantage of using the data modification lookups is that you’re not limited to four possible queries, as with the Data Driven Query task. Of course, you can also use the data modification lookups in the Data Driven Query task to give yourself the extra possibilities. N OTE Our testing indicates that a Data Driven Query task update query is approximately 30% faster than an update performed by a lookup in a Transform Data task. If you have a data transformation that performs an insert 10% of the time and an update 90% of the time, the data transformation should be faster with the Data Driven Query task than with the Transform Data task using Fast Load. The performance advantage of the update queries in the Data Driven Query task is greater than the performance disadvantage for the insert queries. If you have a data transformation that performs an insert 50% of the time and update 50% of the time, the data transformation should be faster in a Transform Data task using Fast Load. The performance advantage of the Fast Load inserts in the Transform Data task should far out- weigh the performance disadvantage on the updates. 34 0672320118 CH28 11/13/00 5:02 PM Page 582 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Choosing a Scripting Language You can gain a performance improvement by using a different language in your transformation scripts. Books Online states that VBScript is approximately 10% faster than JScript, and that JScript is approximately 10% faster than PerlScript. High-Performance DTS Packages C HAPTER 28 28 H IGH - P ERFORMANCE DTS P ACKAGES 583 We have used VBScript exclusively in our ActiveX scripts. We have not tested the other scripting languages. N OTE Use of Parallel Processing to Improve Performance You can greatly improve the performance of a DTS package by maximizing the opportunities for parallel processing. This is especially true when the DTS package is executing on a server with multiple processors. You can achieve a higher level of parallel processing by doing the following: • Setting precedence constraints so that as many tasks as possible are allowed to execute at the same time. •Creating additional connections to the same database. One connection cannot be used by more than one task at a time. Tasks using the same connection can’t be executed in parallel. • Setting the Step object’s ExecuteInMainThread property to FALSE for all the steps. If two steps are both set to execute on the main thread, they can’t be executed in parallel. •Increasing the Package object’s MaxConcurrentSteps property. By default, this property is set to 4. This is too low in situations where you have many processors available. There are some factors that limit the use of these strategies: • The logic of your transformation might require that some tasks be completed before oth- ers are started. If so, you can force serial execution with precedence constraints. • If you are using transactions, you have to prevent access to a database from two different connections at the same time or the package will fail. You can avoid this problem by set- ting the precedence constraints so that the tasks execute serially or by only having a sin- gle connection to each database. Either way, you lose the performance benefit of parallel processing. • There are some tasks that must be executed on the main thread or they will generate errors. This is true for any custom task that is not free-threaded (including all custom tasks built with Visual Basic), tasks that modify properties in custom tasks that are not free-threaded, and any task with a script that calls a COM object written in Visual Basic. 34 0672320118 CH28 11/13/00 5:02 PM Page 583 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Effect of Indexing on Performance The indexes used on tables involved with data transformations can have a very significant impact on the performance of those transformations: •Proper indexing on source tables can improve the speed of the transformation tasks and the Execute SQL task if those tasks are filtering the records in the data source with a WHERE clause, ordering records with an ORDER BY clause, or aggregating records with a GROUP BY clause. •Indexes on destination tables can decrease the speed of transformations because the indexes have to be adjusted for the new records that are being entered. The amount of performance improvement or degradation due to indexes is very dependent on the details of the particular situation. The use of indexes always involves tradeoffs: • Is it faster to take the time to build an index that could improve performance? Is it faster to execute the source query without a proper index? • Is it faster to drop the indexes for the data destination, import the data, and re-create the indexes? Is it faster to leave the indexes in place? •Are all the indexes on the destination table really needed? Could some of them be elimi- nated so that the transformation process can be completed more quickly? Or could they be dropped and re-created after the DTS package has finished? If you don’t have any data in your destination table to begin with, normally you should drop the indexes and build them after the data transformation. If you already have data in the desti- nation table, you should test the transformation with and without the indexes in place to see which gives you the best performance. DTS Packages and Steps P ART V 584 You could divide your DTS package into several packages executed on several servers to achieve an even greater level of parallel execution. The packages can all be exe- cuted from one package with the Execute Package task described in Chapter 18. N OTE Of course, you may need to leave indexes in place because of other users who need to access the data. N OTE 34 0672320118 CH28 11/13/00 5:02 PM Page 584 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Considering Tools Other Than DTS Because of Performance Sometimes you may want to consider using non-DTS tools to achieve better performance. Using bcp for Exporting from SQL Server to Text Files It is faster to use bcp to bulk copy data from SQL Server to a text file than to use the Transform Data task. The high-performance Bulk Insert task cannot be used for moving data from SQL Server to a text file. Using Replication If you want to keep two databases synchronized with each other, you should use replication instead of DTS. Replication is often easier to set up, it often has better performance, and you have more synchronization options. DTS is needed when you’re changing (transforming) data. If you’re just copying data between two databases on a periodic basis, consider replication. Conclusion DTS gives you many tools for creating high-performance transformations, but there’s still a lot of work to do if you want to achieve the highest possible performance. High-Performance DTS Packages C HAPTER 28 28 H IGH - P ERFORMANCE DTS P ACKAGES 585 You can sometimes greatly improve performance by using a covering index. This is an index that includes all the fields used in a query, arranged in the appropriate order for that particular query. Unfortunately, because covering indexes are so large, they can significantly hurt the performance of database inserts and updates. T IP 34 0672320118 CH28 11/13/00 5:02 PM Page 585 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... SQL Server 2000 user interface and the DTS object model CAUTION If you upgrade a server from SQL Server 7.0 to SQL Server 2000, you also have to upgrade the Meta Data Services information models If you don’t, you will receive an error and will not be allowed to save packages to Meta Data Services SQL Server Books Online describes how to do this upgrade at Data Transformation Services\Sharing Meta Data\ DTS... that have been saved to Meta Data Services The Meta Data Browser is a new tool in SQL Server 2000 for accessing meta data You can use the Meta Data Browser in SQL Server 2000 or as a standalone snap-in for the Microsoft Management Console When you use it inside SQL Server, you access the Meta Data Browser under the Meta Data Services node in the Enterprise Manager All the databases and packages that... Browser in SQL Server 7.0 You can access it by highlighting the Meta Data node under the Data Transformation Services node for a particular SQL Server in the Enterprise Manager • The Meta Data Browser is a new tool in SQL Server 2000 You can find it in the Enterprise Manager in the Meta Data Services node for a particular SQL Server You can use the DTS Browser to update some of the business meta data contained... Meta Data Services with DTS Meta Data Services gives you a centralized place to store and access data transformation meta data Integrating DTS with Meta Data Services CHAPTER 29 589 Meta Data Meta data is data about data It is a description of the structures that are used to store, transform, and retrieve data There are two kinds of meta data in an enterprise information system: • Business meta data. .. also derived from the Database Information Model • The Data Transformation Services Information Model stores data transformation information that is specific to Microsoft s Data Transformation Services This model is derived from the Database Transformation Information Model The Meta Data Services SDK You can download the Meta Data Services Software Development Kit (SDK) from Microsoft s Web site This... another one for data types A generic model specifies the relationships between models The information models that relate to databases and data transformations are shown in Figure 29.10 Uniform Modeling Language Information Model Database Information Model SQL Server Information Model Database Transformation Information Model OLAP Information Model Data Transformation Services Information Model Microsoft. .. Meta Data Services That chapter also discusses the use of the PackageRepository object to retrieve general information about the packages stored in Meta Data Services NOTE Microsoft SQL Server 2000 Meta Data Services was called the Microsoft Repository in SQL Server 7.0 There have been many enhancements in the newer version, especially with the new Meta Data Browser and the ability to export meta data. .. primary tools for viewing the information in Meta Data Services You have the new Meta Data Browser You also have the DTS Browser, which provides the same capabilities as the SQL Server 7.0 Repository Browser The term “repository” is still used to describe the database that physically stores the meta data (which was called “metadata” in SQL Server 7.0) SQL Server Books Online has made these changes throughout... Meta Data Services, but it would require a good deal of specialized programming You should save packages to Meta Data Services because you want to handle the meta data in an organized and consistent way The DTS Browser The SQL Server 2000 Enterprise Manager provides two ways to view the meta data of databases and DTS packages that have been stored in Meta Data Services: Integrating DTS with Meta Data. .. 598 DTS Packages and Steps PART V The Repository Database The repository is the database that is used for the physical storage of the meta data By default, the msdb system database in SQL Server is used as the repository The data is stored in a set of tables with prefixes that identify the various information models (Dbm for Database Model, Dtm for Data Transformation model, and so on) The information . packages stored in Meta Data Services. Microsoft SQL Server 2000 Meta Data Services was called the Microsoft Repository in SQL Server 7.0. There have been. being used in the SQL Server 2000 user interface and the DTS object model. N OTE If you upgrade a server from SQL Server 7.0 to SQL Server 2000, you also

Ngày đăng: 07/11/2013, 20:15