Such file pointers are used by several elements with Integration Services, including the file system and FTP tasks for file manipulation and the Execute SQL task to identify the file fro
Trang 1to adjust names and locations for specific target servers UNC is a method of identifying a path so
that it can be accessed from anywhere on the network a package may be run; it takes the form of
\\servername\sharename\path\file.ext The many file configuration managers are listed here:
■ Flat File: Presents a text file as if it were a table, with locale and header options The file can
be in one of four formats:
■ Delimited: File data is separated by column (e.g., comma) and row delimiters (e.g.,{CR}{LF})
■ Fixed Width: File data has known sizes without column or row delimiters When opened
in Notepad, such a file appears as if all data is on a single line
■ Ragged Right: File data is interpreted using fixed width for all columns except the last, which is terminated by the row delimiter
Only files that use the delimited format are able to interpret zero-length strings as null
■ Multiple Flat Files: Same as the Flat File connection manager, but it allows multiple files to
be selected, either individually or using wildcards Data then appears as a single large table to Integration Services elements
■ File: Identifies a file or folder in the file system without specifying content Such file pointers are used by several elements with Integration Services, including the file system and FTP tasks for file manipulation and the Execute SQL task to identify the file from which a SQL statement should be read The usage type (Create file, Existing file, Create folder, Existing folder) ensures that the correct type of file pointer is created
■ Multiple Files: Same as the file connection manager, but it allows multiple files to be selected, either individually or using wildcards
■ Excel: Identifies a file containing a group of cells that can be interpreted as a table (0 or 1 header rows, data rows below without row or column gaps)
Special
Beyond Database and File connection managers, several other types are provided:
■ Cache: Defines a data cache location The cache is first populated using the Cache transform and then used by Lookup transforms within Data Flow tasks The cache is a write once, read many data store: All the data to be included in the cache must be written by a single Cache transform but can then be used by many Lookup transforms Configuring the connection manager requires that index columns be selected, so it is often easiest to use the New button from within the Cache transform
to create the connection manager, as it provides the column meta-data
Configure the connection manager by marking the columns that will be used to look up rows
in the Columns tab Mark the first column in the lookup as index position 1, the second as 2, and so on The lookups performed on a cache must use all of the marked columns and no others to find the row By default, the cache is created in memory and is available only in the current package Make the cache available on disk for use by subsequent packages by enabling file cache on the General tab and identifying the.CAWfile to be used to store the cached data
■ FTP: Defines a connection to an FTP server For most situations, entering the server name and credentials is sufficient to define the connection This is used with the FTP task to move and remove files or create and remove directories using FTP
Trang 2■ HTTP: Defines a connection to a Web Service Enter the URL of the WSDL (Web
Ser-vice Definition) for the Web SerSer-vice in question — for example,http://MyServer/
reportserver/reportservice.asmx?wsdlpoints to the WSDL for Reporting
Services on MyServer Used with the Web Service task to access Web Service
methods
■ MSMQ: Defines a connection to a Microsoft Message Queue; used in conjunction with a
Message Queue task to send or receive queued messages
■ SMO: Specifies the name and authentication method to be used with Database Transfer tasks
(Transfer Objects, Transfer Logins, etc.)
■ SMTP: Specifies the name of the Simple Mail Transfer Protocol Server for use with the Send
Mail task Older SMTP server versions may not support all the commands necessary to send
e-mail from Integration Services
■ WMI: Defines a server connection for use with Windows Management Instrumentation tasks,
which enable logged and current event data to be collected
Control flow elements
The Control Flow tab provides an environment for defining the overall work flow of the package The
following elements are the building blocks of that work flow
Containers
Containers provide important features for an Integration Services package, including iteration over a
group of tasks and isolation for error and event handling
In addition to containers, the Integration Services Designer will also create task groups Define a group
by selecting a number of Control Flow items, right-clicking one of the selected items, and choosing
Group This encloses several tasks in a group box that can be collapsed into a single title bar Note,
however, that this group has no properties and cannot participate in the container hierarchy — in short,
it is a handy visual device that has no effect on how the package executes
The containers available are as follows:
■ TaskHost: This container is not visible in a package, but implicitly hosts any task that is not
otherwise enclosed in a container Understanding this default container helps understand error
and event handler behaviors
■ Sequence: This simply contains a number of tasks without any iteration features, but it
pro-vides a shared event and error-handling context, allows shared variables to be scoped to the
container level instead of the package level, and enables the entire container to be disabled at
once during debugging
■ For Loop: This container provides the advantages of a Sequence container but runs the
tasks in the container as if the tasks were in a C# for loop For example, given an
inte-ger variable@LoopCount, assigning the For Loop propertiesInitExpressionto
@LoopCount=0,EvalExpressionto @LoopCount<3, andAssignExpressionto
@LoopCount=@LoopCount+1will execute the contents of the container three times, with
@LoopCountcontaining the values(0,1,2)on each successive iteration
Trang 3■ Foreach Loop: This container provides iteration over the contents of the container based on various lists of items:
■ File: Each file in a wildcarded directory command
■ Item: Each item in a manually entered list
■ ADO: Each row in a variable containing an ADO recordset or ADO.NET data set
■ ADO.NET Schema Rowset: Each item in the schema rowset
■ Nodelist: Each node in an XPath result set
■ SMO: List of server objects (such as jobs, databases, file groups) Describe the list to be iterated on the Collection page, and then map each item being iterated over to a corresponding variable For example, a File loop requires a single string variable
mapped to index 0, but an ADO loop requires n variables for n columns, with indexes 0
through n-1
Control flow tasks
Tasks that can be included in control flow are as follows:
■ ActiveX Script: Enables legacy VB and Java scripts to be included in Integration Services New scripts should use the Script task instead Consider migrating legacy scripts where possible because this task will not be available in future versions of SQL Server
■ Analysis Services Execute DDL: Sends Analysis Services Scripting Language (ASSL) scripts to
an Analysis Services server to create, alter, or process cube and data mining structures Often such scripts can be created using the Script option in SQL Server Management Studio
■ Analysis Services Processing Task: Identifies an Analysis Services database, a list of objects
to process, and processing options
■ Bulk Insert: Provides the fastest mechanism to load a flat file into a database table without transformations Specify source file and destination table as a minimum configuration If the source file is a simple delimited file, then specify the appropriate row and column delimiters;
otherwise, create and specify a format file that describes the layout of the source file Error rows cannot be redirected, but rather cause the task to fail
■ Data Flow: Provides a flexible structure for loading, transforming, and storing data as config-ured on the Data Flow tab See the section ‘‘Data Flow Components,’’ later in this chapter for the components that can be configured in a Data Flow task
■ Data Profiling: Builds an XML file to contain an analysis of selected tables Available anal-yses include null ratio, column length for string columns, statistics for numeric columns, value distribution, candidate keys, and inter-column dependencies Open the result-ing file in the Data Profile Viewer to explore the results Alternately, the analysis results can be sent to an XML variable for programmatic inspection as part of a data validation regimen
Configure by setting the destination and file overwrite behavior on the General page Select profiles to run either by pressing the Quick Profile button to select many profiles for a single table or by switching to the Profile Requests page to add profiles for one or more tables
Trang 4individually Add a new profile request manually by clicking the Profile Type pull-down list on
the first empty row
■ Data Mining Query: Runs prediction queries against existing, trained data mining models
Specify the Analysis Services database connection and mining structure name on the
Min-ing Model tab On the Build Query tab, enter the DMX query, usMin-ing the Build New Query
button to invoke the Query Builder if desired The DMX query can be parameterized by
placing parameter names of the form@MyParamNamein the query string If parameters are
used, then map from the parameter name (without the @ prefix) to a corresponding
vari-able name on the Parameter Mapping tab Results can be handled by sending them either to
variable(s) on the Result Set tab and/or to a database table on the Output tab:
■ Single-row result sets can be stored directly into variables on the Result Set tab by mapping
each Result (column) Name returned by the query to the corresponding target variable,
choosing the Single Row result type for each mapping
■ Multiple-row result sets can be stored in a variable of typeObjectfor later use with a
Foreach loop container or other processing On the Result Set tab, map a single Result
Name of 0 (zero) to the object variable, with a result type of Full Result Set
■ Independent of any variable mappings, both single-row and multiple-row result sets can be
sent to a table by specifying the database connection and table name on the Output tab
■ Execute DTS 2000 Package: Enables legacy DTS packages to be executed as part of the
Integration Services work flow Specify the package location, authentication information, and
DTS-style Inner/Outer variable mappings Optionally, once the package is identified, it can be
loaded as part of the Integration Services package Additional downloads are required in SQL
Server 2008 to enable DTS package execution; see Books Online for details
■ Execute Package: Executes the specified Integration Services package, enabling packages to
be broken down into smaller, reusable pieces Invoking a child package requires substantial
overhead, so consider the number of invocations per run when considering child packages
For example, one or two child packages per file or table processed is probably fine, but one
package per row processed is probably not The child package will participate in a transaction
if the Execute Package task is configured to participate Variables available to the Execute
Package task can be used by the child package by creating a ‘‘parent package variable’’
configu-ration in the child package, mapping each parent package variable to a locally defined package
variable as needed
■ Execute Process: Executes an external program or batch file Specify the program to be run in
theExecutableproperty, including the extension (e.g.,MyApp.exe), and the full path if the
program is not included in the computer’sPATHsetting (e.g.,C:\stuff\MyApp.exe) Place
any switches or arguments that would normally follow the program name on the command
line in theArgumentsproperty Set other execution time parameters as appropriate, such as
WorkingDirectoryorSuccessValueso Integration Services knows if the task succeeded
TheStandardInputVariableproperty allows the text of a variable to be supplied to
applications that read fromStdIn(e.g.,findorgrep) TheStandardOutputVariable
andStandardErrorVariableproperties enable the task’s normal and error messages to be
captured in variables
Trang 5■ Execute SQL: Runs a SQL script or query, optionally returning results into variables On the General page of the editor, set theConnectionTypeandConnectionproperties to specify which database the query will run against.SQLSourceTypespecifies how the query will be entered:
■ Direct Input: Enter into theSQLStatementproperty by typing in the property page, pressing the ellipses to enter the query in a text box, pressing the Browse button to read the query from a file into the property, or pressing the Build Query button to invoke the Query Builder
■ File connection: Specify a file that the query will be read from at runtime
■ Variable: Specify a variable that contains the query to be run
A query can be made dynamic either by using parameters or by setting theSQLStatement property using the Expressions page of the editor Using expressions is slightly more com-plicated but much more flexible, as parameter use is limited — only in theWHEREclause and, with the exception of ADO.NET connections, only for stored procedure executions or simple queries If parameters are to be used, then the query is entered with a marker for each parameter to be replaced, and then each marker is mapped to a variable via the Parameter Mapping page Parameter markers and mapping vary according to connection manager type:
■ OLE DB: Write the query leaving a ? to mark each parameter location, and then refer to each parameter using its order of appearance in the query to determine a name: 0 for the first parameter, 1 for the second, and so on
■ ODBC: Same as OLE DB, except parameters are named starting at 1 instead of 0
■ ADO: Write the query using ? to mark each parameter location, and specify any non-numeric parameter name for each parameter For ADO, it is the order in which the variables appear on the mapping page (and not the name) that determines which parameter they will replace
■ ADO.NET: Write the query as if the parameters were variables declared in Transact-SQL (e.g.,SELECT name FROM mytable WHERE id = @ID), and then refer to the parameter
by name for mapping
TheResultSetproperty (General page) specifies how query results are returned to variables:
■ None: Results are not captured
■ Single row: Results from a singleton query can be stored directly into variables On the Result Set tab, map each result name returned by the query to the corresponding target variable As with input parameters, result names vary according to connection manager type OLE DB, ADO, and ADO.NET connections map columns by numeric order starting
at 0 ODBC also allows numeric mapping but starts at 1 for the first column In addition, OLE DB and ADO connections allow columns to be mapped by column name instead of number
■ Full result set: Multiple-row result sets are stored in a variable of typeObjectfor later use with a Foreach loop container or other processing On the Result Set tab, map a single result name of 0 (zero) to the object variable, with a result type of Full Result Set
■ XML: Results are stored in an XML DOM document for later use with a Foreach loop container or other processing On the Result Set tab, map a single result name of 0 (zero)
to the object variable, with a result type of Full Result Set
Trang 6■ File System Task: Provides a number of file (copy, delete, move, rename, set attributes)
and folder (copy, create, delete, delete content, move) operations Source and destination
files/folders can be specified by either a File connection manager or a string variable that
contains the path Remember to set the appropriate usage type when configuring a File
con-nection manager (e.g., Create folder vs Existing folder) Set theOverwriteDestinationor
UseDirectoryIfExistsproperties to obtain the desired behavior for preexisting objects
■ FTP: Supports a commonly used subset of FTP functionality, including send/receive/delete files
and create/remove directories Specify the server via an FTP connection manager Any remote
file/path can be specified via either direct entry or a string variable that contains the file/path
A local file/path can be specified via either a File connection manager or a string variable that
contains the file/path Wildcards are accepted in filenames UseOverWriteFileAtDestto
specify whether target files can be overwritten, andIsAsciiTransferto switch between
ASCII and binary transfer modes
■ Message Queue: Sends or receives queued messages via MSMQ Specify the message
connec-tion, send or receive, and the message type
New in 2008
Script tasks and script components now use the Visual Studio Tools for Applications (VSTA) development
environment This enables C# code to be used in addition to the Visual Basic code supported by SQL
Server 2005 Scripts also have full access to Web and other assembly references, compared to the subset of
.NET assemblies available in SQL Server 2005
■ Script: This task allows either Visual Basic 2008 or Visual C# 2008 code to be embedded in a
task Properties include the following:
■ ScriptLanguage: Choose which language to use to create the task Once the script has
been viewed/edited, this property becomes read-only
■ ReadOnlyVariables/ReadWriteVariables: List the read and read/write variables to be
accessed within the script, separated by commas, in these properties Attempting to access
a variable not listed in these properties results in a run-time error Entries are case sensi-tive, somyvarandMyVarare considered different variables, although using the new Select Variables dialog will eliminate typos
■ EntryPoint: Name of the class that contains the entry point for the script There is
nor-mally no reason to change the default name (ScriptMain) It generates the following code shell:
Public Class ScriptMain Public Sub Main()
‘
‘ Add your code here Dts.TaskResult = Dts.Results.Success End Sub
End Class
Trang 7At the end of execution, the script must returnDts.TaskResultas either success or failure
to indicate the outcome of the task Variables can be referenced through theDts.Variables collection For example,Dts.Variables("MyVar").Valueexposes the value of the MyVarvariable Be aware that the collection is case sensitive, so referencing"myvar"will not return the value of"MyVar" TheDtsobject exposes several other useful members, including theDts.Connectionscollection to access connection managers,Dts.Events.Fire methods to raise events, and theDts.Logmethod to write log entries See ‘‘Interact-ing with the Package in the Script Task’’ in SQL Server 2008 Books Online for additional details
■ Send Mail: Sends a text-only SMTP e-mail message Specify the SMTP configuration manager and all the normal e-mail fields (To, From, etc.) Separate multiple addresses by commas (not semicolons) The source of the message body is specified by theMessageSourceType prop-erty:Direct Inputfor entering the body as text in theMessageSourceproperty,File Connectionto read the message from a file at runtime, orVariableto use the contents
of a string variable as the message body Attachments are entered as pipe-delimited file specs
Missing attachment files cause the task to fail
■ Transfer Database: Copies or moves an entire database between SQL Server instances
Choose between the fasterDatabaseOfflinemethod (which detaches, copies files, and reattaches the databases) or the slowerDatabaseOnline(which uses SMO to create the target database) Identify the source and destination servers via SMO connection managers
For theDatabaseOnlinemethod, specify the source and destination database names, and the path for each destination file to be created TheDatabaseOnlinemethod requires the
same information, plus a network share path for each source and destination file, as the copy
must move the physical files Specifying UNC paths for the network share path is the most general, but packages that are running on one of the servers can reference local paths for that server Using theDatabaseOnlinemethod requires that any objects on which the database depends, such as logins, be in place before the database is transferred
■ Transfer Error Messages: Transfers custom error messages (alasp_addmessage) from one server to another Identify the source and destination servers via SMO connection managers and the list of messages to be transferred
■ Transfer Jobs: Copies SQL Agent jobs from one SQL Server instance to another Identify the source and destination servers via SMO connection managers and the list of messages to be transferred Any resources required (e.g., databases) by the jobs being copied must be available
to successfully copy
■ Transfer Logins: Copies logins from one SQL Server instance to another Identify the source and destination servers via SMO connection managers and the list of logins to be transferred
The list may consist of selected logins, all logins on the source server, or all logins that have access to selected databases (see theLoginsToTransferproperty in the Task dialog)
■ Transfer Master Stored Procedures: Copies any custom stored procedures from the master database on one server to the master database on another server Identify the source and destination servers via SMO connection managers, and then select to either copy all custom stored procedures or individually mark the procedures to be copied
■ Transfer Objects: Copies any database-level object from one SQL Server instance to another
Identify the source and destination servers via SMO connection managers and the database on each server For each type of object, select to either copy all such objects or to individually identify which objects to transfer, and then enable copy options (e.g.,DropObjectsFirst,
Trang 8■ Web Service: Executes a Web Service call, storing the output in either a file or a
vari-able Specify an HTTP connection manager and a local file in which to store WSDL
information If the HTTP connection manager points directly at the WSDL file (e.g.,
http://MyServer/MyService/MyPage.asmx?wsdlfor theMyServiceWeb Service
onMyServer), then use the Download WSDL button to fill the local copy of the WSDL file;
otherwise, manually retrieve and create the local WSDL file SettingOverwriteWSDLFileto
truewill store the latest Web Service description into the local file each time the task is run
Once connection information is established, switch to the Input page to choose the service
and method to execute, and then enter any parameters required by the chosen method The
Output page provides options to output to either a file, as described by a File connection
manager, or a variable Take care to choose a variable with a data type compatible with the
result the Web Service will return
■ WMI Data Reader: Executes a Windows Management Instrumentation (WQL) query against
a server to retrieve event log, configuration, and other management information Select a WMI
connection manager and specify a WQL Query (e.g.,SELECT * FROM win32_ntlogevent
WHERE logfile = ‘system’ AND timegenerated > ‘20080911’for all system event
log entries since 9/11/2008) from direct input, a file containing a query, or a string
vari-able containing a query Choose an output format by setting theOutputTypeproperty to
‘‘Data table’’ for a separated values list, ‘‘Property name and value’’ for one
comma-separated name/property combination per row with an extra newline between records, or
‘‘Property value’’ for one property value per row without names UseDestinationTypeand
Destinationto send the query results to either a file or a string variable
■ WMI Event Watcher: Similar to a WMI data reader but instead of returning data, the task
waits for a WQL specified event to occur When the event occurs or the task times out,
the SSIS task eventsWMIEventWatcherEventOccurredorWMIEventWatcherEvent
Timeoutcan fire, respectively For either occurrence, specify the action (log and fire event or
log only) and the task disposition (return success, return failure, or watch again) Set the task
timeout (in seconds) using theTimeoutproperty, with 0 specifying no timeout
■ XML: Performs operations on XML documents, including comparing two documents (diff),
merging two documents, applying diff output (diffgram) to a document, validating a document
against a DTD, and performing XPath queries or XSLT transformations Choose a source
docu-ment as direct input, a file, or a string variable, and an output as a file or a string variable Set
other properties as appropriate for the selectedOperationType
Maintenance Plan tasks
Maintenance Plan tasks provide the same elements that are used to build maintenance plans for use
in custom package development Tasks use an ADO.NET connection manager to identify the server
being maintained, but any database selected in the connection manager is superseded by the databases
identified within each Maintenance Plan task Any questions about what a particular task does can be
answered by pressing the View T-SQL button on the maintenance task
For more information about database maintenance, see Chapter 42, ‘‘Maintaining the
Database.’’
The available tasks are as follows:
■ Back Up Database: Creates a native SQL backup of one or more databases
■ Check Database Integrity: Performs a DBCCCHECKDB
Trang 9■ Execute SQL Server Agent Job: Starts the selected SQL Agent job via thesp_start_job stored procedure
■ Execute T-SQL Statement: A simplified SQL-Server-only statement execution It does not return results or set variables: Use the Execute SQL task for more complex queries
■ History Cleanup: Trims old entries from backup/restore, maintenance plan, and SQL Agent job history
■ Maintenance Cleanup: Prunes old maintenance plan, backup, or other files
■ Notify Operator: Performs ansp_notify_operator, sending a message to selected on-duty operators defined on that SQL Server
■ Rebuild Index: Issues anALTER INDEX REBUILDfor each table, indexed view, or both in the selected databases
■ Reorganize Index: UsesALTER INDEX REORGANIZEto reorganize either all or selected indexes within the databases chosen It optionally compacts large object data
■ Shrink Database: Performs a DBCCSHRINKDATABASE
■ Update Statistics: Issues anUPDATE STATISTICSstatement for column, index, or all statistics in the selected databases
Data flow components
This section describes the individual components that can be configured within a Data Flow task:
sources of data for the flow, destinations that output the data, and optional transformations that can
change the data in between See the ‘‘Data Flow’’ section earlier in this chapter for general information
about configuring a Data Flow task
Sources
Data Flow sources supply the rows of data that flow through the Data Flow task Right-clicking a source
on the design surface reveals that each source has two different editing options: Edit (basic) and Show
Advanced Editor, although in some cases the basic Edit option displays the Advanced Editor anyway
The common steps to configuring a source are represented by the pages of the basic editor:
■ Connection Manager: Specify the particular table, file(s), view, or query that will provide the data for this source Several sources will accept either a table name or a query string from a variable
■ Columns: Choose which columns will appear in the data flow Optionally, change the default names of the columns in the data flow
■ Error Output: Specify what to do for each column should an error occur Each type of error can be ignored, cause the component to fail (default), or redirect the problem row to an error output Truncation errors occur when a string is longer than the destination allows, ‘‘Error’’
errors catch all other types of failures Don’t be confused by the ‘‘Description’’ column; it is not another type of error, but merely provides a description of the context under which the error could occur
The advanced editor provides the same capabilities as the basic editor in a different format, plus much
finer control over input and output columns, including names and data types When the rows sent
to the data flow are already sorted, they can be marked as such using the advanced editor On the
Trang 10Input and Output Properties tab, choose the top node of the tree and set theIsSortedproperty
totrue Then select each of the output (data flow) columns that make up the sort and enter a
SortKeyPositionvalue, beginning with 1 and incrementing by 1 for each column used in sorting
To mark a column as sorted descending, specify a negativeSortKeyPosition For example, giving
the Date and Category columnsSortKeyPositionvalues of -1 and 2, respectively, will mark the Date
descending and the Category ascending
The available sources are as follows:
■ OLE DB: The preferred method of reading database data It requires an OLE DB connection
manager
■ ADO.NET: Uses an ADO.NET connection manager to read database data, either by identifying
a database object or entering a query to execute
■ Flat File: Requires a Flat File connection manager Delimited files translate zero-length strings
into null values for the data flow when theRetainNullsproperty istrue
■ Excel: Uses an Excel connection manager and either a worksheet or named ranges as tables A
SQL command can be constructed using the Build Query button that selects a subset of rows
Data types are assigned to each column by sampling the first few rows, but can be adjusted
using the advanced editor
■ Raw: Reads a file written by the Integration Services Raw File destination (see the following
‘‘Destinations’’ section) in a preprocessed format, making this a very fast method of retrieving
data, often used when data processed by one stage of a package needs to be stored and reused
by a later stage Because the data has already been processed once, no error handling or output
configuration is required The input filename is directly specified without using a connection
manager
■ XML: Reads a simple XML file and presents it to the data flow as a table, using either an
inline schema (a header in the XML file that describes the column names and data types) or
an XSD (XML Schema Definition) file The XML source does not use a connection manager;
instead, specify the input filename and then either specify an XSD file or indicate that the file
contains an inline schema (Set theUseInlineSchemaproperty totrueor select the check
box in the basic editor)
■ Script: A script component can act as a source, destination, or transformation of a data flow
Use a script as a source to generate test data or to format a complex external source of data
For example, a poorly formatted text file could be read and parsed into individual columns by
a script Start by dragging a script transform onto the design surface, choosing Source from the
pop-up Select Script Component Type dialog On the Inputs and Outputs page of the editor,
add as many outputs as necessary, renaming them as desired Within each output, define
columns as appropriate, carefully choosing the corresponding data types On the Script page of
the editor, list the read and read/write variables to be accessed within the script, separated by
commas, in theReadOnlyVariablesandReadWriteVariablesproperties, respectively
Click the Edit Script button to expose the code itself, and note that the primary method to be
coded overridesCreateNewOutputRows, as shown in this simple example:
Public Overrides Sub CreateNewOutputRows()
‘Create 20 rows of random integers between 1 and 100 Randomize()
Dim i As Integer