Expert Reference Series of White Papers Bust a Move with Your SSIS – Passing Package Variables 1-800-COURSES www.globalknowledge.com Bust a Move with Your SSIS – Passing Package Variables Bill Kenworthy, Global Knowledge Instructor, MCDBA Introduction Integration Services (SSIS), the next generation of the Extract, Transport, and Load (ETL) feature included with Microsoft SQL Server 2005 has befuddled Database Administrators This is due to the new programming paradigm and the complexity of the development environment SSIS includes many new capabilities bundled with a Visual Studio front end This paper explores the creation of sample development data featuring the use of the most basic features in this new interface The challenge of declaring and passing package variables contains enough material for an hour’s skill-building session with this new tool Extract,Transport, and Load (ELT) The challenge of the new changes of the ETL feature is comparable to a rock band taking over a square dance It’s time to let go, learn to dance to a new beat, and bust some moves with this great new tool, SSIS The complexities of this new tool have caused many administrators to retreat to the familiar tempo of DTS However, as we explore this tool, you will find the creativity available in this new environment compelling enough to bust some moves of your own The goal of this paper is to demonstrate the generation of a table of random data to be used in a development environment, exposing the nuances of passing variable values into various tasks objects that make up the package One of the challenges to application development in a database is to have data available for testing and reporting Adding to this challenge are the typical privacy requirements preventing use of actual patient personal identification In my experience, it is always better to develop with test data that is as reflective of the test data as possible, especially when interviewing end users using a prototype of the system under development I will walk thru the steps needed to create a SSIS package to develop a table of test data The end result of the execution of the package, the Person table, contains just a couple of challenges The package will be configured with package-level variables that will be passed from one task to another This poses the challenges of declaring the variables and referencing them throughout the package Our goal is to generate random Patient Name and SSN information to be inserted in the Person Table schema shown below in Step This purpose of this table is to replace a table containing an extract of live production data to facilitate application testing Overwriting the contents of a table with randomly generated data will obscure the private details in a database, which are protected by legal constraints, while allowing proper acceptance testing and comparison to an existing system prior to cut over to the new system Copyright ©2007 Global Knowledge Training LLC All rights reserved Page The project at hand starts with generation of a simple Database containing five tables and two stored procedures A SSIS solution is created to use this database and populate the Person table with random data in three columns: FirstName, LastName, and SSN The name data is generated from the contents of two driving tables, FName and LName These tables contain the seed data for name generation The result of the execution of the name generation is a table containing an Identity column and columns for First and Last Name The SSN generation fills a separate table and requires no seed data The name generation takes place inside a For Next Loop that uses a package variable to control the number of rows generated Once the loop completes, execution is passed to a SQL task that runs the SSN generation stored procedure After completion of this task, a Merge Join Task is the last major data manipulation activity The Merge Join, contained in a Data Flow Task, combines the two staging tables into the final product The moves that make this package possible are creating package variables, passing the variable values to the task objects, and careful attention to matching column data types A data dictionary for the development database is contained in Resource A The Project Generate tables and stored procedures for the project Creating a SSIS project adding needed variables Add a SQL Task truncating working tables Add a For Next Loop configure looping parameters Call a stored procedure in the loop passing a parameter Generating the SSN data using a execute sql task Add a data flow task Configure a data merge task The tasks to be accomplished in this project The following the individual steps of creating the package; however, the reader will find the configuration entries in the screen shots are valuable in duplicating this demonstration The database and its objects are the foundation upon which we build our transformation The solution requires seed tables and staging tables used to hold temporary results and a final table holding the merged contents of the two staging tables The database diagram in the database is shown in Figure Step Generate tables and stored procedures for the project The data dictionary for this database is contained in the Resources section of this document The script for generating the schema and stored procedures is in Resource B The code has hard-coded references to the DEV database; the script should be run in the context of a database with that name Figure Database Schema for the project Copyright ©2007 Global Knowledge Training LLC All rights reserved Page Step Open an Integration Services Project Open your project, then right click on any clear space on the control flow pane and choose variables from the context menu to open the variable declaration dialog box Add two int32 variables, Counter and MaxRows, with values of and 1000, as shown in Figure The Figure Variables dialog box Counter variable is used to pass the current loop index into the SQL task contained in the loop task that will be added to the project in step MaxRows is the number of rows to be inserted into the Person Table Figure shows the dialog box with appropriate entries Note: references to variables in this environment are case sensitive Resource C contains a reference a topic in the SQL Server 2005 Books Online describing variables and links to how-to: topics Step Add a SQL Task truncating working tables Add a SQL Task to the Control Flow window as the first task in the project and set its parameters as shown in the diagram Note the configuration of the ConnectionType as ADO.NET Although this SQL Task doesn’t pass parameters in the SQLStatement property, I like to keep settings of similar objects consistent Specifying ADO.NET as the connection type allows reference to parameters Figure SQL Task to truncate the working tables using the @ naming convention The SQL query simply truncates the Person and Name tables An appropriate reference to this object in the books online is listed in Resource C Step Add a For Next Loop configure looping parameters The For Loop container defines a repeating control flow in a package In this package, the For loop is used to repeat the execution of the MakeNames stored procedure until the required number of rows configured in the MaxRow variable are inserted into the Name table The For Loop container uses three elements to define the loop init, eval, and assign(increment) control values As you can see in Figure above, the variable @Counter is used for indexing in the loop This reference is case-sensitive and must match a package variable name, with the @ prefix necessary in this property page For example, the variable @MaxRows matches the MaxRows package variable An appropriate reference to this object in the books online is listed in Resource C Copyright ©2007 Global Knowledge Training LLC All rights reserved Page Figure Configuring the For Loop Task Step Call a stored procedure in the loop passing a parameter SQL Task configured with connection type ADO.NET, calls stored procedure MakeAName Note my preference for the property, ConnectionType Each connection type supports a different syntax for passing parameters ADO.NET supports the @ reference, other connection type use a ? [question mark] I prefer the @ syntax, it is consistent with the syntax used in Transact SQL An appropriate reference to this object in the books online is listed in Resource C Figure Property page for the SQL Task embedded in the Loop Task Copyright ©2007 Global Knowledge Training LLC All rights reserved Page Figure Parameter map entry passes variable value from loop The second part of configuring this SQL Task is the mapping required to tie the variable referenced in the SQL statement to the package variable value being passed into the SQL Task by its parent container The first three columns are selected from combo box choices, the developer enters the Parameter Name value by hand Figure Control Flow diagram of the project to this point Test it! Now the project is at a point where it can be tested Your package should resemble the package shown in the figure above If your package errors out when you run the debugger, consult the Execution/Results view for error messages and resolve the errors Copyright ©2007 Global Knowledge Training LLC All rights reserved Page Step Generating the SSN data using an Execute SQL task Figure Configuration of the SQL Task that follows execution of the loop Step Add a data flow task Assemble the data flow objects as shown in Figure above; the properties to be set are in the table in Resource D Appropriate references to the objects used in this dataflow for lookup in the books online are listed in Resource C The Merge Join Task properties are detailed in Step Figure The Data Flow contains objects Copyright ©2007 Global Knowledge Training LLC All rights reserved Page Step Configure a data merge task This entire configuration of the Merge Join is shown in Figure 10 This is the only property page for the Merge Join Figure 10 Configuration of the Merge Join Object Data typing is strong in this environment The datatype of each column in the output table must match that of the corresponding column in the Person table The figure shows the FirstName output column has been configured as a Unicode string, the LastName column in this datareader and the SSN column in the SSN datareader should be set to Unicode as well Figure 11 Configuring the datareader column datatype properties Copyright ©2007 Global Knowledge Training LLC All rights reserved Page Figure 12 Final control flow of the project The finished project should have a Control Flow diagram as shown in Figure 12 In this example annotations have been added to label each task in the flow The diagram shows a few of the rows from the Person table populated using the SSIS package A weakness in my calls to the RAND() SQL function inside the MakeSSN procedure shows a lot of commonality in the second and third segments of the data in the SSN column In this snapshot of the data, you see the value of 71 is very popular in the second segment of the string, and a modal distribution in the last four characters of the string There are clumps of similar values, ‘7137’ shows up in rows 98 -103 I think the inclusion of a Common Language Runtime (CLR) Assembly with a function to generate a random SSN string would be a significant improvement to the package and provide a performance increase Figure 13 The data generated by the package Copyright ©2007 Global Knowledge Training LLC All rights reserved Page Summary I’ve presented a common development scenario, generating representation test data and a possible solution to this requirement The solution presented demonstrates a control flow containing a looping task, several SQL tasks, and a dataflow using a merge object Declaring package variables and passing variable values between tasks requires careful attention to detail when configuring the various tasks to share values amongst them SSIS presents a flexible programming structure allowing no practical limit to extension This flexibility brings with it a finer structure for controlling a group of tasks the complexity of which bears careful experimentation The environment provides many opportunities for tapping into the power of the NET Framework but brings with it some new baggage such as case sensitivity, connection type requirements, and strict type casting Learn More Learn more about how you can improve productivity, enhance efficiency, and sharpen your competitive edge Check out the following Global Knowledge courses: Implementing and Maintaining Microsoft SQL Server 2005 Integration Services Microsoft Certified IT Professional: Database Administrator Boot Camp SQL Server 2005 Administration SQL Server 2005 for Business Intelligence SQL Server 2005 for Developers SQL Server 2005 for Reporting Services For more information or to register, visit www.globalknowledge.com or call 1-800-COURSES to speak with a sales representative Our courses and enhanced, hands-on labs offer practical skills and tips that you can immediately put to use Our expert instructors draw upon their experiences to help you understand key concepts and how to apply them to your specific work situation Choose from our more than 700 courses, delivered through Classrooms, e-Learning, and On-site sessions, to meet your IT and management training needs About the Author: Bill Kenworthy has been working with SQL Server since version 6.0 His love for database challenges is reflected in his writing Bill lives with his wife and dogs at the end of a dirt road in northern Washington State Resources: A Data Dictionary for the project Staging Tables FName, LName Two seed tables – number of rows not necessarily equal These two tables contain the first and last name values that will be selected randomly and inserted into a row in the Name table Name ,SSN Working tables holding Name and SSN working data Copyright ©2007 Global Knowledge Training LLC All rights reserved Page 10 Production Table Person Stores Patient Name and SSN data Stored procedures MakeAName, requires an integer variable that is used to seed the RAND() function The procedure inserts a row into the Dev.dbo.Person table, providing values for the FirstName and LastName columns The name values are randomly selected from the staging tables MakeASSN, populates the SSN table with a unique combination of characters generated by the RAND() function The stored procedure checks the size of the Person table and inserts the same number of rows into the staging table B Script for creation of the database objects SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[FName]') AND type in (N'U')) BEGIN CREATE TABLE [dbo].[FName]( [Id] [int] IDENTITY(1,1) NOT NULL, [FirstName] [nvarchar](50) NULL ) ON [PRIMARY] END GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[LName]') AND type in (N'U')) BEGIN CREATE TABLE [dbo].[LName]( [Id] [int] IDENTITY(1,1) NOT NULL, [LastName] [nvarchar](50) NULL ) ON [PRIMARY] END GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SSN]') AND type in (N'U')) BEGIN CREATE TABLE [dbo].[SSN]( [Id] [int] IDENTITY(1,1) NOT NULL, [SSN] [char](11) NULL ) ON [PRIMARY] Copyright ©2007 Global Knowledge Training LLC All rights reserved Page 11 END GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Name]') AND type in (N'U')) BEGIN CREATE TABLE [dbo].[Name]( [PersonID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [varchar](50) NULL, [LastName] [varchar](50) NULL ) ON [PRIMARY] END GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Person]') AND type in (N'U')) BEGIN CREATE TABLE [dbo].[Person]( [PersonID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [nvarchar](50) NULL, [LastName] [nvarchar](50) NULL, [SSN] [nchar](11) NULL ) ON [PRIMARY] END GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MakeAName]') AND type in (N'P', N'PC')) BEGIN EXEC dbo.sp_executesql @statement = N' CREATE procedure [dbo].[MakeAName] @counter int as Declare @fid int Declare @lid int declare @NumFirstNames int declare @NumLastNames int Declare @random int Declare @fname varchar(50) Declare @Lname varchar(50) select @NumFirstNames = count(*) from fname select @NumLastNames = count(*) from lname set @counter = @counter *131 Copyright ©2007 Global Knowledge Training LLC All rights reserved Page 12 if @counter is null begin set @counter = 99999 end Set @fid = @NumFirstNames - cast(rand(@counter) * @NumFirstNames as int) Set @lid = @NumLastNames - cast(rand(@counter) * @NumLastNames as int) Get a first name and last name pair using a random pointer pair select @fname = FirstName, @lname = LastName from fname,lname where fname.id = @fid and lname.id = @lid Use the randomly paired first and lastname to insert a row Insert Dev.dbo.Name (FirstName,LastName) values (@fname,@lname) ' END GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MakeSSN]') AND type in (N'P', N'PC')) BEGIN EXEC dbo.sp_executesql @statement = N' CREATE Procedure [dbo].[MakeSSN] as declare @counter int declare @X int declare @Y int declare @Z1 int declare @Z2 int declare @maxrows int declare @a char(3) declare @b char(2) declare @c char(4) declare @SSN char(11) declare @test int truncate table dbo.SSN truncate table dbo.test Select @maxrows = count(*) from dbo.Name Set @counter = set @x = 111 While @counter < @maxrows Begin Set @test = if (@x < 842) set @x = @x + else set @x = 111 Copyright ©2007 Global Knowledge Training LLC All rights reserved Page 13 Set @Y = cast(rand(@counter % ) * 100 as int) Set @Z1 = cast(rand(@counter % 30) * 10000 as int) -' END end Set @a = cast(@x as varchar(3)) Set @b = cast(@y as varchar(2)) Set @c = cast(@z1 as varchar(4)) set @ssn = @a + ''-'' + @b + ''-'' + @c select @test = count(SSN) From SSN where SSN = @SSN if @test = Begin if len(@ssn) = 11 begin insert ssn (ssn) values (@ssn) set @counter = @counter + end print ''too short'' end print @counter print @ssn C.Topics in the SQL Server Books Online referenced in the text Step # 7 7 Topic Using Variables in Packages Execute SQL Task For Loop Container Execute SQL Task Execute SQL Task Execute SQL Task DataReader Source Sort Tranformation Merge Join Transformation SQL Server Destination Copyright ©2007 Global Knowledge Training LLC All rights reserved Page 14 D.Table of Settings for the Dataflow Objects Object Name Datareader Tab Property Value ICbConnection Connection manager of your creation Component Properties Name Name Datareader SQL Command Select * from dbo.DEV Input and Output Properties IDataReaderOutput Output Columns FirstName Unicode string [DT_WSTR] DataReaderOutput Output Columns LastName DataReaderOutput Output Columns LastName Unicode string [DT_WSTR] General IDbConnection Connection manager of your creation Component Properties Name SSN Datareader SQL Command SSN Datareader General Select * from dbo.SSN DataReaderOutput Output Columns SSN Unicode string [DT_WSTR] Input Column PersonID Input and Output Properties Name Sort Output Alias Sort Type Sort Order SSN Sort Input Column Output Alias Sort Type Sort Order Merge Join See the figure in Step SQL Server Destination Connection Manager Connection of your creation Mappings Mappings are straight across, no map to the ˆD column on the person table Copyright ©2007 Global Knowledge Training LLC All rights reserved Page 15 ... demonstrates a control flow containing a looping task, several SQL tasks, and a dataflow using a merge object Declaring package variables and passing variable values between tasks requires careful attention... is case-sensitive and must match a package variable name, with the @ prefix necessary in this property page For example, the variable @MaxRows matches the MaxRows package variable An appropriate... in a Data Flow Task, combines the two staging tables into the final product The moves that make this package possible are creating package variables, passing the variable values to the task objects,