eXerCISe 7-2. FILLING the Data WarehOUSe
11. Move the precedence constraint arrows around by clicking them and dragging them
Your ssis package should now look similar to Figure 7-24.
Note: A red dot may appear on some tasks indicating that the task is not configured. As noted, we resolve this later in the chapter.
in this exercise, you began configuring your ssis package by outlining the various steps required utilizing sequence containers and ssis tasks. Currently these tasks do not do anything because they have not been fully configured. in addition, the package will not work until you create one or more connections to the source and destination objects. we look at how this is done in the next section.
Data Connections
After you have outlined your ETL process on the Control Flow tab, the next step is to create connections so that you can configure each task. Creating connections is easy; all you have to do is go down to the bottom of the screen, find the Connection Managers tray, and add a connection by right-clicking the background of the Connection Managers surface area and selecting New Connection from the context menu (Figure 7-25). This
Figure 7-24. The SSIS package at the end of Exercise 7-2
Connection managers are defined by their connection type. There are about 20 different connection types available; some are installed by default, and others can be downloaded. Of all of these choices, you will frequently use the following connection types:
The file connection manager
•
The OLE DB connection manager
•
The ADO.NET connection manager
•
Selecting one of these options is equivalent to selecting New Connection and choosing the provider from the dialog you saw in Figure 7-25. Microsoft conveniently placed these choices at the top of the context menu.
The File Connection Manager
The file connection manager allows you to work with both files and folders. This handy little connection manager can perform operations such as creating new files and folders as well as accessing data from existing files.
In the business intelligence world, it is common to import data from files into staging tables and then distribute it to the various tables in the data warehouse. Therefore, it is very likely that you will use these connections when working with SSIS.
Figure 7-25. Adding a new connection to the Connection Manager tray
An example scenario would be to have a number of branch offices that upload files to a corporate folder. You could process these files one by one into a staging table by combining the file connection manager with an SSIS For-Each container, which allows you to loop through multiple files one by one and apply processing tasks to each of them in turn.
The OLE DB Connection Manager
The OLE DB connection manager is the most common connection manager you will use, since it allows you to connect to anything that has an OLE DB provider. That is a very wide range of connection options when you consider that anything from an Excel spreadsheet to an old mainframe database will have OLE DB providers available.
The OLE DB connection manager is designed to be flexible when working with data types that differ from the originating and destination data sources, and because of this built-in flexibility, the OLE DB connection manager is the one that gives you the least amount of trouble when it comes to data conversion error.
Note
■ ssis has a picky attitude toward data conversions, which can be quite frustrating. These issues are less likely to occur when using oLE DB connections. This is an important consideration when choosing to use oLE DB connections in your ssis packages!
As usual, a gain in flexibility comes at a cost. The cost in this case is a decrease in speed. As a result, for raw performance (when your project requires speed), you probably want to use a more specialized connection manager such as the ADO.NET connection manager, for example. And for ease of use and compatibility (for comparatively smaller projects), use the OLE DB connection manager.
If you are working with a mere several thousand rows, we recommend using an OLE DB connection manager. With only thousands of rows (as opposed to millions), you will notice very little difference in performance and will appreciate the flexibility and ease of use of the OLE DB connection manager.
The ADO.NET Connection Manager
The ADO.NET connection manager is preconfigured to use a .NET provider for accessing data sources, and depending upon the size and type of data you are working with, it may give increased performance over the generic OLE DB connection manager. The performance increase when used with the newer versions of SQL Server can be substantial, but with older or non-Microsoft databases, there are little to no performance gains, no matter how much data is involved.
The ADO.NET connection manager is limited to the types of connections it can make, specifically in comparison to the OLE DB connection manager, but it connects to all versions of Microsoft SQL Server.
The data types used in an ADO.NET connection manager are much more specific to Microsoft’s .NET data types.
As you work with them, you begin to see that the list of types looks very different than the more generic OLE DB data types, which are based on an ANSI standard and not the .NET standard. This is important because SSIS creates metadata to describe all of its source, destination, and transformation components. If the metadata of your SSIS task does not identify a compatible data type, your task will either process with a warning or fail to process at all.
Note
■ we recommend using the oLE DB connection manager whenever possible since it provides the great-
connection managers. But, keep in mind that they are more difficult to work with. Microsoft’s website provides a great deal of information on this topic. For more information, search for the topic of “ssis data types” at http://msdn.microsoft.com.
Configuring a Connection
Each time you add a new connection, you need to configure it. Once you select a connection manager and click the Add button, a configuration dialog appears (Figure 7-26).
Figure 7-26. Each connection manager type has its own configuration manager
This new dialog is dependent on which type of connection manager you selected, and each connection manager type has its own configuration dialog. If you select an ADO.NET connection manager, for example, you are presented with the Configure ADO.NET Connection Manager dialog. If, instead, you choose an OLE DB connection manager, you will be presented with the OLE DB version of this dialog.
In both the OLE DB and ADO.NET configuration dialogs, Visual Studio remembers any previous connections you have created either in this project or in past projects. Therefore, if you have connected to a database in a previous project, these connections are still available. For example, since we previously created a
If you have not created a connection to a given database prior to this, you will not see a connection available and will need to click the New button to create one.
Once you click the New button, yet another dialog appears! In this one, you start your configuration by typing in the name of the database server. If you are connecting to the server on your computer, you can type in localhost. And as covered in previous chapters, using (local) with parentheses or simply putting a period (.) can work, if localhost does not.
Tip
■ if you are using a named instance of sQL server, you have to configure the name as
localhost\ < Mynamedinstance>. For more information on connecting to your local server, see Chapter 5.
After choosing the name of the server, you choose the database you want to connect to. To determine which database a connection manager uses, you can type in the database name or can use the dropdown button beneath the “Select or enter a database name” label, as shown in Figure 7-27.
With both the server and the database selected, clicking the Test Connection button tells you whether your connection is successful. When the connection works correctly, click OK to close the dialog and create a new connection manager in the Connection Manager tray.
In our example, we are importing data from Pubs to our DWPubsSales data warehouse. Therefore, we need a connection to both databases. Each connection manager provides a connection to only a single database at a time, so we must create and configure a separate connection manager for each database.
After you have created your connections, you can edit and review their properties using the Properties window (Figure 7-28). This is convenient since you often create a package on one computer but move it to another later. When you do so, you can adjust the connection for use on the new computer by clicking the ConnectionString property to launch the same dialog shown in Figure 7-27.
Figure 7-28. Reviewing the connection manager properties
Although it is possible to have multiple connections to the same database, you typically use only one connection manager per database for all of your SSIS tasks. This allows you to reconfigure one connection manager while affecting all of the SSIS tasks using that connection.
Tip
■ After you create a connection manager, you can reconfigure it using either its property dialog or the Visual studio property window, but be aware that some settings appear only in the Visual studio property window and not in the dialog. And for other tasks the opposite is true.
Figure 7-29. Editing an Execute SQL task
Execute SQL Tasks
An Execute SQL task allows you to run SQL code on a connected database. For example, to use an Execute SQL task designed to drop foreign key constraints, you configure it to connect to the proper database by selecting the appropriate connection manager and then add SQL code to drop the constraints.
Editing Your Execute SQL Task
You add code to an Execute SQL Task Editor window by right-clicking the task and choosing Edit from the pop-up context menu (Figure 7-29).
When the editing dialog displays, you will see a number of properties that can be configured; however, the two you must configure are the Connection and SQL Statement properties.
In SSIS, editing windows are usually divided into pages. These pages are listed on the left side of the dialog.
In an Execute SQL task, configure the connection by first selecting the General page and then clicking the Connection setting. A dropdown box appears allowing you to select an existing connection manager or even to create a new one (Figure 7-30).
Figure 7-30. Configuring a connection for an Execute SQL task
The connections available in the dropdown box are context sensitive to the connection type selected in the ConnectionType property. If the connection type is not set to OLE DB but instead is set to another connection type such as ADO.NET, any OLE DB connection managers that have been created will not appear in the dropdown selection. Switching the ConnectionType property to OLE DB allows them to appear (Figure 7-30).
To add SQL code, you need to configure the SQL Statement property. After clicking this property, you will see that an ellipsis button appears. When you click the ellipsis button, a dialog appears where you can add your SQL code (Figure 7-31). Hidden buttons like this one are becoming commonplace in Microsoft applications. If you do not immediately see a way to configure a property, try clicking it to see whether it contains a hidden button.
Tip
■ in Microsoft’s most recent user interfaces, dropdown boxes, buttons, or ellipsis icons do not appear until you click a property setting. This newer, sexier interface can cause some problems when trying to figure out how a given property is configured. Just remember; when in doubt, click the property!
SQL code can be typed into the dialog, but a better way is to create the ETL code beforehand with SQL Management Studio (as we did in Chapter 6) and then copy and paste the code into the SSIS window. One advantage to this method is that you can test and correct your code before adding it to your Execute SQL Task.
Once you have selected a connection and added your SQL code, you can close the Execute SQL Task dialog window and test your work by executing it.
Executing Your Execute SQL Tasks
SSIS packages consist of XML code that describes various tasks. Each task placed on the designer surface is a collection of programming instructions in an XML format. When you configure a task, you are filling in attributes and elements of the XML programming code, as shown in Figure 7-32.
Like any other programming language, the code by itself does not do anything unless software exists that can read the programming instructions and perform the action accordingly. This type of software is often referred to as a runtime environment or runtime engine. Visual Studio includes an SSIS debugging engine that launches the SSIS runtime environment. After creating your package, you will no longer need Visual Studio to run your Figure 7-31. Configuring a SQL statement for an Execute SQL task
Figure 7-32. The XML code behind the Execute SQL task
Still, for a developer running the code within, Visual Studio is convenient. Right-click a task and select Execute Task from the context menu, as shown in Figure 7-33, to launch Visual Studio’s debugging engine, which in turn executes the code associated with the SSIS task you selected using the SSIS runtime environment.
As each task executes, the task displays a yellow wheel and then either a red X or green check mark (Figure 7-34). The yellow status means that the underlying code within the task is currently running in the debugging engine. When the task icon turns green, it means that all the code has stopped running and that the execution of that code was successful. When the task icon turns red, it means that the execution of the code was unsuccessful.
No matter the outcome, the debugging engine does not automatically shut down but instead continues running in the background. This is evidenced by the word (Running) within parentheses being displayed at the top of the Visual Studio window (Figure 7-34).
Figure 7-34. A task changes color to indicate its status as it is running
You cannot edit the SSIS package while the debugger is running. Therefore, once all of the tasks have completed, stop the debugger by selecting the hyperlinked message that we circled in Figure 7-34 or by selecting the Stop Debugging option from the Visual Studio Debug menu. Once the SSIS code stops running, you can continue to edit your SSIS package.
You may have noticed that there is an indicator on the sequence container as well as the individual tasks (Figure 7-34). You can debug more than one task at a time within Visual Studio, either by executing the SSIS package file as a whole or by executing all of the tasks within a sequence container.
To execute a set of tasks within a sequence container, right-click the container and choose Execute
Container from the context menu. To execute all the tasks within a particular SSIS package, select the package file in Solution Explorer, right-click it, and then choose Execute Package from the context menu, as shown in Figure 7-35.
We recommend that you test individual SSIS tasks as you go so that any error can be resolved sooner rather than later. This approach is not always practical, however, when certain tasks are contingent on others running or when the database must be in a certain state before your SSIS code executes, such as having the foreign keys dropped before table data can be truncated. In cases like these, you need to pay close attention to the logical order of your ETL tasks and create a strategy that is appropriate to what you are trying to accomplish.
Be sure to test the tasks both singularly and collectively whenever possible, and you will resolve many issues that developers have when creating, deploying, and executing SSIS packages.
Note
■ we recommend that you run the entire package twice. This is because your testing process may have inadvertently set the state of your database objects to something other than normal. For example, some tables may be filled, but others may not be. since most packages eventually are set up by a sQL administrator to run automati- cally at night, nobody wants to find out that a package scheduled to run at 2 a.m. has failed. Running the package successfully the first time may give you a false positive, while running it a second time will be similar to how it was Figure 7-35. Executing the entire SSIS package
The Progress/Execution Results Tabs
The Visual Studio debugger tracks whether each task completes successfully on both the Progress and Execution Results tabs. It may seem odd, but both of these tabs are the same. The tab title changes to Progress while it is running, as shown in Figure 7-36, but changes back the Execution Results (Figure 7-37) when it stops.
Figure 7-37. Viewing the Execution Results tab Figure 7-36. Viewing the Progress tab
When all the tasks are executed successfully, you see “100 percent complete” in the breakdown of each task. If the task fails, you are notified in the same window with a circled red “!” symbol and an associated error message. You use this information to troubleshoot the cause of your problem.
To see an example of this, you could try running the task that drops a foreign key constraint once and then attempt to rerun that same task again. SQL Server raises an error when you try to drop foreign key constraints that do not exist. This error is then caught by the SSIS runtime engine and sent to Visual Studio’s Output window.
That is the error you see in Figure 7-38.
Figure 7-38. Reading error messages on the Output window
The information about the error is not always useful and, if not read carefully, can even be misleading. In Figure 7-38, note that the output message states “connection not established correctly,” but if you look carefully, you find that the output text uses the word possible to describe the reason for the failure.
When we made this screenshot, the connection was working fine. The problem was that we had already run the “Drop the Foreign Key Constraints” task, and when the task ran a second time, the error occurred because those foreign key constraints no longer existed.
The error message also mentions a problem with the query, but we know there is no problem with it because we tested the code in SQL Server Management Studio. In the end, you must use common sense to troubleshoot the cause of these errors and not necessarily rely on information from the Output window alone.
You have seen how to create a new SSIS project, outline the ETL process in the package, create connections, and configure an SSIS task. You will soon put this knowledge to work by creating some connections and configuring the two Execute SQL tasks in your ETLProcessForDWPubsSales.dtsx package during Exercise 7-3.