Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 822 Part V Data Connectivity FIGURE 36-1 The Agent Security dialog FIGURE 36-2 The Snapshot Agent Security dialog 822 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 823 Replicating Data 36 Creating a push subscription to a transactional/snapshot publication The previous two sections discussed the two ways of replicating data, a push and a pull. This section focuses on how to create push subscriptions to transactional and snapshot publications: 1. Connect to your publisher in SQL Server Management Studio and expand the Replication folder and the Local Publications folder. 2. Locate your publication, right-click on it, and select New Subscriptions. Click Next to exit the splash screen. 3. In the Publication dialog, ensure that the publication you wish to create a subscription to is highlighted. If it is not already selected, do so now; you may have to expand other databases to find it. Once your publication i s highlighted, click Next. 4. In the distribution Agent Location dialog, accept the default, which is Run all agents on the Distributor, MySQLServerName (push subscriptions), where MySQLServerName isthenameof your publisher. 5. Click Next to advance to the Subscribers dialog. All subscribers currently enabled will appear in this dialog. If your subscriber does not appear, then click the Add Subscriber button. Note that this button is a drop-down button that enables you to create SQL Server and Non-SQL Server Subscribers. You can create subscriptions to SQL Servers, as well as Oracle and DB2 subscribers using this wizard. With the replication stored procedures it is possible to replicate to any ODBC level 2 compliant and above data source. For more information on how to replicate to Oracle and DB2 subscribers, please refer to this link: http://msdn.microsoft.com/en-us/library/ms188360.aspx. 6. After clicking the Add Subscriber button, you are prompted to connect to your Subscriber. This dialog looks very similar to the Connect to SQL Server dialogs you are familiar with from connecting to SQL Server. Once you have added the subscriber, you will be able to select the subscription database. Select it in the drop-down box, and then click Next. 7. The Distribution Agent Security dialog will appear. This is similar to the Snapshot Agent Security dialog. Enter the account under which you wish the distribution agent to run. You can also specify the accounts you wish to use to connect to the distributor and subscriber here. 8. Click OK when you have completed the setup of these accounts. 9. Click Next to advance to the Synchronization Schedule dialog. From here, you can set a schedule. The default is Run Continuously, which means that the agent will always be running in the case of transactional replication (this setting has no effect on snapshot publications). If you wish to run your distribution agent on a schedule, click the drop-down button and define a schedule or have the agent run on demand. If you select the option to run the agent on demand, then you have to run the agent either through Replication Monitor, by running the job for the distribution agent, or by expanding the publication, locating the sub- scriber, right-clicking on it, and selecting View Synchronization Status and then clicking the Start button. 10. Click Next to advance to the Initialize Subscriptions dialog. The options are to initialize, which means the snapshot will be applied at the subscriber, or not to initialize, which means that 823 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 824 Part V Data Connectivity you need to put all the required objects in place. This includes all tables, the data, stored procedures, functions, and views, as well as replication stored procedures. a. To create the replication stored procedures, use the command sp_scriptpublication customprocs ‘PublicationName’ in your publication database. The stored procedures will appear in the results pane. C opy them into the query window and run them in your subscription database. Under most circumstances the best choice is to select the initialize option. b. If you do select the initialize option, two selections are available in the drop-down list: At First Synchronization and Immediately. At first synchronization means the snapshot will be generated when the distribution agent runs. Immediately means the snapshot will be generated and applied on the subscriber when you complete the dialog. 11. Click Next to advance to the Wizard options. Here you can specify whether you want the snapshot created immediately, scripted, or both. 12. Click Finish to complete the dialog and create your subscription. Creating a pull subscription to a transactional/snapshot publication Creating a pull subscription is very similar to creating a push subscription. The following steps show you how: 1. Connect to your subscriber using SQL Server Management Studio, expand the Replication folder, and then right-click the Local Subscriptions folder. 2. Select New Subscriptions, and click Next at the splash screen. In the drop-down box, select your publisher, and then expand your publication database and select your publication. 3. Click Next. In the Distribution Agent location dialog, select ‘‘Run each agent at its Subscriber (pull subscriptions)’’ and click Next. 4. Click the check box next to your subscriber, and select your subscription database on the right side of the screen. Click Next to advance to the Distribution Agent Security dialog. 5. Select the security context you wish the binary to run under, and choose the account you want to use to connect to the subscriber. This account should be in the dbo_owner role on the subscriber. 6. Click Next. In the Synchronization Schedule dialog, choose how frequently you want the subscriber to connect. The options are to run continuously, run on demand, or run on a schedule. 7. Click Next to continue to the Initialize Subscriptions dialog. The options are to initialize your subscription immediately, at first synchronization, or not to initialize your subscription at all (by not selecting the initialize check box). Follow the notes in Creating a Push Subscription for more details on this option. 8. Click Next to advance to the Wizard options. Here you can specify whether you want the snapshot created immediately, scripted, or both. 9. Click Finish to complete the dialog and create your subscription. 824 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 825 Replicating Data 36 Creating a peer-to-peer topology To create a peer-to-peer topology you must be running the Enterprise Edition of SQL Server. Create a transactional replication publication, and once you have completed the publication creation, right-click on the publication in the Local Publications folder. In the Subscription Options tab, ensure that Allow Peer-to-Peer Subscriptions is set to true. Click OK. Once this is done, follow these steps to set up a peer-to-peer topology: 1. Return to the publication, right-click on the publication again, and this time select Configure Peer-to-Peer Topology. Click Next at the splash screen. 2. In the Publication dialog that appears, select your publication database and publication. These should be highlighted. 3. Click Next to launch the Configure Topology dialog. Right-click and select Add a New Peer Node. 4. Enter your subscriber name, select the appropriate authentication type, and click Next. You will then be prompted for a database and Peer Originator ID. Choose 2 for the Originator ID; the publisher will have an Originator ID of 1. 5. Right-click the database icon in the center of the Configure Topology dialog, and select Connect to All Displayed Nodes. Click Next. This launches the Log Agent Security dialog. 6. Select a security context under which the Log Reader Agent will run, and click Next. This launches the Distributor Security dialog. 7. Select an account under which the Distribution Agent should run and how the Distribution Agent should connect to the Subscriber. You need to repeat this for each node in the topology. Click Next. This launches the New Peer Initialization dialog. You will need to restore the publication database on each of the peers or place the tables and related replication metadata in place. 8. Click Next and then Finish to complete the peer-to-peer topology configuration. Creating a merge publication Creating merge publications is very similar to creating transactional or snapshot publications. Follow these steps to create a merge publication: 1. Connect to your publisher in SQL Server Management Studio, expand the Replication folder, and right-click on the Local Publications folder. Select New Publication and click Next. 2. Select the database you wish to merge replicate, and click Next in the Publication Database dialog. 3. For Publication Type, select Merge publication and click Next. 4. For Subscriber types, select the type of SQL Server to which you are replicating. It is possible to select multiple subscriber types; for example, you can replicate to SQL 2005 and SQL 2008 publishers. 5. Click Next to Advance to the Articles dialog. From here, you can choose what you wish to replicate — for example, tables, views, stored procedures, or functions. If you e xpand a table, you will notice that you have the option to select or deselect columns that you wish to 825 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 826 Part V Data Connectivity replicate. The Article Properties button enables you to control how tables will be replicated (for example, you can choose to replicate user indexes). Click Next. The Article Issues dialog will then warn you that a unique identifier (GUID) column will be added to all tables you are indexing. 6. Click Next to advance to the Filter Rows dialog. Merge replication is designed to only replicate a subset of the data to the subscriber. Part of the reason for this is because merge replication is frequently used over low bandwidth lines, such as over a phone line or the Internet. By filtering rows you can minimize the amount of data that is sent to your sub- scriber. You can also use join filters. Basically, join filters extend a filter you place on a table to all the other tables that have foreign key relationships on the filtered column. Consider the SalesTerritory table in the AdventureWorks2008 database. It is joined to the SalesOrderHeader table by TerritoryID,andtheSalesOrderHeaderTable is joined to the SalesOrderDetail table by the SalesOrderID column. If you filter on the TerritoryID, subscribers would only get the related data for a particular TerritoryID. To use join filters, click the Add button in the Filter Table Rows dialog and select Automati- cally Generate Filters, or click the Add Filter button and select the tables and rows you want to filter on. You then have the option to click the Add Button again and select Add Join to Extend the Selected Filter. You can also filter on Host_Name() and USER_NAME(), both of which can be overridden by the Merge Agent (HostName and PublisherLogin, respectively). 7. Once you have created your filters, click Next to launch the Snapshot Agent dialog; in most cases you will want to accept the default. Click Next to launch the Snapshot Security Agent dialog and set the appropriate accounts for your Snapshot Agent. 8. Click Next to advance to the Snapshot Options dialog, where you can either generate your publication or script it out. Click Next to name your publication and then click Finish. When your publication has been created, click Close. Creating merge replication subscriptions is almost identical to creating subscriptions to transactional and snapshot publications. There are two differences. The first is that there is a Subscription Type dialog. This controls conflicts. Conflicts arise when an attempt is made to update a row that has been deleted on the subscriber between syncs, or the same primary key value is assigned on the publisher and subscriber between syncs. The Subscription Type dialog controls how conflicts are resolved. For example, you can assign a value of 75% to your subscriber. This means that the subscriber change will remain on the publisher unless another subscriber with a higher priority syncs that row at a later point in time. In this case, the subscriber with a higher priority will replace the value that came from the lower priority subscriber. The other subscription type is Client, which means that the first value to the publisher will win any conflicts. The other difference is that you will be able to add a value for hostname that supplies a value to your filter. Web synchronization One other feature of merge replication is web synchronization. Merge r eplication is frequently used to replicate to servers in branch offices over WANs or the Internet. To reduce the exposure of SQL Servers to viruses, worms, Trojan horses, and hackers, Microsoft created web synchronization, whereby the sub- scriber connects to a web server over port 80 or port 443 and an ISAPI filter redirects traffic to a SQL 826 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 827 Replicating Data 36 Server. While most firewall administrators are reluctant to open port 1433 (the TCP/IP port that SQL Server listens on), they have no problem leaving port 80 or 443 open. To configure web synchronization, execute the following steps: 1. Connect to your publisher using SQL Server Management Studio, and expand the Replication and Local Publications folders. 2. Right-click your publication and select Configure Web Synchronization. Click Next. 3. Specify whether your subscribers are running SQL Server or SQL CE and select Next. 4. Enter the name of the web server and choose to either create a new virtual directory or use an existing one. You will receive a prompt to accept the copying of an ISAPI extension that will process your web synchronization. Click Yes, and then Next to launch a dialog for Authentication Access. I recommend that you use Basic Authentication. 5. For the domain, enter the name of the domain as it appears on the certificate; for the realm, enter the name of your fully qualified domain name as it appears on your certificate. After clicking Next, the Directory Access dialog appears. 6. Select an account or a group here that will be used to connect to your snapshot share. This group should have read rights to access the snapshot share. 7. Click Next to advance to the Snapshot Share Access dialog. Enter the name of the snapshot share as a UNC: \\MyServerName\ShareName. The share must pre-exist. 8. Click Next. If you have not already configured a publication to use this share as its snapshot folder, you will get a prompt telling you that the share is empty. Ensure that this is the share you want to use and click Next to continue to the Complete the Wizard dialog. 9. Confirm your choices and click Finish. After the Web Configuration Wizard completes, you will get a success or failure report. T he latter report enables you to determine which component failed and to rerun the wizard to correct those portions. Summary Replication is a complex and powerful feature of SQL Server, and fully describing it could easily take a book by itself. Using the wizards and dialogs that Microsoft has written into Management Studio greatly simplifies the process of configuring and deploying replication. Key points from this chapter include the following: ■ Replication can be a good fit for your data distribution needs. ■ Replication uses a publisher — distributor — subscriber metaphor. ■ Transactional replication is one-way replication by default and the fastest and most popular replication method. ■ Peer-to-peer replication is bi-directional transactional replication, and an Enterprise Edition–only feature. ■ Merge replication is a best fit for bi-directional replication, especially when the publisher and subscribers are occasionally or frequently offline. The next chapter continues the data connectivity theme with Microsoft’s BI solution for moving data: Integration Services. 827 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 828 www.getcoolebook.com Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 829 Performing ETL with Integration Services IN THIS CHAPTER Integration Services variables and expressions Constructing control and data flows Package event handlers Debugging Full list of package elements Configuring packages for multiple environments Package deployment I ntegration Services is most commonly described as an extract-transform-load (ETL) tool. ETL tools are traditionally associated with preparing data for warehousing, analysis, and reporting, but Integration Services represents a step beyond the traditional role. It is really a robust programming environment that happens to be good at data and database-related tasks. Many prospective users have been intimidated by the Integration Services learning curve, sticking to the Transact-SQL that they know instead of investigating a more powerful ETL tool. This has made traditional SQL approaches one of the largest competitors of Integration Services, but those who take the time to understand this tool will find several advantages: ■ Simple, fast methods for moving large quantities of data, minimizing database load, and batching data into destination tables to keep blocking and transaction log sizes down ■ The capability to chain together many tasks, with complete control over ordering and error and exception handling. Many tasks can be executed in parallel. ■ Connections to read or write most any type of data without special programming or linked server calls ■ Common data and database management tasks are implemented without the need to write code; a .NET scripting environment is available for more custom tasks, plus Integration Services is fully extensible with custom assemblies. ■ Resulting packages are as manageable as the situation requires, with several deployment, configuration, auditing, restart, and logging options. While careful coding in SQL or other languages can approach the same core func- tionality as Integration Services, most projects require significant effort and end 829 www.getcoolebook.com Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 830 Part V Data Connectivity up with minimal exception handling — often something like ‘‘stop and send me an e-mail if there is a problem.’’ One colleague recently received 8,000 e-mails over a two-day period from such a data loading system. Integration Services enables you to avoid many of the tedious details and spend more time building robust applications. It also excels at identifying problems and performing recovery operations within the application itself. Many of the Integration Services features, such as error row redirection, complex precedence constraints, data conversion, and fuzzy lookup, are well suited to recovering from data problems, rather than ignoring or failing on them. There is nothing better than building a system integration application that one seldom even thinks about. New in 2008 L ook throughout this chapter for new features added in SQL Server 2008, including enhanced ADO.NET connectivity, upgrade of the scripting environment to use Visual Studio Tools for Applications (VSTA), enhanced lookup functionality, including cache creation and tuning, and the new Data Profiling task. Additionally, the Integration Services runtime environment has been made more scalable through increased parallelism. Design Environment One of the best ways to understand Integration Services is to understand its design environment. Begin by opening a new Integration Services project within the Business Intelligence Development Studio. The Integration Services template is located in the Business Intelligence folder. The window that appears should look similar to what is shown in Figure 37-1. Beyond the ever-present Solution Explorer and Properties panes, several panes and tabs are used in building a package (use the View menu to display any missing panes): ■ Connection Managers pane: Connection managers are pointers to files, databases, and so on, that are used to provide context for the execution of tasks placed on the design surface. For example, an Execute SQL task requires a database connection. ■ Toolbox: The toolbox provides a list of tasks that can be dragged onto the design surface. The list of available tasks varies according to the active tab in the main pane. ■ Control Flow tab: This is the primary design surface on which tasks are placed, configured, and ordered by connecting tasks with precedence arrows. ■ Data Flow tab: One of the tasks that can be configured on the Control Flow tab is a Data Flow task, used to move and transform data. The Data Flow tab is used to configure Data Flow tasks; think of it as a Properties window on steroids. ■ Event Handlers tab: Events are exposed for the overall package and each task within it. Tasks are placed here to execute for any event, such as OnError or OnPreExecute. 830 www.getcoolebook.com Nielsen c37.tex V4 - 07/21/2009 2:13pm Page 831 Performing ETL with Integration Services 37 ■ Package Explorer tab: This tab lists all the package’s elements in a single tree view. This can be helpful for discovering configured elements not always obvious in other views, such as event handlers and scoped variables. FIGURE 37-1 The Control Flow tab of Integration Services’ design environment The package runs by executing Control Flow tasks, beginning with tasks that have no incoming prece- dence constraints (arrows). As each task completes, the next task executes based on the precedence con- straints until all tasks are complete. Refer to Figure 37-1 for an example of three tasks sequenced by two such arrows. While the tasks in this example are ordered for serial execution, many combinations are possible (see the section ‘‘Control Flow Precedence’’ later in this chapter). For users of Data Transformation Services (DTS) in SQL 7.0 and SQL 2000, this approach to building a package will be familiar, but why the Data Flow tab? Each data flow is a single task on the Control Flow tab; drill down (e.g., via a double-click) to view a single data flow’s configuration on the Data Flow tab (see Figure 37-2). Note that it does not contain tasks but data sources, destinations, and transformations. The arrows between the boxes are not prece- dence indicators but data inputs and outputs that determine how data flows from source to destination. 831 www.getcoolebook.com . button is a drop-down button that enables you to create SQL Server and Non -SQL Server Subscribers. You can create subscriptions to SQL Servers, as well as Oracle and DB2 subscribers using this. types, select the type of SQL Server to which you are replicating. It is possible to select multiple subscriber types; for example, you can replicate to SQL 2005 and SQL 2008 publishers. 5. Click. dialog, accept the default, which is Run all agents on the Distributor, MySQLServerName (push subscriptions), where MySQLServerName isthenameof your publisher. 5. Click Next to advance to the Subscribers