Nielsen c35.tex V4 - 07/21/2009 2:10pm Page 812 Part V Data Connectivity Service Broker can handle complex message groups, such as multiple line items of an order that may not appear consecutively in the queue due to other messages being received simultaneously. The conversa- tion group can be used to select out the related messages. Monitoring Service Broker While Management Studio has no visibility in the activity of a queue, nor summary page reports for the queue object, you can select directly from the queue or select a count(). In addition, there are database catalog views to shed light on the queue: * sys.dm_broker_activated_tasks * sys.dm_broker_connections * sys.dm_broker_forwarded_messages * sys.dm_broker_queue_monitors SQL Trace/Profiler has a Broker event class with 10 Service Broker–related events that can be traced. Summary Service Broker is one of those technologies that provides no benefit ‘‘out of the box.’’ Unless you make the effort to architect the database using Service Broker, it offers no advantage. However, if you do take the time to design the database using Service Broker, you’ll see significant scalability benefits, as Service Broker queues buffer the workload. The next chapter continues the progression through SQL Server technologies and discusses ADO.NET 2.0 and its powerful methods for connectivity. 812 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 813 Replicating Data IN THIS CHAPTER Replication concepts Configuring replication R eplication is an optional native SQL Server 2008 component that is used to copy data and other database objects from one database or server to another. Replication is used for many purposes, listed here in order from most popular to rarely used: ■ Offloading reporting from an OLTP server to a reporting server ■ Data consolidation — for example, consolidating branch office data to a central server ■ Data distribution — for example, distributing data from a central server to a set of member servers to improve read performance ■ Disaster recovery — replication can be used to keep a DR (disaster recovery) server synchronized with the main server, and clients can be manually redirected to the DR with minimal interruption ■ Synchronizing data with a central server and a mobile sales force ■ Synchronizing data with handheld devices (such as PDAs and smartphones) Replication processes can be made to be highly scalable, and typically can syn- chronize data between servers/databases with acceptable latency. Latency reflects the lag of time between when data is sent (replicated) from the source server and received at the destination server. Replication is not the o nly way to move data between servers. There a re several alternatives, each with its own pros and cons: ■ bcp utility ■ SSIS 813 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 814 Part V Data Connectivity ■ Distributed transactions ■ Triggers ■ Copy Database Wizard ■ Backup and restore ■ Log shipping and database mirroring Bulk copy program (bcp) is a command-line tool that can be used to send tabular data to the file sys- tem, and from there to a remote server. While it can be scripted, it is slower than replication processes, requires significant work to set up, and the DBA/developer needs to ensure that all objects are in place on the destination server. For example, all tables, views, stored procedures, and functions need to be on the destination server. There is no provision for change tracking. In other words, bcp can’t tell what has changed in the data, and only sends the changes to the destination server. The solution requires change tracking — a way to determine what has been inserted/updated/deleted on the source server. These may involve using Change Data Capture or the Change Tracking features in SQL 2008. SSIS can be thought of as a programmatic interface to a high-performance bcp utility. It can be faster than bcp. As with bcp, it requires that the DBA/developer place all objects on the destination server, and there is no provision for change tracking. Distributed transactions normally involve using MS DTC (Microsoft Distributed Transaction Coordi- nator). With a distributed transaction, the transaction is committed on the source server, then on the destination server, and then the application can do the next unit of work. (This is sometimes called a split write.) The application has to be configured to use distributed transactions, and the network connection must be stable and have ample bandwidth; otherwise, the transactions will fail. With distributed transactions, only changes are ‘‘replicated.’’ The DBA/developer needs to place all tables (along with the initial data), stored procedures, views, and functions on the destination server. Triggers are very similar to distributed transactions. With distributed transactions, all application code (for example, stored procedures, and sometimes ADO.NET code) must be rewritten for the distributed transactions. With triggers, the ‘‘replication’’ logic is incorporated on the trigger. And like distrib- uted transactions, only changes are ‘‘replicated.’’ The DBA/developer needs to place all tables (along with the initial data), stored procedures, views, and functions on the destination server. There is also overhead with using triggers, especially over a network. The Copy Database Wizard will move or copy a database from one server to another. It is intended for a single use move or copy. In the move mode, you can only move the database one time. In the copy mode, the database can be copied multiple times if you specify the options to delete the database and the database files that might exist on the destination server. Backup and restore will copy the entire database to the destination server. The level of granularity possi- ble for the preceding options are tables (bcp, SSIS) and transactions (triggers, distributed transactions). Backup and restore, log shipping, database mirroring, and the Copy Database Wizard ‘‘replicate’’ entire databases. As the name suggests, backup and restore involves backing up the database on the source server and restoring it on the destination server. This option is not scalable for large databases, and the database must go offline while the database is being restored. It is not a good option in environments with real-time data requirements, as the data becomes progressively out of date until the latest backup is restored on the destination server. 814 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 815 Replicating Data 36 Log shipping is continuous backup and restore. The log is backed up on the source server and applied to a previously restored database backup on the destination server. Log shipping is not considered to be scalable, especially for large databases or large numbers of databases. The database on the destina- tion server is not accessible with log shipping. There are options to make it accessible, but it will be in read-only mode, and users need to be kicked off when the next log is r eady to be applied. Database mirroring is continuous log shipping. Changes to the database transaction log are continually shipped from the source server to the destination server. The database on the destination server will be inaccessible while being mirrored. There are two modes of database mirroring: high performance and high safety. With high safety, application writes on the source server are not committed on the source server until they are also committed on the destination server. This can cause increased latency for all writes on the destination server, which may make database mirroring not a good fit for your particular requirements. High-performance mode does not have this problem, as changes occurring on the source server are applied to the destination server asynchronously. However, the high-performance option is only available on the Enterprise Edition of SQL Server 2005 and SQL Server 2008. What’s New with Replication? T here are several new features in SQL Server 2008 replication: ■ Integration with database mirroring. If you have a remote distributor, you can con- figure your publisher on your principal to failover to your mirror without having to reinitialize y our subscribers. ■ Much f aster snapshot delivery on Windows 2008 servers ■ A new Wizard for deploying nodes in a peer-to-peer topology. For more information, consult http://msdn.microsoft.com/en-us/library/dd263442.aspx. ■ Conflict detection in peer-to-peer replication ■ Ability to make schema changes in peer-to-peer replication without having to stop all users from using the topology while changes are deployed. Replication Concepts SQL Server replication operates according to a publishing metaphor. There can be three types of servers in a replication topology: ■ Publisher: The source server ■ Distributor: For transactional replication and peer-to-peer replication, the distributor is where the changes are stored until they are replicated to the destination server. For merge replication, the distributor is merely a repository for replication process history. Changes and historical information are stored in a database called the distribution database. ■ Subscriber: The destination server 815 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 816 Part V Data Connectivity Types of replication Based on the publishing metaphor, SQL Server 2008 offers five basic types of replication, each serving a different purpose: ■ Snapshot replication: A point-in-time image of database objects (a snapshot) is copied from the source server to the destination server. This image generation and deployment can be scheduled at whatever interval makes sense for your requirements; however, it is best used when the majority of your data seldom changes, and when it does, it changes at the same time. ■ Transactional replication: Transactions occurring on the source server are asynchronously captured and stored in a repository (called a distribution database) and then applied, again asynchronously, on the destination server. ■ Oracle publishing: This is a variant of transactional replication. Instead of SQL Server being the source server, an Oracle server is the source server, and changes are replicated from the Oracle server to SQL Server. This SQL Server can be the final destination for the Oracle server’s data, or it can act as a gateway, and changes can be replicated downstream to other SQL Servers, or other RDBMs. Oracle publishing is only available on SQL Server Enterprise Edition and above. ■ Peer-to-peer replication: Another variant of transactional replication that is used to replicate data to one or more nodes. Each node can publish data to member nodes in a peer-to-peer replication topology. Should one node go offline, changes occurring on the offline node and the other member servers will be synchronized when that node comes back online. Changes are replicated bi-directionally, so a change occurring on Node A will be replicated to Node B, and changes occurring on Node B will be replicated to Node A. Peer-to-peer replication is an Enterprise Edition–only feature that is scalable to approximately 10 nodes, but your results may vary depending on your replicated workload, your hardware, and your available bandwidth. ■ Merge replication: As the name indicates, merge replication is used to merge changes occur- ring on the destination server with changes occurring on the source server, and vice versa. It is highly scalable to hundreds if not thousands of destination servers. With merge replication, there is a central clearinghouse for changes that determines which changes go where. With peer-to-peer replication, any member node in the topology can assume the clearinghouse role. Replication agents As you might imagine, a lot of work is required to move data between the various servers in the pub- lishing metaphor. To do so, SQL Server replication makes use of three agents: ■ Snapshot agent: Generates the tabular and schema data or schemas for the objects you wish to replicate. The tables and schema data, and related replication metadata, is frequently referred to as the snapshot. The snapshot agent is used by all replication types. The snapshot agent writes the tabular/schema data to the file system. ■ Distribution agent: Used by snapshot replication to apply the snapshot on the subscribers, and used by transactional replication to apply the snapshot on the subscriber and to replicate subsequent changes occurring on the publisher to the subscriber. 816 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 817 Replicating Data 36 ■ Merge agent: Detects changes that have occurred on the publisher and the subscriber since the last time these agents ran and merges them together to form a consistent set on both the publisher and the subscriber. In some cases, the same primary key value will be assigned on the publisher and one or more subscribers between runs of the merge agent (called a sync). When the merge agent runs it detects this conflict and l ogs it to conflict tables that can be viewed using the conflict viewer. With merge replication, the data that is in conflict will persist on the publisher and the subscriber by default. For example, if a primary key value of 1,000 for a table is assigned on the publisher, and then the same value is assigned on the same table on the subscriber, when the merge agent runs it will log the conflict, but keep the publisher’s values for the row with a PK (primary key) of 1,000 on the publisher, and keep the subscriber’s values for the row with a PK of 1,000 on the subscriber. Merge replication has a rich set of features to handle conflicts, including one that skips changes to different columns occurring on the same row between publisher and subscriber. This is termed column-level conflict tracking. For example, a change to John Smith’s home phone number occurring on the publisher and his cell phone number occurring on the subscriber would be merged to have both changes persisting on both the publisher and sub- scriber. By default, merge replication uses row-level conflict tracking that might result in the change to John Smith’s home phone number being updated on both the publisher and the subscriber, but his cell phone change being rolled back, with this conflict and the conflicting values being logged to the conflict tables. Best Practice A single server can serve as both the publisher and the distributor and even as the subscriber. An excellent configuration for experimenting with replication is a server with multiple SQL Server instances. However, when performance is an issue, a dedicated distributor server is the best plan. This remote distributor can act as a distributor for multiple publishers; in fact, you can configure this remote distributor to have a separate distribution database for each publisher. The publisher server organizes multiple articles (an article is a data source: a single table, view, function, or stored procedure) into a publication. You may find that you get better performance by grouping large articles (tables) into their own publication. The distributor server manages the replication process. The publisher can initiate the subscription and push data to the subscriber server, or the subscriber can set up the subscription and pull the subscription from the publisher. Transactional consistency The measure of transactional consistency is the degree of synchronization between two replicated servers. As the lag time between synchronizations increases, transactional consistency decreases. If the data is identical on both servers most of the time, transactional consistency is said to be high. Conversely, a replication system that passes changes every two weeks by e-mail has low transactional consistency. 817 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 818 Part V Data Connectivity Configuring Replication Using wizards is the simplest way to implement replication. Developers and DBAs generally avoid wiz- ards because they have limited features, b ut implementing replication without wizards requires numer- ous calls to arcane stored procedures and is a tedious and painful process prone to user errors. Before configuring r eplication, it is important to understand the limitations of various SQL Server editions. For example, SQL Server Express can only act as a subscriber, and the number of subscribers each edition can have is limited. For example, the Standard Edition can only have five subscribers, the Web 25 subscribers. Merge replication can only be used to replicate to subscribers with same version or lower. For example, you can’t have a SQL Server 2005 publisher merge replicating to SQL Server 2008 subscribers; however, a SQL Server 2008 publisher can replicate to a SQL Server 2005 subscriber. Merge replication is the only replication type that can replicate to SQL Server CE subscribers. Creating a publisher and distributor To enable a server as a publisher you must first configure it as a subscriber. While you can configure the publisher with a local or remote distributor, it is recommended that you configure the distrib- utor first, before creating your first publication. This way, if there is a problem, it will be easier to troubleshoot. The following steps walk you through the process of creating your first distributor: 1. Connect to the server that will be acting as your publisher/distributor or remote distributor using SQL Server Management Studio. You need to use the SQL Server 2008 version of SQL Server Management Studio for this. 2. Once you have connected, right-click on the replication folder and select the menu option Configure Distribution. If you do not see the Configure Distribution option, either your SQL Server edition is SQL Server Express or you do not have the replication components installed. To install the replication components, you need to run Setup again. 3. After clicking through the initial splash screen, you will have the option to select which server you should use as your distributor: either the local server or a remote server. If you are using a remote server, you need to ensure that the remote server is already configured as a distributor. Because this is a local distributor, select the default option and click Next. 4. You will be prompted for a folder to serve as the default location where the snapshot agent deposits the snapshot. Select a different location if the default folder does not have adequate space for your snapshots, or if you want to minimize I/O contention. The snapshot generation process is an I/O-intensive process during snapshot generation. You do have the option to select a snapshot folder or share for each publication when you create it, so the snapshot folder location is not of critical importance. 5. Once you have selected the location for your snapshot folder or snapshot share, click Next. The Distribution Database dialog enables you to name your distribution database and select folders where the database data and log files will reside. 818 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 819 Replicating Data 36 Best Practice I f you have a large number of subscribers, or you are replicating over a WAN, you should use a share for your snapshot folder, or use FTP along with pull subscribers (you will configure FTP server details when you create your publication). With pull subscribers, the merge or distribution agent or process runs on the s ubscriber. With push subscribers, the distribution and merge agents run on the publisher/distributor or distributor. If you are using push subscribers with a remote distributor, your snapshot folder must also be configured as a snapshot share. It is not a good security practice to use the Admin shares (i.e., C$), but rather a share name that hides the path of the actual physical snapshot folder location, and does not require the distribution or merge agents to run under an account that has rights to access the snapshot share. Best Practice O ptimal configuration of a distributor or a distribution server is on a 64-bit server with ample RAM and RAID 10 drives. The distributor server will be I/O and network bound, so the more RAM available for caching and the greater the available network bandwidth, the greater the throughput of your transactional replication solution. Merge replication is CPU and network bound, so these best practices do not apply for it. 6. Click Next to enable the publishers that you wish to use this distributor. If this is a local publisher/distributor, your publisher will already be selected. If not, you need to click the Add button if you want to enable other publishers to use this distributor. 7. Click Next to assign a distributor password. This allows remote publishers to use this distributor as their distributor. 8. Click Next, Next again, and Finish to com plete the creation of your distributor. To configure a publisher to use a remote distributor, follow these steps: 1. Connect to the publisher using SQL Server Management Studio, right-click on the Replication folder, and select Configure Distribution. 2. When you get to the option to select which server you wish to use as your distributor, select the ‘‘Use the f ollowing server as the Distributor’’ option. 3. Click the Add button and enter the connection information to connect to the remote distribu- tor. You will be prompted for the password you configured to access the remote distributor. 4. Click Next, Next again, and Finish. Your remote distributor is now ready to use. Creating a snapshot/transactional publication Once a distributor is set up for your server, you can create your publications. A publication is defined as a collection of articles, where an article is an item to be published. An article in SQL Server can be a 819 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 820 Part V Data Connectivity table, a view, an indexed view, a user-defined function, or even a stored procedure or its execution. If you choose to replicate the execution of a stored procedure, the stored procedure call will be executed on the subscriber. For example, if you fire a stored procedure that updates 10,000 rows on a table, this table is replicated, and the execution of the stored procedure is executed, only the stored procedure call will be executed. If the replication of the stored procedure execution was not replicated, 10,000 update statements would have to be replicated by the publisher, through the distributor to the subscriber. As you can imagine, there are considerable benefits to doing this. Typically tables are published, but views can also be published. You just need to ensure that the base tables referenced by the views are also published. To create a publication, execute the following steps: 1. Connect to your publisher using SQL Server Management Studio, and expand the Replication folder, then right-click on the local publication folder and select New Publication. 2. After clicking through the initial splash screen, select the database you wish to replicate from the Publication Databases section. 3. Click Next. In the Replication Types dialog that appears, select the replication type you wish to use. You will then get a dialog entitled Articles, from which you can select the type of objects you wish to replicate. 4. Expand each object type tree and select the articles you wish to replicate. For e xample, if you wish to replicate tables, expand the table tree and select the individual tables you wish to replicate. You can elect to replicate all tables by selecting the check box next to the table tree. You a lso have the option to replicate only a subset of the columns in tables you are replicating. If you see a table with what appears to be a red circle with a slash through it next to the table, this table does not have a primary key and you will be unable to replicate it in a transactional publication. Snapshot and merge replication allow you replicate tables without primary keys. If you highlight a table and then click the article properties drop-down box, you can configure options regarding how the table will be replicated to the subscriber. For example, you can replicate user triggers, include foreign key dependencies, and determine what will happen if a table with the same name already exists on the subscriber. The options are as follows: ■ Drop the subscriber table ■ Do nothing ■ Keep the table, but delete all of its data ■ Keep the table, but delete all of the data that meets your filtering c riteria (covered in the next step) 5. Once you have selected the objects you wish to replicate, click Next. The Filter Table Rows dialog will appear. From here, you can configure filtering criteria that sends only a subset of the rows to the subscriber. For example, if you were replicating a table with a state column, 820 www.getcoolebook.com Nielsen c36.tex V4 - 07/21/2009 2:11pm Page 821 Replicating Data 36 you may decide that the subscriber should have only rows from California. To enable that, you would click the Add button, select the table in the drop-down box in the ‘‘Select table to filter option,’’ click the State column in the ‘‘Complete the filter statement’’ section, and then add the state value. In code, it might look like this: SELECT <published_columns> FROM [dbo].[SalesStaff] WHERE [State]=’CA’ This would ensure that the subscriber only receives data and changes from sales staff when the value of State is CA. 6. Once you have e nabled your filters, click Next. The next dialog is Snapshot Agent, which controls two snapshot options: ■ Create a snapshot immediately and keep the snapshot available to initialize subscriptions. ■ Schedule the Snapshot Agent to run at the following times. The first option generates the snapshot immediately; and every replicated change that occurs in the publication is not only replicated to the subscriber, but also added to the snapshot files. This is a great option when you have to deploy a lot of snapshots frequently, but it does add a constant load to your publisher. The second option to schedule the snapshot agent generates a snapshot on a schedule, so the snapshot files are updated each time you run the snapshot agent. Changes not in the snapshot have to be stored i n the distribution agent, which may mean extra storage requirements on the distributor. For most DBAs/developers, it is not a good practice to enable these options. 7. Click Next to configure Agent Security. This option allows you to select the security context you wish your replication agents to run under. By default, SQL Server runs the replication agents under the same account under which the SQL Server agent account runs. This is not considered to be a good security practice, as buffer overflow, worm attacks, or Trojan attacks might be able to hijack the replication agent and run commands with the same security context as the SQL Server Agent on the publisher, distributor, or subscriber. This dialog enables you to control which account the replication agent is going to run under; ideally, this will be an account with as few rights as possible on the publisher, distributor, or subscriber. Figure 36-1 illustrates this dialog. 8. Click the Security Settings button to display the Snapshot Agent Security dialog shown in Figure 36-2. From here, you can enter the Windows or SQL Server Agent account under which you wish the snapshot agent to run. If you choose a Windows account, it needs to be added using the following syntax: DomainName\AccountName. 9. Once you have selected the agents you wish to use, click OK, and then Next to exit the Agent Security dialog. 10. The next dialog is the Wizard Actions dialog. This enables you to create the publication imme- diately, create a script to create the publication, or both. Once you have made your selection, click Next. In the Complete the Wizard dialog that appears, you can name your publication. 11. Once you have given your publication a name, click Finish to create it. After you have created your publication, you can now create one or more subscriptions to it. 821 www.getcoolebook.com . replication. Instead of SQL Server being the source server, an Oracle server is the source server, and changes are replicated from the Oracle server to SQL Server. This SQL Server can be the final. For example, you can’t have a SQL Server 2005 publisher merge replicating to SQL Server 2008 subscribers; however, a SQL Server 2008 publisher can replicate to a SQL Server 2005 subscriber. Merge replication. available on the Enterprise Edition of SQL Server 2005 and SQL Server 2008. What’s New with Replication? T here are several new features in SQL Server 2008 replication: ■ Integration with database