Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 702 Part V Data Connectivity @datasrc = ‘C:\SQLServerBible\CHA1_Schedule.xls’, @provstr = ‘Excel 5.0’; Excel spreadsheets are not multi-user spreadsheets. SQL Server can’t perform a distributed query that accesses an Excel spreadsheet while that spreadsheet is open in Excel. Linking to MS Access Not surprisingly, SQL Server links easily to MS Access databases. SQL Server uses the OLE DB Jet provider to connect to Jet and request data from the MS Access .mdb file. FIGURE 31-3 Prior to the conversion to SQL Server, the Cape Hatteras Adventures company was managing its tour schedule in the CHA1_Schedule.xls spreadsheet. 702 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 703 Executing Distributed Queries 31 FIGURE 31-4 Tables are defined within the Excel spreadsheet as named ranges. The CHA1_Schedule spreadsheet has five named ranges. Because Access is a database, there’s no trick to preparing it for linking, as there is with Excel. Each Access table will appear as a table under the Linked Servers node in Management Studio. The Cape Hatteras Adventures customer/prospect list was stored in Access prior to upsizing the database to SQL Server. The following code from the CHA2_Convert.sql script links to the CHA1_Customers.mdb Access database so SQL Server can retrieve the data and populate the SQL Server tables: EXEC sp_addlinkedserver ‘CHA1_Customers’, 703 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 704 Part V Data Connectivity ‘Access 2003’, ‘Microsoft.Jet.OLEDB.4.0’, ‘C:\SQLServerBible\CHA1_Customers.mdb’; If you are having difficulty with a distributed query, one of the first places to check is the security con- text. Excel expects that connections do not establish a security context, so the non-mapped user login should be set to no security context: EXEC sp_addlinkedsrvlogin @rmtsrvname = ‘CHA1_Schedule’, @useself = ‘false’; Developing Distributed Queries Once the link to the external data source is established, SQL Server can reference the external data within queries. Table 31-2 shows the four basic syntax methods that are available, which differ in query-processing location and setup method. TABLE 31-2 Distributed Query Method Matrix Link Setup Query-Execution Location Local SQL Server External Data Source (Pass-Through) Linked Server Four-part name Four-part name OpenQuery() Ad Hoc Link Declared in the Query OpenDataSource() OpenRowSet() Distributed queries and Management Studio Management Studio doesn’t supply a graphic method for initiating a distributed query. There’s no way to drag a linked server or remote table into the Query Designer. However, the distributed query can be entered manually in the SQL pane and then executed as a query. Using the Query Editor, the name of the linked server can be dragged from the Object Explorer to the Query Editor. Distributed views Views are saved SQL SELECT statements. While I don’t recommend building a client/server application based on views, they are useful for ad hoc queries. Because most users (and even developers) are unfa- miliar with the various methods of performing distributed queries, wrapping a distributed query inside a view might be a good idea. 704 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 705 Executing Distributed Queries 31 Local-distributed queries A local-distributed query sounds like an oxymoron, but it’s a query that pulls the external data into SQL Server and then processes the query at the local SQL Server. Because the processing occurs at the local SQL Server, local-distributed queries use T-SQL syntax and are sometimes called T-SQL distributed queries. Using the four-part name If the data is in another SQL Server, then a complete four-part name is required: Server.Database.Schma.ObjectName The four-part name may be used in any SELECT or data-modification query. On my writing computer is a second instance of SQL Server called [SQL2008RC0\London]. The object’s owner name is required if the query accesses an external SQL Server. The following query retrieves the Person table from the SQL2 instance: SELECT LastName, FirstName FROM [SQL2008RC0\London].Family.dbo.Person; Result: LastName FirstName Halloway Kelly Halloway James When performing an INSERT, UPDATE,orDELETE command as a distributed query, either the four-part name or a distributed query function must be substituted for the table name. For example, the following SQL code, extracted from the CHA2_Convert.sql script that populates the CHA2 sample database, uses the four-part name as the source for an INSERT command. The query retrieves base camps from the Excel spreadsheet and inserts them into SQL Server: INSERT BaseCamp(Name) SELECT DISTINCT [Base Camp] FROM CHA1_Schedule [Base_Camp] WHERE [Base Camp] IS NOT NULL; If you’ve already executed CHA2_Convert.sql and populated your copy of CHA2, then you may want to re-execute CHA2_Create.sql in order to start with an empty database. As another example of using the four-part name for a distributed query, the following code updates the Family database on the second SQL Server instance: UPDATE [SQL2008RC0\London].Family.dbo.Person SET LastName = ‘Wilson’ WHERE PersonID = 1; 705 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 706 Part V Data Connectivity OpenDataSource() Using the OpenDataSource() function is functionally the same as using a four-part name to access a linked server, except that the OpenDataSource() function defines the link within the function instead of referencing a pre-defined linked server. While defining the link in code bypasses the linked server requirement, if the link location changes, then the change will affect every query that uses OpenDataSource(). In addition, OpenDataSource() won’t accept variables as parameters. The OpenDataSource() function is substituted for a server in the four-part name and may be used within any DML statement. The syntax for the OpenDataSource() function seems simple enough: OPENDATASOURCE ( provider_name, init_string ) However, there’s more to it than the first appearance betrays. The init string is a semicolon-delimited string containing several parameters (the exact parameters used depend on the external data source and are not described here; see Books Online for a full overview). The potential parameters within the init string include data source, location, extended properties, connection timeout, user ID, password, and catalog. The init string must define the entire external data-source connection, and the security context, within a function. No quotes are required around the parameters within the init string. The common error committed in building OpenDataSource() distributed queries is mixing the commas and semicolons. If OpenDataSource() is connecting to another SQL Server using Windows authentication, then authentication delegation via Kerberos security is required. A relatively straightforward example of the OpenDataSource() function is using it as a means of accessing a table within another SQL Server instance: SELECT FirstName, Gender FROM OPENDATASOURCE( ‘SQLOLEDB’, ‘Data Source=SQL2008VPC\London;User ID=Joe;Password=j’ ).Family.dbo.Person; Result: FirstName Gender Adam M Alexia F The following example of a distributed query that uses OpenDataSource() references the Cape Hatteras Adventures sample database. Because an Access location contains only one database and the tables don’t require the owner to specify the table, the database and owner are omitted from the four-part name: SELECT ContactFirstName, ContactLastName FROM OPENDATASOURCE( ‘Microsoft.Jet.OLEDB.4.0’, 706 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 707 Executing Distributed Queries 31 ‘Data Source = C:\SQLServerBible\CHA1_Customers.mdb’ ) Customers; Result: ContactFirstName ContactLastName Neal Garrison Melissa Anderson Gary Quill To illustrate using OpenDataSource() in an update query, the following query example will update any rows inside the CHA1_Schedule.xls Excel 2000 spreadsheet. A named range was previously defined as Tours ‘=Sheet1!$E$5:$E$24’, which now appears to the SQL query as a table within the data source. Rather than update an individual spreadsheet cell, this query performs an UPDATE operation that affects every row in which the tour column is equal to Gauley River Rafting and updates the Base Camp column to the value Ashville. The distributed SQL Server query will use OLE DB to call the Jet engine, which will open the Excel spreadsheet file. Because the spreadsheet is opened by a user, the file is now unavailable to anyone else. Excel is a single-user database. The OpenDataSource() function supplies only the server name in a four-part name; as with Access, the database and owner values are omitted: UPDATE OpenDataSource( ‘Microsoft.Jet.OLEDB.4.0’, ‘Data Source=C:\SQLServerBible\CHA1_Schedule.xls; User ID=Admin;Password=;Extended properties=Excel 5.0’ ) Tour SET [Base Camp] = ‘Ashville’ WHERE Tour = ‘Gauley River Rafting’; Figure 31-5 illustrates the query execution plan for the distributed UPDATE query, beginning at the right with a Remote Scan operation that returns all 19 rows from the Excel named range. The data is then processed within SQL Server. The details of the Remote Update logical operation reveal that the distributed UPDATE query actually updated only two rows. To complete the example, the following query reads from the same Excel spreadsheet and verifies that the update took place. Again, the OpenDataSource() function is only pointing the distributed query to an external server: SELECT * FROM OpenDataSource( ‘Microsoft.Jet.OLEDB.4.0’, ‘Data Source=C:\SQLServerBible\CHA1_Schedule.xls; User ID=Admin;Password=;Extended properties=Excel 5.0’ ) Tour WHERE Tour = ‘Gauley River Rafting’; 707 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 708 Part V Data Connectivity FIGURE 31-5 The query execution plan for the distributed query using OpenDataSource() Result: Base Camp Tour Ashville Gauley River Rafting Ashville Gauley River Rafting Pass-through distributed queries A pass-through query executes a query at the external data source and returns the result to SQL Server. The primary reason for using a pass-through query is to reduce the amount of data being passed from the server (the external data source) and the client (SQL Server). Rather than pull a million rows into SQL Server so that it can use 25 of them, it may be better to select those 25 rows from the external data source. Be aware that the pass-through query will use the query syntax of the external data source. If the external data source is Oracle or Access, then PL/SQL or Access SQL must be used in the pass-through query. 708 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 709 Executing Distributed Queries 31 In the case of a pass-through query that modifies data, the remote data type determines whether the update is performed locally or remotely: ■ When another SQL Server is being updated, the remote SQL Server will perform the update. ■ When non–SQL Server data is being updated, the data providers determine where the update will be performed. Often, the pass-through query merely selects the correct rows remotely. The selected rows are returned to SQL Server, modified inside SQL Server, and then returned to the remote data source for the update. Two forms of local distributed queries exist, one for linked servers and one for external data sources defined in the query; likewise, two forms of explicitly declaring pass-through distributed queries exist as well. OpenQuery() uses an established linked server, and OpenRowSet() declares the link within the query. Using the four-part name If the distributed query is accessing another SQL Server, then the four-part name becomes a hybrid distributed query method. Depending on the FROM clause and the WHERE clause, SQL Server will attempt to pass as much of the query as possible to the external SQL Server to improve performance. When building a complex distributed query using the four-part name, it’s difficult to predict how much of the query SQL Server will pass through. I’ve seen SQL Server take a single query and depending on the WHERE clause, the whole query was passed through, each table became a separate pass-through query, or only one table was passed through. OpenQuery() For pass-through queries, the OpenQuery() function leverages a linked server, so it’s the easiest to develop. It also handles changes in server configuration without changing the code. The OpenQuery() function is used within the SQL DML statement as a table. The function accepts only two parameters: the name of the linked server and the pass-through query. The next query uses OpenQuery() to retrieve data from the CHA1_Schedule Excel spreadsheet: SELECT * FROM OPENQUERY(CHA1_Schedule, ‘SELECT * FROM Tour WHERE Tour = "Gauley River Rafting"’); Result: Tour Base Camp Gauley River Rafting Ashville Gauley River Rafting Ashville The OpenQuery() pass-through query requires almost no processing by SQL Server. The Remote Scan returns exactly two rows to SQL Server. The WHERE clause is executed by the Jet engine as it reads from the Excel spreadsheet. In the next example, the OpenQuery() requests the Jet engine to extract only the two rows requiring the update. The actual UPDATE operation is performed in SQL Server, and the result is written back 709 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 710 Part V Data Connectivity to the external data set. In effect, the pass-through query is performing only the SELECT portion of the UPDATE command: UPDATE OPENQUERY(CHA1_Schedule, ‘SELECT * FROM Tour WHERE Tour = "Gauley River Rafting"’) SET [Base Camp] = ‘Ashville’; OpenRowSet() The OpenRowSet() function is the pass-through counterpart to the OpenDataSet() function. Both require the remote data source to be fully specified in the distributed query. OpenRowSet() adds a parameter to specify the pass-through query: SELECT ContactFirstName, ContactLastName FROM OPENROWSET (’Microsoft.Jet.OLEDB.4.0’, ‘C:\SQLServerBible\CHA1_Customers.mdb’; ‘Admin’;’’, ‘SELECT * FROM Customers WHERE CustomerID = 1’); Result: ContactFirstName ContactLastName Tom Mercer Best Practice O f the four distributed-query methods, the best option is the OpenQuery() function. With OpenQuery(), you have specific control over which data will be processed where. In addition, it has the advantage of predefined links, making the query more robust if the server configuration changes. To perform an update using the OpenRowSet() function, use the function in place of the table being modified. The following code sample modifies the customer’s last name in an Access database. The WHERE clause of the UPDATE command is handled by the pass-through portion of the OpenRowSet() function: UPDATE OPENROWSET (’Microsoft.Jet.OLEDB.4.0’, ‘C:\SQLServerBible\CHA1_Customers.mdb’; ‘Admin’;’’, ‘SELECT * FROM Customers WHERE CustomerID = 1’) SET ContactLastName = ‘Wilson’; Distributed Transactions Transactions are key to data integrity. If the logical unit of work includes modifying data outside the local SQL server, then a standard transaction is unable to handle the atomicity of the transaction. If a failure should occur in the middle of the transaction, then a mechanism must be in place to roll back 710 www.getcoolebook.com Nielsen c31.tex V4 - 07/21/2009 2:03pm Page 711 Executing Distributed Queries 31 the partial work; otherwise, a partial transaction will be recorded and the database will be left in an inconsistent state. Chapter 66, ‘‘Managing Transactions, Locking, and Blocking,’’ explores the ACID properties of a database and transactions. Distributed Transaction Coordinator SQL Server uses the Distributed Transaction Coordinator (DTC) to handle multiple server transactions, commits, and rollbacks. The DTC service uses a two-phase commit scheme for multiple server trans- actions. The two-phase commit ensures that every server is available and handling the transaction by performing the following steps: 1. Each server is sent a ‘‘prepare to commit’’ message. 2. Each server performs the first phase of the commit, ensuring that it is capable of committing the transaction. 3. Each server replies when it has finished preparing for the commit. 4. Only after every participating server has responded positively to the ‘‘prepare to commit’’ message is the actual commit message sent to each server. If the logical unit of work only involves reading from the external SQL Server, then the DTC is not required. Only when remote updates are occurring is a transaction considered a distributed transaction. The Distributed Transaction Coordinator is a separate service from SQL Server. DTC is started or stopped with the SQL Server Service Manager. Only one instance of DTC runs per server regardless of how many SQL Server instances may be installed or running on that server. The actual service name is msdtc.exe, and it consumes only about 2.5 MB of memory. DTC must be running when a distributed transaction is initiated or the transaction will fail. Developing distributed transactions Distributed transactions are similar to local transactions with a few extensions to the syntax: SET xact_abort on; BEGIN DISTRIBUTED TRANSACTION; In case of error, the xact_abort connection option will cause the current transaction, rather than only the current T-SQL statement, to be rolled back. The xact_abort ON option is required for any dis- tributed transactions accessing a remote SQL Server and for most other OLE DB connections as well; but if xact_abort ON is not in the code, then SQL Server will automatically convert the transaction to xact_abort ON as soon as a distributed query is executed. The BEGIN DISTRIBUTED TRANSACTION command, which determines whether the DTC service is available, is not strictly required. If a transaction is initiated with only BEGIN TRAN, then the transaction is escalated to a distributed transaction, and DTC is checked as soon as a distributed query is executed. It’s considered a better practice to use BEGIN DISTRIBUTED TRANSACTION so that DTC is checked at the beginning of the transaction. When DTC is not running, an 8501 error is raised automatically: 711 www.getcoolebook.com . into SQL Server and then processes the query at the local SQL Server. Because the processing occurs at the local SQL Server, local-distributed queries use T -SQL syntax and are sometimes called T -SQL. instance of SQL Server called [SQL2 008RC0London]. The object’s owner name is required if the query accesses an external SQL Server. The following query retrieves the Person table from the SQL2 instance: SELECT. a separate service from SQL Server. DTC is started or stopped with the SQL Server Service Manager. Only one instance of DTC runs per server regardless of how many SQL Server instances may be installed