ETLTechniques•Chapter8 337 In the last few sections we have discussed a broad range of bulk load operations. The BCP and BULK INSERT statements both assume that you will be working with flat files, and SQL Server. Sometimes the data you want is already in another database engine somewhere else, however, and you would like to get it straight from that database rather than having to go through a flat file as an intermediary. Distributed queries are one way to do that, and we’ll talk about those next. Distributed Queries Distributed queries make it possible for you to have SQL Server work with data in external data sources. The external data sources could be other SQL Server instances, or they could be an Oracle instance, or an Excel file, and so on. There are a number of methods available to working with data in external data sources. You can specify the data source as part of the query using functions like OPENROWSET or OPENDATASOURCE. Because these methods specify the data connection with the query, and not as a preconfigured server object, we call them “ad hoc distributed queries.” You can also formally define the external data sources before you use them by defining linked servers. These linked servers can be used in four-part, fully qualified object names to directly reference tables and views in the external data sources. We will talk about ad hoc distributed queries and linked server, but first let’s quickly cover the use of fully qualified names. Understanding Fully Qualified Object Names The fully qualified name of an object in SQL Server is actually made up of four parts with each part separated by a period. The format of a fully qualified name is: [InstanceName].[DatabaseName].[SchemaName].ObjectName The [InstanceName] is optional. If you don’t specify it, it assumes the instance that the client is connected to. In fact, initially you can use the name of only the SQL Instance to which you connected. However, through the use of Linked Servers or the OPENDATASOURCE function, you can provide information about another instance to use in your fully qualified names. SQL Server then can use that information to connect to that other instance on your behalf. The [DatabaseName] is also optional. It defaults to the database the connection is currently in. However, you can specify an alternative database name if the object you need to access is in another database on the same instance. The [SchemaName] is optional as well. It defaults to the default schema for the user account the connection is connected as. If the object isn’t found in the user’s 338 Chapter8•ETLTechniques default schema, SQL Server will make a second check to see if there is an object with the same name in the dbo schema. Schema can be used in databases to help organize objects for security purposes. They also help to organize objects into different namespaces. This allows you to have to objects with the same name in the same database, as long as they belong to different schema. If your database uses schema heavily (like AdventureWorks and AdventureWorks2008 do), then you should be in the habit of always including the schema name in your object identifiers. Actually, schema qualifying your object identifiers is a best practice even if you don’t use schemas to organize your objects. Even if you stick every object in the dbo schema, you would still benefit from always including the dbo schema reference in your identifiers. There are a couple of reasons for this. The most important is that you will get better reuse of cached plans. Because your object references are specific, the optimizer knows exactly what objects are being referenced and can create shareable copies of the plan in the procedure cache. If you don’t schema qualify, it makes plans that are cached and reusable only by your connection. Furthermore, it gets you in the habit of schema qualifying your objects. This is a good habit to have with the increased dependency on schemas in SQL environments, as well as in other platforms like Oracle. Finally, you must specify the object name. This is the only piece that is not optional. If the object name, or any other component (instance, database or schema) contains nonstandard characters, then you must enclose the element in either square brackets ([ ]) or double quotes (“) if the quoted identifiers setting is turned on. For example, if your SQL Server instance is named “SQL08,” and on the instance there is a database named “AdventureWorks2008,” and in the database there is a schema named “HumanResources,” and in that schema there is a table name “Employee,” the fully qualified name of the Person table is: [SQL08].[AdventureWorks2008].[Person].[Person] The preceding name is written using the square brackets to delimit the identifiers. However, none of them have any special characters and therefore the square brackets are not required. So you could have just written: SQL08.AdventureWorks2008.Person.Person Remember that you don’t have to specify a component of the fully qualified identifier if the default is appropriate. The following example shows querying data from a table in the AdventureWorks2008 database even though our connection is in the Master database: USE master; SELECT * FROM AdventureWorks2008.Production.Product; ETLTechniques•Chapter8 339 Finally, here is an example of querying a data from a table named Demo in the dbo schema in the AdventureWorks2008 database. Because the dbo schema is always checked if the object isn’t found in the user’s default schema, you could just not specify it and rely on the default. USE master; SELECT * FROM AdventureWorks2008 Demo; Notice the double dots (“ ”) in the identifier in the preceding example. Those have to be there so that SQL Server knows that you have given it the database name and object name, but not the schema name. SQL Server always parses the object identifiers from the right to the left. The right-most name is the object, the next name to the left is the schema, then the database, and then the instance. Great, so now you know how to fully qualify an object name. Let’s start looking at how we can get data from other systems in SQL Server. Enabling Ad Hoc Distributed Queries Allowing clients to submit ad hoc queries that access external data has a number of security implications. What is the external data they are accessing? Whom do they access it as? Are there any issues that could be destabilizing the SQL instance? Starting with SQL Server 2005 Microsoft started implementing a “secure out-of-the- box” installation model. This means that while there are a massive number of features in the SQL Server platform, a large number of them are turned off by default. This allows administrators to safely configure the ones they want to use and enable only those features. The other features that you don’t need can be left off. This helps reduce the number of features that hackers can attempt to break into. When many features in SQL Server are enabled it is a big target; when fewer features are enabled, SQL Server is a smaller target. That is where the term “Surface Area” comes from. The smaller you are, the smaller your surface area, and the harder you are to attack. Ad hoc distributed queries are one of those features that need to be turned on if you want to use them. The following code shows you how to use the sp_ configure system store procedure to enable ad hoc queries or to disable them if necessary. You must have sysadmin (or CONTROL) permissions on the instance to set these options. Also the option is an “advanced” option, so you must first enable the viewing of advanced options. Here is the code to use: Turn on AdHoc Queries… EXEC sp_configure 'Show Advanced',1; RECONFIGURE; EXEC sp_configure 'Ad Hoc Distributed Queries',1; 340 Chapter8•ETLTechniques RECONFIGURE; Turn off AdHoc Queries… EXEC sp_configure 'Show Advanced',1; RECONFIGURE; EXEC sp_configure 'Ad Hoc Distributed Queries',0; RECONFIGURE; In addition to enabling ad hoc distributed queries in general, each OLE DB provider can be set to either allow or disallow ad hoc access. If you set an OLE DB provider to not allow ad hoc access, only non-sysadmin users are limited. Connections that run with sysadmin privileges will still be allowed to perform ad hoc distributed queries using the provider. This makes it possible for you to allow ad hoc queries using specific OLE DB providers while preventing ad hoc access with others. To prevent a specific provider from allowing ad hoc access, you can use the Object Explorer in SSMS. Here are the basic steps: 1. Connect to your SQL Server instance in the SSMS Object Explorer. 2. Expand Server Objects | Linked Servers | Providers 3. You will see a list of the OLE DB providers on your system. 4. Right click on the provider you wish to configure ad hoc access for and select Properties. 5. In the list of Provider Options, set the Disallow adhoc access to meet your requirements. Ex a m Wa r n i n g Notice that the Disallow adhoc access is a negative property. Turning the property on disables ad hoc access, while turning it off enables ad hoc access. For example, an OLE DB provider allows access to any ODBC data source. It is rather dangerous because any ODBC data source can be configured as long as it exists on the server. If you want to prevent users who are not system administrators from using ad hoc queries to access ODBC data sources, you could disable ad hoc access for the OLE DB provider. The MSDASQL Provider is the OLE DB Provider for ODBC Drivers. Figure 8.2 shows an example of the provider properties for the MSDASQL Provider. ETLTechniques•Chapter8 341 Once you have set the preceding property, any users who are not system admin- istrators attempting to perform an ad hoc query using the MSDASQL provider would receive the following error: Msg 7415, Level 16, State 1, Line 1 Ad hoc access to OLE DB provider 'MSDASQL' has been denied. You must access this provider through a linked server. Figure 8.2 The Disallow Ad Hoc Access Property EXERCISE 8.3 En a B l i n g ad Ho C di s t r i B U t E d QU E r i E s In this exercise, review and set the Ad Hoc Distributed Queries server configuration option. This option must be enabled for ad hoc distributed queries functions like OPENROWSET() and OPENDATASOURCE() to work. This exercise assumes that you have administrative privileges on the SQL Server instance you are working with and that you are running the exercise from the same computer where the SQL Server instance is installed. . name. SQL Server always parses the object identifiers from the right to the left. The right-most name is the object, the next name to the left is the schema, then the database, and then the instance. Great,. Demo; Notice the double dots (“ ”) in the identifier in the preceding example. Those have to be there so that SQL Server knows that you have given it the database name and object name, but not the schema. have SQL Server work with data in external data sources. The external data sources could be other SQL Server instances, or they could be an Oracle instance, or an Excel file, and so on. There