ETLTechniques•Chapter8 327 When the preceding command is run, it produces a data file named Presidents.tsv that looks like this: 1. George Washington 2. John Adams 3. Thomas Jefferson You could turn around and import the data right back into the same table with the command (again, the command should be entered on a single line): bcp AdventureWorks2008.dbo.Presidents in Presidents.tsv -T -f Presidents.fmt Of course, the problem would be that the data file has PresidentIDs in it that conflict with the PresidentIDs of the same records already in the table. What would happen? Well, SQL Server would just ignore the identity values in the data file and generate new identity values. Three new rows would be added with the same names as before, but with new PresidentIDs. If you wanted to override the identity behavior of the PresidentID field, you could use the –E option of BCP to keep the identity values in the data file. In this case the load would fail because that column also has a primary key constraint on it, and as the rows were inserted with the same PresidentID values as the existing rows, the primary key violation would keep the import from succeeding. What if the source file didn’t have any values for the PresidentIDs? If you were to edit the Presidents.tsv file that was produced from your earlier output, you could manually remove the ID values to make the file look like this: George Washington John Adams Thomas Jefferson To make this work, you would then need to edit the Format file as well. The format file would need to indicate that there are now only two fields in the data file rather than three, and that the data file fields map to the FirstName and LastName fields in the AdventureWorks2008.dbo.Presidents table. Your edited Presidents.fmt file would look like this: 10.0 2 1 SQLCHAR 0 50 "\t" 2 FirstName SQL_Latin1_General_CP1_CI_AS 2 SQLCHAR 0 50 "\r\n" 3 LastName SQL_Latin1_General_CP1_CI_AS Notice in the preceding format file that the first field in the data file is mapped to the second column in the table (FirstName) and that the second field in the data 328 Chapter8•ETLTechniques file is mapped to the third column in the table (LastName). You completely ignore the first column in the table (PresidentID). SQL Server will use the IDENTITY property on that field to generate those values. You could finally run the same bcp command to import the data file as before, and SQL Server would use the identity property to generate the PresidentID values automatically. bcp AdventureWorks2008.dbo.Presidents in Presidents.tsv -T -f Presidents.fmt No discussion on BCP format files would be complete without looking at BCP’s new XML format files. These files were introduced in SQL Server 2005, and although they can be much easier to work with than the traditional format files, they still don’t appear to be as widely used. The following command generates a format file just like the first example in this topic, but this time it uses the –x option to produce an XML format file: bcp AdventureWorks2008.dbo.Presidents format null -T -c -f Presidents.fmt.xml –x The format file that is produced by the preceding command looks like the fol- lowing example: <?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="12"/> <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n" ETLTechniques•Chapter8 329 MAX_LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> </RECORD> <ROW> <COLUMN SOURCE="1" NAME="PresidentID" xsi:type="SQLINT"/> <COLUMN SOURCE="2" NAME="FirstName" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="3" NAME="LastName" xsi:type="SQLVARYCHAR"/> </ROW> </BCPFORMAT> After our description of the traditional format file, the preceding options should make sense. If you are looking at creating format files for new systems and don’t need to maintain backward compatibility with older SQL server versions, the XML format files may actually be easier to work with and work better with future versions. The BCP command can be a little difficult to work with, but once you get the command line switches and format files right, it can be an extremely efficient way to import or export data from SQL Server. BCP is a command line utility, however, and runs outside of the SQL Server process. Consequently, the data has to be sent, or “marshaled,” between the client and the server. If there is a lot of data to load, this might be a good thing. Why? Well, because the bcp command could be run multiple times in parallel from multiple workstations and the load could be distrib- uted across many machines. However, if the data file is manageable enough for a single process to handle, it might be more efficient to do it from within SQL Server itself. The BULK INSERT statement enables you to do just that. Before we get into BULK INSERT, though, give BCP a try yourself. EXERCISE 8.1 Us i n g BCP In this exercise, you will export data from the AdventureWorks2008. Person.Person table to a data file using BCP. This exercise assumes that you have administrative privileges on the SQL Server instance you are working with, that you have the AdventureWorks2008 sample database installed on your SQL Server instance, and that you are running the exercise from the same computer where the SQL Server instance is installed. 330 Chapter8•ETLTechniques Using the BULK INSERT Statement The BULK INSERT Transact-SQL statement closely mimics the BCP command line utility. There are some differences, though. The BULK INSERT statement can only import data into SQL Server. It won’t export data to a file like BCP. Moreover, because it is executed by the SQL data engine, you don’t have the ability to run multiple loads in parallel from different machines to optimize performance. Ex a m Wa r n i n g Remember that the BCP command line utility can import and export data from SQL server. The BULK INSERT Transact-SQL statement, however, can only import data into SQL Server. 1. Create a folder off the root of the C: Drive named BCP (C:\BCP). 2. Open a command prompt in windows and change to the C:\BCP directory. 3. Use the following command to export the AdventureWorks2008. Person.Person table data to a file: bcp AdventureWorks2008.Person.Person out C:\BCP\Person.tsv -w –T 4. View the file the C:\BCP\Person.tsv file in notepad. The basic syntax for a BULK INSERT in SQL Server looks like this: BULK INSERT {dbtable} FROM {datafile} [WITH (option, )] As with BCP the dbtable states the target of the load. The datafile parameter specifies the path to the data file. Remember that you are submitting a statement that the server will execute, so the path used must be resolvable by the server. A number of options can be specified. They mirror very closely to a similar option in BCP. Table 8.4 shows some common BULK INSERT statement options. ETLTechniques•Chapter8 331 There are other options, and we will discuss some of them in the section of this chapter on performance. So let’s look at a quick example. Earlier you exported a Unicode, tab-delimited set of data from the Person.Person table. Assume the path to the data file is “C:\bcp\Person.tsv”. In the following script you will first truncate (or clear) the AdventureWorks2008.Person.PersonCopy table, and then you will import the data file contents into the table using the BULK INSERT statement: Clear the table from previous loads TRUNCATE TABLE AdventureWorks2008.Person.PersonCopy; Bulk insert new data into the table BULK INSERT AdventureWorks2008.Person.PersonCopy FROM 'C:\bcp\Person.tsv' WITH (DATAFILETYPE='widechar'); Remember that BCP is a command line tool, so you run BCP statements from a Windows command prompt. BULK INSERT, however, is a Transact-SQL state- ment, so you need to run the preceding statement in a query window in SQL Server Management Studio (or some other tool like sqlcmd). BULK INSERT WITH Option BCP Equivalent Description DATAFILETYPE -c -w –n -N Specifies the data file type of the source file. This can be ’char’, ‘widechar’,‘native’, or ‘widenative.’ FIELDTERMINATOR -t Specifies the field terminator character or characters. ROWTERMINATOR -r Specifies the row terminator character or characters. FORMATFILE -f Specifies the path to a format file. As with the datafile, the path must be relative to the server. Table 8.4 Common BULK INSERT Options . from SQL Server. BCP is a command line utility, however, and runs outside of the SQL Server process. Consequently, the data has to be sent, or “marshaled,” between the client and the server. . with, that you have the AdventureWorks2008 sample database installed on your SQL Server instance, and that you are running the exercise from the same computer where the SQL Server instance is. BCP the dbtable states the target of the load. The datafile parameter specifies the path to the data file. Remember that you are submitting a statement that the server will execute, so the