1. Trang chủ
  2. » Công Nghệ Thông Tin

The Real MTCS SQL Server 2008 Exam 70/432 Prep Kit- P69 docx

5 129 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 5
Dung lượng 137,93 KB

Nội dung

322 Chapter8•ETLTechniques need to describe the structure of the file to BCP. When you use character or Unicode files, however, you must describe the structure of the data in the file to BCP. For example if it is a delimited file, the delimiters need to be specified for BCP to recognize them. HeadoftheClass… DealingwithCharactersandCollations The file storage type that is used can be a common problem when sharing data between different database systems, operating systems, and organi- zations. If you receive a data file that has the data stored as character data, the way those characters are encoded can be an issue. In SQL Server you describe the encoding of the character data as well as how that data can be sorted and compared using collations. The Unicode character set is able to represent thousands of possible character symbols. The Unicode character set is sufficient for representing characters from all the major languages, alphabets and cultures in the world. However, non-Unicode character sets typically can represent only 256 possible symbols. So when you create SQL Server instances, databases, and character columns, you need to specify the character set that has the 256 characters you want. When you are transferring data between two systems, it is possible that the two systems may have elected to use different sets of characters for their non-Unicode data. BCP gives you a number of ways to deal with the differences. You can use the command line arguments to let bcp know that the data file contains either character (-c) or unicode (-w). You can also specify the specific code page (or character set) that the data file was encoded with by including the –C argument. Finally, you can do column-specific collation assignments using bcp format files. You have probably worked with either comma-separated value (csv) or tab- separated value (tsv) files in the past. They store data as values with a delimiter (a comma, a tab, or something else) between each of the values. The rows typically end with a line feed (“\n”) or a carriage return and a line feed (“\r\n”). The following example exports the same data that you got before from the AdventureWorks2008. ETLTechniques•Chapter8 323 Person.Person table, but this time you’ll use a nonnative Unicode format (-w) for the data, and you will specify a comma as the delimiter (-t “,”). bcp AdventureWorks2008.Person.Person out person.csv -w -t, -T If you were to open the person.csv file that is created by the preceding statement, it would look similar to the following (the output has been trimmed for readability). Notice that the field values are separated by commas as was specified in the command line: 1,EM,0,,Ken,J,Sánchez,,0,,<IndividualSurvey 2,EM,0,,Terri,Lee,Duffy,,1,,<IndividualSurvey 3,EM,0,,Roberto,,Tamburello,,0,,<IndividualSurvey Now try to import the data back into the same AdventureWorks2008.Person. PersonCopy table that you used before. Because it already has data in it, you will truncate the table first. To do that you can run the following statement in a query window in SQL Server Management Studio: TRUNCATE TABLE AdventureWorks2008.Person.PersonCopy; Next, you’ll try to load the data into the newly truncated table. Review the script and the output. Notice that you receive an error: bcp AdventureWorks2008.Person.PersonCopy in person.csv -w -t, -T Starting copy SQLState = 22005, NativeError = 0 Error = [Microsoft][SQL Server Native Client 10.0]Invalid character value for cast specification The cause of the error is that actual data has commas in it (this is common in fields that contain human-entered notes or comments). BCP reads the comma in the data as if it were the delimiter of the field. This messes up the reading of the file and causes errors. If you had to stay with a nonnative file, you could specify an alternate field terminator. When picking either field or row terminators, you want to select a character, or a character sequence, that doesn’t occur in the data itself. In this case you could try a tab (“\t”) or something like a pipe character (|) that almost never occurs in human-entered data. If you ran the preceding example with no –t option, the default tab delimiter would have been used to delimit the fields, and because there are luckily no tabs in the actual data, it should work. Here is what that command would look like: 324 Chapter8•ETLTechniques bcp AdventureWorks2008.Person.Person out Person.tsv -w -T The data file produced by the preceding statement would be tab delimited. You could then successfully import it into your Person.PersonCopy table using a very similar statement: bcp AdventureWorks2008.Person.PersonCopy in Person.tsv -w -T You can see where getting BCP to work with your data could be problematic. It has already become a problem pulling data from one of your own SQL tables. It can get even more troublesome when you have to make data that has come from business partners to load successfully into your own tables. As the data formatting specification becomes more complex, you need the power of format files. In the next section we’ll talk about format files. Using Format Files Format files allow you to more explicitly describe the structure of the data file and how it maps to the corresponding SQL Server table or view. For native data files or simple character or Unicode data file types, you can probably specify all the infor- mation that BCP needs to parse the files just using the command line switches. However, if the files use fixed field widths rather than delimiters, or if different fields use different delimiters, the command line options fall short. There are also times when the data file you are using has a different number of columns than the target table you want to load the data into. In those situations format files become a requirement. A common situation where format files are needed is when the target object has an identity column that generates primary key values, but the data file does not include the values for the column. There will be a mismatch between the number of columns in the data file and the target table. Creating a format file is easiest when you have BCP do the initial work for you. There are a number of ways you can do this task. You could run the bcp command with insufficient input and have it prompt you for the details, or you could specify the details needed on the command line, but ask that it generate a format file for you by using the “format” and “-f ” options. Finally, you could have it produce a newer XML format file by including not only the “format” and “-f ” options but also the “-x” option. To demonstrate using format you will start with a simple table that has three columns in it. The following script would generate the table and load it with some sample data: ETLTechniques•Chapter8 325 USE AdventureWorks2008; CREATE TABLE dbo.Presidents (PresidentID int IDENTITY(1,1) NOT NULL PRIMARY KEY, FirstName varchar(50) NOT NULL, LastName varchar(50) NOT NULL); INSERT INTO Presidents VALUES ('George','Washington') INSERT INTO Presidents VALUES ('John','Adams') INSERT INTO Presidents VALUES ('Thomas','Jefferson') Next, you will have BCP create a format file named character type data file and have it name the format file Presidents.fmt. Because you are only generating a for- mat file and not really moving any data, there is no data file. That explains the “nul” where the data file path would normally be: bcp AdventureWorks2008.dbo.Presidents format nul -T -c -f Presidents.fmt The file that is produced by the preceding command looks like this: 10.0 3 1 SQLCHAR 0 12 "\t" 1 PresidentID "" 2 SQLCHAR 0 50 "\t" 2 FirstName SQL_Latin1_General_CP1_CI_AS 3 SQLCHAR 0 50 "\r\n" 3 LastName SQL_Latin1_General_CP1_CI_AS Let’s break the preceding format file down. The first row states the version of BCP that the format file is from (v10.0 is SQL Server 2008’s BCP utility). The second row lists how many fields there are in the data file. In this case there are three columns. The next three rows describe each of the data fields, and the corresponding SQL table column they map to. 326 Chapter8•ETLTechniques Table 8.3 explains each of the elements of the format file field definitions for the second field definition in the format file: Purpose Sample Value Description Host File Field Order 2 Indicates the ordinal position of the field as it is in the data file Host field data type SQLCHAR The storage type of the data in the data files. In our example everything is just SQLCHAR because the file is a character file. Host field prefix length 0 Can be zero unless the field contains NULLs. Learn more in the SQL Server 2008 documentation. Host field data length 50 The length of the host file data field in bytes. The firstname field in the original table was 50 characters, or 50 bytes wide. Host file field terminator “\t” The character that will be used in the data file to indicate the end of the field. The “\t” value here means that the “tab” character is the field terminator. Server Column Num 2 The position of the destination column in the target database object Server column name FirstName The name of the destination column in the target database object Server column collation SQL_Latin1_General_ CP1_CI_AS The collation of the destination column in the target database object. Table 8.3 Format File Field Definition So now that you have a format file, use it during an export from the AdventureWorks2008.dbo.Presidents table (the following command is printed in the book on two lines, but should be entered as a single line: bcp AdventureWorks2008.dbo.Presidents out Presidents.tsv -T -f Presidents.fmt . of BCP that the format file is from (v10.0 is SQL Server 2008 s BCP utility). The second row lists how many fields there are in the data file. In this case there are three columns. The next three. terminator “ ” The character that will be used in the data file to indicate the end of the field. The “ ” value here means that the “tab” character is the field terminator. Server Column Num 2 The. contains NULLs. Learn more in the SQL Server 2008 documentation. Host field data length 50 The length of the host file data field in bytes. The firstname field in the original table was 50 characters,

Ngày đăng: 07/07/2014, 00:20

TỪ KHÓA LIÊN QUAN