The Real MTCS SQL Server 2008 Exam 70/432 Prep Kit- P71 pdf

332 Chapter8•ETLTechniques After running the statement, however, you get a simple report back from SQL Server: (19972 row(s) affected) Of course, you could use the same format files (either traditional, or XML) that we discussed earlier. So as you can see the BULK INSERT statement is very similar in functionality to the BCP command line utility. From the previous two sections you should have a pretty good idea about the mechanics of bulk inserting data. You may be wondering what all the parameters we haven’t discussed are for. Mostly, they have to do with performance. In the next two sections, well discuss a few pointers on maximizing the performance of your bulk loads. We’ll start by looking at how the transaction log is used during bulk operations. But first, get your hands dirty and try a BULK INSERT. EXERCISE 8.2 Us i n g BULK iNSERT In this exercise, you will export and import the data file that you created previously in Exercise 8.1 back into SQL Server. This exercise assumes that you have administrative privileges on the SQL Server instance you are working with, that you have the AdventureWorks2008 sample database installed on your SQL Server instance, and that you are running the exercise from the same computer where the SQL Server instance is installed. 1. Launch SQL Server Management Studio and open a new query window in the AdventureWorks2008 database. 2. Create the target table by running the following T-SQL statement: SELECT TOP 0 * INTO AdventureWorks2008.Person.PersonCopy FROM AdventureWorks2008.Person.Person; 3. Use the following T-SQL statement to load the data from the data file into the new table: BULK INSERT AdventureWorks2008.Person.PersonCopy FROM 'C:\bcp\Person.tsv' WITH (DATAFILETYPE='widechar'); 4. Run the following query to view the imported data: SELECT * FROM AdventureWorks2008.Person.PersonCopy; ETLTechniques•Chapter8 333 Recovery Model and Bulk Operations Every SQL Server database has an option that determines its recovery model. The recovery model of the database determines how the transaction log can be used for backups, and how much detail is recorded in the live log for bulk operations. A database’s recovery model can be set to FULL, BULK_LOGGED, or SIMPLE. The FULL recovery model specifies that all transactions, including bulk operations, will be fully logged in the transaction log. The problem with having the FULL recovery model turned on when you are doing bulk operations is that every record that is inserted gets completely logged in the databases transaction log. If you are loading several records, you might end up with a problem. It can fill the databases transaction log up, and the logging activity itself can slow down the bulk operation. The FULL recovery model does make it possible to do point-in-time restores, even partway through a bulk operation, using the transaction log in the event of a failure. The BULK_LOGGED recovery model records all regular transactions fully just liked the FULL recovery model. Bulk operations are minimally logged, however. What does that mean? Rather than recording the details of every row that was written, the transaction log tracks only which data pages and extents were modified by the bulk operation. The upside is that you don’t bloat the log with a large number of inserts, and because less I/O is being performed against the log, performance can increase. The downside is that the transaction log alone no longer has all the information required to recover the database to a consistent state. When you back up the transaction log that contains information about bulk operations, the actual data extents that were modified by the bulk operation are included in the log backup. That sounds weird, but it’s true. The log backup actually contains extents from the data files, thereby making it possible to restore the transaction log backup and get all the data that the bulk operation inserted back as well. You should also note that the live log can remain small (because it doesn’t have to log every insert performed as part of the bulk load), but the log backup will be large because the log backup contains the actual database extents that were modified. However, when you are using the BULK_LOGGED recovery model, there is some exposure to loss. If a catastrophic failure were to occur after the bulk operation completed, but before you had a chance to back up the log, or the database, you could lose the data that was loaded. This implies that when you are using the BULK_LOGGED recovery model, you must perform at least a transaction log backup of the database immediately after the bulk operation completes. A transaction log backup is enough, but it doesn’t hurt to do full or differential database backups as well. 334 Chapter8•ETLTechniques Regardless of whether you are using the FULL or BULK_LOGGED recovery model, SQL Server will keep all entries in the transaction log until they are backed up using a BACKUP LOG statement, thereby ensuring that you can back up a contiguous chain of all transactions that have occurred on your database and that you can then restore the database using the transaction log backups. This is true even with the BULK_LOGGED recovery model, as long as you back up the log immediately after a bulk operation occurs. The SIMPLE recovery model is not typically recommended for production databases. The big reason is that SQL Server can clear entries from the log, even though they may not have been backed up yet. However, as far as how the log works with bulk operations, SIMPLE is the same as BULK_LOGGED. After a bulk operation is performed, however, you have no choice of doing a log backup. You must follow up with a full or differential database backup. So what recovery model should you be using? SIMPLE isn’t a viable option for critical production databases because it doesn’t allow you to back up the transaction log. FULL is the best option in terms of recoverability because it allows you to back up the log, and the log contains all the details. BULK_LOGGED, however, can offer performance and maintenance benefits when doing bulk operations. The answer then is really a mixture of FULL and BULK_LOGGED. It is generally recommended that you leave your production databases with a FULL recovery model. When doing a bulk operation you would first run a statement to change the recovery model to BULK_LOGGED, do the bulk load, run another statement to change the recovery model back to FULL, and then back up the transaction log. A couple of other requirements must be met for minimal logging to occur. Minimal logging requires that the target table not be replicated and that a TABLOCK be placed on the table by the bulk operation. It also requires that the target table not have any indices on it, unless the table is empty. If the table already has data in it and it has one or more indices, it may be better to drop the indices before the load, and then rebuild them after. Of course, this should be tested in your own environment. The following sample code shows an example of a minimally logged BULK INSERT: ALTER DATABASE AdventureWorks2008 SET RECOVERY BULK_LOGGED; BULK INSERT AdventureWorks2008.Person.PersonCopy FROM 'C:\bcp\Person.tsv' WITH (DATAFILETYPE='widechar', TABLOCK); ALTER DATABASE AdventureWorks2008 SET RECOVERY FULL; BACKUP LOG AdventureWorks2008 To DISK='C:\…\SomeFile.bak' ETLTechniques•Chapter8 335 Note that the preceding code is only a sample. The AdventureWorks2008 database actually uses the SIMPLE recovery model by default. Although the code shown in this example would work, it assumes that the full database backup has already been performed. Log backups can’t be run unless a full backup has been performed. If you do try the preceding code, you might want to set the recover model back to SIMPLE when you are done. Using the right recovery model and bcp options to enable minimal logging can help improve performance by not writing as much detail to the live transaction log for a database. These steps reduce the amount of work the hard drives must do and can accelerate the performance of your bulk loading. It can also make the load more manageable by not bloating the transaction log with a large amount of data. This bloat alone could actually cause a bulk load to fail if the log filled to capacity. Figure 8.1 shows a performance monitor chart of the Percent Log Used counter for the AdventureWorks2008 database. The chart shows the log utilization for two bulk loads. The first load was not minimally logged. The second load was. You can see the dramatic difference in performance between the two modes. Figure 8.1 Minimal Logging Performance Impact 336 Chapter8•ETLTechniques There are other ways to optimize performance, though. In the next section we will cover some ways to optimize the performance of bulk load operations. Optimizing Bulk Load Performance The whole point of performing bulk loads is performance. Well, performance and convenience, but performance is probably the critical part. You want to get as much data into the server as fast as you can, and with as little impact on the server as possible. As we discussed in the previous topic, configuring your bulk loads to be minimally logged can significantly improve the performance and decrease the negative impacts of bulk loads. However, you have other options that you can use to help manage bulk loads as well as improve their performance. These options include breaking the data into multiple batches, and presorting the data to match the clustered index on the target table. Both BCP and BULK INSERT support breaking the load of large files down into smaller batches. The default behavior is that a single batch is used. Each batch equates to a transaction. Therefore, the default is that the bulk operation is performed as a single transaction. One big problem with this option is that the entire load succeeds, or the entire load fails. It also means that the transaction log information that is maintained for the bulk load can’t be cleared from the log until the bulk operation completes. You can optimize the loading of your bulk data by breaking it down into smaller batches. This allows you to fail only the batch rather than the whole load if an error occurs. When you restart the process, you could restart (using the first row options) with the specific batch. It also allows the log to be cleared if backup operations run during the bulk load time frame. Finally, it allows you to break a larger data file into pieces and have it be run by multiple clients in parallel. Of course, if you didn’t have a performance problem to start with, using batches can actually make things worse. So you really need to test with the options to find the optimal settings for your situation. You can also help improve the performance of your bulk loads by making sure that the data in the data file is sorted by the same order as the clustered index key on the target table. If you know this is the case, you can specify to the bulk operation that the data is presorted using the ORDER hint of the BCP utility or BULK INSERT statement. This can improve the performance of bulk loads to tables with clustered indexes. In addition, it may be beneficial to drop nonclustered indices on the table before the load, and re-create them after the load. If the table is empty to start with, this may not help, but if the table has data in it before the load, then it could provide a performance improvement. Of course you should test this with your own databases. . database installed on your SQL Server instance, and that you are running the exercise from the same computer where the SQL Server instance is installed. 1. Launch SQL Server Management Studio. by the same order as the clustered index key on the target table. If you know this is the case, you can specify to the bulk operation that the data is presorted using the ORDER hint of the. window in the AdventureWorks2008 database. 2. Create the target table by running the following T -SQL statement: SELECT TOP 0 * INTO AdventureWorks2008.Person.PersonCopy FROM AdventureWorks2008.Person.Person; 3.

Định dạng
Số trang	5
Dung lượng	173,48 KB