SQL Server Tacklebox- P15 ppt

5 136 0
SQL Server Tacklebox- P15 ppt

Đang tải... (xem toàn văn)

Thông tin tài liệu

3 – The migratory data 70 we are loading data from the text file back in to the SQL_Conn table, which currently holds 58K rows. The -h TABLOCK hint forces a lock on the receiving table. This is one of the requirements to guarantee minimally logging the transactions. The –b option tells BCP to batch the transactions at n rows, in this case every 50,000 rows. If there are any issues during the BCP load process, any rollback that occurs will only rollback to the last transaction after the n load. So, say I wanted to load 100,000 records, and I batched the BCP load every 20,000 records. If there were an issue while loading record 81,002 I would know that 80,000 records were successfully imported. I would lose 1,002 transactions as they would roll back to the last 20,000 mark, which would be 80,000 records. The batch file takes one parameter, which is the number of times to run the BCP command in order to load the required number of rows into the table. How did I choose 20 iterations? Simple math: 20 * 58,040 = 1,160,800 records. As you can see in Figure 3.2, this is exactly the number of rows that is now in the SQL_Conn table, after 20 iterations of the BCP command, using the 58,040 records in the f1_out.txt file as the source. Figure 3.2: Query to count SQL_Conn after loading over 1 million records. NOTE For what it is worth, I have also used this batch file to load a Terabyte worth of data to test how we could effectively manage such a large data store. If you re-run the BCP command in Listing 3.2, to output the query results to a file, you will find that the process takes more than a minute for a million rows, as opposed to the previous 3 seconds for 58K rows, indicating that the time to output the records remains good (58,040 / 3 = 19,346 records per second * 60 3 – The migratory data 71 seonds = 1.16 million). I am still seeing nearly 20,000 records per second times(?) despite the increase in data, attesting to the efficiency of the old tried and true BCP. Filtering the output using queryout Rather than working with the entire table, you can use the queryout option of BCP to limit the data you will be exporting, by way of a filtered T-SQL query. Suppose I want to export data only from a particular time period, say for a run_date greater than October 1st of 2008.The query is shown in Listing 3.4. Select * from dba_rep SQL_Conn where run_date > '10/01/2008' Listing 3.4: Query to filter BCP output. There are many duplicate rows in the SQL_Conn table, and no indexes defined, so I would expect that this query would take many seconds, possibly half a minute to execute. The BCP command is shown in Listing 3.5. bcp "Select * from dba_rep SQL_Conn where run_date > '10/01/2008'" queryout "C:\Writing\Simple Talk Book\Ch3\bcp_query_dba_rep.txt" -n –T Listing 3.5: BCP output statement limiting rows to specific date range, using the output option. As you can see in Figure 3.3, this supposedly inefficient query ran through more than a million records and dumped out 64,488 of them to a file in 28 seconds, averaging over 2,250 records per second. Figure 3.3: BCP with queryout option. 3 – The migratory data 72 Of course, at this point I could fine tune the query, or make recommendations for re-architecting the source table to add indexes if necessary, before moving this type of process into production. However, I am satisfied with the results and can move safely on to the space age of data migration in SSIS. SSIS We saw an example of an SSIS package in the previous chapter, when discussing the DBA Repository. The repository is loaded with data from several source servers, via a series of data flow objects in an SSIS package ( Populate_DBA_Rep). Let's dig a little deeper into an SSIS data flow task. Again, we'll use the SQL_Conn table, which we loaded with 1 million rows of data in the previous section, as the source and use SSIS to selectively move data to an archive table; a process that happens frequently in the real world. Figure 3.4 shows the data flow task, "SQL Connections Archive", which will copy the data from the source SQL_Conn table to the target archive table, SQL_Conn_Archive, in the same DBA_Rep database. There is only a single connection manager object. This is a quite simple example of using SSIS to migrate data, but it is an easy solution to build on. Figure 3.4: Simple SSIS data flow. 3 – The migratory data 73 Inside the SQL Connections Archive data flow, there are two data flow objects, an OLE DB Source and OLE DB Destination, as shown in Figure 3.5. Figure 3.5: Source and destination OLE DB objects in SSIS. We'll use the OLE DB source to execute the query in Listing 3.4 against the source SQL_Conn table, to return the same 64,488K records we dumped out to a file previously. Instead of a file, however, the results will be sent to the OLE DB destination object, which writes them to a SQL_Conn_Archive table. Figure 3.6 shows the Source Editor of the OLE DB source object, including the qualified query to extract the rows from the source table, SQL_Conn. For the Data Access Mode, notice that I am using "SQL command"; other options are "Table or view", "Table Name or View Name Variable" and "SQL Command from variable". I am using SQL command here so as to have control over which fields and subset of data I wish to move, which is often a criteria for real world requests. Notice that I am filtering the data with a WHERE clause, selecting only transactions with a run_date greater than '10/01/08'. 3 – The migratory data 74 Figure 3.6: Source Editor for SQL_Conn query. Figure 3.7 shows the Source Editor for the OLE DB Destination object, where we define the target table, SQL_Conn_Archive, to which the rows will be copied. There are a few other properties of the destination object that are worth noting. I have chosen to use the Fast Load option for the data access mode, and I have enabled the Table Lock option, which as you might recall from the BCP section, is required to ensure minimally logged transactions. . 3.4 shows the data flow task, " ;SQL Connections Archive", which will copy the data from the source SQL_ Conn table to the target archive table, SQL_ Conn_Archive, in the same DBA_Rep. am using " ;SQL command"; other options are "Table or view", "Table Name or View Name Variable" and " ;SQL Command from variable". I am using SQL command here. several source servers, via a series of data flow objects in an SSIS package ( Populate_DBA_Rep). Let's dig a little deeper into an SSIS data flow task. Again, we'll use the SQL_ Conn table,

Ngày đăng: 04/07/2014, 23:20

Tài liệu cùng người dùng

Tài liệu liên quan