4 – Managing data growth 100 Figure 4.5: Physical Disk performance object in Perfmon. With all monitoring systems a go, I am ready to load up a heap table called book_list that I created in the All_Books_Ever_Read database. The Books- List.txt file has approximately 58 thousand records, so I'm going to use the BCP batch file technique (see Listing 3.3, in Chapter 3) to iterate through the file 50 times, and load 2.9 million records into the database. Now it is time to begin the load. A quick peek at Perfmon, see Figure 4.6, shows the current absence of activity prior to executing a hefty query. Figure 4.6: Perfmon low disk activity. Executing Load … now! Please don't turn (or create) the next page …!! Sorry! I could not resist the Sesame Street reference to The Monster at the End of This Book. In fact, the load proceeds with little fanfare. Imagine this is being done in the middle of the afternoon, perhaps after a big lunch or, worse, early in the AM (DBA:M most likely) before your second sip of coffee, with you blissfully unaware of what's unfolding on one of your servers. Figure 4.7 shows the BCP bulk insert process running. 4 – Managing data growth 101 Figure 4.7: BCPing data into the All_Books_Ever_Read database. You can see that the batch process ran 50 times at an average of 2.5 seconds a run, with a total load time of roughly 2 minutes. Not bad for 2.9 million records. Now for the bad news: Figure 4.8 shows how much growth can be directly attributed to the load process. Figure 4.8: Log file growth loading millions of records into table. NOTE For comparison, in a test I ran without ever having backed up the database, the data file grew to over 3 GB, but the log file grew only to 150 MB. 4 – Managing data growth 102 Both the data file and the log file have grown to over 3GB. The Profiler trace, as shown in Figure 4.9, reveals that a combined total of 3291 Auto Grow events took place during this data load. Notice also that the duration of these events, when combined, is not negligible. Figure 4.9: Data and log file growth captured with Profiler. Finally, Figure 4.10 shows the Perfmon output during load. As you can see, % Disk Time obviously took a hit at 44.192 %. This is not horrible in and of itself; obviously I/O processes require disk reads and writes and, because "Avg Disk Queue Length" is healthily under 3, it means the disk is able to keep up with the demands. However, if the disk being monitored has a %DiskTime of 80%, or more, coupled with a higher (>20) Avg Disk Queue Length, then there will be performance degradation because the disk can not meet the demand. Inefficient queries or file growth may be the culprits. Figure 4.10: Perfmon disk monitor. 4 – Managing data growth 103 Average and Current Disk Queue Lengths are indicators of whether or not bottlenecks might exist in the disk subsystem. In this case, an Average Disk Queue Length of 1.768 is not intolerably high and indicates that, on average, fewer than 2 requests were queued, waiting for I/O processes, either read or write, to complete on the Disk. What this also tells me is that loading 2.9 million records into a heap table, batching or committing every 50,000 records, and using the defaults of the Model database, is going to cause significant I/O lag, resulting not just from loading the data, but also from the need to grow the data and log files a few thousand times. Furthermore, with so much activity, the database is susceptible to unabated log file growth, unless you perform regular log backups to remove inactive log entries from the log file. Many standard maintenance procedures implement full backups for newly created databases, but not all databases receive transaction log backups. This could come up to bite you, like the monster at the end of this chapter, if you forget to change the recovery model from Full to Simple, or if you restore a database from another system and unwittingly leave the database in Full recovery mode. Appropriately sizing your data and log files Having seen the dramatic impact of such bulk load operations on file size, what I really want to know now is how much I could reduce the I/O load, and therefore increase the speed of the load process, if the engine hadn't had to grow the files 3291 times, in 1 MB increments for the data file, and 10% increments for the log file. In order to find out, I need to repeat the load process, but with the data and log files already appropriately sized to handle it. I can achieve this by simply truncating the table and backing up the transaction log. This will not shrink the physical data or log files but it will free up all of the space inside them. Before I do that, take a look at the sort of space allocation information that is provided by the sp_spaceused built-in stored procedure in Figure 4.11. Figure 4.11: Output of sp_spaceused for the loaded Book_List table. 4 – Managing data growth 104 As you can see, the Book_List table is using all 3.3 GB of the space allocated to the database for the 2.9 million records. Now simply issue the TRUNCATE command. Truncate Table Book_List And then rerun sp_spaceused. The results are shown in Figure 4.12. Figure 4.12: sp_spaceused after truncation. You can verify that the data file, although now "empty", is still 3.3GB in size using the Shrink File task in the SSMS GUI. Right click on the database, and select "Tasks |Shrink | Files". You can see in Figure 4.13 that the All_Books_Ever_Read.mdf file is still 3.3 GB in size but has 99% available free space. What this means to me as a DBA, knowing I am going to load the same 2.9 million records, is that I do not expect that the data file will grow again. Figure 4.14 shows the command window after re-running the BCP bulk insert process, superimposed on the resulting Profiler trace. . your second sip of coffee, with you blissfully unaware of what's unfolding on one of your servers. Figure 4.7 shows the BCP bulk insert process running. 4 – Managing data growth 101