ptg 1484 CHAPTER 39 Monitoring SQL Server Performance FIGURE 39.28 Viewing CPU utilization in the Utility Explorer. SELECT top 10 spid, lastwaittype, dbid, uid, cpu, physical_io, memusage,status, loginame, program_name from sys.sysprocesses ORDER BY cpu desc This statement lists the top 10 CPU resource hogs active in SQL Server. After you identify which processes are causing a burden on the CPU, you can check whether they can be either turned off or moved to a different server. If they cannot be turned off or moved, you might want to consider upgrading the processor. The same information is available via the new DMV: SELECT top 10 session_id, command, database_id, user_id, cpu_time, reads, writes, logical_reads from sys.dm_exec_requests order by cpu_time desc Taking a peek at the SQL Server schedulers (using the sys.dm_os_schedulers DMV) also shows whether the number of runnable tasks is getting bogged down. If the runnable_tasks_count values are nonzero, there aren’t enough CPU time slices available to run the current SQL Server workload. The following example shows how you query the dm_os_schedulers view for this information: SELECT scheduler_id, current_tasks_count, runnable_tasks_count FROM Sys.dm_os_schedulers And finally, to get an idea of the top CPU hogs in SQL Server cached SQL and stored procedures, you can query the sys.dm_exec_query_stats DMV and aggregate on total_worker_time to get the total CPU consumption, as follows: ptg 1485 A Performance Monitoring Approach 39 SELECT top 50 sum(total_worker_time) as Total_CPU, sum(execution_count) as Total_Count, count(*) as Total_Statements, plan_handle FROM sys.dm_exec_query_stats GROUP BY plan_handle Order by 1 desc GO To actually see the SQL code that is behind the plan_handle, you can execute the dm_exec_sql_text function to get your “hog” list: SELECT total_worker_time, b.text FROM sys.dm_exec_query_stats A CROSS APPLY sys.dm_exec_sql_text (A.plan_handle) AS B order by 1 desc Monitoring Memory Memory, like a processor, is divided into segments for each process running on the server. If there is a demand for more memory than is available, the operating system has to use virtual memory to supplement the physical memory. Virtual memory is storage allocated on the hard disk; it is named PAGEFILE.SYS under Windows. It is important to remember that when the operating system or SQL Server isn’t able to use memory to find something and has to use virtual memory stored on the disk, performance degrades. You need to work on minimizing this situation, known as swapping or page faulting. Monitoring Memory with Performance Monitor Table 39.4 reflects the main performance objects and counters that are best utilized to monitor memory for SQL Server. TABLE 39.4 Memory-Related Performance Objects and Counters Performance Monitor Object Description Process: Working Set|sqlservr The set of memory pages touched recently by the threads in the process (SQL Server, in this case). SQLSERVER:Buffer Manager: Buffer cache hit ratio The percentage of pages that were found in the buffer pool without having to incur a read from disk. SQLSERVER:Buffer Manager: Total Pages The total number of pages in the buffer pool, including database pages, free pages, and stolen pages. SQLSERVER:Memory Manager: Total Server Memory(KB) The total amount of dynamic memory the server is currently consuming. SQLSERVER:Memory Manager: SQL Cache Memory(KB) The total amount of dynamic memory the SQL Server cache is currently consuming. ptg 1486 CHAPTER 39 Monitoring SQL Server Performance To observe the level of the page faulting, you can look at the Memory: Page Faults/sec and Process: Page Faults (for a SQL Server instance) counters. Next in line are the MSSQL Buffer Manager: Buffer Cache hit ratio and MSSQL Buffer Manager: Total Pages counters. They directly indicate how well SQL Server is finding data in its controlled memory (cache). You want to achieve a near 90% or higher ratio here. If the Memory: Pages/sec counter is greater than 0 or the Memory: Page Reads/sec counter is greater than 5, the operating system is being forced to use the disk to resolve memory references. These are called hard faults. The Memory: Pages/sec counter is one of the best indicators of the amount of paging that Windows is doing and the adequacy of SQL Server’s current memory configuration. Because the memory used by SQL Server 2008 dynamically grows and shrinks, you might want to track the exact usage by using either Process: Working Set: SQLServer or MSSQL: Memory Manager: Total Server Memory (KB) counters. These counters indicate the current size of the memory used by the SQL Server process. If these numbers are consistently high, compared to the amount of physical memory in the machine, you are probably ready to install more memory on the server. If you see a performance degrada- tion because SQL Server must continually grow and shrink its memory, you should either remove some of the other services or processes running or configure SQL Server to use a fixed memory allocation. Monitoring Memory in SSMS If you are collecting performance data into the MDW using the Data Collector for a SQL Server instance, you can view more detailed memory usage and memory wait information in the Server Activity History report (refer to Figure 39.12). You can click on the Memory Usage graph to drill down and examine the details of the total memory usage in SQL TABLE 39.4 Memory-Related Performance Objects and Counters Performance Monitor Object Description SQLSERVER:Plan Cache: Cache hit ratio The ratio between lookups and cache hits for query plans in the plan cache. SQLSERVER:Buffer Manager: Total Pages The total number of pages in the plan cache. Memory: Pages/sec The number of pages read from or written to disk to resolve hard page faults. This value usually gives a direct indication of memory issues. Memory: Pages Read/sec The number of times the disk was read to resolve hard page faults. Memory: Page Faults/sec The overall rate at which faulted pages are handled by the processor. Process: Page Faults/sec|sqlservr The rate of page faults occurring in the threads associ- ated with a process (SQL Server, in this case). ptg 1487 A Performance Monitoring Approach 39 Server and the cache and page ratios. It also displays a list of all processes in the server and the total working set size. You can use this information to see if other running processes may be using memory that should be left available for SQL Server. DMVs or System Views for Monitoring Memory The DMVs associated with memory are numerous. The ones you’ll most likely utilize are memory clerks, memory pools, and cache counters. You can find memory allocations by type by using the sys.dm_os_memory_clerks DMV view: SELECT type, sum(multi_pages_kb) from sys.dm_os_memory_clerks WHERE multi_pages_kb <> 0 GROUP BY type order by 2 desc To see how the cache is being used, you can query sys.dm_os_memory_cache_counters: SELECT substring(name,1,25) AS Name, single_pages_kb, single_pages_in_use_kb FROM sys.dm_os_memory_cache_counters order by single_pages_kb desc Finally, when you want to see the total pages allocated to the different objects in memory, you use the sys.dm_os_memory_objects DMV: SELECT substring(type,1,25) as Type, sum(pages_allocated_count) as Total_Memory_Allocated FROM sys.dm_os_memory_objects group by type order by 2 desc Several new DMVs were introduced in SQL Server 2008 to simplify retrieval of informa- tion that can be helpful with memory troubleshooting. In some cases, the newly intro- duced DMVs provide information that was previously available only in DBCC MEMORYSTATUS output. Table 39.5 provides a summary of the new DMVs available for memory troubleshooting. TABLE 39.5 New Memory-Related DMVs in SQL Server 2008 DMV Description sys.dm_os_memory_brokers Provides information about memory alloca- tions using the internal SQL Server memory manager. The information provided can be useful in determining very large memory consumers. ptg 1488 CHAPTER 39 Monitoring SQL Server Performance TABLE 39.5 New Memory-Related DMVs in SQL Server 2008 DMV Description sys.dm_os_memory_nodes and sys.dm_os_memory_node_access_stats Provide information about physical nonuniform memory access (NUMA) memory nodes and node access statistics grouped by the type of the page. ( sys.dm_os_memory_node_access_stats is populated only if dynamic trace flag 842 is enabled due to its performance impact.) sys.dm_os_nodes Provides information about CPU node configu- ration for SQL Server. This DMV also reflects software NUMA (soft-NUMA) configuration. sys.dm_os_process_memory Provides overview information about SQL Server memory usage, including the total physical memory in use and the memory utilization percentage sys.dm_os_sys_memory Provides overview information about the system memory usage including total physical memory on the system and the available physical memory. sys.dm_resource_governor_configuration, sys.dm_resource_governor_resource_pools, and sys.dm_resource_governor_workload_groups Provide information about the state of the Resource Governor feature of SQL Server 2008. Some of the configuration parameters of Resource Governor affect how SQL Server allocates memory; you should check these parameters during memory troubleshooting. For more information on using Resource Governor and its impact on memory alloca- tion, see Chapter 40, “Managing Workloads with the Resource Governor.” Monitoring the Disk System SQL Server performance depends heavily on the I/O subsystem. SQL Server is constantly reading pages from and writing pages to disk via the data cache. It is also constantly writing to disk via the transaction log. Focusing on the database data files, transaction log files, and especially tempdb can yield great performance for your SQL Server platform. Table 39.6 lists the essential performance objects and counters related to monitoring the disk system. ptg 1489 A Performance Monitoring Approach 39 Slow disk I/O causes a reduction in the transaction throughput. To identify which disks are receiving all the attention, you should monitor both the Physical Disk and Logical Disk performance objects. You have many more opportunities to tune at the disk level than with other components, such as processors. This has long been the area where data- base administrators and system administrators have been able to get better performance. You can start by looking at the behavior of the Physical Disk: Current Disk Queue Length and Physical Disk: Avg. Disk Queue Length counters for all disks or for each particular disk. This way, you can identify where most of the activity is, from a disk-usage point of view. As you monitor each individual disk, you might see that some drives are not as busy as others. You can relocate heavily used resources to minimize these long queue lengths that you have uncovered and spread out the disk activity. Common techniques for this are to relocate indexes away from tables, isolate read-only tables away from volatile tables, and so on. You need to take special care with tempdb. The best practice is to isolate it away from all other disk I/O processing. TABLE 39.6 Disk Usage-Related Performance Objects and Counters Performance Monitor Object Description Physical Disk: Current Disk Queue Length The number of outstanding requests (read/write) for a disk. Physical Disk: Avg. Disk Queue Length The average number of both read and write requests queued for disks. Physical Disk: Disk Read Bytes The rate at which bytes are transferred from the disk during read operations. Physical Disk: Disk Write Bytes The rate at which bytes are transferred to the disk during write operations. Physical Disk: % Disk Time The percentage of elapsed time that the selected disk drive is busy servicing read or write requests. Logical Disk: Current Disk Queue Length The number of outstanding requests (read/write) for a disk. Logical Disk: Avg. Disk Queue Length The average number of both read and write requests queued for disks. Logical Disk: Disk Read Bytes The rate at which bytes are transferred from the disk during read operations. Logical Disk: Disk Write Bytes The rate at which bytes are transferred to the disk during write operations. Logical Disk: % Disk Time The percentage of elapsed time that the selected disk drive is busy servicing read or write requests. ptg 1490 CHAPTER 39 Monitoring SQL Server Performance The Physical Disk: % Disk Time counter for each physical disk drive shows the percent- age of time that the disk is active; a continuously high value could indicate an underper- forming disk subsystem. Of course, the monitoring up to this point shows only half the picture if drives are parti- tioned into multiple logical drives. To see the work on each logical drive, you need to examine the logical disk counters; in fact, you can monitor read and write activity separately with Logical Disk: Disk Write Bytes/sec and Logical Disk: Disk Read Bytes/sec. You should be looking for average times below 20ms. If the averages are over 50ms, the disk subsystem is in serious need of replacement, reconfiguration, or redistribution. If you use RAID, you need to know how many physical drives are in each RAID array to figure out the monitored values of disk queuing for any one disk. In general, you just divide the disk queue value by the number of physical drives in the disk array. This calcu- lation gives you a fairly accurate number for each physical disk’s queue length. Monitoring SQL Server’s Disk Activity In the preceding section, we looked at monitoring overall disk activity. In this section, we examine what SQL Server’s contribution is to all this disk activity. Disk activity can be categorized into reads and writes. SQL Server carries out writes to the disk for the follow- ing processes: . Logging records . Writing dirty cache pages at the end of a transaction . Freeing space in the page cache Logging is a constant occurrence in any database that allows modifications, and SQL Server attempts to optimize this process by batching a number of writes together. To see how much work is done on behalf of the database logs, you can examine the SQLServer:Databases:Log Bytes Flushed and SQLServer:Databases:Log Flushes/sec counters. The first tells you the quantity of the work, and the second tells you the frequency. The third kind of write occurs to make space within the page cache. This is carried out by the Lazy Writer process, which you can track with the counter SQLServer:Buffer Manager:Lazy Writes. You also can easily monitor the amount of reading SQL Server is doing by using the counter SQLServer:Buffer Manager:Page Reads. Monitoring Disk Performance in SSMS If you are collecting performance data into the MDW using the Data Collector for a SQL Server instance, you can view more detailed disk performance in the Server Activity History report (refer to Figure 39.12). You can click on the Disk Usage graph to drill down and examine the details on Disk Response Time, Average Disk Queue Length, and Disk ptg 1491 A Performance Monitoring Approach 39 Transfer Rate. In addition, the System Disk Usage detail report also lists the top 20 processes with the highest I/O writes or highest I/O reads per second. This information can help you determine which processes besides SQL Server are performing a large number of disk reads and/or writes that could be affecting the I/O performance of SQL Server. DMVs or System Views for Monitoring Disk System Items There are several I/O-related DMVs and functions. They cover backup tape I/O, pending I/O requests, I/O on cluster shared drives, and virtual file I/O statistics. The best of these is the sys.dm_io_virtual_file_stats function, which allows you to see the file activity within a database allocation. You supply the database ID as the first para- meter, along with the file ID of the database file as the second parameter. This yields an accumulating set of statistics that can be used to isolate and characterize heavy I/O: SELECT cast(db_name(database_id) as varchar(12)) as dbname, file_id, num_of_reads as numreads, num_of_bytes_read as bytesread, num_of_bytes_written as byteswritten, size_on_disk_bytes as size FROM sys.dm_io_virtual_file_stats (5,1) GO dbname file_id numreads bytesread byteswritten size UnleashedMDW 1 7022 411140096 759291904 1048576000 In addition, the OS wait stats for I/O latch waits are great for identifying when reading or writing of a page is not available from the data cache. These latch waits account for the physical I/O waits when a page is accessed for reading or writing. When the page is not found in the cache, an asynchronous I/O gets posted. If there is any delay in the I/O, the PAGEIOLATCH_EX or PAGEIOLATCH_SH latch waits are affected. An increased number of latch waits indicates that an I/O bottleneck exists. The following query reveals this latch wait information: SELECT substring(wait_type,1,15) AS Latch_Waits, waiting_tasks_count, wait_time_ms FROM sys.dm_os_wait_stats WHERE wait_type like ‘PAGEIOLATCH%’ ORDER BY wait_type GO Latch_Waits waiting_tasks_count wait_time_ms PAGEIOLATCH_DT 0 0 PAGEIOLATCH_EX 2871 61356 PAGEIOLATCH_KP 0 0 PAGEIOLATCH_NL 0 0 PAGEIOLATCH_SH 7305 98120 PAGEIOLATCH_UP 1372 7318 ptg 1492 CHAPTER 39 Monitoring SQL Server Performance Monitoring Other SQL Server Performance Items In addition to the main system items that affect performance which have been covered already, there are other items in SQL Server that affect system performance that you should include in your performance monitoring approach. Monitoring Locks One of the often-overlooked areas of performance degradation is locking. You need to ensure that the correct types of locks are issued and that the worst kind of lock, a blocking lock, is kept to a minimum. A blocking lock, as its name implies, prevents other users from continuing their own work. An easy way to identify the level of blocking locks is to use the counter SQLServer:Memory Manager:Lock Blocks. If this counter frequently indicates a value greater than 0, you need to examine the queries being executed or even revisit the database design. For a more detailed discussion on monitoring locking in SQL Server and minimizing locking contention, see Chapter 37, “Locking and Performance.” Monitoring Users Even though you cannot always trace performance problems directly to the number of users connected, it is a good idea to occasionally monitor how this number fluctuates. It is fairly easy to trace one particular user who is causing a massive performance problem. The leverage point here is to see the current number of user connections with the SQLServer:General Statistics:User Connections counter in conjunction with other objects and counters. It is easy to say that the disk subsystem is a bottleneck, but how many users is SQL Server supporting at the time? Summary Attacking SQL Server performance is not a simple task because so many variables are involved. Tuning queries and proper database design are a huge part of this, but dealing with SQL Server as an engine that consumes resources and the physical machine is equally important. This is why it is so critical to take an orderly, methodical approach when undertaking this task. As pointed out in this chapter, you need to basically peel apart the box on which SQL Server has been installed, one component at a time (network, CPU, memory, and disk). This way, you can explore the individual layer or component in a clear and concise manner. Within a short amount of time, you will be able to identify the biggest performance offenders and resolve them. The next chapter, “Managing Workloads with Resource Governor,” discusses the new feature in SQL Server 2008 that lets you control the allocation of resources to SQL Server sessions. This feature can help avoid situations in which a runaway query consumes exces- sive resources in SQL Server. ptg CHAPTER 40 Managing Workloads with the Resource Governor IN THIS CHAPTER . Overview of Resource Governor . Resource Governor Components . Configuring Resource Governor . Monitoring Resource Usage . Modifying Your Resource Governor Configuration If you have ever had a user kick off a runaway report that brought the system to its knees, effectively halting your production online transaction processing (OLTP) activity, you might have wished for a mechanism in SQL Server that would limit the amount of hardware resources allocated to ad hoc reporting requests so that normal production activ- ity was not affected. Such a mechanism could prevent certain processes from consuming too many of the available SQL Server resources, ensuring that your more critical, higher-priority processes would consistently have access to the resources they need. Fortunately, SQL Server 2008 now provides such a mecha- nism: Resource Governor. Resource Governor allows you to classify different types of sessions on your server, which in turn allows you to control how server resources are assigned to a given activity. In SQL Server 2005 and earlier, queries fought among themselves to decide which one would grab the necessary resources first, and it was hard to predict who would win out. By using Resource Governor, you are able to instruct SQL Server to limit the resources a particular session can access. This capability can help ensure that your OLTP processes continue to provide predictable perfor- mance that isn’t adversely affected by unpredictable activ- ity. For example, with Resource Governor, you can specify that no more than 20% of CPU and/or memory resources should be allocated to running reports. When this feature is enabled, no matter how many reports are run, they can never exceed their designated resource allocation. Of course, this reduces the performance of the reports, but at least your production OLTP performance isn’t as negatively affected by runaway reports anymore. . pages, and stolen pages. SQLSERVER:Memory Manager: Total Server Memory(KB) The total amount of dynamic memory the server is currently consuming. SQLSERVER:Memory Manager: SQL Cache Memory(KB) The. process (SQL Server, in this case). SQLSERVER:Buffer Manager: Buffer cache hit ratio The percentage of pages that were found in the buffer pool without having to incur a read from disk. SQLSERVER:Buffer. (for a SQL Server instance) counters. Next in line are the MSSQL Buffer Manager: Buffer Cache hit ratio and MSSQL Buffer Manager: Total Pages counters. They directly indicate how well SQL Server