ptg 1534 CHAPTER 41 A Performance and Tuning Methodology Performance and Tuning Design Guidelines We outline some of the major performance and tuning design guidelines here. There are, of course, many more, but if you a least consider and apply the ones outline here, you should end up with a decently performing SQL Server implementation. As we have described previously, performance and tuning should first be “designed in” to your SQL Server implementation. Many of the guidelines discussed here can be adopted easily in this way. However, when you put off the performance and tuning until later, you have fewer options to apply and less performance improvement when you do make changes. Remember, addressing performance and tuning is like peeling an onion. And, for this reason, we present our guidelines in that way—layer by layer. This approach helps provide you with a great reference point for each layer and a list you can check off as you develop your SQL Server–based implementation. Just ask yourself whether you have considered the specific layer guidelines when you are dealing with that layer. Also, several chapters take you through the full breadth and depth of options and techniques introduced in many of these guidelines. We point you to those chapters as we outline the guidelines. Hardware and Operating System Guidelines Let’s start with the salient hardware and operation system guidelines that you should be considering: Hardware/Physical Server: . Server sizing/CPUs—Physical (or virtual) servers that will host a SQL Server instance should be roughly sized to handle the maximum processing load plus 35% more CPUs (and you should always round up). As an example, for a workload that you anticipate may be fully handled by a four-CPU server configuration, we recom- mend automatically increasing the number of CPUs to six. We also always leave at least one CPU for the operating system. So, if six CPUs are on the server, you should allocate only five to SQL Server to use. You can find details on configuring CPUs in Chapter 55, “Configuring, Tuning, and Optimizing SQL Server Options,” and details on monitoring CPU utilization in Chapter 39, “Monitoring SQL Server Performance.” . Memory—The amount of memory you might need is often directly related to the amount of data you need to be in the cache to achieve 100% or near 100% cache hit ratios. This, of course, yields higher overall performance. We don’t believe there is too much memory for SQL Server, but we do recognize that some memory must be left to the operating system to handle OS-level processing, connections, and so on. So, in general, you should make 90% of memory available to SQL Server and 10% to the OS. You can find details on configuring memory in Chapter 49 and details on monitoring memory utilization in Chapter 39. . Disk/SAN/NAS/RAID—Your disk subsystem can be a major contributor to perfor- mance degradation if not handled properly. We recognize that there are many differ- ent options available here. We generally try to have some separate devices on ptg 1535 Performance and Tuning Design Guidelines 41 different I/O channels so that disk I/O isolation techniques can be used. This means that you isolate heavy I/O away from other heavy I/O activity; otherwise, disk head contention causes massive slowdowns in physical I/O. When you use SAN/NAS stor- age, much of the storage is just logical drives that are heavily cached. This type of situation limits the opportunity to spread out heavy I/O, but the caching layers often alleviate that problem. In general, RAID 10 is great for high update activity, and RAID 5 is great for mostly read-only activity. You can find more information on RAID and storage options in Chapter 38, “Database Design and Performance.” Operating System: . Page file location—When physical memory is exceeded, paging occurs to the page file. You need to make sure that the page file is not located on one of your database disk locations; otherwise, performance of the whole server degrades rapidly. . Processes’ priority—You should never lower the SQL Server processes in priority or to the background. You should always have them set as high as possible. . Memory—As mentioned earlier, you should make sure that at least 10% of memory is available to the OS for all its housekeeping, connection handling, process threads, and so on. . OS version—You should make sure you are using the most recent version of the operating systems as you can and have updated with the latest patches or service packs. Also, often you must remove other software on your server, such as special- ized virus protection. We have lost track of the number of SQL Server implementa- tions we have found that had some third-party virus software installed (and enabled) on it, and all files and communication to the server were interrupted by the virus scans. Rely on Microsoft Windows and your firewalls for this protection rather than a third-party virus solution that gets in the way of SQL Server. If your organization requires some type of virus protection on the server, at least disable scanning of the database device files. Network: . Packet sizes/traffic—With broader bands and faster network adapters (typically at least 1GB now), we recommend you utilize the larger packet sizes to accommodate your heavier-traffic SQL Server instances. Packets of 8KB and larger are easily handled now. Information on configuring the SQL Server packet size is available in Chapter 49. . Routers/switches/balancers—Depending on if you are using SQL clustering or have multitiered application servers, you likely should utilize some type of load bal- ancing at the network level to spread out connections from the network and avoid bottlenecks. ptg 1536 CHAPTER 41 A Performance and Tuning Methodology SQL Server Instance Guidelines Next comes the SQL Server instance itself and the critical items that must be considered: . SQL Server configuration—We do not list many of the SQL Server instance options here, but many of the default options are more than sufficient to deal with most SQL Server implementations. See Chapter 49 for information on all the avail- able options. . SQL Server device allocations—Devices should be treated with care and not over- allocated. SQL databases utilize files and devices as their underlying allocation from the operating system. You do not want dozens and dozens of smaller files or devices for each database. Having all these files or devices becomes harder to administer, move, and manipulate. We often come into a SQL Server implementation and simplify the device allocations before we do any other work on the database. At a minimum, you should create data devices and log devices so that you can easily isolate (separate) them. . tempdb database—Perhaps the most misunderstood SQL Server shared resource is tempdb. General guidelines for tempdb is to minimize explicit usage (overusage) of it by limiting temp table creation, sorts, queries using DISTINCT clause, so on. You are creating a hot spot in your SQL Server instance that is mostly not in your control. You might find it hard to believe, but indexing, table design, and even not executing certain SQL statements can have a huge impact on what gets done in tempdb and have a huge effect on performance. And, of course, you need to isolate tempdb away from all other databases. For additional information on placing and monitoring tempdb, see Chapters 38 and 39. . master database—There is one simple guideline here: protect the master database at all costs. This means frequent backups and isolation of master away from all other databases. . model database—It seems harmless enough, but all databases in SQL Server utilize the model database as their base allocation template. We recommend you tailor this for your particular environment. . Memory—The best way to utilize and allocate memory to SQL Server depends on a number of factors. One is how many other SQL Server instances are running on the same physical server. Another is what type of SQL Server–based application it is: heavy update versus heavy reads. And yet another is how much of your application has been written with stored procedures, triggers, and so on. In general, you want to give as much of the OS memory to SQL Server as you can. But this amount should never exceed 90% of the available memory at the OS level. You don’t want SQL Server or the OS to start thrashing via the page file or competing against each other for memory. Also, when more than one SQL Server instance is on the same physical server, you need to divide the memory correctly for each. Don’t pit them against ptg 1537 Performance and Tuning Design Guidelines 41 each other. More information on configuring and monitoring SQL Server memory is available in Chapters 39 and 49. Database-Level Guidelines . Database allocations—We like to use an approach of putting database files for heavily used databases on the same drives as lightly used databases when more than one database is being managed by a single SQL Server instance. In other words, pair big with small, not big with big. This approach is termed reciprocal database pairing. You should also not have too many databases on a single SQL Server instance. If the server fails, so do all the applications that were using the databases managed by this one SQL Server instance. It’s all about risk mitigation. Remember the adage “never put all your eggs in one basket.” Databases have two primary file allocations: one for their data portion and the other for their transaction log portion. You should always isolate these file allocations from each other onto separate disk subsystems with separate I/O channels if possi- ble. The transaction log is a hot spot for highly volatile applications (that have frequent update activity). Isolate, isolate, and isolate some more. There is also a notion of something called reciprocal database device location. More information is available on this issue in Chapters 38 and 39. You need to size your database files appropriately large enough to avoid database file fragmentation. Heavily fragmented database files can lead to excessive file I/O within the operating system and poor I/O performance. For example, if you know your database is going to grow to 500GB, size your database files at 500GB from the start so that the operating system can allocate a contiguous 500GB file. In addition, be sure to disable the Auto-Shrink database option. Allowing your database files to continuously grow and shrink also leads to excessive file fragmentation as file space is allocated and deallocated in small chunks. . Database backup/recovery/administration—You should create a database back- up and recovery schedule that matches the database update volatility and recovery point objective. All too often a set schedule is used when, in fact, it is not the sched- ule that drives how often you do backups or how fast you must recover from failure. Table Design Guidelines . Table designs—Given the massively increased CPU, memory, and disk I/O speeds that now exist, you should use a general guideline to create as “normalized” a table design as is humanly possible. No longer is it necessary to massively denormalize for performance. Most normalized table designs are easily supported by SQL Server. Normalized table designs ensure that data has high integrity and low overall redun- dant data maintenance. See Dr. E. F. Codd’s original work on relational database design (The Relational Model for Database Management: Version 2, Addison Wesley, ptg 1538 CHAPTER 41 A Performance and Tuning Methodology 1990). Denormalize for performance as a last resort! For more information on normalization and denormalization techniques, see Chapter 38. NOTE Too often, we have seen attempts by developers and database designers to guess at the performance problems they expect to encounter denormalizing the database design before any real performance testing has even been done, This, more often than not, results in an unnecessarily, and sometimes excessively, denormalized database design. Overly denormalized databases require creating additional code to maintain the denor- malized data, and this often ends up creating more performance problems than it attempts to solve, not to mention the greater potential for data integrity issues when data is heavily denormalized. It is always best to start with as normalized a database as possible, and begin testing early in the development process with real data volumes to identify potential areas where denormalization may be necessary for performance reasons. Then, and only when absolutely necessary, you can begin to look at areas in your table design where denormalization may provide a performance benefit. . Data types—You must be consistent! In other words, you need to take the time to make sure you have the same data type definitions for columns that will be joined and/or come from the same data domain— Int to Int, and so on. Often the use of user-defined data types goes a long way to standardize the underlying data types across tables and databases. This is a very strong method of ensuring consistency. . Defaults—Defaults can help greatly in providing valid data values in columns that are common or that have been specified as mandatory (not NULL). Defaults are tied to the column and are consistently applied, regardless of the application that touches the table. . Check constraints—Check constraints can also be useful if you need to have checks of data values as part of your table definition. Again, it is a consistency capa- bility at the column level that guarantees that only correct data ends up in the column. Let us add a word of warning, though: you have to be aware of the insert and update errors that can occur in your application from invalid data values that don’t meet the check constraints. . Triggers—Often triggers are used to maintain denormalized data, custom audit logs, and referential integrity. Triggers are often used when you want certain behavior to occur when updates, inserts, and deletes occur, regardless of where they are initiated from. Triggers can result in cascading changes to related (dependent) tables or fail- ures to perform modifications because of restrictions. Keep in mind that triggers add overhead to even the simplest of data modification operations in your database and are a classic item to look at for performance issues. You should implement triggers sparingly and implement only triggers that are “appropriate” for the level of integrity or activity required by your applications, and no more than is necessary. Also, you need to be careful to keep the code within your triggers as efficient as ptg 1539 Performance and Tuning Design Guidelines 41 possible so the impact on your data modifications is kept to a minimum. For more information on coding and using triggers, see Chapter 30, “Creating and Managing Triggers.” . Primary keys/foreign keys—For OLTP and normalized table designs, you need to utilize explicit primary key and foreign key constraints where possible. For many read-only tables, you may not even have to specify a primary key or foreign key at all. In fact, you will often be penalized with poorer load times or bulk updates to tables that are used mostly as lookup tables. SQL Server must invoke and enforce integrity constraints if they are defined. If you don’t absolutely need them (such as with read-only tables), don’t specify them. . Table allocations—When creating tables, you should consider using the fill factor (free space) options (when you have a clustered index) to correspond to the volatil- ity of the updates, inserts, and deletes that will be occurring in the table. Fill factor leaves free space in the index and data pages, allowing room for subsequent inserts without incurring a page split. You should avoid page splits as much as possible because they increase the I/O cost of insert and update operations. For more infor- mation on fill factor and page splits, see Chapter 34, “Data Structures, Indexes, and Performance.” . Table partitioning—It can be extremely powerful to segregate a table’s data into physical partitions that are naturally accessed via some natural subsetting such as date or key range. Queries that can take advantage of partitions can help reduce I/O by searching only the appropriate partitions rather than the entire table. For more information on table partitioning, see Chapters 24, “Creating and Managing Tables,” and 34. . Purge/archive strategy—You should anticipate the growth of your tables and determine whether a purge/archive strategy will be needed. If you need to archive or purge data from large tables that are expected to continue to grow, it is best to plan for archiving and purging from the beginning. Many times, your archive/purge method may require modifications to your table design to support an efficient archive/purge method. In addition, if you are archiving data to improve perfor- mance of your OLTP applications, but the historical data needs to be maintained for reporting purposes, this also often requires incorporating the historical data into your database and application design. It is much easier to build in an archive/purge method to your database and application from the start than have to retrofit some- thing back into an existing system. Performance of the archive/purge process often is better when it’s planned from the beginning as well. Indexing Guidelines In general, you need to be sure not to overindex your tables, especially for tables that require good performance for data modifications! Common mistakes include creating redundant indexes on primary keys that already have primary key constraints defined or creating multiple indexes with the same set of leading columns. You should understand when an index is required based on need, not just the desire to have an index. Also, you ptg 1540 CHAPTER 41 A Performance and Tuning Methodology should make sure that the indexes you define have sufficient cardinality to be useful for your queries. In most performance and tuning engagements that we do, we spend a good portion of our time removing indexes or redefining them correctly to better support the queries being executed against the tables. For more information on defining useful indexes and how queries are optimized, see Chapters 34, and 35, “Understanding Query Optimization.” Following are some indexing guidelines: . Have an indexing strategy that matches the database/table usages; this is paramount. Do not index OLTP tables with a DSS indexing strategy and vice versa. . For composite indexes, try to keep the more selective columns leftmost in the index. . Be sure to index columns used in joins. Joins are processed inefficiently if no index on the column(s) is specified in a join. . Tailor your indexes for your most critical queries and transactions. You cannot index for every possible query that might be run against your tables. However, your appli- cations will perform better if you can identify your critical and most frequently executed queries and design indexes to support them. . Avoid indexes on columns that have poor selectivity. The Query Optimizer is not likely to use the indexes, so they would simply take up space and add unnecessary overhead during inserts, updates, and deletes. . Use clustered indexes when you need to keep your data rows physically sorted in a specific column order. If your data is growing sequentially or is primarily accessed in a particular order (such as range retrievals by date), the clustered index allows you to achieve this more efficiently. . Use nonclustered indexes to provide quicker direct access to data rows than a table scan when searching for data values not defined in your clustered index. Create nonclustered indexes wisely. You can often add a few other data columns in the nonclustered index (to the end of the index definition) to help satisfy SQL queries completely in the index (and not have to read the data page and incur some extra I/O). This is termed “covering your query.” All query columns can be satisfied from the index structure. . Consider specifying a clustered index fill factor (free space) value to minimize page splits for volatile tables. Keep in mind, however, that the fill factor is lost over time as rows are added to the table and pages fill up. You might need to implement a database maintenance job that runs periodically to rebuild your indexes and reapply the fill factor to the data and index pages. . Be extremely aware of the table/index statistics that the optimizer has available to it. When your table has changed by more than 20% from updates, inserts, or deletes, the data distribution can be affected quite a bit, and the optimizer decisions can change greatly. You’ll often want to ensure that the Auto-Update Statistics option is enabled for your databases to help ensure that index statistics are kept up-to-date as your data changes. ptg 1541 Performance and Tuning Design Guidelines 41 View Design Guidelines In general, you can have as many views as you want. Views are not tables and do not take up any storage space (unless you create an index on the view). They are merely an abstrac- tion for convenience. Except for indexed views, views do not store any data; the results of a view are materialized at the time the query is run against the view and the data is retrieved from the underlying tables. Views can be used to hide complex queries, can be used to control data access, and can be used in the same place as a table in the FROM state- ment of any SQL statement. Following are some view design guidelines: . Use views to hide tables that change their structure often. By using views to provide a stable data access view to your application, you can greatly reduce programming changes. . Utilize views to control security and control access to table data at the data value level. . Be careful of overusing views containing complex multitable queries, especially code that joins such views together. When the query is materialized, what may appear as a simple join between two or three views can result in an expensive join between numerous tables, sometimes including joins to a single table multiple times. . Use indexed views to dramatically improve performance for data accesses done via views. Essentially, SQL Server creates an indexed lookup via the view to the underly- ing table’s data. There is storage and overhead associated with these views, so be careful when you utilize this performance feature. Although indexed views can help improve the performance of SELECT statements, they add overhead to INSERT, UPDATE, and DELETE statements because the rows in the indexed view need to be maintained as data rows are modified, similar to the maintenance overhead of indexes. For more information on creating and using views, see Chapter 27, “Creating and Managing Views.” Transact-SQL Guidelines Overall, how you write your Transact-SQL (T-SQL) code can have one of the greatest impacts on your SQL Server performance. Regardless of how well you’ve optimized your server configuration and database design, poorly written and inefficient SQL code still results in poor performance. The following sections list some general guidelines to help you write efficient, faster-performing code. General T-SQL Coding Guidelines . Use IF EXISTS instead of SELECT COUNT(*) when checking only for the existence of any matching data values. IF EXISTS stops the processing of the SELECT query as soon as the first matching row is found, whereas SELECT COUNT(*) continues search- ing until all matches are found, wasting I/O and CPU cycles. ptg 1542 CHAPTER 41 A Performance and Tuning Methodology . Using Exists/Not Exists in a sub-query is preferable to IN/ NOT IN for sets that are queried. As the potential target size of the set used in the IN gets larger, the perfor- mance benefit increases. . Avoid unnecessary ORDER BY or DISTINCT clauses. Unless the Query Optimizer deter- mines that the rows will be returned in sorted order or all rows are unique, these operations require a worktable for processing the results, which incurs extra over- head and I/O. Avoid these operations if it is not imperative for the rows to be returned in a specific order or if it’s not necessary to eliminate duplicate rows. . Use UNION ALL instead of UNION if you do not need to eliminate duplicate result rows from the result sets being combined with the UNION operator. The UNION statement has to combine the result sets into a worktable to remove any duplicate rows from the result set. UNION ALL simply concatenates the result sets together, without the overhead of putting them into a worktable to remove duplicate rows. . Use table variables instead of temporary tables whenever possible or feasible. Table variables are memory resident and do not incur the I/O overhead and system table and I/O contention that can occur in tempdb with normal temporary tables. . If you need to use temporary tables, keep them as small as possible so they are created and populated more quickly and use less memory and incur less I/O. Select only the required columns rather than using SELECT *, and retrieve only the rows from the base table that you actually need to reference. The smaller the temporary table, the faster it is to create and access the table. . If a temporary table is of sufficient size and will be accessed multiple times, it is often cost effective to create an index on it on the column(s) that will be referenced in the search arguments (SARGs) of queries against the temporary table. Do this only if the time it takes to create the index plus the time the queries take to run using the index is less than the sum total of the time it takes the queries against the temporary table to run without the index. . Avoid unnecessary function executions. If you call a SQL Server function (for example, getdate()) repeatedly within T-SQL code, consider using a local variable to hold the value returned by the function and use the local variable repeatedly throughout your SQL statements rather than repeatedly executing the SQL Server function. This saves CPU cycles within your T-SQL code. . Try to use set-oriented operations instead of cursor operations whenever possible and feasible. SQL Server is optimized for set-oriented operations, so they are almost always faster than cursor operations performing the same task. However, one poten- tial exception to this rule is if performing a large set-oriented operation lead to locking concurrency issues. Even though a single update runs faster than a cursor, while it is running, the single update might end up locking the entire table, or large portions of the table, for an extended period of time. This would prevent other users from accessing the table during the update. If concurrent access to the table is more important than the time it takes for the update itself to complete, you might want to consider using a cursor. ptg 1543 Performance and Tuning Design Guidelines 41 . Consider using the MERGE statement introduced in SQL Server 2008 when you need to perform multiple updates against a table ( UPDATE, INSERT, or DELETE) because it enables you to perform these operations in a single pass of the table rather than perform a separate pass for each operation. . Consider using the OUTPUT clause to return results from INSERT, UPDATE, or DELETE statements rather than having to perform a separate lookup against the table. . Use search arguments that can be effectively optimized by the Query Optimizer. Try to avoid using any negative logic in your SARGs (for example, !=, <>, not in) or performing operations on, or applying functions to, the columns in the SARG. Avoid using expressions in your SARGs where the search value cannot be evaluated until runtime (such as local variables, functions, and aggregations in subqueries) because the optimizer cannot accurately determine the number of matching rows because it doesn’t have a value to compare against the histogram values during query optimiza- tion. Consider putting the queries into stored procedures and passing in the value of the expression as a parameter. The Query Optimizer evaluates the value of a parame- ter prior to optimization. SQL Server evaluates the expression prior to optimizing the stored procedure. . Avoid data type mismatches on join columns. . Avoid writing large complex queries whenever possible. Complex queries with a large number of tables and join conditions can take a long time to optimize. It may not be possible for the Query Optimizer to analyze the entire set of plan alternatives, and it is possible that a suboptimal query plans could be chosen. Typically, if a query involves more than 12 tables, it is likely that the Query Optimizer will have to rely on heuristics and shortcuts to generate a query plan and may miss some optimal strategies. For more tips and information on coding effective and efficient queries, see Chapters 43, “Transact-SQL Programming Guidelines, Tips, and Tricks,” and 35. Stored Procedure Guidelines . Use stored procedures for SQL execution from your applications. Stored procedure execution can be more efficient that ad hoc SQL due to reduced network traffic and query plan caching for stored procedures. . Use stored procedures to make your database sort of a “black box” as far as the as your application code is concerned. If all database access is managed through stored procedures, the applications are shielded from possible changes to the underlying database structures. You can simply modify the existing stored procedures to reflect the changes to the database structures without requiring any changes to the front- end application code. . Ensure that your parameter data types match the column data types they are being compared against to avoid data type mismatches and poor query optimization. . Methodology SQL Server Instance Guidelines Next comes the SQL Server instance itself and the critical items that must be considered: . SQL Server configuration—We do not list many of the SQL Server. allocate memory to SQL Server depends on a number of factors. One is how many other SQL Server instances are running on the same physical server. Another is what type of SQL Server based application. Views.” Transact -SQL Guidelines Overall, how you write your Transact -SQL (T -SQL) code can have one of the greatest impacts on your SQL Server performance. Regardless of how well you’ve optimized your server