Nielsen c66.tex V4 - 07/21/2009 4:13pm Page 1412 Part IX Performance Tuning and Optimization 10. In the background, when a checkpoint occurs (a SQL Server internal event) or the lazy writer runs, SQL Server writes any dirty (modified) data pages to the data file. It tries to find sequen- tial pages to improve the performance of the write. Even though I’ve listed it here as step 10, this can happen at nearly any point during the transaction or after it depending on the amount of data being changed and the memory pressure on the system. SQL Server receives a ‘‘write complete’’ message from Windows. 11. At the conclusion of the background write operation, SQL Server marks the oldest open transaction in the transaction log. All older, committed transactions have been confirmed in the data file and are now confirmed in the transaction log. The DBCC OpenTran command reports the oldest open transaction. Transaction complete The sequence comes full circle and returns the database to a consistent state. 12. The database finishes in a consistent state. Transaction-log rollback If the transaction is rolled back, the DML operations are reversed in memory, and a transaction-abort entry is made in the log. More often than not, the time taken to perform a rollback will be greater than the time taken to make the changes in the first place. Transaction log recovery The primary benefit of a write-ahead transaction log is that it maintains the atomic transactional property in the case of system failure. If SQL Server should cease functioning (perhaps due to a power failure or physical disaster), the transac- tion log is automatically examined once it recovers, as follows: ■ If any entries are in the log as DML operations but are not committed, they are rolled back. ■ To test this feature you must be brave. Begin a transaction and shut down the SQL server before issuing a COMMIT transaction (using the Services applet). This does a shutdown with nowait . Simply closing Query Analyzer won’t do it; Query Analyzer will request permission to commit the pending transactions and will roll back the transactions if permission isn’t given. If SQL Server is shut down normally (this varies greatly, as there are many ways to stop, some of which gracefully shut down, others which don’t), it will wait for any pending tasks to complete before stopping. ■ If you have followed the steps outlined previously and you disable the system just before step 7, the transaction log entries will be identical to those shown later (refer to Figure 66-10). ■ Start SQL Server, and it will recover from the crash very nicely and roll back the unfinished transaction. This can be seen in the SQL Server ErrorLog. ■ If any entries are in the log as DML operations and committed but not marked as written to the data file, they are written to the data file. This feature is nearly impossible to demonstrate. 1412 www.getcoolebook.com Nielsen c66.tex V4 - 07/21/2009 4:13pm Page 1413 Managing Transactions, Locking, and Blocking 66 Transaction Performance Strategies Transaction integrity theory can seem daunting at first, and SQL Server has numerous tools to control transaction isolation. If the database is low usage or primarily read-only, transaction locking and blocking won’t be a problem. However, for heavy-usage OLTP databases, you’ll want to apply the theory and working knowledge from this chapter using these strategies. Also if you are mixing reporting and OLTP systems, you are facing large blocking issues, as reporting systems generally place locks at the page or table level, which isn’t good for your OLTP system that wants row-level locks. Because locking and blocking c omprise the fourth optimization strategy, ensure that steps one through three are covered before tackling locking and blocking: 1. Begin with Smart Database Design: Start with a clean simplified schema to reduce the number of unnecessary joins and reduce the amount of code used to shuttle data from bucket to bucket. 2. Use efficient set-based code, rather than painfully slow iterative cursors or loops. Large set- based operations can cause locking and blocking. Chapter 22, ‘‘Kill the Cursor!,’’ explains how to break up large set-based operations into smaller batches to alleviate this problem. 3. Use a solid indexing strategy to eliminate unnecessary table scans and increase the speed of transactions. To identify locking problems, use the Activity Monitor or SQL Profiler. To reduce the severity of a locking problem, do the following: ■ Evaluate and test using the read committed snapshot isolation level. Depending on your error handling and hardware capabilities, snapshot isolation can significantly reduce concurrency contention. ■ Check the transaction isolation level and ensure that it’s not any higher than required. ■ Make sure transactions begin and commit quickly. Redesign any transaction that includes a cursor that doesn’t have to use a cursor. Move any code that isn’t necessary to the transaction out of the transaction unless it is needed to ensure transactional consistency. ■ If two procedures are deadlocking, make sure they lock the resource in the same order. ■ Make sure client applications access the database through the data abstraction layer. ■ Consider forcing rowlocks locks with the ( rowlock) hint to prevent the locks from escalating. Evaluating database concurrency performance It’s easy to build a database that doesn’t exhibit lock contention and concurrency issues when tested with a handful of users. The real test occurs when several hundred users are all updating orders. Concurrency testing requires a concerted effort. At one level, it can involve everyone available r unning the same front-end form concurrently. A .NET program that constantly simulates a user viewing data and updating data is also useful. A good test is to run 20 instances of a script that constantly pounds the database and then let the test crew use the application. Performance Monitor (covered in Chapter 55, ‘‘Performance Monitor’’) can watch the number of locks. 1413 www.getcoolebook.com Nielsen c66.tex V4 - 07/21/2009 4:13pm Page 1414 Part IX Performance Tuning and Optimization Best Practice M ulti-user concurrency should be tested during the development process several times. To quote the MCSE exam guide, ‘‘ don’t let the real test be your first test.’’ Summary A transaction is a logical unit of work. Although the default SQL Server transaction isolation level works well for most applications, there are several means of manipulating and controlling the locks. To develop a serious SQL Server application, your understanding of the ACID database principles, SQL Server’s transaction log, and locking will contribute to the quality, performance, and r eliability of the database. Major points from this chapter include the following: ■ Transactions must be ACID: atomic (all or nothing), consistent (before and after the trans- action), isolated (not affected by another transaction), and durable (once committed always committed). ■ SQL Server transactions are durable because of the write-ahead transaction log. ■ SQL Server transactions are isolated because of locks or snapshot isolation. ■ Using traditional transaction isolation, readers block writers, and writers block readers and other writers. ■ SQL Server offers four traditional transaction isolation levels: read uncommitted, read commit- ted, repeatable read, and serializable. Read committed, the default transaction isolation level, is the right isolation for most OLTP databases. ■ Never ever use read uncommitted (or the NOLOCK hint). ■ Snapshot isolation means reading the before image of the transaction i nstead of waiting for the transaction to commit. Using snapshot isolation, readers don’t block writers, and writers don’t block readers; only writers block other writers. The next chapter continues the optimization theme with one of my favorite new features — data com- pression. High-transaction databases always struggl e with I/O performance, and data compression is the perfect solution for reducing I/O. 1414 www.getcoolebook.com Nielsen c67.tex V4 - 07/21/2009 4:15pm Page 1415 Data Compression IN THIS CHAPTER Understanding compression Reducing I/O Whole-database compression procedures Compression strategies P ushing a database into the tens of thousands of transactions per second requires massive amounts of raw I/O performance. At those rates, today’s servers can supply the CPU and memory, but I/O struggles. By reducing the raw size of the data, data compression trades I/O for CPU, improving performance. Data compression is easy — easy to enable, and easy to benefit from, so why a full chapter on data compression? Data compression is the sleeper of the SQL Server 2008 new feature list. Like online indexing in SQL Server 2005, I believe that data compression will become the compelling reason to upgrade for many large SQL Server IT shops. In other words, data compression doesn’t warrant an entire chapter because of its complexity or length, but because of its value. Its impact is such that it deserves center stage, at least for this chapter. Understanding Data Compression Every IT professional is familiar with data compression, such as zip files and .jpg compression, to name a couple of popular compression technologies. But SQL Server data compression is specific to the SQL Server storage engine and has a few database-specific requirements. First, there has to be zero risk of loss of data fidelity. Second, it has to be completely transparent — enabled without any application code changes. 1415 www.getcoolebook.com Nielsen c67.tex V4 - 07/21/2009 4:15pm Page 1416 Part IX Performance Tuning and Optimization SQL Server data compression isn’t like .jpg compression, where you can choose the level of compression and more compression means more data loss. With SQL Server data compression, the data is transpar- ently compressed by the storage engine and every compressed data page retains every data value when decompressed. Don’t confuse data compression with backup compression — the two technologies are completely independent. The following data objects may be compressed: ■ Entire heap ■ Entire clustered index ■ Entire non-clustered index ■ Entire indexed view (specifically, the materialized clustered index of an indexed view) ■ Single partition of partitioned table or index While indexes can be compressed, they are not automatically compressed with the table’s compression type. All objects, including indexes, must be individually, manually enabled for compression. Data compression limitations: ■ Heaps or clustered indexes with sparse data may not be compressed. ■ File stream data or LOB data is not compressed. ■ Tables with rows that potentially exceed 8,060 bytes and use row overflow cannot be compressed. ■ Data compression does not overcome the row limit. The data must always be able to be stored uncompressed. Data compression pros and cons Data compression offers several benefits and a few trade-offs, so while using data compression is proba- bly a good thing, it’s worth understanding the pros and cons. The most obvious con is the financial cost. Data compression is only available with the Enterprise Edition. If you already are using Enterprise Edition, great; if not, then moving from Standard to Enterprise is a significant budget request. Data compression uses CPU. If your server is CPU pressured, then turning on data compression will probably hurt performance. Depending on the data mix and the transaction rate, enabling data compres- sion might slow down the application. Not all tables and indexes compress well. In my testing, some objects will compress up to 70%, but many tables will see little compression, or even grow in size when compressed. Therefore, you shouldn’t simply enable compression for every object; it takes some study and analysis to choose compression wisely. With these three possible drawbacks understood, there are plenty of reasons to enable data compression (assuming the data compresses well): 1416 www.getcoolebook.com Nielsen c67.tex V4 - 07/21/2009 4:15pm Page 1417 Data Compression 67 ■ Data compression significantly reduces the I/O bottleneck for a high-transaction database. ■ Data compression significantly reduces the memory footprint of data, thus caching more data in memory and probably improving overall performance. ■ More rows on a page mean that scans and count(*) type operations are faster. ■ Compressed data means SAN shadow copies are smaller. ■ Database snapshots are smaller and more efficient with data compression. ■ SANS and high-performance disks are expensive. Compressed data means less disk space is required, which means more money is left in the budget to attend a SQL Server conference in Maui. ■ Compressed data means backup duration and restore duration is reduced, and less storage space is used for backups. There are hardware-based data compression solutions that compress data as it’s written to disk. While these can reduce disk space and off-load the CPU overhead of compres- sion, they fail to reduce the I/O load on SQL Server, or reduce the data’s memory footprint within SQL Server. There are two types, or levels, of data compression in SQL Server 2008: row level and page level. Each has a specific capability and purpose. So you can best understand how and when to employ data com- pression, the following sections describe how they work. Row compression Row compression converts the storage of every fixed-length data type column (both character and numeric data types) to a variable-length data type column. Row compression grew out of the vardecimal com- pression added with SQL Server 2005 SP2. Depending on the number of fixed-length columns and the actual length of the data, this level may, or may not, provide significant gain. While you’ll still see the columns as fixed length when viewing the database, under the covers the stor- age engine is actually writing the values as if the columns were variable length. A char(50) column is treated as if it’s a varchar(50) column. When row compression is enabled, SQL Server also uses a new variable-length row format that reduces the per-column metadata overhead from 2 bytes to 4 bits. Row-level data compression is designed specifically for third-party databases that have several fixed-length columns but don’t allow schema changes. Page compression SQL Server page compression automatically includes row compression and takes compression two steps further, adding prefix compression and then dictionary compression. Page compression applies only to leaflevel pages (clustered or heaps) and not to the b-tree root or intermediate pages. Prefix compression may appear complex at first, but it’s actually very simple and efficient. For prefix compression the storage engine follows these steps for each column: 1417 www.getcoolebook.com Nielsen c67.tex V4 - 07/21/2009 4:15pm Page 1418 Part IX Performance Tuning and Optimization 1. The storage engine examines all the values and selects the most common prefix value for the data in the column. 2. The longest actual value beginning with the prefix is then stored in the compression informa- tion (CI) structure. 3. If the prefix is present at the beginning of the data values, a number is inserted at the begin- ning of the value to indicate n number of prefix characters of the prefix. The non-prefix portion of the value (the part to the right of the prefix) is left in place. Prefix compression actually examines bytes, so it applies to both character and numeric data. For example, assume the storage engine were applying prefix compression to the following data, which includes two columns, shown in Figure 67-1. FIGURE 67-1 The sample data before page compression is enabled. Raw Data: Page Header Nielsen Nelson Nelsen Nelsen Paul Joe Joseph Joshua Compression Information (CI) Anchor Row: For the first column, the best prefix is Nels. The longest value beginning with the prefix is Nelson, so that’s written to the CI structure, as shown in Figure 67-2. For the second column, the best prefix is Jos and the longest value is Joseph. The prefixes are written to an anchor row at the beginning of each page. 1418 www.getcoolebook.com Nielsen c67.tex V4 - 07/21/2009 4:15pm Page 1419 Data Compression 67 The values are then updated with the prefix (see Figure 67-2). The first value, Nielsen, begins with one letter of the prefix, so 1ielsen is written, which doesn’t save any space. But the compression ratio is much better for values that include more of the prefix — for instance, Nelson is compressed into just the number 6 because it contains six characters of the prefix with nothing remaining. Nelsen is compressed into 4en, meaning that it begins with four letters of the compression followed by en. FIGURE 67-2 Prefix compression identifies the best prefix for each column and then stores the prefix character count in each row instead of the prefix characters. Prefix Compressed Data: Page Header lielsen 6 Nelson 4en 6 0Paul 2e 6 Joseph 3hua Compression Information (CI) Anchor Row: As demonstrated, depending on the commonality of the data set, prefix compression can significantly compress the data without any loss of data fidelity. In this simple example, prefix compression alone reduced the data from 42 bytes to 29 bytes, saving 30%. Notice that in this example, one value, Paul, doesn’t match the prefix at all. It’s stored as 0Paul, which increases the length. If this is the case for most of the rows, and prefix compression offers no benefit for a given column, the storage engine will leave the anchor row null and not use prefix compression for that column. This is one reason why sometimes tables will actually grow when compressed. 1419 www.getcoolebook.com Nielsen c67.tex V4 - 07/21/2009 4:15pm Page 1420 Part IX Performance Tuning and Optimization Once the data is prefix compressed, the storage engine applies dictionary compression. Every value is scanned and any common values are replaced with a token that is stored in the compression information area of the page. Prefix compression occurs on the column level, while dictionary compression occurs across all columns on the page level. Compression sequence The cool thing about data compression is that it’s completely handled by the storage engine and trans- parent to every process outside of the storage engine. This means that the data is compressed on the disk and is still compressed when it’s read into memory. The storage engine decompresses the data as it’s being passed from the storage engine to the query processor, as illustrated in Figure 67-3. FIGURE 67-3 The storage engine compresses and decompresses data as it’s written t o and read from the buffer. Relational Database Engine Disk or SAN Query Optimizer Query Processor Storage Engine (buffer) Data Compression If the object is row compressed, or page compressed (which automatically includes row compression), then row compression is always enabled for every page of the object. Page compression, however, is a different story: ■ The storage engine enables page compression on a page-by-page basis when there’s a benefit for that page. When the storage engine creates a new page, it’s initially uncompressed and remains uncompressed as rows are added to the page. Why compress a page that’s only half full anyway? ■ When the page is full but SQL Server wants to add another row to it, the storage engine tests the page for compression. If the page compresses enough to add the new rows, then the page is compressed. ■ Once the page is a compressed page, any new rows will be inserted compressed (but they won’t trigger recalculation of the compression information, the prefix anchor row, or the dictionary tokens). 1420 www.getcoolebook.com Nielsen c67.tex V4 - 07/21/2009 4:15pm Page 1421 Data Compression 67 ■ Pages might be recompressed (and the prefixes and dictionary tokens recalculated) when the row is updated, based on an algorithm that factors in the number of updates to a page, the number of rows on the page, the average row length, and the amount of space that can be saved by page compression for each page, or when the row would again need to be split. ■ Heaps are recompressed only by an index rebuild or bulk load. ■ In the case of a page split, both pages inherit the page compression information (compression status, prefixes, and dictionary tokens) of the old page. ■ During an index rebuild of an object with page compression, the point at which the page is considered full still considers the fill factor setting, so the free space is still guaranteed. ■ Row inserts, updates, and deletes are normally written to the transaction log in row com- pression, but not in page compression format. An exception is when page splits are logged. Because they are a physical operation, only the page compression values are logged. Applying Data Compression Although data compression is complicated, actually enabling data compression is a straightforward task using either the Data Compression Wizard or an ALTER command. Determining the current compression setting When working with compression, the first task is to confirm the current compression setting. Using the Management Studio UI, there are two ways to view the compression type for any single object: ■ The Table Properties or Index Properties Storage page displays the compression settings as a read-only value. ■ The Data Compression Wizard, found in Object Explorer (context menu ➪ Storage ➪ Manage Compression), opens with the current compression selected. To see the current compression setting for every object in the database, run this query: SELECT O.object_id, S.name AS [schema], O.name AS [Object], I.index_id AS Ix_id, I.name AS IxName, I.type_desc AS IxType, P.partition_number AS P_No, P.data_compression_desc AS Compression FROM sys.schemas AS S JOIN sys.objects AS O ON S.schema_id = O.schema_id JOIN sys.indexes AS I ON O.object_id = I.object_id JOIN sys.partitions AS P ON I.object_id = P.object_id AND I.index_id = P.index_id WHERE O.TYPE = ‘U’ ORDER BY S.name, O.name, I.index_id ; 1421 www.getcoolebook.com . sleeper of the SQL Server 2008 new feature list. Like online indexing in SQL Server 2005, I believe that data compression will become the compelling reason to upgrade for many large SQL Server IT. fail to reduce the I/O load on SQL Server, or reduce the data’s memory footprint within SQL Server. There are two types, or levels, of data compression in SQL Server 2008: row level and page level 07/21/2009 4:13pm Page 1412 Part IX Performance Tuning and Optimization 10. In the background, when a checkpoint occurs (a SQL Server internal event) or the lazy writer runs, SQL Server writes any dirty