Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1442 Part IX Performance Tuning and Optimization FIGURE 68-3 The partition function is used by the partition scheme to place the data in separate filegroups. Part01 Boundaries defined by the Partition Function Partition Locations defined by the Partition Scheme 1/1/2002 1/1/2003 1/1/2004 1/1/2005 Part02 Part03 Create Table( ) On Partition Scheme Part04 Part05 only specify the boundary values between ranges; they don’t define the upper or lower values for the whole table. A boundary value can only exist in one partition. The ranges are defined as left or right. If a row has a partition column value that is the same as a boundary value, then SQL Server needs to know in which partition to put the row. Left ranges mean that data equal to the boundary is included in the partition to the left of the bound- ary. A boundary of ‘12/31/2004’ would create two partitions. The lower partition would include all data up to and including ‘12/31/2004’, and the right partition would include any data greater than ‘12/31/2004’. Right ranges mean that data equal to the boundary goes into the partition on the right of the boundary value. To separate at the new year starting 2008, a right range would set the boundary at ‘1/1/2008’. Any values less than the boundary go into the left, or lower, boundary. Any data with a date equal to or later than the boundary goes into the next partition. These two functions use left and right ranges to create the same result: CREATE PARTITION FUNCTION pfyears(DateTime) AS RANGE LEFT FOR VALUES (’12/31/2001’, ‘12/31/2002’, ‘12/31/2003’, ‘12/31/2004’); or CREATE PARTITION FUNCTION pfYearsRT(DateTime) AS RANGE RIGHT FOR VALUES (‘1/1/2002’, ‘1/1/2003’, ‘1/1/2004’, ‘1/1/2008’); These functions both create four defined boundaries, and thus five partitions. 1442 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1443 Partitioning 68 SQL Server 2008’s table partitions are declarative, meaning the table is segmented by data values. A hash partition segments the data randomly. SQL Server does not have hash par- titioning. You can create a hash function on a computed column but your client application needs to understand this computation to allow for partition elimination. Another option to randomly spread the data across multiple disk subsystems is to define the table using a filegroup and then add multiple files to the filegroup. See Figure 68-4. FIGURE 68-4 The partition configuration can be viewed in Object Explorer. Three catalog views expose information about partition function: syspartition_ functions , syspartition_function_range_values,andsyspartition_parameters. Creating partition schemes The partition schema builds on the partition function to specify the physical locations for the partitions. The physical partition tables may all be located in the same filegroup or spread over several filegroups. 1443 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1444 Part IX Performance Tuning and Optimization The first example partition scheme, named psYearsAll,usesthepfYearsRT partition function and places all the partitions in the Primary filegroup: CREATE PARTITION SCHEME psYearsAll AS PARTITION pfYearsRT ALL TO ([Primary]); To place the table partitions in their own filegroup, omit the ALL keyword and list the filegroups individually. This creates five partitions to match the four boundary values specific in the function: CREATE PARTITION SCHEME psYearsFiles AS PARTITION pfYearsRT TO (Part01, Part02, Part03, Part04, Part05); The partition functions and schemes must be created using T-SQL code, but once they’ve been created you can view them in Management Studio’s Object Explorer under the database Storage node. To examine information about partition schemes programmatically, query sys.partition_ schemes . Creating the partition table Once the partition function and partition schemes are in place, actually creating the table is a piece of cake (pun intended). I recommend creating a partition table with a non-clustered primary key. Adding a clustered index to a table will partition the table based on the partition scheme. The WorkOrder Table Properties page also displays the partition scheme being used by the table. Partition functions and partition schemes don’t have owners, so when referring to partition schemes or partition functions, you don’t need to use the four-part name or the schema owner in the name. The following table is similar to the AdventureWorks WorkOrder table in the production scheme: CREATE TABLE dbo.WorkOrder ( WorkOrderID INT NOT NULL PRIMARY KEY NONCLUSTERED, ProductID INT NOT NULL, OrderQty INT NOT NULL, StockedQty INT NOT NULL, ScrappedQty INT NOT NULL, StartDate DATETIME NOT NULL, EndDate DATETIME NOT NULL, DueDate DATETIME NOT NULL, ScrapReasonID INT NULL, ModifiedDate DATETIME NOT NULL ); CREATE CLUSTERED INDEX ix_WorkORder_DueDate ON dbo.WorkOrder (DueDate) ON psYearsAll(DueDate); The next script inserts 7,259,100 rows into the WorkOrder table in 2 minutes and 42 seconds, as confirmed by the database Summary page: 1444 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1445 Partitioning 68 DECLARE @Counter INT; SET @Counter = 0; WHILE @Counter < 100 BEGIN SET @Counter = @Counter + 1; INSERT dbo.WorkOrder (ProductID, OrderQty, StockedQty, ScrappedQty, StartDate, EndDate, DueDate, ScrapReasonID, ModifiedDate) SELECT WorkOrderID, ProductID, OrderQty, StockedQty, ScrappedQty, StartDate, EndDate, DueDate, ScrapReasonID, ModifiedDate FROM AdventureWorks.Production.WorkOrder; END; It’s possible for multiple partition schemas to share a single partition function. Architecturally, this might make sense if several tables should be partitioned using the same boundaries, because this improves the consistency of the partitions. To verify which tables use which partition schemes, based on which par- tition functions, use the Object Dependencies dialog for the partition function or partition scheme. You can find it using the partition function’s context menu. To see information about how the partitions are being used, look at sys.partitions and sys.partition_counts. Querying partition tables The nice thing about partition tables is that no special code is required to query either across multiple underlying partition tables or from only one partition table. The Query Optimizer automatically uses the right tables to retrieve the data. The $partition operator can return the partition table’s i nteger identifier when used with the partition function. The next code snippet counts the number of rows in each partition: SELECT $PARTITION.pfYearsRT(DueDate) AS Partition, COUNT(*) AS Count FROM WorkOrder GROUP BY $PARTITION.pfYearsRT(DueDate) ORDER BY Partition; Result: Partition Count 1 703900 2 1821200 3 2697100 4 2036900 The next query selects data for one year, so the data should be located in only one partition. Examining the query execution plan (not shown here) reveals that the Query Optimizer used a high-speed clustered index scan on partition ID PtnIds1005: 1445 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1446 Part IX Performance Tuning and Optimization SELECT WorkOrderID,ProductID, OrderQty, StockedQty, ScrappedQty FROM dbo.WorkOrder WHERE DueDate between ‘1/1/2002’ and ‘12/31/2002’ Altering partition tables In order for partition tables to be updated to keep up with changing data, and to enable the perfor- mance testing of various partition schemes, they are easily modified. Even though the commands are simple, modifying the design of partition tables never executes very quickly, as you can imagine. Merging partitions Merge and split surgically modify the table partition design. The ALTER PARTITION MERGE RANGE command effectively removes one of the boundaries from the partition function and merges two partitions. For example, to remove the boundary between 2003 and 2004 in the pfYearsRT partition function, and combine the data from 2003 and 2004 into a single partition, use the following ALTER command: ALTER PARTITION FUNCTION pfYearsRT() MERGE RANGE (’1/1/2004’); Sure enough, following the merge operation, the previous count-rows-per-partition query now returns three partitions, and scripting the partition function from Object Explorer creates a script with three boundaries in the partition function code. If multiple tables share the same partition scheme and partition function being modified, then multiple tables will be affected by these changes. Splitting partitions To split an existing single partition, the first stepistodesignatethenextfilegrouptobeusedbythe partition scheme. This is done using the ALTER PARTITION NEXT USED command. If you specify too many filegroups when creating a scheme, you will get a message that the next filegroup used is the extra file group you specified. Then the partition function can be modified to specify the new boundary using the ALTER PARTITION SPLIT RANGE command to insert a new boundary into the partition function. It’s the ALTER FUNCTION command that actually performs the work. This example segments the 2003–2004 work order data into two partitions. The new partition will include only data for July 2004, the last month with data in the AdventureWorks table: ALTER PARTITION SCHEME psYearsFiles NEXT USED [Primary]; 1446 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1447 Partitioning 68 ALTER PARTITION FUNCTION pfYearsRT() SPLIT RANGE (’7/1/2004’); Switching tables Switching tables i s the cool capability to move an entire table into a partition within a partitioned table, or to remove a single partition so that it becomes a stand-alone table. This is very useful when importing new data, but note a few restrictions: ■ Every index for the partition table must be a partitioned index. ■ The new table must have the same columns (excluding identity columns), indexes, and con- straints (including foreign keys) as the partition table, except that the new table cannot be partitioned. ■ The source partition table cannot be the target of a foreign key. ■ Neither table can be published using replication, or have schema-bound views. ■ The new table must have check constraint restricting the data range to the new partition, so SQL Server doesn’t have to re-verify the data range (and it needs to be validated; no point loading and then creating the constraint with nocheck). ■ Both the stand-alone table and the partition that will receive the stand-alone table must be on thesamefilegroup. ■ The receiving partition or table must be empty. In essence, switching a partition is rearranging the database metadata to reassign the existing table as a partition. No data is actually moved, which makes table switching nearly instantaneous regardless of the table’s size. Prepping the new table The WorkOrderNEW table will be created to demonstrate switching. It will hold August 2004 data from AdventureWorks: CREATE TABLE dbo.WorkOrderNEW ( WorkOrderID INT IDENTITY NOT NULL, ProductID INT NOT NULL, OrderQty INT NOT NULL, StockedQty INT NOT NULL, ScrappedQty INT NOT NULL, StartDate DATETIME NOT NULL, EndDate DATETIME NOT NULL, DueDate DATETIME NOT NULL, ScrapReasonID INT NULL, ModifiedDate DATETIME NOT NULL ) ON Part05; 1447 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1448 Part IX Performance Tuning and Optimization Indexes identical to those on the preceding table will be created on the partitioned table: ALTER TABLE dbo.WorkOrderNEW ADD CONSTRAINT WorkOrderNEWPK PRIMARY KEY NONCLUSTERED (WorkOrderID, DueDate) go CREATE CLUSTERED INDEX ix_WorkOrderNEW_DueDate ON dbo.WorkOrderNEW (DueDate) The following adds the mandatory constraint: ALTER TABLE dbo.WorkOrderNEW ADD CONSTRAINT WONewPT CHECK (DueDate BETWEEN ‘8/1/2004’ AND ‘8/31/2004’); Now import the new data from AdventureWorks, reusing the January 2004 data: INSERT dbo.WorkOrderNEW (ProductID, OrderQty, StockedQty, ScrappedQty, StartDate, EndDate, DueDate, ScrapReasonID, ModifiedDate) SELECT ProductID, OrderQty, StockedQty, ScrappedQty, DATEADD(mm,7,StartDate), DATEADD(mm,7,EndDate), DATEADD(mm,7,DueDate), ScrapReasonID, DATEADD(mm,7,ModifiedDate) FROM AdventureWorks.Production.WorkOrder WHERE DueDate BETWEEN ‘1/1/2004’ and ‘1/31/2004’; The new table now has 3,158 rows. Prepping the partition table The original partition table, built earlier in this section, has a non-partitioned, non-clustered primary key. Because one o f the rules of switching into a partitioned table is that every index must be parti- tioned, the first task for this example is to drop and rebuild the WorkOrder table’s primary key so it will be partitioned: ALTER TABLE dbo.WorkOrder DROP CONSTRAINT WorkOrderPK ALTER TABLE dbo.WorkOrder ADD CONSTRAINT WorkOrderPK PRIMARY KEY NONCLUSTERED (WorkORderID,DueDate) ON psYearsAll(DueDate); Next, the partition table needs an empty partition: ALTER PARTITION SCHEME psYearsFiles NEXT USED [Primary] ALTER PARTITION FUNCTION pfYearsRT() SPLIT RANGE (’8/1/2004’) 1448 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1449 Partitioning 68 Performing the switch The ALTER TABLE SWITCH TO command will move the new table into a specific partition. To determine the empty target partition, select the database Summary page ➪ Disk Usage report: ALTER TABLE WorkOrderNEW SWITCH TO WorkOrder PARTITION 5 Switching out The same technology can be used to switch a partition out of the partition table so that it becomes a stand-alone table. Because no merger is taking place, this is much easier than switching in. The follow- ing code takes the first partition out of the WorkOrder partition table and reconfigures the database metadata so it becomes its own table: ALTER TABLE WorkOrder SWITCH PARITION 1 to WorkOrderArchive Rolling partitions With a little imagination, the technology to create and merge existing partitions can be used to create rolling partition designs. Rolling partitions are useful for time-based partition functions such as partitioning a year of data into months. Each month, the rolling partition expands for a new month. To build a 13-month rolling partition, perform these steps each month: 1. Add a new boundary. 2. Point the boundary to the next used filegroup. 3. Merge the oldest two partitions to keep all the data. Switching tables into and out of partitions can enhance the rolling partition designs by switching in fully populated staging tables and switching out the tables into an archive location. Indexing partitioned tables Large tables mean large indexes, so non-clustered indexes can be optionally partitioned. Creating partitioned indexes Partitioned non-clustered indexes must include the column used by the partition function in the index, and must be created using the same ON clause as the partitioned clustered index: CREATE INDEX WorkOrder_ProductID ON WorkOrder (ProductID, DueDate) ON psYearsFiles(DueDate); 1449 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1450 Part IX Performance Tuning and Optimization Maintaining partitioned indexes One of the advantages of partitioned indexes is that they can be individually maintained. The following example rebuilds the newly added fifth partition: ALTER INDEX WorkOrder_ProductID ON dbo.WorkOrder REBUILD PARTITION = 5 Removing partitioning To remove the partitioning of any table, drop the clustered index and add a new clustered index without the partitioning ON clause. When dropping the clustered index, you must add the MOVE TO option to actually consolidate the data onto the specified filegroup, thus removing the partitioning from the table: DROP INDEX ix_WorkOrder_DueDate ON dbo.Workorder WITH (MOVE TO [Primary]); Data-Driven Partitioning The third method doesn’t involve any Microsoft partitioning technology. Instead, it’s an architectural pattern that I’ve used in large, heavy transaction databases. It’s rather simple, but very fast. A data-driven partitioning scheme segments the data into different servers based on a partition key. Each server has the same database schema, but the data stored is only the required data partition key or ranges. For example, server A could hold accounts 1–999. Server B could hold accounts 1,000–1,999. Server C could hold all accounts greater than or equal to 2,000. A partition mapping table stores the server name for each partition key value or range of values. In the previous example, the partition key table would hold the from and to account numbers and the server name. The middle tier reads and caches the partition mapping table, and for every database access it checks the partition mapping table to determine which server holds the needed data. This method works best when the data is self-contained and the complete query can be solved using only the subset of data. If the servers need to do much cross-server querying to solve the queries, then the benefits are likely lost. What’s nice about data-driven partitioning is that it’s very easy to scale out. Adding another server only requires moving some data and updating the partition-mapping table. 1450 Nielsen c68.tex V4 - 07/21/2009 4:18pm Page 1451 Partitioning 68 Summary Not every database will have to scale to higher magnitudes of capacity, but when a project does grow into the terabytes, SQL Server 2008 provides some advanced technologies to tackle the growth. However, even these advanced technologies are no substitute for Smart Database Design. Key points on partitioning include the following: ■ Partitioned views use a union all to merge data from several user-created base tables. Each partition table must include the partition key and a constraint. ■ The Query Processor can carefully choose the minimum number of underlying tables when selecting through a partitioned view, but not when updating. ■ Distributed partitioned views add distributed queries to combine data from multiple servers. ■ Partitioned tables are a c ompletely different technology than partitioned views and use a partition function, schema, and clustered index to partition a single table. ■ Data-driven partitioning i s an architectural pattern that involves custom coding, but it delivers the best possible scale-out performance and flexibility. The next chapter wraps up this part covering optimization with a new f eature for SQL Server 2008 Enterprise Edition that’s getting quite a bit of buzz. 1451 . creates five partitions to match the four boundary values specific in the function: CREATE PARTITION SCHEME psYearsFiles AS PARTITION pfYearsRT TO (Part0 1, Part0 2, Part0 3, Part0 4, Part0 5); The partition. 4:18pm Page 1443 Partitioning 68 SQL Server 2008 s table partitions are declarative, meaning the table is segmented by data values. A hash partition segments the data randomly. SQL Server does not. the Partition Function Partition Locations defined by the Partition Scheme 1/1/2002 1/1/2003 1/1/2004 1/1/2005 Part0 2 Part0 3 Create Table( ) On Partition Scheme Part0 4 Part0 5 only specify the