Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 412 Part III Beyond Relational ON C.BusinessEntityID = Emp.BusinessEntityID LEFT JOIN Person.Person AS M ON Emp.ManagerID = M.BusinessEntityID ORDER BY Lv, BusinessEntityID; Result (abbreviated): BusinessEntityID ManagerID Lv Name JobTitle Manager 1 263 Jean Trenary Information Services Manager 1 Ken S ´ anchez 2 264 Stephanie Conroy Network Manager 263 Jean Trenary 2 267 Karen Berg Application Specialist 263 Jean Trenary 2 268 Ramesh Meyyappan Application Specialist 263 Jean Trenary 2 269 Dan Bacon Application Specialist 263 Jean Trenary 2 270 Fran ¸ cois Ajenstat Database Administrator 263 Jean Trenary 2 271 Dan Wilson Database Administrator 263 Jean Trenary 2 272 Janaina Bueno Application Specialist 263 Jean Trenary 3 265 Ashvini Sharma Network Administrator 264 Stephanie Conroy 3 266 Peter Connelly Network Administrator 264 Stephanie Conroy A nice feature of the table-valued user-defined function is that it can be called from the CROSS APPLY (new in SQL Server 2005), which executes the function once for every row in the outer query. Here, the CROSS APPLY is used with the function to generate an extensive list of every report under every BusinessEntityID from HumanResources.Employee: using Cross Apply to report all node under everyone SELECT E.BusinessEntityID, OT.BusinessEntityID, OT.Lv FROM HumanResources.Employee AS E CROSS APPLY dbo.OrgTree(BusinessEntityID) AS OT; The next query builds on the previous query, adding a GROUP BY to present a count of the number of reports under every manager. Because it explodes out every manager with its complete subtree, this query returns not 290 rows, but 1,308: Count of All Reports SELECT E.BusinessEntityID, COUNT(OT.BusinessEntityID)-1 AS ReportCount FROM HumanResources.Employee E CROSS APPLY dbo.OrgTree(BusinessEntityID) OT GROUP BY E.BusinessEntityID HAVING COUNT(OT.BusinessEntityID) > 1 ORDER BY COUNT(OT.BusinessEntityID) DESC; Result (abbreviated): BusinessEntityID BusinessEntityID Lv 111 122 1162 1252 1 234 2 412 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 413 Traversing Hierarchies 17 Recursive CTE looking up the hierarchy The previous adjacency list subtree queries all looked down the hierarchical tree. Searching up the tree returns the path from the node in question to the top of the hierarchy — for an organizational chart, it would return the chain of command from the current node to the top of the organizational chart. The technical term for this search is an ancestor search. The queries to search up the hierarchy are similar to the downward-looking queries, only the direction of the join is modified. The following queries demonstrate the modification. This query returns Franc¸ois the DBA’s chain of command to the CEO using a recursive CTE: Adjacency list navigating up the tree Recursive CTE WITH OrgPathUp (BusinessEntityID, ManagerID, lv) AS ( Anchor SELECT BusinessEntityID, ManagerID, 1 FROM HumanResources.Employee WHERE BusinessEntityID = 270 Fran ¸ cois Ajenstat the DBA Recursive Call UNION ALL SELECT E.BusinessEntityID, E.ManagerID, lv + 1 FROM HumanResources.Employee AS E JOIN OrgPathUp ON OrgPathUp.ManagerID = E.BusinessEntityID ) SELECT Lv, Emp.BusinessEntityID, C.FirstName + ‘ ‘ + C.LastName AS [Name], Emp.JobTitle FROM HumanResources.Employee Emp JOIN OrgPathUp ON Emp.BusinessEntityID = OrgPathUp.BusinessEntityID JOIN Person.Person AS C ON C.BusinessEntityID = Emp.BusinessEntityID LEFT JOIN Person.Person AS M ON Emp.ManagerID = M.BusinessEntityID ORDER BY Lv DESC, BusinessEntityID OPTION (MAXRECURSION 20); Result: BusinessEntityID Lv Name JobTitle 3 1 Ken S ´ anchez Chief Executive Officer 2 263 Jean Trenary Information Services Manager 1 270 Fran ¸ cois Ajenstat Database Administrator 413 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 414 Part III Beyond Relational Searching up the hierarchy with a user-defined function Modifying the recursive CTE to search up the hierarchy to find the chain of command, instead of search down the hierarchy to find the subtree, was as simple as changing the join criteria. The same is true for a user-defined function. The next function searches up the hierarchy and is called for Franc¸ois the DBA. The modified join is shown in bold: Classic UDF CREATE FUNCTION dbo.OrgTreeUP (@BusinessEntityID INT) RETURNS @Tree TABLE (BusinessEntityID INT, ManagerID INT, Lv INT) AS BEGIN DECLARE @LC INT = 1 insert the starting level (anchor node) INSERT @Tree (BusinessEntityID, ManagerID, Lv) SELECT BusinessEntityID, ManagerID, @LC FROM HumanResources.Employee AS E the employee WHERE BusinessEntityID = @BusinessEntityID Loop through each lower levels WHILE @@RowCount > 0 BEGIN SET @LC = @LC + 1 insert the Next level of employees INSERT @Tree (BusinessEntityID, ManagerID, Lv) SELECT NextLevel.BusinessEntityID, NextLevel.ManagerID, @LC FROM HumanResources.Employee AS NextLevel JOIN @Tree AS CurrentLevel ON NextLevel.BusinessEntityID = CurrentLevel.ManagerID WHERE CurrentLevel.Lv = @LC - 1 END RETURN END; go calling the Function chain of command up from Fran ¸ cois SELECT * FROM dbo.OrgTreeUp(270); Fran ¸ cois Ajenstat the DBA Result : BusinessEntityID ManagerID Lv 270 263 1 263 1 2 1 NULL 3 414 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 415 Traversing Hierarchies 17 Is the node an ancestor? A common programming task when working with hierarchies is answering the question, Is node A in node B subtree? In practical terms, it’s asking, ‘‘Does Franc¸ois, the DBA, report to Jean Trenary, the IT manager?’’ Using an adjacency list to answer that question from an adjacency list is somewhat complicated, but it can be done by leveraging the subtree work from the previous section. Answering the question ‘‘Does employee 270 report to node 263?’’ is the same question as ‘‘Is node 270 an ancestor of node 263?’’ Both questions can be expressed in SQL as, ‘‘Is the ancestor node in current node’s the ancestor list?’’ The OrgTreeUp() user-defined function returns all the ancestors of a given node, so reusing this user-defined function is the simplest solution: SELECT ‘True’ WHERE 263 263: Jean Trenary IN (SELECT BusinessEntityID FROM OrgTreeUp(270)); 270: Fran ¸ cois Ajenstat the DBA Result: True Determining the node’s level Because each node only knows about itself, there’s no inherent way to determine the node’s level with- out scanning up the hierarchy. Determining a node’s level requires either running a recursive CTE or user-defined function to navigate up the hierarchy and return the column representing the level. Once the level is returned by the recursive CTE or user-defined function, it’s easy to update a column with the lv value. Reparenting the adjacency list As with any data, there are three types of modifications: inserts, updates, and deletes. With a hierarchy, inserting at the bottom of the node is trivial, but inserting into the middle of the hierarchy, updating a node to a different location in the hierarchy, or deleting a node in the middle of the hierarchy can be rather complex. The term used to describe this issue is reparenting — assigning a new parent to a node or set of nodes. For example, in AdventureWorks2008, IT Manager Jean Trenary reports directly to CEO Ken S ´ anchez, but what if a reorganization positions the IT dept under Terri Duffy, VP of Engineering? How many of the nodes would need to be modified, and how would they be updated? That’s the question of reparenting the hierarchy. Because each node only knows about itself and its direct parent node, reparenting an adjacency list is trivial. To move the IT dept under the VP of Engineering, simply update Jean Trenary’s ManagerID value: UPDATE HumanResources.Employee SET ManagerID = 2 Terri Duffy, Vice President of Engineering WHERE BusinessEntityID = 263; Jean Trenary IT Manager 415 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 416 Part III Beyond Relational Deleting a node in the middle of the hierarchy is potentially more complex but is limited to modify- ing n nodes, where n is the number of nodes that have the node being deleted as a parent. Each node under the node to be deleted must be reassigned to another node. By default, that’s probably the deleted node’s ManagerID. Indexing an adjacency list Indexing an adjacency list pattern hierarchy is rather straightforward. Create a non-clustered index on the column holding the parent node ID. The current node ID column is probably the primary key and the clustered index and so will be automatically included in the non-clustered index. The parent ID index will gather all the subtree values by parent ID and perform fast index seeks. The following code was used to index the parent ID column when the adjacency list pattern was restored to AdventureWorks2008 at the beginning of this chapter: CREATE INDEX IxParentID ON HumanResources.Employee (ManagerID); If the table is very wide (over 25 columns) and large (millions of rows) then a non-clustered index on the primary key and the parent ID will provide a narrow covering index for navigation up the hierachy. Cyclic errors As mentioned earlier, every node in the adjacency list pattern knows only about itself and its parent node. Therefore, there’s no SQL constraint than can possibly test for or prevent a cyclic error. If someone plays an April Fools Day joke on Jean Trenary and sets her ManagerID to, say, 270, so she reports to the DBA (gee, who might have permission to do that?), it would introduce a cyclic error into the hierarchy: UPDATE HumanResources.Employee SET ManagerID = 270 Fran ¸ cois Ajenstat the DBA WHERE BusinessEntityID = 263 Jean Trenary IT Manager The cyclic error will cause the OrgTree function to loop from Jean to Franc¸ois to Jean to Franc¸ois for- ever, or until the query is stopped. Go ahead and try it: SELECT ‘True’ WHERE 270 Fran ¸ cois Ajenstat the DBA IN (SELECT BusinessEntityID FROM OrgTree(263)); Now set it back to avoid errors in the next section: UPDATE HumanResources.Employee SET ManagerID = 1 – the CEO WHERE BusinessEntityID = 263 Jean Trenary IT Manager To locate cyclic errors in the hierarchy, a stored procedure or function must navigate both up and down the subtrees of the node in question and use code to detect and report an out-of-place duplication. Download the latest code to check for cyclic errors from www.sqlserverbible.com. 416 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 417 Traversing Hierarchies 17 Adjacency list variations The basic adjacency list pattern is useful for situations that include only a one-parent-to-multiple-nodes relationship. With a little modification, an adjacency list can also handle more, but it’s not sufficient for most serious production database hierarchies. Fortunately, the basic data-pair pattern is easily modified to handle more complex hierarchies such as bills of materials, genealogies, and complex organizational charts. Bills of materials/multiple cardinalities When there’s a many-to-many relationship between current nodes and parent nodes, an associative table is required, similar to how an associative table is used in any other many-to-many cardinality model. For example, an order may include multiple products, and each product may be on multiple orders, so the order detail table serves as an associative table between the order and the product. The same type of many-to-many problem commonly exists in manufacturing when designing schemas for bills of materials. For example, part a23 may be used in the manufacturing of multiple other parts, and part a23 itself might have been manufactured from still other parts. In this way, any part may be both a child and parent of multiple other parts. To build a many-to-many hierarchical bill of materials, the bill of materials serves as the adjacency table between the current part(s) and the parent parts(s), both of which are stored in the same Parts table, as shown in Figure 17-5. The same pattern used to navigate a hierarchy works for a bill of materials system as well — it just requires working through the BillOfMaterials table. The following query is similar to the previous subtree recursive CTE. If finds all the parts used to create a given assembly — in manufacturing this is commonly called a parts explosion report. In this instance, the query does a parts explosion for Product 777 — Adventure Works’ popular Mountain-100 bike in Black with a 44’’ frame: WITH PartsExplosion (ProductAssemblyID, ComponentID, lv, Qty) AS ( Anchor SELECT ProductID, ProductID, 1, CAST(0 AS DECIMAL (8,2)) FROM Production.Product WHERE ProductID = 777 Mountain-100 Black, 44 Recursive Call UNION ALL SELECT BOM.ProductAssemblyID, BOM.ComponentID, lv + 1, PerAssemblyQty FROM PartsExplosion CTE JOIN (SELECT * FROM Production.BillOfMaterials WHERE EndDate IS NULL )ASBOM ON CTE.ComponentID = BOM.ProductAssemblyID ) SELECT lv, PA.NAME AS ‘Assembly’, PC.NAME AS ‘Component’, 417 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 418 Part III Beyond Relational CAST(Qty AS INT) as Qty FROM PartsExplosion AS PE JOIN Production.Product AS PA ON PE.ProductAssemblyID = PA.ProductID JOIN Production.Product AS PC ON PE.ComponentID = PC.ProductID ORDER BY Lv, ComponentID ; FIGURE 17-5 The bill of materials structure in AdventureWorks uses an adjacency table to store which parts (ComponentID) are used to manufacture which other parts (ProductAssembyID). The result is a complete list of all the parts required to make a mountain bike: lv Assembly Component Qty 1 Mountain-100 Black, 44 Mountain-100 Black, 44 0 2 Mountain-100 Black, 44 HL Mountain Seat Assembly 1 2 Mountain-100 Black, 44 HL Mountain Frame - Black,44 1 418 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 419 Traversing Hierarchies 17 2 Mountain-100 Black, 44 HL Headset 1 2 Mountain-100 Black, 44 HL Mountain Handlebars 1 2 Mountain-100 Black, 44 HL Mountain Front Wheel 1 2 Mountain-100 Black, 44 HL Mountain Rear Wheel 1 4 Chain Stays Metal Sheet 5 1 4 Handlebar Tube Metal Sheet 6 1 4 BB Ball Bearing Cup-Shaped Race 2 4 BB Ball Bearing Cone-Shaped Race 2 4 HL Hub HL Spindle/Axle 1 4 HL Hub HL Spindle/Axle 1 4 HL Hub HL Shell 1 4 HL Hub HL Shell 1 4 HL Fork Steerer 1 5 Fork End Metal Sheet 2 1 5 Blade Metal Sheet 5 1 5 Fork Crown Metal Sheet 5 1 5 Steerer Metal Sheet 6 1 Adjacency list pros and cons The adjacency list pattern is common and well understood, with several points in its favor: ■ Reparenting is trivial. ■ It’s easy to manually decode and understand. On the con side, the adjacency list pattern has these concerns: ■ Consistency requires additional care and manual checking for cyclic errors. ■ Performance is reasonable, but not as fast as the materialized list or hierarchyID when retrieving a subtree. The adjacency list pattern is hindered by the need to build the hierarchy if you need to navigate or query related nodes in a hierarchy. This needs to be done iteratively using either a loop in a user-defined function or a recursive CTE. Returning data though a user-defined function also presents some overhead. The Materialized-Path Pattern The materialized-path pattern is another excellent method to store and navigate hierarchical data. Basi- cally, it stores a denormalized, comma-delimited representation of the list of the current node’s complete ancestry, including every generation of parents from the top of the hierarchy down to the current node. A common materialized path is a file path: c:\Users\Pn\Documents\SQLServer2009Bible\AuthorReview\Submitted Franc¸ois Ajenstat, the DBA, has a hierarchy chain of command that flows from Ken S ´ anchez (ID: 1) the CEO, to Jean Trenary (ID: 263) the IT Manager, and then down to Franc¸ois (ID: 270). Therefore, his materialized path would be as follows: 1, 263, 270 419 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 420 Part III Beyond Relational The following scalar user-defined function generates the materialized path programmatically: CREATE FUNCTION dbo.MaterializedPath (@BusinessEntityID INT) RETURNS VARCHAR(200) AS BEGIN DECLARE @Path VARCHAR(200) SELECT @Path = ‘’ Loop through Hierarchy WHILE @@RowCount > 0 BEGIN SELECT @Path = ISNULL(RTRIM( CAST(@BusinessEntityID AS VARCHAR(10)))+ ‘,’,’’) + @Path SELECT @BusinessEntityID = ManagerID FROM Humanresources.Employee WHERE BusinessEntityID = @BusinessEntityID END RETURN @Path END; Executing the function for Franc¸ois Ajenstat (ID:270) returns his materialized path: Select dbo.MaterializedPath(270) as MaterializedPath Result: MaterializedPath 1,263,270, Because the materialized path is stored as a string, it may be indexed, searched, and manipulated as a string, which has its pros and cons. These are discussed later in the chapter. Modifying AdventureWorks2008 for Materialized Path T he following script modifies AdventureWorks2008 and builds a materialized path using the previously added ManagerID data and the newly created MaterializedPath user-defined function: ALTER TABLE HumanResources.Employee ADD MaterializedPath VARCHAR(200); continued 420 www.getcoolebook.com Nielsen c17.tex V4 - 07/21/2009 12:57pm Page 421 Traversing Hierarchies 17 continued Go UPDATE HumanResources.Employee SET MaterializedPath = dbo.MaterializedPath(BusinessEntityID); CREATE INDEX IxMaterializedPath ON HumanResources.Employee (MaterializedPath); SELECT BusinessEntityID, ManagerID, MaterializedPath FROM HumanResources.Employee; Result (abbreviated): BusinessEntityID ManagerID MaterializedPath 1 NULL 1, 2 1 1,2, 3 2 1,2,3, 4 3 1,2,3,4, 5 3 1,2,3,5, 6 3 1,2,3,6, 7 3 1,2,3,7, 8 7 1,2,3,7,8, 9 7 1,2,3,7,9, 263 1 1,263, 264 263 1,263,264, 265 264 1,263,264,265, 266 264 1,263,264,266, 267 263 1,263,267, 268 263 1,263,268, 269 263 1,263,269, 270 263 1,263,270, The way the tasks build on each other for a materialized path are very different from the flow of tasks for an adjacency list; therefore, this section first explains subtree queries. The flow continues with ances- tor checks and determining the level, which is required for single-level queries. Enforcing the structure of the hierarchy — ensuring that every node actually has a parent — is a bit oblique when using the materialized-path method. However, one chap, Simon Sabin (SQL Server MVP in the U.K., all-around good guy, and technical editor for this chapter) has an ingenious method. Instead of explaining it here, I’ll direct you to his excellent website: http://sqlblogcasts.com/blogs/simons/archive/2009/03/09/Enforcing-parent-child- relationship-with-Path-Hierarchy-model.aspx 421 www.getcoolebook.com . materials. For example, part a23 may be used in the manufacturing of multiple other parts, and part a23 itself might have been manufactured from still other parts. In this way, any part may be both. of multiple other parts. To build a many-to-many hierarchical bill of materials, the bill of materials serves as the adjacency table between the current part( s) and the parent parts(s), both of. recursive CTE. If finds all the parts used to create a given assembly — in manufacturing this is commonly called a parts explosion report. In this instance, the query does a parts explosion for Product