Joe Celko s SQL for Smarties - Advanced SQL Programming P67 pot

10 127 0
Joe Celko s SQL for Smarties - Advanced SQL Programming P67 pot

Đang tải... (xem toàn văn)

Thông tin tài liệu

632 CHAPTER 28: TREES AND HIERARCHIES IN SQL Figure 28.2 Figure 28.3 Figure 28.4 28.3 Nested Set Model of Hierarchies 633 Computer science majors will recognize this as a modified preorder tree traversal algorithm. CREATE TABLE NestTree (node CHAR(2) NOT NULL PRIMARY KEY, lft INTEGER NOT NULL UNIQUE CHECK (lft > 0), rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), CONSTRAINT order_okay CHECK (lft < rgt)); NestTree node lft rgt =============== 'A' 1 12 'B' 2 3 'C' 4 11 'D' 5 6 'E' 7 8 'F' 9 10 Another nice thing is that the name of each node appears once and only once in the table. The path enumeration and adjacency list models used lots of self-references to nodes, which made updating more complex. 28.3.1 The Counting Property The lft and rgt numbers have a definite meaning and carry information about the location and nature of each subtree. The root is always (lft, rgt) = (1, 2 * (SELECT COUNT(*) FROM TreeTable)) and leaf nodes always have (lft + 1 = rgt) . SELECT node AS root FROM NestTree WHERE lft = 1; SELECT node AS leaf FROM NestTree WHERE lft = (rgt - 1); 634 CHAPTER 28: TREES AND HIERARCHIES IN SQL Another very useful result of the counting property is that any node in the tree is the root of a subtree (the leaf nodes are a degenerate case) of size (rgt - lft +1)/2 . 28.3.2 The Containment Property In the nested set model table, all the descendants of a node can be found by looking for the nodes with a rgt and lft number between the lft and rgt values of their parent node. For example, to find out all the subordinates of each boss in the corporate hierarchy, you would write: SELECT Superiors.node, ' is a boss of ', Subordinates.node FROM NestTree AS Superiors, NestTree AS Subordinates WHERE Subordinates.lft BETWEEN Superiors.lft AND Superiors.rgt; This would tell you that everyone is also his own boss, so in some situations you would also add the predicate: AND Subordinates.lft <> Superiors.lft This simple self- JOIN query is the basis for almost everything that follows in the nested set model. The containment property does not depend on the values of lft and rgt having no gaps, but the counting property does. The level of a node in a tree is the number of edges between the node and the root. The larger the depth number, the farther away the node is from the root. A path is a set of edges that directly connect two nodes. The nested set model uses the fact that each containing set is “wider” (where width = (rgt - lft)) than the sets it contains. Obviously, the root will always be the widest row in the table. The level function is the number of edges between two given nodes; it is fairly easy to calculate. For example, to find the level of each subordinate node, you would use SELECT T2.node, (COUNT(T1.node) - 1) AS level FROM NestTree AS T1, NestTree AS T2 WHERE T2.lft BETWEEN T1.lft AND T1.rgt GROUP BY T2.node; 28.3 Nested Set Model of Hierarchies 635 The reason for using the expression (COUNT(*) - 1) is to remove the duplicate count of the node itself, because a tree starts at level zero. If you prefer to start at one, then drop the extra arithmetic. 28.3.3 Subordinates The Nested Set Model usually assumes that the subordinates are ranked by age, seniority, or in some other way from left to right among the immediate subordinates of a node. The adjacency model does not have a concept of such rankings, so the following queries are not possible without extra columns to hold the rankings in the adjacency list model. The most senior subordinate is found by this query: SELECT Subordinates.node, ' is the oldest child of ', :my_node FROM NestTree AS Superiors, NestTree AS Subordinates WHERE Superiors.node = :my_node AND Subordinates.lft - 1 = Superiors.lft; leftmost child Most junior subordinate: SELECT Subordinates.node, ' is the youngest child of ', :my_node FROM NestTree AS Superiors, NestTree AS Subordinates WHERE Superiors.node = :my_node AND Subordinates.rgt = Superiors.rgt - 1; rightmost child To convert a nested set model into an adjacency list model with the immediate subordinates, use this query in a VIEW. CREATE VIEW AdjTree (parent, child) AS SELECT B.node, E.node FROM NestTree AS E LEFT OUTER JOIN NestTree AS B ON B.lft = (SELECT MAX(lft) FROM NestTree AS S WHERE E.lft > S.lft AND E.lft < S.rgt); 636 CHAPTER 28: TREES AND HIERARCHIES IN SQL 28.3.4 Hierarchical Aggregations To find the level of each node, so you can print the tree as an indented listing. Technically, you should declare a cursor to go with the ORDER BY clause. SELECT COUNT(T2.node) AS indentation, T1.node FROM NestTree AS T1, NestTree AS T2 WHERE T1.lft BETWEEN T2.lft AND T2.rgt GROUP BY T1.lft, T1.emp ORDER BY T1.lft; This same pattern of grouping will also work with other aggregate functions. Let’s assume a second table contains the weight of each of the nodes in the NestTree. A simple hierarchical total of the weights by subtree is a two-table join. SELECT Superiors.node, SUM (Subordinates.weight) AS subtree_weight FROM NestTree AS Superiors, NestTree AS Subordinates NodeWeights AS W WHERE Subordinates.lft BETWEEN Superiors.lft AND Superiors.rgt AND W.node = Subordinates,node; 28.3.5 Deleting Nodes and Subtrees Another interesting property of this representation is that the subtrees must fill from lft to rgt. In other tree representations, it is possible for a parent node to have a rgt child and no lft child. This lets you assign some significance to being the leftmost child of a parent. For example, the node in this position might be the next in line for promotion in a corporate hierarchy. Deleting a single node in the middle of the tree is conceptually harder than removing whole subtrees. When you remove a node in the middle of the tree, you have to decide how to fill the hole. There are two ways. The first method is to promote one of the children to the original node’s position—Dad dies and the oldest son takes over the business. The second method is to connect the children to the parent of the original node—Mom dies and Grandma adopts the kids. This is the default action in a nested set model because of the containment property; the deletion will destroy the counting property, however. 28.3 Nested Set Model of Hierarchies 637 If you wish to close multiple gaps, you can do this by renumbering the nodes, thus: UPDATE NestTree SET lft = (SELECT COUNT(*) FROM (SELECT lft FROM NestTree UNION ALL SELECT rgt FROM NestTree) AS LftRgt (seq_nbr) WHERE seq_nbr <= lft), rgt = (SELECT COUNT(*) FROM (SELECT lft FROM NestTree UNION ALL SELECT rgt FROM NestTree) AS LftRgt (seq_nbr) WHERE seq_nbr <= rgt); If the derived table LftRgt is a bit slow, you can use a temporary table and index it or use a VIEW that will be materialized. CREATE VIEW LftRgt (seq_nbr) AS SELECT lft FROM NestTree UNION SELECT rgt FROM NestTree; 28.3.6 Converting Adjacency List to Nested Set Model It would be fairly easy to load an adjacency list model table into a host language program, then use a recursive preorder tree traversal program from a college freshman data structures textbook to build the nested set model. Here is a version with an explicit stack in SQL/PSM. Tree holds the adjacency model CREATE TABLE Tree (node CHAR(10) NOT NULL, parent CHAR(10)); Stack starts empty, will holds the nested set model CREATE TABLE Stack (stack_top INTEGER NOT NULL, node CHAR(10) NOT NULL, lft INTEGER, rgt INTEGER); 638 CHAPTER 28: TREES AND HIERARCHIES IN SQL BEGIN ATOMIC DECLARE counter INTEGER; DECLARE max_counter INTEGER; DECLARE current_top INTEGER; SET counter = 2; SET max_counter = 2 * (SELECT COUNT(*) FROM Tree); SET current_top = 1; clear the stack DELETE FROM Stack; push the root INSERT INTO Stack SELECT 1, node, 1, max_counter FROM Tree WHERE parent IS NULL; delete rows from tree as they are used DELETE FROM Tree WHERE parent IS NULL; WHILE counter <= max_counter- 1 DO IF EXISTS (SELECT * FROM Stack AS S1, Tree AS T1 WHERE S1.node = T1.parent AND S1.stack_top = current_top) THEN push when top has subordinates and set lft value INSERT INTO Stack SELECT (current_top + 1), MIN(T1.node), counter, CAST(NULL AS INTEGER) FROM Stack AS S1, Tree AS T1 WHERE S1.node = T1.parent AND S1.stack_top = current_top; delete rows from tree as they are used DELETE FROM Tree WHERE node = (SELECT node FROM Stack WHERE stack_top = current_top + 1); housekeeping of stack pointers and counter SET counter = counter + 1; SET current_top = current_top + 1; 28.4 Other Models for Trees and Hierarchies 639 ELSE pop the stack and set rgt value UPDATE Stack SET rgt = counter, stack_top = -stack_top pops the stack WHERE stack_top = current_top; SET counter = counter + 1; SET current_top = current_top - 1; END IF; END WHILE; END; the top column is not needed in the final answer SELECT node, lft, rgt FROM Stack; This is not the fastest way to do a conversion, but since conversions are probably not going to be frequent tasks, it might be good enough when translated into your SQL product’s procedural language. 28.4 Other Models for Trees and Hierarchies Other models for trees are discussed in a separate book, but these three methods represent the major families of models. You can also use specialized models for specialized trees, such as binary trees. The real point is that you can use SQL for hierarchical structures, but you have to pick the right one for your task. I would classify the choices as: 1. Frequent node changes and infrequent structure changes. Example: organizational charts where personnel come and go, but the organization stays much the same. 2. Infrequent node changes with frequent structure changes. Example: a message board where the e-mails are the nodes that never change, and the structure is simply extended with each new e-mail. 3. Infrequent node changes and infrequent structure changes. Example: historical data in a data warehouse that has a categorical hierarchy in place as a dimension. 4. Both frequent node changes and frequent structure changes. Example: a mapping system that attempts to find the best path from a central dispatch to the currently most critical node through a tree that is also changing. Let’s make that a bit clearer 640 CHAPTER 28: TREES AND HIERARCHIES IN SQL with the concrete example of getting a fire truck from the engine house to the worst fire in its service area, based on the traffic at the time. I am not going to pick any particular tree model for any of those situations. The answer, once again, is “Well, it all depends . . .” CHAPTER 29 Temporal Queries T EMPORAL DATA IS THE hardest type of data for people to handle conceptually. Perhaps time is difficult because it is dynamic and all other data types are static, or perhaps it is because time allows multiple parallel events. This is an old puzzle that still catches people. If a hen and a half can lay an egg and a half in a day and a half, then how many hens does it take to lay six eggs in six days? Do not look at the rest of the page; try to answer the question in your head. The answer is a hen and a half—although you might want to round that up to two hens in the real world. People tend to get tripped up on the rate (eggs per hen per day) because they handle time incorrectly. For example, if a cookbook has a recipe that serves one, and you want to serve 100 guests, you increase the amount of ingredients by 100, but you do not cook it 100 times longer. The algebra in this problem looks like this, where we want to solve for the rate in terms of “eggs per day,” a strange but convenient unit of measurement for summarizing the hen house output: 1 1 / 2 hens * 1 1 / 2 days * rate = 1 1 / 2 eggs The first urge is to multiple both sides by in an attempt to turn every 1 1 / 2 into a 1. But what you actually get is: . weights by subtree is a two-table join. SELECT Superiors.node, SUM (Subordinates.weight) AS subtree_weight FROM NestTree AS Superiors, NestTree AS Subordinates NodeWeights AS W WHERE Subordinates.lft. list model. The most senior subordinate is found by this query: SELECT Subordinates.node, ' is the oldest child of ', :my_node FROM NestTree AS Superiors, NestTree AS Subordinates . the subordinates of each boss in the corporate hierarchy, you would write: SELECT Superiors.node, ' is a boss of ', Subordinates.node FROM NestTree AS Superiors, NestTree AS Subordinates

Ngày đăng: 06/07/2014, 09:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan