Joe Celko s SQL for Smarties - Advanced SQL Programming P60 doc

562 CHAPTER 24: REGIONS, RUNS, GAPS, SEQUENCES, AND SERIES or: SELECT MIN(F1.seq_nbr + 1) FROM List AS F1 UNION ALL VALUE (0) WHERE (L1.seq_nbr +1) NOT IN (SELECT seq_nbr FROM List); Finding entire gaps follows from this pattern, and we get this short piece of code. SELECT (s + 1) AS gap_start, (e - 1) AS gap_end FROM (SELECT L1.seq_nbr, MIN(L2.seq_nbr) FROM List AS L1, List AS L2 WHERE L1.seq_nbr < L2.seq_nbr GROUP BY L1.seq_nbr) AS G(s, e) WHERE (e - 1) > s; Without the derived table we get: SELECT (L1.seq_nbr + 1) AS gap_start, (MIN(L2.seq_nbr) - 1) AS gap_end FROM List AS L1, List AS L2 WHERE L1.seq_nbr < L2.seq_nbr GROUP BY L1.seq_nbr HAVING (MIN(L2.seq_nbr) - L1.seq_nbr) > 1; 24.6 Summation of a Series While this topic is a bit more mathematical than most SQL programmers actually have to use in their work, it does demonstrate the power of SQL and a little knowledge of some basic college math. The summation of a series builds a running total of the values in a table and shows the cumulative total for each value in the series. Let’s create a table and some sample data. CREATE TABLE Series (seq_nbr INTEGER NOT NULL PRIMARY KEY, 24.6 Summation of a Series 563 val INTEGER NOT NULL, answer INTEGER null means not computed yet ); Sequences seq_nbr val answer ====================== 1 6 6 2 41 47 3 12 59 4 51 110 5 21 131 6 70 201 7 79 280 8 62 342 This simple summation is not a problem. UPDATE Series SET answer = (SELECT SUM(R1.val) FROM Series AS S1 WHERE R1.seq_nbr <= Series.seq_nbr) WHERE answer IS NULL; This is the form we can use for most problems of this type with only one level of summation. But things can be worse. This problem came from Francisco Moreno, and on the surface it sounds easy. First, create the usual table and populate it. DROP TABLE Series; CREATE TABLE Series (seq_nbr INTEGER NOT NULL, val REAL NOT NULL, answer REAL); INSERT INTO Series VALUES (0, 6.0, NULL), (1, 6.0, NULL), (2, 10.0, NULL), (3, 12.0, NULL), (4, 14.0, NULL); 564 CHAPTER 24: REGIONS, RUNS, GAPS, SEQUENCES, AND SERIES The goal is to compute the average of the first two terms, then add the third value to the result and average the two of them, and so forth. In this data, we would have: seq_nbr val answer ==================== 0 6.0 NULL 1 6.0 6.0 2 10.0 8.0 3 12.0 10.0 4 14.0 12.0 The first thing we need to do is get rid of the value where (seq_nbr = 0) and change the table to read: seq_nbr val answer ===================== 1 12.0 NULL 2 10.0 NULL 3 12.0 NULL 4 14.0 NULL The obvious approach is to do the calculations directly. UPDATE Series SET answer = (Series.val + (SELECT R1.answer FROM Series AS S1 WHERE R1.seq_nbr = Series.seq_nbr - 1))/2.0 WHERE answer IS NULL; But there is a problem with this approach. It will only calculate one value at a time. The reason is that this series is much more complex than a simple running total. What we have is actually a double summation, in which the terms are defined by a continued fraction. Let’s work out the first four answers by brute force and see if we can find a pattern. answer1 = (12)/2 = 6 answer2 = ((12)/2 + 10)/2 = 8 answer3 = (((12)/2 + 10)/2 + 12)/2 = 10 answer4 = (((((12)/2 + 10)/2 + 12)/2) + 14)/2 = 12 24.7 Swapping and Sliding Values in a List 565 The real trick is to do some algebra and get rid of the nested parentheses. answer1 = (12)/2 = 6 answer2 = (12/4) + (10/2) = 8 answer3 = (12/8) + (10/4) + (12/2) = 10 answer4 = (12/16) + (10/8) + (12/4) + (14/2) = 12 When we see powers of 2, we know we can reseq_nbr them with a formula: answer1 = (12)/2^1 = 6 answer2 = (12/(2^2)) + (10/(2^1)) = 8 answer3 = (12/(2^3)) + (10/(2^2)) + (12/(2^1)) = 10 answer4 = (12/2^4) + (10/(2^3)) + (12/(2^2)) + (14/(2^1)) = 12 The problem is that you need to “count backwards” from the current value to compute higher powers for the previous terms of the summation. That is simply (current_val - previous_val + 1). Putting it all together, we get this expression: UPDATE Series SET answer = (SELECT SUM(val * POWER(2, CASE WHEN R1.seq_nbr > 0 THEN Series.seq_nbr - R1.seq_nbr + 1 ELSE NULL END)) FROM Series AS S1 WHERE R1.seq_nbr <= Series.seq_nbr); This assumes that we have a POWER(base, exponent) function in our implementation. The reason for the second copy of Series under the name S2 in the SUM() expression is that an aggregate function cannot have an outer reference. 24.7 Swapping and Sliding Values in a List You will often want to manipulate a list of values, changing their sequence position numbers. The simplest such operation is to swap two values in your table. 566 CHAPTER 24: REGIONS, RUNS, GAPS, SEQUENCES, AND SERIES CREATE PROCEDURE SwapValues (IN low_seq_nbr INTEGER, IN high_seq_nbr INTEGER) LANGUAGE SQL BEGIN put them in order SET low_seq_nbr = CASE WHEN low_seq_nbr <= high_seq_nbr THEN low_seq_nbr ELSE high_seq_nbr; SET high_seq_nbr = CASE WHEN low_seq_nbr <= high_seq_nbr THEN high_seq_nbr ELSE low_seq_nbr; UPDATE Runs swap SET seq_nbr = low_seq_nbr + ABS(seq_nbr - high_seq_nbr) WHERE seq_nbr IN (low_seq_nbr, high_seq_nbr); END; Inserting a new value into the table is easy: CREATE PROCEDURE InsertValue (IN new_value INTEGER) LANGUAGE SQL INSERT INTO Runs (seq_nbr, val) VALUES ((SELECT MAX(seq_nbr) FROM Runs) + 1, new_value); A bit trickier procedure is to move one value to a new position and slide the remaining values either up or down. This mimics the way a physical queue would act. Here is a solution from Dave Portas. CREATE PROCEDURE SlideValues (IN old_seq_nbr INTEGER, IN new_seq_nbr INTEGER) LANGUAGE SQL UPDATE Runs SET seq_nbr = CASE WHEN seq_nbr = old_seq_nbr THEN new_seq_nbr WHEN seq_nbr BETWEEN old_seq_nbr AND new_seq_nbr THEN seq_nbr - 1 WHEN seq_nbr BETWEEN new_seq_nbr AND old_seq_nbr THEN seq_nbr + 1 ELSE seq_nbr END WHERE seq_nbr BETWEEN old_seq_nbr AND new_seq_nbr OR seq_nbr BETWEEN new_seq_nbr AND old_seq_nbr; 24.9 Folding a List of Numbers 567 This handles moving a value to a higher or to a lower position in the table. You can see how calls or slight changes to these procedures could do other related operations. One of the most useful tricks is to have a calendar table with a Julianized date column. Instead of trying to manipulate temporal data, convert the dates to a sequence of integers and treat the queries as regions, runs, gaps, and so forth. The sequence can be made up of calendar days or Julianized business days, which do not include holidays and weekends. There are a lot of possible methods. 24.8 Condensing a List of Numbers The goal is to take a list of numbers and condense them into contiguous ranges. Show the high and low values for each range; if the range has one number, then the high and low values will be the same. This answer is due to Steve Kass. SELECT MIN(i) AS low, MAX(i) AS high FROM (SELECT N1.i, COUNT(N2.i) - N1.i FROM Numbers AS N1, Numbers AS N2 WHERE N2.i <= N1.i GROUP BY N1.i) AS N(i, gp) GROUP BY gp; 24.9 Folding a List of Numbers It is possible to use the Sequence table to give columns in the same row, which are related to each other, values with a little math instead of self- joins. For example, given the numbers 1 to (n), you might want to spread them out across (k) columns. Let (k = 3) so we can see the pattern. SELECT seq_nbr, CASE WHEN MOD((seq_nbr + 1), 3) = 2 AND seq_nbr + 1 <= :n THEN (seq_nbr + 1) ELSE NULL END AS second, CASE WHEN MOD((seq_nbr + 2), 3) = 0 AND (seq_nbr + 2) <= :n THEN (seq_nbr + 2) 568 CHAPTER 24: REGIONS, RUNS, GAPS, SEQUENCES, AND SERIES ELSE NULL END AS third FROM Sequence WHERE MOD((seq_nbr + 3), 3) = 1 AND seq_nbr <= :n; Columns which have no value assigned to them will get a NULL. That is, for (n = 8) the incomplete row will be (7, 8, NULL) and for (n = 7) it would be (7, NULL, NULL). We never get a row with (NULL, NULL, NULL). Using math can be fancier. In a golf tournament, the players with the lowest and highest scores are matched together for the next round. Then the players with the second lowest and second highest scores are matched together, and so forth. If the number of players is odd, the player with the middle score sits out that round. These pairs can be built with a simple query. SELECT seq_nbr AS low_score, CASE WHEN seq_nbr <= (:n - seq_nbr) THEN (:n - seq_nbr) + 1 ELSE NULL END AS high_score FROM Sequence AS S1 WHERE S1.seq_nbr <= CASE WHEN MOD(:n, 2) = 1 THEN FLOOR(:n/2) + 1 ELSE (:n/2) END; If you play around with the basic math functions, you can do quite a bit. 24.10 Coverings Mikito Harakiri proposed the problem of writing the shortest SQL query that would return a minimal cover of a set of intervals. For example, given this table, how do you find the contiguous numbers that are completely covered by the given intervals? CREATE TABLE Intervals (x INTEGER NOT NULL, y INTEGER NOT NULL, CHECK (x <= y), PRIMARY KEY (x, y)); 24.10 Coverings 569 INSERT INTO Intervals VALUES (1, 3); INSERT INTO Intervals VALUES (2, 5); INSERT INTO Intervals VALUES (4, 11); INSERT INTO Intervals VALUES (10, 12); INSERT INTO Intervals VALUES (20, 21); INSERT INTO Intervals VALUES (120, 130); INSERT INTO Intervals VALUES (120, 128); INSERT INTO Intervals VALUES (120, 122); INSERT INTO Intervals VALUES (121, 132); INSERT INTO Intervals VALUES (121, 122); INSERT INTO Intervals VALUES (121, 124); INSERT INTO Intervals VALUES (121, 123); INSERT INTO Intervals VALUES (126, 127); The query should return Results min_x MAX(y) ================ 1 12 20 21 120 132 Dieter Nöth found an answer with OLAP functions: SELECT min_x, MAX(y) FROM (SELECT x, y, MAX(CASE WHEN x <= MAX_Y THEN NULL ELSE x END) OVER (ORDER BY x, y ROWS UNBOUNDED PRECEDING) AS min_x FROM (SELECT x, y, MAX(y) OVER(ORDER BY x, y ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS max_y FROM Intervals) AS DT) AS DT GROUP BY min_x; 570 CHAPTER 24: REGIONS, RUNS, GAPS, SEQUENCES, AND SERIES Here is a query that uses a self-join and three-level, nested correlated subquery that uses the same approach. SELECT I1.x, MAX(I2.y) AS y FROM Intervals AS I1 INNER JOIN Intervals AS I2 ON I2.y > I1.x WHERE NOT EXISTS (SELECT * FROM Intervals AS I3 WHERE I1.x - 1 BETWEEN I3.x AND I3.y) AND NOT EXISTS (SELECT * FROM Intervals AS I4 WHERE I4.y > I1.x AND I4.y < I2.y AND NOT EXISTS (SELECT * FROM Intervals AS I5 WHERE I4.y + 1 BETWEEN I5.x AND I5.y)) GROUP BY I1.x; This is essentially the same format, but converted to use left anti- semi-joins instead of subqueries. I do not think it is shorter, but it might execute better on some platforms, and some people prefer this format to subqueries. SELECT I1.x, MAX(I2.y) AS y FROM Intervals AS I1 INNER JOIN Intervals AS I2 ON I2.y > I1.x LEFT OUTER JOIN Intervals AS I3 ON I1.x - 1 BETWEEN I3.x AND I3.y LEFT OUTER JOIN (Intervals AS I4 LEFT OUTER JOIN Intervals AS I5 ON I4.y + 1 BETWEEN I5.x AND I5.y) 24.10 Coverings 571 ON I4.y > I1.x AND I4.y < I2.y AND I5.x IS NULL WHERE I3.x IS NULL AND I4.x IS NULL GROUP BY I1.x; If the table is large, the correlated subqueries (version 1) or the quintuple self-join (version 2) will probably make it slow. But we were asked for a short query, not for a quick one. Tony Andrews came up with this answer. SELECT Starts.x, Ends.y FROM (SELECT x, ROW_NUMBER() OVER(ORDER BY x) AS rn FROM (SELECT x, y, LAG(y) OVER(ORDER BY x) AS prev_y FROM Intervals) WHERE prev_y IS NULL OR prev_y < x) AS Starts, (SELECT y, ROW_NUMBER() OVER(ORDER BY y) AS rn FROM (SELECT x, y, LEAD(x) OVER(ORDER BY y) AS next_x FROM Intervals) WHERE next_x IS NULL OR y < next_x) AS Ends WHERE Starts.rn = Ends.rn; John Gilson decided that using recursion is an interesting take on this problem and made this offering: WITH RECURSIVE Cover (x, y, n) AS (SELECT x, y, (SELECT COUNT(*) FROM Intervals) FROM Intervals UNION ALL SELECT CASE WHEN I3.x <= I.x THEN I3.x ELSE I.x END, CASE WHEN I3.y >= I.y THEN I3.y ELSE I.y END, I3.n - 1 FROM Intervals AS I, Cover AS C WHERE I.x <= I3.y AND I.y >= I3.x AND (I.x <> I3.x OR I.y <> I3.y) . 342 This simple summation is not a problem. UPDATE Series SET answer = (SELECT SUM(R1.val) FROM Series AS S1 WHERE R1.seq_nbr <= Series.seq_nbr) WHERE answer IS NULL; This is the form we. R1.seq_nbr > 0 THEN Series.seq_nbr - R1.seq_nbr + 1 ELSE NULL END)) FROM Series AS S1 WHERE R1.seq_nbr <= Series.seq_nbr); This assumes that we have a POWER(base, exponent) function. manipulate a list of values, changing their sequence position numbers. The simplest such operation is to swap two values in your table. 566 CHAPTER 24: REGIONS, RUNS, GAPS, SEQUENCES, AND SERIES CREATE

Định dạng
Số trang	10
Dung lượng	123,92 KB