Joe Celko s SQL for Smarties - Advanced SQL Programming P13 pptx

10 293 0
Joe Celko s SQL for Smarties - Advanced SQL Programming P13 pptx

Đang tải... (xem toàn văn)

Thông tin tài liệu

92 CHAPTER 2: NORMALIZATION Consider this actual problem, which appeared on CompuServe’s ORACLE forum some years ago. A pharmaceutical company has an inventory table, and price changes table that look like this: CREATE TABLE Drugs (drug_nbr INTEGER NOT NULL PRIMARY KEY, drug_name CHAR(30) NOT NULL, drug_qty INTEGER NOT NULL CONSTRAINT positive_quantity CHECK(drug_qty >= 0), ); CREATE TABLE Prices (drug_nbr INTEGER NOT NULL, start_date DATE NOT NULL, end_date DATE NOT NULL CONSTRAINT started_before_endded CHECK(start_date <= end_date), price DECIMAL(8,2) NOT NULL, PRIMARY KEY (drug_nbr, start_date)); Every order has to use the order date to find what the selling price was when the order was placed. The current price will have a value of “eternity” (a dummy date set so high that it will not be reached, such as ‘9999-12-31’). The (end_date + INTERVAL '1' DAY) of one price will be equal to the start_date of the next price for the same drug. While this is normalized, performance was bad. Every report, invoice or query will have a JOIN between Drugs and Prices. The trick might be to add more columns to the Drugs, like this: CREATE TABLE Drugs (drug_nbr INTEGER PRIMARY KEY, drug_name CHAR(30) NOT NULL, drug_qty INTEGER NOT NULL CONSTRAINT positive_quantity CHECK(drug_qty >= 0), current_start_date DATE NOT NULL, current_end_date DATE NOT NULL, CONSTRAINT current_start_before_endded CHECK(current_start_date <= current_end_date), current_price DECIMAL(8,2) NOT NULL, 2.11 Key Types 93 prior_start_date DATE NOT NULL, prior_end_date DATE NOT NULL, CONSTRAINT prior_start_before_endded CHECK(prior_start_date <= prior_end_date), AND (current_start_date = prior_end_date + INTERVAL '1' DAY prior_price DECIMAL(8,2) NOT NULL, ); This covered more than 95% of the orders in the actual company, because very few orders have more than two price changes before they are taken out of stock. The odd exception was trapped by a procedural routine. The other method is to add CHECK() constraints that will enforce the rules destroyed by denormalization. We will discuss this later, but the overhead for insertion, updating, and deleting to the table is huge. In fact, in many cases denormalized tables cannot be changed until a complete set of columns is built outside the table. Furthermore, while one set of queries is improved, all others are damaged. Today, however, only data warehouses should be denormalized. JOINs are far cheaper than they were, and the overhead of handling exceptions with procedural code is far greater than any extra database overhead. 2.11.5 Row Sorting On May 27, 2001, Fred Block posted a problem on the SQL Server Newsgroup. I will change the problem slightly, but the idea was that he had a table with five character string columns that had to be sorted alphabetically within each row. This “flatten table” denormalization is a very common one that might involve months of the year as columns, or other things that are acting as repeating groups in violation of 1NF. Let’s declare the table and dive into the problem: CREATE TABLE Foobar (key_col INTEGER NOT NULL PRIMARY KEY, c1 VARCHAR(20) NOT NULL, c2 VARCHAR(20) NOT NULL, c3 VARCHAR(20) NOT NULL, c4 VARCHAR(20) NOT NULL, c5 VARCHAR(20) NOT NULL); This means that we want this condition to hold: 94 CHAPTER 2: NORMALIZATION CHECK ((c1 <= c2) AND (c2 <= c3) AND (c3 <= c4) AND (c4 <= c5)) Obviously, if he had added this constraint to the table in the first place, we would be fine. Of course, that would have pushed the problem to the front end, and I would not have a topic for this section. What was interesting was how everyone who read this newsgroup posting immediately envisioned a stored procedure that would take the five values, sort them and return them to their original row in the table. The only way to make this approach work for the whole table was to write an update cursor and loop through all the rows of the table. Itzik Ben-Gan posted a simple procedure that loaded the values into a temporary table, then pulled them out in sorted order, starting with the minimum value, using a loop. Another trick is the Bose-Nelson sort (Bose-Nelson Sort, Dr. Dobbs Journal, September 1985, pp. 282-296), which I had written about in Dr. Dobb’s Journal back in 1985. This sort is a recursive procedure that takes an integer and then generates swap pairs for a vector of that size. A swap pair is a pair of position numbers from 1 to (n) in the vector that need to be exchanged if they are out of order. These swap pairs are also related to Sorting Networks in the literature (see Donald Knuth, Art of Computer Programming, Volume 3: Sorting and Searching, 2nd Edition, April 24, 1998, ISBN: 0-201-89685-0). You are probably thinking that this method is a bit weak, because the results are only good for sorting a fixed number of items. But a table only has a fixed number of columns, so that is not a problem in denormalized SQL. You can set up a sorting network that will sort five items, with the minimal number of exchanges, nine swaps, like this: Swap(c1, c2); Swap(c4, c5); Swap(c3, c5); Swap(c3, c4); Swap(c1, c4); Swap(c1, c3); Swap(c2, c5); Swap(c2, c4); Swap(c2, c3); 2.11 Key Types 95 You might want to deal yourself a hand of five playing cards in one suit to see how it works. Put the cards face down on the table and pick up the pairs, swapping them if required, then turn over the row to see that it is in sorted order when you are done. In theory, the minimum number of swaps needed to sort (n) items is CEILING (log2 (n!)), and as (n) increases, this approaches O(n*log2(n)). Computer science majors will remember this “Big O” expression as the expected performance of the best sorting algorithms, such as Quicksort. The Bose-Nelson method is very good for small values of (n). If (n < 9) then it is perfect, actually. But as things get bigger, Bose-Nelson approaches O(n ^ 1.585). In English, this method is good for a fixed size list of 16 or fewer items, but it goes to Hell after that. You can write a version of the Bose-Nelson procedure that will output the SQL code for a given value of (n). The obvious direct way to do a Swap() is to write a chain of UPDATE statements. Remember that in SQL, the SET clause assignments happen in parallel, so you can easily write a SET clause that exchanges the two items when they are out of order. Using the above swap chain, we get this block of code: BEGIN ATOMIC Swap(c1, c2); UPDATE Foobar SET c1 = c2, c2 = c1 WHERE c1 > c2; Swap(c4, c5); UPDATE Foobar SET c4 = c5, c5 = c4 WHERE c4 > c5; Swap(c3, c5); UPDATE Foobar SET c3 = c5, c5 = c3 WHERE c3 > c5; Swap(c3, c4); UPDATE Foobar SET c3 = c4, c4 = c3 WHERE c3 > c4; Swap(c1, c4); 96 CHAPTER 2: NORMALIZATION UPDATE Foobar SET c1 = c4, c4 = c1 WHERE c1 > c4; Swap(c1, c3); UPDATE Foobar SET c1 = c3, c3 = c1 WHERE c1 > c3; Swap(c2, c5); UPDATE Foobar SET c2 = c5, c5 = c2 WHERE c2 > c5; Swap(c2, c4); UPDATE Foobar SET c2 = c4, c4 = c2 WHERE c2 > c4; Swap(c2, c3); UPDATE Foobar SET c2 = c3, c3 = c2 WHERE c2 > c3; END; This is fully portable, Standard SQL code, and it can be machine- generated. But that parallelism is useful. It is worthwhile to combine some of the UPDATE statements. But you have to be careful not to change the effective sequence of the swap operations. If you look at the first two UPDATE statements, you can see that they do not overlap. This means you could roll them into one statement like this: Swap(c1, c2) AND Swap(c4, c5); UPDATE Foobar SET c1 = CASE WHEN c1 <= c2 THEN c1 ELSE c2 END, c2 = CASE WHEN c1 <= c2 THEN c2 ELSE c1 END, c4 = CASE WHEN c4 <= c5 THEN c4 ELSE c5 END, c5 = CASE WHEN c4 <= c5 THEN c5 ELSE c4 END WHERE c4 > c5 OR c1 > c2; 2.11 Key Types 97 The advantage of doing this is that you have to execute only one UPDATE statement, not two. Updating a table, even on nonkey columns, usually locks the table and prevents other users from getting to the data. If you could roll the statements into one single UPDATE, you would have the best of all possible worlds, but I doubt that the code would be easy to read. We can see this same pattern in the pair of statements: Swap(c1, c3); Swap(c2, c5); But there are other patterns, so you can write general templates for them. Consider this one: Swap(x, y); Swap(x, z); Write out all possible triplets and apply these two operations on them, thus: (x, y, z) => (x, y, z) (x, z, y) => (x, z, y) (y, x, z) => (x, y, z) (y, z, x) => (x, z, y) (z, x, y) => (x, y, z) (z, y, x) => (x, y, z) The result of this pattern is that x is lowest value of the three values, and y and z either stay in the same relative position to each other or be sorted properly. Properly sorting them would have the advantage of saving exchanges later and also of reducing the set of the subset being operated upon by each UPDATE statement. With a little thought, we can write the following symmetric piece of code. Swap(x, y) AND Swap(x, z); UPDATE Foobar SET x = CASE WHEN x BETWEEN y AND z THEN y WHEN z BETWEEN y AND x THEN y WHEN y BETWEEN z AND x THEN z WHEN x BETWEEN z AND y THEN z ELSE x END, 98 CHAPTER 2: NORMALIZATION y = CASE WHEN x BETWEEN y AND z THEN x WHEN x BETWEEN z AND y THEN x WHEN z BETWEEN x AND y THEN z WHEN z BETWEEN y AND x THEN z ELSE y END, z = CASE WHEN x BETWEEN z AND y THEN y WHEN z BETWEEN x AND y THEN y WHEN y BETWEEN z AND x THEN x WHEN z BETWEEN y AND x THEN x ELSE z END WHERE x > z OR x > y; While it is very tempting to write more and more of these pattern templates, it might be more trouble than it is worth, because of increased maintenance and readability. Here is an SQL/PSM program for the Bose-Nelson sort, based on the version given in Frederick Hegeman’s “Sorting Networks” article for The C/C++ User’s Journal (Hegeman 1993). It assumes that you have a procedure called PRINT() for output to a text file. You can translate it into the programming language of your choice easily, as long as it supports recursion. CREATE PROCEDURE BoseSort (IN i INTEGER, IN j INTEGER) LANGUAGE SQL DETERMINISTIC BEGIN DECLARE m INTEGER; IF j > i THEN SET m = i + (j-i+1)/2 -1; CALL BoseSort(i,m); CALL BoseSort(m+1, j); CALL BoseMerge(i, m, m+1, j); END IF; END; CREATE PROCEDURE BoseMerge (IN i1 INTEGER, IN i2 INTEGER, IN 'j1' INTEGER, IN 'j2' INTEGER) LANGUAGE SQL DETERMINISTIC BEGIN DECLARE i_mid INTEGER; 2.11 Key Types 99 DECLARE j_mid INTEGER; IF i2 = i1 AND 'j2' = 'j1' THEN CALL PRINT('swap', i1, 'j1'); ELSE IF i2 = i1+1 AND 'j2' = 'j1' THEN CALL PRINT('swap', i1, 'j1'); CALL PRINT('swap', i2, 'j1'); ELSE IF i2 = i1+1 AND 'j2' = 'j1'+1 THEN CALL PRINT('swap', i1, 'j2'); CALL PRINT('swap', i1, 'j1'); ELSE SET i_mid = i1 + (i2-i1+1)/2 - 1; IF MOD((i2-i1+1),2) = 0 AND i2-i1 <> 'j2'-'j1' THEN SET j_mid = ('j1' + 'j2'-'j1')/2 -1; CALL BoseMerge(i1, i_mid, 'j1', j_mid); CALL BoseMerge(ii_mid+1, i2, j_mid+1, 'j2'); CALL BoseMerge(ii_mid+1, i2, 'j1', j_mid); END IF; END IF; END IF; END IF; END; CHAPTER 3 Numeric Data in SQL S QL IS NOT A computational or procedural language; the arithmetic capability of SQL is weaker than that of any other language you have ever used. But there are some tricks that you need to know when working with numbers in SQL and when passing them to a host program. Much of the arithmetic and the functions are defined by implementations, so you should experiment with your particular product and make notes on the defaults, precision, and tools in the math library of your database. You should also read Chapter 21, which deals with the related topic of aggregate functions. This chapter deals with the arithmetic that you would use across a row, instead of down a column; they are not quite the same. 3.1 Numeric Types The SQL Standard has a wide range of numeric types. The idea is that any host language can find an SQL numeric type that matches one of its own. You will also find some vendor extensions in the numeric data types, the most common of which is MONEY . This is really a DECIMAL or NUMERIC data type, which also accepts and displays currency symbols in input and output. . 'j1'); ELSE SET i_mid = i1 + (i2-i1+1)/2 - 1; IF MOD((i2-i1+1),2) = 0 AND i2-i1 <> 'j2&apos ;-& apos;j1' THEN SET j_mid = ('j1' + 'j2&apos ;-& apos;j1')/2 -1 ; . Here is an SQL/ PSM program for the Bose-Nelson sort, based on the version given in Frederick Hegeman s “Sorting Networks” article for The C/C++ User s Journal (Hegeman 1993). It assumes that. PRINT('swap', i2, 'j1'); ELSE IF i2 = i1+1 AND 'j2' = 'j1'+1 THEN CALL PRINT('swap', i1, 'j2'); CALL PRINT('swap', i1, 'j1');

Ngày đăng: 06/07/2014, 09:20

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan