172 CHAPTER 9: HEURISTICS CREATE TABLE Orders (order_nbr INTEGER NOT NULL, ); CREATE TABLE OrdersDetails (order_nbr INTEGER NOT NULL REFERENCES Orders (order_nbr) ON UPDATE CASCADE ON DELETE CASCADE, sku CHAR(10) NOT NULL REFERENCES Inventory (sku) ON UPDATE CASCADE ON DELETE CASCADE, description CHAR(20) NOT NULL, qty INTEGER NOT NULL CHECK(qty > 0), unit_price DECIMAL(12,4) NOT NULL, ); 9.1 Put the Specification into a Clear Statement This might sound obvious, but the operative word is clear statement. You need to ask questions at the start. Let me give some examples from actual problem statements having to do with a schema that models a typical orders and order details database: 1. “ I want to see the most expensive item in each order. ” How do I handle ties for the most expensive item? Did you mean the highest unit price or the highest extension (quantity × unit price) on each order? 2. “I want to see how many lawn gnomes everyone ordered.” How do I represent someone who never ordered a lawn gnome in the result set? Is that a NULL or a zero? If they returned all of their lawn gnomes, do I show the original order or the net results? Or do I show no order ever as a NULL and returns as a zero to preserve information? 3. “How many orders were over $100?” Did you mean strictly greater than $100 or greater than or equal to $100? 9.2 Add the Words “Set of All…” in Front of the Nouns 173 In the “Dance Partner” example, we need to ask: 1. How do we pair the couples? 2. What do we do if there are more boys than girls (or vice versa) in the table? 3. Can someone have more than one partner? If so, how do we assign them? Writing specs is actually harder than writing code. Given a complete, clear specification, the code can almost write itself. 9.2 Add the Words “Set of All…” in Front of the Nouns The big leap in SQL programming is thinking in sets and not in process steps that handle one unit of data at a time. Phrases like “for each x ” poison your mental model of the problem. Look for set characteristics and not for individual characteristics. For example, given the task to find all of the orders that ordered exactly the same number of each item, how would you solve it? One approach is, for each order, to see if there are two values of quantity that are not equal to each other and then reject that order. This leads to either cursors or a self-join. Here is a self-join version; I will not do the cursor version. SELECT D1.order_nbr FROM OrderDetails AS D1 WHERE NOT EXISTS (SELECT * FROM OrderDetails AS D2 WHERE D1.order_nbr = D2.order_nbr AND D1.qty <> D2.qty); Or you can look at each order as a set with these set properties: SELECT order_nbr FROM OrderDetails GROUP BY order_nbr HAVING MIN(qty) = MAX(qty); 174 CHAPTER 9: HEURISTICS 9.3 Remove Active Verbs from the Problem Statement Words like traverse, compute, or other verbs that imply a process will poison your mental model. Try to phrase it as a “state of being” description instead. This is the same idea as in section 9.2, but with a slight twist. Programmers coming from procedural languages think in terms of actions. They add numbers, whereas a declarative programmer looks at a total. They think of process, whereas we think of completed results. 9.4 You Can Still Use Stubs A famous Sydney Harris cartoon shows the phrase “Then a miracle occurs” in the middle of a blackboard full of equations, and a scientist says to the writer, “I think you should be more explicit here in step 2.” We used that same trick in procedural programming languages by putting in a stub module when we did not know what to do at the point in a program. For example, if you were writing a payroll program and the company had a complex bonus policy that you did not understand or have specifications for, you would write a stub procedure that always returned a constant value and perhaps sent out a message that it had just executed. This allowed you to continue with the parts of the procedure that you did understand. This is more difficult to do in a declarative language. Procedural language modules can be loosely coupled, whereas the clauses and subqueries of a SELECT statement are a single unit of code. You could set up a “test harness” for procedural language modules; this is more difficult in SQL. Looking at the “Dance Partner Problem,” I might approach it by saying that I need the boys and the girls in two separate subsets, but I don’t know how to write the code for that yet. So I stub it with some pseudocode in my text editor. Because this is for dance, let’s pick the pseudocode words from a musical. Nobody is going to see this scratch paper work, so why not? SELECT M1.name AS male, F1.name AS female FROM (<miracle for guys>) AS M1(name, <join thingie for guys>) FULL OUTER JOIN (<miracle for dolls>) AS F1(name, <join thingie for dolls>) ON M1.<join thingie for guys> ?? F1.<join thingie for dolls>; 9.4 You Can Still Use Stubs 175 The angle-bracketed pseudocode might expand to multiple columns, subqueries, or just about anything later. Right now they are placemarkers. I also have a “??” placemarker for the relationship between my guys and dolls. I can then go to the next level in the nesting and expand the (<miracle for guys>) subquery like this: (SELECT P1.name, <join thingie for guys> FROM People AS P1 WHERE P1.gender = 1) AS M1 (name, <join thingie for guys>) The same pattern would hold for the (<miracle for dolls>) subquery. I now need to figure out some way of getting code for <join thingie for guys>. The first place I look is the columns that appear in the People table. The only thing I can find in that table is gender. I have a rule that tells me guys = 1 and dolls = 2, and I am enforcing it in my subqueries already. (Note: The full ISO sex codes are 0 = unknown, 1 = male, 2 = female, and 9 = lawful persons, corporations, etc.) I could try this: SELECT M1.name AS male, F1.name AS female FROM (SELECT P1.name, P1.gender FROM People AS P1 WHERE P1.gender = 1) AS M1 (name, gender) FULL OUTER JOIN (SELECT P1.name, gender FROM People AS P1 WHERE P1.gender = 2) AS F1 (name, gender) ON M1.gender = 1 AND F1.gender = 2; but it is pretty easy to see that this is a CROSS JOIN in thin disguise. Add something with the names, perhaps? SELECT M1.name AS male, F1.name AS female FROM (SELECT P1.name, P1.gender FROM People AS P1 WHERE P1.gender = 1) AS M1 (name, gender) FULL OUTER JOIN (SELECT P1.name, gender FROM People AS P1 WHERE P1.gender = 2) AS F1 (name, gender) 176 CHAPTER 9: HEURISTICS ON M1.gender = 1 AND F1.gender = 2 AND M1.name <= F1.name; There was no help there. It produces a smaller set of pairs, but you still get multiple couples on the dance floor. This is where some experience with SQL helps. One of the customary programming tricks is to use a self-join to get a ranking of elements in a set based on their collation sequence. Because this works with any table, we can use it in both guys and dolls to get the final query. SELECT M1.name AS male, F1.name AS female FROM (SELECT P1.name, COUNT (P2.name) FROM People AS P1, People AS P2 WHERE P2.name <= P1.name AND P1.gender = 1 AND P2.gender = 1 GROUP BY P1.name) AS M1 (name, rank) FULL OUTER JOIN (SELECT P1.name, COUNT (P2.name) FROM People AS P1, People AS P2 WHERE P2.name <= P1.name AND P1.gender = 2 AND P2.gender = 2 GROUP BY P1.name) AS F1 (name, rank) ON M1.rank = F1.rank; 9.5 Do Not Worry about Displaying the Data In a tiered architecture, display is the job of the front end, not the database. Obviously, you do not do rounding, add leading zeros, change case, or pick a date format in the database. The important thing is to pass the front end all of the data it needs to do its job, but it is more than that. You can get your dance partner pairs with the query in section 9.4, but if you do not want to see the pairs on the same row, you can write a more compact query like this: SELECT P1.name, P1.gender, COUNT(P2.name) AS rank FROM People AS P1, People AS P2. WHERE P1.gender = P2.gender AND P2.name <= P1.name . almost write itself. 9.2 Add the Words “Set of All…” in Front of the Nouns The big leap in SQL programming is thinking in sets and not in process steps that handle one unit of data at a time still get multiple couples on the dance floor. This is where some experience with SQL helps. One of the customary programming tricks is to use a self-join to get a ranking of elements in a set. You could set up a “test harness” for procedural language modules; this is more difficult in SQL. Looking at the “Dance Partner Problem,” I might approach it by saying that I need the boys