CHAPTER 10 Thinking in SQL “It ain’t so much the things we don’t know that get us into trouble. It’s the thing we know that just ain’t so.” —Artemus Ward (Charles Farrar Browne), American humorist (1834–1867) T HE BIGGEST HURDLE in learning SQL is thinking in sets and logic, instead of in sequences and processes. I just gave you a list of heuristics in the previous chapter, but let’s take a little time to analyze why mistakes were made. You now have some theory, but can you do diagnostics? I tried to find common errors that new programmers make, but perhaps the most difficult thing to learn is thinking in sets. Consider the classic puzzle shown in Figure 10.1. The usual mistake people make is trying to count the 1 × 1 × 2 bricks one at a time. This requires the ability to make a three- dimensional mental model of the boxes, which is really difficult for most of us. The right approach is to look at the whole block as if it were completely filled in. It is 4 × 5 × 5 units, or 50 bricks. The corner that is knocked off is 3 bricks, which we can count individually, so we must have 47 bricks in the block. The arrangement inside the block does not matter at all. 184 CHAPTER 10: THINKING IN SQL All of these examples are based on actual postings in a newsgroup that have been translated into SQL/PSM to remove proprietary features. In some cases, I have cleaned up the data element names, and in others I have left them. Obviously, I am guessing at motivation for each example, but I think I can defend my reasoning. 10.1 Bad Programming in SQL and Procedural Languages As an example of not learning any relational approaches to a problem, consider a posting in the comp.databases.ms-sqlserver newsgroup in January 2005: The title was “How to Find a Hole in Records,” which already tells you that the poster is thinking in terms of a file system and not an RDBMS. The original table declaration had the usual newbie “id” column, without a key or any constraints. The table modeled a year’s worth of rows identified by a week-within-year number (1 to 53) and a day-of- the-week number (1 to 7). Thus, we started with a table that looked more or less like this, after the names were cleaned up: CREATE TABLE WeeklyReport (id INTEGER AUTONUMBER NOT NULL,—not valid SQL! week_nbr INTEGER NOT NULL, day_nbr INTEGER NOT NULL); Figure 10.1 Classic block puzzle. 10.1 Bad Programming in SQL and Procedural Languages 185 By removing the useless, proprietary id column and adding constraints, we then had the following table: CREATE TABLE WeeklyReport (week_nbr INTEGER NOT NULL CHECK(week_nbr BETWEEN 1 AND 53), day_nbr INTEGER NOT NULL CHECK(day_nbr BETWEEN 1 AND 7), PRIMARY KEY(week_nbr, day_nbr)); Despite giving some constraints in the narrative specification, the poster never bothered to apply them to the table declaration. Newbies think of a table as a file, not as a set. The only criteria that data needs to be put into a file is that it is written to that file. The file cannot validate anything. The proprietary auto-number acts to replace a nonrelational record number in a sequential file system. The problem was to find the earliest missing day within each week for inserting a new row. If there were some other value or measurement for that date being recorded, it was not in the specifications. The poster’s own T-SQL solution translated in SQL/PSM like this, with some name changes: CREATE FUNCTION InsertNewWeekDay (IN my_week_nbr_nbr INTEGER) RETURNS INTEGER LANGUAGE SQL BEGIN DECLARE my_day_nbr INTEGER; DECLARE result_day_nbr INTEGER; SET my_day_nbr = 1; xx: WHILE my_day_nbr < 8 DO IF NOT EXISTS (SELECT * FROM WeeklyReport WHERE day_nbr = my_day_nbr AND week_nbr = my_week_nbr_nbr) THEN BEGIN SET result_day_nbr = my_day_nbr; LEAVE xx; END; ELSE BEGIN SET my_day_nbr = my_day_nbr + 1; 186 CHAPTER 10: THINKING IN SQL ITERATE xx; END; END IF; END WHILE; RETURN result_day_nbr; END; This is a classic imitation of a FOR loop, or counting loop, used in all 3GL programming languages. However, if you look at it for two seconds, you will see that this is bad procedural programming! SQL will not make up for a lack of programming skills. In fact, the bad effects of mimicking 3GL languages in SQL are magnified. The optimizers and compilers in SQL engines are not designed to look for procedural code optimizations. By removing the redundant local variables and getting rid of the hidden GOTO statements in favor of a simple, classic structured design, the poster should have written this: CREATE FUNCTION InsertNewWeekDay (IN my_week_nbr INTEGER) RETURNS INTEGER LANGUAGE SQL BEGIN DECLARE answer_nbr INTEGER; SET answer_nbr = 1; WHILE answer_nbr < 8 DO IF NOT EXISTS (SELECT * FROM WeeklyReport WHERE day_number = answer_nbr AND week_nbr = my_week_nbr) THEN RETURN answer_nbr; ELSE SET answer_nbr = answer_nbr + 1; END IF; END WHILE; RETURN CAST (NULL AS INTEGER);—cause an error END; This points out another weakness in this posting. We were not told how to handle a week that has all seven days represented. In the original table design, any integer value would have been accepted because of the lack of constraints. In the revised DDL, any weekday value not between 1 and 7 will cause a primary-key violation. This is not the best solution, . procedural programming! SQL will not make up for a lack of programming skills. In fact, the bad effects of mimicking 3GL languages in SQL are magnified. The optimizers and compilers in SQL engines. INTEGER AUTONUMBER NOT NULL,—not valid SQL! week_nbr INTEGER NOT NULL, day_nbr INTEGER NOT NULL); Figure 10.1 Classic block puzzle. 10.1 Bad Programming in SQL and Procedural Languages 185 . poster’s own T -SQL solution translated in SQL/ PSM like this, with some name changes: CREATE FUNCTION InsertNewWeekDay (IN my_week_nbr_nbr INTEGER) RETURNS INTEGER LANGUAGE SQL BEGIN DECLARE