472 CHAPTER 21: AGGREGATE FUNCTIONS then the MOD() functions determines whether the count was odd or even. I present this version of the query first, because this is how I developed the answer. We can do a much better job with a little algebra and logic: SELECT CASE MIN (SIGN (nbr)) WHEN 1 THEN EXP (SUM (LN (nbr))) all positive numbers WHEN 0 THEN 0.00 some zeros WHEN -1 some negative numbers THEN (EXP (SUM (LN (ABS(nbr)))) * (CASE WHEN MOD (SUM (ABS (SIGN(nbr)-1/ 2)), 2) = 1 THEN -1.00 ELSE 1.00 END)) ELSE CAST (NULL AS FLOAT) END AS big_pi FROM NumberTable; For this solution, you will need to have the logarithm, exponential, mod, and sign functions in your SQL product. They are not standards, but they are very common. You might also have problems with data types. The SIGN() function should return an INTEGER, but might return the same data type as its parameter. The LN() function should cast nbr to FLOAT, but again, beware. The idea is that there are three special cases—all positive numbers, one or more zeros, and some negative numbers in the set. You can find out what your situation is with a quick test on the SIGN() of the minimum value in the set. Within the case where you have negative numbers, there are two subcases: (1) an even number of negatives, or (2) an odd number of negatives. You then need to apply some high school algebra to determine the sign of the final result. Itzik Ben-Gan had problems implementing this in SQL Server, and these issues are worth passing along in case your SQL product also has them. The query as written returns a domain error in SQL Server. It should not have done so if the result expressions in the CASE expression had been evaluated after the conditional flow had performed a short circuit evaluation. Examining the execution plan of the above query, it looks like the optimizer evaluates all of the possible result expressions in a step prior to handling the flow of the CASE expression. This means that in the expression after WHEN 1 the LN() function is also invoked in an intermediate phase for zeros and negative 21.7 Bitwise Aggregate Functions 473 numbers, and in the expression after WHEN -1 the LN(ABS()) is also invoked in an intermediate phase for zeros. This explains the domain error. To handle this, I had to use the ABS() and NULLIF() functions in the positive numbers when CLAUSE, and the NULLIF() function in the negative numbers when CLAUSE: WHEN 1 THEN EXP(SUM(LN(ABS(NULLIF(result, 0.00))))) and WHEN -1 THEN EXP(SUM(LN(ABS(NULLIF(result, 0.00))))) * CASE If you are sure you will have only positive values in the column being computed, then you can use: PRD(<exp>) = EXP(SUM(LN (<exp>))) or: PRD(<exp>) = POWER(CAST (10.00 AS FLOAT), SUM(LOG10(<exp>))) This depends on your vendor functions. This last version assumes that 10.00 would need to be cast as a FLOAT to work with LOG10(), but you should read the manual to see what the assumed data types are. As an aside, the book Bypasses: A Simple Approach to Complexity (Melzak 1983), is a short mathematical book on the general principle of conjugacy, the method of using a transform and its inverse to reduce the complexity of a calculation. 21.7 Bitwise Aggregate Functions A bitwise aggregate function is not a recommended practice, since it will destroy First Normal Form (1NF) by overloading a column with a vector whose components have individual meanings. But it is common enough that I have to mention it. Instead of giving each attribute in the data model its own column, bad programmers will assign a meaning to each bit in the binary representation of an INTEGER or SMALLINT. Some products will actually expose the physical model for the data types and 474 CHAPTER 21: AGGREGATE FUNCTIONS have proprietary bitwise AND and OR operators. Most products do not implement bitwise aggregate Boolean operators, however. But I feel like a furniture maker who is telling you what are the best rocks with which to drive screws into wood. To reiterate, an aggregate function must: 1. Drop all the NULLs 2. Drop all redundant duplicates if DISTINCT is in the parameter list 3. Retain all redundant duplicates if ALL or no other keyword is in the parameter list 4. Perform the required calculation on the remaining values in the expression 5. Return a NULL result for an empty set or for a set of all NULLs [which would be empty after application of (1)] Notice that rules 2 and 3 do not apply to bitwise operators, since a OR a = a, and a AND a = a. 21.7.1 Bitwise OR Aggregate Function Let’s create a simple table that holds the columns of bits as an integer. The CHECK() constraint prevents negative numbers and bit strings of different lengths. CREATE Table Foobar (bits INTEGER nullable for testing CHECK(bits BETWEEN 0 AND 15)); What we want is a bitwise OR on the bits column. SELECT MAX (CASE WHEN MOD (bits/1, 2) = 1 AND bits IS NOT NULL THEN 1 ELSE 0 END) + MAX (CASE WHEN MOD (bits/2, 2) = 1 AND bits IS NOT NULL THEN 2 ELSE 0 END) + MAX (CASE WHEN MOD (bits/4, 2) = 1 AND bits IS NOT NULL THEN 4 ELSE 0 END) 21.7 Bitwise Aggregate Functions 475 + MAX (CASE WHEN MOD (bits/8, 2) = 1 AND bits IS NOT NULL THEN 8 ELSE 0 END) FROM Foobar; The “bits/1” is redundant, but I used it to show the pattern for the construction of this expression. The hope is that a good optimizer will the use of a CASE expression inside the MAX() function. This immediately tells the optimizer that the set of possible answers is limited (in these expressions, limited to {0, 2^n}), so that once any row has returned the highest possible value, the evaluation can stop. The bad news with this expression is that a NULL in the bits column will return 0000. This can be corrected by adding: SIGN(MAX(bits)) * (<bitwise OR expression>) If Foobar is all zeros, then the SIGN() function will return a zero and an optimizer can spot this shortcut evaluation. If the table is empty, or the bits column is all NULLs, the SIGN() will get a NULL from MAX(bits) and propagate it. If bits are declared NOT NULL, then do not use this factor. 21.7.2 Bitwise AND Aggregate Function This code is obvious from the previous discussion. The MAX() now becomes a MIN(), since a single zero can set a bit in the aggregate to zero. The trick with the SIGN() function stays the same as before. SELECT SIGN(MAX(bits)) * MIN (CASE WHEN MOD (bits/1, 2) = 1 AND bits IS NOT NULL THEN 1 ELSE 0 END) + MIN (CASE WHEN MOD (bits/2, 2) = 1 AND bits IS NOT NULL THEN 2 ELSE 0 END) + MIN (CASE WHEN MOD (bits/4, 2) = 1 AND bits IS NOT NULL THEN 4 ELSE 0 END) + MIN (CASE WHEN MOD (bits/8, 2) = 1 AND bits IS NOT NULL THEN 8 ELSE 0 END) FROM Foobar; CHAPTER 22 Auxiliary Tables A I UXILIARY TABLES ARE A way of building functions and lookup tables that would be difficult, if not impossible, to do with the limited computational power of SQL. What SQL is good at is working with tables. Auxiliary tables are not really a part of the data model, but serve as adjuncts to do queries via joins rather than computations. Auxiliary tables are usually very static and are constructed from an outside data source. Thus they do not require the same constraint checking that dynamic tables do. As a general statement, they need to have a primary key declared so that it will create an index for searching and joining the auxiliary table to other tables in the schema, not to protect the data from redundancy. The most important auxiliary table is a Calendar, because the Common Era calendar is too irregular for easy computations. Holidays fall on lunar and solar cycles, there are hundreds of fiscal calendars, and so forth. The discussion of Calendar tables will be given in the section on temporal queries. This chapter will examine various kinds of numeric auxiliary tables. 22.1 The Sequence Table The Sequence table is a simple list of integers from 1 to ( n ) that is used in place of looping constructs in a procedural language. Rather than 478 CHAPTER 22: AUXILIARY TABLES incrementing a counter value, we try to work in parallel with a complete set of values. Unfortunately, SEQUENCE is a reserved word for a proposed construct in Standard SQL that builds a sequence of numbers, but handles them as if they were a list or file rather than a set. The same reserved word is found in Oracle, but not widely used in other products. This table has the general declaration: CREATE TABLE Sequence (seq INTEGER NOT NULL PRIMARY KEY CONSTRAINT non_negative_nbr CHECK (seq > 0) cardinal VARCHAR(25) NOT NULL, ordinal VARCHAR(25) NOT NULL, CONSTRAINT numbers_are_complete CHECK ((SELECT COUNT(*) FROM Sequence) = (SELECT MAX(seq) FROM Sequence)); It includes data such as: seq cardinal ordinal ======================================== 1 'one' 'first' 2 'two' 'second' 3 'three' 'third' 101 'One hundred and one' 'One hundred and first' This table is a list of all the integers from 1 to some value ( n ). The ordinal and cardinal columns are simply examples of handy things that you might want to do with an integer, such as turn it into English words, which would be difficult in a procedural language or pure SQL. I have found that is it a bad idea to start with zero, though that seems more natural to computer programmers. The reason for omitting zero is that this auxiliary table is often used to provide row numbering by being CROSS JOIN ed to another table, and the zero would throw off the one- to-one mapping. 22.1 The Sequence Table 479 The syntax of the sequence constructor I mentioned at the start of this section looks something like this—each product’s syntax will vary, but should have the same parameters: CREATE SEQUENCE <seqname> AS <data type> START WITH <value> INCREMENT BY <value> [MAXVALUE <value>] [MINALUE <value>] [[NO] CYCLE]; To get a value from it, this expression is used wherever it is a legal data type: NEXT VALUE FOR <seq name> If a sequence needs to be reset, you can use this statement to change the optional clauses or to restart the cycle: ALTER SEQUENCE <seq name> RESTART WITH <value>; begin over To remove the sequence, use the obvious statement: DROP SEQUENCE <seq name>; Even when this feature becomes widely available, it should be avoided. It is a nonrelational extension that behaves like a sequential file or procedural function, rather than in a set-oriented manner. You can currently find it in Oracle, Postgres, and Mimer products. 22.1.1 Enumerating a List Given a table in a data warehouse for a report that uses the monthly sales data shown as an attribute (the monthly amounts have to be NULL -able to hold missing values for the future): CREATE TABLE AnnualSales1 (salesman CHAR(15) NOT NULL PRIMARY KEY, jan DECIMAL(5,2), feb DECIMAL(5,2), 480 CHAPTER 22: AUXILIARY TABLES mar DECIMAL(5,2), apr DECIMAL(5,2), may DECIMAL(5,2), jun DECIMAL(5,2), jul DECIMAL(5,2), aug DECIMAL(5,2), sep DECIMAL(5,2), oct DECIMAL(5,2), nov DECIMAL(5,2), "dec" DECIMAL(5,2)); DEC is a reserved word The goal is to “flatten” it out so that it looks like this: CREATE TABLE AnnualSales2 (salesman_name CHAR(15) NOT NULL PRIMARY KEY, sales_month CHAR(3) NOT NULL CONSTRAINT valid_month_code CHECK (sales_month IN ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'), sales_amt DECIMAL(5, 2) NOT NULL, PRIMARY KEY(salesman, sales_month)); The trick is to build a VIEW of the original table with a number beside each month. CREATE VIEW NumberedSales AS SELECT salesman, 1 AS M01, jan, 2 AS M02, feb, 3 AS M03, mar, 4 AS M04, apr, 5 AS M05, may, 6 AS M06, jun, 7 AS M07, jul, 8 AS M08, aug, 9 AS M09, sep, 10 AS M10, oct, 11 AS M11, nov, 12 AS M12, "dec" reserved word FROM AnnualSales1; 22.1 The Sequence Table 481 Now you can use the auxiliary table of sequential numbers, or you can use a VALUES table constructor to build one. The flattened VIEW is: CREATE VIEW AnnualSales2 (salesman, sales_month, sales_amt) AS SELECT S1.salesman_name, (CASE WHEN A.nbr = M01 THEN 'Jan' WHEN A.nbr = M02 THEN 'Feb' WHEN A.nbr = M03 THEN 'Mar' WHEN A.nbr = M04 THEN 'Apr' WHEN A.nbr = M05 THEN 'May' WHEN A.nbr = M06 THEN 'Jun' WHEN A.nbr = M07 THEN 'Jul' WHEN A.nbr = M08 THEN 'Aug' WHEN A.nbr = M09 THEN 'Sep' WHEN A.nbr = M10 THEN 'Oct' WHEN A.nbr = M11 THEN 'Nov' WHEN A.nbr = M12 THEN 'Dec' ELSE NULL END), (CASE WHEN A.nbr = M01 THEN jan WHEN A.nbr = M02 THEN feb WHEN A.nbr = M03 THEN mar WHEN A.nbr = M04 THEN apr WHEN A.nbr = M05 THEN may WHEN A.nbr = M06 THEN jun WHEN A.nbr = M07 THEN jul WHEN A.nbr = M08 THEN aug WHEN A.nbr = M09 THEN sep WHEN A.nbr = M10 THEN oct WHEN A.nbr = M11 THEN nov WHEN A.nbr = M12 THEN "dec" reserved word ELSE NULL END) FROM NumberedSales AS S1 CROSS JOIN (SELECT seq FROM Sequence WHERE seq <= 12) AS A(month_nbr); If your SQL product has derived tables, this can be written as a single VIEW query. 22.1.2 Mapping a Sequence into a Cycle It is sometimes handy to map a sequence of numbers to a cycle. The general formula is: . problems implementing this in SQL Server, and these issues are worth passing along in case your SQL product also has them. The query as written returns a domain error in SQL Server. It should. 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'), sales_amt DECIMAL(5,. AnnualSales2 (salesman, sales_month, sales_amt) AS SELECT S1 .salesman_name, (CASE WHEN A.nbr = M01 THEN 'Jan' WHEN A.nbr = M02 THEN 'Feb' WHEN A.nbr = M03 THEN 'Mar'