582 CHAPTER 25: ARRAYS IN SQL j INTEGER NOT NULL CHECK (j > 0), CHECK ((SELECT MAX(i) FROM MyMatrix) = (SELECT COUNT(i) FROM MyMatrix)), CHECK ((SELECT MAX(j) FROM MyMatrix) = (SELECT COUNT(j) FROM MyMatrix))); The constraints see that the subscripts of each element are within proper range. I am starting my subscripts at one, but a little change in the logic would allow any value. 25.3.1 Matrix Equality This test for matrix equality is from the article “SQL Matrix Processing” (Mrdalj, Vujovic, and Jovanovic 1996). Two matrices are equal if their cardinalities and the cardinality of the their intersection are all equal. SELECT COUNT(*) FROM MatrixA UNION SELECT COUNT(*) FROM MatrixB UNION SELECT COUNT(*) FROM MatrixA AS A, MatrixB AS B WHERE A.i = B.i AND A.j = B.j AND A.element = B.element; You have to decide how to use this query in your context. If it returns one number, they are the same; otherwise, they are different. 25.3.2 Matrix Addition Matrix addition and subtraction are possible only between matrices of the same dimensions. The obvious way to do the addition is simply: SELECT A.i, A.j, (A.element + B.element) AS total FROM MatrixA AS A, MatrixB AS B WHERE A.i = B.i AND A.j = B.j; But properly, you ought to add some checking to be sure the matrices match. We can assume that both start numbering subscripts with either one or zero. 25.3 Matrix Operations in SQL 583 SELECT A.i, A.j, (A.element + B.element) AS total FROM MatrixA AS A, MatrixB AS B WHERE A.i = B.i AND A.j = B.j AND (SELECT COUNT(*) FROM MatrixA) = (SELECT COUNT(*) FROM MatrixB) AND (SELECT MAX(i) FROM MatrixA) = (SELECT MAX(i) FROM MatrixB) AND (SELECT MAX(j) FROM MatrixA) = (SELECT MAX(j) FROM MatrixB)); Likewise, to make the addition permanent, you can use the same basic query in an UPDATE statement: UPDATE MatrixA SET element = element + (SELECT element FROM MatrixB WHERE MatrixB.i = MatrixA.i AND MatrixB.j = MatrixA.j) WHERE (SELECT COUNT(*) FROM MatrixA) =(SELECT COUNT(*) FROM MatrixB) AND (SELECT MAX(i) FROM MatrixA) = (SELECT MAX(i) FROM MatrixB) AND (SELECT MAX(j) FROM MatrixA) = (SELECT MAX(j) FROM MatrixB)); 25.3.3 Matrix Multiplication Multiplication by a scalar constant is direct and easy: UPDATE MyMatrix SET element = element * :constant; Matrix multiplication is not as big a mess as might be expected. Remember that the first matrix must have the same number of rows as the second matrix has columns. That means A[i, k] * B[k, j] = C[i, j], which we can show with an example: CREATE TABLE MatrixA (i INTEGER NOT NULL CHECK (i BETWEEN 1 AND 10), pick your own bounds k INTEGER NOT NULL 584 CHAPTER 25: ARRAYS IN SQL CHECK (k BETWEEN 1 AND 10), must match MatrixB.k range element INTEGER NOT NULL, PRIMARY KEY (i, k)); MatrixA i k element =================== 1 1 2 1 2 -3 1 3 4 2 1 -1 2 2 0 2 3 2 CREATE TABLE MatrixB (k INTEGER NOT NULL CHECK (k BETWEEN 1 AND 10), must match MatrixA.k range j INTEGER NOT NULL CHECK (j BETWEEN 1 AND 4), pick your own bounds element INTEGER NOT NULL, PRIMARY KEY (k, j)); MatrixB k j element ================== 1 1 -1 1 2 2 1 3 3 2 1 0 2 2 1 2 3 7 3 1 1 3 2 1 3 3 -2 CREATE VIEW MatrixC(i, j, element) AS SELECT i, j, SUM(MatrixA.element * MatrixB.element) FROM MatrixA, MatrixB WHERE MatrixA.k = MatrixB.k GROUP BY i, j; 25.4 Flattening a Table into an Array 585 This is taken directly from the definition of multiplication. 25.3.4 Other Matrix Operations The transposition of a matrix is easy to do: CREATE VIEW TransA (i, j, element) AS SELECT j, i, element FROM MatrixA; Again, you can make the change permanent with an UPDATE statement: UPDATE MatrixA SET i = j, j = i; Multiplication by a column or row vector is just a special case of matrix multiplication, but a bit easier. Given the vector V and MatrixA: SELECT i, SUM(A.element * V.element) FROM MatrixA AS A, VectorV AS V WHERE V.j = A.i GROUP BY A.i; Cross tabulations and other statistical functions traditionally use an array to hold data. But you do not need a matrix for them in SQL. It is possible to do other matrix operations in SQL, but the code becomes so complex, and the execution time so long, that it is simply not worth the effort. If a reader would like to submit queries for eigenvalues and determinants, I will be happy to put them in future editions of this book. 25.4 Flattening a Table into an Array Reports and data warehouse summary tables often want to see an array laid horizontally across a line. The original one element/one column approach to mapping arrays was based on seeing such reports and duplicating that structure in a table. A subscript is often an enumeration, denoting a month or another time period, rather than an integer. For example, a row in a “Salesmen” table might have a dozen columns, one for each month of the year, each of which holds the total commission earned in a particular month. The year is really an array, subscripted by the month. The subscripts-and-value approach requires 586 CHAPTER 25: ARRAYS IN SQL more work to produce the same results. It is often easier to explain a technique with an example. Let us imagine a company that collects time cards from its truck drivers, each with the driver’s name, the week within the year (numbered 0 to 51 or 52, depending on the year), and his total hours. We want to produce a report with one line for each driver and six weeks of his time across the page. The Timecards table looks like this: CREATE TABLE Timecards (driver_name CHAR(25) NOT NULL, week_nbr INTEGER NOT NULL CONSTRAINT valid_week_nbr CHECK(week BETWEEN 0 AND 52) work_hrs INTEGER CONSTRAINT zero_or_more_hours CHECK(work_hrs >= 0), PRIMARY KEY (driver_name, week_nbr)); We need to “flatten out” this table to get the desired rows for the report. First, create a working storage table from which the report can be built: CREATE TEMPORARY TABLE TimeReportWork working storage (driver_name CHAR(25) NOT NULL, wk1 INTEGER, important that these columns are NULL-able wk2 INTEGER, wk3 INTEGER, wk4 INTEGER, wk5 INTEGER, wk6 INTEGER); Notice two important points about this table. First, there is no primary key; second, the weekly data columns are NULL -able. This table is then filled with time card values: INSERT INTO TimeReportWork (driver_name, wk1, wk2, wk3, wk4, wk5, wk6) SELECT driver_name, SUM(CASE (week_nbr = :rpt_week_nbr) THEN work_hrs ELSE 0 END) AS wk1, SUM(CASE (week_nbr = :rpt_week_nbr - 1) THEN work_hrs ELSE 0 END) AS wk2, SUM(CASE (week_nbr = :rpt_week_nbr - 2) THEN work_hrs ELSE 0 END) AS wk3, SUM(CASE (week_nbr = :rpt_week_nbr - 3) THEN work_hrs ELSE 0 END) AS wk4, SUM(CASE (week_nbr = :rpt_week_nbr - 4) THEN work_hrs ELSE 0 END) AS wk5, 25.5 Comparing Arrays in Table Format 587 SUM(CASE (week_nbr = :rpt_week_nbr - 5) THEN work_hrs ELSE 0 END) AS wk6 FROM Timecards WHERE week_nbr BETWEEN :rpt_week_nbr AND (:rpt_week_nbr - 5); The number of the weeks in the WHERE clauses will vary with the period covered by the report. The parameter :rpt_week_nbr is “week of the report,” and it computes backwards for the prior five weeks. If a driver did not work in a particular week, the corresponding weekly column gets a zero hour total. However, if the driver has not worked at all in the last six weeks, we could lose him completely (no time cards, no summary). Depending on the nature of the report, you might consider using an OUTER JOIN to a Personnel table to be sure you have all the drivers’ names. The NULLs are coalesced to zero in this example, but if you drop the ELSE 0 clauses, the SUM() will have to deal with a week of all NULLs and return a NULL. This enables you to tell the difference between a driver who was missing for the reporting period and a driver who worked zero hours but turned in a time card for that period. That difference could be important for computing the payroll. 25.5 Comparing Arrays in Table Format It is often necessary to compare one array or set of values with another when the data is represented in a table. Remember that comparing a set with a set does not involve ordering the elements, whereas an array does. For this discussion, let us create two tables, one for employees and one for their dependents. The children are subscripted in the order of their births—i.e., 1 is the oldest living child, 2 is the second oldest, and so forth. CREATE TABLE Employees (emp_id INTEGER PRIMARY KEY, emp_name CHAR(15) NOT NULL, ); CREATE TABLE Dependents (emp_id INTEGER NOT NULL the parent kid CHAR(15) NOT NULL, the array element birthorder INTEGER NOT NULL, the array subscript PRIMARY KEY (emp_id, kid)); 588 CHAPTER 25: ARRAYS IN SQL The query “Find pairs of employees whose children have the same set of names” is very restrictive, but we can make it more so by requiring that the children be named in the same birth order. Both Mr. X and Mr. Y must have exactly the same number of dependents; both sets of names must match. We can assume that no parent has two children with the same name (George Foreman does not work here) or born at the same time (we will order twins). Let us begin by inserting test data into the Dependents table, thus: Dependents emp_id kid_name birthorder ========================== 1 'Dick' 2 1 'Harry' 3 1 'Tom' 1 2 'Dick' 3 2 'Harry' 1 2 'Tom' 2 3 'Dick' 2 3 'Harry' 3 3 'Tom' 1 4 'Harry' 1 4 'Tom' 2 5 'Curly' 2 5 'Harry' 3 5 'Moe' 1 In this test data, employees 1, 2, and 3 all have dependents named ‘Tom’, ‘Dick’, and ‘Harry’. The birth order is the same for the children of employees 1 and 3, but not for employee 2. For testing purposes, you might consider adding an extra child to the family of employee 3, and so forth, to play with this data. Though there are many ways to solve this query, this approach will give us some flexibility that others would not. Construct a VIEW that gives us the number of dependents for each employee: CREATE VIEW Familysize (emp_id, tally) AS SELECT emp_id, COUNT(*) FROM Dependents GROUP BY emp_id; 25.5 Comparing Arrays in Table Format 589 Create a second VIEW that holds pairs of employees who have families of the same size. (This VIEW is also useful for other statistical work, but that is another topic.) CREATE VIEW Samesize (emp_id1, emp_id2, tally) AS SELECT F1.emp_id, F2.emp_id, F1.tally FROM Familysize AS F1, Familysize AS F2 WHERE F1.tally = F2.tally AND F1.emp_id < F2.emp_id; We will test for set equality by doing a self-JOIN on the dependents of employees with families of the same size. If one set can be mapped onto another with no children left over, and in the same birth order, then the two sets are equal. SELECT D1.emp_id, ' named his ', S1.tally, ' kids just like ', D2.emp_id FROM Dependents AS D1, Dependents AS D2, Samesize AS S1 WHERE S1.emp_id1 = D1.emp_id AND S1.emp_id2 = D2.emp_id AND D1.kid = D2.kid AND D1.birthorder = D2.birthorder GROUP BY D1.emp_id, D2.emp_id, S1.tally HAVING COUNT(*) = S1.tally; If birth order is not important, then drop the predicate D1.birthorder = D2.birthorder from the query. This is a form of exact relational division, with a second column equality test as part of the criteria. CHAPTER 26 Set Operations B Y SET OPERATIONS, I mean union, intersection, and set differences, where the sets in SQL are tables. These are the basic operators used in elementary set theory, which has been taught in the United States public school systems for decades. Since the relational model is based on sets, you would expect that SQL would have had a good variety of set operators from the start. However, this was not the case. Standard SQL has added the basic set operators, but they are still not common in actual products. There is another problem in SQL that you did not have in high school set theory. SQL tables are multisets (also called bags), which means that, unlike sets, they allow duplicate elements (rows or tuples). Dr. Codd’s relational model is stricter and uses only true sets. SQL handles these duplicate rows with an ALL or DISTINCT modifier in different places in the language; ALL preserves duplicates, and DISTINCT removes them. So that we can discuss the result of each operator formally, let R be a row that is a duplicate of some row in TableA, or of some row in TableB, or of both. Let m be the number of duplicates of R in TableA and let n be the number of duplicates of R in TableB, where ( m >= 0) and ( n >= 0). Informally, the engines will pair off the two tables on a row-per-row basis in set operations. We will see how this works for each operator. . tables. These are the basic operators used in elementary set theory, which has been taught in the United States public school systems for decades. Since the relational model is based on sets,. 'Harry' 3 1 'Tom' 1 2 'Dick' 3 2 'Harry' 1 2 'Tom' 2 3 'Dick' 2 3 'Harry' 3 3 'Tom' 1 4 'Harry'. work_hrs ELSE 0 END) AS wk1, SUM(CASE (week_nbr = :rpt_week_nbr - 1) THEN work_hrs ELSE 0 END) AS wk2, SUM(CASE (week_nbr = :rpt_week_nbr - 2) THEN work_hrs ELSE 0 END) AS wk3, SUM(CASE (week_nbr