6.10 Avoid UNIONs 127 HAVING MAX(payment_status) = 'F' AND MIN(payment_status) = 'F'; The self-reference and correlation are complicated for both humans and machines. Most optimizers are not smart enough to flatten the first query like this. Exceptions: If you have a problem that is easier to understand with correlated subqueries and your optimizer is good, then don’t be so afraid of them. 6.10 Avoid UNIONs Rationale: UNIONs are usually not well optimized. Because they require that redundant duplicates be discarded, they force most SQL engines to do a sort before presenting the result set to the user. If possible, use UNION ALL instead. You should never have to build a chain of UNIONs from the same base table. That code can be written with OR-ed predicates or CASE expressions. As an example of a horrible misuse of SQL, Chris White posted a procedure that built dynamic SQL that would then build a report. Aside from the obvious violations of basic software engineering, the output was so huge that it exceeded the text size limits of SQL Server. He was attempting to construct an entire report in the database by using UNIONs to get the 12 lines of the report in the right order, by assigning them a letter of the alphabet. The whole thing would take several pages to show, but it is an extraction of the printout lines that were constructed from just the General Ledger. I have not attempted to clean up much of the code, so there are many violations of good coding rules in this snippet. . . . UNION SELECT DISTINCT 'J' AS section, 'NUMBER CHECKS' AS description, '' AS branch, COUNT(DISTINCT GL.source) AS total1, 0 AS total2 FROM GeneralLedger AS GL WHERE GL.period >= :start_period AND GL.period <= :end_period AND GL.year_for_period = :period_yr 128 CHAPTER 6: CODING CHOICES AND GL.account_number IN ('3020') AND GL.journal_id IN ('CD') UNION SELECT DISTINCT 'C' AS section, 'CASH RECEIPTS' AS description, '' AS branch, SUM(GL.amount) * -1 AS total1, 0 AS total2 FROM GeneralLedger AS GL WHERE GL.period >= :start_period AND GL.period <= :end_period AND GL.year_for_period = :period_yr AND GL.account_number = '1050' AND GL.journal_id IN ('CR') UNION SELECT DISTINCT 'D' AS section, 'NUMBER INVOICES' AS description, '' AS branch, COUNT(DISTINCT GL.source) AS total1, 0 AS total2 FROM GeneralLedger AS GL WHERE GL.period >= :start_period AND GL.period <= :end_period AND GL.year_for_period = :period_yr AND GL.account_number IN ('6010', '6090') AND GL.journal_id IN ('SJ') UNION SELECT DISTINCT 'E' AS section, 'VOUCHER TOTAL' AS description, '' AS branch, SUM(GL.amount) * -1 AS total1, 0 AS total2 FROM GeneralLedger AS GL WHERE GL.period >= :start_period AND GL.period <= :end_period AND GL.year_for_period = :period_yr AND GL.account_number = '3020' AND GL.journal_id IN ('PJ', 'TJ') UNION SELECT DISTINCT 'F' AS section, 'CHECKS PRINTED' AS description, '' AS branch, SUM(GL.amount) AS total1, 0 AS total2 FROM GeneralLedger AS GL WHERE GL.period >= :start_period AND GL.period <= :end_period AND GL.year_for_period = :period_yr AND GL.account_number IN ('3020') 6.10 Avoid UNIONs 129 AND GL.journal_id IN ('CD') UNION SELECT DISTINCT 'K' AS section, 'NUMBER VOUCHERS' AS description, '' AS branch, COUNT(DISTINCT GL.source) AS total1, 0 AS total2 FROM GeneralLedger AS GL WHERE GL.period >= :start_period AND GL.period <= :end_period AND GL.year_for_period = :period_yr AND GL.account_number IN ('3020') AND GL.journal_id IN ('PJ', 'TJ'); The last part of the code could be reduced to a single, cohesive procedure. The output of the procedure would then be formatted in the front. Notice that section, description, and branch are all placeholders to give a slot for columns in the other UNIONs not shown here. CREATE PROCEDURE GeneralLedgeSummary (start_period DATE, end_period DATE) SELECT COUNT(DISTINCT CASE WHEN acct_nbr = '3020' AND journal_code = 'CD' THEN source ELSE NULL END), -SUM(CASE WHEN acct_nbr = '1050' AND journal_code ='CR' THEN amount ELSE 0.00 END), COUNT(DISTINCT CASE WHEN acct_nbr IN ('6010', '6090') AND journal_code = 'SJ' THEN source ELSE NULL END), -SUM(CASE WHEN acct_nbr = '1050' AND journal_code = 'CR' THEN amount ELSE 0.00 END), SUM(CASE WHEN acct_nbr = '3020' AND journal_code = 'CD' THEN amount ELSE 0.00 END), COUNT(DISTINCT CASE WHEN acct_nbr = '3020' AND journal_code IN ('PJ', 'TJ') THEN source ELSE NULL END) INTO j_tally, c_total, d_tally, e_total, f_total, k_tally FROM GeneralLedger AS GL WHERE period BETWEEN start_period AND end_period; 130 CHAPTER 6: CODING CHOICES Exceptions: Sometimes the UNION [ALL] is what you actually want. The other set operations in SQL-92, EXCEPT [ALL], and INTERSECT [ALL] are not widely available yet. 6.11 Testing SQL When you are first writing a schema, you will probably generate some test data. If you look in the literature, there is a thing called an Armstrong set, which is the minimal number of rows that will test all of the constraints in a schema. Although it is difficult to automatically create an Armstrong set, you can do a good job with a little effort. 6.11.1 Test All Possible Combinations of NULLs Rationale: NULLs behave strangely, and if there are problems, there is a good chance that a NULL will be involved. Newbies using graphic tools often leave more NULL-able columns in a single table than a professional would in an entire schema for a Fortune 500 company payroll. Exceptions: If the number of combinations is excessive, then look at a redesign rather than a stress test. It means you probably have too many NULL- able columns in the schema. 6.11.2 Inspect and Test All CHECK() Constraints Rationale: You can extract the CHECK() constraint predicates from the DDL and look at them. The first thing is to see if the same data element has the same rules in all of the tables. Some attributes will always have the same CHECK() constraints if the model is correct. For example, the data type, regular expression, and check digit for a UPC code will be the same everywhere in the schema. Some attributes may have different constraints in different tables. For example, it would be reasonable to have “quantity INTEGER DEFAULT 0 NOT NULL CHECK (quantity >= 0)” almost everywhere that the quantity attribute appears. However, you might find that there is also a “CHECK (quantity > 0)” on a table. Is this an error or a situation where a zero quantity is disallowed? You need to look and see. 6.11 Testing SQL 131 Exceptions: None 6.11.3 Beware of Character Columns Rationale: Character columns seldom have enough constraints on them. The result is that they have extra blanks in them, allow mixed-case letters, and will pretty much hold any kind of garbage that a user wishes to put in them. My favorite piece of test data for oversized, unconstrained NVARCHAR(n) columns is a collection of Buddhist sutras in Chinese unicode. At least the users will learn a bit of classic Buddhist thought. Exceptions: None 6.11.4 Test for Size Rationale: One of the problems with small test data sets is that they will run just fine in the development shop, but when the size of the tables grows larger, you can get gradually degraded performance or catastrophe points. A catastrophe point is when there is a sudden change in the performance—the straw that breaks the camel’s back. There is usually a physical component to a catastrophe point, such as excessive paging to a hard drive. Frankly, there is not a lot you can do about it except wait and see if it was a fluke or if it happens again. Gradually degraded performance is the nicer of the two situations. You can monitor the system, see the loss, and take action before anything bad happens. The bad news is that the term gradual can be very short. The query that ran so well on a few thousand rows of test data is a pig when it goes live on several million rows of production data. Try to stress test on a data set that is larger than the current production database. That will let you know you have some margin of error. Exceptions: None . predicates or CASE expressions. As an example of a horrible misuse of SQL, Chris White posted a procedure that built dynamic SQL that would then build a report. Aside from the obvious violations. what you actually want. The other set operations in SQL- 92, EXCEPT [ALL], and INTERSECT [ALL] are not widely available yet. 6.11 Testing SQL When you are first writing a schema, you will probably. of basic software engineering, the output was so huge that it exceeded the text size limits of SQL Server. He was attempting to construct an entire report in the database by using UNIONs to