32 CHAPTER 2: FONTS, PUNCTUATION, AND SPACING 3. Put a new line or at least a space after a semicolon to separate statements. 4. Put a space between words even when you could crowd them together. Exceptions: If SQL does not work the same way as English, then you have to follow the SQL syntax rules. Many of the code-formatting habits people have go back to habits they were taught by programmers who grew up with punchcard data processing. Because we have video terminals and text editors today, a lot of habits no longer have any basis. The practice of putting a comma in front of a single variable on a single line goes back to punchcards. It was often difficult for programmers to get to a keypunch machine to create their decks of cards. In this format, you could pull or insert a card to change your code. There is no excuse for this practice since we now have video terminals. English and European languages are read left to right and then top to bottom. This scanning pattern is so deeply learned that we arrange schematics, comic books, maps, and other graphics the same way. To see how much changing that order can throw you off, try to read a Japanese or Chinese comic book. The panels are in right-to-left order, and the Chinese word balloons are read top to bottom. This is why typographers have a rule that you do not set long words V E R T T I C A L L Y. Did you spot the misspelling? About one-third of readers do not. Likewise, it is difficult to locate duplicates and errors in those long 2.5 Avoid Proprietary Reserved Words if a Standard Keyword Is Available in Your SQL Product 33 vertical lists of names. SQL formatting can use vertical alignment to advantage in other places but in things that should be chunked together. 2.4 Use Full Reserved Words Rational: SQL allows you to skip some reserved words and to abbreviate others. Try to use the full forms to document the program. This is a good thing in COBOL, and it works in SQL as well. For example, an alias can be written with or without an AS operator. That is, “Personnel AS P1” is equivalent to “Personnel P1” in a FROM clause, and “(salary + commission) AS total_pay” is equivalent to “(salary + commission) total_pay” in a SELECT list. But the AS reserved word makes it easier to see there is an alias and not a comma in these situations. Technically, you can abbreviate INTEGER to INT and DECIMAL to DEC, but the full names are preferred. The abbreviations look like the reserved word “into” or the month “Dec” in English. Exceptions: The exception is to use the shorter forms of the character data types. That is, CHAR(n) instead of CHARACTER(n), VARCHAR(n) instead of VARYING CHARACTER(n), NCHAR(n) instead of NATIONAL CHARACTER(n), and NVARCHAR(n) instead of NATIONAL VARYING CHARACTER(n). The full names are too long to be comfortable to a reader. Even COBOL, the most verbose programming language on earth, allows some abbreviations. 2.5 Avoid Proprietary Reserved Words if a Standard Keyword Is Available in Your SQL Product Rationale: Sticking to standards will make your code readable to other SQL programmers who might not know your dialect. It also means that your code can run on other products without being rewritten. Standard code will protect you from failure when the proprietary syntax is dropped or modified. That unwelcome surprise occurred in several products when the vendors added the Standard SQL versions of OUTER JOINs and deprecated their old proprietary versions. In particular, SQL Server programmers had to unlearn their *= syntax and semantics for outer joins. 34 CHAPTER 2: FONTS, PUNCTUATION, AND SPACING The other disadvantage of proprietary features is that they change over time and have no standard behavior. For example, the BIT data type in SQL Server changed its NULL-ability between product releases. Oracle could not tell an empty string from a NULL. There are lots of other examples. Because there is no external standard to appeal, a vendor is free to do anything it wishes. Exceptions: If your SQL product does not yet support standard syntax for something, then you have no choice. This is true for temporal functions. They were late getting to Standard SQL, so the early vendors made up their own syntax and internal temporal models. 2.6 Avoid Proprietary Statements if a Standard Statement Is Available Rationale: This rule ought to be obvious. Sticking to standards will make your code readable to other SQL programmers who might not know your dialect. It also means that your code can run on other products without being rewritten. Standard code will protect your code from failure when the proprietary syntax is dropped or modified. In fact, a vendor can actually give you proprietary features that are unpredictable! In the “Books On Line” interactive manual that comes with Microsoft SQL Server, we get a warning in the REMARKS section about the proprietary “UPDATE FROM ” syntax that tells us: The results of an UPDATE statement are undefined if the state- ment includes a FROM clause that is not specified in such a way that only one value is available for each column occur- rence that is updated (in other words, if the UPDATE statement is not deterministic). For example, given the UPDATE state- ment in the following script, both rows in table S meet the qualifications of the FROM clause in the UPDATE statement, but it is undefined which row from S is used to update the row in table T. This replaces a prior behavior found in the Sybase and Ingres family where the UPDATE FROM would do multiple updates, one for each joined row in the second table. 2.6 Avoid Proprietary Statements if a Standard Statement Is Available 35 In older versions of Sybase/SQL Server, if a base table row is represented more than once in the embedded query, then that row is operated on multiple times instead of just once. This is a total violation of relational principles, but it’s easy to do with the underlying physical implementation. Here is a quick example: CREATE TABLE T1 (x INTEGER NOT NULL); INSERT INTO T1 VALUES (1); INSERT INTO T1 VALUES (2); INSERT INTO T1 VALUES (3); INSERT INTO T1 VALUES (4); CREATE TABLE T2 (x INTEGER NOT NULL); INSERT INTO T2 VALUES (1); INSERT INTO T2 VALUES (1); INSERT INTO T2 VALUES (1); INSERT INTO T2 VALUES (1); Now try to update T1 by doubling all the rows that have a match in T2. UPDATE T1 SET T1.x = 2 * T1.x FROM T2 WHERE T1.x = T2.x; SELECT * FROM T1; original current x x ==== ==== 16 2 2 2 3 3 4 4 The FROM clause gives you a CROSS JOIN, so you get a series of four actions on the same row (1 => 2 => 4 => 8 => 16). These are pretty simple examples, but you get the idea. There are subtle things with self- joins and the diseased mutant T-SQL syntax that can hang you in loops 36 CHAPTER 2: FONTS, PUNCTUATION, AND SPACING by changing things, or you can have tables that depend on the order of the rows for their results, and so forth. SQL Server and Sybase used different fixes for this problem in later versions of their products. Sybase did a hidden “SELECT DISTINCT” in the implied query, and SQL Server gets an unpredictable row. Standard SQL is consistent and clear about aliases, views, and derived tables, as well as a highly orthogonal language. If the UPDATE clause could take an alias, according to the Standard SQL model, then you would create a copy of the contents of that base table under the alias name, then update that copy, and delete it when the statement was over—in effect doing nothing to the base table. If the UPDATE clause could take a FROM clause, according to the Standard SQL model, then you would create a result set from the table expression, then update that copy, and delete it when the statement was over—in effect doing nothing to the base tables. Because this syntax is so proprietary, inconsistent with the standard model, and ambiguous, why does it exist? In the original Sybase product, the physical model made this “extension” relatively easy to implement, and there were no standards or a good understanding of the relational model back then. Programmers got used to it and then it was almost impossible to fix. When I lived in Indianapolis in the mid-1970s, my neighbor had graduated from General Motors private college and gone to work for the company. His first job was investigating industrial accident reports. We were having a beer one night, and he got to telling war stories from the various General Motors plants he had been to for his job. His conclusion after a year on that job was that all industrial accidents are bizarre suicide attempts. People would go to the machine shop and build clever devices to short around the safety features on their equipment so they could work a little faster. For example, if you make a clamp that holds in one of the two safety switches that operates a small stamping machine, you can push the other button with one hand and work material with your free hand. Well, you can do this until that free hand is crushed just above the wrist and squirts across the back wall of the shop anyway. Trading speed for safety and correctness will eventually catch up with you. Exceptions: If your SQL product does not yet support standard syntax for something, then you have no choice. For example, Oracle did not support the CASE . even when you could crowd them together. Exceptions: If SQL does not work the same way as English, then you have to follow the SQL syntax rules. Many of the code-formatting habits people. Avoid Proprietary Reserved Words if a Standard Keyword Is Available in Your SQL Product 33 vertical lists of names. SQL formatting can use vertical alignment to advantage in other places but. Rational: SQL allows you to skip some reserved words and to abbreviate others. Try to use the full forms to document the program. This is a good thing in COBOL, and it works in SQL as well. For