5.4 Multiple Character Sets 97 I was amazed to go to a major hospital in Los Angeles in mid-1993 and see the clerk still looking up codes in a dog-eared looseleaf notebook instead of bringing them up on her terminal screen. The hospital was still using an old IBM mainframe system, which had dumb 3270 terminals, rather than a client/server system with workstations. There was not even a help screen available to the clerk. The translation tables can be downloaded to the workstations in a client/server system to reduce network traffic. They can also be used to build picklists on interactive screens and thereby reduce typographical errors. Changes to the codes are thereby propagated in the system without anyone having to rewrite application code. If the codes change over time, the table for a code should have to include a pair of “date effective” fields. This will allow a data warehouse to correctly read and translate old data. 5.4 Multiple Character Sets Some DBMS products can support ASCII, EBCDIC, and Unicode. You need to be aware of this, so you can set proper collations and normalize your text. The predicate “<string> IS [NOT] NORMALIZED” in SQL-99 determines if a Unicode string is one of four normal forms (i.e., D, C, KD, and KC). The use of the words normal form here is not the same as in a relational context. In the Unicode model, a single character can be built from several other characters. Accent marks can be put on basic Latin letters. Certain combinations of letters can be displayed as ligatures (ae becomes æ). Some languages, such as Hangul (Korean) and Vietnamese, build glyphs from concatenating symbols in two dimensions. Some languages have special forms of one letter that are determined by context, such as the terminal sigma in Greek or accented u in Czech. In short, writing is more complex than putting one letter after another. The Unicode standard defines the order of such constructions in their normal forms. You can still produce the same results with different orderings and sometimes with different combinations of symbols, but it is handy when you are searching such text to know that it is normalized rather than trying to parse each glyph on the fly. You can find details about normalization and links to free software at www.unicode.org. CHAPTER 6 Coding Choices “Caesar: Pardon him, Theodotus. He is a barbarian and thinks the customs of his tribe and island are the laws of nature.” — Caesar and Cleopatra , by George Bernard Shaw, 1898 T HIS CHAPTER DEALS WITH writing good DML statements in Standard SQL. That means they are portable and can be optimized well by most SQL dialects. I define portable to mean one of several things. The code is standard and can be run as-is on other SQL dialects; standard implies portable. Or the code can be converted to another SQL dialect in a simple mechanical fashion, or that the feature used is so universal that all or most products have it in some form; portable does not imply standard. You can get some help with this concept from the X/Open SQL Portability Guides. A major problem in becoming a SQL programmer is that people do not unlearn procedural or OO programming they had to learn for their first languages. They do not learn how to think in terms of sets and predicates, and so they mimic the solutions they know in their first programming languages. Jerry Weinberg (1978) observed this fact more than 25 years ago in his classic book, Psychology of Computer Programming . He was teaching PL/I. For those of you younger readers, PL/I was a language from IBM that was a hybrid of FORTRAN, COBOL, and AlGOL that had a popular craze. 100 CHAPTER 6: CODING CHOICES Weinberg found that he could tell the first programming languages of the students by how they wrote PL/I. My personal experience (1989) was that I could guess the nationality of the students in my C and Pascal programming classes because of their native spoken language. Another problem in becoming a SQL programmer is that people tend to become SQL dialect programmers and think that their particular product’s SQL is some kind of standard. In 2004, I had a job interview for a position where I was being asked to evaluate different platforms for a major size increase in the company’s databases. The interviewer kept asking me “general SQL” questions based on the storage architecture of the only product he knew. His product is not intended for Very Large Database (VLDB) applications, and he had no knowledge of Nucleus, Teradata, Model 204, or other products that compete in the VLDB arena. He had spent his career tuning one version of one product and could not make the jump to anything different, even conceptually. His career is about to become endangered. There is a place for the specialist dialect programmer, but dialect programming should be a last resort in special circumstances and never the first attempt. Think of it as cancer surgery: You do massive surgery when there is a bad tumor that is not treatable by other means; you do not start with it when the patient came in with acne. 6.1 Pick Standard Constructions over Proprietary Constructions There is a fact of life in the IT industry called the Code Museum Effect, which works like this: First, each vendor adds a feature to its product. The feature is deemed useful, so it gets into the next version of the standard with slightly different syntax or semantics, but the vendor is stuck with its proprietary syntax. Its users have written code based on it, and they do not want to redo it. The solutions are the following: 1. Never implement the standard and just retain the old syntax . The problem is that you cannot pass a conformance test, which can be required for government and industry contracts. SQL pro- grammers who know the standard from other products cannot read, write, or maintain your code easily. In short, you have the database equivalent of last year’s cell phone. 2. Implement the standard, but retain the old syntax, too . This is the usual solution for a few releases. It gives the users a chance to 6.1 Pick Standard Constructions over Proprietary Constructions 101 move to the standard syntax but does not break the existing applications. Everyone is happy for awhile. 3. Implement the standard and depreciate the old syntax . The vendor is ready for a major release, which lets it redo major parts of the database engine. Changing to the standard syntax and not supporting the old syntax at this point is a good way to force users to upgrade their software and help pay for that major release. A professional programmer would be converting his or her old code at step two to avoid being trapped in the Code Museum when step three rolls around. Let’s be honest, massive code conversions do not happen until after step three occurs in most shops, and they are a mess, but you can start to avoid the problems by always writing standard code in a step two situation. 6.1.1 Use Standard OUTER JOIN Syntax Rationale: Here is how the standard OUTER JOINs work in SQL-92. Assume you are given: Table1 Table2 a b a c ====== ====== 1 w 1 r 2 x 2 s 3 y 3 t 4 z and the OUTER JOIN expression: Table1 LEFT OUTER JOIN Table2 ON Table1.a = Table2.a <== join condition AND Table2.c = 't'; <== single table condition We call Table1 the “preserved table” and Table2 the “unpreserved table” in the query. What I am going to give you is a little different but equivalent to the ANSI/ISO standards. . help with this concept from the X/Open SQL Portability Guides. A major problem in becoming a SQL programmer is that people do not unlearn procedural or OO programming they had to learn for their. language. Another problem in becoming a SQL programmer is that people tend to become SQL dialect programmers and think that their particular product’s SQL is some kind of standard. In 2004,. DEALS WITH writing good DML statements in Standard SQL. That means they are portable and can be optimized well by most SQL dialects. I define portable to mean one of several