1.2 Follow the ISO-11179 Standards Naming Conventions 7 the SQL standard. The second is that Microsoft does not follow the SQL standard. These problems make it difficult for one naming convention to fit everyone. Exceptions: I will give a simple set of rules based on principles of readability and typography, but there are other possible conventions: 1. Avoid delimited identifiers so you have no problems. 2. IBM uses only uppercase. Unfortunately, this is difficult to read and looks like you are still programming on a punchcard sys- tem. 3. Microsoft and Oracle use lowercase except where it would look odd. Unfortunately, the definition of looking odd is not at all precise. Sometimes reserved words are uppercased, sometimes lowercased, and so forth. 1.2 Follow the ISO-11179 Standards Naming Conventions This is a fairly new ISO standard for metadata, and it is not well understood. Fortunately, the parts that a SQL programmer needs to know are pretty obvious and simple. The real problem is in the many ways that people violate them. A short summary of the NCITS L8 Metadata Standards Committee rules for data elements can be found at the following sites: http://pueblo.lbl.gov/~olken/X3L8/drafts/draft.docs.html http://lists.oasis-open.org/archives/ubl-ndrsc/200111/ msg00005.html Also the pdf file: www.oasis-open.org/committees/download.php/6233/ c002349_ISO_IEC_11179 and the draft: www.iso.org/iso/en/ittf/PubliclyAvailableStandards/ c002349_ISO_IEC_11179-1_1999(E).zip The ISO-11179 standard is broken down into six sections: 8 CHAPTER 1: NAMES AND DATA ELEMENTS 11179-1: Framework for the Specification and Standardization of Data Elements Definitions 11179-2: Classification for Data Elements 11179-3: Basic Attributes of Data Elements 11179-4: Rules and Guidelines for the Formulation of Data 11179-5: Naming and Identification Principles for Data 11179-6: Registration of Data Elements 1.2.1 ISO-11179 for SQL Rationale: Although the formal standards are good, they are very general. It is handy to have a set of rules aimed at the SQL developer in his or her own language. Some of the interpretations given here are the consensus of experts, as taken from newsgroups and private e-mails. Taking the rules from Section ISO-11179-4, a scalar data element should do the following: 1. Be unique (within any data dictionary in which it appears). 2. Be stated in the singular. 3. State what the concept is, not only what it is not. 4. Be stated as a descriptive phrase or sentence(s). 5. Contain only commonly understood abbreviations. 6. Be expressed without embedding definitions of other data ele- ments or underlying concepts. 7. Tables, sets, and other collections shall be named with a collec- tive, class, or plural name. 8. Procedures shall have a verb in their name. 9. A copy (alias) of a table shall include the base table name as well as the role it is playing at that time. This formalism is nice in theory, but names are subject to constraints imposed by software limitations in the real world, such as maximum name length and character sets. Another problem is that one data element may have many names depending on the context in which it is used. It might be called something in a report and something else in an electronic data interchange (EDI) file, and it might be different from the name in the database. But you want to avoid using multiple names in the 1.2 Follow the ISO-11179 Standards Naming Conventions 9 same database, and you should be able to detect them with metadata tools. Furthermore, you want to avoid using multiple names in different databases in the same enterprise. Unfortunately, this is much more difficult to detect without very good data dictionary tools. The data dictionary should include the external names and their context. Exceptions: The curse of legacy databases, legacy file systems, and other traditions can make this very difficult. If there is a common, well-understood name for a data element, then you can use this name instead of a constructed name. For example, “us_postal_code” is formally correct, but “zip_code” is well understood, and you can argue for simply “zip” or “zip4” as a name because it is a familiar term. 1.2.2 Levels of Abstraction Name development begins at the conceptual level. An object class represents an idea, abstraction, or thing in the real world, such as tree or country. A property is something that describes all objects in the class, such as height or identifier. This lets us form terms such as “tree height” or “country identifier” from the combination of the class and the property. The level in the process is the logical level. A complete logical data element must include a form of representation for the values in its data value domain (the set of possible valid values of a data element). The representation term describes the data element’s representation class. The representation class is equivalent to the class word of the prime/class naming convention with which many data administrators are familiar. This gets us to “tree height measure,” “country identifier name,” and “country identifier code” as possible data elements. There is a subtle difference between “identifier name” and “identifier code,” and it might be so subtle that we do not want to model it, but we would need a rule to drop the property term in this case. The property would still exist as part of the inheritance structure of the data element, but it would not be part of the data element name. Some logical data elements can be considered generic elements if they are well defined and are shared across organizations. Country names and country codes are well defined in the ISO 3166 standard, “Codes for the Representation of Names of Countries,” and you might simply reference this document. 10 CHAPTER 1: NAMES AND DATA ELEMENTS Note that this is the highest level at which true data elements, by the definition of ISO-11179, appear: They have an object class, a property, and a representation. The next is the application level. This is usually done with a quantifier that applies to the particular application. The quantifier will either subset the data value domain or add more restrictions to the definition so that we work with only those values needed in the application. For example, assume that we are using ISO-3166 country codes, but we are only interested in Europe. This would be a simple subset of the standard, but it will change slowly over time. However, the subset of countries with more than 20 centimeters of rain this year will vary greatly in a matter of weeks. Changes in the name to reflect this fact will be accomplished by addition of qualifier terms to the logical name. For example, if a view were to list all of the countries with which a certain organization had trading agreements, the query data element might be called “trading_partner_country_name” to show its role in the context of the VIEW or query that limits it. The data value domain would consist of a subset of countries listed in ISO-3166. The physical name is the lowest level. These are the names that actually appear in the database table column headers, file descriptions, EDI transaction file layouts, and so forth. They may be abbreviations or use a limited character set because of software restrictions. However, they might also add information about their origin or format. In a registry, each of the data element names and name components will always be paired with its context so that we know the source or usage of the name or name component. The goal is to be able to trace each data element from its source to wherever it is used, regardless of the name under which it appears. 1.2.3 Avoid Descriptive Prefixes Rationale: Another silly convention among newbies is to use prefixes that describe something about the appearance of the data element in the current table. In the old days, when we worked with sequential file systems, the physical location of the file was very important. The “tbl-” prefix is particularly silly. Before you counter that this prefix answers the question of what something is, remember that SQL has only one data structure. What else could it be? Do you put “n-” in front of every noun you write? Do you think this would make English 1.2 Follow the ISO-11179 Standards Naming Conventions 11 easier to read? It is like infants announcing that everything is “thingie!” as they grab them. “ To be something is to be something in particular; to be nothing in particular or anything in general is to be nothing .” —Aristotle The next worst affix is the <table name>. Why does a data element become something totally different from table to table? For example, “orders_upc” and “inventory_upc” are both UPC codes no matter where they appear, but by giving them two names, you are saying that they are totally, logically different things in your data model. A total nightmare is the combination of “id” in a base table (vague name) with a reference in a second table using the base table name as a prefix in the foreign key or non-foreign-key references. The queries fill up with code like “Orders.ID = OrderID,” which quickly becomes a game of looking for the period and trying to figure out what a thousand different “ID” columns mean in the data dictionary. Affixes like “vw” for views tell you how the virtual table is implemented in the schema, but this has nothing to do with the data model. If I later decide to replace the view with a base table, do I change the name? The bad news is that a table often already exists with the same root name, which makes for more confusion. Equally silly and dangerous are column names that are prefixed with the data type. This is how it is physically represented and not what it means in the data model. The data dictionary will be trashed, because you have no idea if there are “intorder_nbr,” “strorder_nbr,” and perhaps even “forder_nbr,” all trying to be the simple “order_nbr” at the same time. The user can also look at the data declaration language (DDL) and see the data type, defaults, and constraints if he or she does not remember them. The final affix problem is telling us that something is the primary key with a “PK_” or a foreign key with an “FK_” affix. That is how it is used in that particular table; it is not a part of its fundamental nature. The user can also look at the DDL and see the words “PRIMARY KEY” or “FOREIGN KEY REFERENCES ” in the column declarations. The strangest version of this is a rule on a Web site for a company that specializes in Oracle programming. It advocated “<table name>_CK_<column name>” for CHECK() constraints. This not only gives you no help in determining the errors that caused the violation, but it also limits you to one and only one constraint per column per table, and it leaves you to ask about constraints that use two or more columns. . 1.2 Follow the ISO-11179 Standards Naming Conventions 7 the SQL standard. The second is that Microsoft does not follow the SQL standard. These problems make it difficult for one naming convention. Elements 1.2.1 ISO-11179 for SQL Rationale: Although the formal standards are good, they are very general. It is handy to have a set of rules aimed at the SQL developer in his or her own. fairly new ISO standard for metadata, and it is not well understood. Fortunately, the parts that a SQL programmer needs to know are pretty obvious and simple. The real problem is in the many ways