Joe Celko s SQL for Smarties - Advanced SQL Programming P14 pps

102 CHAPTER 3: NUMERIC DATA IN SQL Numbers in SQL are classified as either exact or approximate. An exact numeric value has a precision, p , and a scale, s . The precision is a positive integer that determines the number of significant digits in a particular radix. The Standard says the radix can be either binary or decimal, so you need to know what your implementation does. The scale is a nonnegative integer that tells you how many decimal places the number has. Today, there are not that many base-ten platforms, so you probably have a binary machine. However, a number can have one of many binary representations—twos-complement; ones-complement; high-end, or low-end—and various word sizes. The proper mental model of numbers in SQL is not to worry about the “bits and bytes” level of the physical representation, but to think in abstract terms. The data types NUMERIC , DECIMAL , INTEGER , BIGINT , and SMALLINT are exact numeric types. An integer has a scale of zero, but the syntax simply uses the word INTEGER or the abbreviation INT . SMALLINT has a scale of zero, but the range of values it can hold is less than or equal to the range that INTEGER can hold in the implementation. Likewise, BIGINT has a scale of zero, but the range of values it can hold is greater than or equal to the range that INTEGER can hold in the implementation. BIGINT was added in SQL-99, but had been common in products before then. DECIMAL(p,s) can also be written DEC(p,s) . For example, DECIMAL(8,2) could be used to hold the number 123456.78, which has eight significant digits and two decimal places. The difference between NUMERIC and DECIMAL is subtle. NUMERIC specifies the exact precision and scale to be used. DECIMAL specifies the exact scale, but the precision is implementation-defined to be equal to or greater than the specified value. Mainframe COBOL programmers can think of NUMERIC as a COBOL PICTURE numeric type, whereas DECIMAL is like a BCD. Personal computer programmers these days probably have not seen anything like this. You may find that many small-machine SQLs do not support NUMERIC or DECIMAL , because the programmers do not want to have to have COBOL-style math routines that operate on character strings or an internal decimal representation. An approximate numeric value consists of a mantissa and an exponent. The mantissa is a signed numeric value; the exponent is a signed integer that specifies the magnitude of the mantissa. An approximate numeric value has a precision. The precision is a positive integer that specifies the number of significant binary digits in the 3.1 Numeric Types 103 mantissa. The value of an approximate numeric value is the mantissa multiplied by 10 to the exponent. FLOAT(p) , REAL , and DOUBLE PRECISION are the approximate numeric types. There is a subtle difference between FLOAT(p) , which has a binary precision equal to or greater than the value given, and REAL , which has an implementation- defined precision. The IEEE Standard 754 for floating point numbers is the most common representation today. It is binary and uses 32 bits for single precision and 64 bits for double precision, which is just right for Intel- based PCs, Macintoshes, and most UNIX platforms. Its math functions are available burned into processor chips, so they will run faster than a software implementation. The range for single precision numbers is approximately ± 10^-44.85 to 10^38.53, and for double precision, approximately ± 10^-323.3 to 10^308.3. However, there are some special values in the Standard. Zero cannot be directly represented in this format, so it is modeled as a special value denoted with an exponent field of zero and a fraction field of zero. The sign field can make this either − 0 and +0, which are distinct values that compare as equal. If the exponent is all zeroes, but the fraction is nonzero (else it would be interpreted as zero), then the value is a denormalized number, which is not assumed to have a leading 1 before the binary point. Thus, this represents a number (- s * 0. f * 2 − 126), where s is the sign bit and f is the fraction. For double precision, denormalized numbers are of the form (- s * 0. f * 2 − 1022). You can interpret zero as a special type of denormalized number. The two values “+infinity” and “ − infinity” are denoted with an exponent of all ones and a fraction of all zeroes. The sign bit distinguishes between negative infinity and positive infinity. Being able to denote infinity as a specific value is useful, because it allows operations to continue past overflow situations. Operations with infinite values are well defined in the IEEE floating point. The value NaN (Not a Number) is used to represent a bit configuration that does not represent a number. NaNs are represented by a bit pattern with an exponent of all ones and a nonzero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN). A QNaN is a NaN with the most significant fraction bit set. QNaNs propagate freely through most arithmetic operations. These values pop out of an operation when the result is not mathematically defined, like division by zero. 104 CHAPTER 3: NUMERIC DATA IN SQL An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaNs can be handy to assign to uninitialized variables to trap premature usage. Semantically, QNaNs denote indeterminate operations, while SNaNs denote invalid operations. SQL has not accepted the IEEE model for mathematics for several reasons. Much of the SQL Standard allows implementation-defined rounding, truncation, and precision to avoid limiting the language to particular hardware platforms. If the IEEE rules for math were allowed in SQL, then we would need type conversion rules for infinite and a way to represent an infinite exact numeric value after the conversion. People have enough trouble with NULL s, so let’s not go there. 3.1.1 BIT, BYTE, and BOOLEAN Data Types Machine-level things like a BIT or BYTE data type have no place in SQL. SQL is a high-level language; it is abstract and defined without regard to physical implementation. This basic principle of data modeling is called data abstraction. Bits and bytes are the lowest units of hardware-specific physical implementation you can get. Are you on a high-end or low-end machine? Does the machine have 8-, 16-, 32-, 64-, or 128-bit words? Twos-complement or ones-complement math? Hey, the SQL Standard allows decimal machines, so bits do not exist at all! What about NULL s in this data type? To be an SQL data type, you have to have NULL s, so what is a NULL bit? By definition, a bit is in one of two states, on or off, and has no NULL . If your vendor adds NULLs to bit, how are the bit-wise operations defined? Oh, what a tangled web we weave when first we mix logical and physical models. What does the implementation of the host languages do with bits? Did you know that +1, +0, − 0 and − 1 are all used for BOOLEAN s, but not consistently? In C#, Boolean values are 0/1 for FALSE / TRUE , while VB.NET has Boolean values of 0/-1 for FALSE / TRUE —and they are proprietary languages from the same vendor. That means all the host languages—present, future, and not-yet-defined—can be different. There are usually two situations in practice. Either the bits are individual attributes, or they are used as a vector to represent a single attribute. In the case of a single attribute, the encoding is limited to two values, which do not port to host languages or other SQLs, cannot be easily understood by an end user, and cannot be expanded. In the second case, what some newbies, who are still thinking in terms of second- and third-generation programming languages or even punch cards, do is build a vector for a series of “yes/no” status codes, 3.2 Numeric Type Conversion 105 failing to see the status vector as a single attribute. Did you ever play the children’s game “20 Questions” when you were young? Imagine you have six components for a loan approval, so you allocate bits in your second-generation model of the world. You have 64 possible vectors, but only 5 of them are valid (i.e., you cannot be rejected for bankruptcy and still have good credit). For your data integrity, you can: 1. Ignore the problem. This is actually what most newbies do. 2. Write elaborate CHECK() constraints with user-defined functions or proprietary bit-level library functions that cannot port and that run like cold glue. Now we add a seventh condition to the vector—which end does it go on? Why? How did you get it in the right place on all the possible hardware that it will ever use? Did all the code that references a bit in a word by its position do it right after the change? You need to sit down and think about how to design an encoding of the data that is high-level, general enough to expand, abstract, and portable. For example, is that loan approval a hierarchical code? Concatenation code? Vector code? Did you provide codes for unknown, missing, and N/A values? It is not easy to design such things! 3.2 Numeric Type Conversion There are a few surprises in converting from one numeric type to another. The SQL Standard left it up to the implementation to answer a lot of basic questions, so the programmer has to know his package. 3.2.1 Rounding and Truncating When an exact or approximate numeric value is assigned to an exact numeric column, it may not fit. SQL says that the database engine will use an approximation that preserves leading significant digits of the original number after rounding or truncating. The choice of whether to truncate or round is implementation-defined, however. This can lead to some surprises when you have to shift data among SQL implementations, or shift storage values from a host language program into an SQL table. It is probably a good idea to create the columns with more decimal places than you think you need. Truncation is defined as truncation toward zero; this means that 1.5 would truncate to 1, and −1.5 would truncate to −1. This is not true for 106 CHAPTER 3: NUMERIC DATA IN SQL all programming languages; everyone agrees on truncation toward zero for the positive numbers, but you will find that negative numbers may truncate away from zero (i.e., −1.5 would truncate to −2). SQL is also indecisive about rounding, leaving the implementation free to determine its method. There are two major types of rounding in programming. The scientific method looks at the digit to be removed. If this digit is 0, 1, 2, 3, or 4, you drop it and leave the higher-order digit to its left unchanged. If the digit is 5, 6, 7, 8, or 9, you drop it and increment the digit to its left. This method works with a small set of numbers and was popular with FORTRAN programmers because it is what engineers use. The commercial method looks at the digit to be removed. If this digit is 0, 1, 2, 3, or 4, you drop it and leave the digit to its left unchanged. If the digit is 6, 7, 8, or 9, you drop it and increment the digit to its left. However, when the digit is 5, you want to have a rule that will round up about half the time. One rule is to look at the digit to the left: if it is odd, then leave it unchanged; if it is even, increment it. There are other versions of the decision rule, but they all try to make the rounding error as small as possible. This method works with a large set of numbers, and it is popular with bankers because it reduces the total rounding error in the system. Another convention is to round to the nearest even number, so that both 1.5 and 2.5 round to 2, and 3.5 and 4.5 both round to 4. This rule keeps commercial rounding symmetric. The following expression uses the MOD() function to determine whether you have an even number or not. ROUND (CAST (amount - .0005 AS DECIMAL (14,4)) - (CAST (MOD (CAST (amount*100.0 + .99 AS INTEGER), 2) AS DECIMAL (14,4))/1000.0), 2); In commercial transactions, you carry money amounts to four or more decimal places, but round them to two decimal places for display. This is a GAAP (Generally Accepted Accounting Practice) in the United States for U.S. dollars and a law in Europe for working with euros. Here is your first programming exercise for the notes you are making on your SQL. Generate a table of 5,000 random numbers, both positive and negative, with four decimal places. Round the test data to two decimal places, and total them using both methods. Notice the difference, and save those results. Now load those same numbers into a table in your SQL, like this: 3.2 Numeric Type Conversion 107 CREATE TABLE RoundTest (original DECIMAL(10,4) NOT NULL, rounded DECIMAL(10,2) NOT NULL); insert the test data INSERT INTO RoundTest (original) VALUES (2134.5678. 0.00); etc. UPDATE RoundTest SET rounded = original; write a program to use both rounding methods compare those results to this query SELECT SUM(original), SUM(rounded) FROM RoundTest; Compare these results to those from the other two tests. Now you know what your particular SQL is doing. Or, if you got a third answer, there might be other things going on, which we will deal with in Chapter 21 on aggregate functions. We will postpone discussion here, but the order of the rows in a SUM() function can make a difference in accumulated floating-point rounding error. 3.2.2 CAST() Function SQL-92 defines the general CAST(<cast operand> AS <data type>) function for all data type conversions, but most implementations use several specific functions of their own for the conversions they support. The SQL-92 CAST() function is not only more general, but it also allows the <cast operand> to be either a <column name>, a <value expression>, or a NULL. For numeric-to-numeric conversion, you can do anything you wish, but you have to watch for the rounding errors. The comparison predicates can hide automatic type conversions, so be careful. SQL implementations will also have formatting options in their conversion functions that are not part of the Standard. These functions either use a PICTURE string, like COBOL or some versions of BASIC, or return their results in a format set in an environment variable. This is very implementation-dependent. 108 CHAPTER 3: NUMERIC DATA IN SQL 3.3 Four-Function Arithmetic SQL is weaker than a pocket calculator. The dyadic arithmetic operators +, −, *, and / stand for addition, subtraction, multiplication, and division, respectively. The multiplication and division operators are of equal precedence and are performed before the dyadic plus and minus operators. In algebra and in some programming languages, the precedence of arithmetic operators is more restricted. They use the “My Dear Aunt Sally” rule; that is, multiplication is done before division, which is done before addition, which is done before subtraction. This practice can lead to subtle errors. For example, consider (largenum + largenum − largenum), where largenum is the maximum value that can be represented in its numeric data type. If you group the expression from left to right, you get ((largenum + largenum) − largenum) = overflow error! However, if you group the expression from right to left, you get (largenum + (largenum − largenum)) = largenum. Because of these differences, an expression that worked one way in the host language may get different results in SQL, and vice versa. SQL could reorder the expressions to optimize them, but in practice, you will find that many implementations will simply parse the expressions from left to right. The best way to be safe is always to make extensive use of parentheses in all expressions, whether they are in the host language or in your SQL. The monadic plus and minus signs are allowed and you can string as many of them in front of a numeric value of variables as you like. The bad news about this decision is that SQL also uses Ada-style comments, which put the text of a comment line between a double dash and a new line-character. This means that the parser has to figure out whether “ ” is two minus signs or the start of a comment. Most versions of SQL also support C-style comment brackets (i.e., /* comment text */). Such brackets have been proposed in the SQL3 discussion papers, because some international data transmission standards do not recognize a new line in a transmission, and the double-dash convention will not work. If both operands are exact numeric, the data type of the result is exact numeric, as you would expect. Likewise, an approximate numeric in a calculation will cast the results to approximate numeric. The kicker is in how the results are assigned in precision and scale. Let S1 and S2 be the scale of the first and second operands, respectively. The precision of the result of addition and subtraction is implementation-defined, and the scale is the maximum of S1 and S2. 3.4 Arithmetic and NULLs 109 The precision of the result of multiplication is implementation-defined, and the scale is (S1 + S2). The precision and scale of the result of division are implementation-defined, and so are some decisions about rounding or truncating results. The ANSI X3H2 committee debated about requiring precision and scales in the standard and finally gave up. This means I can start losing high-order digits, especially with a division operation, where it is perfectly legal to make all results single-digit integers. Nobody does anything that stupid in practice. In the real world, some vendors allow you to adjust the number of decimal places as a system parameter, some default to a few decimal places, and some display as many decimal places as they can, so that you can round off to what you want. You will simply have to learn what your implementation does by experimenting with it. Most vendors have extended this set of operators with other common mathematical functions. The most common additional functions are modulus, absolute value, power, and square root. But it is also possible to find logarithms to different bases, and to perform exponential, trigonometric, and other scientific, statistical, and mathematical functions. Precision and scale are implementation-defined for these functions, of course, but they tend to follow the same design decisions as the arithmetic did. The reason is obvious: they are using the same library routines under the covers as the math package in the database engine. 3.4 Arithmetic and NULLs NULLs are probably one of the most formidable database concepts for the beginner. Chapter 6 is devoted to a detailed study of how NULLs work in SQL, but this section is concerned with how they act in arithmetic expressions. The NULL in SQL is only one way of handling missing values. The usual description of NULLs is that they represent currently unknown values that might be replaced later with real values when we know something. This definition actually covers a lot of territory. The Interim Report 75-02-08 to the ANSI X3 (SPARC Study Group 1975) showed 14 different kinds of incomplete data that could appear as the results of operations or as attribute values. They included such things as arithmetic underflow and overflow, division by zero, string truncation, raising zero to the zeroth power, and other computational errors, as well as missing or unknown values. 110 CHAPTER 3: NUMERIC DATA IN SQL The NULL is a global creature, not belonging to any particular data type but able to replace any of their values. This makes arithmetic a bit easier to define. You have to specifically forbid NULLs in a column by declaring the column with a NOT NULL constraint. But in SQL-92, you can use the CAST function to declare a specific data type for a NULL, such as CAST(NULL AS INTEGER). One reason for this convention is completeness; another is to let you pass information about how to create a column to the database engine. The basic rule for math with NULLs is that they propagate. An arithmetic operation with a NULL will return a NULL. That makes sense; if a NULL is a missing value, then you cannot determine the results of a calculation with it. However, the expression ( NULL / 0) looks strange to people. The first thought is that a division by zero should return an error; if NULL is a true missing value, there is no value to which it can resolve and make that expression valid. However, SQL propagates the NULL, while a non- NULL value divided by zero will cause a runtime error. 3.5 Converting Values to and from NULL Since host languages do not support NULLs, the programmer can elect either to replace them with another value that is expressible in the host language or to use indicator variables to signal the host program to take special actions for them. 3.5.1 NULLIF() Function SQL specifies two functions, NULLIF() and the related COALESCE(), that can be used to replace expressions with NULL, and vice versa. They are part of the CASE expression family. The NULLIF(V1, V2) function has two parameters. It is equivalent to the following CASE expression: NULLIF(V1, V2) := CASE WHEN (V1 = V2) THEN NULL ELSE V1 END; That is, when the first parameter is equal to the second, the function returns a NULL; otherwise, it returns the first parameter’s value. The properties of this function allow you to use it for many purposes. The important properties are these: 3.5 Converting Values to and from NULL 111 1. NULLIF(x, x) will return NULL for all values of x. This includes NULL, since (NULL = NULL) is UNKNOWN, not TRUE. 2. NULLIF(0, (x - x)) will convert all non-NULL values of x into NULL. But it will convert x NULL into x zero, since (NULL - NULL) is NULL, and the equality test will fail. 3. NULLIF(1, (x - x + 1)) will convert all non-NULL values of x into NULL. But it will convert a NULL into a one. This can be general- ized for all numeric data types and values. 3.5.2 COALESCE() Function The COALESCE(<value expression>, , <value expression>) function scans the list of <value expression>s from left to right, determines the highest data type in the list, and returns the first non- NULL value in the list, casting it to the highest data type. If all the <value expression>s are NULL, the result is NULL. The most common use of this function is in a SELECT list, where there are columns that have to be added, but one can be a NULL. For example, to create a report of the total pay for each employee, you might write this query: SELECT emp_nbr, emp_name, (salary + commission) AS totalpay FROM Employees; But salesmen may work on commission only, or on a mix of salary and commission. The office staff is on salary only. This means an employee could have NULLs in his salary or commission column, which would propagate in the addition and produce a NULL result. A better solution would be: SELECT emp_nbr, emp_name (COALESCE(salary, 0) + COALESCE(commission, 0)) AS paycheck FROM Employees; A more elaborate use for the COALESCE() function is with aggregate functions. Consider a table of customers’ purchases with a category code and the amount of each purchase. You are to construct a query that will . the start of a comment. Most versions of SQL also support C-style comment brackets (i.e., /* comment text */). Such brackets have been proposed in the SQL3 discussion papers, because some. bits for double precision, which is just right for Intel- based PCs, Macintoshes, and most UNIX platforms. Its math functions are available burned into processor chips, so they will run faster. implementations will simply parse the expressions from left to right. The best way to be safe is always to make extensive use of parentheses in all expressions, whether they are in the host language

Định dạng
Số trang	10
Dung lượng	149,61 KB