Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 202 Part II Manipulating Data With Select Case expressions SQL Server’s CASE expression is a flexible and excellent means of building dynamic expressions. If you’re a programmer, no doubt you use the case command in other languages. The SQL CASE expression, however, is different. It’s not used for programmatic flow of control, but rather to logically determine the value of an expression based on a condition. Best Practice W hen programmers write procedural code, it’s often because part of the formula changes depending on the data. To a procedural mind-set, the best way to handle this is to loop through the rows and use multiple IF statements to branch to the correct formula. However, using a CASE expression to handle the various calculations and executing the entire operation in a single query enables SQL Server to optimize the process and make it dramatically faster. Because the case expression returns an expression, it may be used anywhere in the SQL DML statement ( SELECT, INSERT, UPDATE, DELETE) where an expression may be used, including column expressions, join conditions, where conditions, having conditions, in the ORDER BY, or even embedded in a longer expression. A case expression can even be used mid-expression to create a dynamic formula – very powerful. The CASE statement has two forms, simple and searched, described in the following sections. Simple case With the simple CASE, the variable is presented first and then each test condition is listed. However, this version of CASE is limited in that it can perform only equal comparisons. The CASE expression sequentially checks the WHEN conditions and returns the THEN value of the first true WHEN condition. In the following example, based on the OBXKites database, one CustomerType is the default for new customers and is set to true in the IsDefault column. The CASE expression compares the value in the default column with each possible bit setting and returns the character string ‘default type’ or ‘possible’ basedonthebitsetting: USE OBXKites; SELECT CustomerTypeName, CASE IsDefault WHEN 1 THEN ‘default type’ WHEN 0 THEN ‘possible’ ELSE ‘-’ END AS AssignStatus FROM CustomerType; 202 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 203 Data Types, Expressions, and Scalar Functions 9 Result: CustomerTypeName AssignStatus Preferred possible Wholesale possible Retail default type The CASE expression concludes with an end and an alias. In this example, the CASE expression evalu- ates the IsDefault column, but produces the AssignStatus column in the SQL SELECT result set. Be careful if you use NULL in a simple CASE. This translates literally to ‘‘=NULL’’ and not to ‘‘IS NULL’’. You can get unintended results if you are not careful. Boolean case The Boolean form of case (called the searched case in BOL) is more flexible than the simple form in that each individual case has its own Boolean expression. Therefore, not only can each WHEN condition include comparisons other than =, but the comparison may also reference different columns: SELECT CASE WHEN 1<0 THEN ‘Reality is gone.’ WHEN CURRENT_TIMESTAMP = ‘20051130’ THEN ‘David gets his driver’’s license.’ WHEN 1>0 THEN ‘Life is normal.’ END AS RealityCheck; Following is the result of the query when executed on David’s sixteenth birthday: RealityCheck David gets his driver’s license. As with the simple case, the first true WHEN condition halts evaluation of the case and returns the THEN value. In this case (a pun!), if 1 is ever less than 0, then the RealityCheck case will accurately report ‘reality is gone.’ When my son turns 16, the RealityCheck will again accurately warn us of his legal driving status. If neither of these conditions is true, and 1 is still greater than 0, then all is well with reality and ‘Life is normal.’ The point of the preceding code is that the searched CASE expression offers more flexibility than the simple CASE. This example mixed various conditional checks (<,=,>), and differing data was checked by the WHEN clause. The Boolean CASE expression can handle complex conditions, including Boolean AND and OR opera- tors. The following code sample uses a batch to set up the CASE expression (including T-SQL variables, which are explained in Chapter 21, ‘‘Programming with T-SQL’’), and the CASE includes an AND and a BETWEEN operator: DECLARE @b INT, @q INT; 203 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 204 Part II Manipulating Data With Select SET @b = 2007; SET @q = 25; SELECT CASE WHEN @b = 2007 AND @q BETWEEN 10 AND 30 THEN 1 ELSE NULL END AS Test; Result: Test 1 Working with nulls The relational database model represents missing data using null. Technically, null means ‘‘value absent’’ and it’s commonly understood to mean ‘‘unknown.’’ In practice, null can indicate that the data has not yet been entered into the database or that the column does not apply to the particular row. Because null values are unknown, the result of any expression that includes null will also have a value that is unknown. If the contents of a bank account are unknown, and its funds are included in a port- folio, then the total value of the portfolio is also unknown. The same concept is true in SQL, as the fol- lowing code demonstrates. Phil Senn, a database developer, puts it this way: ‘‘Nulls zap the life out of any other value.’’ SELECT 1 + NULL; Result: NULL Because nulls have such a devastating effect on expressions, some developers detest the use of nulls. They develop their databases so that nulls are never permitted, and column defaults supply surrogate nulls (blank, 0, or ‘n/a’) instead. Other database developers argue that an unknown value should be represented by a zero or a blank just to make coding easier. I fall into the latter camp. Nulls are valuable in a database because they provide a consistent method of identifying missing data. And regardless of how missing data is represented in the database, certain types of queries will often produce nulls in the results, so it’s worthwhile to write code that checks for nulls and handles them appropriately. An advantage to using nulls is that SQL Server’s AVG() and COUNT(column) aggregate func- tions automatically exclude nulls from the calculation. If you’re using a surrogate null (for example, I’ve seen IT shops use 0 or -999 to represent missing numeric data) then every aggregate query must filter out the surrogate null or the results will be less than accurate. Testing for null Because null represents a missing value, there is no way to know whether a null is equal or unequal to a given value, or even to another null. Returning to the bank account example, if the balance of account 123 is missing and the balance of account 234 is missing, then it’s logically impossible to say whether the two accounts have an equal or unequal balance. 204 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 205 Data Types, Expressions, and Scalar Functions 9 Consider this simple test which proves that null does not equal null: IF NULL = NULL SELECT ‘=’; ELSE SELECT ‘<> ’; Result: <> Because the = and <> operators can’t check for nulls, SQL includes two special operators, IS and IS NOT, to test for equivalence to special values, as follows: WHERE Expression IS NULL Repeating the simple test, the IS search operator works as advertised: IF NULL IS NULL SELECT ‘Is’; ELSE SELECT ‘Is Not’; Result: Is The IS search condition may be used in the SELECT statement’s WHERE clause to locate rows with null values. Most of the Cape Hatteras Adventures customers do not have a nickname in the database. The following query retrieves only those customers with a null in the Nickname column: USE CHA2; SELECT FirstName, LastName, Nickname FROM dbo.Customer WHERE Nickname IS NULL ORDER BY LastName, FirstName; Result: FirstName LastName Nickname Debbie Andrews NULL Dave Bettys NULL Jay Brown NULL Lauren Davis NULL The IS operator may be combined with NOT to test for the presence of a value by restricting the result set to those rows where Nickname is not null: SELECT FirstName, LastName, Nickname FROM dbo.Customer WHERE Nickname IS NOT NULL ORDER BY LastName, FirstName; 205 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 206 Part II Manipulating Data With Select Result: FirstName LastName Nickname Joe Adams Slim Melissa Anderson Missy Frank Goldberg Frankie Raymond Johnson Ray Handling nulls When you are supplying data to reports, to end users, or to some applications, a null value will be less than welcome. Often a null must be converted to a valid value so that the data may be understood, or so the expression won’t fail. Nulls require special handling when used within expressions, and SQL includes a few functions designed specifically to handle nulls. ISNULL() and COALESCE() convert nulls to usable values, and NULLIF() creates a null if the specified condition is met. Using the COALESCE() function COALESCE() is not used as often as it could (some would say should) be, perhaps because it’s not well known. It’s a very cool function. COALESCE() accepts a list of expressions or columns and returns the first non-null value, as follows: COALESCE(expression, expression, ) COALESCE() is derived from the Latin words co + alescre, which mean to unite toward a common end, to grow together, or to bring opposing sides together for a common good. The SQL keyword, however, is derived from the alternate meaning of the term: ‘‘to arise from the combination of distinct elements.’’ In a sense, the COALESCE() function brings together multiple, differing values of unknown usefulness, and from them emerges a single valid value. Functionally, COALESCE() isthesameasthefollowingcaseexpression: CASE WHEN expression1 IS NOT NULL THEN expression1 WHEN expression2 IS NOT NULL THEN expression2 WHEN expression3 IS NOT NULL THEN expression3 ELSE NULL END The following code sample demonstrates the COALESCE() function returning the first non-null value. In this case, it’s 1+2: SELECT COALESCE(NULL, 1+NULL, 1+2, ‘abc’); 206 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 207 Data Types, Expressions, and Scalar Functions 9 Result: 3 COALESCE() is excellent for merging messy data. For example, when a table has partial data in several columns, the COALESCE() function can help pull the data together. In one project I worked on, the client had collected names and addresses from several databases and applications into a single table. The contact name and company name made it into the proper columns, but some addresses were in Address1, some were in Address2, and some were in Address3. Some rows had the second line of the address in Address2. If the address columns had an address, then the SalesNote was a real note. In many cases, however, the addresses were in the SalesNote column. Here’s the code to extract the address from such a mess: SELECT COALESCE( Address1 + STR(13) + STR(10) + Address2, Address1, Address2, Address3, SalesNote) AS NewAddress FROM TempSalesContacts; For each row in the TempSalesContacts table, the COALESCE() function will search through the listed columns and return the first non-null value. The first expression returns a value only if there’s a value in both Address1 and Address2, because a value concatenated with a null produces a null. Therefore, if a two-line address exists, then it will be returned. Otherwise, a one-line address in Address1, Address2, or Address3 will be returned. Failing those options, the SalesNote column will be returned. Of course, the result from such a messy source table still needs to be manually scanned and verified. Using the ISNULL() function The most common null-handling function is ISNULL(), which is different from the IS NULL search condition. This function accepts a single expression and a substitution value. If the source is not equal to null, then the ISNULL() function passes the value on. However, if the source is null, then the sec- ond parameter is substituted for the null, as follows: ISNULL(source_expression, replacement_value) Functionally, ISNULL() is similar to the following case expression: CASE WHEN source_expression IS NULL THEN replacement_value ELSE source_expression END The following code sample builds on the preceding queries by substituting the string (’NONE’) for a null for customers without a nickname: SELECT FirstName, LastName, ISNULL(Nickname,’none’) FROM Customer 207 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 208 Part II Manipulating Data With Select ORDER BY LastName, FirstName; Result: FirstName LastName Nickname Joe Adams Slim Melissa Anderson Missy Debbie Andrews none Dave Bettys none If the row has a value in the Nickname column, then that value is passed though the ISNULL() func- tion untouched. However, if the nickname is null for a row, then the null is handled by the ISNULL() function and converted to the value none. The ISNULL() function is specific to T-SQL, whereas NULLIF() is ANSI standard SQL. Using the NULLIF() function Sometimes a null should be created in place of surrogate null values. If a database is polluted with n/a, blank, or – values where it should contain nulls, then you can use the NULLIF() function to replace the inconsistent values with nulls and clean the database. The NULLIF() function accepts two parameters. If they are equal, then it returns a null; otherwise, it returns the first parameter. Functionally, NULLIF() isthesameasthefollowingcaseexpression: CASE WHEN Expression1 = Expression2 THEN NULL ELSE Expression1 END The following code will convert any blanks in the Nickname column into nulls. The first statement updates one of the rows to a blank for testing purposes: UPDATE Customer SET Nickname = ‘’ WHERE LastName = ‘Adams’; SELECT LastName, FirstName, CASE Nickname WHEN ‘’ THEN ‘blank’ ELSE Nickname END AS Nickname, NULLIF(Nickname, ‘’) as NicknameNullIf FROM dbo.Customer WHERE LastName IN (’Adams’, ‘Anderson’, ‘Andrews’) ORDER BY LastName, FirstName; 208 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 209 Data Types, Expressions, and Scalar Functions 9 Result: LastName FirstName Nickname NicknameNullIf Adams Joe blank NULL Anderson Melissa Missy Missy Andrews Debbie NULL NULL The third column uses a case expression to expose the blank value as ‘‘blank,’’ and indeed the NULLIF() function converts the blank value to a null in the fourth column. To test the other null possibilities, Melissa’s Nickname was not affected by the NULLIF() function, and Debbie’s null Nickname value is still in place. A common use of NULLIF() prevents divide-by-zero errors. The following expression will generate an error if the variable b is zero: a/b∼∼ Error if b is 0, otherwise a normal division result However, you can use NULLIF() such that if the value of the b variable is 0, it will result in a NULL instead of an error, as follows: a / NULLIF(b,0) ∼∼NULL result if b is 0, otherwise a normal division result Now with a 0 as the result instead of an error, COALESCE() can be used to replace it with something more usable if needed. Scalar Functions Scalar functions return a single value. They are commonly used in expressions within the SELECT, WHERE, ORDER BY, GROUP,andHAVING clauses, or T-SQL code. SQL Server includes dozens of functions. This section describes the functions I find most useful. Best Practice P erformance is as much a part of the data-schema design as it is a part of the query. Plan to store the data in the way that it will be searched by a WHERE condition, rather than depend on manipulating the data with functions at query time. While using a function in an expression in a result-set column may be unavoidable, using a function in a WHERE condition forces the function to be calculated for every row. In addition, another bottleneck is created because using a function in a WHERE clause makes it impossible for the Query Optimizer to use an index seek — it has to use a scan instead, resulting in much more I/O. With SQL Server 2008 you can develop three types of user-defined functions, as explained in Chapter 25, ‘‘Building User-Defined Functions.’’ 209 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 210 Part II Manipulating Data With Select User information functions In a client/server environment, it’s good to know who the client is. Toward that end, the following four functions are very useful, especially for gathering audit information: ■ USER_NAME(): Returns the name of the current user as he or she is known to the database. When a user is granted access to a database, a username that is different from the server login name may be assigned. The results are affected by an EXECUTE AS command, in which case the username shown is that of the impersonated user. ■ SUSER_SNAME(): Returns the login name by which the user was authenticated to SQL Server. If the user was authenticated as a member of a Windows user group, then this function still returns the user’s Windows login name. The results are affected by an EXECUTE AS command, in which case the username shown is that of the impersonated user. ■ HOST_NAME(): Returns the name of the user’s workstation. ■ APP_NAME(): Returns the name of the application (if set by the application itself) connected to SQL Server, as follows: SELECT USER_NAME() AS ‘User’, SUSER_SNAME() AS ‘Login’, HOST_NAME() AS ‘Workstation’, APP_NAME() AS ‘Application’; Result: User Login Workstation Application Dbo NOLI\Paul CHA2\NOLI Management Studio Date and time functions Databases must often work with date and time data, and SQL Server includes several useful functions for that. SQL Server stores both the data and the time in a single data type. It also has types for date only, time only, and zone-aware times. T-SQL includes several functions to return the current date and time: ■ GetDate(): Returns the current server date and time to the nearest 3 1 3 milliseconds, rounded to the nearest value ■ CURRENT_TIMESTAMP:ThesameasGETDATE() except ANSI standard ■ GetUTCDate(): Returns the current server date converted to Greenwich mean time (also known as UTC time) to the nearest 3 milliseconds. This is extremely useful for companies that cross time boundaries. New to SQL Server 2008: ■ SysDateTime(): Returns the current server date and time to the nearest hundred nanoseconds ■ SysUTCDateTime(): Returns the current server date converted to Greenwich mean time to the nearest hundred nanoseconds 210 www.getcoolebook.com Nielsen c09.tex V4 - 07/21/2009 12:40pm Page 211 Data Types, Expressions, and Scalar Functions 9 ■ SYSDATETIMEOFFSET(): Returns a DateTimeOffset value that contains the date and time of the computer on which the instance of SQL Server is running. The time zone offset is included. ■ ToDateTimeOffset(): Returns a DateTimeOffset type The following four SQL Server date-time functions handle extracting or working with a specific portion of the date or time stored within a datetime column: ■ DATEADD(date portion, number, date): Returns a new value after adding the number ■ DATEDIFF(date portion, start date, end date): Returns the count of the date portion boundaries ■ DateName(date portion, date): Returns the proper name for the selected portion of the datetime value or its ordinal number if the selected portion has no name (the portions for DateName() and DatePart() are listed in Table 9-2): SELECT DATENAME(year, CURRENT_TIMESTAMP) AS "Year"; Result: Year 2009 This code gets the month and weekday name: select DATENAME(MONTH,CURRENT_TIMESTAMP) as "Month", DATENAME(WEEKDAY,CURRENT_TIMESTAMP) As "Day" Result Month Day February Tuesday This code gets the month and weekday name and displays the results in Italian: Set language Italian select DATENAME(MONTH,CURRENT_TIMESTAMP) as "Month", DATENAME(WEEKDAY,CURRENT_TIMESTAMP) As "Day" Result Month Day Febbraio Martedi For more information about datetime, datetime2, and other data types, refer to Chapter 20, ‘‘Creating the Physical Database Schema.’’ The following code example assigns a date of birth to Mr. Frank and then retrieves the proper names of some of the portions of that date of birth using the DateName() function: UPDATE Guide SET DateOfBirth = ‘September 4 1958’ WHERE LastName = ‘Frank’; 211 www.getcoolebook.com . T -SQL code. SQL Server includes dozens of functions. This section describes the functions I find most useful. Best Practice P erformance is as much a part of the data-schema design as it is a part. a scan instead, resulting in much more I/O. With SQL Server 2008 you can develop three types of user-defined functions, as explained in Chapter 25, ‘‘Building User-Defined Functions.’’ 209 www.getcoolebook.com Nielsen. time functions Databases must often work with date and time data, and SQL Server includes several useful functions for that. SQL Server stores both the data and the time in a single data type. It