Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 542 Part IV Developing with SQL Server OrderID UNIQUEIDENTIFIER NOT NULL FOREIGN KEY REFERENCES dbo.[Order] ON DELETE CASCADE, ProductID UNIQUEIDENTIFIER NULL FOREIGN KEY REFERENCES dbo.Product, Chapter 23, ‘‘T-SQL Error Handling,’’ shows how to create triggers that handle custom ref- erential integrity and cascading deletes for nonstandard data schemas or cross-database referential integrity. Creating User-Data Columns A user-data column stores user data. These columns typically fall into two categories: columns users use to identify a person, place, thing, event, or action, and columns that further describe the person, place, thing, event, or action. SQL Server tables may have up to 1,024 columns, but well-designed relational-database tables seldom have more than 25, and most have only a handful. Data columns are created during table creation by listing the columns as parameters to the CREATE TABLE command. The columns are listed within parentheses as column name, data type, and any column attributes such as constraints, nullability, or default value: CREATE TABLE TableName ( ColumnName DATATYPE Attributes, ColumnName DATATYPE Attributes ); Data columns can be added to existing tables using the ALTER TABLE ADD columnname command: ALTER TABLE TableName ADD ColumnName DATATYPE Attributes; An existing column may be modified with the ALTER TABLE ALTER COLUMN command: ALTER TABLE TableName ALTER COLUMN ColumnName NEWDATATYPE Attributes; To list the columns for the current database using code, query the sys.objects and sys.columns catalog views. Column data types The column’s data type serves two purposes: 542 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 543 Creating the Physical Database Schema 20 ■ It enforces the first level of data integrity. Character data won’t be accepted into a datetime or numeric column. I have seen databases with every column set to nvarchar to ease data entry. What a waste. The data type is a valuable data-validation tool that should not be overlooked. ■ It determines the amount of disk storage allocated to the column. Character data types SQL Server supports several character data types, listed in Table 20-2. TABLE 20-2 Character Data Types Data Type Description Size in Bytes Char(n) Fixed-length character data up to 8,000 characters long using collation character set Defined length *1byte Nchar(n) Unicode fixed-length character data Defined length *2bytes VarChar(n) Variable-length character data up to 8,000 characters long using collation character set 1 byte per character VarChar(max) Variable-length character data up to 2GB in length using collation character set 1 byte per character nVarChar(n) Unicode variable-length character data up to 8,000 characters long using collation character set 2 bytes per character nVarChar(max) Unicode variable-length character data up to 2GB in length using collation character set 2 bytes per character Text Variable-length character data up to 2,147,483,647 characters in length Warning: Deprecated 1 byte per character nText Unicode variable-length character data up to 1,073,741,823 characters in length Warning: Deprecated 2 bytes per character Sysname A Microsoft user-defined data type used for table and column names that is the equivalent of nvarchar(128) 2 bytes per character Unicode data types are very useful for storing multilingual data. The cost, however, is the doubled size. Some developers use nvarchar for all their character-based columns, while others avoid it at all costs. I recommend using Unicode data when the database might use foreign languages; otherwise, use char, varchar,ortext. Numeric data types SQL Server supports several numeric data types, listed in Table 20-3. 543 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 544 Part IV Developing with SQL Server Best Practice W hen working with monetary values, be very careful with the data type. Using float or real data types for money will cause rounding errors. The data types money and smallmoney are accurate to one hundredth of a U.S. penny. For some monetary values, the client may request precision only to the penny, in which case decimal is the more appropriate data type. TABLE 20-3 Numeric Data Types Data Type Description Size in Bytes Bit 1or0 1bit Tinyint Integers from 0 to 255 1 byte Smallint Integers from -32,768 to 32,767 2 bytes Int Integers from -2,147,483,648 to 2,147,483,647 4 bytes Bigint Integers from -2 ˆ 63 to 2 ˆ 63-1 8 bytes Decimal or Numeric Fixed-precision numbers up to -10 ˆ 38 + 1 Varies according to length Money Numbers from -2 ˆ 63 to 2 ˆ 63, accuracy to one ten-thousandths (.0001) 8bytes SmallMoney Numbers from -214,748.3648 through +214,748.3647, accuracy to ten thousandths (.0001) 4bytes Float Floating-point numbers ranging from -1.79E + 308 through 1.79E + 308, depending on the bit precision 4or8bytes Real Float with 24-bit precision 4 bytes Date/Time data t ypes Traditionally, SQL Server stores both the date and the time in a single column using the datetime and smalldatetime data types, described in Table 20-4. With SQL Server 2008, Microsoft released several new date/time data types, making life much easier for database developers. Some programmers (non-DBAs) choose character data types for date columns. This can cause a horrid conversion mess. Use the IsDate() function to sort through the bad data. Other data types Other data types, listed and described in Table 20-5, fulfill the needs created by unique values, binary large objects, and variant data. 544 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 545 Creating the Physical Database Schema 20 TABLE 20-4 Date/Time Data Types Data Type Description Size in Bytes Datetime Date and time values from January 1, 1553 (beginning of the Julian calendar), through December 31, 9999, accurate to three milliseconds 8 bytes Smalldatetime Date and time values from January 1, 1900, through June 6, 2079, accurate to one minute 4 bytes DateTime2() Date and time values January 1, 0001 through December 31, 9999 (Gregorian calendar), variable accuracy from .01 seconds to 100 nanoseconds 6–8 bytes depending on precision Date Date and time values January 1, 0001 through December 31, 9999 (Gregorian calendar) 3 bytes Time(2) Time values, variable accuracy from .01 seconds to 100 nanoseconds 3–5 bytes depending on precision Datetimeoffset Date and time values January 1, 0001 through December 31, 9999 (Gregorian calendar), variable accuracy from .01 seconds to 100 nanoseconds, includes embedded time zone 8–10 bytes depending on precision TABLE 20-5 Other Data Types Data Type Description Size in Bytes Timestamp or Rowversion Database-wide unique random value generated with every update based on the transaction log LSN value 8 bytes Uniqueidentifier System-generated 16-byte value 16 bytes Binary(n) Fixed-length data up to 8,000 bytes Defined length VarBinary(max) Fixed-length data up to 8,000 bytes Defined length VarBinary Variable-length binary data up to 8,000 bytes Bytes used Image Variable-length binary data up to 2,147,483,647 bytes Warning: Deprecated Bytes used Sql_variant Can store any data type up to 2,147,483,647 bytes Depends on data type and length 545 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 546 Part IV Developing with SQL Server Calculated columns A calculated column is powerful in that it presents the results of a predefined expression the way a view (a stored SQL SELECT statement) does, but without the overhead of a view. Calculated columns also improve data integrity by performing the calculation at the table level, rather than trusting that each query developer will get the calculation correct. By default, a calculated column doesn’t actually store any data; instead, the data is calculated when queried. However, since SQL Server 2005, calculated columns may be optionally persisted, in which case they are calculated when entered and then sorted as regular, but read-only, row data. They may even be indexed. Personally, I’ve replaced several old triggers with persisted, indexed, calculated columns with great success. They’re easy, and fast. The syntax simply defines the formula for the calculation in lieu of designating a data type: ColumnName as Expression The OrderDetail table from the OBXKites sample database includes a calculated column for the extended price, as shown in the following abbreviated code: CREATE TABLE dbo.OrderDetail ( Quantity NUMERIC(7,2) NOT NULL, UnitPrice MONEY NOT NULL, ExtendedPrice AS Quantity * UnitPrice Persisted, ) ON [Primary]; Go Sparse columns New for SQL Server 2008, sparse columns use a completely different method for storing data within the page. Normal columns have a predetermined designated location for the data. If there’s no data, then some space is wasted. Even nullable columns use a bit to indicate the presence or absence of a null for the column. Sparse columns, however, store nothing on the page if no data is present for the column for that row. To accomplish this, SQL Server essentially writes the list of sparse columns that have data into a list for the row (5 bytes + 2–4 bytes for every sparse column with data). If the columns usually hold data, then sparse columns actually require more space than normal columns. However, if the majority of rows are null (I’ve heard a figure of 50%, but I’d rather go much higher), then the sparse column will save space. Because sparse columns are intended for columns that infrequently hold data, they can be used for very wide tables — up to 30,000 columns. To create a sparse column, add the SPARSE keyword to the column definition. The sparse column must be nullable: 546 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 547 Creating the Physical Database Schema 20 CREATE TABLE Foo ( FooPK INT NOT NULL IDENTITY PRIMARY KEY, Name VARCHAR(25) NOT NULL, ExtraData VARCHAR(50) SPARSE NULL ); Worst Practice A ny table design that requires sparse columns is a horrible design. A different pattern, probably a super-type subtype pattern, should be used instead. Please don’t ever implement a table with sparse columns. Anyone who tells you they need to design a database with sparse columns should get a job flipping burgers. Don’t let them design your database. Column constraints and defaults The database is only as good as the quality of the data. A constraint is a high-speed data-validation check or business-logic check performed at the database-engine level. Besides the data type itself, SQL Server includes five types of constraints: ■ Primary key constraint: Ensures a unique non-null key ■ Foreign key constraint: Ensures that the value points to a valid key ■ Nullability: Indicates whether the column can accept a null value ■ Check constraint: Custom Boolean constraint ■ Unique constraint: Ensures a unique value SQL Server also includes the following column option: ■ Column Default: Supplies a value if none is specified in the INSERT statement The column default is referred to as a type of constraint on one page of SQL Server Books Online, but is not listed in the constraints on another page. I call it a column option because it does not constrain user-data entry, nor does it enforce a data-integrity rule. However, it serves the column as a useful option. Column nullability A null value is an unknown value; typically, it means that the column has not yet had a user entry. Chapter 9, ‘‘Data Types, Expressions, and Scalar Functions,’’ explains how to define, detect, and handle nulls. Whether or not a column will even accept a null value is referred to as the nullability of the column and is configured by the null or not null column attribute. New columns in SQL Server default to not null, meaning that they do not accept nulls. However, this option is normally overridden by the connection property ansi_null_dflt_on. The ANSI standard is 547 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 548 Part IV Developing with SQL Server to default to null, which accepts nulls, in table columns that aren’t explicitly created with a not null option. Best Practice B ecause the default column nullability differs between ANSI SQL and SQL Server, it’s best to avoid relying on the default behavior and explicitly declare null or not null when creating tables. The following code demonstrates the ANSI default nullability versus SQL Server’s nullability. The first test uses the SQL Server default by setting the database ANSI NULL option to false,andthe ANSI_NULL_DFLT_OFF connection setting to ON: USE TempDB; EXEC sp_dboption ‘TempDB’, ANSI_NULL_DEFAULT, ‘false’; SET ANSI_NULL_DFLT_OFF ON; The NullTest table is created without specifying the nullability: CREATE TABLE NullTest( PK INT IDENTITY, One VARCHAR(50) ); The following code attempts to insert a null: INSERT NullTest(One) VALUES (NULL); Result: Server: Msg 515, Level 16, State 2, Line 1 Cannot insert the value NULL into column ‘One’, table ‘TempDB.dbo.NullTest’; column does not allow nulls. INSERT fails. The statement has been terminated. Because the nullability was set to the SQL Server default when the table was created, the column does not accept null values. The second sample will rebuild the table with the ANSI SQL nullability default: EXEC sp_dboption ‘TempDB’, ANSI_NULL_DEFAULT, ‘true’; SET ANSI_NULL_DFLT_ON ON; DROP TABLE NullTest; CREATE TABLE NullTest( PK INT IDENTITY, 548 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 549 Creating the Physical Database Schema 20 One VARCHAR(50) ); The next example attempts to insert a null: INSERT NullTest(One) VALUES (NULL); Result: (1 row(s) affected) Managing Optional Data D atabases attempt to model reality. In reality, sometimes there’s standard data that for one reason or another doesn’t apply to a specific object. Some people don’t have a suffix (e.g., Jr. or Sr.). Some addresses don’t have a second line. Some orders are custom jobs and don’t have part numbers. Sometimes the missing data is only temporarily missing and it will be filled in later. A new customer supplies her name and e-mail address, but not her street address. A new order doesn’t yet have a closed date, but will have one later. Every employee will eventually have a termination date. The usual method for handling optional or missing data is with a nullable column. Nulls are controversial at best. Some database modelers use them constantly, while other believe that nulls are evil. Even the meaning of null is debated, with some claiming null means unknown, others saying null means the absence of data. When the bits hit the hard drive, there are three possible solutions for representing optional data in a database. Rather than debate the merits of each option, this is an opportunity to apply the database objectives from Chapter 2, ‘‘Data Architecture’’: ■ Nullable columns: These use a consistent bit to represent the fact that the column is missing data. ■ Surrogate nulls: These use a data flag (e.g., ‘‘na’’, ‘‘n/a’’, empty string, -99) to repre- sent the missing data. While popular with data modelers who want to avoid nulls and left outer joins, this solution has several problems. Real data is being used to represent missing data, so every query must filter out the missing data correctly. Using surrogate nulls for date/time columns is particularly messy. Surrogate nulls in a numeric aggregate must be filtered out (nulls handle this automatically). Over time, surrogate nulls tend to become less consistent as more users or developers employ differing values for the surrogate null. ■ Absent rows: This solution removes the optional data column from the main table and places it in another supertype/subtype table. If the data does not apply to a given row, that row is not inserted into the subtype table, hence the name missing row. While this continued 549 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 550 Part IV Developing with SQL Server continued completely eliminates nulls and surrogate nulls from the database and sounds correct in theory, it presents a host of practical difficulties. Queries are now very complex to code correctly. Left outer joins are required to retrieve or even test for the presence of data in the optional data column. This can create data integrity issues if developers use the wrong type of join; and it kills perfor- mance, as SQL Server has to read from multiple tables and indexes to retrieve data for a single entity. Inserts and updates have to parse out the columns to different tables. Creating Indexes Indexes are the bridge from a query to the data. Without indexes, SQL Server must scan and filter to select specific rows — a dog slow process at best. With the right indexes, SQL Server screams. SQL Server uses two types of indexes: clustered indexes , which reflect the logical sort order of the table, and non-clustered indexes, which are additional b-trees typically used to perform rapid searches of non-key columns. The columns by which the index is sorted are referred to as the key columns. Within the Management Studio’s Object Explorer, existing indexes for each table are listed under the DatabaseName ➪ Tables ➪ TableName ➪ Indexes node. Every index property for new or existing indexes may be managed using the Index Properties page, shown in Figure 20-7. The page is opened for existing indexes by right-clicking on the index and choosing Properties. New indexes are created from the context menu of the Indexes node under the selected table. While this chapter covers the syntax and mechanics of creating indexes, Chapter 64, ‘‘Indexing Strategies,’’ explores how to design indexes for performance. Using Management Studio, indexes are visible as nodes under the table in Object Explorer. Use the Indexes context menu and select New Index to open the New Index form, which contains four pages: ■ General index information includes the index name, type, uniqueness, and key columns. ■ Index Options control the behavior of the index. In addition, an index may be disabled or re-enabled. ■ Included Columns are non-key columns used for covering indexes. ■ The Storage page places the index on a selected filegroup. ■ The Spatial page has configuration options specific to indexes for the spatial data type. ■ The Filter page is for SQL Server 2008’s new WHERE clause option for indexes. When opening the properties of an existing index, the Index Properties form also includes two additional pages: 550 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 551 Creating the Physical Database Schema 20 ■ The Fragmentation page displays detailed information about the health of the index. ■ Extended Properties are user-defined additional properties. FIGURE 20-7 Every index option may be set using Management Studio’s Index Properties page. Changes made in the Index Properties page may be executed immediately using the OK button or scheduled or scripted using the icons at the top of the page. Indexes are created in code with the CREATE INDEX command. The following command creates a clustered index named IxOrderID on the OrderID foreign key of the OrderDetail table: CREATE CLUSTERED INDEX IxOrderID ON dbo.OrderDetail (OrderID); To retrieve fascinating index information from T-SQL code, use the following functions and catalog views: sys.indexes, sys.index_columns, sys.stats, sys.stats_columns, sys.dm_db_index_physical_stats, sys.dm_index_operational_stats, sys.indexkey_property, and sys.index_col. 551 www.getcoolebook.com . ypes Traditionally, SQL Server stores both the date and the time in a single column using the datetime and smalldatetime data types, described in Table 20-4. With SQL Server 2008, Microsoft released. to the data. Without indexes, SQL Server must scan and filter to select specific rows — a dog slow process at best. With the right indexes, SQL Server screams. SQL Server uses two types of indexes:. data types SQL Server supports several numeric data types, listed in Table 20-3. 543 www.getcoolebook.com Nielsen c20.tex V4 - 07/23/2009 8:26pm Page 544 Part IV Developing with SQL Server Best