Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 372 Part II Manipulating Data With Select The first section of the merge query identifies the target and source tables and how they relate. Following the table definition, there’s an optional clause for each match combination, as shown in this simplified syntax: MERGE TargetTable USING SourceTable ON join conditions [WHEN Matched THEN DML] [WHEN NOT MATCHED BY TARGET THEN DML] [WHEN NOT MATCHED BY SOURCE THEN DML] Applying the MERGE command to the airline check-in scenario, there’s an appropriate action for each match combination: ■ If the row is in both FlightPassengers (the target) and CheckIn (the source), then the target is updated with the CheckIn table’s seat column. ■ If the row is present in CheckIn (the source) but there’s no match in FlightPassenger (the target), then the row from CheckIn is inserted into FlightPassenger. Note that the data from the source table is gathered by the INSERT command using INSERT VALUES. ■ If the row is present in FlightPassenger (the target), but there’s no match in CheckIn (the source), then the row is deleted from FlightPassenger. Note that the DELETE command deletes from the target and does not require a WHERE clause because the rows are filtered by the MERGE command. Here’s the complete working MERGE command for the scenario: MERGE FlightPassengers F USING CheckIn C ON C.LastName = F.LastName AND C.FirstName = F.FirstName AND C.FlightCode = F.FlightCode AND C.FlightDate = F.FlightDate WHEN Matched THEN UPDATE SET F.Seat = C.Seat WHEN NOT MATCHED BY TARGET THEN INSERT (FirstName, LastName, FlightCode, FlightDate, Seat) VALUES (FirstName, LastName, FlightCode, FlightDate, Seat) WHEN NOT MATCHED BY SOURCE THEN DELETE ; The next query looks at the results of the MERGE command, returning the finalized passenger list for SQL Server Airlines flight 2008: SELECT FlightID, FirstName, LastName, FlightCode, FlightDate, Seat FROM FlightPassengers 372 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 373 Modifying Data 15 Result: FlightID FirstName LastName FlightCode FlightDate Seat 1 Paul Nielsen SS2008 2009-03-01 9F 2 Sue Jenkins SS2008 2009-03-01 7A 4 Jerry Nixon SS2008 2009-03-01 2A 5 Missy Anderson SS2008 2009-03-01 4B MERGE has a few specific rules: ■ It must be terminated by a semicolon. ■ The rows must match one-to-one. One-to-many matches are not permitted. ■ The join conditions must be deterministic, meaning they are repeatable. Returning Modified Data SQL Server can optionally return the modified data as a data set for further use. This can be useful to perform more work on the modified data, or to return the data to the front-end application to eliminate an extra round-trip to the server. The OUTPUT clause can access the inserted and deleted virtual tables, as well as any data source refer- enced in the FROM clause, to select the data to be returned. Normally used only by triggers, inserted and deleted virtual tables contain the before and after views to the transaction. The deleted virtual table stores the old data, and the inserted virtual table stores the newly inserted or updated data. For more examples of the inserted and deleted table, turn to Chapter 26, ‘‘Creating DML Triggers.’’ Returning data from an insert The INSERT command makes the inserted virtual table available. The following example, taken from earlier in this chapter, has been edited to include the OUTPUT clause. The inserted virtual table has a picture of the new data being inserted and returns the data: USE CHA2; INSERT dbo.Guidelist (LastName, FirstName, Qualifications) OUTPUT Inserted.* VALUES(‘Nielsen’, ‘Paul’,‘trainer’); Result: GuideID LastName FirstName Qualifications DateOfBirth DateHire 12 Nielsen Paul trainer NULL NULL 373 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 374 Part II Manipulating Data With Select Best Practice A n excellent application of the OUTPUT clause within an INSERT is returning the values of newly created surrogate keys. The identity_scope() function returns the last single identity inserted, but it can’t return a set of new identity values. There is no function to return the GUID value just created by a newsequentialid() default. However, the OUTPUT clause returns sets of new surrogate keys regardless of their data type. You can almost think of the INSERT OUTPUT as a scope_GUID() function or a set-based scope_identity(). Returning data from an update The OUTPUT clause also works with updates and can return the before and after picture of the data. In this example, the deleted virtual table is being used to grab the original value, while the inserted virtual table stores the new updated value. Only the Qualifications column is returned: USE CHA2; UPDATE dbo.Guide SET Qualifications = ‘Scuba’ OUTPUT Deleted.Qualifications as OldQuals, Inserted.Qualifications as NewQuals Where GuideID = 3; Result: OldQuals NewQuals NULL Scuba Returning data from a delete When deleting data, only the deleted table has any useful data to return: DELETE dbo.Guide OUTPUT Deleted.GuideID, Deleted.LastName, Deleted.FirstName WHERE GuideID = 3; Result: GuideID LastName FirstName 5 Wilson Greg Returning data from a merge The MERGE command can return data using the OUTPUT clause as well. A twist is that the MERGE command adds a column, $action, to identify whether the row was inserted, updated, or deleted from the target table. The next query adds the OUTPUT clause to the previous MERGE command: 374 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 375 Modifying Data 15 MERGE FlightPassengers F USING CheckIn C ON C.LastName = F.LastName AND C.FirstName = F.FirstName AND C.FlightCode = F.FlightCode AND C.FlightDate = F.FlightDate WHEN MATCHED THEN UPDATE SET F.Seat = C.Seat WHEN NOT MATCHED BY TARGET THEN INSERT (FirstName, LastName, FlightCode, FlightDate, Seat) VALUES (FirstName, LastName, FlightCode, FlightDate, Seat) WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT deleted.FlightID, deleted.LastName, Deleted.Seat, $action, inserted.FlightID, inserted.LastName, inserted.Seat ; Result: FlightID LastName Seat $action FlightID LastName Seat NULL NULL NULL INSERT 5 Anderson 4B 1 Nielsen 9F UPDATE 1 Nielsen 9F 2 Jenkins 7A UPDATE 2 Jenkins 7A 3 Smith 2A DELETE NULL NULL NULL 4 Nixon 29B UPDATE 4 Nixon 2A Returning data into a table For T-SQL developers, the OUTPUT clause can return the data for use within a batch or stored proce- dure. The data is received into a user table, temp table, or table variable, which must already have been created. Although the syntax may seem similar to the INSERT INTO syntax, it actually functions very differently. In the following example, the OUTPUT clause passes the results to a @DeletedGuides table variable: DECLARE @DeletedGuides TABLE ( GuideID INT NOT NULL PRIMARY KEY, LastName VARCHAR(50) NOT NULL, FirstName VARCHAR(50) NOT NULL ); DELETE dbo.Guide OUTPUT Deleted.GuideID, Deleted.LastName, Deleted.FirstName INTO @DeletedGuides WHERE GuideID = 2; 375 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 376 Part II Manipulating Data With Select Interim result: (1 row(s) affected) Continuing the batch SELECT GuideID, LastName, FirstName FROM @DeletedGuides; Result: (1 row(s) affected) GuideID LastName FirstName 2 Frank Ken An advance use of the OUTPUT clause, called composable DML , passes the output data to an outer query, which can then be used in an INSERT command. For more details, refer to Chapter 11, ‘‘Including Data with Subqueries and CTEs.’’ Summary Data retrieval and data modification are primary tasks of a database application. This chapter examined the workhorse INSERT, UPDATE, DELETE,andMERGE DML commands and described how you can use them to manipulate data. Key points in this chapter include the following: ■ There are multiple formats for the INSERT command depending on the data’s source: INSERT VALUES, INSERT SELECT, INSERT EXEC,andINSERT DEFAULT. ■ INSERT VALUES now has row constructors to insert multiple rows with a single INSERT. ■ INSERT INTO creates a new table and then inserts the results into the new table. ■ UPDATE always updates only a single table, but it can use an optional FROM clause to reference other data sources. ■ Using DELETE without a WHERE clause is dangerous. ■ The new MERGE command pulls data from a source table and inserts, updates, or deletes in the target table depending on the match conditions. ■ INSERT, UPDATE, DELETE,andMERGE can all include an optional OUTPUT clause that can select data from the query or the virtual inserted and deleted tables. The result of the OUTPUT clause can be passed to the client, insertedintoatable,orpassedtoanouterquery. This chapter explained data modifications assuming all goes well, but in fact several conditions and sit- uations can conspire to block the INSERT, UPDATE, DELETE,orMERGE. The next chapter looks at the dark side of data modification and what can go wrong. 376 www.getcoolebook.com Nielsen c16.tex V4 - 07/21/2009 12:53pm Page 377 Modification Obstacles IN THIS CHAPTER Avoiding and solving complex data-modification problems Primary keys, foreign keys, inserts, updates, and deletes Deleting duplicate rows Nulls and defaults Trigger issues Updating with views S ome newsgroup postings ask about how to perform a task or write a query, but another set of postings ask about troubleshooting the code when there is some problem. Typically, SQL Server is working the way it is supposed to function, but someone is having trouble getting past what’s perceived to be an obstacle. This chapter surveys several types of potential obstacles and explains how to avoid them. In nearly every case, the obstacle is understood — it’s really a safety feature and SQL Server is protecting the data by blocking the insert, update, or delete. As Table 16-1 illustrates, INSERT and UPDATE operations face more obstacles than DELETE operations because they are creating new data in the table that must pass multiple validation rules. Because the DELETE operation only removes data, it faces fewer possible obstacles. Data Type/Length Column data type/length may affect INSERT and UPDATE operations. One of the first checks the new data must pass is that of data type and data length. Often, a data-type error is caused by missing or extra quotes. SQL Server is particular about implicit, or automatic, data-type conversion. Conversions that function automatically in other programming languages often fail in SQL Server, as shown in the following example: USE OBXKites; DECLARE @ProblemDate DATETIME = ‘20090301’; INSERT dbo.Price (ProductID, Price, EffectiveDate) VALUES (’6D37553D-89B1-4663-91BC-0486842EAD44’, @ProblemDate, ‘20020625’); 377 www.getcoolebook.com Nielsen c16.tex V4 - 07/21/2009 12:53pm Page 378 Part II Manipulating Data with Select TABLE 16-1 Potential Data Modification Obstacles Potential Problem Insert Operation Update Operation Delete Operation Data Type/Length X X Primary Key Constraint and Unique Constraint X X Duplicate Rows X X Foreign Key Constraint X X X Unique Index X X Not Null and No Default X X Check Constraint X X Instead of Trigger X X X After Trigger X X X Non-Updatable Views X X X Views with Check Option X X Security X X X Result: Msg 257, Level 16, State 3, Line 3 Implicit conversion from data type datetime to money is not allowed. Use the CONVERT function to run this query. The problem with the preceding code is that a DATETIME variable is being inserted into a money data type column. For most data type conversions, SQL server handles the conversion implicitly; however, conversion between some data types requires using the cast() or convert() function. For more details about data types and tables, refer to Chapter 20, ‘‘Creating the Physical Database Schema.’’ Data-type conversion and conversion scalar functions are discussed in Chapter 9, ‘‘Data Types, Expressions, and Scalar Functions.’’ Primary Key Constraint and Unique Constraint Both primary key constraints and unique constraints may affect INSERT and UPDATE operations. While this section explicitly deals with primary keys, the same is true for unique indexes. Primary keys, by definition, must be unique. Attempting to insert a primary key that’s already in use will cause an error. Technically speaking, updating a primary key to a value already in use also causes 378 www.getcoolebook.com Nielsen c16.tex V4 - 07/21/2009 12:53pm Page 379 Modification Obstacles 16 an error, but surrogate primary keys (identity columns and GUIDs) should never need to be updated; and a good natural key should rarely need updating. Candidate keys should also be stable enough that they rarely need updating. Updating a primary key may also break referential integrity, causing the update to fail. In this case, how- ever, it’s not a primary-key constraint that’s the obstacle, but the foreign-key constraint that references the primary key. For more information about the design of primary keys, foreign keys, and many of the other constraints mentioned in this chapter, refer to Chapter 3, ‘‘Relational Database Design.’’ For details on creating constraints, turn to Chapter 20, ‘‘Creating the Physical Database Schema.’’ One particular issue related to inserting is the creation of surrogate key values for the new rows. SQL Server provides two excellent means of generating surrogate primary keys: identity columns and GUIDs. Each method has its pros and cons, and its rules for safe handling. Every table should have a primary key. If the primary key is the same data used by humans to identify the item in reality, then it’s a natural key , e.g., ssn, vehicle vin, aircraft tail number, part serial number. The alternative to t he natural key is the surrogate key , surrogate meaning artificial or a stand-in replace- ment. For databases, a surrogate key means an artificial, computer-generated value is used to uniquely identify the row. SQL Server supports identity columns and globally unique identifiers (GUIDs) as surro- gate keys. Identity columns SQL Server automatically generates incrementing integers for identity columns at the time of the insert and any SQL INSERT statement normally can’t interfere with that process by supplying a value for the identity column. The fact that identity columns refuse to accept inserted integers can be a serious issue if you’re inserting existing data with existing primary key values that must be maintained because they are referenced by secondary tables. The solution is to use the IDENTITY_INSERT database option. When set to ON it temporarily turns off the identity column and permits the insertion of data into an identity column. This means that the insert has to explicitly provide the primary-key value. The IDENTITY_INSERT option may only be set ON for one table at a time within a database. The following SQL batch uses the IDENTITY_INSERT option when supplying the primary key: USE CHA2; attempt to insert into an identity column INSERT dbo.Guide (GuideID, FirstName, LastName) VALUES (10, ‘Bill’, ‘Fletcher’); Result: Server: Msg 544, Level 16, State 1, Line 1 Cannot insert explicit value for identity column in table ’Guide’ when IDENTITY_INSERT is set to OFF. 379 www.getcoolebook.com Nielsen c16.tex V4 - 07/21/2009 12:53pm Page 380 Part II Manipulating Data with Select The sample database for this book can be downloaded from the book’s website: www.sqlserverbible.com. The next step in the batch sets the IDENTITY_INSERT option and attempts some more inserts: SET IDENTITY_INSERT Guide ON; INSERT Guide (GuideID, FirstName, LastName) VALUES (100, ‘Bill’, ‘Mays’); INSERT dbo.Guide (GuideID, FirstName, LastName) VALUES (101, ‘Sue’, ‘Atlas’); To see what value the identity column is now assigning, the following code re-enables the identity col- umn, inserts another row, and then selects the new data: SET IDENTITY_INSERT Guide OFF; INSERT Guide ( FirstName, LastName) VALUES ( ‘Arnold’, ‘Bistier’); SELECT GuideID, FirstName, LastName FROM dbo.Guide; Result: GuideID FirstName LastName 1 Dan Smith 2 Jeff Davis 3 Tammie Commer 4 Lauren Jones 5 Greg Wilson 100 Bill Mays 101 Sue Atlas 102 Arnold Bistier As this code demonstrates, manually inserting a GuideID of ‘‘101’’ sets the identity column’s next value to ‘‘102.’’ Another potential problem when working with identity columns is determining the value of the identity that was just created. Because the new identity value is created with SQL Server at the time of the insert, the code causing the insert is unaware of the identity value. The insert works fine; the perceived prob- lem occurs when the code inserts a row and then tries to display the row on a user-interface grid within an application, because the code is unaware of the new data’s database-assigned primary key. SQL Server provides four methods for determining the identity value: ■ @@IDENTITY: This venerable global variable returns the last identity value generated by SQL Server for any table, connection, or scope. If another insert takes place between the time of your insert and the time when you check @@IDENTITY, @@IDENTITY will return not your insert, but the last insert. For this reason, don’t use @@IDENTITY; it’s only there for backward compatibility. 380 www.getcoolebook.com Nielsen c16.tex V4 - 07/21/2009 12:53pm Page 381 Modification Obstacles 16 ■ SCOPE_IDENTITY (): This system function, introduced in SQL Server 2000, returns the last generated identity value within the scope of the calling batch or procedure. I recommend using this method, as it is the safest way to determine the identity value you last generated. ■ IDENT_CURRENT (TABLE): This function, also introduced in SQL Server 2000, returns the last identity value per table. While this option seems similar to SCOPE_IDENTITY(), IDENT_CURRENT() returns the identity value for the given table regardless of inserts to any other tables that may have occurred. This prevents another insert, buried deep within a trigger, from affecting the identity value returned by the function. ■ OUTPUT clause: The INSERT, UPDATE, DELETE,andMERGE commands can include an OUTPUT clause that can select from the inserted and deleted virtual tables. Using this data, any data modification query can return the inserted identity values. Globally unique identifiers (GUIDs) Globally unique identifiers (GUIDs) are sometimes, and with great debate, used as primary keys. A GUID can be the best choice when you have to generate unique values at different locations (i.e., in replicated scenarios), but hardly ever otherwise. With regard to the insertion of new rows, the major difference between identity columns and GUIDs is that GUIDs are generated by the SQL code or by a column default, rather than automatically gener- ated by the engine at the time of the insert. This means that the developer has more control over GUID creation. There are five ways to generate GUID primary key values when inserting new rows: ■ The NEWID() function can create the GUID in T-SQL code prior to the INSERT. ■ The NEWID() function can create the GUID in client code prior to the INSERT. ■ The NEWID() function can create the GUID in an expression in the INSERT command. ■ The NEWID() function can create the GUID in a column default. ■ The NEWSEQUENTIALID() function can create the GUID in a column default. This is the only method that avoids the page split performance issues with GUIDs. If you must use a GUID, then I strongly recommend using NEWSEQUENTIALID() . The following sample code demonstrates various methods of generating GUID primary keys during the addition of new rows to the ProductCategory table in the OBXKites database. The first query sim- ply tests the NEWID() function: USE OBXKites; Select NewID(); Result: 5CBB2800-5207-4323-A316-E963AACB6081 The next three queries insert a GUID, each using a different method of generating the GUID: GUID from Default (the columns default is NewID()) INSERT dbo.ProductCategory (ProductCategoryID, ProductCategoryName) 381 www.getcoolebook.com . extra quotes. SQL Server is particular about implicit, or automatic, data-type conversion. Conversions that function automatically in other programming languages often fail in SQL Server, as shown in. FlightCode FlightDate Seat 1 Paul Nielsen SS2008 2009-03-01 9F 2 Sue Jenkins SS2008 2009-03-01 7A 4 Jerry Nixon SS2008 2009-03-01 2A 5 Missy Anderson SS2008 2009-03-01 4B MERGE has a few specific. is used to uniquely identify the row. SQL Server supports identity columns and globally unique identifiers (GUIDs) as surro- gate keys. Identity columns SQL Server automatically generates incrementing