Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 352 Part II Manipulating Data With Select 5 Wilson Greg Rock Climbing, First Aid (5 row(s) affected) When the data to be inserted, usually in the form of variables sent from the user interface, is known, inserting using the INSERT VALUES form is the best insert method. Typically, to reference values from a data source, the INSERT SELECT is used, but an INSERT VALUES can include a scalar subquery as one of the values. Inserting a result set from select Data may be moved and massaged from one result set into a table by means of the INSERT SELECT statement. The real power of this method is that the SELECT command can pull data from nearly any- where and reshape it to fit the current needs. It’s this flexibility that the INSERT SELECT statement exploits. Because SELECT can return an infinite number of rows, this form can insert an infinite number of rows. Of course, the full power of the SELECT can be used to generate rows for the insert. The SELECT can include any clause except ORDER BY. A simplified form of the syntax is as follows: INSERT [INTO] schema.Table [(columns, )] SELECT columns FROM data sources [WHERE conditions]; As with the INSERT VALUES statement, the data columns must line up and the data types must be valid. If the optional insert columns are ignored, then every table column (except an identity column) must be populated in the table order. The following code sample uses the OBXKites database. It selects all the guides from the Cape Hatteras Adventures database and inserts them into the OBXKites Contact table. The name columns are pulled from the Guide table, while the company name is a string literal (note that the Guide table is specified by means of a three-part name, database.schema.table): Use OBXKites Using a fresh copy of OBXKites without population INSERT dbo.Contact (FirstName, LastName, ContactCode, CompanyName) SELECT FirstName, LastName, GuideID, ‘Cape Hatteras Adv.’ FROM CHA2.dbo.Guide; To verify the insert, the following SELECT statement reads the data from the Contact table: SELECT FirstName as First, LastName AS Last, CompanyName FROM dbo.Contact; Result: First Last CompanyName Dan Smith Cape Hatteras Adv. 352 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 353 Modifying Data 15 Jeff Davis Cape Hatteras Adv. Tammie Commer Cape Hatteras Adv. Lauren Jones Cape Hatteras Adv. Greg Wilson Cape Hatteras Adv. (5 row(s) affected) The key to using the INSERT/SELECT statement is selecting the correct result set. It’s a good idea to run the SELECT statement by itself to test the result set prior to executing the insert. Measure twice, cut once. Inserting the result set from a stored procedure The INSERT EXEC form of the INSERT operation pulls data from a stored procedure and inserts it into a table. Behind these inserts are the full capabilities of T-SQL. The basic function is the same as that of the other insert forms. The columns have to line up between the INSERT columns and the stored- procedure result set. Here’s the basic syntax of the INSERT EXEC command: INSERT [INTO] schema.Table [(Columns)] EXEC StoredProcedure Parameters; Be careful, though, because stored procedures can easily return multiple record sets, in which case the INSERT attempts to pull data from each of the result sets, and the columns from every result set must line up with the insert columns. For more about programming stored procedures, refer to Chapter 24, ‘‘Developing Stored Procedures.’’ The following code sample builds a stored procedure that returns the first and last names of all guides from both the Cape Hatteras Adventures database and Microsoft’s Northwind sample database from SQL Server 2000. Next, the code creates a table as a place to insert the result sets. Once the stored procedure and the receiving table are in place, the sample code performs the INSERT EXEC statement: Use CHA2; CREATE PROC ListGuides AS SET NOCOUNT ON; result set 1 SELECT FirstName, LastName FROM dbo.Guide; result set 1 SELECT FirstName, LastName FROM Northwind.dbo.Employees; RETURN; 353 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 354 Part II Manipulating Data With Select When the ListGuides stored procedure is executed, two result sets should be produced: Exec ListGuides; Result: FirstName LastName Dan Smith Jeff Davis Tammie Commer Lauren Jones Greg Wilson FirstName LastName Nancy Davolio Andrew Fuller Janet Leverling Margaret Peacock Steven Buchanan Michael Suyama Robert King Laura Callahan Anne Dodsworth The following DDL command creates a table that matches the structure of the procedure’s result sets: CREATE TABLE dbo.GuideSample (FirstName VARCHAR(50), LastName VARCHAR(50), CONSTRAINT PK_GuideSample PRIMARY KEY (FirstName, LastName) ); With the situation properly set up, here’s the INSERT EXEC command: INSERT dbo.GuideSample (FirstName, LastName) Exec ListGuides; A SELECT command can read the data and verify that fourteen rows were inserted: SELECT FirstName, LastName FROM dbo.GuideSample; Result: FirstName LastName Dan Smith Jeff Davis Tammie Commer 354 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 355 Modifying Data 15 Lauren Jones Wilson Greg Nancy Davolio Andrew Fuller Janet Leverling Margaret Peacock Steven Buchanan Michael Suyama Robert King Laura Callahan Anne Dodsworth INSERT/EXEC does require more work than INSERT/VALUES or INSERT/SELECT, but because the stored procedure can contain complex logic, it’s the most powerful of the three. The INSERT EXEC and SELECT INTO forms will not insert data into table variables. Table variables are covered in Chapter 21, ‘‘Programming with T-SQL.’’ Creating a default row SQL includes a special form of the INSERT command that creates a single new row with only default values. The only parameter of the new row is the table name. Data and column names are not required. The syntax is very simple, as shown here: INSERT schema.Table DEFAULT VALUES; I have never used this form of INSERT in any real-world applications. It could be used to create ‘‘pigeon hole’’ rows with only keys and null values, but I don’t recommend that design. Creating a table while inserting data The last method of inserting data is a variation on the SELECT command. The INTO select option takes the results of a SELECT statement and creates a new table containing the results. SELECT INTO is often used during data conversions and within utilities that must dynamically work with a variety of source-table structures. The full syntax includes every SELECT option. Here’s an abbreviated syntax to highlight the function of the INTO option: SELECT Columns INTO NewTable FROM DataSources [WHERE conditions]; The data structure of the newly created table might be less of an exact replication of the original table structure than expected because the new table structure is based on a combination of the original table and the result set of the SELECT statement. String lengths and numerical digit lengths may change. If the SELECT INTO command is pulling data from only one table and the SELECT statement contains no data-type conversion functions, then there’s a good chance that the table columns and null settings will remain intact. However, keys, constraints, and indexes will be lost. 355 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 356 Part II Manipulating Data With Select SELECT INTO is a bulk-logged operation, similar to BULK INSERT and BULK COPY. Bulk-logged operations may enable SQL Server to quickly move data into tables by minimally recording the bulk-logged operations to the transaction log (depending on the database’s recovery model). Therefore, the database options and recovery model affect SELECT INTO and the other bulk-logged operations. For more about BULK INSERT and BULK COPY, refer to Chapter 30, ‘‘Performing Bulk Oper- ations.’’ For details on recovery models, refer to Chapter 41, ‘‘Recovery Planning.’’ The following code sample demonstrates the SELECT/INTO command as it creates the new table GuideList by extracting data from Guide (some results abridged): USE CHA2; sample code for setting the bulk-logged behavior ALTER DATABASE CHA2 SET RECOVERY BULK_LOGGED; the select/into statement SELECT LastName, FirstName INTO dbo.GuideList FROM dbo.Guide ORDER BY Lastname, FirstName; The sp_help system stored procedure can display the structure of a table. Here it is being used to verify the structure that was created by the SELECT/INTO command: EXEC sp_help GuideList; Result (some columns abridged): Name Owner Type Created_datetime GuideList dbo user table 2001-08-01 16:30:02.937 Column_name Type Length Prec Scale Nullable GuideID int 4 10 0 no LastName varchar 50 no FirstName varchar 50 no Qualifications varchar 2048 yes DateOfBirth datetime 8 yes DateHire datetime 8 yes Identity Seed Increment Not For Replication GuideID 1 1 0 RowGuidCol No rowguidcol column defined. 356 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 357 Modifying Data 15 Data_located_on_filegroup PRIMARY The object does not have any indexes. No constraints have been defined for this object. No foreign keys reference this table. No views with schema binding reference this table. The following insert adds a new row to test the identity column created by the SELECT/INTO: INSERT Guidelist (LastName, FirstName, Qualifications) VALUES(’Nielsen’, ‘Paul’, ‘trainer’); To view the data that was inserted using the SELECT/INTO command and the row that was just added with the INSERT/VALUES command, the following SELECT statement extracts data from the GuideList table: SELECT GuideID, LastName, FirstName FROM dbo.GuideList; Result: GuideID LastName FirstName 12 Nielsen Paul 7 Atlas Sue 11 Bistier Arnold 3 Commer Tammie 2 Davis Jeff 10 Fletcher Bill 5 Greg Wilson 4 Jones Lauren 1 Smith Dan In this case, the SELECT/INTO command retained the column lengths and null settings. The identity column was also carried over to the new table, although this may not always be the case. I recommend that you build tables manually, or at least carefully check the data structures created by SELECT/INTO. SELECT/INTO can serve many useful functions: ■ If zero rows are selected from a table, then SELECT/INTO will create a new table with only the data schema (though with the limitations listed earlier). ■ If SELECT reorders the columns, or includes the cast() function, then the new table will retain the data within a modified data schema. 357 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 358 Part II Manipulating Data With Select ■ When combined with a UNION query, SELECT/INTO can combine data from multiple tables vertically. The INTO goes in the first SELECT statement of a UNION query. ■ SELECT/INTO is especially useful for denormalizing tables. The SELECT statement can pull from multiple tables and create a new flat-file table. Note one caveat concerning SELECT/INTO and development style: The SELECT/INTO state- ment should not replace the use of joins or views. When the new table is created, it’s a snapshot in time — a second copy of the data. Databases containing multiple copies of old data sets are a sure sign of trouble. If you need to denormalize data for ad hoc analysis, or to pass to a user, then creating a view is likely a better alternative. Developing a Data Style Guide T here are potential data troubles that go beyond data types, nullability, and check constraints. Just as MS Word’s spelling checker and grammar checker can weed out the obvious errors but also create poor (or libelous) literature, a database can protect against only gross logical errors. Publishers use manuals of style and style guides for consistency. For example, should Microsoft be referred to as MS, Microsoft Corp., or Microsoft Corporation in a book or article? The publisher’s chosen style manual provides the answer. Databases can also benefit from a data style guide that details your organization’s preferences about how data should be formatted. Do phone numbers include parentheses around the area codes? Are phone extensions indicated by ‘‘x.’’ or ‘‘ext.’’? One way to begin developing a style guide is to spend some time just looking at the data and observing the existing inconsistencies. Then, try to reach a consensus about a common data style. The Chicago Manual of Style is a good source for ideas. There’s no magical right or wrong style — the goal is simply data consistency. Updating Data SQL’s UPDATE command is an incredibly powerful tool. What used to take dozens of lines of code with multiple nested loops now takes a single statement. Even better, SQL is not a true command language — it’s a declarative language. The SQL code is only describing to the Query Optimizer what you want to do. The Query Optimizer then develops a cost-based, optimized query execution plan to accomplish the task. It determines which tables to fetch and in which order, how to merge the joins, and which indexes to use. It does this based on several factors, including the current data-population statistics, the indexes available and how they relate to the data population within the table, and table sizes. The Query Optimizer even considers current CPU performance, memory capacity, and hard-drive performance when designing the plan. Writing code to perform the update row by row could never result in that level of optimization. 358 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 359 Modifying Data 15 Updating a single table The UPDATE command in SQL is straightforward and simple. It can update one column of one row in a table, or every column in every row in the updated table, but the optional FROM clause enables that table to be part of a complete complex data source with all the power of the SQL SELECT. Here’s how the UPDATE command works: UPDATE schema.Table SET column = expression, column = value [FROM data sources] [WHERE conditions]; The UPDATE command can update multiple rows, but only one table. The SET keywordisusedto modify data in any column in the table to a new value. The new value can be a hard-coded string literal, a variable, an expression, or even another column from the data sources listed in the FROM portion of the SQL UPDATE statement. For a comprehensive list of expression possibilities, see Chapter 9, ‘‘Data Types, Expres- sions, and Scalar Functions.’’ The WHERE clause is vital to any UPDATE statement. Without it, the entire table is updated. If a WHERE clause is present, then only the rows not filtered out by the WHERE clause are updated. Be sure to check and double-check the WHERE clause. Again, measure twice, cut once. The following sample UPDATE resembles a typical real-life operation, altering the value of one column for a single row. The best way to perform a single-row update is to filter the UPDATE operation by refer- encing the primary key: USE CHA2; UPDATE dbo.Guide SET Qualifications = ‘Spelunking, Cave Diving, First Aid, Navigation’ Where GuideID = 6; The following SELECT statement confirms the preceding UPDATE command: SELECT GuideID, LastName, Qualifications FROM dbo.Guide WHERE GuideID = 6; Result: GuideID LastName Qualifications 6 Bistier Spelunking, Cave Diving, First Aid, Navigation 359 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 360 Part II Manipulating Data With Select Performing global search and replace Cleaning up bad data is a common database developer task. Fortunately, SQL includes a REPLACE() function, which when combined with the UPDATE command can serve as a global search and replace. I’ve used this to remove extra tabs from data. In the following example, which references the Family sample database, every occurrence of ‘‘ll’’ in the LastName column is updated to ‘‘qua’’: Use Family; Update Person Set LastName = Replace(LastName, ‘ll’, ‘qua’); The following SELECT statement examines the result of the REPLACE() function: Select LastName from Person; Result (abbreviated): lastname Haquaoway Haquaoway Miquaer Miquaer Haquaoway Referencing multiple tables while updating data A more powerful function of the SQL UPDATE command is setting a column to an expression that can refer to the same column, other columns, or even other tables. While expressions are certainly available within a single-table update, expressions often need to refer- ence data outside the updated table. The optional FROM clause enables joins between the table being updated and other data sources. Only one table can be updated, but when the table is joined to the cor- responding rows from the joined tables, the data from the other columns is available within the UPDATE expressions. One way to envision the FROM clause is to picture the joins merging all the tables into a new super-wide result set. Then the rest of the SQL statement sees only that new result set. While that is what’s happen- ing in the FROM clause, the actual UPDATE operation is functioning not on the new result set, but only on the declared UPDATE table. The following query uses the FROM clause to access the Contact and Order tables. The JOIN limits the query to only those contact rows that have placed orders. The UPDATE command updates only the Contact table: USE OBXKites 360 www.getcoolebook.com Nielsen c15.tex V4 - 07/21/2009 12:51pm Page 361 Modifying Data 15 UPDATE dbo.Contact SET IsCustomer = 1 FROM dbo.Contact AS C JOIN dbo.[Order] AS O ON C.ContactID = O.ContactID The UPDATE FROM syntax is a T-SQL extension and not standard ANSI SQL 92. If the database will possibly be ported to another database platform in the future, then use a subquery to select the correct rows: UPDATE dbo.Contact SET IsCustomer = 1 WHERE ContactID IN (SELECT ContactID FROM dbo.[Order]) For a real-life example, suppose all employees will soon be granted a generous across-the-board raise (OK, so it’s not a real-life example) based on department, length of service in the position, performance rating, and length of time with the company. If the percentage for each department is stored in the Department table, SQL can adjust the salary for every employee with a single UPDATE statement by joining the Employee table with the Department table and pulling the Department raise factor from the joined table. Assume the formula is as follows: 2 + (((Years in Company * .1) + (Months in Position * .02) + ((PerformanceFactor * .5 ) if over 2)) * Department RaiseFactor) The sample code sets up the scenario by creating a couple of tables and populating them with test data: USE tempdb CREATE TABLE dbo.Dept ( DeptID INT IDENTITY NOT NULL PRIMARY KEY, DeptName VARCHAR(50) NOT NULL, RaiseFactor NUMERIC(4, 2) ) CREATE TABLE dbo.Employee ( EmployeeID INT IDENTITY NOT NULL PRIMARY KEY, DeptID INT FOREIGN KEY REFERENCES Dept, LastName VARCHAR(50) NOT NULL, FirstName VARCHAR(50) NOT NULL, Salary NUMERIC(9,2) NOT NULL, PerformanceRating NUMERIC(4,2) NOT NULL, DateHire DATE NOT NULL, 361 www.getcoolebook.com . based on department, length of service in the position, performance rating, and length of time with the company. If the percentage for each department is stored in the Department table, SQL can. 07/21/2009 12:51pm Page 356 Part II Manipulating Data With Select SELECT INTO is a bulk-logged operation, similar to BULK INSERT and BULK COPY. Bulk-logged operations may enable SQL Server to quickly move. use manuals of style and style guides for consistency. For example, should Microsoft be referred to as MS, Microsoft Corp., or Microsoft Corporation in a book or article? The publisher’s chosen style