Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 40 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
40
Dung lượng
685,04 KB
Nội dung
196 CHAPTER 13 Full-text searching WHEN THEN 'Newly created and not yet used' WHEN THEN 'Being used for insert' WHEN THEN 'Closed ready for query' WHEN THEN 'Being used for merge input and ready for query' WHEN THEN 'Marked for deletion Will not be used for query and merge ➥source' ELSE 'Unknown status code' END FROM sys.fulltext_index_fragments f JOIN sys.tables t on f.table_id = t.object_id; When this query returns, look for rows whose type is 4, or Closed ready for query A table will be listed once for each fragment it has If it turns out that you have a high number of closed fragments, you should consider doing a REORGANIZE on the index (using the ALTER FULLTEXT INDEX statement) Note two things: first you must a reorganize, as opposed to a rebuild Second, the exact number of fragments that will cause you issues is somewhat dependant on your hardware But as a rule of thumb, if it exceeds 50, start planning a reorganize, and if it’s over 100, start planning in a hurry The keywords We’ll close this chapter out by answering one of the most-often-asked questions: how can I find out what words are contained in my full-text index? New with SQL Server 2008 are a pair of dynamic management functions (DMFs) that can help us answer that very question The first is sys.dm_fts_index_keywords To use this function, pass in the database ID and object ID for the table you want to discover the keywords for It returns a table with many columns; this query shows you the more useful ones Note that it also references the sys.columns view in order to get the column name: SELECT keyword, display_term, c.name, document_count FROM sys.dm_fts_index_keywords(db_id() , object_id ('Production.ProductDescription')) fik JOIN sys.columns c on c.object_id = object_id('Production.ProductDescription') AND c.column_id = fik.column_id; The db_id() function allows us to easily retrieve the database ID We then use the object_id function to get the ID for the table name, passing in the text-based table name Table shows a sampling of the results Table Sample of results for query to find keywords Keyword Display term Column Document count 0x006C0069006700680074 light Description 0x006C006900670068007400650072 lighter Description 0x006C0069006700680074006500730074 lightest Description 0x006C0069006700680074007700650069006700680074 lightweight Description 11 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Useful system queries Table 197 Sample of results for query to find keywords (continued) Display term Keyword Column Document count 0x006C00690067006E0065 ligne Description 0x006C0069006E0065 line Description 0x006C0069006E006B link Description The Keyword column contains the Unicode version of the keyword in hexadecimal format, and is used as a way to link the Display Term—the real indexed word—to other views The Column column is obvious; the Document Count indicates how many times this keyword appears in the table One oddity about this particular DMF is that it doesn’t appear in the Object Explorer—at least, the version used in the writing of this chapter doesn’t But not to worry: the view still works, and it’s found in the Books Online documentation To add to the oddities, there’s a second dynamic management function, one that doesn’t display in the Object Explorer It’s sys.dm_fts_index_keywords_by_ document, and can also return valuable information about your keywords Here’s a query that will tell us not only what the keywords are, but what rows they are located on in the source table: SELECT keyword, display_term, c.name , document_id , occurrence_count FROM sys.dm_fts_index_keywords_by_document(db_id() , object_id('Production.ProductDescription')) JOIN sys.columns c on c.object_id = object_id('Production.ProductDescription') ORDER BY display_term; Like its sister DMF, you pass in the database ID and the object ID for the table Table shows a sampling of the data returned Table Sample of results for query to find keywords and their source row Display term Column name Document ID Occurrence count 0x006C0069006700680074 light Description 249 0x006C0069006700680074 light Description 409 0x006C0069006700680074 light Description 457 0x006C0069006700680074 light Description 704 0x006C0069006700680074 light Description 1183 0x006C0069006700680074 light Description 1199 0x006C0069006700680074 light Description 1206 Keyword Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 198 CHAPTER 13 Full-text searching Keyword and Display Term are the same as the previous view, as is the Column Name The Document ID is the unique key from the source table, and the Occurrence Count is how many times the word appears in the row referenced by the document ID Using this information, we can construct a query that combines data from the source table with this view This will create a valuable tool for debugging indexes as we try to determine why a particular word appears in a result set: SELECT d.keyword, d.display_term , d.document_id primary key , d.occurrence_count, p.Description FROM sys.dm_fts_index_keywords_by_document(db_id() , object_id('Production.ProductDescription')) d JOIN Production.ProductDescription p ON p.ProductDescriptionID = d.document_id ORDER BY d.display_term; As you can see from the results shown in table 8, we can pull the description for the row with the keyword we want Table Partial results of expanded query combining keywords with source data Display term Keyword Document Occurrence ID count Description 0x006C0069006700680074 light 249 Value-priced bike with many features of our top-of-the-line models Has the same light, stiff frame, and the quick acceleration we’re famous for 0x006C0069006700680074 light 409 Alluminum-alloy frame provides a light, stiff ride, whether you are racing in the velodrome or on a demanding club ride on country roads 0x006C0069006700680074 light 457 This bike is ridden by race winners Developed with the AdventureWorks Cycles professional race team, it has a extremely light heat-treated aluminum frame, and steering that allows precision control 0x006C0069006700680074 light 704 A light yet stiff aluminum bar for longdistance riding 0x006C0069006700680074 light 1183 Affordable light for safe night riding; uses AAA batteries 0x006C0069006700680074 light 1199 Light-weight, wind-resistant, packs to fit into a pocket 0x006C0069006700680074 light 1206 Simple and light-weight Emergency patches stored in handle Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Summary 199 Summary This concludes our look at full-text searching with SQL Server 2008 We began by creating a catalog to hold our indexes, then proceeded to step two, creating the indexes themselves Our third step was querying the full-text indexes in a variety of ways Finally, we looked at some queries that will help us maintain and discover the state of our full-text indexes Hopefully you’ll find that using full-text searching can be as easy as one-two-three! About the author Robert C Cain is a Microsoft MVP in SQL development, and is a consultant with COMFRAME as a senior business intelligence architect Prior to his current position, Robert worked for a regional power company, managing, designing, and implementing the SQL Server data warehouse for the nuclear division He also spent 10 years as a senior consultant, working for a variety of customers in the Birmingham, Alabama, area using Visual Basic and C# He maintains the popular blog http://arcanecode.com In his spare time, Robert enjoys spending time with his wife and two daughters, digital photography, and amateur radio, holding the highest amateur license available and operating under the call sign N4IXT Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 14 Simil: an algorithm to look for similar strings Tom van Stiphout Are you a perfect speller? Is everyone in your company? How about your business partners? Misspellings are a fact of life There are also legitimate differences in spelling: what Americans call rumors, the British call rumours Steven A Ballmer and Steve Ballmer are two different but accurate forms of that man’s name Your database may contain a lot of legacy values from the days before better validation at the point of data entry Overall, chances are your database already contains imperfect textual data, which makes it hard to search Additionally, the user may not know exactly what to look for When looking for a number or a date, we could search for a range, but text is more unstructured, so database engines such as SQL Server include a range of tools to find text, including the following: EQUALS (=) and LIKE SOUNDEX and DIFFERENCE CONTAINS and FREETEXT Simil Equals and LIKE search for equality with or without wildcards SOUNDEX uses a phonetic algorithm based on the sound of the consonants in a string CONTAINS is optimized for finding inflectional forms and synonyms of strings Simil is an algorithm that compares two strings, and based on the longest common substrings, computes a similarity between (completely different) and (identical) This is sometimes called fuzzy string matching Simil isn’t available by default Later in this chapter we’ll discuss how to install it In this chapter, we take a closer look at these various methods, beginning with the simplest one 200 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross SOUNDEX and DIFFERENCE 201 Equals (=) and LIKE In this section we’ll discuss two simple options for looking up text Equals (=) is appropriate if you know exactly what you’re looking for, and you know you have perfect data For example, this statement finds all contacts with a last name of Adams If you have an index on the column(s) used in the WHERE clause, this lookup is very fast and can’t be beat by any of the other techniques discussed later in this chapter: SELECT FirstName, LastName FROM Person.Person WHERE (LastName = 'Adams') NOTE Throughout this chapter, I’m using SQL Server 2008 and the sample database AdventureWorks2008, available at http:/ /www.codeplex.com/ SqlServerSamples LIKE allows wildcards and patterns This allows you to find data even if there’s only a partial match For example, this statement finds all contacts with a last name starting with A: SELECT FirstName, LastName FROM Person.Person WHERE (LastName LIKE 'A%') The wildcards % and _ are used as a placeholder for any text and any character If you omit wildcards altogether, the statement returns the same records as if = were used If, as in the preceding example, you use LIKE with a wildcard at the end, you have the benefit of a fast indexed lookup if there’s an index on the column you’re searching on Wildcard searches such as WHERE (LastName LIKE '%A') can’t use an index and will as a result be slower LIKE also supports patterns indicating which range of characters are allowed For example, this statement finds last names starting with Aa through Af: SELECT FirstName, LastName FROM Person.Person WHERE (LastName LIKE 'A[a-f]%') Whether the lookup is case sensitive depends on the collation selected when the server was installed A more detailed discussion of case sensitivity isn’t in scope of this chapter But what if you don’t know the exact string you’re looking for? Perhaps you heard a company name on the radio and only know what it sounds like SOUNDEX and DIFFERENCE If you’re looking for words that sound alike, SOUNDEX and DIFFERENCE are the built-in functions to use They only work for English pronunciation To get the SOUNDEX value, call the function: Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 202 CHAPTER 14 Simil: an algorithm to look for similar strings SELECT FirstName, LastName, SOUNDEX(LastName) FROM Person.Person WHERE (LastName LIKE 'A%') SOUNDEX returns a four-character string representing the sound of a given string The first character is the first letter of the string, and the remaining three are numbers representing the sound of the first consonants of the string Similar-sounding consonants get the same value; for example the d in Adams gets a value of 3, just like a t would After all substitutions, Adams and Atoms have the same SOUNDEX value of A352 One typical use for SOUNDEX is to store the values in a table, so that you can later run fast-indexed lookups using = The DIFFERENCE function is used to compare SOUNDEX values in expressions It converts its two arguments to SOUNDEX equivalents and computes the difference, expressed in a value between (weak or no similarity) and (strong similarity or identical) For example, this statement finds contacts with last names somewhat similar to Adams: SELECT FirstName, LastName FROM Person.Person WHERE (DIFFERENCE(LastName, 'Adams') = 3) Resulting names from the sample database include Achong, Adina, Ajenstat, and Akers As you can see, not all of them would we immediately associate with Adams That’s one of the limitations of this simple algorithm Keep reading for more sophisticated options CONTAINS and FREETEXT So far we’ve covered a few fairly simple ways of finding text: by literal value, using literal values and wildcards, and by comparing the sounds of strings Now we’re going to check out the most powerful text-matching features built into SQL Server The keywords CONTAINS and FREETEXT are used in the context of full-text indexes, which are special indexes (one per table) to quickly search words in text They require the use of a special set of predicates Let’s look at a few of these powerful statements The first one looks for all records with the word bike in them: SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE CONTAINS(Description, 'bike') You might think that’s equivalent to the following: SELECT ProductDescriptionID, Description FROM Production.ProductDescription WHERE (Description LIKE '%bike%') But the two statements aren’t equivalent The former statement finds records with the word bike, skipping those with bikes, biker, and other forms Changing the latter statement to LIKE '% bike %' doesn’t work either, if the word is next to punctuation The CONTAINS and FREETEXT keywords can also handle certain forms of fuzzy matches, for example: Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross CONTAINS and FREETEXT 203 SELECT Description FROM Production.ProductDescription WHERE CONTAINS(Description, 'FORMSOF (INFLECTIONAL, ride) ') This statement finds words that are inflectionally similar, such as verb conjugations and singular/plural forms of nouns So words such as rode and riding-whip are found, but rodeo isn’t FREETEXT is similar to CONTAINS, but is much more liberal in finding variations For example, a CONTAINS INFLECTIONAL search for two words would find that term and its inflections, whereas FREETEXT would find the inflections of two and words separately Another aspect of fuzzy matches is using the thesaurus to find similar words Curiously, the SQL Server thesaurus is empty when first installed I populated the tsglobal.xml file (there are similar files for specific languages) with the following: bicycle bike Then I was able to query for any records containing bike or bicycle: SELECT Description FROM Production.ProductDescription WHERE CONTAINS(Description, 'FORMSOF (THESAURUS, bike) ') The thesaurus can also hold misspellings of words along with the proper spelling: visualbasic vb visaul basic visual basic If I were writing a resume-searching application, this could come in handy The last option I want to cover here is the NEAR keyword, which looks for two words in close proximity to each other: SELECT Description FROM Production.ProductDescription WHERE CONTAINS(Description, 'bike NEAR woman') CONTAINS and FREETEXT have two cousins—CONTAINSTABLE and FREETEXTTABLE They return KEY and RANK information, which can be used for ranking your results: SELECT [key], [rank] FROM CONTAINSTABLE(Production.ProductDescription, Description, 'bike') ORDER BY [rank] DESC So far we’ve covered the full range of text-searching features available in T-SQL, and we’ve been able to perform many text-oriented queries If it’s impractical to build a thesaurus of misspellings and proper spellings, we have to use a more generic routine Let’s get to the core of this chapter and take a closer look at one answer to this problem Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 204 CHAPTER 14 Simil: an algorithm to look for similar strings Simil As shown earlier, T-SQL allows us to perform a wide range of text searches Still, a lot remains to be desired, especially with regard to misspellings If you want to find a set of records even if they have misspellings, or want to prevent misspellings, you need to perform fuzzy string comparisons, and Simil is one algorithm suited for that task One use for Simil is in data cleanup In one example, a company had a table with organic chemistry compounds, and their names were sometimes spelled differently The application presents the user with the current record and similar records The user can decide which records are duplicates, and choose the best one One button click later, all child records are pointed to the chosen record, and the bad records are deleted Then the user moves to the next record Another typical use for Simil is in preventing bad data from entering the database in the first place Our company has a Sales application with a Companies table When a salesperson is creating or importing a new company, the application uses Simil to scan for similar company names If it finds any records, it’ll show a dialog box asking the user if the new company is one of those, or indeed a new company, as shown in figure Other uses include educational software with open-ended questions One tantalizing option the original authors mention is to combine Simil with a compiler, which could then auto-correct common mistakes Let’s look at Simil in more detail, and learn how we can take advantage of it In 1988, Dr Dobb’s Journal published the Ratcliff/Obershelp algorithm for pattern recognition (Ratcliff and Metzener, “Pattern Matching: The Gestalt Approach,” http:/ / www.ddj.com/184407970?pgno=5) This algorithm compares two strings and returns a similarity between (completely different) and (identical) Ratcliff and Obershelp wrote the original version in assembly language for the 8086 processor In 1999, Steve Grubb published his interpretation in the C language (http:/ /web.archive.org/web/ Figure A form showing similar database records Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Simil 205 20050213075957/www.gate.net/~ddata/utilities/simil.c) This is the version I used as a starting point for the NET implementation I’m presenting here The purpose of Simil is to calculate a similarity between two strings Algorithm The Simil algorithm looks for the longest common substring, and then looks at the right and left remainders for the longest common substrings, and so on recursively until no more are found It then returns the similarity as a value between and 1, by dividing the sum of the lengths of the substrings by the lengths of the strings themselves Table shows an example for two spellings of the word Pennsylvania The algorithm finds the largest common substrings lvan, and then repeats with the remaining strings until there are no further common substrings Table Simil results for Pennsylvania Word Word Common substring Length Pennsylvania Pencilvaneya lvan Pennsy ia Penci eya Pen nsy ia ci eya a nsy i ci ey (none) Subtotal 16 Length of original strings 24 Simil = 16/24 0.67 Simil is case sensitive If you want to ignore case, convert both strings to uppercase or lowercase before calling Simil At its core, Simil is a longest common substring or LCS algorithm, and its performance can be expected to be on par with that class of algorithms Anecdotally, we know that using Simil to test a candidate company name against 20,000 company names takes less than a second Simil has good performance and is easy to understand It also has several weaknesses, including the following: The result value is abstract Therefore it’ll take some trial and error to find a good threshold value above which you’d consider two strings similar enough to take action For data such as company names, I recommend a starting Simil value of about 0.75 For the organic chemistry names, we found that 0.9 gave us better results Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 16 Table-valued parameters Don Kiely A major goal of the new Transact-SQL (T-SQL) features in SQL Server 2008 is to reduce the amount of code you need to write for common scenarios Many new language features simplify code, and table-valued parameters probably so most dramatically Such an innocuous name for a radical new feature! It’s the sort of thing that only a geek could love: the ability to pass a table to a procedure It’s a simple enhancement but will change the way you think about programming SQL Server forever If you’ve ever passed a comma or other delimited list of data values to a stored procedure, then split them up, and processed them, or bumped up against stored procedure parameter limits, you know the pain that is now forever gone In this chapter I’ll explore the syntax and use of this new T-SQL feature, both in SQL Server code as well as client code By the end of the chapter, you’ll wonder how you ever programmed without it! What’s the problem? Before SQL Server 2008, there was no easy way to pass data containers—arrays, DataSets, DataTables, and so on—to stored procedures and functions You could pass single scalar values with no problem, although if you had to pass many parameters you might run into the limit on parameters, which is 2,100 Objects like arrays, in-memory tables, and other constructs are not the kind of set-based objects that T-SQL deals with Yet sometimes it is necessary to pass data containers to a code module Over the years, the ever-resourceful SQL Server community has devised plenty of ways to get around this problem Some of the more common workarounds have included the following: Pass in a delimited string, and parse it using T-SQL’s less-than-robust stringhandling features Pass data to the procedure as XML and shred it into a relational form 221 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 222 CHAPTER 16 Table-valued parameters Create long parameter lists with plenty of optional parameters to accommodate varying needs Create a global temporary table or even a permanent Parameters table to store lists of parameter data, which is often linked to a particular user connection The problem has plenty of other creative solutions Unfortunately, making these solutions work requires either convoluted code or a shared resource or both, thereby creating maintenance nightmares Another kind of problem with T-SQL’s inability to pass a data container as a parameter occurs when you pass data from a client application to be stored or processed in SQL Server The canonical problem of this type is creating a new customer order along with one or more order line items The overall workflow between client application and database server goes something like this: Pass the order header information, including customer ID, order date, and other order details Get the new order ID back (normally the primary key of the Order table), then make multiple calls to an InsertOrderDetail stored procedure to insert each order detail line item’s data Because there has been no easy way to encapsulate all this in a single call to the database server, the application has a chatty relationship with the server, causing many round trips and creating a fragile situation You had to be careful with transactions, because whenever so much data was flowing back and forth between the application server and the database server, too many things could go wrong In addition, it’s never a good idea to leave data fragments lying around a database Table-valued parameters to the rescue! Microsoft has felt our pain, and their solution is table-valued parameters (TVPs) This is not a new data type, but rather an object variable of type TABLE, an in-memory collection of rows TABLE variable types are not new in SQL Server 2008 What’s new is that you can now pass a variable of this type as a parameter to a stored procedure or other code module What? That’s all that’s new? Yes, indeed! This seemingly simple change has enormous implications for the code you write, vastly simplifying code whenever you need to pass rows of data to a code module The parameter is a strongly typed database object with a table schema that you define, with all of the normal benefits of such variables TVPs are great for passing tabular data around the code modules that make up an application, meeting a need that is surprisingly common in relational databases Let’s take a look at a simple example of creating and using a TVP You create and use TVPs in five steps: Create a table type and define its structure Declare a code module that accepts a parameter of the table type Declare a variable of the table type and reference it Fill the table variable with data Call the code module, passing the variable to it Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Table-valued parameters to the rescue! 223 The code for this example is in the Simple.sql file The first step is to create the table type and define its structure This is a persistent database object that you can reuse all you want within the database Here, MyTbl has an integer ID field—presumably a primary key—and a string field CREATE TYPE MyTbl AS TABLE (ID INT, String NVARCHAR(100)) GO You can treat the type almost like any kind of persisted table, such as by defining a primary key, CHECK constraints, default values for fields, and computed columns, and you can even set permissions One of the few limitations is that you can’t call user-defined functions from types The next step is to create a stored procedure that uses a parameter of the MyTbl type The dkSelectFromTVP procedure takes the TVP as its only parameter and returns the contents of the table using a SELECT statement As I said, this is a simple example CREATE PROCEDURE dkSelectFromTVP(@TVParam MyTbl READONLY) AS SET NOCOUNT ON; SELECT * FROM @TVParam; GO Next, create an instance of the TABLE type and populate it from data The code that follows creates the @TVP variable and inserts some data with the names of some cities and locations in Alaska Notice that you can populate TABLE types using the same T-SQL statements used to insert data in regular, persisted tables DECLARE @TVP AS MyTbl; INSERT INSERT INSERT INSERT INTO INTO INTO INTO @TVP(ID, @TVP(ID, @TVP(ID, @TVP(ID, String) String) String) String) VALUES VALUES VALUES VALUES (1, (2, (3, (4, 'Fairbanks'); 'Juneau'); 'Anchorage'); 'Denali'); @TVP behaves like any other kind of local variable, with a well-defined scope—the code module in which you declare it—that is cleaned up when it goes out of scope One benefit is that using a table-valued parameter with a stored procedure generally causes fewer recompilations than using temporary tables The final step is to run the stored procedure, passing the @TVP table-valued parameter Here’s the code: EXEC dkSelectFromTVP @TVP; This produces the result shown in figure Amazing! We have a stored procedure that returns the result of a SELECT statement! Let’s take the example a few steps further Say that you want to change the data within the stored procedure before returning it Here is a revised stored procedure that shows Figure Results of passing a TVP to a stored procedure with list of Alaska place names Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 224 CHAPTER 16 Table-valued parameters some civic pride by adding “Rocks!” to every city name other than Anchorage (which is just a big city that is more like Seattle than anything else in Alaska).1 ALTER PROCEDURE dkSelectFromTVP(@TVParam MyTbl READONLY) AS SET NOCOUNT ON UPDATE @TVParam SET String=String + ' Rocks!' WHERE String ➥'Anchorage' SELECT * FROM @TVParam GO But when you run this ALTER code, it produces an error message: Msg 10700, Level 16, State 1, Procedure dkSelectFromTVP, Line The table-valued parameter "@TVParam" is READONLY and cannot be modified Oops! We forgot to remove the READONLY keyword in the definition After making that change, here is the new code: ALTER PROCEDURE dkSelectFromTVP(@TVParam MyTbl) AS SET NOCOUNT ON UPDATE @TVParam SET String=String + ' Rocks!' WHERE String 'Anchorage' SELECT * FROM @TVParam GO But when you run this code, you still get an error message: Msg 352, Level 15, State 1, Procedure dkSelectFromTVP, Line The table-valued parameter @TVParam must be declared with the READONLY option This shows that a TVP must be declared as READONLY within the code module where you use it as a parameter This means that you can’t change the contents of the table from within the code module This is a disappointing limitation of TVPs in SQL Server 2008, but Microsoft is receiving pressure to ease this restriction in an update or future version For now, you can work around the problem by making changes to the content of the table variable before passing it to a code module The following code shows an example of how you could that: DECLARE @TVP AS MyTbl INSERT INSERT INSERT INSERT INTO INTO INTO INTO @TVP(ID, @TVP(ID, @TVP(ID, @TVP(ID, String) String) String) String) VALUES VALUES VALUES VALUES (1, (2, (3, (4, 'Fairbanks') 'Juneau') 'Anchorage') 'Denali') UPDATE @TVP SET String=String + ' Rocks!' WHERE String 'Anchorage' EXEC dkSelectFromTVP @TVP Editor’s note: The author of this chapter is from Fairbanks, Alaska, which is several times smaller than Anchorage He assures me that there is, in fact, no rivalry between the two cities at all Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Another TVP example 225 An interesting characteristic of a TVP is that it has a non-null default value: an empty recordset That means that this statement, EXEC dkSelectFromTVP does not cause an error, even though the stored procedure definition does not define a default value for the parameter, nor does it allow nulls The default value for a TVP is an empty recordset, even if you never initialize the variable Keep that behavior in mind as you work with TVPs, because it is quite different for other variable types Another TVP example Okay, that wasn’t the most extraordinary use of a TVP But it shows the steps and syntax that you’ll use with TVPs Let’s look at another example of how to use TVPs This example, in the Products.sql file, uses a TABLE type to read product data from the AdventureWorksLT2008 database and insert new rows into another products table along with inventory data and the current date through a stored procedure The first step is to create the TABLE type, which contains two fields for product information: CREATE TYPE ProductsType AS TABLE (ProductName NVARCHAR(50), ProductNumber NVARCHAR(25)); GO NOTE Microsoft no longer ships sample databases with SQL Server, but they make them available on CodePlex You can download the full set of sample SQL Server 2005 and 2008 databases from http:/ /www.codeplex.com/ MSFTDBProdSamples The Products table will contain the new information The Name and ProductNumber fields will contain the data from the AdventureWorksLT2008 database, while the ItemsInStock and CreatedDate fields will be populated when the data is inserted into the table CREATE TABLE Products (Name NVARCHAR(50), ProductNumber NVARCHAR(25), ItemsInStock INT, CreatedDate DATETIME); GO The next bit of code creates the stored procedure that receives the TVP with the product name and number data It uses an INSERT statement to add data to the new Products table, setting ItemsInStock to zero and using the GETDATE() function to insert the current date and time CREATE PROCEDURE dkInsertProducts @ProductsTVP ProductsType READONLY AS SET NOCOUNT ON INSERT INTO dbo.Products (Name, ProductNumber, ItemsInStock, CreatedDate) SELECT Name, ProductNumber, 0, GETDATE() FROM @ProductsTVP; GO Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 226 CHAPTER 16 Table-valued parameters Figure Results of running TVP-stored procedure The final set of server-side code creates the variable of the TABLE type ProductsType, reads data from the AdventureWorksLT2008.SalesLT.Product table, and passes the TVP to the stored procedure DECLARE @Prods AS ProductsType; INSERT INTO @Prods (ProductName, ProductNumber) SELECT [Name], ProductNumber FROM AdventureWorksLT2008.SalesLT.Product; EXEC dkInsertProducts @Prods; GO SELECT * FROM Products; Figure shows the results of running the code The usefulness of TVPs becomes apparent only with more real-world examples that are called from ADO.NET client applications, so we’ll explore that next Using TVPs from client applications Microsoft has a huge job keeping its various database and development platforms in sync so that developers can take advantage of new SQL Server 2008 features To make full use of TVPs, you’ll need to use both Visual Studio 2008 and ADO.NET 3.5 Neither has full support yet for all new SQL Server 2008 features, but fortunately the initial releases of both products support TVPs From client applications, you can use TVPs to pass multiple rows of data in a single round trip to the server, without any special server-side processing logic or temporary tables to hold data for set operations The data you pass must have the correct number of fields to match the TVP definition on the server, but the column names don’t have to match The data types must correspond to the types defined in the code module in SQL Server, or be implicitly convertible If you violate these requirements, you’ll get an error message from SQL Server You can pass a TVP from ADO.NET code to SQL Server using any of three ADO.NET objects: System.Data.DataTable System.Data.DbDataReader System.Collections.Generic.IList Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Using TVPs from client applications 227 ADO.NET and SQL Server collaborate to perform the conversions necessary to create a new TABLE type that is usable by T-SQL The DataTable object is probably the easiest to use and familiar to most NET database application developers The DbDataReader object is part of ADO.NET’s alternative factory-method-based objects SqlDataRecord is used within SQL CLR code, which is the NET code that executes within the SQL Server process Most often, you’ll need to identify the parameter as a TVP by using the TypeName property of a SqlParameter object, setting the property to SqlDbType.Structured The type you use must match the name of a compatible TABLE type in the database on SQL Server Otherwise you’ll get an error Using a DataTable The sample client is an ASP.NET website application contained in the Client directory in this chapter’s sample files The Simple.aspx page shows two methods of using TVPs from ADO.NET with the DataTable and DbDataReader objects The first of two buttons on the form uses the dkSelectFromTVP stored procedure created earlier in the chapter to return the list of rows in the TVP passed to it Listing shows the two methods The BuildDataTable method creates a new DataTable and populates it with data, this time with some cities and villages in the Alaska Interior The real meat of the code is in the DataTableButton Click event procedure Listing Code to use a DataTable to pass a TVP to a stored procedure private DataTable BuildDataTable() { DataTable dt = new DataTable("AlaskaCities"); dt.Columns.Add("ID", typeof(System.Int32)); dt.Columns.Add("String", typeof(System.String)); dt.Rows.Add(1, dt.Rows.Add(2, dt.Rows.Add(3, dt.Rows.Add(4, dt.Rows.Add(5, return dt; "Ester"); "Chena"); "North Pole"); "Chatanika"); "Fox"); } protected void DataTableButton_Click(object sender, EventArgs e) { DataTable dt = BuildDataTable(); using (SqlConnection cnn = new SqlConnection(ConfigurationManager ConnectionStrings["TVPs"].ConnectionString)) { SqlCommand cmd = new SqlCommand("dkSelectFromTVP", cnn); cmd.CommandType = CommandType.StoredProcedure; cmd.Parameters.AddWithValue("@TVParam", dt); cnn.Open(); SqlDataReader dr = cmd.ExecuteReader(); GridView1.DataSource = dr; Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 228 CHAPTER 16 Table-valued parameters GridView1.DataBind(); } } The event procedure starts by creating the DataTable object to be passed to the stored procedure The rest of the code is fairly typical ADO.NET code that creates a command object, adds the DataTable as a parameter, and calls the ExecuteReader method to return the resultset, which is bound to a GridView control on the web form Figure shows the results The notable thing about this example is that there isn’t anything notable about it, compared to Figure Results of passing a DataTable as a TVP how you’ve been writing ADO.NET code for years And because the structure of the DataTable is such a close match for the TABLE type defined earlier, the code doesn’t even have to specify the SqlDbType of the table-valued parameter Usually specifying the type is not optional, so you should get into the habit of always providing that information You’ll see how in the next example Using a DbDataReader The second button on the Simple.aspx page uses a DbDataReader object, one of the factory-based objects provided by ADO.NET This example reads the ProductID and Name fields from the Product table in the AdventureWorksLT2008 database and populates a DbDataReader object with that data It then passes the object to the dkSelectFromTVP stored procedure and again binds the results to the GridView control on the web page This time the code that defines the parameter is a bit different than our last example, because you have to specify the TypeName and SqlDbType properties of the SqlParameter object Set the TypeName property to the name of the TABLE type you defined in the database and set the SqlDbType property to the SqlDbType.Structured enumeration value SqlParameter param = cmd.Parameters.AddWithValue("@TVParam", ddr); param.TypeName = "dbo.MyTbl"; param.SqlDbType = SqlDbType.Structured; Listing shows the full code for the button’s Click event procedure, and figure shows a partial list of the results Listing Using a DbDataReader object as a TVP protected void DbDataReaderButton_Click(object sender, EventArgs e) { DbProviderFactory factory = DbProviderFactories.GetFactory("System.Data.SqlClient"); Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Using TVPs from client applications 229 DbConnection cnnAW = factory.CreateConnection(); cnnAW.ConnectionString = ConfigurationManager ConnectionStrings["AWLT"].ConnectionString; DbCommand dbCmd = cnnAW.CreateCommand(); dbCmd.CommandText = "SELECT Productid, [Name] FROM SalesLT.Product"; dbCmd.CommandType = CommandType.Text; cnnAW.Open(); DbDataReader ddr = dbCmd.ExecuteReader(); using (SqlConnection cnn = new SqlConnection(ConfigurationManager ConnectionStrings["TVPs"].ConnectionString)) { SqlCommand cmd = new SqlCommand("dkSelectFromTVP", cnn); cmd.CommandType = CommandType.StoredProcedure; SqlParameter param = cmd.Parameters.AddWithValue("@TVParam", ddr); param.TypeName = "dbo.MyTbl"; param.SqlDbType = SqlDbType.Structured; cnn.Open(); SqlDataReader dr = cmd.ExecuteReader(); GridView1.DataSource = dr; GridView1.DataBind(); } } Figure Results of passing a DbDataReader as a TVP Using TVPs to enter orders The next example shows a more realistic use for TVPs The code is part of an order entry system, and one of the application requirements is to reduce the number of round trips between the client application and the database server The code in Orders.sql takes the order header information and line items and inserts them into the appropriate tables in the database in a single round trip The code in listing shows the T-SQL used to create the database objects The OrderDetailsType is a TABLE type that the client application will pass to the stored procedure with the list of Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 230 CHAPTER 16 Table-valued parameters products selected by the user Pricing and other information will be gathered from other tables in the AdventureWorksLT2008 database Then the code creates the Orders and OrderDetails tables to hold the data Finally, the code creates the dkPlaceOrder stored procedure that receives the CustomerID and OrderDetailsType TVP It gets the next order ID from the Orders table, then inserts the data into the Orders and OrderDetails tables Note that all the data is either passed to the procedure or gathered from related tables in the AdventureWorksLT2008 database NOTE To keep things simple and to focus on TVPs, I didn’t make this code robust enough to handle concurrency issues For example, the method of getting the next OrderID number will likely result in duplicate IDs in a multiuser environment So don’t use this code in a production environment without beefing it up to handle your environment! Listing Code to create database objects to insert orders Create type to hold user product selections CREATE TYPE OrderDetailsType AS TABLE (ProductID INT NOT NULL, Quantity INT NOT NULL) GO CREATE TABLE Orders ( OrderID INT NOT NULL, CustomerID INT NOT NULL, OrderDate Datetime NOT NULL) GO CREATE TABLE OrderDetails ( OrderID INT NOT NULL, ProductID INT NOT NULL, Cost MONEY NOT NULL, Quantity INT NOT NULL) GO CREATE PROCEDURE dkPlaceOrder ( @CustomerID INT, @Items OrderDetailsType READONLY) AS Get the next OrderID CAUTION! Simple, but not robust for concurrent apps DECLARE @OrderID INT SET @OrderID = (SELECT ISNULL(MAX(OrderID)+1, 1) FROM Orders) Create the order INSERT INTO dbo.Orders VALUES(@OrderID, @CustomerID, GETDATE()) Insert the products INSERT dbo.OrderDetails(OrderID, ProductID, Cost, Quantity) SELECT @OrderID, tvp.ProductID, prd.StandardCost, tvp.Quantity FROM @Items AS tvp INNER JOIN AdventureWorksLT2008.SalesLT.Product AS prd ON tvp.ProductID = prd.ProductID GO Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Using TVPs from client applications 231 Figure The order entry user interface Orders.sql also has some code you can use to test the stored procedure by entering an order Figure shows that the Orders.aspx page in the Client sample website application has the user interface The user selects a product category, then selects one of the available products To order the item, the user enters the quantity and clicks the Add to Cart button The Shopping Cart section keeps track of the selected items When the user is done, she clicks the Checkout button to place the order, which inserts the data into the Orders and OrderDetails tables To maximize profits, there is no way to remove an item from the shopping cart after it is placed there Most of the code behind the Orders.aspx page manages the user interface and is not included here; you can explore that code on your own The shopping cart is managed with a DataTable that is saved as a session variable as the user interacts with the page and selects products This makes it easy to pass to the stored procedure when the user is ready to place the order But there is a twist in the code: to make displaying the contents of the shopping cart convenient, the DataTable includes a ProductName field in addition to the two fields that the stored procedure expects So before passing the DataTable to the procedure, the code removes that extra field It does this by creating a copy of the DataTable object and removing the column from the copy instead of the original object If you tried to pass the DataTable with the extra field, SQL Server would refuse it and raise an error Listing contains the code for the Click event of the Checkout button The code uses a hardcoded CustomerID of 1, and sets up to pass the dt DataTable object as a TVP Listing Checkout code that creates the order in the database protected void CheckoutButton_Click(object sender, EventArgs e) { // Need to get rid of extra column Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 232 CHAPTER 16 Table-valued parameters DataTable dt = dtCart.Copy(); dt.Columns.Remove("ProductName"); using (SqlConnection cnn = new SqlConnection(ConfigurationManager ConnectionStrings["TVPs"].ConnectionString)) { SqlCommand cmd = new SqlCommand("dkPlaceOrder", cnn); cmd.CommandType = CommandType.StoredProcedure; cmd.Parameters.AddWithValue("@CustomerID", 1); SqlParameter param = cmd.Parameters.AddWithValue("@Items", dt); param.TypeName = "dbo.OrderDetailstype"; param.SqlDbType = SqlDbType.Structured; cnn.Open(); cmd.ExecuteNonQuery(); dtCart.Rows.Clear(); gridCart.DataSource = dtCart; gridCart.DataBind(); } } Summary The primary benefit of using TVPs is that you can write simpler code than in older versions of SQL Server because you don’t have to deal with the lack of container objects supported in T-SQL As a result you have more flexibility in working with sets of data as well as better performance because you can take advantage of T-SQL’s setbased operations The variable that contains the TVPs is scoped to the procedure to which they are passed, just as with other types of parameters They abide by all of the security features in SQL Server, which means that you can assign permissions on the TABLE type object and limit how they are used and by whom And like other database objects, various catalog views can provide information about the table objects TVPs have limitations: TVPs are passed by reference (BY REF) for the sake of performance (This is not the same way that objects are passed by reference in NET programming languages; see the explanation that follows.) The parameter must be marked as READONLY, which means that you can’t modify in any way the data in the TVP from within the procedure you pass it to This is probably the most significant limitation of TVPs, one that hopefully Microsoft will ease in the future No statistics are created on the TABLE object variable, which means that you may end up with expensive table scans more often than you would with a regular persisted table It is important to remember that the default value for a TABLE variable is an empty recordset, not a null This makes it different from any other data type, so you have to remember this and not expect that a TABLE variable will ever be null Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Summary 233 As mentioned above, one of the limitations of TVPs is that they are passed by reference The TVP is materialized as a temporary table in the tempdb database, and when you pass it to a code module, you are actually passing a reference to that table SQL Server manages it all for you, so unless you dig (or read the documentation!) you’ll never know this The code samples for this chapter include a TempDB.sql script that lets you explore this behavior, by displaying tempdb’s sys.tables and sys.columns catalog views to list the structure of the TVP from within a stored procedure Nevertheless, despite these limitations, TVPs a remarkable job in reducing the amount of code that you need to write when you need to pass a set of data to a code module They are probably among the best of the productivity features that Microsoft introduced in SQL Server 2008 About the author Don Kiely, MVP, MCSD, is a technology consultant who develops secure desktop and web applications using tools including SQL Server, VB, C#, AJAX, and ASP.NET He writes regularly for many industry journals, including Visual Studio Magazine, MSDN Magazine, CoDe Magazine, and asp.netPRO Don trains developers and speaks regularly at industry conferences, including TechEd, SQL PASS, DevConnections, DevTeach, and others, and is a member of the INETA and MSDN Canada speaker bureaus He writes courseware and records videos for AppDev In his other life he roams the Alaska wilderness by foot, dog sled, skis, and kayak Contact him at donkiely@computer.org Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross 17 Build your own index Erland Sommarskog Consider this SQL query: SELECT person_id, first_name, last_name, birth_date, email FROM persons WHERE email LIKE '%' + @word + '%' You can immediately tell that the only way the SQL Server can evaluate this query is to look through every single email address, be that by scanning the table or by scanning an index on the email column If the table is large, say, ten million rows, you can expect an execution time of at least a minute Imagine now that there is a requirement that in most cases the response time for a search should be only a few seconds How could you solve this? Regular indexes not help, nor full-text indexes; to be efficient both require that there is no leading wildcard in the search string There is one way out: you have to build your own index In this chapter I will look at three ways to this The first two methods more or less require SQL 2005 or later, whereas the last method is easily implementable on SQL 2000 The database and the table In this chapter we will work with the persons table shown in listing Listing Creating the persons table and index on email CREATE TABLE persons ( person_id int NOT NULL, first_name nvarchar(50) NULL, last_name nvarchar(50) NOT NULL, birth_date datetime NULL, email varchar(80) NOT NULL, other_data char(73) NOT NULL DEFAULT ' ' CONSTRAINT pk_persons PRIMARY KEY (person_id)) CREATE INDEX email_ix ON persons (email) 234 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross Plain search and introducing tester_sp 235 The aim is to implement queries that permit users to look up persons by searching on parts of an email address The column other_data is a filler to make the table wider; in a real-life table you would instead find other columns, such as address, city and so on If you want to work with the examples in this chapter, you should get the file Ch.17.zip from the download area for this book This file includes a number of scripts that we will work with The scripts assume that you have Microsoft SQL Server 2005 or later installed Note also that some of the scripts require that you have enabled the Common Language Runtime (CLR) on you server; by default the execution of userwritten CLR code is disabled Different file names denote different versions of the scripts for SQL 2005 and SQL 2008 The zip file also includes a BCP file from which the persons table is loaded The script 01_build_database.sql creates a database called yourownindex, which is 1.7 GB, whereof 700 MB are the log files Before you run the script, run a find-andreplace to match the path to which you extracted the files The database collation is Latin1_General_CI_AS The script loads one million rows into the persons table from the BCP file Because including real email addresses in a download file could be sensitive, I have generated the addresses from a list of Slovenian words that I happened to have lying around Slovenian is a language which is very rich in inflections, and this list has over a million entries The user section of the addresses is completely random There are some 20,000 different domains, with a skewed distribution to make some domains more common than others to mimic a real-world situation The top domains are the normal ones, with the distribution taken from a real-world source About 58 percent of the addresses end in com or net The email addresses are lowercase only, and I have replaced all accented characters so that only international A through Z characters appear There are no digits in the data The data in the columns first_name, last_name, and birth_date are taken from the AdventureWorksDW database The build script also creates a few more items that I will present as we encounter them Expect the script to run for 2–3 minutes; it’s a good idea to run the script at this point Plain search and introducing tester_sp One should never go on a performance quest without a baseline In our case this is the SELECT statement in the beginning of the article This also gives me the opportunity to introduce the stored procedure tester_sp, created by 01_build_database.sql Tester_sp expects a single parameter: the name of the procedure to be tested Tester_sp assumes that the tested procedure takes a single parameter containing a search string, and that the procedure returns the columns person_id, first_name, last_name, birth date, and email for all persons whose email address contains the search string Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark Licensed to Kerri Ross ... hoc SQL (the difference is in SQL encapsulation), what about T -SQL extensions that Entity SQL doesn’t support? There are database-specific extensions like SQL Server? ??s PIVOT operator, or ANSI SQL. .. dialect of SQL (Entity SQL or ESQL) and can use ESQL statements or LINQ queries to access data Although neither framework uses vanilla T -SQL as its query language, both frameworks can generate SQL statements... unit of execution of a NET application SQL Server 2005 introduced the ability to run NET assemblies in the SQL Server process space Running inside of the SQL Server process offers performance benefits