Hướng dẫn học Microsoft SQL Server 2008 part 55 pdf

10 322 0
Hướng dẫn học Microsoft SQL Server 2008 part 55 pdf

Đang tải... (xem toàn văn)

Thông tin tài liệu

Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 502 Part III Beyond Relational The other solution to the multiple-column search problem consists of adding an additional col- umn to hold all the text to be searched and duplicating the data from the original columns to a FullTextSearch column within an after trigger or using a persisted computed column. This solution is not smooth either. It duplicates data and costs performance time during inserts and updates. The crux of the decision regarding how to solve the multiple-column search is the conflict between fast reads and fast writes — OLAP versus OLTP. Searches with wildcards Because the full-text search engine has its roots in Windows Index and was not a SQL Server–developed component, its wildcards use the standard DOS conventions (asterisk for a multi-character wildcard, and double quotes) instead of SQL-style wildcards and SQL single quotes. The other thing to keep in mind about full-text wildcards is that they work only at the end of a word, not at the beginning. Indexes search from the beginning of strings, as shown here: SELECT Title FROM Fable WHERE CONTAINS (*,‘ "Hunt*" ’); Result: Title The Hunter and the Woodman The Ass in the Lion’s Skin The Bald Knight Phrase searches Full-text search can attempt to locate full phrases if those phrases are surrounded by double quotes. For example, to search for the fable about the boy who cried wolf, searching for ‘‘Wolf! Wolf!’’ does the trick: SELECT Title FROM Fable WHERE CONTAINS (*,‘ "Wolf! Wolf!" ’); Result: Title The Shepherd’s Boy and the Wolf Word-proximity searches When searching large documents, it’s nice to be able to specify the proximity of the search words. Full- text search implements a proximity switch by means of the NEAR option. The relative distance between 502 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 503 Using Integrated Full-Text Search 19 the words is calculated, and, if the words are close enough (within about 30 words, depending on the size of the text), then full-text search returns a true for the row. The story of Androcles, the slave who pulls the thorn from the lion’s paw, is one of the longer fables in the sample database, so it’s a good test sample. The following query attempts to locate the fable ‘‘Androcles’’ based on the proximity of the words ‘‘pardoned’’ and ‘‘forest’’ in the fable’s text: SELECT Title FROM Fable WHERE CONTAINS (*,‘pardoned NEAR forest’); Result: Title Androcles The proximity switch can handle multiple words. The following query tests the proximity of the words ‘‘lion,’’ ‘‘paw,’’ and ‘‘bleeding’’: SELECT Title FROM Fable WHERE CONTAINS (*,‘lion NEAR paw NEAR bleeding’); Result: Title Androcles The proximity feature can be used with CONTAINSTABLE;theRANK indicates relative proximity. The following query ranks the fables that mention the word ‘‘life’’ near the word ‘‘death’’ in order of proximity: SELECT Fable.Title, FTS.Rank FROM Fable INNER JOIN CONTAINSTABLE (Fable, *,‘life NEAR death’) AS FTS ON Fable.FableID = FTS.[KEY] ORDER BY FTS.Rank DESC; Result: Title Rank The Serpent and the Eagle 7 The Eagle and the Arrow 1 The Woodman and the Serpent 1 503 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 504 Part III Beyond Relational Word-inflection searches The full-text search engine can actually perform linguistic analysis and base a search for different words on a common root word. This enables you to search for words without worrying about number or tense. For example, the inflection feature makes possible a search for the word ‘‘flying’’ that finds a row containing the word ‘‘flew.’’ The language you specify for the table is critical in a case like this. Something else to keep in mind is that the word base will not cross parts of speech, meaning that a search for a noun won’t locate a verb form of the same root. The following query demonstrates inflection by locating the fable with the word ‘‘flew’’ in ‘‘The Crow and the Pitcher’’: SELECT Title FROM Fable WHERE CONTAINS (*,‘FORMSOF(INFLECTIONAL,fly)’); Result: Title The Crow and the Pitcher The Bald Knight Thesaurus searches The full-text search engine has the capability to perform thesaurus lookups for word replacements as well as synonyms. To configure your own thesaurus options, edit the thesaurus file. The location of the thesaurus file is dependent on your language, and server. The thesaurus file for your language will follow the naming convention TSXXX.xml, where XXX is your language code (e.g., ENU for U.S. English, ENG for U.K. English, and so on). You need to remove the comment lines from your thesaurus file. If you edit this file in a text editor, then there are two sections or nodes to the thesaurus file: an expansion node and a replacement node. The expansion node is used to expand your search argument from one term to another argument. For example, in the thesaurus file, you will find the following expansion: <expansion> <sub>Internet Explorer</sub> <sub>IE</sub> <sub>IE5</sub> </expansion> This will convert any searches on ‘‘IE’’ to search on ‘‘IE’’ or ‘‘IE5’’ or ‘‘Internet Explorer.’’ The replacement node is used to replace a search argument with another argument. For example, if you want the search argument sex interpreted as gender, you could use the replacement node to do that: <replacement> <pat>sex</pat> <sub>gender</sub> </replacement> 504 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 505 Using Integrated Full-Text Search 19 The pat element (sex) indicates the pattern you want substituted by the sub element (gender). A FREETEXT query will automatically use the thesaurus file for the language type. Here is an example of a generational query using the Thesaurus option: SELECT * FROM TableName WHERE CONTAINS(*,‘FORMSOF(Thesaurus,"IE")’); This returns matches to rows containing IE, IE5, and Internet Explorer. Variable-word-weight searches In a search for multiple words, relative weight may be assigned, making one word critical to the search and another word much less important. The weights are set on a scale of 0.0 to 1.0. The ISABOUT option enables weighting, and any hit on the given word allows the rows to be returned, so it functions as an implied Boolean OR operator. The following two queries use the weight option with CONTAINSTABLE to highlight the differences among the words ‘‘lion,’’ ‘‘brave,’’ and ‘‘eagle’’ as the weighting changes. The query will examine only the FableText column to prevent the results from being skewed by the shorter lengths of the text found on the title and moral columns. The first query weights the three words evenly: SELECT Fable.Title, FTS.Rank FROM Fable INNER JOIN CONTAINSTABLE (Fable, FableText, ‘ISABOUT (Lion weight (.5), Brave weight (.5), Eagle weight (.5))’) AS FTS ON Fable.FableID = FTS.[KEY] ORDER BY Rank DESC; Result: Title Rank Androcles 92 The Eagle and the Fox 85 The Hunter and the Woodman 50 The Serpent and the Eagle 50 The Dogs and the Fox 32 The Eagle and the Arrow 21 The Ass in the Lion’s Skin 16 When the relative importance of the word ‘‘eagle’’ is elevated, it’s a different story: SELECT Fable.Title, FTS.Rank FROM Fable INNER JOIN CONTAINSTABLE (Fable, FableText, 505 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 506 Part III Beyond Relational ‘ISABOUT (Lion weight (.2), Brave weight (.2), Eagle weight (.8))’) AS FTS ON Fable.FableID = FTS.[KEY] ORDER BY Rank DESC; Result: Title Rank The Eagle and the Fox 102 The Serpent and the Eagle 59 The Eagle and the Arrow 25 Androcles 25 The Hunter and the Woodman 14 The Dogs and the Fox 9 The Ass in the Lion’s Skin 4 When all the columns participate in the full-text search, the small size of the moral and the title make the target words seem relatively more important within the text. The next query uses the same weighting as the previous query but includes all columns (*): SELECT Fable.Title, FTS.Rank FROM Fable INNER JOIN CONTAINSTABLE (Fable, *, ‘ISABOUT (Lion weight (.2), Brave weight (.2), Eagle weight (.8))’) AS FTS ON Fable.FableID = FTS.[KEY] ORDER BY Rank DESC; Result: Title Rank The Wolf and the Kid 408 The Hunter and the Woodman 408 The Eagle and the Fox 102 The Eagle and the Arrow 80 The Serpent and the Eagle 80 Androcles 25 The Ass in the Lion’s Skin 23 The Dogs and the Fox 9 The ranking is relative, and is based on word frequency, word proximity, and the relative importance of a given word within the text. ‘‘The Wolf and the Kid’’ does not contain an eagle or a lion, but two fac- tors favor bravado. First, ‘‘brave’’ is a rarer word than ‘‘lion’’ or ‘‘eagle’’ in both the column and the table. Second, the word ‘‘brave’’ appears in the moral as one of only 10 words. So even though ‘‘brave’’ was weighted less, it rises to the top of the list. It’s all based on word frequencies and statistics (and some- times, I think, the phase of the moon!). 506 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 507 Using Integrated Full-Text Search 19 Fuzzy Searches While the CONTAINS predicate and CONTAINSTABLE-derived table perform exact word searches, the FREETEXT predicate expands on the CONTAINS functionality to include fuzzy, or approximate, full-text searches from free-form text. Instead of searching for two or three words and adding the options for inflection and weighting, the fuzzy search handles the complexity of building searches that make use of all the full-text search engine options, and tries to solve the problem for you. Internally, the free-form text is broken down into multiple words and phrases, and the full-text search with inflections and weighting is then performed on the result. Freetext FREETEXT works within a WHERE clause just like CONTAINS, but without all the options. The follow- ing query uses a fuzzy search to find the fable about the big race: SELECT Title FROM Fable WHERE FREETEXT (*,‘The tortoise beat the hare in the big race’); Result: Title The Hare and the Tortoise FreetextTable Fuzzy searches benefit from the FREETEXT-derived table that returns the ranking in the same way that CONTAINSTABLE does. The two queries shown in this section demonstrate a fuzzy full-text search using the FREETEXT-derived table. Here is the first query: SELECT Fable.Title, FTS.Rank FROM Fable INNER JOIN FREETEXTTABLE (Fable, *, ‘The brave hunter kills the lion’) AS FTS ON Fable.FableID = FTS.[KEY] ORDER BY Rank DESC; Result: Title Rank The Hunter and the Woodman 257 The Ass in the Lion’s Skin 202 The Wolf and the Kid 187 Androcles 113 507 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 508 Part III Beyond Relational The Dogs and the Fox 100 The Goose With the Golden Eggs 72 The Shepherd’s Boy and the Wolf 72 Here is the second query: SELECT Fable.Title, FTS.Rank FROM Fable INNER JOIN FREETEXTTABLE (Fable, *, ‘The eagle was shot by an arrow’) AS FTS ON Fable.FableID = FTS.[KEY] ORDER BY Rank DESC; Result: Title Rank The Eagle and the Arrow 288 The Eagle and the Fox 135 The Serpent and the Eagle 112 The Hunter and the Woodman 102 The Father and His Two Daughters 72 Performance SQL Server 2008’s full-text search engine performance is several orders of magnitude faster than previous versions of SQL Server. However, you still might want to tune your system for optimal performance. ■ iFTS benefits from a very fast subsystem. Place your catalog on its own controller, preferably its own RAID 10 array. A sweet spot exists for SQL iFTS on eight-way servers. After a full or incremental population, force a master merge, which will consolidate all the shadow indexes into a single master index, by issuing the following command: ALTER FULLTEXT CATALOG catalog_name REORGANIZE; ■ You can also increase the maximum number of ranges that the gathering process can use. To do so, issue the following command: EXEC sp_configure ‘max full-text crawl range’, 32; Summary SQL Server indexes are not designed for searching for words in the middle of a column. If the database project requires flexible word searches, then Integrated Full-Test Search (iFTS) is the perfect solution, even though it requires additional development and administrative work. 508 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 509 Using Integrated Full-Text Search 19 ■ iFTS requires configuring a catalog for each table to be searched. ■ iFTS catalogs are not populated synchronously within the SQL Server transaction. They are populated asynchronously following the transaction. The recommended method is using Change Tracking, which can automatically push changes as they occur. ■ CONTAINS is used within the WHERE clause and performs simple word searches, but it can also perform inflectional, proximity, and thesaurus searches. ■ CONTAINSTABLE functions like CONTAINS but it returns a data set that can be referenced in a FROM clause. ■ FREETEXT and FREETEXTTABLE essentially turn on every advanced feature of iFTS and perform a fuzzy word search. As you read through this ‘‘Beyond Relational’’ part of the book, I hope you’re getting a sense of the breadth of data SQL Server can manage. The next chapter concludes this part with Filestream, a new way to store large BLOBs with SQL Server. 509 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 510 www.getcoolebook.com Nielsen p04.tex V4 - 07/21/2009 1:06pm Page 511 Developing with SQL Server IN THIS PART Chapter 20 Creating the Physical Database Schema Chapter 21 Programming with T-SQL Chapter 22 Kill the Cursor! Chapter 23 T-SQL Error Handling Chapter 24 Developing Stored Procedures Chapter 25 Building User-Defined Functions Chapter 26 Creating DML Triggers Chapter 27 Creating DDL Triggers Chapter 28 Building the Data Abstraction Layer Chapter 29 Dynamic SQL and Code Generation P art II of this book was all about writing set-based queries. Part III extended the select command to data types beyond relational. This part continues to expand on select to provide programmable flow of control to develop server-side solutions; and SQL Server has a large variety of technologies to choose from to develop server-side code — from the mature T-SQL language to .NET assemblies hosted within SQL Server. This part opens with DDL commands ( create, alter,anddrop), and progresses through 10 chapters of Transact-SQL that build on one another into a crescendo with the data abstraction layer and dynamic SQL. The final chapter fits CLR programming into the picture. So, unleash the programmer within and have fun. There’s a whole world of developer possibilities with SQL Server 2005. If SQL Server is the box, then Part IV is all about thinking inside the box, and moving the processing as close to the data as possible. www.getcoolebook.com . develop server- side solutions; and SQL Server has a large variety of technologies to choose from to develop server- side code — from the mature T -SQL language to .NET assemblies hosted within SQL Server. This. Relational’’ part of the book, I hope you’re getting a sense of the breadth of data SQL Server can manage. The next chapter concludes this part with Filestream, a new way to store large BLOBs with SQL Server. 509 www.getcoolebook.com Nielsen. Page 511 Developing with SQL Server IN THIS PART Chapter 20 Creating the Physical Database Schema Chapter 21 Programming with T -SQL Chapter 22 Kill the Cursor! Chapter 23 T -SQL Error Handling Chapter

Ngày đăng: 04/07/2014, 09:20

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan