ptg 2014 CHAPTER 50 SQL Server Full-Text Search catalogs are queryable, but no results are returned until you rebuild them); 2, which means the full-text indexes are imported into the database (however, the results may be inconsistent because some of the full-text indexes are generated by the SQL 2005 full-text word breakers and not the SQL Server 2008 word breakers). Now you know how to build full-text catalogs and indexes and modify them. The next section describes how to get information on the catalogs and indexes you build. Diagnostics After you create catalogs and indexes, you occasionally need to get information about your catalogs, tables, and indexes. So that you can do this, Microsoft has supplied the sp_help_fulltext_tables and sp_help_fulltext_columns stored procedures and the system view sys.fulltext_catalogs. These stored procedures and view allow you to examine the state of your full-text tables, columns, and catalogs. Microsoft recommends that rather than using these objects, you use the OBJECTPROPERTY, COLUMNPROPERTY, and FULLTEXTCATALOGPROPERTY metadata func- tions. Table 50.1 lists the full-text index properties for the OBJECTPROPERTY function. TABLE 50.1 Full-Text Index Properties for the OBJECTPROPERTY Property Description Values TableFullText BackgroundUpdate IndexOn Indicates whether change 1 = true and 0 = false tracking is enabled. TableFulltext CatalogId Returns the catalog ID of the catalog the full-text index is placed on. CatalogID or 0 (table not indexed) TableFulltextChange TrackingOn Enables change tracking. 1 = true and 0 = false TableFulltextDocs Processed Returns the number of rows processed since indexing started. TableFulltextFail Count Returns the number of rows that failed to index. TableFulltextItem Count Returns the number of rows successfully indexed. TableFulltextKey Column Returns the ID of the key index used by SQL Server FTS (normally the primary key). ptg 2015 Setting Up a Full-Text Index 50 Table 50.2 lists the full-text index properties for the COLUMNPROPERTY function. Table 50.3 lists the properties for the FULLTEXTCATALOGPROPERTY function. TABLE 50.1 Full-Text Index Properties for the OBJECTPROPERTY Property Description Values TableFulltext PendingChanges Returns the number of rows outstanding to be indexed. TableFulltext PopulateStatus Returns a number indicating the state of the population. 1 = full population is in progress; 2 = incremental population is in progress; 3 = propagation of tracked changes is in progress; 4 = background update index is in progress, such as autochange tracking; and 5 = full-text indexing is throttled or paused TableHasActive FulltextIndex Indicates whether a table has an active full-text index on it. 1 = true and 0 = false TABLE 50.2 Full-Text Index Properties for the COLUMNPROPERTY Function Property Description Values IsFulltextIndexed Indicates whether a column is full-text indexed. 1 = true and 0 = false FullTextTypeColumn Returns the ID of the document type column. TABLE 50.3 Properties for the FULLTEXTCATALOGPROPERTY Property Description Values AccentSensitivity Indicates whether the catalog is accent sensitive. 1 = true and 0 = false IndexSize Returns the size of the full-text catalog. ItemCount Returns the number of items (rows) indexed in the catalog. ptg 2016 The following examples show how to query metadata functions using the full-text index properties: SELECT OBJECTPROPERTY(object_id(‘Person.Contact’), ‘TableFullTextBackgroundUpdateIndexOn’) select objectproperty(object_id(‘Person.Contact’),’TableFulltextChangeTrackingOn’) SELECT OBJECTPROPERTY(object_id(‘Person.Contact’),’TableFulltextKeyColumn’) SELECT OBJECTPROPERTY(object_id(‘Person.Contact’),’TableFulltextPendingChanges’) SELECT OBJECTPROPERTY(object_id(‘Person.Contact’),’TableFulltextPopulateStatus’) SELECT OBJECTPROPERTY(object_id(‘Person.Contact’),’TableHasActiveFulltextIndex’) CHAPTER 50 SQL Server Full-Text Search TABLE 50.3 Properties for the FULLTEXTCATALOGPROPERTY Property Description Values MergeStatus Indicates whether a master merge is in progress. 1 = true and 0 = false PopulateCompletion Age Specifies how long ago the last popula- tion completed. PopulateStatus Returns the status of the population. 0 = idle, 1 = full population in progress, 2 = paused, 3 = throttled, 4 = recovering, 5 = shut down, 6 = incre- mental population in progress, 7 = building index, 8 = disk is full, paused, and 9 = change tracking UniqueKeyCount Returns the number of unique words indexed. ResourceUsage Returns a number indicating how aggressively SQL Server FTS is consoli- dating the catalog. Ranges from 1 to 5 (the most aggressive); 3 is the default. IsFulltextInstalled Indicates whether SQL Server FTS is installed. 1 = true and 0 = false LoadOSResources Indicates whether third-party word break- ers are loaded. 1 = true and 0 = false VerifySignature Determines whether signatures of word breakers and language resources are checked. 1 = true and 0 = false ptg 2017 Setting Up a Full-Text Index 50 SELECT COLUMNPROPERTY ( object_id(‘Person.Contact’), ‘charcol’ , ‘IsFulltextIndexed’ ) SELECT COLUMNPROPERTY ( object_id(‘Person.Contact’), ‘VarbinaryColumn’,’FullTextTypeColumn’ ) SELECT FULLTEXTCATALOGPROPERTY(‘MyCatalog’,’indexsize’) SELECT FULLTEXTCATALOGPROPERTY(‘MyCatalog’,’itemcount’) SELECT FULLTEXTCATALOGPROPERTY(‘MyCatalog’,’mergestatus’) SELECT FULLTEXTCATALOGPROPERTY(‘MyCatalog’,’populatecompletionage’) SELECT FULLTEXTCATALOGPROPERTY(‘MyCatalog’,’populatestatus’) SELECT FULLTEXTSERVICEPROPERTY(‘loadosresources’) Using the Full-Text Indexing Wizard to Build Full-Text Indexes and Catalogs Although the T-SQL full-text commands provide a scriptable interface for creating full-text catalogs and indexes, sometimes it is easier to use the Full-Text Indexing Wizard to create them. To create a full-text index, follow these steps: 1. Connect to SQL Server in SQL Server Management Studio. 2. Expand the databases folder. 3. Expand the database that contains the tables you want to full-text index. 4. Expand the tables folder. 5. Right-click the table you want to full-text index (in this example, the Production.Document table). 6. Select Full-Text Index, as shown in Figure 50.1. FIGURE 50.1 Selecting the Full-Text Index menu in SSMS. ptg 2018 You then click Define Full-Text Index to launch the Full-Text Indexing Wizard. On the Welcome to the SQL Server Full-Text Indexing Wizard splash screen, you click Next to bring up the Select an Index dialog, as shown in Figure 50.2. In the Unique Index drop- down box, you select the unique index you want to use for the full-text index. In this example, the only option is the primary key, PK_Document_DocumentID. CHAPTER 50 SQL Server Full-Text Search TIP If there are multiple unique keys to choose from, it is recommended that you choose the smallest of the unique keys. It is also a good idea to choose a unique key; this is a static column that is unlikely to be modified. You may get the message “A unique column must be defined on this table/view.” In this case, you have to create a unique index or primary key on the table before you can proceed. If a unique index or primary key exists, the Next button is enabled. When you click the Next button, the next dialog you see is the Select Table Columns dialog (see Figure 50.3). In this dialog, you select the columns you want to index and the word breaker you want to use to index the contents of this column. Notice that the Select Table Columns dialog displays only the columns that can be full- text indexed. In this example, the FileName and DocumentSummary columns will be indexed by the server default full-text language. For the Document column, you select the language (English) by clicking the drop-down box that displays the available languages. The document type (in this case FileExtension) also needs to be selected. You then click Next and proceed to choose the population type from the Select Change Tracking dialog (see Figure 50.4). FIGURE 50.2 The Full-Text Indexing Wizard Select an Index dialog. ptg 2019 Setting Up a Full-Text Index 50 FIGURE 50.3 The Full-Text Index Wizard Select Table Columns dialog. There are three options in the Select Change Tracking dialog: Automatically (continuous change tracking), Manually (change tracking with scheduled or manual updates), and Do Not Track Changes. If you specify Do Not Track Changes, the Start Full Population When Index Is Created check box is enabled. You click Next to advance to the Select a Catalog dialog. This dialog allows you to select an existing catalog or create a new catalog with options to set the catalog accent sensitivity and to make it the default catalog. You click Next to set incremental table and catalog populations. You click Next to view the summary page and finish creating your full-text indexes and catalogs. You click Close to complete FIGURE 50.4 The Full-Text Index Wizard Select Change Tracking dialog. ptg 2020 CHAPTER 50 SQL Server Full-Text Search the wizard. If you are running Service Pack 1, you need to right-click your table one more time, select Full-Text Index, and select Enable Full-Text Index to start change tracking. You are now ready to start querying your full-text indexes. Full-Text Searches Four SQL clauses allow you to conduct full-text searches on your full-text index tables: . CONTAINS—Specifies a strict exact match, with options to make the search flexible. . CONTAINSTABLE—Returns a ranked rowset from SQL Server FTS implementing the Contains algorithm, which must be joined against the base table. . FREETEXT—Specifies a stemmed search that returns results to all generations of the search phrase. . FREETEXTTABLE—Returns a ranked rowset from SQL Server FTS implementing the FreeText algorithm, which must be joined against the base table. CONTAINS and CONTAINSTABLE The CONTAINS and CONTAINSTABLE predicates have the following parameters: . Search phrase . Generation . Proximity . Weighted Search Phrase The search phrase is the phrase or word that you are looking for in a full-text indexed table. If you are searching for more than one word, you have to wrap your search phrase in double quotation marks, as in this example: SELECT * FROM Person.Contact WHERE CONTAINS(*,’”search phrase”’) — search all columns In this query, you are searching all full-text indexed columns. However, you can search a single column, a list of columns, or all columns. The following example shows how: SELECT * FROM Person.Contact WHERE CONTAINS(FirstName, ‘“search phrase”’) — searching 1 column SELECT * FROM Person.Contact WHERE CONTAINS((FirstName,Lastname), ‘“search phrase”’) — searching 2 columns You can also use Boolean operators in your search phrase, as in this example: SELECT * FROM Person.Contact WHERE CONTAINS(*, ‘“Ford” AND NOT (“Harrison” OR “Betty”)’) ptg 2021 Full-Text Searches 50 This example searches on Ford cars, where you don’t want hits to rows that contain refer- ences to Harrison and Ford or Betty and Ford. CONTAINS supports Boolean AND, OR, and AND NOT but not OR NOT. You can also use wildcards in your searches by adding the * to the end of a word in your search phrase. A wildcard added to one word acts as wildcard on all words in the search phrase, so a search on Al Anon* matches with Alcoholics Anonymous, Al Anon, and Alexander Anonuevo. Generation The term generation refers to all forms of a word, which could be the word itself, all declensions (that is, singular or plural forms, such as book and books), conjugations of a word (such as book, booked, booking, and books), and thesaurus replacements and substitu- tions of a word. To search on all generations of a word, you use a FREETEXT search on the formsOf predicate. The following example shows how to use the formsOf predicate to search on declensions and conjugations of a word: SELECT * FROM Person.Contact WHERE CONTAINS(*,’formsOf(inflectional,book)’) Generations of a word also include its thesaurus expansions and replacements. An expansion is the word and other synonyms of the word (for example, book and volume or car and automobile). An expansion can also include alternate spellings, abbreviations, and nicknames. A replacement is a word that you want replaced in a search. For example, if you have users searching on the word sex, and you want sex interpreted as gender, you can replace the search on the term sex with a search on the word gender. To get the thesaurus option to work, you need to edit the thesaurus file for your language. By default, the thesaurus files are in C:\Program Files\Microsoft SQL Server\MSSQL.X\MSSQL\FTData, where X is the instance number. There is a thesaurus file for each full-text supported language; it is named TSXXX.XML, where XXX is a three-letter identifier for the language. There also is another thesaurus file called TSGlobal.XML. Changes made to the TSGlobal thesaurus file are effective in all languages but are overrid- den by the language-specific thesaurus files. To make the thesaurus file effective, you have to remove the comment marks and then restart MSFTESQL (the Microsoft SQL Server Full- Text Search service). Notice that the thesaurus files have an XML element called <diacritics = true/>. Setting this element to false makes the thesaurus not sensitive to accents; otherwise, the thesaurus file is accent sensitive. As mentioned previously, the thesaurus file has two sections: an expansion section and a replacement section. The expansion section looks like this: <expansion> <sub>Internet Explorer</sub> <sub>IE</sub> <sub>IE5</sub> <sub>IE6</sub> </expansion>. ptg 2022 CHAPTER 50 SQL Server Full-Text Search The sub nodes refer to substitutes, so a search on Internet Explorer is substituted to addi- tional searchers on Internet Explorer, IE, IE5, and IE6. The replacement section looks like this: <replacement> <pat>NT5</pat> <pat>W2K</pat><sub>Windows 2000</sub> </replacement> Here, searches on the patterns NT5 or W2K are replaced by a search on Windows 2000, so your search will never find rows containing only the words NT5 or W2K. To use the thesaurus option, you need to use the formsOf predicate. Here is an example of a formsOf query: SELECT * FROM Person.Contact WHERE CONTAINS(*, ‘formsof(thesaurus,ie)’) Proximity SQL Server 2008 FTS supports the proximity predicate, which allows you to search on tokens that are close, or near, to each other. Near is defined as within 50 words. Words separated by more than 50 words do not show up in a CONTAINS or CONTAINSTABLE search. With a FREETEXT or FREETEXT table search, the separation distance can be up to 1,326 words. Here is an example of a proximity-based search: SELECT * FROM Person.Contact WHERE CONTAINS(*, ‘“peanut butter” NEAR “jam”’) Weighted A weighted search allows you to assign different weights to search tokens; you use the ISABOUT predicate to do a weighted search. If you want to search on Gulf of Mexico and Oil, and you want to place more emphasis on Gulf of Mexico than on Oil, you could query like this: SELECT * FROM Person.Contact WHERE CONTAINS(*, ‘isabout(“Gulf of Mexico” weight(0.7), Oil weight(0.1))’) You can use multiple weighted search items in a search, but doing so decreases the search speed. LANGUAGE Sometimes you might want to conduct a search in a different language than the default full-text language for your server. For example, say you want to conduct a German- language search on the contents of a column. To do this, you would use the language predicate like this: SELECT * FROM Person.Contact WHERE CONTAINS(*, ‘volkswagen’, LANGUAGE 1031) ptg 2023 Full-Text Searches 50 In this search, German language rules are applied when searching the index. In this case, the search on Volkswagen is expanded to a search on Volkswagen, wagen, and volk. If you are storing multilingual content in a single column, you should have a column that indi- cates the language of the content stored in the column. Otherwise, your searches might return unwanted results from content in different languages. CONTAINSTABLE CONTAINSTABLE supports all the predicates of the CONTAINS operator but returns a result set containing only the key and rank. The CONTAINSTABLE clause also supports all predicates of CONTAINS, but it allows you to use the TOP_n_BY RANK parameter to return only the first n results. Because the CONTAINSTABLE predicate returns only the key value and rank, you have to join it against the base table (or another related table) to get meaningful results. Here are some examples: SELECT * FROM Person.Contact JOIN (SELECT [key], rank FROM CONTAINSTABLE(Person.Contact, *, ‘test’)) AS k ON k.[key]= Person.Contact.ContactID In the following example, Person.Contact is a child table of the Sales.Individual table. Sales.Individual has a foreign key relationship to the Person.Contact table’s primary key, ContactID. This query illustrates how you could join the CONTAINSTABLE result set from the Person.Contact table against the Sales.Individual table (this example also illustrates the TOP_n_BY_RANK option): SELECT * FROM Sales.Individual as s JOIN (SELECT [key], rank FROM CONTAINSTABLE(Person.Contact, *, ‘jon’,100)) AS k ON k.[key]=s.Contactid order by rank desc In this query, you limit the results to the top 100 rows. The second query returns, at most, 100 rows with the highest-rank values. Keep in mind that CONTAINS is faster than FREETEXT, but it is a strict character-by-character match, unless you use some of the word-generation searches. FREETEXT and FREETEXTTABLE FREETEXT and FREETEXTTABLE incorporate what Microsoft considers to be the natural way to search. For example, if you were searching on book, you would expect to get hits to rows containing the word books (the plural). If you were searching on the word swimming, you would expect results containing the words swimming, swim, swims, swum, and so on. The FreeText and FREETEXTTABLE queries implicitly search on all generations of a word and include a proximity-based search. However, if you wrap your search in double quota- tion marks, the FREETEXT and FREETEXTTABLE predicates do not do any stemming. FREETEXT and FREETEXTTABLE also include the TOP_n_BY_RANK parameter. . file for your language. By default, the thesaurus files are in C:Program Files Microsoft SQL Server MSSQL.XMSSQLFTData, where X is the instance number. There is a thesaurus file for each full-text. indicating how aggressively SQL Server FTS is consoli- dating the catalog. Ranges from 1 to 5 (the most aggressive); 3 is the default. IsFulltextInstalled Indicates whether SQL Server FTS is installed. 1. Wizard to create them. To create a full-text index, follow these steps: 1. Connect to SQL Server in SQL Server Management Studio. 2. Expand the databases folder. 3. Expand the database that