1. Trang chủ
  2. » Công Nghệ Thông Tin

Microsoft Press microsoft sql server 2005 PHẦN 10 pdf

140 324 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 140
Dung lượng 4,45 MB

Nội dung

813 Chapter 21 Creating Full-Text Catalogs SQL Server, like all database platforms, is built to store and retrieve large amounts of data. The system enables efficient data management by imposing a structure on the data it stores in its tables. However, not all data has a well-defined structure, and not all queries conform to basic true/false rules for retrieving data. To manage this type of data and its associated queries, other platforms rely on third-party tools. But SQL Server’s Full-Text Search component provides a powerful and flexible feature called full-text indexing to manage queries issued against unstructured data. This chapter provides an overview of full-text search elements and terminology, explains how to create full-text catalogs and indexes, and shows how to populate the indexes and keep them up to date. Then the chapter shows you how to execute full-text queries to search full-text indexed columns for matching words. Exam objectives in this chapter: ■ Implement a full-text search. ❑ Create a catalog. ❑ Create an index. ❑ Specify a full-text population method. Lessons in this chapter: ■ Lesson 1: Creating a Full-Text Catalog. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 ■ Lesson 2: Creating a Full-Text Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820 ■ Lesson 3: Populating a Full-Text Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 ■ Lesson 4: Querying Data by Using a Full-Text Index . . . . . . . . . . . . . . . . . . . . 828 Before You Begin To complete the lessons in this chapter, you must have ■ SQL Server 2005 installed. ■ Full-text indexing installed. ■ A copy of the AdventureWorks sample database installed in the instance. C2162271X.fm Page 813 Friday, April 29, 2005 8:07 PM 814 Chapter 21 Creating Full-Text Catalogs NOTE Full-text search SQL Server 2005 provides Full-Text Search as a separately installable component. You can find the option to install full-text functionality under the Database Engine node within the SQL Server 2005 Setup Wizard. If you specify default settings for installing the Database Engine, Full-Text Search is selected and installed. Full-text indexing has its own service, called Microsoft Full-Text Engine for SQL Server (MSFTESQL), for populating and managing full-text catalogs. One instance of full-text indexing is installed for each SQL Server instance, with each instance having its own MSFTESQL service and service account. MORE INFO Installing full-text search For complete information about installing full-text search, see the SQL Server 2005 Books Online article “Installing and Upgrading Full-Text Search.” SQL Server 2005 Books Online is installed as part of SQL Server 2005. Updates for SQL Server 2005 Books Online are available for download at www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx. Real World Michael Hotek One of the largest recruiting agencies in the world spent years developing a pro- prietary application that allowed recruiters to quickly and flexibly search the agency’s database for resumes that matched desired criteria. On any given day, agency employees ran thousands of queries against several hundred thousand resumes to fill thousands of openings spanning every industry and job function. To its competitors, this company was the model of success. However, this suc- cess came at the cost of hundreds of hard-working research assistants who spent 35–40 hours a week parsing resumes into a massive keyword index because the programming team couldn’t keep pace with the industry’s rate of change. Every week, the recruiting agency had to deal with hundreds of new job titles, technology changes, and terminology shifts. The IT team loaded all these changes into the automated parsing routines on which the search system was based. Then the team executed hundreds of tests to ensure accurate results before releasing the new code base. After the new search code was released, the IT team had to reparse the entire database of resumes, compare it with the pre- vious parsing, and then rebuild the keyword index. When the system was origi- nally deployed, this process took two to three days. Two years later, it was taking four to five weeks and growing longer all the time. The company had to find a solution. C2162271X.fm Page 814 Friday, April 29, 2005 8:07 PM Before You Begin 815 I was called in to help, and after spending about three hours gaining an under- standing of the company’s environment, I asked the IT staff if we could run a simple set of tests on a prototype solution. The staff was hesitant because all pre- vious “tests” they had performed with a variety of vendors required days or weeks of effort and yielded mixed results. But after assuring them that the initial tests should be completed by the end of the day, I was able to proceed. We installed SQL Server 2000’s Full-Text Search component, built a full-text cat- alog, and added two indexes. The entire process took about an hour on the sub- set of test data we were using. We then executed hundreds of the IT team’s test queries and compared the results with previous results. The results weren’t encouraging. Less than 10 percent of the results from the full-text queries matched the results from the proprietary search algorithms. We then looked at the results more closely. It turns out that our full-text queries were picking up thousands of resumes that the proprietary algorithms missed due to misspell- ings, synonyms, and other factors. The full-text results were also more accurate when dealing with the series of keywords on which recruiters normally searched. Our simple test turned into a full-blown pilot program. In less than a day, the developers could switch over the application’s querying capability to use the full-text index. Three days later, the application was in production with spectac- ular results. The day the new application went into production, the company shattered all previous records for matching potential candidates to job openings. Over the next two months, the company hit a record for placements, only to break it the following week. The agency no longer needed the position of research assistant, so it moved its research assistants into other roles, with most of them receiving promotions to junior recruiter. Implementing the full-text feature also let the company eliminate the entire scan- ning and optical character recognition (OCR) process it previously used. Resumes submitted in plain-text format were loaded into one column. Resumes that were submitted in any other format were converted to Microsoft Word or PDF format and loaded directly into the database. The IT team then used the full-text engine with add-in filters that could break the resumes down into words and index them in native document format without requiring any of the previ- ous time-consuming text conversions. C2162271X.fm Page 815 Friday, April 29, 2005 8:07 PM 816 Chapter 21 Creating Full-Text Catalogs In one case, one of the agency’s sales representatives was visiting a potential new customer, hoping to sign a contract to manage the customer’s recruiting efforts. The customer decided to give the agency a test on the spot and handed the rep- resentative a profile for a new job title that it was creating based on changes in its industry that had occurred just two weeks earlier. The sales rep did not know the customer was considering four other recruiting agencies. After getting a net- work connection, the sales rep immediately found 15 potential candidates for the new position. She walked out of the meeting with a contract in hand because none of the competitors could even find a reference to the skill set the customer was asking for. Over the next two years, this company expanded operations to span the globe, recording a corresponding 50× increase in number of placements. All this success came with very little investment in IT because full-text indexing could adapt itself to any language needed. As we write this book, the recruiting agency is fin- ishing its pilot program for upgrading to SQL Server 2005 and is expecting to reap significant performance improvements. NOTE Chapter conventions As with many technologies within SQL Server 2005, you can use SQL Server Management Studio (SSMS) to administer full-text indexing by pointing and clicking your way through administration screens. And you might choose to use SSMS to manage full-text functionality in your organization. However, walking through the screens in the SSMS graphical user interface (GUI) doesn’t explain very much about the functionality you can leverage. Because the SSMS screens and wizards submit Transact-SQL commands to SQL Server to perform the specified tasks, this chapter uses this code to explain what you can do to take advantage of full-text indexing in a variety of situations. C2162271X.fm Page 816 Friday, April 29, 2005 8:07 PM Lesson 1: Creating a Full-Text Catalog 817 Lesson 1: Creating a Full-Text Catalog Full-Text Search is based on the technology of full-text indexes. Although you create full- text indexes on columns within tables in SQL Server databases, the full-text indexes are maintained in a structure outside of SQL Server called a full-text catalog. A full-text cata- log stores one or more full-text indexes. In this lesson, you will see how to use the Trans- act-SQL CREATE FULLTEXT CATALOG command to create a full-text catalog. After this lesson, you will be able to: ■ Create a full-text catalog. Estimated lesson time: 20 minutes How to Create a Full-Text Catalog The first step in creating full-text indexing is to create a full-text catalog to hold the indexes. You create a catalog by using the CREATE FULLTEXT CATALOG Transact- SQL command, as the following general syntax shows: CREATE FULLTEXT CATALOG catalog_name [ON FILEGROUP filegroup ] [IN PATH 'rootpath'] [WITH <catalog_option>] [AS DEFAULT] [AUTHORIZATION owner_na me ] <catalog_option>::= ACCENT_SENSITIVITY = {ON|OFF} After giving the catalog a name, you specify a filegroup for the catalog, which needs to be part of the database for which the catalog will contain indexes. Although you can put the catalog on the default filegroup, it is a good practice to put a catalog on a sec- ondary filegroup and to use this filegroup only for full-text catalogs. This configura- tion lets you use filegroup backup and restore to back up and restore a full-text catalog independently of the rest of the database. You use the command’s IN PATH clause to specify the root directory in which the full- text catalog will be stored. For full-text catalogs, the filegroup specification simply associates a full-text catalog to a filegroup for use with backup and restore operations. However, the actual catalog is stored within a physical directory structure outside a database. When you create a catalog, a directory with the same name as your catalog is created in this root directory. If a directory that uses the same name as your catalog already exists, a suffix is appended to the name to create a unique directory structure. C2162271X.fm Page 817 Friday, April 29, 2005 8:07 PM 818 Chapter 21 Creating Full-Text Catalogs Within this directory structure, as indexes are added to the catalog, subdirectories are created to contain them. You use the command’s WITH clause to specify accent sensitivity. If you don’t specify an option for this clause, the full-text catalog uses the setting from the database’s col- lation. Otherwise, you can explicitly specify whether the catalog should be sensitive to accents. If you change this option later, you must rebuild all full-text indexes within the catalog. The next clause, AS DEFAULT, serves a similar purpose as setting a default filegroup. When you create full-text indexes without explicitly specifying a catalog, SQL Server creates the indexes within the default catalog. The command’s AUTHORIZATION clause simply specifies the user or role that owns the catalog. Quick Check 1. What is the purpose of a full-text catalog? 2. Where is a full-text catalog stored? Quick Check Answers 1. A full-text catalog provides the basic storage container for one or more full- text indexes. 2. Full-text catalogs, along with their associated indexes, are stored in a direc- tory structure that is external to SQL Server. PRACTICE Create a Full-Text Catalog In this practice, you create a full-text catalog to use with the AdventureWorks database. 1. Create a directory on the operating system named C:\test. 2. Launch SSMS, connect to your instance, and open a new query window. 3. Add a new filegroup to the AdventureWorks database that you will use for the full- text catalog by executing the following batch: USE master GO ALTER DATABASE AdventureWorks ADD FILEGROUP FTFG1 GO ALTER DATABASE AdventureWorks ADD FILE ( NAME = N'AdventureWorksFT_data', C2162271X.fm Page 818 Friday, April 29, 2005 8:07 PM Lesson 1: Creating a Full-Text Catalog 819 FILENAME = N'C:\TEST\AdventureWorksFT_data.ndf' , SIZE = 2048KB , FILEGROWTH = 1024KB ) TO FILEGROUP [FTFG1] GO NOTE Filegroup must have primary file Although full-text catalogs and indexes are stored in a directory structure external to SQL Server, the filegroup on which a full-text catalog is placed must have at least one active file. This file cannot be marked READ ONLY or taken OFFLINE. 4. Create a full-text catalog on the FTFG1 filegroup by executing the following command: USE AdventureWorks; GO CREATE FULLTEXT CATALOG AWCatalog ON FILEGROUP FTFG1 IN PATH 'C:\TEST' AS DEFAULT; GO Lesson Summary ■ The first step in setting up full-text indexing is to define a catalog to store one or more full-text indexes that are used to process queries. ■ You use the CREATE FULLTEXT CATALOG Transact-SQL command to create a full-text catalog. ■ Although you must associate a full-text catalog with a filegroup for backup and restore purposes, full-text catalogs are stored in a directory structure external to the database. Lesson Review The following questions are intended to reinforce key information presented in this lesson. The questions are also available on the companion CD if you prefer to review them in electronic form. NOTE Answers Answers to these questions and explanations of why each answer choice is right or wrong are located in the “Answers” section at the end of the book. 1. Where does the full-text catalog physically exist? A. Within the database in which it is associated B. In the msdb database C. In an external directory structure D. In a filegroup for the database C2162271X.fm Page 819 Friday, April 29, 2005 8:07 PM 820 Chapter 21 Creating Full-Text Catalogs Lesson 2: Creating a Full-Text Index After you have created a full-text catalog, you need to create one or more full-text indexes before you can execute full-text queries. In this lesson, you will review the powerful architecture of full-text indexing and then see how to create an index by using the CREATE FULLTEXT INDEX Transact-SQL command. After this lesson, you will be able to: ■ Explain the terminology associated with full-text indexing. ■ Create a full-text index. Estimated lesson time: 20 minutes Full-Text Index Architecture You can build full-text indexes on textual data stored in char, nchar, varchar, nvarchar, varchar(max), text, ntext, image, varbinary, varbinary(max), and xml columns. How- ever, the image, varbinary, and varbinary(max) columns require special handling if you want to use them for full-text processing. You use multiple helper services to build a compact and efficient full-text index. These services include word breakers and stemmers, language files, noise word files, filters, and protocol handlers. Word breakers are routines that find the breaks between words and generate a basic word list for each row within the column or columns that you are indexing. Stemmers conjugate verbs. Word breakers and stemmers work with language files to understand the words that are in the input stream. Language files, in conjunction with word breakers and stemmers, allow full-text indexing to handle multiple languages without requiring translation routines or specialized processing. Commonly used words in a language are referred to as noise words. Noise words are contained in language-specific noise files, which contain basic structural elements that are not useful for search routines. Examples of noise words for the English lan- guage are “the,” “a,” and “an.” When the word-breaker routine encounters a noise word for the particular language being processed, it ignores the word. Thus, a full-text index does not include all possible words in a column, but only those that are inter- esting for queries. C2162271X.fm Page 820 Friday, April 29, 2005 8:07 PM Lesson 2: Creating a Full-Text Index 821 NOTE Configuring noise words SQL Server ships with a default set of noise word files for each language. These files are stored in $SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\. The files are simple text files that you can edit to include noise words specific to your application that you want to exclude. If a word exists in this file, it is not indexed and is excluded from any full-text queries. At this point, you might be thinking that you can create full-text indexes only on text- based columns. This is not true. You use protocol handlers and filters when you want to create a full-text index on a varbinary, varbinary(max), or image column. These ser- vices let you extract text from Word, Excel, and PowerPoint files as well as PDF and other files that are stored in a native format inside SQL Server. For the filters to work, you need to add a column to the table to contain a value that indicates the type of doc- ument stored in the column. The filter then loads up the binary stream stored in the column, strips all the formatting information, and returns the text within the docu- ment to the word-breaker routine. BEST PRACTICES Filters By taking advantage of filters, you no longer have to convert files to a text-based format before being able to use full-text indexing on them. You can store files in their native format inside SQL Server while still allowing full-search capability. After the word-breaker routine has a list of valid words for a row within a column, the full-text engine calculates tokens to represent the words. A token is simply a com- pressed form of the original word that saves space and ensures that full-text indexes can be created in as compact a form as possible. The full-text functionality then builds all the tokens in a column into an inverted, stacked, compressed structure within a file that is used for search operations. This unique structure allows ranking and scoring algorithms to efficiently satisfy possible queries. How to Create a Full-Text Index To create a full-text index, you use the CREATE FULLTEXT INDEX Transact-SQL com- mand, as the following generic syntax shows: CREATE FULLTEXT INDEX ON table_name [(column_name [TYPE COLUMN type_column_name] [LANGUAGE language_term] [, n])] C2162271X.fm Page 821 Friday, April 29, 2005 8:07 PM 822 Chapter 21 Creating Full-Text Catalogs KEY INDEX index_name [ON fulltext_catalog_name] [WITH {CHANGE_TRACKING {MANUAL | AUTO | OFF [, NO POPULATION]}} ] The first part of this command specifies the table on which you want to create the full- text index. Although you can index multiple columns in a table, only one full-text index per table is allowed. You then specify the column or columns you want to index. If you specify a column of type varbinary, varbinary(max), or image for indexing, you must also specify the TYPE COLUMN clause. This clause refers to the column discussed earlier that you need to add to the table to designate the format of the column’s data. NOTE Type columns A type column is a character column that contains an abbreviation that corresponds to the con- tents of a column being indexed. For example, a value of .doc indicates a Word document. This value is entered on a row-by-row basis, so multiple different document types can be stored in a single column. This column is used to load the correct filter for the word-breaker routine when the index is built on a varbinary, varbinary(max), or image column. As you are specifying the column and column type for the index, you can also specify an explicit language for the column. You might need to specify this clause when you are indexing a table that contains multiple columns in which each column contains different languages, such as a column that is translated into multiple languages. The command’s KEY INDEX clause specifies the table’s unique column. This column uniquely identifies each row in the table so that the full-text index can be correlated to rows in the table. The key must be a single column in the table; compound keys are not allowed. The next clause, ON, enables you to specify the full-text catalog on which the index is created. And the final clause specifies whether changes to the indexed data are tracked. With reg- ular indexes, SQL Server always maintains the index in sync with the underlying data by causing changes in the index at the same time as changes to the referenced data are made. Full-text indexes, however, are separated from normal database transaction pro- cesses so that changes to data in columns that are full-text indexed are propagated into the index via a background process that does not immediately reflect the data changes. When the change-tracking value is set to MANUAL, changes to the data in the col- umns need to be propagated into the index either manually or by scheduling a job in C2162271X.fm Page 822 Friday, April 29, 2005 8:07 PM [...]... 2000 Server, your SQL Server 2005 installation requires Windows 2000 Server SP4 C Incorrect: If you’re using Windows 2000 Server, your SQL Server 2005 installation requires Windows 2000 Server SP4 D Correct: If you’re using Windows 2000 Server, your SQL Server 2005 installation requires Windows 2000 Server SP4 2 Correct Answer: A A Correct: The minimum service pack level required by SQL Server 2005. .. Server 2005 for Windows Server 2003 is SP1 B Incorrect: The minimum service pack level required by SQL Server 2005 for Windows Server 2003 is SP1 C Incorrect: The minimum service pack level required by SQL Server 2005 for Windows Server 2003 is SP1 D Incorrect: The minimum service pack level required by SQL Server 2005 for Windows Server 2003 is SP1 3 Correct Answer: A A Correct: Express Edition requires... $SQL_ Server_ Install_Path\ Microsoft SQL Server\ MSSQL.1\MSSQL\FTDATA\ directory For information about populating thesaurus files, see the SQL Server 2005 Books Online article “Configuring Thesaurus Files.” Word proximity is a common way of searching documents for multiple keywords or phrases This type of query uses the NEAR (~) keyword The closer words are to C2162271X.fm Page 833 Friday, April 29, 2005. .. purchase a SQL Server license C Incorrect: Developer Edition is not licensed for production use D Incorrect: Standard Edition requires users to purchase a SQL Server license Z01A62271X.fm Page 840 Friday, April 29, 2005 8:08 PM 840 Chapter 1: Lesson Review Answers Lesson 2 1 Correct Answer: D A Incorrect: If you’re using Windows 2000 Server, your SQL Server 2005 installation requires Windows 2000 Server. .. principals created, stored, and managed in SQL Server 2 Correct Answers: A and C A Correct: By using Windows authentication, SQL Server relies on operating system authentication You can gain access to all the operating system security features and can implement enterprise-wide policies B Incorrect: SQL Server 2005 lets you apply the local Windows Password Policy to SQL Server logins C Correct: Windows authentication... default instance on a single server D Incorrect: There can be only one default instance on a single server Lesson 4 1 Correct Answers: A and C A Correct: You need to install the SQL Server Agent service to use an account B Incorrect: This agent for transactional replication operates under the security account of the SQL Server Agent C Correct: You need to install the SQL Server service to use an account... authentication mode Z01A62271X.fm Page 842 Friday, April 29, 2005 8:08 PM 842 Chapter 1: Lesson Review Answers Lesson 5 1 Correct Answer: A A Correct: An in-place upgrade is installing SQL Server 2005 on top of the current installation B Incorrect: SQL Server does not allow the sharing of databases between an older version and a newer version of SQL Server C Incorrect: This is a side-by-side installation... database-level objects, not server- level principals Z01A62271X.fm Page 847 Friday, April 29, 2005 8:08 PM Chapter 2: Lesson Review Answers 847 B Correct: Fixed server roles are server principals that let you assign administrative rights to logins C Correct: Windows logins are server principals that let you give access to Windows users and groups D Correct: SQL Server logins are server principals created,... clients to use SQL Server with different service packs, it requires you to install separate SQL Server servers for each client In addition, this solution requires you to move a client to a different server if he requires a change in service pack 2 Correct Answer: A A Correct: There can be only one default instance on a single server B Incorrect: There can be only one default instance on a single server C... it essentially C2162271X.fm Page 830 Friday, April 29, 2005 8:07 PM 830 Chapter 21 Creating Full-Text Catalogs makes a guess This educated guess assigns a value based on the number of rows in the table up to a maximum value In SQL Server 2000, the maximum value for a full-text function was 1,000 In SQL Server 2005, this value has been increased to 10, 000 Obviously, guessing in the context of such a large . see the SQL Server 2005 Books Online article “Installing and Upgrading Full-Text Search.” SQL Server 2005 Books Online is installed as part of SQL Server 2005. Updates for SQL Server 2005 Books. can find them in the $SQL_ Server_ Install_Path Microsoft SQL Server MSSQL.1MSSQLFTDATA directory. For information about populating thesaurus files, see the SQL Server 2005 Books Online article. upgrading to SQL Server 2005 and is expecting to reap significant performance improvements. NOTE Chapter conventions As with many technologies within SQL Server 2005, you can use SQL Server Management

Ngày đăng: 07/08/2014, 02:22

TỪ KHÓA LIÊN QUAN