Thông tin tài liệu
Pro Full-Text Search in
SQL Server 2008
■■■
Michael Coles with
Hilary Cotter
www.it-ebooks.info
Pro Full-Text Search in SQL Server 2008
Copyright © 2009 by Michael Coles and Hilary Cotter
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or by any information storage or retrieval
system, without the prior written permission of the copyright owner and the publisher.
ISBN-13 (pbk): 978-1-4302-1594-3
ISBN-13 (electronic): 978-1-4302-1595-0
Printed and bound in the United States of America 9 8 7 6 5 4 3 2 1
Trademarked names may appear in this book. Rather than use a trademark symbol with every occurrence
of a trademarked name, we use the names only in an editorial fashion and to the benefit of the trademark
owner, with no intention of infringement of the trademark.
Lead Editor: Jonathan Gennick
Technical Reviewer: Steve Jones
Editorial Board: Clay Andres, Steve Anglin, Mark Beckner, Ewan Buckingham, Tony Campbell,
Gary Cornell, Jonathan Gennick, Michelle Lowman, Matthew Moodie, Jeffrey Pepper,
Frank Pohlmann, Ben Renow-Clarke, Dominic Shakeshaft, Matt Wade, Tom Welsh
Project Manager: Denise Santoro Lincoln
Copy Editor: Benjamin Berg
Associate Production Director: Kari Brooks-Copony
Production Editor: Laura Esterman
Compositor/Artist: Octal Publishing, Inc.
Proofreader: Patrick Vincent
Indexer: Broccoli Information Management
Cover Designer: Kurt Krames
Manufacturing Director: Tom Debolski
Distributed to the book trade worldwide by Springer-Verlag New York, Inc., 233 Spring Street, 6th Floor,
New York, NY 10013. Phone 1-800-SPRINGER, fax 201-348-4505, e-mail orders-ny@springer-sbm.com, or
visit http://www.springeronline.com.
For information on translations, please contact Apress directly at 2855 Telegraph Avenue, Suite 600,
Berkeley, CA 94705. Phone 510-549-5930, fax 510-549-5939, e-mail info@apress.com, or visit
http://www.apress.com.
Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use.
eBook versions and licenses are also available for most titles. For more information, reference our Special
Bulk Sales–eBook Licensing web page at http://www.apress.com/info/bulksales.
The information in this book is distributed on an “as is” basis, without warranty. Although every precaution
has been taken in the preparation of this work, neither the author(s) nor Apress shall have any liability to
any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly
by the information contained in this work.
The source code for this book is available to readers at http://www.apress.com.
www.it-ebooks.info
For Devoné and Rebecca
—Michael
www.it-ebooks.info
v
Contents at a Glance
About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
About the Technical Reviewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
■CHAPTER 1 SQL Server Full-Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
■CHAPTER 2 Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
■CHAPTER 3 Basic and Advanced Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
■CHAPTER 4 Client Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
■CHAPTER 5 Multilingual Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
■CHAPTER 6 Indexing BLOBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
■CHAPTER 7 Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
■CHAPTER 8 Thesauruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
■CHAPTER 9 iFTS Dynamic Management Views and Functions . . . . . . . . . . . . 185
■CHAPTER 10 Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
■CHAPTER 11 Advanced Search Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
■APPENDIX A Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
■APPENDIX B iFTS_Books Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
■APPENDIX C Vector-Space Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
■INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
www.it-ebooks.info
vii
Contents
About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
About the Technical Reviewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
■CHAPTER 1 SQL Server Full-Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Welcome to Full-Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
History of SQL Server FTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Goals of Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Mechanics of Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
iFTS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Indexing Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Query Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Search Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Measuring Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Synonymy and Polysemy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
■CHAPTER 2 Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Initial Setup and Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Enabling Database Full-Text Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Creating Full-Text Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
The New Full-Text Catalog Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
The CREATE FULLTEXT CATALOG Statement . . . . . . . . . . . . . . . . . . 23
Upgrading Full-Text Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Creating Full-Text Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
The Full-Text Indexing Wizard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
The DocId Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
The CREATE FULLTEXT INDEX Statement . . . . . . . . . . . . . . . . . . . . . 33
www.it-ebooks.info
viii
■CONTENTS
Full-Text Index Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Full Population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Incremental Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Update Population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Additional Index Population Options . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Catalog Rebuild and Reorganization. . . . . . . . . . . . . . . . . . . . . . . . . . 37
Scheduling Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
SQL Profiler Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
System Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
■CHAPTER 3 Basic and Advanced Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
iFTS Predicates and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
FREETEXT and FREETEXTTABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Adding a Language Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Returning the Top N by RANK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
CONTAINS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Phrase Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Boolean Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Prefix Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Generational Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Proximity Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Weighted Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
CONTAINSTABLE Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Advanced Search Topics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Using XQuery contains() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Column Rank-Multiplier Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Taxonomy Search and Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
■CHAPTER 4 Client Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Hit Highlighting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
The Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Calling the Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Search Engine–Style Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Defining a Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
www.it-ebooks.info
■CONTENTS
ix
Extended Backus-Naur Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Implementing the Grammar with Irony. . . . . . . . . . . . . . . . . . . . . . . . 88
Generating the iFTS Query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Converting a Google-Style Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Querying with the New Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
■CHAPTER 5 Multilingual Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A Brief History of Written Language. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
iFTS and Language Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Writing Symbols and Alphabets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Bidirectional Writing and Capitalization . . . . . . . . . . . . . . . . . . . . . . 103
Hyphenation and Compound Words . . . . . . . . . . . . . . . . . . . . . . . . . 104
Nonalphanumeric Characters and Accent Marks . . . . . . . . . . . . . . 105
Token Position Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Generational Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Storing Multilingual Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Storing Plain Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Storing XML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Storing HTML Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Storing Microsoft Office Documents. . . . . . . . . . . . . . . . . . . . . . . . . 112
Storing Other Document Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Detecting Content Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Designing Tables to Store Multilingual Content . . . . . . . . . . . . . . . . . . . . 112
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
■CHAPTER 6 Indexing BLOBs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
LOB Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Character LOB Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
XML LOB Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Binary LOB Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
FILESTREAM BLOB Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Efficiency Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
FILESTREAM Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
T-SQL Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Storage Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
OpenSqlFilestream API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
www.it-ebooks.info
x
■CONTENTS
■CHAPTER 7 Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
System Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Creating Custom Stoplists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Managing Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Upgrading Noise Word Lists to Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Stoplist Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Stoplists and Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Stoplists and Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
■CHAPTER 8 Thesauruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Thesaurus Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Editing and Loading Thesaurus Files . . . . . . . . . . . . . . . . . . . . . . . . 167
Expansion Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Replacement Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Global and Local Thesauruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
A Practical Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Word Bags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Additional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Accent and Case Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Nonrecursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Overlapping Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
General Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
■CHAPTER 9 iFTS Dynamic Management Views and Functions . . . . . . . 185
iFTS and Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
DMVs and DMFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Looking Inside the Full-Text Index . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Parsing Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Accessing Full-Text Index Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Retrieving Population Information. . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Services and Memory Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
www.it-ebooks.info
■CONTENTS
xi
Catalog Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Listing Full-Text Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Retrieving Full-Text Index Metadata. . . . . . . . . . . . . . . . . . . . . . . . . 198
Revealing Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Viewing Supported Languages and Document Types . . . . . . . . . . 204
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
■CHAPTER 10 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Introducing Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Standard Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Third-Party Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Custom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Custom Filter Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Filter Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Custom Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Filter Class Factory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Filter Class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Compiling and Installing the Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Testing the Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Gatherer and Protocol Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Word Breakers and Stemmers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
■CHAPTER 11 Advanced Search Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Spelling Suggestion and Correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Spelling Suggestion Implementation . . . . . . . . . . . . . . . . . . . . . . . . 241
Name Searching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Phonetic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Soundex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
NYSIIS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
String Similarity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Longest Common Subsequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Edit Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
N-Grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
www.it-ebooks.info
xii
■CONTENTS
■APPENDIX A Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
■APPENDIX B iFTS_Books Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Installing the Sample Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Installing the Phonetic Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Sample Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
■APPENDIX C Vector-Space Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Documents As Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
■INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
www.it-ebooks.info
[...]... looking for —Bono Vox, U2 F ull-text search encompasses techniques for searching text-based data and documents This is an increasingly important function of modern databases SQL Server has had full-text search capability built into it since SQL Server 7.0 SQL Server 2008 integrated full-text search (iFTS) represents a significant improvement in full-text search functionality, a new level of full-text search. .. process: The SQL Server process contains both the SQL Server query processor, which compiles and executes SQL queries, and the full-text engine, which compiles and executes full-text queries This tight integration of the SQL Server and full-text query processors in SQL Server 2008 is a significant improvement over prior versions of SQL Server full-text search, allowing SQL Server to generate far more... adds the indexable words to inverted index fragments The last step of the indexing process is the master merge, which combines all of the index fragments into a single master full-text index The indexing process in general and the master merge in particular can be resource- and I/O-intensive Despite the intensity of the process, the indexing process doesn’t block queries from occurring Querying a full-text. .. degree in information technology and multiple Microsoft and other certifications Michael has published dozens of technical articles online and in print magazines, including SQL Server Central, ASPToday, and SQL Server Standard Michael is the author of the books Pro SQL Server 2008 XML (Apress, 2008) and Pro T -SQL 2008 Programmer’s Guide (Apress, 2008) , and he is a contributor to Accelerated SQL Server 2008. .. seen in FTS’s dependence on components that implement Indexing Service’s programming interfaces For instance, in SQL Server, document-specific filters are tied to filename extensions Though powerful for its day, the initial implementations of FTS in SQL Server 7.0 and 2000 proved to have certain limitations, including the following: • The DBMS itself made storing, manipulating, searching, and retrieving... enterprise-class database management system (DBMS) History of SQL Server FTS Full-text search has been a part of SQL Server since version 7.0 The initial design of SQL Server full-text search provided for reuse of Microsoft Indexing Service components Indexing Service is Microsoft’s core product for indexing and searching files and documents in the file system The idea was that FTS could easily reuse... interested in Having all key words in an index returns hits substantially faster than looking through every document you’re storing to find the user’s search phrase SQL Server uses an inverted index structure to store full-text index data The inverted index structure is built by breaking searchable content into word-length tokens (a process known as tokenizing) and storing each word with relevant metadata in. .. of international character sets and multilingual searches We also provide best practices around multilingual searching Chapter 6 SQL Server 2008 provides greater flexibility and more options for storing large object (LOB) data in your databases Chapter 6 discusses the options available for storing, managing, and indexing LOB data in your database In this chapter, we take a look at how SQL Server indexes... This book is intended for SQL Server developers and DBAs who want to get the most out of SQL Server 2008 Integrated Full-Text Search (iFTS) To get the most out of this book, you should have a working knowledge of T -SQL, as most of the sample code in the book is written in SQL Server 2008 T -SQL Sample code is also provided in C# and C++, where appropriate Although knowledge of these programming languages... of SQL Server the full-text index in SQL Server 2008 is stored in the database instead of the file system We will discuss setup, configuration, and population of full-text indexes in detail in Chapter 2 • Stoplist: The stoplist is simply a list of stopwords, or words that are considered useless for the purposes of full-text search The indexer consults the stoplist during the indexing and querying process . Pro Full-Text Search in
SQL Server 2008
■■■
Michael Coles with
Hilary Cotter
www.it-ebooks.info
Pro Full-Text Search in SQL Server 2008
Copyright. an increasingly important function of modern databases. SQL Server has had full-text search
capability built into it since SQL Server 7.0. SQL Server 2008
Ngày đăng: 07/03/2014, 18:20
Xem thêm: Pro Full-Text Search in SQL Server 2008 ppt, Pro Full-Text Search in SQL Server 2008 ppt