Pro Full-Text Search in SQL Server 2008 ■■■ Michael Coles with Hilary Cotter Pro Full-Text Search in SQL Server 2008 Copyright © 2009 by Michael Coles and Hilary Cotter All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-13 (pbk): 978-1-4302-1594-3 ISBN-13 (electronic): 978-1-4302-1595-0 Printed and bound in the United States of America 9 8 7 6 5 4 3 2 1 Trademarked names may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, we use the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. Lead Editor: Jonathan Gennick Technical Reviewer: Steve Jones Editorial Board: Clay Andres, Steve Anglin, Mark Beckner, Ewan Buckingham, Tony Campbell, Gary Cornell, Jonathan Gennick, Michelle Lowman, Matthew Moodie, Jeffrey Pepper, Frank Pohlmann, Ben Renow-Clarke, Dominic Shakeshaft, Matt Wade, Tom Welsh Project Manager: Denise Santoro Lincoln Copy Editor: Benjamin Berg Associate Production Director: Kari Brooks-Copony Production Editor: Laura Esterman Compositor/Artist: Octal Publishing, Inc. Proofreader: Patrick Vincent Indexer: Broccoli Information Management Cover Designer: Kurt Krames Manufacturing Director: Tom Debolski Distributed to the book trade worldwide by Springer-Verlag New York, Inc., 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax 201-348-4505, e-mail orders-ny@springer-sbm.com, or visit http://www.springeronline.com. For information on translations, please contact Apress directly at 2855 Telegraph Avenue, Suite 600, Berkeley, CA 94705. Phone 510-549-5930, fax 510-549-5939, e-mail info@apress.com, or visit http://www.apress.com. Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Special Bulk Sales–eBook Licensing web page at http://www.apress.com/info/bulksales. The information in this book is distributed on an “as is” basis, without warranty. Although every precaution has been taken in the preparation of this work, neither the author(s) nor Apress shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in this work. The source code for this book is available to readers at http://www.apress.com. For Devoné and Rebecca —Michael v Contents at a Glance About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii About the Technical Reviewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix ■CHAPTER 1 SQL Server Full-Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 ■CHAPTER 2 Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 ■CHAPTER 3 Basic and Advanced Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 ■CHAPTER 4 Client Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 ■CHAPTER 5 Multilingual Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 ■CHAPTER 6 Indexing BLOBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 ■CHAPTER 7 Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 ■CHAPTER 8 Thesauruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 ■CHAPTER 9 iFTS Dynamic Management Views and Functions . . . . . . . . . . . . 185 ■CHAPTER 10 Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 ■CHAPTER 11 Advanced Search Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 ■APPENDIX A Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 ■APPENDIX B iFTS_Books Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 ■APPENDIX C Vector-Space Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 ■INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 vii Contents About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii About the Technical Reviewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix ■CHAPTER 1 SQL Server Full-Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Welcome to Full-Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 History of SQL Server FTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Goals of Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Mechanics of Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 iFTS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Indexing Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Query Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Search Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Measuring Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Synonymy and Polysemy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 ■CHAPTER 2 Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Initial Setup and Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Enabling Database Full-Text Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Creating Full-Text Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 The New Full-Text Catalog Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 The CREATE FULLTEXT CATALOG Statement . . . . . . . . . . . . . . . . . . 23 Upgrading Full-Text Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Creating Full-Text Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 The Full-Text Indexing Wizard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 The DocId Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 The CREATE FULLTEXT INDEX Statement . . . . . . . . . . . . . . . . . . . . . 33 viii ■CONTENTS Full-Text Index Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Full Population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Incremental Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Update Population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Additional Index Population Options . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Catalog Rebuild and Reorganization. . . . . . . . . . . . . . . . . . . . . . . . . . 37 Scheduling Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 SQL Profiler Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 System Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 ■CHAPTER 3 Basic and Advanced Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 iFTS Predicates and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 FREETEXT and FREETEXTTABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Adding a Language Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Returning the Top N by RANK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 CONTAINS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Phrase Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Boolean Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Prefix Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Generational Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Proximity Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Weighted Searches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 CONTAINSTABLE Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Advanced Search Topics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Using XQuery contains() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Column Rank-Multiplier Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Taxonomy Search and Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 ■CHAPTER 4 Client Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Hit Highlighting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 The Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Calling the Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Search Engine–Style Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Defining a Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 ■CONTENTS ix Extended Backus-Naur Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Implementing the Grammar with Irony. . . . . . . . . . . . . . . . . . . . . . . . 88 Generating the iFTS Query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Converting a Google-Style Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Querying with the New Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 ■CHAPTER 5 Multilingual Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 A Brief History of Written Language. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 iFTS and Language Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Writing Symbols and Alphabets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Bidirectional Writing and Capitalization . . . . . . . . . . . . . . . . . . . . . . 103 Hyphenation and Compound Words . . . . . . . . . . . . . . . . . . . . . . . . . 104 Nonalphanumeric Characters and Accent Marks . . . . . . . . . . . . . . 105 Token Position Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Generational Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Storing Multilingual Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Storing Plain Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Storing XML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Storing HTML Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Storing Microsoft Office Documents. . . . . . . . . . . . . . . . . . . . . . . . . 112 Storing Other Document Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Detecting Content Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Designing Tables to Store Multilingual Content . . . . . . . . . . . . . . . . . . . . 112 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 ■CHAPTER 6 Indexing BLOBs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 LOB Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Character LOB Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 XML LOB Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Binary LOB Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 FILESTREAM BLOB Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Efficiency Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 FILESTREAM Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 T-SQL Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Storage Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 OpenSqlFilestream API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 x ■CONTENTS ■CHAPTER 7 Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 System Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Creating Custom Stoplists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Managing Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Upgrading Noise Word Lists to Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Stoplist Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Stoplists and Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Stoplists and Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 ■CHAPTER 8 Thesauruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Thesaurus Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Editing and Loading Thesaurus Files . . . . . . . . . . . . . . . . . . . . . . . . 167 Expansion Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Replacement Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Global and Local Thesauruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 A Practical Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Word Bags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Additional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Accent and Case Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Nonrecursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Overlapping Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 General Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 ■CHAPTER 9 iFTS Dynamic Management Views and Functions . . . . . . . 185 iFTS and Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 DMVs and DMFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Looking Inside the Full-Text Index . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Parsing Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Accessing Full-Text Index Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Retrieving Population Information. . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Services and Memory Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 ■CONTENTS xi Catalog Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Listing Full-Text Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Retrieving Full-Text Index Metadata. . . . . . . . . . . . . . . . . . . . . . . . . 198 Revealing Stoplists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Viewing Supported Languages and Document Types . . . . . . . . . . 204 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 ■CHAPTER 10 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Introducing Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Standard Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Third-Party Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Custom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Custom Filter Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Filter Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Custom Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Filter Class Factory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Filter Class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Compiling and Installing the Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Testing the Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Gatherer and Protocol Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Word Breakers and Stemmers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 ■CHAPTER 11 Advanced Search Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Spelling Suggestion and Correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Spelling Suggestion Implementation . . . . . . . . . . . . . . . . . . . . . . . . 241 Name Searching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Phonetic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Soundex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 NYSIIS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 String Similarity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Longest Common Subsequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Edit Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 N-Grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 xii ■CONTENTS ■APPENDIX A Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 ■APPENDIX B iFTS_Books Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Installing the Sample Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Installing the Phonetic Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Sample Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 ■APPENDIX C Vector-Space Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Documents As Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 ■INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 [...]... cover full-text query syntax in detail in Chapter 3 • SQL Server process: The SQL Server process contains both the SQL Server query processor, which compiles and executes SQL queries, and the full-text engine, which compiles and executes full-text queries This tight integration of the SQL Server and full-text query processors in SQL Server 2008 is a significant improvement over prior versions of SQL Server. .. for searching text-based data and documents This is an increasingly important function of modern databases SQL Server has had full-text search capability built into it since SQL Server 7.0 SQL Server 2008 integrated full-text search (iFTS) represents a significant improvement in full-text search functionality, a new level of full-text search integration into the database engine over prior releases In. .. degree in information technology and multiple Microsoft and other certifications Michael has published dozens of technical articles online and in print magazines, including SQL Server Central, ASPToday, and SQL Server Standard Michael is the author of the books Pro SQL Server 2008 XML (Apress, 2008) and Pro T -SQL 2008 Programmer’s Guide (Apress, 2008) , and he is a contributor to Accelerated SQL Server 2008. .. can be resource- and I/O-intensive Despite the intensity of the process, the indexing process doesn’t block queries from occurring Querying a full-text index during the indexing process, however, can result in partial and incomplete results being returned Query Process The full-text query process uses the same language-specific word breakers that the indexer uses in the indexing process; however, it uses... seen in FTS’s dependence on components that implement Indexing Service’s programming interfaces For instance, in SQL Server, document-specific filters are tied to filename extensions Though powerful for its day, the initial implementations of FTS in SQL Server 7.0 and 2000 proved to have certain limitations, including the following: • The DBMS itself made storing, manipulating, searching, and retrieving... 1-2 Inverted index with stopwords removed Whenever you perform a full-text search in SQL Server, the full-text query engine tokenizes your input string and consults the inverted index to locate relevant documents We’ll discuss indexing in detail in Chapter 2 and full-text search queries in Chapter 3 iFTS Architecture The iFTS architecture consists of several full-text search components working in cooperation... through every document you’re storing to find the user’s search phrase SQL Server uses an inverted index structure to store full-text index data The inverted index structure is built by breaking searchable content into word-length tokens (a process known as tokenizing) and storing each word with relevant metadata in the index An inverted index for a document containing the phrase Now is the time for... enterprise-class database management system (DBMS) History of SQL Server FTS Full-text search has been a part of SQL Server since version 7.0 The initial design of SQL Server full-text search provided for reuse of Microsoft Indexing Service components Indexing Service is Microsoft’s core product for indexing and searching files and documents in the file system The idea was that FTS could easily reuse... stored in the database can be indexed), inflectional and thesaurus generational terms, ranking, and elimination of noise words, full-text search provides a powerful set of tools for searching your data Full-text search functionality is an increasingly important function in modern databases There are many reasons for this increase in popularity, including the following: • Databases are increasingly being... are making best guesses, hoping to get the right answers In other words, unsophisticated searchers rely on a hitor-miss approach, blind luck, or serendipity You can help your users by offering training in corporate environments, providing online help, and instituting other methods of educating them Good search engineers will institute some form of logging to determine what their users are searching for, . Pro Full-Text Search in SQL Server 2008 ■■■ Michael Coles with Hilary Cotter Pro Full-Text Search in SQL Server 2008 Copyright © 2009. online and in print magazines, including SQL Server Central, ASPToday, and SQL Server Standard. Michael is the author of the books Pro SQL Server 2008