Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 547 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
547
Dung lượng
26,71 MB
Nội dung
[...]... major issues involved in search engine design Information Retrieval Relevance -Effective ranking Evaluation -Tes ting and measuring Information needs -User interaction Search Engines Performance -Efficient search and indexing Incorporating new data -Coverage and freshness Scalability -Growingwith data and users Adaptability -Tuning/or applications Specific problems -E.g., spam Fig 1.1 Search engine design... http://en.wikipedia.org/wiki /Information_ retrieval 10 http://lucene.apache.org 11 http://www.lemurproject.org 12 http://www .search- engines- book.com 9 8 1 Search Engines and Information Retrieval Lemur is an open source toolkit that includes the Indri C++-based search engine Lemur has primarily been used by information retrieval researchers to compare advanced search techniques Galago is a Java-based search engine that... this definition is still appropriate and accurate The term "informa1 Information retrieval is often abbreviated as IR In this book, we mostly use the full term This has nothing to do with the fact that many people think IR means "infrared" or something else 2 1 Search Engines and Information Retrieval tion" is very general, and information retrieval includes work on a wide range of types of information. .. retrieval namely, search engines 1.3 Search Engines A search engine is the practical application of information retrieval techniques to large-scale text collections A web search engine is the obvious example, but as 5 6 Also known as an evaluation corpus (plural corpora) Text REtrieval Conference—http://trec.nist.gov/ 1.3 Search Engines 7 has been mentioned, search engines can be found in many different... Enterprise search engines for example, Autonomy8—must be able to process the large variety of information sources in a company and use company-specific knowledge as part of search and related tasks, such as data mining Data mining refers to the automatic discovery of interesting structure in data and includes techniques such as clustering Desktop search engines, such as the Microsoft Vista™ search feature,... documents in ranked order Although searching the World Wide Web (web search] is by far the most common application involving information retrieval, search is also a crucial part of applications in corporations, government, and many other domains Vertical search is a specialized form of web search where the domain of the search is restricted to a particular topic Enterprise search involves finding the... Salton, a pioneer in information retrieval and one of the leading figures from the 1960s to the 1990s, proposed the following definition in his classic 1968 textbook (Salton, 1968): Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information Despite the huge advances in the understanding and technology of search in the past 40... problems -E.g., spam Fig 1.1 Search engine design and the core information retrieval issues Based on this discussion of the relationship between information retrieval and search engines, we now consider what roles computer scientists and others play in the design and use of search engines 1.4 Search Engineers Information retrieval research involves the development of mathematical models of text and language,... of information retrieval 1.2 The Big Issues Information retrieval researchers have focused on a few key issues that remain just as important in the era of commercial web search engines working with billions of web pages as they were when tests were done in the 1960s on document collections containing about 1.5 megabytes of text One of these issues is relevance Relevance is a fundamental concept in information. .. by coming up with easier and faster ways to find the right information These people, whether they call themselves computer scientists, software engineers, information scientists, search engine optimizers, or something else, are working in the field of Information Retrieval. 1 So, before we launch into a detailed journey through the internals of search engines, we will take a few pages to provide a context . STROHMAN This page intentionally left blank Contents 1 2 3 Search Engines and Information Retrieval 1.1 What Is Information Retrieval? 1.2 The Big Issues 1.3 Search Engines 1.4 Search Engineers Architecture . Sydney Hong Kong Seoul Singapore Taipei Tokyo Information Retrieval in Practice Search Engines University of Massachusetts, Amherst Yahoo! Research Google Inc. Editor -in- Chief Michael Hirsch Acquisitions. Web Search 7.6 Machine Learning and Information Retrieval 7.6. 1 Learning to Rank 7.6.2 Topic Models and Vocabulary Mismatch 7.7 Application-Based Models 8 Evaluating Search Engines 8.1