1. Trang chủ
  2. » Công Nghệ Thông Tin

Collective Intelligence in Action phần 1 pdf

43 354 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 43
Dung lượng 3,91 MB

Nội dung

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Collective Intelligence in Action Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Collective Intelligence in Action SATNAM ALAG MANNING Greenwich (74° w. long.) Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com To my dear sons, Ayush and Shray, and my beautiful, loving, and intelligent wife, Alpana For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. Sound View Court 3B fax: (609) 877-8256 Greenwich, CT 06830 email: orders@manning.com ©2009 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15% recycled and processed without the use of elemental chlorine. Development Editor: Jeff Bleiel Manning Publications Co. Copyeditor: Benjamin Berg Sound View Court 3B Typesetter: Gordan Salinovic Greenwich, CT 06830 Cover designer: Leslie Haimes ISBN 1933988312 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – MAL – 13 12 11 10 09 08 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com v brief contents PART 1 GATHERING DATA FOR INTELLIGENCE 1 1 ■ Understanding collective intelligence 3 2 ■ Learning from user interactions 20 3 ■ Extracting intelligence from tags 50 4 ■ Extracting intelligence from content 82 5 ■ Searching the blogosphere 107 6 ■ Intelligent web crawling 145 PART 2 DERIVING INTELLIGENCE 173 7 ■ Data mining: process, toolkits, and standards 175 8 ■ Building a text analysis toolkit 206 9 ■ Discovering patterns with clustering 240 10 ■ Making predictions 274 PART 3 APPLYING INTELLIGENCE IN YOUR APPLICATION 307 11 ■ Intelligent search 309 12 ■ Building a recommendation engine 349 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com vii contents foreword xvii preface xix acknowledgments xxi about this book xxiii PART 1GATHERING DATA FOR INTELLIGENCE 1 1 Understanding collective intelligence 3 1.1 What is collective intelligence? 4 1.2 CI in web applications 6 Collective intelligence from the ground up: a sample application 7 ■ Benefits of collective intelligence 9 ■ CI is the core component of Web 2.0 10 ■ Harnessing CI to transform from content-centric to user-centric applications 12 1.3 Classifying intelligence 14 Explicit intelligence 14 ■ Implicit intelligence 15 ■ Derived intelligence 16 1.4 Summary 18 1.5 Resources 18 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com CONTENTS viii 2 Learning from user interactions 20 2.1 Architecture for applying intelligence 21 Synchronous and asynchronous services 21 ■ Real-time learning in an event-driven system 23 ■ Polling services for non–event-driven systems 24 ■ Advantages and disadvantages of event-based and non–event-based architectures 25 2.2 Basics of algorithms for applying CI 25 Users and items 26 ■ Representing user information 27 Content-based analysis and collaborative filtering 29 Representing intelligence from unstructured text 30 Computing similarities 31 ■ Types of datasets 32 2.3 Forms of user interaction 34 Rating and voting 35 ■ Emailing or forwarding a link 36 ■ Bookmarking and saving 36 ■ Purchasing items 37 ■ Click-stream 37 ■ Reviews 39 2.4 Converting user interaction into collective intelligence 41 Intelligence from ratings via an example 41 ■ Intelligence from bookmarking, saving, purchasing Items, forwarding, click-stream, and reviews 46 2.5 Summary 48 2.6 Resources 48 3 Extracting intelligence from tags 50 3.1 Introduction to tagging 51 Tag-related metadata for users and items 52 ■ Professionally generated tags 52 ■ User-generated tags 53 ■ Machine-generated tags 54 ■ Tips on tagging 55 ■ Why do users tag? 55 3.2 How to leverage tags 56 Building dynamic navigation 56 ■ Innovative uses of tag clouds 58 Targeted search 59 ■ Folksonomies and building a dictionary 60 3.3 Extracting intelligence from user tagging: an example 60 Items related to other items 61 ■ Items of interest for a user 61 ■ Relevant users for an item 62 3.4 Scalable persistence architecture for tagging 62 Reviewing other approaches 63 ■ Recommended persistence architecture 66 3.5 Building tag clouds 69 Persistence design for tag clouds 69 ■ Algorithm for building a tag cloud 70 ■ Implementing a tag cloud 71 ■ Visualizing a tag cloud 76 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com CONTENTS ix 3.6 Finding similar tags 79 3.7 Summary 80 3.8 Resources 81 4 Extracting intelligence from content 82 4.1 Content types and integration 83 Classifying content 83 ■ Architecture for integrating content 85 4.2 The main CI-related content types 86 Blogs 87 ■ Wikis 89 ■ Groups and message boards 91 4.3 Extracting intelligence step by step 93 Setting up the example 94 ■ Naïve analysis 95 ■ Removing common words 98 ■ Stemming 99 ■ Detecting phrases 100 4.4 Simple and composite content types 102 4.5 Summary 103 4.6 Resources 104 5 Searching the blogosphere 107 5.1 Introducing the blogosphere 108 Leveraging the blogosphere 108 ■ RSS: the publishing format 109 ■ Blog-tracking companies 111 5.2 Building a framework to search the blogosphere 111 The searcher 113 ■ The search parameters 113 ■ The query results 114 ■ Handling the XML response 115 ■ Exception handling 116 5.3 Implementing the base classes 116 Implementing the search parameters 117 ■ Implementing the result objects 117 ■ Implementing the searcher 119 ■ Parsing XML response 123 ■ Extending the framework 127 5.4 Integrating Technorati 128 Technorati search API overview 128 ■ Implementing classes for integrating Technorati 130 5.5 Integrating Bloglines 135 Bloglines search API overview 135 ■ Implementing classes for integrating Bloglines 136 5.6 Integrating providers using RSS 139 Generalizing the query parameters 139 ■ Generalizing the blog searcher 140 ■ Building the RSS 2.0 XML parser 141 5.7 Summary 143 5.8 Resources 143 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... robots.txt file 15 6 Retrieving the content 15 9 Extracting URLs 16 0 Making the crawler intelligent 16 1 Running the crawler 16 2 Extending the crawler 16 3 ■ ■ ■ ■ ■ ■ 6.3 Scalable crawling with Nutch 16 4 Setting up Nutch 16 4 Running the Nutch crawler 16 5 Searching with Nutch 16 8 Apache Hadoop, MapReduce, and Dryad 16 9 ■ ■ ■ 6.4 6.5 PART 2 7 Summary 17 1 Resources 17 1 DERIVING INTELLIGENCE 17 3 Data mining: process,... standards 17 5 7 .1 Core concepts of data mining 17 6 Attributes 17 6 Supervised and unsupervised learning 17 8 Key learning algorithms 17 8 The mining process 18 1 ■ ■ 7.2 Using an open source data mining framework: WEKA 18 2 Using the WEKA application: a step-by-step tutorial 18 3 Understanding the WEKA APIs 18 6 Using the WEKA APIs via an example 18 8 ■ 7.3 Standard data mining API: Java Data Mining (JDM) 19 3 JDM... CONTENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 6 Intelligent web crawling 14 5 6 .1 Introducing web crawling 14 6 Why crawl the Web? 14 6 The crawling process 14 7 Intelligent crawling and focused crawling 14 9 Deep crawling 15 0 Available crawlers 15 1 ■ ■ ■ 6.2 Building an intelligent crawler step by step 15 2 Implementing the core algorithm 15 2 Being polite: following the robots.txt... APIs 304 Retrieving the classification model using the JDM APIs 305 Retrieving the classification model using the JDM APIs 305 ■ ■ ■ ■ 10 .6 10 .7 PART 3 11 Summary 306 Resources 306 APPLYING INTELLIGENCE IN YOUR APPLICATION .307 Intelligent search 309 11 .1 Search fundamentals 310 Search architecture 310 Core Lucene classes indexing and searching via example 313 ■ 11 .2 Indexing with Lucene 311 ■ Basic 320... 320 Understanding the index format 320 Modifying the index 3 21 Incremental indexing 322 Accessing the term frequency vector 324 Optimizing indexing performance 325 ■ ■ ■ ■ 11 .3 Searching with Lucene 327 Understanding Lucene scoring 327 Querying Lucene 330 Sorting search results 3 31 Querying on multiple fields 333 Filtering 334 Searching multiple indexes 335 Using a HitCollector 335 Optimizing search performance... 2 61 ■ ■ ■ ■ 9.2 Leveraging WEKA for clustering 262 Creating the learning dataset 263 Creating the clusterer 265 Evaluating the clustering results 266 ■ ■ 9.3 Clustering using the JDM APIs 268 Key JDM clustering-related classes 268 Clustering settings using the JDM APIs 269 Creating the clustering task using the JDM APIs 2 71 Executing the clustering task using the JDM APIs 2 71 Retrieving the clustering... http://www.simpopdf.com Understanding collective intelligence This chapter covers ■ The basics of collective intelligence ■ How collective intelligence manifests itself in web applications ■ Building user-centric applications using collective intelligence ■ The three forms of intelligence: direct, indirect, and derived Web applications are undergoing a revolution In this post-dot-com era, the web is transforming... product, as shown in figure 1. 1—or it may be more involved—building models to recommend personalized content to a user This book is focused toward building the more involved models to personalize your application As shown in figure 1. 2, there are three things that need to happen to apply collective intelligence in your application You need to 1 2 3 Intelligence from Mining Data User A user influences others... phrases 214 Writing an analyzer to inject synonyms and detect phrases 218 Putting our analyzers to work 218 ■ ■ ■ ■ xi CONTENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 8.2 Building the text analysis infrastructure 2 21 Building the tag infrastructure 222 Building the term vector infrastructure 225 Building the Text Analyzer class 2 31 Applying the text analysis infrastructure... Contribute and interact 1. 2 .1 Content Model User Contribute and interact Figure 1. 2 Three components to harnessing collective intelligence 1: Allow users to interact 2: Learn about your users in aggregate 3: Personalize content using user interaction data and aggregate data Collective intelligence from the ground up: a sample application In our example, John and Jane are two engineers who gave up their . book xxiii PART 1GATHERING DATA FOR INTELLIGENCE 1 1 Understanding collective intelligence 3 1. 1 What is collective intelligence? 4 1. 2 CI in web applications 6 Collective intelligence from. classes 311 ■ Basic indexing and searching via example 313 11 .2 Indexing with Lucene 320 Understanding the index format 320 ■ Modifying the index 3 21 ■ Incremental indexing 322 ■ Accessing. 39 2.4 Converting user interaction into collective intelligence 41 Intelligence from ratings via an example 41 ■ Intelligence from bookmarking, saving, purchasing Items, forwarding, click-stream,

Ngày đăng: 12/08/2014, 10:22

TỪ KHÓA LIÊN QUAN

w