1. Trang chủ
  2. » Công Nghệ Thông Tin

Programming Collective Intelligence potx

360 486 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 360
Dung lượng 3,56 MB

Nội dung

www.it-ebooks.info Praise for Programming Collective Intelligence “I review a few books each year, and naturally, I read a fair number during the course of my work. And I have to admit that I have never had quite as much fun reading a preprint of a book as I have in reading this. Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details.” — Dan Russell, Uber Tech Lead, Google “Toby’s book does a great job of breaking down the complex subject matter of machine- learning algorithms into practical, easy-to-understand examples that can be used directly to analyze social interaction across the Web today. If I had this book two years ago, it would have saved me precious time going down some fruitless paths.” — Tim Wolters, CTO, Collective Intellect “Programming Collective Intelligence is a stellar achievement in providing a comprehensive collection of computational methods for relating vast amounts of data. Specifically, it applies these techniques in context of the Internet, finding value in otherwise isolated data islands. If you develop for the Internet, this book is a must-have.” — Paul Tyma, Senior Software Engineer, Google www.it-ebooks.info www.it-ebooks.info Programming Collective Intelligence www.it-ebooks.info Other resources from O’Reilly Related titles Web 2.0 Report Learning Python Mastering Algorithms with C AI for Game Developers Mastering Algorithms with Perl oreilly.com oreilly.com is more than a complete catalog of O’Reilly books. You’ll also find links to news, events, articles, weblogs, sample chapters, and code examples. oreillynet.com is the essential portal for developers interested in open and emerging technologies, including new platforms, pro- gramming languages, and operating systems. Conferences O’Reilly brings diverse innovators together to nurture the ideas that spark revolutionary industries. We specialize in document- ing the latest tools and systems, translating the innovator’s knowledge into useful skills for those in the trenches. Visit conferences.oreilly.com for our upcoming events. Safari Bookshelf (safari.oreilly.com) is the premier online refer- ence library for programmers and IT professionals. Conduct searches across more than 1,000 books. Subscribers can zero in on answers to time-critical questions in a matter of seconds. Read the books on your Bookshelf from cover to cover or sim- ply flip to the page you need. Try it today for free. www.it-ebooks.info Programming Collective Intelligence Building Smart Web 2.0 Applications Toby Segaran Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo www.it-ebooks.info Programming Collective Intelligence by Toby Segaran Copyright © 2007 Toby Segaran. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Mary Treseler O’Brien Production Editor: Sarah Schneider Copyeditor: Amy Thomson Proofreader: Sarah Schneider Indexer: Julie Hawks Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrators: Robert Romano and Jessamyn Read Printing History: August 2007: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Programming Collective Intelligence, the image of King penguins, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. This book uses RepKover ™ , a durable and flexible lay-flat binding. ISBN-10: 0-596-52932-5 ISBN-13: 978-0-596-52932-1 [M] www.it-ebooks.info vii Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1. Introduction to Collective Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What Is Collective Intelligence? 2 What Is Machine Learning? 3 Limits of Machine Learning 4 Real-Life Examples 5 Other Uses for Learning Algorithms 5 2. Making Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Collaborative Filtering 7 Collecting Preferences 8 Finding Similar Users 9 Recommending Items 15 Matching Products 17 Building a del.icio.us Link Recommender 19 Item-Based Filtering 22 Using the MovieLens Dataset 25 User-Based or Item-Based Filtering? 27 Exercises 28 3. Discovering Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Supervised versus Unsupervised Learning 29 Word Vectors 30 Hierarchical Clustering 33 Drawing the Dendrogram 38 Column Clustering 40 www.it-ebooks.info viii | Table of Contents K-Means Clustering 42 Clusters of Preferences 44 Viewing Data in Two Dimensions 49 Other Things to Cluster 53 Exercises 53 4. Searching and Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 What’s in a Search Engine? 54 A Simple Crawler 56 Building the Index 58 Querying 63 Content-Based Ranking 64 Using Inbound Links 69 Learning from Clicks 74 Exercises 84 5. Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Group Travel 87 Representing Solutions 88 The Cost Function 89 Random Searching 91 Hill Climbing 92 Simulated Annealing 95 Genetic Algorithms 97 Real Flight Searches 101 Optimizing for Preferences 106 Network Visualization 110 Other Possibilities 115 Exercises 116 6. Document Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Filtering Spam 117 Documents and Words 118 Training the Classifier 119 Calculating Probabilities 121 A Naïve Classifier 123 The Fisher Method 127 Persisting the Trained Classifiers 132 Filtering Blog Feeds 134 www.it-ebooks.info Table of Contents | ix Improving Feature Detection 136 Using Akismet 138 Alternative Methods 139 Exercises 140 7. Modeling with Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Predicting Signups 142 Introducing Decision Trees 144 Training the Tree 145 Choosing the Best Split 147 Recursive Tree Building 149 Displaying the Tree 151 Classifying New Observations 153 Pruning the Tree 154 Dealing with Missing Data 156 Dealing with Numerical Outcomes 158 Modeling Home Prices 158 Modeling “Hotness” 161 When to Use Decision Trees 164 Exercises 165 8. Building Price Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Building a Sample Dataset 167 k-Nearest Neighbors 169 Weighted Neighbors 172 Cross-Validation 176 Heterogeneous Variables 178 Optimizing the Scale 181 Uneven Distributions 183 Using Real Data—the eBay API 189 When to Use k-Nearest Neighbors 195 Exercises 196 9. Advanced Classification: Kernel Methods and SVMs . . . . . . . . . . . . . . . . . . . 197 Matchmaker Dataset 197 Difficulties with the Data 199 Basic Linear Classification 202 Categorical Features 205 Scaling the Data 209 www.it-ebooks.info [...]... of collective intelligence, and the proliferation of new services means there are new opportunities appearing every day I believe that understanding machine learning and statistical methods will become ever more important in a wide variety of fields, but particularly in interpreting and organizing the vast amount of information that is being created by people all over the world What Is Collective Intelligence? ... Wikipedia has more entries than any other encyclopedia, and despite some manipulation by malicious users, it is 2 | Chapter 1: Introduction to Collective Intelligence www.it-ebooks.info generally believed to be accurate on most subjects This is an example of collective intelligence because each article is maintained by a large group of people and the result is an encyclopedia far larger than any single coordinated... in a search This is a very different example of collective intelligence Where Wikipedia explicitly invites users of the site to contribute, Google extracts the important information from what web-content creators do on their own sites and uses it to generate scores for its users While Wikipedia is a great resource and an impressive example of collective intelligence, it owes its existence much more to... learn about a few machine-learning algorithms, you’ll start seeing places to apply them just about everywhere 6 | Chapter 1: Introduction to Collective Intelligence www.it-ebooks.info Chapter 2 CHAPTER 2 Making Recommendations 2 To begin the tour of collective intelligence, I’m going to show you ways to use the preferences of a group of people to make recommendations to other people There are many applications... into your product’s documentation does require permission We appreciate, but do not require, attribution An attribution usually includes the title, author, publisher, and ISBN For example: Programming Collective Intelligence by Toby Segaran Copyright 2007 Toby Segaran, 978-0-596-52932-1.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact... item multiplied by 10 as the value: {1:10,2:20,3:30,4:40,5:50,6:60,7:70,8:80,9:90} Open APIs The algorithms for synthesizing collective intelligence require data from many users In addition to machine-learning algorithms, this book discusses a number of Open Web APIs (application programming interfaces) These are ways that companies allow you to freely access data from their web sites by means of a specified... machine-learning algorithms and statistical methods This combination will allow you to set up collective intelligence methods on data collected from your own applications, and also to collect and experiment with data from other places What Is Machine Learning? Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn What this means, in most... phrase collective intelligence for decades, and it has become increasingly popular and more important with the advent of new communications technologies Although the expression may bring to mind ideas of group consciousness or supernatural phenomena, when technologists use this phrase they usually mean the combining of behavior, preferences, or ideas of a group of people to create novel insights Collective. .. harder without your support and I certainly would have missed out on some of the more entertaining examples Preface | xxi www.it-ebooks.info www.it-ebooks.info Chapter 1 CHAPTER 1 Introduction to Collective Intelligence 1 Netflix is an online DVD rental company that lets people choose movies to be sent to their homes, and makes recommendations based on the movies that customers have previously rented... this book are written in Python, and familiarity with Python programming will help, but I provide explanations of all the algorithms so that programmers of other languages can follow The Python code will be particularly easy to follow for those who know high-level languages like Ruby or Perl This book is not intended as a guide for learning programming, so it’s important that you’ve done enough coding . saved me precious time going down some fruitless paths.” — Tim Wolters, CTO, Collective Intellect Programming Collective Intelligence is a stellar achievement in providing a comprehensive collection. free. www.it-ebooks.info Programming Collective Intelligence Building Smart Web 2.0 Applications Toby Segaran Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo www.it-ebooks.info Programming. . . . . . . . . xiii 1. Introduction to Collective Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What Is Collective Intelligence? 2 What Is Machine Learning?

Ngày đăng: 31/03/2014, 01:20

TỪ KHÓA LIÊN QUAN