1. Trang chủ
  2. » Công Nghệ Thông Tin

Mining the social web, 2nd edition

448 365 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Cấu trúc

  • Copyright

  • Table of Contents

  • Preface

    • README.1st

    • Managing Your Expectations

    • Python-Centric Technology

    • Improvements Specific to the Second Edition

    • Conventions Used in This Book

    • Using Code Examples

    • Safari® Books Online

    • How to Contact Us

    • Acknowledgments for the Second Edition

    • Acknowledgments from the First Edition

  • Part I. A Guided Tour of the Social Web

    • Prelude

    • Chapter 1. Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More

      • 1.1. Overview

      • 1.2. Why Is Twitter All the Rage?

      • 1.3. Exploring Twitter’s API

        • 1.3.1. Fundamental Twitter Terminology

        • 1.3.2. Creating a Twitter API Connection

        • 1.3.3. Exploring Trending Topics

        • 1.3.4. Searching for Tweets

      • 1.4. Analyzing the 140 Characters

        • 1.4.1. Extracting Tweet Entities

        • 1.4.2. Analyzing Tweets and Tweet Entities with Frequency Analysis

        • 1.4.3. Computing the Lexical Diversity of Tweets

        • 1.4.4. Examining Patterns in Retweets

        • 1.4.5. Visualizing Frequency Data with Histograms

      • 1.5. Closing Remarks

      • 1.6. Recommended Exercises

      • 1.7. Online Resources

    • Chapter 2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More

      • 2.1. Overview

      • 2.2. Exploring Facebook’s Social Graph API

        • 2.2.1. Understanding the Social Graph API

        • 2.2.2. Understanding the Open Graph Protocol

      • 2.3. Analyzing Social Graph Connections

        • 2.3.1. Analyzing Facebook Pages

        • 2.3.2. Examining Friendships

      • 2.4. Closing Remarks

      • 2.5. Recommended Exercises

      • 2.6. Online Resources

    • Chapter 3. Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More

      • 3.1. Overview

      • 3.2. Exploring the LinkedIn API

        • 3.2.1. Making LinkedIn API Requests

        • 3.2.2. Downloading LinkedIn Connections as a CSV File

      • 3.3. Crash Course on Clustering Data

        • 3.3.1. Clustering Enhances User Experiences

        • 3.3.2. Normalizing Data to Enable Analysis

        • 3.3.3. Measuring Similarity

        • 3.3.4. Clustering Algorithms

      • 3.4. Closing Remarks

      • 3.5. Recommended Exercises

      • 3.6. Online Resources

    • Chapter 4. Mining Google+: Computing Document Similarity, Extracting Collocations, and More

      • 4.1. Overview

      • 4.2. Exploring the Google+ API

        • 4.2.1. Making Google+ API Requests

      • 4.3. A Whiz-Bang Introduction to TF-IDF

        • 4.3.1. Term Frequency

        • 4.3.2. Inverse Document Frequency

        • 4.3.3. TF-IDF

      • 4.4. Querying Human Language Data with TF-IDF

        • 4.4.1. Introducing the Natural Language Toolkit

        • 4.4.2. Applying TF-IDF to Human Language

        • 4.4.3. Finding Similar Documents

        • 4.4.4. Analyzing Bigrams in Human Language

        • 4.4.5. Reflections on Analyzing Human Language Data

      • 4.5. Closing Remarks

      • 4.6. Recommended Exercises

      • 4.7. Online Resources

    • Chapter 5. Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More

      • 5.1. Overview

      • 5.2. Scraping, Parsing, and Crawling the Web

        • 5.2.1. Breadth-First Search in Web Crawling

      • 5.3. Discovering Semantics by Decoding Syntax

        • 5.3.1. Natural Language Processing Illustrated Step-by-Step

        • 5.3.2. Sentence Detection in Human Language Data

        • 5.3.3. Document Summarization

      • 5.4. Entity-Centric Analysis: A Paradigm Shift

        • 5.4.1. Gisting Human Language Data

      • 5.5. Quality of Analytics for Processing Human Language Data

      • 5.6. Closing Remarks

      • 5.7. Recommended Exercises

      • 5.8. Online Resources

    • Chapter 6. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More

      • 6.1. Overview

      • 6.2. Obtaining and Processing a Mail Corpus

        • 6.2.1. A Primer on Unix Mailboxes

        • 6.2.2. Getting the Enron Data

        • 6.2.3. Converting a Mail Corpus to a Unix Mailbox

        • 6.2.4. Converting Unix Mailboxes to JSON

        • 6.2.5. Importing a JSONified Mail Corpus into MongoDB

        • 6.2.6. Programmatically Accessing MongoDB with Python

      • 6.3. Analyzing the Enron Corpus

        • 6.3.1. Querying by Date/Time Range

        • 6.3.2. Analyzing Patterns in Sender/Recipient Communications

        • 6.3.3. Writing Advanced Queries

        • 6.3.4. Searching Emails by Keywords

      • 6.4. Discovering and Visualizing Time-Series Trends

      • 6.5. Analyzing Your Own Mail Data

        • 6.5.1. Accessing Your Gmail with OAuth

        • 6.5.2. Fetching and Parsing Email Messages with IMAP

        • 6.5.3. Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension

      • 6.6. Closing Remarks

      • 6.7. Recommended Exercises

      • 6.8. Online Resources

    • Chapter 7. Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More

      • 7.1. Overview

      • 7.2. Exploring GitHub’s API

        • 7.2.1. Creating a GitHub API Connection

        • 7.2.2. Making GitHub API Requests

      • 7.3. Modeling Data with Property Graphs

      • 7.4. Analyzing GitHub Interest Graphs

        • 7.4.1. Seeding an Interest Graph

        • 7.4.2. Computing Graph Centrality Measures

        • 7.4.3. Extending the Interest Graph with “Follows” Edges for Users

        • 7.4.4. Using Nodes as Pivots for More Efficient Queries

        • 7.4.5. Visualizing Interest Graphs

      • 7.5. Closing Remarks

      • 7.6. Recommended Exercises

      • 7.7. Online Resources

    • Chapter 8. Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More

      • 8.1. Overview

      • 8.2. Microformats: Easy-to-Implement Metadata

        • 8.2.1. Geocoordinates: A Common Thread for Just About Anything

        • 8.2.2. Using Recipe Data to Improve Online Matchmaking

        • 8.2.3. Accessing LinkedIn’s 200 Million Online Résumés

      • 8.3. From Semantic Markup to Semantic Web: A Brief Interlude

      • 8.4. The Semantic Web: An Evolutionary Revolution

        • 8.4.1. Man Cannot Live on Facts Alone

        • 8.4.2. Inferencing About an Open World

      • 8.5. Closing Remarks

      • 8.6. Recommended Exercises

      • 8.7. Online Resources

  • Part II. Twitter Cookbook

    • Chapter 9. Twitter Cookbook

      • 9.1. Accessing Twitter’s API for Development Purposes

        • 9.1.1. Problem

        • 9.1.2. Solution

        • 9.1.3. Discussion

      • 9.2. Doing the OAuth Dance to Access Twitter’s API for Production Purposes

        • 9.2.1. Problem

        • 9.2.2. Solution

        • 9.2.3. Discussion

      • 9.3. Discovering the Trending Topics

        • 9.3.1. Problem

        • 9.3.2. Solution

        • 9.3.3. Discussion

      • 9.4. Searching for Tweets

        • 9.4.1. Problem

        • 9.4.2. Solution

        • 9.4.3. Discussion

      • 9.5. Constructing Convenient Function Calls

        • 9.5.1. Problem

        • 9.5.2. Solution

        • 9.5.3. Discussion

      • 9.6. Saving and Restoring JSON Data with Text Files

        • 9.6.1. Problem

        • 9.6.2. Solution

        • 9.6.3. Discussion

      • 9.7. Saving and Accessing JSON Data with MongoDB

        • 9.7.1. Problem

        • 9.7.2. Solution

        • 9.7.3. Discussion

      • 9.8. Sampling the Twitter Firehose with the Streaming API

        • 9.8.1. Problem

        • 9.8.2. Solution

        • 9.8.3. Discussion

      • 9.9. Collecting Time-Series Data

        • 9.9.1. Problem

        • 9.9.2. Solution

        • 9.9.3. Discussion

      • 9.10. Extracting Tweet Entities

        • 9.10.1. Problem

        • 9.10.2. Solution

        • 9.10.3. Discussion

      • 9.11. Finding the Most Popular Tweets in a Collection of Tweets

        • 9.11.1. Problem

        • 9.11.2. Solution

        • 9.11.3. Discussion

      • 9.12. Finding the Most Popular Tweet Entities in a Collection of Tweets

        • 9.12.1. Problem

        • 9.12.2. Solution

        • 9.12.3. Discussion

      • 9.13. Tabulating Frequency Analysis

        • 9.13.1. Problem

        • 9.13.2. Solution

        • 9.13.3. Discussion

      • 9.14. Finding Users Who Have Retweeted a Status

        • 9.14.1. Problem

        • 9.14.2. Solution

        • 9.14.3. Discussion

      • 9.15. Extracting a Retweet’s Attribution

        • 9.15.1. Problem

        • 9.15.2. Solution

        • 9.15.3. Discussion

      • 9.16. Making Robust Twitter Requests

        • 9.16.1. Problem

        • 9.16.2. Solution

        • 9.16.3. Discussion

      • 9.17. Resolving User Profile Information

        • 9.17.1. Problem

        • 9.17.2. Solution

        • 9.17.3. Discussion

      • 9.18. Extracting Tweet Entities from Arbitrary Text

        • 9.18.1. Problem

        • 9.18.2. Solution

        • 9.18.3. Discussion

      • 9.19. Getting All Friends or Followers for a User

        • 9.19.1. Problem

        • 9.19.2. Solution

        • 9.19.3. Discussion

      • 9.20. Analyzing a User’s Friends and Followers

        • 9.20.1. Problem

        • 9.20.2. Solution

        • 9.20.3. Discussion

      • 9.21. Harvesting a User’s Tweets

        • 9.21.1. Problem

        • 9.21.2. Solution

        • 9.21.3. Discussion

      • 9.22. Crawling a Friendship Graph

        • 9.22.1. Problem

        • 9.22.2. Solution

        • 9.22.3. Discussion

      • 9.23. Analyzing Tweet Content

        • 9.23.1. Problem

        • 9.23.2. Solution

        • 9.23.3. Discussion

      • 9.24. Summarizing Link Targets

        • 9.24.1. Problem

        • 9.24.2. Solution

        • 9.24.3. Discussion

      • 9.25. Analyzing a User’s Favorite Tweets

        • 9.25.1. Problem

        • 9.25.2. Solution

        • 9.25.3. Discussion

      • 9.26. Closing Remarks

      • 9.27. Recommended Exercises

      • 9.28. Online Resources

  • Part III. Appendixes

    • Appendix A. Information About This Book’s Virtual Machine Experience

    • Appendix B. OAuth Primer

      • Overview

        • OAuth 1.0A

        • OAuth 2.0

    • Appendix C. Python and IPython Notebook Tips & Tricks

  • Index

  • About the Author

Nội dung

www.it-ebooks.info www.it-ebooks.info Learn how to turn data into decisions From startups to the Fortune 500, smart companies are betting on data-driven insight, seizing the opportunities that are emerging from the convergence of four powerful trends: New methods of collecting, managing, and analyzing data n Cloud computing that offers inexpensive storage and flexible, on-demand computing power for massive data sets n Visualization techniques that turn complex data into images that tell a compelling story n n Tools that make the power of data available to anyone Get control over big data and turn it into insight with O’Reilly’s Strata offerings Find the inspiration and information to create new products or revive existing ones, understand customer behavior, and get the data edge Visit oreilly.com/data to learn more ©2011 O’Reilly Media, Inc O’Reilly logo is a registered trademark of O’Reilly Media, Inc www.it-ebooks.info www.it-ebooks.info SECOND EDITION Mining the Social Web Matthew A Russell www.it-ebooks.info Mining the Social Web, Second Edition by Matthew A Russell Copyright © 2014 Matthew A Russell All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://my.safaribooksonline.com) For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com Editor: Mary Treseler Production Editor: Kristen Brown Copyeditor: Rachel Monaghan Proofreader: Rachel Head October 2013: Indexer: Lucie Haskins Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Rebecca Demarest Second Edition Revision History for the Second Edition: 2013-09-25: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449367619 for release details Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc Mining the Social Web, the image of a groundhog, and related trade dress are trademarks of O’Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein ISBN: 978-1-449-36761-9 [LSI] www.it-ebooks.info If the ax is dull and its edge unsharpened, more strength is needed, but skill will bring success —Ecclesiastes 10:10 www.it-ebooks.info www.it-ebooks.info Table of Contents Preface xiii Part I A Guided Tour of the Social Web Prelude Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More 1.1 Overview 1.2 Why Is Twitter All the Rage? 1.3 Exploring Twitter’s API 1.3.1 Fundamental Twitter Terminology 1.3.2 Creating a Twitter API Connection 1.3.3 Exploring Trending Topics 1.3.4 Searching for Tweets 1.4 Analyzing the 140 Characters 1.4.1 Extracting Tweet Entities 1.4.2 Analyzing Tweets and Tweet Entities with Frequency Analysis 1.4.3 Computing the Lexical Diversity of Tweets 1.4.4 Examining Patterns in Retweets 1.4.5 Visualizing Frequency Data with Histograms 1.5 Closing Remarks 1.6 Recommended Exercises 1.7 Online Resources 6 9 12 15 20 26 28 29 32 34 36 41 42 43 Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More 45 2.1 Overview 2.2 Exploring Facebook’s Social Graph API 2.2.1 Understanding the Social Graph API 2.2.2 Understanding the Open Graph Protocol 46 46 48 54 vii www.it-ebooks.info 2.3 Analyzing Social Graph Connections 2.3.1 Analyzing Facebook Pages 2.3.2 Examining Friendships 2.4 Closing Remarks 2.5 Recommended Exercises 2.6 Online Resources 59 63 70 85 85 86 Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More 89 3.1 Overview 3.2 Exploring the LinkedIn API 3.2.1 Making LinkedIn API Requests 3.2.2 Downloading LinkedIn Connections as a CSV File 3.3 Crash Course on Clustering Data 3.3.1 Clustering Enhances User Experiences 3.3.2 Normalizing Data to Enable Analysis 3.3.3 Measuring Similarity 3.3.4 Clustering Algorithms 3.4 Closing Remarks 3.5 Recommended Exercises 3.6 Online Resources 90 90 91 96 97 100 101 112 115 131 132 133 Mining Google+: Computing Document Similarity, Extracting Collocations, and More 135 4.1 Overview 4.2 Exploring the Google+ API 4.2.1 Making Google+ API Requests 4.3 A Whiz-Bang Introduction to TF-IDF 4.3.1 Term Frequency 4.3.2 Inverse Document Frequency 4.3.3 TF-IDF 4.4 Querying Human Language Data with TF-IDF 4.4.1 Introducing the Natural Language Toolkit 4.4.2 Applying TF-IDF to Human Language 4.4.3 Finding Similar Documents 4.4.4 Analyzing Bigrams in Human Language 4.4.5 Reflections on Analyzing Human Language Data 4.5 Closing Remarks 4.6 Recommended Exercises 4.7 Online Resources 136 136 138 147 148 150 151 155 155 158 160 167 177 178 179 180 Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More 181 5.1 Overview viii 182 | Table of Contents www.it-ebooks.info ... EDITION Mining the Social Web Matthew A Russell www.it-ebooks.info Mining the Social Web, Second Edition by Matthew A Russell Copyright © 2014 Matthew A Russell All rights reserved Printed in the. .. http://bit.ly/MiningThe SocialWeb2E Preface www.it-ebooks.info | xvii Improvements Specific to the Second Edition When I began working on this second edition of Mining the Social Web, I don’t... according to the OSS license under which the code is released An attribution usually includes the title, author, publisher, and ISBN For example: Mining the Social Web, 2nd Edition, by Matthew A Russell

Ngày đăng: 27/03/2019, 14:10

TỪ KHÓA LIÊN QUAN