www.it-ebooks.info www.it-ebooks.info Praise for Mining the Social Web “Mining the Social Web is a must-read as data is distributed at a dizzying pace. A great primer for API jockeys, social media junkies, and data scientists alike, [Matthew] Russell deftly distills the prodigious opportunity in mining social media data.” — Nick Ducoff, CEO of Infochimps, Inc. “This is an essential guide to tapping the new generation of online data sources. Russell has done a great job creating an accessible manual for anyone working with social informa- tion on the web, covering both how to access it and simple methods for extracting surprising insights from all that raw data.” — Pete Warden, Founder of OpenHeatMap.com “Mining the Social Web is now my go-to book for any project that involves analyzing social data. It contains a multitude of useful examples and is highly recommended for any data mining project you’re considering. Great for beginners and advanced readers alike.” — Abe Music, Principal, Zaffra “This book is clearly a labor of love for the author. He has deftly woven together the use of classic text and graph mining libraries with current social media applications. Examples are concrete and concise while providing useful insights that facilitate future development and exploration by the reader. This text is a great primer for those just beginning their forays into extracting understanding from social networks, and also for advanced researchers needing access to the latest social media APIs.” — Chris Augeri, Senior Research Fellow, University of Nebraska “This is a phenomenal book for anyone wanting to get started mining social data. It is well- researched and provides plenty of examples to get one going from the very first chapter. It is also very easy to follow and a real pleasure to read. This book is my first recommendation for anyone interested in the mining, analysis, and visualization of data from the social web.” — Jeffrey Humphries, PhD; Computer Scientist Mining_praise_page Page i Wednesday, January 12, 2011 10:28 AM www.it-ebooks.info “Few things will impact us the way automated understanding of human communication by software will in the coming years. This subject is broad and deep. It has been the subject of thousands of papers and hundreds of dissertations. What Matthew has pulled together is something that has really been missing: an applied introduction to a diverse and deep set of technologies and topics that make the knowledge buried in human communication inside the social web accessible. It is the work of a powerful technologist—someone who can equip capable programmers with new tools that are truly valuable. Read this book. It will open up doors to where software is going in the next decade.” — Tim Estes, Founder and CEO, Digital Reasoning “Mining the Social Web is a great resource on how to get the most out of the Twitter API.” — Raffi Krikorian, Platform Services group, Twitter “Matthew covers an interesting and eclectic group of data sources, analysis techniques, data management tools, and visualizations that provide a thorough survey of the latest thinking on how to gain insight from the social web. His examples are vivid and serve as great starting points for further exploration. Matthew clearly cares that the reader under- stands the material; the book is chock full of timely, knowing, and truly helpful hints and advice. Mining the Social Web has me excited to dive further into this rich area of analysis.” — Roger Magoulas, Director of Market Research, O’Reilly Media Mining_praise_page Page ii Wednesday, January 12, 2011 10:28 AM www.it-ebooks.info Mining the Social Web www.it-ebooks.info www.it-ebooks.info Mining the Social Web Matthew A. Russell Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo www.it-ebooks.info Mining the Social Web by Matthew A. Russell Copyright © 2011 Matthew Russell. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Mike Loukides Production Editor: Adam Zaremba Copyeditor: Rachel Head Proofreader: Marlowe Shaeffer Indexer: Ellen Troutman Zaig Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Printing History: January 2011: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Mining the Social Web, the image of a groundhog, and related trade dress are trade- marks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. TM This book uses RepKover™, a durable and flexible lay-flat binding. ISBN: 978-1-449-38834-8 [M] 1294936576 www.it-ebooks.info To those seeking knowledge and wisdom: Use wisdom and understanding to establish your home; Let good sense fill the rooms with priceless treasures. Wisdom brings strength, and knowledge gives power. Battles are won by listening to advice and making a lot of plans. May you find knowledge and wisdom. www.it-ebooks.info www.it-ebooks.info [...]... tweet of “RT @SocialWebMining Justin Bieber is on SNL 2nite w00t?!?” would indicate that the sender is retweeting information gained via the user @SocialWebMining An equivalent form of the retweet would be “Justin Bieber is on SNL 2nite w00t?!? Ummm…(via @SocialWebMining)” Extracting relationships from the tweets Because the social web is first and foremost about the linkages between people in the real... SNL 2nite w00t?!? (via @SocialWebMining)"] >>> for t in example_tweets: rt_patterns.findall(t) [('RT', ' @SocialWebMining')] [('via', ' @SocialWebMining')] In case it’s not obvious, the call to findall returns a list of tuples in which each tuple contains either the matching text or an empty string for each group in the pattern; note that the regex does leave a leading space on the extracted entities,... off the Web, † and they are enabling technology to bring out the best (and sometimes the worst) in us The explosion of social networks is just one of the ways that the gap between the real world and cyberspace is continuing to narrow Generally speaking, each chapter of this book interlaces slivers of the social web along with data mining, analysis, and visualization techniques to answer the following kinds... terms that prohibit the use of their data outside of their platforms, but at the moment, it’s par for the course Most social networking sites are like walled gardens, but from their standpoint (and the standpoint of their investors) a lot of the value these companies offer currently relies on controlling the platforms and protecting the privacy of their users; it’s a tough balance to maintain and probably... in the following chapters are available for download at GitHub at https://github.com/ptwobrussell /Mining- the- Social- Web the official code repository for this book You are encouraged to monitor this repository for the latest bug-fixed code as well as extended examples by the author and the rest of the social coding community This book is here to help you get your job done In general, you may use the. .. terminology rather than web- centric terminology as they simultaneously promote graph-based APIs In fact, Tim Berners-Lee has suggested that perhaps he should have used the term Giant Global Graph (GGG) instead of World Wide Web (WWW), because the terms web and “graph” can be so freely interchanged in the context of defining a topology for the Internet Whether the fullness of Tim Berners* See the opening... liberally hyperlinked, and it is assumed that you’d rather look them up online than rely on inevitably stale copies in this printed book The official GitHub repository that maintains the latest and greatest bug-fixed source code for this book is http://github.com/ptwobrussell/ Mining- the- Social- Web The official Twitter account for this book is @SocialWebMining This book is also not recommended if you need... vision will ever be realized remains to be seen, but the Web as we know it is getting richer and richer with social data all the time When we look back years from now, it may well seem obvious that the second- and third-level effects created by an inherently social web were necessary enablers for the realization of a truly semantic web The gap between the two seems to be closing Or Not to Read This Book?... creation than a technical one I designed it for a social effect—to help people work together—and not as a technical toy The ultimate goal of the Web is to support and improve our weblike existence in the world We clump into families, associations, and companies We develop trust across the miles and distrust around the corner —Tim Berners-Lee, Weaving the Web (Harper) To Read This Book? If you have a basic... whatsoever about the legal ramifications of what you may decide to do with the data that’s made available to you from social networking sites, although it does sincerely attempt to comply with the letter and spirit of the terms governing the particular sites that are mentioned It may seem unfortunate that many of the most popular social networking sites have licensing terms that prohibit the use of their data . AM www.it-ebooks.info Mining the Social Web www.it-ebooks.info www.it-ebooks.info Mining the Social Web Matthew A. Russell Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo www.it-ebooks.info Mining the Social. for Mining the Social Web Mining the Social Web is a must-read as data is distributed at a dizzying pace. A great primer for API jockeys, social media junkies, and data scientists alike, [Matthew]. Social networks really are changing the way we live our lives on and off the Web, † and they are enabling technology to bring out the best (and sometimes the worst) in us. The explosion of social