Practical machine learning innovations in recommendation

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	55
Dung lượng	5,06 MB

Nội dung

www.ebook3000.com Practical Machine Learning Innovations in Recommendation Ted Dunning and Ellen Friedman Practical Machine Learning by Ted Dunning and Ellen Friedman Copyright © 2014 Ted Dunning and Ellen Friedman All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://my.safaribooksonline.com) For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Editor: Mike Loukides January 2014: First Edition Revision History for the First Edition: 2014-01-22: First release 2014-08-15: Second release See http://oreilly.com/catalog/errata.csp?isbn=9781491915387 for release details Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc Practical Machine Learning: Innovations in Rec‐ ommendation and related trade dress are trademarks of O’Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their prod‐ ucts are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein ISBN: 978-1-491-91538-7 [LSI] www.ebook3000.com Table of Contents Practical Machine Learning What’s a Person To Do? Making Recommendation Approachable Careful Simplification Behavior, Co-occurrence, and Text Retrieval Design of a Simple Recommender What I Do, Not What I Say Collecting Input Data 10 Co-occurrence and Recommendation 13 How Apache Mahout Builds a Model Relevance Score 16 17 Deploy the Recommender 19 What Is Apache Solr/Lucene? Why Use Apache Solr/Lucene to Deploy? What’s the Connection Between Solr and Co-occurrence Indicators? How the Recommender Works Two-Part Design 19 20 20 22 23 Example: Music Recommender 27 Business Goal of the Music Machine Data Sources Recommendations at Scale A Peek Inside the Engine 27 28 29 32 iii Using Search to Make the Recommendations 33 Making It Better 37 Dithering Anti-flood When More Is More: Multimodal and Cross Recommendation 38 40 41 Lessons Learned 45 A Additional Resources 47 iv | Table of Contents www.ebook3000.com CHAPTER Practical Machine Learning A key to one of most sophisticated and effective approaches in ma‐ chine learning and recommendation is contained in the observation: “I want a pony.” As it turns out, building a simple but powerful rec‐ ommender is much easier than most people think, and wanting a pony is part of the key Machine learning, especially at the scale of huge datasets, can be a daunting task There is a dizzying array of algorithms from which to choose, and just making the choice between them presupposes that you have sufficiently advanced mathematical background to under‐ stand the alternatives and make a rational choice The options are also changing, evolving constantly as a result of the work of some very bright, very dedicated researchers who are continually refining exist‐ ing algorithms and coming up with new ones What’s a Person To Do? The good news is that there’s a new trend in machine learning and particularly in recommendation: very simple approaches are proving to be very effective in real-world settings Machine learning is moving from the research arena into the pragmatic world of business In that world, time to reflect is very expensive, and companies generally can’t afford to have systems that require armies of PhDs to run them Prac‐ tical machine learning weighs the trade-offs between the most ad‐ vanced and accurate modeling techniques and the costs in real-world terms: what approaches give the best results in a cost-benefit sense? Let’s focus just on recommendation As you look around, it’s obvious that some very large companies have for some years put machine learning into use at large scale (see Figure 1-1) Figure 1-1 What does recommendation look like? As you order items from Amazon, a section lower on the screen sug‐ gests other items that might be of interest, whether it be O’Reilly books, toys, or collectible ceramics The items suggested for you are based on items you’ve viewed or purchased previously Similarly, your videoviewing choices on Netflix influence the videos suggested to you for future viewing Even Google Maps adjusts what you see depending on what you request; for example, if you search for a tech company in a map of Silicon Valley, you’ll see that company and other tech compa‐ nies in the area If you search in that same area for the location of a restaurant, other restaurants are now marked in the area (And maybe searching for a big data meetup should give you technology companies plus pizza places.) But what does machine learning recommendation look like under the covers? Figure 1-2 shows the basics | Chapter 1: Practical Machine Learning www.ebook3000.com Figure 1-2 The math may be scary, but if approached in the right way, the concepts underlying how to build a recommender are easily understood If you love matrix algebra, this figure is probably a form of comfort food If not, you may be among the majority of people looking for solutions to machine-learning problems who want something more approachable As it turns out, there are some innovations in recom‐ mendation that make it much easier and more powerful for people at all levels of expertise There are a few ways to deal with the challenge of designing recom‐ mendation engines One is to have your own team of engineers and data scientists, all highly trained in machine learning, to custom design recommenders to meet your needs Big companies such as Google, Twitter, and Yahoo! are able to take that approach, with some very valuable results Other companies, typically smaller ones or startups, hope for success with products that offer drag-and-drop approaches that simply re‐ quire them to supply a data source, click on an algorithm, and look for easily understandable results to pop out via nice visualization tools There are lots of new companies trying to design such semiautomated products, and given the widespread desire for a turnkey solution, What’s a Person To Do? | many of these new products are likely to be financially successful But designing really effective recommendation systems requires some careful thinking, especially about the choice of data and how it is han‐ dled This is true even if you have a fairly automated way of selecting and applying an algorithm Getting a recommendation model to run is one thing; getting it to provide effective recommendations is quite a lot of work Surprisingly to some, the fancy math and algorithms are only a small part of that effort Most of the effort required to build a good recommendation system is put into getting the right data to the recommendation engine in the first place If you can afford it, a different way to get a recommendation system is to use the services of a high-end machine-learning consultancy Some of these companies have the technical expertise necessary to supply stunningly fast and effective models, including recommenders One way they achieve these results is by throwing a huge collection of algorithms at each problem, and—based on extensive experience in analyzing such situations—selecting the algorithm that gives the best outcome SkyTree is an example of this type of company, with its growing track record of effective machine learning models built to order for each customer Making Recommendation Approachable A final approach is to it yourself, even if you or your company lack access to a team of data scientists In the past, this hands-on approach would have been a poor option for small teams Now, with new de‐ velopments in algorithms and architecture, small-scale development teams can build large-scale projects As machine learning becomes more practical and approachable, and with some of the innovations and suggestions in this paper, the self-built recommendation engine becomes much easier and effective than you may think Why is this happening? Resources for Apache Hadoop–based com‐ puting are evolving and rapidly spreading, making projects with very large-scale datasets much more approachable and affordable And the ability to collect and save more data from web logs, sensor data, social media, etc., means that the size and number of large datasets is also growing How is this happening? Making recommendation practical depends in part on making it simple But not just any simplification will do, as explained in Chapter | Chapter 1: Practical Machine Learning www.ebook3000.com CHAPTER Careful Simplification Make things as simple as possible, but not simpler — Roger Sessions Simplifying Einstein’s quote “Keep it simple” is becoming the mantra for successful work in the big data sphere, especially for Hadoop-based computing Every step saved in an architectural design not only saves time (and therefore money), but it also prevents problems down the road Extra steps leave more chances for operational errors to be introduced In production, having fewer steps makes it easier to focus effort on steps that are essential, which helps keep big projects operating smoothly Clean, streamlined architectural design, therefore, is a useful goal But choosing the right way to simplify isn’t all that simple—you need to be able to recognize when and how to simplify for best effect A major skill in doing so is to be able to answer the question, “How good is good?” In other words, sometimes there is a trade-off between sim‐ ple designs that produce effective results and designs with additional layers of complexity that may be more accurate on the same data The added complexity may give a slight improvement, but in the end, is this improvement worth the extra cost? A nominally more accurate but considerably more complex system may fail so often that the net result is lower overall performance A complex system may also be so difficult to implement that it distracts from other tasks with a higher payoff, and that is very expensive This is not to say that complexity is never advantageous There cer‐ tainly are systems where the simple solution is not good enough and where complexity pays off Google’s search engine is one such example; www.ebook3000.com CHAPTER Making It Better The two-part design for the basic recommender we’ve been discussing is a full-scale system capable of producing high-quality recommen‐ dations Like any machine-learning system, success depends in part on repeated cycles of testing, evaluation, and tuning to achieve the desired results Evaluation is important not only to decide when a rec‐ ommender is ready to be deployed, but also as an ongoing effort in production By its nature, the model will change over time as it’s ex‐ posed to new user histories—in other words, the system learns A rec‐ ommender should be evaluated not only on present performance but also on how well it is setup to perform in the future As we pointed out in Chapter 2, as the developer or project director, you must also decide how good is good or, more specifically, which criteria define success in your situation—there isn’t just one yardstick of quality Trade-offs are individualized, and goals must be set appro‐ priately for the project For example, the balance between extreme accuracy in predictions or relevance and the need for quick response or realistic levels of development effort may be quite different for a big e-commerce site when compared to a personalized medicine project Machine learning is an automated technology, but human insight is required to determine the desired and acceptable results, and thus what constitutes success In practical recommendation, it’s also important to put your effort where it pays off the most In addition to the ongoing testing and adjusting to make a recommender better, there are also several addon capabilities that are important in a real-world deployment of such a system These add-ons are, strictly speaking, a bit outside the scope 37 of the recommender itself and have to with how people interact with a recommender as opposed to how a recommender works in isolation Even if they are outside the recommender itself, these addons can still have a profound effect on the perceived quality of the overall recommendation system Dithering The surprising thing about the technique known as dithering is that its approach is to make things worse in order to make them better Recall that the order in which items are recommended depends on their relevance score The basic approach in relevance dithering is to shake things up by intentionally including in a list of the top hits a few items with much smaller (i.e., less desirable) relevance Why? The idea is motivated by the observation that users don’t generally look beyond the first screenful of results produced by a search or recom‐ mendation engine You can see this if you plot the click-through rate versus result position in the search results (called rank here) for all search or recommendation results Most likely, you will see something a lot like what’s shown in Figure 7-1 Click-through will generally de‐ cline as rank increases due to decreasing relevance At about rank 10, users will have to scroll the screen to see more results, but many won’t bother Then at rank 20, even fewer will click to the next page 38 | Chapter 7: Making It Better www.ebook3000.com Figure 7-1 Why dithering is useful Behavior of visitors to a website shows that recommendations that appear on the second or later pages are almost never seen by users and therefore not provide critical feedback to the recommender This behavior can have a profound effect on a recommendation en‐ gine, because if users never see the results on the second and later pages, they won’t provide the recommendation engine with behavioral feedback on whether these second-page results were actually any good As a result, the recommendation engine mostly gets feedback on re‐ sults that it already knows about and gets very little feedback on results at the edge of current knowledge This limitation causes the recom‐ mendation engine to stagnate at or near the initial performance level It does not continue to learn On the other hand, if the result lists are shuffled a bit, then results from the second or even later pages have a chance of appearing on the first page, possibly even above the fold Although this change slightly di‐ lutes the initial relevance of the top recommendations, in the long run, the system has a chance to discover excellent recommendations that it would otherwise not know about When that happens, the engine will quickly start incorporating that discovery into mainstream results Once again, the recommender learns Dithering broadens the training data that’s fed to the recommendation engine Even if accuracy is adversely impacted during the first few days of operation, the broader feedback quickly improves the accuracy well Dithering | 39 above initial levels In fact, some recommendation-system designers have told us that introducing dithering resulted in a greater improve‐ ment in quality than any other single change The implementation of dithering is quite simple One way is to take the result list and generate a score that is the log of the initial rank of each result (r) combined with normally distributed random noise Then sort the results according to that score This approach leaves the top few results in nearly their original order, but depending on how large the random noise is, results that would otherwise be deeply bur‐ ied can be lifted onto the first page of results Dithering also has the surprising effect of increasing user stickiness This happens because the recommendation page changes each time the seed for the randomization changes It is common to keep the seed constant for minutes at a time The change in the contents of the top few recommendations when the seed does change seems to intrigue users into repeatedly returning to the recommendation page Para‐ doxically, users who don’t normally click to the second page of results seem to be happy to return to the first page over and over to get ad‐ ditional results Anti-flood Most recommendation algorithms, including the one discussed in this paper, can give you too much of a good thing Once it zeros in on your favorite book, music, video, or whatever, any recommendation engine that works on individual items is likely to give you seemingly endless variations on the same theme if such variations can be found It is much better to avoid monotony in the user experience by pro‐ viding diversity in recommendations with no more than a few of each kind of results This approach also protects against having several kinds of results obscured by one popular kind It is conceivable to build this preference for diversity into the recommendation engine itself, but our experience has been that it is much easier to ruin an otherwise good recommendation engine than it is to get diverse results out of the engine while maintaining overall quality As a precaution, it is much easier to simply reorder the recommendations to make the re‐ sults appear more diverse To this, many working recommendation systems have heuristic rules known collectively as anti-flood measures The way that these 40 | Chapter 7: Making It Better www.ebook3000.com systems work is that they will penalize the rank of any results that appear too similar to higher-ranked results For instance, the second song by the same artist might not be penalized, but the third song by the same artist might be penalized by 20 result positions This example of penalizing the same artist is just one way of implementing antiflood Many others are plausible, and which ones work best on your data is highly idiosyncratic to your own situation When More Is More: Multimodal and Cross Recommendation Throughout this discussion, we’ve talked about the power of simpli‐ fication, but emphasized smart simplification We have examined the design and functioning of a simple recommender, one in which a sin‐ gle kind of user interaction with a single kind of items is employed to suggest the same kind of interaction with the same kind of item For example, we might recommend music tracks for listening based on user histories for tracks to which they and others have previously lis‐ tened But if you have the luxury of going beyond this basic recommendation pattern, you may get much better results with a few simple additions to the design Here’s the basis for the added design elements People don’t just one thing (like want a pony) They buy a variety of items, listen to music, watch videos, order travel tickets, browse websites, or comment on their lives in email and social media In all these cases, there are mul‐ tiple kinds of interactions with multiple kinds of items Data for a variety of interactions and items is often available when building a recommender, providing a way to greatly enrich the input data for your recommender model and potentially improve the quality of rec‐ ommendations When More Is More: Multimodal and Cross Recommendation | 41 Figure 7-2 Multimodal recommendations can improve results The basic idea behind this multimodal approach is depicted in Figure 7-2 The first example shows a simple recommendation pattern in which there is a match between the type of interaction item and the type of recommendation For example, you could have user-viewing histories as input data to give recommendations for video viewing, such as, “you might like to watch these videos.” The triangles in Figure 7-2 illustrate this situation for the first recommendation ex‐ ample Multimodal recommendation is shown as the more complicated ex‐ ample in the figure Here, more than one type of behavior is used as input data to train the recommender Even the recent event history that triggers realtime recommendation may not be the same type of behavior as what is being recommended In this example, for instance, book buying or a query represents a new user event In that case, the system recommends video viewing in response to a book purchase or a query instead of in response to video viewing Your multimodal sys‐ tem is using a crossover of behavior to strengthen relevance or extend the system based on which new histories are available 42 | Chapter 7: Making It Better www.ebook3000.com As it turns out, the matrix transformations depicted back in Figure 1-2 as a “look under the covers” for a machine-learning recommender happen to represent a multimodal recommendation While multimo‐ dal or cross-recommendations are more complicated than simple rec‐ ommendations, they still are not out of reach The good news is that the innovations already described here, such as using search technol‐ ogy like Solr/Lucene to deploy a recommendation system, still apply and make the next-level recommenders also relatively easy to imple‐ ment When More Is More: Multimodal and Cross Recommendation | 43 www.ebook3000.com CHAPTER Lessons Learned Real-world projects have real-world budgets for resources and effort It’s important to keep that in mind in the move from cutting-edge academic research in machine learning to practical, deployable rec‐ ommendation engines that work well in production and provide prof‐ itable results So it matters to recognize which approaches can make the biggest difference for the effort expended Simplifications chosen wisely often make a huge difference in the practical approach to recommendation The behavior of a crowd can provide valuable data to predict the relevance of recommendations to individual users Interesting co-occurrence can be computed at scale with basic algorithms such as ItemSimilarityJob from the Apache Mahout library, making use of log likelihood ratio anomaly-detection tests Weighting of the computed indicators improves their ability to predict relevance for recommendations One cost-effective simplification is the innovative use of search capa‐ bilities, such as those of Apache Solr/Lucene, to deploy a recommender at scale in production Search-based, item-based, recommendation underlies a two-part design for a recommendation engine that has offline learning and realtime online recommendations in response to recent user events The result is a simple and powerful recommender that is much easier to build than many people would expect 45 This two-part design for recommendation at large scale can be made easier and even more cost effective when built on a realtime distributed file system such as the one used by MapR However, with some extra steps, this design for a recommender can be implemented on any Apache Hadoop-compatible distributed file system The performance quality of the recommender can generally be im‐ proved through rounds of evaluation and tuning, A/B testing, and adjustments in production, plus dithering and anti-flood tricks to keep the recommendation engine learning and keep the experience fresh for users Furthermore, additional levels of quality can be gained by taking into account more than one type of behavior as input for the learning model: the so-called multimodal approach to recommenda‐ tion Oh yes…and we still want that pony 46 | Chapter 8: Lessons Learned www.ebook3000.com APPENDIX A Additional Resources Slides/Videos • October 2013 Strata + Hadoop World (New York) talk by Ted Dunning on building multimodal recommendation engine using search technology: http://slidesha.re/16juGjO • May 2014 Berlin Buzzwords video of “Multi-modal Recommen‐ dation Algorithms” talk by Ted Dunning: http://bit.ly/XXy2bm Blog Two related entries from Ted Dunning’s blog “Surprise and Coinci‐ dence”: • On recommendation, LLR, and a bit of code: http://bit.ly/ 1dCL5Vk • Software tutorials for corpus analysis: http://bit.ly/1dZdKyX Books • Mahout in Action by Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman (Manning 2011): http://amzn.to/1eRFSbb — Japanese translation: Mahout in Action (O’Reilly Japan): http:// bit.ly/14td9DS 47 — Korean translation: Mahout in Action (Hanbit Media, Inc.): http://bit.ly/VzZHY9 • Apache Mahout Cookbook by Piero Giacomelli (Packt Publishing 2013): http://amzn.to/1cCtQNP Training One-day technical course, “Machine Learning with Apache Mahout: Introduction to Scalable ML for Developers,” developed by the authors for MapR Technologies and co-developed by Tim Seears of Big Data Partnership For details, see MapR or BDP Apache Mahout Open Source Project For more information, visit the Apache Mahout website or Twitter LucidWorks The LucidWorks website includes tutorials on Apache Solr/Lucid‐ Works Elasticsearch Elasticsearch provides an alternative wrapper for Lucene The tech‐ niques in this book work just as well for Elasticsearch as for Solr 48 | Appendix A: Additional Resources www.ebook3000.com About the Authors Ted Dunning is Chief Applications Architect at MapR Technologies and committer and PMC member of the Apache Mahout, ZooKeeper, and Drill projects and mentor for the Apache Storm, DataFu, Flink, and Optiq projects He contributed to Mahout clustering, classifica‐ tion, and matrix decomposition algorithms and helped expand the new version of Mahout Math library Ted was the chief architect be‐ hind the MusicMatch (now Yahoo Music) and Veoh recommendation systems, built fraud-detection systems for ID Analytics (LifeLock), and is the inventor of over 24 issued patents to date Ted has a PhD in computing science from University of Sheffield When he’s not doing data science, he plays guitar and mandolin Ted is on Twitter at @ted_dunning Ellen Friedman is a consultant and commentator, currently writing mainly about big data topics She is a committer for the Apache Ma‐ hout project and a contributor to the Apache Drill project With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics including molecular biology, nontraditional inheritance, and oceanography Ellen is also co-author of a book of magic-themed cartoons, A Rabbit Under the Hat Ellen is on Twitter at @Ellen_Friedman Data What ise? Scienc Data u Jujits ies compan ducts to the o pro longs data int ure be The fut le that turn op and pe Mike Lo ukides The Ar t of Tu DJ Patil rning Da ta Into g PlanninData for Big Produc t book to dscape s hand lan A CIO’ ging data an the ch Team Radar O’Reilly O’Reilly Strata is the essential source for training and information in data science and big data—with industry news, reports, in-person and online events, and much more Weekly Newsletter ■ Industry News & Commentary ■ Free Reports ■ Webcasts ■ Conferences ■ Books & Videos ■ Dive deep into the latest in data science and big data strataconf.com ©2014 O’Reilly Media, Inc The O’Reilly logo is a registered trademark of O’Reilly Media, Inc 131041 www.ebook3000.com .. .Practical Machine Learning Innovations in Recommendation Ted Dunning and Ellen Friedman Practical Machine Learning by Ted Dunning and Ellen Friedman Copyright © 2014 Ted Dunning and Ellen... trend in machine learning and particularly in recommendation: very simple approaches are proving to be very effective in real-world settings Machine learning is moving from the research arena into... www.ebook3000.com CHAPTER Practical Machine Learning A key to one of most sophisticated and effective approaches in ma‐ chine learning and recommendation is contained in the observation: “I want

Ngày đăng: 04/03/2019, 13:43