1. Trang chủ
  2. » Công Nghệ Thông Tin

Building Web Reputation Systems- P23 ppt

15 239 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 15
Dung lượng 394,12 KB

Nội dung

message to the Yahoo! Profiles karma model without knowing the address for the dispatcher; it can just send a message to the one for its own framework and know that the message will get relayed to the appropriate servers. Note that a registration service, such as the one described for the dispatch consumer, is required to support this functionality. There can be many message dispatchers deployed, and this layer is a natural location to provide any context-based security that may be required. Since changes to the rep- utation database come only by sending messages to the reputation framework, limiting application access to the dispatcher that knows the names and addresses of the context- specific models makes sense. As a concrete example, only Yahoo! Travel and Local had the keys needed to contact, and therefore make changes to, the reputation framework that ran their shared model, but any other company property could read their ratings and reviews using the separate reputation query layer (see “Reputation query inter- face” on page 298). The Yahoo! Reputation Platform’s dispatcher implementation was optimistic: all application API calls return immediately without waiting for model execution. The messages were stored with the dispatcher until they could be forwarded to a model execution engine. The transport services used to move messages to the dispatcher varied by application, but most were proprietary high-performance services. A few models, such as Yahoo! Mail’s Spam IP reputation, accepted inputs on a best-effort basis, which uses the fastest available transport service. The Yahoo! Reputation Platform high-level architectural layer cake shown in Figure A-1 contains all the required elements of a typical rep- utation framework. New framework designers would do well to start with that design and design/select implementations for each component to meet their requirements. Figure A-3 shows the heart of the reputation framework, the model execution engine, which manages the reputation model processes and their state. Messages from the dispatcher layer are passed into the appropriate model code for immediate execution. The model execution engine reads and writes its state, usually in the form of reputation statements via the reputation database layer. (See “Reputation repository” on page 298.) Model processes run to completion, and if cross-model execution or optimism is desired, may send messages to the dispatcher for future processing. The diagram also shows that models may use the external event signaling system to notify applications of changes in state. See the section “External signaling inter- face” on page 297. Model execution engine. 296 | Appendix A: The Reputation Framework This platform gets much of its performance from parallel processing, and the Yahoo! Reputation Platform uses this approach by implementing an Engine Proxy that routes all incoming message traffic to the engine that is currently running the appropriate model in a concurrent process. This proxy is also in charge of loading and initializing any model that is not currently loaded or executing. The Yahoo! Reputation Platform implemented models in PHP with many of the mes- saging lines within the model diagram implemented as function calls instead of higher- overhead messages. See “Your Mileage May Vary” on page 300 for a discussion of the rationale. The team chose PHP mostly due to its members’ personal expertise and tastes (there was no particular technical requirement that drove this choice). In optimistic systems, such as the Yahoo! reputation platform, output happens passively: the application has no idea when a change happened or what the results were of any input event. Some unknown time after the input, a query to the database may or may not reflect a change. In high-volume applications, this is a very good thing because it is just impractical to wait for every side effect of every input to External signaling interface. Figure A-3. Yahoo! Reputation Platform model engine. Framework Designs | 297 propagate across dozens of servers. But when something important (read valuable) happens, such as an IP address switching from good-actor to spammer, the application needs to be informed ASAP. This is accomplished by using an external signaling interface. For smaller systems, this can just be hardcoded calls in the reputation model implementation. But larger envi- ronments normally have signalling services in place that typically log signal details and have mechanisms for executing processes that take actions, such as changing user ac- cess or contacting supervisory personnel. Another kind of signaling interface can be used to provide a layer of request-reply se- mantics to an optimistic system: when the model is about to complete, a signal gets sent to a waiting thread that was created when the input was sent. The thread identifier is sent along as a parameter throughout the model as it executes. On the surface, the reputation repository layer looks like any other high-performance, partitioned, and redundant database. The specific features for the repository in the Yahoo! reputation platform are: • Like the other layers, the repositories may themselves be managed by a proxy manager for performance. • The reputation claim values may be normalized by the repository layer so that those reading the values via the query interface don’t have to know the input scale. To improve performance, many read-modify-write operations, such as increment and addToSum, are implemented as stored procedures at the database level, instead of being code-level mathematic operations at the model execution layer. This significantly re- duces interprocess message time as well as the duration of any lock contention on highly modified reputation statements. The Yahoo! Reputation Platform also contains features to dynamically scale up by adding new repository partitions (nodes) and cope gracefully with data migrations. Though those solutions are proprietary, we mention them here for completeness and so that anyone contemplating such a framework can consider them. The main purpose for all of this infrastructure is to provide speedy access to the best possible reputation statements for diverse display and other corporate use patterns. The reputation query interface provides this service. It is sepa- rated from the repository service because it provides read-only access, and the data access model is presumed to be less restrictive. For example, every Yahoo! application could read user karma scores, even if they could only modify it via their own context- restricted reputation model. Large-scale database query service architectures are well understood and well documented on the Web and in many books. Framework design- ers are reminded that the number of reputation queries in most applications is one or two orders of magnitude larger than the number of changes. Our short treatment of the subject here does not reflect the relative scale of the service. Reputation repository. Reputation query interface. 298 | Appendix A: The Reputation Framework Yahoo! used context-specific entity identifiers (often in the form of database foreign keys) as source and target IDs. So, even though Yahoo! Movies might have permission to ask the reputation query service for a user’s restaurant reviews, it might do them no good without a separate service from Yahoo! Local to map the reviews’ local-specific target ID back to a data record describing the eatery. The format used is context.for eignKeyValue; the reason for the context. is to allow for context-specific wildcard search (described later). There is always at least one special context: user., which holds karma. In practice, there is also a source-only context, roll-up., used for claims that aggregate the input of many sources. Claim type identifiers are of a specific format—context.application.claim. An exam- ple is YMovies.MovieReviews.OverallRating to hold the claim value for a user’s overall rating for a movie. Queries are of the form: Source: [SourceIDs], Claim: [ClaimIDs], Target: [Targe tIDs]. Besides the obvious use of retrieving a specific reputation statement, the iden- tifier design used in this platform supports wildcard queries (*) to support various mul- tiple return results: Source:*, Claim: [ClaimID], Target: [TargetID] Returns all of a specific type of claim for a particular target. e.g., all of the reviews for the movie Aliens. Source: [SourceID], Claim: context.application.*, Target: * Returns all of the application-specific reputation statements for any targets by a source, e.g., all of Randy’s ratings, reviews, and helpful votes on other user reviews. Source: *, Claim: [ClaimID], Target: [TargetID, TargetID, ] Returns all reputation statements with a certain claim type of multiple targets. The application is the source of the list of targets, such as a list of otherwise qualified search results, e.g., What have users given as overall ratings for the movies that are currently in theaters near my house? There are many more query patterns possible, and framework designers will need to predetermine exactly which wildcard searches will be supported, as appropriate in- dexes may need to be created and/or other optimizations might be required. Yahoo! supports both RESTful interfaces and JSON protocol requests, but any reliable protocol would do. It also supports returning a paged window of results, reducing interprocess messaging to just the number of statements required. Yahoo! lessons learned During the development of the Yahoo! Reputation Platform, the team wandered down many dark alleys and false paths. Presented next are some of warning signs and insights gained. They aren’t intended as hard-and-fast rules, just friendly advice: • It is not practical to use prebuilt code blocks to build reputation models, because every context is different, so every model is also significantly different. Don’t try Framework Designs | 299 to create a reputation scripting language. Certainly there are common abstractions, as represented in the graphical grammar, but those should not be confused with actual executing code. To get the desired customization, scale, and performance, the reputation processes should be expressed directly in native code. The Yahoo! Reputation Platform expressed the reputation models directly in PHP. After the first few models were complete, common patterns were packaged and released as code libraries, which decreased the implementation time for each model. • Focus on building only on the core reputation framework itself, and use existing toolkits for messaging, resource management, and databases. No need to reinvent the wheel. • Go for the performance over slavishly copying the diagrams’ inferred modularity. For example, even the Simple Accumulator process is probably best implemented primarily in the database process as a stored procedure. Many of the patterns work out to be read-modify-write, so the main alternatives are stored procedures or deferring the database modifications as long as possible given your reliability requirements. • Creating a common platform is necessary, but not sufficient, to get applications to share data. In practice, it turned out that the problem of reconciling the entity identifiers between sites was a time-intensive task that often was deprioritized. Often merging two previously existing entity databases was not 100% automatic and required manual supervision. Even when the data was merged, it typically required each sharing application to modify existing user-facing application code, another expense. This latter problem can be somewhat mitigated in the short-term by writing backward-compatible interfaces for legacy application code. Your Mileage May Vary Given the number of variations on reputation framework requirements and your ap- plication’s technical environment, the two examples just presented represent extremes that don’t exactly apply to your situation. Our advice is to design in favor of adapta- bility, a constraint we intentionally left off the choice list. It took three separate tries to implement the Yahoo! Reputation Platform. Yahoo! first tried to do it on the cheap, with a database vendor creating a request-reply, all database-procedure-based implementation. That attempt surfaced an unacceptable performance/reliability trade-off and was abandoned. The second attempt taught us about premature reputation model compilation and optimization and that we could loosen the strongly typed and compiled language re- quirement in order to make reputation model implementation more flexible and accessable to more programmers. The third platform finally made it to deployment, and the lessons are reflected in the previous section. It is worth noting that though the platform delivers on the original 300 | Appendix A: The Reputation Framework requirements, the sharing requirement—listed as a primary driver for the project—is not yet in extensive use. Despite the repeated assertions by senior product management, the applications designers end up requiring orientation in the benefits of sharing their data as well as leveraging the shared reputations of other applications. Presently, only customer care looks at cross-property karma scores to help determine whether an ac- count that might otherwise be automatically suspended should get additional, high- touch support instead. Recommendations for All Reputation Frameworks Reputation is a database. Reputation statements should be stored and indexed sepa- rately so that applications can continue to evolve new uses for the claims. Though it is tempting to mix the reputation process code in with your application, don’t do it! You will be changing the model over time to either fix bugs, achieve the results you were originally looking for, or to mitigate abuse, and this will be all but impossible unless reputation remains a distinct module. Sources and targets are foreign keys, and generally the reputation framework has little to no specific knowledge of the data objects indexed by those keys. Everything the reputation model needs to compute the claims should be passed in messages or remain directly accessible to each reputation process. Discipline! The reputation framework manages nothing less than the code that sets the valuation of all user-generated and user-evaluated content in your application. As such, it deserves the effort of regular critical design and code reviews and full testing suites. Log and audit every input that is interesting, especially any claim overrides that are logged during operations. There have been many examples of employees manipulating reputation scores in return for status or favors. Your Mileage May Vary | 301 APPENDIX B Related Resources There are many readings on the broad topic of reputation systems. We list a few here and encourage readers who have additional resources to contribute or want to read the most up-to-date list to visit this book’s website at http://buildingreputation.com. Further Reading The Web contains thousands of white papers and blog postings related to specific reputation issues, such as ratings bias and abusing karma. The list here is a represen- tative sample. We maintain an updated, comprehensive list on their Delicious book- marks: http://delicious.com/frandallfarmer/reputation and http://delicious.com/soldier ant/reputation. A Framework for Building Reputation Systems, by Phillip J. Windley, Ph.D., Kevin Tew, Devlin Daley, dept. of computer science Brigham Young University. One of the few papers that proposes a platform approach to reputation systems. Designing Social Interfaces, by Christian Crumlish and Erin Malone from O’Reilly and Yahoo! Press. It covers not only the reputation patterns, but social patterns of all types —a definite companion for our book. “Designing Your Reputation System,” a slideshow presentation by Bryce Glass, initially presented before we started on this: book. “Reputation As Property in Virtual Economies,” by Joseph Blocher, discusses the idea that online reputation may become real-world property. The Reputation Pattern Library at the Yahoo! Developer Network, where some of our thoughts were first refined into clear patterns. The Reputation Research Network, a clearinghouse for some older reputation systems research papers. “Who Is Grady Harp? Amazon’s Top Reviewers and the fate of the literary amateur,” by Garth Risk Hallberg. One of many articles talking about the side effects of having 303 karma associated with commercial gain. See our Delicious bookmarks for similar arti- cles about YouTube, Yelp, SlashDot, and more. Recommender Systems Though only briefly mentioned in this book, recommender systems are an important form of web reputations, especially for entities. There are extensive libraries of research papers available on the Web. In particular, you should check out the following resources: Visit http://presnick.people.si.umich.edu/. The site is maintained by Paul Resnick, pro- fessor at the University of Michigan School of Information. He is one of the lead re- searchers in reputation and recommender systems and is a prolific author of relevant works. GroupLens is a research lab at the University of Minnesota with a focus in recommender systems. Robert E. Kraut is another important researcher who focuses on recommender and collaboration systems. Visit his site at http://www.cs.cmu.edu/~kraut/RKraut.site.files/ research/research.html. The ACM Recommender Systems conference site contains some great links to support materials, including slide decks. Social Incentives The “Broken Windows” effect is cited in this book in several chapters. There is some popular debate about its effect on human behavior, highlighted in two popular books: Gladwell, Malcolm. The Tipping Point: How Little Things Can Make a Big Difference. MA: Back Bay Books, 2002. Levitt, Steven D., and Stephen J. Dubner. Freakonomics: A Rogue Economist Explores the Explores the Hidden Side of Everything. NY: Harper Perennial, 2009. They focus on the question of the effects (or lack thereof) on crime based on the New York Police Department’s strict enforcement. Though we don’t take a position on that specific example, we want to point out a few additional references that support the broken windows effect in other contexts: Johnson, Carolyn Y. “Breakthrough on Broken Windows.” The Boston Globe, February 8, 2009. “The Broken Windows Theory of Crime is Correct.” The Economist, November 20, 2008. 304 | Appendix B: Related Resources The emerging field of behavioral economics is deeply relevant to using reputation as user incentive. Papers and books are starting to emerge, but we recommend this primer for all readers: Ariely, Dan. Predictably Irrational. NY: Harper Perennial, 2010. Howe, Jeff. Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Busi- ness. NY: Three Rivers Press, 2009. This book provides some useful insight into group motivation. Patents Several patent applications were cited in this book, and we’ve gathered their references here for convenience. Contributors to this section are encouraged to include other relevant intellectual property for consideration by their peers. U.S. Patent Application 11/774,460:Detecting Spam Messages Using Rapid Sender Rep- utation Feedback Analysis, Miles Libbey, F. Randall Farmer, Mohammad Mohsenza- deh, Chip Morningstar, Neal Sample U.S. Patent Application 11/945,911:Real-Time Asynchronous Event Aggregation Sys- tems, F. Randall Farmer, Mohammad Mohsenzadeh, Chip Morningstar, Neal J. Sample U.S. Patent Application 11/350,981:Interestingness ranking of media objects, Daniel S. Butterfield, Caterina Fake, Callum James Henderson-Begg, Serguei Mourachov U.S. Patent Application 11/941,009:Trust Based Moderation, Ori Zaltzman and Quy Dinh Le Patents | 305 [...]... Index | 309 frameworks (see reputation frameworks) free versus cheap, 116 freeform content, 179 freshness and decay, 63 friendship incentives, 114 fulfillment incentives, 119 Full Monty (content control pattern), 110 G game currencies, reputation points as, 156 global reputation, 9 FICO, 10 goals, defining for reputation system, 98–102 Google Analytics, use of personal reputations, 170 Answers, direct... best-effort reliability, 282 invisible reputation framework, 288 beta testing reputation models, 230 bias, 60–63 first-mover effects, 63 ratings bias effects, 61 branding, 117 broken windows theory online behavior and, 205 Yahoo! Answers and, 274 browser cookies, input from, 134 bug reports (content control pattern), 105 building blocks, 39–57 claim types, 39 computing reputation, 46–54 routers, 54–57 We’d... 107 surveys, 107 Web 1.0, 104 content quality, 13, 15 (see also quality) configurable thresholds for, 205 improving, 102 content reputation, 176 normalized percentages with summary count (example), 180 content showcases, 200 safeguards for, 203 content, users’ expression of opinions about, 207 contexts of reputation, 4, 8, 26 constraining scope, 146 data portability and, 284 dynamic reputation models,... reputation models, 280 FICO and the Web, 12 importance of, 146 limiting for karma, 177 reputation generation and, 151 thumb voting, 141 using to guide ratings scale used, 136 corporate reputations, 172 counters, 47 reversible, 47 Craigslist abuse reporting, 69 creators, honoring, 15 credit scores creating feedback loop, 226 FICO, 10 cron jobs, 134 crusader incentives, 114 currency, reputation points as, 156... normalization (see normalization) decay (time-based) in reputation models, 93 decaying reputation scores, 64 decisions based on reputation, 174 high investment in, 129 process patterns (routers), 54 Delicious lists on, 237 denormalization, 53, 58 scalar, 54 Digg accumulators display, 182 benefit to user, 132 design and voting behavior, 237 display of reputation scores, 167 vote-to-promote model, 26, 29,... Millennium Act (DCMA), 109 direct revenue incentives, 116 displaying reputation, 165–196 corporate reputations, 172 formats, 178 harmful effects of leaderboards, 194–196 patterns, 180–194 levels, 185–189 normalized score to percentage, 180 points and accumulators, 182 statistical evidence, 183 personal and public reputations combined, 171 personal reputations, 169 questions on, 165 ranked lists, 189 to show... 226 FICO credit score, global reputation study, 10 filtering (reputation) , 173 fire-and-forget messaging, 286 first-mover effects, 63 firsts, rewarding, 135 flagging abusive content, 69 on Flickr, 85 on Yahoo! Answers, 256 flexibility in reputation systems, 94 Flickr feedback to contributors, 213 interestingness algorithm, 41 interestingness score for content quality, 82–89 reputation display and, 167... integration, 223–227 avoiding feedback loops, 226 implementing reputation model, 224 inputs, 225 planning for change, 227 Yahoo! Answers system, 270 application optimization, 231 application tuning, 234, 235 (see also tuning reputation systems) Ariely, Dan, 111, 116, 198 asynchronous activations, 134 attention and massive scale of web content, 13 audits of reputation system applications, 127 averages problems... incentives, 116 Orkut, 169, 195 greater disclosure (adding information), 145 H Hawthorne effect, 233 histories, reputation and, 214 user profiles, 218 I iconic numbered levels, 186 identity spoofing, 215 identity, reputation as, 214–221 contributor ranks in listings, 220 user profiles, 216 user reputation in context of contribution, 219 implicit claims, 133 implicit inputs, 143–146 adding to collection,... recognition, 119 social incentives, information resources, 304 social versus market exchanges, 111 for user engagement, 99, 106 inferred karma, 285 generating, 159 in Yahoo! Answers reputation model, 263 inferred reputation, 210 just-in-time reputation calculation, 211 input, 56, 131–146 automating simulated inputs, 229 best practices for good inputs, 135 common explicit inputs, 136–143 common implicit inputs, . list to visit this book’s website at http://buildingreputation.com. Further Reading The Web contains thousands of white papers and blog postings related to specific reputation issues, such as. Delicious book- marks: http://delicious.com/frandallfarmer /reputation and http://delicious.com/soldier ant /reputation. A Framework for Building Reputation Systems, by Phillip J. Windley, Ph.D., Kevin. about, 207 contexts of reputation, 4, 8, 26 constraining scope, 146 data portability and, 284 dynamic reputation models, 280 FICO and the Web, 12 importance of, 146 limiting for karma, 177 reputation generation

Ngày đăng: 03/07/2014, 07:20

TỪ KHÓA LIÊN QUAN