Handbook of Multimedia for Digital Entertainment and Arts- P1 pot

Handbook of Multimedia for Digital Entertainment and Arts Borko Furht Editor Handbook of Multimedia for Digital Entertainment and Arts 123 Editor Borko Furht Department of Computer Science and Engineering Florida Atlantic University 777 Glades Road PO Box 3091 Boca Raton, FL 33431 USA borko@cse.fau.edu ISBN 978-0-387-89023-4 e-ISBN 978-0-387-89024-1 DOI 10.1007/978-0-387-89024-1 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009926305 c Springer Science+Business Media, LLC 2009 All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface The advances in computer entertainment, multi-player and online games, technology-enabled art, culture and performance have created a new form of entertainment and art, which attracts and absorbs their participants The fantastic success of this new field has influenced the development of the new digital entertainment industry and related products and services, which has impacted every aspect of our lives This Handbook is carefully edited book – authors are 88 worldwide experts in the field of the new digital and interactive media and their applications in entertainment and arts The scope of the book includes leading edge media technologies and latest research applied to digital entertainment and arts with the focus on interactive and online games, edutainment, e-performance, personal broadcasting, innovative technologies for digital arts, digital visual and auditory media, augmented reality, moving media, and other advanced topics This Handbook is focused on research issues and gives a wide overview of literature The Handbook comprises of five parts, which consist of 33 chapters The first part on Digital Entertainment Technologies includes articles dealing with personalized movie, television related media, and multimedia content recommendations, digital video quality assessments, various technologies for multi-player games, and collaborative movie annotation The second part on Digital Auditory Media focuses on articles on digita music management and retrieval, music distribution, music search and recommendation, and automated music video generation The third part on Digital Visual Media consists of articles on live broadcasts, digital theater, video browsing, projector camera systems, creating believable characters, and other aspects of visual media The forth part on Digital Art comprises articles that discuss topics such as information technology and art, augmented reality and art, creation process in digital art, graphical user interface in art, and new tools for creating arts The part V on Culture of New Media consists of several articles dealing with interactive narratives, discussion on combining digital interactive media, natural interaction in intelligent spaces, and social and interactive applications based on using sound-track identification With the dramatic growth of interactive digital entertainment and art applications, this Handbook can be the definitive resource for persons working in this field as researchers, scientists, programmers, and engineers The book is intended for a v vi Preface wide variety of people including academicians, animators, artists, designers, developers, educators, engineers, game designers, media industry professionals, video producers, directors and writers, photographers and videographers, and researchers and graduate students This book can also be beneficial for business managers, entrepreneurs, and investors The book can have a great potential to be adopted as a textbook in current and new courses on Media Entertainment The main features of this Handbook can be summarized as: The Handbook describes and evaluates the current state-of-the-art in multimedia technologies applied in digital entertainment and art It also presents future trends and developments in this explosive field Contributors to the Handbook are the leading researchers from academia and practitioners from industry I would like to thank the authors for their contributions Without their expertise and effort this Handbook would never come to fruition Springer editors and staff also deserve our sincere recognition for their support throughout the project Borko Furht Editor-in-Chief Boca Raton, 2009 Contents Part I DIGITAL ENTERTAINMENT TECHNOLOGIES Personalized Movie Recommendation George Lekakos, Matina Charami, and Petros Caravelas Cross-category Recommendation for Multimedia Content Naoki Kamimaeda, Tomohiro Tsunoda, and Masaaki Hoshino 27 Semantic-Based Framework for Integration and Personalization of Television Related Media Pieter Bellekens, Lora Aroyo, and Geert-Jan Houben 59 Personalization on a Peer-to-Peer Television System Jun Wang, Johan Pouwelse, Jenneke Fokker, Arjen P de Vries, and Marcel J.T Reinders 91 A Target Advertisement System Based on TV Viewer’s Profile Reasoning 115 Jeongyeon Lim, Munjo Kim, Bumshik Lee, Munchurl Kim, Heekyung Lee, and Han-kyu Lee Digital Video Quality Assessment Algorithms 139 Anush K Moorthy, Kalpana Seshadrinathan, and Alan C Bovik Countermeasures for Time-Cheat Detection in Multiplayer Online Games 157 Stefano Ferretti Zoning Issues and Area of Interest Management in Massively Multiplayer Online Games 175 Dewan Tanvir Ahmed and Shervin Shirmohammadi vii viii Contents Cross-Modal Approach for Karaoke Artifacts Correction 197 Wei-Qi Yan and Mohan S Kankanhalli 10 Dealing Bandwidth to Mobile Clients Using Games 219 Anastasis A Sofokleous and Marios C Angelides 11 Hack-proof Synchronization Protocol for Multi-player Online Games 237 Yeung Siu Fung and John C.S Lui 12 Collaborative Movie Annotation 265 Damon Daylamani Zad and Harry Agius Part II DIGITAL AUDITORY MEDIA 13 Content Based Digital Music Management and Retrieval 291 Jie Zhou and Linxing Xiao 14 Incentive Mechanisms for Mobile Music Distribution 307 Marco Furini and Manuela Montangero 15 Pattern Discovery and Change Detection of Online Music Query Streams 327 Hua-Fu Li 16 Music Search and Recommendation 349 Karlheinz Brandenburg, Christian Dittmar, Matthias Gruhne, Jakob Abeßer, Hanna Lukashevich, Peter Dunker, Daniel Gă rtner, Kay Wolter, Stefanie Nowak, and Holger Grossmann a 17 Automated Music Video Generation Using Multi-level Feature-based Segmentation 385 Jong-Chul Yoon, In-Kwon Lee, and Siwoo Byun Part III DIGITAL VISUAL MEDIA 18 Real-Time Content Filtering for Live Broadcasts in TV Terminals 405 Yong Man Ro and Sung Ho Jin 19 Digital Theater: Dynamic Theatre Spaces 423 Sara Owsley Sood and Athanasios V Vasilakos 20 Video Browsing on Handheld Devices 447 Wolfgang Hă rst u Contents ix 21 Projector-Camera Systems in Entertainment and Art 471 Oliver Bimber and Xubo Yang 22 Believable Characters 497 Magy Seif El-Nasr, Leslie Bishko, Veronica Zammitto, Michael Nixon, Athanasios V Vasiliakos, and Huaxin Wei 23 Computer Graphics Using Raytracing 529 Graham Sellers and Rastislav Lukac 24 The 3D Human Motion Control Through Refined Video Gesture Annotation 551 Yohan Jin, Myunghoon Suk, and B Prabhakaran Part IV DIGITAL ART 25 Information Technology and Art: Concepts and State of the Practice 567 Salah Uddin Ahmed, Cristoforo Camerano, Luigi Fortuna, Mattia Frasca, and Letizia Jaccheri 26 Augmented Reality and Mobile Art 593 Ian Gwilt 27 The Creation Process in Digital Art 601 Ad´ rito Fernandes Marcos, Pedro S´ rgio Branco, e e and Nelson Troca Zagalo 28 Graphical User Interface in Art 617 Ian Gwilt 29 Storytelling on the Web 2.0 as a New Means of Creating Arts 623 Ralf Klamma, Yiwei Cao, and Matthias Jarke Part V CULTURE OF NEW MEDIA 30 A Study of Interactive Narrative from User’s Perspective 653 David Milam, Magy Seif El-Nasr, and Ron Wakkary 31 SoundScapes/Artabilitation – Evolution of a Hybrid Human Performance Concept, Method & Apparatus Where Digital Interactive Media, The Arts, & Entertainment are Combined 683 A.L Brooks x Contents 32 Natural Interaction in Intelligent Spaces: Designing for Architecture and Entertainment 713 Flavia Sparacino 33 Mass Personalization: Social and Interactive Applications Using Sound-Track Identification 745 Michael Fink, Michele Covell, and Shumeet Baluja Index 765 xvi Nelson Troca Zagalo University of Minho, Braga, Portugal Veronica Zammitto Simon Fraser University, Vancouver, Canada Jie Zhou Tsinghua University, Beijing, China Contributors Part I DIGITAL ENTERTAINMENT TECHNOLOGIES Chapter Personalized Movie Recommendation George Lekakos, Matina Charami, and Petros Caravelas Introduction The vast amount of information available on the Internet, coupled with the diversity of user information needs, have urged the development of personalized systems that are capable of distinguishing one user from the other in order to provide content, services and information tailored to individual users Recommender Systems (RS) form a special category of such personalized systems and aim to predict user’s preferences based on her previous behavior Recommender systems emerged in the mid-90’s and since they have been used and tested with great success in e-commerce, thus offering a powerful tool to businesses activating in this field by adding extra value to their customers They have experienced a great success and still continue to efficiently apply on numerous domains such as books, movies, TV program guides, music, news articles and so forth Tapestry [1], deployed by Xerox PARC, comprises a pioneer implementation in the field of recommender systems and at the same time, it was the first to embed human judgment in the procedure of producing recommendations Tapestry was an email system capable to manage and distribute electronic documents utilizing the opinion of users that have already read them Other popular recommender systems that followed are Ringo [2] for music pieces and artists, Last.fm as a personalized internet radio station, Allmusic.com as a metadata database about music genres, similar artists and albums, biographies, reviews, etc, MovieLens [3] and Bellcore [4] for movies, TV3P [5], pEPG [6] and smart EPG [7] as program guides for digital television (DTV), GroupLens [8, 9] for news articles in Usenet and Eigentaste on Jester database as a joke recommender system Nowadays, Amazon.com [10] is the most popular and successful example of applying recommender systems in order to provide personalized promotions for a plethora of goods such as books, CDs, DVDs, toys, etc G Lekakos ( ), M Charami, and P Caravelas ELTRUN, the e-Business Center, Department of Management Science and Technology, Athens University of Economics and Business, Athens, Greece e-mail: glekakos@aueb.gr; scha@ait.gr; pcaravel@aueb.gr B Furht (ed.), Handbook of Multimedia for Digital Entertainment and Arts, DOI 10.1007/978-0-387-89024-1 1, c Springer Science+Business Media, LLC 2009 G Lekakos et al Now more than ever, the users continuously face the need to find and choose items of interest among many choices In order to realize such a task, they usually need help to search and explore or even reduce the available options Today, there are thousands of websites on the Internet collectively offering an enormous amount of information Hence, even the easiest task of searching a movie, a song or a restaurant may be transformed to a difficult mission Towards this direction, search engines and other information retrieval systems return all these items that satisfy a query, usually ranked by a degree of relevance Thus, the semantics of search engines is characterized by the “matching” between the posted query and the respective results On the contrary, recommender systems are characterized by features such as “personalized” and “interesting” and hence greatly differentiate themselves form information retrieval systems and search engines Therefore, recommender systems are intelligent systems that aim to personally guide the potential users inside the underlying field The most popular recommendation methods are collaborative filtering (CF) and content-based filtering (CBF) Collaborative filtering is based on the assumption that users who with similar taste can serve as recommenders for each other on unobserved items On the other hand, content-based filtering considers the previous preferences of the user and upon them it predicts her future behavior Each method has advantages and shortcomings of its own and is best applied in specific situations Significant research effort has been devoted to hybrid approaches that use elements of both methods to improve performance and overcome weak points The recent advances in digital television and set-top technology with increased storage and processing capabilities enable the application of recommendation technologies in the television domain For example products currently promoted through broadcasted advertisements to unknown recipients may be recommended to specific viewers who are most likely to respond positively to these messages In this way recommendation technologies provide unprecedented opportunities to marketers and suppliers with the benefit of promoting goods and services more effectively while reducing viewers’ advertising clutter caused by the large amount of irrelevant messages [11] Moreover, the large number of available digital television channels increases the effort required to locate content, such as movies and other programs, that it is most likely to match viewe’s interests The digital TV vendors recognize this as a serious problem, and they are now offering personalized electronic program guides (EPGs) to help users navigate this digital maze [12] This article proposes a movie recommender system, named MoRe, which follows a hybrid approach that combines content-based and collaborative filtering MoR’s performance is empirically evaluated upon the predictive accuracy of the algorithms as well as other important indicators such as the percentage of items that the system can actually predict (called prediction coverage) and the time required for generating predictions The remainder of this article is organized as follows The next section is devoted to the fundamental background of recommender systems describing the main recommendation techniques along with their advantages and limitations Right after, we illustrate the MoRe system overview and in the section Personalized Movie Recommendation following, we describe in detail the algorithms implemented The empirical evaluation results are then presented, while the final section provides a discussion about conclusions and future research Background Theory Recommender Systems As previously mentioned, the objective of recommender systems is to identify which of the information items available are really interesting or likable to individual users The original idea underlying these systems is based on the observation that people very often rely upon opinions and recommendations from friends, family or associates to make selections or purchase decisions Motivated by this “social” approach, recommender systems produce individual recommendations as an output or have the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options [13] Hence, recommender systems aim at predicting a user’s future behavior based on her previous choices and by relying on features that implicitly or explicitly imply preferences As shown in Figure 1, the recommendation process usually takes user ratings on observed items and/or item features as input and produces the same output for unobserved items Many approaches have been designed, implemented and tested on how to process the original input data and produce the final outcome Still, two of them are the most dominant, successful and widely accepted: collaborative filtering and content-based filtering Collaborative filtering is the technique that maximally utilizes the “social” aspect of recommender systems, as similar users, called neighbors, are used in order to generate recommendations for the target user On the other hand, content-based filtering analyses the content of the items according to some features depending on the domain in order to profile the users according to their preferences These two fundamental approaches are presented in a great detail in the following subsections Next, we describe some other alternative techniques used in producing personalized recommendations We continue by realizing comparative observations among all aforementioned techniques, underlying the strengths and the shortcomings of each, thus driving the need of combining them in forming hybrid recommender systems Hybrids form the last subsection of the recommender systems background theory Fig A high level representation of a recommender system G Lekakos et al Collaborative Filtering Collaborative filtering comprises the most popular and widely used approach for generating recommendations [14] It filters and evaluates items utilizing other people’ tastes and attitudes It operates upon the assumption that users who have exhibited similar behavior in the past can serve as recommenders for each other on unobserved items Thus, while the term collaborative filtering has become popular since the last decade, its algorithmic behavior originates from something that people use to centuries no; exchange views and opinions According to collaborative filtering, a user’s behavior consists of her preferences to products or services The idea is to trace relationships or similarities between the target user and the remaining users in the database, aggregate the similar users’ preferences and use them as a prediction for the target user As a result, users that seem to prefer and choose common items are identified to have similar purchasing behavior and belong to a neighborhood The user of a specific neighborhood may receive recommendations from her neighborhood for items that she has not bought, used or experienced in the past with a great possibility of satisfaction as neighbors are characterized from common taste Collaborative filtering consists of four fundamental steps Data collection – Input space Neighbors similarity measurement Neighbors selection Recommendations generation The first step is an independent one that relates to different alternative ways of collecting the input data for the algorithm, while the rest three describe the algorithmic approach itself Data Collection – Input Space The input space for collaborative filtering may be summarized in a table, called user ritem matrix, where users form the rows and items form the columns, while each cell Cij of the matrix is filled by the degree of satisfaction of the ith user for the j th item The degree of satisfaction is usually depicted in the form of ratings from users to items Ratings can be continuous or discrete in a specific scale, i.e ranging from to 100 with real numbers or from to with natural numbers respectively They can alternatively be provided in a dual representation, i.e “thumbs p up”, or imply that the item was liked and “thumbs down”, or X entail that the item was not liked Rating scale chosen depends on the domain of the application Nevertheless, the most common rating scale being used is the one that ranges from to 5, with denoting totally unpleasantness and denoting absolute satisfaction The empty cells in the user item matrix imply that the user has not yet evaluated the specific item Besides the representation, rating also varies according to the way that it is collected There exist two different ways of collecting users’ ratings: explicitly and Personalized Movie Recommendation implicitly Explicit rating refers to a user consciously expressing her preference usually in a discrete numerical scale The user evaluates an item and assigns it a rate according to the scale used On the other hand, implicit rating refers to interpreting user behavior or selections to impute a vote or preference It can be based on browsing data (for example in Web applications), purchase history (for example in online or traditional stores) or other types of information access patterns Explicit rating is much more accurate reflecting more precisely each user’s taste (as long as users provide consistent ratings), but at the same time, it is much more difficult to collect from all users and for a large percentage of the offered items Moreover, most users usually rate items that they liked and avoid to deal with the uninteresting ones Thus, the user item matrix is generally filled in with positive votes lacking a sufficient amount of negative ones On the contrary, implicit rating may not always reflect the reality, since users were not asked directly, and in some cases, it can also be misleading (e.g the interpretation of time spent in a website in the case that the user is idle or has left her computer) Nevertheless, the biggest advantage of implicit rating is the fact that it relieves the user from examining and evaluating an item, it is usually based on positive preferences (thus avoiding the lack of negative ratings) and it manages to continuously collect input data as users interact with the system No matter what the nature of the input space is, acquiring users’ ratings for previously experienced objects comprises the fundamental initial step for collaborative filtering Neighbors Similarity Measurement Collaborative filtering approaches can be distinguished into two major classes: model-based and memory-based [15] Model-based methods develop and learn a model, which is applied upon the target user’s ratings to make predictions for unobserved items Two widely used probabilistic models are Bayesian classifier and Bayesian network with decision trees On the other hand, memory-based methods operate upon the entire database of users to find the closest neighbors of the target and weight their recommendation according to their similarities The fundamental algorithm of the memory-based class is the nearest neighbor (denoted as NN), which is considered as one of the most effective collaborative filtering approaches Weighting the neighbors’ recommendations implies defining and calculating the distance between the target user and her neighbors This distance may represent either the correlation or the similarity among all users A typical measure of correlation is the Pearson correlation coefficient, which indicates the degree of linear correlation between two variables In collaborative filtering, it is applied on the items rated in common by two users Other popular correlation measures are the Spearman rank correlation, which is similar to Pearson but calculates the correlation between ranked lists instead of ratings, and the mean-squared difference, which emphasizes the bigger distances among ratings instead of the small ones For further details about correlation measures and examples of some systems in where they were applied, refer to [16] G Lekakos et al On the other hand, the similarity is usually calculated using vectors, the so-called similarity vectors In the field of information retrieval, the similarity between two documents is often measured by treating each document as a vector of word frequencies and computing the cosine of the angle formed by the two frequency vectors Adopting this formalism to collaborative filtering, users take the role of the documents, items take the role of words and votes take the role of word frequencies Note that in this case, observed votes indicate a positive preference, there is no role for negative votes and unobserved items receive a zero vote [15] provides an extensive description of similarity vectors Neighbors Selection Having assigned weights to users, the next step is to decide which users will be selected and used in the recommendations generation process for the target user In other words, select the users that will form the target user’s neighborhood Theoretically, we could consider all users as neighbors with the closest ones contributing more and with the more distant ones contributing less in the generation of recommendations However, real commercial recommender systems deal with thousands to millions of users, and hence the approach of considering all users in the neighborhood is infeasible in terms of real time response Thus, the system should select a subset of users that best form the neighborhood in order to decrease the computational cost and guarantee acceptable response times Two techniques have been employed in recommender systems: the thresholdbased selection and the approach of k nearest neighbors (denoted as k–NNs) The former technique selects as neighbors those users whose correlation or similarity to the target user exceeds a certain threshold value Therefore, we may select neighbors to be over 70% similar to the target user (a high percentage in real online systems) On the contrary, the latter technique selects a predefined number k of best neighbors to form target user’s neighborhood Thus, for example, we may select the 10-best neighbors of the target user The threshold-based approach may form quite reliable neighborhoods, since it can guarantee that the neighbors of the target user will be, for example, over 70% similar to her However, in real applications, the diversity of the users is very high In such cases, there exist great possibilities of not even forming neighbors for some users and as a result, not generating recommendations for those users On the other hand, the k nearest neighbors technique always guarantees the formation of a neighborhood with the cost of possible decrease of results’ quality, since the distance between the target user and her neighbors may be actually very big There is no panacea in choosing one of the aforementioned techniques; rather the selection of the proper one depends every time on the underlying application domain In both cases, evaluating the appropriate values for the threshold or for the size of the neighborhood is of vital importance These values depend on the nature and the size of the input data and are experimentally calculated Personalized Movie Recommendation Recommendations Generation As soon as we have assigned weights to users and selected the ones that will serve as neighbors, we are ready to create predictions for the target user The recommendation for a new item for the target user is based on the weighted average of her neighbors’ ratings, weighted by their similarity to the target user The recommendation generated is normalized in order for its rating to fall in the very same range used for all items in the domain Content-based Filtering Content-based filtering makes predictions upon the assumption that a user’s previous preferences or interests are reliable indicators for her future behavior This approach requires that the items are described by features, and is typically applied upon textbased documents with predefined format or in domains with structured data, where the extraction of features that uniformly characterize the data is easy [17] Text documents are semi-structured data, since they not consist of specific predefined words The application of content-based filtering in semi-structured data adopts much work from the text information retrieval and the natural language processing fields, such as representing the documents as vectors and measure the similarity between those vectors, utilizing attributes and characteristics of the natural language [18] Applying content-based filtering in unstructured raw data, like multimedia, proves to be a very interesting and useful task, though a very challenging one that requires a lot of research [19] Therefore, the majority of developed content-based recommender systems target on textually described items, like books, articles and TV programs Even in the case of music and movies, recommendation techniques mostly apply on extracted textual features such as title, genre, etc and not on the multimedia itself Besides, most on-line music databases today, such as Napster and mp3.com and movie databases, such as IMDb, rely on file names or text labels to searching and indexing, using traditional text processing techniques Summarizing, content-based filtering can be applied in book recommender systems using features such as title, author, theme, or even a short summary or description (if available) In the same sense, it can be applied in a large range of other application domains, such as recommending movies (with features such as title, actors, director, genre, plot description), TV programs (with features such as name, type, presenter, hour), music (with features such as title, artist, album, genre), restaurants (with features such as name, cuisine, service, cost, place), and so forth TF-IDF (term-frequency times inverse document-frequency) and Information Gain are two metrics commonly applied in content-based filtering [20, 21] They are statistical measures used to evaluate how important a word is to a document in a collection or corpus The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the collection The intuition behind these metrics is that the terms with the highest 10 G Lekakos et al weight occur more often in the current document than in the other documents of the collection, and therefore are more central to the topic of this document On the other hand, terms that are very frequent to all documents in the collection provide no particular descriptive power for the current document Creating a model of the user’s preference from the user history is a form of classification learning Such algorithms are the key component of content-based recommendation systems, because they learn a function that models each user’s interests Given a new item and the user history, the function makes a prediction on whether the user would be interested in the item Some of the most popular algorithms are traditional machine learning algorithms designed to work on structured data, while other algorithms are designed to work in high dimensional spaces and not require a pre-processing step of feature selection [22] reviews a number of such well-known classification learning algorithms Other Approaches Apart from the two previously mentioned approaches of collaborative filtering and content-based filtering, there exist some more alternative techniques that have been deployed in recommender systems Some of the most popular ones include the collection of demographic data, the use of a utility function, the creation of a knowledge model, and the utilization of well-known data mining techniques Demographic recommender systems aim to categorize the user based on personal attributes and make recommendations based on demographic classes In most of these systems, demographic data for user categorization are collected through the interaction of the user with the system using questionnaires, short surveys, dialogue prompts, etc In other systems, machine learning is used to arrive at a classifier based on demographic data Demographic techniques form ‘people-to-people’ correlations like collaborative ones, but use different data The benefit of a demographic approach is that it may not require a history of the type needed by collaborative and content-based techniques Utility-based and knowledge-based recommenders match user’s need with a set of options available Utility-based recommenders make suggestions based on a computation of the utility of each object for the user Of course, the central problem is how to create a utility function for each user The benefit of utility-based recommendation is that it can factor non-product attributes, such as vendor reliability and product availability into the utility computation Knowledge-based recommendation attempts to suggest objects based on inferences about a user’s needs and preferences Knowledge-based approaches have knowledge about how a particular item meets a particular user need and can therefore reason about the relationship between a need and a possible recommendation Data mining has been applied with a great success in the field of grocery retailing It usually refers to the automated extraction of implicit but useful information from large databases Several data mining techniques, such as classification, clustering Personalized Movie Recommendation 11 and association analysis have been also used in the field of e-commerce aiming to extract critical results in order to support fundamental strategic commercial decisions These techniques can be additionally used in the recommendation process Association rules and clusters of similar users or similar items may seem very useful approaches [23] Comparing Recommendation Approaches All the aforementioned recommendation techniques have their own strengths and shortcomings Choosing the appropriate approach depends on the application domain, the nature of the corresponding data and the purpose that the recommender system will serve In this subsection, we compare the previously presented recommendation approaches underlying their advantages and weaknesses The best known problem that recommender systems face is the new entry or the so-called from literature ‘ramp-u’ problem [24] This term actually refers to two distinct but related problems New User: Most recommender systems realize a comparison between the target user and other users based on the commonly experienced items As a result, a user with few ratings becomes difficult to categorize New Item: Similarly, a new item that has not got many ratings also cannot be easily recommended It is also known as the ‘early rater’ problem, since the first person to rate an item gets little benefit from doing so Collaborative recommender systems depend on overlap in ratings across users and have difficulty when the space of ratings is sparse, since few users have rated the same items Sparsity is a significant problem in many domains, where there are many items available In these cases, unless the user base is very large, the chance that another user will share a large number of rated items is small These problems suggest that pure collaborative techniques are best suited to problems where the density of user interest is relatively high across a small and static space of items If the set of items changes too rapidly, old ratings will be of little value to new users If the set of items is large and user interest thinly spread, then the probability of overlap with other users will be small Collaborative recommenders work best for a user who acquires many neighbors of similar taste The technique does not work well for the so-called ‘gray sheep’ that falls on a border between existing cliques of users This is also a problem for demographic systems that attempt to categorize users on personal characteristics On the other hand, demographic recommenders not have the ‘new user’ problem, because they not require a list of ratings from the user Instead they have the problem of gathering the requisite demographic information In content-based filtering the features used to describe the content are of primary importance The more descriptive they are the more accurate the prediction is In computational terms, content-based prediction can be performed even if the user 12 G Lekakos et al has rated at least one similar item though prediction accuracy increases with the number of similar items Content-based techniques also have the problem that they are limited by the features that are explicitly associated with the objects that they recommend For example, content-based movie recommendation can only be based on written materials about a movie: actors’ names, plot summaries, etc because the movie itself is opaque to the system Of course, multimedia information retrieval tries to analyse the multimedia data itself, thus offering news ways to extract more representative features Still, this field of science is not yet mature and a lot of research is needed towards this direction Collaborative systems rely only on user ratings and can be used to recommend items without any descriptive data Even in the presence of descriptive data, some experiments have found that collaborative recommender systems can be more accurate than content-based ones [24] The great power of the collaborative approach relative to content-based ones is its cross-genre recommendation ability Both content-based and collaborative techniques suffer from the ‘portfolio effect.’ An ideal recommender would not suggest a stock that the user already owns or a movie she has already seen The problem becomes more evident in domains such as news filtering, since stories that look quite similar to those already read will never be recommended to the user Utility-based and knowledge-based recommenders not have ‘ramp-up’ or sparsity problems, since they not base their recommendations on accumulated statistical evidence The major benefit of utility-based techniques is that they let the user express all of the considerations that need to go into a recommendation However, the burden of this approach is that the user must construct a complete preference function and must therefore weight the significance of each possible feature Finally, knowledge-based recommender systems are prone to the drawback of knowledge acquisition The system should have knowledge about the objects being recommended and their features, knowledge about the users and ability to map between the user’s needs and the object that might satisfy those needs Despite this drawback, knowledge-based recommendation has some beneficial characteristics, such as it demands less effort by the user than utility-based recommendation and it does not involve a start-up period during which its suggestions are low quality Hybrids Hybrid recommender systems attempt to combine the aforementioned pure techniques in order to overcome the individual shortcomings and increase the overall quality of predictions The tricky part in designing a hybrid system lies behind the fact that the produced system jeopardizes to also inherent the weaknesses of the modular techniques As already mentioned, collaborative filtering is the most popular and efficient recommender approach Consequently, most hybrid systems combine collaborative filtering along with another technique that solves the ramp-up problem In their vast majority hybrids utilize combinations of content-based and Personalized Movie Recommendation 13 collaborative filtering [12, 21, 25, 26] Some of them further extend the two approaches by demographics-based predictions [27], while few of them utilize knowledge-based techniques where domain functional knowledge is exploited [24] A significant part of research in hybrid recommender systems concerns the techniques that can be used to combine the approaches since they may significantly affect the prediction outcome Burke [24, 28] classifies hybridization techniques into seven classes: weighted where each of the recommendation approaches makes predictions which are then combined into a single prediction; switching where one of the recommendation techniques is selected to make the prediction when certain criteria are met; mixed in which predictions from each of the recommendation techniques are presented to the user; feature combination where a single prediction algorithm is provided with features from different recommendation technique; cascade where the output from one recommendation technique is refined by another; feature augmentation where the output from one recommendation technique is fed to another, and meta-level in which the entire model produced by one recommendation technique is utilized by another Switching, mixed, and weighted hybrids are differentiated from the remaining techniques in Burke’s taxonomy by the fact that each one of the base recommendation methods produce independently from each other a prediction which is then presented to the user either as a single prediction (switching, weighted) or as two independent predictions (mixed) Switching hybrids in particular, are low-complexity hybridization methods based on the examination of the conditions that affect the performance of the base algorithms each time a prediction is requested When such conditions occur the final prediction is the outcome of the base recommendation approach that is not affected (or is less affected) from these conditions MoRe System Overview It has been empirically shown that collaborative filtering is more accurate than content-based filtering predictions provided that certain criteria are met [29, 30] As discussed earlier, two criteria with significant effect are the neighborhood size and the number of rated items by the target user In this paper we present a switching hybrid algorithm whose main prediction approach is based on collaborative filtering switched to content-based filtering when the above criteria are met The proposed hybrid approach besides predictive accuracy also considers two other factors with practical significance: the prediction coverage as well as the time required to make a real-time prediction The MoRe system is a Web-based recommender system that collects user ratings concerning movies on one-to-five scale through its graphical user interface More specifically as soon as a new user is registered with the system she is asked to provide a number of ratings in order for the system to initiate the prediction process (new user problem) The selection of movies that are presented to the user is 14 G Lekakos et al based on the measure proposed by [31] computed as log popularity/ entropy The selection of the most popular movies increases the possibility to collect the respective ratings since it is most likely that these movies have been actually seen by the new user The collected ratings are organized in a user item matrix and combined with the movies dataset are loaded into the system MoRe’s architectural design, presented in Figure 2, realizes three recommender techniques; a pure collaborative filtering, a pure content-based filtering and a hybrid approach that has been implemented in two versions, called switching and substitute The two versions of the hybrid algorithm are differentiated by the parameter that controls the switch from collaborative filtering to content-based filtering as will be analyzed in subsequent sections The MoRe system utilizes a version of the well-known MovieLens dataset that contains 1,000,000 user ratings provided by 6040 original MovieLens users for about 4000 movies Each user has rated at least twenty movies in the one-to-five rating scale The sparsity of the user ratings matrix is 95.86% Since the dataset Fig MoRe system overall design Personalized Movie Recommendation 15 contains only the name and the production year of each movie, it is necessary to augment the movie description features for the content-based predictor To accomplish that, we implemented a web crawler that seeks for data in the website of IMDb The crawler exploits the search tool of IMDb and collects data about the genre, cast, director, writing credits, producers and plot keywords of each movie The number of plot keywords may exponentially increase the number of features used to describe the movies and therefore the system administrator may remove keywords from the movie description that appear in less than a certain number of movies In addition, the system creates (at an off-line phase) the set of most similar movies for all available movies in order to speed-up real-time predictions The size of the set of most similar movies is also determined by the system administrator, as shown in Figure In order to make recommendations, collaborative filtering uses the ratings matrix while the content-based predictor uses mainly the movie data files Hybrid techniques use both the content-based and the collaborative engines Even though the system is able to produce recommendations with more than one technique, only one is applied at any given time The technique selection, realized in Figure 4, is a task for the administrator of the system Figure illustrates that the users receive the recommendations in a ranked list of movies where the prediction appears to the user in a “five-star” scale The users may additionally provide their feedback directly on the recommended movies The system described can also be easily used for parameters tuning and experimental evaluation The system administrator may select the size of the training and test sets as percentage of the whole dataset and initiate the estimation of the accuracy of the recommendation methods in MoRe Fig Selection of features threshold and pre-computation of similar movies sets .. .Handbook of Multimedia for Digital Entertainment and Arts Borko Furht Editor Handbook of Multimedia for Digital Entertainment and Arts 123 Editor Borko Furht Department of Computer Science and. .. new form of entertainment and art, which attracts and absorbs their participants The fantastic success of this new field has influenced the development of the new digital entertainment industry and. .. and investors The book can have a great potential to be adopted as a textbook in current and new courses on Media Entertainment The main features of this Handbook can be summarized as: The Handbook

Định dạng
Số trang	30
Dung lượng	635,13 KB