topic movie recommendation system of netflix

24 0 0
Tài liệu đã được kiểm tra trùng lặp
topic movie recommendation system of netflix

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

● Task: Collecting and integrating user data, movie metadata, and other relevantinformation into the recommendation system''''s database.● Responsibility: Handled by automated data collecti

Trang 1

Subject: AI Business Applications

Lecturer: Vu Trong Sinh

FINAL REPORT

GROUP 5 - CityU 8B

Nguyen Thi Phuong Nhi - CA8-146Le Phuong Anh - CA8-114Nguyen Thy Ngoc - CA8-153Le Hoang Linh Chi - CA8-118

Hanoi, 2023

Trang 2

CA8-114 Lê Phương Anh Introduction; Idea bank; Project definition andplanning

CA8-118 Lê Hoàng Linh Chi Prototype architecture and technologies, Productiontesting & conclusion`

25%

Trang 3

1 Introduction

1.1 Brief introduction about Netflix

Founded in 1997, Netflix has successfully become a global player in the streamingindustry by providing unique and diverse content to its subscribers in more than 190countries The platform boasts a vast library of content, ranging from original productions tolicensed TV shows and movies, catering to diverse audiences worldwide With the ongoingtrend towards cord-cutting, Netflix has established itself as a key player in the entertainmentindustry, providing its users with a range of streaming options across multiple devices.However, Netflix's users have been faced with a dilemma: the recommendation algorithmshave been primarily focused on accuracy, neglecting other essential elements such asdiversity and serendipity As a result, the recommendations often exclude long-tail items andhave low coverage In this study, we present a recommendation technique that takes intoaccount diversity, accuracy, and serendipity, a triple threat that makes sure no good contentgoes unnoticed.

Trang 4

● Task: Collecting and integrating user data, movie metadata, and other relevantinformation into the recommendation system's database.

● Responsibility: Handled by automated data collection processes and backendsystems (AI).

User Profile Creation:

● Task: Creating user profiles based on their interactions, preferences, andviewing history.

● Responsibility: Automated process using AI algorithms.

● Once user profiles are created and content data is collected, preprocessingtechniques are applied to clean and transform the data before further analysis.● Responsibility: Automated process using AI algorithms.

● The preprocessing step involves several tasks, such as removing duplicateentries, handling missing values, standardizing data formats, and normalizingdata

Content Analysis and Feature Extraction:

● Task: Analyzing the content of movies and extracting relevant features such asgenres, actors, directors, and other metadata.

● Responsibility: Automated process using AI algorithms.

● Responsibility: Automated process using AI algorithms.

Filtering and Ranking:

● Task: Filtering and ranking the generated recommendations based onadditional criteria, such as user feedback, popularity, and business rules.● Responsibility: A combination of AI algorithms and human curators AI

algorithms handle initial filtering and ranking, while human curators may

apply additional criteria or business rules.

Presentation to the User:

● Task: Presenting the final set of movie recommendations to the user throughthe Netflix interface.

● Responsibility: The recommendation system's AI algorithms handle thepresentation and delivery of recommendations to the user.

Trang 5

● Task: Collecting and analyzing user feedback, ratings, and interactions toimprove the recommendation system.

● Responsibility: Combination of AI algorithms and human data scientists AIalgorithms collect and analyze user feedback at scale, while human datascientists interpret and analyze the feedback for insights and systemimprovements.

By using this model, Netflix can face the problem that the recommendation system can onlymake recommendations based on existing user interests In other words, the model has alimited ability to extend the user's current interest.

1.3 Set up an idea bank:

1 Incorporate ExplorationRecommendations & EnhanceSerendipitous Discovery

Netflix can introduce a "Surprise Me" or"Random Pick" feature that suggests contentoutside the user's typical viewing patterns.This can allow users to discover hiddengems and expand their horizons byproviding unexpected recommendations.2 Collaborative Filtering with Diverse

User Profiles

Netflix can enhance their collaborativefiltering approach by considering diverseuser profiles and preferences This caninvolve incorporating input from users withvarying tastes and interests to ensure thatrecommendations are not solely based onpopular or mainstream content.3 Contextual Recommendation Engine Netflix can develop a contextual

recommendation engine that considers notonly user preferences and item

characteristics but also contextual factorssuch as time, location, device, and socialcontext.

Firstly, about capital requirement:

Idea 2 requires a low capital requirement because we can utilize the existing data, the initial

Trang 6

data collection and storage infrastructure are already in place, reducing the need for additionalcapital investment in data acquisition In addition, existing algorithms, such as user-based oritem-based collaborative filtering, can be used as a starting point and customized based onspecific requirements This reduces the need for significant investment in algorithm researchand development Moreover, collaborative filtering, especially in its traditional forms, isrelatively straightforward to implement compared to more complex recommendationapproaches The simplicity of collaborative filtering reduces the development complexity,resulting in lower development costs and faster implementation timelines.

Idea 1 can require substantial capital allocation due to the development of new features, datacollection and analysis, algorithm development, user experience enhancements, and systemintegration These investments are necessary to deliver a differentiated and engagingrecommendation experience, but they contribute to higher capital expenditure compared toenhancing collaborative filtering with diverse user profiles, which builds upon existingalgorithmic approaches and data.

Idea 3 Developing a contextual recommendation engine requires significant investment indata collection infrastructure, machine learning models, and real-time processing capabilities.It relies on a wide range of contextual data, such as user location, time, device, browsingbehavior, social interactions, and more Collecting and processing this contextual datarequires robust infrastructure and systems capable of handling large volumes of data.Investing in data collection mechanisms, data storage, data pipelines, and data processingcapabilities can incur substantial costs And building real-time processing capabilities,including scalable and low-latency systems, may involve investments in infrastructure, cloudservices, and specialized technologies.

Secondly, about the risk:

Idea 2 of enhancing the collaborative filtering approach by considering diverse user profilesand preferences carries relatively low risk It builds upon existing algorithms and data,leveraging user input and preferences to provide more personalized recommendations Themain risks may involve data quality and privacy concerns, but these can be mitigated withproper safeguards.

Idea 1 involves introducing a "Surprise Me" or "Random Pick" feature to suggest contentoutside the user's typical viewing patterns While there may be some risks associated withdeveloping and integrating this feature, such as potential user resistance or technicalchallenges, the overall risk level is relatively low to medium.

Idea 3 of developing a contextual recommendation engine involves higher risk compared tothe other two ideas It requires significant investment in data collection infrastructure,machine learning models, and real-time processing capabilities The complexity of integratingcontextual data, ensuring data accuracy, and maintaining real-time processing systems adds tothe risk level Additionally, the need for continuous updates and maintenance furthercontributes to the overall risk.

After evaluating various ideas considering risk and capital requirements, it has beendetermined that idea 2 is the most suitable option to proceed with implementation

2 Project definition and planning

2.1 Design Thinking:

Trang 7

watch movies, TV shows, and other content)

Empathy map:

● Explore the Netflix platform to find movies and TV shows.● Watch content based on personal interests and preferences.● Rate and review the movies and TV shows they watch.Feels:

● Users may feel disappointed when they come across a movie that they didn't like● Users become impatient by the search process taking longer than expected.

● Users feel overwhelmed by the sheer volume of options and struggle to make adecision.

● User feel boring with the same movie recommendedThinks:

● “Finding a suitable movie is a waste of time"● “The recommendations are too repetitive sometimes"● "It's frustrating to see suggestions that are not relevant to me."Say:

● “I wish I could discover some new movies in other genres that might appeal to me”● “I wish the system could suggest me some interesting movie, not just the trending

The goal is to provide the user with a diverse range ofmovie recommendations that are not solely focused onthe latest or trending films but based on the similar usersthat have the same tastes.

As a user, I want to explore the new and interestingmovies that may appeal to me.

The goal is to help users discover new and diversecontent that they may not have been aware of orconsidered before By offering a wide range ofrecommendations across various genres and categories,users can explore and expand their viewing options

As a user, I want to be pleasantly surprised by therecommendation system, discovering hidden gems ormovies outside my usual genre preferences, allowing meto expand my viewing horizons.

2.2 Success criteria of the project [16]Business metric

Click-through rate CTR measures how many clicks are gained by recommendations The assumption

Trang 8

(CTR) is that the higher the clicks, the more relevant are the recommendations.The Netflix recommendation system is considered to be a success if it is equal orslightly higher than those CTR rates of industry or other recommendationsystems In this case, >= 38% ( compared to Google news or Forbesrecommendation system) [16]

Adoption andconversion

While CTR tells us whether a user clicked a movie, it can’t determine whetherthat user actually saw that recommended movie Alternative adoption measureshave been taken into account by Netflix Netflix counts how many times a movieor series was watched after being recommended (“Take rate”).

User-centric metric:

Novelty Novelty can be defined as a fraction of unknown items among all items the userliked It assesses whether the system introduces users to content they have notencountered before, promoting exploration and discovery High novelty meansthat the system suggests items that the user may not have considered or beenaware of, enhancing the user's viewing experience An ideal way of measuring itwould be a customer survey but in most cases, we are unable to determinewhether the user knew the item before.

Determining a specific numeric value for a "good" Novelty metric in a Netflixrecommendation system can be challenging as it depends on several factors,including the preferences of the user base and the content library However, as ageneral guideline, a Novelty metric between 0.6 and 0.8 (or 60% to 80%) is oftenconsidered favorable for promoting diversity and introducing users to newcontent.

System performance:

Response time For any recommendation system, response time should be small which proves afast and active system A system with slow processing time can not be deployedor become useful for users We expect the system to respond to users in seconds.A response time ranging between 200 milliseconds and 1 second is consideredacceptable as users still likely won’t notice the delay [27]

Maximum numberof concurrent users

A recommendation system is usually made useful for a large number of users.Hence, the system should be able to deal with multiple user’s requestssimultaneously.

3 Data curation

● The expected dataset for the Netflix recommendation system can include thefollowing examples:

- User Data:

Trang 9

+ User profiles: Each user's profile information, including demographics,viewing history, ratings, and preferences.

+ Viewing history: A record of the movies and TV shows watched by each user,including timestamps.

+ Ratings: User ratings for movies and TV shows, indicating their preferences.

● List all the appropriate data sources:- Netflix API:

+ Data source: Netflix provides an API that allows access to user data, viewinghistory, ratings, and other relevant information.

+ Steps to collect data:

(1) To collect data using the Netflix API, you would need to register forAPI access and obtain an API key.

(2) With the API key, you can make authorized requests to retrieve userdata, viewing history, and other relevant information by sending HTTPrequests to the API endpoints.

- MovieLens:

+ Data source: The MovieLens dataset provides a collection of movie ratingsand related data for research and education purposes The objective of thisdataset is to support the development of movie recommendation systems, datamining algorithms, and related research studies It includes information suchas movie ratings, user demographics, movie metadata, and user-provided tags.The dataset allows researchers to evaluate and compare recommendationalgorithms, explore new methods, and contribute to the field of movierecommendation The ultimate goal is to foster research and innovation in thisdomain.

+ Steps to collect data:

(1) Access to website:https://grouplens.org/datasets/movielens/[19](2) Choose a dataset: MovieLens Latest Dataset (Small: 100,000 ratings

and 3,600 tag applications applied to 9,000 movies by 600 users Lastupdated 9/2018)

(3) Download dataset (zip file)(4) Load the file to Drive (.csv)

● Describe the way to store and organize:

- User Data:

+ User profiles, viewing history, and ratings can be stored in a structured formatsuch as a database, where each user's information is stored as separate recordswith relevant attributes This can be implemented using a databasemanagement system like MySQL, PostgreSQL, or MongoDB.

Trang 10

+ Database: Users table with columns likeuser_id, name, age, gender, andpreferences.

+ Viewing History: Table or collection with columns likeuser_id, movie_id,timestamp.

+ Ratings: Table or collection with columns likeuser_id, movie_id, rating.

● Collaborative Filtering: Collaborative filtering techniques can be used to leverage

the collective behavior of users to generate recommendations By analyzing thebehavior and preferences of similar users, the system can make predictions andprovide recommendations based on patterns observed in the larger user base.

Data source: The MovieLens dataset provides a collection of movie ratings and related data

for research and education purposes The objective of this dataset is to support the

development of movie recommendation systems, data mining algorithms, and related researchstudies It includes information such as movie ratings, user demographics, movie metadata,and user-provided tags The dataset allows researchers to evaluate and compare

recommendation algorithms, explore new methods, and contribute to the field of movierecommendation The ultimate goal is to foster research and innovation in this domain.

4 Prototype building

4.1 Survey of all existing solutions:

Thesis:Diversification inRecommendation

System [3] Diversifymovierangerecommen-dationsystem

diversification https://raiith.iith.ac.in/4104/1/Thesis_Mtech_CS_4104.pdf

High computational complexity:This may hinder the widespreadadoption of the method in someapplications.

Research paper: ItTakes Variety toMake a World:Diversification inRecommenderSystems [4]

Algorithm MaxRelAlgorithm MaxDivAlgorithm SwapAlgorithm Greedy

Algorithm MaxRel struggled withdiversity and missed the long-tailniche items Accuracy of MaxDivwas not as high as AlgorithmMaxRel Algorithm Swap had betteraccuracy than the other approaches,but its diversity was limited.Algorithm Greedy, had betteraccuracy than Algorithm MaxDivand Algorithm MaxRel, but itignored less-popular items with littleto no explanation, lowering itscoverage Therefore, none of thembetter in use to improve the current

Trang 11

problem of Netflix recommendationsystem - diversification.

Research paper:A method forimproving thediversity of movierecommendationwith knowledgegraph [5]

CombinedCollaborativeFiltering withLatent FactorModel & TopicModel to form ahybrid approach.

Hybrid approach is effective inimproving the diversity of movierecommendations withoutsignificantly compromising theaccuracy.

Netflix likely has its own unique recommendations system that takes cues from a variety ofsources However, there are some areas from other companies’s system that we are interestedand we think Netflix might:

- Personalization techniques: Like Netflix, HBO Max aim to be as personalized as possiblebut in a way that is not so isolated While Netflix’s recommendation system focuses mainlyon machine learning and algorithms, HBO Max has tried to take an alternate approach thatutilizes a hybrid of algorithm and human curated content but with a focus on that humantouch By involving human curation, Netflix could improve the accuracy of theirrecommendations and offer more personalized suggestions that better suit their user’spreferences and interest [6]

- Machine learning and natural language processing: Amazon Prime Video uses NLPtechniques to understand user preferences, while HBO Max and Apple TV+ use machinelearning algorithms to improve recommendations Netflix could study how these systems areusing these artificial intelligence techniques to create more sophisticated and nuancedrecommendations [6]

Some information of similar companies’s recommendation system:

1 Amazon Prime Video: Amazon uses a hybrid approach for its recommendation system thatcombines collaborative filtering and content-based filtering They also use Natural LanguageProcessing (NLP) techniques to understand user preferences and provide personalizedrecommendations [7]

+ Pros: The recommendations are highly relevant Its integration with Amazon's otherservices, such as Amazon shopping and Alexa, provides a seamless user experience.+ Cons: Amazon's recommendation system sometimes fails to provide accurate

recommendations, and the user interface can be cluttered.

2 HBO Max: HBO Max uses a hybrid approach to their recommendation system, similar toAmazon Prime Video, that combines collaborative filtering and content-based filtering Theyalso use Machine Learning algorithms to analyze user behavior and adjust recommendationsaccordingly.

+ Pros: The recommendations are highly relevant as HBO Max established innovative

Trang 12

feature: “Choose your adventure”, allowing users to answer a few questions abouttheir mood and preferences, and then HBO Max recommends a personalized viewingexperience based on their answer [8]

+ Cons: HBO Max's recommendation system can be limited in the range of optionsprovided.

3 Disney: uses a collaborative filtering approach to recommend content based on the user'sviewing history, favorites, and content ratings They also provide curated collections andpersonalized watchlists based on user preferences [9]

+ Pros: its extensive library of exclusive content, making it easier for users to discovernew shows and movies.

+ Cons: Disney+'s recommendation system can be limited in the range of optionsprovided and the user interface can be cluttered.

4 Apple TV+: uses a mix of collaborative filtering and machine learning to personalizecontent recommendations based on the user's viewing habits, preferences, and content ratings.They also offer handpicked content collections based on special events or themes [10]

+ Pros: its integration with Apple's ecosystem, making it easier for users to find andwatch content across their devices.

+ Cons: some users have reported that Apple TV+'s recommendation system can beslow to update and the search function can be limited.

● Solutions suggested by the AI community, industry conferences (provide with theproof from the community, the paper or the name of the conference)

- The International Joint Conference on Artificial Intelligence (IJCAI) suggested a newapproach to recommending diverse items, which they call diversity-weighted utilitymaximization (DUM) The paper of the conference: Optimal Greedy Diversity forRecommendation, proceedings of the Twenty-Fourth International Joint Conferenceon Artificial Intelligence (IJCAI 2015) [11]

4.2 Prototype architecture and technologies needed:● System architecture

Ngày đăng: 19/06/2024, 17:04