1. Trang chủ
  2. » Luận Văn - Báo Cáo

topic movie recommendation system of netflix

24 3 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Movie Recommendation System of Netflix
Tác giả Nguyen Thi Phuong Nhi, Le Phuong Anh, Nguyen Thy Ngoc, Le Hoang Linh Chi
Người hướng dẫn Vu Trong Sinh, Lecturer
Trường học City University of Hong Kong
Chuyên ngành AI Business Applications
Thể loại Final Report
Năm xuất bản 2023
Thành phố Hanoi
Định dạng
Số trang 24
Dung lượng 2,35 MB

Nội dung

● Task: Collecting and integrating user data, movie metadata, and other relevantinformation into the recommendation system''''s database.● Responsibility: Handled by automated data collecti

Trang 1

Subject: AI Business Applications

Lecturer: Vu Trong Sinh

FINAL REPORT

GROUP 5 - CityU 8B

Nguyen Thi Phuong Nhi - CA8-146

Le Phuong Anh - CA8-114

Nguyen Thy Ngoc - CA8-153

Le Hoang Linh Chi - CA8-118

Hanoi, 2023

Trang 2

ID

CA8-114 Lê Phương Anh Introduction; Idea bank; Project definition and

planning

25%

CA8-118 Lê Hoàng Linh Chi Prototype architecture and technologies, Production

testing & conclusion`

25%

Trang 3

1 Introduction

1.1 Brief introduction about Netflix

Founded in 1997, Netflix has successfully become a global player in the streamingindustry by providing unique and diverse content to its subscribers in more than 190countries The platform boasts a vast library of content, ranging from original productions tolicensed TV shows and movies, catering to diverse audiences worldwide With the ongoingtrend towards cord-cutting, Netflix has established itself as a key player in the entertainmentindustry, providing its users with a range of streaming options across multiple devices.However, Netflix's users have been faced with a dilemma: the recommendation algorithmshave been primarily focused on accuracy, neglecting other essential elements such asdiversity and serendipity As a result, the recommendations often exclude long-tail items andhave low coverage In this study, we present a recommendation technique that takes intoaccount diversity, accuracy, and serendipity, a triple threat that makes sure no good contentgoes unnoticed

Trang 4

● Task: Collecting and integrating user data, movie metadata, and other relevantinformation into the recommendation system's database.

● Responsibility: Handled by automated data collection processes and backendsystems (AI)

User Profile Creation:

● Task: Creating user profiles based on their interactions, preferences, andviewing history

● Responsibility: Automated process using AI algorithms

Preprocessing:

● Once user profiles are created and content data is collected, preprocessingtechniques are applied to clean and transform the data before further analysis

● Responsibility: Automated process using AI algorithms

● The preprocessing step involves several tasks, such as removing duplicateentries, handling missing values, standardizing data formats, and normalizingdata

Content Analysis and Feature Extraction:

● Task: Analyzing the content of movies and extracting relevant features such asgenres, actors, directors, and other metadata

● Responsibility: Automated process using AI algorithms

● Responsibility: Automated process using AI algorithms

Filtering and Ranking:

● Task: Filtering and ranking the generated recommendations based onadditional criteria, such as user feedback, popularity, and business rules

● Responsibility: A combination of AI algorithms and human curators AI

algorithms handle initial filtering and ranking, while human curators may

apply additional criteria or business rules

Presentation to the User:

● Task: Presenting the final set of movie recommendations to the user throughthe Netflix interface

● Responsibility: The recommendation system's AI algorithms handle thepresentation and delivery of recommendations to the user

Trang 5

● Task: Collecting and analyzing user feedback, ratings, and interactions toimprove the recommendation system.

● Responsibility: Combination of AI algorithms and human data scientists AIalgorithms collect and analyze user feedback at scale, while human datascientists interpret and analyze the feedback for insights and systemimprovements

By using this model, Netflix can face the problem that the recommendation system can onlymake recommendations based on existing user interests In other words, the model has alimited ability to extend the user's current interest

1.3 Set up an idea bank:

1 Incorporate Exploration

Recommendations & Enhance

Serendipitous Discovery

Netflix can introduce a "Surprise Me" or

"Random Pick" feature that suggests contentoutside the user's typical viewing patterns.This can allow users to discover hiddengems and expand their horizons byproviding unexpected recommendations

2 Collaborative Filtering with Diverse

User Profiles

Netflix can enhance their collaborativefiltering approach by considering diverseuser profiles and preferences This caninvolve incorporating input from users withvarying tastes and interests to ensure thatrecommendations are not solely based onpopular or mainstream content

3 Contextual Recommendation Engine Netflix can develop a contextual

recommendation engine that considers notonly user preferences and item

characteristics but also contextual factorssuch as time, location, device, and socialcontext

Firstly, about capital requirement:

Idea 2 requires a low capital requirement because we can utilize the existing data, the initial

Trang 6

data collection and storage infrastructure are already in place, reducing the need for additionalcapital investment in data acquisition In addition, existing algorithms, such as user-based oritem-based collaborative filtering, can be used as a starting point and customized based onspecific requirements This reduces the need for significant investment in algorithm researchand development Moreover, collaborative filtering, especially in its traditional forms, isrelatively straightforward to implement compared to more complex recommendationapproaches The simplicity of collaborative filtering reduces the development complexity,resulting in lower development costs and faster implementation timelines.

Idea 1 can require substantial capital allocation due to the development of new features, datacollection and analysis, algorithm development, user experience enhancements, and systemintegration These investments are necessary to deliver a differentiated and engagingrecommendation experience, but they contribute to higher capital expenditure compared toenhancing collaborative filtering with diverse user profiles, which builds upon existingalgorithmic approaches and data

Idea 3 Developing a contextual recommendation engine requires significant investment indata collection infrastructure, machine learning models, and real-time processing capabilities

It relies on a wide range of contextual data, such as user location, time, device, browsingbehavior, social interactions, and more Collecting and processing this contextual datarequires robust infrastructure and systems capable of handling large volumes of data.Investing in data collection mechanisms, data storage, data pipelines, and data processingcapabilities can incur substantial costs And building real-time processing capabilities,including scalable and low-latency systems, may involve investments in infrastructure, cloudservices, and specialized technologies

Secondly, about the risk:

Idea 2 of enhancing the collaborative filtering approach by considering diverse user profilesand preferences carries relatively low risk It builds upon existing algorithms and data,leveraging user input and preferences to provide more personalized recommendations Themain risks may involve data quality and privacy concerns, but these can be mitigated withproper safeguards

Idea 1 involves introducing a "Surprise Me" or "Random Pick" feature to suggest contentoutside the user's typical viewing patterns While there may be some risks associated withdeveloping and integrating this feature, such as potential user resistance or technicalchallenges, the overall risk level is relatively low to medium

Idea 3 of developing a contextual recommendation engine involves higher risk compared tothe other two ideas It requires significant investment in data collection infrastructure,machine learning models, and real-time processing capabilities The complexity of integratingcontextual data, ensuring data accuracy, and maintaining real-time processing systems adds tothe risk level Additionally, the need for continuous updates and maintenance furthercontributes to the overall risk

After evaluating various ideas considering risk and capital requirements, it has beendetermined that idea 2 is the most suitable option to proceed with implementation

2 Project definition and planning

2.1 Design Thinking:

Trang 7

watch movies, TV shows, and other content)

Empathy map:

Does:

● Explore the Netflix platform to find movies and TV shows

● Watch content based on personal interests and preferences

● Rate and review the movies and TV shows they watch

Feels:

● Users may feel disappointed when they come across a movie that they didn't like

● Users become impatient by the search process taking longer than expected

● Users feel overwhelmed by the sheer volume of options and struggle to make adecision

● User feel boring with the same movie recommended

Thinks:

● “Finding a suitable movie is a waste of time"

● “The recommendations are too repetitive sometimes"

● "It's frustrating to see suggestions that are not relevant to me."

Say:

● “I wish I could discover some new movies in other genres that might appeal to me”

● “I wish the system could suggest me some interesting movie, not just the trendingones”

The goal is to provide the user with a diverse range of

movie recommendations that are not solely focused on

the latest or trending films but based on the similar users

that have the same tastes

As a user, I want to explore the new and interestingmovies that may appeal to me

The goal is to help users discover new and diverse

content that they may not have been aware of or

considered before By offering a wide range of

recommendations across various genres and categories,

users can explore and expand their viewing options

As a user, I want to be pleasantly surprised by therecommendation system, discovering hidden gems ormovies outside my usual genre preferences, allowing me

to expand my viewing horizons

2.2 Success criteria of the project [16]

Business metric

Click-through rate CTR measures how many clicks are gained by recommendations The assumption

Trang 8

(CTR) is that the higher the clicks, the more relevant are the recommendations.

The Netflix recommendation system is considered to be a success if it is equal orslightly higher than those CTR rates of industry or other recommendationsystems In this case, >= 38% ( compared to Google news or Forbes

or series was watched after being recommended (“Take rate”)

User-centric metric:

Novelty Novelty can be defined as a fraction of unknown items among all items the userliked It assesses whether the system introduces users to content they have not

encountered before, promoting exploration and discovery High novelty meansthat the system suggests items that the user may not have considered or beenaware of, enhancing the user's viewing experience An ideal way of measuring itwould be a customer survey but in most cases, we are unable to determinewhether the user knew the item before

Determining a specific numeric value for a "good" Novelty metric in a Netflixrecommendation system can be challenging as it depends on several factors,including the preferences of the user base and the content library However, as ageneral guideline, a Novelty metric between 0.6 and 0.8 (or 60% to 80%) is oftenconsidered favorable for promoting diversity and introducing users to newcontent

System performance:

Response time For any recommendation system, response time should be small which proves afast and active system A system with slow processing time can not be deployed

or become useful for users We expect the system to respond to users in seconds

A response time ranging between 200 milliseconds and 1 second is consideredacceptable as users still likely won’t notice the delay [27]

Maximum number

of concurrent users

A recommendation system is usually made useful for a large number of users.Hence, the system should be able to deal with multiple user’s requestssimultaneously

3 Data curation

● The expected dataset for the Netflix recommendation system can include thefollowing examples:

- User Data:

Trang 9

+ User profiles: Each user's profile information, including demographics,viewing history, ratings, and preferences.

+ Viewing history: A record of the movies and TV shows watched by each user,including timestamps

+ Ratings: User ratings for movies and TV shows, indicating their preferences

● List all the appropriate data sources:

- Netflix API:

+ Data source: Netflix provides an API that allows access to user data, viewinghistory, ratings, and other relevant information

+ Steps to collect data:

(1) To collect data using the Netflix API, you would need to register forAPI access and obtain an API key

(2) With the API key, you can make authorized requests to retrieve userdata, viewing history, and other relevant information by sending HTTPrequests to the API endpoints

- MovieLens:

+ Data source: The MovieLens dataset provides a collection of movie ratingsand related data for research and education purposes The objective of thisdataset is to support the development of movie recommendation systems, datamining algorithms, and related research studies It includes information such

as movie ratings, user demographics, movie metadata, and user-provided tags.The dataset allows researchers to evaluate and compare recommendationalgorithms, explore new methods, and contribute to the field of movierecommendation The ultimate goal is to foster research and innovation in thisdomain

+ Steps to collect data:

(1) Access to website:https://grouplens.org/datasets/movielens/[19](2) Choose a dataset: MovieLens Latest Dataset (Small: 100,000 ratingsand 3,600 tag applications applied to 9,000 movies by 600 users Lastupdated 9/2018)

(3) Download dataset (zip file)

(4) Load the file to Drive (.csv)

● Describe the way to store and organize:

- User Data:

+ User profiles, viewing history, and ratings can be stored in a structured formatsuch as a database, where each user's information is stored as separate recordswith relevant attributes This can be implemented using a database

management system like MySQL, PostgreSQL, or MongoDB

Trang 10

+ Database: Users table with columns likeuser_id, name, age, gender, and

preferences

+ Viewing History: Table or collection with columns likeuser_id, movie_id,

timestamp

+ Ratings: Table or collection with columns likeuser_id, movie_id, rating

● Collaborative Filtering: Collaborative filtering techniques can be used to leverage

the collective behavior of users to generate recommendations By analyzing the

behavior and preferences of similar users, the system can make predictions and

provide recommendations based on patterns observed in the larger user base

Data source: The MovieLens dataset provides a collection of movie ratings and related data

for research and education purposes The objective of this dataset is to support the

development of movie recommendation systems, data mining algorithms, and related researchstudies It includes information such as movie ratings, user demographics, movie metadata,

and user-provided tags The dataset allows researchers to evaluate and compare

recommendation algorithms, explore new methods, and contribute to the field of movie

recommendation The ultimate goal is to foster research and innovation in this domain

4 Prototype building

4.1 Survey of all existing solutions:

s_Mtech_CS_4104.pdf

High computational complexity:This may hinder the widespreadadoption of the method in someapplications

https://openproceedings.org/2009/conf/edbt/YuLA09.pdf

Algorithm MaxRel struggled withdiversity and missed the long-tailniche items Accuracy of MaxDivwas not as high as AlgorithmMaxRel Algorithm Swap had betteraccuracy than the other approaches,but its diversity was limited.Algorithm Greedy, had betteraccuracy than Algorithm MaxDivand Algorithm MaxRel, but itignored less-popular items with little

to no explanation, lowering itscoverage Therefore, none of thembetter in use to improve the current

Trang 11

problem of Netflix recommendationsystem - diversification.

https://isciia2022

bit.edu.cn/docs/2023-01/03ebb47fe7c04a639f409023e3a72715.pdf

Hybrid approach is effective inimproving the diversity of movierecommendations withoutsignificantly compromising theaccuracy

Netflix likely has its own unique recommendations system that takes cues from a variety ofsources However, there are some areas from other companies’s system that we are interestedand we think Netflix might:

- Personalization techniques: Like Netflix, HBO Max aim to be as personalized as possiblebut in a way that is not so isolated While Netflix’s recommendation system focuses mainly

on machine learning and algorithms, HBO Max has tried to take an alternate approach thatutilizes a hybrid of algorithm and human curated content but with a focus on that humantouch By involving human curation, Netflix could improve the accuracy of theirrecommendations and offer more personalized suggestions that better suit their user’spreferences and interest [6]

- Machine learning and natural language processing: Amazon Prime Video uses NLPtechniques to understand user preferences, while HBO Max and Apple TV+ use machinelearning algorithms to improve recommendations Netflix could study how these systems areusing these artificial intelligence techniques to create more sophisticated and nuancedrecommendations [6]

Some information of similar companies’s recommendation system:

1 Amazon Prime Video: Amazon uses a hybrid approach for its recommendation system thatcombines collaborative filtering and content-based filtering They also use Natural LanguageProcessing (NLP) techniques to understand user preferences and provide personalizedrecommendations [7]

+ Pros: The recommendations are highly relevant Its integration with Amazon's otherservices, such as Amazon shopping and Alexa, provides a seamless user experience.+ Cons: Amazon's recommendation system sometimes fails to provide accuraterecommendations, and the user interface can be cluttered

2 HBO Max: HBO Max uses a hybrid approach to their recommendation system, similar toAmazon Prime Video, that combines collaborative filtering and content-based filtering Theyalso use Machine Learning algorithms to analyze user behavior and adjust recommendationsaccordingly

+ Pros: The recommendations are highly relevant as HBO Max established innovative

Trang 12

feature: “Choose your adventure”, allowing users to answer a few questions abouttheir mood and preferences, and then HBO Max recommends a personalized viewingexperience based on their answer [8]

+ Cons: HBO Max's recommendation system can be limited in the range of optionsprovided

3 Disney: uses a collaborative filtering approach to recommend content based on the user'sviewing history, favorites, and content ratings They also provide curated collections andpersonalized watchlists based on user preferences [9]

+ Pros: its extensive library of exclusive content, making it easier for users to discovernew shows and movies

+ Cons: Disney+'s recommendation system can be limited in the range of optionsprovided and the user interface can be cluttered

4 Apple TV+: uses a mix of collaborative filtering and machine learning to personalizecontent recommendations based on the user's viewing habits, preferences, and content ratings.They also offer handpicked content collections based on special events or themes [10]+ Pros: its integration with Apple's ecosystem, making it easier for users to find andwatch content across their devices

+ Cons: some users have reported that Apple TV+'s recommendation system can beslow to update and the search function can be limited

● Solutions suggested by the AI community, industry conferences (provide with theproof from the community, the paper or the name of the conference)

- The International Joint Conference on Artificial Intelligence (IJCAI) suggested a newapproach to recommending diverse items, which they call diversity-weighted utilitymaximization (DUM) The paper of the conference: Optimal Greedy Diversity forRecommendation, proceedings of the Twenty-Fourth International Joint Conference

on Artificial Intelligence (IJCAI 2015) [11]

4.2 Prototype architecture and technologies needed:

● System architecture

Ngày đăng: 19/06/2024, 17:04

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
1. TowardDataScience. (2018, October 31). Recommendation system with matrixfactorization. Retrieved fromhttps://towardsdatascience.com/recommendation-system-with-matrix-factorization-ebc4736869e4 Link
2. SmartStudios. (2019, July 17). Serving 200 Million Online Subscribers: The Netflix Way.Retrieved May 24, 2021, fromhttps://smartstudios.io/blog/serving-200-million-online-subscribers-the-netflix-way/ Link
3. Bagul, S. S. (2016). Diversified Recommender Systems: A survey and future directions.Retrieved from https://raiith.iith.ac.in/4104/1/Thesis_Mtech_CS_4104.pdf Link
6. Green, J. E. (2021, August 5). Algorithms in Streaming Services: Developing Personalized Recommendations. Arts Management and Technology Laboratory.https://amt-lab.org/blog/2021/8/algorithms-in-streaming-services#:~:text=developing%20personalized%20recommendations.-,HBO%20Max,focus%20on%20that%20human%20touch Link
7. Gavira, M. (2018, December 13). Amazon's Recommendation Engine: The Secret Sauce.LinkedIn.https://www.linkedin.com/pulse/amazons-recommendation-engine-secret-sauce-mario-gavira Link
8. Stolze, E. (n.d.). HBO Max. Retrieved May 10, 2021, from http://www.ericstolze.com/#/hbo-max Link
10. Carrasco, M. (2021, January 15). How AppleTV+'s Recommendation Engine Works.Medium.https://towardsdatascience.com/how-appletvs-recommendation-engine-works-7d3e0ba4ebc4 Link
12. Microservices architecture at Netflixhttps://smartstudios.io/blog/serving-200-million-online-subscribers-the-netflix-way/ Link
13. Netflix recommendation system architecture. (n.d.). [Blog post]. Georgetown University.http://xiaomanchen.georgetown.domains/blog/algorithm/ Link
14. Collaborative Filtering Vs Content-Based Filtering for Recommender Systems. (2020, September 10). Analytics India Magazine.https://analyticsindiamag.com/collaborative-filtering-vs-content-based-filtering-for-recommender-systems/ Link
15. TowardDataScience. (2018, October 31). Recommendation system with matrixfactorization. Retrieved fromhttps://towardsdatascience.com/recommendation-system-with-matrix-factorization-ebc4736869e4 Link
16. Neptune.ai. (2021, May 28). Recommender systems metrics: Precision, Recall, and NDCG explained. Retrieved from https://neptune.ai/blog/recommender-systems-metrics Link
17. GeeksforGeeks. (n.d.). ML-Content Based Recommender System. Retrieved from https://www.geeksforgeeks.org/ml-content-based-recommender-system/18 Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009 Link
20. Movielens Movie Recommendation System:https://www.kaggle.com/code/sachinsarkar/movielens-movie-recommendation-system Link
21. Movie Recommendation System with Collaborative Filtering. (n.d.). [Video]. YouTube.https://www.youtube.com/watch?v=3ecNC-So0r4&t=1889s Link
22. Building a movie recommendation system. (n.d.). [Video]. YouTube.https://www.youtube.com/watch?v=XoTwndOgXBM Link
23. Movie Recommendation System. (n.d.). [Video]. YouTube.https://www.youtube.com/watch?v=A_78fGgQMjM Link
24. Netflix recommendation system architecture. (n.d.). [Blog post]. Georgetown University.http://xiaomanchen.georgetown.domains/blog/algorithm/ Link
25. A Guide to Content-Based Filtering In Recommender Systems. (2021, May 7). Turing.https://www.turing.com/kb/content-based-filtering-in-recommender-systems Link
27. What is response time & how to reduce it. Sematext. (2023, February 10).https://sematext.com/glossary/response-time Link

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN