Hệ khuyến nghị bảo mật

ĐẠI HỌC QUỐC GIA TP.HCM TRƯỜNG ĐẠI HỌC BÁCH KHOA ———————————– PHẠM HỒNG THÁI HỆ KHUYẾN NGHỊ BẢO MẬT Chuyên ngành: Khoa Học Máy Tính Mã số: 8.48.01.01 LUẬN VĂN THẠC SĨ TP HỒ CHÍ MINH, tháng năm 2023 CƠNG TRÌNH ĐƯỢC HỒN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA ĐHQG-HCM Cán hướng dẫn khoa học: PGS TS Đặng Trần Khánh Cán chấm nhận xét 1: TS Đặng Trần Trí Cán chấm nhận xét 2: TS Nguyễn Thị Ái Thảo Luận văn thạc sĩ bảo vệ Trường Đại học Bách Khoa, ĐHQG Tp HCM ngày 07 tháng 02 năm 2023 Thành phần Hội đồng đánh giá luận văn thạc sĩ gồm: (Ghi rõ họ, tên, học hàm, học vị Hội đồng chấm bảo vệ luận văn thạc sĩ) Chủ Tịch: PGS.TS Trần Minh Quang Thư Ký: TS Phan Trọng Nhân Phản Biện 1: TS Đặng Trần Trí Phản Biện 2: TS Nguyễn Thị Ái Thảo Ủy Viên: PGS.TS Nguyễn Tuấn Đăng Xác nhận Chủ tịch Hội đồng đánh giá LV Trưởng Khoa quản lý chuyên ngành sau luận văn sửa chữa (nếu có) CHỦ TỊCH HỘI ĐỒNG TRƯỞNG KHOA KHOA HỌC VÀ KỸ THUẬT MÁY TÍNH ĐẠI HỌC QUỐC GIA TP.HCM CỘNG HỒ XÃ HỘI CHỦ NGHĨA VIỆT NAM TRƯỜNG ĐẠI HỌC BÁCH KHOA Độc lập - Tự Do - Hạnh Phúc NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: Phạm Hồng Thái MSHV: 1970521 Ngày, tháng năm sinh: 13/08/1996 Nơi sinh: TP.HCM Ngành: Khoa học Máy Tính Mã số: 8480101 I TÊN ĐỀ TÀI: – Hệ khuyến nghị bảo mật / Secure Recommender System II NHIỆM VỤ VÀ NỘI DUNG : – Tìm hiểu hệ khuyến nghị, cách thức để xây dựng hệ khuyến nghị bảo mật chuyên dùng cho nhiệm vụ xếp hạng – Đánh giá phương pháp có, từ đề xuất phương pháp để xây dựng hệ khuyến nghị bảo mật chuyên dùng cho nhiệm vụ xếp hạng dựa học liên kết có cải tiến mức độ bảo mật so với phương pháp có – Hiện thực, thử nghiệm đánh giá hệ thống xây dựng III NGÀY GIAO NHIỆM VỤ: 20/01/2022 IV NGÀY HOÀN THÀNH NHIỆM VỤ: 07/02/2023 V CÁN BỘ HƯỚNG DẪN: PGS TS Đặng Trần Khánh Tp HCM, ngày tháng năm 2023 CÁN BỘ HƯỚNG DẪN HỘI ĐỒNG NGÀNH (Họ tên chữ ký) (Họ tên chữ ký) TRƯỞNG KHOA KHOA HỌC VÀ KỸ THUẬT MÁY TÍNH (Họ tên chữ ký) Lời cảm ơn Đầu tiên, xin bày tỏ lòng biết ơn sâu sắc tới PGS TS Đặng Trần Khánh, người hướng dẫn suốt trình thực luận văn đề cương Nhờ có hướng dẫn góp ý thầy hỗ trợ thực nghiên cứu mình, hồn thành tốt luận văn báo cáo hội nghị ACOMPA 2022 Đồng xin cảm ơn nhóm nghiên cứu, đặc biệt bạn Nguyễn Khánh Nam Phún Vỹ Hịa giúp tơi nhiều trình học tập nghiên cứu Bên cạnh đó, tơi xin gửi lời cảm ơn đến quý thầy cô khoa Khoa học Kỹ thuật máy tính truyền thụ kiến thức, kinh nghiệm quý báu cho hai năm qua Cuối cùng, xin gửi lời cảm ơn chân thành đến gia đình bạn bè, người ln động viên, ủng hộ suốt thời gian học cao học Thành phố Hồ Chí Minh, 02/2023 Phạm Hồng Thái i Tóm tắt Mục đích đề tài nghiên cứu xây dựng hệ khuyến nghị/đề xuất bảo mật tập trung vào việc cá nhân hóa xếp hạng mục cho người dùng, với mục tiêu bảo vệ thông tin tương tác người dùng Cụ thể, luận án tơi trình bày thuật toán dùng để đào tạo hệ khuyến nghị dựa kỹ thuật Lọc cộng tác (Collaborative Filtering) thông qua phương pháp phân tích ma trận tiên tiến có tên Lọc cộng tác thần kinh (Neural Collaborative Filtering) với phương pháp tiếp cận theo cặp (pairwise approach) phổ biến có tên Xếp hạng cá nhân hóa Bayes (Bayesian Personalized Ranking) Hệ khuyến nghĩ huấn luyện cách phân tán kỹ thuật học liên kết (Federated Learning) kết hợp với thuật tốn tổng hợp an tồn sử dụng mật mã dựa đường cong Elliptic nhằm nâng cao mức độ bảo mật ii Abstract The main purpose of this thesis is to research and develop a secure recommendation system specialized for the ranking task Specifically, in this thesis, I will propose the algorithm used to train recommender system based on Collaborative Filtering technique through one of the state-of-the-art matrix factorization methods today called Neural Collaborative Filtering with the most popular pairwise approach called the Bayesian Personalized Ranking The recommender system will be trained in Federated Learning setting with a secure aggregation using cryptography based on the Elliptic curve to improve security iii Lời cam đoan Tôi Phạm Hồng Thái học viên cao học khoa Khoa Học Kỹ Thuật Máy Tính, Đại học Bách Khoa TP HCM, MSHV 1970521 Tôi xin cam đoan luận văn thạc sĩ "Hệ khuyến nghị bảo mật" kết tìm hiểu, nghiên cứu độc lập thân Tơi xin cam đoan: Luận văn thực cho mục đích tìm hiểu nghiên cứu bậc cao học Các cơng trình, báo tham khảo để xây dựng nên luận văn trích dẫn, tham khảo Tất tài liệu trích dẫn có tính kế thừa từ tạp chí cơng trình nghiên cứu cơng bố Những cơng cụ, phần mềm cho trình thực luận văn phần mềm mã nguồn mở Hình ảnh số liệu trích dẫn nguồn tham khảo rõ ràng TP Hồ Chí Minh, Ngày 07 Tháng 02 Năm 2023 Học viên Phạm Hồng Thái iv Mục lục GIỚI THIỆU 1.1 Mở đầu 1.2 Mục tiêu 1.3 Bố cục luận văn 1 2 CƠ SỞ LÝ THUYẾT 2.1 Hệ khuyến nghị - Recommender System 2.1.1 Giới thiệu chung 2.1.2 Ma trận tương tác người dùng - sản phẩm - User-item interaction matrix 2.1.3 Kỹ thuật lọc dựa nội dụng (Content-based Filtering) 2.1.4 Kỹ thuật lọc cộng tác - Collaborative Filtering 2.1.5 Các kỹ thuật lai (Hybrid) 2.2 Phân tích ma trận - Matrix Factorization 2.3 Neural Collaborative Filtering 2.4 Bayesian Personalized Ranking (BPR) 2.5 Private Set Union (PSU) 2.6 Federated Learning 2.7 Secure Aggregation Protocol 2.8 Giao thức trao đổi khóa Diffie–Hellman 2.9 Elliptic-curve Diffie–Hellman 4 9 11 13 14 15 17 18 20 CÁC HƯỚNG TIẾP CẬN VÀ CƠNG TRÌNH LIÊN QUAN 3.1 Các hướng tiếp cận cơng trình liên quan 3.1.1 Huấn luyện tập trung (Centralized) 3.1.2 Huấn luyện phi tập trung (Áp dụng học liên kết - Federated Learning) 3.1.3 Phân tích bảo mật 3.1.4 Phân tích chi phí liên lạc 23 23 23 25 27 29 PHƯƠNG PHÁP ĐỀ XUẤT 33 4.1 Giải pháp để huấn luyện mơ hình 33 4.2 Giải pháp để bảo vệ thông tin người dùng 34 v MỤC LỤC 4.2.1 Cách huấn luyện mơ hình 37 THÍ NGHIỆM 5.1 Tập liệu - Dataset 5.2 Phép đo - Metrics 5.3 Các thiết lập thí nghiệm 5.4 Kết KẾT LUẬN 6.1 Kết luận 6.2 Ưu điểm 6.3 Hạn chế 6.4 Hướng phát triển cho tương lai - Future Study 41 41 41 42 43 55 55 55 56 56 Danh mục cơng trình khoa học 58 Tài liệu tham khảo 81 vi Danh sách hình vẽ 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 3.1 3.2 3.3 3.4 3.5 5.1 5.2 5.3 Hệ khuyến nghị User-item interaction matrix Hại loại kỹ thuật để xây dựng hệ khuyến nghị Matrix Factorization Utility Matrix users (a) Latent Vectors for users (b) Latent Vectors for users Neural Collaborative Filtering Architecture Ví dự liệu huấn luyện phương pháp Bayesian Personalized Ranking Minh họa giải thuật PSU Federated Learning Secure Aggregation Protocol [1] Diffie-Hellman [2] Elliptic Curve Diffie-Hellman key exchange, với G điểm biết trước đồ thị Elliptic (tương tự số g Diffie-Hellman cổ điển), QA/B public key Alice/Bob, dA/B private key Alice/Bob.[3] Thuật toán FedAvg [4] Thuật toán FMF [5] Tần suất cập nhật item embeddings người dùng 500 lần lặp (vịng huấn luyện), người dùng lấy mẫu ngẫu nhiên seen item, sau sử dụng mục để ghép nối với unseen item khác để thực cập nhật cho vòng huấn luyện (k=8) Tần suất cập nhật item embeddings người dùng 500 lần lặp, người dùng lấy mẫu ngẫu nhiên giống cách trình bày Figure 3.3, nhiên có sử dụng random process [6] để cân tần suất cập nhật seen unseen item Random Response [6] để giải vấn đề bảo mật 10 11 11 12 13 15 16 18 20 21 25 26 30 31 32 Kết huấn luyện tập liệu MovieLens 100k 44 Kết huấn luyện tập liệu MovieLens 1M 45 Tần suất cập nhật item embeddings cho người dùng seen nhiều items dùng giải thuật với tập dataset MovieLens 100k 46 vii 1JX\HQ/H'X\/DLHo Chi Minh City University of Technology 9DQ+RDL7UDQHCMC University of Technology 9DQ6LQK1JX\HQHCMC International University 3KXF.KDL1JX\HQHCMC University of Technology 1JR+D4XDQJ7KLQKHCMC University of Technology LP+XQJ/HUniversity of Information Technology - VNUHCM +RDQK6X/HUniversity of Economics and Law - VNUHCM +XQJ1JX\HQ'LQKNha Trang University 1JX\HQ/H+RDQJRitsumeikan University 7KDQK+RUniversity of Economics and Law, VNU-HCM 9LHW9X9XVietnam National University, Hanoi 7KDQK/RQJ/HHCMC University of Technology xi Secure Recommender System based on Neural Collaborative Filtering and Federated Learning Hong Thai Pham, Khanh Nam Nguyen Vy Hoa Phun Ho Chi Minh City University of Technology, VNU-HCM 268 Ly Thuong Kiet Street, District 10 Ho Chi Minh City, Vietnam {phthai.sdh19, nknam.sdh19}@hcmut.edu.vn Zalo ZMB DTS - VNG Corporation 13 Tan Thuan Dong, District 7, Ho Chi Minh City, Vietnam hoapv3@vng.com.vn Tran Khanh Dang(*) Ho Chi Minh City University of Food Industry 140 Le Trong Tan Street, Tay Thanh Ward, Tan Phu District Ho Chi Minh City, Vietnam khanh@hufi.edu.vn Abstract—A recommender system aims to suggest the most relevant items to users based on their personal data However, data privacy is a growing concern for anyone Secure recommender system is a research direction to preserve user privacy while maintaining as high performance as possible The most recent strategy is to use Federated Learning, a machine learning technique for privacy-preserving distributed training In Federated Learning, a subset of users will be selected for training model using data at local systems, the server will securely aggregate the computing result from local models to generate a global model, finally that model will give recommendations to users In this paper, we present a novel algorithm to train Collaborative Filtering recommender system specialized for the ranking task in Federated Learning setting, where the goal is to protect user interaction information (i.e., implicit feedback) Specifically, with the help of the algorithm, the recommender system will be trained by Neural Collaborative Filtering, one of the state-of-the-art matrix factorization methods and Bayesian Personalized Ranking, the most common pairwise approach In contrast to existing approaches which protect user privacy by requiring users to download/upload the information associated with all interactions that they can possibly interact with in order to perform training, the algorithm can protect user privacy at low communication cost, where users only need to obtain/transfer the information related to a small number of interactions per training iteration Above all, through extensive experiments, the algorithm has demonstrated to utilize user data more efficient than the most recent research called FedeRank, while ensuring that user privacy is still preserved Index Terms—recommender system, federated learning, security and privacy I I NTRODUCTION Unlike an offline store, there are a wide variety of products to offer to customers in e-commerce stores Nonetheless, unlimited choices can cause problems for users It is difficult for users to find the most appropriate item Plenty of recommender systems have been proposed to solve this issue The recommender system utilizes users data and preferences to filter recommended items for the users (*) Corresponding author While building a recommender system, the notion “secure” has not been carefully considered Traditional recommender systems must collect user historical interactions and store them in a centralized database This setting will lead to problems: • Violate data regulations (e.g., the General Data Protection Regulation) by collecting users data without their permission However, without the users’ data, it is difficult for the traditional recommender system to make good recommendations [1] • Privacy breach can be occurred by gathering user past interactions (e.g., interactions between users and items, or user ratings) and/or user sensitive data (e.g., ethnicity and gender) to infer confidential or personal information of that user in a non-secure recommender system [2]–[4] For these reasons, a secure recommender system is needed to provide good recommendation as well as good prevention against privacy risks In 2016, Google proposed a new machine learning technique for privacy-preserving distributed training called Federated Learning (FL) In FL setting, users will train their local models using their data in their local systems (e.g., user devices) and only send the local models’ information (e.g., gradients, weights, or changes) to the server to update the global model The way to update the global model can be done by using a standard method called Federated Averaging (FedAvg) [5], which securely takes the average of updated weights or changes of local models to update the global model Among the various recommender paradigms, Collaborative Filtering (CF) recommender system is the most widely applied in real-world use case due to its superior performance and ability to cope with large number of users and items in a system [6], [7] The strength of CF recommender system is that users who have shared similar tastes in the past are more likely to so again in the future To overcome the aforementioned restrictions, we propose a novel algorithm to deploy a deep-learning based, generalized version of Funk MF method to train CF recommender system using Bayesian Personalized Ranking (BPR) loss [8] in FL setting, which can both improve the recommendation performance and resolve the privacy breach at low communication cost To be specific, the Neural Collaborative Filtering (NCF) method [9] will be used to train CF system in this paper In summary, the following points are our contributions: • We propose a novel solution to train CF model specialized for ranking tasks via NCF method in FL setting, where the solution supports the pairwise BPR loss for training • We derive an algorithm to protect user privacy at low communication cost, where the user does not need to download/upload the full item embedding matrix for each update to hide implicit feedback information Moreover, the algorithm also helps users utilizing their data more efficiently than one of the most recent research, FedeRank [10], while ensuring that the user’s privacy is preserved The remainder of this paper is organized as follows In Section II, we discuss the related works to protect user privacy when building a recommender system from the past to the present in more detail with essential background knowledge The algorithm will be proposed and explained in detail how it is able to fix the problems in the existing works in Section III Moving to Section IV, we will evaluate the performance of our algorithm through extensive experiments Finally, the algorithm’s advantages and limitations will be summarised in Section V II R ELATED W ORK In this section, we will review some recommender systems and methods to protect user privacy In the context of traditional recommender system, users data has been collected and stored in an organized database to enhance the system performance Since businesses can utilize users’ data for other purposes without their consent, user privacy may be violated Privacy invasion can be treated by generalizing data in the database using anonymity techniques These techniques also guarantee to provide useful information for organizations/businesses even when the data is anonymized Three of the prominent anonymization techniques used, namely k-anonymity [11], l-diversity [12], and t-closeness [13] to make each instance indistinguishable from other instances, while meaningful information is still accessible The generalized attributes are called “quasi-identifiers” often used to determine the identity of an instance or a record in a dataset These techniques can be exploited by combining the anonymized dataset with external datasets [14], [15], or are also prone to other attacking strategies such as Skewness attack and Similarity attack [13] or inference attack [16] To overcome such privacy issues, a prominent design/algorithm called Federated Collaborative Filter (FCF) has been proposed to deploy a standard method to train CF recommender system in a decentralized setting/design [17] The standard method of matrix-factorization is known as Funk MF, a matrix-factorization method which simply decomposes user-item interaction matrix of data to a set of user and item embeddings associated with a set of users and items to build CF system [18] Notwithstanding, the FCF algorithm is only specialized for Funk MF method to train CF system, and the data can still be reconstructed via reverse engineering attack even when the data is kept private [19] Furthermore, since the information exchange between users to the server during the training process is mathematically derived from Alternating Least Squares (ALS) [20] loss, the FCF algorithm is only dedicated to protect data if the CF model is optimized by ALS loss function The model can’t be used to protect user data if it is trained by other better loss function (e.g., BPR loss function) in ranking task In addition, FedeRank [10] has demonstrated a solution to train the CF model via Funk MF method with BPR loss in FL setting while using the conventional randomized response technique [21] to protect user privacy Specifically, a user will select a subset of seen and unseen items to perform local training, which results in non-zero gradients for those item embeddings associated with the selected items during the training process Then a random portion of non-zero itemembedding gradients will be discarded (i.e., set to 0) to protect user interactions Even so, this technique is not secure, which we will elaborate in Section III-A More significantly, many studies [10], [17], [22] require users to download/upload the information of the full itemembedding matrix, which is problematic due to the limited resource of user devices and network infrastructures Another solution to train recommender systems in FL setting considers the communication overhead issue while preserving user privacy [23] In detail, an union item set is composed by gathering many small random subsets of items from a group of selected users Then, each selected user will only select some items from that union set to perform local training Finally, the gradients of the updated items will be sent to the server instead of sending the gradients of all items Furthermore, to tackle the privacy concern at the same time, [23] let each user use Randomized Aggregatable Privacy-Preserving Ordinal Response (RAPPOR) technique [24] to randomly select items inside the union item set to perform local training, which guarantees the privacy of any user is bounded with a specific privacy budget irrespective of the number of updates What is more, because only item embeddings associated with items inside the union set can get updated in communication, this solution makes any selected item have a high chance of getting requested by multiple users, and the server cannot infer the target item belonged to which user if the corresponding item-embedding gradient was submitted by two or more users and aggregated via the Secure Aggregation Protocol [25] A Neural Collaborative Filtering (NCF) Neural Collaborative Filtering is a generalized version of Funk MF method and can be regarded as the foundation method of the state-of-the-art matrix-factorization method [9], Similar to Funk MF method, NCF method also aims at mapping a typical, sparse representation of user embeddings, which is one-hot encoding vector (i.e., only a single entry contains and the other entries are 0), onto a dense representation TABLE I N OTATIONS AND DESCRIPTION FOR THIS SECTION Notation W U I L(W, u, i, j) L hl (·) ˆl h rîu u rîj EU EI eu ei Description Model parameters of CF model, W[N] nonembedding weights of the models, W[u] and W[i], which represent the user embedding of the user u and the item embedding of the item i respectively Set of all users Set of all items, I+ u set of all seen/positive items of a user u, I− u set of all unseen/negative items of a user u Loss function of a single pairwise record, which takes a user index u, a seen item ID i, and an unseen item ID j Total number of hidden layers for CF model trained by NCF method The output function at the lth hidden layer, where 1≤l≤L The evaluated output at the lth hidden layer, where 1≤l≤L Estimate likeliness/rating score of the user u given to the item i, which is the output of CF model The difference in the estimate likeliness/rating scores of the user u given to the seen item i and the unseen item j User embedding matrix Item embedding matrix One-hot representation of the user u ∈ U One-hot representation of the item i ∈ I of user embeddings (i.e., most of the entries are non zeros) in order to gain more efficiency for many Machine Learning downstream tasks such as regression and classification, and the same thing applies to item embeddings of all items in the given dataset By using a combination of both linear GMF (Generalized Matrix Factor) and non-linear MLP (Multi-Layer Perceptron), it can produce higher quality recommendations with faster convergence B Bayesian Personalized Ranking (BPR) BPR is a pairwise approach to train a recommender system specialized for the ranking task [8], where a training record contains a pair of positive and negative items (i.e pair of seen and unseen items), and the goal is to make the system prefer positive items over negative items by maximizing the logarithm of the posterior probability of the model W preferring positive items over negative items Specifically, the posterior probability can be interpreted as how likely the model prefers positive items over negative items, where the likeliness is defined by the difference of the estimate rating/likeliness scores between a pair of positive and negative items, and the estimate likeliness score of an item given by a user is typically the output of the model [26] C Private Set Union (PSU) Private Set Union protocol presented by [23] is a method that allows users to securely send their items to a server to create a union set without revealing any of the individual sets A set of items to be submitted by a user will be represented as a Bloom filter [27] This is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, Bloom filter consists of an array of m binary elements (each element’s value is or 1), and k hash functions designed so that the output is an integer in the range [1 m] To form the union set given many item sets of users, one can simply add all the filters up and then check the existence of items in the aggregated filter to finally reconstruct the desired union item set, which is the union of item sets submitted by users The process of constructing the union item set is described in Algorithm Algorithm Private Set Union (PSU) [23] Require: Sets of items {Iu }u∈U− from a subset of users U− ⊂ U 1: for each user u ∈ U− ▷ This process can be done parallelly 2: Compose the Bloom filter Bu for the item set Iu 3: Randomize the non-zero entries in the filter Bu by ′ some random integers from ZR to form Bu 4: Construct the hint vector Hu by using Algorithm ′ 5: Securely submit both Bu and Hu to the server 6: end for ′ ˆ =P 7: The server obtains the aggregated filter B u∈U− Bu P ˆ = ˆ and the aggregated hint H u∈U− Hu then uses B ˆ to reconstruct the union set of items Iˆ and sends and H Iˆ to each user u ∈ U− However, the server will know how many users submitted the same item by counting the value of the filters To avoid this issue, users can randomize the non-zero entries in their filters by some random integers before submitting the filters to the server, which is the reason behind Step in Algorithm After the server receives filters, it infers the members inside the aggregated filter, the naive way is to check the existence of every possible item inside the filter, which can be timeconsuming if the number of items in a system is very large To reduce the time complexity of inferring items, users can give hints to the server by telling the server to only look at a particular region of items In detail, Algorithm explicitly describes how users can give hints to the server Algorithm Hint for item set reconstruction [23] Require: The number of partitions/regions K and the set of items to be submitted Iu from a user u ∈ U 1: Partition all item IDs I into K equal regions 2: Hu ← ▷ Initialize the hint vector Hu containing K entries to 3: for each item i ∈ Iu 4: Hu [k] ← iff item i is in the range of item IDs specified for the k th entry in Hu , where k = 1, , K 5: end for 6: return Hu D Secure Aggregation Protocol Secure Aggregation Protocol [25] is a class of Secure MultiParty Computation algorithms used to compute multiparty sum of information from individual users (it could be model parameters or Bloom filters) in a secure manner without revealing to one another any information The protocol assumes that any pair of users will share a common secret seed to generate noise with the same magnitude but opposite direction Then, the real information will be perturbed by the noise before submitting to the server As a result, all the noise will be cancelled out once the perturbed information is aggregated III P ROPOSED A PPROACH In this section, we propose our algorithm to train CF model for ranking tasks in FL setting via NCF method and BPR loss A Security Analysis There are two issues related to the privacy breach when using the pairwise approach and BRP to train CF model The first one is the imbalance of the update frequency between seen items and unseen items when the pairwise approach is used to train the CF model Suppose that the number of items that the user u has interacted with is fewer − than the number of unseen items, i.e., |I+ u | zk or n1 ≈ zk then u u u u + 3: Compute p− u and pu via nu , mu , zu , and k with the help of the optimization problem 4: else ▷ When n1u < zku − + 5: Compute pu and pu via nu , mu , zu , and k with the help of the optimization problem 6: end if 7: for each i ∈ Iu 8: if i ∈ I+ ▷ If i is a seen item of user u u then 9: Insert i to I˜u with probability p+ u 10: else 11: Insert i to I˜u with probability p− u 12: end if 13: end for 14: return I˜u Above all, the most critical question is how to adjust these probabilities in such a way that the frequency of seen items and unseen items in the training samples is comparable In detail, the idea here is to balance the probability that a seen item will exist in a training batch, i.e., a set of training samples composed by the user in communication, with the probability that an unseen item will exist in the training batch, − and note that, these probabilities are different from p+ u and pu Therefore, it remains to define those probabilities based on the sampling process described from in 4, 9-10 in Algorithm and − then set p+ u and pu in a way that minimizes the difference in those probabilities Mathematically, minimizing the difference between the probability related to the frequency of an unseen item and that of a seen item in a training batch can be expressed as follows: k k k − ∗ p− ∗ (k ∗ p+ ∗ p+ u + u + k ∗ pu ) − u mu zu nu {z } | {z } | (1) ⇔ k k p− + ) − p+ − ) u ∗( u ∗( mu zu nu zu (2) (3) Optimization problem provides an initiative to find optimal + p− u and pu to balance the frequency of items when the user u has only interacted with few items, which happens when nu zku where, (1) represent the upper-bound probability that an unseen item will exist in a training batch (2) represents the probability that a seen item will exist in a training batch For convenience, we will regard the frequency of items in a training batch or the update/request frequency of items with respect to a user through the lens of the server as the frequency of items for the rest part in this paper + In the optimization problem 3, p− u has to be greater than pu , and the number of participants |U− | in a communication must be less than the number of seen items nu to ensure (1) > (2), which are simply the conditions to make the frequency of unseen items higher than that of seen items given the user u has only interacted with few items More specifically, a user u will be considered to interact with very few items if n1u > zku Otherwise, if p− u is set k to less than or equal to p+ given > , the frequency u nu zu of seen items is noticeably higher than that of unseen items, which is the case when the privacy breach occurs In fact, it is not sensible to find a solution that can make the frequency of seen and unseen items equally balance, since the trivial + solution will be setting p− u and pu equal to to make the whole expression in become 0, which only happens when + p− u = pu Therefore, the remaining option to balance the frequency of items is to enforce the frequency of unseen items higher than the frequency of seen items in which the difference should be negligible Nevertheless, the optimization problem is not well defined due to the inequality constraint between + p− u and pu is vague To make the inequality constraint be + − ratio more specific, we reinterpret p− ∗ p+ u > pu as pu ≥ pu u, ratio ratio > 1, and thus where pu is a predefined constant and pu the optimization problem now becomes a linear optimization problem However, minimizing the gap difference between + (1) and (2) will generate very small p− u and pu , which will undesirably cause Algorithm to select very few items to compose a training batch, and thus the user can only perform very few updates in a large number of communications To tackle this issue, instead of minimizing the gap difference according to the optimization problem 3, we fix the gap difference by a predefined quantity tu > and then include − the relation between tu , p+ u and pu in the constraint part of the problem To be clear, fixing the gap between the upper-bound probability of an unseen item will exist in a training batch and the probability associated with a seen item to a small value is the reasonable approach, because this will result in a smaller gap between the actual probability that an unseen item will exist in a batch and the probability associated with a seen item Afterward, we change the goal of the optimization problem to maximizing p+ u , because this corresponds to increasing the number of valid samples in a batch, which is equivalent to increasing the data utility of the user to update the global model Intuitively, the goal is to maximize the frequency of seen items in a training batch while ensuring that the discrepancy between the frequency of seen and unseen items is indistinguishable, which can be controlled by adjusting tu and pratio Mathematically, the finalized optimization problem u − used to compute p+ u and pu can be described as follows: max p+ u s.t p− u = (3) ≥ pratio ∗ u − ≤ pu ≤ ≤ p+ u ≤1 p− u 0 k − zu + tu + n1u p k k u mu + zu mu + zu | {z } | {z } (4) (4) p+ u Optimization problem defines the way to compute optimal k + p− u and pu given mu >> nu and nu > zu where, The first constraint is derived by equating the right-handside objective function in to tu and then rearranging it tu ∈ (0, m1u + zku ) to ensure this optimization problem has a sensible solution Otherwise if tu > m1u + zku , it is infeasible to obtain a solution, since (3) > and (4) is also positive quantity, which is equivalent to p− u > Moreover, + when tu = m1u + zku , the solution for p− u and pu will be and respectively, which is undesirable, because Algorithm will ignore all valid samples, and there is no update in the training process About the case when a user u still has fewer seen items than unseen items, but the number of seen items is sufficiently larger than the previous case, i.e., n1u < zku ⇔ n1u − zku < 0, the probability of an unseen item will exist in a batch is still higher than that of a seen item, which means that (1) > (2) k k + still holds Therefore, p− u ∗ ( mu + zu ) > pu ∗ ( nu − zu ) even − when p+ u ≥ pu , since the right hand side of the inequality is a negative quantity To make use of this property to find + the optimal p− u and pu when the user u has interacted with a decent number of items, we define another optimization problem as follows: max p+ u s.t p− u = tu mu + k zu + p− u ≤ pu + nu mu − + k zu k zu p+ u (5) ≤ p− u ≤1 ≤ p+ u ≤1 Optimization problem defines the way to compute optimal k + p− u and pu given mu > nu and nu < zu where, The equality constraint is identical to that of the optimization problem | − k | t ∈ (0, m1u + zku + ( n1u + zku )2 ], where the domain of tu mu zu is extended by an additional term having the absolute sign in the numerator to account for the negative term in the equality constraint What is more, when the total number of items that the user u has interacted with is moderate (i.e., n1u ≈ zku ), the user can simply take the absolute of the difference between n1u and zku in the equality constraint of the optimization problem and − use that optimization problem to estimate p+ u and pu Another worth mentioning point is that, both the optimization problems require the number of participants |U− | in a communication must be less than the total number of seen items nu with respect to the user u to ensure the probability of an unseen item will exist in the batch is greater than the probability associated with a seen item C Overall solution Algorithm describes our solution to train CF model via NCF method in FL setting Regarding the procedure to evaluate the global model performance, the server shall check the availability of users to gather as many users as possible for the evaluation process, since more users should better approximate how good the global model is In particular, the server will make pairs between users, and each pair will make use of PSU algorithm to construct a common set of items then request all items in that set The reason for joining users into many pairs instead of a group of more than two users is because each user will need to request many item embeddings to judge whether the global model truly understands her preference, and therefore to reduce the communication overhead as much as possible while ensuring the user privacy is preserved, individuals will be paired together and request the same set of item embeddings In detail, the evaluation process is fully described by Algorithm Algorithm Overall solution 1: for each communication t = 1, , T 2: Select a subset of users U− ⊂ U to participate the training process 3: for each user u ∈ U− ▷ Union item set construction 4: Compose a list of items Iu by randomly sampling k seen items and k unseen items with replacement 5: Use PSU algorithm (Algorithm 1) to submit the duplicate-free version of Iu , which is Iû , to the server to construct the union item set Iˆ 6: end for 7: Construct Iˆ by using Algorithms and then delivers it to the selected users U− 8: for each user u ∈ U− ▷ Local training process 9: Run Algorithm to select items in Iu to form a set of positive items I˜u 10: Compose the training subdataset Du− by randomly sampling k unseen/negative items in Iˆ to pair with each positive item in I˜u 11: Request W[i] ∀i ∈ Du− and W[N] from the server 12: Perform a single update on the local model based on valid samples in Du− using BPR loss and Adam optimizer 13: for each item i ∈ Du− 14: if item i does not exist in the valid samples of Du− then 15: wiu ← 16: ∇Wu [i] ← 17: end if 18: end for 19: Send wiu ∗ ∇Wu [i] & wiu ∀i ∈ Du− and NDu− ∗ ∇Wu [N] & NDu− to the server via Secure Aggregation Protocol (Section II-D) 20: end for 21: Aggregate the given information sent from the users U− to update the global model via Adam update rule [29] 22: Evaluate the global model performance after a certain number of communications by Algorithm 23: end for IV E XPERIMENTS A Dataset Our experimental studies are conducted using MovieLens [28] 100k datasets which has been widely used as a benchmark dataset to evaluate the performance of CF model by different methods B Metric and Test Data We use types of metrics to evaluate the ranking performance of CF model on implicit feedback, including Hit Ratio (HR), and Normalized Discounted Cumulative Gain (NDCG) For Hit Ratio and NDCG metrics, we only evaluate if user’s Algorithm Model evaluation ˜ ⊂U 1: Get alive users U 2: Make pairs between alive users P = {(u, v)} ∀u, v ∈ ˜ and u ̸= v U 3: for each (u, v) ∈ P ▷ For each pair of users u and v 4: Compose the respective sets of item IDs I˙u and I˙v needed for their evaluation 5: Use PSU algorithm (Algorithm 1) to take the union of I˙u and I˙v to obtain I˙ = I˙u ∪ I˙v 6: Request W[i] ∀i ∈ I˙ and W[N] from the server 7: Evaluate their models 8: Send the evaluation results to the server via Secure Aggregation Protocol (Section II-D) 9: end for 10: Aggregate the evaluation results to obtain the overall model performance items of interest is in the top 10 recommendations against a list of other unseen items (HR@10 and NDCG@10) About the test dataset, for all users, we take their latest rated items to compose the test set C Baselines and Other Specification In this experiment, we train CF model based on the following baselines: (1) Train CF model via NCF method and BPR loss in the centralized setting, where the training process is identical to the previously mentioned baseline (2) Train CF model using Algorithm 4, where users will fully utilize their data by accepting all valid samples (i.e., set p+ u = ∀u ∈ U) To protect user privacy simultaneously, each user will request the item embeddings of all items in the union item set and later upload nonzero gradients of the item embeddings involved in the training as well as zero gradients for those not involved in the training Basically, this method will fully utilize user data at the expense of communication cost, while the user privacy is still preserved (3) Train CF model using Algorithm in which the item + selection probabilities p− u and pu are optimized based on the solution introduced in Section III-B for each user u ∈ U Since this method also requires tu and pratio u − parameters to compute p− u and pu , we set tu = 1e − and pratio = if n1u > zku or | n1u − zku | < 2e − u then use the optimization problem to solve for p+ u and k p− u ; otherwise, we define tu = mu + zu and use the − optimization problem to solve for p+ u and pu Regarding the model specification, embedding dimension is set to 128 The model contains hidden layers, where the output size of each layer decreases by a factor of relatives to its input size (e.g if the input size of a layer is 128, the output size will be 64), and the output at each layer is followed by Rectified Linear Unit (ReLU) activation function except for the output of the final layer In training process, 16 random users will participate in a training iteration, which is less than the least number of items that a user has interacted with in this dataset in order to ensure the condition to protect user privacy mentioned in Section III-B satisfied Moreover, the hyper parameter specified for sampling items, k, is set to 8, and the learning rate is set to 1e-3 by default for all methods D Result In terms of CF model performance, Table IV-D summarizes the evaluation scores of different baselines after 3000 iterations with respect to HR@10 and NDCG@10 More specifically, Figure depicts the evaluation scores of those methods throughout different iterations/communications and metrics It is clear that, with HR@10 or NDCG@10, method (1) outperforms other methods significantly in all metrics Even though the aim of method (2) is to utilize user data in FL setting fully, the CF model trained by this method cannot achieve comparable performance to the CF model trained by method (1) The result is even worse when user privacy and communication cost are optimized using method (3) Although the accuracy of method (2) and (3) is slightly worse than method (1) but it is still better than Federated Neural Collaborative Filtering (FedNCF) [30] which is the most recent NCF with federated setting One of the reasons why using Federated Learning cannot make the CF model achieve the same performance as in the centralized setting is that user correlation cannot be sufficiently learned in FL setting [30] One of the outstanding advantages of this method over FedeRank solution is to protect user privacy is that the item embeddings of seen items have more chance to get updated Figure and illustrates the update frequency related to each item ID of a user in 500 iterations when using FedeRank and our solution to protect user privacy respectively, where the user has interacted with a decent number of items It is clear that, the update frequency of seen items when using FedeRank solution is only around 10 to 25, which is much less than the statistics of the novel solution in which the update frequency of seen items is around 20 up to 40 The reason for this is because the probability of updating seen item embeddings when using FedeRank is only around 58%, but the probability is 100% when using the novel solution, since the novel solution also defines around 20% chance to create invalid training samples related to unseen item embeddings Another example of how these solutions balance the update frequency of item embeddings when a user only interacts with very few number of items are shown in Figures and It is evident that the update frequency of seen item embeddings when using the novel solution is again higher than that of the FedeRank solution, while the frequency of seen item embeddings is still intertwined with the frequency of unseen item embeddings V D ISCUSSION AND C ONCLUSION We have presented a new and efficient approach to train Collaborative Filtering model based on Neural Collaborative Filtering method with a pairwise approach in Federated Learning settings for the purpose of protecting user privacy TABLE II S UMMARY OF CF MODEL PERFORMANCE ON DIFFERENT METHODS Methods (1) (2) (3) FedNCF [30] HR@10 0.68 0.65 0.62 0.61 NDCG@10 0.43 0.38 0.35 0.34 Security Level No Moderate Highest Moderate Fig Update frequency of item embeddings for a user interacted with a decent number of items when using FedeRank algorithm Fig CF model performance on different methods with respect to HR@10 and NDCG@10 with only a small communication cost required This solution provides a more efficient approach to utilizing the data in FL setting without disclosing user information than one of the most recent research projects called FedeRank In addition, it also has higher accuracy than FedNCF To a greater extent, this novel solution could be applied on another framework called Neural Collaborative Ranking (NCR) [31], which also leverages the pairwise approach for the ranking task A limitation of our work is the trade-off between privacy and accuracy Although the accuracy is slightly reduced, the federated models generally provide acceptable recommendation quality Further work should be to use adaptive sampling methodology to adjust the sampling weights associated with the items in the union item set, where the probability of sampling common items should be decreased over time, and for some items that are hardly ever appeared in the union item set, they should have high probability to be selected Fig Update frequency of item embeddings for a user interacted with a decent number of items when using Algorithm Fig Update frequency of item embeddings for a user interacted with very few number of items when using Algorithm Fig Update frequency of item embeddings for a user interacted with very few number of items when using FedeRank algorithm R EFERENCES [1] Z Lin, W Pan, Q Yang, and Z Ming, “A generic federated recommendation framework via fake marks and secret sharing,” ACM Trans Inf Syst., jun 2022, just Accepted [Online] Available: https://doi.org/10.1145/3548456 [2] U Weinsberg, S Bhagat, S Ioannidis, and N Taft, “Blurme: Inferring and obfuscating user gender based on ratings,” in Proceedings of the Sixth ACM Conference on Recommender Systems, ser RecSys ’12 New York, NY, USA: Association for Computing Machinery, 2012, p 195–202 [Online] Available: https://doi.org/10.1145/2365952.2365989 [3] M Kosinski, D Stillwell, and T Graepel, “Private traits and attributes are predictable from digital records of human behavior,” Proceedings of the National Academy of Sciences, vol 110, no 15, pp 5802–5805, 2013 [Online] Available: https://www.pnas.org/doi/abs/10.1073/pnas.1218772110 [4] T Ha, T K Dang, H Le, and T A Truong, “Security and privacy issues in deep learning: A brief review,” SN Computer Science, vol 1, no 5, p 253, Aug 2020 [5] B McMahan, E Moore, D Ramage, S Hampson, and B A y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, ser Proceedings of Machine Learning Research, A Singh and J Zhu, Eds., vol 54 PMLR, 20–22 Apr 2017, pp 1273–1282 [Online] Available: https://proceedings.mlr.press/v54/mcmahan17a.html [6] B McFee, L Barrington, and G Lanckriet, “Learning content similarity for music recommendation,” IEEE Transactions on Audio, Speech, and Language Processing, vol 20, no 8, pp 2207–2218, 2012 [7] J Yuan, W Shalaby, M Korayem, D Lin, K AlJadda, and J Luo, “Solving cold-start problem in large-scale recommendation engines: A deep learning approach,” in 2016 IEEE International Conference on Big Data (Big Data) Los Alamitos, CA, USA: IEEE Computer Society, dec 2016, pp 1901–1910 [Online] Available: https://doi.ieeecomputersociety.org/10.1109/BigData.2016.7840810 [8] S Rendle, C Freudenthaler, Z Gantner, and L Schmidt-Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, ser UAI ’09 Arlington, Virginia, USA: AUAI Press, 2009, p 452–461 [9] X He, L Liao, H Zhang, L Nie, X Hu, and T.-S Chua, “Neural collaborative filtering,” in Proceedings of the 26th International Conference on World Wide Web, ser WWW ’17 Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, 2017, p 173–182 [Online] Available: https://doi.org/10.1145/3038912.3052569 [10] V W Anelli, Y Deldjoo, T Di Noia, A Ferrara, and F Narducci, “Federank: User controlled feedback with federated recommender systems,” in Advances in Information Retrieval, D Hiemstra, M.-F Moens, J Mothe, R Perego, M Potthast, and F Sebastiani, Eds Cham: Springer International Publishing, 2021, pp 32–47 [11] L Sweeney, “K-anonymity: A model for protecting privacy,” Int J Uncertain Fuzziness Knowl.-Based Syst., vol 10, no 5, p 557–570, oct 2002 [Online] Available: https://doi.org/10.1142/S0218488502001648 [12] A Machanavajjhala, J Gehrke, D Kifer, and M Venkitasubramaniam, “L-diversity: privacy beyond k-anonymity,” in 22nd International Conference on Data Engineering (ICDE’06), 2006, pp 24–24 [13] N Li, T Li, and S Venkatasubramanian, “t-closeness: Privacy beyond kanonymity and l-diversity,” in 2007 IEEE 23rd International Conference on Data Engineering, 2007, pp 106–115 [14] A Evfimievski, J Gehrke, and R Srikant, “Limiting privacy breaches in privacy preserving data mining,” in Proceedings of the TwentySecond ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ser PODS ’03 New York, NY, USA: Association for Computing Machinery, 2003, p 211–222 [Online] Available: https://doi.org/10.1145/773153.773174 [15] P Golle, “Revisiting the uniqueness of simple demographics in the us population,” in Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, ser WPES ’06 New York, NY, USA: Association for Computing Machinery, 2006, p 77–80 [Online] Available: https://doi.org/10.1145/1179601.1179615 [16] T Ha, T K Dang, and N Nguyen-Tan, “Comprehensive analysis of privacy in black-box and white-box inference attacks against generative adversarial network,” in Future Data and Security Engineering, T K Dang, J Kăung, T M Chung, and M Takizawa, Eds Cham: Springer International Publishing, 2021, pp 323–337 [17] M Ammad-ud-din, E Ivannikova, S A Khan, W Oyomno, Q Fu, K E Tan, and A Flanagan, “Federated collaborative filtering for privacy-preserving personalized recommendation system,” CoRR, vol abs/1901.09888, 2019 [Online] Available: http://arxiv.org/abs/1901.09888 [18] Y Koren, R Bell, and C Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol 42, no 8, pp 30–37, 2009 [19] D Chai, L Wang, K Chen, and Q Yang, “Secure federated matrix factorization,” IEEE Intelligent Systems, vol 36, no 05, pp 11–20, sep 2021 [20] Y Hu, Y Koren, and C Volinsky, “Collaborative filtering for implicit feedback datasets,” in 2008 Eighth IEEE International Conference on Data Mining, 2008, pp 263–272 [21] S L Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol 60, no 309, pp 63–69, 1965 [Online] Available: http://www.jstor.org/stable/2283137 [22] L Minto, M Haller, B Livshits, and H Haddadi, “Stronger privacy for federated collaborative filtering with implicit feedback,” in Fifteenth ACM Conference on Recommender Systems, ser RecSys ’21 New York, NY, USA: Association for Computing Machinery, 2021, p 342–350 [Online] Available: https://doi.org/10.1145/3460231.3474262 [23] C Niu, F Wu, S Tang, L Hua, R Jia, C Lv, Z Wu, and G Chen, “Billion-scale federated learning on mobile clients: A submodel design with tunable privacy,” in Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, ser MobiCom ’20 New York, NY, USA: Association for Computing Machinery, 2020 [Online] Available: https://doi.org/10.1145/3372224.3419188 ´ Erlingsson, V Pihur, and A Korolova, “Rappor: Randomized [24] Ulfar aggregatable privacy-preserving ordinal response,” in In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security ACM, 2014 [25] K Bonawitz, V Ivanov, B Kreuter, A Marcedone, H B McMahan, S Patel, D Ramage, A Segal, and K Seth, “Practical secure aggregation for privacy-preserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser CCS ’17 New York, NY, USA: Association for Computing Machinery, 2017, p 1175–1191 [Online] Available: https://doi.org/10.1145/3133956.3133982 [26] T K Dang, Q P Nguyen, and V S Nguyen, “Evaluating session-based recommendation approaches on datasets from different domains,” in Future Data and Security Engineering, T K Dang, J Kăung, M Takizawa, and S H Bui, Eds Cham: Springer International Publishing, 2019, pp 577–592 [27] B H Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol 13, no 7, pp 422–426, 1970 [28] F M Harper and J A Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol 5, no 4, pp 1–19, 2015 [29] D P Kingma and J Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014 [30] V Perifanis and P S Efraimidis, “Federated neural collaborative filtering,” Know.-Based Syst., vol 242, no C, apr 2022 [Online] Available: https://doi.org/10.1016/j.knosys.2022.108441 [31] B Song, X Yang, Y Cao, and C Xu, “Neural collaborative ranking,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, ser CIKM ’18 New York, NY, USA: Association for Computing Machinery, 2018, p 1353–1362 [Online] Available: https://doi.org/10.1145/3269206.3271715 Tài liệu tham khảo [1] K Bonawitz, V Ivanov, B Kreuter, A Marcedone, H B McMahan, S Patel, D Ramage, A Segal, and K Seth, “Practical secure aggregation for privacypreserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser CCS ’17 New York, NY, USA: Association for Computing Machinery, 2017, p 1175–1191 [Online] Available: https://doi.org/10.1145/3133956.3133982 [2] A MacLaine, “Introduction to public key cryptography,” 2022 [Online] Available: https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/DiffieHellman_Key_Exchange.svg/1280px-Diffie-Hellman_Key_Exchange.svg.png [3] W J Buchanan, “Elliptic curve diffie hellman (ecdh) with differing elliptic curves.” 2013, https://asecuritysite.com/encryption/ecdh3 [Online] Available: https://asecuritysite.com/encryption/ecdh3 [4] B McMahan, E Moore, D Ramage, S Hampson, and B A y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, ser Proceedings of Machine Learning Research, A Singh and J Zhu, Eds., vol 54 PMLR, 20–22 Apr 2017, pp 1273–1282 [Online] Available: https://proceedings.mlr.press/v54/mcmahan17a.html [5] M Ammad-ud-din, E Ivannikova, S A Khan, W Oyomno, Q Fu, K E Tan, and A Flanagan, “Federated collaborative filtering for privacy-preserving personalized recommendation system,” CoRR, vol abs/1901.09888, 2019 [Online] Available: http://arxiv.org/abs/1901.09888 81 TÀI LIỆU THAM KHẢO [6] V W Anelli, Y Deldjoo, T Di Noia, A Ferrara, and F Narducci, “Federank: User controlled feedback with federated recommender systems,” in Advances in Information Retrieval, D Hiemstra, M.-F Moens, J Mothe, R Perego, M Potthast, and F Sebastiani, Eds Cham: Springer International Publishing, 2021, pp 32–47 [7] K Hjerppe, J Ruohonen, and V Leppanen, “The general data protection regulation: Requirements, architectures, and constraints,” in 2019 IEEE 27th International Requirements Engineering Conference (RE) IEEE, sep 2019 [Online] Available: https://doi.org/10.1109%2Fre.2019.00036 [8] N F Awad and M S Krishnan, “The personalization privacy paradox: An empirical evaluation of information transparency and the willingness to be profiled online for personalization,” MIS Q., vol 30, pp 13–28, 2006 [9] X He, L Liao, H Zhang, L Nie, X Hu, and T.-S Chua, “Neural collaborative filtering,” in Proceedings of the 26th International Conference on World Wide Web, ser WWW ’17 Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, 2017, p 173–182 [Online] Available: https://doi.org/10.1145/3038912.3052569 [10] S Rendle, C Freudenthaler, Z Gantner, and L Schmidt-Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, ser UAI ’09 Arlington, Virginia, USA: AUAI Press, 2009, p 452–461 [11] C Niu, F Wu, S Tang, L Hua, R Jia, C Lv, Z Wu, and G Chen, “Billion-scale federated learning on mobile clients: A submodel design with tunable privacy,” in Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, ser MobiCom ’20 New York, NY, USA: Association for Computing Machinery, 2020 [Online] Available: https://doi.org/10.1145/3372224.3419188 [12] B H Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol 13, no 7, pp 422–426, 1970 [13] B McMahan and D Ramage, “Federated learning: Collaborative machine learning without centralized training data,” Apr 2017 [Online] Available: https://ai.googleblog.com/2017/04/federated-learning-collaborative.html 82 TÀI LIỆU THAM KHẢO [14] L Zhu, Z Liu, and S Han, “Deep leakage from gradients,” in Advances in Neural Information Processing Systems, H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, and R Garnett, Eds., vol 32 Curran Associates, Inc., 2019 [Online] Available: https://proceedings.neurips.cc/paper/2019/file/ 60a6c4002cc7b29142def8871531281a-Paper.pdf [15] W Diffie and M Hellman, “New directions in cryptography,” IEEE Transactions on Information Theory, vol 22, no 6, pp 644–654, 1976 [16] K A Bonawitz, V Ivanov, B Kreuter, A Marcedone, H B McMahan, S Patel, D Ramage, A Segal, and K Seth, “Practical secure aggregation for federated learning on user-held data,” CoRR, vol abs/1611.04482, 2016 [Online] Available: http://arxiv.org/abs/1611.04482 [17] D Adrian, K Bhargavan, Z Durumeric, P Gaudry, M Green, J A Halderman, N Heninger, D Springall, E Thomé, L Valenta, B VanderSloot, E Wustrow, S Zanella-Béguelin, and P Zimmermann, “Imperfect forward secrecy: How diffie-hellman fails in practice,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ser CCS ’15 New York, NY, USA: Association for Computing Machinery, 2015, p 5–17 [Online] Available: https://doi.org/10.1145/2810103.2813707 [18] A K Lenstra, T Kleinjung, and E Thomé, “Universal security; from bits and mips to pools, lakes – and beyond,” Cryptology ePrint Archive, Paper 2013/635, 2013, https://eprint.iacr.org/2013/635 [Online] Available: https://eprint.iacr.org/2013/635 [19] D J Bernstein, N Duif, T Lange, P Schwabe, and B.-Y Yang, “High-speed high-security signatures,” Cryptology ePrint Archive, Paper 2011/368, 2011, https: //eprint.iacr.org/2011/368 [Online] Available: https://eprint.iacr.org/2011/368 [20] J W Bos, J A Halderman, N Heninger, J Moore, M Naehrig, and E Wustrow, “Elliptic curve cryptography in practice,” in Financial Cryptography and Data Security, N Christin and R Safavi-Naini, Eds Berlin, Heidelberg: Springer Berlin Heidelberg, 2014, pp 157–175 83

Định dạng
Số trang	99
Dung lượng	5,75 MB