Preserving privacy for publishing time series data with differential privacy

110 1 0
Preserving privacy for publishing time series data with differential privacy

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY LẠI TRUNG MINH ĐỨC PRESERVING PRIVACY FOR PUBLISHING-TIME-SERIES DATA WITH DIFFERENTIAL PRIVACY Major: COMPUTER SCIENCE Major code: 8480101 MASTER’S THESIS HO CHI MINH CITY, July 2023 ii THIS THESIS IS COMPLETED AT HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY – VNU-HCM Supervisor 1: Assoc Professor DANG TRAN KHANH Supervisor 2: PhD LE LAM SON Examiner 1: Assoc Professor TRAN TUAN DANG Examiner 2: PhD DANG TRAN TRI This master’s thesis is defended at HCM City University of Technology, VNU- HCM City on 10 July 2023 Master’s Thesis Committee: Chairman - Assoc Professor TRAN MINH QUANG Secretary - PhD NGUYEN THI AI THAO Examiner - Assoc Professor TRAN TUAN DANG Examiner - PhD DANG TRAN TRI Commissioner - PhD TRUONG THI AI MINH Approval of the Chairman of Master’s Thesis Committee and Dean of Faculty of Computer Science and Engineering after the thesis being corrected (If any) CHAIRMAN OF THESIS COMMITEE HEAD OF FACULTY OF COMPUTER SCIENCE AND ENGINEERING iii VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY SOCIALIST REPUBLIC OF VIETNAM Independence – Freedom - Happiness HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY THE TASK SHEET OF MASTER’S THESIS Full name: LẠI TRUNG MINH ĐỨC Date of birth: 24 May 1996 Major: Computer Science I Student ID: 2070686 Place of birth: Ho Chi Minh City Major ID: 8480101 THESIS TITLE: Preserving Privacy for Publishing Time-series Data with Differential Privacy (Duy trì quyền riêng tư cho thời gian xuất liệu chuỗi với quyền riêng tư khác biệt) II TASKS AND CONTENTS: Week Task Time W1 to W2 - Conduct the literature review and methodology to conduct the study - Define scope of work for the main research of the thesis - Write up the report weeks W3 to W4 - Research of related works/projects on Differential Privacy, Time-series privacy - Write up the report (cont.) weeks W5 to W14 - Implementing the algorithms of Differential Privacy on Time series data - Comparing those algorithms with data utility metrics - Finding the data characteristics to choose the best algorithms - Write up the report (cont.) 10 weeks W15 to W16 - Finalize the solution package - Finalize the document - Prepare the presentation weeks Thesis Submission – June 2023 iv III THESIS START DAY: (According to the decision on assignment of Master’s thesis) 05 September 2022 IV THESIS COMPLETION DAY: (According to the decision on the assignment of the Master’s thesis) 12 June 2023 V SUPERVISOR (Please fill in the supervisor’s full name and academic rank) Assoc Prof DANG TRAN KHANH – PGS TS ĐẶNG TRẦN KHÁNH PhD LE LAM SON – TS LÊ LAM SƠN Ho Chi Minh City, 09 June 2023 SUPERVISOR (Full name and signature) Assoc Prof DANG TRAN KHANH SUPERVISOR (Full name and signature) PhD LE LAM SON CHAIR OF PROGRAM COMMITTEE (Full name and signature) DEAN OF FACULTY OF COMPUTER SCIENCE AND ENGINEERING (Full name and signature) Note: Student must pin this task sheet as the first page of the Master’s Thesis booklet v ACKNOWLEDGEMENT I would like to express my profound gratitude to all those who have supported me throughout the journey of completing this master's thesis First and foremost, I extend my heartfelt thanks to my supervisors, Assoc Prof DANG TRAN KHANH and PhD LE LAM SON, for your invaluable guidance, patience, and expertise Working under your supervision has been an honor, and your insightful feedback and encouragement have been instrumental in shaping the outcome of this research I would also like to extend my sincere appreciation to the team at Unilever Vietnam, particularly Mr ERIC FRANCIS CHEN – Head of UniOps and Data & Analytics SEA&I, Mr IAN LOW - my line manager, and the awesome Unilever Data & Analytics Vietnam team Your willingness to assist, and insightful discussions have significantly contributed to the successful completion of this thesis Furthermore, I want to acknowledge my friends, classmates, doctors, and psychologists for your mental supports and contributions Your motivation, discussions, treatments with diverse perspectives have supported me to live positively Special appreciation goes to Ms HOA NINH, Ms LINH PHAM, Ms XUAN NGUYEN, Mr RYAN NGUYEN for your huge support and encouragement throughout this journey Lastly, I am indebted to my family: mom THAI LINH PHAM, sister MARY PHUC LAI for your unconditional love, and constant encouragement Your sacrifices and understanding have been the bedrock of my achievements I am profoundly grateful for your presence and support To all those mentioned above, as well as those who have contributed in immeasurable ways, I offer my sincerest thanks Your efforts, belief, and contributions have made this thesis possible vi ABSTRACT This thesis explores the crucial domain of data privacy, encompassing the rights of individuals to maintain control over the collection, usage, sharing, and storage of their personal data Within the realm of personal data, time-series data poses distinct challenges and sensitivities when it comes to privacy protection Time-series data comprises information with temporal attributes that can unveil patterns, trends, and behaviors of individuals or groups over time, and carries inherent risks in terms of privacy breaches The primary objectives of this thesis are as follows: first, to review traditional methods of privacy-preserving data publishing, with a specific focus on efforts made for protecting time-series data; second, to gain a comprehensive understanding of the theories and principles of Differential Privacy, a promising approach for privacy preservation; third, to explore notable mechanisms within Differential Privacy that are applicable to time-series data; fourth, to investigate and address privacy challenges in data partnerships through the integration of Differential Privacy and other relevant techniques; and finally, to develop a process for the application of privacy techniques within the context of business collaborations The contribution of this thesis is twofold Firstly, it aims to make the concept of Differential Privacy more accessible and comprehensible, particularly for nonacademic and corporate audiences who may not have a deep technical background By presenting Differential Privacy in a clear and straightforward manner, this research facilitates its adoption and implementation in real-world scenarios Secondly, this thesis proposes a guideline for the practical application and evaluation of Differential Privacy on time-series data, specifically within the context of data collaboration among multiple parties The guideline serves as a valuable resource for organizations seeking to protect privacy while engaging in collaborative data initiatives vii TÓM TẮT LUẬN VĂN Luận văn nghiên cứu lĩnh vực quan trọng quyền riêng tư liệu, bao gồm quyền cá nhân để trì kiểm sốt việc thu thập, sử dụng, chia sẻ lưu trữ liệu cá nhân họ Trong lĩnh vực liệu cá nhân, liệu chuỗi thời gian đặt thách thức nhạy cảm riêng biệt đến việc bảo vệ quyền riêng tư Dữ liệu chuỗi thời gian bao gồm thơng tin có thuộc tính thời gian tiết lộ mẫu, xu hướng hành vi cá nhân nhóm qua thời gian, mang theo rủi ro việc vi phạm quyền riêng tư Các mục tiêu luận văn sau: thứ nhất, xem xét phương pháp truyền thống việc xuất liệu bảo vệ quyền riêng tư, với tập trung đặc biệt vào nỗ lực để bảo vệ liệu chuỗi thời gian; thứ hai, để hiểu rõ lý thuyết nguyên tắc Sự khác biệt Quyền riêng tư, phương pháp hứa hẹn để bảo vệ quyền riêng tư; thứ ba, khám phá chế đáng ý Sự khác biệt Quyền riêng tư mà áp dụng cho liệu chuỗi thời gian; thứ tư, điều tra đối mặt với thách thức quyền riêng tư đối tác liệu thông qua việc tích hợp Sự khác biệt Quyền riêng tư kỹ thuật liên quan khác; cuối cùng, phát triển quy trình để áp dụng kỹ thuật bảo vệ quyền riêng tư bối cảnh hợp tác kinh doanh Đóng góp luận văn hai phần Thứ nhất, nhằm mục tiêu làm cho khái niệm Sự khác biệt Quyền riêng tư trở nên dễ tiếp cận dễ hiểu, đặc biệt đối tượng không học thuật doanh nghiệp khơng có tảng kỹ thuật sâu Bằng cách trình bày Sự khác biệt Quyền riêng tư cách rõ ràng dễ hiểu, nghiên cứu hỗ trợ việc áp dụng thực tình thực tế Thứ hai, luận văn đề xuất hướng dẫn cho việc áp dụng thực tiễn đánh giá Sự khác biệt Quyền riêng tư liệu chuỗi thời gian, đặc biệt bối cảnh hợp tác liệu nhiều bên viii THE COMMITMENT OF THESIS’ AUTHOR I hereby declare that this master thesis is my own original work and has not been submitted before to any institution for assessment purposes Further, I have acknowledged all sources used and have cited these in the reference section …………………………… LAI TRUNG MINH DUC ……………………… Date ix TABLE OF CONTENTS CHAPTER 1: OVERVIEW OF THE THESIS 1 Background and Context Data Publishing and Privacy Preserving Data Publishing Challenges of Privacy Preserving Data Publishing (PPDP) for Time-series data3 Differential Privacy as a powerful player Thesis objectives Thesis contributions Thesis structure CHAPTER 2: PRIVACY MODELS RESEARCHS Attack models and notable privacy models 1.1 Record linkage attack and k-Anonymity privacy model 1.2 Attribute linkage attack and l-diversity and t-closeness privacy model 1.3 Table linkage and δ-presence privacy model 10 1.4 Probabilistic linkage and Differential Privacy model 11 Summary 12 CHAPTER 3: THE INVESTIGATION ON DIFFERENTIAL PRIVACY 14 The need for Differential Privacy principle 14 1.1 No need to model the attack model in detail 14 1.2 Quantifiable privacy loss 15 1.3 Multiple mechanisms composition 16 The promise (and not promised) of Differential Privacy 17 x 2.1 The promise 17 2.2 The not promise 18 2.3 Conclusion 18 Formal definition of Differential Privacy 18 3.1 Terms and notations 19 3.2 Randomized algorithm 20 3.3 𝜀-differential privacy 20 3.4 (𝜀, 𝛿) differential privacy 21 Important concepts of Differential Privacy 22 4.1 The sensitivity 23 4.2 Privacy composition 24 4.3 Post processing 26 Foundation mechanisms of Differential Privacy 26 5.1 Local Differential Privacy and Global Differential Privacy 26 5.2 Laplace mechanism 27 5.3 Exponential mechanism 29 Notable mechanisms for Time-series data 30 6.1 Laplace mechanism (LPA – Laplace Perturbation Algorithm) 30 6.2 Discrete Fourier Transform (DFT) with Laplace mechanism (FPA – Fourier Perturbation Algorithm) 31 6.3 Temporal perturbation mechanism 32 6.4 STL-DP – Perturbed time-series by applying DFT with Laplace mechanism on trends and seasonality 37 CHAPTER 4: EXPERIMENT DESIGNS 38 81 [11] J P Near and C Abuah, Programming Differential Privacy [Online] Available: https://uvm-plaid.github.io/programming-dp/, 2022 [12] F Natasha, "Differential Privacy for Metric Spaces: Information-Theoretic Models for Privacy and Information-Theoretic Models for Privacy and utility with new applications to metrics domains,", Ph.D dissertation, Institut Polytechnique de Paris, Macquarie University, Sydney, Australia, 2021 [13] A Narayanan and V Shmatikov, "Robust De-anonymization of Large Sparse Datasets," 2008 IEEE Symposium on Security and Privacy (sp 2008), Oakland, CA, USA, 2008, pp 111-125, doi: 10.1109/SP.2008.33 [14] A Machanavajjhala, J Gehrke, D Kifer and M Venkitasubramaniam, "Ldiversity: privacy beyond k-anonymity," 22nd International Conference on Data Engineering (ICDE'06), Atlanta, GA, USA, 2006, pp 24-24, doi: 10.1109/ICDE.2006.1 [15] N Li, T Li and S Venkatasubramanian, "t-Closeness: Privacy Beyond kAnonymity and l-Diversity," 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, 2007, pp 106-115, doi: 10.1109/ICDE.2007.367856 [16] B C M Fung, K Wang, R Chen and P S Yu, "Privacy-Preserving Data Publishing: A Survey of Recent Developments," ACM Computing Surveys, vol 42, no 4, pp 1-53, 2010 [17] F Fioretto and P V Hentenryck, "OptStream: Releasing Time Series Privately," Journal of Artificial Intelligence Research, vol 65, pp 423-456, 2019 [18] L Fan and L Xiong, "Real-time aggregate monitoring with differential privacy," in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, USA, pp 2169– 2173, 2012 [19] L Fan and L Xiong, "Adaptively Sharing Time-Series with Differential Privacy,", ArXiv, vol abs/1202.3461, 2012 [20] C Dwork and A Roth, "The Algorithmic Foundations of Differential Privacy," Foundations and Trends® in Theoretical Computer Science, vol 9, no 3–4, pp 211-407, 2014 82 [21] C Dwork, A Smith and J Ullman, "Exposed! A Survey of Attacks on Private Data," Annual Review of Statistics and Its Application, vol 4, pp 61-84, Annual Reviews, 2016 [22] C Dwork, "Differential Privacy: A Survey of Results," in Theory and Applications of Models of Computation TAMC 2008 Lecture Notes in Computer Science, vol 4978, 2008 [23] C Dwork, F McSherry, K Nissim and A Smith, "Calibrating noise to sensitivity in private data analysis," in Proceedings of the Third Conference on Theory of Cryptography, New York, USA, 2006, pp 265-284 [24] C Dwork, G N Rothblum and S Vadhan, "Boosting and Differential Privacy," 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, USA, 2010, pp 51-60, doi: 10.1109/FOCS.2010.12 [25] F Boenisch, "Differential Privacy: General Survey and Analysis of Practicability in the Context of Machine Learning,", M.Sc thesis, Freie Universität Berlin, 2019 [26] H H Arcolezi, J.-F Couchot, D Renaud et al., "Differentially Private Multivariate Time Series Forecasting of Aggregated Human Mobility With Deep Learning: Input or Gradient Perturbation?," Neural Computing and Applications, vol 34, pp 13355–13369, Springer, 2022 [27] K Kim, M Kim and S Woo, "STL-DP: Differentially Private Time Series Exploring Decomposition and Compression Methods," presented in ACM CIKM22-PAS: The 1st International Workshop on Privacy Algorithms in Systems, Georgia, USA, 2022 [28] European Parliament and Council of the European Union, "General Data Protection Regulation (GDPR)," 2018 [Online] Available: https://gdpr.eu/tag/gdpr/ [Accessed 06 2023] [29] G Wright, "TechTarget - RFM analysis (recency, frequency, monetary)," [Online] Available: https://www.techtarget.com/searchdatamanagement/definition/RFM-analysis [Accessed 06 2023] 83 APPENDIX Table: Descriptive Statistics table of the data perturbation output MEA MEDI COLUMNS STD MIN N AN Quantity 31 24 24 Quantity_FPA_Q2_10 45 29 39 Quantity_tFPA_Q2_10 44 32 (63) 38 Quantity_sFPA_Q2_10 64 32 (49) 60 Quantity_FPA_Q3_10 63 39 56 Quantity_tFPA_Q3_10 62 42 (74) 55 Quantity_sFPA_Q3_10 86 43 (67) 79 Quantity_FPA_Q2_5 69 44 61 Quantity_tFPA_Q2_5 69 47 (59) 61 Quantity_sFPA_Q2_5 93 48 (58) 85 Quantity_FPA_Q3_5 111 71 97 Quantity_tFPA_Q3_5 111 73 (80) 98 Quantity_FPA_Q3+IQR_10 116 73 102 Quantity_sFPA_Q3_5 137 74 (62) 124 Quantity_tFPA_Q3+IQR_10 116 76 (69) 103 Quantity_sFPA_Q3+IQR_10 142 77 (101) 129 Quantity_FPA_Q3+IQR_5 223 144 195 Quantity_tFPA_Q3+IQR_5 224 145 (68) 196 Quantity_sFPA_Q3+IQR_5 252 146 (39) 224 Quantity_FPA_Q2_1.0986122886 275 182 237 681098 Quantity_tFPA_Q2_1.098612288 276 183 (108) 238 6681098 Quantity_sFPA_Q2_1.098612288 304 185 (52) 266 6681098 Quantity_LPA_Q2_10 31 197 (4,016) 28 Quantity_FPA_Q2_0.6931471805 433 288 372 599453 Quantity_tFPA_Q2_0.693147180 435 289 (53) 374 5599453 Quantity_sFPA_Q2_0.693147180 463 291 (41) 402 5599453 Quantity_FPA_Q3_1.0986122886 479 312 415 681098 Quantity_tFPA_Q3_1.098612288 480 313 (51) 417 6681098 Quantity_sFPA_Q3_1.098612288 509 314 (38) 446 6681098 Quantity_LPA_Q3_10 31 343 (6,570) 28 MAX 99 261 267 312 419 430 460 441 480 520 779 836 897 903 835 830 1,682 1,645 1,799 2,262 2,027 2,050 4,085 3,263 3,664 3,532 3,369 3,887 3,539 8,340 84 Quantity_LPA_Q2_5 Quantity_FPA_Q3_0.6931471805 599453 Quantity_tFPA_Q3_0.693147180 5599453 Quantity_sFPA_Q3_0.693147180 5599453 Quantity_FPA_Q3+IQR_1.09861 22886681098 Quantity_sFPA_Q3+IQR_1.09861 22886681098 Quantity_tFPA_Q3+IQR_1.09861 22886681098 Quantity_LPA_Q3_5 Quantity_LPA_Q3+IQR_10 Quantity_FPA_Q3+IQR_0.69314 71805599453 Quantity_tFPA_Q3+IQR_0.69314 71805599453 Quantity_sFPA_Q3+IQR_0.69314 71805599453 Quantity_LPA_Q3+IQR_5 Quantity_LPA_Q2_1.0986122886 681098 Quantity_sFPA_Q2_0.1 Quantity_tFPA_Q2_0.1 Quantity_FPA_Q2_0.1 Quantity_LPA_Q2_0.6931471805 599453 Quantity_LPA_Q3_1.0986122886 681098 Quantity_tFPA_Q3_0.1 Quantity_FPA_Q3_0.1 Quantity_sFPA_Q3_0.1 Quantity_LPA_Q3_0.6931471805 599453 Quantity_LPA_Q3+IQR_1.09861 22886681098 Quantity_tFPA_Q3+IQR_0.1 Quantity_sFPA_Q3+IQR_0.1 Quantity_FPA_Q3+IQR_0.1 31 391 (7,389) 28 7,362 757 494 - 657 6,674 758 495 (83) 658 6,736 787 496 (41) 687 7,398 1,003 651 - 873 7,736 1,033 651 (28) 903 7,876 1,005 652 (36) 873 7,446 32 32 684 719 (14,330) (14,951) 28 28 17,497 19,944 1,586 1,028 - 1,382 13,151 1,590 1,030 (52) 1,383 12,496 1,618 1,033 (19) 1,411 11,965 30 1,438 (34,831) 28 35,461 33 1,771 (34,483) 29 37,630 (2) (27) 2,576 2,548 2,548 26,323 22,724 23,103 2,999 1,997 2,972 1,998 2,972 1,998 32 2,804 (61,659) 28 51,866 31 3,110 (71,024) 27 52,263 (44) (11) 4,498 4,497 4,531 39,640 40,192 37,356 5,193 3,423 5,195 3,426 5,227 3,434 26 4,922 (98,152) 26 82,814 25 6,526 (170,174 ) 25 116,757 (16) 9,489 80,045 9,523 81,451 9,479 89,869 10,91 7,153 10,95 7,159 10,92 7,160 85 Quantity_LPA_Q3+IQR_0.69314 71805599453 27 Quantity_LPA_Q2_0.1 27 Quantity_FPA_Q2_0.01 Quantity_tFPA_Q2_0.01 Quantity_sFPA_Q2_0.01 Quantity_FPA_Q3_0.01 Quantity_sFPA_Q3_0.01 Quantity_tFPA_Q3_0.01 Quantity_LPA_Q3_0.1 Quantity_sFPA_Q3+IQR_0.01 Quantity_FPA_Q3+IQR_0.01 Quantity_tFPA_Q3+IQR_0.01 27,89 27,86 27,95 48,86 48,93 48,88 102,6 05 102,6 64 102,6 88 Quantity_LPA_Q3+IQR_0.1 36 Quantity_LPA_Q2_0.01 (398) Quantity_LPA_Q3_0.01 141 Quantity_LPA_Q3+IQR_0.01 1,420 10,36 (166,663 ) 19,39 (604,816 ) 19,73 19,75 (21) 19,78 34,13 34,13 34,14 34,18 (1,002,7 90) 71,08 71,16 71,26 71,49 (1,355,9 76) 188,7 (4,008,7 60 25) 330,7 (7,035,5 31 19) 695,7 (20,273, 61 571) 28 196,373 25 503,293 23,666 249,360 23,609 238,876 23,703 207,434 41,811 417,693 41,844 378,603 41,782 366,735 14 729,898 88,201 1,053,5 39 88,136 850,740 88,132 810,135 59 24 30 20 1,398,1 27 4,210,5 42 6,354,6 59 25,137, 865 Table: Accuracy result of RFM analysis between data perturbations and original data Total Count ACCURA METHOD User True CY Quantity_sFPA_Q2_10 4914 3826 77.9% Quantity_FPA_Q2_10 4914 3719 75.7% Quantity_sFPA_Q2_5 4914 3562 72.5% Quantity_sFPA_Q3_10 4914 3560 72.4% Quantity_tFPA_Q2_10 4914 3540 72.0% Quantity_sFPA_Q3_5 4914 3379 68.8% 86 Quantity_sFPA_Q3+IQR_10 Quantity_FPA_Q3_10 Quantity_tFPA_Q2_5 Quantity_FPA_Q3_5 Quantity_FPA_Q2_5 Quantity_sFPA_Q2_0.693147180559945 Quantity_tFPA_Q3_10 Quantity_sFPA_Q3+IQR_5 Quantity_tFPA_Q3+IQR_10 Quantity_FPA_Q3+IQR_10 Quantity_sFPA_Q2_1.098612288668109 Quantity_tFPA_Q2_1.098612288668109 Quantity_tFPA_Q3_5 Quantity_tFPA_Q3+IQR_5 Quantity_sFPA_Q3+IQR_1.09861228866 81098 Quantity_FPA_Q3_1.0986122886681098 Quantity_sFPA_Q2_0.1 Quantity_FPA_Q2_0.6931471805599453 Quantity_tFPA_Q3+IQR_0.1 Quantity_sFPA_Q3_1.098612288668109 Quantity_tFPA_Q3_0.693147180559945 Quantity_FPA_Q2_1.0986122886681098 Quantity_sFPA_Q3_0.693147180559945 Quantity_tFPA_Q3+IQR_0.69314718055 99453 Quantity_sFPA_Q3_0.1 Quantity_FPA_Q3_0.6931471805599453 Quantity_FPA_Q3+IQR_0.1 Quantity_tFPA_Q3_1.098612288668109 Quantity_FPA_Q3+IQR_1.09861228866 81098 Quantity_tFPA_Q2_0.693147180559945 Quantity_FPA_Q3_0.1 Quantity_tFPA_Q3_0.1 4914 4914 4914 4914 4914 3360 3315 3310 3273 3250 68.4% 67.5% 67.4% 66.6% 66.1% 4914 4914 4914 4914 4914 3238 3237 3221 3212 3197 65.9% 65.9% 65.5% 65.4% 65.1% 4914 3191 64.9% 4914 4914 4914 3181 3179 3178 64.7% 64.7% 64.7% 4914 4914 4914 4914 4914 3169 3157 3157 3152 3148 64.5% 64.2% 64.2% 64.1% 64.1% 4914 3147 64.0% 4914 4914 3144 3140 64.0% 63.9% 4914 3136 63.8% 4914 4914 4914 4914 3122 3121 3121 3119 63.5% 63.5% 63.5% 63.5% 4914 3117 63.4% 4914 3113 63.3% 4914 4914 4914 3112 3108 3108 63.3% 63.2% 63.2% 87 Quantity_sFPA_Q3+IQR_0.1 Quantity_tFPA_Q2_0.1 Quantity_tFPA_Q3+IQR_1.09861228866 81098 Quantity_FPA_Q3+IQR_5 Quantity_tFPA_Q3_0.01 Quantity_FPA_Q3_0.01 Quantity_sFPA_Q3_0.01 Quantity_FPA_Q3+IQR_0.69314718055 99453 Quantity_sFPA_Q3+IQR_0.69314718055 99453 Quantity_tFPA_Q3+IQR_0.01 Quantity_sFPA_Q3+IQR_0.01 Quantity_FPA_Q2_0.01 Quantity_FPA_Q3+IQR_0.01 Quantity_tFPA_Q2_0.01 Quantity_sFPA_Q2_0.01 Quantity_FPA_Q2_0.1 Quantity_LPA_Q2_10 Quantity_LPA_Q3_10 Quantity_LPA_Q3+IQR_5 Quantity_LPA_Q3+IQR_0.01 Quantity_LPA_Q3_1.0986122886681098 Quantity_LPA_Q2_5 Quantity_LPA_Q3_0.6931471805599453 Quantity_LPA_Q2_0.1 Quantity_LPA_Q3_0.1 Quantity_LPA_Q3+IQR_1.09861228866 81098 Quantity_LPA_Q3+IQR_0.1 Quantity_LPA_Q2_1.0986122886681098 Quantity_LPA_Q3+IQR_0.69314718055 99453 Quantity_LPA_Q3_5 Quantity_LPA_Q3+IQR_10 Quantity_LPA_Q2_0.01 Quantity_LPA_Q3_0.01 Quantity_LPA_Q2_0.6931471805599453 4914 4914 3105 3092 63.2% 62.9% 4914 4914 4914 4914 4914 3085 3085 3083 3081 3074 62.8% 62.8% 62.7% 62.7% 62.6% 4914 3073 62.5% 4914 4914 4914 4914 4914 4914 4914 4914 4914 4914 4914 4914 4914 4914 4914 4914 4914 3071 3041 3030 3027 3017 3005 2991 2975 1802 1779 1599 1413 786 603 603 388 328 62.5% 61.9% 61.7% 61.6% 61.4% 61.2% 60.9% 60.5% 36.7% 36.2% 32.5% 28.8% 16.0% 12.3% 12.3% 7.9% 6.7% 4914 4914 4914 324 260 235 6.6% 5.3% 4.8% 4914 4914 4914 4914 4914 4914 156 154 119 105 57 36 3.2% 3.1% 2.4% 2.1% 1.2% 0.7% 88 Table: Accuracy (RMSE) result from the forecast of data perturbations and original data RAN RMSE RANK RMSE K METHOD _MEA _PREDICTI _SUM _RF N ON M Quantity_tFPA_Q2_10 8207 21 Quantity_FPA_Q2_10 8261 21 2 Quantity_FPA_Q3_10 14703 39 Quantity_sFPA_Q2_10 14705 38 Quantity_tFPA_Q3_10 15443 41 13 Quantity_tFPA_Q2_5 17815 47 Quantity_FPA_Q2_5 18782 50 11 Quantity_sFPA_Q3_10 22095 58 Quantity_sFPA_Q2_5 27137 71 Quantity_FPA_Q3_5 34333 91 10 10 Quantity_tFPA_Q3_5 35297 95 11 19 Quantity_FPA_Q3+IQR_10 37174 99 12 16 Quantity_tFPA_Q3+IQR_10 38485 103 13 15 Quantity_sFPA_Q3_5 46826 124 14 Quantity_sFPA_Q3+IQR_10 48041 127 15 Quantity_LPA_Q2_10 63143 180 16 55 Quantity_LPA_Q2_5 79198 341 17 60 Quantity_tFPA_Q3+IQR_5 85361 230 18 20 Quantity_FPA_Q3+IQR_5 85503 230 19 42 Quantity_sFPA_Q3+IQR_5 99783 268 20 14 Quantity_FPA_Q2_1.0986122886681 109801 295 21 28 098 Quantity_tFPA_Q2_1.0986122886681 113680 303 22 18 098 Quantity_sFPA_Q2_1.0986122886681 125813 339 23 17 098 Quantity_LPA_Q3_10 157349 424 24 56 Quantity_FPA_Q2_0.6931471805599 180710 487 25 24 453 Quantity_tFPA_Q2_0.6931471805599 185744 500 26 36 453 Quantity_LPA_Q3+IQR_10 187696 1421 27 69 Quantity_sFPA_Q2_0.6931471805599 193371 521 28 12 453 Quantity_FPA_Q3_1.0986122886681 193837 522 29 22 098 89 Quantity_tFPA_Q3_1.0986122886681 098 Quantity_sFPA_Q3_1.0986122886681 098 Quantity_LPA_Q3_5 Quantity_tFPA_Q3_0.6931471805599 453 Quantity_FPA_Q3_0.6931471805599 453 Quantity_sFPA_Q3_0.6931471805599 453 Quantity_LPA_Q2_0.6931471805599 453 Quantity_LPA_Q3+IQR_5 Quantity_FPA_Q3+IQR_1.098612288 6681098 Quantity_sFPA_Q3+IQR_1.09861228 86681098 Quantity_tFPA_Q3+IQR_1.09861228 86681098 Quantity_LPA_Q2_1.0986122886681 098 Quantity_tFPA_Q3+IQR_0.69314718 05599453 Quantity_FPA_Q3+IQR_0.693147180 5599453 Quantity_sFPA_Q3+IQR_0.69314718 05599453 Quantity_LPA_Q3_1.0986122886681 098 Quantity_FPA_Q2_0.1 Quantity_sFPA_Q2_0.1 Quantity_tFPA_Q2_0.1 Quantity_LPA_Q3_0.6931471805599 453 Quantity_LPA_Q3+IQR_1.098612288 6681098 Quantity_LPA_Q3+IQR_0.693147180 5599453 Quantity_FPA_Q3_0.1 Quantity_sFPA_Q3_0.1 Quantity_tFPA_Q3_0.1 Quantity_tFPA_Q3+IQR_0.1 205912 558 30 34 211897 571 31 26 266047 1503 32 68 329841 889 33 27 335632 904 34 32 338795 913 35 29 360744 8798 36 72 388685 1278 37 57 427599 1152 38 35 432107 1155 39 21 432751 1166 40 41 675443 2911 41 66 701058 1889 42 30 711372 1922 43 46 722261 1946 44 47 851131 3175 45 59 1282272 1361263 1376267 3474 3639 3709 46 47 48 54 23 40 1578508 5637 49 61 1967164 8479 50 64 2024416 19847 51 67 2341340 2431525 2436939 4904793 6327 6553 6568 13220 52 53 54 55 37 31 38 25 90 Quantity_sFPA_Q3+IQR_0.1 Quantity_FPA_Q3+IQR_0.1 Quantity_LPA_Q2_0.1 Quantity_LPA_Q3_0.1 Quantity_tFPA_Q2_0.01 Quantity_sFPA_Q2_0.01 Quantity_FPA_Q2_0.01 Quantity_FPA_Q3_0.01 Quantity_tFPA_Q3_0.01 Quantity_sFPA_Q3_0.01 Quantity_LPA_Q3+IQR_0.1 Quantity_LPA_Q3_0.01 Quantity_tFPA_Q3+IQR_0.01 Quantity_FPA_Q3+IQR_0.01 Quantity_sFPA_Q3+IQR_0.01 Quantity_LPA_Q2_0.01 Quantity_LPA_Q3+IQR_0.01 5039232 5039631 7000154 1007802 1290266 1311403 1378668 2214131 2307074 2319636 2507054 3593369 4718123 4779886 4790585 5093408 2244466 62 13582 13583 26119 56 57 58 39 33 62 43439 59 63 35061 60 52 35347 61 53 37261 62 50 60003 63 44 62185 64 43 62523 65 45 66 65 67 71 68 48 69 51 70 49 71 70 72 58 10533 81667 12751 12883 12912 50934 80159 Table: Accuracy (RMSE) result of the adjusted forecast version NEW_R NEW_R RANK_NE RANK_P RAN METHOD MSE_S MSE_ME W_PREDIC REDICTI K_RF UM AN TION ON M Quantity_FPA_Q2_1 6300 16 2 Quantity_tFPA_Q2_1 6335 16 Quantity_FPA_Q3_1 12969 34 3 91 Quantity_tFPA_Q3_1 Quantity_sFPA_Q2_ 10 Quantity_tFPA_Q2_5 Quantity_FPA_Q2_5 Quantity_sFPA_Q3_ 10 Quantity_sFPA_Q2_ Quantity_FPA_Q3_5 Quantity_tFPA_Q3_5 Quantity_FPA_Q3+I QR_10 Quantity_LPA_Q2_1 Quantity_tFPA_Q3+I QR_10 Quantity_sFPA_Q3_ Quantity_sFPA_Q3+I QR_10 Quantity_LPA_Q3_1 Quantity_LPA_Q2_5 Quantity_tFPA_Q3+I QR_5 Quantity_FPA_Q3+I QR_5 Quantity_sFPA_Q3+I QR_5 Quantity_FPA_Q2_1 0986122886681098 Quantity_tFPA_Q2_1 0986122886681098 Quantity_sFPA_Q2_ 1.0986122886681098 Quantity_LPA_Q3_5 Quantity_LPA_Q3+I QR_10 Quantity_tFPA_Q2_0 6931471805599453 Quantity_FPA_Q2_0 6931471805599453 13107 34 13 13400 35 14935 15815 39 41 7 11 21170 56 8 24191 63 9 30379 30301 80 80 10 11 10 11 10 19 32255 85 12 12 16 32732 86 13 16 55 33572 88 14 13 15 41452 109 15 14 44836 118 16 15 56894 150 17 24 56 64694 171 18 17 60 73107 193 19 18 20 75538 199 20 19 42 88993 235 21 20 14 94305 249 22 21 28 96434 255 23 22 18 110250 291 24 23 17 118288 312 25 32 68 124621 329 26 27 69 159687 422 27 26 36 161981 428 28 25 24 92 Quantity_sFPA_Q2_ 0.6931471805599453 Quantity_FPA_Q3_1 0986122886681098 Quantity_tFPA_Q3_1 0986122886681098 Quantity_sFPA_Q3_ 1.0986122886681098 Quantity_LPA_Q3+I QR_5 Quantity_LPA_Q2_1 0986122886681098 Quantity_tFPA_Q3_0 6931471805599453 Quantity_FPA_Q3_0 6931471805599453 Quantity_sFPA_Q3_ 0.6931471805599453 Quantity_FPA_Q3+I QR_1.098612288668 1098 Quantity_sFPA_Q3+I QR_1.098612288668 1098 Quantity_tFPA_Q3+I QR_1.098612288668 1098 Quantity_LPA_Q2_0 6931471805599453 Quantity_LPA_Q3_1 0986122886681098 Quantity_tFPA_Q3+I QR_0.693147180559 9453 Quantity_sFPA_Q3+I QR_0.693147180559 9453 Quantity_FPA_Q3+I QR_0.693147180559 9453 Quantity_LPA_Q3_0 6931471805599453 171467 453 29 28 12 173392 458 30 29 22 184380 487 31 30 34 190758 504 32 31 26 233690 618 33 37 57 282065 746 34 41 66 283265 749 35 33 27 293870 777 36 34 32 301062 796 37 35 29 372295 984 38 38 35 390442 1032 39 39 21 396307 1048 40 40 41 484617 1282 41 36 72 501644 1327 42 45 59 607302 1606 43 42 30 635602 1681 44 44 47 644181 1704 45 43 46 779578 2062 46 49 61 93 Quantity_LPA_Q3+I QR_1.098612288668 1098 Quantity_tFPA_Q2_0 Quantity_FPA_Q2_0 Quantity_sFPA_Q2_ 0.1 Quantity_LPA_Q3+I QR_0.693147180559 9453 Quantity_sFPA_Q3_ 0.1 Quantity_FPA_Q3_0 Quantity_tFPA_Q3_0 Quantity_LPA_Q2_0 Quantity_sFPA_Q3+I QR_0.1 Quantity_FPA_Q3+I QR_0.1 Quantity_tFPA_Q3+I QR_0.1 Quantity_LPA_Q3_0 Quantity_tFPA_Q2_0 01 Quantity_sFPA_Q2_ 0.01 Quantity_LPA_Q3+I QR_0.1 Quantity_FPA_Q2_0 01 Quantity_sFPA_Q3_ 0.01 Quantity_tFPA_Q3_0 01 Quantity_FPA_Q3_0 01 Quantity_LPA_Q2_0 01 1141599 3020 47 50 64 1165076 3082 48 48 40 1167740 3089 49 46 54 1172283 3101 50 47 23 1666132 4407 51 51 67 2052540 5430 52 53 31 2057484 5443 53 52 37 2063080 5457 54 54 38 3100171 8201 55 58 62 4340845 11483 56 56 39 4369844 11560 57 57 33 4411216 11669 58 55 25 5558968 14706 59 59 63 29420 60 60 52 29767 61 61 53 30328 62 66 65 30781 63 62 50 52725 64 65 45 53081 65 64 43 53181 66 63 44 83943 67 71 70 1112083 1125215 1146432 1163545 1993040 2006469 2010269 3173047 94 Quantity_tFPA_Q3+I QR_0.01 Quantity_sFPA_Q3+I QR_0.01 Quantity_FPA_Q3+I QR_0.01 Quantity_LPA_Q3_0 01 Quantity_LPA_Q3+I QR_0.01 4045681 4096919 4250846 5754476 1122971 64 107028 68 68 48 108384 69 70 49 112456 70 69 51 152234 71 67 71 297082 72 72 58 95 VITA Full name: LAI TRUNG MINH DUC Date of Birth: 24 May 1996 Place of Birth: HO CHI MINH CITY Address: 156 NGUYEN LUONG BANG, DISTRICT 7, HCMC Website: https://www.linkedin.com/in/henryduclai/ 2019 – Bachelor of Cyber Security – FPT University 2019 – 2023 – Data Science Manager @ Unilever Vietnam

Ngày đăng: 25/10/2023, 22:15

Tài liệu cùng người dùng

Tài liệu liên quan