Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 130 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
130
Dung lượng
605,79 KB
Nội dung
B GIÁO , mn D C VÀ ÀO T O B QU C PHÒNG VI N KHOA H C VÀ CÔNG NGH QUÂN S - L - NG TH D NG DISTRIBUTED SOLUTIONS IN PRIVACY PRESERVING DATA MINING (Nghiên c u xây d ng m t s gi i pháp đ m b o an toàn thơng tin q trình khai phá d li u) LU N ÁN TI N S TOÁN H C Hà N i - 2011 B GIÁO D C VÀ ÀO T O B QU C PHÒNG VI N KHOA H C VÀ CÔNG NGH QUÂN S - L - NG TH D NG DISTRIBUTED SOLUTIONS IN PRIVACY PRESERVING DATA MINING (Nghiên c u xây d ng m t s gi i pháp đ m b o an tồn thơng tin q trình khai phá d li u) Chuyên ngành: B o đ m toán h c cho máy tính h th ng tính tốn Mã s : 62 46 35 01 LU N ÁN TI N S TOÁN H C Ng i h ng d n khoa h c: GIÁO S - TI N S KHOA H C H TÚ B O PHÓ GIÁO S - TI N S B CH NH T H NG Hà N i - 2011 Pledge I promise that this thesis is a presentation of my original research work Any of the content was written based on the reliable references such as published papers in distinguished international conferences and journals, and books published by widely-known publishers Results and discussions of the thesis are new, not previously published by any other authors i Contents INTRODUCTION 1.1 Privacy-preserving data mining: An overview 1.2 Objectives and contributions 1.3 Related works 1.4 Organization of thesis 12 METHODS FOR SECURE MULTI-PARTY COMPUTATION 13 2.1 Definitions 13 2.1.1 Computational indistinguishability 13 2.1.2 Secure multi-party computation 14 2.2 Secure computation 15 2.2.1 Secret sharing 15 2.2.2 Secure sum computation 16 2.2.3 Probabilistic public key cryptosystems 17 2.2.4 Variant ElGamal Cryptosystem 18 2.2.5 Oblivious polynomial evaluation 20 2.2.6 Secure scalar product computation 21 2.2.7 Privately computing ln x 22 PRIVACY PRESERVING FREQUENCY-BASED LEARNING IN 2PFD SETTING 24 3.1 Introduction 24 3.2 Privacy preserving frequency mining in 2PFD setting 27 3.2.1 Problem formulation 27 3.2.2 Definition of privacy 29 3.2.3 Frequency mining protocol 30 ii 3.3 3.4 3.5 3.2.4 Correctness Analysis 32 3.2.5 Privacy Analysis 34 3.2.6 Efficiency of frequency mining protocol 37 Privacy Preserving Frequency-based Learning in 2PFD Setting 38 3.3.1 Naive Bayes learning problem in 2PFD setting 38 3.3.2 Naive Bayes learning Protocol 40 3.3.3 Correctness and privacy analysis 42 3.3.4 Efficiency of naive Bayes learning protocol 42 An improvement of frequency mining protocol 44 3.4.1 Improved frequency mining protocol 44 3.4.2 Protocol Analysis 45 Conclusion 46 ENHANCING PRIVACY FOR FREQUENT ITEMSET MINING IN VERTICALLY 49 4.1 Introduction 49 4.2 Problem formulation 51 4.2.1 Association rules and frequent itemset 51 4.2.2 Frequent itmeset identifying in vertically distributed data 52 4.3 Computational and privacy model 53 4.4 Support count preserving protocol 54 4.4.1 Overview 54 4.4.2 Protocol design 56 4.4.3 Correctness Analysis 57 4.4.4 Privacy Analysis 59 4.4.5 Performance analysis 61 Support count computation-based protocol 64 4.5.1 Overview 64 4.5.2 Protocol Design 65 4.5.3 Correctness Analysis 65 4.5.4 Privacy Analysis 67 4.5.5 Performance analysis 68 Using binary tree communication structure 69 4.5 4.6 iii 4.7 Privacy-preserving distributed Apriori algorithm 70 4.8 Conclusion 71 PRIVACY PRESERVING CLUSTERING 73 5.1 Introduction 73 5.2 Problem statement 74 5.3 Privacy preserving clustering for the multi-party distributed data 76 5.4 5.5 5.3.1 Overview 76 5.3.2 Private multi-party mean computation 78 5.3.3 Privacy preserving multi-party clustering protocol 80 Privacy preserving clustering without disclosing cluster centers 82 5.4.1 Overview 83 5.4.2 Privacy preserving two-party clustering protocol 85 5.4.3 Secure mean sharing 87 Conclusion 88 PRIVACY PRESERVING OUTLIER DETECTION 91 6.1 Introduction 91 6.2 Technical preliminaries 92 6.2.1 Problem statement 92 6.2.2 Linear transformation 93 6.2.3 Privacy model 94 6.2.4 Private matrix product sharing 95 Protocols for the horizontally distributed data 95 6.3.1 Two-party protocol 97 6.3.2 Multi-party protocol 100 6.3 6.4 Protocol for two-party vertically distributed data 101 6.5 Experiments 104 6.6 Conclusions 106 SUMMARY 107 Publication List 110 Bibliography 111 iv List of Phrases Abbreviation Full name PPDM Privacy Preserving Data Mining k-NN k-nearest neighbor EM Expectation-maximization SMC Secure Multiparty Computation DDH Decisional Diffie-Hellman PMPS Private Matrices Product Sharing SSP Secure Scalar Product OPE Oblivious polynomial evaluation ICA Independent Component Analysis 2PFD 2-part fully distributed setting FD fully distributed setting c ≡ computational indistinguishability v List of Tables 4.1 The communication cost 62 4.2 The complexity of the support count preserving protocol 63 4.3 The parties’s time for the support count preserving protocol 64 4.4 The communication cost 68 4.5 The complexity of the support count computation protocol 69 4.6 The parties’s time for the support count computation protocol 70 6.1 The parties’s computational time for the horizontally distributed data 105 6.2 The parties’s computational time for the vertically distributed data 105 vi List of Figures 3.1 Frequency mining protocol 33 3.2 The time used by the miner for computing the frequency f 38 3.3 Privacy preserving protocol of naive Bayes learning 41 3.4 The computational time for the first phase and the third phrase 43 3.5 The time for computing the key values in the first phase 43 3.6 The time for computing the frequency f in third phrase 44 3.7 Improved frequency mining protocol 47 4.1 Support count preserving protocol 58 4.2 The support count computation protocol 66 4.3 Privacy-preserving distributed Apriori protocol 72 5.1 Privacy preserving multi-party mean computation 79 5.2 Privacy preserving multi-party clustering protocol 81 5.3 Privacy preserving two-party clustering 86 5.4 Secure mean sharing 89 6.1 Private matrix product sharing (PMPS) 96 6.2 Protocol for two-party horizontally distributed data 98 6.3 Protocol for multi-party horizontally distributed data 101 6.4 Protocol for two-party vertically distributed data 103 vii Chapter INTRODUCTION 1.1 Privacy-preserving data mining: An overview Data mining plays an important role in the current world and provides us a powerful tool to efficiently discover valuable information from large databases [25] However, the process of mining data can result in a violation of privacy, therefore, issues of privacy preservation in data mining are receiving more and more attention from the this community [52] As a result, there are a large number of studies has been produced on the topic of privacy-preserving data mining (PPDM) [72] These studies deal with the problem of learning data mining models from the databases, while protecting data privacy at the level of individual records or the level of organizations Basically, there are three major problems in PPDM [8] First, the organizations such as government agencies wish to publish their data for researchers and even community However, they want to preserve the data privacy, for example, highly sensitive financial and health private data Second, a group of the organizations (or parties) wishes to together obtain the mining result on their joint data without disclosing each party’s privacy information Third, a miner wishes to collect data or obtain the data mining models from the individual users, while preserving privacy of each user Consequently, PPDM can be formed into three following areas depending on the models of information sharing Privacy-preserving data publishing: The model of this research consists of only an organization, is the trusted data holder This organization wishes to publish its data to the miner or the research community such that the anonymized data are useful for the data mining applications For example, some hospitals collect records from their patients for the some required SUMMARY This thesis have proposed four solutions for four problems in privacy preserving data mining in distributed data In each solution, we provided analysis to prove privacy and correctness based on the semi-honest security model and the secure multi-party computation methods We also evaluated the communication cost and computational complexity based on the estimation method In addition, we provided some experimental results to show how efficient and practical of the solutions In the first work, we proposed a solution for privacy-preserving data mining in a new scenario so-called 2PFD setting In this setting, each record is owned by two different users, one user only knows the values for a subset of attributes and the other knows the values for the remaining attributes We proposed a solution allows a miner to learn the frequency-based data mining models in 2PFD setting such as naive Bayes learning, decision tree learning, association rules mining, Pearson correlation analysis, etc., while preserving each users privacy The crucial step in the proposed solution is the privacy-preserving computation of frequencies of a tuple of values in the users data We illustrated the applicability of the solution by using it to build the privacy preserving protocol for the naive Bayes classifier learning Experimental results show that our protocol is efficient We also showed an improvement of technique using Shamir secure sharing scheme to allows the miner to be able to obtain frequency without requiring the full participating of all user pairs In the second work, we considered a scenario in which data are collected and maintained in vertically distributed model by different parties We proposed protocols allow the parties to cooperate for frequent itemset mining on their joint data, while preserving privacy of participants The important security feature of our protocols which are better than the previous protocols’s one in way that we achieve the full privacy protection of the parties That is, we not assume the existence of any kind of trusted parties Moreover, no collusion of parties can make any privacy breaches, unless all parties together 107 make a single collusion, which does not exist in fact In third work, we considered a data set that is horizontally partitioned into several parties We proposed a solution that allows the parties to cluster the joint data set using the EM algorithm, without revealing anything except for the final results So, each party could learn the cluster to which each of their data objects belongs, but they learn nothing else We gave two protocols for privacy preserving EM-based clustering: one for multi-party distributed data and one for two-party distributed data For the multi-party protocol, unlike the existing protocol, it does not reveal sum results of numerator and denominator in the secure computation for the parameters of EM algorithm, therefore, the proposed protocol is more secure and it allows the number of participating parties to be arbitrary For two-party case, we proposed a protocol that allows computing covariance matrices and final results without revealing the private information and the means In forth work, we have proposed a solution for privacy-preserving multivariate outlier detection on both vertically and horizontally distributed data models Basically, the proposed solution is based on techniques: linear transformation, private matrix product sharing, secure mean sharing and secure sum Privacy of the protocols in the solution is based on both Semi-honest and expansion security models In addition, we provided the experiments to show the computational complexity of the protocol is linear in the number of data attributes and the size of database We also show that they are very efficient in horizontally distributed data that mainly depends on the number of data attributes The proposed solution is useful in the scenario that multi parties wish to cooperate for outlier detection on their joint data sets, while they want to keep data privacy For example, two companies need to share their network log data to building an intrusion detection system, some banks need to share their customers data to find fraud cases, etc Also, although proposed solution are technically mature enough to be used for several applications of the privacy preserving data mining There are still some issues of solutions that are able to be improved in the future The general question for proposed solutions is that how could participants 108 may extend to the malicious behavior ? Although we can use some available solutions to integrate into our solutions for against malicious adversaries, the current solutions may be quite expensive for practical applications Thus, finding the efficient solutions may defend against malicious adversaries is an open problem Another interesting problem is to develop a general solution for privacy-preserving distributed data mining such that any privacy preserving data mining problem can be solved by using this solution Finding the general solution may be impossible, but the simple idea that we can design building blocks for primitive problems such as frequency mining, mean computation and so on, then these building blocks can be composed to design solutions for more complex problems, and make a general programming interface for future use Thus an open problem is to find the general primitives for the various data mining algorithms and design the building blocks for these primitives In addition it can ensure that the compositions of primitives has to meet the privacy purpose as well as the optimization in efficient for many data mining applications For 2PFD setting, in current the main problem we solved, is the privacy preserving frequency mining, which can be the key component of privacy preserving protocols for several data mining tasks such as naive Bayes learning, decision tree learning, association rules mining, Pearson correlation analysis, etc There may be many other tasks of privacy preserving data mining in 2PFD setting, such as multivariate regression analysis, that would be of interest for future work In addition, in the proposed frequency mining solution in 2PFD, half of the users need two interactions with the miner, so a natural question is whether we can design a method in which each user needs only one interaction with the miner For forth work, though the linear transformation technique meets the privacy model, ICA-based attacks method can cause the privacy breaches to the our transformation approach Thus, an open problem is to carefully select the transformation matrix to adapt for each particular data distribution such that the chosen perturbation is more resilient to the ICA-based attacks This problem should be investigated in the future 109 PUBLICATION LIST [1]“Privacy Preserving Frequency Mining in 2-Part Fully Distributed Setting”, IEICE Trans Information Systems, Vol.E93-D, No.10, 27012708, October 2010 [2] “Enhancing Privacy in Distributed Data Clustering”, Journal of Computer Science and Cybernetics, Vol 26, No 2, 1-15, 2010 [3] “Enhancing Privacy in Frequent Itemset Mining, submitted to journal “Expert Systems with Applications”, Elsevier [4] “Privacy Preserving Classification in Two-Dimension Distributed Data”, International Conference on Knowledge and Systems Engineering KSE 2010, 7-9 October, Hanoi, 96-103, 2010 [5] “Privacy preserving EM-based Clustering”, IEEE RIVF International Conference on Computing and Communication Technologies, RIVF09, 13-17 July 2009, Da Nang, 111-117, IEEE Press, 2009 [6] “Privacy Preserving for Multivariate Outlier Detection”, Third International Conference on Knowledge, Information and Creativity Support Systems (KICSS 2008), December 22-23, Hanoi, 7-16, 2008 110 Bibliography [1] Dakshi Agrawal and Charu C Aggarwal On the design and quantification of privacy preserving data mining algorithms In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 247–255 ACM, 2001 [2] Rakesh Agrawal and Ramakrishnan Srikant Privacy-preserving data mining SIGMOD Rec., 29(2):439–450, 2000 [3] Rakesh Agrawal, Ramakrishnan Srikant, and Dilys Thomas Privacy preserving olap In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 251–262 ACM, 2005 [4] Shipra Agrawal and Jayant R Haritsa A framework for high-accuracy privacy-preserving mining In ICDE ’05: Proceedings of the 21st International Conference on Data Engineering, pages 193–204 IEEE Computer Society, 2005 [5] Fatemah A Alqallaf, Kjell P Konis, R Douglas Martin, and Ruben H Zamar Scalable robust covariance and correlation estimates for data mining In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’02, pages 14–23 ACM, 2002 [6] Roberto J Bayardo and Rakesh Agrawal Data privacy through optimal k-anonymization In ICDE ’05: Proceedings of the 21st International 111 Conference on Data Engineering, pages 217–228 IEEE Computer Society, 2005 [7] Dan Boneh The decision diffie-hellman problem In Proceedings of the Third International Symposium on Algorithmic Number Theory, pages 48–63 Springer-Verlag, 1998 [8] Aggarwal C Charu and Philip S Yu Privacy-Preserving Data Mining: Models and Algorithms ASPVU, Boston, MA, United States, 2008 [9] David W Cheung, Jiawei Han, Vincent T Ng, Ada W Fu, and Yongjian Fu A fast distributed algorithm for mining association rules In DIS ’96: Proceedings of the fourth international conference on on Parallel and distributed information systems, pages 31–43 IEEE Computer Society, 1996 [10] Chris Clifton, Murat Kantarcioglu, Jaideep Vaidya, Xiaodong Lin, and Michael Y Zhu Tools for privacy preserving distributed data mining SIGKDD Explor Newsl., 4(2):28–34, 2002 [11] Chris Clifton, Murat Kantarcioglu, Jaideep Vaidya, Xiaodong Lin, and Michael Y Zhu Tools for privacy preserving distributed data mining ACM SIGKDD Explorations, 4:2003, 2003 [12] Ivan Damgard and Mads Jurik A generalisation, a simplification and some applications of paillier, probabilistic public-key system In In proceedings of PKC 01, LNCS series, pages 119–136 Springer-Verlag, 2001 [13] Elena Dasseni, Vassilios S Verykios, Ahmed K Elmagarmid, and Elisa Bertino Hiding association rules by using confidence and support In IHW ’01: Proceedings of the 4th International Workshop on Information Hiding, pages 369–383, London, UK, 2001 Springer-Verlag [14] A P Dempster, N M Laird, and D B Rubin Maximum likeli- hood from incomplete data via the em algorithm JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, 39(1):1–38, 1977 112 [15] Pedro Domingos and Michael Pazzani On the optimality of the simple bayesian classifier under zero-one loss Mach Learn., 29(2-3):103–130, 1997 [16] Jim Dowd, Shouhuai Xu, and Weining Zhang Privacy-preserving decision tree mining based on random substitutions In In International Conference on Emerging Trends in Information and Communication Security, pages 145–159 Springer Berlin / Heidelberg, 2005 [17] Wenliang Du, Shigang Chen, and Yunghsiang S Han Privacy- preserving multivariate statistical analysis: Linear regression and classification In In Proceedings of the 4th SIAM International Conference on Data Mining, pages 222–233, 2004 [18] Wenliang Du and Zhijun Zhan Building decision tree classifier on private data In Proceedings of the IEEE international conference on Privacy, security and data mining, pages 1–8 Australian Computer Society, Inc., 2002 [19] Wenliang Du and Zhijun Zhan Using randomized response techniques for privacy-preserving data mining In KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 505–510 ACM, 2003 [20] Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant Limiting privacy breaches in privacy preserving data mining In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 211–222 ACM, 2003 [21] Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, and Johannes Gehrke Privacy preserving mining of association rules In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217–228 ACM, 2002 113 [22] Michael Freedman, Kobbi Nissim, and Benny Pinkas Efficient private matching and set intersection In EUROCRYPT 2004, pages 1–19 Springer-Verlag, 2004 [23] Bart Goethals, Sven Laur, Helger Lipmaa, and Taneli Mielikinen On private scalar product computation for privacy-preserving data mining In In Proceedings of the 7th Annual International Conference in Information Security and Cryptology, pages 104–120 Springer-Verlag, 2004 [24] Oded Goldreich The Foundations of Cryptography, volume 2, chapter 7: General Cryptographic Protocols Cambridge University Press, 2nd edition, 2004 [25] Jiawei Han and Micheline Kamber Data Mining: Concepts and Techniques 2nd ed (The Morgan Kaufmann Series in Data Management Systems) Morgan Kaufmann Publishers, 2006 [26] Shuguo HAN and Wee Keong NG Multi-party privacy-preserving decision trees for arbitrarily partitioned data International Journal of Intelligent Control and Systems, 12(4):351–358, 2007 [27] Martin Hirt and Kazue Sako Efficient receipt-free voting based on homomorphic encryption In Proceedings of EuroCrypt 2000, LNCS series, pages 539–556 Springer-Verlag, 2000 [28] Victoria Hodge and Jim Austin A survey of outlier detection methodologies Artif Intell Rev., 22:85–126, 2004 [29] Ali Inan, Selim V Kaya, Ycel Saygin, Erkay Savas, Aya A Hintoglu, and Albert Levi Privacy preserving clustering on horizontally partitioned data Data Knowl Eng., 63(3):646–666, 2007 [30] Bjarne K Jacobsen and Dag S Thelle The tromson heart study: The relationship between food habits and the body mass index Journal of Chronic Diseases, 40(8):795–800, 1987 114 [31] G Joachim The relationship between habits of food consumption and reported reactions to food in people with inflammatory bowel disease– testing the limits J Nutrition and Health, 13(2):69–83, 1999 [32] Murat Kantarcioglu Privacy-preserving distributed data mining and processing on horizontally partitioned data PhD thesis, Purdue University, West Lafayette, IN, USA, 2005 Major Professor-Clifton, Christopher W [33] Murat Kantarcioglu and Chris Clifton Privacy-preserving distributed mining of association rules on horizontally partitioned data IEEE Trans on Knowl and Data Eng., 16(9):1026–1037, 2004 [34] Murat Kantarcioglu and Jaideep Vaidya Privacy preserving naive bayes classifier for horizontally partitioned data In IEEE ICDM Workshop on Privacy Preserving Data Mining, 2003 [35] V Kapoor, P Poncelet, F Trousset, and M Teisseire Privacy preserving sequential pattern mining in distributed databases In CIKM ’06: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 758–767 ACM, 2006 [36] Hillol Kargupta, Souptik Datta, Qi Wang, and Krishnamoorthy Sivakumar On the privacy preserving properties of random data perturbation techniques In ICDM ’03: Proceedings of the Third IEEE International Conference on Data Mining, page 99 IEEE Computer Society, 2003 [37] Chen Keke and Liu Ling Privacy preserving data classification with rotation perturbation In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM ’05, pages 589–592 IEEE Computer Society, 2005 [38] Yu-Hwan Kim, Shang-Yoon Hahn, and Byoung-Tak Zhang Text filtering by boosting naive bayes classifiers In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 168–175 ACM, 2000 115 [39] Yu-Chiang Li, Jieh-Shan Yeh, and Chin-Chen Chang Micf: An effective sanitization algorithm for hiding sensitive patterns on data mining Adv Eng Inform., 21(3):269–280, 2007 [40] Xiaodong Lin, Chris Clifton, Michael Zhu, Em Mixture Modeling, Xiaodong Lin, Chris Clifton, and Michael Zhu Privacy-preserving clustering with distributed em mixture modeling Knowledge and Information Systems, pages 68–81, 2005 [41] Yehuda Lindell and Benny Pinkas Privacy preserving data mining In Advances in Cryptology (CRYPTO’00), pages 36–53, 2000 [42] Michels Markus and Horster Patrick Some remarks on a receipt-free and universally verifiable mix-type voting scheme In Proceedings of the International Conference on the Theory and Applications of Cryptology and Information Security: Advances in Cryptology, pages 125–132 Springer-Verlag, 1996 [43] G.J McLachlan and K.E Basford Mixture Models: Inference and Applications to Clustering Marcel Dekker, New York, 1988 [44] Moni Naor and Benny Pinkas Oblivious transfer and polynomial evaluation In STOC ’99: Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 245–254 ACM, 1999 [45] Moni Naor and Benny Pinkas Efficient oblivious transfer protocols In SODA ’01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 448–457 Society for Industrial and Applied Mathematics, 2001 [46] Andreas Noack and Stefan Spitz Dynamic threshold cryptosystem without group manager Network Protocols and Algorithms, 1(1):108–121, 2009 [47] Stanley R M Oliveira and Osmar R Zaane Achieving privacy preservation when sharing data for clustering In In Proc of the Workshop 116 on Secure Data Management in a Connected World (SDM04) in conjunction with VLDB2004, pages 67–82 Toronto,Canada, 2004 [48] Stanley R M Oliveira and Osmar R Zaăane Privacy preserving frequent itemset mining In CRPIT ’14: Proceedings of the IEEE international conference on Privacy, security and data mining, pages 43–54 Australian Computer Society, Inc., 2002 [49] Stanley Oliveira Osmar and Stanley R M Oliveira Privacy preserving clustering by data transformation In In Proc of the 18th Brazilian Symposium on Databases, pages 304–318, 2003 [50] Pascal Paillier Public-key cryptosystems based on composite degree residuosity classes In IN ADVANCES IN CRYPTOLOGY EUROCRYPT 1999, pages 223–238 Springer-Verlag, 1999 [51] Hyoungmin Park and Kyuseok Shim Approximate algorithms for kanonymity In SIGMOD ’07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 67–78, 2007 [52] The European Parliament Eu directive 95/46/ec of the european parliament and of the council on the protection of individuals with regard to the processing of personal data and on the free movement of such data Official J European Communities, 40:31, 1995 [53] Huseyin Polat and Wenliang Du Privacy-preserving collaborative filtering using randomized perturbation techniques In ICDM ’03: Proceedings of the Third IEEE International Conference on Data Mining, page 625 IEEE Computer Society, 2003 [54] R.L Rivest, A Shamir, and L Adleman A method for obtaining digital signatures and public-key cryptosystems Communications of the ACM, 21:120–126, 1978 [55] Shariq J Rizvi and Jayant R Haritsa Maintaining data privacy in association rule mining In VLDB ’02: Proceedings of the 28th inter117 national conference on Very Large Data Bases, pages 682–693 VLDB Endowment, 2002 [56] Adi Shamir How to share a secret Commun ACM, 22(11):612–613, 1979 [57] Madhusudana V S Shashanka, Paris Smaragdis, and Madhusudana V S Shashanka Secure sound classification: Gaussian mixture models In IEEE International Conference on Acoustics Speech and Signal Processing(ICASSP) IEEE Computer Society, 2006 [58] Billis S Jafari A Shyue Liang Wang, Yu Huei Lee Hiding sensitive items in privacy preserving association rule mining In Systems, Man and Cybernetics, 2004 IEEE International Conference on, pages 3239 – 3244 IEEE Computer Society, 2005 [59] Luis Kruger1 Somesh Jha1 and Patrick McDaniel2 Privacy preserving clustering In In Proc of the 10th European Symposium on Research in Computer Security, pages 397–417, 2005 [60] J Stevens Applied Multivariate Statistics for the social sciences Lawrence Erlbaum Associates, 1986 [61] Latanya Sweeney k-anonymity: A model for protecting privacy International Journal on Uncertainty Fuzziness and Knowledgebased Systems, 10:557–570, 2002 [62] Yiannis Tsiounis and Moti Yung On the security of elgamal based encryption In Proceedings of the First International Workshop on Practice and Theory in Public Key Cryptography, pages 117–134 SpringerVerlag, 1998 [63] Jaideep Vaidya and Chris Clifton Privacy preserving association rule mining in vertically partitioned data In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining ACM, 2002 118 [64] Jaideep Vaidya and Chris Clifton Privacy preserving association rule mining in vertically partitioned data In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 639–644 ACM, 2002 [65] Jaideep Vaidya and Chris Clifton Leveraging the ”multi” in secure multi-party computation In WPES ’03: Proceedings of the 2003 ACM workshop on Privacy in the electronic society, pages 53–59 ACM, 2003 [66] Jaideep Vaidya and Chris Clifton Privacy-preserving k-means clustering over vertically partitioned data In KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 206–215 ACM, 2003 [67] Jaideep Vaidya and Chris Clifton Privacy-preserving outlier detection In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM ’04, pages 233–240 IEEE Computer Society, 2004 [68] Jaideep Vaidya and Chris Clifton Secure set intersection cardinality with application to association rule mining J Comput Secur., 13(4):593–622, 2005 [69] Jaideep Vaidya, Chris Clifton, Murat Kantarcioglu, and A Scott Patterson Privacy-preserving decision trees over vertically partitioned data ACM Trans Knowl Discov Data, 2(3):1–27, 2008 [70] Jaideep Vaidya, Murat Kantarciouglu, and Chris Clifton Privacy- preserving naive bayes classification The VLDB Journal, 17(4):879– 898, 2008 [71] Jaideep Shrikant Vaidya Privacy preserving data mining over vertically distributed parttioned data Purdue University, phd thesis edition, 2004 [72] Vassilios S Verykios, Elisa Bertino, Igor Nai Fovino, Loredana Parasiliti Provenza, Yucel Saygin, and Yannis Theodoridis State-of-the-art in privacy preserving data mining SIGMOD Rec., 33(1):50–57, 2004 119 [73] Hempel Susanne Wolfradt Uwe and Miles Jeremy N V Perceived parenting styles, depersonalisation, anxiety and coping behaviour in adolescents Journal of Adolescence, 34(3):521–532, 2003 [74] Fan Wu, Jiqiang Liu, and Sheng Zhong An efficient protocol for private and accurate mining of support counts Pattern Recogn Lett., 30(1):80– 86, 2009 [75] Yi-Hung Wu, Chia-Ming Chiang, and Arbee L P Chen Hiding sensitive association rules with limited side effects IEEE Trans on Knowl and Data Eng., 19(1):29–42, 2007 [76] Anrong Xue, Xiqiang Duan, Handa Ma, Weihe Chen, and Shiguang Ju Privacy preserving spatial outlier detection In Proceedings of the 2008 The 9th International Conference for Young Computer Scientists, pages 714–719 IEEE Computer Society, 2008 [77] Zhiqiang Yang, Sheng Zhong, and Rebecca N Wright Privacy- preserving classification of customer data without loss of accuracy In In SIAM SDM, pages 21–23, 2005 [78] Andrew Chi-Chih Yao How to generate and exchange secrets In SFCS ’86: Proceedings of the 27th Annual Symposium on Foundations of Computer Science, pages 162–167 IEEE Computer Society, 1986 [79] Xun Yi and Yanchun Zhang Privacy-preserving distributed association rule mining via semi-trusted mixer Data Knowl Eng., 63(2):550–567, 2007 [80] Sheng Zhong Privacy-preserving algorithms for distributed mining of frequent itemsets Information Sciences, 177(2):490503, 2007 [81] Sheng Zhong, Zhiqiang Yang, and Tingting Chen k-anonymous data collection Inf Sci., 179(17):2948–2963, 2009 120 [82] Sheng Zhong, Zhiqiang Yang, and Rebecca N Wright Privacy- enhancing k-anonymization of customer data In IN PODS, pages 139– 147 ACM Press, 2005 [83] Sheng Zhong, Zhiqiang Yang, and Rebecca N Wright Privacy- enhancing k-anonymization of customer data In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 139–147 ACM, 2005 121 ... NG DISTRIBUTED SOLUTIONS IN PRIVACY PRESERVING DATA MINING (Nghiên c u xây d ng m t s gi i pháp đ m b o an tồn thơng tin trình khai phá d li u) Chuyên ngành: B o đ m tốn h c cho máy tính h th... Furthermore, we can see that the 2PFD setting is quite popular in practice, and that privacy preserving frequency mining protocols in 2PFD are significant and can be applied to many other similar... executing the protocol, we generate three pairs of keys for each user, with the size of p and q set at 1024 bits and 160 bits, and compute values X and Y Note that generating these keys and parameters