Bảo vệ tính riêng tư trong khai phá dữ liệu cho dữ liệu dựa trên vị trí (lbs)

CƠNG TRÌNH ĐƯỢC HỒN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH Cán hướng dẫn khoa học: TS Đặng Trần Khánh (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét 1: TS Nguyễn Đức Cường (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét 2: TS Nguyễn Thanh Bình (Ghi rõ họ, tên, học hàm, học vị chữ ký) Luận văn thạc sĩ bảo vệ HỘI ĐỒNG CHẤM BẢO VỆ LUẬN VĂN THẠC SĨ TRƯỜNG ĐẠI HỌC BÁCH KHOA, ngày 19 tháng năm 2010 TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA KH&KT MÁY TÍNH CỘNG HOÀ XÃ HỘI CHỦ NGHIÃ VIỆT NAM Độc Lập - Tự Do - Hạnh Phúc -oOo Tp HCM, ngày 21 tháng 01 năm 2010 NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: TRƯƠNG TUẤN ANH Phái: Nam Ngày, tháng, năm sinh: 29-09-1985 Nơi sinh: Quảng Trị Chuyên ngành: Khoa học Máy tính MSHV: 00708185 1- TÊN ĐỀ TÀI: BẢO VỆ TÍNH RIÊNG TƯ TRONG KHAI PHÁ DỮ LIỆU CHO DỮ LIỆU DỰA TRÊN VỊ TRÍ (LBS) 2- NHIỆM VỤ LUẬN VĂN: - Tìm hiểu lí thuyết bảo vệ tính riêng tư, dịch vụ dựa vị trí khai phá liệu - Phân tích điểm mạnh, điểm yếu giải pháp đề nghị lựa chọn giải pháp thích hợp - Đề xuất giải pháp để bảo vệ tính riêng tư cho liệu dựa vị trí 3- NGÀY GIAO NHIỆM VỤ: 21/01/2010 4- NGÀY HOÀN THÀNH NHIỆM VỤ: 02/07/2010 5- HỌ VÀ TÊN CÁN BỘ HƯỚNG DẪN: TS ĐẶNG TRẦN KHÁNH Nội dung đề cương Luận văn thạc sĩ Hội Đồng Chuyên Ngành thông qua CÁN BỘ HƯỚNG DẪN CHỦ NHIỆM BỘ MÔN KHOA QL CHUYÊN NGÀNH (Họ tên chữ ký) QUẢN LÝ CHUYÊN NGÀNH (Họ tên chữ ký) (Họ tên chữ ký) LỜI CAM ĐOAN Tôi cam đoan rằng, ngoại trừ kết tham khảo từ cơng trình khác ghi rõ luận văn, cơng việc trình bày luận văn Tơi thực chưa có phần nội dung luận văn nộp để lấy cấp trường trường khác Trương Tuấn Anh Luận văn Thạc sĩ LỜI CẢM ƠN Quá trình hai năm học tập trường Đại học Bách khoa Thành phố Hồ Chí Minh qua luận văn tốt nghiệp thành cuối thể tổng kết kiến thức, nỗ lực thân học viên Để có thành này, cho phép em bày tỏ lòng biết ơn sâu sắc đến tồn thể thầy giáo trường Đại học Bách khoa Thành phố Hồ Chí Minh, đặc biệt thầy cô Khoa Khoa học Kỹ thuật Máy tính Chính kinh nghiệm, kiến thức quý báu mà thầy cô truyền đạt cho em giúp em giải nhiều vấn đề để đến kết cuối Cho em gửi lời cảm ơn lòng biết ơn sâu sắc đến Tiến sĩ Đặng Trần Khánh, người hướng dẫn giúp đỡ em suốt trình thực luận văn Những ý kiến đóng góp, gợi ý giải vấn đề thầy góp phần quan trọng cho thành công đề tài luận văn Tôi xin gửi lời cảm ơn đến anh chị, bạn bè, đồng nghiệp giúp đỡ đóng góp ý kiến quý báu cho tơi suốt q trình hồn thành luận văn Cho gửi lời cảm ơn đến ba mẹ người thân gia đình ln chia sẻ, động viên cổ vũ tinh thần, giúp vượt qua khó khăn suốt q trình học tập thời gian làm luận văn Xin cảm ơn tất người Thành phố Hồ Chí Minh, tháng năm 2010 Trương Tuấn Anh Luận văn Thạc sĩ i TÓM TẮT Ngày nay, với phát triển mạnh mẽ công nghệ thông tin mạng không dây, ứng dụng thông tin di động phát triển mạnh mẽ tạo nhiều tiền đề cho hướng nghiên cứu Trong đó, nghiên cứu dịch vụ dựa vị trí (Location based service) trở thành hướng nghiên cứu lĩnh vực Với phát triển dịch vụ dựa vị trí, thơng tin vị trí thu thập cá nhân tổ chức thơng qua q trình khai phá liệu, thơng tin có ích rút trích Vấn đề đặt việc khai phá liệu liệu vị trí phổ biến thơng tin “nhạy cảm” người sử dụng Do đó, rõ ràng người sử dụng không muốn phổ biến thông tin vị trí cho người khác xem, điều hợp lý thơng tin dùng cho mục đích xấu mà người sử dụng không mong muốn Tuy nhiên, ứng dụng data mining lại mong muốn thơng tin phải xác kết thật hiệu Chính yêu cầu đặt thách thức lớn cho nhà nghiên cứu tập trung sâu vào lĩnh vực Rõ ràng, giải thuật/framework hiệu để đảm bảo tính riêng tư người sử dụng đồng thời cung cấp cho ứng dụng data mining thông tin cần thiết kết hiệu Luận văn tập trung vào việc bảo vệ tính riêng tư người sử dụng họ sử dụng dịch vụ Luận văn đề nghị framework/giải thuật để bảo vệ tính riêng tư người sử dụng đồng thời cân việc bảo vệ tính riêng tư hiệu khai phá liệu Luận văn Thạc sĩ ii ABSTRACT With the rapid development of information technology and wireless network, the mobile services have been developed quickly and opened many research directions Among them, the research about the location based services is one of the main research directions With the development of the location based services, the organizations or individuals can collect the location information of the users Through the datamining process, they can infer the valuable information However, this process can expose the “sensitive” information of the user Therefore, the user does not want to publish their location information Contrary, the data mining process wants the input data which are more accurate so that it can output information which is trust This contrary requires a framework/algorithm to protect the user’s privacy and provide the essential information to the datamining process at the same time The thesis will focus on protecting the user’s privacy when they use the location services The thesis also proposes some framework/algorithm which tradeoff between the privacy protection and the effect of the datamining process Luận văn Thạc sĩ iii MỤC LỤC Chương I Giới thiệu đề tài I Tổng quan II Đối tượng nghiên cứu III Tính cấp thiết đề tài IV Vị trí đề tài V Các công việc liên quan VI Tính khả thi đề tài VII Ý nghĩa đề tài Chương II Tổng quan bảo vệ tính riêng tư khai phá liệu I Bảo vệ tính riêng tư cho liệu trước khai phá Phương pháp Randomization Phương pháp K-Anonymity Phương pháp L-Diversity 15 T-Closeness 17 Query Auditing 19 II Bảo vệ tính riêng tư khai phá liệu 20 III Thay đổi kết khai phá liệu để bảo vệ tính riêng tư 21 Che dấu luật kết hợp (Association Rule Hiding) 21 Giảm tính hiệu phân loại 22 Inference Control Query Auditing 23 IV So sánh đặc điểm hướng tiếp cận bảo vệ tính riêng tư 23 V Bảo vệ tính riêng tư khai phá liệu phân tán (distributed data) 24 Luận văn Thạc sĩ iv Chương III Bảo vệ tính riêng tư dịch vụ dựa vị trí (Location-based services) 27 I Tổng quan 27 II Các phương pháp bảo vệ tính riêng tư LBS 29 Kiến trúc không cộng tác (Non-Cooperative Architecture) 29 Kiến trúc có tham gia thành phần trung tâm tin cậy (Centralized Trusted Party Architecture) 30 Kiến trúc cộng tác ngang hàng (Peer to Peer Cooperative Architecture) 33 Chương IV Bảo vệ tính riêng tư khai phá liệu dịch vụ dựa vị trí36 I Những điểm yếu việc áp dụng k-anonymity cho việc bảo vệ tính riêng tư khai phá liệu dựa vị trí 36 II Spatio-Temporal Anonymization 37 III Grid-Based Anonymization 41 Khái niệm đặc điểm 41 Kiến trúc hệ thống 43 Các giải thuật phục vụ cho khai phá liệu 44 Đánh giá điểm yếu giải pháp dựa Grid 46 Chương V Bảo vệ tính riêng tư khai phá liệu dịch vụ dựa vị trí theo hướng tiếp cận dùng lưới tương thích 50 I Vấn đề cần giải 50 II Giải pháp lưới tương thích 51 Các định nghĩa 51 Kiến trúc 52 Vấn đề phủ lấp với hướng tiếp cận lưới tương thích 53 Giải thuật 54 Luận văn Thạc sĩ v Đánh giá chất lượng 60 III Đánh giá giải pháp lưới tương thích 61 Phương pháp 61 Tập liệu 63 Kết 63 Chương VI BẢO ĐẢM K-ANONYMITY CHO BẢNG DỮ LIỆU VỊ TRÍ 67 I Giới thiệu 67 II Các phương pháp bảo vệ 67 III Hướng tiếp cận dùng lưới để đảm bảo k-anonymity cho liệu vị trí 69 Các định nghĩa 69 Giải thuật 71 IV K-anonymity cho liệu không-thời gian (spatio-temporal data) 75 Thảo luận 75 Giải thuật 79 Đánh giá 82 Chương VII HƯỚNG TIẾP CẬN BẢO ĐẢM K-ANONYMITY CHO BẢNG DỮ LIỆU VỊ TRÍ CĨ QUAN TÂM ĐẾN LUẬT KẾT HỢP 84 I Giới thiệu 84 II Các khái niệm 84 III Tính tốn giá trị 86 IV Giải thuật 88 V Đánh giá phương pháp bảo đảm k-anonimity cho liệu vị trí có quan tâm đến luật kết hợp 93 Phương pháp 93 Dữ liệu 93 Luận văn Thạc sĩ vi Kết 93 Chương VIII KẾT LUẬN 95 Chương IX TÀI LIỆU THAM KHẢO 96 Luận văn Thạc sĩ Proceedings of MoMM2009 MoMM 2009 Full Papers Figure 12 True and false overlapped area Since we want to measure the true overlapped area when the user calls services many times, we put the user in a particular area, then simulate several service calls from the user To create the area where the user stays, we create a path (from home to work) with the assumption that the user usually travels on this way We use the path-shaped area because it is easy to illustrate the overlapping problems To create the dataset of paths, we use the Thomas Brinkhoff’s framework for generating network-based moving objects [9] The network used in the experiment is Oldenburg which is about 102.96 km2 [23] Firstly, we define the grid corresponding with the Oldenburg city The area of each cell reflects the minimum security level This means that the larger the cell is, the more security is provided, but the less quality the service is We exam the algorithm with the grid’s size varying from 40x40 cells, 50x50 cells to 60x60 cells In each case, we use 20 objects corresponding with its path in 20 time units We run this dataset for 15 times with the memorizing algorithm and 15 times with the random algorithm For each time, we calculate the average of true overlapped area (by cells) of 20 objects Figure 13 The average of overlapped cells with grid size of 40x40 and privacy level Figure 14 The average of overlapped cells with grid size of 40x40 and privacy level We also apply the algorithm in the grid with the privacy level 2, 3, and to diversify the results The charts show the relationship between the average of overlapped area of 20 objects and the number of service calls In the chart, the blue line with diamond makers represents for the average values of the memorizing algorithm while the red line with rectangle makers stands for the random algorithm Because we examine the objects in 20 time units corresponding with 20 positions, there will be maximum 20 overlapped cells for each object However, with a predefined grid, including the grid size and the number of cells, and a specific privacy level, and a particular dataset, we can calculate the maximum overlapped cells for each object in the random algorithm Therefore, in each experiment, the averages of true overlapped area yielded by random algorithm are bounded by this maximum value Moreover, this maximum value must be less than 20 as the above reason (each object is tested with 20 positions in our experiment) With the grid size of 40x40 cells, the maximum of the average of true overlapped area yielded by random algorithm is 11.2 cells The average number increases in the first few times of service calls; then it reaches the maximum value Besides, the average of true overlapped area yielded by the memorizing algorithm keeps constantly only around cells (see figure 13, 14, 15) Figure 15 The average of overlapped cells with grid size of 40x40 and privacy level Similarly, with the grid size of 50x50 cells, the maximum of the average of true overlapped area yielded by random algorithm is 13.1 cells The average number also increases in the first few times of service calls before reaching the maximum value 13.1 cells Besides, the average of true overlappedarea yielded by the memorizing algorithm keeps constantly only cells (see figure 16, 17, 18) In the experiment with the grid size of 60x60 cells, the results are the same as the previous ones The maximum of the average of true overlapped area yielded by random algorithm is 14.6 cells (see figure 19, 20, 21) 151 Proceedings of MoMM2009 MoMM 2009 Full Papers Figure 16 The average of overlapped cells with grid size of 50x50 and privacy level Figure 20 The average of overlapped cells with grid size of 60x60 and privacy level Figure 17 The average of overlapped cells with grid size of 50x50 and privacy level Figure 21 The average of overlapped cells with grid size of 60x60 and privacy level The results show that the average of true overlapped area yielded by our algorithm remains constant while the average of true overlapped area yielded by random algorithm increases in the first few times then keeping constant Moreover, the average of true overlapped area yielded by our algorithm is about 2.98 times less than the average of true overlapped area yielded by the random algorithm These results confirm that our approach is reasonable in reducing the number of overlapped areas In general, these experimental values can change slightly depending on the dataset but the overall conclusion keeps valid Figure 18 The average of overlapped cells with grid size of 50x50 and privacy level CONCLUSION AND FUTURE WORKS In this paper, we proposed a grid based solution and a memorizing algorithm that the trusted middleware can use to anonymize the location of the user This solution solved the overlapping problem occurring when the user uses the services many times a specific location The paper also proposes the solution when the user wants to change his desired privacy level to preserve privacy However, when the user moves in a trajectory, the area contains the location of the user may be limited but this area is not smaller than a grid cell as we discuss above To the best of our knowledge, our approach is new and the introduced memorizing algorithm over the grid-like partitioned space is among the vanguard ones that try to minimize the ability that the user’s location can be discovered Figure 19 The average of overlapped cells with grid size of 60x60 and privacy level In future, we will investigate the adaptive grid that can resize its grid cells This grid allows the trusted middleware can create a grid according to the user’s purpose to preserve privacy Another future work should be considered, it is the design a data structure 152 Proceedings of MoMM2009 MoMM 2009 Full Papers that can save the rectangles This structure can help the trusted server finds the rectangle quickly, it also helps the middleware to process the user’s requests quickly [11] Bugra, G., Ling, L 2008 Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms IEEE Transaction on mobile computing, vol.7, no.1 REFERENCES [12] Cuellar, J R 2002 Location Information Privacy B Srikaya (Ed.) Geographic Location in the Internet Kluwer Academic Publishers pp 179–208 [1] Andreas, G 2006 Coordinate Transformation - A Solution for the Privacy Problem of Location Based Service IEEE 2006 [13] Dieter, S., Marco C M., Siani P 2008 PRIME Architecture Version (July 2008) URL: http://www.prime-project.eu [2] Ardagna, C.A., Cremonini, M., Vimercati, S.D.C., Samarati, P 2008 Privacy-enhanced Location-based Access Control Michael, G., Sushil, J (Ed.) Handbook of Database Security – Applications and Trends Springer pp 531-552 [14] Gidófalvi, G., Huang, X., Pedersen, T B 2007 PrivacyPreserving Data Mining on Moving Object Trajectories In: Proc of the 8th International Conference on Mobile Data Management Germany [3] Atluri, V., Shin, H 2008 Effiently Enforcing the Security and Privacy Policies in a Mobile Environment Michael, G., Sushil, J (Ed.) Handbook of Database Security – Applications and Trends Springer pp 553-573 [15] Gruteser, M., Grunwald, D 2003 Anonymous usage of location-based services through spatial and temporal cloaking In: Proc of the 1st International Conference on Mobile Systems, Applications, and Services [4] Bellavista, P., Corradi, A., Giannelli, C 2005 Efficiently Managing Location Information with Privacy Requirements in Wi-Fi Networks, a Middleware approach, Wireless Communication Systems 2nd International Symposium on Volume, Issue pp 91–95 [16] Kupper, A 2005 Location-based Services – Fundamentals and Operation John Wiley & Sons Ltd., England [5] Beresford, A.R., Stajano, F 2003 Location privacy in pervasive computing IEEE Pervasive Computing (2003) pp 46-55 [6] Beresford, A.R, Stajano, F 2004 Mix zones: User privacy in location-aware services In: Proc of the 2nd IEEE Annual Conference on Pervasive Computing and Communications Workshops [7] Bettini, C., Wang, X., Jajodia, S 2005 Protecting privacy against location-based personal identification In: Proc of the 2nd VLDB Workshop on Secure Data Management, LNCS 3674 Springer-Verlag [8] Bettini, C., Mascetti, S., Wang, X S 2008 Privacy Protection through Anonymity in Location-based Services Michael, G., Sushil, J (Ed.) Handbook of Database Security – Applications and Trends Springer pp 509-530 [9] Brinkhoff, T 2002 A framework for generating networkbased moving objects GeoInformatica pp 153–180 [10] Bugra, G., Ling, L 2005 A Customizable k-Anonymity Model for Protecting Location Privacy In: Proc of IEEE International Conference on Distributed Computing Systems pp 620-629 [17] Langheinrich, M 2002 A Privacy Awareness System for Ubiquitous Computing Environments In: Proc of the 4th International Conference on Ubiquitous Computing Springer-Verlag pp 237–245 [18] Marco, G., Xuan, L 2004 Protecting Privacy in Continuous Location - Tracking Applications IEEE Computer Society [19] Mohamed, F M 2007 Privacy in Location-based Services: State-of-the-art and Research Directions IEEE International Conference on Mobile Data Management [20] Myles, G., Friday, A., Davies, N 2003 Preserving Privacy in Environments with Location-Based Applications IEEE Pervasive Computing, Vol 2, No pp 56–64 [21] Panos, K., Gabriel, G., Kyriakos, M., Dimitris, P 2007 Preventing Location-Based Identity Inference in Anonymous Spatial Queries IEEE Transactions on Knowledge and Data Engineering, Vol.19, No.12 [22] Schiller , J., Voisard A 2004 Location-Based Services Morgan Kaufmann [23] Wikipedia: http://en.wikipedia.org/wiki/Oldenburg (July 20, 2009) 153 An Adaptive Grid-Based Approach to Location Privacy Preservation Anh Tuan Truong, Quynh Chi Truong, and Tran Khanh Dang Faculty of Computer Science and Engineering Ho Chi Minh City University of Technology, Vietnam {anhtt,tqchi,khanh}@cse.hcmut.edu.vn Abstract Location privacy protection is a key factor to the development of location-based services Location privacy relates to the protection of a user’s identity, position, and path In a grid-based approach, the user’s position is obfuscated in a number of cells However, this grid does not allow users to adjust the cell size which relates to a minimum privacy level Therefore, it is hard to fix various privacy requirements from different users This paper proposes a flexible-grid-based approach as well as an algorithm to protect the user’s location privacy However, the user can custom conveniently his grid due to his requirement of privacy The overlap-area problem is also counted in the algorithm By deeply investigating on our solution, we also discuss open research issues to make the solution feasible in the practice Keywords: Location-based Services, Location Privacy, Privacy Preserving, Adaptive Grid, Privacy Attack Models Introduction The rapid development of location-based services (LBS) gives both opportunities and challenges for users and service providers LBS are services that make use of the location information of users [2, 8] The opportunities are that users can be benefited from the service while the service providers can earn more profits However, by using services, users face with privacy problems because his privacy data is attractive to attackers The location privacy can be defined as the right of individuals, groups, and institutions to determine themselves how, when, to whom, and for which purposes their location information is used [3, 2, 5] Besides, service providers have more responsibility to protect the user’s private information, especially the location privacy Therefore, location privacy protection is an emerging topic that is interested by many researchers [7] In [1], we proposed an approach that preserves the location privacy by cloaking the user’s location in a grid-based map However, we only designed the solution with a fixed grid It means that the size of each cell in the grid is predefined It is not convenient for users as each user has a different requirement of privacy Thus, N.T Nguyen et al (Eds.): Adv in Intelligent Inform and Database Systems, SCI 283, pp 133–144 springerlink.com © Springer-Verlag Berlin Heidelberg 2010 134 A.T Truong, Q.C Truong, and T.K Dang in this paper we have improved the previous solution by proposing an algorithm that works on a flexible grid or a grid that allows users change the cell’s size The rest of this paper is organized as follows In section 2, we briefly summarize the related work and the fixed-grid-based problem Next, section presents our improved approach for preserving privacy in LBS with an adaptive grid We propose various open issues with our solution in section Finally, section presents concluding remarks as well as our future works Related Works 2.1 Anonymity-Based Technique Location privacy is classified into three categories: identity privacy, position privacy and path privacy [2] The identity privacy is to protect users’ identities from disclosure to attackers, the position privacy is to hide the true position of users from the attack and the path privacy is to protect the information related to users’ motions For the identity privacy, there are some solutions such as anonymitybased technique, grouping technique In the anonymity-based technique [5, 6], the user uses a false identity to keep his anonymity when he calls the services In the grouping technique, users gather in a group then one of them acts as deputy to send the request to service providers By this way, it is hard to identify the one who really issues the request For the second category, the position privacy, the main approach is to obfuscate the user’s true position In other words, the true position of the user is blurred to decrease the accuracy There are numerous techniques to obfuscate a position such as enlarging, shifting, and reducing For more details, refer to [2] The last privacy category can be violated if an attacker finds the user’s path by monitoring the user’s requests in a period However, the path privacy can be protected by applying both identity and position privacy preserving techniques to make the attacker hard to infer the path by linking the requests 2.2 Fixed Grid-Based Solution and Problems The user maybe wants to use the service many times When he wants to use the service, he will send his true location to the service server This location should be hidden in a region to protect the user’s privacy Attackers maybe wait and catch this region Clearly, if attackers catch more regions, they can find the users’ location more easily In figure 1, the user uses the service three times and regions R1, R2, R3 are created Attackers can catch three regions and limit the area that contains the users’ location to the colored area In [1], a fixed grid-based approach for the trusted party architecture was introduced to solve above problem With this approach, the user’s location will be hidden in an area, which includes cells of a grid, it is named anonymization rectangle With this grid, it is simple to satisfy the required privacy level of the user An Adaptive Grid-Based Approach to Location Privacy Preservation 135 Fig Randomization approach problem When the user requires a higher level, the anonymization rectangle will be extended and when the user requires a smaller level, the rectangle will be smaller However, the middleware will use the same grid to anonymize the user’s location with this approach, but each user has a different required level of privacy, so if the cell size is too small, it is not enough to preserve the location of the user Conversely, the service is not sufficient if the size of a cell is too big Therefore, It is also difficult to decide how big the size of a cell To solve this problem, we can combine cells to form a bigger cell or split a cell to form new smaller cells For example, if the user wants a small cell, the middleware will split the default cell to some new cells Otherwise, if the user wants a big cell, the combination of some default cells to form a new cell will be carried out In this paper, we propose another solution; it is to design a grid which cells can resize At each time the user wants to use the service, the grid will be redesigned to meet the requirement of the user In the next section, we will discuss the detail of this solution, which we call adaptive Grid-based solution Adaptive Grid-Based Solution In the previous section, we show problems of the fixed grid-based solution In this section, we will introduce an adaptive Grid-based approach, which the grid cell size can change Depending on the requirement of the user, the size of the grid cell is established The trusted middleware will choose the anonymization area that contains these grid cells and the location of the user is in one of these grid cells First, we will consider the definition of this grid 3.1 Definitions As defined in [1], a Grid G will divide a map into cells, a cell is not necessary a square-shaped, it can a rectangle but all of cells will cover the space Different from [1], the grid cell size in this solution can be variable To divide a map into cells, a Starting point S is needed Similar to [1], an anonymization area is a square, it contains some cells and the location of the user (point U) will be in this area Figure shows a Grid and an anonymization area, h and w is the height and the width of the grid cell In this paper, h and w are equal for simplicity: 136 A.T Truong, Q.C Truong, and T.K Dang Fig Grid (a) and Anonymization area (b) With the starting point, the middleware will create the grid with the height and the width of the grid cell For example, with the height and the width w1, the grid is in Figure 3a and with the height and the width w2, the grid is in Figure 3b Fig Two grids with the starting point S 3.2 Architecture Similar to [1], the middleware architecture, which has a trusted middleware, is used to implement this solution Figure shows this architecture: Fig Trusted Middleware Architecture In this solution, the grid is put in the middleware When the user wants to use services, he will send his requirement information to the middleware The requirement information will include the user’s location information, required level of privacy and the required cell size According to the required cell size, the middleware will create a grid for this user Then, the middleware will choose cells from this grid to form an anonymization area according to the user’s required privacy The middleware will send this anonymization area to the service server and receive results Finally, the middleware will filter reasonable results and return these results to the user In Figure 5a, the user will send his requirement information including his true location U, required level of privacy is cells and required cell size is w1 The An Adaptive Grid-Based Approach to Location Privacy Preservation 137 middleware will create the grid as in Figure 5a and choose cells to form 3*3anonymization area and sends this area to the service server The anonymization area is coloured in Figure 5a At another time, the user uses the service again but the required cell size is w2 The grid in Figure 5b is created, the anonymization area is coloured in Figure 5b We see that the square in Figure 5a is not equal to the square in Figure 5b although the required level of privacy is the same Fig Two anonymization areas with different requirement information Additionally, the middleware should have a default grid Therefore, the user does not need to send the required cell size if he wants to use the default grid This case is mentioned in [1] or in other word; the case in [1] is a particular case of the adaptive grid-based solution 3.3 Overlapping Problems with Adaptive Grid-Based Solution Because the fixed grid is an adaptive grid‘s case, adaptive grid also has problems of fixed grid-based solution These problems were mentioned in [1] and have been solved by the memorizing algorithm Moreover, with the adaptive grid-based solution, the grid can be redesigned at different times when the user uses services, so the anonymization area can be limited See the example in figure 6: at time t1, the user uses the service, he defines the grid cell with size w1 He also requires the privacy level is cells The anonymization area is R1 In future, the user uses the service again but he also redefines the grid cell with size w2 The required privacy level is not changed The middleware will create the anonymization area R2 Fig Overlapping problem 138 A.T Truong, Q.C Truong, and T.K Dang We see that the overlapping area can limit the anonymization area to R3 Attackers can decide the area containing the location of the user easily As [1], the smallest area, which can be limited in the default grid, is a grid cell However, in the adaptive grid, we can not decide the smallest area which can be limited because the grid is redesigned when the user uses services Clearly, if the user uses the service many times, the overlapping area can be smaller In the next section, we will introduce an algorithm to solve these problems Like the memorizing algorithm, this algorithm also requires that the middleware has a database to save the anonymization area 3.4 Algorithm We will review problems in the previous section At the first time, the user uses the service and the middleware chooses the anonymization area, but at the second time and so on, the grid cell size can be resized, so the anonymization area may be changed Because the grid cell can be resized, the anonymization area may not overlap totally The partial overlap is the cause of these problems The smaller the overlap, the easier attackers can find the location of the user To solve these problems, we should combine information from previous times when the user used the service The middleware will combine information from previous uses of the user to create the anonymization area for the current time We can figure out the mechanism of this algorithm as following: - The user will send his requirement information when he uses the service to the middleware As discussion before, the requirement information includes his true location, the required grid cell size and the required level of privacy - The middleware will receive the user’s requirement information With the starting point, it will create the grid according to the required grid cell size Then, it will query the database to check if this is the first time the user uses this service or not: • If this is the first time, depending on the location of the user and the required level of privacy, the middleware will choose the anonymization area and save it to the database for future references • If no, the middleware will find all information from previous uses; combine them with the current information to choose the appropriate anonymization area Then, it will save the current information to the database - The middleware will send the anonymization area that has been just created to the service server - The middleware will receive return results, choose the acceptable result and return this result to the user Clearly, the anonymization area for the current time should totally overlap with previous areas The total overlap will help our against the information mining of attackers to find the user’s location Reality, if the anonymization area which is created at the second time and so on does not overlap the first anonymization area totally, attackers will limit the area which contains the true location of the user We discussed this problem in the section 3.3 See the case in Figure 7a, the anonymization area R1 overlaps partially with the anonymization area R2, so attackers An Adaptive Grid-Based Approach to Location Privacy Preservation 139 Fig Partial overlap area (a) and Total overlap area (b) can limit the area that contains the user’s location to R3 In this case, the total overlap area as in Figure 7b is better However, the total overlap may not occur at any time As we discussed before, the grid cell size may change, so the bigger overlap area may not fill all space of the smaller one For more details, we will consider the example as in Figure 8a: at the first time, required privacy level is cells Anonymization area R1 is created and at the second time, required privacy level is 16 cells Grid G2 is created and anonymization area R2 is chosen Fig Example for overlap area (a) and Maximal overlap area (b) We see that we can not choose the anonymization area R2 in order to overlap R1 totally In this case, the algorithm should choose the anonymization area R2 in order to the overlap area between R2 and R1 is maximal The maximal overlap area is R3 in Figure 8b As we see in the mechanism, when the anonymization area is created and sent to the service server, the middleware also save this anonymization area to its database for future references However, what does information need to save? To choose the anonymization area, the middleware will consider all previous anonymization areas It will choose the anonymization area for current time in order to the overlap area is maximal Therefore, the middleware should also save all previous anonymization areas Intuitively, we need just the information of the last overlap area So, the middleware will choose a new anonymization area so that new overlap area between this anonymization area and the last overlap area is maximal 140 A.T Truong, Q.C Truong, and T.K Dang Fig Very small overlap area Clearly, the partial overlap area will limit the space that contains the users’ location In above examples, the maximal overlap is acceptable because the space, which contains the users’ location, is big enough However, not all of maximal overlap area is acceptable We will consider the example in Figure 9: anonymization area R1 is created at the first time and R2 is created at the second time the user uses the service In this case, the maximal overlap area, which intersects between R1 and R2, is R3 Actually, the user wants the space, which contains his true location, is R2 at the second time However, attackers can find out that the true location of the user is in R3 Clearly, R3 is very small when comparing with R2 Intuitively, to solve above problem, we can define the minimal anonymization area When the maximal overlap area at the current time is smaller the minimal anonymization area, the middleware will choose the previous maximal overlap area and send this area to the service server However, it is difficult to decide the size of this area because we can not know how big the minimal anonymization is enough We also propose another approach to solve above problem It is to use a roving starting point The idea of this approach as follow: - When the user uses the service for the first time, the middleware will save to its database the information about vertexes of the anonymization area In Figure 10, they are vertex A, B, C and D - At the second time and so on, the middleware will choose one of four vertexes as the starting point It will create new grid according to the new starting point and return the anonymization area Fig 10 Roving starting point An Adaptive Grid-Based Approach to Location Privacy Preservation 141 As shown in Figure 10a, the vertex A is chosen as new starting point The new grid is created and the new anonymization area (R2) will totally overlap with the previous anonymization area (R1) In Figure 10b, the vertex C is chosen as starting point and the anonymization area is R2 It also overlaps totally with R1 The details of this approach will be left as future works In short, we can describe the algorithm in pseudo code as follows: Create the grid according to the user’s requirement information; if (this is the first time the user uses the service) { Get random anonymization area which contains the true location of the user; Save this anonymization area; Send this area to the service server; } else { Query the last maximal overlap area of the user; Perform overlap_area_getting function; Save the anonymization area which have just found in the overlap_area_getting function; Save the maximal overlap area; Send this area to the service server; } In this algorithm, the overlap_area_getting()function is very important The goal of this function is to find the new anonymization area so that the overlap area between this area and the last maximal overlap area is maximal Therefore, we can describe the mechanism of this function as follow: - Query the last maximal overlap area from the middleware’s database - Based on the grid that has just been created and the required privacy level of the user The middleware will choose all anonymization areas according to the user’s requiredprivacy The condition is that these anonymization areas must contain the true location of the user - Choose the anonymization area that the overlap area between it and the last maximal area is biggest - Return the anonymization that has just found and new maximal overlap area To limit the number of anonymization areas, we notice that these anonymization areas must contain the location the user So we will start at the cell contains the location of the user, we will go forward to four directions from this cell as in Figure 11 At each direction, choose cells that are “the most suitable” We will consider the example in Figure 11a: the starting cell is cell Assume that we want to get a 2*2-anonymization area With the width, two cells and are considered We will choose cell because the overlap area between cell and last maximal area is bigger With the height, it is similar to the width’s process The anonymization area with cells 1, 2, 3, is the best one for 2*2-anonymization area Another example is in Figure 11b; in this case, we want to choose a 3*3anonymization area At the step 1, similar to the figure 11a, the cell and cell 142 A.T Truong, Q.C Truong, and T.K Dang Fig 11 Roving starting point will be considered; we will choose cell At the next step, cell and cell are considered, cell will be chosen The process for the width is stopped because three cells have been chosen At the next step, the process for the height will be started and it is similar to the width process Finally, we can see that an efficient structure data is important For a long time, anonymization areas, which are stored to database, is increased So, a sufficient structure data for saving these anonymization areas is needed 3.5 Measures of Quality The main requirements for the location cloaking are Accuracy, Quality, Efficiency and Flexibility as shown in [14]: - Accuracy: the system must satisfy the requirement of the user as accuracy as possible Quality: the attacker can not find out the true location of the user Efficiency: the computation for the location cloaking should be simple Flexibility: the user can change his requirement of privacy at any time However, these criterions should be trade off The requirement for the best quality will lead to increase the complexity of the computation and so on In our approach, the user can require the level of privacy to protect his private location The middleware will choose the anonymization area to hide the true location of the user according to user’s privacy level Furthermore, the user can define the smallest area (cell size) or use the default cell He can also change his required level of privacy at any time when he wants to use the service Indeed, the approach can easily satisfy the privacy requirement of the user When the user wants a high level of privacy, the middleware will expand the anonymization area that contains the true location of the user Conversely, the anonymization area will be smaller if a lower level of privacy is required Because the true location of the user is embedded in an area, it is difficult to find the true location of the user When the anonymization area is enough big, the attacker maybe make more effort to find out the true location of the user Moreover, we notice that the overlap_area_getting()function is the main function and it takes much time to finish This function will find the anonymization area so that the overlap area between this anonymization area and the last overlap area is the largest The function will take two loops, one for find cells in the vertical and another one for horizontal So the complexity of the this function is O(n) An Adaptive Grid-Based Approach to Location Privacy Preservation 143 Besides, the complexity of the algorithm also depends on the database access So, the data structure for saving anonymization areas is needed to decrease the complexity of this algorithm, we discussed it before Open Research Issues In previous sections, we introduced a new research approach for applying an adaptive grid to the middleware architecture to preserve the privacy of the user This new research approach also opens more new research issues As we discussed before, the adaptive grid with fixed starting point will result in some problems Therefore, the design an adaptive grid with a roving starting point will make the middleware to protect the privacy of the user sufficiently In some case, the anonymization area, which is chosen, will not “big” enough to hide the location of the user For example, assume that the anonymization area includes four cells; they are cell 1, cell 2, cell and cell However, cell 1, cell 2, cell are regions that the user may not be there, for example, a lake or a swamp, so attackers can limit the area which contains the user’s location to the cell A new direction in investigating a new algorithm or a method to eliminate the anonymization area, which contains “dead” regions, should be considered The probability P of an anonymity area which has been chosen should be: required _ privacy _ level P= ∑ Pi +R (1) P is the probability of an anonymity area that does not have the “dead” regions Pi is the probability of a cell that is not a “dead” region R is the priority of this anonymity area R should depend on the overlap area between this anonymization area and previous anonymization areas of previous uses When the middleware wants to choose an anonymity area, it will choose the anonymity area that has the biggest value of P A combination between the grid approach with an algorithm, which helps to find Pi and R, will increase the efficiency in protecting the user’s privacy Again, we notice that the time to carry out the algorithm also depends on the database structure So, an efficient data structure is needed The efficient data structure will help us to save the anonymization area efficiently This will help to reduce the time to get the anonymization area when the middleware wants to query the database A new direction in designing a new data structure should be also considered Conclusions In this paper, we proposed a flexible grid and an algorithm working on this grid to anonymize the location of the user This solution gives the user a right to adjust the size of a cell, which is corresponding with the minimum privacy level, to meet the user’s requirement In the algorithm, we covered all possible situations that can be occurred when users resize the cell’s size Moreover, we also proposed the solution for the overlap-area problems 144 A.T Truong, Q.C Truong, and T.K Dang This approach can be applied in many industry fields such as health, work, personal life… In these services, users not use services directly; they will send their request to a trusted middleware, which is provided by a third trusted organization The middleware will be responsible for protecting the user’s location according to the user’s requirement In future, we will investigate all research directions discussed in the previous part to make our solution become more applicable in real life References Truong, Q.C., Truong, T.A., Dang, T.K.: Privacy Preserving through A Memorizing Algorithm in Location-Based Services In: 7th International Conference on Advances in Mobile Computing & Multimedia (2009) Ardagna, C.A., Cremonini, M., Vimercati, S.D.C., Samarati, P.: Privacy-enhanced Location-based Access Control In: Michael, G., Sushil, J (eds.) Handbook of Database Security – Applications and Trends, pp 531–552 Springer, Heidelberg (2008) Beresford, A.R., Stajano, F.: Location privacy in pervasive computing IEEE Pervasive Computing, 46–55 (2003) Beresford, A.R., Stajano, F.: Mix zones: User privacy in location-aware services In: 2nd IEEE Annual Conference on Pervasive Computing and Communications Workshops (2004) Bettini, C., Wang, X., Jajodia, S.: Protecting privacy against location-based personal identification In: 2nd VLDB Workshop on Secure Data Management (2005) Bettini, C., Mascetti, S., Wang, X.S.: Privacy Protection through Anonymity in Location-based Services In: Michael, G., Sushil, J (eds.) Handbook of Database Security – Applications and Trends, pp 509–530 Springer, Heidelberg (2008) Bugra, G., Ling, L.: Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms IEEE Transaction on mobile computing (2008) Cuellar, J.R.: Location Information Privacy In: Srikaya, B (ed.) Geographic Location in the Internet, pp 179–208 Kluwer Academic Publishers, Dordrecht (2002) Gidófalvi, G., Huang, X., Pedersen, T.B.: Privacy-Preserving Data Mining on Moving Object Trajectories In: 8th International Conference on Mobile Data Management (2007) 10 Gruteser, M., Grunwald, D.: Anonymous usage of location-based services through spatial and temporal cloaking In: 1st International Conference on Mobile Systems, Applications, and Services (2003) 11 Kupper, A.: Location-based Services - Fundamentals and Operation John Wiley & Sons, Chichester (2005) 12 Langheinrich, M.: A Privacy Awareness System for Ubiquitous Computing Environments In: 4th International Conference on Ubiquitous Computing, pp 237–245 (2002) 13 Marco, G., Xuan, L.: Protecting Privacy in Continuous Location - Tracking Applications IEEE Computer Society, Los Alamitos (2004) 14 Mohamed, F.M.: Privacy in Location-based Services: State-of-the-art and Research Directions In: IEEE International Conference on Mobile Data Management (2007) 15 Myles, G., Friday, A., Davies, N.: Preserving Privacy in Environments with LocationBased Applications IEEE Pervasive Computing, 56–64 (2003) 16 Panos, K., Gabriel, G., Kyriakos, M., Dimitris, P.: Preventing Location-Based Identity Inference in Anonymous Spatial Queries IEEE Transactions on Knowledge and Data Engineering (2007) ... việc bảo vệ tính riêng tư khai phá liệu liệu vị trí Đối tư? ??ng nghiên cứu chủ yếu đề tài tập trung vào hai đối tư? ??ng: bảo vệ tính riêng tư khai phá liệu liệu dựa vị trí (LBS) Tính riêng tư khai phá. .. tiêu phương pháp bảo vệ tính riêng tư vị trí cho ứng dụng dựa vị trí cân bảo vệ tính riêng tư hiệu ứng dụng dựa vị trí Luận văn Thạc sĩ 29 II Các phương pháp bảo vệ tính riêng tư LBS Trong mục... tính MSHV: 00708185 1- TÊN ĐỀ TÀI: BẢO VỆ TÍNH RIÊNG TƯ TRONG KHAI PHÁ DỮ LIỆU CHO DỮ LIỆU DỰA TRÊN VỊ TRÍ (LBS) 2- NHIỆM VỤ LUẬN VĂN: - Tìm hiểu lí thuyết bảo vệ tính riêng tư, dịch vụ dựa vị

Định dạng
Số trang	132
Dung lượng	2,68 MB