1. Trang chủ
  2. » Luận Văn - Báo Cáo

Bảo vệ tính riêng cho các dịch vụ dựa trên vị trí ở mức cơ sở dữ liệu

97 9 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 97
Dung lượng 4,93 MB

Nội dung

ĐẠI HỌC QUỐC GIA TP.HCM TRƯỜNG ĐẠI HỌC BÁCH KHOA - TƠ QUỐC CƯỜNG BẢO VỆ TÍNH RIÊNG TƯ CHO CÁC DỊCH VỤ DỰA TRÊN VỊ TRÍ Ở MỨC CƠ SỞ DỮ LIỆU Chuyên ngành: Khoa học máy tính LUẬN VĂN THẠC SĨ TP HỒ CHÍ MINH, tháng năm 2011 CƠNG TRÌNH ĐƯỢC HỒN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH Cán hướng dẫn khoa học : PGS TS Đặng Trần Khánh Cán chấm nhận xét : TS Trần Văn Hoài Cán chấm nhận xét : TS Nguyễn Chánh Thành Luận văn thạc sĩ bảo vệ Trường Đại học Bách Khoa, ĐHQG Tp HCM ngày 27 tháng năm 2011 Thành phần Hội đồng đánh giá luận văn thạc sĩ gồm: TS Thoại Nam TS Nguyễn Chánh Thành PGS TS Đặng Trần Khánh TS Nguyễn Tuấn Đăng Xác nhận Chủ tịch Hội đồng đánh giá LV Trưởng Khoa quản lý chuyên ngành sau luận văn sửa chữa (nếu có) Chủ tịch Hội đồng đánh giá LV Bộ môn quản lý chuyên ngành ĐẠI HỌC QUỐC GIA TP.HCM TRƯỜNG ĐẠI HỌC BÁCH KHOA CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: Tô Quốc Cƣờng Phái: Nam Ngày, tháng, năm sinh: 20/01/1986 Nơi sinh: Cần Thơ Chuyên ngành: Khoa Học Máy Tính MSHV: 09070958 I TÊN ĐỀ TÀI: BẢO VỆ TÍNH RIÊNG TƢ CHO CÁC DỊCH VỤ DỰA TRÊN VỊ TRÍ Ở MỨC CƠ SỞ DỮ LIỆU II NHIỆM VỤ VÀ NỘI DUNG: - Tìm hiểu kiến thức dịch vụ dựa vị trí - Tìm hiểu vấn đề bảo vệ tính riêng tƣ dịch vụ dựa vị trí cấu trúc đánh mục cho liệu không-thời gian - Nghiên cứu đề xuất giải pháp bảo vệ tính riêng tƣ mức sở liệu III NGÀY GIAO NHIỆM VỤ : (Ghi theo QĐ giao đề tài) 7/2010 IV NGÀY HOÀN THÀNH NHIỆM VỤ: (Ghi theo QĐ giao đề tài) 7/2011 V CÁN BỘ HƯỚNG DẪN (Ghi rõ học hàm, học vị, họ, tên): PGS TS Đặng Trần Khánh Tp HCM, ngày 14 tháng năm 2011 CÁN BỘ HƯỚNG DẪN CHỦ NHIỆM BỘ MÔN QUẢN LÝ CHUYÊN NGHÀNH KHOA QL CHUYÊN NGÀNH ii LỜI CẢM ƠN Tôi xin gửi lời cảm ơn sâu sắc đến Thầy Đặng Trần Khánh tận tình hướng dẫn, giúp đỡ thực đề tài nghiên cứu Tơi xin gửi lời cảm ơn nhóm ASIS Lab hỗ trợ tạo điều kiện nghiên cứu cho suốt thời gian qua Sau cùng, xin gửi lời cảm ơn chân thành đến gia đình bạn bè bên cạnh động viên giúp đỡ tơi iii TĨM TẮT LUẬN VĂN Ngày nay, với phát triển mạnh mẽ công nghệ định vị (GPS) dịch vụ dựa vị trí, viết tắt LBS (Location-based Service), dịch vụ nhằm cung cấp tiện ích cho người sử dụng dựa vị trí họ, ngày phát triển khơng ngừng Tuy nhiên, để sử dụng dịch vụ LBS, người sử dụng vơ tình cố ý chấp nhận để lộ vài thông tin riêng tư vị trí số thơng tin liên quan khác tên, tuổi, sở thích, … Việc để lộ thông tin riêng tư gây ảnh hưởng không nhỏ đến đời sống riêng tư chí an tồn người sử dụng Để giải vấn đề bảo vệ tính riêng tư này, nhiều giải pháp nhà nghiên cứu đưa Tuy nhiên, tất giải pháp có điểm chung giải thuật bảo vệ tính riêng tư tách khỏi sở liệu lưu trữ thơng tin vị trí người dùng Điều làm cho trình xử lý câu truy vấn diễn lâu Trước yêu cầu trên, luận văn nghiên cứu đề giải pháp để tích hợp giải thuật bảo vệ tính riêng tư vào mức sở liệu nhằm rút ngắn thời gian xử lý câu truy vấn iv ABSTRACT Nowadays, with the development of GPS, the Location-based Service (LBS) which is the service that supplies users with added value based on their positions, is emerging However, in order to use LBS, users have to reveal their personal information such as their positions or related information like name, age, hobbies, … This problem will cause many effects to users’s life and safety To solve this problem, many methods have been proposed These methods have one thing in common: the privacy-preserving algorithms are seperated from the databases that store position information of users, making the query processing longer This thesis will research and propose a solution to integrate the privacypreserving algorithms into database level in order to shorten query processing time v LỜI CAM ĐOAN Tôi xin cam đoan luận văn tốt nghiệp công trình nghiên cứu thực cá nhân, thực sở nghiên cứu lý thuyết thực nghiệm Các số liệu, kết nêu luận văn trung thực chưa người khác công bố cơng trình khác vi Mục lục Chương Giới thiệu đề tài 1.1 Đặt vấn đề 1.2 Giới thiệu đề tài 1.2.1 Tên đề tài 1.2.2 Giới hạn đề tài 1.2.3 Mục tiêu đề tài 1.2.4 Ý nghĩa khoa học thực tiễn 1.3 Kế hoạch thực Chương Cơ sở lý thuyết 2.1 Hệ thống định vị toàn cầu-GPS 2.1.1 Định nghĩa 2.1.2 Các thành phần GPS 2.1.3 Hoạt động GPS 2.2 Hệ thống thông tin địa lý-GIS 2.2.1 Định nghĩa 2.2.2 Các thành phần GIS 10 2.2.3 Hoạt động GIS 10 2.3 Dịch vụ dựa vị trí-LBS 11 2.3.1 Định nghĩa 11 2.3.2 Phân loại 11 2.4 Tính riêng tư LBS 13 2.4.1 Định nghĩa 13 2.4.2 Phân loại 14 vii 2.4.3 Chính sách riêng tư 15 2.5 Cấu trúc đánh số cho liệu không-thời gian 16 Chương Các cơng trình nghiên cứu liên quan 18 3.1 Kiến trúc sử dụng thành phần trung gian tin cậy 18 3.1.1 Phương pháp pha trộn vùng (Mix Zones) 19 3.1.2 Phương pháp che dấu vùng nhạy cảm sử dụng thuật toán k-area [MaX04]: 20 3.1.3 Phương pháp che dấu không gian chia ¼(Quadtree Spatial Cloaking)[Moh07] 23 3.1.4 Thuật toán che dấu CliqueCloak – sử dụng đồ thị vô hướng: 24 3.1.5 Thuật toán che dấu sử dụng lân cận gần (Nearest Neighbor Cloaking – NNC) [PGK+07]: 25 3.1.6 Thuật toán che dấu không gian Hilbert (Hilbert Cloaking): 27 3.1.7 Giảm độ xác vị trí (obfuscation): 29 3.2 Nhận xét đánh giá cho nhóm phương pháp [And06] 30 3.3 Các cấu trúc đánh số cho liệu không-thời gian 31 3.3.1 Cấu trúc đánh số cho liệu không-thời gian khứ (Indexing the Past): 32 3.3.2 Cấu trúc đánh số cho liệu không-thời gian tương lai (Indexing the Now and the Future): 36 Chương Hướng tiếp cận thực 43 4.1 Quá trình xử lý câu truy vấn 43 viii 4.2 Kiến trúc triển khai 44 4.3 Làm mờ (obfuscation) 45 4.3.1 Làm mờ không gian [ACV+09, ACD+07] 45 4.3.2 Làm mờ không-thời gian 45 4.3.3 Làm mờ khơng gian có quan tâm đến yếu tố địa lý 46 4.4 Chính sách bảo mật (Authorization) 47 4.4.1 Chính sách bảo mật cho yếu tố khơng gian 47 4.4.2 Chính sách bảo mật cho yếu tố không-thời gian 47 4.5 Cấu trúc đánh mục: 48 4.5.1 OST-tree: 48 4.5.2 Quy tắc đặt thông tin bảo mật lên nút tìm kiếm OST-tree: 49 4.6 Bob-tree 50 4.7 Phân tích độ bảo mật giải pháp: 54 Chương Đánh giá giải pháp 56 5.1 Phương pháp đánh giá 56 5.2 Tập liệu 57 5.2.1 Dữ liệu không gian 57 5.2.2 Kiểu liệu không-thời gian 57 5.3 Kết đánh giá 59 5.4 Đánh giá tập liệu ngẫu nhiên 61 Chương Tổng kết 63 6.1 Tổng kết 63 6.2 Hướng phát triển 64 it can affect the quality of location-based services So, it is the responsibility of user to decide which degree of accuracy of user’s location to be revealed to which service providers Motivated by this, Dang et al developed the general architecture [7] to classify LBS service providers depending on the user’s trust This architecture inherits the property of mandatory access control to label service providers so that users only reveal their locations on an appropriate level based on the labels assigned to service providers However, the index structure in this architecture does concern about temporal data at a very abstract level Thus, it is necessary to concretize this structure by a suitable spatio-temporal index and this will be discussed in the next section B Spatio-Temporal Structures for Indexing the Present and Future Positions of Moving Objects Several recent researches focus on indexing the present and future positions of moving objects [12] and the most popular category is parametric spatial access Two popular access methods in this category are PR-tree and TPR-tree PR-tree [14], however, is only suitable for objects with spatial extent So, in applications concerning a user’s position which is a spatial point in nature, the PR-tree is not the best solution For TPR-tree [5], it inherits the idea of parametric bounding rectangles in R-tree [15] to create time-parameterized bounding rectangle (tpbr) Since the tpbr is organized in hierarchical form in terms of space, TPR-tree is chosen as the base structure of our proposed structure so that we can easily overlay the obfuscated data in TPR-tree’s node hierarchically In TPR-tree, the position of an moving object x(t) at a future time t (t >= tc) is found by applying the linear function representing its location to the current time x(t) = x(t0) + v(t – t0) where t0 is the initial time, tc the current time, x(t0) the initial position and v the velocity The tpbr is also a function of time Specifically, the lower (upper) bound of a tpbr is set to move with the minimum (maximum) speed of all enclosed objects Despite the existence of several indexing techniques for present and future positions, no moving-object index has yet been reported in the literature that achieves the goal of obfuscating the user’s position C Access Methods for Privacy-Preserving Several index structures have been proposed to manage both profiles and moving object data The SSTP-tree [6] is constructed similarly to the TPR-tree, but each node has additional information about a profile bounding vector to support the profile conditions Therefore, each node of the SSTP-tree includes both tpbr to support the spatio-temporal attributes and profile bounding vector to support profile conditions The limitation of this access method is that it only allows or denies the access request of subjects, but does not concern about obfuscating the spatio-temporal data In other words, there are only two levels of result in the access request evaluation: reject or accept By adding more information about obfuscating the spatio-temporal data of users, our proposed index structure, however, has a multi-level form of result when evaluating an access request depending on the user’s trust on the LBS service providers In [13], a unified index for location and profile data is proposed This index clusters the customers based on their profiles using a categorical clustering algorithm, and then constructs a TPR-tree for each cluster A query is processed in the profile database to retrieve the target clusters and then traverse these clusters to retrieve the customers who satisfy the criteria This unified index is, however, used for marketing purpose which retrieves the group of interested customers, but does not concern about obfuscating the customer’s location It is evident from the above discussion that currently there does not exist any spatio-temporal index structure that can effectively handle spatio-temporal obfuscation Towards this goal, in this paper, we propose the OST-tree, a structure originally motivated by the TPR-tree, but with several modifications to support spatio-temporal obfuscation III TEMPORAL OBFUSCATION Many of the research activities have been done in the area of spatial obfuscation [3,4,10,11,16], but, to the best of our knowledge, no mature proposals for obfuscating the temporal data of users exist So, we focus on this issue in this section Similar to spatial obfuscation, temporal obfuscation will degrade the exact value of time t0 to the vague temporal value [t[, t]], where t[ < t0 < t] For example, instead of saying that ”the position of user will be in location (x0, y0) in the next 15 minutes”, we can obfuscate the time value by saying that ”the position of user will be in location (x0, y0) in the next 13 to 16 minutes” By combining the spatial and temporal dimension, a spatio-temporal value can be calculated by obfuscating both the spatial and temporal value For example, according to the above example, we can say: “The user’s position is somewhere in the area of 1.2 square kilometer, including the location (x0, y0), and within the next 13 to 16 minutes in the future” Definition (Temporal obfuscation) The obfuscated value of timestamp t0 is the temporal interval [t[, t]] which includes the real timestamp t0 with the probability: P(t0 ∈ [t[, t]])=1 (1) Definition (Spatio-temporal obfuscation) The obfuscated value of user’s exact position (xu, yu) at a timestamp t0 is a rectangular area (xc, yc, w, h) centered on the geographical coordinates (xc, yc) with width w, height h, at a temporal interval [t[, t]], which includes the user’s exact position (xu, yu) at a real timestamp t0 with the probability: P((xu, yu) ∈ Rectangle(xc, yc, w, h) AND t0 ∈ [t[, t]])=1 (2) In our work, we have the same assumption as in [10] which states that the probability distribution of user’s position within an area is uniform Formally, the joint probability density function fr(x, y) of a region is: ⎧ if (x, y) ∈ r ⎪ f r (x, y)= ⎨ s(r) ⎪ otherwise ⎩ where s(r) represents the area of r (3) Similarly, the probability distribution of user’s position within an area r and at a time t0 within an interval [t[, t]] is: ⎧ if (x, y) ∈ r , t0 ∈ [t [ ,t ] ] ⎪ f r, t (x, y)= ⎨ s(r) (t ] − t [ ) ⎪ otherwise ⎩ (4) Based on this property, if service providers have a higher level of trust from a user, their identities will be placed on the node nearer to the leaf node and vice versa For instance, the service provider with the identity #S103 has the highest level of trust from a user with the identity #U232, and so it can obtain the user’s exact position (∆s=0) This service provider’s identity is, therefore, placed on the leaf node Definition (Authorization) An authorization α is a 4tuple where idsp is the identity of service provider, iduser is the identity of user, ∆s, ∆t is the degree of accuracy of user’s position (spatial data) and time, respectively The meaning of an authorization is that a user with the identity iduser allows only the service provider with the identity idsp to access his/her sensitive information of position and time with the degree of accuracy of ∆s, ∆t, respectively For example, a user with the identity #U232 is willing to reveal his position in the next 10 minutes with the accuracy of position and time being 600 square meters, minutes, respectively, to the advertising service with the identity #S101 This authorization can be expressed as α1 = If the user’s exact position in the next 10 minutes is located at a coordinate , the result returned from the next to 12 minutes to the service provider is a rectangle which has the area of 600 square meters and contains the coordinate in case of time and position, respectively IV INDEX STRUCTURE The base structure of the OST-tree is that of the TPR-tree for indexing the spatio-temporal data However, in order to specify the authorization and the degree of accuracy of user’s position and time, the node structure will be modified to attach more information Specifically, in addition to the tpbr, each node contains a pointer p pointing to the list of entries Each entry has the form of a 4-tuple , indicating that a service provider with the identity idsp can access sensitive information of a user with the identity iduser at the degree of accuracy of user’s position and time specified by the value ∆s and ∆t, respectively Fig illustrates the structure of the OSTtree For the illustration purpose, the values of authorizations αi (i=1 5) in this figure are α1 = , α2 = , α3 = , α4 = , and α5 = Our goal is to develop an index structure that can incorporate the accuracy degree of user’s position Therefore, this accuracy degree parameter must be in the hierarchical form The OST-tree achieves this hierarchy well Since the tpbr in a TPR-tree is already organized in hierarchical structure, the OST-tree inherits this property to hierarchically organize the bounding rectangle containing the user’s exact position that will be returned to the service providers More specifically, when traversing from the root node to a leaf node in the OSTtree, the degree of accuracy of user’s position increases because the area of the bounding rectangle is smaller and vice versa For example, in the traversal path N1-N5-N14 (see Fig 1), the areas of the returned rectangles reduce from 1000m2 to 500m2 and 0m2 corresponding to α1, α3, and α5 This means that the degree of accuracy of user’s position increases Figure OST-tree structure A Privacy Information Overlaying and Insertion The privacy information overlaying and insertion process happen in parallel We traverse the OST-tree from the root node down to the leaf node to place the new object in the suitable leaf node (by applying the insertion algorithm as shown in [5,8]) and, at the same time, recursively compare the degree of accuracy of user’s position (∆s) with a spatial extent of each node (N■) in the insertion path to find the appropriate node overlaying privacy information We have two possible scenarios for this comparison: • Case 1: If (N is the appropriate sub-tree) and (∆s≥ N■), we overlay α on N and continue the insertion process • Case 2: If (∆s< N■), depend on the level of N, we have two scenarios: If N is a non-leaf node, we choose an appropriate sub-tree rooted at N (complying with the algorithm ChooseSubtree of R*-trees [8]) and continue the overlaying process If N is a leaf node, we overlay and insert the new object into this node If a moving object has already existed in the index structure and the user wants to add new policies, we find the appropriate node in the insertion path to overlay privacy information Since the authorization is put as high as possible in the OST-tree, the search process can stop at some internal node if the match occurs Thus, we not always have to traverse to the leaves to find a user’s exact position as in algorithms separated from the database level For example, if the service provider #S101 wants to obtain the position of user #U135, the search process stops at the internal node N6-N7 and returns the result But, in the case of an algorithm separated from the database level, we have to traverse to the leaf node N15, where the position of #U135 belongs to, to retrieve the user’s exact position and then obfuscate it B Privacy Analysis Adversary model: the adversary tries to manipulate the obfuscated region to infer the user’s exact location For obfuscation techniques, the relevance [11] is used to measure the location privacy protection The lower the relevance, the higher the location privacy protection is, and thus the lower the probability an adversary can infer the user’s exact location So, in order to analyze the location privacy protection of our proposed approach with that of the approach that separates the algorithm from the database level, we will compare the relevance values of the two approaches M ( Pb + t ) ≤ d ⇒ M ≤ d Pb + t So, the number of disk accesses is: Ω(log d Pb + t n) (7) For the OST-tree, we have two cases: the list of authorizations is pointed by a pointer or embedded directly into the nodes For the first case, an internal node of OST-tree contains the time-parameterized bounding rectangles, block pointers and an authorization pointer; these must fit into a single block: M ( Pb + t ) + Pa ≤ d ⇒ M ≤ For the approach that separates the algorithm from database level, the relevance is: d − Pa Pb + t So, the number of disk accesses is: Rs = ( Ai ∩ A f ) (5) Ω(log d − Pa n) Ai A f where Ai is the location measurement [11] and depends completely on the positioning technology, and Af is the obfuscated region created by the privacy-preserving algorithm To calculate the relevance of our proposed approach, we can simply replace Af by ∆s in (5) However, the concept of relevance in [11] only concerns about spatial privacy protection By taking into account the temporal element, we extend the relevance concept to use for both spatial and temporal privacy protection as follows: (8) Pb + t For the second case, an internal node will have the same number of time-parameterized bounding rectangles and block pointers, but the size of authorization depends on the number of authorization placed in each node Hence, the order M can be calculated as follows: M ( Pb + t ) + aS a ≤ d ⇒ M ≤ d − aS a Pb + t So, the number of disk accesses is: Rst = ( Ai ∩ Δ s ) Ai Δ s Δt Ω(log d − aSa n) (6) where ∆s and ∆t are the degree of accuracy of user’s position and time, respectively From (5) and (6), we can see that Rs ≥ Rst (since Af = ∆s and ∆t ≥ 1), meaning that the degree of privacy protection of our proposed approach is higher than that of approach separating the algorithm from database level More specific, incorporating the temporal dimension into the relevance concept reduces the probability that an adversary can infer the user’s exact location because the adversary has to guess not only where, but also when the user’s exact position belongs to C Performance Analysis In this section, we compare the performance between the TPR-tree and OST-tree in terms of the number of disk accesses For the analysis, let us suppose that n is the number of moving users; t is the size of each tpbr; d is the disk block size; M is the maximum number of tree pointers in one node; Pb is the size of block pointer pointing to a subtree; Pr is the size of authorization pointer pointing to the list of authorizations; a is the average number of authorization placed in each node; Sa is the size of each authorization α For the TPR-tree, an internal node contains only timeparameterized bounding rectangles and block pointers; these must fit into a single block: Pb + t (9) In OST-tree, if the authorization is embedded directly into the node, it will require less disk accesses than that of the first case where the authorization is pointed by a pointer Also, the above analysis shows that when traversing to leaf nodes is required in two indexes, the TPR-tree has the lower height and requires less disk accesses than that of the OST-tree since the OST-tree has to reserve the space to store the authorization in each node However, in most cases, we not have to traverse to the OST-tree leaves to retrieve the result Because if the pair value of the query’s authorization is matched with that of some internal node, we will stop at this node and return the result without further traversing on the OST-tree Hence, the OST-tree requires less disk accesses than that of the TPR-tree Only in the worst case where users are willing to reveal their exact position to service providers, we have to traverse to the leaf node to retrieve the exact result For example, if the service provider #S101 wants to get position of the user #U134, the query needs visiting only two nodes (root node and N1) instead of three nodes, thereby reducing the number of disk accesses comparing to TPR-trees V PERFORMANCE EXPERIMENTS To conduct the experiments, we use the open source implementation of TPR-trees called SaIL [9] Both TPR-tree and OST-tree are implemented in C++, and all the experiments are conducted on a Core Duo Personal Computer with GB of memory In the experiments, we use uniform data, where object positions are randomly generated and speeds ranging from 0.25 to 1.66 are chosen at random The effective fill factor is usually close to 70% The fan-out of internal and leaf nodes is 20 with a 4K page size The maximum update interval is 20 The number of query is 35 and the horizon time is 20 the leaf node to get the result in OST-tree Only in some cases, where users want to reveal their exact locations to service providers, the OST-tree is not better than TPR-tree, because we have to traverse to the leaf nodes of OST-tree Hence, OST-tree is better than TPR-tree in cases users just want to reveal a low degree of accuracy of their locations to service providers Since the node of OST-tree contains authorizations, the number of records in each OST-tree’s node is smaller than that of the TPR-tree So, the OST-tree requires more nodes to contain the same number of moving objects (cf Fig 2) In this work, we have introduced the OST-tree capable of obfuscating the spatio-temporal data of users Although the OST-tree requires more storage space and update overhead, it achieves the lower querying cost and higher privacy protection comparing to the TPR-tree Figure Storage cost Figure Relevance By incorporating the time into the privacy model, the average relevance of our proposed approach (Rst) is smaller than that of the obfuscation algorithm (Rs) (cf Fig 3) VI Future work will extend the probability distribution of user’s position so that the probability that a user’s position (x, y) belongs to a region is not uniformly distributed Because, in real life, the region where a user belongs to depends on many factors related to geography, it is easy for the adversary to infer a user’s exact position in the obfuscated area if the probability distribution of user’s position is uniformly distributed REFERENCES [1] Fig and Fig compare the insert cost between the TPRtree and OST-tree in terms of CPU time and number of I/O operations, respectively The insert cost of OST-tree is higher than that of TPR-tree since we have to spend extra time (or number of I/O operations), besides the time for insertion process, to find appropriate node to overlay authorization [2] Given the mobility of users, the update cost as shown in Fig of OST-tree is higher than that of TPR-tree, because OST-tree has to incur the additional cost of updating the authorization (moving α from current node to another node corresponding to the newly updated position of a user) [5] [3] [4] [6] [7] [8] [9] Figure Insert cost (CPU time) Figure Insert cost (I/Os) [10] [11] [12] [13] Figure Update cost Figure Point location query cost Fig compares the query cost between the two indexes in terms of the number of I/O operations The query in this case is point location queries, which retrieves the rectangle containing the location of user In general, the query cost of OST-tree is better than that of TPR-tree since we not have to traverse to CONCLUSION AND FUTURE WORK [14] [15] [16] M Gruteser, D Grunwald: “Anonymous Usage of Location-Based Services Through Spatial and Temporal Cloaking” MOBISYS, 2003 G Bugra, L Ling: “Protecting Location Privacy with Personalized kAnonymity: Architecture and Algorithms” IEEETMC, 7(1):1–18, 2008 C.A Ardagna, M Cremonini, E Damiani, S.D.C Vimercati, P Samarati: “Location-Privacy Protection through Obfuscation-based Techniques” DBSEC, 2007 F.M Mohamed: “Privacy in Location-based Services: State-of-the-art and Research Directions” MDM, 2007 S Saltenis, C.S Jensen, S.T Leutenegger, M.A Lopez: “Indexing the Positions of Continuously Moving Objects” ACM SIGMOD, pp 331– 342, 2000 V Atluri, H Shin: “Efficient Security Policy Enforcement in a Location Based Service Environment” DBSEC, 2007 T.K Dang, Q.C To: “An Extensible and Pragmatic Hybrid Indexing Scheme for MAC-based LBS Privacy-Preserving in Commercial DBMSs” ACOMP, pp 58–67, 2010 N Beckmann, H.-P Kriegel, R Schneider, B Seeger: “The R*-tree: An Efficient and Robust Access Method for Points and Rectangles” ACM SIGMOD, pp 322–331, 1990 M Hadjieleftheriou , E Hoel , V.J Tsotras: “SaIL: A Spatial Index Library for Efficient Application Integration” Geoinformatica, 9(4):367–389, 2005 J H Jafarian, M Amini, R Jalili: “Protecting Location Privacy through a Graph-based Location Representation and a Robust Obfuscation Technique” ICISC, 2008 C.A Ardagna, M Cremonini, S.D.C Vimercati, P Samarati: “An Obfuscation-Based Approach for Protecting Location Privacy“ TDSC, 8(1):13-27, 2011 M.F Mokbel, T.M Ghanem, W.G Aref: “Spatio-temporal access methods” IEEE Data Engineering Bulletin, 26(2):40–49, 2003 V Atluri, N.R Adam, M Youssef: “Towards a unified index scheme for mobile data and customer profiles in a location-based service environment” NG2I, 2003 M Cai, P.Z Revesz: “Parametric R-Tree: An Index Structure for Moving Objects” COMAD, 2000 A Guttman: “R-trees: A Dynamic Index Structure for Spatial Searching” SIGMOD, pp 47–57, 1984 T.T Anh, T Q Chi, T.K Dang: “An Adaptive Grid-Based Approach to Location Privacy Preservation” ACIIDS, pp 133–144, 2010 Bài báo hội nghị ACIIDS 2011 (Asian Conference on Intelligent Information and Database Systems) Daegu, Hàn Quốc , 20 -22 tháng năm 2011 Bob-Tree: An Efficient B+-Tree Based Index Structure for Geographic-Aware Obfuscation Quoc Cuong To1, Tran Khanh Dang1, and Josef Küng2 Faculty of Computer Science & Engineering, HCM University of Technology, Vietnam {qcuong, khanh}@cse.hcmut.edu.vn FAW Institute, Johannes Kepler University Linz, Austria/Europe josef.kueng@faw.jku.at Abstract The privacy protection of personal location information increasingly gains special attention in the field of location-based services, and obfuscation is the most popular technique aiming at protecting this sensitive information However, all of the conventional obfuscation techniques are geometry-based and separated from the database level Thus, the query processing has two timeconsuming phases due to the number of disk accesses required to retrieve the user’s exact location, and the location obfuscation Also, since these techniques are geometry-based, they cannot assure location privacy when the adversary has knowledge about the geography of the obfuscated region We address these problems by proposing Bob-tree, an index structure that is based on Bdual-tree and contains geographic-aware information on its nodes Experiments show that Bob-tree provides a significant improvement over the algorithm separated from the database level for query processing time and location privacy protection Keywords: LBS, obfuscation, privacy-preserving, spatio-temporal indexing Introduction With the rapid development of mobile technologies, there are more than 4.5 billion mobile users by the year 2009 and the number is expected to increase more Among various services for mobile phone, the location-based service (LBS) is the most promising one since it supplies users with many value-added services In order to benefit from these services, users, however, have to reveal their sensitive information such as their current location Such novel services pose many challenges because users are not willing to reveal their sensitive information but still want to benefit from these useful services We consider location privacy as an enabling technology for the proliferation of LBS, and so must balance the privacy and service quality To solve this privacy-preserving problem, many techniques have been suggested and the most popular one is obfuscation [1,2,3,4] The general idea of this technique is to degrade the quality of user’s location information but still allow them to use services with acceptable quality However, this technique has two major limitations First, all of the proposed obfuscation algorithms are geometry-based In other words, they not consider the geographic feature constituting the obfuscated region (e.g., a lake in an obfuscated area) Based on the knowledge of the region geography, an N.T Nguyen, C.-G Kim, and A Janiak (Eds.): ACIIDS 2011, LNAI 6591, pp 109–118, 2011 © Springer-Verlag Berlin Heidelberg 2011 110 Q.C To, T.K Dang, and J Küng adversary can increase the inference probability of a user’s exact location Second, these algorithms are separated from the database level, making the algorithms go through two time-consuming phases due to the number of disk access required to (1) retrieve user’s exact location on the database level, and (2) obfuscate this information on the algorithm level Also, it is prone to privacy violation and more deployment complexity because both phases, together with the communication channel between them, must be trusted Motivated by these reasons, in this work, we create a new geographic-aware obfuscation technique and propose Bob-tree, a new spatio-temporal index structure based on Bdual-tree [8] By taking into account the geographic feature inside the obfuscated region, our new technique ensures a higher privacy protection degree than that of the geometry-based obfuscation techniques in [1,2,3,4] Furthermore, because Bob-tree embeds geographic-aware region information on its nodes, the process of calculating the obfuscated region can be done in only one phase: traversing the index structure to retrieve the appropriate obfuscated region that contains a user’s exact location This one-phase process can reduce the processing time considerably comparing to the two-phase process mentioned above The rest of this paper is organized as follows Section briefly reviews related work Section presents the new geographic-aware obfuscation technique Section introduces Bob-tree Section gives privacy and performance analyses Section presents experimental results, and section gives concluding remarks Related Work 2.1 Location Obfuscation Among the most popular techniques to protect user’s location privacy, obfuscation based techniques have gained much interest due to its intuition and implementation simplicity Location obfuscation aims at hiding user’s exact location by decreasing the quality of user’s location information In [1], Ardagna et al propose obfuscation techniques by enlarging the area containing user’s real location However, these techniques just deal with geometry of the obfuscated region, not concerning about what is included inside (i.e., the geographic feature) Of late, the semantic-aware obfuscation technique introduced in [13,14] considers sensitive feature types inside an obfuscated region But, this technique does not concern about how big the area of the obfuscated region is It focuses only on the probability that a user is located in a sensitive place In various LBS, however, an indispensable requirement is that the area of any obfuscated region must be big enough to protect user’s location privacy With obfuscation techniques, the bigger the area, the harder an attacker can infer the user’s exact location If the area, however, is too big, it can affect the quality of location-based services So, it is the responsibility of users to decide what location accuracy degree to be revealed to which service providers Inspired by this, in [9], Dang et al introduce an architecture to classify the service providers depending on the user’s trust This architecture inherits the property of mandatory access control to label each service provider so that users only reveal their locations on an appropriate level based on the labels assigned to the service providers Similar to this idea, our Bob-Tree: An Efficient B+-Tree Based Index Structure 111 proposed approach classifies service providers in the way that the more reliable the service providers, the smaller area of the obfuscated region they can obtain 2.2 Spatio-temporal Structures for Indexing Moving Objects A number of recent researches focus on indexing the present and future positions of moving objects [5], and the two most dominant popular methods are parametric spatial access and space-filling-curve transformation With the former, the main idea is that the bounding rectangle is a temporal function, and thus can enclose moving objects The most popular access method in this category, TPR-tree [6], inherits the idea of parametric bounding rectangles in R-tree [15] to create time-parameterized bounding rectangles (TPBR) However, the TPBR bear two crucial limitations that dramatically affect the performance of TPR-tree: overlapping and high storage cost The latter overcomes these two limitations by using the space filling curves (e.g., Peano/z-order, Hilbert) to transform object locations from multi-dimension to onedimension space Then, these one-dimensional values are indexed by a B+-tree, which is the typical one-dimensional index Two most popular access methods in this category is Bx-tree [7] and Bdual-tree [8] The Bx-tree outperforms the TPR-tree by factors of as much as 10 but it fails to consider object velocity, and thus the query processing with Bx-tree retrieves a large number of false hits, which seriously affects its performance Bdual-tree overcomes this limitation by capturing also the velocity information By using the partitioning grid that divides the data space into cells, Bdualtree can effectively answer progressive spatio-temporal queries which are poorly supported by Bx-tree Despite the existence of several indexing techniques for present and future positions, to the best of our knowledge, no moving-object index has yet been reported in the literature that achieves the goal of obfuscating the geographic-aware region 2.3 Access Methods for Privacy-Preserving All of existing privacy-preserving algorithms are separated from the database level [1,2,3,4,13] This separation, as mentioned above, makes the two-phase query processing time-consuming Motivated by this, Atluri et al [10] create SSTP-tree, a unified index structure that embeds users’ profile vectors directly into its nodes, to support profile conditions The limitation of this access method is that it only allows or denies the access request of subjects, but does not concern about obfuscating the user’s location In other words, the access request evaluation has only two levels of result: reject or accept Our proposed index structure, however, has multi-level form of result as evaluating an access request, based on the user’s trust in service providers Very recently, the OST-tree [11] embeds the user’s privacy policy into its nodes and obfuscates spatio-temporal data But, since OST-tree is based on TPR-tree and concerns only with geometry-based obfuscation, it has high storage cost and quite low privacy protection It is evident from the above discussions that currently there does not exist any spatio-temporal index structure that can effectively handle geographic-aware obfuscation Again, all of them are based on TPR-tree which is much less efficient than Bdual-tree in terms of storage cost and query processing time [8] Towards this 112 Q.C To, T.K Dang, and J Küng goal, in this paper, we propose the Bob-tree, a structure originally based on Bdual-tree, but with essential modifications to support geographic-aware obfuscation Geographic-Aware Obfuscation As discussed above, although there exist a variety of research activities in spatial obfuscation, none of the proposed techniques concern with the geographic features This can leave a backdoor to privacy open as the adversary has the geography knowledge of the obfuscated region To address this problem, in this section, we present a new geographic-aware obfuscation technique that takes into account both the area of and the geographic feature inside the obfuscated region This newly proposed technique not only ensures the same quality of service as others as in [1,2,3, 4] (because the obfuscated regions produced by these techniques have the same area), but also has better user’s location privacy protection (as proved in section 5.2) a) Fig Example of unapproachable region b) Fig A part of internal node of Bob-tree and its projection on coordinate space In our proposed technique, the region is divided into two geographic features: approachable and unapproachable parts The unapproachable parts represent places where users, because of some reasons, cannot enter In contrast, users can enter approachable parts For example, the lake and mountain are the unapproachable parts of the region in Fig because no boats are allowed on the lake and user cannot climb the mountain Our proposed obfuscation technique returns to service providers the obfuscated regions that not only have the same area as that of geometry-based techniques, but also contain only approachable features inside it Adversary model: Using the external knowledge of the geographic feature inside the obfuscated region, the adversary tries to eliminate the unapproachable parts of the obfuscated region, leaving only the approachable parts In this way, the area of the original obfuscated region created by the previously proposed algorithms, e.g., as in [1,2,3,4], will be reduced since the returned region, in this case, includes both the approachable and unapproachable parts As a result, the probability that an adversary, with his external knowledge, can infer the user’s exact location within the obfuscated region is higher The adversary, however, cannot reduce the area of the region created by our newly proposed geographic-aware obfuscation technique because this returned region includes only the approachable parts Thus, for the two techniques, although the areas of the two regions are the same, the region created by our proposed Bob-Tree: An Efficient B+-Tree Based Index Structure 113 technique achieves better location privacy protection For example, in Fig 1, since the region r1 contains two unapproachable parts (a mountain and a lake), the adversary can reduce r1 to a smaller region by eliminating the intersection of r1 with the lake and mountain The region r2, however, does not intersect with the lake or mountain, and so it is impossible for the adversary to reduce this region Index Structure The base structure of the Bob-tree is originated from that of the B+-tree which indexes the one-dimensional values Similar to Bdual-tree [8], a d-dimensional moving point o in our index structure with a reference timestamp o.tref, d coordinates o[1],…,o[d], and d velocities o.v[1],…,o.v[d] has its dual in the 2d-dimensional vector as follows: odual=(o[1](Tref),…,o[d](Tref),o.v[1], ,o.v[d]), where o[i](Tref) is the i-th coordinate of o at time Tref and is given by: o[i](Tref)=o[i] + o.v[i]*(Tref-o.tref) This 2d-dimensional point in a dual space is mapped to an one-dimensional value using Hilbert curve, and then this value is indexed by B+-tree However, in order to specify the geographic-aware region, the node structure is modified to attach this information Specifically, beside the one-dimensional Hilbert value transformed from the corresponding multi-dimensional point, each internal node contains the area of the approachable regions corresponding to the Hilbert range of the node ob Fig B -tree Fig illustrates the structure of the Bob-tree Each internal node is of the form where Pi is the tree pointer, Si is the area of the approachable regions associated with a Hilbert interval [Ki-1,Ki], where Ki is the search key value Each leaf node is of the form where Pri is a data pointer, and Pnext points to the next leaf node of the Bob-tree In Bdual-tree, each internal node e is implicitly accompanied by an interval [e.hl, e.hu], where e.hl and e.hu are Hilbert values of the starting and ending cells of the region represented by node e, and thus e is associated with e.hu-e.hl cells The area of a region associated with each internal node is then calculated by multiplying the total number of cells of each internal node with the area of the projection of each cell into the coordinate space However, the region associated with e.hu-e.hl cells includes both approachable and unapproachable regions Thus, in order to increase the privacy degree of the region, we must filter out all unapproachable portions in this region Assume that the projection result of a region associated with an internal node e (associated with e.hu-e.hl cells in the 2d-dimensional space) into the coordinate 114 Q.C To, T.K Dang, and J Küng space is the region consisting of e1.hu-e1.hl cells (in the 1d-dimensional space), and there are x unapproachable cells within this 1d-dimensional region Obviously, the number of approachable cells associated with e is e1.hu-e1.hl-x Thus, the area of approachable regions associated with e is (e1.hu-e1.hl-x)Sc, where Sc is the area of each cell in the 1d-dimensional space For example, Fig 2a shows an internal node and its associated Hilbert value transformed from the 4-dimensional space Fig 2b is the projection of this node into the 2-dimensional coordinate space The five gray cells 27, 28, 35, 36, and 38 are unapproachable Assume that area of each cell is 100m2, the area of approachable regions associated with this internal node is (45-235)x100=1700m2, where two values 23 and 45 are the Hilbert values of projection into the 2-dimensional space of the two cells 20, 46 in the 4-dimensional space The authorization α used in our approach is a 3-tuple where idsp is the identity of the service provider, iduser is the user’s identity, and ∆s is the area of the approachable region The meaning of an authorization is that a user iduser only allows the service provider idsp to access his/her sensitive personal location information with an accuracy degree of ∆s For example, the user #U232 is willing to reveal his position in an approachable region, with the area of 600m2, to the advertising service #S101 This authorization can be expressed as α1 = If the user’s exact position is at coordinate , the result returned to the service provider is an approachable region of 600m2, containing the coordinate The area of an obfuscated region associated with each node in a Bob-tree is hierarchical because the interval [e1.hl,e1.hu] is smaller when traversing from the root to the leaf nodes Therefore, when traversing from the root down in a Bob-tree, the accuracy of user’s position increases because the area of the obfuscated region is smaller and vice versa Based on this basic property, if a service provider that has a low trust level from a user wants to retrieve the user’s location, the search process can stop at some internal nodes that may be close to the root Search, Insertion, Deletion and Update with Bob-tree The following algorithm outlines the procedure to search for a record in Bob-tree Algorithm Search dual Input: a dual vector of a moving point o , area of an obfuscated region S Output: the region that its area equals S and contains dual a moving point with a dual vector o dual Transform o into the Hilbert value h while (n is not a leaf node) Search node n for an entry i such that Ki-1 < h ≤ Ki if S = Si then return the region corresponding to the Hilbert interval [Ki-1;Ki] else if S > S1 then return ExtendCell(Ki-1,Ki,S) else n ← n.Pi //the i-th tree pointer in node n Search leaf node n for an entry (Ki,Pri) with h=Ki if found then retrieve the user’s exact location else the search value h is not in the database Bob-Tree: An Efficient B+-Tree Based Index Structure 115 The algorithm ExtendCell(Ki,Kj,S) extends the region corresponding to the Hilbert interval [Ki,Kj] by adding more approachable cells until the area of the extended region equals S This ensures that the obfuscated region produced by our technique has the same area as that of the geometry-based techniques and achieves better location privacy protection because it contains only approachable features In this search algorithm, if the area S is big (e.g., the service provider gets a low trust level from the user, and thus can only obtain a big region containing the user’s exact location), the search process can stop at some internal node near the root In this case, the disk access number is reduced significantly So, we not have to traverse to the leaf node to find the user’s exact position as in the previously proposed algorithms, which are separated from the database level With our approach, only when users are willing to reveal their exact location to service providers, the search process must traverse to leaf nodes The insert, delete and update operations of Bob-tree are similar to those of B+-tree However, since these operations change the key value in each node, the area of the obfuscated region associated with each node needs to be re-calculated Privacy and Performance Analyses 5.1 Privacy Analysis For obfuscation techniques, the relevance [1] is used to measure the location privacy protection The lower the relevance, the higher the location privacy protection is, and thus the lower the probability an adversary can infer the user’s real location Rs = ( Ai ∩ A f ) Ai A f (1) where Ai is location measurement [1] and depends completely on the used positioning technology, and Af is the obfuscated region area With the algorithms separated from the database level, because Af includes both the accessible and inaccessible regions, an adversary can eliminate the inaccessible part of Af (cf the adversary model in section 3) We call Afa the accessible region of Af after eliminating the inaccessible (e.g Afa ≤ Af) The relevance of Afa is calculated as follows: Rsa = ( Ai ∩ A fa ) Ai A fa (2) In Bob-tree, since Af includes only the accessible region, an adversary cannot reduce Af to a smaller region Thus, the relevance of our proposed approach is still Rs Since Afa ≤ Af, from (1) and (2) we have Rsa ≥ Rs This means that the location privacy protection of our proposed approach is better than that of the introduced algorithms More specifically, by considering the geographic feature inside the obfuscated region, we reduce the probability that an adversary can infer the user’s exact location Similar to TPR-tree, the adversary can eliminate the inaccessible parts of Af in relevance of OST-tree However, the temporal obfuscation [11] in this relevance compensates the inaccessible part of Af as follows: 116 Q.C To, T.K Dang, and J Küng R st = ( Ai ∩ A f ) Ai A f Δt (3) Therefore, if the area of the inaccessible parts of Af or the temporal obfuscation is small, the relevance of Bob-tree is still smaller than that of OST-tree (e.g., Rs ≤ Rst.) 5.2 Performance Analysis In this section, we compare the performance between the TPR-tree, OST-tree and Bobtree in terms of the tree height and number of disk accesses in the query processing Let m, m’, q, q’ r, r’ denote the average number of entries (i.e., the fan-out) at the root, internal nodes, and leaf nodes; R, R’ be the total number of records being indexed; and d, d’ be the depth of the Bob-tree and TPR-tree, respectively Then we have: ⎛ R R = ( m + 1)( q + 1) d −1 r ⇒ d = log q +1 ⎜ ⎝ ( m + 1) r ⎞ ⎟ +1 ⎠ (4) Similarly, we have: ⎛ ⎞ R +1 d ' = log q +1 ⎜ , , ⎟ + ( 1) m r ⎝ ⎠ (5) In Bob-tree, since each node contains only the integers representing the search key values and areas of the approachable regions, the storage cost for each entry is low, and thus the node fan-out is high On the contrary, in TPR-tree, each node contains the TPBR that require high storage cost; hence the fan-out is low In other words, averagely each internal node of TPR-tree contains fewer entries than Bob-tree (m’

Ngày đăng: 03/02/2021, 22:56

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN