Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 93 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
93
Dung lượng
1,95 MB
Nội dung
ĐẠI HỌC QUỐC GIA THÀNH PHỐ HỒ CHÍ MINH TRƢỜNG ĐẠI HỌC BÁCH KHOA NGUYỄN VIỆT HÙNG NHỮNG VẤN ĐỀ BẢO MẬT KHI TRUY VẤN CƠ SỞ DỮ LIỆU XML ĐỘNG ĐƢỢC “OUTSOURCED” CHUYÊN NGÀNH: CÔNG NGHỆ THÔNG TIN MÃ SỐ NGÀNH: 60.48.01 LUẬN VĂN THẠC SĨ – TP Hồ Chí Minh 02/2007 – Trang 2/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases CƠNG TRÌNH ĐƯỢC HOÀN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA ĐẠI HỌC QUỐC GIA THÀNH PHỐ HỒ CHÍ MINH Cán hướng dẫn khoa học: Tiến sĩ ĐẶNG TRẦN KHÁNH Cán chấm nhận xét 1: Tiến sĩ NGUYỄN ĐỨC CƯỜNG Cán chấm nhận xét 2: Tiến sĩ TRẦN VĂN HOÀI Luận văn thạc sĩ bảo vệ HỘI ĐỒNG CHẤM BẢO VỆ LUẬN VĂN THẠC SĨ TRƯỜNG ĐẠI HỌC BÁCH KHOA, ngày 03 tháng 02 năm 2007 SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh TRƯỜNG ĐẠI HỌC BÁCH KHOA PHÒNG ĐÀO TẠO SĐH CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM ĐỘC LẬP – TỰ DO – HẠNH PHÚC Tp HCM, ngày tháng năm 200 NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: Nguyễn Việt Hùng Ngày, tháng, năm sinh: 14 tháng 01 năm 1981 Chuyên ngành: Công nghệ thông tin Phái: Nam Nơi sinh: Kiên Giang MSHV: 00703170 I- TÊN ĐỀ TÀI: Các vấn đề bảo mật việc truy vấn CSDL XML động outsourced II- NHIỆM VỤ VÀ NỘI DUNG: - Tìm hiểu tổng quan vấn đề liên quan bảo mật CSDL outsourced - Tìm hiểu nghiên cứu liên quan khía cạnh Query Assurance - Đề xuất giải pháp kiểm tra query assurance cho CSDL XML outsourced - Xây dựng chương trình thực giải pháp, đo đạc đánh giá giải pháp đề III- NGÀY GIAO NHIỆM VỤ : IV- NGÀY HOÀN THÀNH NHIỆM VỤ: V- CÁN BỘ HƢỚNG DẪN: Tiến sĩ Đặng Trần Khánh CÁN BỘ HƢỚNG DẪN (Học hàm, học vị, họ tên chữ ký) CN BỘ MÔN QL CHUYÊN NGÀNH Nội dung đề cương luận văn thạc sĩ Hội đồng chun ngành thơng qua TRƢỞNG PHỊNG ĐT – SĐH Ngày tháng năm 2006 TRƢỞNG KHOA QL NGÀNH Trang 4/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases ACKNOWLEDGEMENT I would like to express my gratefulness To my mom and dad who has brought me up and done everything for my life; To my advisor, Dr DangTran Khanh, who has advised me with all his heart; To my friends who are always in my side, and especially, to my colleagues who are willing to help me complete some parts of the work SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Trang 5/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases ABSTRACT With the impressive improvement of the network technologies, database outsourcing is emerging as an important trend beside the “application-as-a-service” In this model, data owners ship their data to external service providers Service providers data management tasks and offer their clients a mechanism to manipulate outsourced database Since a service provider is not always fully trusted, security and privacy of outsourced data are important issues These problems are referred as data confidentiality, user privacy, data privacy and query assurance Among them, query assurance takes a crucial role to the success of the database outsourcing model To the best of our knowledge, however, query assurance, especially for outsourced XML database, has not been concerned reasonably in any previous work In this paper, we propose a novel index structure, Nested Merkle B+ Tree, combining the advantages of B+ tree and Merkle Hash Tree to completely deal with three issues of query assurance known as correctness, completeness and freshness in outsourced XML database Experimental results with real dataset prove the effeciency of our proposed solution SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Trang 6/93 TÓM TẮT Với phát triển vượt bậc lĩnh vực công nghệ mạng cho đời nhiều dịch vụ từ xa, đặc biệt đời dịch vụ “application as a service” Dịch vụ giúp cho người tiếp cận cách hợp pháp với phần mềm với chi phí thấp Thời gian gần đây, xuất xu cho phép làm giảm chi phí quản lý liệu qua dịch vụ gọi “database outsourcing” Với dịch vụ này, đơn vị, tổ chức lưu trữ thơng tin, liệu máy chủ nhà cung cấp dịch vụ Các nhà cung cấp dịch vụ đảm nhận cơng tác bảo trì máy chủ, bảo trì phần mềm DBMS bảo trì CSDL khách hàng Bên cạnh đó, họ cung cấp chế cho phép đơn vị, tổ chức thao tác CSDL Tuy nhiên, thông tin vốn tài sản q báu, nên đơn vị hồn tồn khơng thể tin cậy nhà cung cấp dịch vụ việc đảm bảo an tồn cho CSDL Do phát sinh yêu cầu bảo mật CSDL outsourced Các vấn đề tóm gọn bốn yêu cầu bảo mật, bao gồm: data confidentiality, data privacy, user privacy query assurance Ngoài phần giới thiệu tổng quan kết đạt lĩnh vực data outsourcing, tài liệu đưa cấu trúc mục cho liệu XML Dựa cấu trúc này, tài liệu trình bày phương pháp đảm bảo truy vấn cho CSDL XML outsourced số kết thực nghiệm thực cho phương pháp SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Trang 7/93 MỤC LỤC ACKNOWLEDGEMENT ABSTRACT Chƣơng GIỚI THIỆU 1.1 Data Confidentiality 12 1.2 User Privacy Data Privacy 13 1.3 Query Assurance 17 1.4 Nhận xét 19 Chƣơng CÁC NGHIÊN CỨU LIÊN QUAN 22 2.1 Khái niệm 22 2.2 Hướng tiếp cận dùng chữ ký điện tử 23 2.3 Hướng tiếp cận sử dụng cấu trúc liệu đặc biệt 25 2.4 Hướng tiếp cận Challenge – Response 28 2.5 Hướng tiếp cận dựa vào đặc thù toán 30 2.6 Bảo đảm truy vấn cho liệu dạng 31 2.7 Nhận xét 33 Chƣơng DỮ LIỆU XML 35 3.1 Mơ hình lưu trữ 35 3.2 Chỉ mục cho tài liệu XML 40 Chƣơng ĐẢM BẢO TRUY VẤN 42 4.1 Phương pháp 42 4.2 Nested B+ Tree 43 4.3 Tác vụ chọn 46 4.4 Các tác vụ cập nhật liệu 49 Chƣơng PHÂN TÍCH 51 Chƣơng THỰC NGHIỆM 58 Chƣơng KẾT LUẬN 63 Chƣơng PHỤ LỤC 67 8.1 Cấu trúc lưu trữ XML 67 8.2 Giải thuật gán nhãn (labeling) 67 8.3 Chương trình thử nghiệm 68 8.4 Lược đồ tài liệu mondial.xml 71 8.5 Kế hoạch thực thi truy vấn 72 8.6 Tóm lược nghiên cứu liên quan 73 8.7 Bài báo liên quan 83 SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Trang 8/93 Chương GIỚI THIỆU Thông tin nguồn tài nguyên quan trọng tổ chức Quản lý xử lý thông tin hiệu tập trung quan tâm người Với đời máy tính điện tử (eclectronic computer) máy tính cá nhân (personal computer – PC), ngành khoa học máy tính mang đến kỷ nguyên mới, kỷ nguyên thông tin, tác động mạnh mẽ đến lĩnh vực đời sống Dữ liệu lưu trữ thành các sở liệu (CSDL), thông thường, đặt nội tổ chức (in-house database) Điều đòi hỏi tổ chức phải đầu tư khoản chi phí cho việc quản lý hệ thống CSDL, bao gồm: thiết bị phần cứng (máy móc, hệ thống mạng), phần mềm (hệ quản trị CSDL – DBMS, chương trình ứng dụng cụ thể,…), nhân (nhân viên quản trị mạng, nhân viên quản trị CSDL,…) Cùng với phát triển xã hội nói chung tổ chức nói riêng, nhu cầu lưu trữ xử lý ngày gia tăng phức tạp Những yêu cầu làm tăng tổng chi phí quản lý Mặc dù, giá thành phần cứng giảm nhiều, chi phí quyền phần mềm, chi phí cho đội ngũ nhân viên quản trị có trình độ cao để quản lý hệ thống thông tin ngày phức tạp thật vấn đề đáng quan tâm tổng chi phí sở hữu (total cost of ownership) tổ chức Điều đặc biệt quan trọng tổ chức vừa nhỏ, tổ chức phi lợi nhuận,… Trong năm gần đây, tiến vượt bậc công nghệ mạng truyền thông cho đời hệ thống mạng tốc độ cao, băng thông rộng, khai sinh khái niệm “application as a service” Người dùng cần phải trả khoản phí nhỏ cho nhà cung cấp dịch vụ sử dụng phần mềm mà không cần phải quan tâm đến chi phí quyền, chi phí cài đặt bảo trì hệ thống SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Trang 9/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Bên cạnh đó, dịch vụ khác dần hình thành, “database as a service”, cung cấp cho người dùng nơi lưu trữ truy xuất liệu với chi phí thấp, mà khơng cần phải mua sắm thiết bị, địi hỏi phải có đội ngũ chun trách Điều giúp giảm đáng kể chi phí quản lý thông tin cho tổ chức Outsourced Databases Data Owner Services Provider Submit Queries Clients Result Data Hình 1.1 Mơ hình “Database as a Service” Trong mơ hình “database as a service”, người sở hữu liệu (data owner – DO) đặt CSDL nhà cung cấp dịch vụ (service provider – SP) cho khách hàng (clients, queriers – C, Q) thực tác vụ CSDL select, insert update Mơ hình cịn gọi “outsourced database services” (ODBS) Thông tin tài sản quan trọng tổ chức Việc đặt CSDL lưu trữ thông tin nơi không tin cậy bên tổ chức (nhà cung cấp dịch vụ) làm nảy sinh vấn đề bảo mật Chính vấn đề định tính khả thi Dịch vụ CSDL outsource (outsourced database services – ODBS) Các CSDL outsourced phải đảm bảo an toàn, ngăn cấm truy cập tổ chức/cá nhân thẩm quyền, kể nhà cung cấp dịch vụ Khi đó, nhà cung cấp dịch vụ trở thành đối tượng nguy hiểm việc đảm bảo bảo mật liệu Do xâm nhập từ bên ngoài, cao nhất, đạt khả truy cập hệ thống nhà cung cấp dịch vụ Vì nghiên cứu chủ yếu tập trung vào việc ngăn chặn hành vi xâm nhập nhà cung cấp dịch vụ (service provider – SP) Về mặt bản, vấn đề bảo mật CSDL SP chia thành bốn lĩnh vực sau [1]: SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Trang 10/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Data confidentiality tính nội liệu Chủ sở hữu liệu (data owner – DO) không muốn người khác khơng có thẩm quyền có khả truy cập CSDL mình, kể SP User privacy tính riêng tư người dùng Thơng tin hàng hóa Do bán cho công ty khác Các công ty khách hàng không muốn để lộ thông tin mà họ khai thác, kể DO SP Data privacy tính bảo mật liệu DO khơng muốn khách hàng khai thác nhiều thông tin mà họ phép khai thác Query Assurance tính bảo đảm truy vấn Khách hàng (Client) phải đảm bảo liệu mà nhận xác, đầy đủ từ CSDL nguyên thủy DO cung cấp, mà không bị thay đổi ý muốn Bảng 1.1 Các vấn đề bảo mật ODBS Song song với việc đảm bảo yêu cầu bảo mật, ta cần phải quan tâm đến hiệu thực truy vấn (performance) khả mở rộng CSDL (scalability, usability) Để đảm bảo data confidentiality, liệu mã hóa trước “outsourced” Tuy nhiên điều làm tăng tính phức tạp việc xử lý truy vấn liệu mã hóa mà phải đảm bảo yêu cầu bảo mật khác SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Trang 79/93 Out Source Database Data Confidentiality Thực thi câu query liệu mã hóa: Hacigümüş Paper: #5 Nội dung: - Tách câu query thành phần (1 thực thi server, xử lý client) - Bổ sung thêm số thông tin cho report Paper: #5 Các lổ hổng bảo mật Paper: #6 Khơng thỏa mãn tính semantically secure Paper: #6 Attrib: Cons Hàm băm khơng đảm bảo an tồn miền giá trị rời rạc nhỏ Paper: #6 Attrib: Cons Khơng đảm bảo tính xác kết trả câu query Paper: #6 Attrib: Cons Chưa che dấu câu query cách hồn thiện: server biết loại câu query select, insert hay update Paper: #6 Attrib: Cons Thực mã hóa theo dịng, đó, người dùng biết tất thuộc tính cịn lại dịng cho dù họ phép biết thuộc tính Paper: #6 Attrib: Cons Tìm kiếm liệu mã hóa XML Paper: #9 SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Trang 80/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Out Source Database Data Confidentiality Thực thi câu query liệu mã hóa: Hacigümüş Paper: #5 Tìm kiếm liệu mã hóa XML Paper: #9 Chiến lược tìm kiếm tuyến tính Paper: #9 Chiến lược tìm kiếm Paper: #9 Chỉ áp dụng so sánh tag XML Paper: #9 Cải tiến giải thuật tuyến tính ứng dụng chuyên cho XML Paper: #9 SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Trang 81/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Out Source Database Authentication and Integrity Query authentication and integrity Đề xuất Einar Mykletun Paper: #7 Chỉ xem xét cho mô hình unifiied client Paper: #7 Do mịn xác thực: mức record Sử dụng chữ ký điện tự để đảm báo integrity Paper: #7 Các Overhead factors: Mức độ tính tốn querier Dung lượng đường truyền querier Mức độ tính tốn server Mức độ tính tốn data owner Khơng gian lưu trữ server Paper: #7 Có thể sử dụng cho mơ hình multi clients Paper: #7 Khơng đảm bảo DP kiểm tra mức dòng Attrib: Cons Sử dụng Merkle Hash Trees Paper: #7 Là công cụ hữu ích để thực ODB integrity Tăng chi phí (4) có update/delete Chi phí (2) cao dùng Condensed-RSA Completeness SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Trang 82/93 Out Source Database Authentication and Integrity Query authentication and integrity Completeness Radu Sion mở rộng ringer để có đạt tính completeness Paper: #8 Chỉ sử dụng cho trường hợp batch-query Paper: #8 Sử dụng fake chanlege token để tăng tính bảo mật Paper: #8 Chỉ khảo sát mơ hình Unified Querier Paper: #8 Chưa xem xét rõ trường hợp update Paper: #8 SV: Nguyễn Việt Hùng HD: TS Đặng Trần Khánh Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases 8.7 Trang 83/93 Bài báo liên quan Query Assurance Verification for Dynamic Outsourced XML Databases Viet Hung Nguyen, Tran Khanh Dang, Nguyen Thanh Son Faculty of Computer Science and Engineering University of Technology, Ho Chi Minh City, Vietnam {hungnv, dtkhanh, sonsys}@cse.hcmut.edu.vn Abstract With rapid developments of the network technologies, database outsourcing is emerging as an important new trend beside the “application-as-a-service” In this model, data owners ship their data to external service providers Service providers data management tasks and offer their clients a mechanism to manipulate outsourced databases Since a service provider is not always fully trusted, security and privacy of outsourced data are significant issues These problems are referred to as data confidentiality, user privacy, data privacy and query assurance Among them, query assurance takes a crucial role to the success of the database outsourcing model To the best of our knowledge, however, query assurance, especially for outsourced XML databases, has not been concerned reasonably in any previous work In this paper, we propose a novel index structure, Nested Merkle B+-Tree, combining the advantages of B+-tree and Merkle Hash Tree to completely deal with three issues of query assurance known as correctness, completeness and freshness in dynamic outsourced XML databases Experimental results with real datasets prove the effeciency of our proposed solution Introduction With the impressive improvement of the network technologies, database outsourcing is emerging as an important trend beside the “application-as-a-service” In the outsourced database service model (ODBS), data owners ship their data to external service providers Service providers data management tasks and offer their clients a mechanism to manipulate outsourced database This helps organizations cut down the cost of hardware, software investment and maintainance Specially, this model gives organizations a cost-effective means of employing outside professional staff that is increasingly expensive Thank to the ongoing SV: Nguyễn Việt Hùng Josef Küng FAW Institute Johannes Kepler University of Linz, Austria jkueng@faw.uni-linz.ac.at development of the chip industry, computer hardware is getting cheaper and more powerful than before Nevertheless, we easily realize that that cost of software and experienced staff are increasing rapidly By outsourcing, organizations could take the benefits of new software and have their system maintained, upgraded professionally with a feasible price In the digital age, information is a valuable asset Since a service provider is not always fully trusted, organizations need to be guaranteed that their sensitive data is protected This kind of security requirement is different from that of traditional in-house databases The question is “how is the client‟s data protected against sophisticated attackers?” [1] Attackers here mean both intruders and server operators Because a server operator has rights to execute all database operations, traditional security barriers are useless with these malicious insiders This big challenge of the model is the source of numerous researches in the community These problems are stated as data confidentiality, user privacy, data privacy and query assurance [1, 16] - Data confidentiality: data owners not want anyone, except valid clients, to be able to see their outsourced data, even server operators - User privacy: server and even data owners should not know about the clients‟ queries and returned results - Data privacy: clients are not allowed to get more information than what they are querying from the server - Query assurance: server has to prove that the returned results are original, complete and up-todate Figure [1] brings out the meaning of some words in the previous definition In ODBS model, data owners store their data at a server managed by an external company called service provider Data owners manipulate their data through a secure communication Clients, who pay HD: TS Đặng Trần Khánh Trang 84/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases for data access permission, if needed, could request data directly from service providers Naturally, data owners themselves are clients To ensure data confidentiality, data is encrypted before outsourcing Many encryption algorithms could be employed These algorithms often fall into symmetric (DES, TripleDES, Rijndael) or asymmetric (RSA) group Commonly, symmetrical algorithms are prefered because of their balance between security and performance Clients Secure Communication External Servers (Outsourced Database Service Provider) Store/Retrieve Data Data Owners Service Providers Pay for data riev Ret ed ata Clients Figure ODBS models The next two requirements, user privacy and data privacy, are well done in many researches for both relational and tree-structured data A detailed discussion of these two issues are out of the scope of this paper More information can be found in [1, 2, 3, 4, 5, 6, 22, 23] In this paper, we focus on the last one, query assurance, for outsourcing XML data In outsourcing scenario, clients not fully trust servers Data owners have to give clients the ability to authenticate the results they received from servers In that respect, query authentication has three important issues: correctness, completeness and freshness Correctness means clients must be able to validate returned records to ensure that they have not been tampered with Completeness guarantees that there is no matched record excluded from the answers Finally, freshness shows that the answers are based on the latest version of the dynamic outsourced database Previous work often assumes that the database is readonly and the results are always fresh Hence, only correctness and completeness are considered In addition, most of these research activities have been done theoretically [7, 11, 14] Recently, in [15, 16], the authors mentioned about freshness, and carried out the evaluations with real datasets However, none of them says about query assurance for outsourced XML database Our contributions In this work, we propose a novel authenticated structure indexing all elements of a XML document, which includes both xml elements and xml attributes Basing on this structure, we introduce a solution to the three mentioned issues for the outsourced XML database SV: Nguyễn Việt Hùng The rest of the paper is organized as follows Section gives some background information Section briefly reviews related work Section discusses XML data storage format at server Section is about our novel structure and how to use this structure to support query assurance Section presents analysis and evaluation of the proposed solution Finally, section concludes the paper Preliminaries Basically, the idea of existing solutions mainly falls into two approaches: - The first approach relies totally upon digital signature scheme By signing on each record of each relation, clients can verify the correctness of the data Signature could be embedded with other information to give a proof of completeness, this kind of information will be discussed later However, no research in this approach has mentioned about freshness - The second approach depends on a complex structure, also known as Authenticated Data Structure (AuthDS) Servers use this structure to build a verification object VO, then pass it along with the answers After that, clients use the VO to verify the returned results Besides, there are some researches in this area not belong to any approaches above A brief discussion will be presented in the next section 2.1 Concept Collision-free hash function [15] For our purpose, a hash function H is an efficiently computable function that takes a variable-length input x and returns a fixed-length output y = H(x) Collision-free states that it is unable to find two inputs x1 ≠ x2, such that H(x1) = H(x2), in a computational time Public-key digital signature schemes [15] A public-key signature is a tool for authenticating the integrity and ownership of signed messages Digital signature scheme is the combination of asymmetric encryption and collision-free hash function Hashing the message, senders have its digest, and then apply an asymmetric encryption on that digest to produce the signature Signature is sent along with the original message When receivers receive the message, they hash the message for the digest, and then verify the accompanied signature If the verification is successful, receivers could be assured that what they received is originated from the sender The most popular digital signature scheme is RSA Aggregating digital signature schemes [15] The cost of digital signature computation and verification is quite expensive, especially, when receivers have to deal with HD: TS Đặng Trần Khánh Trang 85/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases thousands of signatures A new technique has been developed to combine these signatures into a single aggregated one Thereby, instead of sending thousands of signatures and making corresponding thousands of verifications, a condensed signature is employed and a single verification is done to ensure correctness of all the messages The Condensed-RSA and the BGLS aggregated signature are two popular samples of this scheme [10, 21] The Merkle Hash Tree [15] The Merkle hash tree (MHT) (see figure 2), first proposed by R.C Merkle, is one of the most popular authentic data structure It is a binary tree, where each leaf contains the hash of the data value, and each internal node contains the hash of its two children The hash function, in this situation, is usually collision-free The verification is done based on the fact that the hash value of the root is authentically published (authenticity can be established by a digital signature) Beside the returned value, the sender also attaches several additional hash values of some nodes so that receiver could reconstruct hash value of the root If this value equals the published value, the data is authenticated By this mechanism, the Merkle hash tree provides an authenticity for simple point queries root = h(h12||h34) h12 = h(h1||h2) h1 h34 = h(h3||h4) h2 h3 h4 Figure Example of the Merkle hash tree Let‟s see an illustrative example in figure [15] The answer here is {5}, then the server returns additional hash values {h1, h34} to clients to verify correctness 2.2 Methodology Correctness Correctness means data has not been tampered with as compared to the origin Digital signature is the most favourite solution If the data is signed by owners‟ private keys and their public keys are well known, clients could easily verify that returned results are truely from the owner and have not been altered Completeness There are a lot of methods to give proof of completeness Most of the researches concentrate on range query and point query In general, point query is the query that returns records whose attributes equal the conditional value and range query returns records whose attributes are within two given boundary values Because SV: Nguyễn Việt Hùng point query is a specific case of range query, if we achieve completeness for range query, then we have it for point query in the same way Now, suppose that a range query Q on relation R returns a set of records S, we have: S = {R | R R, R.x ≥ LB, R.x ≤ UB} (1) In which, LB and UB are two given boundary values and R is a tuple in relation R To give a proof, the server includes two additional tuples of relation R in result set as follows Sb = {RL | Ri R , RL.x = max(Ri.x), Ri.x < LB, i} {RU | Rj R , RU.x = min(Rj.x), Rj.x > UB, j} (2) If this attribute orders the relation R and RL, RU are proven to be satisfied formula (2), there is no record falling into RL and min(S), RU and max(S); and so the completeness proof is achieved Freshness Outsourced data is embedded with unique time information called timestamp [15, 16] Timestamp is widely published to all clients When data owners change the data, they update the timestamp and announce it again Freshness is achieved if clients could extract timestamp from the answers and this value equals the published value Related Work There are several researches in this area, and we concisely summarize major work here As presented in section 2.2, to achieve correctness, each tuple of a relation is signed by a private-key with a digital signature algorithm [7, 10, 16, 21] The signature is stored along with the tuple Returned tuple itself contains a signature; clients recompute the hash and then verify the signature to ensure correctness Another interest is the granularity that relates to which level of relation should be signed There are three levels: whole relation, tuple or attribute [7] Signing the whole table requires the server to return all tuples in the answers, which is unfeasible If signature is at row-level, the answers will include all attributes of the rows so the data privacy is violated In case of column-level, the computation and storage cost at both server and clients are overhead due to expensive largeinteger calculations for a signature However, by employing aggregated signature technique, several signatures are combined into the only one passed to clients Clients a single verification for all returned data, column-level granularity could be taken into account A different approach uses an extra data structure called authenticated data structure (AuthDS), proposed in [11, 14, 15], by extending the idea of MHT This approach provides both correctness and completeness [11, 14, 15] as well as freshness [15] In this scheme, data is sorted at leaves of a tree Leaves also contain hash of data; internal nodes contain hash of their children that calculated by HD: TS Đặng Trần Khánh Trang 86/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases hashing the combination of children‟s hashes These calculations ensure the order of data in the sorted list Therefore, as mentioned in section 2.2, they could prove both correctness and completeness Additionally, the root contains a well-known timestamp value [15] Clients will extract the timestamp from the answers; compare it with this value to know how much up-to-date the answers are AuthDS is usually large and complex It takes an extra cost for storing and maintaining these structures In [10, 21], the authors introduced a new signature-based method called Digital signature aggregation and chaining (DSAC), which gives a concise proof of correctness and completeness with a smaller-impossible storage cost The main idea of DSAC is also the same as that of AuthDS, which was detailed in section 2.2 R6 R5 R7 A1 R2 R5 R6 A2 R7 R5 R12 A3 Figure Signature Chain The relation is sorted by all searchable attributes to have ordered lists Each tuple has some neighbors (left and right) in lists For example, in figure [10], left neighbors of R5 are R6, R2, R7 and right ones are R7, R6, R12 in three searchable dimensions based on attribute A1, A2 and A3 respectively We call left neighbors immediate predecessor The signature of each tuple contains hashes of its entire immediate predecessor (IPR) Sign(r) = h(h(r)||h(IPR1(r))|| … h(IPRl(r)))SK (3) Return to our example, we has signature of R5 computed as: Sign(R5) = h(h(R5)||h(R6)||h(R2)||h(R7))SK With signatures chained in the above fashion, the server answers a range query by releasing all matching tuples; the boundary tuples which are just beyond the query range (to provide a proof of completeness) as well as the aggregated signature corresponding to the result set The signature chain proves to the querier that the server has indeed returned all tuples in the query range Specifically, the querier verifies that the values in the boundary tuples are just beyond the range posed in the query At the same time, the querier verifies that there are no other tuple between the boundary tuples and the satisfied ones This is because boundary values are linked to the first and the last tuple Therefore, the querier obtains a concise proof of completeness [10, 21] The above paragraphes discuss two main approaches in query authenticating: the signature based approach and the AuthDS approach However, these two approaches concentrate only on range queries and not address aggregated ones Moreover both of them are queryunderstanding that means these solutions have to analyze the query syntax, requiring some extra information SV: Nguyễn Việt Hùng (signature or AuthDS) for the proof, and some specific tasks in query processing process This causes a high cost with respect to complex query processing To overcome these problems, a new approach extending the ringer protocol of distributed computing has been developed [8] In this approach, query authenticating are proven for all types of query, regardless their forms However, the probability of successfully cheating at servers is too high, approximate 33% [8] Xml Databases An XML database consists of semi-structured tree data in a human readable text format Nowadays, XML is a defactor standard for information interchange because of its platform-independent and extensible characteristics The XML databases become more famous and widely used over the internet Therefore, the need of outsourcing XML data is quite natural Some research work done recently could be applied to outsourced XML databases in order to obtain privacy and confidentiality [2, 4, 9] None of them, however, is about query authenticating for XML database although there are a lot of efforts for the relational database To cope with security problems of outsourced XML databases, we could create an algorithm to transform an XML database to a traditional relational database in which security problems are quite well-solved Alternatively, we need a special method dealing with security issues in outsourced XML databases Hence we should first answer the question “how we store XML data?” The answer strongly influences to the way we deal with this problem Table based Each XML document has a schema, we call it schema tree that defines the relation between nodes and their attributes (parent-child relationship) A schema tree consists of two node types: element node and attribute node Hereafter, we name element node as tnode and attribute node as a-node With the schema tree, we could easily change an XML document into relational tables as follows - Label all t-nodes with integers incrementally so that their labels are unique in the whole XML document - For each t-node, create a table named by suffixing t-node‟s name with its label The table should have one more column, called ID, that is its primary key - If a t-node has a parent, we append a column named PNodeID to point to its parent‟s corresponding table - Finally, for each child node a-node, append to the table a column named as the a-node‟s name HD: TS Đặng Trần Khánh Trang 87/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Figure Tranform an XML document into tables Figure illustrates a sample From a given XML document (A), we extract its schema tree (B) After labeling, we transform it into relational tables (C) Once an XML document is transformed into traditional tables, we could apply prior work to achieve query authenticating However, an XML database schema is changed easily That causes the table schema to change also, and sometime forces the data to be re-outsourced Node based XML document is structured as a tree with t-nodes and a-nodes We could save t-nodes and a-nodes as tuples in a single table Each node has enough information to reconstruct the XML text We propose the following schemas for t-node and a-node t-node(nodeid, xtype, datatype, nameid, pnodeid, lmaid, value) a-node(nodeid, xtype, datatype, nameid, pnodeid, sibid, value) Where: - Xtype is used to distinguish t-node and a-node among the records - Datatype determines type of the value (text, numeric, date time, etc.) - NameID is the node‟s unique identity (using the labeling in the same manner of table based) - PnodeID refers to parent node‟s tuple - lmaID refers to the left-most attribute of the node - sibID refers to the right sibling of the attribute - Value is the value of the node/attribute For security reasons, this information is serialized into an encrypted binary string before outsourcing Storing XML documents under such a node-based format conforms to tree-structured data Changing an XML document schema does not dramatically affect the outsourced data because it just appends the modified node to the database However, each element is stored by a lot of records for the tag name and attributes Thus outsourced database storage cost may become overhead Furthermore, it‟s difficult to utilize the existing indices This problem extremely affects the overall performance Query Assurance SV: Nguyễn Việt Hùng An important factor for a feasible solution to query assurance of outsourced XML databases depends on index structures, which are employed to manage the storage and retrieval of the data By embedding extra information into this structure, we could achieve query assurance Most related work about XML indexing could be found in [17, 18] However, these structures are not suitable for query authenticity purpose because we expect an ordered list of all XML elements (both xml-tags and xml-attributes) for giving completeness proof (cf section 2.2) Additionally, because of the nature of XML query languages (e.g, XPath, XQuery), most of the queries require join-operations that take an expensive cost for proving the query completeness A simple way of minimizing these costs is to sort XML elements by their parent As mentioned in section 2.2, in order to obtain an effective proof for the completeness, we want to sort all XML elements by two criteria: (path, value) and (path, parent, value), where path is the path from the tree‟s root to a given node However, existing data structure could not help this In the next sections, we will introduce a novel index structure, called Nested Merkle B +-Tree, to facilitate the storage and retrieval as well as to ensure the query correctness, completeness and freshness for outsourced XML databases 5.1 Nested B+-Tree As mentioned above, we desire an index structure that can sort all elements by their corresponding combinations (path, value) and (path, parent, value); where path is the path from the root to a given node Here, we list out all possible paths in the XML document, then we associate each path with an unique integer We call this number nameid To construct our desired structure, we first employ a B+-Tree structure with the search key is nameid (equivalent to path) We name it as NameTree At each leaves‟ entry of NameTree, instead of record‟s links, we store two references to two new B+-Trees having value and (parent, value) as their search keys, respectively We call these trees ValueTree and ParentTree (see figure 4) At all leaves of ValueTree and ParentTree, we store links to a-node or t-node records (hereafter, refered to as data records) Note, if many data records have the same key then they are in a same entry The B+-Tree can solve this situation by employing an additional structure called bucket A bucket stores references to all same data records Finally, an entry of the leaf points to this bucket Combining these trees, we have our desired structure named Nested B+-Tree (NBT) Let us see how this structure fulfills our needs With NameTree, we have data records sorted by nameid (or path) ValueTree sorts data records by value Because ValueTree is at leaf of NameTree, we have data records sorted by (nameid, value) In the same manner, data HD: TS Đặng Trần Khánh Trang 88/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases (n am eid ) records are sorted by (nameid, parent, value) with ParentTree (p (v no de alu id, e) va lue ) NameTree ParentTree ValueTree Figure Nested B+-Tree 5.2 Nested Merkle B+-Tree Basing on the idea of the Mekle Hash Tree, we attach some information to NBT for giving proof of query assurance Each node of NBT has some extra hashes of its children The hash values are calculated as follows a-node record : Ha-node = h(nodeid||xtype||…||value) (4) t-node record : Ht-node = h(h(nodeid||xtype||…||value)|| i Hattr) (5) Leaf of ValueTree and ParentTree: HL = h( i Hdata-record) (6) Internal node: HI = h( i Hchild-node) (7) Leaf node of NameTree: HL-N = h(Hvtree || Hptree) (8) Root node of NameTree: HR = h( || i Hchild-node) (9) Where, Hattr is hash value of an a-node of a given t-node Hdata-record is either Ha-node or Ht-node that associated to the link Hchild-node is one of HL, HI or HL-N Hptree and Hvtree are hash values of the roots of corresponding ParentTree and ValueTree denotes a timestamp value and h() is a oneway non-invertible hash function (e.g, SHA-1, MD5) Additionally, we sign the root of NameTree by privatekey of a digital signature scheme (such as RSA, DSA) The corresponding public key and timestamp are broadcasted to all clients 5.3 Providing query assurance The method of achieving query assurance is the same as that described in section 2.2 Data records are sorted in leaves of ValueTree and ParentTree Let us see how a query is authenticated with this structure SV: Nguyễn Việt Hùng Correctness and Completeness Assume that the result set consists of leaf entries in ValueTree that contain links to records fallen into a given range (range query) As mentioned in section 2.2, two additional boundary records are also included in the result set These entries occupy some adjacent leaves, say {Li, Li+1,…, Lj} The server returns data records and hash values of not-in-result entries in Li and Lj so that clients could recompute hash values of these leaves Similarly, the series {Li, Li+1,…, Lj} occupies entries in some internal nodes, say {Ix, Ix+1,…, Iz} In addition, hash values of the remained entries in internal nodes are returned Recursively, the server returns the root with its signature and timestamp These not-in-result entries (in both leaves and internal nodes) are called co-path Actual results and co-path are packaged in a structure called VO (verification object) and sent to clients The clients will recalculate hash value of the root, then verify it with the signature to assure both correctness and completeness of the result set More detail of co-path and VO could be found in [11, 14] Freshness Along with the result set, the server returns timestamp value of the root After verifying the root‟s signature clients compare returned timestamp to the well-known timestamp broadcasted by the data owner before If two values are equal, clients are guaranteed about freshness of the result 5.4 Select operation We have seen how to use NMB+-Tree to achieve query authenticity for the simplest situation that returns a single and contiguous record set It is just a single step among many ones that have to be done To answer a query, the server has to perform some other tasks in a specific order called execution plan Next, let‟s see some examples of XPath to find out a general method for query processing Customer Name Address State Order City Item Employee Name Code 11 12 Amount Name 13 Price 14 Date 10 Qty 15 Figure An example of labeled XML schema tree As illustrated in figure 6, the round rectangular nodes stand for elements and the sharp corner ones are attributes The numbers next to node are their nameid we mentioned above Example The first question is “Find out all sold items named „TV‟?” The corresponding query in XPath should be expressed like /Customer/Order/Item[@name= HD: TS Đặng Trần Khánh Trang 89/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases ”TV”] The server scans NameTree with nameid = 13 to get the ValueTree Then, the server scans on the found ValueTree with value = „TV‟ to list out all satisfied attributes name_13 and builds a VO for authenticity With the pnodeid field in each found name_13, the server reads and appends these Item_8 into the VO Because there is only one Item_8 for each name_13, the server needs not to provide any information to prove query assurance For each Item_8, the server returns two remain attributes These steps could be rewritten in a semi-structured language as follows STEP#1 IndexMethod : Vtree, nameID=13 STEP#3 IndexMethod :Ptree,nameID=3, pid=ParentStepValue Result level : Retrieval : node only StepValue : ID [For : not included Retrieval : node only StepValue : PNODEID [For each matched item, perform] STEP#2 IndexMethod :DirectIDAccess, id=ParentStepValue Result level : Retrieval : node and all its attributes Example The second example deals with a question “List out all items bought by Marry?” and its query is /Customer[@name=”Marry”]/Order/Item The server does some similar jobs to obtain satisfied Customer_1 records For each Customer_1, the server scans on ParentTree3 with pnodeID equals id of Customer_1 The ParentTree3 is the ParentTree located in the leaf entry that nameid = of the NameTree A VO is built to give proof of query assurance for these records In the same manner, the server returns expected Item_8 records and their VOs The execution plan is described as follows STEP#1 IndexMethod : Vtree, nameID=4 Condition : equal to [Marry] Result level : not included Retrieval : node only StepValue : PNODEID [For each matched item, perform] STEP#2 IndexMethod :DirectIDAccess, value=ParentStepValue Result level : not included Retrieval : node only StepValue : ID [For each matched item, perform] SV: Nguyễn Việt Hùng each matched items, perform] STEP#4 IndexMethod :Ptree,nameID=8, pid=ParentStepValue Condition : equal to [TV] Result level not included Result level : Retrieval : node and all its attributes Unifying VOs For each matched Customer_1 record, the server returns an Order_3 record set that have pnodeid point to the Customer_1 record and, similarly, an Item_8 record set for each found Order_3 Therefore, the server returns a lot of record sets and each of them requires a VO Thus, the number of VOs is linear with the product of matched records in Customer_1 and Order_3 This could dominate communication and computation cost at clients Hence, instead of building multiple VOs, the server only builds a single VO for all returned record sets by making a slight modification on the co-path generating algorithm Thus, the clients only have to carry out one verification for all returned record sets The above shows the way we deal with simple XPath queries However, by applying this idea, we could answer more complex questions, but we not detail it here due to space limitation 5.5 Update operations Since the database could be changed even if it has been outsourced, feasible solutions should have an acceptable cost for update operations Update operations refer to insertion, update, and deletion Data owners could make changes to a local copy of the database then outsource it again Here, we refer to another scheme that data owners not have any local copy and/or re-outsourcing is impossible The most important issue of an update operation is to recalculate embeded security information In this scheme, we should recalculate the data nodes relevant and their associated index nodes The basic idea for this protocol is shown in figure In general, an update could be treated as a series of delete and insert operations We summarize them as follows HD: TS Đặng Trần Khánh Trang 90/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases Figure Protocol to perform update operation Insertion To insert a new element into database, the owner first serializes it into t-node and a-node Thesenodes are sent to server to insert into the database and update the index structure This will change the hash value of a leaf, then the change is propagated to its ancestors (up to the root) The server recalculates these hash values and returns the new hash of the root to the data owner The data owner generates a new timestamp value , combines it with the hash value then signs on it with the private key The signature and are sent to the server to update the root Finally, the owner announces the new timestamp to all clients If many nodes are inserted at the same time, batch operation could be considered to minimize the root resigning round Deletion Performing a delete is quite similar to an insert operation The only difference is the first round of the protocol Instead of inserting, server has to locate deleted node in the index tree then removes it from the index and database The remains are the same as that of the insertion process Analysis and Experiments Until now, no research work about query assurance of outsourced XML databases has been carried out Hence, we will theoretically analyze our proposed solution with respect to storage cost and VO size The table summarizes the notions used in this section n Total number of data item (element and attribute) s Number item of result set f fanout parameter NMB+-Tree T h min|max Min/max height of a tree T LTmin|max Min/max number of leaves in a tree T NTmin|max Min/max of total nodes in a tree T |sign| Size of a hash value (20 bytes for SHA-1) encrypted database Moreover, we use a sample 69,846 items dataset [19], representing about world geographic database integrated from the CIA World Factbook, for all the tests Storage cost Storage cost here is refered to as the cost for additional security information In our scheme, this is the cost of storing the NMB+-Tree The NMB+-Tree consists of one NameTree, several ValueTrees and ParentTrees Because the number of nameid is small compared to that of elements, the storage cost for ValueTrees and ParentTrees dominates the overall storage cost We, however, could not formulate the number of ValueTrees and ParentTrees in a database because it depends on the XML document structure Therefore, we suppose that only one type of node in the XML document, hence there is only one nameid in the whole database This means we have a one-element NameTree, one ValueTree and one ParentTree Thus the storage cost is cost of storing n-elements ValueTree and ParentTree In addition, we assume that these nodes are distinguished by their values We could easily find that the minimal and maximal number of leaves is calculated as formula (10) Here we can get the minimum when all leaves are full (f entries) and the maximum when all leaves are half-full (f/2 entries) n , LVTree max f Then we have: LVTree VTree hmin log f LVTree VTree hmax log f LVTree max Table Formula notions The experiments are conducted on a P4-2.8GHz Windows XP SP-2 with 512MB of memory Benchmarking program is written in VB.NET 2005, NET Framework 2.0 We employ the managed Rijndael and RSA-1024 for data encryption and digital signature We use Microsoft SQL Server 2005 Express Edition for storing the SV: Nguyễn Việt Hùng n f (10) (11) (12) 1 N mVTree in f Lm in hm in 1 f HD: TS Đặng Trần Khánh (13) Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases N VTree m ax Lm ax hm ax f f (14) Hence, the overall storage cost is calculated as follows (note that, proofs of formulas (11) to (14) are shown in appendix) storage Cmin N NTree VTree N PTree N storage Cmax N NTree VTree N max PTree N max (15) In fact, there are many elements that may have the same (nameid, value) or (nameid, parentID, value) These elements will occupy only one entry in the index tree So , before applying these formulas, we should determine a value n‟ that is the number of distinct elements by mentioned conditions Then, we replace any occurance of n in formular (10) by n‟ The actual storage cost is achieved by counting number of index nodes in the database Figure and table show detailed information about the storage cost for our sample dataset Storage cost node 12K 9.885 10K 8K 6.894 5.264 6K 3.518 4K 2K 8.703 7.920 1.821 0K 10K 20K 30K Min nodes 40K 50K Actual nodes 60K 70K size Max nodes Figure Number of actual nodes compared to min-, max-one VO size Similarly, we only concern about extra information returned to clients to give query authenticity The formular (16) is used to calculated the size of a single VO CVO =|sign| CVOi , i = 1,hNMBTree Number of nodes at a depth s f Lh/f h Lh h-1 … Lh-1 = … Lh-i = Lh-i+1/f h-i (16) Additional elements in co-path CVOh = f.Lh - s + CVOh-1 = f.Lh-1 – Lh … CVOh-i = f.Lh-i – Lhi+1 Overall cost The overall cost includes I/O cost, communication cost, CPU cost at both server and client SV: Nguyễn Việt Hùng Trang 91/93 Our experiments are performed in a local environment therefore, communication cost is omitted But this is directly proportional with the VO size We simulate the overall cost by measuring query processing time since a query is submitted to server until a completed XML result text is reconstructed from the returned data To know the I/O cost, for each query, we measure number of nodes that have been loaded from the database into memory during the query processing process Figure describes six phases of the query processing process The two gray phases, Parse and Plan, are ignored in our benchmarking program To make a comparision, we import a same XML document into SQL Server and execute the same query on it Experimental results are shown in the next paragraphes Experimental results Table presents the storage cost in our tests The first line, Distinct items, is the number of value-distinguished items, and the fourth one is that of items distinguished by combinations (parentID, value) The Min Nodes and Max Nodes are theoretic nodes calculated by formula (13) and (14) In table 3, we show the execution time and I/O cost over the result size and database size In which, result size is number of items in the result set Executing time is the overall time, including time for fetching data from the database into memory (fetch data); time for building VO (build VO); time for verifying the VO at client (verify VO); and time for reconstructing XML text (generate XML) SQL Time is the executing time SQL Server used to execute the query IO Cost is number of nodes being loaded into memory during the process Moreover, table and figure 10 show the increase percentage of the four mentioned criteria Result Size, Executing Time, SQL Time and IO cost Conclusions and Future Work This work explored the problem of query assurance of query replies in outsourced database In particular, we developed a novel index structure called Nested Merkle B+-Tree by which we could completely achieve query assurance for dynamic outsourced XML data, ensuring the query correctness, completeness and freshness Our proposed solution is among the first efforts in this area We also implemented a benchmarking program for experiments and carried out evaluations with real datasets to show the efficiency of the proposed solution In the future, we will futher investigate other complex forms of XPath/XQuery, especially aggregated function ones Moreover, another approach could be taken in mind is to employ multi-dimensional access methods (MAMs) [24] for the storage and retrieval management of outsourced XML data Although MAMs bring in many advantages for the indexed data, they introduce HD: TS Đặng Trần Khánh Trang 92/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases challenging issues related to security, especially for outsourced databases Thus, further research in this Database Size Value Tree Parent Tree 10K Distinct items 20K direction will be interesting 30K 40K 50K 60K 70K 5,415 10,743 16,128 20,907 22,279 22,499 24,913 Min Nodes 603 1,196 1,794 2,325 2,477 2,501 2,770 Max Nodes 679 1,345 2,018 2,615 2,786 2,814 3,116 Distinct items 9,126 18,108 27,181 36,376 43,612 50,379 57,497 Min Nodes 1,015 2,014 3,022 4,043 4,848 5,599 6,390 Max Nodes 1,143 2,265 3,400 4,549 5,454 6,299 7,189 Actual nodes 1,821 3,518 5,264 6,894 7,920 8,703 9,885 Table Storage cost from sample dataset DataSize Result size Executing time (s) 10K 20K 30K 40K 50K 60K 70K 77 340 570 743 743 743 743 0.220952 0.5181948 0.776308 0.932495 0.946947 0.965588 0.971315 Fetch data 0.063093 0.3028116 0.553021 0.765792 0.783896 0.784914 0.788293 Build VO 0.000781 0.0021049 0.003099 0.004648 0.00549 0.005479 0.010001 Verify VO 0.154281 0.2063051 0.213926 0.153889 0.150011 0.167531 0.163361 Generate XML 0.001558 0.0031142 0.004486 0.005616 0.005662 0.005688 0.006184 0.038746 0.0878642 0.13144 0.177424 0.188258 0.194569 0.20183 108 402 685 871 873 873 885 SQL Time (s) IOCost (nodes) Table Execution time, I/O cost over result size and database size DataSize 10K 20K 30K 40K 50K 60K 70K Result(%) 341.6% 640.3% 864.9% 864.9% 864.9% 864.9% Time (%) 134.5% 251.3% 322.0% 328.6% 337.0% 339.6% SQL Time(%) 126.8% 239.2% 357.9% 385.9% 402.2% 420.9% IOCost(%) 272.2% 534.3% 706.5% 708.3% 708.3% 719.4% Table The increase of overall cost over result size and database size 1000% Parse Plan 865% 900% 865% 865% 865% 386% 402% 421% Build VO Fetch data server side 800% 700% 640% 600% 500% 400% 300% Verify VO 239% client side 200% Generate XML 127% 100% 0% Figure Six phases of query processing progress 322% 329% 337% 340% 251% 135% 0% 0% 10K 20K Result(%) SV: Nguyễn Việt Hùng 358% 342% 30K Time (%) 40K 50K IOCost(%) 60K 70K SQL Time(%) Figure 10 The increase of overall cost over result size and database size HD: TS Đặng Trần Khánh Trang 93/93 Đề tài: Security Issues in Querying Dynamic Outsourced XML Databases References [1] T.K.Dang, “Security Protocols for Outsourcing Database Services”, Information and Security: An International Journal, ProCon Ltd., Sofia, Bulgaria, 18, 85-108, 2006 [2] P Lin and K.S Candan, “Hiding Tree-Structured Data and Queries from Untrusted Data Stores”, In Proc 2nd Intl Workshop on Security In Information Systems, Porto, Portugal, April 2004 [3] T.K.Dang, “A Practical Solution to Supporting Oblivious Basic Operations on Dynamic Outsourced Search Trees”, In Proc Intl Workshop on Privacy Data Management, in conjunction with ICDE05, IEEE Computer Society, Tokyo, Japan, April 2005 [4] P.Lin and K.S Candan, “Secure and Privacy Preserving Outsourcing of Tree Structured Data”, Secure Data Management, VLDB 2004 Workshop, Toronto, Canada, August, 2004 [5] H Hacigümüş, B Iyer, C Li and S Mehrotra, “Executing SQL over Encrypted Data in the Database-ServiceProvider Model”, In Proc ACM SIGMOD Intl Conf on Management of data, February 2002 [6] K.C.K.Fong, “Potential Security Holes in Hacigümüş‟ Scheme of Executing SQL over Encrypted Data”, 2003, http://www.cs.siu.edu/~kfong/research/database.pdf [7] E.Mykletun, M.Narasimha and G.Tsudik, “Authentication and Integrity in Outsourced Databases”, In Proc ISOC Symp on Network and Distributed System Security, 2004 [8] R.Sion, “Query Executing Assurance for Outsourced Databases”, In Proc 31st VLDB Conf., Trondheim, Norway, 2005 [9] R Brinkman, L Feng, J Doumen, P.H Hartel, and W Jonker, “Efficient Tree Search in Encrypted Data”, Information System Security Journal, 13, 14-21, 2004 [10] M.Narasimha and G.Tsudik, “DSAC: An Approach to Ensure Integrity of Outsourced Databases using Signature Aggregation and Chaining”, ACM Conf on Information and Knowledge Management, November, 2005 [11] E.Mykletun, M.Narasimha and G.Tsudik, “Providing Authentication and Integrity in Outsourced Databased using Merkle Hash Tree‟s”, UCI-SCONCE Technical Report, 2003, http://sconce.ics.uci.edu/das/ MerkleODB.pdf [12] R.Sion and B.Carbunar, “Conjunctive Keyword Search on Encrypted Data with Completeness and Computational Privacy”, Cryptology ePrint Archive, Report, 2005 [13] P.Golle and I.Mironov, “Uncheatable distributed computations”, In Proc Conf on Topics in Cryptology, section 2.2, 2001 [14] P.Devanbu, M.Gertz, C.Martel and S.G.Stubblebine, “Authentic Thrid-party Data Publication”, In Proc IFIP Workshop on Database Security, 101–112, 2000 [15] F.Li, M.Hadjieleftheriou, SV: Nguyễn Việt Hùng G.Kollios and “Dynamic Authenticated Index Structures for Outsourced Databases”, SIGMOD 2006, Chicago, USA, June, 2006 [16] T.K.Dang and N.T.Son, “Providing Query Assurance for Outsourced Tree-Indexed Data”, In Proc Intl Conf on High Performance Scientific Computing, Hanoi, Vietnam, March, 2006 [17] H.Wang, S.Park, W.Fan and P.S.Yu, “ViST: A Dynamic Index Method for Querying XML Data by Tree Structures”, SIGMOD 2003, San Diego, CA, June, 2003 [18] T.Shimizu and M.Yoshikawa, “An XML Index on B+ Tree for Content and Structural Search”, 2005, http://dl.itc.nagoya-u.ac.jp/~shimizu/papers/ shimizu_dews2005.pdf [19] Sample dataset World geographic database, http://www.cs.washington.edu/research/xmldatasets/data/ mondial/mondial-3.0.xml [20] Sample dataset Index of articles from SIGMOD Record, http://www.cs.washington.edu/research/xmldatasets/data/s igmod-record/SigmodRecord.xml [21] M Narasimha and G Tsudik, “Authentication of Outsourced Databases using Signature Aggregation and Chaining”, In Proc Intl Conf on Database Systems for Advanced Applications, April 2006 [22] T.K.Dang, “A Practical Solution to Supporting Oblivious Basic Operations on Dynamic Outsourced Search Trees”, Special Issue of International Journal of Computer Systems Science and Engineering (CSSE), CRL Publishing Ltd, UK, 21(1), 53-64, Jan, 2006 [23] T.K.Dang, “Oblivious Search and Updates for Outsourced Tree-Structured Data on Untrusted Servers”, International Journal of Computer Science and Applications (IJCSA), 2(2), 67-84, June 2005 [24] T.K Dang, “Semantic Based Similarity Searches in Database Systems (Multidimensional Access Methods, Similarity Search Algorithms)”, PhD thesis, FAWInstitute, University of Linz, Austria, May 2003 Appendix Formulas (11) to (14) are proven as follows Minimal nodes at a depth h h-1 h-2 … h-(h-1) Maximal nodes at a depth Lmin Lmin/f Lmin/f2 … Lmin/fh-1 = VTree hmin Lmax 2Lmax/f 22Lmax/f2 … 2h-1Lmax/fh-1 = log f LVTree N Lmin (1 1/ f 1/ f N max Lmax (1 2/ f 22 / f VTree , hmax .) ) log f / LVTree max hmin Lmin (1 / f ) Lmax 2/ f 1 1/ f hmax 1 2/ f L.Reyzin, HD: TS Đặng Trần Khánh 1 ... Giang MSHV: 00703170 I- TÊN ĐỀ TÀI: Các vấn đề bảo mật việc truy vấn CSDL XML động outsourced II- NHIỆM VỤ VÀ NỘI DUNG: - Tìm hiểu tổng quan vấn đề liên quan bảo mật CSDL outsourced - Tìm hiểu nghiên... (SP) thực truy vấn lưu trữ thông qua đường truy? ??n mạng bảo mật Clients trả chi phí cho data owner để có quyền truy cập liệu, thực truy cập liệu trực tiếp từ SP thông qua đường truy? ??n bảo mật Trong... attribute element tài liệu XML theo hai thứ tự: (nameid, value) (nameid, pnodeid, value), đảm bảo cho việc truy vấn nhanh chóng khả đảm bảo truy vấn tài liệu XML Để đảm bảo yêu cầu trên, sử dụng