1. Trang chủ
  2. » Ngoại Ngữ

A Privacy-preserving Query on Outsourced

15 160 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 15
Dung lượng 0,98 MB

Nội dung

Journal of Information & Computational Science 9: (2012) 619–633 Available at http://www.joics.com A Privacy-preserving Query on Outsourced Database with B-tree ⋆ Sha Ma a,∗, a Department b School Bo Yang b , Kangshun Li a , Feng Xia a of Informatics, South China Agricultural University, Guangzhou 510642, China of Computer Science, Shaanxi Normal University, Shaanxi 710062, China Abstract In outsourced database, once the data is encrypted, query processing is more difficult compared with traditional plaintext database Providing query service with preserving privacy is of essential concern in such framework This paper proposes a novel method of a privacy-preserving query on outsourced database with B-tree by searching on B-tree with PIR and then obtaining query results with PIR again We describe the scheme that enable a user to access an encrypted database accurately, privately retrieve information and only obtain query results without leaking other information Our contributions include a set of security notion for such a system as well as a construction which is secure under the newly introduced security notions Keywords: Outsourced Database; Database Security; Private Information Retrieval; B-tree; Order-perserving Sysmmetric Encryption Introduction The proliferation of a new bread of data management applications that store and process data at remote locations has led to the emergence of data outsourcing or database as a service as an important research problem [1, 2, 3] In a typical setting of the problem, data is stored as the remote location in an encrypted form A query generated at the client-side is transformed into a representation such that it can be evaluated directly on encrypted data at the remote location The results might be processed by the client after decryption to determine the final answers The nature of data processing starts to change when the level of trust in the serviceprovider itself begins to decrease from complete to partial to (perhaps) none at all! Such a varying trust scenario necessitates the usage of various security enhancing techniques in the context of outsourced database [4, 5, 6, 7, 8, 9] Our motivation is to preserve the query privacy in the passive adversary (e.g., the database administrator or the user) model ⋆ This work is supported by the National Natural Science Foundation of China under Grants 60973134, 61173164 and 70971043, and the Natural Science Foundation of Guangdong Province under Grant 10351806001000000 ∗ Corresponding author Email address: martin deng@163.com (Sha Ma) 1548–7741 / Copyright © 2012 Binary Information Press March 2012 620 S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 Why we care about the privacy of a database query? Consider the following typical real-life scenario: An outsourced database sever contains diagnosis information about various diseases Alice thinks that she may have some disease, so she wants to investigate it further After Alice sends the description of the disease to the outsourced database, it will then tell Alice the corresponding the diagnosis information If Alice’s query is found in the database, the server immediately knows that Alice may have such a disease; even worse, after receiving Alice’s disease’s description, it can derive much else about Alice, such as other health problems that Alice might have If the server is not trustworthy, it could disclose the information about Alice to other parties, and Alice might have difficulty getting employment, insurance, credit, etc But even if Alice trust the server, and it has no intention of disclosing Alice’s private information, the server himself might prefer that Alice’s query be kept private out of liability concern: If the server knows Alice’s disease information, and that information is accidentally disclosed (perhaps by a external system irruption), the server might face an expensive lawsuit from Alice From this perspective, a trusted server will actually prefer not to know either Alice’s query or its response The known Private Information Retrieval (PIR) techniques is the most related to this problems, which has been widely studied [10, 11, 12, 13] The PIR problem consists of devising a protocol involving a user and a database server, each having a secret input The database’s secret input is called the data string, an n-bit string B=b1 , b2 , · · · , bn The user’s secret input is an integer i between and n The protocol should enable the user to learn bi in a communication-efficient way and at the same time hide i from the database However, PIR technique cannot be directly utilized in outsourced database There are three main reasons The first one is that the user does not know the physical address, e.g., i =2, in outsourced database The user usually sends a SQL sentence including predicts, e.g., attribute op constant (op includes =, , ≤, ≥, etc.)to the database server and the database server retrieves the correct results according to the predicts The second one is that most research related PIR focuses on the user privacy without concerning about data privacy In another way, the user may obtain other physical bits of the data (i.e., xj for j ̸= i) or other information such as the exclusive-or of certain subsets of the bits of x except for a single physical bit of x Although Oblivious Transfer (OT) protocol in cryptography can meet this requirement, it is not a good solution to utilize OT protocol on outsourced database due to significant communication complexity The last one is that data confidentiality can not be guaranteed by PIR, which means the data are still in plaintext However, in outsourced database, the stored data in service provider should be encrypted because the data are out of control by the data owner and may be stolen by a malicious adversary A new protocol is required to realize a privacy-preserving query on outsourced database A trivial solution of a query with preserving privacy on outsourced database is to send all encrypted data to the client, which can operate the decryption and execute querying on plaintext data Obviously, it weakens the advantage of outsourced database because of the drastic increase of the user’s computational cost In addition, the data privacy is not guaranteed any more because the client can obtains all information after decryption While much research focuses on how to query efficiently on the encrypted data [14, 2, 15], the research concerning about privacy-preserving in this scenario has been an interesting direction [16, 17, 18, 19] This paper points to a special query manner on outsourced database with PIR technique using B-tree index Since the search on B-tree index needs specified nodes, e.g., the root node of the tree, which has stable physical addresses as long as the tree’s structure is not changed, we can utilize PIR technique to realize the query with preserving privacy However, a problem we face is that when the user receives specified nodes of the index tree, the decryption of the nodes may still disclose S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 621 other information, which should be hidden from the user, because the data after the decryption may be out of the query results, leading to violating the data privacy Our solution is to use order-preserving encryption to search keys of nodes in the B-tree index The rest of this paper is organized as follows Section describes the preliminaries used in our construction In Section 3, we present a general framework of privacy-preserving query on outsourced database with B-tree and security definitions Section describes our construction and proves that it is secure under the introduced security notions Finally, Section concludes 2.1 Preliminaries B-tree To speed up data access, B-tree index structure is very popular in modern database application (in this paper, we denote all variant of B-tree as B-tree, e.g B+tree.) In [15], the author chooses to encrypt each tree node as a whole because protecting a tree-based index by encrypting each of its field would disclose the ordering relationship between the index values The original tree is then stored as a table with two attributes: the node ID, automatically assigned by the system on insertion, and an encrypted value representing the node content The advantage of this solution is that the content of the B-tree nodes is not visible to the untrusted DBMS The drawback, however, is that the user privacy and data privacy are not protected during the query process Intuitively, to execute an interval query, the front end has to perform a sequence of queries that retrieve tree nodes at progressively deeper levels; The user’s access pattern may be disclosed since the information collected during the retrieve of tree nodes will disclose the construction of the whole tree Fig above shows an example of the B-tree on attribute Customer with sample values Assume the frond end will produce a sequence of queries that will access in sequence node 0, 1, and 4; then, the server knows that the user was accessing node 0, 1, and 4, and node is the root, node is an internal node, node is a leaf node of the tree Using such information collected gradually, together with statistical methods, the server can rebuild the whole tree and infer sensitive information from the encrypted database To solve the problem, we utilize PIR technique to access each layer nodes of B-tree obtaining user privacy In addition, during the query, the user will get more information showing that there are at least two other customers named Jane and Donna in the database through the decryption of node 0, 1, so the data privacy cannot be satisfied To solve the problem, we utilize encryption twice to each node of B-tree, firstly using OPE algorithm and then a general encryption Our solution originates the primitive idea: preserving the typical structure of B-tree through encryption each its fields by OPE and meanwhile breaking the correlation of data and its corresponding index items by different identifying information and different encryption 2.2 PIR A Private Information Retrieval (PIR) scheme allows a user to retrieve information from a database while maintaining the query private from the database managers In this model, the database is viewed as a n-bit string x out of which the user retrieves the i-th bit xi , while giving 622 S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 Fig 1: (a) B-tree and (b) Plaintext table and encrypted table for B-tree the database no information about the index i The main cost measure for such a scheme is its communication complexity The notion of PIR was introduced in Ref [10], where it was shown that if there is only one copy of the database available then n bits of communication are needed (for information-theoretic user-privacy) However, if there are k ≥ non-communicating copies of the database, then there are solutions with much better communication complexity Gertner firstly introduces a model of Symmetrically-private Information Retrieval (SPIR) [20], where the privacy of the data, as well as the privacy of the user, is guaranteed That is, in every invocation of a SPIR protocol, the user learns only a single physical bit of x and no other information about the data The SPIR is realized based on k databases (k ≥ 2) as the first implementation of a distributed version of 1-out-of-n oblivious transfer A Single-Database Private Information Retrieval is proposed by Giovanni Di Crescenzo on EUROCRYPT 2000 [13], which is a non-trivial PIR protocol At the end of the execution of the protocol, the following two properties must hold: (1) after applying the reconstruction function, the user obtains the i-th data bit xi ; and (2) the distributions on the query sent to the database are computationally indistinguishable for any two indices i, i′ Definition (PIR) Let (D, U) be an interactive protocol, and let R be a polynomial time algorithm P rob[R1 ; · · · ; Rn : E] is denoted as the probability of event E, after the execution of random processes R1 ; · · · ; Rn The notation tA,B (x, rA , y, rB ) denotes the transcript of an execution of an interactive protocol (A, B) with input x for A and y for B and with random string rA for A and rB for B and (rA , rB , t) ← tA,B (x, ·, y, ·) is denoted the case where the random strings for both A and B are chosen uniformly at random We say that (D, U, R) is a private information retrieval (PIR) scheme if: (Correctness) For each n ∈ N , each i ∈ {1, , n}, each x ∈ {0, 1}n , where x = x1 ◦ · · · ◦ xn , S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 623 and xl ∈ {0, 1} for l = 1, , n, and for all constants c, and all sufficiently large k, P rob[(rD , rU , t) ←− tD,U ((1k , x), ·, ((1k , n, i)), ·) : R(1k , n, i, rU , t) = xi ] ≥ − k −c (User Privacy) For each n ∈ N , each i, j ∈ {1, , n}, each x ∈ {0, 1}n , where x = x1 ◦ · · · ◦ xn , xl ∈ {0, 1} for l = 1, , n, for each polynomial time D′ , for all constant c, and all sufficiently large k, it hold that |pi − pj | ≤ k −c , where pi = P rob[(rD′ , rU , t) ←− tD′ ,U ((1k , x), ·, ((1k , n, i)), ·) : D′ (1k , x, rD′ , t) = 1] pj = P rob[(rD′ , rU , t) ←− tD′ ,U ((1k , x), ·, ((1k , n, j)), ·) : D′ (1k , x, rD′ , t) = 1] 2.3 OPE Order-preserving Symmetric Encryption (OPE) is a deterministic encryption scheme whose encryption function preserves numerical ordering of the plaintexts Let us define what we mean by this For A, B ⊆ N with|A| ≤ |B|, a function f : A → B is order-preserving if for all i, j ∈ A, f (i) > f (j ) iff i > j OPE has a long history in form of one-part codes, which are list of plaintexts and the corresponding ciphertexts, both arranged in alphabetical or numerical order so only a single copy is required for efficient encryption and decryption Agrawal et al firstly suggests a primitive of OPE for allowing efficient range queries on encrypted data in the database community [21] However, the construction is rather ad-hoc and has certain limitations, namely its encryption algorithm must take as input all the plaintexts in the database It is not always practical to assume that users know all these plaintexts in advance, so a stateless scheme whose encryption algorithm can process single plaintexts on the fly is preferable Moreover, It does not define security nor provide any formal security analysis Alexandra Boldyreva et al proposes an efficient OPE scheme and proves its security based on pseudorandomness of an underlying blockcipher [22] Their construction is based on a natural relation between a random order-preserving function and the hypergeometric probability distribution In this paper, OPE is used for each field of B-tree to make query processing to be done exactly as efficiently as for unencrypted data The user can locate the desired ciphertext in nodes without getting more information, which can satisfy data privacy Definition (OPE)Let SE = (K, Enc, Dec) be an order-preserving encryption scheme with plaintext-space [M] and ciphertext-space [N] for M, N ∈ N such that 2k−1 ≤ N < 2k for some k ∈ N Then there exist an IND-OCPA(indistinguishability under ordered chosen-plaintext attack) adversary A against SE such that 2k Advind−cpa (A) ≥ − SE M −1 So, k in the theorem should be almost as large as M for A’s advantage to be small 3.1 Model and Definitions Model Fig illustrates the four primary entities of the DAS model: Data Owner (DO), user (U), trusted front (F) and Database Service Provider (DSP) We assume that DO stores the encrypted 624 S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 Database (DB) at the DSP and the outsourced data allows certain amount of query processing for U to occur at the DSP without jeopardizing privacy Below, we propose a protocol to ensure the security requirements of this DAS model resorting to F Our assumption for this protocol is that F will not collude with DO, U or DSP in any cases Furthermore, F is usually the deputy of the DO and responsible for query transformation It can send queries to the server on behalf of DO when allowed since the user has registered to use the data owner’s service We briefly depict the properties of our protocol below Data owner Tr us ted DSP Untr usted Trusted DB Trusted d ste tru Un F d ste tru Un User Fig 2: Our model In our model, DO outsources information to the DSP and charges U for using their data The outsourced information is valuable thus all the information should be encrypted to prevent analysis by DSP and other intruders, which we call data confidentiality Meanwhile the outsourced information is important and the user is not allowed to get more information other than what she is querying on DB, which we call data privacy In addition, whenever the user accesses DB, she does not want DO and DSP to know exactly what she is concern about, both the query and its result, which we call user privacy 3.2 Adversarial Model There are three types of adversaries in our model A naive player (U or DSP): who gets a copy of the encrypted data stored in the outsourced database and wants to infer some information A curious service provider: who wants to infer some information from a query or the response to a query A curious user: who wants to infer some information from the response to a query 3.3 Storage Model In order to illustrate the storage model, in this section we give a simple example DSP uses a table for storing and maintaining data entries The table stores encrypted data entries associate with a unique id For example, consider a regular table that has attributes, such as name, age and salary The encrypted table contains columns: tid and etuple, where tid is the unique number of a tuple, which is usually numbered sequentially starting from and the etuple is the encrypted value of the plaintext tuple In addition, encrypted table for storing the B-tree consists S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 625 of 2n + attributes: a unique number which is generated differently from tid since the nid, a unique number of the node in B-tree, is generated as a random number, n search keys and n + pointers, where a parameter n associated with each B-tree index and determines the layout of all blocks of the B-tree See Fig In more detail, plaintext data entries are encrypted by a general encryption algorithm as a tuple in the table because encrypting by row is preferable to encrypting by field for queries from the TPC-H benchmark Encrypted table for B-tree is used to support search functions by which the user can obtain the exact results without the leakage of other information Specifically, all pointers are encrypted once by another key which is not the same as the one used to encrypted the plain entries and all search keys are encrypted twice: firstly, encrypted using OPE, then using the same encryption algorithm for pointers Fig 3: (a) Plaintext table and B-tree and (b) Encrypted table for data entries and B-tree 3.4 Operations We now provide an inaccurate description of our solution Consider a database system D, a data owner DO, the database serve provider DSP, a user U, the trusted party F Suppose the database D consists of m records {d1 , · · · , dm }, each of which contains n attributes {a1 , · · · , an }, for a record di , we use id(di )to denote the identifying information that is uniquely associated with di , such as the value of primary key The DSP not only hosts the encrypted version of D, denoted by D′ = {d′1 , · · · , d′m }, where d′i = ⟨id(di ), E(di )⟩(E(di ) is an encryption of di ), but also hosts an encrypted version of B-tree denoted by BT ree′ , which is constructed on each attribute aj (j ∈ {1 · · · n}) P re is the predicate expression of the query whose value is TURE or FALSE 626 S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 representing the satisfaction of predicates or the converse respectively Definition A privacy-preserving query on outsourced database with B-tree consists of the following probabilistic polynomial time algorithms and protocols: KeyGen(1s ) outputs public and private keys: (Apublic , Aprivate ) for the encryption and decryption of data entries, (Bpublic , Bprivate ) for the encryption and decryption of B-tree and Cprivate for OPE algorithm StoreDO,DSP,F (D, BT ree, Apublic , Bpublic , Cprivate ) is a protocol that allow DO to send D′ to DSP , which is the encryption of D under Apublic , and also associate BT ree′ for each attribute, which is the encryption of B-tree under Bpublic and Cprivate Cprivate are held only by F QueryU,DSP,F (P re, Aprivate , Bprivate ) is a protocol that retrieves all records satisfying P re for U P re, Aprivate , and Bprivate are held only by U 3.5 Security Properties Firstly, we define the security of database encryption Definition (Security of database encryption [4]) An encryption scheme (Gen, Enc, Dec) for database tables, which consists of key generation scheme Gen, encryption function Enc, and decryption function Dec, has indistinguishable encryptions if for every polynomial-size circuit family {Cn }, every polynomial p, and all sufficiently large n, every database R1 and R2 ∈ {0, 1}poly(n) with the same schema and the same number of tuples (i.e., |R1 | = |R2 |):|P r{Cn (EncGen(1n (R1 )) ) = 1} − P r{Cn (EncGen(1n (R2 )) ) = 1}| < p(n) The probability in the above terms is over the internal coin tosses of G and E Next, we describe correctness and privacy for such a system Definition (Query Correctness) Let Apublic , Aprivate , Bpublic , Bprivate ←− KeyGen(1s ) Fix a fi′ ′ n nite sequence of messages and indexes: {{di }m i=1 , {BT reej }j=1 } Suppose that, for all i ∈ [m] and j ∈ [n], the protocol StoreDO,DSP,F (D, BT ree, Apublic , Bpublic , Cprivate ) is executed by DO, DSP and F Denote by RP re the results that U receives after the execution of QueryU,DSP,F (P re, Aprivate , Bprivate ) Then, a privacy-preserving query on outsourced database with B-tree is ′ ′ ′ n said to be correct on the sequence {{di }m i=1 , {BT reej }j=1 } if P r⌈RP re(aw ) = {di |P re(di aw ) = TRUE}⌉ > − neg(1s ), for each predicate, where the probability is taken over all internal randomness used in the protocols Store and Query A privacy-preserving query on outsourced database with B-tree is said to be correct if it is correct on all such finite sequences DO’s privacy consists of two folds: the first one is that all stored data should be indistinguishable to the DSP and the second one is that the user cannot learn any other information besides the results of user’s query Definition For DO’s privacy to DSP, consider the following game between an adversary A and a challenger C A will play the role of DSP and C will play the role of a DO.The game consists of the following steps: S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 627 KeyGen(1s ) is executed by C who sends the output Apublic and Bpublic to A A asks queries of the form (D, BT ree) where D is the plaintext database and BT ree is the plaintext index on D; C answers by executing the protocol Store(D, BT ree, Apublic , Bpublic , Cprivate ) A chooses two pairs (D0 , BT ree0 ), (D1 , BT ree1 ) to be sent to C, where D0 and D1 are of equal size, and BT ree0 and BT ree1 are of equal size C picks a random bit b ∈R {0, 1} and executes Store(Db , BT reeb , Apublic , Bpublic , Cprivate ) with A A asks more queries of the form (D, BT ree) and C responds by executing protocol Store(Db , BT reeb , Apublic , Bpublic , Cprivate ) with A A outputs a bit b′ ∈ {0, 1} ′ We define the adversary’s advantage as AdvA (1s ) = |P r[b = b ] − 12 | We say that a privavypreserving query on outsourced database is DO’s privacy to DSP if, for all A ∈ PPT, we have that AdvA (1s ) is a negligible function Definition For DO’s privacy to U, consider the following game between an adversary A and a challenger C A will play the role of U and C will play the role of a DO The game consists of the following steps: KeyGen(1s ) is executed by C who sends the output Apublic , Bpublic to A A asks queries of the form (D, BT ree) where D is the plaintext database and BT ree is the plaintext index on D; C answers by executing the protocol Store(D, BT ree, Apublic , Bpublic , Cprivate ); A chooses two pairs (D0 , BT ree0 ), (D1 , BT ree1 ) and sends this to C, where the database and BTrees are of equal size, respectively C picks a random bit b ∈R {0, 1} and executes Store(Db , BT reeb , Apublic , Bpublic ) with A A asks queries of the form P re, where the predicate is on a certain attribute; C answers by executing the protocol Query(P re, Aprivate , Bprivate ) with A A asks more queries P re and C responds by executing the protocol Query(P re, Aprivate , Bprivate ) with A A outputs a bit b′ ∈ {0, 1} We define the adversary’s advantage as AdvA (1s ) = |P r[b = b′ ] − 12 | We say that a privacypreserving query on outsourced database is DO’s private to U if, for all A ∈ PPT, we have that AdvA (1s ) is a negligible function Definition For query privacy, consider the following game between an adversary A and a challenger C A plays the role of DSP, and C plays the role of U The game proceeds as follows: 628 S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 KeyGen(1s ) is executed by C who sends the output Apublic , Bpublic to A A asks queries of the form P re, where the predicate is on the certain attribute; C answers by executing the protocol Query(P re, Aprivate , Bprivate ) with A A chooses two predicates P re(a0 ), P re(a1 ) and sends them to C a0 and a1 both are attributes of D C picks a random bit b ∈R {0, 1} and executes the protocol Query(P re(ab ), Aprivate , Bprivate ) with A A asks more queries Pre and C responds by executing the protocol Query(P re, Aprivate , Bprivate ) with A A outputs a bit b′ ∈ {0, 1} We define the adversary’s advantage as AdvA (1s ) = |P r[b = b′ ] − 21 | We say that a privacypreserving query on outsourced database is query privacy if, for all A ∈ PPT, we have that AdvA (1s ) is a negligible function Our Construction We present a construction of a privacy-preserving query on outsourced database in a “semihonest” model In our context, the term “semi-honest” refers to a party that correctly executes the protocol, but may collect information during the protocol’s execution Correctness and privacy will be proved under a computational assumption We assume the outsourced data are encrypted by a semantically secure public-key encryption satisfying the Definition The key generation, encryption algorithms will be denoted by K and E, respectively We define the required algorithms below Firstly, let us describe our assumption about the parties involved again: DO, U, DSP and F In general, there could be many data owners but, for the purpose of describing the protocol, we need only to name one DO is assumed to hold the data, B-tree and the public key U holds the private key and submits query to the database DSP stores the encrypted data and B-tree and provides search service F is the deputy of DO and assists in the execution of user’s query KeyGen(s): Run K(1s ), the key generation algorithm of the underlying cryptosystem, to create public and private keys, (Apublic , Aprivate ) for the encryption and decryption of data entries, (Bpublic , Bprivate ) for the encryption and decryption of B-tree and Cprivate for the search keys of nodes in B-tree Private and public parameters for a PIR scheme are also generated by this algorithm StoreDO,DSP,F (D, BT ree, Apublic , Bpublic , Cprivate ): DO sends the encrypted database and indexes to the DSP The protocol consists of the following steps: (a) DO sends the encrypted version of the database D′ = {(idi , EApublic (di ))}m i=1 and its ′ ′ n BT rees = {BT reesj }j=1 to DSP Specially, all pointers are encrypted once using Bpublic and all search keys are encrypted twice: firstly encrypted using Cprivate for OPE, then encrypted using Bpublic for a general encryption algorithm S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 629 (b) DSP receives and stores D′ and BT rees′ (c) DSP sends the address ρ of the root node of each encrypted B-tree back to DO and F QueryU,DSP,F (P re, Aprivate , Bprivate ): U wishes to retrieve all record satisfying P re from DSP Suppose P re is hold on the attribute The protocol proceeds as follows: (a) U sends P re to F to generates an encrypted P re′ (e.g EOP E (ai ) op EOP E (v) for (ai op v) EOP E (x) is denoted as the encryption of x by OPE algorithm.) (b) U receives the address ρ of BT ree′i from F (c) U executes an efficient PIR protocol to get the encrypted root node of the BT ree′i : NODElevel0 (d) U decrypts the answers for the PIR queries to obtain EOP E (NODElevel0 value), using the key Bprivate (e) U compares EOP E (NODElevel0 value) with EOP E (v) If the result is “>”, U just decrypts the encrypted right pointer to get the address of the next level node If the result is “≤”, U just decrypts the encrypted left pointer to get the address of the next level node (f) U executes the step (c) to (e) again to the next level of NODE (g) U executes the step (f) until the leaf node (h) U decrypts pointers according to the ids of the final results using the private key Bprivate (i) If there are k records in the final results, U executes k efficient PIR protocols to get the records from the encrypted table (j) U decrypts the encrypted results using the private key Aprivate Theorem The privacy-preserving query on outsourced database with B-tree is correct according to Definition Proof The correctness of the protocol is straightforward Suppose that a record d′i is generated by DO, where P re(di ) = TRUE Note that the correctness of the protocol includes the correctness of the traversal on the B-tree and the correctness of getting data based on the results of B-tree traversal OPE allows indexing processing to be done exactly as same as for unencrypted data on B-trees and meanwhile PIR guarantees U to correctly retrieve required items from the database Theorem Assuming security of the underlying cryptosystem, the privacy-preserving query on outsourced database is DO’s privacy to DSP, according to Definition Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security game, from Definition 6, with some non-negligible advantage So, under those conditions, A can distinguish the distribution of Store(D0 , BT ree0 , Apublic , Bpublic , Cprivate ) from the distribution of Store(D1 , BT ree1 , Apublic , Bpublic , Cprivate ), where the word “distribution” refers to the distribution of the transcript of the interaction between the parties A transcript of Store(D, BT ree, Apublic , Bpublic , Cprivate ) essentially consists of just EApublic (D) and EBpublic (BT ree) 630 S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 We assumed that there exists an adversary A that can distinguish these two distributions Hence, the encrypted table or the encrypted B-tree cannot be computationally indistinguishable repectively So, there exists an adversary A′ ∈ PPT that can distinguish between EApublic (D), EBpublic (BT ree) or the correlation between EApublic (D) and EBpublic (BT ree) If A′ ∈ PPT distinguishes the encrypted tables, it has distinguished EApublic (D0 ) from EApublic (D1 ) which violates our assumption of Definition If A′ ∈ PPT distinguishes the encrypted B-tree, it has distinguished EBpublic (BT ree0 ) from EBpublic (BT ree1 ) which violates our assumption of Definition and Definition If A′ ∈ PPT distinguishes the correlation between EApublic (D) and EBpublic (BT rees) Obviously, there is no correlation between EApublic (D) and EBpublic (BT rees) because the data itself and the corresponding index for the same element is independent So we conclude that no such A exists in the first place, and hence the system is secure according to Definition Theorem Assuming security of the underlying cryptosystem, the privacy-preserving query on outsourced database is DO’s privacy to U, according to Definition Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security game, from Definition 7, with some non-negligible advantage So, under those conditions, A can distinguish EApublic (D0 ) from EApublic (D1 ) according to the transcript of the interaction between the parties A transcript of Query protocol consists of a value encrypted using OPE, the address of the B-tree, a sequence of PIR protocols that occur in Query denoted by {PIR(NODEleveli )}i=1 and {PIR(xi )}ℓi=1 suppose that the depth of BTree is and the number of query results is ℓ We assumed that there exists an adversary A that can distinguish between EApublic (D0 ) from EApublic (D1 ) Since EApublic (D0 ) from EApublic (D1 ) are computationally distinguishable, so there exists an adversary A′ ∈ PPT that can receive information about more than the query results If A′ can receive information about more that the query results, consider the following transcript: ECprivate (v) ρ PIR(NODElevel0 ) · · · PIR(NODElevel ) PIRx1 · · · PIRxℓ Since the first and the second item are unrelated to DO’data, A′ can infer other information from {PIR(NODEleveli )}i=1 or {PIR(xi )}ℓi=1 If A′ ∈ PPT can infer other information from PIR{(NODEleveli )}i=1 , it means that A′ can receive more information from the NODEleveli by decryption using Bprivate In our model, the construction of B-tree is described in section 3.3 Each node of the B-tree consists of the following items: EBpublic (p0 ), EBpublic (ECprivate (v1 )), EBpublic (p1 ), · · · , EBpublic (ECprivate (vm )), EBpublic (pm ) Although A′ can obtain the pointers of each node by decryption using Bprivate , it is also unrelated to the DO’data itself So A′ can infer from {ECprivate (vi )}m i=1 , which violates our assumption of Definition S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 631 If A′ ∈PPT infer other information from {PIR(xj )}ℓj=1 more than the query results, it can infer from PIR(xi ), which violates our assumption of Definition So we conclude that no such A exists in the first place, and hence the system is secure according to Definition Theorem Assuming security of the underlying PIR preliminary, the privacy-preserving query on outsourced database with PIR is query privacy, according to Definition Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security game, from Definition 8, with some non-negligible advantage Then, A can distinguish QueryU,DSP,F (P re0 , Aprivate , Bprivate ) from QueryU,DSP,F (P re1 , Aprivate , Bprivate ) with non-negligible advantage The transcript of a Query protocol consists of a value encrypted using OPE, the address of B-tree, a sequence of PIR protocols that occur in Query denoted by {PIR(NODEleveli )}i=1 and {PIR(xi )}ℓj=1 Obviously, the first item is the same; the seconde item is indistinguishable since the address of B-trees is randomness Suppose that the database, the B-tree and the result have the same size respectively, there will be equal number of these PIR queries regardless of the predicate P re Moreover, the number of PIR queries on B-trees is dependent on the level of B-trees and the number of PIR queries on data itself is dependent of the results Consider the following sequence of distributions: ECprivate (v) ρ0 PIR0 (NODElevel0 ) · · · PIR0 (NODElevel ) PIR0 (x1 ) · · · PIR0 (xℓ ) ECprivate (v) ρ1 PIR1 (NODElevel0 ) · · · PIR1 (NODElevel ) PIR1 (x1 ) · · · PIR1 (xℓ ) The first line is the transcript distribution of Query on D0 and the seconde line is the transcript distribution of Query on D1 Since there exists A ∈ PPT that can distinguish the first distribution from the second distribution, then there must exist an adversary A′ ∈ PPT that can distinguish a pair of corresponding PIR queries Therefore, for some i ∈ and j ∈ ℓ we have that A′ can distinguish PIR0 (NODEleveli ) from PIR1 (NODEleveli ) on B-tree or PIR0 (xi ) from PIR1 (xi ) on data entries In both cases, a contradiction of our initial assumption according to Definition Therefore, no such A ∈ PPT exists, and hence our construction is secure according to Definition Theorem (Communication Complexity) The privacy-preserving query on outsourced database from the proceeding construction has sub-linear communciation complexity in n, the number of records held by the DSP Proof Suppose n is the maximum number of tuples to be stored, O(logn) is the depth of Btree Additionally, there are other parameters τ , which is the proportion of the size of search key and the size of a tuple, and ω, which is the proportion of the number of results and the total number n So, the total size of B-tree with storage is O(τ · n) Obviously, we see that the encryption value in P re using OPE and the address of B-tree ρ is independent of n(i.e., their value does not deteriorate as n grows) Therefore the total size of communication of the protocol is O(logn · polylog(τ · n)) + O(ω · polylog(n)) using any polylog(n) PIR protocol, e.g [11, 12] Since τ and ω are values between and 1, so the complexity complexity is O(logn · polylog(n)) 632 S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 Conclusion This paper proposes a novel method of privacy-preserving query on outsourced database with PIR We firstly perform range query using encrypted B-tree index with PIR, and then obtains encrypted records using PIR again based on the results of search on B-tree We mainly solve two main problems The first one is how to search on encrypted B-tree We utilize OPE algorithm to support searching on encrypted data The second one is how to keep the privacy of database query We propose the formal security definition of DO’s privacy to DSP, DO’s privacy to U and query privacy, and then give proofs of our construction References [1] W Lehner, K U Sattler, Database as a service (dbaas), in 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), 1-6 March 2010, Piscataway, NJ, USA, 2010, 1216-1217 [2] H Hacigumus, B Iyer, C Li, S Mehrotra, Executing SQL over encrypted data in the databaseservice-provider model, Proceedings of the ACM SIGMOD International Conference on Managment of Data, June 3-6, 2002, Madison, WI, United States, 2002, 216-227 [3] D Agrawal, A El Abbadi, F Emekci, A Metwally, Database management as a service: Challenges and opportunities, in 2009 IEEE 25th International Conference on Data Engineering (ICDE 2009), 29 March - April 2009, Piscataway, NJ, USA, 2009, 1709-1716 [4] M Kantarcioglu, C Clifton, Security issues in querying encrypted data, Data and Applications Security 19, 2005, 325 [5] G Amanatidis, A Boldyreva, A O’Neill, Provably-secure schemes for basic query support in outsourced databases, Data and Applications Security XXI, 2007, 14-30 [6] J Li, E Omiecinski, Efficiency and security trade-off in supporting range queries on encrypted databases, Data and Applications Security XIX, 2005, 69-83 [7] M Xie, H Wang, J Yin, X Meng, Integrity auditing of outsourced data Proc VLDB Endow., 2007, 782-793 [8] D X Song, D Wagner, A Perrig, Practical techniques for searches on encrypted data, Proceedings of the IEEE Computer Society Symposium on Research in Security and Privacy Berkeley, CA, USA: IEEE, 2000, 44-55 [9] H Pang, J Zhang, K Mouratidis, Scalable verification for outsourced dynamic databases, Proc VLDB Endow., vol 2, no 1, 2009, 802-813 [10] B Chor, O Goldreich, E Kushilevitz, M Sudan, Private information retrieval, in Proceedings of the 1995 IEEE 36th Annual Symposium on Foundations of Computer Science, Milwaukee, WI, USA, 1995, 41-50 [11] C Cachin, S Micali, M Stadler, Computationally private information retrieval with polylogarithmic communication, in Advances in Cryptology - Eurocrypt’99, Lecture Notes in Computer Science, vol 1592, 1999, 402-414 [12] Y Chang, Single database private information retrieval with logarithmic communication, in ACISP Springer, 2004, 50-61 [13] G Di Crescenzo, T Malkin, R Ostrovsky, Single database private information retrieval implies oblivious transfer, in Proceedings of Advances in Cryptology - Eurocrypt 2000, 14-18 May 2000, Berlin, Germany, 2000, 122-138 S Ma et al / Journal of Information & Computational Science 9: (2012) 619–633 633 [14] F Li, M Hadjieleftheriou, G Kollios, L Reyzin, Dynamic authenticated index structures for outsourced databases, in 2006 ACM SIGMOD International Conference on Management of Data, June 27-29, 2006, Chicago, IL, United States, 2006, 121-132 [15] E Damiani, S D C D Vimercati, S Jajodia, S Paraboschi, P Samarati, Balancing confidentiality and efficiency in untrusted relational DBMSs, in Proceedings of the 10th ACM Conference on Computer and Communications Security, CCS 2003, October 27-31, 2003, Washington, DC, United States, 2003, 93-102 [16] F Bao, R H Deng, X Ding, Y Yang, Private query on encrypted data in multi-user settings, in 4th Information Security Practice and Experience Conference, ISPEC 2008, April 21-23, 2008, Lecture Notes in Computer Science, vol 4991, 2008, 71-85 [17] B Thompson, S Haber, W Horne, T Sander, D Yao, Privacy-preserving computation and verification of aggregate queries on outsourced databases, in Privacy Enhancing Technologies, 2009, 185-201 [18] Z Yang, S Zhong, R Wright, Privacy-preserving queries on encrypted data, Proceeding of the 11th European Symposium on Research in Computer Security (CESORICS 2006), 2006, 479-495 [19] B Carbunar, R Sion, Joining privately on outsourced data, in 7th VLDB Workshop on Secure Data Management, SDM 2010, Lecture Notes in Computer Science, vol 6358, 2010, 70-86 [20] Y Gertner, Y Ishai, E Kushilevitz, T Malkin, Protecting data privacy in private information retrieval schemes, in Proceedings of 13th Annual ACM Symposium on Theory of Computing (STOC’98), 23-26 May 1998, vol 60, USA, 2000, 592-629 [21] R Agrawal, J Kiernan, R Srikant, Y Xu, Order preserving encryption for numeric data, in Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, Jun 13-18 2004, Paris, France, 563-574 [22] A Boldyreva, N Chenette, Y Lee, A O’Neill, Order-preserving symmetric encryption, Advances in Cryptology-Eurocrypt 2009, 2009, 224-241 [...]... 2002, Madison, WI, United States, 2002, 216-227 [3] D Agrawal, A El Abbadi, F Emekci, A Metwally, Database management as a service: Challenges and opportunities, in 2009 IEEE 25th International Conference on Data Engineering (ICDE 2009), 29 March - 2 April 2009, Piscataway, NJ, USA, 2009, 1709-1716 [4] M Kantarcioglu, C Clifton, Security issues in querying encrypted data, Data and Applications Security... construction References [1] W Lehner, K U Sattler, Database as a service (dbaas), in 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), 1-6 March 2010, Piscataway, NJ, USA, 2010, 1216-1217 [2] H Hacigumus, B Iyer, C Li, S Mehrotra, Executing SQL over encrypted data in the databaseservice-provider model, Proceedings of the ACM SIGMOD International Conference on Managment of Data,... Hadjieleftheriou, G Kollios, L Reyzin, Dynamic authenticated index structures for outsourced databases, in 2006 ACM SIGMOD International Conference on Management of Data, June 27-29, 2006, Chicago, IL, United States, 2006, 121-132 [15] E Damiani, S D C D Vimercati, S Jajodia, S Paraboschi, P Samarati, Balancing confidentiality and efficiency in untrusted relational DBMSs, in Proceedings of the 10th ACM... 325 [5] G Amanatidis, A Boldyreva, A O’Neill, Provably-secure schemes for basic query support in outsourced databases, Data and Applications Security XXI, 2007, 14-30 [6] J Li, E Omiecinski, Efficiency and security trade-off in supporting range queries on encrypted databases, Data and Applications Security XIX, 2005, 69-83 [7] M Xie, H Wang, J Yin, X Meng, Integrity auditing of outsourced data Proc VLDB... Sander, D Yao, Privacy-preserving computation and verification of aggregate queries on outsourced databases, in Privacy Enhancing Technologies, 2009, 185-201 [18] Z Yang, S Zhong, R Wright, Privacy-preserving queries on encrypted data, Proceeding of the 11th European Symposium on Research in Computer Security (CESORICS 2006), 2006, 479-495 [19] B Carbunar, R Sion, Joining privately on outsourced data,... from the database Theorem 2 Assuming security of the underlying cryptosystem, the privacy-preserving query on outsourced database is DO’s privacy to DSP, according to Definition 6 Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security game, from Definition 6, with some non-negligible advantage So, under those conditions, A can distinguish the distribution of Store(D0... to Definition 7 Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security game, from Definition 7, with some non-negligible advantage So, under those conditions, A can distinguish EApublic (D0 ) from EApublic (D1 ) according to the transcript of the interaction between the parties A transcript of Query protocol consists of a value encrypted using OPE, the address of... then obtains encrypted records using PIR again based on the results of search on B-tree We mainly solve two main problems The first one is how to search on encrypted B-tree We utilize OPE algorithm to support searching on encrypted data The second one is how to keep the privacy of database query We propose the formal security definition of DO’s privacy to DSP, DO’s privacy to U and query privacy, and then... Definition 1 So we conclude that no such A exists in the first place, and hence the system is secure according to Definition 7 Theorem 4 Assuming security of the underlying PIR preliminary, the privacy-preserving query on outsourced database with PIR is query privacy, according to Definition 8 Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security game, from Definition... distribution from the second distribution, then there must exist an adversary A ∈ PPT that can distinguish a pair of corresponding PIR queries Therefore, for some i ∈ and j ∈ ℓ we have that A can distinguish PIR0 (NODEleveli ) from PIR1 (NODEleveli ) on B-tree or PIR0 (xi ) from PIR1 (xi ) on data entries In both cases, a contradiction of our initial assumption according to Definition 1 Therefore, no such A ... that a privacypreserving query on outsourced database is query privacy if, for all A ∈ PPT, we have that AdvA (1s ) is a negligible function Our Construction We present a construction of a privacy-preserving. .. United States, 2002, 216-227 [3] D Agrawal, A El Abbadi, F Emekci, A Metwally, Database management as a service: Challenges and opportunities, in 2009 IEEE 25th International Conference on Data Engineering... computational cost In addition, the data privacy is not guaranteed any more because the client can obtains all information after decryption While much research focuses on how to query efficiently on

Ngày đăng: 30/12/2015, 18:18

w