Nghiên cứu phát triển các phương pháp của lý thuyết đồ thị và otomat trong giấu tin mật và mã hóa tìm kiếm tt tiếng anh

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY —————————— Nguyen Huy Truong RESEARCH ON DEVELOPMENT OF METHODS OF GRAPH THEORY AND AUTOMATA IN STEGANOGRAPHY AND SEARCHABLE ENCRYPTION Major: Mathematics and Informatics Major code: 9460117 ABSTRACT OF DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS Hanoi - 2020 The dissertation is completed at: Hanoi University of Science and Technology Supervisors: Assoc Prof Dr Sc Phan Thi Ha Duong Dr Vu Thanh Nam Reviewer 1: Reviewer 2: Reviewer 3: The dissertation will be defended before approval committee at Hanoi University of Science and Technology Time , date month year The dissertation can be found at: Ta Quang Buu Library, Hanoi University of Science and Technology Vietnam National Library LIST OF PUBLICATIONS [T1] N H Truong (2019), “A New Digital Image Steganography Approach Based on The Galois Field GF (pm ) Using Graph and Automata”, KSII Transactions on Internet and Information Systems, 13(9), pp 4788-4813 (ISI) [T2] N H Truong (2019), “A New Approach to Exact Pattern Matching”, Journal of Computer Science and Cybernetics, 35(3), pp 197-216 [T3] N H Truong (2019), “Automata Technique for The LCS Problem”, Journal of Computer Science and Cybernetics, 35(1), pp 21-37 [T4] N H Truong (2019), “A Novel Cryptosystem Based on Steganography and Automata Technique for Searchable Encryption”, KSII Transactions on Internet and Information Systems (revised) (ISI) INTRODUCTION When the use of computer and Internet is more and more essential, digital data (information) can be copied as well as accessed illegally As a result, information security becomes increasingly important There are two popular methods to provide security, which are cryptography and data hiding Cryptography is used to encrypt data in order to make the data unreadable by a third party Data hiding is used to embed data in digital media Based on the purpose of the application, data hiding is generally divided into steganography that hides the existence of data to protect the embedded data and watermarking that protects the copyright ownership and authentication of the digital media carrying the embedded data Steganography can be used as an alternative way to cryptography However, steganography will become weak if attackers detect existence of hidden data Hence integrating cryptography with steganography is as a third choice for data security With the rapid development of applications based on Internet infrastructure, cloud computing becomes one of the hottest topics in the information technology area Indeed, it is a computing system based on Internet that provides on-demand services from application and system software, storage to processing data For example, when cloud users use the storage service, they can upload information to the servers and then access it on the Internet online Meanwhile, enterprises can not spend big money on maintaining and owning a system consisting of hardware and software Although cloud computing brings many benefits for individuals and organizations, cloud security is still an open problem when cloud providers can abuse their information and cloud users lose control of it Thus, guaranteeing privacy of tenants’ information without negating the benefits of cloud computing seems necessary In order to protect cloud users’ privacy, sensitive data need to be encoded before outsourcing them to servers Unfortunately, encryption makes the servers perform search on ciphertext much more difficult than on plaintext To solve this problem, many searchable encryption techniques have been presented since 2000 Searchable encryption does not only store users’ encrypted data securely but also allows information search over ciphertext Searchable encryption for exact pattern matching is a new class of searchable encryption techniques The solutions for this class have been presented based on algorithms for or approaches to exact pattern matching As in retrieving information from plaintexts, the development of searchable encryption with approximate string matching capability is necessary, where the search string can be a keyword determined, encrypted and stored in cloud servers or an arbitrary pattern From the above problems, together with methods using graph theory and automata proposed by P T Huy et al of solving problems of exact pattern matching (2002), longest common subsequence (2002) and steganography (2011, 2012 and 2013), and their potential applications in steganography and searchable encryption, as well as under the direction of supervisors, the dissertation title assigned is research on development of methods of graph theory and automata in steganography and searchable encryption The purpose of the dissertation is to research on the development of new and quality solutions using graph theory and automata, suggesting their applications in, and applying them to steganography and searchable encryption Based on the results and suggestions introduced by P T Huy et al., the dissertation will focus on following four problems in steganography and searchable encryption: - Digital image steganography; - Exact pattern matching; - Longest common subsequence; - Searchable encryption For the first three problems, the dissertation’s work is to find new and efficient solutions using graph theory and automata Then they will be used and applied to solve the last problem The dissertation has been completed with structure as follows Apart from Introduction at the beginning and Conclusion at the end of the dissertation, the main content of it is divided into five chapters Chapter Preliminaries Chapter Digital image steganography based on the Galois field using graph theory and automata Chapter An automata approach to exact pattern matching Chapter Automata technique for the longest common subsequence problem Chapter Cryptography based on steganography and automata methods for searchable encryption The contents of the dissertation are written based on the paper [T1] published in, the revised manuscript [T4] submitted to KSII Transactions on Internet and Information Systems (ISI), and the papers [T2, T3] published in Journal of Computer Science and Cybernetics in 2019 The main results of the dissertation have been presented at: Seminar on Mathematical Foundations for Computer Science at Institute of Mathematics, Vietnam Academy of Science and Technology; The 9th Vietnam Mathematical Congress, Nha Trang, August 14-18, 2018; Seminar at School of Applied Mathematics and Informatics, Hanoi University of Science and Technology CHAPTER PRELIMINARIES 1.1 Basic Structures 1.1.1 Strings In this dissertation, secret data are considered as strings So, some terms related to strings will be recalled here 1.1.2 Graph Besides some basic concepts in graph theory, this subsection recalls the way representing a graph by adjacency lists and breadth first search These are used in Chapter 1.1.3 Deterministic Finite Automata Study on the problem of the construction and the use of deterministic finite automata is one of objectives of the dissertation Hence, this subsection will clarify this model of computation 1.1.4 The Galois Field GF (pm ) This subsection re-presents how to construct a finite field with pm elements, called the Galois field GF (pm ), where p is prime and m ≥ is an integer The algebraic structure will be used in Chapter 1.2 Digital Image Steganography The interest problem in Chapter is digital image steganography This section will recall the concept of digital images, the basic model of digital image steganography, some parameters to determine the a a, b efficiency of digital image steganography and lastly re-present results researched on development and used in Chapter such as the fastest b a q2 q1 q0 (FOPA) optimal parity assignment method, the module method and the concept of the maximal secret data ratio (MSDR) b The basic model of digital image steganography is shown in Figure 1.4 Secret Data Cover Image Secret Data Communication Channel Embed Stego Image Stego Image Extract Cover Image Send to Secret Key Secret Key Sender Receiver Figure 1.4 The basic diagram of digital image steganography Definition 1.4 (P T Huy et al., 2011) MSDR k (N ) is the largest number of embedded bits of secret data2 in an image 4block of N pixels by changing colours of at most k pixels in the image block, where k,2 N are positive integers Given a positive integer qcolour , call 3qcolour the number of different ways to change the colour of each pixel in an arbitrary image block of N pixels Then 2 k k MSDR k (N ) = log2 (1+qcolour CN +qcolour CN +· · ·+qcolour CN ) (1.3) 1.3 Exact Pattern Matching This section will restate the exact pattern matching problem, and recall the concept of the degree of fuzziness (appearance) used in Chapter Definition 1.5 Let p be a pattern of length m and x be a text of length n over the alphabet Σ Then the exact pattern matching problem is to find all occurrences of the pattern p in x Definition 1.6 (P T Huy et al., 2002) Let p be a pattern and x be a text of length n over the alphabet Σ Then for each ≤ i ≤ n, a degree of appearance of p in x at position i is equal to the length of a longest substring of x such that this substring is a prefix of p, where the right end letter of the substring is x[i] 1.4 Longest Common Subsequence This section will recall the longest common subsequence (LCS) problem, and the Knapsack Shaking approach addressing the problem studied on development in Chapter Denote an arbitrary longest common subsequence of p and x by LCS(p, x) The length of a LCS(p, x) is denoted by lcs(p, x) Let p and x be two strings of lengths m and n over the alphabet Σ, m ≤ n The longest common subsequence problem for two strings (LCS problem) can be stated in two following forms Problem Find a LCS(p, x) Problem Compute the lcs(p, x) 1.5 Searchable Encryption This section clarifies the term of searchable encryption (SE) and recalls the definition of a cryptosystem They will be studied and used in Chapter SE is indeed a system consisting of two main components, a cryptosystem is used to encode and decode on client users side and algorithms for searching on encrypted data are done on cloud providers side In cryptography, SE can be either searchable symmetric encryption (SSE) or searchable asymmetric encryption (SAE) In SSE, only private key holders can create encrypted data and produce trapdoors for search In SAE, users who have the public key can make ciphertexts but only private key holders can generate trapdoors CHAPTER DIGITAL IMAGE STEGANOGRAPHY BASED ON THE GALOIS FIELD USING GRAPH THEORY AND AUTOMATA This chapter first proposes concepts of optimal and near optimal secret data hiding schemes The chapter then proposes a new digital image steganography approach based on the Galois field GF (pm ) using graph and automata to design the data hiding scheme of the general form (k, N, log2 pmn ) for binary, gray and palette images with the given assumptions, where k, m, n, N are positive integers and p is prime, shows sufficient conditions for existence and proves existence of some optimal and near optimal secret data hiding schemes These results are derived from the concept of the maximal secret data ratio of embedded bits, the module method and the FOPA method proposed by P T Huy et al in 2011, 2012 and 2013, recalled in Section 1.2 of Chapter An application of the schemes to the process of hiding a finite sequence of secret data in an image is also considered Security analyses and experimental results confirm that the proposed approach can create steganographic schemes which achieve high efficiency in embedding capacity, visual quality, speed as well as security, which are key properties of steganography The results of Chapter have been published in [T1] 2.1 Introduction 2.2 The Digital Image Steganography Problem Definition 2.1 A block based secure data hiding scheme in digital images (for short, called a data hiding scheme) is a five tuple (I, M, K, Em, Ex), where the following conditions are satisfied I is a set of all image blocks with the same size and image type, M is a finite set of secret elements, K is a finite set of secret keys, Em is an embedding function to embed a secret element in an image block, Em : I × M × K → I, Ex is an extracting function to extract an embedded secret element from an image block, Ex : I × K → M, Ex(Em(I, M, K), K) = M, ∀(I, M, K) ∈ I × M × K Definition 2.2 A data hiding scheme (I, M, K, Em, Ex) is called a data hiding scheme (k, N, r), where k, N, r are positive integers, if each image block in I has N pixels and the embedding function Em can embed r bits of secret data in an arbitrary image block by changing colours of at most k pixels in the image block Definition 2.3 For a given qcolour , a data hiding scheme (k, N, r) is called an optimal data hiding scheme if r = MSDR k (N ) and N , N < N , r = MSDR k (N ) Then N is denoted by Noptimum Definition 2.4 For a given qcolour , a data hiding scheme (k, N, r) is called a near optimal data hiding scheme if r = MSDR k (N ) and N > Noptimum The chapter’s digital image steganography problem Design optimal or near optimal data hiding schemes (k, N, r) for digital images (binary, gray and palette images) 2.3 A New Digital Image Steganography Approach 2.3.1 Mathematical Basis based on The Galois Field Let GF n (pm ) = {(x1 , x2 , , xn )|xi ∈ GF (pm ), ∀i = 1, n}, where n is a positive integer, with two operations of vector addition + and scalar multiplication · are defined as follows x + y = (x1 + y1 , x2 + y2 , , xn + yn ), ax = (ax1 , ax2 , , axn ), a ∈ GF (pm ), where x, y ∈ GF n (pm ) and x = (x1 , x2 , , xn ), y = (y1 , y2 , , yn ) We remember that (GF n (pm ), +, ·) is a vector space over the field GF (pm ) Definition 2.5 The class of an element x ∈ GF n (pm ), denoted by [x], is given by [x] = {ax|a ∈ GF (pm )\{0}} Theorem 2.2 Suppose that find a k-Generators S for the vector space GF n (pm ) and build a flip graph G Then there exists the data hiding scheme (k, N, log2 pmn ), where N = |S| Security analysis of the data hiding scheme proposed (k, N, log2 pmn ): Assume that publish parameters k, N , Em, Ex, the vector space GF n (pm ) and the flip graph G in the data hiding scheme (k, N, log2 pmn ) log2 pmn c(pm − 1)N N !pmN Cp2mn log2 pmn ! (2.12) Theorem 2.3 Suppose that build a flip graph G Then there exists the mn −1 optimal data hiding scheme (1, ppm −1 , log2 pmn ) for qcolour = pm − Propostion 2.6 For n is a positive integer, there exists the optimal data hiding scheme (1, 2n − 1, n) for binary, gray and palette images with qcolour = Notice that if we set N = 2n − 1, then the data hiding scheme (1, 2n − 1, n) becomes the data hiding scheme (1, N, log2 (N + 1) ) Remember that for N is a positive integer, the data hiding scheme (1, N, log2 (N +1) ) for binary image with qcolour = is the data hiding scheme CTL (Chang et al., 2005) So, Proposition 2.6 shows that the data hiding scheme CTL reaches an optimal data hiding scheme for N = 2n − 1, where n is a positive integer Theorem 2.4 Suppose that find a 2-Generators S for the vector space GF n (pm ) with |S| = pm −3 + mn (pm −3)2 +2(2 log2 p pm −1 −1) and build a flip graph G Then there exists the optimal data hiding scheme (2, |S|, log2 pmn ) for qcolour = pm − 2.4 The Near Optimal and Optimal Data Hiding Schemes for Gray and Palette Images Here consider the case k = p = m = and n = 4, the data hiding scheme (2, N, 8) exists if the hypothesis of Theorem 2.2 is satisfied, it means that find a 2-Generators S for the vector space GF (22 ), |S| = N and build a flip graph G over the Galois field GF (22 ) 10 Theorem 2.5 There exists the near optimal data hiding scheme (2, 9, 8) for gray and palette images with qcolour = Security analysis of the near optimal data hiding scheme (2, 9, 8): c39 9!218 28 ! (2.45) Corollary 2.1 There exists the optimal data hiding scheme (1, 5, 4) for gray and palette images with qcolour = Security analysis of the optimal data hiding scheme (1, 5, 4): 35 5!210 24 ! (2.47) 2.5 Experimental Results This section makes a number of experiments to evaluate efficiency of the proposed data hiding schemes and approach 2.6 Conclusions An interesting question arises as to whether there exists the optimal data hiding scheme (2, 8, 8) for 8-bit gray image with qcolour = To increase the data security of the proposed data hiding schemes, Chapter will study on the problem of combining cryptography and steganography for SE CHAPTER AN AUTOMATA APPROACH TO EXACT PATTERN MATCHING This chapter proposes a flexible approach using automata to design an effective algorithm for exact pattern matching in practice, and compares it with some of the most efficient algorithms, such as AOSO, EBOM, FJS, FSBNDM, HASHq, LBNDM, SA, BMH-SBNDM, SBNDMq, TVSBS These results are based on the concept of the degree of appearance introduced by P T Huy et al in 2002, recalled in Section 1.3 of Chapter Theoretical analyses and experimental results show that in practice the proposed algorithm is faster than the above mentioned algorithms in most of the given cases of patterns and alphabets 11 The results of Chapter have been published in [T2] 3.1 Introduction This chapter only focuses on addressing the exact pattern matching problem, recalled in Section 1.3 of Chapter Research on applying the proposition of the chapter for solving this problem to SE will be introduced in Chapter 3.2 The New Algorithm - The MRc Algorithm Given a positive integer c, a string of length c is called a c block A c block is called (resp not) to be in p, denoted by c block ∈ p (resp c block ∈ / p), if the c block is (resp not) a substring of p For a given positive integer c, ≤ c ≤ m and c ≤ i ≤ m, the substring p[i − c + i] is called a c block of p at position i, denoted by c blockip In particular, c = 1, then c block is only a letter Definition 3.3 Let p be a pattern and z be a c block of p, where c is a positive integer for ≤ c ≤ |p| Let i be some position in p for c ≤ i ≤ |p| Then i is called the last position of appearance of z in p, denoted by Pos p (z), if z = c blockip and ∀j > i, j ≤ |x|, z = c blockjp Based on the automaton Mp and the two concepts of the breaking point and Posp , the basic idea of the proposed approach to exact pattern matching is shown in Figure 3.2 A c_block p, slide the window and the next test jump Set q = The breaking point occurs, the next test jump x Window A c_block A backtracking position p Figure 3.2 The basic idea of the proposed approach From the above approach, this section constructs a new exact pattern matching algorithm, called the MRc algorithm The correctness of the MRc algorithm a is guaranteed by the following theorem b,c,# b b,c,# b,# 12 a Theorem 3.3 For any given pattern p and text x, the MRc algorithm finds all occurrences of the pattern p in x 3.3 Analysis of The MRc Algorithm Propostion 3.2 Let p be a pattern of length m and x be a text of length n over the alphabet Σ Let c be a positive integer constant such that ≤ c ≤ m Then MRc algorithm requires n + 2c letters of x accessed in the worst case Denote the probability of an arbitrary event by P Propostion 3.3 Let p be a pattern of length m over the alphabet Σ If |Σ| ≥ and ≤ m ≤ 2048, then there exists c, ≤ c ≤ such that for an arbitrary c block z over the alphabet Σ, P (z ∈ x) ≤ 2−5 with a uniform distribution over the alphabet Σ Theorem 3.4 Let p be a pattern of length m and x be a text of length n over the alphabet Σ Let T (n) be the number of all letters of x accessed by the MRc algorithm If |Σ| ≥ 4, 16 ≤ m ≤ 2048 or |Σ| ≥ 32, ≤ m ≤ 2048, then there exists c, ≤ c ≤ such that the two following conditions are satisfied with a uniform distribution over Σ (a) T (n) < n, (b) P (z ∈ x) ≤ 2−5 , where z is an arbitrary c block over the alphabet Σ 3.4 Experimental Results This section makes a number of experiments to compare the MRc algorithm with the other algorithms 3.5 Conclusions The appearance of a part of the pattern is immediately reflected or updated at the any position being scanned in the text, so the chapter’s approach can be applied to SE This issue will be presented in Chapter CHAPTER AUTOMATA TECHNIQUE FOR THE LONGEST COMMON SUBSEQUENCE PROBLEM This chapter proposes two efficient algorithms in practice for computing the length of a longest common subsequence of two 13 strings, using automata technique, in sequential and parallel ways For two input strings of lengths m and n with m ≤ n, the parallel algorithm uses k processors (k ≤ m) and costs time complexity O(n) in the worst case, where k is an upper estimate of the length of a longest common subsequence of the two strings These results are based on the Knapsack Shaking approach introduced by P T Huy et al in 2002, recalled in Section 1.4 of Chapter Experimental results show that for the alphabet of size 256, the proposed sequential and parallel algorithms are about 65.85 and 3.41m times faster than the classical dynamic programming algorithm proposed by Wagner and Fisher in 1974, respectively The results of Chapter have been published in [T3] 4.1 Introduction The chapter’s work is only concerned with the problem of computing the length of a longest subsequence of two strings of lengths m and n, which is the Problem restated in Section 1.4 of Chapter Further, study on applying results of this chapter to SE for approximate pattern matching is one of main objectives of Chapter 4.2 Mathematical Basis In fact, when apply the Problem to the approximate pattern matching problem, we only need to find a common subsequence of two strings such that the length of this common subsequence is equal to a given constant So, in general case, Theorem 1.1 will be replaced with the following theorem Theorem 4.1 Let p and x be two strings of lengths m and n over the alphabet Σ, m ≤ n Let c be a positive integer constant, ≤ c ≤ m and Acp = (Σ, Q, q0 , ϕ, F ) corresponding to p be an automaton over the alphabet Σ, where • The set of states Q = Config(p), • The initial state q0 = C0 , • The transition function ϕ is given as in Definition 1.12, • The set of final states F = {Cf |Cf ∈ Config(p), Cf = {x1 , x2 , , xc } or Cf = ϕ(C0 , x)} Suppose Cf = {x1 , x2 , , xt } is a final state for ≤ t ≤ m Then there exists a substring u of x such that a LCS(p, u) equals xt 14 Theorem 4.2 Let p be a string of length m on the alphabet Σ, C ∈ Config(p) and s ∈ Σ∗ Then δ(Wp (C), s) = Wp (ϕ(C, s)), where δ and ϕ are given as in Definitions 4.5 and 1.12, respectively 4.3 Automata Models for Solving The LCS Problem Theorem 4.3 Let p and x be two strings of lengths m and n over the alphabet Σ, m ≤ n Let c be a positive integer constant, ≤ c ≤ m and ASc p = (Σ, Q, q0 , δStep , F ) corresponding to p be an automaton over the alphabet Σ, where • The set of states Q = WConfig(p), • The initial state q0 = W0 , • The transition function δStep is given as in Definition 4.8, • The set of final states F = {Wf |Wf ∈ WConfig(p), |Wf | = c or Wf = δStep (W0 , x)} Suppose Wf is a final state Then there exists a substring u of x such that lcs(p, u) = |Wf | Theorem 4.4 Let p and x be two strings of lengths m and n over the alphabet Σ, m ≤ n Let c be a positive integer constant, ≤ c ≤ m and APp c = (Σ, Qp , q0 , δp , Fp ) corresponding to p be an automaton over the alphabet Σ, where • The set of states Qp = WConfig(p), • The initial state q0 = W0 , • The transition function δp is given as in Definition 4.9 • The set of final states Fp = {Wf |Wf ∈ WConfig(p), |Wf | = c or Wf = δp (W0 , x)} Suppose Wf is a final state Then there exists a substring u of x such that lcs(p, u) = |Wf | Based on Theorem 4.4 with c = |p|, a parallel algorithm for solving the Problem is constructed as follows Algorithm (the parallel algorithm): Input: Two strings p and x, |p| ≤ |x| Output: The lcs(p, x) q = W0 ; // Set up the initial state of the automaton APp c For i = to |x| Do { q = δp (q, x[i]); 15 If (|q| = |p|) Break; } lcs(p, x) = |q|; Propostion 4.3 Let p and x be two strings of lengths m and n over the alphabet Σ, m ≤ n Suppose the Algorithm uses k processors (k ≤ m), where k is an upper estimate of the length of a longest common subsequence of the two strings Then the time complexity of the Algorithm is O(n) in the worst case 4.4 Experimental Results This section carries out a number of experiments to compare the two proposed algorithms with the Algorithm WF 4.5 Conclusions By automata technique, the lcs(p, x) is always reflected and updated at every location being scanned in the string x Then the chapter’s technique for the LCS problem can be exploited in SE This application will be studied in Chapter CHAPTER CRYPTOGRAPHY BASED ON STEGANOGRAPHY AND AUTOMATA METHODS FOR SEARCHABLE ENCRYPTION This chapter first proposes a novel cryptosystem based on the data hiding scheme (2, 9, 8), a new result presented in Section 2.4 of Chapter 2, with high security, where encrypting and hiding are done at once, the ciphertext does not depend on the input image size as existing hybrid techniques of cryptography and steganography The results of Chapters and 4, using automata technique, then are applied to design two algorithms for exact and approximate pattern matching on secret data encrypted by the proposed cryptosystem Theoretical analyses remark that these algorithms both have O(n) time complexity in the worst case, where for the approximate algorithm, assume that it uses (1 − )m processors, where , m and n are the error of the string similarity measure newly defined in Formula (5.11) and lengths 16 of the pattern and secret data, respectively In searchable encryption, the cryptosystem can be applied to encrypt and decrypt secret data by users and pattern matching algorithms can be used by servers to perform pattern search The results of Chapter have been published in [T4] 5.1 Introduction The goal of this chapter is to propose a novel symmetric cryptosystem that is used on users side, and algorithms for exact and approximate pattern matching on ciphertexts which are used on cloud servers side These are essential components in SSE 5.2 A Novel Cryptosystem Based on The Data Hiding Scheme (2, 9, 8) The function Em (computing the flip information q): q = q0 ; For i = to N Do q = δ(q, Ii ); q = δ(q, M ); The function Ex (extracting M from I ): I = I; For each (it , at ) in q Do Iit = Adjacent(Iit , at ); q = q0 ; For i = to N Do q = δ(q, Ii ); M = q; Remark 5.1 From defining two functions Em and Ex as above, all image blocks I used are not changed Consider Σ to be an alphabet of size 256 Set P = Σ By the decimal representation of the vector space GF (22 ) over the field GF (22 ) in Section 2.4, then |P| = |GF (22 )| = 256, hence there exists a bijective function f from P to GF (22 ), denote the inverse function of f by f −1 Put F to be a set of all f From the function δ2 , the state q of the automaton A(I, M, K) computed by Statement (2.4) is a set Suppose B is a binary string of length 12 to present an arbitrary state q 17 Put Q to be a set of all possible states q, C is a set of all 12-bit strings B presenting q, q ∈ Q Consider a function h, h : Q → C, h(q) = B, where q is presented by B Obviously, h is a bijective function Denote the inverse function of h by h−1 Let K = {(f, K, I)|f ∈ F, K ∈ K, I ∈ I} is a finite set of secret keys For k ∈ K , k = (f, K, I), ek and dk are defined as follows ek : P → C, ek (x) = h(Em (I, f (x), K)) for x ∈ P dk : C → P, dk (y) = f −1 (Ex (h−1 (y), I, K)) for y ∈ C Set E = {ek |k ∈ K }, D = {dk |k ∈ K } From Definition 1.13, the correctness of the cryptosystem (P, C, K , E, D) is guaranteed by the following theorem Theorem 5.1 Let ∀x ∈ P, ∀k ∈ K , ek ∈ E and dk ∈ D Then dk (ek (x)) = x Security analysis of the cryptosystem (P, C, K , E, D): Assume that publish parameters the flip graph G, Em , Ex , GF (22 ) and h in the cryptosystem (P, C, K , E, D) c39 9!218 28 !2569 = c39 9!290 28 ! for gray images, (5.3) c39 9!218 29t 28 ! = c39 9!218+9t 28 ! for palette images (5.4) or Remark 5.2 By Remark 5.1, all pairs of functions (ek , dk ) in the cryptosystem (P, C, K , E, D) not make the image blocks I change for ∀k ∈ K , k = (f, K, I) In addition, we can see that encrypting and hiding are done at the same time Consider an arbitrary subset of image blocks F as an input image, F ⊂ I, F = {F1 , F2 , , Ft2 }, t2 is the number of image blocks Next, Section 5.2 gives a way applying the cryptosystem (P, C, K , E, D) to the process of encrypting and decrypting secret data over an insecure channel By Remark 5.2, we can use a secret key subset K instead of one secret key k, K = {(f, K, I)|K ∈ K, I ∈ F } ⊂ K for f ∈ F, K = {K , K , , K t1 } 18 Suppose that secret data is a string x = x1 x2 xt3 for xi ∈ P, ∀i = 1, t3 , t3 ≥ The encrypting algorithm eK used to encrypt x is given as follows iK = 1; iF = 1; For i = to t3 Do { ki = (f, K iK , FiF ); yi = eki (xi ); iK = (iK − 1) mod t1 + 1; iF = (iF − 1) mod t2 + 1; } y = y1 y2 yt3 ; The decrypting algorithm dK used to decrypt y is given as follows iK = 1; iF = 1; For i = to t3 Do { ki = (f, K iK , FiF ); xi = dki (yi ); iK = (iK − 1) mod t1 + 1; iF = (iF − 1) mod t2 + 1; } x = x1 x2 xt3 ; Remark 5.3 For two algorithms eK and dK given as above, an arbitrary image block I in the input image F can be used many times in process of encrypting and decrypting the secret data So, for a give input image F , the secret data encrypted is not limited by the size of the input image F 19 5.3 Automata Technique for Exact Pattern Matching on Encrypted Data Suppose that Alice has a secret data and prefers to outsource this data to a cloud provider Bob As the provider is semi-trusted, Alice needs to encrypted her plaintext and wishes to only store ciphertext in the cloud Assume that Alice uses the cryptosystem (P, C, K , E, D) proposed in Section 5.2 to encrypt data with a pair of two secret parameters (S, k) in the cryptosystem, where S is a 2-Generators for GF (22 ) with elements and k = (f, K, I) ∈ K Because of limited storage space and computing ability, instead of downloading ciphertext, decrypting it and searching locally, Alice may ask Bob to perform pattern matching tasks on the ciphertext directly with a trapdoor of the pattern received from her Consider Σ to be an alphabet of size 256 Suppose that the secret data is a string over Σ, x = x1 x2 xt3 for xi ∈ P, ∀i = 1, t3 , t3 ≥ and t3 is often a large natural number, where P = Σ Before uploading the secret data x to Bob, Alice use the encrypting function ek ∈ E to encrypt each xi Then Alice computes yi = ek (xi ), ∀i = 1, t3 , and the encrypted secret data is a string over Σ , y = y1 y2 yt3 which is sent to Bob, where Σ is an alphabet, Σ = {a |a = ek (a), a ∈ Σ} In general case, for x is any string over the alphabet Σ and a string y is obtained from x by the above way Then we can write y = ek (x) for short and y is a string over the alphabet Σ Remark 5.4 By using only one pair of two secret parameters (S, k), then the security of process of encrypting and decrypting the secret data x is similar to Formulas (5.3) (for gray images) or (5.4) (for palette images) Suppose that Bob needs to perform exact pattern matching task of an arbitrary pattern p on encrypted data y Based on previously introduced results in Chapter 3, here continues using automata technique to meet the requirement Theorem 5.2 Let p be a pattern over the alphabet Σ Let two automata Mp = (Σ, Qp , q0 , δp , Fp ) and Mp = (Σ , Qp , q0 , δp , Fp ) be determined as in Theorem 3.2 Then Qp = Qp , Fp = Fp , ∀q ∈ Qp , ∀a ∈ Σ , a = dk (a ), δp (q, a ) = δp (q, a), where p = ek (p) 20 Remark 5.5 The meaning of Theorem 5.2 in practice is to compute δp from δp Let a pattern p and a text (secret data) x be two strings over the same alphabet Σ and assume |p| |x| For assuming that we have only the encrypted secret data y which is not decrypted to the secret data x, from Propositions 5.2, 5.3 and 5.4, Theorem 5.2, based on the MRc algorithm for c = and using the type a breaking point and the concept of Posp in Chapter 3, and by using the automaton Mp given as in Theorem 3.2, we have an exact pattern matching algorithm immediately that finds all occurrences of the pattern p in x as follows Note that the trapdoor according to the search pattern p is computed based on p, which includes the length of p, the functions Sign, Posp and the automaton Mp jump = |p|; While (jump ≤ |y|) { If (sign(yjump ) == 1) { q = 0; i = jump − P osp (yjump ) + 1; Do { q = δp (q, yi ); If (q == |p|) Mark an occurrence of p at i − |p| + in x; i + +; } While (q = and i ≤ |y|); jump = i − 1; } jump = jump + |p|; } Remark 5.6 In the worst case, this algorithm’s time complexity is O(n) 21 5.4 Automata Technique for Approximate Pattern Matching on Encrypted Data Suppose that Bob wants to approximate pattern matching of any pattern p on the ciphertext y From results proposed in Chapter 4, automata technique is still applied to meeting the requirement Theorem 5.3 Given a pattern p on Σ and a positive integer constant c with ≤ c ≤ |p| Let two automata APp c = (Σ, Qp , q0 , δp , Fp ) and APp c = (Σ , Qp , q0 , δp , Fp ) be determined as in Theorem 4.4 Then Qp = Qp , Fp = Fp , ∀q ∈ Qp , ∀a ∈ Σ , a = dk (a ), δp (q, a ) = δp (q, a), where p = ek (p) Remark 5.7 The meaning of Theorem 5.3 in practice is to compute δp from δp Definition 5.1 Given two strings p and x over Σ, and a string similarity measure d Let an error , > 0, ∈ Then p appears in x with the error if there exists a substring u of x such that d(p, u) ≤ To construct the approximate pattern matching algorithm, we need a function to measure the string similarity This section defines a new measure of similarity between two strings d(p, u) = − lcs(p, u) , min{|p|, |u|} (5.11) where p is a pattern and u is a substring of x The constant c in Theorem 4.4 is determined by c = (1 − )|p| Without decrypting y, based on Theorem 5.3, Definition 5.1 and Formula (5.11), use the automaton APp c given as in Theorem 4.4, we immediately have an approximate pattern matching algorithm which determines whether p appears in x with the error or not as follows, where the trapdoor responding to the pattern p is determined from p and , which consists of the constant c and the automaton APp c app = 0; q = W0 ; //The initial state of the automaton APp c is started from W0 For i = to |y| Do { q = δp (q, yi ); 22 If (|q| = c) {app=1; Break;} } If (app = 1) Announce the appearance of the pattern p in x with the error ; Else Announce that p does not appear in x with the error Remark 5.8 In the worst case, time complexity of the algorithm is O(n) when it uses (1 − )|p| processors 5.5 Conclusions With the proposed automata approach to pattern matching algorithms, the automata constructed are only based on search patterns Then the algorithms will have lots of advantages in case of a given pattern and a very large set of ciphertexts stored in the cloud So, future work continues studying this technique to apply in SE CONCLUSION Based on the supervision of Assoc Prof Dr Sc Phan Thi Ha Duong and Dr Vu Thanh Nam, and study on results using graph theory and automata technique proposed by P T Huy et al in steganography and searchable encryption, new contributions of the dissertation in these fields can be summarized as follows: A general approach based on the Galois field GF (pm ) using graph theory and automata to designing optimal and near optimal secret data hiding schemes for binary, gray and palette images (Chapter 2); Based on the approach, Chapter shows that the data hiding scheme CTL reaches an optimal data hiding scheme with N = 2n − 1, where n is a positive integer; From the approach, Chapter proposes data hiding schemes consisting of the optimal data hiding scheme (1, 2n − 1, n) for binary, gray and palette images with qcolour = 1, where n is a positive integer, the near optimal data hiding scheme (2, 9, 8) and the optimal data hiding scheme (1, 5, 4) for gray and palette images with qcolour = 3; A flexible automata approach to constructing an efficient algorithm for the exact pattern matching problem in practice (Chapter 3); 23 Mathematical basis for the development of automata technique for computing the lcs(p, x) (Chapter 4); By the above basis, Chapter proposes two efficient sequential and parallel algorithms computing the lcs(p, x) in practice The parallel algorithm takes O(n) time in the worst case if it uses k processors, where k is an upper estimate of the length of a longest common subsequence of the two strings p and x; Based on results proposed in Chapters 2, and 4, Chapter proposes two major components of SSE: a) A novel cryptosystem based on the data hiding scheme (2, 9, 8) with high security, used by users This method allows both of encrypting and hiding to be done at once, the ciphertext not to depend on the input image size as existing hybrid techniques of cryptography and steganography; b) Two exact and approximate pattern matching algorithms, using automata technique, search for any pattern in ciphertexts directly, performed by cloud servers For the assumption that the approximate algorithm uses (1 − )m processors, the time complexities of these algorithms are both O(n) in the worst case, where , m and n are the error of the proposed measure of similarity between two strings and lengths of the pattern and secret data, respectively Because the problem of development of methods of graph theory and automata in steganography and searchable encryption is topical, some following interest problems can be considered in the future: Whether there exists the optimal data hiding scheme (2, 8, 8) for 8-bit gray image with qcolour = 3; Improving the quality of stego image generated by proposed data hiding schemes for palette images; The problem of steganalysis attacks; Development of automata technique in SE 24 ... computing ability, instead of downloading ciphertext, decrypting it and searching locally, Alice may ask Bob to perform pattern matching tasks on the ciphertext directly with a trapdoor of the pattern... process of encrypting and decrypting the secret data x is similar to Formulas (5.3) (for gray images) or (5.4) (for palette images) Suppose that Bob needs to perform exact pattern matching task... over the alphabet Σ Then the exact pattern matching problem is to find all occurrences of the pattern p in x Definition 1.6 (P T Huy et al., 2002) Let p be a pattern and x be a text of length n

Định dạng
Số trang	27
Dung lượng	461,06 KB