String processing and information retrieval 23rd international symposium, SPIRE 2016

LNCS 9954 Shunsuke Inenaga · Kunihiko Sadakane Tetsuya Sakai (Eds.) String Processing and Information Retrieval 23rd International Symposium, SPIRE 2016 Beppu, Japan, October 18–20, 2016 Proceedings 123 Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany 9954 More information about this series at http://www.springer.com/series/7407 Shunsuke Inenaga Kunihiko Sadakane Tetsuya Sakai (Eds.) • String Processing and Information Retrieval 23rd International Symposium, SPIRE 2016 Beppu, Japan, October 18–20, 2016 Proceedings 123 Editors Shunsuke Inenaga Informatics Kyushu University Fukuoka Japan Tetsuya Sakai Computer Science and Engineering Waseda University Tokyo Japan Kunihiko Sadakane Mathematical Informatics University of Tokyo Tokyo Japan ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-46048-2 ISBN 978-3-319-46049-9 (eBook) DOI 10.1007/978-3-319-46049-9 Library of Congress Control Number: 2016950414 LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues © Springer International Publishing AG 2016 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface This volume contains the papers presented at SPIRE 2016, the 23rd International Symposium on String Processing and Information Retrieval, held October 18–20, 2016 in Beppu, Japan Following the tradition from previous years, the focus of SPIRE this year was on fundamental studies on string processing and information retrieval, as well as application areas such as bioinformatics, Web mining, and so on The call for papers resulted in 46 submissions Each submitted paper was reviewed by at least three Program Committee members Based on the thorough reviews and discussions by the Program Committee members and additional subreviewers, the Program Committee decided to accept 25 papers The main conference featured three keynote speeches by Kunsoo Park (Seoul National University), Koji Tsuda (University of Tokyo), and David Hawking (Microsoft & Australian National University), together with presentations by authors of the 25 accepted papers Prior to the main conference, two satellite workshops were held: String Masters in Fukuoka, held October 12–14, 2016 in Fukuoka, and the 11th Workshop on Compression, Text, and Algorithms (WCTA 2016), held on October 17, 2016 in Beppu String Masters was coordinated by Hideo Bannai, and WCTA was coordinated by Simon J Puglisi and Yasuo Tabei WCTA this year featured two keynote speeches by Juha Kärkkäinen (University of Helsinki) and Yoshitaka Yamamoto (University of Yamanashi) We would like to thank the SPIRE Steering Committee for giving us the opportunity to host this wonderful event Also, many thanks go to the Program Committee members and the additional subreviewers, for their valuable contribution ensuring the high quality of this conference We appreciate Springer for their professional publishing work and for sponsoring the Best Paper Award for SPIRE 2016 We finally thank the Local Organizing Team (led by Hideo Bannai) for their effort to run the event smoothly October 2016 Shunsuke Inenaga Kunihiko Sadakane Tetsuya Sakai Organization Program Committee Leif Azzopardi Philip Bille Praveen Chandar Raphael Clifford Shane Culpepper Zhicheng Dou Hui Fang Simone Faro Johannes Fischer Sumio Fujita Travis Gagie Pawel Gawrychowski Simon Gog Roberto Grossi Ankur Gupta Wing-Kai Hon Shunsuke Inenaga Makoto P Kato Gregory Kucherov Moshe Lewenstein Yiqun Liu Mihai Lupu Florin Manea Gonzalo Navarro Yakov Nekrich Tadashi Nomoto Iadh Ounis Simon Puglisi Kunihiko Sadakane Tetsuya Sakai Hiroshi Sakamoto Leena Salmela Srinivasa Rao Satti Ruihua Song Young-In Song Kazunari Sugiyama University of Glasgow, UK Technical University of Denmark, Denmark University of Delware, USA University of Bristol, UK RMIT University, Australia Renmin University of China, China University of Delaware, USA University of Catania, Italy TU Dortmund, Germany Yahoo! Japan Research, Japan University of Helsinki, Finland University of Wroclaw, Poland and University of Haifa, Israel Karslruhe Institute of Technology, Germany Università di Pisa, Italy Butler University, USA National Tsing Hua University, Taiwan Kyushu University, Japan Kyoto University, Japan CNRS/LIGM, France Bar Ilan University, Israel Tsinghua University, China Vienna University of Technology, Austria Christian-Albrechts-Universität zu Kiel, Germany University of Chile, Chile University of Waterloo, Canada National Institute of Japanese Literature, Japan University of Glasgow, UK University of Helsinki, Finland University of Tokyo, Japan Waseda University, Japan Kyushu Institute of Technology, Japan University of Helsinki, Finland Seoul National University, South Korea Microsoft Research Asia, China Wider Planet, South Korea National University of Singapore, Singapore VIII Organization Aixin Sun Wing-Kin Sung Julián Urbano Sebastiano Vigna Takehiro Yamamoto Nanyang Technological University, Singapore National University of Singapore, Singapore University Carlos III of Madrid, Spain Università degli Studi di Milano, Italy Kyoto University, Japan Additional Reviewers Bingmann, Timo Bouvel, Mathilde Chikhi, Rayan Cicalese, Ferdinando Conte, Alessio Farach-Colton, Martin Fici, Gabriele Fontaine, Allyx Frith, Martin Ganguly, Arnab I, Tomohiro Jo, Seungbum Kempa, Dominik Kosolobov, Dmitry Lee, Joo-Young Liu, Xitong Mercas, Robert Ordóđez Pereira, Alberto Pisanti, Nadia Rosone, Giovanna Schmid, Markus L Starikovskaya, Tatiana Thankachan, Sharma V Välimäki, Niko Keynote Speeches Indexes for Highly Similar Sequences Kunsoo Park Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea kpark@theory.snu.ac.kr The 1000 Genomes Project aims at building a database of a thousand individual human genome sequences using a cheap and fast sequencing, called next generation sequencing, and the sequencing of 1092 genomes was announced in 2012 To sequence an individual genome using the next generation sequencing, the individual genome is divided into short segments called reads and they are aligned to the human reference genome This is possible because an individual genome is more than 99 % identical to the reference genome This similarity also enables us to store individual genome sequences efficiently Recently many indexes have been developed which not only store highly similar sequences efficiently but also support efficient pattern search To exploit the similarity of the given sequences, most of these indexes use classical compression schemes such as run-length encoding and Lempel-Ziv compression We introduce a new index for highly similar sequences, called FM index of alignment We start by finding common regions and non-common regions of highly similar sequences We need not find a multiple alignment of non-common regions Finding common and non-common regions is much easier and simpler than finding a multiple alignment, especially in the next generation sequencing Then we make a transformed alignment of the given sequences, where gaps in a non-common region are put together into one gap We define a suffix array of alignment on the transformed alignment, and the FM index of alignment is an FM index of this suffix array of alignment The FM index of alignment supports the LF mapping and backward search, the key functionalities of the FM index The FM index of alignment takes less space than other indexes and its pattern search is also fast This research was supported by the Bio & Medical Technology Development Program of the NRF funded by the Korean government, MSIP (NRF-2014M3C9A3063541) Longest Common Abelian Factors and Large Alphabets 259 would also be welcome In another direction, we may be able to use rounding techniques described by Cicalese et al [4] to trade off accuracy for time We are currently working on sampling techniques that we hope can be combined with rounding to yield even faster algorithms Finally, we note that achieving O(n2 ) time and O(n) space is possible if we are happy with answers that are sometimes incorrect More precisely, we can use Karp-Rabin hashing in place of Melhorn et al.’s data structure in our algorithm (which is effectively acting as a rolling hash function) This gives a Monte Carlo algorithm that correctly computes the LCAF with high probability; and can be made Las Vegas fairly easily by applying techniques from [5] We defer the details to the full version of this paper References Alatabbi, A., Iliopoulos, C.S., Langiu, A., Rahman, M.S.: Algorithms for longest common abelian factors, arXiv:1503.00049 (2015) Amir, A., Chan, T.M., Lewenstein, M., Lewenstein, N.: On hardness of jumbled indexing In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E (eds.) ICALP 2014 LNCS, vol 8572, pp 114–125 Springer, Heidelberg (2014) Apostolico, A., Crochemore, M., Farach-Colton, M., Galil, Z., Muthukrishnan, S.: 40 years of suffix trees Commun ACM 59(4), 66–73 (2016) Cicalese, F., Gagie, T., Giaquinta, E., Laber, E.S., Lipták, Z., Rizzi, R., Tomescu, A.I.: Indexes for jumbled pattern matching in strings, trees and graphs In: Kurland, O., Lewenstein, M., Porat, E (eds.) SPIRE 2013 LNCS, vol 8214, pp 56–63 Springer, Heidelberg (2013) Gagie, T., Gawrychowski, P., Kă arkkă ainen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching In: Pardo, A., Viola, A (eds.) LATIN 2014 LNCS, vol 8392, pp 731–742 Springer, Heidelberg (2014) Hui, L.C.K.: Color set size problem with applications to string matching In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U (eds.) CPM 1992 LNCS, vol 644, pp 230–243 Springer, Heidelberg (1992) doi:10.1007/3-540-56024-6 19 Kociumaka, T., Starikovskaya, T., Vildhøj, H.W.: Sublinear space algorithms for the longest common substring problem In: Schulz, A.S., Wagner, D (eds.) ESA 2014 LNCS, vol 8737, pp 605–617 Springer, Heidelberg (2014) Mehlhorn, K., Sundar, R., Uhrig, C.: Maintaining dynamic sequences under equality tests in polylogarithmic time Algorithmica 17(2), 183–198 (1997) Pattern Matching for Separable Permutations Both Emerite Neou1(B) , Romeo Rizzi2 , and Stéphane Vialette1 Université Paris-Est, LIGM (UMR 8049), CNRS, UPEM, ESIEE Paris, ENPC, 77454 Marne-la-Vallée, France {neou,vialette}@univ-mlv.fr Department of Computer Science, Universit` a degli Studi di Verona, Verona, Italy romeo.rizzi@univr.it Abstract Given a permutation π (called the text) of size n and another permutation σ (called the pattern) of size k, the NP-complete permutation pattern matching problem asks whether σ occurs in π as an orderisomorphic subsequence In this paper, we focus on separable permutations (those permutations that avoid both 2413 and 3142, or, equivalently, that admit a separating tree) The main contributions presented in this paper are as follows – We simplify the algorithm of Ibarra (Finding pattern matchings for permutations, Information Processing Letters 61 (1997), no 6) to detect an occurrence of a separable permutation in a permutation and show how to reduce the space complexity from O(n3 k) to O(n3 log k) – In case both the text and the pattern are separable permutations, we give a more practicable O(n2 k) time and O(nk) space algorithm Furthermore, we show how to use this approach to decide in O(nk3 ) time whether a separable permutation of size n is a disjoint union of two given permutations of size k and – Given a permutation of size n and a separable permutation of size k, we propose an O(n6 k) time and O(n4 log k) space algorithm to compute the largest common separable permutation that occurs in the two input permutations This improves upon the existing O(n8 ) time algorithm by Rossin and Bouvel (The longest common pattern problem for two permutations, Pure Mathematics and Applications 17 (2006)) – Finally, we give a O(n6 k) time and space algorithm to detect an occurrence of a bivincular separable permutation in a permutation (Bivincular patterns generalize classical permutations by requiring that positions and values involved in an occurrence may be forced to be adjacent) Introduction A permutation σ is said to occurs in π, in symbols σ π, if there exists a subsequence of entries of π that has the same relative order as σ, and in this case σ is said to be a occurs in π Otherwise, π is said to avoid the permutation σ For example, the permutation π = 391867452 contains the pattern σ = 51342 as can c Springer International Publishing AG 2016 S Inenaga et al (Eds.): SPIRE 2016, LNCS 9954, pp 260–272, 2016 DOI: 10.1007/978-3-319-46049-9 25 Pattern Matching for Separable Permutations 261 be seen in the highlighted subsequence of π = 391867452 (or π = 391867452 or π = 391867452 or π = 391867452) However, since the permutation π = 391867452 contains no increasing subsequence of length four, π avoids 1234 During the last decade, the study of the permutation pattern matching has become a very active area of research and an annual conference (Permutation Pattern) is devoted to the subject of pattern in permutation and a database1 of permutation pattern avoidance is maintained by Bridget Tenner We consider here the so-called permutation pattern matching problem (also sometimes referred to as the pattern involvement or pattern containment problem): Given two permutations σ and π, this problem is to decide whether σ π (the problem is ascribed to Wilf in [5]) The permutation pattern matching problem is NP-hard [5] It is, however, polynomial time solvable by brute-force enumeration if σ has bounded size Improvements to this algorithm were presented in [1,2], the latter describing a nice O(|π|0.47k+o(|σ|) ) time algorithm Bruner and Lackner [8] gave a fixed-parameter algorithm solving the permutation pattern matching problem with an exponential worst-case runtime of O(1.79run(π) ), where run(π) denotes the number of alternating runs of π (This is an improvement upon the O(k nk ) runtime required by brute-force search without imposing restrictions on σ and π.) Of particular importance, it has been proved in [11] that the permutation pattern matching problem is fixed-parameter tractable for parameter k A few particular cases of the permutation pattern matching problem have been attacked successfully Of particular interest in our context, the permutation pattern matching problem is solvable in polynomial time for separable patterns Separable permutations are those permutations where the patterns 2413 and 3142 not occur The permutation pattern matching problem is solvable in O(kn4 ) time and O(kn3 ) space for separable patterns [12] (see also [5]), where k is the size of the pattern and n is the size of the text Notice that there are numerous characterizations of separable permutations To mention just a few examples, they are the permutations whose permutation graphs are cographs (i.e P4 -free graphs); equivalently, a separable permutation is a permutation that can be obtained from the trivial permutation by direct sums and skew sums [16] While the term separable permutation dates only to the work of Bose, Buss, and Lubiw [5], these permutations first arose in Avis and Newborns work on pop stacks [3] There exist many generalisations of patterns that are worth considering in the context of algorithmic issues in pattern involvement (see [14] for an up-to-date survey) Vincular patterns, also called generalized patterns, resemble (classical) patterns, with the constraint that some of the letters in an occurrence must be consecutive Of particular importance in our context, Bruner and Lackner [8] proved that deciding whether a vincular pattern σ of size k occurs in a permutation π of size n is W [1]-complete for parameter k Bivincular patterns generalize classical patterns even further than vincular patterns Indeed, in bivincular patterns, not only positions but also values of elements involved in a occurrence may be forced to be adjacent http://math.depaul.edu/bridget/patterns.html 262 B.E Neou et al The paper is organized as follows Section is devoted to presenting the needed material In Sect 3, we revisit the polynomial-time algorithm of Ibarra [12] and we propose a simpler dynamic programming approach, and in Sect we focus on the case where both the pattern and the target permutation are separable Section is devoted to presenting related problems Subsection 5.1 is concerned with presenting an algorithm to test whether a separable permutation is the disjoint union of two given (necessarily separable) permutations In Subsect 5.2, we revisit the classical problem of computing a longest common separable pattern as introduced by Rossin and Bouvel [15] and propose a slightly faster - yet still not practicable - algorithm Finally, in Subsect 5.3, we prove that the pattern matching problem is polynomial-time solvable for vincular separable patterns To the best of our knowledge, this is the first time the pattern matching problem is proved to be tractable for a generalization of separable patterns Due to space constraints, most proofs are omitted and deferred to the full version of this paper Definitions A permutation of size n is a one-to-one function from an n-element set to itself We write permutations as words π = π1 π2 πn , whose letters are distinct and usually consist of the integers 1, 2, , n We designate its i-th element by π[i], and for any i, j ∈ [n] with i ≤ j, we let π[i : j] stand for the sequence πi πi+1 πj We let Sn denote the set of all permutations of size n We shall also represent a permutation π by its plot consisting in the set of points at coordinates (i, π[i]) drawn in the plane According to this representation, we say that an element π[i] is on the left (resp right) of another element π[j] if i < j (resp i > j) Furthermore, we say that an element π[i] is above (resp below ) another element π[j] if π[j] < π[i] (resp π[i] < π[j]) The reduced form of a permutation π on a set {j1 , j2 , , jk } where j1 < j2 < < jk , is the permutation π obtained by renaming the letters of π so that ji is renamed i for all ≤ i ≤ k We let red(π) denote the reduced form of π For example red(5826) = 2413 If red(u) = red(w), we say that u and w are order-isomorphic A permutation σ ∈ Sk is said to occur within a permutation π ∈ Sn , if there is some k-tuple ≤ i1 ≤ i2 ≤ ≤ ik ≤ n such that red(πi1 πi2 πik ) = σ (i.e., π has a subsequence of size k that is order-isomorphic to σ) The subsequence πi1 πi2 πik is called an occurrence of σ in π If σ does not occur in π, then π is said to avoid σ For two permutations π1 of size n1 and π2 of size n2 , the direct sum of π1 and π2 is defined by π1 ⊕ π2 = π1 [1]π1 [2] π1 [n1 ](π2 [1] + n1 )(π2 [2] + n1 ) (π2 [n2 ] + n1 ) [16] The direct sum operation reduces to putting the elements of π2 right above the elements of π1 See Fig for an example of a π2 = direct sum Similarly, we define the skew sum of π1 and π2 by π1 (π1 [1] + n2 )(π1 [2] + n2 ) (π1 [n1 ] + n2 )π2 [1]π2 [2] π2 [n2 ] [16] The skew sum operation reduces to putting the elements of π1 left above the elements of π2 See Fig for an example of a skew sum Pattern Matching for Separable Permutations 263 Fig 312 ⊕ 3214 = 3216547 Fig 312 3214 = 7563214 Separable permutations may be characterized by the forbidden permutation patterns 2413 and 3142 Equivalently, Bose, Buss, and Lubiw [5] define a separable permutation to be a permutation that has a separating tree (note that there may be more than one tree for a given permutation): a rooted binary tree in which the elements of the permutation appear (in permutation order) at the leaves of the tree, and in which the descendants of each tree node form a contiguous subset of these elements Each interior node of the tree is either a positive node in which all descendants of the left child are smaller than all descendants of the right node, or a negative node in which all descendants of the left node are greater than all descendants of the right node See Fig for an illustration Let σ ∈ Sk be a separable permutation, and Tσ be the corresponding separating tree For every node v of Tσ , we let σ(v) stand for the sequence of elements of σ stored at the leaves of the subtree rooted at v Also a permutation is said to be separable if and only if it is the permutation with a unique element or it can be written as a direct sum or skew sum of two smaller separable permutations The tree representation and the decomposition with direct sum or skew sum are strongly related: if σ = σ1 ⊕ σ2 (resp σ = σ1 σ2 ) then there exists a separating tree with a positive (resp negative) root and the left child of the root is the separating tree of σ1 and the right child is the separating tree of σ2 An occurrence of a bivincular permutation pattern σ = (σ, X, Y ) in π is an occurrence of σ in π such that if (e1 , e2 ) ∈ X then the elements matching e1 and e2 must be consecutive in index and if (e1 , e2 ) ∈ Y then the elements matching e1 and e2 must be consecutive in value Moreover if (e1 , e2 ) ∈ X (resp (e1 , e2 ) ∈ Y ) and e1 ∈ / σ then e2 must matched to the leftmost (resp bottommost) element of / σ then e1 must matched to the π and if (e1 , e2 ) ∈ X (resp (e1 , e2 ) ∈ Y ) and e2 ∈ 264 B.E Neou et al Fig On the left, a separating tree Tπ for the permutation π = 342561 together with the corresponding σ(v) sequences and on the right the decomposition of the root of this tree and of its left child: 342561 = red(34256) = (red(342) ⊕ red(56)) (Color figure online) rightmost (resp topmost) element of π Note we only consider “realisable” bivincular permutation pattern which means that σ occurs in σ (by adding elements in X or Y this may not be the case such as (0, 2) ∈ Y ) and “clean” bivincular permutation pattern which means that there is not redundancy in the elements of X and Y For example, given σ = (2143, {(0, 2), (4, 3)}, {(1, 4), (4, 5)}) 3217845 is an occurrence of σ but 3217845 is not is not the leftmost element Note that this definition differs from Definition 1.4.1 in [14], but it is more suited for our algorithm Improved Algorithm to Detect a Separable Pattern Let π ∈ Sn and σ ∈ Sk , and assume that σ is a separable permutation Ibarra [12] gave a nice O(kn4 ) time and O(kn3 ) space algorithm to detect an occurrence of σ in π We revisit the approach of Ibarra and propose a simpler algorithm Since σ is a separable permutation, we can assume that we are given in addition a separating tree Tσ for σ (constructing a separating tree of a separable permutation is linear time and space [5]) Let S be a sequence of elements in [n] with no repetitions A occurrence of a node v of Tσ into S is an occurrence of red(σ(v)) into red(S) The bottom point ↓(s) of an occurrence s of σ(v) into S is the minimum value of the sequence s Similarly, the upmost point ↑(s) is the maximum value of s In the following, since all numbers in [n] are positive, we adopt the convention that the maximum value occurring in an empty subset of [n] is We consider the following family of subproblems that has been first introduced by Ibarra [12]: For every node v of Tσ , every two i, j ∈ [n] with i ≤ j, Pattern Matching for Separable Permutations 265 and every upper bound ub ∈ [n], we have the subproblem ˆ↓v,i,j [ub], where the semantic is the following: Δ ˆ ↓v,i,j [ub] = max{↓(s) : s is an occurrence of σ(v) into π[i : j] with ↑(s) ≤ ub} We first observe that this family of problems is already closed under induction (we not need to introduce the family H as in [12]) These subproblems can be solved by the following equations: – Base: If v is a leaf of Tσ then ˆ↓ v,i,j [ub] := max{π[ι] : π[ι] ≤ ub, i ≤ ι ≤ j} – Step: Let vL and vR be the left and right children of v • If v is a positive node of Tσ (i.e., all elements in the interval associated to vR are larger than all elements in the interval associated to vL ), then ˆ ˆ ˆ↓ v,i,j [ub] := max{↓vL ,i,ι−1 [↓vR ,ι,j [ub]] : i < ι ≤ j} • If v is a negative node of Tσ (i.e., all elements in the interval associated to vR are smaller than all elements in the interval associated to vL ), then ˆ ˆ ˆ↓ v,i,j [ub] := max{↓vR ,ι,j [↓vL ,i,ι−1 [ub]] : i < ι ≤ j} These relations imply a O(kn4 ) time and O(kn3 ) space algorithm for detecting an occurrence of a separable permutation of size k in a permutation of size n, as obtained by Ibarra in [12], only simplified Proposition One can reduce the memory consumption of the algorithm above to O(n3 log k) Proof Observe first that for computing all the entries ˆ↓v,·,· [·] for a certain node v with left and right children vL and vR , we only need the entries ˆ↓vL ,·,· [·] and ˆ ↓vR ,·,· [·] The main idea for achieving the memory spearing is the following – All problems for a same node v are solved together and their solution is maintained in memory until the problems for the parent of v have also been solved At that point the memory used for node v is released – We use a modified DFS traversal on Tσ : for every node v which has two children, we first process its largest child (in terms of the number of nodes in the subtree rooted at that child), then the other child, and finally v itself We claim that the above procedure yields a O(n3 log k) space algorithm We first expand our DFS algorithm to what is known as the White-Gray-Black DFS [9] First, we color all vertices white When we call dfs(u), we color u gray Finally, when DFS(u) returns, we color u black Thanks to this colour scheme, at each step of the modified DFS, we may partition Tσ into a white-gray subtree (all nodes are either white or gray) and a forest of maximal black subtrees 266 B.E Neou et al (all nodes are black and the parent of the root - if it exists - is either white or gray) Our space complexity claim is reduced to prove that, at every step of the algorithm, the forest contains at most O(log k) maximal black subtrees Let hσ be the height of Tσ , and consider any partition of Tσ into a white-gray subtree and an non-empty forest T b of maximal black subtrees The following property easily follows from the (standard) DFS colour scheme Property For every ≤ i ≤ hσ , there exist at most two maximal black subtrees in T b whose roots are at height i in Tσ Furthermore, if there are two maximal black subtrees in T b whose roots are at height i in Tσ (they must have the same parent), then T b contains no maximal black subtree whose root is at height j > i in Tσ According to Property and aiming at maximising |T b |, we may focus on the case where T b contains one maximal black subtree whose root is at height i, ≤ i < hσ , in Tσ (if T b contains one maximal black subtree whose root is at height in Tσ then |T b | = 1), and T b contains two maximal black subtrees whose roots are at height hσ in Tσ (these two maximal black subtrees reduce to size-1 subtrees) The claimed space complexity for the dynamic programming algorithm (i.e., |T b | = log(k)) now follows from the fact that we are using a modified DFS algorithm where we branch of the largest subtree first after having marked a vertex gray Indeed, the maximal black subtree whose root is at height in Tσ contains at least half of the nodes of Tσ , and the same argument applies for subsequent maximal black subtrees in the forest T b Both π and σ Are Separable Permutations When both π and σ are separable permutations we can strive for more efficient solutions since we can construct in linear time the two separating trees Tπ and Tσ It turns out, however, that the standard (i.e binary) separating trees are not well-suited to handle this task We use here the notion of compact separating tree (also known as decomposition tree [16]) Informally, in compact separating tree, we strive for every node to have as many children as possible (so that the compact separating tree of the identity permutation has only the root as its - positive - internal node) A simple linear time post-processing can be used to produce the decomposition tree out of the binary separating tree We will adopt the convention that a compact separating tree of a separating tree Tπ is denoted T˜π The compact tree can be understood with direct/skew sums as the largest decomposition in direct/skew sums: if π = π1 ⊕ ⊕ π then the (unique) compact separating tree of π is the tree with a positive root and with the compact separating tree of π1 as first child, the compact separating tree of πi as ith child and the compact separating tree of π as th child See Figs and for examples Note that when π is decomposed into direct (resp skew) sums it forms a stair up (resp down) of rectangles Now, recall that the tree inclusion problem for ordered and labeled trees is defined as follows: Given two ordered and labeled trees T and T , can T be Pattern Matching for Separable Permutations 267 Fig A separating tree Tπ for the permutation π = 453126987, the corresponding separating tree T˜π and the decomposition 453126987 = red(45312)⊕red(6)⊕red(987) = (red(45) red(3) red(12)) ⊕ red(6) ⊕ (red(9) red(8) red(7)) (Color figure online) obtained from T by deleting nodes? (Deleting a node v entails removing all edges incident to v and, if v has a parent u, replacing the edge from u to v by edges from u to the children of v; see Fig 6.) This problem has been recognized as an important query primitive in XML databases The rationale for considering compact separating trees stems from the following property Property Let π and σ be two separable permutations We have σ π if and only if the compact separating tree T˜σ is included into the compact separating tree T Kilpelă ainen and Manilla [13] presented the first polynomial time algorithm using quadratic time and space for the tree inclusion problem Since then, several improved results have been obtained for special cases when T and T have a small number of leaves or small depth However, in the worst case these algorithms still use quadratic time and space The best algorithm is by Bille and Gørtz [4] who gave an O(nT ) space and ⎧ ⎫⎞ ⎛ ⎨ lT nT ⎬ O ⎝min lT lT log log nT + nT ⎠ ⎩ nT nT + n log n ⎭ T T log nT 268 B.E Neou et al Fig On the left the permutation π = π1 ⊕ ⊕ πi ⊕ ⊕ π and on the right its corresponding compact separating tree Fig The effect of removing a node from a tree time algorithm, where nT (resp nT ) denotes the number of node of T (resp T ) and lT (resp lT ) denotes the number of leaves of T (resp T ) However, all efficient solutions developed so far for the tree inclusion problem result in very complicated and hard-to-implement algorithms For example, the main idea in the efficient algorithm presented in [4] is to construct a data structure on T supporting a small number of procedures, called the set procedures, on subsets of nodes of T We propose a dynamic programming based approach for solving this problem Proposition There exits an O(n2 k) time and O(nk) space algorithm to find an occurrence of a separable pattern of size k in a separable permutation of size n Related Problems Some related problems (deciding the union of a separable permutation, finding a maximum size separable subpermutation and pattern matching issues for bivincular separable patterns) are gathered in this section All algorithms rely on dynamic programming 5.1 Deciding the Union of a Separable Permutations This subsection is devoted to shuffling permutations Given three permutations π, σ and τ , the problem is to decide whether π is the disjoint union of two patterns that are order-isomorphic to σ and τ , respectively For example 937654812 Pattern Matching for Separable Permutations 269 is the disjoint union of two subsequences that are order-isomorphic to 2431 and 53241, as can be seen in the highlighted form 937654812 This problem is of interest since it is strongly related to two others combinatorial problems that naturally arise in the context of pattern in permutations The first one is to decide whether the permutation pattern matching problem for parameter n − k is fixed-parameter tractable (FPT) (Recall that the permutation pattern matching problem for parameter k is fixed-parameter tractable [11].) The second one is to decide whether a permutation is a square: Given a permutation π, does there exists a permutation σ such that π is the disjoint union of two subsequences that are both order-isomorphic to σ? This problem has recently been proved to be NP-complete [10] for general permutations Proposition Given three separable permutations π of size n, σ of size k and τ of size , there exists an O(nk ) time and O(nk 2 ) space algorithm to decide whether π is the disjoint union of two patterns that are order-isomorphic to σ and τ , respectively Note that the complexity of the problem is still open if we not restrict the input permutations to be separable [10] 5.2 Finding a Maximum Size Separable Subpermutation The longest common pattern problem for permutations is, given a set of permutation, to find the largest permutation that occurs in each input permutation The problem is intended to be the natural counterpart to the classical longest common subsequence problem Rossin and Bouvel [15] gave an O(n8 ) time algorithm for computing the largest common separable permutation that occurs in two permutations of size (at most) n, one of these two permutations being separable This problem was further generalised in [6] where it is shown that that the problem of computing the largest separable permutation that occurs in k permutations of size (at most) n is solvable in O(n6k+1 ) time and O(n4k+1 ) space Notice that this later problem is NP-complete for unbounded k, even if all input permutations are actually separable The following proposition improves upon the algorithm of Rossin and Bouvel [15] Proposition Given a permutation of size n and a separable permutation of size k, one can compute in O(n6 k) time and O(n4 log k) space the largest common separable permutation that occurs in the two input permutations 5.3 Vincular and Bivincular Separable Patterns We prove here that detecting a vincular or a bivincular separable pattern in a permutation is polynomial time solvable Since a vincular pattern is a special case of bivincular pattern (when Y = {∅}), we focus on bivincular patterns Note that the algorithm of Sect cannot be used to find an occurrence of a bivincular pattern as we not have any control on the positions and on the values of the matched elements 270 B.E Neou et al Let σ be a separable bivincular pattern (this is a shortcut for σ being separable) of size k and π is a permutation of size n We can represent bivincular patterns (as well as occurrences of bivincular patterns in permutations) by theirs plots Such plot consists in the set of points at coordinates (i, σ[i]) drawn in the plane together with forbidden regions denoting adjacency constraints (similarly to what is done with mesh patterns, see [7]) A vertical forbidden region between two points denotes the fact that the occurrence of these two points must be consecutive in positions Similarly, a horizontal forbidden region between two points denotes the fact that the occurrence of these two points must be consecutive in value Now, given a permutation π and a pattern σ, the bivincular pattern σ occurs in π if there exists a set of points in the plot of π that is order-isomorphic to σ and if the forbidden regions not contain any point (see Fig 7) Fig From left to right, the bivincular pattern σ = (2143, {(0, 2), (4, 3)}, {(1, 4), (4, 5)}), an occurrence of σ in 3216745, an occurrence of σ in 3216745 but not an occurrence of σ in 3216745 because the point (1, 3) and (5, 7) are in the forbidden areas Proposition Given a permutation π of size n and a bivincular separable pattern σ of size k, there exists a O(n6 k) time and space algorithm to decide whether σ occurs in π Before explaining the main idea of the algorithm, we need the notion of rectangle in a permutation Given a permutation π, a rectangle R with bottom left corner (i, lb) and top right corner (j, ub) is the pattern π[i : j] in which all entries greater than ub and smaller than lb are removed We say that a rectangle R contains an occurrence of σ if and only if there exists a subsequence of Re which is order-isomorphic to σ The following lemma is the key element for proving Proposition 5: Lemma Let σ = σL ⊕ σR (resp σ = σL σR ) σ occurs in π if and only if there exist rectangles RL and RR in π, such that RL is left below RR (resp RL is left above RR ), RL contains an occurrence of σL and RR contains an occurrence of σR Pattern Matching for Separable Permutations 271 Given a positive (resp negative) node v of σ with left child vL and right child vR , and a rectangle R of π, deciding whether σ(v) occurs in R reduced to deciding whether there exists a split of the rectangle R into two rectangles RL and RR such that RL is left below RR (resp RL is left above RR ), RL contains an occurrence of σ(vL ) and RR contains an occurrence of σ(vR ) This recursive algorithm solves the permutation pattern matching, but not for the bivincular case as we have no control over the values and the positions of the elements in the occurrence Notice now that, given two rectangles that are consecutive horizontally, say R1 = ((∗, ∗), (j, ∗) and R2 = ((j + 1, ∗)(∗, ∗)), if the rightmost element of the occurrence in R1 is on the right edge of R1 and the leftmost element of the occurrence in R2 is on the left edge of R2 then those two elements are consecutive in position In the same way, given two rectangles that are consecutive vertically, say R1 = ((∗, ∗), (∗, ub) and R2 = ((∗, ub +1)(∗, ∗)), if the topmost element of the occurrence in R1 is on the top edge of R1 and the bottommost element of the occurrence in R2 is on the bottom edge of R2 then those two elements are consecutive in value The proposed algorithm implements the above idea to ensure that two elements are consecutive in position or in value in the sought occurrence: The algorithm splits the rectangle R into RL and RR such that RL and RR are always consecutive horizontally and vertically: if v is a positive node then R is splitted into RL = ((∗, ∗), (j, ub)) and RR = ((j + 1, ub +1), (∗, ∗)), and otherwise (if v is a negative node) then R is splitted into RL = ((∗, lb), (j, ∗)) and RR = ((j + 1, ∗), (∗, lb −1)) Acknowledgments We thank the anonymous reviewers whose comments and suggestions helped improve and clarify this manuscript References Ahal, S., Rabinovich, Y.: On complexity of the subpattern problem SIAM J Discrete Math 22(2), 629–649 (2008) Albert, M.H., Aldred, R.E.L., Atkinson, M.D., Holton, D.A.: Algorithms for pattern involvement in permutations In: Eades, P., Takaoka, T (eds.) ISAAC 2001 LNCS, vol 2223, pp 355–366 Springer, Heidelberg (2001) Avis, D., Newborn, M.: On pop-stacks in series Utilitas Math 19, 129–140 (1981) Bille, P., Gørtz, I.L.: The tree inclusion problem: in linear space and faster ACM Trans Algorithms 7(3), 38 (2011) Bose, P., Buss, J.F., Lubiw, A.: Pattern matching for permutations Inf Process Lett 65(5), 277–283 (1998) Bouvel, M., Rossin, D., Vialette, S.: Longest common separable pattern among permutations In: Ma, B., Zhang, K (eds.) CPM 2007 LNCS, vol 4580, pp 316– 327 Springer, Heidelberg (2007) Bră anden, P., Claesson, A.: Mesh patterns and the expansion of permutation statistics as sums of permutation patterns, ArXiv e-prints (2011) Bruner, M.-L., Lackner, M.: A fast algorithm for permutation pattern matching based on alternating runs In: Fomin, F.V., Kaski, P (eds.) SWAT 2012 LNCS, vol 7357, pp 261–270 Springer, Heidelberg (2012) 272 B.E Neou et al Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn MIT Press, Cambridge (2009) 10 Giraudo, S., Vialette, S.: Unshuffling permutations In: Kranakis, E., et al (eds.) LATIN 2016 LNCS, vol 9644, pp 509–521 Springer, Heidelberg (2016) doi:10 1007/978-3-662-49529-2 38 11 Guillemot, S., Marx, D.: Finding small patterns in permutations in linear time In: Chekuri, C (ed.) Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM 2014, Portland, Oregon, USA, pp 82–101 (2014) 12 Ibarra, L.: Finding pattern matchings for permutations Inf Process Lett 61(6), 293295 (1997) 13 Kilpelă ainen, P., Manilla, H.: Ordered and unordered tree inclusion SIAM J Comput 24(2), 340–356 (1995) 14 Kitaev, S.: Patterns in Permutations and Words Springer, Heidelberg (2013) 15 Rossin, D., Bouvel, M.: The longest common pattern problem for two permutations Pure Math Appl 17, 55–69 (2006) 16 Vatter, V.: Permutation classes In: B´ ona, M (ed.) Handbook of Enumerative Combinatorics, pp 753–818 Chapman and Hall/CRC (2015) Author Index Araujo, Guido 122 Badkobeh, Golnaz 254 Baeza-Yates, Ricardo 231 Belazzougui, Djamal 145 Brisaboa, Nieves R 191, 218, 240 Cerdeira-Pena, Ana 191 Christiansen, Anders Roy 61 Clifford, Raphaël 133 Cording, Patrick Hagge 93, 153 Cox, Anthony J Crochemore, Maxime 22 Diptarama, 110 Mäkinen, Veli 145 Manzini, Giovanni 50, 80 Nakashima, Yuto 254 Navarro, Gonzalo 50, 191, 218 Neou, Both Emerite 260 Onodera, Taku 68 Ordóđez, Alberto 50 Paramá, José R 218 Penabad, Miguel R 191 Pissis, Solon P 22 Previtali, Marco 145 Puglisi, Simon J 1, 160, 254 Farach-Colton, Martín 61 Fariđa, Antonio 50, 240 Farruggia, Andrea Fontaine, Allyx 133 Radoszewski, Jakub 22 Reh, Carl Philipp 35 Rizzi, Romeo 260 Rodríguez, M Andrea 240 Rytter, Wojciech 22 Gagie, Travis 1, 50, 145, 160, 254 Galaktionov, Daniil 240 Gawrychowski, Pawel 153 Gog, Simon 122 Gómez-Brandón, Adrián 218 Grabowski, Szymon 254 Sakai, Yoshifumi 15 Shibuya, Tetsuo 68 Shinohara, Ayumi 110 Silva-Coira, Fernando 191 Sirén, Jouni Starikovskaya, Tatiana 133 Sugimoto, Shiho 254 Hucke, Danny 35 Iliopoulos, Costas S 22 Inoue, Hiroshi 97 Ito, Masaru 97 Kärkkäinen, Juha 204 Kempa, Dominik 204 Knudsen, Mathias Bæk Tejs 93 Kociumaka, Tomasz 22 Külekci, M Oğuzhan 166 Kundu, Ritu 22 Lohrey, Markus 35 López-López, Narciso 191 Louza, Felipe A 122 Taura, Kenjiro 97 Telles, Guilherme P 122 Tischler, German 178 Valenzuela, Daniel 160 Vialette, Stéphane 260 Vildhøj, Hjalte Wedel 133 Waleń, Tomasz 22 Wang, Guoqiang 231 Weimann, Oren 153 Yoshinaka, Ryo 110 Zanotto, Leandro 122 ... information about this series at http://www.springer.com/series/7407 Shunsuke Inenaga Kunihiko Sadakane Tetsuya Sakai (Eds.) • String Processing and Information Retrieval 23rd International Symposium,. .. String Processing and Information Retrieval, held October 18–20, 2016 in Beppu, Japan Following the tradition from previous years, the focus of SPIRE this year was on fundamental studies on string. .. were held: String Masters in Fukuoka, held October 12–14, 2016 in Fukuoka, and the 11th Workshop on Compression, Text, and Algorithms (WCTA 2016) , held on October 17, 2016 in Beppu String Masters

Định dạng
Số trang	288
Dung lượng	11,04 MB