Formal Methods John Fitzgerald Constance Heitmeyer Stefania Gnesi Anna Philippou (Eds.) LNCS 9995 FM 2016: Formal Methods 21st International Symposium Limassol, Cyprus, November 9–11, 2016 Proceedings 123 Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA Takeo Kanade, USA Jon M Kleinberg, USA John C Mitchell, USA C Pandu Rangan, India Demetri Terzopoulos, USA Gerhard Weikum, Germany Formal Methods Subline of Lectures Notes in Computer Science Subline Series Editors Ana Cavalcanti, University of York, UK Marie-Claude Gaudel, Université de Paris-Sud, France Subline Advisory Board Manfred Broy, TU Munich, Germany Annabelle McIver, Macquarie University, Sydney, NSW, Australia Peter Müller, ETH Zurich, Switzerland Erik de Vink, Eindhoven University of Technology, The Netherlands Pamela Zave, AT&T Laboratories Research, Bedminster, NJ, USA 9995 More information about this series at http://www.springer.com/series/7408 John Fitzgerald Constance Heitmeyer Stefania Gnesi Anna Philippou (Eds.) • • FM 2016: Formal Methods 21st International Symposium Limassol, Cyprus, November 9–11, 2016 Proceedings 123 Editors John Fitzgerald Newcastle University Newcastle upon Tyne UK Stefania Gnesi ISTI-CNR Pisa Italy Constance Heitmeyer US Naval Research Laboratory Washington, DC USA Anna Philippou University of Cyprus Nicosia Cyprus ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-48988-9 ISBN 978-3-319-48989-6 (eBook) DOI 10.1007/978-3-319-48989-6 Library of Congress Control Number: 2016956000 LNCS Sublibrary: SL2 – Programming and Software Engineering © Springer International Publishing AG 2016 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface Over nearly three decades since its foundation in 1987, the “FM” Symposium has become a central part of the intellectual and social life of the Formal Methods community We are therefore delighted to present the proceedings of FM 2016, the 21st symposium in the series, held in Limassol, Cyprus, during November 9–11, 2016 Throughout these years, Springer has supported the symposium through its Lecture Notes in Computer Science (LNCS) series It is therefore with particular pleasure that we present this year’s proceedings as the first volume in the new LNCS subline on Formal Methods The creation of this subline reflects the maturity and growing significance of the discipline The 2016 symposium received 162 submissions to the main track – the largest number of contributions to a regular symposium in the FM series to date Review of each submission by at least three Program Committee members followed by a discussion phase led to the selection of 43 papers – an acceptance rate of 0.265 These proceedings also contain six papers selected by the Program Committee of the Industry Track chaired by Georgia Kapitsaki (University of Cyprus), Tiziana Margaria (University of Limerick and Lero, Ireland), and Marcel Verhoef (European Space Agency, The Netherlands) We were honored that three of the most creative and respected members of our community – Manfred Broy (Technical University of Munich), Peter O’Hearn (University College London, and Facebook), and Jan Peleska (University of Bremen and Verified Software International) – accepted our invitation to give keynote presentations at the symposium Also scheduled during FM 2016 were four workshops selected by the Workshop Chairs, Nearchos Paspallis (University of Central Lancashire in Cyprus) and Martin Steffen (University of Oslo), eight tutorials selected by the Tutorial Chairs, Dimitrios Kouzapas (Glasgow University) and Oleg Sokolsky (University of Pennsylvania), and eight papers to be presented at a Doctoral Symposium organized by Andrew Butterfield (Trinity College Dublin) and Matteo Rossi (Politecnico di Milano) The resulting FM 2016 program reflects the breadth and vibrancy of both research and practice in formal methods today As in previous years, FM 2016 attracted submissions from all over the world: 299 authors from 22 European countries, 126 authors from eight Asian countries, 64 authors from North America, 24 authors from five countries in South America, 16 authors from Australia and New Zealand, and five authors from two African countries, Algeria and Tunisia The largest number of authors from a single country were from China (58), the second largest number of authors came from France (56), the third largest number of authors were from the UK (53), and the fourth largest number of authors were from the USA (45) Last year, the FM community mourned the passing of Prof Peter Lucas, a former chair of the FME Association and a founding figure of the formal methods discipline VI Preface This year, as a symposium highlight, we celebrated Peter’s achievements by presenting the first Lucas Award for a highly influential paper in formal methods We are grateful to all involved in FM 2016, particularly the Program Committee members, subreviewers, and other committee chairs The excellent local organization and publicity groups, chaired by Yannis Dimopoulos, Chryssis Georgiou, and George Papadopoulos (University of Cyprus), deserve special thanks Much of the symposium’s activity would be impossible without the support of our sponsors We gratefully acknowledge the support of: Springer, the Cyprus Tourism Organization, the University of Cyprus, and DiffBlue September 2016 John S Fitzgerald Stefania Gnesi Constance Heitmeyer Program Co-chairs Anna Philippou General Chair Organization Program Committee Erika Abraham Bernhard K Aichernig Myla Archer Gilles Barthe Nikolaj Bjorner Michael Butler Andrew Butterfield Ana Cavalcanti David Clark Frank De Boer Ewen Denney Jin Song Dong Javier Esparza John Fitzgerald Vijay Ganesh Diego Garbervetsky Dimitra Giannakopoulou Stefania Gnesi Wolfgang Grieskamp Arie Gurfinkel Anne E Haxthausen Ian Hayes Constance Heitmeyer Thai-Son Hoang Jozef Hooman Laura Humphrey Ralf Huuck Fuyuki Ishikawa Einar Broch Johnsen Cliff Jones Georgia Kapitsaki Joost-Pieter Katoen Gerwin Klein Laura Kovacs Thomas Kropf Peter Gorm Larsen RWTH Aachen University, Germany TU Graz, Austria Naval Research Laboratory, USA IMDEA Software Institute, Spain Microsoft Research, USA University of Southampton, UK Trinity College, University of Dublin, Ireland University of York, UK UCL, UK CWI, The Netherlands SGT/NASA Ames, USA National University of Singapore, Singapore Technical University of Munich, Germany Newcastle University, UK University of Waterloo, Canada Universidad de Buenos Aires, Argentina NASA Ames, USA ISTI-CNR, Italy Google, USA University of Waterloo, Canada Technical University of Denmark, Denmark University of Queensland, Australia Naval Research Laboratory, USA University of Southampton, UK TNO-ESI and Radboud University Nijmegen, The Netherlands Air Force Research Laboratory, USA UNSW/SYNOPSYS, Australia National Institute of Informatics, Japan University of Oslo, Norway Newcastle University, UK University of Cyprus, Cyprus RWTH Aachen University, Germany NICTA and UNSW, Australia Vienna University of Technology, Austria Bosch, Germany Aarhus University, Denmark VIII Organization Thierry Lecomte Yves Ledru Rustan Leino Elizabeth Leonard Martin Leucker Michael Leuschel Zhiming Liu Tiziana Margaria Mieke Massink Annabelle McIver Dominique Mery Peter Müller Tobias Nipkow Jose Oliveira Olaf Owe Sam Owre Anna Philippou Nico Plat Elvinia Riccobene Judi Romijn Grigore Rosu Andreas Roth Augusto Sampaio Gerardo Schneider Natasha Sharygina Marjan Sirjani Ana Sokolova Jun Sun Kenji Taguchi Stefano Tonetta Marcel Verhoef Aneta Vulgarakis Alan Wassyng Heike Wehrheim Michael Whalen Jim Woodcock Fatiha Zaidi Gianluigi Zavattaro Jian Zhang Lijun Zhang ClearSy, France Université Grenoble Alpes, France Microsoft Research, USA Naval Research Laboratory, USA University of Lübeck, Germany University of Düsseldorf, Germany Southwest University, China University of Limerick and Lero, Ireland CNR-ISTI, Italy Macquarie University, Australia Université de Lorraine, LORIA, France ETH Zürich, Switzerland TU München, Germany Universidade Minho, Portugal University of Oslo, Norway SRI International, USA University of Cyprus, Cyprus Thanos and West IT Solutions, The Netherlands University of Milan, Italy Movares, The Netherlands University of Illinois at Urbana-Champaign, USA SAP Research, Germany Federal University of Pernambuco, Brazil Chalmers University of Gothenburg, Sweden University of Lugano, Switzerland Reykjavik University, Iceland University of Salzburg, Austria Singapore University of Technology and Design, Singapore AIST, Japan FBK-irst, Italy European Space Agency, The Netherlands Ericsson, Sweden McMaster University, Canada University of Paderborn, Germany University of Minnesota, USA University of York, UK University of Paris-Sud, France University of Bologna, Italy Chinese Academy of Sciences, China Chinese Academy of Sciences, China Organization Additional Reviewers Aestasuain, Fernando Aguirre, Nazareno Ait Ameur, Yamine Almeida, José Bacelar Alt, Leonardo Ambrona, Miguel Andronick, June Antignac, Thibaud Arcaini, Paolo Arming, Sebastian Asadi, Sepideh Azadbakht, Keyvan Bagheri, Maryam Bai, Guangdong Bak, Stanley Bandur, Victor Bartocci, Ezio Basile, Davide Bertrand, Nathalie Berzish, Murphy Bonacina, Maria Paola Bornat, Richard Bourke, Timothy Braghin, Chiara Bravetti, Mario Bright, Curtis Bubel, Richard Calinescu, Radu Carvalho, Gustavo Cassez, Franck Castaño, Rodrigo Chawdhary, Aziem Chen, Xiaohong Chen, Xin Ciancia, Vincenzo Ciriani, Valentina Colom, José Manuel Colvin, Robert Cremers, Cas Dalvandi, Mohammadsadegh Dang, Thao Decker, Normann Dehnert, Christian Delzanno, Giorgio Demasi, Ramiro Dghaym, Dana Dimovski, Aleksandar S Dobrikov, Ivaylo Dodds, Mike Donat-Bouillud, Pierre Dong, Naipeng Dutertre, Bruno Díaz, Gregorio Engelmann, Bjưrn Fantechi, Alessandro Fedyukovich, Grigory Fokkink, Wan Foster, Simon Fox, Anthony Freitas, Leo Ghassabani, Elaheh Habli, Ibrahim Herbelin, Hugo Heunen, Chris Holzer, Andreas Huisman, Marieke Hyvärinen, Antti Höfner, Peter Immler, Fabian Inoue, Jun Jacob, Jeremy Jafari, Ali Jakobs, Marie-Christine Jansen, Nils Jegoure, Cyrille Johansen, Christian Junges, Sebastian Katis, Andreas Khamespanah, Ehsan Kotelnikov, Evgenii Kremer, Gereon Kretinsky, Jan Krämer, Julia Désirée Kumar, Ramana Laarman, Alfons Lallali, Mounir IX 774 B Luteberget et al Fig Structured comments attached to a rule expressing violation of a regulation Fig Counter-example presentation within the RailCOMPLETE CAD tool closures), and uses (2) negation with negation-as-failure semantics (stratified negation) Finally, and going beyond pure Datalog, it uses (3) arithmetic, to model aspects such as distances Our prototype implementation uses XSB Prolog which does conventional top-down Prolog search, combined with tabling of recursive predicates, ensuring the Datalog properties of termination and polynomial running time Figure shows an example rule input corresponding to a railway property, whereas Fig shows the graphical representation indicating to the engineer which regulation is violated The tight integration into the CAD program and, as such, into the engineer’s design process, creates the demand for fast re-evaluation of all conclusions upon small changes to the railway designs The performance studies of [8] show that the current implementation is well acceptable for “one-shot” validation even for realistic designs with running times in the range of seconds (the tool is applied to a real train station currently under construction) However, it is not fast enough to smoothly and transparently be integrated such that it can automatically rerun the complete verification for each small change Incremental Verification for On-the-Fly Performance An alternative approach that promises to be more efficient is incremental verification: instead of solving logic programs from scratch for each verification run, it tries to materialize all consequences of the base facts and then maintains this view under fact updates The existing literature on incremental materialization of Datalog programs gives various strategies for doing this efficiently We briefly Rule-Based Incremental Verification Tools Applied to Railway Designs 775 survey methods for incremental evaluation of Datalog programs, also known in the deductive database literature as the view maintenance problem [5] [1, Chap 22] We also survey relevant tools and compare their features (e.g., availability, industry-quality, performance) in the context of our verification tool A more thorough evaluation appears in a long version of this work [9] Datalog systems use rules to derive a set of consequences (intensional facts), from a given set of base facts (extensional facts) Typically, Datalog systems use a bottom-up (or forward-chaining) evaluation strategy, where all possible consequences are materialized [15, Chap 3] [1, Chap 13] This simplifies query answering to simply looking up values in the materialization tables Any change to the base facts, however, will invalidate the materialization Several approaches have been suggested to reduce the work required to find a new materialization after changing the base facts First, if considering only addition of facts to positive Datalog programs, i.e without negation, then the standard semi-naive algorithm [15, Chap 3] [1, Chap 13] is already an efficient approach The real challenge are non-monotonic changes, i.e., removing facts appearing positively in rules or adding facts appearing negatively in rules Non-monotonicity is essential in our railway infrastructure verification rules Graph reachability is prominent in many of the regulations for railway signalling, so efficiently maintaining rules involving transitivity is also essential Some algorithms, such as truth maintenance systems [3], work by storing more information (in addition to the logical consequences) about the supporting facts for derived facts, so that removal of supporting facts may or may not remove a derived fact This allows efficient removal of facts, at the cost of requiring more time and memory for normal derivations Another class of algorithms, working without additional “bookkeeping”, can be more efficient if the re-evaluation of sets of facts is relatively easy compared to re-materializing all facts The Propagation-Filtering algorithm [7] works on each removed fact separately, propagating it through to all rules which depend on it In contrast, the Delete-Rederive (DRed) algorithm [6] is rule-oriented and works on sets of facts, first over-approximating all possible deletions that may result from a change in base facts, then re-deriving any still-supported facts from the over-deleted state before finally continuing semi-naive materialization on newly added facts Recently, the Forward/Backward/Forward (FBF) algorithm [10] used in RDFox improved the DRed algorithm in most cases by searching for alternative support (and caching the results) for each potentially deleted fact before proceeding to the next fact Notably, this method performs better on rules involving transitivity, as deletions not propagate further than necessary Datalog Tools for Incremental Verification Our procedure uses rule-based modelling and verification techniques in the style of Datalog In consequence, we perform a survey of Datalog-based and related tools The logic programs for our verification make use of recursive predicates, stratified negation, and arithmetic Therefore, we pay particular attention to tools that at least satisfy these needs In addition, we are looking for high performance on relatively small (in-memory) data sets, so light-weight library-style 776 B Luteberget et al logic engines are preferred High-performance distributed “big data” type of tools have less value in this context XSB Prolog continuously developed since 1990, has constantly been pushing the state of the art in high-performance Prolog XSB is especially known for its tabling support [14], which allows fast Datalog-like evaluation of logic programs without restricting ISO Prolog The tabling support was extended to allow incremental evaluation [12], and these features have been under continued development and seem to have reached a mature state [13] For some applications, however, the additional memory usage for incremental tabling can lead to a significant increase in the total memory needed RDFox is a multicore-scalable in-memory RDF triple store with Datalog reasoning It reads semantic web formats (RDF/OWL) and stores RDF triples, but also includes a Datalog-like input language which can describe SWRL rules This rule language has been extended to include stratified negation and arithmetic The RDFox system also implements a new algorithm called FBF for incremental evaluation [10] RDFox stores internally only triples as in RDF, which, in Datalog, corresponds to only using unary and binary predicates A method of reifying the rules for higher-arity Datalog predicates into binary predicates allows RDFox to calculate any-arity Datalog programs However, this requires separate rules for each component of the predicate, and when doing incremental evaluation, the FBF algorithm’s backward chaining step then examines all combinations of components potentially involved Because of this problem, using RDFox incrementally did not improve running times in our case study LogicBlox is a programming platform [2] for combining transactions with analytics in enterprise application areas including web-based retail planning and insurance It uses a typed, Datalog-based custom language LogiQL and has a comprehensive development framework It claims support for incremental verification, but we could not evaluate it on our railway example due to absence of freely downloadable distributions Dyna is a promising new Datalog-like language for modern statistical AI systems [4] It has currently not matured sufficiently for our application, but its techniques are promising, and we hope to see it more fully developed in the future Many other Datalog tools are available (around 30), few of them supporting incremental evaluation An overview and our brief evaluation of them can be found in the technical report [9] We hope to include these findings also in the Wikipedia page for Datalog.1 Efficiency Gains, Shortcomings, and Possible Ways Forward Table compares the running time and memory usage for the verification on Arna station used as a reference station in RailCOMPLETE The railway https://en.wikipedia.org/wiki/Datalog#Systems implementing Datalog Rule-Based Incremental Verification Tools Applied to Railway Designs 777 Table Case study size and running times on a standard laptop Testing Arna Arna station phase A phase B Relevant components 15 152 231 Interlocking routes 23 42 Datalog input facts 85 8283 9159 Running time (s) 0.015 Memory (MB) 20 2.31 104 4.59 190 Incremental verif baseline: Running time (s) 0.016 Memory (MB) 21 5.87 1110 12.25 2195 Incr single object update: 0.54 1165 0.61 2267 XSB: Non-incrementalverif.: Running time (s) 0.014 Memory (MB) 22 signalling design project for this station is currently in progress by Norconsult AS The extra bookkeeping required in XSB to prepare for incremental evaluation requires more time and memory than non-incremental evaluation, so we include both non-incremental and from-scratch incremental evaluation in the table for comparison We show how updates can be calculated faster than fromscratch evaluation by moving a single object (an axle counter) in and out of a disallowed area near another object (regulations require at least 21.0 m separation between train detectors) Without using abstraction methods, the case study verification uses over GB of memory So, for any hope of handling larger stations on a standard laptop or workstation, this must be reduced We were not able to reduce memory usage in this case study using the abstraction methods in XSB (version 3.6.0) While currently none of the tools seem to satisfy all conditions we hoped for in our integration, notably efficiency, but also maturity and stability, it should also be noted that the need for incremental evaluation has been identified by the community not only as theoretically interesting, but also as of practical importance The RDFox developers aim to support incremental updates of higher-arity predicates in a later version The XSB project has made efforts to improve its abstraction mechanisms, so future versions might become feasible for our use If reducing the memory usage would require adapting a Datalog algorithm (such as DRed), then XSB’s unrestricted Prolog might be a challenge A different approach would be to extend another efficient Datalog tool, such as Souffl´e, to incremental evaluation, which could require a significant effort 778 B Luteberget et al References Abiteboul, S., Hull, R., Vianu, V (eds.): Foundations of Databases, 1st edn Addison-Wesley Longman Publishing Co., Boston (1995) Aref, M., ten Cate, B., Green, T.J., Kimelfeld, B., Olteanu, D., Pasalic, E., Veldhuizen, T.L., Washburn, G.: Design and implementation of the LogicBlox system In: SIGMOD International Conference on Management of Data, pp 1371– 1382 ACM (2015) Doyle, J.: A truth maintenance system Artif Intell 12(3), 231–272 (1979) Eisner, J., Filardo, N.W.: Dyna: extending datalog for modern AI In: Moor, O., Gottlob, G., Furche, T., Sellers, A (eds.) Datalog 2.0 2010 LNCS, vol 6702, pp 181–220 Springer, Heidelberg (2011) doi:10.1007/978-3-642-24206-9 11 Gupta, A., Mumick, I.S., et al.: Maintenance of materialized views: problems, techniques, and applications IEEE Data Eng Bull 18(2), 3–18 (1995) Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally In: SIGMOD International Conference on Management of Data, pp 157–166 ACM (1993) Harrison, J.V., Dietrich, S.W.: Maintenance of materialized views in a deductive database: an update propagation approach In: Workshop on Deductive Databases, pp 56–65 (1992) Luteberget, B., Johansen, C., Steffen, M.: Rule-based consistency checking ´ of railway infrastructure designs In: Abrah´ am, E., Huisman, M (eds.) IFM 2016 LNCS, vol 9681, pp 491–507 Springer, Heidelberg (2016) doi:10.1007/ 978-3-319-33693-0 31 Luteberget, B., Johansen, C., Steffen, M.: Rule-based consistency checking of railway infrastructure designs (long version) Technical report 450, University of Oslo (IFI) (2016) 10 Motik, B., Nenov, Y., Piro, R.E.F., Horrocks, I.: Incremental update of datalog materialisation: the backward/forward algorithm In: Proceedings of AAAI 2015 AAAI Press (2015) 11 Nash, A., Huerlimann, D., Schă utte, J., Krauss, V.P.: RailML a standard data interface for railroad applications, pp 233–240 WIT Press (2004) 12 Saha, D., Ramakrishnan, C.R.: Incremental evaluation of tabled logic programs In: Palamidessi, C (ed.) ICLP 2003 LNCS, vol 2916, pp 392–406 Springer, Heidelberg (2003) doi:10.1007/978-3-540-24599-5 27 13 Swift, T.: Incremental tabling in support of knowledge representation and reasoning Theory Pract Log Program 14(4–5), 553–567 (2014) 14 Swift, T., Warren, D.S.: XSB: extending Prolog with tabled logic programming Theory Pract Log Program 12(1–2), 157–187 (2012) 15 Ullman, J.D.: Principles of Database and Knowledge-base systems, vol I & II Computer Society Press (1988) RIVER: A Binary Analysis Framework Using Symbolic Execution and Reversible x86 Instructions Teodor Stoenescu1 , Alin Stefanescu2(B) , Sorina Predut2 , and Florentin Ipate2 Bitdefender, Bucharest, Romania University of Bucharest, Bucharest, Romania alin@fmi.unibuc.ro Abstract We present a binary analysis framework based on symbolic execution with the distinguishing capability to execute stepwise forward and also backward through the execution tree It was developed internally at Bitdefender and code-named RIVER The framework provides components such as a taint engine, a dynamic symbolic execution engine, and integration with Z3 for constraint solving Introduction Given the nowadays extreme interconnectivity between multiple systems, networks and (big) data pools, the field of cybersecurity is a vitally important aspect, for which concentrated efforts and resources are invested To mention only two recent examples in this direction, European Union just launched a new public-private partnership on cybersecurity which is expected to trigger a e1.8 billion of investment by 2020 [1] in advanced research and cooperation to improve the defence against the myriad of security attacks, and US currently organises, through DARPA, a cybersecurity grand challenge (CGC) [2], where successful teams compete to analyse and fix a benchmark of binary files using a combination of dynamic and static analysis, concolic, and fuzz testing Almost all the tools on the security market which aim to detect vulnerabilities of source or binary code employ static analysis or, more rarely, dynamic analysis through random values, a technique called fuzz testing This may be more efficient than the alternative of symbolic execution that we explore here, but can miss many deeper or more insidious security issues Symbolic execution is a promising approach whose foundational principles were laid thirty years ago [3], but which only recently started to regain attention from the research community due to advancement in constraint solving, various combinations of concrete and symbolic execution, and more computing power to fight the usual state explosion problem [4] The basic idea of symbolic execution is to mark (some of) the program variables as symbolic rather than concrete and execute the program symbolically by accumulating constraints on those variables along the different paths explored in the execution tree c Springer International Publishing AG 2016 J Fitzgerald et al (Eds.): FM 2016, LNCS 9995, pp 779–785, 2016 DOI: 10.1007/978-3-319-48989-6 50 780 T Stoenescu et al Most of the symbolic execution tools work on source code or bytecode [5–7] rather than binary code [8–10] However, binary code analysis is a very difficult task due to its complexity and lower level constructs On the other hand, it is better to run the analysis directly at binary level, because this is the code which is executed by the operating system Moreover, in cybersecurity, usually only the binary file is available, so recent research efforts are invested into dynamic analysis of binary files [2] with companies such as Bitdefender joining the trend Bitdefender is a Romanian software security company and the creator of one of the world’s fastest and most effective lines of internationally certified security software and award-winning protection since 2001 [11] Today, Bitdefender secures the digital experience of 500 million home and corporate users across the globe and, for that, Bitdefender is constantly performing research activities in the software security area The RIVER framework is an example of such internal research effort with person-years invested in the project until now Contributions: The main differentiator of RIVER is the design and implementation of a set of extended reversible x86 instructions, which allows an efficient control of the execution and their integration into a symbolic execution framework For that, the following artifacts were created: RIVER intermediate representation, which adds necessary and sufficient information to the x86 set of instructions in order to efficiently “undo” the operations when needed or to track certain variables as tainted; dedicated taint analysis and symbolic execution engines based on the above; and, as a byproduct, a debugger at binary level with forward and backward step execution capabilities A technical report on RIVER is online at: http://tinyurl.com/river-tr-2016 Description of the Framework This section details the overall design of the RIVER framework, which is shown in Fig RIVER (spelled backwards) stands for the “REVersible Intermediate Representation” RIVER has a fixed length extended x86 instruction set and was designed to be efficiently translated to and from x86 normal (“forward”) instructions Its main novelty is the introduction of reverse (“backward”) instructions Also, specific tracking instructions were added to enable the taint analysis This intermediate representation is depicted in the left hand side of Fig The RIVER code is obtained from an input as x86 native binary code (see bottom-left corner of Fig 1) through the dynamic binary instrumentation component, by means of disassembly Then, modified code is used by the components for on-the-fly reversible execution and taint analysis All these are used by the symbolic execution engine which also uses a state-of-the-art SMT solver, Z3 [12], for dealing with the constraints for the symbolic variables (see top of Fig 1) but also ondemand snapshots to save certain memory states All these and various other aspects are discussed in this section RIVER Intermediate Language Now we describe RIVER intermediate language (IL) by presenting a couple of design choices More details and an example is given in the RIVER technical report mentioned above First of all, RIVER RIVER: A Binary Analysis Framework Using Symbolic Execution 781 Fig RIVER architecture code is obtained automatically from the input native x86 through the dynamic binary instrumentation component (DBI) which is plugged in the reversible execution component (see bottom-left corner of Fig 1) Thus, RIVER augments translated code in order to make it reversible It uses a shadow stack in order to save instruction operands that are about to be destroyed The original instructions are prefixed with operand saving ones DBI also generates code for reversing the execution so that the destroyed values can be restored from the shadow stack The RIVER instructions include modifiers, specifiers, operator codes and types as well as flags and a special field for the family of the instruction These additional information is used to identify the prefixes and operand types and registers of the original instructions and help the data flow analysis RIVER DBI component also contains its own disassembler, which augments the code with the following properties: (a) implicit operands: some instructions implicitly modify registers and memory locations These are added to the instruction as implicit operands; (b) register versioning: in order to simplify data flow analysis, the disassembler versions every register use; (c) meta operations: since the x86 instruction set is not orthogonal, some instructions may be split into several sub-operations, and (d) absolute jump addresses: relative jump operations are augmented with an additional operand containing the original instruction address This makes it easier to compute the jump destination Reversible Execution Component The reversible execution engine (see middle of Fig 1) enables the forward and backward control of RIVER IL code that was translated from the native x86 code through the DBI component It operates at the basic block level, i.e., a sequence of instructions terminated by a 782 T Stoenescu et al jump, by replacing the jump instruction in order to maintain the execution control To implement reversibility, the RIVER translator inserts RIVER-specific instructions in the translated code Then, the RIVER translator generates a second basic block for reversing the effects of the first block Based on the above, we developed a forward and backward binary debugger (see bottom-right of Fig 1) We created it to be used by the software developers and security experts at Bitdefender, who need to examine dynamically certain behaviours of binary files with a fine-grained control It operates at basic block level and it has a web front-end using JavaScript bindings for RIVER Moreover, it offers the possibility to set breakpoints, but also so-called “waypoints”, which are similar to breakpoints but referring to points in the past of the execution Taint Analysis Component This component records the spread of taint through a program which uses tainted values We implement classic taint spreading algorithms, but we adapt them to our RIVER IL to take into account also the reversibility feature Technically, we added tracking instructions in RIVER IL which are used by DBI to enable determining locations (both memory and registers) that have been directly influenced by the input values Initially all input locations are marked as tainted and everything else is untainted At runtime, any instruction having a tainted operand produces tainted results (with some exceptions) There are two ways of tracking locations: using simple boolean values or binding custom values to memory locations (pointers to symbolic expressions) We use the former for simple taint analysis (if used as standalone) and the latter for symbolic execution Symbolic Execution Engine In order to perform various types of analysis and testing using dynamic symbolic execution, the program has to exercise a large set of paths through its execution tree The more paths are explored, the higher the coverage of examined behaviours However, since the enumeration of paths is computationally expensive, several approaches have been proposed to minimize its footprint [4] Our symbolic execution engine (see right of Fig 1) aims to tackle the path explosion problem through its distinctive feature of reversibility More precisely, instead of re-executing paths from the beginning each time, we generate them through backtracking (using, e.g., a depth first search strategy) and use the reversibility to keep the memory usage low for the backtracking steps Moreover, we keep only the current path in memory rather than a whole set of paths and snapshots Thus, we try to exploit the temporal and spatial data locality, since most execution paths have a lot of common subsequences We the above by keeping track of two things in parallel: a concrete stack for the current path plus, only when needed, snapshots We optimise the latter during reverse execution using many implicit micro-snapshots as opposed to (expensive) macro-snapshots usually used by the current symbolic execution approaches The micro-snapshots keep only the modified memory locations, so we can easily restore the previous snapshot at each program point It is a high priority for us to keep the snapshots at a minimum, and use it only on demand, i.e., when we cannot reverse the execution of specific instructions, such as system calls, processor exceptions, or interrupts (e.g., “0x2e”) RIVER: A Binary Analysis Framework Using Symbolic Execution 783 The fact that they are quite uncommon also helps our performance Furthermore, we try to avoid also the snapshots associated to system calls: we have started a detailed analysis of the reversibility of these problematic functions, by systematically examining Windows Native API (NTDLL) and implementing their inverse functions, whenever possible Regarding the symbolic execution engine, we not implement “pure” symbolic execution, but use concolic execution, i.e., mixing concrete and symbolic execution at the binary level Thus, instead of being only symbolic, the inputs have a concrete value which is a representative of the symbolic domain Besides, the taint analysis component tracks the symbolic values Other Technical Aspects RIVER framework is written in C++, having 14 KLOC in the current stable version, but is still under further development, with more components, optimisation and types of analyses to be added soon RIVER IL currently covers about 87 % of the integer x86 instruction set, which is the core of x86 This percentage is high enough to run most binary programs in RIVER reversible mode (including specific debugging) and for taint analysis However, we cannot compare yet the performance of RIVER with other frameworks using symbolic execution on binaries, because the SMT solver integration does not have a high enough coverage to run on existing benchmarks Also, the symbolic execution engine implements only a straightforward depthfirst exploration of paths using the C APIs of the RIVER components in the middle of Fig 1, but we are now adapting several advanced features available in other state of the art symbolic execution frameworks [2,3,6,9,10,13,14] There is great advancement in the dynamic symbolic execution research community, which was increasingly active over the last decade [3,4] Moreover, we now develop an integration of our concolic execution with a parallel fuzz testing module, in order to increase the path coverage We designed a distributed processing framework based on Apache Spark and Hadoop to apply fuzz testing on several parallel machines and obtain a first test suite with a good coverage Then, we apply symbolic execution by tweaking certain paths to increase coverage, as done also by others [15,16] This is still work in progress Also, RIVER IL increases sixfold the size of original x86 code To lower this overhead, we are currently implementing some classic code optimisation methods such as instruction reordering After first experiments, we estimate to reduce the size of RIVER code to only double the size of the original code, which should be an acceptable trade-off Conclusions In this paper, we presented RIVER, a new binary analysis framework built from scratch with the idea of reversible basic block at its core RIVER has all the components needed to perform dynamic symbolic execution, including: dynamic binary instrumentation and reversible execution, which enabled the construction of a dedicated debugger, and also, taint analysis and SMT solver integration, 784 T Stoenescu et al which enabled a lightweight symbolic execution engine with minimized footprint This architecture was based on a novel intermediate representation, RIVER IL We plan to use RIVER internally at Bitdefender in order to both extensively test our commercial products, but also to find security vulnerabilities in external binary files, which is Bitdefender’s core business To reach this level, we need to implement several improvements mentioned before and then tune the framework for certain types of vulnerabilities This will be our focus for the next months Moreover, we want to experiment with idea cross-pollination between RIVER and related tools in both directions, i.e., to implement in RIVER heuristics that proved efficient in other frameworks, but also vice versa, to investigate if our concept of reversibility may improve the performance of existing tools (see how KLEE benefited from such a transfer of optimization ideas in [14]) Acknowledgements We thank Sorin Baltateanu and Traian Serbanuta for fruitful discussions and acknowledge partial support from MuVeT and MEASURE projects (PN-II-ID-PCE-2011-3-0688 and PN-III-P3-3.5-EUK-2016-0020) References European-Commission: Commission signs agreement with industry on cybersecurity and steps up efforts to tackle cyber-threats http://europa.eu/rapid/ press-release IP-16-2321 en.htm Accessed July 2016 DARPA-US: Cyber grand challenge (2016) http://cgc.darpa.mil Cadar, C., Sen, K.: Symbolic execution for software testing: three decades later Commun ACM 56(2), 82–90 (2013) Pasareanu, C.S., Visser, W.: A survey of new trends in symbolic execution for software testing and analysis STTT 11(4), 339–353 (2009) Sen, K., Marinov, D., Agha, G.: CUTE: a concolic unit testing engine for C In: Proceedings of ESEC/FSE, pp 263–272 ACM (2005) Cadar, C., et al.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs In: Proceedings of OSDI, pp 209–224 USENIX (2008) Luckow, K.S., Pasareanu, C.S.: Symbolic PathFinder v7 ACM SIGSOFT Softw Eng Notes 39(1), 1–5 (2014) Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis In: Sekar, R., Pujari, A.K (eds.) ICISS 2008 LNCS, vol 5352, pp 1–25 Springer, Heidelberg (2008) doi:10.1007/978-3-540-89862-7 Cha, S.K., Avgerinos, T., Rebert, A., Brumley, D.: Unleashing Mayhem on binary code In: Proceedings of SP 2012, pp 380–394 IEEE (2012) 10 Salwan, J., Saudel, F.: Triton: a dynamic symbolic execution framework In: Proceedings of SSTIC, pp 31–54 (2015) http://triton.quarkslab.com 11 Bitdefender (2016) http://www.bitdefender.com/business/awards.html 12 de Moura, L., Bjørner, N.: Z3: an efficient SMT solver In: Ramakrishnan, C.R., Rehof, J (eds.) TACAS 2008 LNCS, vol 4963, pp 337–340 Springer, Heidelberg (2008) doi:10.1007/978-3-540-78800-3 24 13 Chipounov, V., Kuznetsov, V., Candea, G.: The S2E platform: design, implementation, and applications ACM Trans Comput Syst 30(1), (2012) RIVER: A Binary Analysis Framework Using Symbolic Execution 785 14 Rizzi, E.F., et al.: On the techniques we create, the tools we build, and their misalignments: a study of KLEE In: Proceedings of ICSE 2016, pp 132–143 ACM (2016) 15 Ciortea, L., Zamfir, C., Bucur, S., Chipounov, V., Candea, G.: Cloud9: a software testing service Oper Syst Rev 43(4), 5–10 (2009) 16 Stephens, N., et al.: Driller: augmenting fuzzing through selective symbolic execution In: Proceedings of NDSS 2016, pp 1–16 The Internet Society (2016) Author Index Abdulla, Parosh Aziz 25 Antonino, Pedro 43 Aştefănoaei, Lacramioara 60 Atig, Mohamed Faouzi 25 Bardin, Sébastien 235 Becker, Hanno 69 Beneš, Nikola 85 Bensalem, Saddek 60, 199 Biondi, Fabrizio 406 Bisgaard, Morten 559 Böhm, Stanislav 102 Bošnački, Dragan 694 Bozga, Marius 60, 199 Brabrand, Claus 217 Brim, Luboš 85 Cavada, Roberto 741 Chand, Saksham 119 Chen, Mingshuai 137 Chen, Xiaohong 460 Chen, Xin 721 Chen, Yuqi 155 Cheng, Chih-Hong 60 Chifflier, Pierre 496 Chin, Wei-Ngan 659 Cimatti, Alessandro 164, 741 Colvin, Robert J 352 Combaz, Jacques 199 Crema, Luigi 741 Crespo, Juan Manuel 69 David, Cristina 182 Day, Nancy A 677 Dellabani, Mahieddine 199 Demko, Martin 85 Diep, Bui Phi 25 Dimovski, Aleksandar S 217 Djoudi, Adel 235 Dong, Jin Song 513 Feyling, Claus 772 Filipovikj, Predrag 748 Flores-Montoya, Antonio 254 Fränzle, Martin 137, 577 Galowicz, Jacek 69 Gerhardt, David 559 Ghezzi, Carlo 531 Ghorbal, Khalil 628 Giannakopoulou, Dimitra 274 Giantamidis, Georgios 291 Gibson-Robinson, Thomas 43 Gomes, Victor B.F 310 Gotsman, Alexey 426 Goubault, Éric 235 Griesmayer, Andreas 551 Grov, Gudmund 326 Grumberg, Orna 593 Gu, Ming 757, 764 Guck, Dennis 274 Hansen, Michael R 577 Hasanagić, Miran 344 Hayes, Ian J 352 Hensel, Ulrich 69 Hermanns, Holger 559 Hiet, Guillaume 496 Hirai, Yoichi 69 Hoa, Koh Chuen 388 Hofmann, Martin 612 Holzer, Andreas 370 Honiden, Shinichi 444 Hou, Zhe 388 Huang, Chao 721 Huang, Wen-ling Ipate, Florentin 779 Ishikawa, Fuyuki 444 Jančar, Petr 102 Jiang, Yu 757, 764 Jiao, Li 702 Johansen, Christian 772 Johnson, Taylor T 628 Joshi, Saurabh 551 788 Author Index Kawamoto, Yusuke 406 Kesseli, Pascal 182 Khoo, Siau-Cheng 659 Khyzha, Artem 426 Kobayashi, Tsutomu 444 Kong, Hui 757 Kong, Pingfan 460 Krčál, Jan 559 Kroening, Daniel 182, 551 Kunz, César 69 Lahav, Ori 479 Larsen, Peter Gorm 344 Lausdahl, Kenneth 344 Le, Ton Chanh 659 Legay, Axel 406 Letan, Thomas 496 Lewis, Matt 182 Li, Li 513 Li, Yangjia 137, 702 Li, Yi 460 Lin, Wang 721 Lin, Yuhui 326 Liu, Han 757, 764 Liu, Yang 388 Liu, Yanhong A 119 Liu, Zhiming 721 Ljungkrantz, Oscar 748 Lönn, Henrik 748 Luteberget, Bjørnar 772 Mahmud, Nesredin 748 Marinescu, Raluca 748 Meca, Ondřej 102 Meinicke, Larissa A 352 Melham, Tom 551 Menghi, Claudio 531 Morin, Benjamin 496 Mosaad, Peter N 137 Mover, Sergio 164 Mukherjee, Rajdeep 551 Nakata, Keiko 69 Neele, Thomas 694 Néron, Pierre 496 Nies, Gilles 559 Ody, Heinrich 577 Parkinson, Matthew 426 Pastva, Samuel 85 Peleska, Jan Poskitt, Christopher M 155 Predut, Sorina 779 Roccabruna, Mattia 741 Roscoe, A.W 43 Rothenberg, Bat-Chen 593 Ruess, Harald 60 Sacchini, Jorge Luis 69 Šafránek, David 85 Sanan, David 388 Schumann, Johann 274 Schwartz-Narbonne, Daniel 370 Seceleanu, Cristina 748 Senjak, Christoph-Simon 612 Sessa, Mirko 164 Sha, Lui 757 Sogokon, Andrew 628 Song, Houbing 757 Spoletini, Paola 531 Stefanescu, Alin 779 Steffen, Martin 772 Stenger, Marvin 559 Stoenescu, Teodor 779 Stoller, Scott D 119 Strichman, Ofer 645 Struth, Georg 310 Sun, Jiaguang 757, 764 Sun, Jun 155, 460, 513 Sun, Meng 460 Ta, Quang-Trung 659 Tabaei Befrouei, Mitra 370 Tews, Hendrik 69 Tiu, Alwen 388 Tonetta, Stefano 741 Tran-Jørgensen, Peter W.V 344 Tripakis, Stavros 291 Tuerk, Thomas 69 Tumas, Vytautas 326 Vafeiadis, Viktor 479 Vakili, Amirhossein 677 Veitsman, Maor 645 Velykis, Andrius 352 Wang, Jingyi 460 Wang, Shuling 702 Author Index Wąsowski, Andrzej 217 Weissenbacher, Georg 370 Wies, Thomas 370 Wijs, Anton 694 Winter, Kirsten 352 Yan, Gaogao 702 Yang, Zhengfeng 721 Zhan, Naijun 137, 702 Zhang, Huafeng 764 789 ... Lecture Notes in Computer Science ISBN 97 8-3 -3 1 9-4 898 8-9 ISBN 97 8-3 -3 1 9-4 898 9-6 (eBook) DOI 10.1007/97 8-3 -3 1 9-4 898 9-6 Library of Congress Control Number: 20169 56000 LNCS Sublibrary: SL2 – Programming... https://en.wikipedia.org/wiki/Model-based_testing, 201 6- 0 7-1 1 c Springer International Publishing AG 2016 J Fitzgerald et al (Eds.): FM 2016, LNCS 9995, pp 3–22, 2016 DOI: 10.1007/97 8-3 -3 1 9-4 898 9-6 _1 J Peleska and... the FM Symposium has become a central part of the intellectual and social life of the Formal Methods community We are therefore delighted to present the proceedings of FM 2016, the 21st symposium