bia chinh LVan VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY BUI PHI DIEP AVOIDING STATE SPACE EXPLOSION IN MODEL CHECKER MASTER THESIS OF INFORMATION TECHNOLOGY Hanoi 20[.]
VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY BUI PHI DIEP AVOIDING STATE-SPACE EXPLOSION IN MODEL-CHECKER MASTER THESIS OF INFORMATION TECHNOLOGY Hanoi - 2014 VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY BUI PHI DIEP AVOIDING STATE-SPACE EXPLOSION IN MODEL-CHECKER Major: Computer science Code: 60480101 MASTER THESIS OF INFORMATION TECHNOLOGY SUPERVISOR: Assoc Prof Nguyen Viet Ha Dr Mohamed Faouzi Atig Hanoi - 2014 Declaration of Authorship I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at University of Engineering and Technology (UET/Coltech) or any other educational institution, except where due acknowledgement is made in the thesis Any contribution made to the research by others, with whom I have worked at UET/Coltech or elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged Signed: Date: i ABSTRACT Model-checking is a well-known technique for the program verification problem (i.e., checking that the program satisfies a given property) However, Model-checking su↵ers the state-space explosion problem This is more visible in the the case of concurrent / parallel programs Therefore, developing new efficient techniques to address the statespace explosion problem (such as slicing) is a crucial and difficult challenge in Modelchecking In this thesis, we present a new slicing method for handling the standard state explosion problem Our slicing method consists of three steps: (1) creating an abstraction with respect to a subset of program variables, which leads to an over-approximation of the input program, (2) reconstructing a program with another subset of variables from a counterexample of abstracted program, and (3) refining the abstraction if the counterexample is a spurious one The process stops when either there are no more counterexamples remaining or there exists a counterexample after investigating all variables In the former case, the program is correct; in the latter case, the program contains an error We have implemented a prototype tool and run it successfully on standard benchmarks, together with several challenging examples The experimental results show the efficiency of our method ii Acknowledgements First and foremost, I would like to express my deepest gratitude to my supervisor, Assoc.Prof Nguyen Viet Ha, for his patient guidance and continuous support throughout the years I would like to give my honest appreciation to my co-supervisor Dr Mohamed Faouzi Atig and Prof Parosh Aziz Abdulla for their great support They always appear when I need help, and respond to queries so helpfully and promptly The encouragement from my family, my friends in UET-VNUH and Uppsala University, and my girl friend, Diu Cap, is also very important for me When reading this thesis, if you find any mistakes, sending to me at diepbp@vnu.edu.vn is appreciated iii Contents Declaration of Authorship i Acknowledgements iii Contents iv List of Figures vi List of Tables vii Introduction 1.1 Motivation 1.2 Related work 1.2.1 Program Slicing 1.2.2 Predicate Abstraction 1.3 Thesis structure 1 4 Slicing Sequential Programs 2.1 Program Syntax 2.2 Statements, Variables 2.3 Program Control-Flow Graph 2.4 Program Transition System 2.5 Variable Slicing for Sequential Programs 2.5.0.1 Creating an initial abstraction 2.5.1 Reconstructing Counterexample 2.5.2 Refining the abstraction 8 10 11 13 15 15 19 21 22 22 22 23 23 Slicing Concurrent Programs 3.1 Program Syntax 3.2 Statements 3.3 Program Control-Flow Graph 3.4 Variable Slicing for Concurrent Programs 3.4.1 Reconstructing the counterexample Experiment 27 4.1 Sequential Programs 27 4.2 Concurrent Programs 29 iv Contents v Conclusions and Future Work 33 Bibliography 34 List of Figures 1.1 1.2 1.3 Structure of SPIN Our method CEGAR framework 2.1 2.2 2.3 An example of program and its control flow graph An example of program abstraction, counterexample and program Refined abstraction 3.1 3.2 3.3 CFG of concurrent program and their abstraction 21 A concurrent counterexample 24 Simulating the concurrent counterexample 25 4.1 A program with its initial abstraction 28 vi 11 reconstructed 18 20 List of Tables 2.1 2.2 The syntax of program Conditions on state transitions hv1 , ⌦1 i !P hv2 , ⌦2 i for each vertex type 12 4.1 Experimental results of verifying concurrent programs in comparison with SPIN 30 Column Information 30 Experimental results of verifying concurrent programs in comparison with tools in SV-COMP 32 4.2 4.3 vii Chapter Introduction In this chapter, we describe the motivation of our work, and it is important Initially, we start with the necessaries of program verification and Model-Checking Then we state the state explosion problem, which are unavoidable by applying Model-Checking Next, we summarize our solutions and results Finally, we describe related work and thesis’s structure 1.1 Motivation Software is everywhere The appearance of software is in diverse areas namely education, healthcare system, transportation Besides, the development of multicore architecture makes software to be designed in many cores It helps software run faster, but unfortunately software becomes larger and more complex Software developers now need more e↵ort to not only make software but also guarantee that it works correctly Errors in software are difficult to find and fix Therefore, we need efficient techniques, for example program verification and testing, to help us handle complex program errors Program verification is a technique that considers a program with given properties of interest It then concludes that whether the properties are satisfied in the program The considered properties of program can be valuations of variables at a program location, or no out of memory errors The result of program is either the properties are hold, which means all executions of program not violate the properties, or the properties are not hold because of a specific execution It is notable that program verification is undecidable There are no a program verification tool which can handle every type of program with every type of properties In practice, each program verification tool only focuses on a small types of program with a restricted set of properties Chapter Introduction Program testing is another technique to find program errors Program testing considers a program with a pair of input and output, called test case If the program executes with the given input and returns the given output, the program is definitely correct in that case Otherwise, the program has an error related to the given input and output There are a number of limitations of program testing Initially, program testing requires a large number of test cases to cover all possible program execution More importantly, program testing cannot handle nondeterministic programs, for example a program with many threads and interleaving between threads Such bugs like Heisenbugs [1] are difficult to figure out by program testing So in the scope of this thesis, we only focus on using program verification Model checking is a popular verification technique The key idea of model checking is that it represents the program by a model, i.e Kripke structure [2], and verifies the model instead of the original program The work in [3] shows a number of advantages of model checking in comparison with other verification techniques, i.e automated theorem proving, such as faster, providing counterexample - which is one of the biggest advantage of model checking, and using temporal logics to describe properties of program SPIN [4] is an efficient and a famous model checker SPIN provides an intuitive notation for design specification, and a concise notation for correctness claims, and the consistency between the notations In particular, the design specifications are written in Promela language [5], and correctness claims are written by Linear Temporal Logic [6] The structure of SPIN is shown in Figure 1.1 It takes input from the front end XSPIN The input is the program specification, including the program design and its correctness claims, both are written in Promela language If the specification has no syntax errors, it then generates a verifier, optimizes and executes the verifier A counterexample of program is detected, it is then sent back to the simulation to to inspect the error in detail However, SPIN faces the state explosion problem Ideally, SPIN visits program states and stores the valuation each state The state spaces that SPIN travels may contain billions of reachable states Therefore, SPIN requires much time when verifying large programs, especially concurrent programs One of efficient method to handle the state explosion program is slicing [7] There are a number of slicing methods that have been developed, i.e static slicing [7], dynamic slicing [8] and conditional slicing [9] In general, instead of considering the whole program, slicing focus on a number of statements in the program with respect to the slicing criterion The slicing criterion often is a pair of a program location and a subset of program variables Slicing is proved to be useful in testing [10], debugging [8] Chapter Introduction XSPIN Front -End Syntax Error Reports Promela Parser LTL Parser and Translator Verifier Generator Simulation Optimization Counterexample Executable On-The-Fly Verifier Figure 1.1: Structure of SPIN We now propose a new slicing method to handle some basic cases of the explosion problem The di↵erence of our method in comparison with other slicing methods that is we not concentrate on any specified subset of program variables All variables in the program are considered For each step in our method, a subset of program variables is selected A new executable program, which consists of statements in the original with respect to selected variables, is generated and verified The Figure 1.2 describes our method Our method aims to check the program safety The process of our method is describes as follows • Our method takes a program as input It then creates an initial abstraction of program with respect to a subset of program variables The abstraction is an over-approximation of the original program • The abstraction then is verified by a model checker If the program is safe, the process stops, the program is concluded to be safe Otherwise, a counterexample is generated The counterexample execution trace shows why there exists a bug in the program The counterexample is used to reconstruct a simulation program In general, the simulation program is another abstraction of the original program, which not only follows statements in counterexample execution trace but also contains statements with respect to a new subset of variables of the original program The model checker then verifies the new abstract program Chapter Introduction P Create Initial Abstraction P1 Model Check Cex Refine No counterexample Reconstruct Counterexample P2 No counterexample Correct Cex, V ars(P2 ) 6= ; Model Check Cex, V ars(P2 ) = ; Error Figure 1.2: Our method • If the reconstructed program is safe, that means the counterexample is spurious Our method then refines the initial abstraction and runs model checker again Otherwise, a counterexample is generated and our method reconstructs it again with another subset of variables • Our method stops when either there are no more variables remaining in the program or no more counterexamples are detected In the former case, the program is error In the latter case, the program is safe 1.2 Related work In this section we describe related work, including work about program slicing and predicate abstraction 1.2.1 Program Slicing Program slicing is a static analysis technique proposed by Weiser [7] The technique uses control and data flow to find a subset of statements in the program which are suitable for a particular criterion That criterion defines how the given program is sliced is called Chapter Introduction slicing criterion The applications of program slicing are in several tasks, including testing [10], debugging [8], maintenance [11], etc The slicing method proposed by Weiser is called static slicing Slicing criterion of static slicing is a tuple (l, V ) where l is a program location and V is a subset of program variables A slice is any subset of program that maintains the e↵ect of program on the slicing criterion In particular, a static slice is obtained by removing a number of statement in the original program and it preserves the program behavior of the original program with respect to the slicing criterion The main issue of static slicing is that a static slice may contain many statements that not a↵ect the value of variables of interest To overcome the issue, dynamic slicing [8] is proposed to reduce the number of considered statements In particular, dynamic slicing preservers the behavior of program for a specific input So that instead of involving all potential statements a↵ecting the slicing criterion, dynamic slicing reduces the search space Therefore, slicing criterion of dynamic slicing is a triple (l, V, I) where l, V are program location and a subset of variables in the program respectively, and I is the program input Conditioned slicing [9] is another slicing method that bridges the gap between static slicing and dynamic slicing While static slicing does not care about input, dynamic slicing specifies input in detail, that triggers a large number of input needed to cover the whole program Conditioned slicing provides information about input without being so specific as to give the precise values, i.e., using boolean expression to relate the possible value of inputs Slicing criterion of dynamic slicing is a triple (l, V, F (V )) where l, V are program location and a subset of variables in the program respectively, and F (V ) is the first order logic formula on variables in V 1.2.2 Predicate Abstraction Predicate abstraction is a technique proposed by Graf and Saidi [12] and Colon and Uribe [13] The technique is widely applied in software verification Instead of tracking the specific value of data of variables in the program, it tracks predicates of the data When verifying a program, an abstracted program is created to represent it, using Existential Abstraction [14] Variables in the abstracted program are Boolean variables, which represents predicates So each abstract state in the abstracted program models a number of state in the original program There exists a transition from an abstract state if at least one corresponding concrete state in the original program has the transition, that is over-approximation The main challenge of predicate abstraction that is finding a set of predicates for a program Chapter Introduction P, ' Create Initial Abstraction A(P ), ' Model Check A(P ) ' A(P ) ✏ ' Generate Counterexample Refine Cex is spurious Cex Stop Cex is not spurious Check Spurious Counterexample Figure 1.3: CEGAR framework Counterexample guide abstract refinement (CEGAR) is a technique that automates searching for predicates The technique consists of four basic steps as shown in Figure 1.3 In the figure, we use some notations including P, A(P ), ', Cex to denote the original program, abstracted program, program specification and counterexample, respectively • Create Initial Abstraction: Construct abstracted program of the original program by over-approximation the program behavior of the original one The abstracted program is a finite-state program whose variables are Boolean • Verify abstraction: The abstracted program is finite state so a model checker for programs with Boolean variables is used for verifying whether the abstracted program satisfies a given specification ' If the model checker returns the abstracted program is safe then the process finishes Otherwise, the program does not satisfy the specification and then a counterexample is returned • Check the counterexample: The counterexample of abstracted program may be a spurious counterexample because of over-approximation, that means the counterexample is not corresponds a valid execution of the original program If the counterexample turns out to be an actual counterexample, the original program does not satisfy the specification Otherwise, the abstracted program needs a refinement • Refine the abstraction: Because of the existence of the spurious counterexample, it is necessary to eliminate the counterexample from the abstracted program by adding additional predicates, which represents behaviors in the abstracted program Chapter Introduction which does not correspond to original program The model checker then resumes with the refined abstraction 1.3 Thesis structure The organization of this thesis is as follows Chapter describes our slicing method for sequential programs It also includes the program syntax, transition system of sequential programs Chapter defines our slicing method for concurrent programs We only focus on the di↵erence of our method compared with the method for sequential programs Experimental results contains many tests with both sequential and concurrent programs in Chapter Most of them are well-known algorithms, for example Dekker [15], Lamport [16], Peterson [17], etc Finally, conclusions and future work are in Chapter Chapter Slicing Sequential Programs In this chapter, we describe our slicing method for sequential programs We aim to check the program safety The form of safety properties are assertions that specify invariants of the program In Promela, assertions are assert statements: assert(e), where e is an expression If the expression e evaluates to at run time, the program aborts Safety checking is so reduced to check whether assert statements are reachable We first consider Promela language with the standard syntax, statements, variables in a Promela program as well as the transition system of Promela We then describes our method in details for sequential Promela programs Definition (Directed Graph) A directed graph is a tuple G = (V, E) where V is a set of vertices and E ✓ V ⇥ V is a set of edges, together with functions start and end that associate a start vertex and an end vertex with each edge A path of length k from node n to node m is a sequence of edges he1 , , ek i such that end(ei ) = start(ei+1 ) for i = 1, , k 2.1 Program Syntax The program syntax is presented in Table 2.1 Each program P declares a set of variables Vars and a procedure init Variable type is either integer or boolean All variables are global Variables are statically scoped as in C The variable identifier is a C-style identifier Procedure init is constructed by a sequence of statements Statements may be labelled and are built inductively by composition with control-flow statement Expressions are built in the usual way from the constants, variables and the standard logical connectives There are three statements that can a↵ect the control flow of a Chapter Slicing Sequential Programs Table 2.1: The syntax of program Syntax prog ::= decl⇤ decl btype ::= ::= btype id+ ; int id | ::= bool [a-zA-Z ][a-zA-Z0-9 ] prog ::= init() body body ::= { sseq } prog ⇤ ::= ::= | stmt ::= | | | | | option ::= lstmt+ stmt id : stmt skip ; goto id ; id = expr+ ; if option+ fi ; option+ od ; assert expr ; :: expr -> sseq expr expr binop expr ( expr ) id const &| == | > | < sseq lstmt binop ::= | | | ::= Description A program consists of a list of global variable declarations followed by a list of procedure definitions Global variable declarations Variables may has either int or bool type An identifier is a regular Cstyle identifier Procedure definition, followed by program body Program body is a sequence of statements Labelled statement Assignment statement Conditional statement Loop statement Assert statement Option in either conditional or loop statement Logical connectives program, including: if, while and assert The statement if and are a nondeterministic, guarded choice For each if and do, there are possibly two or more options When an option is selected to execute, all other options are discarded 2.2 Statements, Variables The term statement denotes and instance that can be derived from the nonterminal stmt in Table 2.1 Let P be a program with n statements, Stmt be the set of statements in P Let T ype : Stmt ! {Skip, Goto, Assignment, Condition, Assertion} be the function indicating the type of statements in Stmt Chapter Slicing Sequential Programs 10 Because the sequential program only has one procedure, we assume that all variables and statement labels are globally unique Let V ars(P ) be the set of variables in P , V ars(si ) be the set of variables appearing in statement si If si is an assignment statement, let V arsr (si ), V arsl (si ) be the set of variables in the right side and left side of si respectively, V ars(si ) = V arsr (si ) [ V arsl (si ) 2.3 Program Control-Flow Graph This section defines the control-flow graph of a program of sequential programs The program has a procedure, which is modeled by a directed graph The control flow graph of a sequential program P is a directed graph GP = (VP , EP ) where VP is a set of vertices, and EP ✓ VP ⇥ VP is the set of edges The set VP contains one vertex for each statement in P and one additional vertex Exit for the program, which indicates where the program stops Furthermore, VP has one vertex Err to indicate the failure of assert statements Each edge connects two vertices, for example an edge from v1 to v2 , denoted by hv1 , v2 i Edges are constructed by the successor function SuccP : VP ! VP The successor function is defined in term of function N extP : VP ! VP N extP has a recursive definition based on the program syntax in the Table 2.1 In the table, each statement has an sseq node as its parent If statement si is not the last statement in its parent sseq, N extP (si ) is the statement immediately following si in the sequence Otherwise, let a be the closest ancestor of si in the syntax tree such that a is a stmt node and a is the not last statement in its sequence If a exists, N extP (si ) is the statement immediately following a Otherwise, N extP (si ) is Exit So we define SuccP basing on N ext function: • If si is goto L, SuccP (si ) = {sj } where sj is the statement labelled L • If si is either an if or a statement, then SuccP (si ) = {sj1 , , sjn } where sjk , k n are options of the if or respectively • If si is the last statement in an option of statement, then SuccP (si ) = {sj1 , , sjn } where sjk , k n are options of the • If si is an assertstatement, then SuccP (si ) = {N extP (si ), Exit} Example Consider the example in Figure 2.1 The left figure is the program source The program has one function named init The control flow graph of init is shown in the right figure There is an if statement at line with two options x2 and x2 > Chapter Slicing Sequential Programs 11 x1 = x2 = x1 + x2 > x2