1. Trang chủ
  2. » Thể loại khác

Automated software test data generation using a genetic algorithm

15 185 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Automated software test data generation using a GA 6/2/94 Automated Software Test Data Generation Using A Genetic Algorithm Min Pei Professor, Beijing Union University Visiting Professor, Michigan State University Visiting Professor, Beijing University of Aeronautic and Astronautic Erik D Goodman Professor and Director, Case Center for Computer-Aided Engineering & Manufacturing Michigan State University Visiting Professor, Beijing University of Aeronautic and Astronautic Zongyi Gao Professor, Computer Science Department Beijing University of Aeronautic and Astronaut Kaixiang Zhong Graduate Research Assistant, Computer Science Department Beijing University of Aeronautic and Astronautic Abstract Software testing is an important way for improving the quality of software It accounts approximately half of all software engineering cost The critical point is to increase the degree of automation of testing and its test data generation In 80’s most test data generation used symbolic evaluation to drive test data Since the complexity of the set of predicate equations solving it can not be put into practice Recent methods use actual program execution and function minimization search method The local property of these method needs to be improved In this paper we proposed a new approach automated test data generation using a genetic algorithm (GA) Compared with traditional search algorithms the efficiency and efficacy are much better than before We introduce the new method and apply the new method to a typical sample The results of the experiment of test data generation using a GA and its analysis is presented in this paper Further work needs to be extended to dynamic data structure Key words: software testing, test data generation, path selection, genetic algorithm, fitness No 15 Automated software test data generation using a GA path[tmp] = 0; tmp++; while (i < high[0]) {if (step[0] == 0) { printf(“\n********step == 0\n”); break; } loop_counter++; if (loop_counter 0 E1 - E2 < E2 - E1 E2 - E1> 0 E2 - E1< abs (E1 - E2) abs (E1 - E2) > 0 abs (E1 - E2) < F= E1 = E2 E1 ≠ E2 E1 - E2 F= E1 < E2 E1 ≤ E2 when F= The fitness function can be the sum of the branch functions along the ideal required path Fitness = F1 + F2 ⋅⋅⋅ + Fn (2) This fitness function (2) may be sensitive than the simple one (1) above The GA search procedure of the minimization of Fitness function can obtain the optimal results, i.e the test data which can execute the selected path No Automated software test data generation using a GA 6/2/94 value from to or from to It prevents the loss of useful genetic material and plays a secondary role we are not going to introduce this operators in detail, please refer the book and related GA papers[14], [15] The general flowchart of test data generation using a GA is shown on Fig We use the GA system GAucsd 1.4 to implement the algorithm applied on the sample problem[16] Initialize Population (generate random sets of input values) Selected ideal path Pi Evaluation Dynamic execution for ideal path Pi Yes Satisfied? No G A - Operation Selection Crossover Mutation Optimum Chromosome Path domain (input values) for ideal path Pi Fig General Flowchart of Test data Generation using a GA No Automated software test data generation using a GA 6/2/94 solved step by step The search for minimum of the branch function proceed with all input variables in the manner of one by one All input variables x1, x2, ⋅⋅⋅, xn are explored in turn until the solution is found or fail to find the solution 3.The search procedure consists of two major phases, an “exploratory search” and a “pattern search” Actually this is a trial and error way 4.This is one-dimensional search procedure based on direct search method It allows only to find a local minimum, when function has several local minimums it can prevent to find a solution The method of automated software test data generation using a GA is totally a new approach It can overcome most of above shortcoming The problem can be solved by evolution procedure It works on whole path directly and search with all input variables It is a directed randomized technique using evolution and genetic not a trial and error way Obviously, the new approach is a high dimensional search procedure, it can find nearly optimization We would explain the new approach in next section briefly and use it to solve Fig sample program Test data generation and Software testing using a GA Outline of test data generation using a GA Test data problem to a genetic algorithm is represented in an abstract form in terms of a chromosome which is directly analogous to a chromosome in a living organism The chromosome is composed of gene, each of which may assume one of a number of possible values or alleles While in an organism a gene may represents sex or eye’s color, the gene in the test data generation representation sense is one of the input variables The genetic algorithm manipulates the coding of the set of gene values making up the chromosome at binary string level This is equivalent to operate on the set of values of all input variables This is one of the basic distinction from the traditional method such as direct optimization method The GA operates on a population of the sets of input values rather than a single set of input values The overall suitability of a chromosome, that is the matching degree between the path of practical execution and the ideal required path we set, is termed its fitness The value of fitness function of a chromosome reflects the path of the program executing on the input values of all variables represented by the chromosome how good it complies with the user selected path Here the coding system is quite simple, we take binary string of all values of input variables as a chromosome When we solve the sample problem Fig 1, the chromosome is the coding of values of an array and three variables In total there are 131 input variables where 128 are the members of array and the rests are high, low and step variables We transit each array element as a 10 bits binary string and rest variables as bits binary string There are three basic operations we use in GA evolution procedure They are: Reproduction, Crossover, and Mutation • Reproduction of selection operation based on spinning a weighted roulette wheel where each current chromosome (binary string) in the population has a wheel slot sized in proportion to its fitness Spin of the weighted wheel create more offspring of high fit string in the succeeding generation • Crossover operation take members of newly reproduction strings in the mating pool and mate together Then choice an integer position k along the string at random between and the string length less one [1, l-1] and swap all bits between k+1 and l inclusively This is main operator • Mutation operation randomly walk through the string space then occasionally change the No Automated software test data generation using a GA 6/2/94 In a reduced flow graph of program we merge the edges of sequencing nodes as a short subpath and take different branches which include in if-then-else or while statement as an independent subpath separately Each subpath in the flow graph can be labeled by certain number In fact, a path in a control flow graph is a sequence of this kind of subpath and we can identify the path by the sequence of labeled number The reduced flow graph of the sample program of Fig is shown on Fig.2 For example, label represents the subpath < n1, n2, n3, n4 >, label represents the subpath < n5, n6 >, label is the subpath < n5, n7 > Path selection In practice, a program Q may have an infinite number of paths Any practical pathwise testing will have to involve a procedure for selecting a subsets of total paths In the analysis the potential reliability of pathwise testing is examined by considering the degree of the reliability that could be obtained A natural criterion of reliability of program testing is the execution of every branch in the program for a finite cases In some of the systems the user selects program paths and the computer generates the description of the data which cause the paths to be followed In other systems under static analysis the program is automatically decomposed into classes of paths and one path is selected from each class[12] All of the systems result in the generation of a sequence of sets {Ti}in=11 which correspondent to path domains or to union of path domains For concentrate on automated generation test data we select the paths of our sample program manually now We decide to select the path limited to loops All the possible paths is shown on table No path No path No path 0-7 0-1-2-4-6-1-3-4-6-7 15 0-1-3-4-6-1-2-5-6-7 0-1-2-4-6-7 0-1-2-4-6-1-3-5-6-7 16 0-1-3-4-6-1-3-4-6-7 0-1-2-5-6-7 10 0-1-2-5-6-1-2-4-6-7 17 0-1-3-4-6-1-3-5-6-7 0-1-3-4-6-7 11 0-1-2-5-6-1-2-5-6-7 18 0-1-3-5-6-1-2-4-6-7 0-1-3-5-6-7 12 0-1-2-5-6-1-3-4-6-7 19 0-1-3-5-6-1-2-5-6-7 0-1-2-4-6-1-2-4-6-7 13 0-1-2-5-6-1-3-5-6-7 20 0-1-3-5-6-1-3-4-6-7 0-1-2-4-6-1-2-5-6-7 14 0-1-3-4-6-1-2-4-6-7 21 0-1-3-5-6-1-3-5-6-7 Table selected paths Test data generation and dynamic program testing In this section we would try to compare the method we proposed with the method in paper [7] In paper [7] test data generation is solved by transforming each branch predicate E1 op E2 into the equivalent real-valued function F referred as a branch function and using function minimization search technique and dynamic data flow analysis There are several shortcoming in this method The test data generation problem has to be divided to a sequence of subgoals and to be No Automated software test data generation using a GA 6/2/94 program input[6] A path Pk in a control flow graph is a sequence Pk = < nk0, nk1, ⋅⋅⋅, nkq > of instructions, such that nk0 = s, and for all i, ≤ i < q, (nki, nki+1) ∈ A Suppose Pi is a path through a program Q Then the path domain Di = D(Pi) for Pi is the subset of the input domain which causes Pi to be executed The path computation Ci =C(Pi) for Pi is the function which is computed by the sequence of computations in Pi[12] A path is feasible if there exists a program input x for which the path is traversed during program execution, otherwise the path is infeasible #include int fat(a,low,high,step) { int a[128]; int low; int high; int step; int min, max, i; = a[low]; max = a[low]; i = low + step; n0 n1 n2 n3 n4 n5, n6 n7, n8 n9 while (i < high) {if (max < a[i]) max = a[i]; if (min > a[i]) = a[i]; i += step; } n10 printf(“Max=%d, Min=%d” max, min); } Fig A sample program n0• n4 • n n6 • n8• • • n10 •e • n7 • n9 Fig The reduced control flow graph of a sample program No Automated software test data generation using a GA 6/2/94 In the following of the paper, we introduce the new method of test data generation using GA through a simple sample problem For comparison and the sake of simplicity, we focus on pathwise test data generation, and use about the same concept and same example in [7] Test data generation problem Software Test data generation is to find a set of test data T which is reliable A set of test data T called reliable for a software if it reveals that software S contains an error whenever S is incorrect If a set of test data T is reliable and software S produce the correct output for each element of T then S is a correct software Although absolutely complete software testing only can be achievable by executing it on all possible program inputs, obviously it is impossible to so There are three sources of information for constructing test data: the program to be test, its specification, programmer’s knowledge of commonly occurring programming errors Correspondingly, There are three type of test data generation: pathwise test data generation[3], [4], [5] [6], data specification generation[9], [10], selected and random test data generation[11] Most researchers agree that a natural criterion of program testing completeness is pathwise testing strategy which is the execution of all program branches (or, in other words, traversing all exits of statements) for a finite number of cases[5] A method for analyzing the reliability of path testing was introduced in the paper[12] This analysis means that if data for testing those programs are selected using the path testing strategy, we will be “almost certain” of detecting 65 percent of errors [13] Following above idea our new method focuses on pathwise test data generation The basic operations of pathwise software testing consists of three steps: program control flow graph construction, path selection, and test data generation and dynamic program execution Program control flow graph A control flow graph of program Q is a directed graph G=(N, A, s, e) where 1) N is set of nodes, 2) A is binary relation on N (a subset of N × N), referred to as a set of edges, and 3) s and e are, respectively, unique entry and unique exit node, s, e ∈ N A node in N corresponds to the smallest single-entry, single-exist executable part of a statement in Q that can not be further decomposed; such a part is referred to as an instruction A single instruction corresponds to an assignment statement, an input or output statement, or the part of an if-then-else or while statement, in which case it is called a test instruction An edge (ni, nj) ∈ A corresponds to a possible transfer of control from instruction ni to nj For instance, (n2, n3), (n5, n6), (n5, n7) are edges in the program of Fig.1 An edge (ni, nj) is called a branch if ni is a test instruction Each branch in the control flow graph can be labeled by a predicate, referred to as a branch predicate, describing a condition under which the branch will be traversed For example, in the program of Fig branch (n4, n5) is labeled “i < high”, branch (n5, n6) is labeled “max < a[i]”, and branch (n5, n7) is labeled “max ≥ a[i]”[7] An input variable of a program Q is a variable which appears in an input statement or it is in an input parameter of a function or procedure Input variable may be of different types, e.g., integer, real, boolean, etc Let I= (x1, x2, ⋅⋅⋅, xn) be a vector of input variables of program Q The domain Dri of input variable xi is a set of all values which xi can hold By the domain D of the program we mean a cross product, D = Dr1 × Dr2 × ⋅⋅⋅ × Drn, where each Dri is the domain for input variable xi A single point x in the n-dimensional input space D, x ∈ D, is referred to as a No Automated software test data generation using a GA 6/2/94 Introduction Software testing is a main method for improving the quality and increasing the reliability of software now and thereafter the long-term period of future Software testing is a kind of complex, labor-intensive, and time-expensive work; it accounts for approximately 50% of the cost of a software system development [1], [2] Increasing the degree of automation and the efficiency of software testing certainly can drop down the cost of software design, reduce the time period of software development, and increase the quality of software significantly The critical point of the problem involved in automation of software testing is of particular relevance of automated software test data generation Test data generation in software testing is the process of identifying a set of program input data which satisfies given testing criterion For this difficult problem solving there were a lot of research works which have been done in the past In 80’s century, most methods of the test data generation [3], [4], [5] [6], used symbolic evaluation to drive test data The way of symbolic evaluation is through establishing predicate equations under static condition and solving them to drive test data Since the complexity of the set of predicate equations solving, the non-determination of the number of the loop and the index of array, and the existing explosion problem made these methods hard to put into practice From the end of 80’s up to now some methods presented which is based on actual dynamic execution of testing program These methods use function minimization search algorithms to automatically locate values of input variables for which the function becomes negative and the selected path is traversed In general, these methods requires regularity and continuity of the branch function and the existence of derivatives Even though an alternative method has been proposed which selected direct-search methods in the actual execution of program[7] This method which progress towards the minimum using a strategy based on the comparison of branch function values only can avoid above requirement of the branch function But it is still not able to compute and control the variation of branch function on the whole path, only can work on the subpath step by step Every step is one dimensional search procedure for solving and consists of two major phases, an “exploratory search” and a “pattern search” The essence of this method is a kind of hill-climbing, so the efficiency is quite slow In this paper we present a new approach of test data generation using a Genetic Algorithm(GA) Genetic Algorithms are search algorithms based on the mechanics of natural selection and natural genetics It follows the Darwin’s evolution principle “struggle for life and survival of fittest” Compared with traditional optimization techniques genetic algorithms are known for their robust The function minimization algorithm applied in some approaches of test data generation is based on the direct search methods [8] One of the problems of this method is that it allows only to find a local minimums In many cases this can prevent solving subgoals, especially for branch functions with several local minimums[7] The genetic algorithm manipulate the coding of the input value set at string level not the input value set itself, search from a population of input sets not from a single input set These features of the method let search to explore similarities among high performance strings (the representative of certain input set) and finally the results can cover the program path we set GA only uses the degree of covering ideal path as fitness function and efficiently exploits historical information to speculate on new search points with expected improved performance The rules of transition from one set to the next are probabilistic rather than deterministic These differences make the new method easy to find the nearly global optimum the input set for covering the ideal required path And the genetic Algorithm can entirely work on the whole path, it wouldn’t need to divide the path into subpath solving them step by step and often ignore the knowledge of the branch predicates and functions No Automated software test data generation using a GA 6/2/94 Automated Software Test Data Generation Using A Genetic Algorithm Min Pei1,2, Erik D Goodman1, Zongyi Gao3, Kaixiang Zhong3 1Case Center for Computer-Aided Engineering & Manufacturing Michigan State University East Lansing, MI 48824-1226 pei@egr.msu.edu 2Beijing 3Beijing Union University University of Aeronautic and Astronautic Abstract Software testing is an important way for improving the quality of software It accounts approximately half of all software engineering cost The critical point is to increase the degree of automation of testing and its test data generation In 80’s most test data generation used symbolic evaluation to drive test data Since the complexity of the set of predicate equations solving it can not be put into practice Recent methods use actual program execution and function minimization search method The local property of these method needs to be improved In this paper we proposed a new approach automated test data generation using a genetic algorithm (GA) Compared with traditional search algorithms the efficiency and efficacy are much better than before We introduce the new method and apply the new method to a typical sample The results of the experiment of test data generation using a GA and its analysis is presented in this paper Further work needs to be extended to dynamic data structure Key words: software testing, test data generation, path selection, genetic algorithm, fitness No ... the branch predicates and functions No Automated software test data generation using a GA 6/2/94 Automated Software Test Data Generation Using A Genetic Algorithm Min Pei1,2, Erik D Goodman1,... pathwise test data generation, and use about the same concept and same example in [7] Test data generation problem Software Test data generation is to find a set of test data T which is reliable... section briefly and use it to solve Fig sample program Test data generation and Software testing using a GA Outline of test data generation using a GA Test data problem to a genetic algorithm is

Ngày đăng: 08/11/2017, 23:49

TỪ KHÓA LIÊN QUAN