A Novel Ant Based Algorithm for Multiple Graph Alignment

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	6
Dung lượng	168,45 KB
File đính kèm	Preprint1409.rar (154 KB)

Nội dung

Multiple graph alignment (MGA) is a new approach to analyzing protein structure in order to exploring their functional similarity. In this article, we propose a twostage memetic algorithm to solve the MGA problem, named ACOMGA2, based on ant colony optimization metaheuristic. A local search procedure is applied only to the second stage of the algorithm to save runtime. Experimental results have shown that ACOMGA2 outperforms stateoftheart algorithms while producing alignments of better quality.

A Novel Ant Based Algorithm for Multiple Graph Alignment Trần Ngọc Hà Thai Nguyen University of Education hatn84@gmail.com Đỗ Đức Đông Vietnam National University-Hanoi dongdoduc@vnu.edu.vn Abstract— Multiple graph alignment (MGA) is a new approach to analyzing protein structure in order to exploring their functional similarity. In this article, we propose a two-stage memetic algorithm to solve the MGA problem, named ACOMGA2, based on ant colony optimization metaheuristic. A local search procedure is applied only to the second stage of the algorithm to save runtime. Experimental results have shown that ACO-MGA2 outperforms state-of-the-art algorithms while producing alignments of better quality. Keywords—Multiple Graph Alignment, Ant Colony Optimization, local search, memetic algorithm, SMMAS pheromone update rule I. INTRODUCTION Multiple sequence alignment is a useful approach for analyzing evolutionary homology among DNA sequences or proteins. However, this method is not suitable to determine the functional similarities among the molecules because functional similarities relate more closely to structural features rather than the sequential ones [6,12,15,18,19]. Recently, a number of authors [1, 2, 10-12, 22-24] have proposed using graphical models to represent threedimensional structures of proteins and using the graph alignment techniques to infer functional similarities based on structural analysis. These methods mainly use exact pair-wise graph matching technique. They produce meaningful results when studying the functional evolution of non-homologous molecules. However, it is difficult to leverage these methods to discover biological meaningful samples from approximately saved ones. Weskamp et al. [21] were the first (2007) to introduce the concept of multiple graph alignment (MGA) and to use it to analyze the protein active sites. They proposed a heuristic algorithm according to greedy strategy. The graphs are used to approximately describe binding pockets in Cavbase [8,14] . In this approach, each binding pocket is modeled as a connected graph G(V, E ) and MGA problem is stated as follows : Given a set G ={G1(V1,E1),…,Gn(Vn,En)}which is a set of connected, node-labeled, edge-weighted graphs. In each graph, there are three edit operations: deletion or insertion of a vertex, change of the label of a node, change of the weight of an edge. The mission of the MGA problem is to find an alignment for the vertices of the graphs belong to G to optimize a predefined objective function. MGA is NP-hard problem (see [6, 21]). The heuristic algorithms are only suitable for small problems, hence, not Hoàng Xuân Huấn Vietnam National University-Hanoi huanhx@vnu.edu.vn suitable for real applications. Fober et al [30] have extended the usage of this problem for the structural analysis of biomolecules and have proposed an evolutionary algorithm called GAVEO. Experiments show that this algorithm is more efficient than greedy algorithm although it is more time consuming. In [20] the authors proposed ACO-MGA algorithm that using simply ant colony optimization scheme to solve the multiple graph alignment problem. Experiment shows that this algorithm has better results than the GAVEO algorithm; however its running time is long and its efficiency is not good for large data sets. This paper introduces a two-stage memetic algorithm based on ant colony optimization called ACO-MGA2 as an improvement of the ACO-MGA to align multiple graphs. We keep construction graph as in ACO-MGA, but improve the heuristic information and the local search procedures. To reduce the running time, the algorithm is split into two stages. The local search is only applied at the second stage of the memetic scheme [13]. It consists of two procedures: 1) Rearranging the different labeled vertices in alignment vectors in order to improve the compatibility of the vertices, 2) Swapping identical labeled vertices on each graph to increase the appropriateness of the edges’ weight. Improvements in both runtime and efficiency of ACO-MGA2 is demonstrated empirically by comparison with GAVEO and Greedy. The rest of this paper is organized as follows: Section 2 provides mathematical statements for multiple graph alignment problem and summarizes the related work. Section 3 introduces the newly proposed algorithm. The experimental results are presented in Section 4. Several conclusions are presented in the last section. II. MULTIPLE GRAPH ALIGNMENT PROBLEM AND RELATED WORKS A. Multiple graph alignment problem The multiple graph alignment problem is introduced [21] by Weskamp et al, with the purpose of studying proteins characteristics. Fober et al [6] extended it to analyze the structure of molecules which includes the chemical composition and the protein binding site by. Follows are the problem statement (more details see [6, 20]) Definition 1. (Multigraph) Multigraph is a set of graphs G = {G1(V1, E1),... , Gn(Vn, En)}, where Gi (Vi, Ei) is a connected graph, each vertex is labeled under a given set L, the edges weight represent the Euclidean distances between the vertices. Definition 2. (Edit operations) There are following edit operations to distinguish between a graph G(V; E) and another graph: i) Insertion or deletion of a node: A node v ∈ V and edges associated with it can be deleted or inserted. ii) Change of the label of a node: The label ݈(‫ )ݒ‬of a node ‫ ܸ ∈ ݒ‬can be replaced by other label in L. iii) Change of the weight of an edge. The weight ‫ )݁(ݓ‬of an edge ݁ can be changed based on the conformation. Definition 3. (Multiple Graph Alignment). Let multigraph G ={G1(V1,E1),…,Gn(Vn,En)}, adding to each vertex set Vi a dummy node (denoted ⊥) that is not connected to the other nodes. An alignment of G is defined as follows. Then A ⊆ (V1 ∪ {⊥}) × ... × (Vn ∪ {⊥}) is an alignment of multigraph G if and only if: i) For all i=1,…,n and for each ‫ܸ ∈ ݒ‬௜ , there exists exactly one a = (a1,…,an) ∈ ‫ ܣ‬such that ‫ܽ = ݒ‬௜ ii) For each a = (a1,…,an) ∈ ‫ܣ‬, there exists at least one 1 ≤ i ≤ n such that ܽ௜ ≠ ⊥ Each a = (a1,…,an) ∈ ‫ ܣ‬is called a column vector of corresponding alignment, ‫ܸ ∈ ݒ‬௜ is real node. For readers’ ease, we keep the notation convention G ={G1(V1, E1),…,Gn(Vn, En)} to refer to the multigraph in which the graph Gi has been added a dummy node Definition 4. (Scoring function). The score s of a given alignment A = (a1,…, an) is defined as in Equation 1. n s ( A) = ∑ ns ( a i ) + i =1 ∑ (1) es (a i , a j ) 1≤ i < j ≤ n where ns is the score of the fitness of the corresponding column and is calculated by the Equation 2. nsm  a  nsmm   ns  ⋮  = ∑   ai  1≤ j ... initialize pheromone trail matrix and m ants; while (stop conditions not satisfied) for each a ∈ A Ant a build a multiple graph alignment; Local search// run only at the second stage Search by changing... initializing parameters and m artificial ants (agents) ACO-MGA2 repeatedly perform two stages as in Algorithm The first stage (applied for the first 70% of iterations) In each iteration, each ant builds... vertices Random walk procedure to construct an alignment In each iteration, each ant will repeat the process to build vectors a = (a1 ,…, an) for an alignment A as follows The ant randomly chooses an

Ngày đăng: 14/10/2015, 15:23

Xem thêm