1. Trang chủ
  2. » Thể loại khác

213. A novel ant based algorithm for multiple graph alignment

6 121 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 6
Dung lượng 345,97 KB

Nội dung

The 2014 International Conference on Advanced Technologies for Communications (ATC'14) A Novel Ant Based Algorithm for Multiple Graph Alignment Tran Ngoc Ha Thai Nguyen University of Education hatn84@gmail.com Do Duc Dong Vietnam National University-Hanoi dongdoduc@vnu.edu.vn Abstract— Multiple graph alignment (MGA) is a new approach to analyze protein structure in order to exploring their functional similarity In this article, we propose a two-stage memetic algorithm to solve the MGA problem, named ACOMGA2, based on ant colony optimization metaheuristic A local search procedure is applied only to the second stage of the algorithm to save runtime Experimental results have shown that ACO-MGA2 outperforms state-of-the-art algorithms while producing alignments of better quality Keywords—Multiple Graph Alignment, Ant Colony Optimization, local search, memetic algorithm, SMMAS pheromone update rule I INTRODUCTION Multiple sequence alignment is a useful approach for analyzing evolutionary homology among DNA sequences or proteins However, this method is not suitable to determine the functional similarities among the molecules because functional similarities relate more closely to structural features rather than the sequential ones [6,12,15,18,19] Recently, a number of authors [1, 2, 10-12, 22-24] have proposed using graphical models to represent threedimensional structures of proteins and using the graph alignment techniques to infer functional similarities based on structural analysis These methods mainly use exact pair-wise graph matching technique They produce meaningful results when studying the functional evolution of non-homologous molecules However, it is difficult to leverage these methods to discover biological meaningful samples from approximately saved ones Weskamp et al [21] were the first (2007) to introduce the concept of multiple graph alignment (MGA) and to use it to analyze the protein active sites They proposed a heuristic algorithm according to greedy strategy The graphs are used to approximately describe binding pockets in Cavbase [8,14] In this approach, each binding pocket is modeled as a connected graph G(V, E ) and MGA problem is stated as follows : Given a set G ={G1(V1,E1),…,Gn(Vn,En)}which is a set of connected, node-labeled, edge-weighted graphs In each graph, there are three edit operations: deletion or insertion of a vertex, change of the label of a node, change of the weight of an edge The mission of the MGA problem is to find an alignment for the vertices of the graphs belong to G to optimize a predefined objective function MGA is NP-hard problem (see [6, 21]) The heuristic algorithms are only suitable for small problems, hence, not 978-1-4799-6956-2/14/$31.00 ©2014 IEEE 181 Hoang Xuan Huan Vietnam National University-Hanoi huanhx@vnu.edu.vn suitable for real applications Fober et al [6] have extended the usage of this problem for the structural analysis of biomolecules and have proposed an evolutionary algorithm called GAVEO Experiments show that this algorithm is more efficient than greedy algorithm although it is more time consuming In [20] the authors proposed ACO-MGA algorithm that using simply ant colony optimization scheme to solve the multiple graph alignment problem Experiment shows that this algorithm has better results than the GAVEO algorithm; however its running time is long and its efficiency is not good for large data sets This paper introduces a two-stage memetic algorithm based on ant colony optimization called ACO-MGA2 as an improvement of the ACO-MGA to align multiple graphs We keep construction graph as in ACO-MGA, but improve the heuristic information and the local search procedures To reduce the running time, the algorithm is split into two stages The local search is only applied at the second stage of the memetic scheme [13] It consists of two procedures: 1) Rearranging the different labeled vertices in alignment vectors in order to improve the compatibility of the vertices, 2) Swapping identical labeled vertices on each graph to increase the appropriateness of the edges’ weight Improvements in both runtime and efficiency of ACO-MGA2 is demonstrated empirically by comparison with GAVEO and Greedy The rest of this paper is organized as follows: Section provides mathematical statements for multiple graph alignment problem and summarizes the related work Section introduces the newly proposed algorithm The experimental results are presented in Section Several conclusions are presented in the last section II MULTIPLE GRAPH ALIGNMENT PROBLEM AND RELATED WORKS A Multiple graph alignment problem The multiple graph alignment problem is introduced [21] by Weskamp et al, with the purpose of studying proteins characteristics Fober et al [6] extended it to analyze the structure of molecules which includes the chemical composition and the protein binding site by Follows are the problem statement (more details see [6, 20]) Definition (Multigraph) Multigraph is a set of graphs G = {G1(V1, E1), , Gn(Vn, En)}, where Gi (Vi, Ei) is a connected The 2014 International Conference on Advanced Technologies for Communications (ATC'14) graph, each vertex is labeled under a given set L, the edges weight represent the Euclidean distances between the vertices Solution of an MGA problem is alignment that maximizing the scoring function‫)ܣ(ݏ‬ This is a NP-hard problem (see [6, 21]) If one use the exhaustive method to solve it, the complexity will be ܱ((ܸ݉ܽ‫!)ݔ‬௡ ) where Vmax is the number of vertices of the graph with the highest number of vertices and n is the number of graphs Definition (Edit operations) There are following edit operations to distinguish between a graph G(V; E) and another graph: i) Insertion or deletion of a node: A node v ∈ V and edges associated with it can be deleted or inserted ii) Change of the label of a node: The label ݈(‫ )ݒ‬of a node ‫ ܸ ∈ ݒ‬can be replaced by other label in L iii) Change of the weight of an edge The weight ‫ )݁(ݓ‬of an edge ݁ can be changed based on the conformation B Related works Weskamp et al [21] proposed applying multiple graph alignment problems to study protein characteristics, where graphs are used to approximately describe the binding pockets Definition (Multiple Graph Alignment) Let multigraph G ={G1(V1,E1),…,Gn(Vn,En)}, adding to each vertex set Vi a dummy node (denoted ⊥) that is not connected to the other nodes An alignment of G is defined as follows Then A ⊆ (V1 ∪ {⊥}) × × (Vn ∪ {⊥}) is an alignment of multigraph G if and only if: i) For all i=1,…,n and for each ‫ܸ ∈ ݒ‬௜ , there exists exactly one a = (a1,…,an) ∈ ‫ ܣ‬such that ‫ܽ = ݒ‬௜ ii) For each a = (a1,…,an) ∈ ‫ܣ‬, there exists at least one ≤ i ≤ n such that ܽ௜ ≠ ⊥ Each a = (a1,…,an) ∈ ‫ ܣ‬is called a column vector of corresponding alignment, ‫ܸ ∈ ݒ‬௜ is real node For readers’ ease, we keep the notation convention G ={G1(V1, E1),…,Gn(Vn, En)} to refer to the multigraph in which the graph Gi has been added a dummy node Definition (Scoring function) The score s of a given alignment A = (a1,…, an) is defined as in Equation n s ( A) = ∑ ns ( a i ) + i =1 ∑ (1) es (a i , a j ) 1≤ i < j ≤ n where ns is the score of the fitness of the corresponding column and is calculated by the Equation  nsm  a   nsmm   ns  ⋮  = ∑   a i  1≤ j

Ngày đăng: 16/12/2017, 01:11

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN