2010 Second International Conference on Knowledge and Systems Engineering EM-Coffee: An improvement of M-Coffee Nguyen Ha Anh Tuan, Ha Tuan Cuong, Nguyen Hoang Dung, and Le Sy Vinh University of Engineering and technology Vietnam National University Hanoi Email: vinhls@vnu.edu.vn Tu Minh Phuong Faculty of Information Technology Posts and Telecommunications institute of Technology, Hanoi Vietnam Email: phuongtm@ptit.edu.vn Abstract—Multiple sequence alignment is a basic of sequence analysis In the development of multiple sequence alignment (MSA) approaches, M-Coffee [1] was proposed as a meta-method for assembling outputs from different individual multiple aligners into one single MSA to boost the accuracy Authors showed that M-Coffee outperformed individual alignment methods In this paper, we propose an improvement of M-coffee, called EM-Coffee, by introducing a new weighting scheme for combining input alignments Experiments with benchmark datasets showed that EM-Coffee produced better results than M-Coffee, T-Coffee, Muscle and some other widely used methods Thus, we provide an alternative option for researchers to align sequences Input Output s1 s2 s3 s4 s1 s2 s3 s4 G G G A G - C G C G G G A T G T C C G G G G T A G T T C C A G C G G G T G T A A T A A A A G A T T C A A A Table I A N EXAMPLE OF ALIGNING NUCLEOTIDE SEQUENCES H OMOLOGOUS NUCLEOTIDES ARE ALIGNED INTO THE SAME COLUMNS BY INSERTING GAPS WHICH PRESENT FOR INSERTIONS / DELETIONS Keywords-Multiple sequence alignment; protein, DNA, TCoffee, M-Coffee, Muscle I I NTRODUCTION According to Darwin’s theory [2], all organisms have evolved from a common ancestor Sequences which have evolved from the same ancestor are called homologous sequences In the course of evolution, nucleotide mutations have occurred, i.e., nucleotide substitutions, nucleotide deletions, and nucleotide insertions Since we can not distinguish the insertion and the deletion, they are called ‘indel’ and presented by a gap (‘-’) These mutations result in differences among homologous sequences in the content as well as length The core task in molecular sequence analysis is aligning homologous sequences such that homologous nucleotides/amino acids are aligned into the same columns An example is given in Table I Multiple alignments are inputs for essential sequence analysis systems such as phylogenetic inference, protein structure/function prediction The sequence alignment problem has been studied since the time of Waterman [3] While pairwise alignment (alignment of two sequences) can be solved by dynamic programming techniques [3], multiple sequence alignment (alignment of multiple sequences) is an NP-complete problem [4] A number of approaches, such as ClustalW [5], T-Coffee [6], Muscle [7], probcons [8], M-Coffee [1] have been proposed [9, and references therein] The core components of multiple sequence alignment methods are objective function and optimization procedure Different objective functions to assess the quality of multiple alignments have been proposed [5], [6], [10], [11] The basic scheme of optimization procedure was reviewed and illustrated in [9] (see figure 1) It includes 978-0-7695-4213-3/10 $25.00 © 2010 IEEE DOI 10.1109/KSE.2010.16 four steps: (1) distance matrix calculation, (2) guide tree construction, (3) progressive alignment, and (4) alignment refinement Distance matrix calculation: Distances between all pair of sequences are computed The distance between two sequences presents the similarity between them It can be simply calculated as the percentage of identical positions between two sequences It can be calculated more precisely using nucleotide/amino acid substitution models Guide tree construction: A guide tree (phylogenetic tree) which presents the evolutionary relationships among sequences is constructed based on the distance matrix This can be done by distance-based algorithms [12], [13], [14, and references therein] Progressive alignment: Following the guide tree, sequences (groups of sequences) are progressively aligned The output of this step is a multiple sequence alignment Alignment refinement: Iterative refinement is used to refine the multiple sequence alignment obtained from the progressive alignment step In this step, the guide tree may be reconstructed and sequences are realigned with the new tree This step is completed when no improvement is found Many multiple sequence alignment methods with different levels of reliability have been proposed [5], [7], [11], [6], [1], [9, and references therein] Notredame and colleagues proposed M-Coffee [1] which is an extension of T-Coffee [6] to assemble outputs from individual methods to create more accurate alignments They showed that MCoffee is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs [1] M-Coffee outperformed all individual methods on some 14 C ProbCons Probcons [8] is a highly accurate protein sequence aligner, which is based on two main ideas: 1) pairwise alignment step using pair-HMM with maximum expected accuracy objective function, and 2) probabilistic consistency transformation before progressive alignment Experiments with benchmark datasets showed that Probcons is one of the most accurate methods However, it is only applicable for small datasets due to its computational burden D MAFFT Figure The general scheme of modern multiple sequence alignment methods (refer to [9] for more details) benchmark datasets In M-Coffee, the authors experimented with four different ways to weight outputs from different individual methods: Variance/Covariance weights, Altschul Carrillo Lipman weights, Thompson Higgins Gibson weights, and Accuracy weights However, experiments showed that all weighting schemes does not significantly outperform the simple combination of all methods, i.e, ‘No weight’ [1] In this paper, we introduce an extension of M-Coffee, called EM-Coffee We use Mumsa [15] as the measure of alignment accuracy to derive a new weighting scheme We verify the performance of the extension in standard Balibase 2.01 and Balibase 3.0 benchmark datasets II M ULTIPLE ALIGNMENT METHODS In this section, we give an overview of widely used multiple alignment methods A ClustalW ClustalW [5] is one of the most popular multiple sequence alignments It has been developed by Gibson et al since 1994 and become a common model for other algorithms Its objective function is to optimize the total penalty scores of substitutions, indels ClustalW consists of three main steps: Distance matrix calculation, guide tree construction, and progressive alignment as described above Although ClustalW is still widely used, current studies have shown that it is defeated by new proposed methods in both runtime and accuracy [7], [8], [11] B Muscle Robert Edga introduced a new method, Muscle, to efficiently align multiple sequences [7] Elements of the algorithm include fast distance estimation using k-mer counting, progressive alignment using a new profile function called the log expectation score, and refinement using tree dependent restricted partitioning The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four datasets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB MUSCLE achieves the highest, or joint highest, rank in term of accuracy The main assumption of MAFFT [11] is that physicochemical property differences between amino acids, especially in terms of volume and polarity, have a strong affect on amino acid substitution frequencies [16] [17] In this method, the relationship between two sequences will be defined using fast Fourier transform to reduce the complexity of this algorithm MAFFT offer different options, e.g., FFT-Ns1, FFT-Ns2, FFT-Nsi, Li-Nsi, Ei-Nsi, Gi-Nsi • FFT-Ns1: The progressive alignment process is conducted only one time • FFT-Ns2: The progressive alignment step is conducted one more time • FFT-Nsi: The alignment refinement step is repeated until no improvement is found or the number of iterates is over a threshold • Li-Nsi: This option is used for aligning ‘local homology’ data • Gi-Nsi: This option is used for aligning ‘global homology’ data • Ei-Nsi: This option is used for aligning ‘long internal gaps’ data E T-Coffee T-Coffee [6] is the first alignment method that introduced the notion of consistency across pairwise alignments Given libraries of global and local pairwise alignments, T-Coffee proceeds in five steps (generating library, weighting library, combining library, extending library, and progressive alignment) to construct a final alignment The flowchart of T-Coffee algorithm is presented in Figure F M-Coffee M-Coffee [1] is an extension of T-Coffee The main idea of M-Coffee is to assemble the information from different individual methods to create a result which contains most of correctness information To access the MCoffee method, the authors selected individual methods: ClustalW [5], FFT-Nsi [11], POA-Global [18], DialignT [19], PCMA [20], Muscle, T-Coffee, and ProbCons Each method can be weighted when they are combined into primary T-Coffee library Four weighting models are: Variance/Covariance weights, Altschul Carrillo Lipman weights, Thompson Higgins Gibson weights, and Accuracy weights 15 Figure The flowchart of EM-Coffee algorithm It consists of three steps: generating alignments, weighting alignments, and assembling alignments generating alignments, (2) weighting alignment libraries, and (3) assembling alignment libraries A Generating aligments A reference is the output of an individual method We here use three widely used methods to generate references: MUSCLE [7], MAFFT [11], and ProbCons [8] With MAFFT, we use three options: FFT-nsi, Ei-nsi, and Li-nsi The selection is based on the performance of algorithms on BaliBASE datasets [23] [24] Figure The T-Coffee algorithm flowchart [6] Variance/Covariance weights: Weights are calculated from inverse variance/covariance matrix which is introduced by AltsChul [21] • Altschul Carrillo Lipman weights: This is a tree-based weighting method • Thompson Higgins Gibson weights: This weight scheme was published by Thompson [22], and was also used by ClustalW [5] • Accuracy weights: The weight model is received by normalizing the results obtained individual methods over HOMSTRAD database Authors showed that M-Coffee outperformed individual alignment methods Since four weighting models failed to prove their stable over testing datasets, M-Coffee provides ‘no weight’ model, where outputs from individual methods are treated equally This model produced most stable and high accuracy compared with other methods This is used as the default option of M-Coffee • B Weighting alignments We proposed a new scheme to weight alignments using the output from Mumsa [15] Mumsa evaluates the reliability of the MSA in the absence of reference data It is very helpful for researchers in choosing the optimum results Mumsa defines the multiple overlap score (M OS) for each alignment as its reliability M OS is between 0.0 and 1.0; the higher M OS, the more reliable is the result The weight of an alignment generated from an individual method is defined as the the ratio of its MOS over the average M OS, averageM OS The averageM OS is calculated as below: averageM OS = M OS(X) (1) Where X ∈ {FFT-nsi, Ei-nsi, Li-nsi, ProbCons, MUSCLE}; M OS(X) is the MOS score of alignment generated by X Then the corresponding weight is: III EM-C OFFEE We present a new weighting scheme where weights are determined by Mumsa approach [15] Figure presents the EM-Coffee algorithm which includes three steps: (1) weighti = 16 M OS(X) averageM OS (2) C Assembling alignments Alignments generated from individual methods are used as alignment libraries in for T-Coffee Each alignment is weighted by weighting alignments step The T-Coffee algorithm creates a final multiple alignment by assembling alignment libraries T-Coffee was used with the default options IV R ESULT We used benchmark datasets BAliBASE 2.01 and 3.0 to assess the performance of EM-Coffee compared with other algorithms: M-Coffee, Muscle, MAFFT, ProbCons, T-Coffee, and TD-Coffee (EM-Coffee with ‘no weight’ model) A Data BAliBASE datasets are the most popular benchmark datasets used to assess the performance of multiple sequence alignment methods We used both BAliBASE ver.2.01 and ver.3.0 to test EM-Coffee and other methods BAliBASE 2.01 contains about 141 tests divided into subsets: • Ref1: including 82 tests (all of sequences in each test have similar length) • Ref2: including 23 tests (highly divergent ’orphan’ sequences) • Ref3: including 12 tests (subgroups with less than 25% residue identity between groups) • Ref4: including 12 tests (all of them have N/Cterminal extensions) • Ref5: including 12 tests (internal insertions) BAliBASE 3.0 has much more tests than BAliBASE 2.01 (about 217 tests) It is separated into subsets for testing multiple sequence alignment programs: Ref1, Ref2, Ref3, Ref4, and Ref5 Similar to BAliBASE 2.01, each subset poses certain properties For example, Ref3 contains divergent subfamilies; Ref4 includes large extension between sequences BAliBASE 3.0 can be also divided into two separated categories: all full-length alignment category and homologous alignment category Let Si be the score of the ith column of multiple alignment M and calculated as below Si = s(a, b) where a and b are amino acids from ith column of alignment Let SP (M ) be the score of algorithm M and defined as: Si SP (M ) = i is a column of M For each alignment M , we call R as its reference (the correct multiple alignment), SP (R) is calculated in the similar way as SP (M ) The SP ratio presenting the accuracy of M is defined as SP ratio = SP (M ) × 100 SP (R) C EM-Coffee vs TD-Coffee and M-Coffee First we note that EM-Coffee and TD-Coffee Coffee require the same running time as the weighting alignments step is fast Tables II and III show the performance of tested methods on BAliBASE 2.01 dataset EM-Coffee shows its superior over TD-Coffee on subsets: RV12, RV20, REV40 and RV50 It is better than M-Coffee in out of subsets On average, EM-Coffee outperforms both TD-Coffee and M-Coffee Tables IV and V show the performance of tested methods on BAliBASE 3.0 dataset Since the computational burden, M-Coffee was not tested on this dataset As we can see in table IV, EM-Coffee is better than TD-Coffee in out of cases On average, it is also better than TD-Coffee, however, the difference is not as big as on BAliBASE 2.01 The result of tests over BAliBASE 3.0 dataset with homologous sequences repeats the same pattern with the tests on BAliBASE 3.0 with full-length sequences It means that, EM-Coffee is better than TD-Coffee, especially in TC score for RVS30 and RVS50 On average, EM-Coffee is better than TD-Coffee B Scoring method D EM-Coffee vs others We, as usual, used two scoring systems to determine the precision of algorithms: total-column score (TC) and sumof-pair score (SP) The total-column score can be easily calculated as the percentage of columns which the output of an MSA and the references(correct alignments) are the same The sum-of-pair score calculation is more complicate Suppose a and b are two amino acids, s(a, b) is defined as the score when a and b are aligned together Thus, the value of s(a, b) is The Probcons is the best method for almost cases (see Tables II and III) MUSCLE and FFT-nsi are the two fastest programs, which require only 42 and 39 seconds, respectively EM-Coffee shows its superior over other algorithms on RV12 and RV50 tests Its overall results is close to ProbCons which are better than MUSCLE, FFTNsi, Li-Nsi and Ei-Nsi Tests on full-length alignment of BAliBASE 3.0 show that, EM-Coffee is the best method with respect to the average SP score (see table IV) Li-nsi exceeds ProbCons and has the highest TC score (∼59.27%) In a nutshell, EM-Coffee, Ei-nsi, LI-nsi and ProbCons are the best methods Muscle and FFT-nsi show the lowest scores, but they have the running time advantage Table V shows the superior performance of Probcons compared with other methods on BAliBASE 3.0 datasets ⎧ 1, ⎪ ⎪ ⎪ ⎨0, s(a, b) = ⎪ −1, ⎪ ⎪ ⎩ −2, if a and b is matched if both a and b are gaps if a and b is not matched other cases 17 Method Muscle FFT-nsi ProbCons Li-nsi Ei-nsi EM-Coffee TD-Coffee M-Coffee RV11 84.82 83.11 88.36 83.37 83.37 87.08 87.30 85.94 RV12 88.25 87.33 90.26 89.76 89.58 91.03 90.44 90.17 RV13 88.53 87.52 90.65 88.34 88.27 89.68 89.86 88.79 RV20 92.47 92.00 93.26 92.66 92.64 93.24 93.12 93.07 RV30 80.44 80.77 82.93 79.42 79.21 83.05 83.97 83.06 RV40 90.05 91.83 92.63 92.94 95.41 92.56 91.51 88.53 RV50 97.37 96.44 97.38 97.77 97.84 98.49 97.70 96.13 Average 88.60 87.92 90.65 88.80 88.95 90.45 90.32 89.32 Time(s) 42 39 465 70 76 182 182 1060 Average 73.76 74.02 78.55 74.72 75.12 78.13 77.85 75.23 Time(s) 42 39 465 70 76 182 182 1060 Table II S UM - OF - PAIR RATIO FOR BA LI BASE 2.01 Method Muscle FFT-nsi ProbCons Li-nsi Ei-nsi EM-Coffee TD-Coffee M-Coffee RV11 77.86 75.49 83.43 77.00 77.00 81.15 81.84 78.69 RV12 80.51 81.15 84.30 82.99 82.73 85.01 83.74 83.24 RV13 82.98 81.86 85.47 82.22 82.19 83.79 84.74 81.69 RV20 56.14 56.86 60.50 52.96 52.77 60.33 59.42 59.08 RV30 53.07 56.78 62.10 55.56 54.85 60.67 63.57 62.06 RV40 67.38 75.52 78.74 76.15 83.56 78.61 75.22 66.1 RV50 88.72 84.97 89.32 92.89 91.88 93.78 91.72 87.58 Table III T OTAL - COLUMN RATIO FOR BA LI BASE 2.01 Method MUSCLE FFT-nsi ProbCons Li-nsi Ei-nsi EM-Coffee TD-Coffee RV11 SP TC 58.92 34.76 61.44 39.45 66.97 41.68 66.19 43.79 66.02 43.74 68.72 44.87 68.66 44.61 RV12 SP TC 91.99 81.66 90.82 79.57 94.11 85.52 93.46 83.39 93.46 83.39 93.71 84.07 93.82 84.75 RV20 SP TC 88.99 36.34 90.83 37.51 91.67 40.49 92.70 45.10 92.52 44.61 92.28 43.95 91.63 43.78 RV30 SP TC 81.44 38.40 83.30 49.00 84.60 54.30 86.79 59.33 86.81 59.43 86.00 58.50 86.17 56.87 RV40 SP TC 87.15 45.73 89.87 54.61 90.24 52.86 92.61 61.51 92.26 60.53 91.86 58.10 91.41 58.04 RV50 SP TC 83.87 44.88 86.44 52.88 89.17 56.69 90.25 59.06 89.86 59.62 89.96 60.44 89.57 59.69 Average SP TC 82.53 48.23 84.13 52.89 86.38 55.66 87.22 59.27 87.05 59.00 87.33 58.60 87.12 58.15 Time(s) 1735 690 30092 3335 3680 6404 6404 Table IV R ESULT IN BA LI BASE 3.0 WITH FULL - LENGTH SEQUENCES Method MUSCLE FFT-nsi ProbCons Li-nsi Ei-nsi EM-Coffee TD-Coffee RVS11 SP TC 74.08 52.50 71.63 50.61 81.03 63.08 72.17 52.21 71.98 51.87 77.31 57.39 77.12 56.66 RVS12 SP TC 93.10 82.70 91.86 81.82 95.04 87.07 93.90 84.52 93.92 84.52 94.43 85.55 94.49 85.95 RVS20 SP TC 94.62 53.44 94.00 51.22 95.72 60.39 94.52 51.66 94.57 52.22 95.15 56.56 95.11 56.24 RVS30 SP TC 86.93 55.93 87.84 59.80 90.69 64.97 88.83 63.5 88.83 63.33 89.40 63.63 89.49 61.57 RVS40 SP TC N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A RVS50 SP TC 87.46 49.20 87.18 51.07 90.91 60.53 90.03 56.53 89.93 58.60 90.31 59.27 89.83 57.60 Average SP TC 87.56 60.89 86.67 60.56 90.89 68.77 87.90 62.90 87.86 63.12 89.47 65.81 89.41 65.14 Time(s) 279 141 7437 673 823 4668 4668 Table V R ESULT IN BA LI BASE 3.0 WITH HOMOLOGOUS SEQUENCES ONLY with homologous sequences only EM-Coffee is the second best method In conclusion, experiments emphasize the ability of EM-Coffee to collect and assemble important information from alignments of different individual methods Although EM-Coffee does not always appear to be the best method, it shows the stability over all the subsets and databases results allow us to confirm the reliability of Mumsa by using weights generated from this algorithm However, to execute EM-Coffee, users still need to run five other algorithms: MUSCLE, FFT-nsi, Li-nsi, Ei-nsi, and ProbCons to obtain references Thus, the running speed of the EM-Coffee is slow According to the our assessment, EMCoffee is useful when applied to small or medium datasets R EFERENCES V D ISCUSSION [1] I M Wallace, O O’Sullivan, D G Higgins, and C Notredame, “M-coffee: combining multiple sequence alignment methods with t-coffee,” Nucleic Acids Research, vol 34, no 6, pp 1692–1699, March 2006 We introduce EM-Coffee to the multiple sequence alignment problem It is an improvement of M-Coffee by using a new weighting scheme and selected best algorithms Experiments showed that EM-Coffee outperforms TDCoffee (EM-Coffee where outputs from individual methods have equal weights) and M-Coffee in most cases EMCoffee obtains quite significant results on almost subsets of BAliBASE 2.01 and BAliBASE 3.0 In addition, these [2] C R Darwin, On the Origin of Species London, 6th edn.: John Murray, 1872 [3] T F Smith and M S Waterman, “Identification of common molecular subsequences,” J Mol Biol., vol 147, pp 195– 197, 1981 18 [4] L Wang and T Jiang, “On the complexity of multiple sequence alignment,” J Comput Biol., vol 1, pp 337–348, 1994 [21] S F Altschul, R J Carroll, and D J Lipman, “Weights for data related by a tree,” J Mol Biol, vol 207, no 4, pp 647–653, 1989 [5] J Thompson, D Higgins, and T Gibson, “Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Research, vol 22, no 22, pp 4673–4680, 1994 [22] J D Thompson, D G Higgins, and T J Gibson, “Improved sensitivity of profile searches through the use of sequence weights and gap excision,” Comput Appl Biosci, vol 10, no 1, pp 19–29, 1994 [6] C Notredame, D G Higgins, and J Heringa, “T-coffee: A novel method for fast and accurate multiple sequence alignment,” J Mol Biol, vol 302, pp 205–217, 2000 [23] A Bahr, J Thompson, J.-C Thiery, and O Poch, “Balibase(benchmark alignment database): enhancements for repeats, transmembrane sequences and circular permutations,” Nucleic Acids Research, vol 29, no 1, pp 323–326, 2001 [7] R Edgar, “Muscle: multiple sequence alignment with high accuracy and high through put,” Nucleic Acids Research, vol 32, no 5, pp 1792–1797, 2004 [8] C Do, M Mahabhashyam, M Brudno, and S Batzoglou, “Probcons: probabilistic consistency-base multiple sequence alignment, current opinion in structural biology,” Genome Res, vol 15, pp 330–340, 2005 [24] J Thompson, P Koehl, R Ripp, and O Poch, “Balibase 3.0: Latest developments of the multiple sequence alignment benchmark,” PROTEINS: Structure, Function, and Bioinformatics, vol 61, pp 127–136, 2005 [9] C Do and K Katoh, “Protein multiple sequence alignment,” Methods in Molecular Biology, vol 484, pp 676– 682, 2008 [10] M Vingron and P Argos, “A fast and sensitive multiple sequence alignment algorithm,” Comput Appl Biosci, vol 5, pp 115–121, 1989 [11] K Katoh, K Misawa, K ichi Kuma, and T Miyata, “Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform,” Nucleic Acids Research, vol 30, no 14, pp 3059–3066, july 2002 [12] N Saitou and M Nei, “The neighbor-joining method: A new method to reconstructing phylogenetic trees,” Mol Biol Evol, vol 4, no 4, pp 406–425, 1987 [13] R Desper and O Gascuel, “Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle,” J Comput Biol., vol 9, pp 687–706, 2002 [14] S V Le and v A Haeseler, “Shortest triplet clustering: reconstructing large phylogenies using representative sets,” BMC Bioinformatics, vol 6, p 92, 2005 [15] T Lassmann and E Sonnhammer, “Automatic assessment of alignment quality,” Nucleic Acids Research, vol 33, no 22, pp 7120–7128, 2005 [16] T Miyata, S Miyazawa, and T Yasunage, “Two types of amino acid substitutions in protein evolution,” Journal of Molecular Evolution, vol 12, no 3, pp 219–236, 1979 [17] M Kimura, Neutral theory of molecular evolution Cambridge University Press, 1983 [18] C Lee, C Grasso, and M F Sarlow, “Multiple sequence alignment using partial order graphs,” Bioinformatics, vol 18, no 3, pp 452–464, 2002 [19] B Morgenstern, “Dialign: multiple dna and protein sequence alignment at bibiserv,” Nucleic Acids Research, vol 18(Web Server Issue), pp W33–W36, 2004 [20] J Pei, R Sadreyev, and N V Grishin, “Pcma: fast and accurate multiple sequence alignment based on profile consistency,” Bioinformatics, vol 19, no 3, pp 427–428, 2003 19 ... TD-Coffee on subsets: RV12, RV20, REV40 and RV50 It is better than M-Coffee in out of subsets On average, EM-Coffee outperforms both TD-Coffee and M-Coffee Tables IV and V show the performance of. .. calculated in the similar way as SP (M ) The SP ratio presenting the accuracy of M is defined as SP ratio = SP (M ) × 100 SP (R) C EM-Coffee vs TD-Coffee and M-Coffee First we note that EM-Coffee and... alignment The flowchart of T-Coffee algorithm is presented in Figure F M-Coffee M-Coffee [1] is an extension of T-Coffee The main idea of M-Coffee is to assemble the information from different individual