Amino acid substitution model for rotavirus

56 7 0
Amino acid substitution model for rotavirus

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY ! NGUYEN DUC CANH AMINO ACID SUBSTITUTION MODEL
 FOR ROTAVIRUS
 MASTER THESIS Major: Computer Science HA NOI - 2019 VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY Nguyen Duc Canh AMINO ACID SUBSTITUTION MODEL
 FOR ROTAVIRUS
 MASTER THESIS Major: Computer Science Supervisor: Assoc Prof Le Sy Vinh HA NOI - 2019 Abstract Modeling protein evolution has been a major field of research in bioinformatics for decades One popular method to approximate the evolution of proteins is to use an amino-acid substitution model which can reveal the instantaneous rate that an amino acid is changed into another amino acid Such kind of model is useful in many different ways and has become a main component in a variety of bioinformatic systems Many models such as JTT, WAG and LG have been estimated using data from various species Recent research showed that these models might be inappropriate for analysis of some specific species Meanwhile, the world has witnessed a series of emerging epidemics caused by viruses, notably rotavirus - a contagious virus that can cause gastroenteritis These epidemics raise a need for modeling the evolution of these emerging viruses In this thesis, using the data from the Viral Genome Resource at National Center for Biotechnology Information (NCBI), we propose the ROTA model that has been specifically estimated for modeling the evolution of rotavirus Analysis revealed significant differences between ROTA and existing models in amino acid frequencies, exchangeability coefficients as well as inferred phylogenies Experiments showed that ROTA better characterizes the evolutionary patterns of rotavirus than other models and should be useful in most systems that requires an accurate description of rotavirus evolution iii Acknowledgements I would like to express my sincere gratitude to my advisor Assoc Prof Le Sy Vinh for the continuous support of my study and research, for his patience, motivation, enthusiasm, and immense knowledge His guidance helped me in all the time of research and writing of this thesis I could not have imagined having a better advisor and mentor for my Master study Besides my advisor, I would like to thank Dr Dang Cao Cuong and MSc Le Kim Thu for giving devoted explanations to my questions and guiding me to solve various problems that I had to face I also thank my friends: Can Duy Cat, Nguyen Minh Trang, Le Hai Nam for their continuous motivations without which I would never be able to complete this thesis My sincere thanks also goes to Information Faculty of University of Engineering and Technology, Vietnam National University, Hanoi for providing me the necessary facilities to conduct experiments Last but not the least, I would like to thank my parents for giving birth to me at the first place and supporting me spiritually throughout my life iv Declaration I hereby declare that this thesis was entirely my own work and that any additional sources of information have been properly cited I certify that, to the best of my knowledge, my thesis does not infringe upon anyone’s copyright nor violate any proprietary rights and that any ideas, techniques or any other material from the work of other people included in my thesis, published or otherwise, are fully acknowledged in accordance with the standard referencing practices I declare that this thesis has not been submitted for a higher degree to any other University or Institution v Table of Contents Abstract iii Acknowledgements iv Declaration v Table of Contents vii Acronyms viii List of Figures ix List of Tables x Introduction 1 Background 1.1 Sequence evolution 1.1.1 Heredity materials 1.1.2 Evolution and homologous sequences Modeling sequence evolution 1.2.1 Sequence alignment 1.2.2 General time-reversible substitution model 11 1.2.3 Model of rate heterogeneity 15 1.2.4 Available amino acid substitution models 16 Phylogenetic trees 17 1.3.1 17 1.2 1.3 Overview vi 1.3.2 Phylogenetic tree reconstruction 18 1.3.3 Robinson-Foulds distance 20 1.3.4 Phylogenetic hypothesis testing 20 Method 24 2.1 Modeling method 24 2.2 Model estimation process 26 Results and discussion 28 3.1 Data preparation 28 3.2 Model analysis 30 3.3 Performance on testing alignments 31 3.4 Tree topology analysis 32 3.4.1 Robinson-Foulds distance 32 3.4.2 Shimodaira-Hasegawa test 33 Protein-specific models 35 3.5 Conclusions 39 References 44 A ROTA model 45 vii Acronyms GTR General time-reversible ME Minimum evolution ML Maximum likelihood MP Maximum parsimony MSA Multiple sequence alignment NCBI National Center for Biotechnology Information RF Robinson-Foulds viii List of Figures 1.1 A sample phylogenetic tree of sequences 18 1.2 Two trees T1 and T2 describe the same set of sequences {1, 2, 3, 4, 5} but have different topologies Tree T1 has two bipartitions {1, 2}|{3, 4, 5} and {1, 2, 3}|{4, 5} while tree T2 has two bipartitions {1, 3}|{2, 4, 5} and {1, 2, 3}|{4, 5} Bipartition {1, 2}|{3, 4, 5} is present only in T1 whereas bipartition {1, 3}|{2, 4, 5} is present only in T2 Bipartition {1, 2, 3}|{4, 5} occurs in both T1 and T2 The standardized Robinson and Foulds distance between T1 and T2 is 2/4 3.1 21 The exchangeability coefficients in ROTA, FLU and JTT models The black (gray or white) bubble at the intersection of row X and column Y presents the exchange rate between amino acid X and amino acid Y in ROTA (FLU or JTT) 3.2 32 The relative differences between exchangeability coefficients in ROTA and other two models The size of the bubble corresponds to the value (ROT AXY − MXY )/(ROT AXY + MXY ) where X, Y is one of 20 amino acids and M is FLU for the subfigure a) or JTT for the subfigure b) The black bubble with value 2/3 (1/3) indicates the coefficient in ROTA is (2) times larger than the corresponding one in M whereas the white bubble with value 2/3 (1/3) indicates the coefficient in ROTA is (2) times smaller than 3.3 the corresponding one in M 33 Amino acid frequencies of ROTA, FLU and JTT models 34 ix List of Tables 1.1 Twenty differenct amino acids 3.1 Number of sequences grouped by protein types in training and testing dataset 3.2 34 Comparisons of ROTA and 10 other models in constructing maximum likelihood trees 3.5 30 Comparisons of ROTA and 10 other models in constructing maximum likelihood trees 3.4 29 The Pearson’s correlations between ROTA and 10 widely used models 3.3 35 The normalized Robinson-Foulds distance between trees inferred using ROTA versus FLU and JTT models for 12 testing multiple alignments 3.6 36 The number of test alignments that trees inferred from existing models are significantly worse than those from ROTA for the 11 testing alignments that ROTA is the best-fit model 3.7 37 Comparison of log-likelihood per site between ROTA and proteinspecific model For each alignment of protein P, ROTAP is the model estimated using only sequences of protein P from training dataset 38 A.1 Amino acid exchangeability matrix (the first 20 rows) and frequency vector (the last row) of ROTA x 46 Figure 3.1: The exchangeability coefficients in ROTA, FLU and JTT models The black (gray or white) bubble at the intersection of row X and column Y presents the exchange rate between amino acid X and amino acid Y in ROTA (FLU or JTT) with the highest average log-likelihood score (-29.90) Its score is greater than that of the second best model by 0.41 In a nut shell, ROTA model is more suitable to infer phylogeny of rotavirus species than other existing models 3.4 Tree topology analysis 3.4.1 Robinson-Foulds distance To investigate the topology of phylogenetic trees infered from ROTA and other models, the Robinson-Foulds (RF) metric is used to measure the distance between two phylogenetic trees The RF distance is the number of bi-partitions which are 32 Figure 3.2: The relative differences between exchangeability coefficients in ROTA and other two models The size of the bubble corresponds to the value (ROT AXY − MXY )/(ROT AXY + MXY ) where X, Y is one of 20 amino acids and M is FLU for the subfigure a) or JTT for the subfigure b) The black bubble with value 2/3 (1/3) indicates the coefficient in ROTA is (2) times larger than the corresponding one in M whereas the white bubble with value 2/3 (1/3) indicates the coefficient in ROTA is (2) times smaller than the corresponding one in M present in one tree but not in the other The smaller this value is, the more similar the topologies of the trees seem to be Table 3.5, using the normalized RF distance which is the RF distance divided by the total possible bi-partitions, demonstrates a great difference in tree topologies of these models The average normalized RF distance between ROTA and FLU and between ROTA and JTT are 0.39 and 0.38, respectively 3.4.2 Shimodaira-Hasegawa test We also used the S-H test to assess the statistical significance of the difference between ROTA and other models The S-H test helps to confirm whether the improvement in likelihood indeed comes from the new model ROTA proved to be the better fit model for 11 out of 12 testing alignments With the confidence 33 Figure 3.3: Amino acid frequencies of ROTA, FLU and JTT models Table 3.3: Comparisons of ROTA and 10 other models in constructing maximum likelihood trees model 1st 2nd 3th 4th 5th 6th 7th 8th 9th 10th 11st ROTA 11 0 0 0 0 HIVw 1 0 FLU 1 0 0 0 HIVb 3 0 0 0 JTT 0 0 0 WAG 0 0 3 0 LG 0 0 0 VT 0 0 1 0 Dayhoff 0 0 0 0 6 rtREV 0 0 0 0 BLOSUM62 0 0 0 0 11 34 Table 3.4: Comparisons of ROTA and 10 other models in constructing maximum likelihood trees model logL/site ROTA −29.90 FLU −30.31 HIVb −30.38 JTT −30.40 HIVw −30.47 LG −31.00 WAG −31.00 VT −31.04 Dayhoff −31.30 rtREV −31.33 BLOSUM62 −31.59 level α = 0.05, most of existing models constructed a significantly worse trees than ROTA did for these 11 alignments as shown in table 3.6 There are only models (FLU, HIVb, JTT) which could infer acceptable phylogenies (p-values > 0.05) in some of these 11 testing alignments The HIVb model demonstrated to be the best-fit model for the alignment NSP6 but not significantly better than ROTA (the p-values for HIVb and ROTA for NSP6 are 0.93 and 0.82 respectively) These evidences illustrate that existing models are not suitable to infer phylogenies for rotavirus sequences 3.5 Protein-specific models To go a further step, we estimated a substitution model for each rotavirus protein and then compare the ability to infer phylogenies between ROTA and these new models Because rotavirus consists of 12 proteins, 12 protein-specfic models were estimated using the same method as to estimate ROTA but the training data were divided into 12 subsets according to 12 proteins These models are denoted as 35 Table 3.5: The normalized Robinson-Foulds distance between trees inferred using ROTA versus FLU and JTT models for 12 testing multiple alignments Alignment ROTA vs FLU ROTA vs JTT NSP1 0.30 0.26 NSP2 0.36 0.34 NSP3 0.40 0.44 NSP4 0.55 0.52 NSP5 0.43 0.46 NSP6 0.55 0.42 VP1 0.33 0.30 VP2 0.21 0.25 VP3 0.21 0.21 VP4 0.26 0.29 VP6 0.59 0.57 VP7 0.49 0.48 ROTAP where P is the name of the protein The testing set remains unchanged For the testing alignment of protein P, the trees inferred by ROTA and by ROTAP were compared against each other As expected, ROTAP constructed better likelihood trees than ROTA for all alignments in the testing set as shown in table 3.7 Among the testing alignments, ROTA differs from NSP6 the most with the difference of 1.13 log likelihood per site while ROTA perform almost equally well as ROTAVP1 and ROTAVP2 When included in S-H test, trees inferred from protein-specific models are always the maximum likelihood trees With the confidence level of 0.05, there are ROTA trees rejected by the S-H test (rejected trees including NSP5, NSP6 and VP6) while none of the other species-specific and general models has a tree with p-value > 0.05 This confirms the inappropriate use of other models to study the evolution of rotavirus while suggests that ROTA as species-level model is acceptable for most studies of rotavirus In conclusion, the ROTA model have proved to be different from existing mod36 Table 3.6: The number of test alignments that trees inferred from existing models are significantly worse than those from ROTA for the 11 testing alignments that ROTA is the best-fit model model p-value < 0.05 FLU HIVb JTT HIVw 11 LG 11 VT 11 WAG 11 rtREV 11 Dayhoff 11 BLOSUM62 11 els and better in terms of constructing maximum likelihood trees for rotavirus data 37 Table 3.7: Comparison of log-likelihood per site between ROTA and proteinspecific model For each alignment of protein P, ROTAP is the model estimated using only sequences of protein P from training dataset Alignment ROTAP ROTA ROTAP − ROTA NSP1 −45.27 −45.43 0.16 NSP2 −24.78 −25.06 0.28 NSP3 −27.22 −27.47 0.25 NSP4 −39.42 −39.56 0.14 NSP5 −22.23 −22.62 0.39 NSP6 −11.61 −12.74 1.13 VP1 −16.80 −16.85 0.05 VP2 −13.46 −13.52 0.06 VP3 −29.01 −29.12 0.11 VP4 −41.91 −42.17 0.26 VP6 −16.32 −16.61 0.28 VP7 −67.51 −67.63 0.12 38 Conclusions In this thesis, we propose the ROTA model that has been specifically estimated for modeling the evolution of rotavirus Analyses revealed significant differences between ROTA and existing models in both amino acid frequency vector and exchangeability coefficient matrix Experiments showed that ROTA better characterizes the evolutionary patterns of rotavirus than other models The testing section confirmed that ROTA is better than existing models in constructing maximum likelihood trees ROTA proved significantly better than other models for a majority of tested alignments Although protein-specific models for rotavirus can infer better likelihood trees, ROTA is proven to be useful for most situations In this study, amino acid sequences were aligned by Muscle to produce alignments that serve as inputs for estimating ROTA Phylogenetic tree construction and model estimation process are performed by IQ-TREE program ROTA model is encouraged to be used for any rotavirus protein analysis system that is in need of an accurate description of amino acid substitution process It should enhance our understanding of the evolution and infection process of rotavirus Future work on protein sequences of other types of emerging viruses could be done to understand the similarities as well as the differences between physical, chemical and biological characteristics of these viruses 39 References [1] J C Setubal, J Meidanis, and Setubal-Meidanis, Introduction to computational molecular biology No 04; QH506, S4., PWS Pub Boston, 1997 [2] J Felsenstein and J Felenstein, Inferring phylogenies, vol Sinauer associates Sunderland, MA, 2004 [3] M Salemi and A.-M Vandamme, “The phylogenetic handbook a practical approach to dna and protein phylogeny,” 2003 [4] S Q Le and O Gascuel, “An improved general amino acid replacement matrix,” Molecular biology and evolution, vol 25, no 7, pp 1307–1320, 2008 [5] M Dayhoff, R Schwartz, and B Orcutt, “A model of evolutionary change in proteins,” in Atlas of protein sequence and structure, vol 5, pp 345–352, National Biomedical Research Foundation Silver Spring, 1978 [6] D T Jones, W R Taylor, and J M Thornton, “The rapid generation of mutation data matrices from protein sequences,” Bioinformatics, vol 8, no 3, pp 275–282, 1992 [7] J Adachi and M Hasegawa, “Model of amino acid substitution in proteins encoded by mitochondrial dna,” Journal of molecular evolution, vol 42, no 4, pp 459–468, 1996 [8] S Whelan and N Goldman, “A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach,” Molecular biology and evolution, vol 18, no 5, pp 691–699, 2001 40 [9] M W Dimmic, J S Rest, D P Mindell, and R A Goldstein, “rtrev: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny,” Journal of molecular evolution, vol 55, no 1, pp 65– 73, 2002 [10] D C Nickle, L Heath, M A Jensen, P B Gilbert, J I Mullins, and S L K Pond, “Hiv-specific probabilistic models of protein evolution,” PLoS One, vol 2, no 6, p e503, 2007 [11] C C Dang, Q S Le, O Gascuel, and V S Le, “Flu, an amino acid substitution model for influenza proteins,” BMC evolutionary biology, vol 10, no 1, p 99, 2010 [12] J R Brister, D Ako-Adjei, Y Bao, and O Blinkova, “Ncbi viral genomes resource,” Nucleic acids research, vol 43, no D1, pp D571–D577, 2014 [13] L Pray, “Discovery of dna structure and function: Watson and crick,” Nature Education, vol 1, no 1, p 100, 2008 [14] C Darwin, On the origin of species, 1859 Routledge, 2004 [15] S B Needleman and C D Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol 48, pp 443–453, Mar 1970 [16] M S Waterman, Introduction to computational biology: maps, sequences and genomes Chapman and Hall/CRC, 2018 [17] J D Thompson, D G Higgins, and T J Gibson, “Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic acids research, vol 22, no 22, pp 4673–4680, 1994 [18] R C Edgar, “Muscle: multiple sequence alignment with high accuracy and high throughput,” Nucleic acids research, vol 32, no 5, pp 1792–1797, 2004 41 [19] K Strimmer, A von Haeseler, A.-M Salemi, et al., “Nucleotide substitution models,” in The Phylogenetics Handbook A Practical Approach to DNA and Protein Phylogeny, pp 72–100, Cambridge University Press, 2003 [20] J Keilson, Markov chain models—rarity and exponentiality, vol 28 Springer Science & Business Media, 2012 [21] M Nei and S Kumar, Molecular evolution and phylogenetics Oxford university press, 2000 [22] W M Fitch and E Margoliash, “A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case,” Biochemical genetics, vol 1, no 1, pp 65–71, 1967 [23] J Wakeley, “Substitution rate variation among sites in hypervariable region of human mitochondrial dna,” Journal of Molecular Evolution, vol 37, no 6, pp 613–623, 1993 [24] Z Yang, “Maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods,” Journal of Molecular evolution, vol 39, no 3, pp 306–314, 1994 [25] S Guindon and O Gascuel, “A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood,” Systematic biology, vol 52, no 5, pp 696–704, 2003 [26] L.-T Nguyen, H A Schmidt, A von Haeseler, and B Q Minh, “Iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies,” Molecular biology and evolution, vol 32, no 1, pp 268–274, 2014 [27] S Henikoff and J G Henikoff, “Amino acid substitution matrices from protein blocks.,” Proceedings of the National Academy of Sciences, vol 89, pp 10915–10919, Nov 1992 [28] T Măuller and M Vingron, Modeling amino acid replacement, Journal of Computational Biology, vol 7, no 6, pp 761–776, 2000 42 [29] “Theoretical foundation of the minimum-evolution method of phylogenetic inference.,” Molecular Biology and Evolution, Sept 1993 [30] N Saitou and M Nei, “The neighbor-joining method: a new method for reconstructing phylogenetic trees.,” Molecular biology and evolution, vol 4, no 4, pp 406–425, 1987 [31] A G Kluge and J S Farris, “Quantitative phyletics and the evolution of anurans,” Systematic Biology, vol 18, no 1, pp 1–32, 1969 [32] W M Fitch, “Toward defining the course of evolution: minimum change for a specific tree topology,” Systematic Biology, vol 20, no 4, pp 406–416, 1971 [33] D Sankoff, “Minimal mutation trees of sequences,” SIAM Journal on Applied Mathematics, vol 28, no 1, pp 35–42, 1975 [34] R Graham and L Foulds, “Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computational time,” Mathematical Biosciences, vol 60, no 2, pp 133–142, 1982 [35] D L Quicke, J Taylor, and A Purvis, “Changing the landscape: a new strategy for estimating large phylogenies,” Systematic Biology, vol 50, no 1, pp 60–66, 2001 [36] B Chor and T Tuller, “Maximum likelihood of evolutionary trees: hardness and approximation,” Bioinformatics, vol 21, no suppl 1, pp i97–i106, 2005 [37] D F Robinson and L R Foulds, “Comparison of phylogenetic trees,” Mathematical biosciences, vol 53, no 1-2, pp 131–147, 1981 [38] H Kishino and M Hasegawa, “Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from dna sequence data, and the branching order in hominoidea,” Journal of molecular evolution, vol 29, no 2, pp 170–179, 1989 [39] H Shimodaira and M Hasegawa, “Multiple comparisons of log-likelihoods with applications to phylogenetic inference,” Molecular biology and evolution, vol 16, no 8, pp 1114–1114, 1999 43 [40] S Kalyaanamoorthy, B Q Minh, T K Wong, A von Haeseler, and L S Jermiin, “Modelfinder: fast model selection for accurate phylogenetic estimates,” Nature methods, vol 14, no 6, p 587, 2017 [41] O Chernomor, A von Haeseler, and B Q Minh, “Terrace aware data structure for phylogenomic inference from supermatrices,” Systematic biology, vol 65, no 6, pp 997–1008, 2016 [42] E L Hatcher, S A Zhdanov, Y Bao, O Blinkova, E P Nawrocki, Y Ostapchuck, A A Schăaffer, and J R Brister, Virus variation resource improved response to emergent viral outbreaks,” Nucleic acids research, vol 45, no D1, pp D482–D490, 2016 [43] C C Dang, V S Le, O Gascuel, B Hazes, and Q S Le, “Fastmg: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets,” BMC bioinformatics, vol 15, no 1, p 341, 2014 44 Appendix A ROTA model 45 46 0.017655 N 0.033955 0.033613 R 0.043634 0.519848 0.507852 1.429963 1.699821 0.087850 0.000020 0.112036 0.129972 0.263421 0.057852 1.704456 5.787463 15.419971 0.000020 0.000020 A Q E G H I L K M F P S T W Y V 11.372369 0.046926 C 0.935637 0.614200 1.194592 0.559276 0.000020 0.380398 19.585033 0.248346 0.283006 6.433540 3.785543 0.176993 5.494254 2.315043 0.070026 0.628335 0.021035 3.167864 8.922568 0.308269 0.024733 0.149600 2.553908 0.003553 0.492042 5.783073 0.940857 0.149082 0.581558 0.147064 1.451156 10.931862 0.148206 D 0.003400 0.191170 R 0.202678 0.434330 0.086768 0.236217 2.148162 0.352650 0.033173 0.000020 0.673287 0.153960 C Q E G 0.060515 0.016000 0.042191 0.052791 0.033779 D 0.268857 0.233979 0.127571 0.436352 0.995101 0.015198 H 0.067878 0.534853 5.783924 0.000020 0.035295 0.052579 16.629042 0.000020 4.013829 0.039984 0.000020 1.581158 0.027618 0.023212 0.157407 0.138677 0.000020 0.251714 1.295347 0.517256 0.234603 4.105469 0.483193 0.294882 3.530081 0.323093 0.133142 0.030246 1.473956 0.026451 0.000020 0.009953 0.000020 0.188674 0.208604 0.054240 0.080187 0.000020 0.000020 2.058908 3.680558 0.090828 0.019107 0.312758 0.595025 0.049614 0.040070 0.041166 0.067096 0.015967 0.067942 0.028322 1.745871 1.978981 8.724333 0.078767 0.050429 3.588184 1.548469 0.028392 5.433810 7.462225 0.046146 2.028448 0.144062 0.000020 0.110184 0.067958 0.084543 0.065252 0.072265 3.639708 0.588113 0.709723 0.053928 0.408323 0.679940 0.000020 0.135584 0.314569 0.408975 0.408334 0.000020 0.079158 1.108362 3.826330 0.021942 2.630527 2.509082 1.547062 0.146611 0.028420 1.792501 6.467714 1.811778 0.277569 0.486617 0.549857 4.989183 0.034321 0.092325 0.080787 I K M F P S T W Y V 0.092445 0.066738 0.032043 0.039880 0.031233 0.079733 0.070195 0.012981 0.049242 0.063664 L 27.705830 1.700456 0.033947 4.498474 0.867021 0.432057 0.048381 0.732619 0.037521 0.234116 0.197118 0.000020 6.026262 0.334842 0.781370 0.991239 12.007209 3.067874 0.583010 0.254183 3.211234 Table A.1: Amino acid exchangeability matrix (the first 20 rows) and frequency vector (the last row) of ROTA N A ... therefore, substitution models for amino acids are called empirical substitution models Different methods are proposed to estimate amino acid substitution models Dayhoff model [5] was the first model. .. estimate amino acid substitution models which can reveal the instantaneous substitution rates that amino acids change into the other amino acids This kind of model has become crucial for a wide... comparison to 208 parameters for models of amino acid substitutions Thus, amino acid substitution models are typically estimated from large datasets The first model is Dayhoff model [5] using highly similar

Ngày đăng: 16/03/2021, 09:41

Từ khóa liên quan

Mục lục

  • Abstract

  • Acknowledgements

  • Declaration

  • Table of Contents

  • Acronyms

  • List of Figures

  • List of Tables

  • Introduction

  • Background

    • Sequence evolution

      • Heredity materials

      • Evolution and homologous sequences

      • Modeling sequence evolution

        • Sequence alignment

        • General time-reversible substitution model

        • Model of rate heterogeneity

        • Available amino acid substitution models

        • Phylogenetic trees

          • Overview

          • Phylogenetic tree reconstruction

          • Robinson-Foulds distance

          • Phylogenetic hypothesis testing

          • Method

            • Modeling method

            • Model estimation process

Tài liệu cùng người dùng

Tài liệu liên quan