Abstract
Amino acid substitution models represent substitution rates among amino acids during the evolution. The models play an important role in analyzing protein sequences, especially inferring phylogenies. The rapid evolution of flaviviruses is expanding the threat in public health. A number of models have been estimated for some viruses, however, they are unable to properly represent amino acid substitution patterns of flaviviruses. In this study, we collected protein sequences from the flavivirus genus to specifically estimate an amino acid substitution model, called FLAVI, for flaviviruses. Experiments showed that the collected dataset was sufficient to estimate a stable model. More importantly, the FLAVI model was remarkably better than other existing models in analyzing flavivirus protein sequences. We recommend researchers to use the FLAVI model when studying protein sequences of flaviviruses or closely related viruses.
Similar content being viewed by others
Data Availability
The FLAVI data is available at https://github.com/thulekm/flavi
References
Bollati M, Alvarez K, Assenberg R, Baronti C, Canard B, Cook S, Coutard B et al (2010) Structure and functionality in flavivirus NS-proteins: perspectives for drug design. Antiviral Res 87(2):125–128. https://doi.org/10.1016/j.antiviral.2009.11.009
Minh BQ, CD Cao, VL Sy, R Lanfear (2020) QMaker: estimating empirical models of protein evolution from large collections of alignments. Submitted
Cuong D, Quang Le, Olivier G, Vinh LS (2010) FLU, an amino acid substitution model for influenza proteins. BMC Evol Biol 10:99
Daep CA, Muñoz-Jordán JL, Eugenin EA (2014) Flaviviruses, an expanding threat in public health: focus on dengue, West Nile, and Japanese encephalitis virus. J Neurovirol 20(6):539–560. https://doi.org/10.1007/s13365-014-0285-z
Dang CC, Lefort V, Vinh LS, Le QS, Gascuel O (2011) Replacementmatrix: a web server for maximum-likelihood estimation of amino acid replacement rate matrices. Bioinformatics 27(19):2758–2760. https://doi.org/10.1093/bioinformatics/btr435
Dang CC, Vinh LS, Gascuel O, Hazes B, Le QS (2014) FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets. BMC Bioinform 15(1):341. https://doi.org/10.1186/1471-2105-15-341
Dimmic MW, Rest JS, Mindell DP, Goldstein RA (2002) RtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 55(1):65–73. https://doi.org/10.1007/s00239-001-2304-y
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. https://doi.org/10.1093/nar/gkh340
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791. https://doi.org/10.2307/2408678
Hatcher EL, Zhdanov SA, Bao Y, Blinkova O, Nawrocki EP, Ostapchuck Y, Schaffer AA, Rodney Brister J (2017) Virus variation resource-improved response to emergent viral outbreaks. Nucleic Acids Res 45(D1):D482–D490. https://doi.org/10.1093/nar/gkw1065
Hirotugu A (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723. https://doi.org/10.1109/TAC.1974.1100705
Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS (2017) UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol 35(2):518–522. https://doi.org/10.1093/molbev/msx281
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8:275–282. https://doi.org/10.1093/bioinformatics/8.3.275
Kalyaanamoorthy S, Minh BQ, Wong TKF, Von Haeseler A, Jermiin LS (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14(587):589. https://doi.org/10.1038/nmeth.4285
Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320. https://doi.org/10.1093/molbev/msn067
Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32(1):268–274. https://doi.org/10.1093/molbev/msu300
Nickle DC, Heath L, Jensen MA, Gilbert PB, Mullins JI, SL, Pond (2007) HIV-specific probabilistic models of protein evolution. PLoS ONE 2(6):e503. https://doi.org/10.1371/journal.pone.0000503
Posada D, Crandall KA (2001) Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA 98:13757–13762. https://doi.org/10.1073/pnas.241370698
Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53:131–147. https://doi.org/10.1016/0025-5564(81)90043-2
Shimodaira H, Hasegawa M (2001) CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17:1246–1247. https://doi.org/10.1093/bioinformatics/17.12.1246
Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Syst Biol 51(3):492–508. https://doi.org/10.1080/10635150290069913
Le SQ, Dang CC, Gascuel O (2012) Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol Biol Evol 29:2921–2936
Thorne JL (2000) Models of protein sequence evolution and their applications. Curr Opin Genet Dev 10:602–605. https://doi.org/10.1016/S0959-437X(00)00142-8
Vinh LS, Dang CC, Le QS (2017) Improved mitochondrial amino acid substitution models for metazoan evolutionary studies. BMC Evol Biol 17:136. https://doi.org/10.1186/s12862-017-0987-y
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699. https://doi.org/10.1093/oxfordjournals.molbev.a003851
Yang Z (1993) Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol 10(6):1396–1401
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant number 102.01.2019.06.
Author information
Authors and Affiliations
Contributions
VLS designed the study and experiments. LKT performed experiments. Both authors analyzed experimental results, wrote and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Handling Editor: Keith Crandall.
Rights and permissions
About this article
Cite this article
Le, T.K., Vinh, L.S. FLAVI: An Amino Acid Substitution Model for Flaviviruses. J Mol Evol 88, 445–452 (2020). https://doi.org/10.1007/s00239-020-09943-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-020-09943-3