The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates

  • Russell A. Hermansen
  • Benjamin P. Oswald
  • Stormy Knight
  • Stephen D. Shank
  • David Northover
  • Katharine L. Korunes
  • Stephen N. Michel
  • David A. Liberles
Original Article

Abstract

With the large collections of gene and genome sequences, there is a need to generate curated comparative genomic databases that enable interpretation of results in an evolutionary context. Such resources can facilitate an understanding of the co-evolution of genes in the context of a genome mapped onto a phylogeny, of a protein structure, and of interactions within a pathway. A phylogenetically indexed gene family database, the adaptive evolution database (TAED), is presented that organizes gene families and their evolutionary histories in a species tree context. Gene families include alignments, phylogenetic trees, lineage-specific dN/dS ratios, reconciliation with the species tree to enable both the mapping and the identification of duplication events, mapping of gene families onto pathways, and mapping of amino acid substitutions onto protein structures. In addition to organization of the data, new phylogenetic visualization tools have been developed to aid in interpreting the data that are also available, including TreeThrasher and TAED Tree Viewer. A new resource of gene families organized by species and taxonomic lineage promises to be a valuable comparative genomics database for molecular biologists, evolutionary biologists, and ecologists. The new visualization tools and database framework will be of interest to both evolutionary biologists and bioinformaticians.

Keywords

Molecular evolution Species diversification Computational comparative genomics 

Abbreviations

dN

The normalized rate of nonsynonymous substitutions

dS

The normalized rate of synonymous substitutions

TAED

The adaptive evolution database

PAM

Point accepted mutations, a measure of evolutionary distance

MPI

Message passing interface

MSA

Multiple sequence alignment

NHX

The new Hampshire X format

NCBI

National center for biotechnology information

BLAST

Basic local alignment search tool

PAML

Phylogenetic analysis using maximum-likelihood, software to estimate dN/dS

PDB

Protein data bank

KEGG

Kyoto encyclopedia of genes and genomes

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410CrossRefPubMedGoogle Scholar
  2. Anisimova M, Yang Z (2007) Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol 24:1219–1228CrossRefPubMedGoogle Scholar
  3. Anisimova M, Bielawski JP, Yang Z (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18:1585–1592CrossRefPubMedGoogle Scholar
  4. Benner SA, Chamberlin SG, Liberles DA, Govindarajan S, Knecht L (2000) Functional inferences from reconstructed evolutionary biology involving rectified databases—an evolutionarily grounded approach to functional genomics. Res Microbiol 151:97–106CrossRefPubMedGoogle Scholar
  5. Berglund-Sonnhammer AC, Steffansson P, Betts MJ, Liberles DA (2006) Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol 63:240–250CrossRefPubMedGoogle Scholar
  6. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242CrossRefPubMedPubMedCentralGoogle Scholar
  7. Cannarozzi GM, Schneider A (2012) Codon evolution: mechanisms and models. Oxford University Press, OxfordCrossRefGoogle Scholar
  8. Chamary JV, Parmley JL, Hurst LD (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 7:98–108CrossRefPubMedGoogle Scholar
  9. Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165CrossRefPubMedPubMedCentralGoogle Scholar
  10. Dasmeh P, Serohijos AWR, Kepp KP, Shakhnovich EI (2014) The influence of selection for protein stability on dN/dS estimations. Genome Biol Evol 6:2956–2967CrossRefPubMedPubMedCentralGoogle Scholar
  11. Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134:341–352CrossRefPubMedPubMedCentralGoogle Scholar
  12. Eöry L, Gilbert MTP, Li C, Li B, Archibald A, Aken BL, Zhang G, Jarvis E, Flicek P, Burt DW (2015) Avianbase: a community resource for bird genomics. Genome Biol 16:21CrossRefPubMedPubMedCentralGoogle Scholar
  13. Fletcher W, Yang Z (2010) The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol 27:2257–2267CrossRefPubMedGoogle Scholar
  14. Gharib WH, Robinson-Rechavi M (2013) The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC. Mol Biol Evol 30:1675–1686CrossRefPubMedPubMedCentralGoogle Scholar
  15. Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736PubMedGoogle Scholar
  16. Gonnet GH, Hallett MT, Korostensky C, Bernardin L (2000) Darwin v. 2.0: an interpreted computer language for the biosciences. Bioinformatics 16:101–103CrossRefPubMedGoogle Scholar
  17. Gouveia-Oliveira R, Sackett PW, Pedersen AG (2007) MaxAlign: maximizing usable data in an alignment. BMC Bioinformatics 8:312CrossRefPubMedPubMedCentralGoogle Scholar
  18. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321CrossRefPubMedGoogle Scholar
  19. Hermansen RA, Mannakee BK, Knecht W, Liberles DA, Gutenkunst RN (2015) Characterizing selective pressures on the pathway for de novo biosynthesis of pyrimidines in yeast. BMC Evol Biol 15:232CrossRefPubMedPubMedCentralGoogle Scholar
  20. Hermansen RA, Hvidsten TR, Sandve SR, Liberles DA (2016) Extracting functional trends from whole genome duplication events using comparative genomics. Biol Proced Online 18:11CrossRefPubMedPubMedCentralGoogle Scholar
  21. Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, Vilella AJ, Searle SM, Amode R, Brent S et al (2016) Ensembl comparative genomics resources. Database. doi:10.1093/database/bav096 Google Scholar
  22. Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al (2002) The Ensembl genome database project. Nucleic Acids Res 30:38–41CrossRefPubMedPubMedCentralGoogle Scholar
  23. Hughes T, Hyun Y, Liberles DA (2004) Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinform 5:48CrossRefGoogle Scholar
  24. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30CrossRefPubMedPubMedCentralGoogle Scholar
  25. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44:D457–D462CrossRefPubMedGoogle Scholar
  26. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780CrossRefPubMedPubMedCentralGoogle Scholar
  27. Konrad A, Teufel AI, Grahnen JA, Liberles DA (2011) Toward a general model for the evolutionary dynamics of gene duplicates. Genome Biol Evol 3:1197–1209CrossRefPubMedPubMedCentralGoogle Scholar
  28. Letunic I, Bork P (2007) Interactive tree of life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23:127–128CrossRefPubMedGoogle Scholar
  29. Liberles DA (2007) Ancestral sequence reconstruction. Oxford University Press, OxfordCrossRefGoogle Scholar
  30. Liberles DA, Schreiber DR, Govindarajan S, Chamberlain SG, Benner SA (2001) The adaptive evolution database (TAED). Genome Biol Res 2(8):1–6Google Scholar
  31. Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, Hvidsten TR, Leong JS, Minkley DR, Zimin A et al (2016) The Atlantic salmon genome provides insights into rediploidization. Nature 533:200–205CrossRefPubMedGoogle Scholar
  32. Loytynoja A, Vilella AJ, Goldman N (2012) Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm. Bioinformatics 28:1684–1691CrossRefPubMedPubMedCentralGoogle Scholar
  33. Matsen FA, Kodner RB, Armbrust EV (2010) Pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform 11:538CrossRefGoogle Scholar
  34. Pervez MT, Babar ME, Nadeem A, Aslam M, Awan AR, Aslam N, Hussain T, Naveed N, Qadri S, Waheed U et al (2014) Evaluating the accuracy and efficiency of multiple sequence alignment methods. Evol Bioinform Online 10:205–217CrossRefPubMedPubMedCentralGoogle Scholar
  35. Pollock DD, Goldstein RA (2014) Strong evidence for protein epistasis, weak evidence against it. Proc Natl Acad Sci 111:E1450CrossRefPubMedPubMedCentralGoogle Scholar
  36. Pollock DD, Thiltgen G, Goldstein RA (2012) Amino acid coevolution induces an evolutionary Stokes shift. Proc Natl Acad Sci USA 109:E1352–E1359CrossRefPubMedPubMedCentralGoogle Scholar
  37. Proux E, Studer RA, Moretti S, Robinson-Rechavi M (2009) Selectome: a database of positive selection. Nucleic Acids Res 37:D404–D407CrossRefPubMedGoogle Scholar
  38. Rosindell J, Harmon LJ (2012) OneZoom: a fractal explorer for the tree of life. PLoS Biol 10:e1001406CrossRefPubMedPubMedCentralGoogle Scholar
  39. Roth C, Liberles DA (2006) A systematic search for positive selection in higher plants (Embryophytes). BMC Plant Biol 6:12CrossRefPubMedPubMedCentralGoogle Scholar
  40. Roth C, Betts MJ, Steffansson P, Saelensminde G, Liberles DA (2005) The adaptive evolution database (TAED): a phylogeny based tool for comparative genomics. Nucleic Acids Res 33:D495–D497CrossRefPubMedGoogle Scholar
  41. Roth C, Rastogi S, Arvestad L, Dittmar K, Light S, Ekman D, Liberles DA (2007) Evolution after gene duplication: models, mechanisms, sequences, systems, and organisms. J Exp Zool B 308:58–73CrossRefGoogle Scholar
  42. Sela I, Ashkenazy H, Katoh K, Pupko T (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43:W7–W14CrossRefPubMedPubMedCentralGoogle Scholar
  43. Shah P, McCandlish DM, Plotkin JB (2015) Contingency and entrenchment in protein evolution under purifying selection. Proc Natl Acad Sci 112:E3226–E3235CrossRefPubMedPubMedCentralGoogle Scholar
  44. Simonsen M, Mailund T, Pedersen CNS (2008) Rapid neighbour-joining. In: Crandall KA, Lagergren J (eds) Algorithms in bioinformatics: Proceeding of 8th International Workshop, WABI 2008, Karlsruhe, Germany. Springer, Berlin, pp 113–122, September 15–19 2008Google Scholar
  45. Studer RA, Penel S, Duret L, Robinson-Rechavi M (2008) Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes. Genome Res 18:1393–1402CrossRefPubMedPubMedCentralGoogle Scholar
  46. Suchard MA, Redelings BD (2006) BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22:2047–2048CrossRefPubMedGoogle Scholar
  47. Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34:W609–W612CrossRefPubMedPubMedCentralGoogle Scholar
  48. Tellgren Å, Berglund A-C, Savolainen P, Janis CM, Liberles DA (2004) Myostatin rapid sequence evolution in ruminants predates domestication. Mol Phylogenet Evol 33:782–790CrossRefPubMedGoogle Scholar
  49. Tellgren-Roth Å, Dittmar K, Massey SE, Kemi C, Tellgren-Roth C, Savolainen P, Lyons LA, Liberles DA (2009) Keeping the blood flowing—plasminogen activator genes and feeding behavior in vampire bats. Naturwissenschaften 96:39–47CrossRefPubMedGoogle Scholar
  50. The Genomes Project C (2015) A global reference for human genetic variation. Nature 526:68–74CrossRefGoogle Scholar
  51. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19:327–335CrossRefPubMedPubMedCentralGoogle Scholar
  52. Wallace IM, O’Sullivan O, Higgins DG, Notredame C (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34:1692–1699CrossRefPubMedPubMedCentralGoogle Scholar
  53. Yachdav G, Wilzbach S, Rauscher B, Sheridan R, Sillitoe I, Procter J, Lewis SE, Rost B, Goldberg T (2016) MSAViewer: interactive JavaScript visualization of multiple sequence alignments. Bioinformatics 32:3501–3503PubMedPubMedCentralGoogle Scholar
  54. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591CrossRefPubMedGoogle Scholar
  55. Yang Z, Nielsen R (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46:409–418CrossRefPubMedGoogle Scholar
  56. Yang Z, Nielsen R (2008) Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol 25:568–579CrossRefPubMedGoogle Scholar
  57. Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641–1650PubMedPubMedCentralGoogle Scholar
  58. Yang Z, Nielsen R, Goldman N, Pedersen AM (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449PubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Russell A. Hermansen
    • 1
    • 2
  • Benjamin P. Oswald
    • 2
    • 3
  • Stormy Knight
    • 2
    • 4
  • Stephen D. Shank
    • 1
  • David Northover
    • 1
  • Katharine L. Korunes
    • 2
    • 5
  • Stephen N. Michel
    • 2
    • 6
  • David A. Liberles
    • 1
    • 2
  1. 1.Department of Biology and Center for Computational Genetics and GenomicsTemple UniversityPhiladelphiaUSA
  2. 2.Department of Molecular BiologyUniversity of WyomingLaramieUSA
  3. 3.IBEST Computational Resources CoreUniversity of IdahoMoscowUSA
  4. 4.Supercomputer Systems GroupNational Center for Atmospheric ResearchCheyenneUSA
  5. 5.Department of BiologyDuke UniversityDurhamUSA
  6. 6.College of Biological SciencesUniversity of MinnesotaSaint PaulUSA

Personalised recommendations