The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates

Hermansen, Russell A.; Oswald, Benjamin P.; Knight, Stormy; Shank, Stephen D.; Northover, David; Korunes, Katharine L.; Michel, Stephen N.; Liberles, David A.

doi:10.1007/s00239-017-9806-8

The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates

Original Article
Published: 09 August 2017

Volume 85, pages 46–56, (2017)
Cite this article

Journal of Molecular Evolution Aims and scope Submit manuscript

Russell A. Hermansen^1,2,
Benjamin P. Oswald²^nAff3,
Stormy Knight²^nAff4,
Stephen D. Shank¹,
David Northover¹,
Katharine L. Korunes²^nAff5,
Stephen N. Michel²^nAff6 &
…
David A. Liberles ORCID: orcid.org/0000-0003-3487-8826^1,2

701 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

With the large collections of gene and genome sequences, there is a need to generate curated comparative genomic databases that enable interpretation of results in an evolutionary context. Such resources can facilitate an understanding of the co-evolution of genes in the context of a genome mapped onto a phylogeny, of a protein structure, and of interactions within a pathway. A phylogenetically indexed gene family database, the adaptive evolution database (TAED), is presented that organizes gene families and their evolutionary histories in a species tree context. Gene families include alignments, phylogenetic trees, lineage-specific dN/dS ratios, reconciliation with the species tree to enable both the mapping and the identification of duplication events, mapping of gene families onto pathways, and mapping of amino acid substitutions onto protein structures. In addition to organization of the data, new phylogenetic visualization tools have been developed to aid in interpreting the data that are also available, including TreeThrasher and TAED Tree Viewer. A new resource of gene families organized by species and taxonomic lineage promises to be a valuable comparative genomics database for molecular biologists, evolutionary biologists, and ecologists. The new visualization tools and database framework will be of interest to both evolutionary biologists and bioinformaticians.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genome-Wide Comparative Analysis of Phylogenetic Trees: The Prokaryotic Forest of Life

PhylDiag: identifying complex synteny blocks that include tandem duplications using phylogenetic gene trees

Article Open access 08 August 2014

A Protocol for Phylogenetic Reconstruction

Abbreviations

dN:: The normalized rate of nonsynonymous substitutions
dS:: The normalized rate of synonymous substitutions
TAED:: The adaptive evolution database
PAM:: Point accepted mutations, a measure of evolutionary distance
MPI:: Message passing interface
MSA:: Multiple sequence alignment
NHX:: The new Hampshire X format
NCBI:: National center for biotechnology information
BLAST:: Basic local alignment search tool
PAML:: Phylogenetic analysis using maximum-likelihood, software to estimate dN/dS
PDB:: Protein data bank
KEGG:: Kyoto encyclopedia of genes and genomes

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article CAS PubMed Google Scholar
Anisimova M, Yang Z (2007) Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol 24:1219–1228
Article CAS PubMed Google Scholar
Anisimova M, Bielawski JP, Yang Z (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18:1585–1592
Article CAS PubMed Google Scholar
Benner SA, Chamberlin SG, Liberles DA, Govindarajan S, Knecht L (2000) Functional inferences from reconstructed evolutionary biology involving rectified databases—an evolutionarily grounded approach to functional genomics. Res Microbiol 151:97–106
Article CAS PubMed Google Scholar
Berglund-Sonnhammer AC, Steffansson P, Betts MJ, Liberles DA (2006) Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol 63:240–250
Article CAS PubMed Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
Article CAS PubMed PubMed Central Google Scholar
Cannarozzi GM, Schneider A (2012) Codon evolution: mechanisms and models. Oxford University Press, Oxford
Book Google Scholar
Chamary JV, Parmley JL, Hurst LD (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 7:98–108
Article CAS PubMed Google Scholar
Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165
Article CAS PubMed PubMed Central Google Scholar
Dasmeh P, Serohijos AWR, Kepp KP, Shakhnovich EI (2014) The influence of selection for protein stability on dN/dS estimations. Genome Biol Evol 6:2956–2967
Article CAS PubMed PubMed Central Google Scholar
Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134:341–352
Article CAS PubMed PubMed Central Google Scholar
Eöry L, Gilbert MTP, Li C, Li B, Archibald A, Aken BL, Zhang G, Jarvis E, Flicek P, Burt DW (2015) Avianbase: a community resource for bird genomics. Genome Biol 16:21
Article PubMed PubMed Central Google Scholar
Fletcher W, Yang Z (2010) The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol 27:2257–2267
Article CAS PubMed Google Scholar
Gharib WH, Robinson-Rechavi M (2013) The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC. Mol Biol Evol 30:1675–1686
Article CAS PubMed PubMed Central Google Scholar
Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736
CAS PubMed Google Scholar
Gonnet GH, Hallett MT, Korostensky C, Bernardin L (2000) Darwin v. 2.0: an interpreted computer language for the biosciences. Bioinformatics 16:101–103
Article CAS PubMed Google Scholar
Gouveia-Oliveira R, Sackett PW, Pedersen AG (2007) MaxAlign: maximizing usable data in an alignment. BMC Bioinformatics 8:312
Article PubMed PubMed Central Google Scholar
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321
Article CAS PubMed Google Scholar
Hermansen RA, Mannakee BK, Knecht W, Liberles DA, Gutenkunst RN (2015) Characterizing selective pressures on the pathway for de novo biosynthesis of pyrimidines in yeast. BMC Evol Biol 15:232
Article PubMed PubMed Central Google Scholar
Hermansen RA, Hvidsten TR, Sandve SR, Liberles DA (2016) Extracting functional trends from whole genome duplication events using comparative genomics. Biol Proced Online 18:11
Article PubMed PubMed Central Google Scholar
Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, Vilella AJ, Searle SM, Amode R, Brent S et al (2016) Ensembl comparative genomics resources. Database. doi:10.1093/database/bav096
Google Scholar
Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al (2002) The Ensembl genome database project. Nucleic Acids Res 30:38–41
Article CAS PubMed PubMed Central Google Scholar
Hughes T, Hyun Y, Liberles DA (2004) Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinform 5:48
Article Google Scholar
Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Article CAS PubMed PubMed Central Google Scholar
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44:D457–D462
Article CAS PubMed Google Scholar
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780
Article CAS PubMed PubMed Central Google Scholar
Konrad A, Teufel AI, Grahnen JA, Liberles DA (2011) Toward a general model for the evolutionary dynamics of gene duplicates. Genome Biol Evol 3:1197–1209
Article CAS PubMed PubMed Central Google Scholar
Letunic I, Bork P (2007) Interactive tree of life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23:127–128
Article CAS PubMed Google Scholar
Liberles DA (2007) Ancestral sequence reconstruction. Oxford University Press, Oxford
Book Google Scholar
Liberles DA, Schreiber DR, Govindarajan S, Chamberlain SG, Benner SA (2001) The adaptive evolution database (TAED). Genome Biol Res 2(8):1–6
Google Scholar
Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, Hvidsten TR, Leong JS, Minkley DR, Zimin A et al (2016) The Atlantic salmon genome provides insights into rediploidization. Nature 533:200–205
Article CAS PubMed Google Scholar
Loytynoja A, Vilella AJ, Goldman N (2012) Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm. Bioinformatics 28:1684–1691
Article PubMed PubMed Central Google Scholar
Matsen FA, Kodner RB, Armbrust EV (2010) Pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform 11:538
Article Google Scholar
Pervez MT, Babar ME, Nadeem A, Aslam M, Awan AR, Aslam N, Hussain T, Naveed N, Qadri S, Waheed U et al (2014) Evaluating the accuracy and efficiency of multiple sequence alignment methods. Evol Bioinform Online 10:205–217
Article CAS PubMed PubMed Central Google Scholar
Pollock DD, Goldstein RA (2014) Strong evidence for protein epistasis, weak evidence against it. Proc Natl Acad Sci 111:E1450
Article CAS PubMed PubMed Central Google Scholar
Pollock DD, Thiltgen G, Goldstein RA (2012) Amino acid coevolution induces an evolutionary Stokes shift. Proc Natl Acad Sci USA 109:E1352–E1359
Article CAS PubMed PubMed Central Google Scholar
Proux E, Studer RA, Moretti S, Robinson-Rechavi M (2009) Selectome: a database of positive selection. Nucleic Acids Res 37:D404–D407
Article CAS PubMed Google Scholar
Rosindell J, Harmon LJ (2012) OneZoom: a fractal explorer for the tree of life. PLoS Biol 10:e1001406
Article CAS PubMed PubMed Central Google Scholar
Roth C, Liberles DA (2006) A systematic search for positive selection in higher plants (Embryophytes). BMC Plant Biol 6:12
Article PubMed PubMed Central Google Scholar
Roth C, Betts MJ, Steffansson P, Saelensminde G, Liberles DA (2005) The adaptive evolution database (TAED): a phylogeny based tool for comparative genomics. Nucleic Acids Res 33:D495–D497
Article CAS PubMed Google Scholar
Roth C, Rastogi S, Arvestad L, Dittmar K, Light S, Ekman D, Liberles DA (2007) Evolution after gene duplication: models, mechanisms, sequences, systems, and organisms. J Exp Zool B 308:58–73
Article Google Scholar
Sela I, Ashkenazy H, Katoh K, Pupko T (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43:W7–W14
Article CAS PubMed PubMed Central Google Scholar
Shah P, McCandlish DM, Plotkin JB (2015) Contingency and entrenchment in protein evolution under purifying selection. Proc Natl Acad Sci 112:E3226–E3235
Article CAS PubMed PubMed Central Google Scholar
Simonsen M, Mailund T, Pedersen CNS (2008) Rapid neighbour-joining. In: Crandall KA, Lagergren J (eds) Algorithms in bioinformatics: Proceeding of 8th International Workshop, WABI 2008, Karlsruhe, Germany. Springer, Berlin, pp 113–122, September 15–19 2008
Studer RA, Penel S, Duret L, Robinson-Rechavi M (2008) Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes. Genome Res 18:1393–1402
Article CAS PubMed PubMed Central Google Scholar
Suchard MA, Redelings BD (2006) BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22:2047–2048
Article CAS PubMed Google Scholar
Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34:W609–W612
Article CAS PubMed PubMed Central Google Scholar
Tellgren Å, Berglund A-C, Savolainen P, Janis CM, Liberles DA (2004) Myostatin rapid sequence evolution in ruminants predates domestication. Mol Phylogenet Evol 33:782–790
Article CAS PubMed Google Scholar
Tellgren-Roth Å, Dittmar K, Massey SE, Kemi C, Tellgren-Roth C, Savolainen P, Lyons LA, Liberles DA (2009) Keeping the blood flowing—plasminogen activator genes and feeding behavior in vampire bats. Naturwissenschaften 96:39–47
Article CAS PubMed Google Scholar
The Genomes Project C (2015) A global reference for human genetic variation. Nature 526:68–74
Article Google Scholar
Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19:327–335
Article CAS PubMed PubMed Central Google Scholar
Wallace IM, O’Sullivan O, Higgins DG, Notredame C (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34:1692–1699
Article CAS PubMed PubMed Central Google Scholar
Yachdav G, Wilzbach S, Rauscher B, Sheridan R, Sillitoe I, Procter J, Lewis SE, Rost B, Goldberg T (2016) MSAViewer: interactive JavaScript visualization of multiple sequence alignments. Bioinformatics 32:3501–3503
CAS PubMed PubMed Central Google Scholar
Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591
Article CAS PubMed Google Scholar
Yang Z, Nielsen R (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46:409–418
Article CAS PubMed Google Scholar
Yang Z, Nielsen R (2008) Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol 25:568–579
Article CAS PubMed Google Scholar
Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641–1650
CAS PubMed PubMed Central Google Scholar
Yang Z, Nielsen R, Goldman N, Pedersen AM (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors would like to thank Jason Davies and Ken-ichi Ueda for generating the base code for the D3 tree visualization software. The authors would also like to thank the Advanced Research Computing Center (ARCC) at the University of Wyoming for technical assistance in implementing the TAED pipeline. The authors would additionally like to thank Jessica Siltberg-Liberles and Dietlind Gerloff for feedback on data quality during database development. Lastly, the authors would like to thank National Science Foundation Grants DBI-0743374 and DBI-1355846 for support as well as the University of Wyoming INBRE Award P20 RR016474.

Author information

Benjamin P. Oswald
Present address: IBEST Computational Resources Core, University of Idaho, Moscow, ID, 83844, USA
Stormy Knight
Present address: Supercomputer Systems Group, National Center for Atmospheric Research, Cheyenne, WY, 82009, USA
Katharine L. Korunes
Present address: Department of Biology, Duke University, Durham, NC, 27708, USA
Stephen N. Michel
Present address: College of Biological Sciences, University of Minnesota, Saint Paul, MN, 55108, USA

Authors and Affiliations

Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
Russell A. Hermansen, Stephen D. Shank, David Northover & David A. Liberles
Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
Russell A. Hermansen, Benjamin P. Oswald, Stormy Knight, Katharine L. Korunes, Stephen N. Michel & David A. Liberles

Authors

Russell A. Hermansen
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin P. Oswald
View author publications
You can also search for this author in PubMed Google Scholar
Stormy Knight
View author publications
You can also search for this author in PubMed Google Scholar
Stephen D. Shank
View author publications
You can also search for this author in PubMed Google Scholar
David Northover
View author publications
You can also search for this author in PubMed Google Scholar
Katharine L. Korunes
View author publications
You can also search for this author in PubMed Google Scholar
Stephen N. Michel
View author publications
You can also search for this author in PubMed Google Scholar
David A. Liberles
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RAH and DAL conceived of the study. RAH built the pipeline to generate the database, built the database, executed most of the analysis, and wrote the first draft of the manuscript. DAL supervised the project and contributed significantly to the writing of the manuscript. BPO wrote TAED Tree Viewer, adapted OneZoom for TAED use, and also contributed to the pipeline and data analysis. SK wrote TreeThrasher. SDS contributed to database construction and function, including the protein viewer. DN wrote the API. KLK and SM contributed to running the pipeline and database analysis. All authors contributed to final writing of the manuscript.

Corresponding author

Correspondence to David A. Liberles.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hermansen, R.A., Oswald, B.P., Knight, S. et al. The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates. J Mol Evol 85, 46–56 (2017). https://doi.org/10.1007/s00239-017-9806-8

Download citation

Received: 19 July 2017
Accepted: 03 August 2017
Published: 09 August 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s00239-017-9806-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates

Abstract

Access this article

Similar content being viewed by others

Genome-Wide Comparative Analysis of Phylogenetic Trees: The Prokaryotic Forest of Life

PhylDiag: identifying complex synteny blocks that include tandem duplications using phylogenetic gene trees

A Protocol for Phylogenetic Reconstruction

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates

Abstract

Access this article

Similar content being viewed by others

Genome-Wide Comparative Analysis of Phylogenetic Trees: The Prokaryotic Forest of Life

PhylDiag: identifying complex synteny blocks that include tandem duplications using phylogenetic gene trees

A Protocol for Phylogenetic Reconstruction

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation