Skip to main content

Phylogenetic Analysis That Models Compositional Heterogeneity over the Tree

  • Protocol
  • First Online:
Environmental Microbial Evolution

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2569))

  • 846 Accesses

Abstract

Molecular sequences in a phylogenetic analysis can differ in composition, and that shows that the process of evolution can change over time. However, models of evolution in common use are homogeneous over the tree, and if used in a phylogenetic analysis with compositionally tree-heterogeneous datasets these models can recover incorrect trees. The NDCH or Node-Discrete Compositional Heterogeneity model is able to model such data by accommodating differences in composition over the tree. Usage, problems, and limitations of this model are discussed, and a modification, the NDCH2 model, is described that can ameliorate some of these problems and limitations. Using these models can greatly increase the fit of the model to the data and can find better tree topologies. These models and various statistical tests are illustrated using a bacterial SSU rRNA dataset. These models are implemented in the software P4, and files for the analyses described here are made available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Muto A, Osawa S (1987) The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci (USA) 84:166–9. https://doi.org/10.1073/pnas.84.1.166

    Article  CAS  PubMed  Google Scholar 

  2. Embley TM, Thomas RH, Williams RAD (1993) Reduced thermophilic bias in the 16S rDNA sequence from Thermus ruber provides further support for a relationship between Thermus and Deinococcus. Syst Appl Microbiol 16:25–29. https://doi.org/10.1016/S0723-2020(11)80247-X

    Article  CAS  Google Scholar 

  3. Steel M, Lockhart P, Penny D (1993) Confidence in evolutionary trees from biological sequence data. Nature 364:440–442. https://doi.org/10.1038/364440a0

    Article  CAS  PubMed  Google Scholar 

  4. Hasegawa M, Hashimoto T (1993) Ribosomal RNA trees misleading? Nature 361:23. https://doi.org/10.1038/361023b0

    Article  CAS  PubMed  Google Scholar 

  5. Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci (USA) 91:1455–1459. https://doi.org/10.1073/pnas.91.4.1455

    Article  CAS  PubMed  Google Scholar 

  6. Lockhart PJ, Steel MA, Hendy MD, and Penny D (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605–612. https://doi.org/10.1093/oxfordjournals.molbev.a040136

    CAS  PubMed  Google Scholar 

  7. Foster PG, Jermiin LS, Hickey DA (1997) Nucleotide composition bias affects amino acid content in proteins coded by animal mitochondria. J Mol Evol 44:282–288. https://doi.org/10.1007/PL00006145

    Article  CAS  PubMed  Google Scholar 

  8. Mooers AØ, Holmes EC (2000) The evolution of base composition and phylogenetic inference. Trends Ecol Evol 15:365–369. https://doi.org/10.1016/S0169-5347(00)01934-0

  9. Naser-Khdour S, Minh BQ, Zhang W, Stone EA, Lanfear R (2019) The prevalence and impact of model violations in phylogenetic analysis. Genome Biol Evol 11:3341–3352. https://doi.org/10.1093/gbe/evz193

    Article  PubMed  PubMed Central  Google Scholar 

  10. Foster PG, Hickey DA (1999) Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J Mol Evol 48:284–290. https://doi.org/10.1007/PL00006471

    Article  CAS  PubMed  Google Scholar 

  11. Collins TM, Fedrigo O, Naylor GJ (2005) Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. Syst Biol 54:493–500. https://doi.org/10.1080/10635150590947339

    Article  PubMed  Google Scholar 

  12. Rodríguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF, Philippe H (2007) Detecting and overcoming systematic errors in genome-scale phylogenies. Syst Biol 56:389–399. https://doi.org/10.1080/10635150701397643

    Article  PubMed  Google Scholar 

  13. Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): A new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10:1–21. https://doi.org/10.1186/1471-2148-10-210

    Article  Google Scholar 

  14. Hirt RP, Logsdon JM, Healy B, Dorey MW, Doolittle WF, Embley TM (1999) Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc Natl Acad Sci (USA) 96:580–585. https://doi.org/10.1073/pnas.96.2.580

    Article  CAS  PubMed  Google Scholar 

  15. Yang Z, Roberts D (1995) On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol Biol Evol 12:451–458. https://doi.org/10.1093/oxfordjournals.molbev.a040220

    CAS  PubMed  Google Scholar 

  16. Galtier N, Gouy M (1998) Inferring pattern and process: Maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol 15:871–879. https://doi.org/10.1093/oxfordjournals.molbev.a025991

    Article  CAS  PubMed  Google Scholar 

  17. Foster PG (2004) Modeling compositional heterogeneity. Syst Biol 53:485–495. https://doi.org/10.1080/10635150490445779

    Article  PubMed  Google Scholar 

  18. Gowri-Shankar V, Rattray M (2007) A reversible jump method for Bayesian phylogenetic inference with a nonhomogeneous substitution model. Mol Biol Evol 24:1286–1299. https://doi.org/10.1093/molbev/msm046

    Article  CAS  PubMed  Google Scholar 

  19. Blanquart S, Lartillot N (2008) A site- and time-heterogeneous model of amino acid replacement. Mol Biol Evol 25:842–858. https://doi.org/10.1093/molbev/msn018

    Article  CAS  PubMed  Google Scholar 

  20. Heaps SE, Nye TM, Boys RJ, Williams TA, Embley TM (2014) Bayesian modelling of compositional heterogeneity in molecular phylogenetics. Stat Appl Genet Mol Biol 13:589–609. https://doi.org/10.1515/sagmb-2013-0077

    Article  PubMed  Google Scholar 

  21. Williams TA, Heaps SE, Cherlin S, Nye TM, Boys RJ, Embley TM (2015) New substitution models for rooting phylogenetic trees. Phil Trans Roy Soc B Biol Sci 370:20140336. https://doi.org/10.1098/rstb.2014.0336

    Article  Google Scholar 

  22. Jermiin LS, Ho SY, Ababneh F, Robinson J, Larkum AW (2004) The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst Biol 53:638–643. https://doi.org/10.1080/10635150490468648

    Article  PubMed  Google Scholar 

  23. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376. https://doi.org/10.1007/BF01734359

    Article  CAS  PubMed  Google Scholar 

  24. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Haeseler A von, Lanfear R (2020) IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534. https://doi.org/10.1093/molbev/msaa015

  25. Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685–695. https://doi.org/10.1093/oxfordjournals.molbev.a025808

    Article  CAS  PubMed  Google Scholar 

  26. Cox CJ, Foster PG, Hirt RP, Harris SR, Embley TM (2008) The archaebacterial origin of eukaryotes. Proc Natl Acad Sci (USA) 105:20356–20361. https://doi.org/10.1073/pnas.0810647105

    Article  CAS  PubMed  Google Scholar 

  27. Foster PG, Cox CJ, Embley TM (2009) The primary divisions of life: a phylogenomic approach employing composition-heterogeneous methods. Phil Trans Roy Soc B Biol Sci 364:2197–2207. https://doi.org/10.1098/rstb.2009.0034

    Article  Google Scholar 

  28. Kalyaanamoorthy S, Minh BQ, Wong TK, Von Haeseler A, Jermiin LS (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Meth 14:587–589. https://doi.org/10.1038/nmeth.4285

    Article  CAS  Google Scholar 

  29. Ababneh F, Jermiin LS, Ma C, Robinson J (2006) Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics 22:1225–1231. https://doi.org/10.1093/bioinformatics/btl064

    Article  CAS  PubMed  Google Scholar 

  30. Jermiin LS, Jayaswal V, Ababneh FM, and Robinson J (2016) Identifying Optimal Models of Evolution. In: Methods in molecular biology. Springer, New York, pp 379–420. https://doi.org/10.1007/978-1-4939-6622-6_15

    Google Scholar 

  31. Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Syst Biol 51:492–508. https://doi.org/10.1080/10635150290069913

    Article  PubMed  Google Scholar 

  32. Shimodaira H, Hasegawa M (2001) CONSEL: For assessing the confidence of phylogenetic tree selection. Bioinformatics 17:1246–1247. https://doi.org/10.1093/bioinformatics/17.12.1246

    Article  CAS  PubMed  Google Scholar 

  33. Bollback JP (2002) Bayesian model adequacy and choice in phylogenetics. Mol Biol Evol 19:1171–1180. https://doi.org/10.1093/oxfordjournals.molbev.a004175

    Article  CAS  PubMed  Google Scholar 

  34. Xie W, Lewis PO, Fan Y, Kuo L, Chen M.-H (2011) Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst Biol 60:150–160. https://doi.org/10.1093/sysbio/syq085

    Article  PubMed  Google Scholar 

  35. Geyer CJ (1991) Markov chain Monte Carlo maximum likelihood

    Google Scholar 

  36. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754–755. https://doi.org/10.1093/bioinformatics/17.8.754

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Foster, P.G. (2022). Phylogenetic Analysis That Models Compositional Heterogeneity over the Tree. In: Luo, H. (eds) Environmental Microbial Evolution. Methods in Molecular Biology, vol 2569. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2691-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2691-7_6

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2690-0

  • Online ISBN: 978-1-0716-2691-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics