Bioinformatics pp 421-432 | Cite as

Scaling Up the Phylogenetic Detection of Lateral Gene Transfer Events

  • Cheong Xin Chan
  • Robert G. Beiko
  • Mark A. RaganEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1525)


Lateral genetic transfer (LGT) is the process by which genetic material moves between organisms (and viruses) in the biosphere. Among the many approaches developed for the inference of LGT events from DNA sequence data, methods based on the comparison of phylogenetic trees remain the gold standard for many types of problem. Identifying LGT events from sequenced genomes typically involves a series of steps in which homologous sequences are identified and aligned, phylogenetic trees are inferred, and their topologies are compared to identify unexpected or conflicting relationships. These types of approach have been used to elucidate the nature and extent of LGT and its physiological and ecological consequences throughout the Tree of Life. Advances in DNA sequencing technology have led to enormous increases in the number of sequenced genomes, including ultra-deep sampling of specific taxonomic groups and single cell-based sequencing of unculturable “microbial dark matter.” Environmental shotgun sequencing enables the study of LGT among organisms that share the same habitat.

This abundance of genomic data offers new opportunities for scientific discovery, but poses two key problems. As ever more genomes are generated, the assembly and annotation of each individual genome receives less scrutiny; and with so many genomes available it is tempting to include them all in a single analysis, but thousands of genomes and millions of genes can overwhelm key algorithms in the analysis pipeline. Identifying LGT events of interest therefore depends on choosing the right dataset, and on algorithms that appropriately balance speed and accuracy given the size and composition of the chosen set of genomes.

Key words

Lateral genetic transfer Horizontal genetic transfer Phylogenetic analysis Phylogenomics Multiple sequence alignment Orthology 



The authors acknowledge the collaboration of Robert Charlebois, Aaron Darling, Tim Harlow, Elizabeth Skippington, Chris Whidden, Simon Wong, and Norbert Zeh. CXC is supported by a University of Queensland Early Career Researcher Grant. RGB acknowledges the support of the Canada Research Chairs program. The SPR supertree work was supported by the Canadian Natural Sciences and Engineering Research Council, the Dalhousie Killam Trusts, and the Canadian Institutes for Health Research. MAR acknowledges support from the Australian Research Council, the James S. McDonnell Foundation, and The University of Queensland.


  1. 1.
    Fleischmann RD, Adams MD, White O et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512CrossRefPubMedGoogle Scholar
  2. 2.
    Welch RA, Burland V, Plunkett G et al (2002) Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A 99:17020–17024CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Gogarten JP, Townsend JP (2005) Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol 3:679–687CrossRefPubMedGoogle Scholar
  4. 4.
    Keeling PJ, Palmer JD (2008) Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet 9:605–618CrossRefPubMedGoogle Scholar
  5. 5.
    Ochman H, Lawrence JG, Groisman EA (2000) Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304CrossRefPubMedGoogle Scholar
  6. 6.
    Chan CX, Beiko RG, Ragan MA (2011) Lateral transfer of genes and gene fragments in Staphylococcus extends beyond mobile elements. J Bacteriol 193:3964–3977CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Young BC, Golubchik T, Batty EM et al (2012) Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease. Proc Natl Acad Sci U S A 109:4550–4555CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Beiko RG, Harlow TJ, Ragan MA (2005) Highways of gene sharing in prokaryotes. Proc Natl Acad Sci U S A 102:14332–14337CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Puigbò P, Wolf YI, Koonin EV (2010) The tree and net components of prokaryote evolution. Genome Biol Evol 2:745–756CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Beiko RG (2011) Telling the whole story in a 10,000-genome world. Biol Direct 6:34CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Yutin N, Puigbò P, Koonin EV et al (2012) Phylogenomics of prokaryotic ribosomal proteins. PLoS One 7:e36972CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Smillie CS, Smith MB, Friedman J et al (2011) Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480:241–244CrossRefPubMedGoogle Scholar
  13. 13.
    Ehrlich GD, Ahmed A, Earl J et al (2010) The distributed genome hypothesis as a rubric for understanding evolution in situ during chronic bacterial biofilm infectious processes. FEMS Immunol Med Microbiol 59:269–279CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Stecher B, Denzler R, Maier L et al (2012) Gut inflammation can boost horizontal gene transfer between pathogenic and commensal Enterobacteriaceae. Proc Natl Acad Sci U S A 109:1269–1274CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Chan CX, Beiko RG, Darling AE et al (2009) Lateral transfer of genes and gene fragments in prokaryotes. Genome Biol Evol 1:429–438CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Chan CX, Darling AE, Beiko RG et al (2009) Are protein domains modules of lateral genetic transfer? PLoS One 4:e4524CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Lawrence JG, Ochman H (1997) Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 44:383–397CrossRefPubMedGoogle Scholar
  18. 18.
    Ragan MA, Harlow TJ, Beiko RG (2006) Do different surrogate methods detect lateral genetic transfer events of different relative ages? Trends Microbiol 14:4–8CrossRefPubMedGoogle Scholar
  19. 19.
    Stein L (2001) Genome annotation: from sequence to biology. Nat Rev Genet 2:493–503CrossRefPubMedGoogle Scholar
  20. 20.
    El-Metwally S, Hamza T, Zakaria M et al (2013) Next-generation sequence assembly: four stages of data processing and computational challenges. PLoS Comput Biol 9:e1003345CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Richardson EJ, Watson M (2013) The automatic annotation of bacterial genomes. Brief Bioinform 14:1–12CrossRefPubMedGoogle Scholar
  22. 22.
    Benson DA, Cavanaugh M, Clark K et al (2013) GenBank. Nucleic Acids Res 41:D36–D42CrossRefPubMedGoogle Scholar
  23. 23.
    Wattam AR, Abraham D, Dalay O et al (2014) PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42:D581–D591CrossRefPubMedGoogle Scholar
  24. 24.
    Pagani I, Liolios K, Jansson J et al (2012) The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 40:D571–D579CrossRefPubMedGoogle Scholar
  25. 25.
    Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461CrossRefPubMedGoogle Scholar
  27. 27.
    Zhao Y, Tang H, Ye Y (2012) RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28:125–126CrossRefPubMedGoogle Scholar
  28. 28.
    Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10:210CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577CrossRefPubMedGoogle Scholar
  33. 33.
    Ronquist F, Teslenko M, van der Mark P et al (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Creevey CJ, McInerney JO (2005) CLANN: investigating phylogenetic information through supertree analyses. Bioinformatics 21:390–392CrossRefPubMedGoogle Scholar
  36. 36.
    Whidden C, Zeh N, Beiko RG (2014) Supertrees based on the subtree prune-and-regraft distance. Syst Biol 63:566–581CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Revell LJ (2012) phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3:217–223CrossRefGoogle Scholar
  38. 38.
    Harlow TJ, Gogarten JP, Ragan MA (2004) A hybrid clustering approach to recognition of protein families in 114 microbial genomes. BMC Bioinformatics 5:45CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Skippington E, Ragan MA (2011) Within-species lateral genetic transfer and the evolution of transcriptional regulation in Escherichia coli and Shigella. BMC Genomics 12:532CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Beiko RG, Ragan MA (2008) Detecting lateral genetic transfer: a phylogenetic approach. Methods Mol Biol 452:457–469CrossRefPubMedGoogle Scholar
  41. 41.
    Yang Z (1994) Estimating the pattern of nucleotide substitution. J Mol Evol 39:105–111PubMedGoogle Scholar
  42. 42.
    Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699CrossRefPubMedGoogle Scholar
  43. 43.
    Reinert G, Chew D, Sun F et al (2009) Alignment-free sequence comparison (I): statistics and power. J Comput Biol 16:1615–1634CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Wan L, Reinert G, Sun F et al (2010) Alignment-free sequence comparison (II): theoretical power of comparison statistics. J Comput Biol 17:1467–1490CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Ulitsky I, Burstein D, Tuller T et al (2006) The average common substring approach to phylogenomic reconstruction. J Comput Biol 13:336–350CrossRefPubMedGoogle Scholar
  46. 46.
    Domazet-Lošo M, Haubold B (2009) Efficient estimation of pairwise distances between genomes. Bioinformatics 25:3221–3227CrossRefPubMedGoogle Scholar
  47. 47.
    Chan CX, Bernard G, Poirion O et al (2014) Inferring phylogenies of evolving sequences without multiple sequence alignment. Sci Rep 4:6504CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Bonham-Carter O, Steele J, Bastola D (2013) Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis. Brief Bioinform 15:890–905Google Scholar
  49. 49.
    Haubold B (2014) Alignment-free phylogenetics and population genetics. Brief Bioinform 15:407–418CrossRefPubMedGoogle Scholar
  50. 50.
    Ragan MA, Bernard G, Chan CX (2014) Molecular phylogenetics before sequences: oligonucleotide catalogs as k-mer spectra. RNA Biol 11:176–185CrossRefPubMedPubMedCentralGoogle Scholar
  51. 51.
    Chan CX, Ragan MA (2013) Next-generation phylogenomics. Biol Direct 8:3CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Baum BR (1992) Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41:3–10CrossRefGoogle Scholar
  53. 53.
    Ragan MA (1992) Phylogenetic inference based on matrix representation of trees. Mol Phylogenet Evol 1:53–58CrossRefPubMedGoogle Scholar
  54. 54.
    Beiko RG, Hamilton N (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evol Biol 6:15CrossRefPubMedPubMedCentralGoogle Scholar
  55. 55.
    Whidden C, Beiko R, Zeh N (2013) Fixed-parameter algorithms for maximum agreement forests. SIAM J Comput 42:1431–1466CrossRefGoogle Scholar
  56. 56.
    Skippington E, Ragan MA (2011) Lateral genetic transfer and the construction of genetic exchange communities. FEMS Microbiol Rev 35:707–735CrossRefPubMedGoogle Scholar
  57. 57.
    Aberer AJ, Kobert K, Stamatakis A (2014) ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol 31(10):2553–2556CrossRefPubMedPubMedCentralGoogle Scholar
  58. 58.
    Drummond AJ, Suchard MA, Xie D et al (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29:1969–1973CrossRefPubMedPubMedCentralGoogle Scholar
  59. 59.
    Guindon S, Dufayard JF, Lefort V et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321CrossRefPubMedGoogle Scholar
  60. 60.
    Price MN, Dehal PS, Arkin AP (2010) Fast- / Tree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Cheong Xin Chan
    • 1
  • Robert G. Beiko
    • 2
  • Mark A. Ragan
    • 1
    Email author
  1. 1.Institute for Molecular BioscienceThe University of QueenslandBrisbaneAustralia
  2. 2.Faculty of Computer ScienceDalhousie UniversityHalifaxCanada

Personalised recommendations