Scaling Up the Phylogenetic Detection of Lateral Gene Transfer Events
Lateral genetic transfer (LGT) is the process by which genetic material moves between organisms (and viruses) in the biosphere. Among the many approaches developed for the inference of LGT events from DNA sequence data, methods based on the comparison of phylogenetic trees remain the gold standard for many types of problem. Identifying LGT events from sequenced genomes typically involves a series of steps in which homologous sequences are identified and aligned, phylogenetic trees are inferred, and their topologies are compared to identify unexpected or conflicting relationships. These types of approach have been used to elucidate the nature and extent of LGT and its physiological and ecological consequences throughout the Tree of Life. Advances in DNA sequencing technology have led to enormous increases in the number of sequenced genomes, including ultra-deep sampling of specific taxonomic groups and single cell-based sequencing of unculturable “microbial dark matter.” Environmental shotgun sequencing enables the study of LGT among organisms that share the same habitat.
This abundance of genomic data offers new opportunities for scientific discovery, but poses two key problems. As ever more genomes are generated, the assembly and annotation of each individual genome receives less scrutiny; and with so many genomes available it is tempting to include them all in a single analysis, but thousands of genomes and millions of genes can overwhelm key algorithms in the analysis pipeline. Identifying LGT events of interest therefore depends on choosing the right dataset, and on algorithms that appropriately balance speed and accuracy given the size and composition of the chosen set of genomes.
Key wordsLateral genetic transfer Horizontal genetic transfer Phylogenetic analysis Phylogenomics Multiple sequence alignment Orthology
The authors acknowledge the collaboration of Robert Charlebois, Aaron Darling, Tim Harlow, Elizabeth Skippington, Chris Whidden, Simon Wong, and Norbert Zeh. CXC is supported by a University of Queensland Early Career Researcher Grant. RGB acknowledges the support of the Canada Research Chairs program. The SPR supertree work was supported by the Canadian Natural Sciences and Engineering Research Council, the Dalhousie Killam Trusts, and the Canadian Institutes for Health Research. MAR acknowledges support from the Australian Research Council, the James S. McDonnell Foundation, and The University of Queensland.
- 48.Bonham-Carter O, Steele J, Bastola D (2013) Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis. Brief Bioinform 15:890–905Google Scholar
- 60.Price MN, Dehal PS, Arkin AP (2010) Fast- / Tree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490Google Scholar