Testing for treeness: lateral gene transfer, phylogenetic inference, and model selection
A phylogeny that allows for lateral gene transfer (LGT) can be thought of as a strictly branching tree (all of whose branches are vertical) to which lateral branches have been added. Given that the goal of phylogenetics is to depict evolutionary history, we should look for the best supported phylogenetic network and not restrict ourselves to considering trees. However, the obvious extensions of popular tree-based methods such as maximum parsimony and maximum likelihood face a serious problem—if we judge networks by fit to data alone, networks that have lateral branches will always fit the data at least as well as any network that restricts itself to vertical branches. This is analogous to the well-studied problem of overfitting data in the curve-fitting problem. Analogous problems often have analogous solutions and we propose to treat network inference as a case of model selection and use the Akaike Information Criterion (AIC). Strictly tree-like networks are more parsimonious than those that postulate lateral as well as vertical branches. This leads to the conclusion that we should not always infer LGT events whenever it would improve our fit-to-data, but should do so only when the improved fit is larger than the penalty for adding extra lateral branches.
KeywordsAkaike Information Criterion Lateral gene transfer Model selection Parsimony Phylogenetic networks
We thank David Baum, Rob Beiko, Matt Haber, Ehud Lamm, Bret Larget, Luay Nakhleh, Mike Steel, and an anonymous referee for helpful discussion. This paper was first presented at the workshop, Perspectives on the Tree of Life, sponsored by the Leverhulme Trust and held in Halifax, Nova Scotia, July, 2009.
- Akaike H (1973) Information theory and an extension of the maximum likelihood principle. Pages 267–281 in Second International Symposium on Information Theory. Akademiai Kiado, BudapestGoogle Scholar
- Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical and information-theoretic approach, 3rd edn. Springer, New YorkGoogle Scholar
- Farris JS (1983) The logical basis of phylogenetic analysis. In: Platnick NI, Funk VA (eds), Advances in cladistics II. Columbia University Press, New York, pp 7–36. Reprinted in Sober E. (1994), Conceptual Issues in Evolutionary Biology, MIT Press, Cambridge, pp 333–362Google Scholar
- Jeffreys AJ (1982) Evolution of globin genes. In: Dover GA, Flavell RB (eds) Genome evolution. Academic Press, New York, pp 157–176Google Scholar
- Jukes TH, Cantor CR (1969) Evolution of protein molecules. Mammalian-protein metabolism. Academic Press, New York, pp 21–132Google Scholar
- Kluge AG (2005) What is the rationale for ‘Ockham’s Razor’ (a.k.a. Parsimony) in phylogenetic inference? In: Albert V (ed) Parsimony, phylogeny and genomics. Oxford University Press, Oxford, pp 15–42Google Scholar
- Kubo H (1939) Über hämoprotein aus den wurzelknöllchen von leguminosen. Acta Phytochim (Tokyo) 11:195–200Google Scholar
- Nakhleh L, Sun J, Warnow T, Linder CR, Moret BME, Tholse A (2003) Towards the development of computational tools for evaluating phylogenetic network reconstruction methods. In: Proceedings of the PSB03. Kauai, HawaiiGoogle Scholar
- Nakhleh L, Jin G, Zhao F, Mellor-Crummey J (2005) Reconstructing phylogenetic networks using maximum parsimony. In: Markstein V (ed). Proceedings of the 2005 IEEE computational systems bioinformatics conference (CSB2005); August. pp 93–102Google Scholar
- Park HJ, Jin G, Nakhleh L (2010) Bootstrap-based support of HGT inferred by maximum parsimony. BMC Evol Biol (forthcoming)Google Scholar
- Sober E (2008) Evidence and evolution—the logic behind the science. Cambridge University Press, CambridgeGoogle Scholar
- Wiley E (1981) Phylogenetics: the theory and practice of phylogenetic systematics. Wiley-Interscience, NYGoogle Scholar