Biology & Philosophy

, Volume 25, Issue 4, pp 675–687 | Cite as

Testing for treeness: lateral gene transfer, phylogenetic inference, and model selection

  • Joel D. Velasco
  • Elliott Sober


A phylogeny that allows for lateral gene transfer (LGT) can be thought of as a strictly branching tree (all of whose branches are vertical) to which lateral branches have been added. Given that the goal of phylogenetics is to depict evolutionary history, we should look for the best supported phylogenetic network and not restrict ourselves to considering trees. However, the obvious extensions of popular tree-based methods such as maximum parsimony and maximum likelihood face a serious problem—if we judge networks by fit to data alone, networks that have lateral branches will always fit the data at least as well as any network that restricts itself to vertical branches. This is analogous to the well-studied problem of overfitting data in the curve-fitting problem. Analogous problems often have analogous solutions and we propose to treat network inference as a case of model selection and use the Akaike Information Criterion (AIC). Strictly tree-like networks are more parsimonious than those that postulate lateral as well as vertical branches. This leads to the conclusion that we should not always infer LGT events whenever it would improve our fit-to-data, but should do so only when the improved fit is larger than the penalty for adding extra lateral branches.


Akaike Information Criterion Lateral gene transfer Model selection Parsimony Phylogenetic networks 



We thank David Baum, Rob Beiko, Matt Haber, Ehud Lamm, Bret Larget, Luay Nakhleh, Mike Steel, and an anonymous referee for helpful discussion. This paper was first presented at the workshop, Perspectives on the Tree of Life, sponsored by the Leverhulme Trust and held in Halifax, Nova Scotia, July, 2009.


  1. Akaike H (1973) Information theory and an extension of the maximum likelihood principle. Pages 267–281 in Second International Symposium on Information Theory. Akademiai Kiado, BudapestGoogle Scholar
  2. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Aut Control 19:716–723CrossRefGoogle Scholar
  3. Appleby CA, Tjepkema JD, Trinick MJ (1983) Hemoglobin in a nonleguminous plant Parasponia: possible genetic origin and function in nitrogen fixation. Science 220:951–953CrossRefGoogle Scholar
  4. Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical and information-theoretic approach, 3rd edn. Springer, New YorkGoogle Scholar
  5. Chan CX, Darling AE, Beiko RG, Ragan MA (2009) Are protein domains modules of lateral genetic transfer? PLoS One 4(2):e4524CrossRefGoogle Scholar
  6. Davis CC, Wurdack KJ (2004) Host-to-parasite gene transfer in flowering plants: phylogenetic evidence from Malpighiales. Science 305:676CrossRefGoogle Scholar
  7. Farris JS (1983) The logical basis of phylogenetic analysis. In: Platnick NI, Funk VA (eds), Advances in cladistics II. Columbia University Press, New York, pp 7–36. Reprinted in Sober E. (1994), Conceptual Issues in Evolutionary Biology, MIT Press, Cambridge, pp 333–362Google Scholar
  8. Felsenstein J (1973) Maximum likelihood and minimum-step methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249CrossRefGoogle Scholar
  9. Forster M, Sober E (1994) How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions. Br J Philos Sci 45:1–35CrossRefGoogle Scholar
  10. Hein J (1990) Reconstructing evolution of sequences subject to recombination using parsimony. Math Biosci 98:185–200CrossRefGoogle Scholar
  11. Hein J (1993) A heuristic method to reconstruct the history of sequences subject to recombination. J Mol Evol 36:396–405CrossRefGoogle Scholar
  12. Jeffreys AJ (1982) Evolution of globin genes. In: Dover GA, Flavell RB (eds) Genome evolution. Academic Press, New York, pp 157–176Google Scholar
  13. Jin G, Nakhleh L, Snir S, Tuller T (2006) Maximum likelihood of phylogenetic networks. Bioinformatics 22(21):2604–2611CrossRefGoogle Scholar
  14. Jin G, Nakhleh L, Snir S, Tuller T (2007) Inferring phylogenetic networks by the maximum parsimony criterion: a case study. Mol Biol Evol 24(1):324–337CrossRefGoogle Scholar
  15. Jukes TH, Cantor CR (1969) Evolution of protein molecules. Mammalian-protein metabolism. Academic Press, New York, pp 21–132Google Scholar
  16. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120CrossRefGoogle Scholar
  17. Kluge AG (2005) What is the rationale for ‘Ockham’s Razor’ (a.k.a. Parsimony) in phylogenetic inference? In: Albert V (ed) Parsimony, phylogeny and genomics. Oxford University Press, Oxford, pp 15–42Google Scholar
  18. Kubo H (1939) Über hämoprotein aus den wurzelknöllchen von leguminosen. Acta Phytochim (Tokyo) 11:195–200Google Scholar
  19. Markarenkov V, Legendre P (2004) From a phylogenetic tree to a reticulated network. J Comput Biol 11(1):195–212CrossRefGoogle Scholar
  20. Moret BME, Nakhleh L, Warnow T, Linder CR, Tholse A, Padolina A, Sun J, Timme R (2004) Phylogenetic networks: Modeling, reconstructibility, and accuracy. IEEE/ACM Trans Comput Biol Bioinform 1(1):13–23CrossRefGoogle Scholar
  21. Nakhleh L, Sun J, Warnow T, Linder CR, Moret BME, Tholse A (2003) Towards the development of computational tools for evaluating phylogenetic network reconstruction methods. In: Proceedings of the PSB03. Kauai, HawaiiGoogle Scholar
  22. Nakhleh L, Jin G, Zhao F, Mellor-Crummey J (2005) Reconstructing phylogenetic networks using maximum parsimony. In: Markstein V (ed). Proceedings of the 2005 IEEE computational systems bioinformatics conference (CSB2005); August. pp 93–102Google Scholar
  23. Park HJ, Jin G, Nakhleh L (2010) Bootstrap-based support of HGT inferred by maximum parsimony. BMC Evol Biol (forthcoming)Google Scholar
  24. Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of the AIC and Bayesian approaches over likelihood ratio tests. Syst Biol 53:793–808CrossRefGoogle Scholar
  25. Posada D, Crandall KA (1998) Modeltest: testing the model of DNA substitution. Bioinformatics 14(9):817–818CrossRefGoogle Scholar
  26. Sober E (2004) The contest between likelihood and parsimony. Syst Biol 53:6–16CrossRefGoogle Scholar
  27. Sober E (2008) Evidence and evolution—the logic behind the science. Cambridge University Press, CambridgeGoogle Scholar
  28. Than C, Ruths D, Nakhleh L (2008) PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9:322CrossRefGoogle Scholar
  29. Tuffley C, Steel M (1997) Links between maximum-likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59:581–607CrossRefGoogle Scholar
  30. Vinogradov SN, Hoogewijs D, Bailly X, Arredondo-Peter R, Gough J, Dewilde S, Moens L, Vanfleteren JR (2006) A phylogenomic profile of globins. BMC Evol Biol 6:31–47CrossRefGoogle Scholar
  31. Wiley E (1981) Phylogenetics: the theory and practice of phylogenetic systematics. Wiley-Interscience, NYGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Department of PhilosophyCornell UniversityIthacaUSA
  2. 2.Department of PhilosophyUniversity of Wisconsin, MadisonMadisonUSA

Personalised recommendations