Abstract
The three main approaches in statistical inference—classical statistics, Bayesian and likelihood—are in current use in phylogeny research. The three approaches are discussed and compared, with particular emphasis on theoretical properties illustrated by simple thought-experiments. The methods are problematic on axiomatic grounds (classical statistics), extra-mathematical grounds relating to the use of a prior (Bayesian inference) or practical grounds (likelihood). This essay aims to increase understanding of these limits among those with an interest in phylogeny.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC19:716–723
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Anismova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539–552
Autzen B (2011) Constraining prior probabilities of phylogenetic trees. Biol Philos 26:567–581
Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach, 2nd edn. MIT Press, Cambridge
Barker D, Meade A, Pagel M (2007) Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. Bioinformatics 23:14–20
Beaumont MA, Rannala B (2004) The Bayesian revolution in genetics. Nat Rev Genet 5:251–261
Berger JO, Wolpert RL (1984) The likelihood principle. Institute of Mathematical Statistics, Hayward
Birnbaum A (1962) On the foundations of statistical inference. J Am Stat Assoc 57:269–306
Birnbaum A (1972) More on concepts of statistical evidence. J Am Stat Assoc 67:858–861
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin
Buschbom J, Barker D (2006) Evolutionary history of vegetative reproduction in Porpidia s.l. (lichen-forming Ascomycota). Syst Biol 55:471–484
Casella G (1985) An introduction to empirical Bayes data analysis. Am Stat 39:83–87
Dos Reiss M, Zhu T, Yang Z (2014) The impact of rate prior on Bayesian estimation of divergence times with multiple loci. Syst Biol 63:555–565
Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJP (2003) Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol 20:248–254
Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4:e88
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis. Cambridge University Press, Cambridge
Edwards AWF (1977) R.A. Fisher’s work on statistical inference. In Parenti G (ed) I fondamenti dell’inferenza statistica. Università degli Studi di Firenze, Firenze, pp 117–124. Reprinted in Edwards (1992), pp 245–251.
Edwards AWF (1992) Likelihood, expanded edition. John Hopkins University Press, Baltimore
Efron B (2003) Robbins, empirical Bayes and microarrays. Ann Stat 31:366–378
Ekman S, Blaalid R (2011) The devil in the details: interactions between the branch-length prior and likelihood model affect node support and branch lengths in the phylogeny of the Psoraceae. Syst Biol 60:541–561
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791
Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland
Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80:27–38
Fisher RA (1935a) The design of experiments. Oliver and Boyd, Edinburgh
Fisher RA (1935b) The fiducial argument in statistical inference. Ann Eugenics 6:391–398
Fisher RA (1956) Statistical methods and scientific inference. Oliver and Boyd, Edinburgh
Fraser DAS (1968) Fiducial inference. In: Sills L (ed) International encyclopedia of social sciences. The Macmillan Company and The Free Press, New York, pp 403–406
Gandenberger G (2014) A new proof of the likelihood principle. Br J Philos Sci. doi:10.1093/bjps/axt039
Gelman A, Carlin JB, Stern HS, Rubin DB (1995) Bayesian data analysis. Chapman and Hall, London
Graur D, Martin W (2004) Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet 20:80–86
Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704
Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321
Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4:275–284
Huelsenbeck JP, Bollback JP (2007) Application of the likelihood function in phylogenetic analysis. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, vol 1, 3rd edn. Wiley, Chichester, pp 460–488
Huelsenbeck JP, Larget B, Miller RE, Ronquist F (2002) Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol 51:673–688
Huelsenbeck JP, Jain S, Frost SWD, Kosakovsky Pond SL (2006) A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proc Natl Acad Sci USA 103:6263–6268
Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2:e124
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132
Kadane JB (2011) Principles of uncertainty. CRC Press, Boca Raton
Kahneman D (2012) Thinking, fast and slow, paperback edition. Penguin Books, London
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McInerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6:29
Kempthorne O (1962) Comments on A. Birnbaum’s “On the foundations of statistical inference”. J Am Stat Assoc 67:319–322
Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K (2012) Statistics and truth in phylogenomics. Mol Biol Evol 29:457–472
Lamarck J-BPAM (1809) Philosophie zoologique. Dentu, Paris
Lim J-H, Iggo RD, Barker D (2013) Models incorporating chromatin modification data identify functionally important p53 binding sites. Nucleic Acids Res 41:5582–5593
Lindley DV (1957) A statistical paradox. Biometrika 44:187–192
Lv J, Liu H, Huang Z, Su J, He H, Xiu Y, Zhang Y, Wu Q (2013) Long non-coding RNA identification over mouse brain development by integrative modeling of chromatin and genomic features. Nucleic Acids Res 41:10044–10061
Mayo D (2010) An error in the argument from conditionality and sufficiency to the likelihood principle. In: Mayo D, Spanos A (eds) Error and inference: recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science. Cambridge University Press, Cambridge, pp 305–314
Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936
O’Meara BC (2012) Evolutionary inferences from phylogenies: a review of methods. Ann Rev Ecol Evol Syst 43:267–285
Pagel M (1999) The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst Biol 48:612–622
Pichot A (1994) Présentation. In: Lamarck JBPA (ed) Philosophie Zoologique, avec présentation et notes par André Pichot. Flammarion, Paris, pp 7–49.
Pickett KM, Randle CP (2005) Strange Bayes indeed: uniform topological priors imply non-uniform clade priors. Mol Phylogenet Evol 34:203–211
Posada D (2008) jModelTest: phylogenetic model averaging. Mol Biol Evol 25:1253–1256
Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53:793–808
Randle CP, Pickett KM (2010) The conflation of ignorance and knowledge in the inference of clade posteriors. Cladistics 26:550–559
Rannala B, Yang Z (2007) Inferring speciation times under an episodic molecular clock. Syst Biol 56:453–466
Royall R (2000) On the probability of observing misleading statistical evidence. J Am Stat Assoc 95:760–768
Sanderson MJ (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218–1231
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Seidenfeld T (1992) R.A. Fisher’s fiducial argument and Bayes’ theorem. Stat Sci 7:358–368
Shields R (2004) Pushing the envelope on molecular dating. Trends Genet 20:221–222
Simmons MP, Norton AP (2013) Quantification and relative severity of inflated branch-support values generated by alternative methods: an empirical example. Mol Phylogenet Evol 67:277–296
Sober E (2008) Evidence and evolution: the logic behind the science. Cambridge University Press, Cambridge
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313
Thorne JL, Kishino H, Painter IS (1998) Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol 15:1647–1657
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
Tuffley C, Steel M (1997) Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59:581–607
Tversky A, Kahneman D (1974) Judgement under uncertainty: heuristics and biases. Science 185:1124–1131
Velasco JD (2008) The prior probabilities of phylogenetic trees. Biol Philos 23:455–473
Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9:60–62
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford
Yang Z, Rannala B (2005) Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54:455–470
Yang Z, Rannala B (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol Biol Evol 23:212–226
Yang Z, Yoder AD (2003) Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Syst Biol 52:705–726
Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641–1650
Yoder AD, Yang Z (2000) Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol 17:1081–1090
Zabel SL (1992) R.A. Fisher and the fiducial argument. Stat Sci 7:369–387
Zagordi O, Lobry JR (2005) Forcing reversibility in the no-strand-bias substitution model allows for the theoretical and practical identifiability of its 5 parameters from pairwise DNA sequence comparisons. Gene 347:175–182
Acknowledgments
I thank Maria Dornelas, Heleen Plaisier and Graeme Ruxton for their comments on an earlier version of the manuscript. Discussions at the University of St Andrews, particularly at the Harold Mitchell Building’s Lab Chat series organised by Mike Ritchie’s group and the Centre for Biological Diversity’s Quantitative Biology Discussion Group organised by Mike Morrisey, have also been helpful. I further thank Heleen Plaisier for pointing out the truth about librarians and farmers.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barker, D. Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis. Biol Philos 30, 505–525 (2015). https://doi.org/10.1007/s10539-014-9455-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10539-014-9455-x