Skip to main content
Log in

Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis

  • Published:
Biology & Philosophy Aims and scope Submit manuscript

Abstract

The three main approaches in statistical inference—classical statistics, Bayesian and likelihood—are in current use in phylogeny research. The three approaches are discussed and compared, with particular emphasis on theoretical properties illustrated by simple thought-experiments. The methods are problematic on axiomatic grounds (classical statistics), extra-mathematical grounds relating to the use of a prior (Bayesian inference) or practical grounds (likelihood). This essay aims to increase understanding of these limits among those with an interest in phylogeny.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC19:716–723

    Article  Google Scholar 

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  Google Scholar 

  • Anismova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539–552

    Article  Google Scholar 

  • Autzen B (2011) Constraining prior probabilities of phylogenetic trees. Biol Philos 26:567–581

    Article  Google Scholar 

  • Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach, 2nd edn. MIT Press, Cambridge

    Google Scholar 

  • Barker D, Meade A, Pagel M (2007) Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. Bioinformatics 23:14–20

    Article  Google Scholar 

  • Beaumont MA, Rannala B (2004) The Bayesian revolution in genetics. Nat Rev Genet 5:251–261

    Article  Google Scholar 

  • Berger JO, Wolpert RL (1984) The likelihood principle. Institute of Mathematical Statistics, Hayward

    Google Scholar 

  • Birnbaum A (1962) On the foundations of statistical inference. J Am Stat Assoc 57:269–306

    Article  Google Scholar 

  • Birnbaum A (1972) More on concepts of statistical evidence. J Am Stat Assoc 67:858–861

    Article  Google Scholar 

  • Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin

    Book  Google Scholar 

  • Buschbom J, Barker D (2006) Evolutionary history of vegetative reproduction in Porpidia s.l. (lichen-forming Ascomycota). Syst Biol 55:471–484

    Article  Google Scholar 

  • Casella G (1985) An introduction to empirical Bayes data analysis. Am Stat 39:83–87

    Google Scholar 

  • Dos Reiss M, Zhu T, Yang Z (2014) The impact of rate prior on Bayesian estimation of divergence times with multiple loci. Syst Biol 63:555–565

    Article  Google Scholar 

  • Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJP (2003) Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol 20:248–254

    Article  Google Scholar 

  • Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4:e88

    Article  Google Scholar 

  • Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Edwards AWF (1977) R.A. Fisher’s work on statistical inference. In Parenti G (ed) I fondamenti dell’inferenza statistica. Università degli Studi di Firenze, Firenze, pp 117–124. Reprinted in Edwards (1992), pp 245–251.

  • Edwards AWF (1992) Likelihood, expanded edition. John Hopkins University Press, Baltimore

    Google Scholar 

  • Efron B (2003) Robbins, empirical Bayes and microarrays. Ann Stat 31:366–378

    Article  Google Scholar 

  • Ekman S, Blaalid R (2011) The devil in the details: interactions between the branch-length prior and likelihood model affect node support and branch lengths in the phylogeny of the Psoraceae. Syst Biol 60:541–561

    Article  Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376

    Article  Google Scholar 

  • Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791

    Article  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland

    Google Scholar 

  • Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80:27–38

    Article  Google Scholar 

  • Fisher RA (1935a) The design of experiments. Oliver and Boyd, Edinburgh

    Google Scholar 

  • Fisher RA (1935b) The fiducial argument in statistical inference. Ann Eugenics 6:391–398

    Article  Google Scholar 

  • Fisher RA (1956) Statistical methods and scientific inference. Oliver and Boyd, Edinburgh

    Google Scholar 

  • Fraser DAS (1968) Fiducial inference. In: Sills L (ed) International encyclopedia of social sciences. The Macmillan Company and The Free Press, New York, pp 403–406

    Google Scholar 

  • Gandenberger G (2014) A new proof of the likelihood principle. Br J Philos Sci. doi:10.1093/bjps/axt039

  • Gelman A, Carlin JB, Stern HS, Rubin DB (1995) Bayesian data analysis. Chapman and Hall, London

    Google Scholar 

  • Graur D, Martin W (2004) Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet 20:80–86

    Article  Google Scholar 

  • Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704

    Article  Google Scholar 

  • Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321

    Article  Google Scholar 

  • Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4:275–284

    Article  Google Scholar 

  • Huelsenbeck JP, Bollback JP (2007) Application of the likelihood function in phylogenetic analysis. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, vol 1, 3rd edn. Wiley, Chichester, pp 460–488

    Chapter  Google Scholar 

  • Huelsenbeck JP, Larget B, Miller RE, Ronquist F (2002) Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol 51:673–688

    Article  Google Scholar 

  • Huelsenbeck JP, Jain S, Frost SWD, Kosakovsky Pond SL (2006) A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proc Natl Acad Sci USA 103:6263–6268

    Article  Google Scholar 

  • Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2:e124

    Article  Google Scholar 

  • Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132

    Chapter  Google Scholar 

  • Kadane JB (2011) Principles of uncertainty. CRC Press, Boca Raton

    Book  Google Scholar 

  • Kahneman D (2012) Thinking, fast and slow, paperback edition. Penguin Books, London

    Google Scholar 

  • Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McInerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6:29

    Article  Google Scholar 

  • Kempthorne O (1962) Comments on A. Birnbaum’s “On the foundations of statistical inference”. J Am Stat Assoc 67:319–322

    Google Scholar 

  • Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K (2012) Statistics and truth in phylogenomics. Mol Biol Evol 29:457–472

    Article  Google Scholar 

  • Lamarck J-BPAM (1809) Philosophie zoologique. Dentu, Paris

    Google Scholar 

  • Lim J-H, Iggo RD, Barker D (2013) Models incorporating chromatin modification data identify functionally important p53 binding sites. Nucleic Acids Res 41:5582–5593

    Article  Google Scholar 

  • Lindley DV (1957) A statistical paradox. Biometrika 44:187–192

    Article  Google Scholar 

  • Lv J, Liu H, Huang Z, Su J, He H, Xiu Y, Zhang Y, Wu Q (2013) Long non-coding RNA identification over mouse brain development by integrative modeling of chromatin and genomic features. Nucleic Acids Res 41:10044–10061

    Article  Google Scholar 

  • Mayo D (2010) An error in the argument from conditionality and sufficiency to the likelihood principle. In: Mayo D, Spanos A (eds) Error and inference: recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science. Cambridge University Press, Cambridge, pp 305–314

    Google Scholar 

  • Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936

    Google Scholar 

  • O’Meara BC (2012) Evolutionary inferences from phylogenies: a review of methods. Ann Rev Ecol Evol Syst 43:267–285

    Article  Google Scholar 

  • Pagel M (1999) The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst Biol 48:612–622

    Article  Google Scholar 

  • Pichot A (1994) Présentation. In: Lamarck JBPA (ed) Philosophie Zoologique, avec présentation et notes par André Pichot. Flammarion, Paris, pp 7–49.

  • Pickett KM, Randle CP (2005) Strange Bayes indeed: uniform topological priors imply non-uniform clade priors. Mol Phylogenet Evol 34:203–211

    Article  Google Scholar 

  • Posada D (2008) jModelTest: phylogenetic model averaging. Mol Biol Evol 25:1253–1256

    Article  Google Scholar 

  • Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53:793–808

    Article  Google Scholar 

  • Randle CP, Pickett KM (2010) The conflation of ignorance and knowledge in the inference of clade posteriors. Cladistics 26:550–559

    Article  Google Scholar 

  • Rannala B, Yang Z (2007) Inferring speciation times under an episodic molecular clock. Syst Biol 56:453–466

    Article  Google Scholar 

  • Royall R (2000) On the probability of observing misleading statistical evidence. J Am Stat Assoc 95:760–768

    Article  Google Scholar 

  • Sanderson MJ (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218–1231

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  Google Scholar 

  • Seidenfeld T (1992) R.A. Fisher’s fiducial argument and Bayes’ theorem. Stat Sci 7:358–368

    Article  Google Scholar 

  • Shields R (2004) Pushing the envelope on molecular dating. Trends Genet 20:221–222

    Article  Google Scholar 

  • Simmons MP, Norton AP (2013) Quantification and relative severity of inflated branch-support values generated by alternative methods: an empirical example. Mol Phylogenet Evol 67:277–296

    Article  Google Scholar 

  • Sober E (2008) Evidence and evolution: the logic behind the science. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313

    Article  Google Scholar 

  • Thorne JL, Kishino H, Painter IS (1998) Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol 15:1647–1657

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288

    Google Scholar 

  • Tuffley C, Steel M (1997) Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59:581–607

    Article  Google Scholar 

  • Tversky A, Kahneman D (1974) Judgement under uncertainty: heuristics and biases. Science 185:1124–1131

    Article  Google Scholar 

  • Velasco JD (2008) The prior probabilities of phylogenetic trees. Biol Philos 23:455–473

    Article  Google Scholar 

  • Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9:60–62

    Article  Google Scholar 

  • Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford

    Book  Google Scholar 

  • Yang Z, Rannala B (2005) Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54:455–470

    Article  Google Scholar 

  • Yang Z, Rannala B (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol Biol Evol 23:212–226

    Article  Google Scholar 

  • Yang Z, Yoder AD (2003) Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Syst Biol 52:705–726

    Article  Google Scholar 

  • Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641–1650

    Google Scholar 

  • Yoder AD, Yang Z (2000) Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol 17:1081–1090

    Article  Google Scholar 

  • Zabel SL (1992) R.A. Fisher and the fiducial argument. Stat Sci 7:369–387

    Article  Google Scholar 

  • Zagordi O, Lobry JR (2005) Forcing reversibility in the no-strand-bias substitution model allows for the theoretical and practical identifiability of its 5 parameters from pairwise DNA sequence comparisons. Gene 347:175–182

    Article  Google Scholar 

Download references

Acknowledgments

I thank Maria Dornelas, Heleen Plaisier and Graeme Ruxton for their comments on an earlier version of the manuscript. Discussions at the University of St Andrews, particularly at the Harold Mitchell Building’s Lab Chat series organised by Mike Ritchie’s group and the Centre for Biological Diversity’s Quantitative Biology Discussion Group organised by Mike Morrisey, have also been helpful. I further thank Heleen Plaisier for pointing out the truth about librarians and farmers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Barker.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barker, D. Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis. Biol Philos 30, 505–525 (2015). https://doi.org/10.1007/s10539-014-9455-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10539-014-9455-x

Keywords

Navigation