Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis

Barker, Daniel

doi:10.1007/s10539-014-9455-x

Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis

Published: 19 July 2014

Volume 30, pages 505–525, (2015)
Cite this article

Biology & Philosophy Aims and scope Submit manuscript

Daniel Barker¹

1150 Accesses
6 Citations
7 Altmetric
Explore all metrics

Abstract

The three main approaches in statistical inference—classical statistics, Bayesian and likelihood—are in current use in phylogeny research. The three approaches are discussed and compared, with particular emphasis on theoretical properties illustrated by simple thought-experiments. The methods are problematic on axiomatic grounds (classical statistics), extra-mathematical grounds relating to the use of a prior (Bayesian inference) or practical grounds (likelihood). This essay aims to increase understanding of these limits among those with an interest in phylogeny.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Sander Greenland, Stephen J. Senn, … Douglas G. Altman

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Levi Kumle, Melissa L.-H. Võ & Dejan Draschkow

Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications

Article Open access 04 August 2017

Eric-Jan Wagenmakers, Maarten Marsman, … Richard D. Morey

References

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC19:716–723
Article Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Article Google Scholar
Anismova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539–552
Article Google Scholar
Autzen B (2011) Constraining prior probabilities of phylogenetic trees. Biol Philos 26:567–581
Article Google Scholar
Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach, 2nd edn. MIT Press, Cambridge
Google Scholar
Barker D, Meade A, Pagel M (2007) Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. Bioinformatics 23:14–20
Article Google Scholar
Beaumont MA, Rannala B (2004) The Bayesian revolution in genetics. Nat Rev Genet 5:251–261
Article Google Scholar
Berger JO, Wolpert RL (1984) The likelihood principle. Institute of Mathematical Statistics, Hayward
Google Scholar
Birnbaum A (1962) On the foundations of statistical inference. J Am Stat Assoc 57:269–306
Article Google Scholar
Birnbaum A (1972) More on concepts of statistical evidence. J Am Stat Assoc 67:858–861
Article Google Scholar
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin
Book Google Scholar
Buschbom J, Barker D (2006) Evolutionary history of vegetative reproduction in Porpidia s.l. (lichen-forming Ascomycota). Syst Biol 55:471–484
Article Google Scholar
Casella G (1985) An introduction to empirical Bayes data analysis. Am Stat 39:83–87
Google Scholar
Dos Reiss M, Zhu T, Yang Z (2014) The impact of rate prior on Bayesian estimation of divergence times with multiple loci. Syst Biol 63:555–565
Article Google Scholar
Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJP (2003) Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol 20:248–254
Article Google Scholar
Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4:e88
Article Google Scholar
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis. Cambridge University Press, Cambridge
Book Google Scholar
Edwards AWF (1977) R.A. Fisher’s work on statistical inference. In Parenti G (ed) I fondamenti dell’inferenza statistica. Università degli Studi di Firenze, Firenze, pp 117–124. Reprinted in Edwards (1992), pp 245–251.
Edwards AWF (1992) Likelihood, expanded edition. John Hopkins University Press, Baltimore
Google Scholar
Efron B (2003) Robbins, empirical Bayes and microarrays. Ann Stat 31:366–378
Article Google Scholar
Ekman S, Blaalid R (2011) The devil in the details: interactions between the branch-length prior and likelihood model affect node support and branch lengths in the phylogeny of the Psoraceae. Syst Biol 60:541–561
Article Google Scholar
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Article Google Scholar
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791
Article Google Scholar
Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland
Google Scholar
Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80:27–38
Article Google Scholar
Fisher RA (1935a) The design of experiments. Oliver and Boyd, Edinburgh
Google Scholar
Fisher RA (1935b) The fiducial argument in statistical inference. Ann Eugenics 6:391–398
Article Google Scholar
Fisher RA (1956) Statistical methods and scientific inference. Oliver and Boyd, Edinburgh
Google Scholar
Fraser DAS (1968) Fiducial inference. In: Sills L (ed) International encyclopedia of social sciences. The Macmillan Company and The Free Press, New York, pp 403–406
Google Scholar
Gandenberger G (2014) A new proof of the likelihood principle. Br J Philos Sci. doi:10.1093/bjps/axt039
Gelman A, Carlin JB, Stern HS, Rubin DB (1995) Bayesian data analysis. Chapman and Hall, London
Google Scholar
Graur D, Martin W (2004) Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet 20:80–86
Article Google Scholar
Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704
Article Google Scholar
Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321
Article Google Scholar
Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4:275–284
Article Google Scholar
Huelsenbeck JP, Bollback JP (2007) Application of the likelihood function in phylogenetic analysis. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, vol 1, 3rd edn. Wiley, Chichester, pp 460–488
Chapter Google Scholar
Huelsenbeck JP, Larget B, Miller RE, Ronquist F (2002) Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol 51:673–688
Article Google Scholar
Huelsenbeck JP, Jain S, Frost SWD, Kosakovsky Pond SL (2006) A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proc Natl Acad Sci USA 103:6263–6268
Article Google Scholar
Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2:e124
Article Google Scholar
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132
Chapter Google Scholar
Kadane JB (2011) Principles of uncertainty. CRC Press, Boca Raton
Book Google Scholar
Kahneman D (2012) Thinking, fast and slow, paperback edition. Penguin Books, London
Google Scholar
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McInerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6:29
Article Google Scholar
Kempthorne O (1962) Comments on A. Birnbaum’s “On the foundations of statistical inference”. J Am Stat Assoc 67:319–322
Google Scholar
Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K (2012) Statistics and truth in phylogenomics. Mol Biol Evol 29:457–472
Article Google Scholar
Lamarck J-BPAM (1809) Philosophie zoologique. Dentu, Paris
Google Scholar
Lim J-H, Iggo RD, Barker D (2013) Models incorporating chromatin modification data identify functionally important p53 binding sites. Nucleic Acids Res 41:5582–5593
Article Google Scholar
Lindley DV (1957) A statistical paradox. Biometrika 44:187–192
Article Google Scholar
Lv J, Liu H, Huang Z, Su J, He H, Xiu Y, Zhang Y, Wu Q (2013) Long non-coding RNA identification over mouse brain development by integrative modeling of chromatin and genomic features. Nucleic Acids Res 41:10044–10061
Article Google Scholar
Mayo D (2010) An error in the argument from conditionality and sufficiency to the likelihood principle. In: Mayo D, Spanos A (eds) Error and inference: recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science. Cambridge University Press, Cambridge, pp 305–314
Google Scholar
Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936
Google Scholar
O’Meara BC (2012) Evolutionary inferences from phylogenies: a review of methods. Ann Rev Ecol Evol Syst 43:267–285
Article Google Scholar
Pagel M (1999) The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst Biol 48:612–622
Article Google Scholar
Pichot A (1994) Présentation. In: Lamarck JBPA (ed) Philosophie Zoologique, avec présentation et notes par André Pichot. Flammarion, Paris, pp 7–49.
Pickett KM, Randle CP (2005) Strange Bayes indeed: uniform topological priors imply non-uniform clade priors. Mol Phylogenet Evol 34:203–211
Article Google Scholar
Posada D (2008) jModelTest: phylogenetic model averaging. Mol Biol Evol 25:1253–1256
Article Google Scholar
Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53:793–808
Article Google Scholar
Randle CP, Pickett KM (2010) The conflation of ignorance and knowledge in the inference of clade posteriors. Cladistics 26:550–559
Article Google Scholar
Rannala B, Yang Z (2007) Inferring speciation times under an episodic molecular clock. Syst Biol 56:453–466
Article Google Scholar
Royall R (2000) On the probability of observing misleading statistical evidence. J Am Stat Assoc 95:760–768
Article Google Scholar
Sanderson MJ (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218–1231
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article Google Scholar
Seidenfeld T (1992) R.A. Fisher’s fiducial argument and Bayes’ theorem. Stat Sci 7:358–368
Article Google Scholar
Shields R (2004) Pushing the envelope on molecular dating. Trends Genet 20:221–222
Article Google Scholar
Simmons MP, Norton AP (2013) Quantification and relative severity of inflated branch-support values generated by alternative methods: an empirical example. Mol Phylogenet Evol 67:277–296
Article Google Scholar
Sober E (2008) Evidence and evolution: the logic behind the science. Cambridge University Press, Cambridge
Book Google Scholar
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313
Article Google Scholar
Thorne JL, Kishino H, Painter IS (1998) Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol 15:1647–1657
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
Google Scholar
Tuffley C, Steel M (1997) Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59:581–607
Article Google Scholar
Tversky A, Kahneman D (1974) Judgement under uncertainty: heuristics and biases. Science 185:1124–1131
Article Google Scholar
Velasco JD (2008) The prior probabilities of phylogenetic trees. Biol Philos 23:455–473
Article Google Scholar
Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9:60–62
Article Google Scholar
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford
Book Google Scholar
Yang Z, Rannala B (2005) Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54:455–470
Article Google Scholar
Yang Z, Rannala B (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol Biol Evol 23:212–226
Article Google Scholar
Yang Z, Yoder AD (2003) Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Syst Biol 52:705–726
Article Google Scholar
Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641–1650
Google Scholar
Yoder AD, Yang Z (2000) Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol 17:1081–1090
Article Google Scholar
Zabel SL (1992) R.A. Fisher and the fiducial argument. Stat Sci 7:369–387
Article Google Scholar
Zagordi O, Lobry JR (2005) Forcing reversibility in the no-strand-bias substitution model allows for the theoretical and practical identifiability of its 5 parameters from pairwise DNA sequence comparisons. Gene 347:175–182
Article Google Scholar

Download references

Acknowledgments

I thank Maria Dornelas, Heleen Plaisier and Graeme Ruxton for their comments on an earlier version of the manuscript. Discussions at the University of St Andrews, particularly at the Harold Mitchell Building’s Lab Chat series organised by Mike Ritchie’s group and the Centre for Biological Diversity’s Quantitative Biology Discussion Group organised by Mike Morrisey, have also been helpful. I further thank Heleen Plaisier for pointing out the truth about librarians and farmers.

Author information

Authors and Affiliations

Sir Harold Mitchell Building, School of Biology, University of St Andrews, St Andrews, Fife, KY16 9TH, UK
Daniel Barker

Authors

Daniel Barker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Barker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barker, D. Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis. Biol Philos 30, 505–525 (2015). https://doi.org/10.1007/s10539-014-9455-x

Download citation

Received: 02 May 2014
Accepted: 22 June 2014
Published: 19 July 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10539-014-9455-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation