Abstract
Bernoulli’s 1713 golden theorem is viewed retrospectively in the context of modern model-based frequentist inference that revolves around the concept of a prespecified statistical model \({\mathcal{M}}_{{{\varvec{\uptheta}}}} \left( {\mathbf{x}} \right)\), defining the inductive premises of inference. It is argued that several widely-accepted claims relating to the golden theorem and frequentist inference are either misleading or erroneous: (a) Bernoulli solved the problem of inference ‘from probability to frequency’, and thus (b) the golden theorem cannot justify an approximate Confidence Interval (CI) for the unknown parameter \(\theta\), (c) Bernoulli identified the probability \(P\left( A \right)\) with the relative frequency \(\frac{1}{n}\sum\nolimits_{k = 1}^{n} {x_{k} }\) of event A as a result of conflating \(f({\mathbf{x}}_{0} |\theta )\) with \(f(\theta |{\mathbf{x}}_{0} ),\) where \({\mathbf{x}}_{0}\) denotes the observed data, and (d) the same ‘swindle’ is currently perpetrated by the p value testers. In interrogating the claims (a)–(d), the paper raises several foundational issues that are particularly relevant for statistical induction as it relates to the current discussions on the replication crises and the trustworthiness of empirical evidence, arguing that: [i] The alleged Bernoulli swindle is grounded in the unwarranted claim \(\hat{\theta }_{n} \left( {{\mathbf{x}}_{0} } \right) \simeq \theta^{*} ,\) for a large enough n, where \(\hat{\theta }_{n} \left( {\mathbf{X}} \right)\) is an optimal estimator of the true value \(\theta^{*}\) of θ. [ii] Frequentist error probabilities are not conditional on hypotheses (H0 and H1) framed in terms of an unknown parameter θ since θ is neither a random variable nor an event. [iii] The direct versus inverse inference problem is a contrived and misplaced charge since neither conditional distribution \(f({\mathbf{x}}_{0} |\theta )\) and \(f(\theta |{\mathbf{x}}_{0} )\) exists (formally or logically) in model-based (\({\mathcal{M}}_{{{\varvec{\uptheta}}}} \left( {\mathbf{x}} \right)\)) frequentist inference.
Similar content being viewed by others
References
Bayes, T. (1764). An essay towards solving a problem in the doctrine of chances. Philospical Transactions of the Royal Society of London, 53, 370–402.
Berger, J. O., & Wolpert, R. W. (1988). The likelihood principle. Lecture notes—Monograph series (2nd ed., Vol. 6). Institute of Mathematical Statistics.
Bernoulli, J. (1713/2006). The art of conjecturing. JHU Press.
Billingsley, P. (1995). Probability and measure (4th ed.). Wiley.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum.
Cohen, J. (1994). The earth is round (p <. 05). American Psychologist, 49, 997–1003.
De Moivre, A. (1738). The doctrine of chances: Or a method of calculating the probability of events in play. W. Pearson.
Dempster, A. P. (1966). New methods for reasoning towards posterior distributions based on sample data. The Annals of Mathematical Statistics, 37(2), 355–374.
Diaconis, P., & Skyrms, B. (2018). Ten great ideas about chance. Princeton University Press.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Chapman & Hall.
Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press.
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society A, 222, 309–368.
Fisher, R. A. (1925). Statistical methods for research workers. Oliver and Boyd.
Gelman, A., Carlin, J. B., & Rubin, D. B. (2004). Bayesian data analysis (2nd ed.). Chapman & Hall.
Ghosh, K. J., Delampady, M., & Samata, T. (2006). Introduction to Bayesian analysis. Springer.
Gorroochurn, P. (2012). Classic problems of probability. Wiley.
Hacking, I. (1965). Salmon’s vindication of induction. The Journal of Philosophy, 62(10), 260–266.
Hacking, I. (1980). The theory of probable inference: Neyman, Peirce and Braithwaite. In D. Mellor (Ed.), Science, belief and behavior: Essays in honour of Richard B (pp. 141–160). Cambridge University Press, Cambridge.
Hald, A. (1998). A history of mathematical statistics from 1750 to 1930. Wiley.
Hald, A. (2007). A history of parametric statistical inference from Bernoulli to Fisher, 1713–1935. Springer.
Henderson, L. (2020). The problem of induction. The Stanford Encyclopedia of Philosophy. Edward N. Zalta (ed.). https://plato.stanford.edu/archives/spr2020/entries/induction-problem/.
Howson, C., & Urbach, P. (2006). Scientific rasoning: The Bayesian approach (3rd ed.). Open Court.
Hume, D. (1748). An Enquiry Concerning Human Understanding. Oxford: Oxford University Press.
Kolmogorov, A. N. (1933). Foundations of the theory of Probability, 2nd English edition, NY: Chelsea Publishing Co.
Laplace, P. S. (1812). Théorie analytique des Probabilités (Vol. 2). Courcier Imprimeur.
Le Cam, L. (1977). A note on metastatistics or ‘an essay toward stating a problem in the doctrine of chances”. Synthese, 36, 133–160.
Le Cam, L. (1986). Asymptotic methods in statistical decision theory. Springer.
Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses. Springer.
Lindley, D. V. (1965). Introduction to probability and statistics from the bayesian viewpoint (Vol. 1). Cambridge University Press.
Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman–Pearson philosophy of induction. British Journal for the Philosophy of Science, 57, 323–357.
Mayo, D. G., & Spanos, A. (2011). Error statistics. Philosophy of statisticsIn D. Gabbay, P. Thagard, & J. Woods (Eds.), The handbook of philosophy of science (Vol. 7, pp. 151–196). Amsterdam: Elsevier.
Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Statistical Society of London, A, 236, 333–380.
Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Statistical Society of London, A, 231, 289–337.
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301.
Nosek, B. A., & Lakens, D. E. (2014). A method to increase the credibility of published results. Social Psychology, 45, 137–141.
O’Hagan, A. (1994). Bayesian inference. Edward Arnold.
Renyi, A. (1970). Foundations of probability. Holden-Day.
Robert, C. P. (2007). The Bayesian choice: From decision-theoretic foundations to computational implementation (2nd ed.). Springer.
Salmon, W. C. (1967). The foundations of scientific inference. University of Pittsburgh Press.
Shiryaev, A. N. (2016). Probability-1 (2nd ed.). Springer.
Sober, E. (2008). Evidence and evolution: The logic behind the science. Cambridge University Press.
Spanos, A. (2006). Where do statistical models come from? Revisiting the problem of specification. In J. Rojo (Ed.), Optimality: The Second Erich L. Lehmann Symposium. Lecture Notes-Monograph Series, (Vol. 49, pp. 98–119). Institute of Mathematical Statistics.
Spanos, A. (2010). Is frequentist testing vulnerable to the base-rate fallacy? Philosophy of Science, 77, 565–583.
Spanos, A. (2013a). A frequentist interpretation of probability for model-based inductive inference. Synthese, 190, 1555–1585.
Spanos, A. (2013b). Who should be afraid of the Jeffreys–Lindley paradox? Philosophy of Science, 80, 73–93.
Spanos, A. (2017). Why the decision-theoretic perspective misrepresents frequentist inference. In: Advances in statistical methodologies and their applications to real problems (pp. 3–28). ISBN 978-953-51-4962-0.
Spanos, A. (2018). Mis-specification testing in retrospect. Journal of Economic Surveys, 32(2), 541–577.
Spanos, A. (2019). Probability theory and statistical inference: Empirical modeling with observational data. Cambridge University Press.
Spanos, A. (2021). Revisiting noncentrality-based confidence intervals, error probabilities and estimation-based effect sizes. Journal of Mathematical Psychology, 104, 102580.
Spanos, A., & McGuirk, A. (2001). The model specification problem from a probabilistic reduction perspective. Journal of the American Agricultural Association, 83, 1168–1176.
Spanos, A., & Mayo, D. G. (2015). Error statistical modeling and inference: Where methodology meets ontology. Synthese, 192, 3533–3555.
Von Mises, R. (1928). Probability, statistics and truth (2nd ed.). Dover.
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond p < .05. American Statistian, 73(Suppl. 1), 1–19.
Williams, D. (2001). Weighing the odds: A course in probability and statistics. Cambridge University Press.
Acknowledgements
Thanks are due to Prakash Gorroochurn and two anonymous reviewers for several useful comments and suggestions that helped improve the discussion in the paper appreciably.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the topical collection “Recent Issues in Philosophy of Statistics: Evidence, Testing, and Applications”, edited by Sorin Bangu, Emiliano Ippoliti, and Marianna Antonutti.
Rights and permissions
About this article
Cite this article
Spanos, A. Bernoulli’s golden theorem in retrospect: error probabilities and trustworthy evidence. Synthese 199, 13949–13976 (2021). https://doi.org/10.1007/s11229-021-03405-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229-021-03405-z