European Journal for Philosophy of Science

, Volume 7, Issue 3, pp 411–433 | Cite as

The philosophical significance of Stein’s paradox

  • Olav Vassend
  • Elliott Sober
  • Branden Fitelson
Original paper in Philosophy of Probability


Charles Stein discovered a paradox in 1955 that many statisticians think is of fundamental importance. Here we explore its philosophical implications. We outline the nature of Stein’s result and of subsequent work on shrinkage estimators; then we describe how these results are related to Bayesianism and to model selection criteria like AIC. We also discuss their bearing on scientific realism and instrumentalism. We argue that results concerning shrinkage estimators underwrite a surprising form of holistic pragmatism.


Stein’s paradox Shrinkage estimators Bayesianism Frequentism Statistical decision theory 



We thank Marty Barrett, Larry Brown, Jan.-Willem Romeijn, Teddy Seidenfeld, Mike Steel, Reuben Stern, and the anonymous referees for very useful comments. This paper is dedicated to the memory of Charles Stein (1920-2016).

Compliance with ethical standards


No funding to declare.

Conflict of interest

We declare we have no conflicts of interest.


  1. Angers, J.-F. and Berger, J. O. (1985) The stein effect and bayesian analysis: A reexamination. Technical Report #85–6. Department of Statistics, Purdue University.Google Scholar
  2. Baranchik, A. J. (1964) Multiple regression and estimation of the mean of a multivariate normal distribution. Technical Report 51. Department of Statistics, Stanford University.Google Scholar
  3. Blyth, C. (1951). On minimax statistical decision procedures and their admissibility. The Annals of Mathematical Statistics, 22(1), 22–42.CrossRefGoogle Scholar
  4. Bock, M. E. (1975). Minimax estimators of the mean of a multivariate distribution. Annals of Statistics, 3(1), 209–218.CrossRefGoogle Scholar
  5. Brown, L. D. (1966). On the admissibility of invariant estimators of one or more location parameters. Annals of Mathematical Statistics, 37(5), 1087–1136.CrossRefGoogle Scholar
  6. Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. The Annals of Mathematical Statistics, 42(3), 855–903.CrossRefGoogle Scholar
  7. Brown, L. D. (1975). Estimation with incompletely specified loss functions (the case of several location parameters). Journal of the American Statistical Association, 70(350), 417–427.CrossRefGoogle Scholar
  8. Carnap, R. (1950). Empiricism, semantics, and ontology. Revue Internationale de Philosophie, 4(2), 20–40.Google Scholar
  9. Edwards, A. W. F. (1974). The history of likelihood. International Statistical Review, 42.Google Scholar
  10. Efron, B. (2013). Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge: Cambridge University Press.Google Scholar
  11. Efron, B., & Morris, C. (1972). Limiting the risk of Bayes and empirical Bayes estimators – part II: the empirical Bayes case. Journal of the American Statistical Association, 67(337), 130–139.Google Scholar
  12. Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors – an empirical Bayes approach. Journal of the American Statistical Association, 68(341), 117–130.Google Scholar
  13. Efron, B., & Morris, C. (1977). Stein’s paradox in statistics. Scientific American, 236(5), 119–127.CrossRefGoogle Scholar
  14. Forster, M., & Sober, E. (1994). How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions. British Journal for the Philosophy of Science, 45, 1–36.CrossRefGoogle Scholar
  15. Galton, F. (1888). Co-relations and their measurement, chiefly from anthropometric data. Proceedings of the Royal Society of London, 45, 135–145.CrossRefGoogle Scholar
  16. Gauss, C. F. (1823). Theoria Combination is Observationum Erroribus Minimis Obnoxiae: Pars Posterior. Translated (1995) as Theory of the Combination of Observations Least Subject to Error: Part One, Part Two, Supplement (Trans: Stewart, G. W.). Society for Industrial and Applied Mathematics.Google Scholar
  17. Guttmann, S. (1982). Stein’s paradox is impossible in problems with finite sample space. Annals of Statistics, 10(3), 1017–1020.CrossRefGoogle Scholar
  18. Hodges, J., & Lehmann, E. (1951) Some applications of the Cramér-Rao inequality. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability (pp. 13–22). Berkeley and Los Angeles, University of California Press.Google Scholar
  19. James, W. (1896, 1979). The will to believe. In F. Burkhardt et al. (eds.), The will to believe and other essays in popular philosophy (pp. 291–341). Cambridge: MA, Harvard.Google Scholar
  20. James, W., & Stein, C. (1961). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, 361–379.Google Scholar
  21. Jeffrey, R. (1956). Valuation and acceptance of scientific hypotheses. Philosophy of Science, 23(3), 237–246.CrossRefGoogle Scholar
  22. Jeffrey, R. (1983). The logic of decision (Second ed.). Cambridge: Cambridge University Press.Google Scholar
  23. Lehmann, E. L. (1983). Theory of point estimation. New York: Wiley.CrossRefGoogle Scholar
  24. Miller Jr., R. G. (1981). Simultaneous statistical inference (Second ed.). New York: Springer.CrossRefGoogle Scholar
  25. Pascal, B. (1662). Pensées. Translated by W. Trotter. New York: J. M. Dent Co., 1958, fragments: 233–241.Google Scholar
  26. Perlman, M. D., & Chaudhuri, S. (2012). Reversing the stein effect. Statistical Science, 27(1), 135–143.CrossRefGoogle Scholar
  27. Rudner, R. (1953). The scientist Qua scientist makes value judgments. Philosophy of Science, 20(1), 1–6.CrossRefGoogle Scholar
  28. Sober, E. (2008). Evidence and evolution – the logic behind the science. Cambrige: Cambridge University Press.CrossRefGoogle Scholar
  29. Sober, E. (2015). Ockham’s razors – a User’s manual. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  30. Spanos, A. (2016). How the decision theoretic perspective misrepresents Frequentist inference: ‘Nuts and Bolts’ vs learning from data. Available at
  31. Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1, 197–206.Google Scholar
  32. Stein, C. (1962). Confidence sets for the mean of a multivariate normal distribution (with discussion). Journal of the Royal Statistical Society: Series B: Methodological, 24(2), 265–296.Google Scholar
  33. Stigler, S. (1990). The 1988 Neyman memorial lecture: a Galtonian perspective on shrinkage estimators. Statistical Science, 5(1), 147–155.CrossRefGoogle Scholar
  34. Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42(1), 385–388.CrossRefGoogle Scholar
  35. von Luxburg, U., & Schölkopf, B. (2009). Statistical learning theory: Models, concepts, and results. In D. Gabbay, S. Hartmann, & J. Woods (Eds). Handbook of the history of logic, Vol 10: Inductive Logic.Google Scholar
  36. Wasserman, L. (2004). All of statistics: a concise course in statistical inference. New York: Springer.CrossRefGoogle Scholar
  37. White, M. (2005). A philosophy of culture: The scope of holistic pragmatism. Princeton University Press.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Philosophy DepartmentUniversity of WisconsinMadisonUSA
  2. 2.Philosophy DepartmentNortheastern UniversityBostonUSA

Personalised recommendations