Regression explanation and statistical autonomy

Abstract

The phenomenon of regression toward the mean is notoriously liable to be overlooked or misunderstood; regression fallacies are easy to commit. But even when regression phenomena are duly recognized, it remains perplexing how they can feature in explanations. This article develops a philosophical account of regression explanations as “statistically autonomous” explanations that cannot be deepened by adducing details about causal histories, even if the explananda as such are embedded in the causal structure of the world. That regression explanations have statistical autonomy was first suggested by Ian Hacking and has recently been defended and elaborated by André Ariew, Yasha Rohwer, and Collin Rice. However, I will argue that these analyses fail to capture what regression’s statistical autonomy consists in and how it sets regression explanations apart from other kinds of explanation. The alternative account I develop also shows what is amiss with a recent denial of regression’s statistical autonomy. Marc Lange has argued that facts that can be explained as regression phenomena can in principle also be explained by citing a conjunction of causal histories. The account of regression explanation developed here shows that his argument is based on a misunderstanding of the nature of statistical autonomy.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    Note that this is not to say that the instructor must have been wrong about the effects of punishment and praise on fighter pilots. Perhaps his feedback did have the hypothesized effect on performance over and above the effect of regression toward the mean. The point is that his observations provide no evidence for this.

  2. 2.

    Peter Lipton (2004, 2009) also noted that an appeal to regression toward the mean can be explanatory and stated that regression explanations are statistical explanations. Yet he does not develop this point other than by citing the example from Kahneman.

  3. 3.

    Although Ariew et al. (2017) present this as the final step, Rice et al. (in press) suggest that Galton’s explanatory schema included a third step: the interpretation of the modeled result as being applicable to the biological phenomenon by “justify[ing] the application of results obtained from highly idealized statistical models to real-world systems.” It is unclear what makes this final interpretation/justification step necessary, since the success of the first step already depended on the justification for the interpretation of the biological problem as a statistical one.

  4. 4.

    Letter from Francis Galton to George Darwin, 12 January 1877, Galton Papers, University College London (GALTON/3/3/7). The illustration of Galton’s two-stage quincunx that Ariew et al. (2017) include in their paper is taken from Stigler (1986), who reproduced it from Galton’s letter. ARR suggest that they are following Stigler’s analysis of how Galton explained intergenerational stability using this device. However, Stigler only asserts that Galton used it “to provide an analogue proof that a normal mixture of normal distributions was itself a normal” (Stigler 1986, p. 280–281).

  5. 5.

    Although ARR briefly discuss this experiment, they fail to appreciate its import. They take the outcome of the experiment to be supported by the simulation of two-stage quincunx, rather than the balancing quincunx. Immediately following their discussion of the two-stage quincunx, they write: “This is the same result seen in the sweet pea breeding experiment,” and “The sweet pea experiment acted exactly in the way that the [two-stage] quincunx predicts” (Ariew et al. 2017).

  6. 6.

    If the degree of deviation of offspring character values from the parental mean had varied with the parental character value, the uniform quincunxal pattern would not have been a good model of the action of family variability. Galton recognized this and reported that “if it had been otherwise, I cannot imagine, from theoretical considerations, how the typical problem could be solved” (Galton 1877, p. 291).

  7. 7.

    Galton noted that the order in which he modeled these two processes was arbitrary. It was only for modeling purposes that he needed to present the two processes as acting sequentially. Hence, the distribution in the middle is an artifact of the material simulation and has no real-world referent. It should not be mistaken for an intermediate ‘generation’.

  8. 8.

    Galton argued that the height of the mother needed to be multiplied by a factor 1,08 before taking the average. The details of this calculation and Galton’s defense of the mid-parent concept need not concern us here.

References

  1. Ariew A, Rice C, Rohwer Y (2015) Autonomous-statistical explanations and natural selection. Br J Philos Sci 66(3):635–658

    Article  Google Scholar 

  2. Ariew A, Rohwer Y, Rice C (2017) Galton, reversion and the quincunx: the rise of statistical explanation. Stud Hist Philos Sci Part C Stud Hist Philos Biol Biomed Sci 66:63–72

    Article  Google Scholar 

  3. Galton F (1872) On blood-relationship. Proc R Soc 20:294–402

    Google Scholar 

  4. Galton F (1877) Typical laws of heredity. Proc R Inst 8:282–301

    Google Scholar 

  5. Galton F (1886) Presidential address, section H, anthropology. Rep Br Assoc Adv Sci 55:1206–1214

    Google Scholar 

  6. Galton F (1889) Natural inheritance. Macmillan and Co, London

    Book  Google Scholar 

  7. Hacking I (1983) The autonomy of statistical law. In: Rescher N (ed) Scientific explanation and understanding. University Press of America, Lanham, pp 3–19

    Google Scholar 

  8. Hacking I (1990) The taming of chance. Cambridge University Press, Cambridge

    Book  Google Scholar 

  9. Hacking I (1992) Statistical language, statistical truth and statistical reason: the self-authentification of a style of scientific reason. In: McMullin E (ed) The social dimension of science. University of Notre Dame Press, Notre Dame, pp 130–157

    Google Scholar 

  10. Hempel C (1965) Aspects of scientific explanation and other essays in the philosophy of science. Free Press, New York

    Google Scholar 

  11. Hotelling H (1933) Review of the triumph of mediocrity in business, by Horace Secrist. J Am Stat Assoc 28(184):463–465

    Article  Google Scholar 

  12. Kahneman D (2012) Thinking, fast and slow. Penguin Books, London

    Google Scholar 

  13. Lange M (2013) Really statistical explanations and genetic drift. Philos Sci 80(2):169–188

    Article  Google Scholar 

  14. Lange M (2017) Because without cause. Oxford University Press, Oxford

    Google Scholar 

  15. Lipton P (2004) Inference to the best explanation. Routledge, London

    Google Scholar 

  16. Lipton P (2009) Causation and explanation. In: Beebee H, Hitchcock C, Menzies P (eds) The Oxford handbook of causation. Oxford University Press, Oxford

    Google Scholar 

  17. Morton V, Torgerson DJ (2003) Effect of regression to the mean on decision making in health care. BMJ 326(7398):1083–1084

    Article  Google Scholar 

  18. Nesselroade JR, Stigler SM, Baltes PB (1980) Regression toward the mean and the study of change. Psychol Bull 88(3):622–637

    Article  Google Scholar 

  19. Rice C, Rohwer Y, Ariew A Explanatory schema and the process of model building. Synthese (in press)

  20. Salmon W (1971) Statistical explanation. In: Salmon W (ed) Statistical explanation and statistical relevance. University of Pittsburgh Press, Pittsburgh, pp 29–87

    Chapter  Google Scholar 

  21. Schall T, Smith G (2000) Do baseball players regress toward the mean? Am Stat 54(4):231

    Google Scholar 

  22. Senn S (1997) Editorial—regression to the mean. Stat Methods Med Res 6(2):99–104

    Article  Google Scholar 

  23. Senn SJ, Collie GS (1988) Accident blackspots and the bivariate negative binomial. Traffic Eng Control 29(3):168–169

    Google Scholar 

  24. Smith G (2018) What the luck?. Bloomsbury Publishing Plc, London

    Google Scholar 

  25. Stigler SM (1986) The history of statistics: the measurements of uncertainty before 1900. Harvard University Press, Cambridge

    Google Scholar 

  26. Stigler SM (1999) Statistics on the table. Harvard University Press, Cambridge

    Google Scholar 

  27. Stigler SM (2010) Darwin, Galton and the Statistical Enlightenment. J R Stat Soc Ser A Stat Soc 173(3):469–482

    Article  Google Scholar 

Download references

Acknowledgements

I thank the audience of the Videnskabsteori Seminar at the Niels Bohr Institute and my colleagues in the Section for History and Philosophy of Science for helpful comments and suggestions. This work was supported by a Veni research Grant from the Netherlands Organisation for Scientific Research (NWO), Grant Number 275-20-060.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Joeri Witteveen.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Witteveen, J. Regression explanation and statistical autonomy. Biol Philos 34, 51 (2019). https://doi.org/10.1007/s10539-019-9705-z

Download citation

Keywords

  • Regression toward the mean
  • Regression explanation
  • Statistical autonomy
  • Statistical explanation
  • Regression fallacy
  • Reversion
  • Heredity
  • Francis Galton