Skip to main content
Log in

Learning from Non-Causal Models

  • Original Research
  • Published:
Erkenntnis Aims and scope Submit manuscript


This paper defends the thesis of learning from non-causal models: viz. that the study of some model can prompt justified changes in one’s confidence in empirical hypotheses about a real-world target in the absence of any known or predicted similarity between model and target with regards to their causal features. Recognizing that we can learn from non-causal models matters not only to our understanding of past scientific achievements, but also to contemporary debates in the philosophy of science. At one end of the philosophical spectrum, my thesis undermines the views of those who, like Cartwright (Erkenntnis 70:45–58, 2009), follow Hesse (Models and Analogies in Science, Notre Dame, University of Notre Dame Press, 1963) in restricting the possibility of learning from models to only those situations where a model identifies some causal factors present in the target. At the other end of the spectrum, my thesis also helps undermine some extremely permissive positions, e.g., Grüne-Yanoff’s (Erkenntnis 70(1):81–99, 2009, Philos Sci 80(5): 850–861, 2013) claim that learning from a model is possible even in the absence of any similarity at all between model and target. The thesis that we can learn from non-causal models offers a cautious middle ground between these two extremes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others


  1. Cf. “Grüne-Yanoff (2013): Learning from a model M… is constituted by a change in confidence in certain hypotheses, justified by reference to M” (2). In what follows, I will use ‘confidence’ and ‘degree of credence’ interchangeably. When a model justifies assigning some additional degree of credence to an empirical hypothesis, I will sometimes use the equivalent expression ‘provides inductive support’.

  2. The expression ‘non-causal’ is borrowed from discussions of non-causal explanations in science (e.g., Lange 2016). While this paper is concerned with confirmation and not explanation, there are suggestive parallels between the two topics, some of which are discussed in footnote 23 (cf. also Reutlinger et al. 2017).

  3. The specific metaphysical profile of Cartwright’s “capacities”, intended as ‘powers’ that ground the causal laws, does not affect the arguments below. Cf. also Cartwright’s discussion on pp. 53-4 of her (2009).

  4. Mäki (1992) is another proponent of the view of models as isolators. In light of Sugden’s (2000) criticism, however, Mäki (2009) takes a less extreme position on the epistemology of models, closer to Sugden’s.

  5. Cf. Sugden (2009) on Banerjee’s (1992) herd model: “The effect of herding in the model is similar to that of herding in the real world. From the similarity in effects, we are invited to infer […] similar causes” (10).

  6. I am skipping over some of the subtleties regarding Hesse’s notion of ‘inductive support’ which is afforded by the use of models. On her view, the generic notion of inductive support that is at stake (before, as it were, a probabilistic theory of confirmation is invoked) is a comparative notion of one hypothesis (based on a model) being more reasonable than another (not based on a model). The probabilistic rendering in terms of ‘justifying additional credence’ to a hypothesis is discussed on pp. 112-3 in her chapter ‘The Logic of Analogy’ (1963).

  7. Among other things, this condition purports to rule out that similarities in gerrymandered respects, such as those of the grue/bleen variety made famous by Goodman (1955), count as evidence for other, merely predicted similarities between a model and a target. For Hesse, those respects of similarity are illegitimate in inductive arguments since they do not belong to the accepted vocabulary of any current or past science.

  8. Bartha (2009, 43) misinterprets Hesse’s (1963) causal condition when he raises as a problem for her account Franklin’s argument that lightning is attracted by pointed metal rods, based on the fact that the ‘electrical fluid’ is so attracted and that “electrical fluid agrees with lightning in these particulars: Giving light. Color of the light. Crooked direction. Swift motion. Being conducted by metals…” (1941, 334). Granted, when proposing this analogy, Franklin had little knowledge of the causal fgeeatures of either electricity or lightning. Yet Hesse’s causal condition does not require knowledge: “the use of analogical argument does not presuppose that the actual causal relation is known” (1963, 84). Hesse’s causal condition only requires that there be some antecedent reason for expecting causal relations to underwrite the correlations observed in the source. When interpreted in this way, it is far from obvious that Franklin’s case poses a problem for her causal condition.

  9. One of Hesse’s (1963) favorite examples of this class of models is “the formal analogy between elliptic membranes and the acrobat’s equilibrium, both of which can be described by Mathieu’s Equation” (69).

  10. Note that the appeal to ‘causal laws’ to spell out the notion of a causal connection is not only consistent with the accounts by, e.g., Hesse (1963) and Cartwright (2009), but (as discussed on pp. 18–19 and in fn. 18 and 26) is to my knowledge one of the broadest possible formulations of the causal condition on learning that does not trivialize its content. I thus expect defenders of the causal condition to welcome my proposed explication.

  11. Today this law is known as the ‘central limit theorem’. I will continue using Galton’s terminology to avoid confusion when quoting him. The same applies to ‘reversion towards mediocrity’ that will be mentioned momentarily: in contemporary statistics, this phenomenon is known as ‘regression towards the mean’.

  12. Here I disagree with Ariew et al. (2017), according to which the quincunx of Fig. 1 “provides justification for his statistical assumption [viz. “that hereditary characters approximate the normal distribution”]” (70). As will be discussed below, it is the resemblance between Quetelet’s data with the more exceptional outcome of the machine of Fig. 2 (absent from their reconstruction) that yields confirmation to the statistical assumption.

  13. If more than one process of heredity is at play, but they all conform to the law of deviation, then their combined result must necessarily conform to the law of deviation as well. Galton calls this statistical theorem (that the sum of independent normal variates is itself normal), the “law of the sum of two fallible measures” (1877, 533).

  14. Its approximately linear character, whereby reversion is proportional to the magnitude of variation, results instead from the fact that the distribution of the parent generation is bell-shaped. This is in turn to be explained in the same way, that is, by the fact that the heredity processes leading to that generation’s distribution obeyed the law of deviation. Galton’s account is therefore ‘recursive’: assuming one generation’s distribution is normal, a story is given about what enables “successive generations to maintain statistical identity” (1877, 493).

  15. It may be objected that in 1877, when “Typical Laws of Heredity” was published, Galton took himself to have singled out a causal law of reversion towards mediocrity. According to Stigler (2016), Galton realized that the reversion law does not describe a “pull” towards the ancestral type, but is rather a purely statistical effect, only when he was able to compare (a few years after 1877) the data concerning the heights of children in relation to their parents with the data about the height of brothers in the same family. However, first, it doesn’t follow from the law of reversion being understood as a causal law that Galton’s model satisfies the causal condition, since the law of deviation, which is also needed for the derivation of statistical identity, was by Galton’s lights in 1877 “purely numerical” (495). At most, the case-study is a hybrid. Second, it seems to me disingenuous for a defender of the causal condition on learning to rely so much on these historical contingencies. Regardless of Galton’s oversights in 1877, his example makes it plausible that some non-causal models can prompt learning.

  16. I am including in the ‘heuristic’ category the use of models for “conceptual exploration” discussed in Hausman (1992). His idea is that some models are studied in order to learn facts about the models themselves—whether or not this information is in turn projectable onto real-world targets. I am also including in the broadly ‘heuristic’ category the use of non-causal models to support “weak conclusions” (i.e., plausibility claims) about the target and that of motivating “adherence to a research program” discussed in Pincock (2012).

  17. Note that the distinction between causal and non-causal laws that I am drawing is orthogonal to the distinction between deterministic and probabilistic laws. Thus, the fact that ‘randomness’ is involved in Galton’s laws but none is involved in Newton’s law of gravity plays no role in my argument. Priestley’s case-study would still qualify as ‘causal’, on my account, even if Newton’s law made use of probabilities.

  18. I take it to be understood that entailments mediated by mathematical theorems do not count as ‘causal connections’ on any remotely plausible, non-trivial conception of the latter notion. See also fn. 26.

  19. This argument targets ‘formal models’ in Hesse’s specific sense: i.e., purely structural isomorphisms without any ‘observable’ similarities. Note, however, that the triviality objection is still plausible under Bartha’s (2009, p. 195) slightly more liberal construal of the notion of a formal model.

  20. At the very least, defenders of the thesis of learning from non-causal models do not face the burden of having to show what (if anything) is wrong with the triviality objection just mentioned.

  21. Other authors (e.g. Batterman 2002; Batterman and Rice 2014) use ‘minimal models’ in a less specific way to refer to ‘highly idealized’ models. Here I will use ‘minimal models’ in Grüne-Yanoff’s specific sense.

  22. Another seemingly related thesis (which I will not discuss here) is Nguyen’s (2019) view that some models can “adequately represent” a real-world target without being “similar” (2) to them. Unfortunately, the author does not provide an explication of ‘similarity’ that would clarify the content and distinctiveness of his thesis.

  23. While in this paper I have deliberately avoided the topic of scientific explanation, I should note here that there is an interesting parallel between Grüne-Yanoff’s claim that minimal models can engender learning and Batterman and Rice’s (2014) claim that there exist ‘minimal model explanations’ in science. On their view, some scientific models can be used to explain properties of real-world systems without having any features in common with those systems. For a critical discussion, whose negative upshot nicely matches the ones I will advance below with regards to Grüne-Yanoff’s thesis, cf. Lange’s (2014) reply to Batterman and Rice.

  24. I am not committed to claiming that all of Grüne-Yanoff’s case-studies are non-causal models. Indeed, some of them may even be ‘ordinary’ causal models—so long, of course, as they are not minimal models.

  25. Note that, given any real-world target, there exists an arbitrary one-to-one isomorphism with the properties of the model that makes the latter (at least) a ‘formal’ model, i.e., a model bearing merely formal similarities with the target. Here, however, I am concerned with the allegation that there are no ‘observable’ similarities between Schelling’s checkerboard model and actual cities. Grüne-Yanoff’s (2009) argument that modellers often do not specify their intended targets is insufficient to establish the absence of such similarities since, as Fumagalli (2016, p. 440) notes, the intended targets may be left implicit by the modelers.

  26. Of course, we can speak as if dimes and pennies have ‘preferences’ not to be a minority in their neighborhood, and as if those preferences ‘cause’ the segregated patterns in the checkerboard (Sugden 2000 sometimes speaks in this way). However, that would not be enough to show that the checkerboard resembles actual cities in their causal features. This is because we would not be inclined to accept that dimes and pennies have ‘preferences’ before the analogy with actual cities was introduced. Hence whatever similarities result from adopting talk of ‘preferences’ for dimes and pennies would be purely ‘formal’ and not ‘material’.

  27. Cf. Sugden (2000): “suppose we read Schelling as claiming that if people lived in checkerboard cities, and if people came in just two colours, […] and if…, and if… (going on to list all the properties of the model), then cities would be racially segregated. That is not an empirical claim at all: it is a theorem.” (17).


  • Ainslie, G. (2001). Breakdown of the will. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Ariew, A., Rohwer, Y., & Rice, C. (2017). Galton, reversion and the quincunx. Studies in History and Philosophy of Science Part C, 66, 63–72.

    Article  Google Scholar 

  • Banerjee, A. (1992). A simple model of herd behavior. The Quarterly Journal of Economics, 107(3), 797–817.

    Article  Google Scholar 

  • Bartha, P. (2009). By parallel reasoning. New York: Oxford University Press.

    Google Scholar 

  • Batterman, R. (2002). Asymptotics and the role of minimal models. The British Journal for the Philosophy of Science, 53, 21–38.

    Article  Google Scholar 

  • Batterman, R., & Rice, C. (2014). Minimal model explanation. Philosophy of Science, 81(3), 349–376.

    Article  Google Scholar 

  • Cartwright, N. (1989). Nature’s capacities and their measurement. Oxford: Clarendon Press.

    Google Scholar 

  • Cartwright, N. (2009). If no capacities then no credible worlds. But can models reveal capacities? Erkenntnis, 70, 45–58.

    Article  Google Scholar 

  • Franklin, B. (1941). Benjamin Franklin’s experiments. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Fraser, D. (forthcoming). The non-miraculous success of formal analogies in quantum theories. In French, S., Saatsi, J. (eds.), Scientific realism and the quantum. New York: Oxford University Press.

  • Fumagalli, R. (2015). No learning from minimal models. Philosophy of science. In Proceedings of the 24th Biennial meeting of the philosophy of science association.

  • Fumagalli, R. (2016). Why we cannot learn from minimal models. Erkenntnis, 81(3), 433–455.

    Article  Google Scholar 

  • Galton, F. (1877). Typical laws of heredity. Nature, 15(5), 492–533.

    Google Scholar 

  • Giere, R. (1988). Explaining science: A cognitive approach. Chicago: Chicago University Press.

    Book  Google Scholar 

  • Goodman, N. (1955). Fact, fiction and forecast. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Grüne-Yanoff, T. (2009). Learning from minimal economic models. Erkenntnis, 70(1), 81–99.

    Article  Google Scholar 

  • Grüne-Yanoff, T. (2013). Appraising models non-representationally. Philosophy of Science, 80(5), 850–861.

    Article  Google Scholar 

  • Güth, W. (1995). An evolutionary approach to explaining cooperative behavior by reciprocal incentives. International Journal of Game Theory, 24, 323–344.

    Article  Google Scholar 

  • Hausman, D. (1992). The inexact and separate science of economics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Hesse, M. (1963). Models and analogies in science. Notre Dame: University of Notre Dame Press.

    Google Scholar 

  • Knuutila, T. (2009). Isolating representations vs. credible constructions? Economic modelling in theory and practice. Erkenntnis, 70, 59–80.

    Article  Google Scholar 

  • Lange, M. (2014). On ‘minimal model explanations’: A reply to batterman and rice. Philosophy of Science, 82(2), 292–305.

    Article  Google Scholar 

  • Lange, M. (2016). Because without cause. New York: Oxford University Press.

    Book  Google Scholar 

  • Mäki, U. (1992). On the method of isolation in economics. Poznan Studies in the Philosophy of the Sciences and the Humanities, 26, 316–351.

    Google Scholar 

  • Mäki, U. (2009). MISSing the world. Models as isolations and credible surrogate systems. Erkenntnis, 70(1), 29–43.

    Article  Google Scholar 

  • Nguyen, J. (2019). It’s not a game: Accurate representation with Toy models. The British Journal for the Philosophy of Science, 99, 225.

    Google Scholar 

  • Norton, J. (forthcoming). The material theory of induction. (available online).

  • Pietsch, W. (2019). A causal approach to analogy. Journal for General Philosophy of Science, 50(4), 489–520.

    Article  Google Scholar 

  • Pincock, C. (2012). Mathematical models of biological patterns. Lessons from Hamilton’s selfish Herd. Biology and Philosophy, 27, 481–496.

    Article  Google Scholar 

  • Priestley, J. (1767). History and present state of electricity. London: Dodsell, Johnson & Cadell.

    Google Scholar 

  • Reutlinger, A., Hangleiter, D., & Hartmann, S. (2017). Understanding (with) toy models. The British Journal for the Philosophy of Science, 69(4), 1069–1099.

    Article  Google Scholar 

  • Schelling, T. (1978). Micromotives and macrobehavior. New York: Norton.

    Google Scholar 

  • Sober, E. (2001). Venetial sea levels, british bread prices, and the principle of common cause. The British Journal for the Philosophy of Science, 52, 331–346.

    Article  Google Scholar 

  • Steel, D. (2007). Across the boundaries: Extrapolation in biology and social science. New York: Oxford University Press.

    Book  Google Scholar 

  • Stigler, S. (2016). The seven pillars of statistical wisdom. Cambridge, MA: Harvard University Press.

    Book  Google Scholar 

  • Sugden, R. (2000). Credible worlds: The status of theoretical models in economics. Journal of Economic Methodology, 7, 1–31.

    Article  Google Scholar 

  • Sugden, R. (2009). Credible worlds, capacities and mechanisms. Erkenntnis, 70, 3–27.

    Article  Google Scholar 

Download references


I am grateful to Marc Lange, Matt Kotzen, Alan Nelson, Samantha Wakil and two anonymous referees of this Journal for their extremely helpful comments on earlier drafts of this paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Francesco Nappo.

Ethics declarations

Conflict of interest

The author declares that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nappo, F. Learning from Non-Causal Models. Erkenn 87, 2419–2439 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: