Skip to main content
Log in

Contrast classes and agreement in climate modeling

  • Paper in Philosophy of the Natural Sciences
  • Published:
European Journal for Philosophy of Science Aims and scope Submit manuscript

Abstract

In an influential paper, Wendy Parker argues that agreement across climate models isn’t a reliable marker of confirmation in the context of cutting-edge climate science. In this paper, I argue that while Parker’s conclusion is generally correct, there is an important class of exceptions. Broadly speaking, agreement is not a reliable marker of confirmation when the hypotheses under consideration are mutually consistent—when, e.g., we’re concerned with overlapping ranges. Since many cutting-edge questions in climate modeling require making distinctions between mutually consistent hypotheses, agreement across models will be generally unreliable in this domain. In cases where we are only concerned with mutually exclusive hypotheses, by contrast, agreement across climate models is plausibly a reliable marker of confirmation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. For a sampling of the discussion, see Baumberger et al. (2017), Frigg et al. (2015), Gluck (2023), Harris (2021), Harris and Frigg (2023), Justus (2012), Lloyd (2015b), O’Loughlin (2021), and Winsberg (2021).

  2. Lloyd (2015a) makes a similar point on very different grounds.

  3. Why focus on probability-raising here? Because some of the other desiderata for a good hypothesis—such as informativeness or decision-relevance—trade off with posterior probability. For instance: the hypothesis that ECS is in the range 1.0-5.0\(^{\circ }\)C will always be more probable but less informative than the hypothesis that it is in the range 1.5-4.5\(^{\circ }\)C. What we want from our evidence is information about how much more probable—that is, we want it to raise the probability of one hypothesis relative to the other (see §4 for further discussion).

  4. There are a variety of different views in this vicinity, each with their own caveats and alternatives. See, e.g., Carrier and Lenhard (2019), Jebeile and Barberousse (2021), Katzav et al. (2021), Parker and Risbey (2015), Stainforth et al. (2007), and Winsberg (2018). For a contrasting view, see Dethier (2022, 2023a).

  5. This treatment of agreement should be distinguished from those in which agreement (or a related concept like “robustness”) is interpreted as a particular probabilistic condition (see, e.g., Dethier, forthcoming; Schupbach, 2018). Since I’m following Parker closely here, I’m adopting her approach, but in other contexts an alternative understanding of agreement may be preferable.

  6. We can model Parker’s concept of agreement as follows. Let \(s_i\) represent what I’m calling the estimate and what Parker calls the “statement” of the \(i^{\text {th}}\) model. Then two models agree on \(h_1\) iff \(s_1 \vDash h_1\) and \(s_2 \vDash h_1\). Note that the evidence (\(e_i\)) in this situation is not \(s_i\) but the fact that the \(i^{\text {th}}\) model “says” that \(s_i\). After all, within this idealized setting, we can assume that we know what the model says, but we can’t update on the truth of that content because we don’t know whether “what the model says” is true. Agreement is then a reliable marker of confirmation to the extent that agreement (\(s_1 \vDash h_1\) and \(s_2 \vDash h_1\)) makes confirmation (\( Pr(e_1 \& e_2|h_1) > Pr(e_1 \& e_2|h_2)\)) likely.

  7. Of course, if we view the formalism here itself as a model of the testing situation, we might reasonably only draw qualitative conclusions about “real” degree of confirmation. My point here is that formal measures of confirmation track the ratio between the two V terms.

  8. Dethier (forthcoming), Myrvold (1996), and Wheeler (2012) all make this point explicitly using variations on the same formalism adopted here. The point holds in other settings, however, as illustrated by the work of Bovens and Hartmann (2003) and Schupbach (2018).

  9. Of course, putting the question in these terms is potentially misleading in one respect: the ranges for ECS are not put forward ahead of time as hypotheses; instead, they’re inferred—typically using statistics—from the outputs of models and other evidence. My discussion here should be understood as representing the formal support relationships between the evidence and the space of hypotheses that we could draw inferences about on the basis of that evidence and not as capturing anything like the temporal order involved in the practice of drawing these inferences.

  10. Though that doesn’t mean that they’re inaccurate in the same way. On the contrary, there’s a known correlation between error and spread: when the models get a prediction wrong, they tend to get it wrong in a bunch of different ways (see Knutti et al., 2010).

  11. My point here is not that we should add error bars around \(h_1\) in this case, but that the mere presence of idealizations doesn’t undermine the convergence condition except in special cases; see the next paragraph.

  12. There are cases like this in regional climate modeling; see, e.g., Boé et al. (2020) and Schwingshackl et al. (2019).

  13. As one reviewer stressed, one can imagine extreme scenarios—where, e.g., \(h_1\) specifies ECS to five significant digits—where we should expect some idealization explains the agreement. I agree, but think the explanation for our intuition here is that the prior for the relevant \(h_1\) is extremely low and the observed agreement is extremely surprising whether or not \(h_1\) is true—it’s not that \(\lnot h_1\) offers any better explanation of it that \(h_1\) does. Indeed, our confidence in \(h_1\) in this scenario should increase! Just not enough to overcome the low prior.

  14. The claim here is: where climate scientists haven’t found these interactions, we don’t have sufficient reason to expect them to exist. The restriction is important. In other contexts, we might expect that such interactions are common enough that even experts miss them with a high degree of frequency. I see no reason to postulate that that’s the case here, however.

  15. Reasonably, Parker doesn’t define “significant,” but she does note that it depends on contextual factors (Parker, 2018, 281).

  16. Of course, there’s no guarantee that the models will agree on all of the mutually consistent hypotheses that we’re interested in. In that case, by the reasoning of the last section, there will be some confirmation of the set of hypotheses that the models do agree on. But notice: if the models all fall within the range 1.0-5.0\(^{\circ }\)C, my arguments in the last section strictly only warrant increased confidence that ECS falls within that range relative to the hypothesis that it doesn’t. Plausibly, my arguments could be extended to confirmation relative to a fully excluded range, like 6.0-10.0\(^{\circ }\)C. But they don’t extend to confirmation relative to 1.0-10.0\(^{\circ }\)C. As we’ll see in the next section, these kinds of overlapping hypotheses with what are effectively different “error bars" are one of the major cases that Parker is concerned with. Or, in other words, in the cases at issue, our prior probability is often sufficiently concentrated that at least some results fall within the scope of each of the interesting hypotheses. Where that isn’t true, however, I’m happy to acknowledge that we have confirmation of some subset of the mutually consistent hypotheses.

  17. The use of “consequent” here can be made more appropriate by reframing the discussion in terms of an agreement conditional and a confirmation conditional á la Joyce (1999), but I think that’s unnecessary for communicating the point.

  18. The following discussion is simplified, but a close evaluation of Perrin’s work vindicates the general picture while complicating our understanding of how well Perrin himself succeed at the task (see Smith & Seth, 2020, chapter 6).

  19. Though, as O’Loughlin (2021) and Winsberg (2018) stress, there are specific cases in which we can further constrain model outputs using knowledge from other sources of evidence.

  20. Why not? Agreement ignores question of distribution: if we’re asking whether the set of estimates agrees on h, we’re putting aside variation in how much the different estimates support h or where the different estimates fall within the range captured by h. In Bayesian terms, we’re not updating on the total evidence. One way of viewing cases in which agreement is a good heuristic is that they’re cases in which the distribution makes little difference and so updating only on agreement approximates the results that we would get if we updated on the total evidence. As one reviewer stressed to me, at face value this point supports the argument by Lloyd (2015b) and others that we should adopt a different understanding of “robustness.”

  21. I’m aware of two extended treatments of these sorts of probability claims: Dethier (2023b) and Winsberg (2018). Both explicitly build this point into their account.

  22. Of course, the real question here is “what is the value of ECS?” which is plausibly a question where the answers are mutually exclusive. But the models don’t agree on that question—they deliver a range of different answers—and so the agreement heuristic cannot help us there.

  23. Notably the IPCC doesn’t appeal to model agreement in answering this question; by all appearances, their reasoning is purely statistical.

References

  • Baumberger, C., Knutti, R., & Hirsch Hadorn, G. (2017). Building confidence in climate model projections: An analysis of inferences from fit. Wiley Interdisciplinary Reviews: Climate Change, 8(3), e454.

    Google Scholar 

  • Boé, J., Somot, S., Corre, L., & Nabat, P. (2020). Large discrepancies in summer climate change over Europe as projected by global and regional climate models: Causes and consequences. Climate Dynamics, 54, 2981–3002.

    Article  Google Scholar 

  • Bovens, L., & Hartmann, S. (2003). Bayesian Epistemology. Oxford University Press.

    Google Scholar 

  • Carrier, M., & Lenhard, J. (2019). Climate models: How to assess their reliability. International Studies in the Philosophy of Science, 32(2), 81–100.

    Article  Google Scholar 

  • Chandler, J. (2007). Solving the tacking problem with contrast classes. British Journal for the Philosophy of Science, 58(3), 489–502.

    Article  Google Scholar 

  • Chandler, J. (2013). Contrastive confirmation: Some competing accounts. Syntese, 190(1), 129–138.

    Google Scholar 

  • Dai, A., Huang, D., Rose, B. E. J., Zhu, J., & Tian, X. (2020). Improved methods for estimating equilibrium climate sensitivity from transient warming simulations. Climate Dynamics, 54(11–12), 4515–4543.

    Article  Google Scholar 

  • Dethier, C. (forthcoming). The unity of robustness: Why agreement across model reports is just as valuable as agreement among experiments. Erkenntnis

  • Dethier, C. (2022). When is an ensemble like a sample? ‘Model-Based’ inferences in climate modeling. Synthese, 200(52), 1–20.

    Google Scholar 

  • Dethier, C. (2023a). Against “Possibilist’’ interpretations of climate models. Philosophy of Science, 90(5), 1417–1426.

    Article  Google Scholar 

  • Dethier, C. (2023b). Interpreting the probabilistic language in IPCC reports. Ergo, 10(8), 203–225.

    Google Scholar 

  • Dong, Y., Armour, K. C., Zelinka, M. D., Proistosescu, C., Battisti, D. S., Zhou, C., & Andrews, T. (2020). Intermodel spread in the pattern effect and its contribution to climate sensitivity in CMIP5 and CMIP6 models. Journal of Climate, 33(18), 7755–7775.

  • Eyring, V., Bock, L., Lauer, A., Righi, M., Schlund, M., Andela, B., Arnone, E., Bellprat, O., Broetz, B., Caron, L.-P., Carvalhais, N., Cionni, I., Cortesi, N., Crezee, B., Davin, E. L., Davini, P., Debeire, K., de Mora, L., Deser, C., . . . Zimmermann, K. (2020). Earth system model evaluation tool (ESMValTool) v2.0 – an extended set of large-scale diagnostics for quasi-operational and comprehensive evaluation of Earth system models in CMIP. Geoscientific Model Development, 13(7), 3383–3438.

  • Frigg, R., Thompson, E., & Werndl, C. (2015). Philosophy of climate science part II: Modeling climate change. Philosophy Compass, 10(12), 965–977.

    Article  Google Scholar 

  • Gluck, S. (2023). Robustness of climate models. Philosophy of Science, 90(5), 1407–1416.

    Article  Google Scholar 

  • Gregory, J. M., Andrews, T., Ceppi, P., Mauritsen, T., & Webb, J. M. (2020). How accurately can the climate sensitivity to CO$_2$ be estimated from historical climate change? Climate Dynamics, 54(1–2), 129–157.

    Article  Google Scholar 

  • Harris, M., & Frigg, R. (2023). Climate models and robustness analysis – part II: The justificatory challenge. In G. Pellegrino, & M. Di Paola (Eds.), Handbook of the philosophy of climate change (pp. 1–22). Springer.

  • Harris, M. (2021). The epistemic value of independent lies: False analogies and equivocations. Synthese, 199, 14577–97.

    Article  Google Scholar 

  • IPCC (2021). Climate change 2021: The physical science basis. In V. Masson-Delmotte, P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, & B. Zhou (Eds.), Sixth assessment report of the intergovernmental panel on climate change. Cambridge University Press.

  • IPCC (2013). Climate change 2013: The physical science basis. In T. F. Stocker, D. Qin, G.-K. Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex, & P.M. Midgley (Eds.), Fifth assessment report of the intergovernmental panel on climate change. Cambridge University Press.

  • Jebeile, J., & Barberousse, A. (2021). Model spread and progress in climate modelling. European Journal for Philosophy of Science,11(66).

  • Joyce, J. M. (1999). The foundations of causal decision theory. Cambridge University Press.

    Book  Google Scholar 

  • Justus, J. (2012). The elusive basis of inferential robustness. Philosophy of Science, 79(5), 795–807.

    Article  Google Scholar 

  • Kass, R. E., & Rafterty, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.

    Article  Google Scholar 

  • Katzav, J., Thompson, E. L., Risbey, J., Stainforth, D. A., Bradley, S., & Frisch, M. (2021). On the appropriate and inappropriate uses of probability distributions in climate projections, and some alternatives. Climatic Change, 169(15), 1–20.

  • Knutti, R., Furrer, R., Tebaldi, C., Cermak, J., & Meehl, G. A. (2010). Challenges in combining projections from multiple climate models. Journal of Climate, 25(10), 2739–2758.

  • Lloyd, E. (2015a). Adaptationism and the logic of research questions. Biological Theory, 10, 343–362.

    Article  Google Scholar 

  • Lloyd, E. (2015b). Model robustness as a confirmatory virtue: The case of climate science. Studies in History and Philosophy of Science Part A, 49, 58–68.

    Article  Google Scholar 

  • Myrvold, W. (1996). Bayesianism and diverse evidence: A reply to Andrew Wayne. Philosophy of Science, 63(4), 661–665.

    Article  Google Scholar 

  • Myrvold, W. (2017). On the evidential import of unification. Philosophy of Science, 84(1), 92–114.

    Article  Google Scholar 

  • O’Loughlin, R. (2021). Robustness reasoning in climate model comparisons. Studies in History and Philosophy of Science Part A, 85, 34–43.

    Article  Google Scholar 

  • O’Loughlin, R. (2023). Diagnosing errors in climate model intercomparisons. European Journal for Philosophy of Science, 13(20), 1–29.

    Google Scholar 

  • Parker, W. S. (2018). The significance of robust climate projections. In E. A. Lloyd, & E. Winsberg (Eds.), Climate modeling: Philosophical and conceptual issues (pp. 273–296). Palgrave Macmillan.

  • Parker, W. S. (2010). Whose probabilities? Predicting climate change with ensembles of models. Philosophy of Science, 77(5), 985–997.

    Article  Google Scholar 

  • Parker, W. S. (2011). When climate models agree: The significance of robust model predictions. Philosophy of Science, 78(4), 579–600.

    Article  Google Scholar 

  • Parker, W. S., & Risbey, J. S. (2015). False precision, surprise and improved uncertainty assessment. Philosophical Transactions of the Royal Society Part A, 373(3055), 20140453.

    Article  Google Scholar 

  • Perrin, J. (1916). Atoms. Trans. In D. L. Hammick (Ed.). D. Van Norstrand.

  • Schlosshauer, M., & Wheeler, G. (2011). Focused correlation, confirmation, and the jigsaw puzzle of variable evidence. Philosophy of Science, 78(3), 376–392.

    Article  Google Scholar 

  • Schupbach, J. (2018). Robustness analysis as explanatory reasoning. British Journal for the Philosophy of Science, 69(1), 275–300.

    Article  Google Scholar 

  • Schwingshackl, C., Davin, E. L., Hirschi, M., Sørland, S. L., Wartenburger, R., & Seneviratne, S. I. (2019). Regional climate model projections underestimate future warming due to missing plant physiological CO$_{2}$ response. Environmental Research Letters, 14(11), 1–11.

  • Sherwood, S. C., Webb, M. J., Annan, J. D., Armour, K. C., Forster, P. M., Hargreaves, J. C., Hegerl, G., Klein, S. A., Marvel, K. D., Rohling, E. J., Watanabe, M., Andrews, T., Braconnot, P., Bretherton, C. S., Foster, G. L., Hausfather, Z., von der Heydt, A. S., Knutti, R., Mauritsen, T., . . . Zelinka, M. D.(2020). An assessment of Earth’s climate sensitivity using multiple lines of evidence. Review of Geophysics, 58(4), e2019RG000678.

  • Smith, G. E., & Seth, R. (2020). Brownian motion and molecular reality: A study in theory-mediated measurement. Oxford University Press.

    Google Scholar 

  • Stainforth, D. A., Downing, T. E., Washington, R., Lopez, A., & New, M. (2007). Issues in the interpretation of climate model ensembles to inform decisions. Philosophical Transactions of the Royal Society Series A, 365(1857), 2163–2177.

  • Stegenga, J., & Menon, T. (2017). Robustness and independent evidence. Philosophy of Science, 84(3), 414–435.

    Article  Google Scholar 

  • Wheeler, G. (2012). Explaining the limits of Olsson’s impossibility result. Southern Journal of Philosophy, 50(1), 136–150.

    Article  Google Scholar 

  • Wheeler, G., & Scheines, R. (2013). Coherence and confirmation through causation. Mind, 122(485), 135–170.

    Article  Google Scholar 

  • Winsberg, E. (2018). Philosophy and climate science. Cambridge University Press.

    Book  Google Scholar 

  • Winsberg, E. (2021). What does robustness teach us in climate science: A re-appraisal. Synthese, 198, 5099–5122.

    Article  Google Scholar 

Download references

Acknowledgements

My thanks to Wendy Parker, who helpfully provided thorough comments on earlier drafts of this paper on two separate occasions; Morgan Thomson; and a number of anonymous referees for comments on earlier versions of this paper.

Funding

Funding for this paper was provided in part by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project 254954344/GRK2073 and in part by the National Science Foundation under Grant No. 2042366.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Corey Dethier.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dethier, C. Contrast classes and agreement in climate modeling. Euro Jnl Phil Sci 14, 14 (2024). https://doi.org/10.1007/s13194-024-00577-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13194-024-00577-6

Keywords

Navigation