All analyses were conducted in R, using the packages ggplot2 (Wickham and Winston 2019), lmerTest (Kuznetova et al. 2017) and lme4 (Bates et al. 2022).
Reported familiarity, clarity, usefulness, and frequency of use of the IPCC guidance note for assigning confidence levels
To answer our first research question, we examined experts’ reported familiarity, clarity, usefulness, and frequency of use of the IPCC guidance note for assigning confidence levels (Fig. 2; Mastrandrea et al. 2010). We found that the physical scientists were most familiar with the IPCC guidance note: 87% said they were “very familiar” or “familiar” with the IPCC guidance note, compared to 73% of the earth scientists, 66% of the life scientists, 64% of the social scientists, and 64% of the engineers (Fig. 2; Table A.3 in Appendix). Clarity and usefulness of the IPCC guidance note about confidence (Mastrandrea et al. 2010) were rated quite similar by experts from all disciplines. Yet, 81% of the physical scientists indicated using the IPCC guidance note “always” or “very frequently,” while that figure was 69% for the earth scientists, 52% for the life scientists, 53% for the social scientists, and 50% for the engineers (Fig. 2). Thus, experts from different disciplines may have different needs for guidelines on communicating confidence about climate evidence (Swart et al. 2009).
Confidence levels assigned to evidence and agreement
To answer our second research question about how experts’ assignment of confidence levels varied with the presented levels of evidence (“low,” “medium,” “robust”) and scientific agreement (“low,” “medium,” “high”) from the IPCC guidance note (Mastrandrea et al. 2010; Fig. 1, and in “Materials and methods”), we computed multilevel linear regressions to predict experts’ assigned confidence levels from the presented level of evidence, agreement, their interaction, and experts’ type and number of disciplines. In all models that included type of discipline, earth scientists were used as the reference group because they were the largest group. All models also controlled for demographic variables, such as years since PhD and gender. In line with the IPCC guidance note (Mastrandrea et al. 2010), experts assigned higher confidence levels when presented with statements declaring more robust evidence and higher agreement (Fig. 3). Relationships between assigned confidence levels and presented levels of agreement were stronger when the evidence was robust (evidence [low vs. medium vs. robust] × agreement [low vs. medium vs. high]: b = 0.15; SE = 0.02; 95% CI [0.11, 0.19], Fig. 4, Table A.4A in Appendix). We also examined whether experts from different disciplines varied in their assigned confidence levels, given presented levels of evidence and agreement. We found a stronger positive relationship between the level of evidence and assigned confidence levels among life scientists as compared to earth scientists (evidence × life sciences: b = 0.13, SE = 0.07, 95%CI [− 0.01, 0.27]), and among engineers as compared to earth scientists (evidence x engineering: b = 0.14, SE = 0.07, 95%CI [0.01, 0.27], Table A.4B in Appendix).
We also assessed how many experts refrained from assigning a confidence level to each combination of evidence and agreement. Only 3% refrained from assigning a confidence level for “robust evidence/high agreement,” 6% refrained for “robust evidence/medium agreement” and for “medium evidence/high agreement,” and 4% refrained for “medium evidence/medium agreement.” For combinations where evidence and agreement appeared to be more contradictory, experts were less likely to assign a confidence level. For example, 14% assigned no confidence level for “robust evidence/low agreement,” 12% assigned none for “low evidence/high agreement,” and 12% assigned none for “low evidence/low agreement.” We computed multilevel linear regressions to predict the experts’ likelihood to refrain from assigning a confidence level, including evidence, agreement, their interaction, and experts’ type and number of disciplines. In all models that included type of discipline, earth scientists were used as the reference group because they were the largest group. All models also controlled for demographic variables, such as years since PhD and gender. Overall, experts were less likely to assign a confidence level when the evidence was less robust and the agreement was lower (evidence x agreement: b = − 0.67, SE = 0.14, 95%CI [− 0.95, − 0.40]; Table A.5A in Appendix). This pattern was similar across the type and number of disciplines (all p > 0.13; Table A.5B in Appendix).
Estimated probability intervals for combinations of likelihood terms and confidence levels, when presented out of context or in context
Our third research question concerned the relationship between the experts’ estimated probability intervals in response to combinations of likelihood terms (e.g., “likely,” “very unlikely”) and confidence levels (e.g., “high confidence,” “medium confidence”), which were presented out of context or in the context of IPCC report sentences. We analyzed descriptive statistics of the lower and upper bounds of estimated probability intervals. For all likelihood terms, the lower and upper bounds regressed to 50% when experts were presented with “medium confidence” rather than “high confidence” (Fig. 4).
We also calculated probability intervals by subtracting the experts’ lower bound estimates from their upper bound estimates (Harris et al. 2017). We assessed whether probability intervals differed with the presented likelihood terms and confidence levels, and out of context or in the context of IPCC report sentences (Fig. 4). We computed multilevel linear regressions to predict the experts’ probability intervals, including the likelihood term, confidence level, out of vs. in context presentation, their interactions, and the experts’ type and number of disciplines. In all linear models with type of discipline, the earth scientists were used as the reference because they were the largest group. All models also controlled for demographic variables, such as years since PhD and gender. Overall, probability intervals were wider in response to the term “likely” (M = 24.53, SD = 16.30) than to the term “very unlikely” (M = 16.29, SD = 14.90; likelihood term: b = − 8.12, SE = 0.87, 95%CI [− 9.82, − 6.42]; Table A.7A in Appendix). Furthermore, probability intervals were wider when likelihood terms were combined with “medium confidence” compared to “high confidence” (confidence level: b = 3.43, SE = 0.87, 95%CI [1.72, 5.13]; Table A.7A in Appendix). The interval ranges were also, on average, wider for out of than in context presentations (M = 21.44, SD = 16.79 vs. M = 19.36, SD = 15.29; context: b = − 1.98, SE = 0.62, 95%CI [− 3.18, − 0.77]; Table A.7A in Appendix, but please note that medians looked slightly different to what models indicated; Fig. 4). Probability intervals were slightly wider for earth scientists compared to life scientists (life scientists: b = − 6.25, SE = 3.45, 95%CI [− 12.93, 0.43]); Table A.7B in Appendix).
We also examined the accuracy of the experts’ estimated probability intervals. Intervals were coded as accurate when they fell within intervals specified in the IPCC guidance note (Mastrandrea et al. 2010; Fig. 1). We computed multilevel linear regressions to predict the accuracy of the experts’ estimated probability intervals, including the likelihood term, confidence level, context, their interactions, and the type and number of disciplines. In all linear models with type of discipline, the earth scientists were used as the reference because they were the largest group. All models also controlled for demographic variables, such as years since PhD and gender. Fifty percent of intervals for the “likely and high confidence” term fell within the interval specified in the IPCC guidance note. This was also true for 18% of intervals for the “likely and medium confidence” term, 42% of intervals for the “very unlikely and high confidence” term, and 20% of intervals for the “very unlikely and medium confidence” term (likelihood term x confidence level: b = 1.27, SE = 0.31, 95%CI [0.66, 1.89]). Probability intervals in response to “medium confidence” were more likely to be accurate when presented in context (confidence level x context: b = 0.68, SE = 0.31, 95%CI [0.07, 1.29]; Table A.9A in Appendix). Earth scientists were more likely to accurately estimate probability intervals than social scientists (social sciences: b = − 0.75, SE = 0.38, 95%CI [− 1.49, − 0.01]). Estimates by experts who self-identified with several disciplines were more accurate than those by experts who self-identified with one discipline (number of disciplines [one vs. several]: b = 0.87, SE = 0.36, 95%CI [0.18, 1.57]; Table A.9B in Appendix). Figure 5 displays cumulative proportions of accurate probability-interval estimates across eight combinations, separated by one discipline vs. several disciplines (see also Fig. A.1B in Appendix).