Improving the usability of climate indicator visualizations through diagnostic design principles

Visual climate indicators have become a popular way to communicate trends in important climate phenomena. Producing accessible visualizations for a general audience is challenging, especially when many are based on graphics designed for scientists, present complex and abstract concepts, and utilize suboptimal design choices. This study tests whether diagnostic visualization guidelines can be used to identify communication shortcomings for climate indicators and to specify effective design modifications. Design guidelines were used to diagnose problems in three hard-to-understand indicators, and to create three improved modifications per indicator. Using online surveys, the efficacy of the modifications was tested in a control versus treatment setup that measured the degree to which respondents understood, found accessible, liked, and trusted the graphics. Furthermore, we assessed whether respondents’ numeracy, climate attitudes, and political party affiliation affected the impact of design improvements. Results showed that simplifying modifications had a large positive effect on understanding, ease of understanding, and liking, but not trust. Better designs improved understanding similarly for people with different degrees of numerical capacity. Moreover, while climate skepticism was associated with less positive subjective responses and greater mistrust toward climate communication, design modification improved understanding equally for people across the climate attitude and ideological spectrum. These findings point to diagnostic design guidelines as a useful tool for creating more accessible, engaging climate graphics for the public.


Introduction
Indicators and climate visualization are one approach to answer ongoing calls for increasing the accessibility of climate data and information for a wider range of users, including nonscientists (Overpeck et al. 2011). They are frequently used to communicate status, changes, and trends in climate-related phenomena to audiences with a range of technical expertise (IPCC 2014; U.S. Environmental Protection Agency 2016; Melillo et al. 2014). Unlike data visualizations, indicators are regularly updated and change is explicitly linked to a baseline (Kenney et al. 2016(Kenney et al. , 2017. When well-designed, they can effectively translate complex scientific concepts, increasing their usability in decision-making (Dilling and Lemos 2011). As a result, efforts to produce clear and effective indicators have become widespread.
Indicator design consists of two components: (i) selection of the metric to measure change and (ii) choice of appropriate visualization to effectively and efficiently communicate an indicator's key messages (Kenney et al. 2016(Kenney et al. , 2017. While it might seem that selecting the metric is more of a challenge, designing visualizations that are accessible to diverse audiences can pose significant hurdles because design choices must account for cognitive experience and capacity (Harold et al. 2016). For example, McMahon et al. (2015) demonstrated that graphics that effectively communicate to domain experts are misinterpreted by well-educated nonexpert audiences. Even highly educated people struggle with understanding basic probability and risk information (Lipkus et al. 2001), highlighting that lower numeracy skills may be a potential contributing reason for misinterpretation of climate graphics among public audiences. Such graphic choices and key messages also need to account for how the public interprets scientific uncertainty (Budescu et al. 2014, Hollin et al. 2015, Morgan et al. 2009, Shackley et al. 1996. Additionally, it is important when creating indicators for the design to be clear as to whether the information presented is descriptive, empirical, predictive, or normative, or a combination of approaches, to clearly delineate data from interpretations of risk (Mahony et al. 2012).
The combination of these factors highlights an opportunity to be more thoughtful and evidence-based in the design and communication of climate graphics, especially those designed to be public-facing (Daron et al. 2015). Many indicators have, as their starting point, graphics designed for scientific audiences, whose goal is data exploration instead of the succinct summarization general audiences expect to see. This can cause impediments to understanding the key messages of an indicator-even for highly educated audiences. Given that many graphics are not sufficiently accessible and interpretable, a systematic approach to climate graphic improvements is needed. And, while the understandability of indicators is often limited by visualization design choices that need to be improved, it is important to note that the complexity of underlying concepts and patterns the indicators visualize is often also problematic and warrants attention.
With these questions in mind, we conducted a pilot study of the initial 14 indicators that were part of the US Global Change Research Program, National Climate Assessment pilotphase indicator system released in May 2015 (Kenney et al. 2016(Kenney et al. , 2017 to gauge the understandability of these indicators. We focused on key messages both to understand how to design more effective visualizations for assessment purposes and because the effectiveness of this metadata for reuse was evaluated in Wiggins et al. (2018). Metadata that conforms to FAIR principles (Wilkinson et al., 2016) is the mechanism provided for decision-makers to contextualize, recreate, or develop indicators that are meaningful for their own purposes. Study participants were asked to both interpret and draw inferences about the indicators. We found that indicators that were visually complex and represented abstract concepts were particularly challenging for participants to understand.
Based on this result, we selected a small subset of indicator visualizations to use in testing the main question of this study: whether making design modifications can improve a layperson's understanding of challenging scientific graphics. Specifically, the current study utilized a performance assessment of static climate indicator visualizations to test the effectiveness of design changes (similar to Gerst et al., 2020) in increasing the understandability of three indicators: Annual Greenhouse Gas Index (AGGI), Heating and Cooling Degree Days (HCDD), and Spatial Temperature Change (Temp). Our approach was to take existing climate indicator visualizations, diagnose them for design problems, and use the diagnoses to create design modifications for comparative assessment. The specific aspect of performance that we tested was understandability: whether a user understood the prescribed key message of a visualization. In addition, we also explored respondents' subjective reactions to the graphicsperceived ease of understanding, liking, and trust toward the graphics-as well as potential moderating effects of numeracy, climate attitudes, and party affiliation.

Diagnostic design guidelines
One key aspect of this study is that, as opposed to testing different design choices for a specific predetermined element (e.g., rainbow versus other colormaps), it instead diagnoses, modifies, and tests assemblages of design changes based on diagnostic guidance. Diagnostic design guidelines have a long history in the visualization field (Tufte 2001). However, only recently significant efforts were made to synthesize guideline insights produced through practice, which evolved from experience and intuition, with those produced in a scientific setting, which are informed by theory and experimentation. Beyond the improvements from combining diverse streams of evidence, the benefits of this synthesis are guidelines that have the ease of use of practice-based approaches and the linkages to cognition provided by science-based approaches.
For this study, we used a recent guideline of scientific visualization problems developed by Dasgupta et al. (2015), which focuses on improving the extent to which design choices match the intent of the visualization. For example, the intent of visualizations in the current study is the communication of a single key message related to a climate trend or pattern. Design problems that may cause a mismatch between intent and design choices can have five consequences: misinterpretation, inaccuracy, lack of expressiveness, inefficiency, and lack of emphasis. Misinterpretation is the most severe consequence. It occurs when a user makes an incorrect inference from an image and is often a sign of poor design choices. Inaccuracy, while serious, is less severe, and occurs when a user makes a conceptually correct but numerically inaccurate inference, for example, being able to perceive that one quantity is larger than another, but inaccurately inferring the exact relative difference.
Even if a user makes a correct and accurate inference from a visualization, other design problems can hinder effectiveness. First, a lack of expressiveness can occur if the intent of the visualization is not easily perceived. This frequently happens when less effective chart types or visual variables are chosen, directing user attention to unimportant trends or patterns. Inefficiency problems can occur if a user must spend more time than necessary to make a correct and accurate inference, and often happens when a design is overly cluttered or complicated. Finally, a visualization with a lack of emphasis does not include auxiliary elements, such as legends, grids, and annotations, that highlight important details. To correct for the many potential design problems, modifications of the visualization are implemented to improve these key five qualities of scientific graphics.
As shown in Table 1, the diagnostic guidelines are hierarchical. At the top-level problem stage, the distinction is made between encoding and decoding, and specific problem types are identified for each. Encoding relates to fundamental design choices that map data to visual variables: chart type, visual variable type, level of detail, and color map. Decoding encompasses how the design interacts with a user's perception and cognitive abilities. Decoding problems occur when there is visual clutter, scale or projection distortion, an overly complex visual comparison task, and ineffective auxiliary items that lead to a communication gap. For each problem type, specific problem causes are identified. For example, one cause of comparison complexity is superposition overload, which occurs when too many visual variables are superimposed on the same image. While the design might be technically appropriate, it may be too complex to enable an effective comparison of trends.
The problem causes listed in Table 1 can be used to review existing graphics for potential issues that can undermine understandability, and suggest beneficial modifications. For example, Gerst et al. (2020) used the guidelines to identify that a widely distributed weather forecasting map used the same color for two variables. This problem, called visual variable ambiguity, can lead to misinterpretation. Based on this diagnosis, the study created a modification that employed two different colors and tested its efficacy against the original; results showed a large improvement in understanding. The current study utilizes the guidelines in the same way: to identify potential problem causes, generate modifications to address these, and test whether the modifications are effective in increasing understanding.

Subjective responses to indicators
While understanding is key, it is only part of engagement with science communication. People's subjective responses, including the valence of affective reactions, play an important role in the likelihood of utilizing and sharing the material. Moreover, people respond positively to a sense of ease and fluency of interpretation, or familiarity with the material. If the participants struggle to make sense of the information, that can negatively impact their affective reactions, and lead to disengagement and dismissal. To account for these processes, we examined how positively participants felt about the indicators (assessed as liking), and their experience of accessibility in interpreting the graphic (assessed as ease of understanding).
Another key factor in how people engage with science communication is whether it is perceived as credible, salient, and legitimate (Cash et al. 2003). The absence of these factors can diminish trust, and a lack of trust toward the communication is likely to decrease engagement and uptake (Priest et al. 2003). Trust toward the content of information is also impacted by trust toward the sources and the channel of communication (Weingart & Guenther 2016), especially for politically controversial topics such as climate change (as discussed in the subsequent section). As such, we assess trust as an important metric of indicator communication success.

The audience: considerations in indicator effectiveness
In addition to challenges with correctly interpreting the graphics, understandability, subjective response, and trust may also be related to the capacities and attitudes of the viewers. The first relevant audience characteristic for interpreting indicators is the capacity to understand and work with numbers, known as numeracy. As mentioned earlier, a lack of basic numerical capacity, such as understanding probabilities and trends (Lipkus et al. 2001), can pose a considerable barrier to understanding scientific graphics. It is important to assess what levels of numeracy are needed to understand climate indicators, and whether the simplifications offered in the modifications help to close the numeracy gap. We also wanted to ascertain whether audiences who are most likely to use the indicators in decision-making, such as public planners or administrators, are able to interpret the information provided. We therefore focused our sample on participants who reported having some higher education.
The next important consideration are the attitudes and beliefs of the viewers toward the topic being addressed in the indicator. Specifically, climate change has become a highly polarized topic in the USA and other developed countries, with a willingness to accept the reality of anthropogenic climate change sharply split along ideological and political party lines (Dunlap et al. 2016;McCright et al. 2016). Increasing evidence of the problem, its causes, and consequences have not shifted beliefs and attitudes about climate change toward acknowledgment and resolve to take ameliorative action. Rather, a concerted and well-funded campaign to undermine trust in climate science, political leadership that actively denies the problem, and media messaging disparaging the issue and undermining resolve to implement solutions (Oreskes & Conway 2010) have effectively turned it from a question of science to one of identity and group membership (Kahan et al. 2011). People's desire to protect existing systems and their established way of life has led to ongoing unwillingness to engage with climate change and the risk it poses to themselves and society as a whole (Feygina et al. 2010).
For these reasons, accessible communication about climate change is particularly important. Yet, the underlying ideological polarization and lack of trust may impede understanding and acceptance through motivated cognition processes (Hennes et al. 2016). As such, it is important to examine whether improving visual climate change indicator design is effective in increasing understanding for people who are more skeptical of climate change, and across the ideological spectrum, to ensure that they are reaching diverse audiences. It is also important to examine how the differences in climate attitudes and in political affiliation are related to subjective responses to and trust toward climate indicators, and whether these responses are impacted by design improvements.

Research questions
In sum, our study aims to (i) empirically test the effectiveness of diagnostic design principles in identifying and guiding improvement in understandability and subjective responses to climate indicator graphics and (ii) explore whether the effectiveness of these modifications is impacted by people's numerical capacity, as well as prior attitudes toward climate change and political identities. Specifically, we set out the following research questions: RQ1. Do modifications based on diagnostic principles lead to improved understanding of the indicators? What types of modifications are most effective?
RQ2. Do modifications based on diagnostic guidelines improve subjective response to the indicators (perceived ease of understanding and liking) and trust?
RQ3. Does the effectiveness of design modifications in improving understanding and subjective response differ based on individuals' numerical capacity?
RQ4. Are climate attitudes related to understanding and subjective response to indicators? Do climate attitudes affect/impede any improvements gained from design modifications?
RQ5. Is political party affiliation related to understanding and subjective response to indicators? Does political party affiliation affect/impede any improvements gained from design modifications?

Participant recruitment
Participants were recruited through the online survey company ROI Rocket, which maintains a verified database of potential respondents, and were paid for their time. The survey was distributed by ROI Rocket in June 2017. A total of 817 surveys were completed, and 79 were dropped due to excessively short completion time, indicating incomplete attention. The sample size was designed to detect a 10-15% point change in understandability. We oversampled for participants who completed at least some higher education.

Experimental protocol and materials
Participants completed the experimental survey online through a web browser of their choice. The survey consisted of four blocks of questions: (i) numeracy assessment; (ii) indicator understandability and subjective responses; (iii) demographics and political ideology; and (iv) and climate change attitudes (see Online Resource 2 for survey).
First, we assessed numeracy using a two-part 10-item scale (Lipkus et al. 2001) that tests for the capacity to understand probabilities and basic mathematical concepts, using items such as: "If the chance of getting a disease is 10%, how many people would be expected to get the disease out of 1000?" In our analysis, we used the first part of the scale, which consists of three items, and is the most validated and widely used measure of numeracy.
Next, participants were shown three indicator visualizations. They saw either the original version of the indicator or one of the modifications designed by the authors for each indicator.
The following section describes in detail the creation of the modified indicators via the application of diagnostic guidelines. Participants were randomly assigned to visualization design conditions for each indicator, and the presentation order was randomized to counteract learning and carry-over effects.
Participants completed a series of questions after seeing each indicator. We assessed understanding using a three-answer multiple choice question that tested comprehension of the visualization's trends, patterns, and overall meaning. We then assessed subjective responses to the indicator with two items (alpha = 0.85): "How easy was it for you to understand the presentation of the data in the graphic?" (ease of understanding, from 1 to 5 stars) and "How much do you like the graphic?" (liking, from 1 to 5 stars). We also asked: "How much trust do you have in this indicator?" (trust, from 1 = high level of trust to 4 = no trust).
Climate attitudes were assessed using the Six Americas 15-item questionnaire (Maibach et al. 2011), which asks about acceptance of anthropogenic climate change, worry about its impacts to themselves and their communities, its importance to themselves and friends, and support for personal and collective mitigation efforts. These questions were administered after indicator assessments to prevent carry-over or biasing effects. Similar to the original Six Americas study, we use latent class analysis (polCA package) to group the respondents into clusters.
Demographic measures included gender, ethnicity, state of residence, combined household income, number of people in the household, highest level of education, and area of study. Ideology measures included political party identification, and rating on a liberal-conservative spectrum with respect to political orientation, social and cultural issues, and economic issues.

Indicator design modifications
Based on our pilot study (Kenney et al. 2017), and in line with reasoning reviewed above, we chose three visualizations that contain significant design problems that could potentially be improved through design changes: AGGI (Fig. 1), HCDD (Figs. 2), and Temp (Fig. 3). The authors categorized all design problems in the original image, decided which problems are modifiable, and then used diagnostic guidelines to create three modified treatment images per visualization.

Annual Greenhouse Gas Index
The AGGI indicator aims to communicate (as described by the USGCRP indicators website for this and other indicators) that the warming influence of all atmospheric greenhouse gas emissions has increased by 34% from 1990 to 2013. There are several problems in the original visualization (Fig. 1a) which may obscure this message. The level of detail may be too granular, and specifying stacked bars although the contribution of individual GHGs is unnecessary to extract the key message. The chart may be inappropriate, with the graph showing two axes with different units, which are unnecessary for understanding. There is also a communication gap: the annotation specifying that AGGI increased from 1.00 to 1.34 between 1990 and 2013 is technically correct, but requires a user to know that a change in an index can be converted to a percentage change. Additional problems are that the graph starts in 1979 when the key message references 1990, and the use of an index when the key message is about percentage change.
Addressing these problems is constrained by climate subject matter expert conventions for displaying the warming influence of GHGs, and with AGGI in particular, which require that the graph start at the beginning of increased data availability (1979), use the AGGI index (rather than an alternative), and have an AGGI value of one in 1990, representing a 100% increase since 1750 (Butler & Montzka 2020). Given these constraints, we focused design modifications on problems associated with the use of two axes and stacked bars. Modification AGGI 2 (Fig. 1c) removed the radiative forcing axis and moved the AGGI axis from right to left, yielding a one axis graph. Modification AGGI 1 (Fig. 1b) aggregated the stacked bars mapped to individual GHGs into a single bar representing total GHG contribution to warming. In modification AGGI 3 (Fig. 1d), both changes were used in order to test their combined effect.

Heating and Cooling Degree Days
For HCDD, the intended key message is that from 1979 to 2013 the number of heating degree days (HDD) decreased while cooling degree days (CDD) increased. We hypothesized that the primary design problem of the original HCDD visualization (Fig. 2a) is that the trends in HDD and CDD are not easily perceived, and this is exacerbated by having both time series on the same graph. This is known as superposition overload, which is a comparison complexity problem. Two options exist to fix this problem. One is simply to put each time series in its own graph (HCDD 3, Fig. 2d-e). The other is to treat the problem as a communication gap that can be addressed by annotation: adding either trendlines (HCDD 2, Fig. 2c) or a descriptive title  Fig. 2b). Much like AGGI, HCDD is likely difficult to understand, but changing the measure is outside the scope of visualization design.

Temperature (Temp)
The key message of the temperature map is that all regions in the USA have experienced an increase in average temperatures, with the Northeast showing the greatest increase and the Southeast showing the smallest increase. The original visualization (Fig. 3a) contains both temporal and spatial problems that make extracting the key message challenging. First, the decadal time series plots are difficult to use for gauging relative regional long-term trends and are an example of chart mismatch. Further, the map contains a level of detail that impedes spatial averaging needed to compare regions, and regional-level average changes in temperature are not annotated.
To address the graph mismatch and communication gap problems, modifications Temp 1 (Fig. 3b) and Temp 3 (Fig. 3d) remove the time series bar graphs and replace them with annotations indicating the amount of regional temperature change. To reduce the extent of detail, modifications Temp 2 (Fig. 3c) and Temp 3 apply the same high-resolution color map to the region level, aggregating lower-level detail. However, Temp 2 retains the time series graphs and has no additional annotation.

Data analytic plan
To address our research questions, we used the R statistical computing platform to test the effect of visualization modifications on understanding, subjective response, and trust toward the indicators, and to examine the moderating role of numeracy, climate attitudes, and political party affiliation on that effect. Separate regression models were calculated for each outcome variable, with design modifications as the main predictor and moderating variables entered as interaction terms (see Tables S4-S6 in Online Resource 1). For understandability, which is a binary variable, a logistic regression model was used, fit with the glm function from the stats package (Table S4). All understandability effects are reported as odds ratios along 95% Fig. 3 Temp indicator visualizations: a top-left, original (control) image; b top-right, Temp 1, remove regional time series and replace with average change annotations; c bottom-left, Temp 2, replace high spatial resolution with regional average; d bottom-right, Temp 3, remove regional time series and replace with average change annotations and replace high spatial resolution with regional average confidence intervals (CI) and p-values. For subjective response (Table S5) and trust (Table S6) variables, linear regression models were used. Coefficients were fit with the lm function from the stats package and heteroscedasticity-corrected covariance matrices were estimated using the hccm function from the car package. Climate attitudes and political party affiliation are treated as categorical variables, with respondents with very accepting climate attitudes and Democrat affiliation coded as the base comparison group in the regression models. Comparisons in the text that describe changes from a different base group (e.g., Republicans relative to Independents) are noted.
Direct and moderating effects are estimated by ANOVA, using likelihood ratio test for understandability and F-tests for subjective response and trust. The resulting χ 2 or F statistics for discussed effects are reported in the text along with p-values. All ANOVA results are listed in Table S3 of Online Resource 1. As the regression models contain multiple categorical variables and interaction terms, some post-estimation analysis was necessary to clearly report the model results. Numeracy main effects were estimated from the fit models using average marginal effects estimation from the margins package. The emmeans package, which implements least squares means and Tukey or Dunnett comparisons, was used to estimate linear contrasts among modifications and moderators as well as to estimate main effects from the saturated regression models (Tables S4-S6) in Online Resource 1.

Demographics and variable distribution
The respondents in our sample were 52% female and 48% male (none chose "other"). The ethnic breakdown was 83% White, 3% Latino, 6% Black, 6% Asian, .5% Native American, and 1% "other." For party affiliation, participants identified as 41% Democrat, 28% Independent, and 29% Republican. The two most prevalent education levels were Bachelor's degree (54%) and a Master's degree (27%). The most common household income was $50,000-75,000 and household size was two people. A complete distribution of demographics is in Tables S1 and S2 of Online Resource 1.
We tested for imbalances in demographics and moderators across the control and treatments to ensure randomization was effective. There was a slight imbalance in party affiliation distribution for AGGI (p = 0.04) and climate attitudes for Temp (p = 0.03). We used propensity scores to create balancing weights under two scenarios: (1) using the imbalanced moderator and (2) with all moderators. Fitting the regression models with either set of weights did not change the findings. Overall, the original versions of each indicator visualization scored relatively poorly on understandability, with 45%, 54%, and 39% of respondents correctly identifying the key message of the AGGI, HCDD, and Temp indicators, respectively (Fig. 4). The best improvement in understandability for each indicator was an increase of 17% (p = 0.003), 13% (p = 0.04), and 34% (p < 0.001) for AGGI, HCDD, and Temp. The increase for the Temp indicator was especially dramatic because its original visualization had the lowest understandability of the three indicators, while its best modification had the highest understandability (73%) of the modifications across all the indicators.

AGGI indicator
For AGGI, the key message (answer 1) is that all greenhouse gas emissions increased by 34% from 1990 to 2013. Answer 2 stated that warming influence had increased by 34% between 1979 and 2013, incorrectly identifying the start year, and answer 3 stated that warming influence has increased by 1.34%, incorrectly interpreting AGGI as a percent increase instead of an index centered at one.
The most frequent incorrect answer for the original AGGI visualization was answer 3. Removing only the stacked bars (AGGI 1) led to slight decreases in respondents choosing answers 2 and 3, resulting in an overall increase of understandability of 7% (p = 0.40). The limited improvement may be due to the key AGGI metric still being specified on the secondary vertical axis. Keeping the stacked bars but specifying only one axis (AGGI 2) led to a much larger reduction in answer 3 and an increase in understandability of 13% (p = 0.04). Finally, both removing the stacked bars and specifying only one axis (AGGI 3) led to an increase in understandability of 17% (p = 0.003).
Overall, while the modifications led to significant increases in understanding, there appears to be room for further improvement-for example by altering the key message to match the visualization's start date, changing the start year on the graph, or directly specifying percent on the graph as opposed to an index.

HCDD indicator
For HCDD, the key message (answer 1) is that from 1979 to 2013 heating degree days decreased while cooling degree days increased. The incorrect answer choices were heating degree days increased while cooling degree days decreased (answer 2) and heating degree days and cooling degree days both increased (answer 3). For the original visualization, the majority of incorrect answers were answer 3. Adding a title (HCDD 1) or trendlines (HCDD 2) did not improve this misinterpretation. Only separating each time series into a separate graph (HCDD 3) led to significant improvements in understandability (13%, p = 0.04). However, the reduction in misinterpreting the visualization as showing increases in both heating and cooling degree days (answer 3) was slightly offset by a fraction of respondents interpreting the visualization as having opposite trends to those shown (answer 2). This may be due to the relatively confusing definition of degree days, which links hot and cold in ways that are opposite of climate change: hotter summers = increased cooling degree days because of air conditioning use, and less cold winters = decreased heating degree days because less building heating needed. When this definition was highlighted by adding a title that emphasizes the climate change implications of the indicator (HCDD 1): "Warming Climate Leads to Milder Winters and Hotter Summers" the confusion appeared to be exacerbated. In this modification, selection of answer 2 increased dramatically, leading to an overall lower understandability than the original visualization (decrease of 12%, p = 0.06). This suggests that further improvement in the understandability of HCDD might require redefining the indicator measure.

Temperature indicator
The key message for Temp (answer 1) was that all regions in the USA have experienced an increase in average temperatures, with the Northeast showing the greatest increase and the Southeast showing the smallest increase. The first incorrect answer stated that all regions except for the Southeast have increased average temperatures (answer 2), and the second was the same as answer 1 but identified the Great Plains North, instead of the Northeast, as having the greatest increase in average temperature (answer 3). The original visualization performed poorly because it is difficult to estimate if the temperature in the Southeast has increased or decreased, and it is not clear which region has experienced the greatest increase. A similar percentage of respondents chose answers 2 and 3, indicating the spatial averaging problems are equivalent in difficulty. An additional problem is that the time series graphs are a distraction because their decade-level resolution is not necessary to extract the key message.
Removing the time series graphs and including annotation indicating regional average temperature change (Temp 1) led to a large drop in mistaking the Great Plains North as experiencing the greatest increase in average temperature, but no change in misinterpreting the Southeast as experiencing decreasing average temperature, resulting in overall improved understandability (15%, p = 0.01). Removing the fine spatial map resolution and aggregating by region (Temp 2) led to a large drop in misinterpreting the trend in the Southeast, because the small regions of cooling in Arkansas, Mississippi, and Alabama were not visible at the regional level of aggregation. However, regional average changes were not annotated and only indicated by a coarse color scale and the time series graphs. As a result, respondents still had difficulties identifying the region with the greatest increase in average temperature, and did not show improved understandability (5% increase, p = 0.72).
Combining the modifications of simplifying temporal and spatial change (Temp 3) led to a dramatic increase in understanding (34%, p < 0.001). Here, it is much easier for respondents to see that the Southeast is warming as opposed to cooling and to identify the Northeast as having the greatest increase in average temperature.

Efficacy of simplifying and annotating modifications
These findings provide evidence that simplification is an effective means to increase understanding, and can be particularly impactful when combined with annotation that conveys the key message. While not all partial simplifying modifications led to significant improvements (e.g., AGGI 1 and Temp 2), the best modifications for the AGGI and HCDD indicators implemented simplifications without any additional annotation. None of the modifications that added annotation alone improved understandability. However, the two best performing modifications for the Temp indicator used both simplification and annotation.

Trust
Overall, participants expressed modest trust for the indicators: M = 2.86 for AGGI, M = 2.66 for HCDD, and M = 2.83 for Temp. We did not see improvements in trust in response to indicator modification. As with understanding and subjective response, the HCDD 1 modification, which added a title without simplifying the graphics and may have led to confusion, resulted in a 0.33 decrease (M = 2.33, p = 0.002) in trust.

Does the effectiveness of design modifications in improving understanding and subjective responses differ based on individuals' numerical capacity? (RQ3)
For the numeracy measure, participants most frequently got 2 out of a total of 3 questions correct, and the distribution showed a left skew, with only 15% of respondents answering no questions correctly. For sections 3.4-3.6, ANOVA results are shown in Table S3 of Online Resource 1. Odds ratios and confidence intervals from the understandability logistic regression model are listed in Table S4, while coefficients from the subjective response and trust linear regression models are in Table S5 and S6, respectively.
These findings show that, overall, the indicators were more accessible to those with greater levels of numerical capacity, and that modifications to improve understandability were not more effective for those with lower (or greater) numerical ability.

Subjective response
Numeracy had weak direct relationships with subjective response for AGGI and Temp, and a moderating role for HCDD (F(3, 692) = 5.47, p = 0.001). A one-point increase on the numeracy scale was related to a 0.10 (p = 0.05) and 0.08 (p = 0.06) increase in subjective response for AGGI and Temp, respectively. The HCDD interaction showed that for the HCDD 1 modification, a respondent with low numeracy had no change in subjective response compared to the original, while a respondent with high numeracy had a decrease in subjective response of 0.35 (p = 0.003). Note that the HCDD 1 modification led to a large drop in understanding, so those with higher numeracy correctly perceived the graphic's difficulty.

Trust
Numeracy did not have a direct relationship with trust, but did exhibit moderating relationships for AGGI (F(3, 692) = 4.17, p = 0.006) and Temp (F(3, 692) = 3.65, p = 0.01). For the AGGI 1 modification, compared to the original, respondents with low numeracy increased their trust of the indicator by 0.47 (p = 0.06), while those with high numeracy decreased their trust by 0.26 (p = 0.21). For the Temp 1 modification, compared to the original, respondents with low numeracy experienced an increase in trust of 0.46 (p = 0.02), while those with high numeracy showed no significant change in trust.
In sum, numeracy was strongly related to better understanding of the indicators. Improving the designs did not close the gaps between those with different numerical ability in correctly identifying the key messages of the indicators. Numeracy was also weakly related to more positive subjective response. Interestingly, participants with less, but not more, numeracy found some modifications more trustworthy, suggesting that improved clarity can contribute to trust for those who have less capacity to interpret complex scientific graphics.
3.5 Are climate attitudes related to understanding and subjective response to the indicators? Do climate attitudes affect/impede any improvements gained from design modifications? (RQ4) To assess the role of climate attitudes, we clustered participants into groups that share similar climate attitudes, in line with the Six America's (2011) approach. While the original study found that six clusters best fit the data (Maibach et al. 2011), our survey results yielded four clusters. From most to least accepting climate attitudes, we labeled the groups very accepting (29.7%), mostly accepting (32.7%), mostly skeptical (21.1%), and very skeptical (16.5%). The key difference between the groups was the degree to which they reported accepting or skeptical climate change attitudes, and were concerned about its implications. The percent of each group that accepted that climate change is happening and mostly caused by humans was 95%, 75%, 40%, and 2%, respectively. For worry, 73% of very skeptical respondents were not at all worried about climate change, as compared to only 6% of mostly skeptical respondents (most of this group are a little worried). Similarly, 79% of very accepting respondents were very worried about climate change as compared to only 13% of mostly accepting respondents (most of this group were somewhat worried).

Understandability
For HCDD and Temp indicators, there were no significant direct or moderating relationships between climate acceptance and understanding of the indicators. However, for AGGI, we observed a weak direct relationship (χ 2 (3, 720) = 6.94, p = 0.07). Compared to respondents with very accepting climate attitudes, less accepting respondents had lower odds of understanding the indicator by a factor of 0.82 (CI: 0.50-1.34, p = 0.64) for mostly accepting, 0.55 (CI: 0.30-1.00, p = 0.05) for mostly skeptical, and 0.48 (CI: 0.24-0.96, p = 0.03) for very skeptical respondents, respectively.
In summary, climate acceptance was related to greater understanding of the climate visualizations for AGGI, but not for HCDD or Temp. At the same time, subjective reactions to all the indicators and their modifications were related to climate acceptance. Less accepting (more skeptical) respondents reported more negative subjective responses to, and less trust of, the visualizations.
3.6 Is party affiliation related to understanding and subjective response to indicators? Does party affiliation affect/impede any improvements gained from design modifications? (RQ5) For party affiliation, participants identified as 42% Democrat, 29% Independent, and 30% Republican.

Subjective response
The relationship of party affiliation with subjective response echoed that of understanding, with a direct effect (F(2, 692) = 8.11, p < 0.001) for AGGI, a weak moderation for HCDD (F(6, 692) = 1.90, p = 0.08), and a direct relationship for Temp (F(2, 692) = 3.86, p = 0.02). For AGGI, both Democrats and Independents had lower subjective responses than Republicans by 0.38 (p = 0.02) and 0.52 (p < 0.001), respectively. For HCDD original, Independents had the lowest subjective response, 0.70 (p = 0.05) lower than Republicans. For HCDD 3, Independents improved relative to Democrats by 1.02 (p = 0.01) compared to the original. As a result, there were no differences in subjective response among party affiliations for HCDD modifications. For Temp, Republicans had a higher subjective response (0.31, p = 0.02) than Independents, while differences between Democrats and Independents or Republicans were not significant.

Trust
Party affiliation showed no direct or moderating relationships for AGGI and HCDD, but we did observe a weak moderating relationship (F(6, 692) = 1.71, p = 0.12) for Temp. In the original and Temp 2, Independents had lower levels of trust (0.27, p = 0.19) relative to Democrats, (0.34, p = 0.07). However for Temp 3, trust among party affiliations was not significantly different. Compared to the original, this was an improvement in trust for Independents by 0.53 (p = 0.02) relative to Democrats.
In summary, party affiliation had stronger relationships with subjective response than understanding. In particular, Independents consistently trusted the graphics less than Democrats. However, for the AGGI original graphic, Republicans had a much higher understanding and positive subjective response than Democrats. For the AGGI modifications, Democrats increased in understanding and subjective response at a greater rate, erasing differences among political party groups for the modifications.

Summary of findings
Overall, we found that modifications based on diagnostic design guidelines improved understandability of climate indicators. Results demonstrate that, across all three indicators tested, simplifications were effective in helping participants understand the key messages contained in the indicators. However, we found that updating annotation was only effective for one of three indicators. Moreover, in one case, our annotation modification HCDD 2, which added a title to the indicator, may have interacted with the confusing nature of the graphic and led to significant reductions in understandability and subjective responses.
People's subjective responses to the indicators showed a pattern similar to that of understandability. Modifications that led to improvements in understandability also resulted in greater perceived ease of understanding and liking of the indicator. However, design improvements had less impact on trust, which changed in response to only one modification.
We found that the three complex indicators that we examined required relatively high numeracy for correct understanding. We had oversampled for people with more education in order to approximate the typical audience for climate indicators, such as public administrators and decision-makers. Even among this group, understanding varied and required considerable levels of numerical ability. Presumably, the problem is exacerbated among those with less education and exposure to statistics and science communication. And while our modifications improved understanding overall, they did not improve the accessibility of the graphics to a greater extent for those with lower numeracy.
In contrast, we found that understanding the indicators was mostly not related to attitudes toward climate change or political party affiliation. Climate attitudes were, however, strongly related to subjective reactions to the indicators. Those with stronger acceptance of climate change were more likely to find the indicators likable and easy to understand, and to feel greater trust toward them. Importantly, though, none of these attitude and identity-related variables moderated the effect of indicator modifications in improving understanding and subjective reactions, showing consistent effectiveness for people with different attitudes toward climate change and across the political spectrum.

Lessons learned from successful and unsuccessful modifications
While this study provides an important foray into the potential of diagnostic design guidelines for improving scientific graphics, it opens many new questions. For example, although we observed consistent improvements in understanding in response to simplifying the graphics, this was not the case for annotation. Simplification aims to remove complex or abstract elements of the representation, and thereby makes it more accessible, which appears to be a reliably effective improvement. Annotation provides textual support that either summarizes the information in the graphic or draws out the implications of the indicator. These modifications appear to behave differently. In particular, a few of our annotation modifications that spelled out indicator implications resulted in decreased actual and perceived understanding, liking, and (in one case, for HCDD) even trust. This least successful modification provided an annotation of the graphic's implications but did not address the inaccessibility of the complex visual design.
We suspect that people may have experienced a contradiction between their (possibly incorrect) interpretation of the complex graphic and the annotation which focused on key messages. This finding suggests that the most effective approach is to simultaneously simplify the graphic to remove any confusing elements, and to provide annotation that summarizes the key points and, if appropriate, helps guide the viewer toward the takeaways. This possibility needs to be explored in future research. More specific testing of annotation is needed to determine its effects independent of other potential modifications, as well as in combination and interaction with them. Furthermore, these types of modifications may be more or less effective for different kinds of indicators and design issues, and future research is needed to understand which offer the most benefits under which circumstances.

Trust
One interesting finding was that the sense of trust participants felt after viewing the indicators did not change much in response to the modifications we implemented-even for changes that improved understanding, perceived ease of understanding, and liking. While there is not enough evidence in the current study to interpret this finding, some clues point to future research questions. They may suggest, echoing the points above, that helping to increase the correspondence between viewer's own interpretations of the data shown and the annotated summary of the takeaways can create a sense of cohesion and thereby increase trust. Future studies need to examine the tradeoffs embedded in choices around simplification of graphical complexity and the annotations intended to support viewers toward key messages, and identify combinations that allow people to reach their own conclusions while offering guidance. On a broader scale, our findings reinforce the importance of building a foundation of trust in climate indicators through continual engagement by users and members of the public once indicators have been released and as they continue to be refined (Kenney et al. 2016).

Climate attitudes and political party affiliation
As can be expected from extensive prior research, people who were more skeptical of climate change had more negative personal reactions to climate indicators-they reported them as harder to understand and less likable, as well as less trustworthy. However, they were no worse at understanding the graphics (and even better for one of the indicators). Most important was our finding that these ideology-related variables did not interfere with the improvements brought about by indicator design modifications in understanding, as well as subjective reactions. This is an encouraging finding which suggests that, despite deeply held beliefs rooted in identities and ideologies, people are able to engage with the visual communication offered by climate indicators.
However, given evidence of motivated cognition processes which undermine the encoding and recall of climate information that does not align with people's attitudes and positions (Hennes et al. 2016), future studies are needed to understand the longevity of improvements in interpretation of and reactions to indicator modifications, and their impact on understanding of and engagement with the issue of climate change as a whole. Moreover, an important future question is whether improved understanding leads to updates in acceptance of climate change or attitudes toward it.

Limitations
The findings of this study are encouraging, but one limitation is the small set of indicators used. We examined three indicators that we chose based on their problematic graphic design and low understandability scores. While they offered an opportunity to test the effectiveness of modifications, they are a small sample that does not represent the breadth of the many types of science graphics and the specific design features these employ. Future studies are needed to look at graphics from different areas of science that address different types of phenomena, and employ different visual elements.
In addition, our sample was not representative of the general public, insofar as we focused on participants whose education level resembles that of more frequent users of climate graphics. To the extent that indicators are an important part of informing and educating the broader public about the progression and impacts of climate change, future studies should examine accessibility and improvement of indicator design among nationally representative samples.

Summary
Diagnostic design guidelines are useful for identifying shortcomings in climate indicators and developing modifications that improve objective understanding as well as the valence of subjective reactions. While helpful overall, such modifications may not make up for the inherent complexity of the material presented in the indicators, which continue to require a fairly high numerical facility for correct interpretation. Importantly, design modification is effective for improving understandability, as well as subjective reactions, independent of people's prior beliefs about climate change or political affiliation. These findings suggest that the design of climate indicators can benefit from applying diagnostic design guidelines and carries some promise of reaching beyond the ideological divide around this vital and pressing issue.
Availability of data and material Survey materials are included as Online Resource 2 and data is available upon request.
Code availability Not applicable.
Author contributions All authors contributed to the study conception, design, material preparation, and data collection. Analysis was performed by Gerst and Feygina. All authors contributed to writing and editing the manuscript. All authors read and approved the final manuscript.

Declarations
Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.