Introduction

Modeling coral bleaching: heat stress accumulation

Hundreds of millions of people (Burke et al. 2011) and more than a quarter of all marine species (Reaka-Kudla 1997) rely on tropical coral reefs, a habitat covering less than 1% of the ocean floor (Rivera et al. 2020). Reefs provide coastline protection (van Zanten et al. 2014) and island-forming sand (East et al. 2018) along with the complex habitat (Darling et al. 2017) and nutrient cycling (Hatcher 1990) necessary to support one of the most diverse and productive ecosystems on the planet.

Though vital to island cultures and economies worldwide, corals are in global decline as a result of human activities (Carpenter et al. 2003), with anthropogenic climate change among the dominant threats. In the past forty years alone, the average intensity of marine heatwaves has increased by more than 25%, and their frequency has escalated sixfold as a result of human carbon emissions (Laufkotter et al. 2020). Reef-building corals have a fairly narrow thermal tolerance, and sufficiently elevated sea surface temperatures (SST) cause corals to expel their algal endosymbionts (Symbiodiniaceae) as part of a physiological stress response (Brown 1997). This process, referred to as coral bleaching, robs the coral of color and its primary source of energy, compromising fitness and function. If stressful conditions persist, the coral will die. Climate change is therefore a major concern for coral reef conservation.

Accurately linking stressor (ocean warming) to stress response (coral bleaching) requires that we account for both the magnitude and duration of the heating event, so managers and researchers frequently employ metrics of time-integrated SST anomalies to quantify heat stress accumulation. Degree-heating weeks (DHW) are one such metric (Gleeson & Strong 1995) and are defined by the National Oceanic and Atmospheric Administration (NOAA) as a rolling sum of any positive weekly temperature anomaly (“HotSpot”) at least 1 °C above the maximum monthly mean (MMM) accumulated over the preceding 12 weeks (Liu et al. 2005, Liu et al. 2013; Fig. 1). Programs like NOAA’s Coral Reef Watch (CRW) track regions most at risk for coral bleaching according to a tiered DHW threshold-based warning model. Such knowledge can inform management actions (such as the placement of a marine protected area; Randazzo-Eisemann et al. 2021) or facilitate our understanding of how mass bleaching events develop (Liu et al. 2005, 2013; Little et al. 2022). Moreover, accurate bleaching forecasts enable rapid allocation of surveyors to affected reefs (Maynard et al. 2009), and in several cases, a priori bleaching detection using the DHW toolkit meant researchers were able to monitor the full progression of coral bleaching events in situ (Heron et al. 2016a; Jones et al. 2021).

Fig. 1
figure 1

Elements of the degree-heating week (DHW) definition used to calculate heat stress accumulation from local sea surface temperatures (SST). Data shown are from the NOAA CRW CoralTemp dataset for Key West, Florida, during the summer of 2015. The maximum monthly mean (MMM; the long-term average temperature for an area’s warmest month; solid line) is the climatological average from 1985 to 2012 and represents the area’s typical summertime high. MMM serves as the baseline for defining temperature anomalies. Only anomalies above a certain threshold (dotted line) are accumulated as heat stress (shaded red) within the rolling accumulation window (gray box). In this example, the accumulation window is 12 weeks long, and the illustrated DHW value represents heat stress accumulation as of September 23, 2015 (the rightmost edge of the window)

Though numerous studies validate the relationship between DHW and coral reef stress events (Eakin et al. 2010; Heron et al. 2016a; Kayanne 2017), many others find the accepted metric limited (McClanahan et al. 2007; van Hooidonk & Huber 2009; DeCarlo 2020; Mason et al. 2020; Ainsworth et al. 2021) or inaccurate (McClanahan et al. 2019; Eladawy et al. 2022) as a bleaching predictor. Coral bleaching dynamics vary widely on both spatial (Guest et al. 2012; Ainsworth et al. 2021; Bowdler 2021; Raj et al. 2021) and temporal (Thompson & van Woesik 2009; Palumbi et al. 2014; Ainsworth et al. 2016; DeCarlo et al. 2019) scales, yet the current DHW definition is rarely modified to account for regional variability or new data. Proposed modifications generally stop at adjusting the bleaching threshold (van Hooidonk & Huber 2009) or, in some cases, the baseline for tolerable SST (Burt et al. 2019). Analyses also seldom address how these stress-defining parameters may be changing over time—a key element in adaptive management.

The grimmest long-term projections suggest reef-building corals are teetering on the edge of an extinction point, with annual bleaching expected within several decades if we assume a constant bleaching threshold over time (e.g., Sheppard 2003; van Hooidonk et al. 2016; Donner et al. 2018). Yet several lines of research suggest that under the right circumstances, coral reefs may develop some degree of resistance to warming-induced bleaching (Thompson & van Woesik 2009; Guest et al. 2012; Cantin & Lough 2014; Palumbi et al. 2014; Ainsworth et al. 2016; DeCarlo & Harrison 2019) . These findings would imply that under a warming climate, we will observe an increase in the bleaching threshold (Donner 2009) as corals acclimate to rising water temperatures, or as community composition shifts toward more robust individuals and species when the reef’s more sensitive constituents die off (Hughes et al. 2018b). Overall, there exists a great deal of uncertainty in our understanding of coral bleaching trajectories, and assessing potential trends in bleaching thresholds relative to the rate of climate change will ensure predictive models are evolving appropriately.

Optimizing the DHW model: meteorological skill scores

There are four parameters we can modify in the DHW-based bleaching forecast model to better account for spatiotemporal variation in coral stress responses (Table 1, Fig. 1). Different combinations of (1) thermal baseline, (2) anomaly cutoff, and (3) accumulation window size will yield different heat stress accumulation (DHW) values, and we can adjust (4) the bleaching threshold defining how much heat stress must accumulate before corals are expected to bleach. DeCarlo (2020) applied a skill-based approach to refine the DHW calculation, altering anomaly inclusion criteria and accumulation window size to improve bleaching forecast performance using tools from another predictive analog: meteorology. Here, we build upon these analyses using higher-resolution SST measurements and a larger bleaching dataset for forecast validation.

Table 1 Elements of the degree-heating week (DHW) definition and the DHW bleaching forecast model

Models used to forecast binary (i.e., presence/absence) weather occurrences, such as rain or tornado formation, undergo continuous refinement via a suite of skill tests that assess the correspondence between predicted and observed outcomes (Table 2). Together, six skills describe all relevant facets of presence/absence forecast performance (Table 3). Models that maximize hits (H) and correct negatives (CN) while minimizing misses (M) and false alarms (FA) generally demonstrate greater skill, as reflected in larger accuracy scores, high probability of detection (PoD), small false alarm ratios (FAR), and a low probability of false detection (PoFD). For forecasts where only event occurrences are important, accuracy (which rewards hits and correct negatives equally) is less useful than threat (which emphasizes events over non-events). However, these scores can be misleading, particularly when the relative frequencies of presence and absence diverge substantially from a 50/50 split. For a rare event, a forecast model might achieve a great deal of accuracy simply by predicting it never occurs. On the other end of the spectrum, frequent occurrences are more likely to be predicted purely by random chance, so higher-probability events achieve greater threat scores without necessarily demonstrating greater skill.

Table 2 Contingency table representing the possible outcomes of each hindcast
Table 3 Skill tests used to assess the performance of binary forecast models

The equitable threat score (ETS) is so named because it penalizes the forecast model for hits due to random chance, allowing it to approximate statistical equitability at sufficiently large sample sizes (Hogan et al. 2010). In theory, this penalty permits fair comparisons between any two forecast models, since it dampens the effect of event rarity and assigns low skill to forecasts biased in either direction. For the same reason, ETS is usually reported with the forecast’s frequency bias: a value representing the agreement between model and reality on the rarity of the predicted event. An ideal bias score should be close to 1, with a smaller value suggesting the model underpredicts the event; the converse is true for larger values. An ETS of exactly 0 requires that the model be prohibitively discriminating (always returning an “absence” forecast) or wholly indiscriminate (always returning a “presence” forecast). For our bleaching prediction models, we might achieve this in two ways: setting the bleaching threshold higher than all observed DHW values or, on the opposite end of the spectrum, reducing the bleaching threshold to 0. Between these two extremes (all or nothing), ETS reaches a maximum value, making it an ideal metric for optimizing forecast model parameters (see DeCarlo 2020 for an illustrative example).

As a comparison tool, relative ETS values are more meaningful than absolute values, which are not intuitive. To contextualize the ETS values reported in this study with a more mundane example, next-day forecasts for rain tend to score between 0.30 and 0.40 by this metric, and scores < 0.10 are generally regarded as a sign of little to no skill. A model predicting moderate rainfall for the southeastern United States with a more realistic 3.5-day lead time produces an ETS varying from 0.07 in the summer to 0.19 in the winter when precipitation patterns are more frequent and more consistent (Baxter et al. 2014). In this example, differences in ETS reflect differences in forecast skill dependent on lead time and season, as predictability improves model performance.

For our analysis, we cross-tested variations in temperature anomaly cutoff, accumulation window size, and bleaching threshold (Table 1), producing a global array of annual bleaching predictions for each combination. After validating these model predictions against observed bleaching events spanning the tropical oceans from 1998 to 2019, we propose modifications to the DHW bleaching forecast model in order to optimize its forecast skill globally, regionally, and over time. We also assess the performance of different thermal baselines (climatologies; Supplement 1) and, finally, co-opt the skill-testing approach as a means of testing for shifts in the bleaching threshold under rising ocean temperatures.

Methods

Data selection

Prior analyses using the DHW metric have cited the coarse spatial resolution of satellite SST measurements as a barrier to modeling reef-scale processes like coral bleaching (Eakin et al. 2010; McClanahan et al. 2019; Mason et al. 2020; Ainsworth et al. 2021). Therefore, to calculate thermal baselines and DHW values, we chose NOAA’s CoralTemp dataset for its high spatiotemporal resolution and coverage (5 km \(\times\) 5 km daily global values from 1985 to present; NOAA Coral Reef Watch 2018; Skirving et al. 2020). Since its update in 2018, this near-real-time satellite dataset is among the best available (Little et al. 2022), and as it forms the basis for all of NOAA’s CRW products—including their own DHW calculations and coral bleaching forecasts—we can draw direct comparisons between the current DHW-based model and the variations tested here.

The bleaching data used in this analysis were compiled by NOAA (included in Virgen-Urcelay & Donner 2023) and van Woesik & Kratochwill (2022a, b). We converted bleaching values as reported in each database (percentage of total coral cover bleached) to presence/absence observations using thresholds of 1% (following Hughes et al. 2018a), 5%, 10%, 35%, and 50% for bleaching presence (Supplement 6). We use 5% for our results reported in the main text, but see Supplement 6 for additional details. Collectively, these ~ 35,000 observations span over 10,000 reef sites globally and constitute two decades of reef monitoring (Fig. 2). We assigned each bleaching observation to the nearest 5 km \(\times\) 5 km cell in a global mesh grid to match SST values to every reef in the dataset. Bleaching observations assigned to the same grid cell experience the same calculated heat stress but were assessed as separate data points; thus, if a grid cell included neighboring reefs, one of which bleached on a given year where the other did not, they were counted individually as a presence and an absence. This approach retained any bleaching response variability that may exist in the dataset at smaller-than-5 km scales or on an intra-annual basis.

Fig. 2
figure 2

Annual distribution of bleaching occurrence (5% bleaching or greater) data used to validate bleaching forecast models in this study. ~ 20,000 observations from 1998 to 2019 were compiled by van Woesik & Kratochwill (2022a, b) and the remaining ~ 15,000, spanning 2014–2017 only, were compiled by NOAA (Virgen-Urcelay & Donner 2023)

We omitted redundant reports between the two databases and “colony-level” bleaching reports, but otherwise used all observations recorded. Some areas are more densely represented than others throughout the time series.

Variables assessed

Temperature anomalies are defined relative to a historical baseline, usually a climatological (interannual) average like the maximum monthly mean (MMM): the temperature of the site’s warmest summer month every year averaged from 1985 to 2012. We calculated DHW according to four different thermal baselines: MMM over the standard three-decade period (1985–2012), two shorter MMM climatologies (15 years from 1997 to 2012 and 10 years from 2002 to 2012), and a “maxima” climatology that included only the 10 warmest years from 1985 to 2012. The latter was included to test the assumption that in areas with high interannual SST variability (such as those affected by El Niño Southern Oscillation, ENSO), multiannual rather than annual temperature maxima determine the upper limit of coral heat tolerance. Consequently, we performed a second baseline comparison that included the Niño SST Index regions (1 + 2, 3, 3.4, 4) only. Following preliminary testing, only the best-performing baseline (the 1985–2012 standard for both the globe and ENSO regions; Supplement 1) was used in subsequent analyses.

Since DeCarlo (2020) found that eliminating the current definition’s 1 °C anomaly cutoff improved bleaching forecast skill, we assessed intermediate variations in the cutoff temperature for counting SST anomalies to find an optimum: 0 °C (no cutoff) to 1.0 °C in increments of 0.2 °C. The same study experimented with shrinking the accumulation window (< 12 weeks), but we selected a broader range of values (1 to 15 weeks) for the rolling window. 15 weeks is about the duration of a typical summer season, at least for off-equatorial regions (Heron et al. 2016b).

NOAA CRW’s bleaching alert system uses coral bleaching thresholds of 4 DHW (Alert Level 1: bleaching expected) and 8 DHW (Alert Level 2: widespread bleaching and mortality expected), values which have been frequently cited in the literature (e.g., Hooidonk and Huber 2009, Heron et al. 2016a, etc.). We tested bleaching threshold values from 1 to 12 DHW.

Altogether, we assessed 1,080 permutations of anomaly cutoff, window size, and bleaching threshold in the DHW metric as a threshold-based coral bleaching forecast model using a series of iterative loops in MATLAB. For each combination, if a grid cell’s maximum DHW value for a given year met or exceeded the bleaching threshold being tested, the event was reported as a positive bleaching prediction. Thus, each cell–year pair in the hindcast constituted a single predicted event representing that area’s bleaching status during peak warming. On the other hand, the timing of observed bleaching events is dependent on survey dates, so they may not necessarily align with the calculated DHW peaks. Our forecast validation assumed that any observed bleaching during a given year resulted from peak DHW thermal stress; more precise alignment was not possible. Reef surveys rarely occur during peak DHW—most surveys included in these datasets were conducted during or after warmer summer seasons—and bleaching is often still evident weeks to months after its onset (Hueerkamp et al. 2001; Schoepf et al. 2015). Unless coral populations have visibly recovered to below the 5% bleaching threshold between a thermal stress event and the subsequent bleaching survey, the event is still flagged as a bleaching occurrence. Though some misalignment of observations and peak DHW is possible, we validate the current DHW definition and all of our variations against the same bleaching dataset, so all analyses will be subject to the same misalignment errors, most of which will result in frequency biases closer to 0. As a result, we may still be confident in any identified skill improvement between models—particularly when these improvements ameliorate a tendency to underpredict bleaching.

We performed a train–test split analysis to evaluate the robustness of derived DHW parameters post hoc. To do this, in each of 100 iterations, the best-performing (highest ETS) model according to a random partition of 80% of the validation data (training set) was applied to the remaining 20% of the observations (test set). These train–test splits (repeated 100 times per optimization) inform how robust our results are to variations in the underlying data (see Supplement 7).

Global, ocean-specific, and regional forecast optimization

After optimizing the DHW bleaching forecast model on a global basis (all events), we also calculated forecast skill for specific subsets of the dataset by only including outcomes (hits, misses, false alarms, and correct negatives) that fall within certain bounds, such as the coordinates for an ocean basin or a specific calendar year range. We divided the dataset by ocean basin (Atlantic, Pacific, Indian) and region (Supplement 2, Fig. 3) and selected the best-performing DHW definition for each geographic subset based on ETS. With the exception of the Coral Triangle, regions were defined by clustering nearby reef sites rather than adhering to an established convention, but our definitions do align with prior work (Maragos & Williams 2011). After optimization, we determined the forecast skill for each level of specificity (globe, basin, region) by validating every bleaching event according to the DHW definition optimized for its geographic location. Since every skill test is based on the same four outcomes (Table 2) and applied to the entire global dataset (even as different DHW definitions may be applied to each region), the scores for every model are comparable, allowing us to assess how increasing the geographic specificity of each bleaching model affected forecast skill.

Fig. 3
figure 3

Geographic distribution of reef sites used in this analysis; red dots represent discrete bleaching observations subset into three ocean basins (left to right: Indian, Pacific, Atlantic) and 16 regions (white polygons): 1 = Red Sea, 2 = Persian Gulf, 3 = Arabian Sea, 4 = West Indian Ocean, 5 = Central Indian Ocean, 6 = Mainland Southeast Asia, 7 = Coral Triangle (Political; includes all waters around countries with reefs in the scientifically defined Coral Triangle region) & Environs including Cocos-Keeling, 8 = East Asia including the Mariana Islands, 9 = Western Australia, 10 = Great Barrier Reef & Coral Sea, 11 = Micronesia & Equatorial Polynesia, 12 = Hawaiian Archipelago & Environs, 13 = Southern Polynesia, 14 = Eastern Pacific, 15 = Caribbean & Bermuda, 16 = Equatorial Atlantic

Bleaching threshold temporal analysis

To explore the possibility of a global shift in bleaching threshold, we held anomaly cutoff and window size constant at values from the global optimization analysis, and then optimized DHW for annual subsets of the data by allowing only bleaching threshold to fluctuate. As in the optimization analysis, we ran models for the globe, each ocean basin, and select regions where observations were consistently available (Supplement 3). We plotted the best-performing bleaching thresholds by year in order to reveal any temporal trends via linear regression from 1998 to 2019. Analyses violating the regression assumption of normality were performed using the nonparametric Mann–Kendall Tau trend test for time series instead (MATLAB function written by Burkey 2006). We included half-increment thresholds to improve continuity: 21 values every 0.5 DHW from 0 to 10 DHW. In some cases, the forecast model for a specific year exhibited no skill (highest ETS was 0); these were omitted from analysis since an ETS of 0 returns two different bleaching thresholds, one at each end of the spectrum (omitted values and further explanation can be found in Supplement 3). To compare the rate of change observed in the optimal bleaching threshold through time against the rate of ocean warming, we also averaged the DHW maxima (using the same constant values for anomaly cutoff and window size) for each grid cell in the analysis on an annual basis and performed a time series regression.

Results and discussion

Refining the DHW definition

In this study, we performed a series of coral bleaching hindcasts using 1,080 variations of the DHW definition characterized by different combinations of underlying climatology, anomaly cutoff temperature, accumulation window size, and bleaching threshold. Our assessment determined which combination of variables produced forecast models with the highest skill, optimizing the DHW definition first for the globe, then for each ocean basin, and finally for sixteen regions of the tropical ocean (Fig. 3). Global optimization validated the use of the 1985–2012 maximum monthly mean (MMM) as the best-performing baseline for reef thermal tolerance (Supplement 1), so all further skill testing applied this climatology. Our spotlight analysis on regions affected by El Niño Southern Oscillation (ENSO) also found that the 1985–2012 climatology outperformed the maxima climatology (10 warmest years).

Global optimization

Overall, the current DHW definition (1 °C cutoff, 12-week window, 4 DHW bleaching threshold) achieved an equitable threat score (ETS) of 0.098. Optimizing the accumulation window duration, anomaly cutoff, and bleaching threshold on a global basis increased the ETS to 0.167, a skill improvement of 70% (Table 4, Fig. 4). The adjusted definition also raised forecast bias from ~ 0.27 (current) to ~ 1.27 (global optimization), indicating that the current definition consistently underpredicted bleaching occurrences from 1998 to 2019 (Table 4). The updated model therefore made positive predictions more likely by using a smaller cutoff of 0.4 °C and a lower bleaching threshold of 3 DHW with an 11-week accumulation window (Table 4, Fig. 4, Supplement 4). The current definition and the globally optimized definition achieved similar accuracies of 65% and 64%, respectively, but most of the current model’s accuracy score was due to correct negatives rather than hits—the forecast using the current definition detected only 20% of bleaching occurrences (Table 4). Although the globally optimized model produces about twice as many false alarms (FAR = 0.45) as the current model (FAR = 0.23), the payoff is a nearly fourfold increase in detection rate (PoD = 70%) over NOAA’s standard for Bleaching Alert Level 1. We should note that the current DHW definition predicts extreme bleaching events more accurately than mild events, but it was still outperformed by a definition with a smaller accumulation window and anomaly cutoff in all cases (see Supplement 6 for the full analysis).

Table 4 Bleaching forecast skill for globally and geographically optimized variations of the degree-heating week (DHW) definition. The synthesized geographic optimization scores (highlighted and in bold) were calculated from a global model in which each observation was compared to a prediction made using the DHW definition optimized for its ocean basin or ocean region, respectively. Ocean basins and regions are listed in order of decreasing ETS
Fig. 4
figure 4

Results of adjusting DHW definition and bleaching forecast model parameters to optimize forecast skill (equitable threat score, ETS) when a: the anomaly cutoff was held constant at 0.4 °C, and b: the accumulation window was held constant at 11 DHW. In both panels, the thick black line represents the performance of the current definition using a 1 °C anomaly cutoff and a 12-week accumulation window

The train–test split validated our definition (88% of splits; Supplement 7), though it also suggested that a smaller window and lower bleaching threshold may perform just as adequately. This seems to be in line with prior DHW optimization studies, which favored further shrinking the accumulation window to 8 or 9 weeks when the 1 °C anomaly cutoff was omitted (DeCarlo 2020; Lachs et al. 2021). Though DeCarlo (2020) uses the same optimization framework as the current study and produced a global model of comparable skill (ETS = 0.169, bias = 1.27), differences in SST product and coral bleaching dataset could explain some divergence in results, and in order to compare scores between studies, we need to discuss the relationship between data resolution and ETS. Compared to CoralTemp (the high-resolution 5-km SST product used in this study and Lachs et al. 2021), Little et al. (2022) found that OI-SSTv2 (the lower-resolution 25-km SST product used in DeCarlo 2020) overpredicted mass bleaching events and thus required more stringent prediction criteria. Higher-resolution forecast grids tend to artificially reduce ETS since errors (misses, false alarms) are less likely to be blurred out (Weygandt et al. 2004), and if our verification “area” (spatiotemporal coverage of our observations) is small relative to the scale of the forecasted event, the model is over-penalized for hits due to random chance (Wang 2014). Marine heatwaves generally affect regions containing many reef sites, so we cannot assume that bleaching observations on each reef are independent of one another. Yet because the random-chance correction term is based on simple probability (Table 3), the model will be penalized as if occurrences were random, resulting in a lower score (reductions on the order of ~ 0.01 to ~ 0.1 depending on the ratio of event area to verification area; Wang 2014).

Using a larger bleaching dataset and assessing bleaching reports individually (by site) in the present analysis may have had a similar effect—the Hughes et al. (2018a) dataset used in DeCarlo (2020) defined bleaching observations by each of 100 geographic regions ranging in size from 2 km2 to 9319 km2, such that if bleaching were observed anywhere within those confines, the entire area would be considered a “presence” event for that year. Our method here ensured a larger verification sample size at the cost of greater noise. We should note that Lachs et al. (2021) used a similarly large bleaching dataset (n = 10,380) as the current study, but limited qualifying events to observations of 20% bleaching extent or greater (this study: 5%). A higher event threshold dismisses minor bleaching occurrences and thus produces a model that is less sensitive to mild warming events—those that would need a longer accumulation window to detect. This greatly improves accuracy with regard to significant bleaching events at the cost of reducing overall detection rate (see Supplement 6), but as Lachs et al. (2021) used a different (and more computationally costly) method to optimize their forecast models, we can only comment on general trends.

Ocean basin and regional optimization

Testing forecast skill using geographic subsets of the bleaching data allowed us to further optimize the DHW definition on an ocean and regional basis by accounting for the ways in which coral stress responses vary broadly across space. Table 4 reports the improvements in ETS and the other skill scores for all models and the associated optimized parameters. A global analysis in which each grid cell uses the bleaching forecast model optimized for its ocean basin (Atlantic, Pacific, or Indian) demonstrated an overall ETS of 0.287, nearly double the performance of the globally optimized model (ETS = 0.167) and nearly three times the performance of the current model (ETS = 0.098). Further subdividing each ocean basin into 16 smaller geographic regions continued to improve skill (ETS = 0.337). These augmented scores reflect increased accuracy—from 64% (globally optimized) to 74% (ocean-specific) and 76% (regionally optimized)—and although the ocean-specific model slightly underpredicts bleaching (bias = 0.854), the region-specific model achieves a near-perfect frequency bias score of 0.960. Interestingly, the regionally optimized model more closely matches the globally optimized model for probability of detection (70%) than the basin-optimized model (61%). For both the regional and basin-optimized models, the smaller false alarm ratios (0.275 and 0.287; Table 4) suggest that their positive predictions were more likely to correspond with observed bleaching occurrences than the global model.

As with the global optimization, the train–test split analysis validated our regional definitions; in many cases, the results were unanimous across all 100 splits, and for all regions, the values reported in Table 4 fall within 1 SD of the corresponding test mean (Supplement 7). All models significantly outperform the current definition at the 95% confidence interval.

Reef stress is governed by a complex constellation of interactive factors that (1) determine the environmental conditions a reef experiences and (2) define the circumstances under which such conditions would be stressful, but (3) are challenging to observe, predict, and model (Mason et al. 2020). Fortunately, factors that drive coral bleaching response variability often share a critical commonality: a strong spatial component. Consequently, incorporating location into even a simple threshold-based bleaching model accounts for a substantial portion of variability, vastly improving forecast skill. The literature agrees: Geographic location was the single strongest predictor of coral bleaching in McClanahan et al.’s (2019) analysis. After assessing seventeen different variables, their best-performing model focused on the interaction between geography and temperature. Additionally, Heron et al. (2016a) found that their bleaching models only performed satisfactorily (r2 > 0.4) when bleaching sites were grouped geographically or by site type. Prior applications of meteorological skill testing to coral bleaching forecasts confirmed that optimal bleaching threshold varies substantially from region to region, and accounting for this specificity drastically improves forecast skill (van Hooidonk & Huber 2009; DeCarlo 2020); we simply took this a step further and allowed all forecast parameters to vary regionally, with a corresponding increase in forecast skill. Our results validate the power of addressing geographic variation in coral stress responses even without fully understanding the specific variables at play.

The broad regional variation that we found across optimized DHW definition parameters (Table 4) reflects the variability in bleaching responses observed in the literature across individual reefs as well as on the geographic scale detectable by remote sensing tools (Guest et al. 2012; Bowdler 2021). Adjacent basins and regions (which are more likely to share some of the traits enumerated above) tended toward similar parameters during DHW optimization. The best-performing forecast model for the Atlantic Ocean, for example, varied substantially from those produced for the Indian and Pacific Oceans, which were more alike: the Atlantic’s optimal definition used a 0.2 °C cutoff, 4-week accumulation window, and 2 DHW bleaching threshold, while the Pacific and Indian Oceans favored a larger 9-week accumulation window (twice the duration) and much higher bleaching thresholds (triple the allowed accumulation; Table 4). In practice, these values imply that Atlantic reefs during this period were more susceptible to coral bleaching than their counterparts in the Indo-Pacific, a well-established pattern in the literature (e.g., Cramer et al. 2021). Its biogeographic isolation from the Pacific and Indian Oceans produces differences in species composition and oceanographic conditions. Moreover, high levels of human impact, historical loss of coral cover and complexity due to storms and disease, and overall lower biodiversity have had devastating effects on Caribbean reef resilience (Roff & Mumby 2012).

Similarly, the two Central Pacific regions encompassing much of Micronesia and Polynesia (regions 11 and 13 in Fig. 3) favored similar DHW definitions with a low anomaly cutoff (0.2 °C–0.4 °C), moderate accumulation window (8–10 weeks), and relatively high bleaching threshold (5–6 DHW). The reefs in these regions experience generally similar oceanographic and climate processes by virtue of their proximity to one another in the middle of the planet’s largest ocean basin. Further, a common biogeographic history ensures some degree of taxonomic overlap and shared assemblage traits relative to more distant regions (high local endemism notwithstanding; Maragos & Williams 2011). Of all their shared characteristics, their stress history is most relevant. The reefs of the central and eastern Pacific are among the most heavily affected by El Niño (ENSO) events, and corals accustomed to high interannual SST variability may be better suited to weather mass bleaching events (Donner & Carilli 2019; Fox et al. 2021)—hence the higher bleaching threshold. Seasonal SST variability tends to predict coral bleaching resilience better than diel or interannual variability, but evidence supports all three conferring an adaptive benefit (Carilli et al. 2012; Palumbi et al. 2014; Safaie et al. 2018; Barshis et al. 2018; Brown et al. 2023). Additionally, though the maxima climatology baseline did not perform as well as the standard 1985–2012 climatology, the use of the latter may slightly underestimate the true thermal tolerance of reefs in these regions; this is evident in the difference in optimal bleaching thresholds between the two (1 DHW and 3 DHW, respectively; see Supplement 1). Given the tendency of the maxima climatology-based model to underpredict bleaching in ENSO-affected regions relative to the same model using the standard climatology, however, we believe the latter remains a better choice.

Changes in bleaching threshold over time

To identify a potential temporal trend in optimal bleaching threshold, we held anomaly cutoff and window size constant and optimized DHW for annual subsets of the data by allowing only bleaching threshold to fluctuate. This identified the best-performing bleaching threshold for each year, which we plotted over time (Fig. 5) and assessed for a temporal trend (Table 5). Using the globally optimized DHW definition with a 0.4 °C cutoff and an 11-week accumulation window, the optimal bleaching threshold has increased significantly from 1998 to 2019 (τ = 0.312, p = 0.044); the least-squares line (Fig. 5a, Supplement 3) puts the global rate of increase around 0.19 DHW/year. This oceanwide shift in average bleaching resistance corresponds with and potentially outpaces the global mean rate of rising DHW (0.14 DHW/year, p < 0.01, R2adj = 0.436; Table 5), though our modeled trajectory carries large uncertainties apparent in both the confidence envelope around the best-fit line and the variance between iterations in the train–test splitting analysis, which in some data-poor years could be quite high (Fig. 5, Supplement 7). Many drivers may be contributing to the rising bleaching threshold, and they vary from optimistic to grim.

Fig. 5
figure 5

Changes in optimal bleaching threshold over time relative to peak heat stress. Accumulation window size and anomaly cutoff were held constant at the optimal values for each locale. Annually optimized bleaching thresholds are plotted for three regions: a: the globe (black), b: the Pacific (blue), and c: the Indian Ocean (purple). Open red circles in each panel are the DHW maxima (using the same constant values for anomaly cutoff and window size) for each year averaged across reef sites globally and represent the background rate of ocean warming. Shaded envelopes around lines of best fit represent 95% confidence intervals; R2 values are adjusted; Mann–Kendall τ is provided for the nonparametric trend test; bolded p-values indicate a trend significantly different from 0

Table 5 Linear trends in bleaching threshold over time. For analyses which violated assumptions of normality, a nonparametric Mann–Kendall Tau trend test for time series was performed in lieu of linear regression, and the appropriate statistic is reported for each. Significant p-values are bolded and marked with an asterisk (*). “Peak DHW” is the DHW maxima averaged across each grid cell in the analysis on an annual basis and represents the temporal trend in reef heat stress from 1998 to 2019

Evidence from coral cores, field surveys, and both field and laboratory experiments suggest that we can expect some degree of coral acclimation as environmental conditions shift to extremes (Thompson & van Woesik 2009; Guest et al. 2012; Cantin & Lough 2014; Palumbi et al. 2014; DeCarlo et al. 2019), although concern remains that climate change may be progressing too rapidly for corals to keep pace (Ainsworth et al. 2016). In the short term, corals may modify their behavior, symbiont community, or gene expression to cope with stressful conditions more effectively (e.g., Turnham et al. 2023). However, population resilience only develops as a result of adaptation, wherein coral species and individuals predisposed to surviving extreme temperatures outlive and replace their less resilient neighbors, though this process is a slow one. Still, acclimation and adaption are essential drivers of community development and population maintenance, and not all ecosystem shifts are created equal. Low-resilience species and individuals, many of which are major habitat builders essential for the reef’s diversity-supporting structural complexity, are disproportionately killed during mass bleaching events severe enough to reduce coral cover (Hughes et al. 2018b; Darling et al. 2019). The extant assemblage is typically comprised of slow-growing, heat-tolerant reef builders and a contingent of opportunistic species: weedy corals providing limited structural complexity, as well as algae, sponges, and other competitive space occupiers that may interfere with coral recovery (Hughes et al. 2018b; Darling et al. 2019). This shift toward hardier, less diverse, and less abundant assemblages also reduces the reef’s susceptibility to bleaching, but at a price—an associated reduction in function, coral cover, biodiversity, and resilience in the face of non-bleaching stressors.

Regardless of mechanism, the bleaching threshold trend offers two insights: first, based on these tens of thousands of bleaching observations over several decades, we may be observing an increase in bleaching threshold over time, and second, optimal ETS did not change appreciably over the course of the bleaching threshold analysis, but consistently outperformed an otherwise identical model that used a static bleaching threshold (Supplement 5). The latter implies an ability to adequately maintain bleaching forecast skill over time simply by adjusting the bleaching threshold to compensate for environmental shifts and corresponding changes in the coral stress response. Although this trend may not apply to every reef or region, future bleaching projections should account for the possibility.

Two examples from the region-specific threshold analyses illustrate how bleaching threshold trends may differ from place to place. Regions 11, 13, and 14—which experience periodic ENSO-related temperature extremes that almost certainly moderate reef stress tolerance (Spillman et al. 2021)—exhibited no significant change in bleaching threshold over the past two decades (Table 5); the most appropriate bleaching thresholds for the central and eastern Pacific were historically already higher than other regions (Table 3). In contrast, we found that when considering the Pacific as a whole, the forecast-optimized bleaching threshold has increased significantly since 1998 (slope 0.17 DHW/year, p < 0.05, R2adj = 0.176; Fig. 5b). This does not seem to be the case in the Indian Ocean, where there is a positive but not significant trend in the bleaching threshold (slope 0.08 DHW/year, p > 0.05, R2adj = 0.019).

Whatever processes produce the increasing bleaching thresholds, they should compel us to re-examine our prediction models, particularly given the Indo-Pacific’s prevalence of reef systems at both extremes: isolated, highly biodiverse climate refugia and ecologically vulnerable, high-risk reefs projected to fare poorly under climate change (Sheppard 2003; Obura et al. 2022; Shlesinger & van Woesik 2023). In fact, studies assuming a constant bleaching threshold into the future (4 DHW, the threshold for Alert Level 1, is common) warn that coral reefs in the Indian Ocean may reach an unrecoverable tipping point—a climate under which we can expect bleaching-level conditions on an annual basis—as early as 2030 (van Hooidonk et al. 2016). Yet a static bleaching threshold implies that these coral assemblages have reached their adaptation limit and can adapt no further or at least no faster, and we have evidence that this may not always be the case (Donner 2009; Thompson & van Woesik 2009; Guest et al. 2012; Cantin & Lough 2014; Palumbi et al. 2014; Ainsworth et al. 2016; DeCarlo et al. 2019). At the very least, our findings draw attention to areas in which bleaching projections might be improved, perhaps by implementing a graded bleaching threshold in models developed for reef systems where this value seems to be changing over time. With each successive year, the conditions that coral reefs must contend with for their continued survival and function grow more extreme. Since we have observed increasing bleaching thresholds over the past two decades, regardless of the mechanism, projections of future bleaching should consider that present-day tolerance may underrepresent future tolerance, even if reefs in the future are composed of different coral assemblages.

Holding window size and anomaly cutoff constant is necessary to assess changes in bleaching threshold over time, as these model parameters covary and thus cannot be interpreted in isolation (Fig. 4, Supplement 4). In a similar vein, adjusting only one element of the DHW definition at a time (holding the other two constant) can reveal their relative contributions to forecast skill—and tell us to which improvements our forecast models are most sensitive (Supplement 4). This sensitivity refers to model fit rather than an inherent ecological trait. That is, a higher anomaly cutoff alone does not necessarily indicate a robust, heat-tolerant coral assemblage; rather, it represents some combination of underlying processes that allow us to predict bleaching more accurately when we are stricter about the temperature anomalies considered stressful.

Conclusions

Our results highlight the benefits of continually reassessing our heat stress metrics and coral bleaching forecast models. Based on global bleaching observations from 1998 to 2019, we could forecast coral bleaching events on a reef-sized scale (5 km \(\times\) 5 km) more effectively by lowering the anomaly cutoff temperature to 0.4 °C, adjusting the size of the accumulation window to 11 weeks, and setting a slightly lower bleaching threshold of 3 DHW. Together, these modifications confer a 70% increase in forecast skill relative to the current standard DHW model, as well as ameliorate its bias toward underpredicting coral bleaching occurrences. Optimizing the DHW definition on a geographic basis further doubled forecast performance and corrected the global model’s compensatory tendency to overpredict bleaching events. Optimizing the forecast model on a temporal basis revealed a global increase in coral bleaching threshold that keeps pace with rates of ocean warming.

For all the complexities governing coral stress responses, even a simple threshold-based model like the DHW metric optimized in this study can maintain its status as a powerful monitoring and forecasting tool by factoring in the effect of geographic location. Many of the complex environmental and ecological factors that drive, exacerbate, and mitigate coral bleaching possess a strong spatial component, so accounting for regional variability is a prerequisite for competent bleaching models. The utility of a spatial proxy for something like real-time coral bleaching risk assessment does not preclude the necessity of nuanced, explanatory multivariate models, however; both have their place. In the effort to preserve the world’s coral reefs, success hinges on our ability to allocate limited resources (monitoring, mitigation, management) to areas that are particularly valuable, vulnerable, or versatile players in their respective ecosystems. This, in turn, requires an intimate and accurate understanding of 1) the current state of our reefs, and 2) how we expect them to evolve in a rapidly changing environment. Our long-term projections and short-term forecasts should anticipate change, and revisiting and refining our existing toolkit is a necessary step. The results of our study provide new definitions of DHW that can be applied now, and suggests that projections of future bleaching events should reconsider the use of static heat stress sensitivities.