1 Introduction

Hurricanes threaten human lives and livelihoods, inflict severe damage to property, and incur billions of dollars in economic losses and recovery efforts. From 1992 through 2011, these events alone caused 42% of the catastrophe-insured losses in the United States (King et al. 2013). The disruption from these extreme weather events will likely increase with rising coastal populations and increasing value of infrastructure in coastal areas (Knutson et al. 2010). Furthermore, anthropogenic climate change is already worsening and expected to further increase average sea surface temperatures (SSTs) and sea level (IPCC 2018; Kossin et al. 2020). There is growing interest in determining if modifications to the incoming flux of solar radiation could potentially offset key impacts expected to occur from rising global temperature (Msadek et al. 2013). Whether or not such strategies are pursued, it is critical to understand the relationship between hurricane statistics and climate responses to past radiative forcings to help characterize the full range of plausible future anthropogenic influences on hurricane activity.

The underlying relationship between hurricanes, radiative forcing, and climate change remains an area of active inquiry (Ting et al. 2015; Elsner 2006; IPCC 2014). Several modeling studies have suggested that, in general, future storms may pose more severe threats to human well-being, infrastructure, and the economy (IPCC 2014). For example, Villarini and Vecchi (2013) and Emanuel (2013) used data from the Coupled Model Intercomparison Project 5 (CMIP5) ensemble to evaluate storm intensity under climate change. Such studies suggest that the number and intensity of the largest storms (e.g., category 4 and 5 hurricanes) will increase in a warmer climate due primarily to an increase in SST. Recent evidence (Kossin et al. 2020) demonstrates that TCs have significantly increased in intensity, though this may be due to changing observational methodologies (Vecchi et al. 2021). Furthermore, Wang et al. (2018) and Strauss et al. (2021) have shown clear evidence for anthropogenic influence on hurricane precipitation and flooding, respectively.

While we are interested in the relationship between volcanic eruptions and hurricanes for its own sake, eruptions of the recent past and last millennium may also provide a glimpse of the risks associated with “solar geoengineering” (SG), which has received increasing attention as a possible strategy for slowing down the rate of global warming from greenhouse gas emissions (Govindasamy and Caldeira 2000; Caldeira and Wood 2008; Kravitz et al. 2014; MacMartin et al. 2019). That is, if global greenhouse gas reduction efforts are insufficient in the coming decades, some researchers argue that interventions to the climate system may be preferable to allowing global temperatures to increase (Govindasamy and Caldeira 2000; Caldeira and Wood 2008; Kravitz et al. 2014; MacMartin et al. 2019). While there are numerous SG strategies, stratospheric aerosol injection (SAI) in particular mimics large volcanic eruptions (Jones et al. 2017; Crutzen 2006). Fundamentally, SAI decreases the total amount of sunlight reaching the surface, which shares some similarity with the effect of stratospheric aerosol injections from volcanic eruptions. In the stratosphere, chemical and micro-physical processes convert SO2 into sub-micrometer sulfate particles. The process has been observed in volcanic eruptions (Wilson et al. 1993; Bluth et al. 1992). Although the particle sizes of artificial aerosols are not necessarily the same as volcanic aerosols, the injection rate of artificial aerosols can be designed to roughly match volcanic radiative forcing (Lacis and Mischenko 1995). In addition to the micro-physical differences between SAI and volcanic eruptions, differences in climate response can result from whether SAI is pulsed or sustained and also from injection location. Volcanic eruptions more closely match a pulsed SAI forcing, and the effects of a sustained forcing can be significantly different (Duan et al. 2019). Thus caution should be taken when extrapolating climate responses of volcanic eruptions to climate responses of SAI in general. The similarities between SAI and eruptions mentioned here only extend as far as the SAI regime mimics the location and duration of the eruptions.

Volcanic eruptions result in an increased reflection of solar radiation, which can strongly impact global temperatures, circulation patterns, and water cycles (Mass and Portman 1989; Robock 2000; Swingedouw et al. 2017). Changes in tropical sea surface temperature from SAI and volcanic eruptions would also similarly alter the position of the Inter Tropical Convergence Zone (ITCZ). For example, asymmetric volcanic forcings (e.g., volcanic aerosols being concentrated in the stratosphere of one hemisphere) alter the position of the ITCZ for at least one year following an eruption (Yang et al. 2019; Pauseta et al. 2019; Jones et al. 2017). These effects could potentially be even longer lasting if coupled interactions between the ocean and atmosphere are engaged (Colose et al. 2016; Raible et al. 2016; Schurer et al. 2013, 2014). Given that hurricanes are sensitive to the regions where moisture convergence occurs, it follows that such radiative effects on the global circulation could influence hurricane statistics.

The historical record only provides a few clues about the effect of volcanic eruptions on hurricanes or tropical cyclones (TCs). Nevertheless, modeling studies suggest that a reduction in TC accumulated energy, TC duration, and lifetime maximum intensity occurs following a volcanic eruption due to a decrease in SST and increase in upper tropospheric/lower stratospheric temperature (Evan 2012), all of which decreases TC thermodynamic efficiency estimates (Emanuel et al. 2013). The asymmetric increase in stratospheric aerosols occurs in the hemisphere in which the eruption took place, modifying the sea surface temperature gradient (Haywood et al. 2013). This gradient shifts the location of the Inter-tropical Convergence Zone (ITCZ) to the opposite hemisphere of the eruption (Haywood et al. 2013), which hinders hurricane development in the volcano’s own hemisphere due to a decrease in convection and increase in wind shear. In fact, after the northern hemisphere eruptions of Mount Pinatubo (1991) and El Chichón (1982), North Atlantic TC activity decreased, while TC activity increased following the southern hemisphere eruption of Agung (1964) (Evan 2012; Guevara-Murua et al. 2015). More recently, Camargo and Polvani (2019) found no robust reduction of North Atlantic tropical cyclone activity in recent observations or reanalysis. A reduction in potential intensity (PI) is observed in CMIP5 and CESM large-ensemble historical simulations after volcanic eruptions, though the decrease might be overestimated due to model bias.

Comparisons between all-forcing and single-forcing last millennium model simulations, as well as multiproxy paleoclimate constructions, have shown that large volcanic eruptions were the dominant forcing during the pre-industrial record (Schurer et al. 2013, 2014). Studies have also shown that large tropical volcanic eruptions may have long-lasting influences on the Atlantic multi-decadal oscillation and lead to El Niño-like warming in the cold season after the eruption (Otto-Bliesner et al. 2016). Such impacts could, in turn, affect hurricane statistics because these large-scale modes help govern, in part, the frequency and intensity of storms occurring in any given year.

Attempts to use paleoclimate indicators of past events (i.e., paleotempestology) are limited by the paucity of appropriate archives and proxies (Liu et al. 2008; Mann et al. 2009; Donnelly et al. 2015). Importantly, most paleorecords are commonly constrained to certain geographic areas and currently provide limited information from other regions also commonly affected by TCs, such as the Caribbean (Oliva et al. 2018). The use of paleotempestological records is further limited because it is not possible to fully reconstruct the tracks and lifetime of past TCs, most of which occur over the ocean (Emanuel 2005). These limitations preclude the extensive use of such data to evaluate the effects of major volcanic eruptions on TC activity (Oliva et al. 2018; Yan et al. 2015; Korty et al. 2012).

Hurricanes are mesoscale features of the tropical circulation, and as such they depend critically on quantities that are typically unresolvable in the coarse resolution grid of general circulation models (GCMs), which typically have nominal horizontal resolutions on the order of 50–200 km. For example, the resolution of the CESM simulation used here is 1.9° × 2.5° which is roughly 211 km by 278 km at the equator. In order to overcome this limitation, authors typically follow one of three following approaches. The first approach is relatively simple, and entails calculating thermodynamic metrics like PI at the native (coarse) resolution of GCM output, which shows the maximum possible TC intensity given a vertical sounding (Wang et al. 2018; Emanuel and Nolan 2004; Tang and Camargo 2014; Bister and Emanuel 2002). A recent study (Yan et al. 2018) used the last millennium ensemble (LME) to determine the theoretical effects of volcanic eruptions during the last 1000 years on PI. The authors found a significant relationship lasting up to 3 years post-eruption, but also “divergent” responses at the mid and high latitudes to the volcanic forcing. While this approach is computationally efficient, it does not explicitly simulate TCs.

A second approach uses a statistical method, coupling synthetic tracks with an axisymmetric hurricane model to downscale GCM output (Emanuel 2006; Korty et al. 2017). This approach is computationally lightweight, allowing for an investigation of long-term variability in fully coupled GCM simulations of the last millennium (Kozar et al. 2013). While this method has a high spatial resolution in the vicinity of the eyewall, the model is limited due to being axisymmetric and a hurricane being unable to feedback on environmental conditions supplied to it.

The third approach employs a regional model to dynamically downscale GCM output (Knutson et al. 2008). Dynamical downscaling typically requires high performance computing infrastructure as well as boundary conditions from the “parent” GCM at six hourly temporal resolution. It is therefore much more computationally expensive than the other two methods, but it provides greater insight into the storms that would have occurred in each GCM framework if it were run with sufficiently high spatial resolution. Dynamical downscaling has been widely used to evaluate hurricane statistics during the twentieth and twenty-first century (Emanuel 2013), but it has not been widely adapted to the last millennium modeling context.

In this study, we use WRF to dynamically downscale output from the LME to analyze the effect of volcanic eruptions on hurricane intensity and lifetime. While not a perfect analog to SAI, eruptions are important for understanding physical mechanisms and validating models used in projecting the climate response to a possible SAI regime. Understanding the effects of volcanic eruptions on hurricane dynamics during the last millennium can therefore yield some insight into the possible (unintended) effects of SAI on these hazards.

2 Data and methods

We dynamically downscale 234 total years from two members of the “Last Millennium Ensemble” (Otto-Bliesner et al. 2016) with high resolution temporal output. The LME consists of over two dozen fully forced and single forcing experiments from the record spanning 850 CE to 2005. The LME was used because it provided an extensive readily available dataset which could be easily downscaled. While monthly data was archived for most of the members of the LME, two simulations produced sufficiently high temporal output to allow for high resolution dynamical downscaling using a regional model. One of these runs was a fully forced last millennium simulation, and the other was a long control simulation with time invariant boundary conditions. We further evaluate the strengths and limitations of our methodology by comparing downscaled reanalysis data to an historical database of hurricane tracks and intensities. The details of this approach are described below.

2.1 Data

2.1.1 Last millennium ensemble

Output from only two members of the LME were archived at sufficiently high temporal resolution to be used as boundary conditions for WRF: a fully-forced simulation with time varying boundary conditions (LMEforced) and a pre-industrial control simulation with time invariant boundary conditions (LMEcontrol). Both simulations were run from 850 to 2005 CE using the Community Earth System Model (CESM) version 1.1, with the Community Atmosphere Model (CAM) version 5. The resolution of the atmosphere and land grids are nominally ∼2° and ∼1° for ocean and sea ice grids. While both runs were spun up for 200 years under control conditions prior to 850 CE, LMEforced was forced with the transient evolution of solar intensity, volcanic emissions, greenhouse gases, aerosols, and land-use conditions, as well as insolation changes from planetary orbit and tilt. In the LMEcontrol the boundary conditions were simply held fixed at their pre-industrial levels, thus providing an unforced baseline for evaluating changes in hurricane statistics following large volcanic eruptions.

2.1.2 ERA-I and IBTrACS

The native 2° resolution of CAM5 in the LME simulations would make it impossible to resolve hurricanes, hence we cannot evaluate the reliability of our downscaling methodology (see Sect. 2.2.1) with LME data alone. We therefore also downscaled the ERA-Interim (ERA-I) (Dee et al. 2011) reanalysis data to retrospectively predict historical hurricanes, then compared those predictions against the International Best Track Archive for Climate Stewardship (IBTrACS) (Knapp et al. 2010) database. ERA-I comprises a reanalysis dataset starting in 1979 and is available until August 2019. It uses four-dimensional variational data assimilation (4DVAR), yielding a significant advantage over reanalysis products using 3DVAR. This improves asynoptic data handling and allows for the influence of an observation to be more strongly controlled by model dynamics (Schenkel and Hart 2011). This data assimilation method is coupled with the ECMWF Integrated Forecast Model (IFS) to extrapolate fields between observations.

2.2 Methods

2.2.1 Dynamical downscaling

We used WRF version 3.9 (WRFV3.9) (Skamarock et al. 2008) to dynamically downscale archived data from LME simulations with the physics schemes shown in Table 1. We also turned on heat and moisture surfaces fluxes (isfflx = 1) and modified exchange coefficients Cd and Ck according to surface winds (isftcflx = 1). The physics schemes were selected to satisfy a 15% difference threshold imposed between downscaled ERA-I and IBTrACS, as quantified by our diagnostics suite.

Table 1 WRF physics parameterizations used for downscaling

WRF was run for a total of 234 simulation years over a domain spanning the North American sector from 130°W to 15°E and the equator to 55°N, allowing us to identify and track TCs in both the Atlantic and Eastern Pacific, even after making landfall in North America. In comparing LMEcontrol and LMEforced we focused on the effect of aerosol forcing from volcanic eruptions. The aerosol mass signals from volcanic eruptions from 1100 to 1850 CE are shown in Fig. 1, which is described in detail in Gao et al. (2008). We selected the top 42 eruptions based on peak aerosol mass signal from LMEforced and ran WRF for two years after each of these eruptions (84 simulation years total). LMEcontrol was run using WRF for 150 years from 1000 to 1150 to provide a sufficient sample of non-volcanic natural variability. The two year window after eruptions was selected because the residence time of stratospheric aerosols is around one to two years (Crutzen 2006), and individual large eruptions produce global or hemispheric cooling for an average of two or three years (Robock 2015). The LMEcontrol run was ensured to have sufficient length by analyzing the SST signal in frequency space (Fig. 2). This analysis shows that using 100 years of control data is sufficient to ensure we are not missing any low frequency variability.

Fig. 1
figure 1

Aerosol mass signals for volcanic eruptions from 1150 to 1850 CE in the Northern Hemisphere. The minimum, 10th largest, and 5th largest signals are shown with dashed lines. All signals were selected as “all NH eruptions” years for this study

Fig. 2
figure 2

Aerosol mass signals for volcanic eruptions from 1150 to 1850 CE in the Southern Hemisphere. The minimum, 10th largest, and 5th largest signals are shown with dashed lines. All signals were selected as “all SH eruptions” years for this study

We elected to use a horizontal grid spacing (∆X) of 30 km. The 30 km spacing represents a compromise between our competing requirements for high resolution output and a large sample size; each downscaled year used approximately 2000 core hours on the Cheyenne supercomputer totaling about 500,000 core hours for all years. While the 30 km resolution is somewhat coarse for resolving certain features of hurricanes, decreasing the resolution further to 10 km was much too expensive. The threefold increase in spatial resolution would have translated into more than a tenfold increase in core hours, or a tenfold reduction in the number of years simulated. All data from the LME were prepared for WRF using the procedure and code described in (Bruyere et al. 2015).

In our work, 6-hourly ERA-I data was also downscaled in WRF (using the same domain and 30 km resolution as the downscaled LME simulations) from 1995 to 2005 and compared to the IBTrACS database for the same record. The comparison was made using the suite of diagnostics described in Sect. 2.2.3.

2.2.2 Tracking tropical cyclones with TSTORMS

We applied the TSTORMS (Zhao et al. 2009) tracking software, developed and supported by GFDL, to analyze the results of downscaling. This routine uses minimum pressure and maximum vorticity criteria to identify cyclones. Events are stored as “storms” if they satisfy the following conditions for at least a preset number of days (ndays): (1) That the maximum vorticity location is within a threshold radius (rcrit) of the minimum pressure location, (2) that the core temperature of the cyclone is higher than outside of the core by a threshold difference (twccrit) and (3) the difference in vertical distance between pressure levels at 200 hPa and 1000 hPa outside and inside the core exceeds a threshold value (thickcrit).

As described in Walsh et al. (2015) and Zhao et al. (2009), tracking results are sensitive to the details of the tracking scheme that is employed and especially the threshold values selected for identifying storms (Horn et al. 2014). To identify sensitivity to threshold values, we conducted a limited parameter sweep to determine optimal threshold values. We calculated the difference between ERA-I downscaled output and IBTrACS data, for each set of parameters, using the diagnostics described in Sect. 2.2.3. We used the set of parameters that achieved the minimum difference of ∼13.5%. This parameter set was rcrit = 1.5°, twccrit = 1.0 °C, thickcrit = 50 m, and ndays = 2.

2.2.3 Diagnostics

Once hurricanes were identified in our downscaled LME data using TSTORMS, we calculated 19 diagnostic metrics to evaluate differences in the statistics of storms in LMEcontrol and those occurring after large eruptions in LMEforced. The diagnostics consist of 10 storm distributions and 9 calculations of the fraction of the total number of storms with a certain range of values (e.g., fraction of storms with minimum pressure between 1020 and 980 hPa). We calculated the 10 distribution diagnostics as follows: the fraction of storms vs (1) month, (2) year, (3) latitude, (4) longitude, (5) maximum wind speed, (6) minimum pressure, (7) decay time from maximum wind speed, (8) decay time from minimum pressure, (9) power dissipation index (PDI), and (10) accumulated cyclone energy (ACE). For example, (5) is a plot of the fraction of storms with a specific maximum wind speed. The final 9 diagnostics include the fraction of storms within (11) May to November, (12) 0–25 N latitude, (13) 100–50 W longitude, (14) 1020−980 hPa pressure, (15) 0−40 m/s maximum wind speed, (16) 0−100 h decay time from maximum wind speed, (17) 0−100 h decay time from minimum pressure, (18) 1.12*109–21.6*109m3/s2 PDI, and (19) 0.25–3.75m2/s2 ACE. Mean values and quantile values (expressed as percentages) were used to calculate fractional differences between LMEcontrol and LMEforced, and these differences were averaged over all diagnostics for a composite percentage difference. We refer to the mean difference of diagnostics 1–10 as the total “average difference” and the mean difference of diagnostics 11–19 as the total “percentage difference.” We also computed these metrics from our downscaled ERA-I data to compare them to the IBTrACS database as a test of our methodology, as described in Sects. 2.2.1 and 2.2.2.

The diagnostics described above were used as test statistics to evaluate whether volcanic eruptions have a measurable effect on hurricane behavior. These diagnostics were selected in order to assess hurricane behavior across a broad range of characteristics. The diagnostics not only quantify hurricane behavior across the temporal and spatial domain, but also assess more fundamental physical characteristics. In addition, the diagnostics can be used with limited data consisting only of time, location, wind speed, and surface pressure. This presents a versatile and efficient approach to capture both mean climatology and structured hurricane behavior.

To determine whether volcanic eruptions affect hurricane statistics, we performed two-sample KS-tests for distributions of each of the diagnostics. The two samples tested for each diagnostic came from downscaled LMEcontrol and LMEforced data. Since LMEcontrol does not include volcanic eruptions, agreement with LMEcontrol is confirmation of the null hypothesis. We also performed two-sample Anderson–Darling tests to account for long tail effects to which KS-tests are insensitive.

3 Results

We first show comparisons in TC tracks between downscaled output from ERA-I and IBTrACS (Fig. 3) as well as between LMEforced and LMEcontrol (Fig. 4). The ERA-I vs. IBTrACS comparison provides a baseline of whether WRF downscales TCs correctly, while the comparison between LMEforced and LMEcontrol is focused specifically on the effect of aerosol forcing from volcanic eruptions. There is agreement in the location of TC tracks for both comparisons, however the downscaled ERA-I underestimates TC intensity. This underestimation is likely related to the downscaling resolution of 30 km and the representation of storms in ERA-I (Lui et al. 2021). There is also a notable lack of TC tracks in the Gulf of Mexico for the downscaled ERA-I compared to the IBTrACS (Fig. 5).

Fig. 3
figure 3

LMEcontrol SST power spectrum. 100 years of control data is sufficient in capturing low frequency content

Fig. 4
figure 4

Downscaled ERA-I hurricane tracks using TSTORMS (A) compared to the IBTrACS database (B) for 1995 through 2005

Fig. 5
figure 5

Downscaled LMEforced hurricane storm tracks using TSTORMS (A) compared to downscaled LMEcontrol using TSTORMS (B)

3.1 ERA-I vs. IBTrACS

As shown in Table 2, using our suite of diagnostics, we found an overall agreement between ERA-I and IBTrACS of ∼86.5%, or a composite difference of ∼13.5%. 6-hourly ERA-I data downscaled in WRF was compared to IBTrACS for the same time period (1995−2005). Notably, most hurricanes (97%) from downscaled ERA-I remain below a maximum wind speed of 40 m/s, whereas 30% of storms from IBTrACS cross the 40 m/s maximum wind speed. Diagnostic distributions for both ERA-I and IBTrACS are shown in Fig. 6. It is worth noting that the truncation of the domain in our ERA-I WRF simulations contributes to the differences in latitude and longitude peaks seen in Fig. 6. For example, according to the IBTrACS database, some storms track beyond the WRF domain, while our downscaled ERA-I simulations cut-off where the WRF domain ends. In Hodges et al. (2017), the authors assess how well TCs are represented in reanalysis products. This work used two TC-track matching approaches, referred to as (1) “direct matching” and (2) “objective matching”. The authors further used several diagnostics in order to compare reanalysis TC tracks to those found in IBTrACS. The objective matching approach, which employs a tracking algorithm similar to TSTORMS, found an agreement of ∼60% with ERA-I in the Northern Hemisphere. A simple “direct matching” implementation of our own achieved similar agreement. The physics schemes in Sect. 2.2.1 and threshold values in Sect. 2.2.2 were used to satisfy a self-selected 15% difference threshold imposed between ERA-I and IBTrACS, as quantified by our diagnostics suite.

Table 2 ERA-I vs. IBTrACS statistics for certain variables
Fig. 6
figure 6

Average number of storms by time of year (month), year, latitude, longitude, average wind speed, average pressure, lifetime based on decay from maximum wind to average wind, lifetime based on decay from minimum pressure to average pressure, accumulated cyclone energy (ACE), and power dissipation index (PDI), between downscaled ERA-I and IBTrACS for 1995 through 2005. Bin widths are 1°, 3°, 1 m/s, 1mb for latitude, 6 h, and 6 h for longitude, wind speed, pressure, wind life, and pressure life respectively. Distributions are normalized by the total number of storms

3.2 Effect of eruptions on hurricane statistics

3.2.1 Average effect of eruptions

Distributions of the diagnostics for LMEcontrol and LMEforced, with all 42 eruptions included, are shown in Fig. 7. The same diagnostic comparison for all northern hemisphere eruptions is shown in Fig. 8. All southern hemisphere eruptions are included in the diagnostic comparison shown in Fig. 9. A diagnostic comparison for the top 5 northern hemisphere eruptions is shown in Fig. 10. The top 5 southern hemisphere eruptions are included in Fig. 11. Performing two sample KS-tests and two-sample Anderson–Darling tests on the distributions, along with fractional significance tests on the difference of mean values, shows that the overall effect of all 42 eruptions is consistent with the null hypothesis. That is, the effect of all 42 eruptions is consistent with the natural climate variability seen in LMEcontrol (Table 3). However, when separating northern and southern hemisphere eruptions, we observe a moderate effect on the frequency, lifetime, and intensity of hurricanes (Tables 4, 6). The KS-tests show a maximum difference between the two samples (D-value) and a probability that the two samples are drawn from the same distribution (P value). The Anderson–Darling tests give the level at which the null hypothesis can be rejected. For example, in the max wind row of Table 6, an Anderson–Darling test significance value of 0.001 shows that the null hypothesis can be rejected at the 0.1% level. For all northern hemisphere eruptions (Tables 4, 5) we can reject the null hypothesis by the Anderson–Darling test for intensity, PDI, and ACE at a maximum level of 13%. For both PDI and ACE we can reject the null hypothesis at around the 2% level. We see similar results for all southern hemisphere eruptions in Tables 6, 7. The fractional significance tests (Tables 8, 9, 10) show the fraction of TCs in LMEcontrol which exceed the LMEforced mean. For example, in the max wind row of Table 8, 53.2% of the TCs in LMEcontrol had stronger maximum wind speed than the mean maximum wind speed of TCs in LMEforced. For all northern hemisphere eruptions, we see in Table 9 that TCs have a shorter lifetime, lower maximum intensity, and lower latitude than 55–70% of TCs in LMEcontrol. We also find that TCs following northern hemisphere eruptions have a lower PDI and ACE than 65% of TCs in the control. In contrast, we observe the opposite effect with southern hemisphere eruptions. As shown in Table 10, TCs following southern hemisphere eruptions have longer lifetimes, larger maximum intensity, increased latitudes, and larger PDI/ACE than nearly 60% of TCs in the control.

Fig. 7
figure 7

Same as Fig. 6 except for a comparison between LMEforced and LMEcontrol for all eruptions in both hemispheres

Fig. 8
figure 8

Same as Fig. 6 except for a comparison between LMEforced and LMEcontrol for all eruptions in the northern hemisphere

Fig. 9
figure 9

Same as Fig. 6 except for a comparison between LMEforced and LMEcontrol for all eruptions in the southern hemisphere

Fig. 10
figure 10

Same as Fig. 6 except for only the five strongest eruptions in the northern hemisphere

Fig. 11
figure 11

Same as Fig. 6 except for only the five strongest eruptions in the southern hemisphere

Table 3 KS-tests show a maximum difference between the two samples (D value) and a probability that the two samples are drawn from the same distribution
Table 4 KS-tests (D- and P-values) and Anderson–Darling tests (Significance) for all eruptions. (Northern Hemisphere)
Table 5 KS-tests (D- and P values) and Anderson–Darling tests (significance) for 5 strongest eruptions (Northern Hemisphere)
Table 6 KS-tests (D- and P values) and Anderson–Darling tests (significance) for all eruptions (southern hemisphere)
Table 7 KS-tests (D- and P values) and Anderson–Darling tests (Significance) for 5 strongest eruptions (Southern Hemisphere)
Table 8 Fractional significance tests for all eruptions
Table 9 Fractional significance tests for all eruptions, the 10 strongest eruptions, the 5 largest eruptions, and the largest eruption (1258)
Table 10 Fractional significance tests for all eruptions, the 10 strongest eruptions, the 5 largest eruptions, and the largest eruption (1452)

We also calculated Pearson correlation coefficients on eruption strength and diagnostic changes. To determine the significance of these correlation coefficients, we calculated the 90%, 85%, and 80% confidence intervals. If the confidence interval for a particular diagnostic (e.g., maximum wind speed) does not include zero correlation we can say with at least 90%, 85%, or 80% confidence that there is a correlation between eruption strength and that particular diagnostic. These confidence intervals are affected by the number of eruptions in our analysis. The correlation coefficients for all eruptions in both hemispheres are listed in Table 11. Here, we see with at least 80% confidence that there is a correlation between eruption strength and reduction in yearly hurricane number, intensity, and lifetime. The correlation coefficients for all northern hemisphere eruptions are shown in Table 12. Again we see with at least 80% confidence that there is a correlation between eruption strength and reduction in yearly hurricane number, maximum wind speed, and lifetime. The correlation coefficients for all southern hemisphere eruptions are shown in Table 13. Here we again see with at least 80% confidence there is a correlation between eruption strength and reduction in intensity. We also see with at least 80% confidence a correlation between eruption strength and storms occurring earlier in the year. However, for the southern hemisphere we do not see the same correlation between eruption strength and reduction in yearly number or lifetime. For the southern hemisphere we see with at least 80% confidence a correlation between eruption strength and storm occurring later in the year. The opposite effect is seen for northern hemisphere eruptions.

Table 11 Correlations between eruption strength and diagnostics for all eruptions in both hemispheres
Table 12 Correlations between eruption strength and diagnostics for all eruptions in the Northern Hemisphere
Table 13 Correlations between eruption strength and diagnostics for all eruptions in the Southern Hemisphere

3.2.2 Effect of strongest eruptions

Distributions of diagnostics with only the 5 strongest northern hemisphere eruptions are shown in Fig. 10. The same diagnostics for the 5 strongest southern hemisphere eruptions are shown in Fig. 11. The results of running KS-tests and Anderson–Darling tests for the 5 strongest northern hemisphere eruptions are shown in Table 5. The results for the 5 strongest southern hemisphere eruptions are shown in Table 7. For the 5 largest eruptions, in both the northern and southern hemispheres we can reject the null hypothesis by the Anderson–Darling test for nearly every diagnostic at a maximum level of 1%. The fractional significance tests on the strongest northern hemisphere and southern hemisphere eruptions are in Tables 9 and 10, respectively. We see in Table 9 that for the 5 strongest northern hemisphere eruptions 70–80% of TCs have a lower maximum intensity, shorter lifetime, and are found at a lower latitude than TCs in LMEcontrol. Following the strongest eruption (1258) nearly 80–90% of TCs exhibit this reduced lifetime, maximum intensity, and latitude. We see in Table 10 that for the 5 strongest southern hemisphere eruptions there is the opposite effect to that of northern hemisphere eruptions. In this case nearly 60% of TCs following eruptions have an increased lifetime, maximum intensity, and latitude. However, we still observe a similar reduction in power dissipation index and accumulated cyclone energy.

4 Discussion and conclusion

In this work we have explored the effect of volcanic eruptions in the past millennium on hurricane climatology. To do this we first validated our approach of downscaling CESM data with WRF by comparing results of ERA-I downscaling with IBTrACS data. ERA-I downscaling was only able to partially reproduce IBTrACS observations however, underestimating observational intensities. On the other hand, frequency, limetime, and location were faithfully reproduced in the ERA-I downscaling. The underestimation of intensity could limit measurable effects in our intensity-based diagnostics. We also performed a parameter search for our cyclone tracking algorithm in order to achieve high accuracy. We then compared the results of downscaling our control data from CESM with forced data from CESM, where we focused on the years in the forced data with volcanic eruptions.

Our results suggest that the overall effect of all eruptions on hurricane statistics is small and not significant as compared to the control simulation (i.e., the null hypothesis). However, we see a moderate reduction in frequency, lifetime, and maximum intensity for hurricanes following northern hemisphere eruptions and the opposite for southern hemisphere eruptions. Sufficiently strong northern hemisphere eruptions do result in lower annual hurricane count, reduced intensity, and shorter lifetimes, significant at the 70-80th percentile. This evidence is in the form of KS, Anderson–Darling, and fractional significance tests on diagnostic distributions, as well as correlations between strength and changes in the mean values of these diagnostics. The moderate increase in North Atlantic TC activity following southern hemisphere eruptions substantiates previous research suggesting moisture convergence increases in the hemisphere opposite of a volcanic eruption (e.g., Haywood et al. 2013).

Comparing the downscaled ERA-I results to hurricane tracks and intensities from IBTrACS allowed us to evaluate the efficacy of our approach. Due to the inherently chaotic nature of hurricane genesis, exact agreement between ERA-I and IBTrACS was not expected, especially given the coarse 30 km resolution of our downscaled simulations. Assessing our approach was the primary objective in comparing ERA-I with IBTrACS. We expected ERA-I to capture the observational record for mean climate and to provide good agreement between downscaled results and overall hurricane statistics seen in IBTrACS. We avoided using metrics like PI, since caution is urged in using PI to draw strong conclusions about tropical cyclone projections as it fails to capture features seen in high-resolution climate models (Wehner et al. 2015). Dynamical downscaling provides far greater detail in both the spatial and temporal domain (Emanuel 2013). This is evident in Figs. 4 and 5 and the extensive suite of diagnostics used to analyze downscaled results.

The results in this study have significant implications for hurricane development in a potential future climate under an SAI regime. Although we analyzed the effects of an increase in stratospheric aerosols from volcanic eruptions, the results are relevant to what may occur under an SAI regime. Although analyses of impacts were once limited by historical observation and coarser resolution, we were able to evaluate the direct influence of many volcanic eruptions on individual hurricanes. For example, northern hemisphere eruptions in the downscaled LMEforced experiment produced a slight reduction in hurricane frequency, intensity, and lifetime. These impacts could be similarly felt if SAI is designed to mimic the injection from the northern hemisphere eruptions, removing some uncertainty associated with regional changes in tropical cyclone development for the Northern Atlantic Ocean. According to our results, hurricanes would likely decrease in frequency, lifetime, and intensity under a relatively strong SAI regime with aerosol injection in the northern hemisphere.

Although our results show moderate correlation between eruption strength and certain diagnostic measures, it is not necessarily true that stronger eruptions have a larger effect on hurricane statistics. Additionally, research has shown large uncertainties in volcanic reconstructions and seasonality of volcanic eruptions (Schmidt et al. 2012; Stevenson et al. 2017; Raible et al. 2016), which presents a direction for further investigation. In this vein, one could look at an ensemble of higher resolution GCM simulations on one or two of the strongest volcanic eruptions. This eruption profile could be simulated both in the climate conditions during the historical eruption as well as under future climate change conditions. An ensemble average of simulations with perturbed initial conditions will allow us to home in on the sole effect of aerosol forcing. This will also allow us to explore the question of whether downscaling introduced any unknown biases. An ensemble under future climate change conditions will allow us to explore the interplay of large aerosol forcing and strong anthropogenic forcing.