1 Introduction

Since the 1970s, it has been recognized that climate models are capable of simulating features reminiscent of tropical cyclones (Manabe et al. 1970). These so-called hurricane-type (Bengtsson et al. 1982) or tropical cyclone-like (Walsh and Watterson 1997) vortices tend to appear in climate simulations over the basins where cyclogenesis is commonly observed and usually at the right time of the year, but these systems are generally weaker and much larger than the storms observed in the real world. The increase in climate model resolution that has accompanied the increase in computing power over the last few decades has led to significant improvements in the realism of this simulated tropical cyclone activity (Walsh et al. 2010; Caron et al. 2010; Manganello et al. 2012; Camargo and Wing 2016): recent simulations performed in the range of a few tens of km globally produces relatively realistic tropical cyclone activity and even produced major (cat 3–5) hurricanes (Zarzycki and Jablonowski 2014; Wehner et al. 2015; Murakami et al. 2016; Scoccimarro et al. 2017), but such simulations are computationally expensive and only available to a few modeling groups at the moment. Furthermore, such grid spacing is still insufficient to resolve the inner core of the tropical cyclones, which requires a resolution of a few kilometers (Chen et al. 2007).

One way by which one can increase the realism of simulated tropical cyclone activity is through the use of finer-scale limited-area models, which get embedded in a coarser resolution climate model (or reanalyses data) (Walsh and Ryan 2000; Walsh and Katzey 2000; Knutson et al. 2007; Caron and Jones 2011; Knutson et al. 2013), or through the use of variable-resolution models (Chauvin et al. 2006; Daloz et al. 2012; Caron et al. 2012; Zarzycki et al. 2017), which focuses the available computing power over a specific region. With these two techniques, one can significantly increase the resolution such that the different characteristics of tropical cyclone activity (e.g. number of storms, geographical distribution) are usually well simulated in the region of interest. However, computational constraints are such that it is not currently possible to run long climate simulations at the resolution required to produce tropical cyclones that would resolve the inner-core dynamics of the tropical cyclones and produce storms with a size and intensity distribution comparable to what is observed with either of those techniques.

An alternative downscaling approach to analyze tropical cyclone activity, which avoids the limitation linked to climate model resolution, was developed by Emanuel et al. (2006) and Emanuel et al. (2008). With this technique, a large dataset of synthetic tropical cyclone tracks is created following a three step process. First, the climate state, derived either from a reanalysis or a climate model, is seeded with a large numbers of weak disturbances. These disturbances are then allowed to propagate based on the large-scale general circulation of the atmosphere and, finally, the intensity of the storms along each track is computed using a deterministic coupled-atmosphere tropical cyclone model which uses the atmosphere and near-surface ocean thermodynamic conditions. When the technique is applied to current climate conditions, the disturbances which survive and develop into full grown tropical cyclones have been shown to have fairly realistic physical characteristics (Emanuel et al. 2008) and direct comparisons with climate model outputs have shown that tropical cyclone activity produced using this downscaling approach generally compares favorably to the cyclone activity explicitly simulated by climate models, in particular over the Atlantic (Daloz et al. 2015; Emanuel et al. 2010). This downscaling technique has been applied extensively to study a range of problems, including, but not limited to tropical cyclone activity during the Pliocene epoch (Fedorov et al. 2010), hurricane-related precipitation risk over Texas (Zhu et al. 2013; Emanuel 2017), poleward migration of tropical cyclone activity (Kossin et al. 2016), storm surge threat to New York City (Lin et al. 2010, 2012; Reed et al. 2015) as well as other potentially vulnerable locations (Lin and Emanuel 2015), medicanes (Romero and Emanuel 2013, 2017), polar lows (Romero and Emanuel 2017) and, finally, projected change in global hurricane activity (Emanuel 2013), hurricane-related damage (Emanuel 2011; Mendelsohn et al. 2012) and tropical cyclone season length (Dwyer et al. 2015).

In this manuscript, we analyze and compare a series of simulations produced using four different reanalysis datasets. More specifically, we analyze the ability of the technique to reproduce different observed characteristics of observed Atlantic hurricane activity and compare the impact of changing the reanalysis boundary conditions on the simulated hurricane activity. The manuscript is organized as follows: Sect. 2 describes the data and the hurricane model, Sect. 3 compares the geographical distribution of the simulated hurricane activity while Sect. 4 analyzes both the decadal and interannual hurricane variability. We conclude with our final remarks in Sect. 5.

2 Tropical cyclone data

Observed tropical cyclone tracks used as reference for this study are taken from the International Best Track Archive for Climate StewardshipFootnote 1 (IBTrACS) (Knapp et al. 2010). The dataset provides the 6-hourly tropical cyclone position, the 1-min maximum sustained wind (MSW) and the minimum in surface pressure for all the storms ranging from 1851 to the present in the Atlantic basin, which is the region considered here. In order to have meaningful comparison, we excluded tracks that were archived as subtropical or extra-tropical. We also excluded storms that did not reach tropical cyclone status.

The method used to construct the synthetic track datasets has been described explicitly in Emanuel et al. (2006) and Emanuel et al. (2008) and is briefly summarized here. First, the starting point of the potential tracks are generated randomly in space and time and then, for each of these points, a trajectory is constructed using a beta and advection model (Marks 1992), which combines the vertical average of the deep tropospheric winds (estimated here as a weighted average of the ambient flow at 850 and 250 hPa) and a constant beta-drift correction (Holland 1983) to account for the environmental advection of potential vorticity by the storm. The winds at both 850 and 250 hPa vary randomly in time, but are constructed such that the mean, variance and co-variances among the two scalar wind components at the two levels match the individual monthly means in each reanalysis dataset. Furthermore, both wind time series are constrained to have a power spectra that decreases with the cube of the frequency. Each of these tracks is then extended to very high latitudes (cut off at 75°N) such that the storms usually dissipate before reaching their end point.

Once all the tracks have been generated, the intensity of the storms is computed using an axisymmetric balance model coupled to a simple one-dimensional ocean model, the so-called Coupled Hurricane Intensity Prediction System (CHIPS) model (Emanuel et al. 2004), which is initialized with a weak warm-core vortex, with surface winds of 25 kt, and integrated forward in time along each of those tracks using each year’s monthly mean atmosphere and near-surface ocean thermodynamic conditions. The synthetic winds at both 850 and 250 hPa are also used as input in order to capture the impact of vertical wind shear on the storm intensification. In effect, most disturbances dissipate very quickly due to the unfavorable conditions in which they were initially seeded (e.g. dry atmosphere, cold SST, high vertical wind shear).

One major advantage of using this approach is that the CHIPS model is phrased in angular momentum coordinates and as such reaches very high resolution (order of 1 km) in the center of the storm, thus resolving the inner core. The intensity model returns a radius of maximum winds and a maximum circular wind speed, to which a fraction (60%) of the linear translational speed is added to get the true maximum surface wind speed. Bathymetry and topography are also included and landfall is represented by a reduction in the surface enthalpy exchange coefficient. For each dataset, the number of tropical cyclones produced from one year to the next is kept constant (this constant is simulation dependent and is provided in the last column of Table 1) and the relative annual frequency is given by the proportion of randomly seeded disturbances reaching the tropical cyclone threshold of 35 knots.

Table 1 Information on the reanalysis dataset used in this study

The technique described above was used to produce four tropical cyclone datasets, each using a different set of boundary conditions but all derived from reanalysis products: ERA-Interim (ERA-I) (Dee et al. 2011) and NCEP (Kalnay et al. 1996) for the period 1980–2010, MERRA-1 (Rienecker et al. 2011) between 1979 and 2010 and the 20th Century Reanalysis (20CR) (Compo et al. 2011) for the 1891–2008 period (Table 1). This information is summarized in Table 1.

The synthetic track dataset includes, like the observational dataset, the position of the storm’s center and maximum surface wind speed, but at 2-hourly intervals. In this dataset, we excluded the few storms that reached tropical status only once they had propagated into the Eastern Pacific basin. For both observations and simulations, we further computed the number of hurricanes (64 knots), major hurricanes (93 knots) and the number of hurricanes making landfall over the US. The latter was computed by checking whether there was at least one center over the US territory with maximum surface wind speed above the 64 knots threshold.

In order to evaluate intra-basin activity, we divided the tropical cyclone tracks into four different clusters based on their cyclogenesis locations, which is defined as the location where each storm first exceeds tropical cyclone intensity, that is to say the first time that its maximum surface wind speed reaches at least 34 knots. Four such clusters were constructed, each corresponding to a different region of the North Atlantic ocean: (1) the Gulf of Mexico, (2) the Caribbean Sea west of 65°W, (3) the tropical North Atlantic, between 0° and 20°N and east of 65°W, and iv) the Atlantic north of 20°N. The black contours in Fig. 2a show the boundaries of the four different regions. The clusters have been chosen so that each one encompasses a maximum in cyclogenesis density. Although constructed differently, these clusters are fairly similar to those constructed using a more advanced clustering technique (Gaffney 2004) to study Atlantic hurricane variability (Kossin et al. 2010; Kozar et al. 2012; Boudreault et al. 2017).

Data generated from observations and reanalyses will also be examined using statistical tools. The methodology will only be described in the paper when appropriate, either in the text or in a table/figure caption.

3 Climatology

In Fig. 1, we compare the observed tropical cyclone tracks with the synthetic tracks produced using the different reanalyses. As it was shown in previous studies (Emanuel et al. 2006, 2010), the latter are fairly realistic, with many storms forming off the West African coast and over what is considered the main development region (the so-called Cape Verde storms) and generally propagating westward towards Central and North America before either making landfall or re-curving towards the northern North Atlantic. A few minor differences can be observed, such as anomalously high activity in the 0–10°N band over the Atlantic and anomalously high landfall rates in the northern part of South America (Venezuela, Suriname and the two Guianas) in some cases. Synthetic storms also appear to cross more frequently into the Eastern Pacific compared to observations. These features are present regardless of which 472 synthetic tracks were selected to produce Fig. 1 and thus are not the result of an unrepresentative sample. Interestingly, the MDR in the ERA-I simulation appears more realistic than when TCs are tracked directly in the reanalysis dataset [Figure 1 in Murakami (2014)].

Fig. 1
figure 1

Observed (a) and synthetic (be) tropical cyclone tracks. The different track colors correspond to the intensity of the tropical cyclones based on the Saffir-Simpson scale. Observed tropical cyclones are plotted for the full 1980–2010 period, for a total of 472 tropical cyclones. For consistency, the same number of tropical cyclones has been randomly sampled for each set of downscaled simulations and plotted alongside observed tracks

Figure 2 compares the cyclogenesis density of the different synthetic tracks with that of the observed tracks while Table 2 compares the proportion of storms forming in each of the four sub-basins. Again, we notice that for MERRA, ERA-Interim and NCEP, the cyclogenesis distribution is fairly realistic: the three sets of simulations show higher cyclogenesis density, and in many cases a local maxima, over the four regions where a local maxima in cyclogenesis density is actually observed (tropical Atlantic, Caribbean Sea, Gulf of Mexico and western extra-tropical Atlantic). Tropical cyclone (TC) activity is underestimated in the Gulf of Mexico [both ERA-I and MERRA show a significant shift in activity from the Gulf of Mexico to the Atlantic (Table 2)], but is much more realistic than what is typically detected in high-resolution global and regional climate models, which tend to produce very little hurricane activity over that part of the basin (Scoccimarro et al. 2017; Camargo and Wing 2016; Camp et al. 2015; Mei et al. 2014; Vecchi et al. 2014; Strazzo et al. 2013a).

Fig. 2
figure 2

Cyclogenesis density for observations (a, b) and for each synthetic dataset (ce). The color represents the annual mean number of cyclogenesis in a 400 km radius. A contour is drawn for each 0.2 cyclogenesis. The thick black lines in a correspond to the boundaries of the four different cyclone clusters. Only the synthetic tracks of the period 1980–2010 are considered, except for 20CR, where the two last years are excluded (1980–2008)

Table 2 Proportion (%) of tropical cyclones (TC), hurricanes (HR) and major hurricanes (MHR) forming in each of the four sub-basins

Over the tropical Atlantic, cyclogenesis occurs from the lesser Antilles to the Cape Verde Islands, but while the main development region (MDR) is still visible to various degrees, it generally extends further south, to around 7°N (Fig. 2c–f) and cyclogenesis is detected, in most cases, nearly as far south as the equator (random seeding is performed until 3°N). Furthermore, the simulations show a maximum over the western part of the MDR as opposed to its eastern part. This westward shift combined to the southward expansion of the MDR probably explains why the simulated cyclones can reach more southern latitudes and strike unusual places like South America (Fig. 1b–e). While climate models can have difficulty producing realistic tropical cyclone activity over the Atlantic basin in general and the MDR in particular, even at high resolution, Atlantic tropical cyclones explicitly simulated by Global climate models (GCMs) and Regional climate models (RCMs) usually do not hit the northern part of the South American continent and are usually constrained further north than the storms that form here (Ibid.).

This positive westward-southward bias in cyclogenesis over the tropics in the NCEP-driven simulation has been pointed out previously by Emanuel et al. (2008) and again by Strazzo et al. (2013b), but appears here as a robust feature of this technique, regardless of boundary conditions. One will notice that the simulated pattern in tropical cyclone formation over the tropical Atlantic is more reminiscent of the longer climatological average (Fig. 2b) than that observed during the period covered by the simulations (Fig. 2a). Although arguments have been made that hurricane activity has been shifting eastward over the recent past (Holland and Webster 2007), there are reasons to believe that this apparent eastward shift in observed cyclogenesis is, at least in part, artificially induced by better observational coverage (Landsea 2007). However, the westward bias in simulated cyclogenesis is likely caused by the different seeding approach used in the simulations compared to the real world. In the latter, a large fraction of North Atlantic hurricanes originates from African easterly waves (AEWs) (Avila 1990; Avila and Pasch 1995; Thorncroft and Hodges 2001), the latitudes of which is determined by the latitude of the African Easterly Jet along which they propagate. Similarly, in climate simulations, AEWs fulfill a similar role, seeding disturbances with spatial and temporal constraints. In the simulations analysed here on the other hand, the origins of the disturbances are randomly distributed throughout the Atlantic basin, including a few degrees to the south of where AEWs generally propagate. There are generally favorable thermodynamical conditions for cyclone activity in this area, but very few cyclogenesis events have ever been observed. This strongly suggests that constraining the distributions of the seeds in this downscaling exercise to reflect the precursor role played by AEWs on Atlantic tropical cyclone activity would likely lead to improvements in the geographical distribution of hurricane activity. Adjusting the distribution to reflect the influence of quasi-baroclinic systems off the southeast US may provide a more realistic local maximum over the extra-tropical Atlantic as well. Finally, this bias in the climatological distribution, in particular the southward expansion of the MDR, might also explain why simulated MDR storms are more likely to become hurricanes compared to observed tropical cyclones (Table 4): by forming further south, they can spend more time over the warmer tropical ocean and thus have more time to intensify.

One set of simulations which stands out with respect to the other three is the one produced using 20CR, with one anomalously high maximum located over the Caribbean Sea (Fig. 2f). Because disturbances are seeded each year until a fixed number of tropical cyclones are formed, the tropical cyclones which form in that region are produced at the expense of the storms forming in the other three sub-basins (especially the Northern Atlantic and the tropical Atlantic, perhaps not surprisingly since they are the largest sub-basins). To help explain this feature, we show in Fig. 3 the difference in seasonal vertical wind shear between 20CR and MERRA: a strong negative anomaly is detected over the Caribbean Sea, the region with anomalously high cyclogenesis, in 20CR compared to MERRA. The result is similar whether we compare 20CR to MERRA, ERA-Interim or NCEP (not shown). In the case of MERRA, seasonal average vertical wind shear over that region is \(\sim\)11 \(m s^{-1}\) whereas it is \(\sim\)\(m s^{-1}\) in 20CR. Vertical wind shear is known to negatively affect tropical cyclone formation and intensification and was shown to affect the storm evolution in the downscaling approach studied here (Emanuel 2006; Emanuel et al. 2008). This negative wind shear anomaly over the Caribbean Sea in 20CR is due to a weaker subtropical jet in 20CR compared to the other reanalysis products as well as a shift in its position, from South-West of the Caribbean Sea to North-East of the Caribbean Islands (not shown). The fact that the simulation performed with 20CR stands out compared to the other three datasets is not entirely surprising: because upper-atmospheric data are not assimilated in this reanalysis dataset, the model used to produce the reanalysis is not constrained in the upper atmosphere, resulting in the model biases being transfered to the final reanalysis product, biases which can then impact the downscaled hurricane activity.

Fig. 3
figure 3

Difference in mean seasonal wind shear between 20CR and MERRA, for the period 1980–2008. The wind shear is defined by the difference of wind (in m/s) between 250 and 850 hPa, for the months July to October. Blue corresponds to more favorable condition for cyclone formation in 20CR. Black dots represent values which are not statistically significant (at the 5% level)

Having first compared the geographical distribution of the storms, we now compare the number of the most intense storms between the different datasets. Table 3 presents the ratio of tropical cyclones intensifying to hurricanes, major hurricanes and category 5 hurricanes. The number of tropical storms which intensify into hurricanes and major hurricanes is generally well estimated in these simulations, with the notable exception of the dataset produced using NCEP reanalysis, in which case the proportion of storms intensifying to hurricanes, major hurricanes and category 5 hurricanes is always underestimated. We suspect that this negative bias in the number of intense storms is linked, in part, to the relatively large positive upper-tropospheric temperature bias in NCEP compared to the other reanalyses (Randel et al. 2000). The maximum intensity of a mature tropical cyclone is known to not only depend on the surface enthalpy fluxes (which depend on the temperature of the ocean surface), but on the difference in temperature between the surface and the outflow temperature, which for intense cyclones is located near the tropopause (Bister and Emanuel 1998). In fact, a decrease in upper-tropospheric temperatures over the last decades has been associated, in part, with the observed increase in Atlantic hurricane activity during the same period (Vecchi et al. 2013; Emanuel et al. 2013; Wing et al. 2015). As can be seen from Fig. 4a, the difference in temperature between NCEP and the other reanalyses at 100 hPa over the MDR, especially during the first half of the simulated period, is comparable to the change in temperature observed in both MERRA and ERA-Interim during the 1980–2010 period (~ 2–3 K). However, even in the later years, when the upper tropospheric temperatures are in relative agreement, the potential intensity in NCEP remains systematically lower than in ERA-I and MERRA (not shown), which is consistent with a larger proportion of storms reaching hurricane status in ERA-I/MERRA compared to NCEP, even in the later period (Fig. 4b). Whether this bias in potential intensity can account for the entire difference exhibited by the NCEP simulation with respect to the other simulations is not clear however.

Table 3 Fraction of tropical cyclones (%) that intensify to hurricanes (HR/TC), major hurricanes (MHR/TC) and category 5 hurricanes (MHR5/TC) as well as the proportion of hurricanes making landfall over the US (USLF/HR)
Fig. 4
figure 4

a Time series of air temperature at 100 hPa over the MDR for the reanalysis datasets of the study (The average period is August to October). b Time series of the proportion of tropical cyclones intensifying into hurricanes for the entire Atlantic basin in each of the simulations

3.1 US landfalling Hurricanes

Table 3 shows that, despite the difference in cyclogenesis locations, the simulations are relatively successful at capturing the proportion of hurricanes making landfall over the US, with the simulation driven by 20CR producing the largest, but not statistically significant overestimation. However, Table 4 shows that there are differences at the sub-basin scale, in particular for the Caribbean Sea and the North Atlantic regions, where, in both cases, the proportion of landfalling hurricanes is overestimated. Note that the difference is only statistically significant in the latter region for ERA-I and NCEP.

Figure 5 shows the average observed and simulated hurricane tracks and US landfalling tracks for two clusters. These average tracks are computed by resampling each track, with cubic splines, to the same number of points. This allows the calculation of the mean position for tracks with different lifetimes. Some interesting discrepancies between observation and simulations can be seen, especially for the Northern Atlantic cluster (Fig. 5a). In this domain, maximum cyclogenesis is observed very close to the US coast, around 30°N and 75°W (Fig. 2), but is also observed to occur as far East as 20°W. In comparison, simulated tropical cyclones form more homogeneously across the extra-tropical Atlantic, but does not extend eastward of \(\sim\)35°W. These compensating differences result in the mean observed and simulated cyclogenesis locations (first TC strength occurence) being very close to each other (not shown). On the other hand, Fig. 5a shows that the mean initial position of a hurricane in the simulations is located more to the South-West (closer to land) than in observations and tend to propagate, at least initially, towards the north-west. Because hurricanes will form, on average, closer to land and tend to propagate towards the U.S., we should expect a higher ratio of hurricanes in that sub-basin to make landfall in the simulations compared to observations. For both NCEP and ERA-I, which produce the storms that form on average closest to the US, this leads to significant differences compared to observations (Table 4).

Table 4 Same as Table 3 but for the different clusters
Fig. 5
figure 5

Mean hurricane tracks (full lines) and mean US landfalling hurricane tracks (dash lines) for the a North Atlantic cluster and b the Caribbean Sea cluster

Although the differences are not statistically significant, Table 4 shows that hurricanes forming over the Caribbean Sea are more likely to make landfall in the simulations than in observations. In this case, Fig. 5b shows the observed tracks are more likely to recurve and propagate south of Florida compared to the synthetic tracks, which propagate directly towards the main land. We also note that there is a large difference in the observed landfall rate of the storms forming in that sub-basin between the two periods considered here (1980–2008 vs 1891–2008). In this case, the difference does not come from a shift in the cyclogenesis region or changes in the direction of propagation, but from a tendency of the storms to dissipate sooner in the more recent period compared to the historical average (full black and gray lines in Fig. 5).

Finally, Fig. 1 suggests that synthetic storms making landfall tend to penetrate further inland with hurricane intensity winds than observed TCs. Comparing the average intensity of the landfalling storms (Table 5) shows that simulated tropical cyclones in three out of four datasets tend to make landfall at slightly higher intensity. Simulations also tend to underestimate the decrease in intensity after landfall, except in the 20CR-driven simulations, where it is overestimated. This is likely due to the fact that the intensity of landfalling storms are significantly higher at landfall compared to the other three datasets. Finally, Table 5 also shows that the simulations tend to underestimate the decrease in intensity typically observed prior to landfall, which is consistent with a tendency to overestimate intensity at landfall.

Table 5 Mean intensity of US landfalling tropical cyclones and mean change in intensity (both in knots) before and after landfall
Table 6 Basin-wide hurricane activity and 6-hourly intensification rate near the US coast between the most recent quiescent and active periods

4 Variability of hurricane activity

4.1 Decadal variability

The data record of Atlantic hurricanes extends back to the late 19th century and during that period, hurricane activity has been observed to oscillate between prolonged periods (lasting a few decades) of higher and lower activity. This decadal time scale variability in Atlantic hurricane activity is generally attributed to a succession of positive and negative SST anomalies in the North Atlantic Ocean dubbed the Atlantic Multi-Decadal Variability (AMV) [also referred to as the Atlantic Multi-Decadal Oscillation (AMO)] (McCarthy et al. 2015; Knight et al. 2006; Zhang and Delworth 2006; Goldenberg et al. 2001). The origin of the AMV has been linked to both aerosols and internal ocean circulation, although which one of the two is the prime driver is still being debated (Vecchi et al. 2017). The last quiescent period (negative phase of the AMV) was observed to extend from the late 1960s to the early 1990s and the most recent active period (positive phase of the AMO/AMV) is generally considered to have begun around the mid-1990s. It has been suggested that we may have entered in a new era of low hurricane activity (Klotzbach et al. 2015), but such claims have also been made in the past (Molinari 2003) and, barring some new developments in the field of decadal forecasting (Caron et al. 2018), it is quite possible that we will not know that we have entered into a new quiescent phase until we are well into it. In this section, we evaluate whether the difference between the two periods is reproduced by this technique.

Table 6 compares the average number of hurricanes and major hurricanes between the last quiescent period (1980–1992) and the following active period (1993–2010) in the observations and the different simulations. The simulations driven by ERA-Interim, MERRA and NCEP all return a larger number of hurricanes and major hurricanes in the active period. However that increase is much smaller than what was observed during that same period. There are also large discrepancies between the various simulations themselves, with the NCEP dataset returning an increase in activity which is about twice as large as the increase in the ERA-Interim dataset.

Figure 6 shows the difference in cyclogenesis density between the active and the quiescent phase for both observations and simulations. We notice an increase in tropical cyclone activity in most of the basin during the active phase, except for the region north of \(20^{\circ }\)N, which shows large areas with more cyclogenesis events during the quiescent phase. This opposite change in TC activity between the tropics and the extra-tropics has been highlighted previously by Kossin et al. (2010) and Elsner et al. (2000). The simulation driven by NCEP is arguably the only simulation which manages to reproduce the observed pattern (although at much reduced amplitude), but most simulations manage to somewhat reproduce the three observed maxima over the tropical Atlantic (also with smaller amplitude). However, almost all the simulations suggest a decrease in activity over the Gulf of Mexico during the active phase and both 20CR and ERA-Interim simulations erroneously display an area of decreasing activity over the tropical Atlantic during the active phase.

Fig. 6
figure 6

Difference in cyclogenesis density (cyclogenesis occurrence per year at less than 400 km) between the positive and negative phase of the AMV/AMO for observations (a) and for each synthetic dataset (be). Non-significant differences are stippled

Table 7 Trends in the number of storms

Because the simulations generally fail to capture the large basin-wide activity difference between the two phases of the AMV, the increasing trend in Atlantic TC activity observed over the 1980–2010 period is also severely underestimated (Table 7). The trends in Atlantic hurricane activity, as simulated by the downscaling approach evaluated here, were discussed at length in Emanuel et al. (2013) and Wing et al. (2015) and as such won’t be analyzed here. We will simply point out that the upward trends in the NCEP dataset are the largest of the four datasets (Table 7), but none of these trends match the ones observed during the 30-year period and at least part of these trends are spurious. As mentioned previously, there is a warm bias in the upper troposphere in the NCEP reanalysis during the 1980s and 1990s (Fig. 4). Such bias would act to artificially reduce hurricane intensification through a reduction in the temperature difference between the surface and the outflow temperature, and thus decreasing potential intensity, in the early part of the simulations. Reducing or eliminating this bias over the course of the 1980–2010 period would induce an artificial increase in TC activity in the 30-year simulation. We also note that the MERRA simulation shows a significant upward trend in activity while the ERA-I does not. This difference was attributed to larger trends in vertical wind shear and moist entropy deficit in the middle atmosphere over the MDR in MERRA compared to ERA-I (Emanuel et al. 2013).

4.1.1 Landfalling hurricanes

A recent study by Kossin (2017) suggests that, while conditions generally become more favorable to hurricanes in the Atlantic basin as a whole during the positive phase of the AMV, conditions become less favorable near the coastal US. Kossin (2017) showed that during the more active period, vertical wind shear tends to increase compared to the quiescent period. So, while there are fewer hurricanes and major hurricanes in the quiescent phase, the storms that do manage to reach the US coast in the quiescent phase are more likely to intensify into stronger storms than in the more active period. Kossin (2017) also detected larger variance in the intensification rates during the quiescent period than during the more active periods. We investigate whether these differences in intensification mean and variance near the coast is reproduced in the series of simulations.

Figure 7 compares the probability distribution of the intensification rate of major hurricanes near the US coastFootnote 2 between the quiescent period and the more active period while column 5 and 6 in Table 6 compares the average intensification rate between the two periods for both hurricanes and major hurricanes. We note that because the period considered here is shorter than in Kossin (2017) (1980–2010 vs 1970–2015), the differences measured in observation are no longer statistically significant, but still consistent with the results of that study.

Fig. 7
figure 7

Distribution of intensity changes (in kt per 6 hours) for major hurricanes near the US coast for both the positive and negative phases of the AMV/AMO

Although the signal is generally weaker in the simulations than in the observations, there is an increase in the mean intensification rate of the storms during the quiescent phase compared to the active phase, and that difference is generally larger for major hurricanes than for hurricanes. For hurricanes, this difference appears as a decrease in the positive intensification rate of the storms nearing the coast, while for major hurricanes it appears as an increase in the de-intensification rate. This result is fairly consistent among the different datasets, except for 20CR. For the latter, the results are less consistent with the study of Kossin (2017), but this is not entirely surprising given the large biases present in the geographical distribution of TCs in this dataset and the fact that the 20CR reanalysis does not assimilate atmospheric data.

Finally, we also evaluated whether the simulation could detect the increase in volatility in intensification rate (larger variance) during the quiescent period. As pointed out in Kossin (2017), higher volatility increases the challenge in forecasting the intensity as the storms are about to make landfall. Results show that the simulations do not capture this phenomenon (two rightmost columns of Table 6): the simulations always return an increase (sometimes significant and sometimes not) in variance during the active phase compared to the inactive phase, but never a decrease.

4.2 Interannual variability

Besides interdecadal variability, Atlantic hurricane activity is also known to display large interannual variability. This variability is driven in large part by the El Niño-Southern Oscillation (ENSO) (Gray 1984; Goldenberg and Shapiro 1996; Pielke and Landsea 1999; Klotzbach 2011a, b), but a whole range of factors have also been associated with that variability (Caron et al. 2015). High-resolution atmospheric GCMs and RCMs driven by reanalyses, when supplied with observed sea surface temperatures (SST) have shown a remarkable ability at capturing this variability over the recent past (Zhao et al. 2009; Knutson et al. 2007). In this section, we evaluate the ability of this technique at capturing the observed interannual variability and compare the impact of changing reanalysis boundary conditions on that variability.

Figure 8 shows the time series for tropical cyclones (TCs), hurricanes (HRs), major hurricanes (MHRs, category at least 3) and US landfalling hurricanes (USlf) derived both from observations and from all four downscaling exercises while Table 8 shows the correlation coefficients for each of those time series between the observations and each of the simulations. We note that, since we expect the correlation coefficients to be influenced by the sample size, it is possible that the relative performance of the different simulations would be affected if the number of seeds was kept constant for each dataset. However, we have no reason to believe that the overall conclusion derived from these results would be impacted. The synthetic datasets generally capture a large portion of the observed variability in TCs, HRs and MHRs, especially when using reanalyses that assimilate atmospheric data (ERA-Interim, NCEP and MERRA). Interestingly, the level of correlation obtained with ERA-I and MERRA (0.72, 0.78) is only slightly lower than what was obtained by Murakami (2014) when tracking the TCs directly in these two reanalysis datasets (0.86 and 0.85, respectively) for a similar period (1979–2012). Furthermore, this level of correlation is comparable to simulations performed using regional climate models driven by reanalyses (Knutson et al. 2007; Caron and Jones 2011). As expected, the correlations are lower with 20CR and the differences in correlation between the latter and the other three datasets are broadly similar to the differences in correlation detected in a RCM when a GCM is substituted to reanalyses as lateral boundary conditions (Caron and Jones 2011). The simulations have more difficulty capturing the variability of the US landfalling hurricanes, although simulations performed with NCEP and MERRA return significant correlation coefficients for this time series. Interestingly, while the results have generally been relatively similar between the MERRA and ERA-Interim dataset, in this case the simulation driven by MERRA manages to capture a certain level of the variability in US landfalling hurricanes while the simulation performed with ERA-Interim does not.

Fig. 8
figure 8

Time series of tropical cyclones (top-left), hurricanes (top-right), major hurricanes (bottom-left) and US landfalling hurricanes (bottom-right) in observations and for the various sets of synthetic tracks

Table 8 Correlation coefficients between observed and simulated times series of tropical cyclones, hurricanes, major hurricanes and US landfalling hurricanes

Although the variability is somewhat lower in the simulations, exceptionally active and inactive years are generally captured as well: the high activity of 1995 or the low activity of 1997 for example are quite well represented. Similarly, the record breaking 2005 season is also a record or near record year for all the sample except for the Era-Interim dataset. In fact, the simulations performed with reanalyses which assimilate atmosphere data capture 3 or 4 of the 5 years (period 1980–2010) with the most major hurricanes and, similarly, 3 or 4 of the 5 years with the least major hurricanes (not shown). Over the similar period 1980-2008, the simulation performed with 20CR reproduce two of these most and least active years (again, as measured by the number of major hurricanes).

We further analyze the simulated variability by dividing the storms in two different groups: a first group constructed using the storms from the Gulf of Mexico and North Atlantic clusters (henceforth referred to as the northern cluster) and a second group composed of the storms from the Caribbean Sea and tropical Atlantic clusters (henceforth referred to as the southern cluster). The correlation coefficients between each of these two clusters and observed activity are also given in Table 8. The difference between the two groups is striking: while the simulations generally manage to reproduce the observed variability to a significant level for the southern cluster, the variability of the northern cluster is not captured, except for a few rare exceptions.

This difference likely reflects the varying factors modulating TC activity in different parts of the northern Atlantic basin. Kossin et al. (2010) showed that TC activity in the deep tropic (the equivalent of our southern cluster) is primarily modulated by ENSO and by the Atlantic Meridional Mode (AMM), the influence of which should be captured by the downscaling method. On the other hand, Kossin et al. (2010) linked TC variability over the Gulf of Mexico primarily to the Madden-Julian Oscillation (MJO) and over the extra-tropics, to the May–June north Atlantic oscillation (NAO). The MJO is a sub-seasonal mode of variability and, given the seeding method adopted here and the fact that we are looking at annual mean data, we would not expect to capture its influence. The link between the NAO and Atlantic TC activity was postulated to occur through changes in the strength and location of the Atlantic subtropical high (Elsner and Kocher 2000; Elsner 2003), but could also be related to changes in Atlantic SST (Knaff 1997). Given that the physical mechanism linking the NAO and Atlantic hurricane is not entirely resolved and that (Boudreault et al. 2017) showed that the NAO predictive power of seasonal activity drops significantly after 1980, it is not clear whether we should expect the hurricane variability in that sub-basin to be captured or not.

4.2.1 ENSO

As mentioned previously, ENSO is one of the main factors of Atlantic hurricane variability. The influence of ENSO has been found to occur primarily through modulation of upper-tropospheric zonal winds, and consequently on vertical wind shear, over the Northern tropical Atlantic (Goldenberg and Shapiro 1996; Bell and Chelliah 2006), but changes in tropospheric humidity (Camargo et al. 2007) and upper-tropospheric temperatures have also been implicated (Sobel et al. 2002). The presence of El Niño (La Niña) conditions in the tropical Pacific are usually associated with conditions more detrimental (favorable) to cyclogenesis and with a general decrease (increase) in Atlantic tropical cyclone activity.

To understand whether the influence of ENSO is captured by the simulated TC activity, we use regression analysis, using the Niño 3.4 index as the predictor. A Poisson regression was used for the observed number of TCs whereas standard linear regression was used for outputs from reanalyses. The Niño 3.4 index is defined as the series of SST anomalies based on a 25 years sliding climatology over the Niño 3.4 region (region bounded by 5°N–5°S and 120°–170°W) during the months of ASO, the peak of the hurricane season in the Atlantic. For each simulation, the time series are computed using the corresponding reanalysis SST product : Oiv2 (Reynolds) (Reynolds et al. 2007) for MERRA and NCEP, HadISST (Rayner et al. 2006) for 20CR and a combination of different products for Era-Interim.

Figure 9 shows the sign of the regression coefficients of the relationship between the various hurricane time series and the Niño 3.4 index. Results for the entire basin and for the northern and southern clusters are computed separately. The color of the shading indicates the sign of the relationship (blue is negative, red is positive) whereas the tone indicates the significance of the relationship as measured by the p-value of a two-tailed test (10% level (lightest shade, very weak relationship), 5% level (intermediate shade), or a 1% level (darkest shade, strong relationship)).

Fig. 9
figure 9

Relationship between the TC activity and ENSO computed with observations and reanalyses over different parts of the basin between 1980 and 2010. The heatmaps illustrate the sign and significance of the relationship for each pair of predictands (bottom) and predictors (left). Blue color refers to a negative coefficient. Tone of shading indicates significance of the relationship as measured by the p value of a two-tailed test on the coefficient

As expected, we note a significant influence of ENSO on basin-wide activity and a strong contrast in the influence of ENSO between the two clusters: while ENSO is a good predictor of TC activity for the southern cluster, it is not the case for the northern cluster. This is relatively well captured in the simulations, with the regression coefficients being generally significant for the southern cluster, but not for the northern cluster. We note that this is consistent with results shown in Table 8, which showed that the simulations were generally able to capture the variability of the southern cluster but were less successful for the northern cluster.

For the 1980–2010 period, all the simulations show a link between landfalling hurricanes from the southern cluster and the Niño 3.4 index. This is not the case in the observations, but the regression coefficient becomes highly significant when the longer period is considered (Fig. 10). This is not entirely surprising, given that the sample of landfalling hurricanes is much smaller than that of basin-wide hurricanes. Compared to the basin-wide level, there are many additional layers of randomness (e.g. intensity at a certain point, direction of propagation, survival time) which make it more difficult to detect a statistically significant signal from ENSO with only 30 years of data. Adding more than 80 years of data helps strengthening and detecting that signal, as shown in Fig. 10. On the other hand, the models are forced to generate between 100–200 tropical cyclones every year. Compared to the average of 12 cyclones forming annually in the Atlantic, we should expect the simulations to have a larger signal-to-noise ratio over 30 years compared to the observations, which in this case makes it sufficient to detect the influence of ENSO.

Fig. 10
figure 10

Same as Fig. 9 but for 20CR reanalysis with HadISST dataset as the predictand, over the period 1891–2008

We further our investigation into the different responses of the simulated TC activity to ENSO by comparing how the latter vary with the flavour of El Niño during the period 1980–2008: Kim et al. (2009) showed that while a standard El Niño event (warming in the Niño 3 region) generally leads to a decrease in hurricane activity across the entire basin, a Modoki El Niño (Ashok and Yamagata 2009; Kulkarni and Siingh 2016), where the warming is more pronounced in the central tropical Pacific (Niño 4 region), usually leads to a decrease in activity in the eastern part of the basin, but an increase in the western part of the basin (first column of Fig. 11). This asymmetric response between the two flavours of El Niño was suggested to be driven by the different response of the upper-level winds and vertical wind shear in the western part of the basin (Kim et al. 2009).

Figure 11 shows the composite differences in track density between El Niño (East Pacific warming, EPW), Modoki El Niño (Central Pacific warming , CPW) and La Niña (East Pacific cooling, EPC) for the 1980–2010 period. The different responses between La Niña, El Niño and Modoki El Niño highlighted by Kim et al. (2009) is clearly visible (first row), although in this case, the increase in activity during Modoki El Niño is significantly reduced compared to what is shown in Kim et al. (2009). We note that although the period in Kim et al. (2009) is longer (1951–2006) than what is considered here, most of the Modoki have occurred over the more recent past and the two periods have the same number of Modoki El Niño (5): the year 1969 (12 hurricanes) has been substituted by 2009 (3 hurricanes). All simulations show a general decrease in activity during EPW (top row) and a general increase in activity during EPC (bottom row). On the other hand, the simulations completely fail to capture the observed response to Modoki El Niño: no increase in activity in the western part of the basin is detected and the simulated responses between EPW and CPW are generally very similar to one another in all the simulations. This simulated response is very similar to what had previously been reported in an ensemble of RCM simulations (Caron et al. 2010), suggesting that the apparent TC-response to Modoki El Niño might be due to the small observational sample. We also notice differences in the ENSO responses between the different simulations (e.g. the NCEP simulation shows a stronger response over the western part of the basin than the MERRA or ERA-I simulation), but these differences are largely consistent with the difference in cyclogenesis density shown in Fig. 2. Finally, the impact of ENSO in 20CR is maximum over the Caribbean Sea, which coincides with the large maximum in activity in that simulation.

Fig. 11
figure 11

Composites in track density anomalies during typical El Niño (first row), modoki El Niño (second row) and La Niña events (third row). El Niño tracks anomalies were constructed using the years 1982, 1987 and 1997 (31 observed TCs). Modoki El Niño tracks anomalies were constructed using the years 1991, 1994, 2002, 2004 and 2009 (65 observed TCs). La Niña tracks anomalies were constructed using the years 1988, 1998, 1999, 2007 and 2010 (86 observed TCs)

5 Concluding remarks

In this manuscript, we have evaluated the ability of a downscaling method to reproduce observed Atlantic hurricane activity over the recent past and investigated the sensitivity of the results to the choice of boundary conditions. As previous studies had pointed out, the geographical distributions of cyclogenesis and cyclone tracks were quite realistic, but many of the biases that had been identified previously (e.g. southward bias in cyclogenesis formation) (Emanuel et al. 2008) appeared as being independent of the choice of the reanalysis dataset used to drive these simulations.

On the other hand, the choice of reanalysis boundary conditions did have an impact on certain characteristics of the simulated tropical cyclone activity. For one, TC activity was systematically weaker in the NCEP-driven simulations compared to ERA-Interim- and MERRA-driven simulations. This bias in TC intensity was linked in part to a well-known warm bias in the upper-troposphere in the NCEP reanalysis. The gradual reduction of this temperature bias over the 30-year period did reduce the gap in intensity (compared to the other datasets), but did not eliminate it completely and also induced an artificial upward trend in TC activity in this particular dataset. With this artificial component, the upward trend in TC activity in the NCEP-driven dataset was the largest of the four datasets, but even so, the simulation did not capture the trend that was observed during the study period. In the simulation which was driven by the reanalysis dataset that did not assimilate atmospheric data (20CR), we noticed an unrealistically large maximum in cyclogenesis over the Caribbean Sea. The high number of TCs over the region had a significant impact on the overall geographical distribution, as this high number of TCs came at the expense of TCs forming in the other regions of the Atlantic basin.

Despite the biases, the proportion of storms making landfall over the continental US was generally well captured, but two datasets (NCEP and ERA-I) had a tendency to shift cyclogenesis formation westward over the Northern Atlantic, which led to an overestimate of landfalling storms along the eastern seaboard. Intensity at landfall was generally slightly overestimated, which was the result of underestimating the decrease in intensity observed in TCs approaching land.

Finally, the technique evaluated here was also capable of detecting differences in the level of hurricane activity between the active and the quiescent phase, although these differences were smaller than what has been observed. This holds true both for the basin-wide level of activity and for the difference in intensification of landfalling hurricanes near the US coast [the response to the so-called “protective barrier” Kossin (2017)]. Similarly, the influence of ENSO was generally well reproduced, with the TC response concentrated over the tropics. On the other hand, the simulations did not manage to reproduce the unusual track anomalies observed during modoki El Niños. Whether this is a failure of the technique, possibly linked to some of its biases, or whether it’s due to a peculiar signal arising from the small number of observed modoki El Niño is not clear at this time and is beyond the scope of this study, but certainly warrants further attention.

By focusing largely on the intensity of the storms in this study, we implicitly focused on wind-related damage. However, as hurricane Harvey clearly demonstrated, rain can also be a major source of damage, in particular for slow moving storms. As such, the downscaling technique is currently being extended to include precipitation originating from tropical cyclones and evaluated against radar observations. Once this is completed, it will offer an even more complete view of TC-related risk.