1 Introduction

Extremes of extratropical geostrophic wind speed (geo-wind) derived from long-term historical sub-daily surface pressure observations have been used to infer historical storminess conditions and trends over the northeast Atlantic-European region (Wang et al. 2009, 2011, referred to as W09 and W11 hereafter; Alexandersson et al. 1998, 2000; Schmidt et al. 1998; WASA 1998) and over southeast Australia (Alexander et al. 2011). This is because surface pressure observations are, relatively, more reliable and temporally homogeneous than surface wind observations, and extratropical geo-wind extremes derived from surface pressure observations well approximate surface wind speed extremes (W09).

Recently, Wang et al. (2012), referred to as W12 hereafter, applied an objective cyclone tracking algorithm to analyze trends and low frequency variability of extra-tropical cyclone activity in the ensemble of Twentieth Century Reanalyses (20CR; Compo et al. 2011). On the basis of comparing linear trends estimated from a seasonal cyclone activity index (CAI) time series and seasonal 95th geo-wind percentiles in winter (JFM), they concluded that, for the North Atlantic-European region, the 20CR cyclone trends are in agreement with trends in geo-wind extremes derived from in-situ surface pressure observations. This conclusion has been challenged by Krueger et al. (2013b) because a recent study by the same lead author (Krueger et al. 2013a) comparing annual 95th geo-wind percentiles derived from surface pressure observations with those from the 20CR found that “20CR-geostrophic storminess deviates to a large extent from the observation-based curve” in the period prior to 1950.

In this reply, after briefly describing the data and methodology in Sect. 2, we corroborate in Sect. 3 that our conclusion is valid; and we clarify in Sect. 4 that several factors contribute to the apparent inconsistencies between the 20CR and observation-based geo-wind extremes reported by Krueger et al. (2013a, b). Finally, we summarize our conclusions in Sect. 5.

2 Data and methodology

The 10 triangles analyzed in both W09 and W11 are shown in Fig. 1. We also show station Armagh because we use the data from this station to help verify suspect values in some of the other stations (see “Appendix” and Figs. 6, 7). The 20CR gridpoints used to approximate the stations, also shown in Fig. 1 (red diamonds), are exactly the same as in Krueger et al. (2013a, b). In addition, the 50-km EASE (Equal Area SSM/I Earth) gridpoints used to obtain the North Sea regional mean CAI values in W12 are shown in Fig. 1 (thin blue crosses). In W12, CAI values in 5 × 5 arrays of 50-km EASE-gridpoints were aggregated to represent the CAI at the 250-km EASE grid scale (the center point in the 5 × 5 array of 50-km EASE-gridpoints; see Fig. 1, thick black crosses), obtaining 11 CAI time series (one for each of 11,250-km EASE-gridpoints). Each of these CAI time series was standardized before being averaged to obtain the regional mean series. There was a minor error in calculating the regional mean CAI values in W12, namely, only 9 CAI time series (cyan dots in Fig. 1), were used to obtain the regional mean. This error has an effect on the 11-year Gaussian filtered CAI curve reported in W12 but no discernible effect on the trend estimate (see the magenta and blue curves in Fig. 2a; they share the same trend estimate).

Fig. 1
figure 1

The pressure triangles that were analyzed in Wang et al. (2009). All triangles with a dotted line are supplementary triangles (see Sect. 2 and Table 2 of Wang et al. 2009). The first year of geo-wind data is also shown in each of the 10 triangles. The red diamonds represent the 20CR gridpoints that are used to approximate the stations for calculating geo-winds from the 20CR data. The thin blue crosses indicate the set of 50-km EASE-gridpoints over which the cyclone activity index (CAI) was averaged to obtain the North Sea regional mean CAI values for comparison with the regional mean geo-wind extremes. The thick black crosses represent the 250-km EASE-gridpoints; CAI values were aggregated for the 5 × 5 arrays of 50-km EASE-gridpoints centered at each of these 250-km EASE-gridpoints (see W12). The cyan dots show the set of 250-km EASE-gridpoints over which the regional mean CAI curve shown in Fig. 12a of W12 was obtained

Fig. 2
figure 2

The Gaussian filtered series of the North Sea regional averages of standardized cyclone activity index (CAI), and of standardized seasonal P95 geo-winds (black and red curves) a in winter and b in all four seasons consecutively. The red and black curves are based on the 3-hourly or 6-hourly (as indicated) geo-winds derived from the Obs and NewObs data (see Sect. 2), respectively. The blue (magenta) curve represents averages of CAI over the 11 points shown as thick black crosses (the 9 points shown as cyan dots) in Fig. 1. The numbers in parentheses are the trend estimates for the corresponding time series. The cyan hatching represents the 95 % confidence interval of the blue trend line, and the grey shading, of the black trend line. The red (magenta) curve in a is a copy of the black (dashed) curve in Fig. 12a of W12. As in W12, the standardization is relative the mean and standard deviation of the period 1961–1990, and the seasons are defined as JFM, AMJ, JAS, and OND

Before proceeding to any analysis, we screen the sea level pressure (SLP) observations from stations show in Fig. 1 for large errors (errors greater than 20 hPa) using the procedure detailed in the “Appendix”, and correct or exclude the identified erroneous values, obtaining a “new” observational data set (NewObs). As a result of the screening for large errors, we found 108 segments (short periods within a data record) with erroneous SLP values (Figs. 6, 7). The Aberdeen and Torshavn records contain the most identified errors. For the entire collection, almost all of these errors occur in the pre-1948 period and appear to have been introduced primarily during the digitization of paper records or from other post-measurement processing procedures. The errors are usually on the order of tens of hPa (Figs. 6, 7) and, as would be expected, have notable effects on the observation-based geo-wind extremes. These erroneous values were not identified, nor corrected or excluded in any of the previous studies using the surface pressure data of the WASA (Waves and Storms in the North Atlantic) project (Alexandersson et al. 1998, 2000; Schmidt et al. 1998; WASA 1998; W09; W11; Krueger et al. 2013a; and other pressure tendency studies using the WASA data), although W09 had already identified and excluded 49 random errors in their analysis. These errors, and those identified in W09, are all in the WASA pressure data set (Schmidt et al. 1997; WASA 1998). The WASA data set has been incorporated into the International Surface Pressure Databank (ISPD, Yin et al. 2008), which was assimilated into the 20CR and also used in W09 and W11. Thus, all these errors are apparently present in Krueger et al. (2013a, b) and Alexandersson et al. (1998, 2000); most are also present in W09 and W11 (except the 49 identified in W09). These post-measurement digitization and processing errors are much larger than measurement errors and represent a major source of uncertainty in the observations. The error model of Krueger et al. (2013a) considers only a 1 hPa standard deviation measurement error, and thus does not fully represent the uncertainty in the observations.

The observed sub-daily SLP time series (typically with two or three values daily in the early decades, and 3-hourly or hourly in the recent decades) were interpolated to a 3-hourly data series using the same procedure as in W09 and W11 (natural spline interpolation; as explained in W09, interpolation is necessary because the hours of observations vary from station to station, and also over time). Since the available 20CR data are 6-hourly, we sample the resulting 3-hourly observations at the same 6-hourly time steps (0000, 0600, 1200, 1800 hours) as in 20CR, and we exclude (set to missing) the 20CR time steps where the observations in the NewObs data set are missing, obtaining the 20CR_NewObs data set. Thus, both the 20CR and observations (20CR_NewObs and NewObs) have exactly the same number of non-missing geo-wind data points, at the same sequence of 6-hourly time steps. Also, exactly the same method was used to calculate geo-winds from the observed and 20CR 6-hourly SLP data, and the same methods were used to derive the annual and seasonal 95th percentiles (P95). For comparison purposes, we also consider annual and seasonal P95 values based on geo-winds from 3-hourly observational SLP data. These are indicated with “_3hly” hereafter.

3 Observed and 20CR trends

First, we show that our conclusion that “For the North Atlantic-European region and southeast Australia, the 20CR cyclone trends are in agreement with trends in geostrophic wind extremes derived from in-situ surface pressure observations” is valid. To this end, we obtain annual P95 values conventionally (i.e., as the 95th percentile considering all sub-daily geo-winds in each calendar year), and we also obtain consecutive seasonal P95 values using the method of W11. Specifically, we first calculate the seasonal P95 of all sub-daily geo-winds in moving 91-day windows, obtaining a daily time series of moving season P95 values. Then, a 91-day moving average procedure is applied to the daily series of moving season P95 values, obtaining a daily time series of 91-day moving averaged values of moving season P95. This latter daily series is sampled seasonally, at four mid-season days of each year, to obtain the consecutive seasonal P95 values analyzed in this study. The power spectrum of such a consecutive seasonal P95 series is presented by the green curve in Fig. 1 of W11, indicating that this series contains little aliasing effect (W11).

For the North Sea region (the area of the 5 triangles: APTB, BAPV, DAPV, APVD, and VTAP; see Fig. 1), as shown in Fig. 2, the agreement between the linear trend estimates for the 20CR CAI time series and for seasonal P95 geo-wind time series is even better than reported in W12 after the correction or exclusion of the newly identified data errors. The linear trends are estimated using the method detailed in Wang and Swail (2001), which is based on the Kendall’s tau (Sen 1968; Kendall 1955; Mann 1945) and also accounts for lag-1 autocorrelation. The 95 % confidence interval for the trend is estimated using the variance of the corresponding residual series (von Storch and Zwiers 1999).

In winter (JFM), the linear trend estimate for the 20CR CAI time series is closer to that for the NewObs_3hly P95 geo-wind time series than to that for the Obs_3hly P95 geo-wind time series (see the numbers in parentheses in Fig. 2a). In other words, the agreement between the blue and black trend lines in Fig. 2a is better than that between the blue and red trend lines [the latter is what was shown in Fig. 12a of W12]. This is also true for the consecutive seasonal time series, as shown in Fig. 2b. Also, the 95 % confidence interval of the consecutive seasonal CAI trend estimate (cyan hatching) overlaps substantially with that of the NewObs consecutive seasonal P95 geo-wind trend estimate (grey shading in Fig. 2b). Note that the Obs_3hly and NewObs_3hly seasonal P95 geo-winds in Fig. 2, and in Fig. 12a of W12, are derived conventionally from all 3-hourly geo-winds in each season of each year (i.e. same as in W09). But the NewObs (also labeled as “NewObs_6hly” in Fig. 2b) seasonal P95 geo-winds are derived using the method of W11 and thus contains little aliasing effect. Both the CAI values and the NewObs seasonal P95 geo-winds in Fig. 2b are from 6-hourly data and thus are more comparable.

The difference between the pair of blue and black trend lines in Fig. 2a or b is statistically insignificant, because the linear trend estimated for the time series of the differences between the 20CR CAI time series and the NewObs_3hly or NewObs P95 geo-wind time series is insignificant. The 95 % confidence interval for the trend estimated from the difference time series is (−0.00482, 0.00692) for Fig. 2a, and (−0.00832, 0.00071) for 2b, both indicating that the trend in the differences is insignificant.

Note that we discussed both linear trends and low-frequency variability in W12, which is clearly specified even in the title of the study, and that we concluded only that the linear trend estimates are consistent with each other. We computed linear trends, because it is a common summary of one aspect of a time series and linear trends are of broad interest to users, and because low-frequency variability can exist with or without a long-term linear trend.

4 Factors contributing to the reported inconsistencies

Next, we show that the inconsistencies between the 20CR and observation-based geo-wind extremes reported by Krueger et al. (2013a, b) are, to a large extent, an artifact of using annual percentiles to represent extremes. The curves in Fig. 3 represent 45-season (11.25-year) Gaussian filtered series of the consecutive standardized seasonal P95 values, derived from the uncorrected observations (red), from the newly corrected observations (black), and from 20CR data with missing values in the newly corrected observations being excluded (blue). The symbols in Fig. 3 indicate the unfiltered consecutive seasonal P95 values. Note that in Fig. 3 the seasons are defined as DJF, MAM, JJA, and SON, to be consistent with W09 and W11.

Fig. 3
figure 3

North Atlantic regional averages of standardized consecutive seasonal P95 geo-winds (symbols) and the corresponding 45-season Gaussian filtered series (curves). The seasonal P95 values are obtained using the method of W11 to diminish aliasing effects. The NewObs and Obs indicate that the geo-winds are calculated from SLP data with and without correction (or exclusion in some cases) of the newly identified errors, respectively. The 20CR_NewObs is the 20CR in which time steps corresponding to missing values in the NewObs data set have been masked off (set to missing). The grey shading represents the 20CR_NewObs ensemble spread. Discontinuities in the curves represent periods of missing geo-winds. The correlations between the black and blue curves (filtered series) are reported without parentheses on the graph, and those between the dots and crosses (unfiltered series) are reported in parentheses

In general, the 20CR and observation-based consecutive seasonal P95 series are in good agreement, especially over the period since 1893 (Fig. 3, blue and black curves). The correlation between the unfiltered series (dots and crosses in Fig. 3) is 0.815, 0.900, and 0.946 for the whole period (1874–2007), the period from 1892 to 2007, and the period from 1950 to 2007, respectively (Fig. 3, the numbers in parentheses). All correlations are highly significant (the 99.99 % critical value for sample correlations is 0.160 for sample size N = 134 × 4 = 536). The slightly lower correlation for the whole period is due to the deviation in the pre-1893 period. Note that this deviation is substantially smaller than that of the annual P95 time series shown in Fig. 4 and in Krueger et al. (2013a, b). The correlation between the filtered series (black and blue curves in Fig. 3) is lower than that between the unfiltered series (dots and crosses), particularly for the whole period due to the pre-1893 discrepancy between the observational P95 values and those from 20CR.

Fig. 4
figure 4

a, b The 11-year Gaussian filtered series, and c the unfiltered series, of regional averages of standardized annual P95 geo-winds over the indicated regions. The annual P95 values are determined conventionally by using all 6-hourly geo-winds in each calendar year, except for the Obs_3hly curve, which is derived from all 3-hourly geo-winds (it is a copy of the annual P95 curve shown in Fig. 2 of W09). The NewObs and Obs indicate that the geo-winds are calculated from SLP data with and without correction/exclusion of the newly identified errors, respectively. The 20CR_NewObs is the 20CR in which time steps corresponding to missing values in the NewObs data set have been masked off (set to missing). The grey shading represents the 20CR_NewObs ensemble spread (from the minimum to the maximum values among the 56 members). Discontinuities in the curves represent periods of missing geo-winds. The correlations (Corr) reported on each panel are those between the black and blue curves for the indicated periods

The differences between the P95 values in the filtered Obs and NewObs (red and black curves, respectively), and between the unfiltered Obs and NewObs (pink circles and black dots, respectively), shown in Fig. 3 are purely due to the effect of the newly identified observational errors that were included in the Obs data set, but are either corrected or excluded in the NewObs data set. These errors, especially the long run of very large errors (greater than 30 hPa) in the Aberdeen record (Fig. 6, top panel), result in a few very large outliers (up to about 6.5 standard deviations in 1879) in the observation-based geo-wind extremes (see Fig. 3). Their effects are particularly notable in the first decade (compare Fig. 3, red and black curves). This is because only two of the 10 triangles (APTB and BAPV) have any geo-wind data for the pre-1893 period. Both of these triangles include the erroneous Aberdeen record, and one also includes the erroneous Torshavn record (Fig. 1). Since there were only two geo-winds triangles available in the pre-1893 period, the uncertainty in the observationally estimated curve is expected to be substantially larger in this early period.

It is important to note that 20CR assimilated marine observations and other station data in the early period, in addition to the few stations that were used in the geo-wind calculation. For each pressure triangle, the geo-winds derived from the 20CR data indirectly involve observations in the vicinity of the triangle and farther afield. That is, several types of observational information were used in the 20CR, which enables more comprehensive quality control (QC) of the observational data, potentially resulting in geo-wind estimates that are less affected by observational errors than the geo-winds derived from observational pressure triangles. For example, 143 out of the 146 newly identified erroneous values in the Aberdeen record for the period 1871–1921 were rejected by the 20CR QC system (see Compo et al. 2011 “Appendix B” for a detailed description of the 20CR QC system). For the Aberdeen record for year 1879, the 20CR rejected 98 values, including the long run of large errors (38 erroneous values) in October 1879 that were identified in this study. Some of the errors identified by the 20CR are probably smaller than 20 hPa, so that the procedure of screening for large errors conducted in this study (see “Appendix”) cannot detect them. This may be an additional reason for the deviation between the 20CR and observed geo-winds in the pre-1893 period (Fig. 3, blue and black curves, respectively). Further in-depth analysis of the marine and other station data collectively is necessary to find the causes behind the remaining deviation; we plan to undertake this time consuming task in the near future. We believe that the uncertainty also requires further investigation in this early period, both for the observations and 20CR. More in-depth quality assurance of the pressure data and digitization of more observed data in the early period, such as being coordinated by the Atmospheric Circulation Reconstructions over the Earth initiative (Allan et al. 2011), will help reduce uncertainty.

Next, despite the known aliasing issues, we reconsider the conventional annual P95 geo-winds as in previous studies (e.g., Alexandersson et al. 1998, 2000; Krueger et al. 2013a). Figures 4a, b show the 11-year Gaussian filtered regional averages of the standardized annual P95 geo-winds for both the North Sea and the North Atlantic regions (see Fig. 1). For the NewObs and 20CR_NewObs data sets, the unfiltered series of standardized annual P95 geo-winds are also shown in Fig. 4c. The annual P95 values are derived conventionally, as in W09, Alexandersson et al. (1998, 2000), and Krueger et al. (2013a). The Obs_3hly (green) curve is the annual P95 curve shown in Fig. 2 of W09 (also in Fig. 1 of Krueger et al. 2013b), which was based on 3-hourly geo-winds derived from the SLP data without the correction or exclusion of the errors shown in Figs. 6, 7.

The difference between the green and red (Obs_3hly and Obs) curves in Fig. 4a, b arises solely from the sampling interval, i.e., 6-hourly (Obs) versus 3-hourly (Obs_3hly) geo-winds. For the North Atlantic averages, sampling from 3-hourly geo-winds gives higher annual P95 values in the early decades (and lower values in the 1960s) than sampling from 6-hourly geo-winds (Fig. 4b, green and red curves). For the North Sea area, the differences are smaller but still noticeable in the early decades (Fig. 4a, green and red curves). In Krueger et al. (2013a, b), the observed geo-winds were derived from 3-hourly data, but the 20CR geo-winds were 6-hourly (because the available 20CR data are 6-hourly). This contributes modestly to the deviation between the 20CR and observed low-pass filtered annual P95 time series, particularly for the full domain (Fig. 4b; compare blue and green curves vs. blue and red curves).

The difference between the red and black (Obs and NewObs) curves in Fig. 4a, b is purely due to the correction or exclusion of the newly identified erroneous SLP values shown in Figs. 6, 7. The effect of this is particularly notable in the pre-1900 period. Since there are only two geo-wind triangles available in the pre-1893 period and both triangles are included in the North Sea area, the differences in the regional averages between the North Sea and the North Atlantic regions (compare the same colour curves in Fig. 4a, b) in this early period are small; they are purely due to the standardization that is based on the mean and variance of the whole period for each curve.

As can be seen from the blue and black curves in Figs. 3 and 4b, the deviations between the 20CR and observation-based consecutive standardized seasonal P95 series are much smaller than those between the corresponding annual P95 series. The inconsistencies between 20CR and observation-based geo-wind extremes reported by Krueger et al. (2013a) are mainly in the annual P95 time series and are, to a large extent, due to the annual sampling (other contributors are described above). The annual sampling convolves the very different storminess regimes in different seasons. The resulting annual P95 time series suffers from aliasing between the effects of low-frequency variability and the annual cycle. For example, differences in the annual cycle between the 20CR and station-data based P95 geo-winds (see Fig. 5) would be aliased and shown as differences in the low-frequency variability. On the contrary, the annual cycle in both the mean and variance of geo-wind extremes is effectively diminished from our consecutive standardized seasonal P95 time series. This is because our seasonal P95 values are derived from 4 distributions (one for each season) for each year and are then standardized in each of the four seasons of year, separately (i.e., the standardization is with respect to the mean and variance in each season).

Fig. 5
figure 5

The annual cycle in the mean and variance of the 20CR and station-data based P95 geo-winds for each of the 10 triangles (see the horizontal axis and Fig. 1)

Differences could exist in the annual cycle of both the mean and variance, as shown in Fig. 5, because of (1) the small differences between the 20CR and station-based triangles, (2) the fact that 20CR used more observational information (marine and other station data) than the station-data based geo-winds, and (3) limitations of the 20CR model resolution. In particular, 20CR shows lower variance of P95 geo-winds over triangles DAPV, VTAP, and APVD, but higher mean P95 values with higher variability over triangle BBV, than the station-data based counterpart (Fig. 5).

Nevertheless, the correlations between the annual P95 time series of the NewObs and 20CR_NewObs data sets (see the numbers in Fig. 4c) are highly significant statistically. Even the lowest value (0.766), which is obtained for the whole 134-year period (1874–2007) for the North Atlantic region (Fig. 4c), is highly significant (for sample size N = 134, the 99.99 % critical value for sample correlations is 0.316). The correlations between the filtered annual P95 series (black and blue curves in Fig. 4a, b) are much lower, which may be partly due to the “the discrepancy-spreading effect” of the Gaussian filter.

5 Conclusions

In this reply, we have provided further evidence to show that the conclusion comparing linear trends in 20CR storminess and observation-based geo-wind extremes in W12 is valid. We have also clarified that several factors contribute to the apparent inconsistencies between the 20CR and observation-based geo-wind extremes reported by Krueger et al. (2013a, b). These include the choice of index that is used to represent time variation in extremes (e.g., annual vs. seasonal percentiles), the use of different sampling intervals (6-hourly vs. 3-hourly), and the presence of very large errors in the observations (i.e., the WASA pressure data set; Schmidt et al. 1997) that were not identified, nor corrected or excluded in any of the previous studies of observation-based geo-wind extremes (Alexandersson et al. 1998, 2000; Schmidt et al. 1998; WASA, 1998; W09; W11; Krueger et al. 2013a, b).

We have shown that the time series of consecutive seasonal P95 geo-winds derived from the observations and from 20CR are in good agreement starting in 1893, with some deviation in the pre-1893 period for which the observations (especially digitized data) remain limited and are more uncertain. The correlation between the 20CR and observation-based geo-wind extremes (P95) time series for the full 134-year record is highly significant statistically, with and without the correction or exclusion of the newly identified erroneous SLP values. The agreement between 20CR and observations is further improved after the correction or exclusion of the newly identified erroneous SLP values.