1 Introduction

The forecast skill of the dynamic forecast system is primarily determined by the initialization and realism of the model formulations (Koster et al. 2004; Vitart 2014). Further, forecast skill of the individual forecast event depends on the instability of the dynamical system (Lorenz 1969). Generally, the forecast skills are inversely proportional to the instability of the system, however, several well-known unstable climate modes are now well predicted with a recent advance of the dynamical models based on the improvement of the community’s scientific understanding. In this respect, various sources of subseasonal climate predictability have been found (Newman et al. 2003; Vitart 2014; Mariotti et al. 2020).

The excitation of the well-known large-scale climate variability, such as the Madden–Julian Oscillation (MJO), Atlantic Oscillation (AO), circumglobal teleconnection (CGT) pattern, and Pacific-Japan (PJ) pattern contributed to improving the forecast skill over East Asia (Kosaka et al. 2012; Kim et al. 2014; Lee et al. 2015, 2020; Liu et al. 2021; Feng et al. 2021). That means, through the atmospheric teleconnections, impacts of the aforementioned well-known climate variability are expected to reach East Asia, therefore, it contributes to increase the forecast skill. For example, as the phase and amplitude of the MJO can be predicted more than 20 days in advance (Kim et al. 2018), the East Asian summer rainfall variations resulting from the MJO are also expected to be predictable with a similar lead time, which is systematically longer than its overall forecast skill (Park et al. 2017; Liang and Lin 2018). In this respect, the forecast skill of the climate variability in any region would depend on whether the predictable large-scale variability is excited. However, the contribution of various well-known dominant climate variability to the forecast skill of the mid-latitude climate variability has never been quantified. In other words, it is still unclear what the large-scale variability that is most closely linked to the mid-latitude forecast quality are.

To reveal the large-scale climate variability that guarantee the high forecast skill, it must be preceded to evaluate the forecast quality of an individual case, then a case or a group with high forecast skill should be selected. The temporal correlation between the forecasted and the observed anomalies is one of the most common metric to evaluate the forecast skill (Peng et al. 2002; Vitart 2014; Robertson et al. 2015; Liang and Lin 2018; de Andrade et al. 2019); however, it has an explicit limitation for this purpose. Since the correlation skill is defined by the second-moment statistics (i.e., covariance and variance), the correlation skill can be calculated only for a group, not an individual case. In other words, correlation skill might not be proper for evaluating the forecast skill of an individual forecast case. One may select a case or categorize a group using absolute or squared error-based metrics. However, those metrics are dependent on the forecasted and observed anomaly amplitudes. Therefore, it may be an inappropriate indicator of the degree of association between two variables.

To assess the forecast skill of an individual forecast case independent to the anomaly amplitude, we adopted a strategy to apply a Taylor expansion to the temporal correlation (Geng et al. 2018), and used it as a metric to evaluate the skill of the individual forecast. As this forecast metric mainly depends on the degree of association between the forecasted and observed anomalies while it is less dependent on its amplitude, it is referred to as the “association strength (AS) skill”. Using the AS skill, an approximated temporal anomaly correlation skill can be calculated for each forecast case.

The data and methods used in this study are described in Sect. 2. The derivation of the AS skill from the formulation of the temporal correlation is given in detail in Sect. 3. In Sects. 4 and 5, the application results of the AS skill to define a group of high correlation skills over the East Asian summer rainfall in hindcast experiments are provided. Finally, we summarize the main findings and discuss further implications of this research in Sect. 6.

2 Data and methods

2.1 Observational and hindcast data

We analyzed the hindcast dataset of the UK Met Office’s ensemble Global Seasonal forecast system version 5 (GloSea5, Maclachlan et al. 2015). To evaluate the GloSea5 prediction skill over East Asia on a subseasonal time scale, we used the daily hindcast output of the GloSea5 forecast system in May–June–July–August (MJJA) 1991–2010, with global coverage at 0.833° \(\times\) 0.556°, which were analyzed after being regridded to 2.5° \(\times\) 2.5° using box averaging with simple bilinear interpolation. The prediction began on days 1, 9, 17, and 25 of each month and was integrated for 60 days; consequently, three ensemble members were produced, and they were averaged to make ensemble mean in this study. Therefore, the number of forecast cases was 320 for the period of interest [i.e., 4 cases/month \(\times\) 4 months (from May to August) \(\times\) 20 years]. The analyzed variables were daily geopotential height field at 500 hPa; meridional and zonal winds at 850, 500, and 200 hPa; precipitation. To focus on the subseasonal variability, the interannual variability was removed by subtracting the average value for the previous 120 days from all data after subtracting the daily climatology (Wheeler and Hendon 2004). For example, for 10-days-lead forecasts, the average for the 10 days of predicted values and 110 days of the most recent observed values was subtracted to remove the interannual variability. Note that this method can be applied for real-time forecasting because we only used observed values before the forecast start date (Wheeler and Hendon 2004). All data were pentad averaged before analysis.

For reference, the ERA-INTERIM daily reanalysis data from the European Centre for Medium Range Weather Forecasts (ECMWF) was used with global coverage on 2.5° \(\times\) 2.5° grids (Dee et al. 2011). The daily geopotential height field of 500 hPa, meridional and zonal winds of 850, 500, and 200 hPa, and precipitation were used. For all variables, the average value for the previous 120 days was subtracted from all data after the daily climatology for MJJA 1991–2010 was subtracted.

2.2 Climate indices

To evaluate the forecast skill of East Asia precipitation anomalies, we used the East Asian Summer Rainfall anomaly (EASRA) index, which is defined as the area-averaged precipitation at 30°–50° N, 115°–150° E (Lee et al. 2005). The circimglobal teleconnection (CGT) pattern is defined by the 200 hPa geopotential height anomalies averaged over 35°–45° N, 55°–75° E (Ding and Wang 2007). The Pacific-Japan (PJ) pattern is defined by an 850 hPa geopotential height from 155°E, 35°N minus 125° E, 22.5° N, multiplied by 2 (Nitta 1987). The boreal summer intraseasonal oscillation (BSISO) index was calculated by Kikuchi (2020). The Madden–Julian Oscillation (MJO) index was acquired from https://psl.noaa.gov/mjo/mjoindex/omi.1x.txt, and Summer North Atlantic Oscillation (SNAO) index from https://ftp.cpc.ncep.noaa.gov/cwlinks/norm.daily.nao.index.b500101.current.ascii. Each mode was acquired by compositing the observed pentad anomalies with respect to climate indices.

2.3 Wave-activity flux

To determine the wave energy flux associated with the EASRA index, the zonal and meridional components of the wave-activity flux (WAF) were calculated as follows (Takaya and Nakamura 2001).

$${\mathbf{W}} = \frac{p\cos \phi }{{2\left| {\mathbf{U}} \right|}}\left( {\begin{array}{*{20}c} {\frac{U}{{a^{2} \cos^{2} \phi }}\left[ {\left( {\frac{{\partial \psi^{\prime}}}{\partial \lambda }} \right)^{2} - \psi^{{^{\prime}\frac{{\partial^{2} \psi^{\prime}}}{{\partial \lambda^{2} }}}} } \right] + \frac{V}{{a^{2} \cos \phi }}\left[ {\frac{{\partial \psi^{\prime}}}{\partial \lambda }\frac{{\partial \psi^{\prime}}}{\partial \phi } - \psi^{{^{\prime}\frac{{\partial^{2} \psi^{\prime}}}{\partial \lambda \partial \phi }}} } \right]} \\ {\frac{U}{{a^{2} \cos \phi }}\left[ {\frac{{\partial \psi^{\prime}}}{\partial \lambda }\frac{{\partial \psi^{\prime}}}{\partial \phi } - \psi^{{^{\prime}\frac{{\partial^{2} \psi^{\prime}}}{\partial \lambda \partial \phi }}} } \right] + \frac{V}{{a^{2} }}\left[ {\left( {\frac{{\partial \psi^{\prime}}}{\partial \phi }} \right)^{2} - \psi^{{^{\prime}\frac{{\partial^{2} \psi^{\prime}}}{{\partial \phi^{2} }}}} } \right]} \\ \end{array} } \right)$$
(1)

where W represents the WAF and \(\psi ^{\prime}\) is the streamfunction anomaly regressed onto the EASRA index. U and V are the climatological zonal and meridional fields, respectively, in MJJA; \(\lambda \mathrm\,\text{and}\,\phi\) are the longitude and latitude, respectively; \(a\) is the earth radius; \(p\) is the pressure normalized by 1000 hPa; \({f}_{0}\) is the Coriolis force; and \({N}^{2}\) is the buoyancy frequency.

3 The association strength (AS) skill

3.1 Mathematical derivations

The AS skill is derived from the temporal anomaly correlation. First, the correlation skill for all forecast cases \(\overline{{r} }\) at lth lead day can be written as follows (Peng et al. 2002; Vitart et al. 2012; Robertson et al. 2015; Liang and Lin 2018; de Andrade et al. 2019).

$$\overline{{r(l)}} = \frac{{\overline{{XY(l)}} }}{{\sqrt {\overline{{X^{2} (l)}} \times \overline{{Y^{2} (l)}} } }}$$
(2)

Each terms in Eq. (2) are calculated as follows,

$$\overline{{XY(l)}} = \frac{1}{{n_{t} }}\sum\limits_{{n = 1}}^{{n_{t} }} {\{ X\left( {n,l} \right) - \overline{{X(l)}} \} } \bullet \{ Y(n,l) - \overline{{Y(l)}} \}$$
(3)
$$\overline{{X^{2} (l)}} = \frac{1}{{n_{t} }}\sum\limits_{{n = 1}}^{{n_{t} }} {\{ X\left( {n,l} \right) - \overline{{X(l)}} \} ^{2} }$$
(4)
$$\overline{{Y^{2} (l)}} = \frac{1}{{n_{t} }}\sum\limits_{{n = 1}}^{{n_{t} }} {\{ Y\left( {n,l} \right) - \overline{{Y(l)}} \} ^{2} }$$
(5)

\(\overline{{XY }({l})}\) denotes the covariance between the predicted value X(l) and observed value Y(l). \(\overline{{{X} }^{2}({l})}\), \(\overline{{{Y} }^{2}({l})}\) denote the variance of the predicted and observed values at the lth lead day, respectively. \({X}\left({n},{l}\right),{Y}\left({n},{l}\right),\) and \({{n}}_{{t}}\) is the X, Y at lth lead day in the nth forecast case, and the number of forecast cases, respectively. \(\overline{{X }({l})},\overline{{Y }({l})}\) is the forecast-case-averaged values of X and Y at lth lead day (i.e., \(\overline{{X}({l})}=\frac{1}{{{n}}_{{t}}}\sum_{{n}=1}^{{{n}}_{{t}}}{X}({n},{l}),\overline{{Y}({l})}=\frac{1}{{{n}}_{{t}}}\sum_{{n}=1}^{{{n}}_{{t}}}{Y}({n},{l})\)).

To evaluate the prediction skill for a subset of forecast cases \({{n}}_{{s}}\) (i.e., \(1<{{n}}_{{s}}<{{n}}_{{t}}\)), the following are defined.

$$\Delta XY\left(l\right)=\widetilde{{XY(l)}}-\overline{{XY(l)}}$$
(6)
$$\Delta {{X}}^{2}\left({ l }\right)=\widetilde{{{X}}^{2}\left({l}\right)}-\overline{{{X} }^{2}({l})}$$
(7)
$$\Delta {{Y}}^{2}\left({l}\right)=\widetilde{{{Y}}^{2}({l})}-\overline{{{Y} }^{2}({l})}$$
(8)

\(\widetilde{XY(l)}\), \(\widetilde{{X}^{2}\left(l\right)}\), \(\widetilde{{Y}^{2}\left(l\right)}\) denote the covariance between X and Y, the variance of the X, and Y at the lth lead day for a subset of forecast cases (\({n}_{s}\)). \(\Delta XY\left(l\right)\), \({\Delta X}^{2}\left(l\right)\), \(\Delta {Y}^{2}\left(l\right)\) denote the deviation of the \(\widetilde{XY\left(l\right)}\), \(\widetilde{{X}^{2}\left(l\right)}\), \(\widetilde{{Y}^{2}\left(l\right)}\) from the corresponding values with all cases at the lth lead day, respectively. Then, correlation skill for a subset of forecast case can be defined as follows.

$$\widetilde{{r(l)}} = \frac{{\overline{{XY(l)}} + \Delta XY(l)}}{{\sqrt {\left( {\overline{{X^{2} (l)}} + \Delta X^{2} \left( l \right)} \right)\left( {\overline{{Y^{2} (l)}} + \Delta Y^{2} \left( l \right)} \right)} }}$$
(9)

Here, \(\widetilde{r(l)}\) refers to the correlation skill for a subset of forecast cases at the lth lead day. The AS skill can be obtained by Taylor series expansion of Eq. (9), where only the first order is retained, as follows (Geng et al. 2018).

$$\widetilde{{r(l)}} \approx \overline{{r(l)}} + \overline{{r(l)}} \frac{{\Delta XY\left( l \right)}}{{\overline{{XY(l)}} }} - \frac{{\overline{{r(l)}} }}{2}\{ \frac{{\Delta X^{2} \left( l \right)}}{{\overline{{X^{2} \left( l \right)}} }} + \frac{{\Delta Y^{2} \left( l \right)}}{{\overline{{Y^{2} \left( l \right)}} }}\}$$
(10)

The first term on the right-hand side is the correlation skill over all prediction cases as defined in Eq. (2). If we normalize the predicted and observed values by the standard deviation as follows,

$$x\left( {n,l} \right) = \frac{{X\left( {n,l} \right) - \overline{{X(l)}} }}{{\sqrt {\overline{{X^{2} (l)}} } }}$$
(11)
$$y\left( {n,l} \right) = \frac{{Y\left( {n,l} \right) - \overline{{Y(l)}} }}{{\sqrt {\overline{{Y^{2} (l)}} } }}$$
(12)

then \(\overline{{x }^{2}(l)}=\overline{{y }^{2}(l)}=1\), and \(\overline{{xy(l)}} = \overline{{r(l)}}\). Finally, to define the AS skill to evaluate the forecast skill of an individual forecast case, we defined \(\Delta xy\), \(\Delta {x}^{2}\), and \(\Delta {y}^{2}\) terms from a single forecast case (i.e., \({n}_{s} =1\)) as follows.

$$\Delta xy\left({n,l}\right)=x\left({n,l} \right) \bullet y(n,l)-\overline{{xy(l)}}$$
(13)
$$\Delta {{x}}^{2}\left({n},{l}\right) ={{x}}^{2}\left({n},{l}\right) - \overline{{{x} }^{2}({l})}$$
(14)
$$\Delta {{y}}^{2}\left({n},{l}\right)={{y}}^{2}\left({n},{l}\right)-\overline{{{y} }^{2}\left({l}\right)}$$
(15)

The AS skill can be obtained as follows by modifying Eq. (10).

$$AS\left( {n,l} \right) = \overline{{r(l)}} + \Delta xy\left( {n,l} \right) - \frac{{\overline{{r(l)}} }}{2}\{ \Delta x^{2} \left( {n,l} \right) + \Delta y^{2} \left( {n,l} \right)\}$$
(16)

As the denominator in Eq. (16) is a constant, the AS skill can be considered a linearized version of Eq. (9). Therefore, the correlation skill and the case-averaged AS skill in any group are expected to be equivalent for any group. However, it should be emphasized that, with the AS skill, the forecast skill can be calculated for the individual forecast case. In contrast, the correlation skill for the individual case is hardly reasonable.

3.2 Interpretation

The advantages of the AS skill are demonstrated by comparing its characteristics with the nondimensional squared error defined as follows.

$$Nondim\_sqerr(n,l)={\left\{x\left(n,l\right)-y\left(n,l\right)\right\}}^{2}$$
(17)

To fairly compare the nondimensional squared error to the AS skill, we first modified Eq. (17) by applying Eqs. (13)–(15) as follows.

$$ \begin{aligned}{Nondim}\_{sqerr}\left( {{n},{l}} \right) &= \,\overline{{{x}^{2} \left( {l} \right)}} + \Delta {x}^{2} \left( {{n},{l}} \right) + \overline{{{y}^{2} \left( {l} \right)}} \\ &\quad+ \Delta {y}^{2} \left( {{n},{l}} \right) - 2\overline{{{xy}\left( {l} \right)}} - \Delta 2{xy}\left( {{n},{l}} \right)\end{aligned} $$
(18)

Finally, Eq. (19) can be obtained as follows.

$$ \begin{aligned}1 - \frac{{{Nondim}\_{sqerr}\left( {{n},{l}} \right)}}{2} & =\, \overline{{{r}\left( {l} \right)}} + \Delta {xy}\left( {{n},{l}} \right) \\ &\quad- \frac{1}{2}\left\{ {\Delta {x}^{2} \left( {{n},{l}} \right) + \Delta {y}^{2} \left( {{n},{l}} \right)} \right\}\end{aligned} $$
(19)

Note that this derivation is possible with the assumption \(\overline{{x(l)}^{2}}=\,\overline{{y(l)}^{2}}=1\), \(\overline{x(l)}=\overline{y(l)}=0\), which is true for the entire set, while it is normally not the case for any subset. A left-hand term of Eq. (19) (i.e., \(1-Nondim\_sqerr(n,l)/2\)) will be referred to as a ‘transformed squared error.’

Both AS skill and the transformed squared error are calculated as the covariance between x and y (i.e., \(\Delta {x}{y}({n},{l})\)) diminished by the summed variance terms of x and y (i.e., \(-\frac{1}{2}\{\Delta {{x}}^{2}\left({n},{l}\right)+{\Delta {y}}^{2}\left({n},{l}\right)\}\)). The covariance term exhibits a large value once the sign of the forecasted value is matched to the observed. In other words, with the fixed summed variance terms, the covariance term is maximized when the values of x and y are the same. If only either x or y has a large value while the other is kept the same, it increases the variance terms to a greater degree than the covariance term. This means that both AS skill and transformed squared error tends to decrease once the forecasted value is apart from the observed value.

While the contribution of the covariance term is the same between AS skill and the transformed squared error, the contribution of the variance terms is systematically different. The variance terms are multiplied by the fixed amplitude of the 0.5 in transformed squared error, the amplitude of the variance terms in AS skill is proportional to \(\frac{\overline{{r}}}{2 }\). In AS skill, the both the covariance term and summed variance terms can affect the AS skill with a similar degree as the overall amplitude difference between the covariance and the variances are matched by multiplying \(\frac{\overline{{r}}}{2 }\). On the other hand, in transformed squared error, the summed variance terms affect much strongly than covariance as the variances are larger than the covariance (i.e., \(\frac{1}{2}({{x}}^{2}+{{y}}^{2})\ge {x}{y}\)).

In other words, the most distinct feature of the AS skill is that it is less affected by the amplitude of the observed and forecasted values than transformed sqaured error. The increase in the amplitude of x and y tends to increase the covariance term, however, it is canceled out to some extent by the variance terms. That is, the AS skill is less dependent on the amplitude of the forecasted/observed anomalies.

4 Application of the AS skill to evaluate the forecast skill of East Asian summer precipitation

To verify the similarity between the correlation skill and the case-averaged AS skill for any group, the all-case-averaged AS skill of the EASRA index was firstly compared with the correlation skill using all the forecast cases. The all-case-averaged AS skill of the EASRA index between the observations and forecasts are 0.84, 0.60, and 0.44 at 1-, 2-, and 3-pentad leads, respectively. Those AS skill are identical to the correlation skill, which is 0.84, 0.60, and 0.44 at 1-, 2-, and 3-pentad leads, respectively. This shows that the AS skill can be used to evaluate the correlation skill of any group.

The forecast skill of the EASRA index is roughly comparable to the often-used criteria for a reliable forecast (i.e., correlation skill > 0.5) at 3-pentad leads (Park et al. 2017; Liang and Lin 2018). Thus, we focus on the forecast skill at 3-pentad leads as it is a key measure for determining the period of a reliable forecast.

As aforementioned, the advantage of using the AS skill is that the forecast skill of the individual forecast cases can be evaluated. To demonstrate this point, Fig. 1 shows the AS skill of the EASRA index at 3-pentad lead (black line). The x-axis represents the case number of the GloSea5 hindcast. For example, the leftmost, and the second value denotes the forecast initiated on May 1, and 9, 1991, and the value denotes the AS skill at the third pentad after the forecast began (i.e., average from May 11–15, and 19–23, 1991), respectively. Most values vary within the magnitude of 2, and the minimum, and maximum values reached up to about – 6, and + 4, respectively.

Fig. 1
figure 1

Association Strength (AS) skill of EASRA index (precipitation averaged over 30–50°N, 115–150°E) between GloSea5 and observations at 3-pentad lead (black line) in MJJA 1991–2010. The highest AS cases (red dots) are defined as those where the AS skill subtracted from the all-case-average value is greater than 0.5 times the standard deviation. The lowest AS cases (blue dots) are those where the AS skill subtracted from the all-case-average value is smaller than − 0.5 times the standard deviation

To examine the impact of each term on the AS skill, we calculated the anomalous AS skill, covariance term, and the sum of variance term for individual forecast cases in Fig. 2. Anomalous AS skill and covariance terms have a linear relationship, implying that the AS skill tends to increase as the predicted and observed value commonly increase (Fig. 2a). An asymmetric feature is shown because variance terms consistently contribute to lower the AS skill. For example, when the signs of x and y are the same, the covariance is positive, the variance term partially cancels out the covariance. On the other hand, when the signs of x and y are opposite, the covariance term is negative, and the variance term contribute to amplify the negative AS skill. (Fig. 2b).

Fig. 2
figure 2

Scatter plot between the anomalous AS skill (x-axis) and a the covariance term (i.e., second term on the right-hand side of Eq. (16), y-axis) and b the variance terms (i.e., sum of third and fourth term on the right-hand side of Eq. (16), y-axis) for individual forecast case

Based on the skill evaluation of the individual forecasts using the AS skill, we identified two groups according to the AS skill. The first group is the highest AS cases, which are defined as those where the AS skill subtracted from the all-case-averaged value is greater than 0.5 standard deviation; represented as red dots in Fig. 1. The second group is the lowest AS cases, which are defined as those where the AS skill subtracted from the all-case-averaged value is smaller than − 0.5 standard deviation; represented as blue dots in Fig. 1. The number of the highest, and lowest AS cases is 69, and 70, respectively.

To verify that these two groups are well separated according to the forecast quality of the EASRA index, we evaluated the characteristics of the EASRA index for each group (Fig. 3). Figure 3a shows the regression coefficient of the 5-days-moving-averaged observed and predicted EASRA indices at lead times of 1 to 5 pentads, with respect to the observed EASRA index at 3-pentad lead. In the highest AS cases (red lines in Fig. 3), the observed and predicted EASRA indices show similar time evolutions; both the observed and predicted EASRA indices begin to develop at 2-pentad lead and show a peak phase at 3-pentad lead. This indicates that the GloSea5 successfully simulated the observed evolution of the EASRA index in the highest AS cases. In the lowest AS cases (blue lines in Fig. 3), the observed EASRA coefficient at 3-pentad lead is positive, whereas the predicted EASRA coefficient is negative. This is consistent with the fact that the forecasts of the EASRA index in the lowest AS cases are systematically erroneous.

Fig. 3
figure 3

a Lead/lag-regression coefficients between the observed EASRA index at 3-pentad lead and the observed (solid line) and forecasted (dashed line) EASRA index in the highest AS cases (red) and lowest AS cases (blue) cases. b PDF of anomalous EASRA index for highest AS cases (red line), lowest AS cases (blue line), and remaining (gray bars) cases

Figure 3b shows the probability density function (PDF) of the anomalous EASRA index in the highest AS cases (red line), lowest AS cases (blue line), and remaining cases (gray bars, 181 cases). In the remaining cases, the PDF of the precipitation anomalies had the largest value for the small-amplitude anomalies. In contrast, for both the highest and lowest AS cases, the probability is relatively high at relatively strong precipitation anomalies. This feature is much more prominent for the highest AS case; this finding is consistent with previous studies, which found that the forecast skill tends to be better when the target amplitude is strong (van den Dool and Toth 1991; Kim et al. 2014; Mason et al. 2021).

The next question is whether the forecast cases in the highest AS cases actually exhibit the highest correlation skill. To examine this point, we randomly selected a same number of cases as for the highest AS cases (i.e., 69 cases) among total forecast cases without replacement, then, calculated the case-averaged AS skill and the correlation skill. This procedure was repeated 1,000 times. Note that the AS skill was pre-calculated for individual forecast events, then, a simple average for the selected cases was taken, while the correlation skill was calculated after the cases were selected. The scatter plot between the randomly selected case-averaged AS skill and the correlation skill exhibited a strong linear positive relationship (Fig. 4a). As the AS skill successfully measured the correlation skill for any particular group, the cases with high AS skill tended to exhibit a high correlation skill (Fig. 4b).

Fig. 4
figure 4

a Scatter plot between the correlation skill (x-axis) and case-averaged AS skill (y-axis) of a group with 69 randomly selected forecast cases. b Scatter plot between the rank in the forecast skill evaluated by the correlation skill (x-axis) and case-averaged AS skill (y-axis). Note that we selected the same number of forecast cases (i.e., 69 cases) to the cases for the highest AS cases, and the random selection of the forecast cases is repeated 1000 times. The red dots represent the skill and rank of the highest AS cases

One can wonder about the accuracy of the AS skill with the zeroth-order approximation (i.e., only with the covariance term, the second term on the right-hand side in Eq. (16)). We checked the root-mean-square-difference between the zeroth-order or first-order AS skill and the correlation skill for the randomly selected cases; it was 0.06 and 0.021 in the zeroth- and first-order approximation, respectively. This clearly demonstrates that the AS skill becomes more accurate with the first-order approximations as in Eq. (16) rather than the formulation only with the zeroth-order (i.e., only with the covariance term).

Interestingly, the highest AS cases exhibited the highest correlation skill (red dot in Fig. 4). Both the AS and the correlation skills in the highest AS cases were clearly beyond the range of the skill in the randomly defined groups. This means that the evaluation of the individual forecast skills based on the AS skill extracted the most well-predicted cases that guaranteed the highest correlation skill, which could hardly be achieved by random selection of the cases.

For comparison, Fig. 5 shows the scatter plot of the correlation skill and the case-averaged squared errors, transformed squared error for the randomly selected cases. There is an inverse relationship between the case-averaged squared errors and the correlation skill, and this relationship is not strong as those between AS skill and the correlation skill. In transformed squared error, even though squared error transformed to the correlation dimension, variance terms affect forecast skill much more strongly than covariance. For these reasons, the forecast skill measured based on squared error or its non-dimensional transformed form still exhibited less agreement with the correlation skill than the AS skill does (Fig. 5b). Therefore, the cases with low squared errors do not always guarantee high correlation skill (Fig. 5c). This supports our notion that the previous individual forecast skill evaluation metric, such as the squared errors, might not be optimal for evaluating the degree of association between the forecasted and the observed anomalies, even though it might still be useful when evaluating the simulation quality considering up to anomaly amplitude.

Fig. 5
figure 5

Same as Fig. 4, but for the case-averaged a squared error, b transformed squared error (y-axis) and the correlation skill (x-axis), c ranks for the squared error, transformed squared error (y-axis), and the correlation skill (x-axis). The red dots represent the skill and rank of the highest AS cases

5 Dominant climate variability in the highest AS cases

5.1 Spatial distribution of the EASRA-related anomalies

In this section, we examined a dominant large-scale subseasonal variability associated with the EASRA index in highest AS cases. To consider the time-lags that the remote large-scale climate variability results in East Asian precipitation variations, we exhibited large-scale climate anomalies averaged for 2–3 pentad leads. Figure 6 shows the 10-day averaged (for 2–3 pentad leads) precipitation anomalies regressed onto the observed EASRA index at 3-pentad lead in highest AS cases. The observed and predicted precipitation anomalies related to the EASRA index show a positive precipitation anomaly over eastern China-Korean peninsula-southern Japan and a negative precipitation anomaly over the subtropical western Pacific. The spatial pattern of the precipitation anomalies resembles a typical rainband pattern over East Asia during boreal summer (Kang et al. 1999; Chen et al. 2004; Seo et al. 2011).

Fig. 6
figure 6

Regression map of 10-days averaged precipitation anomalies in observation (a) and GloSea5 (b) onto observed EASRA index at 3-pentad lead in highest AS cases

Figure 7 shows the geopotential height anomalies at 500 hPa (Z500) and the WAF, zonal wind at 200 hPa (U200), the streamfunction at 850 hPa (Strm850), and horizontal wind anomalies at 850 hPa (UV850) regressed onto the observed EASRA index anomaly in the highest AS cases. The regressed Z500 anomalies in the observations exhibited low-pressure anomalies over the northern Korean peninsula, and high-pressure anomalies over the southern Japan. This dipole pattern of the upper-level geopotential height anomalies amplifies the climatological land-sea pressure gradient to enhance the summer monsoonal structure (Lee et al. 2017). Interestingly, the regressed Z500 anomalies also show low-pressure anomalies over the Norwegian Sea and high-pressure anomalies over the mid-latitude North Atlantic, western Russia (Fig. 7a), indicating that the low-pressure anomalies over the northern Korean peninsula are part of the wave train patterns originating from the North Atlantic. The regressed WAF anomalies (vectors in Fig. 7a), which flow from the North Atlantic to East Asia, support this notion. The overall spatial distribution of Z500 anomalies is well simulated by GloSea5 (Fig. 7d).

Fig. 7
figure 7

Regression map of a, d 10-days averaged geopotential height anomalies at 500 hPa [shading, unit: m (mm day−1)−1] and WAF [green vectors, unit: m2 s−2 (mm day−1)−1] at 500 hPa, b, e zonal wind anomalies at 200 hPa [shading, unit: m s−1 (mm day−1)−1], c, f streamfunction anomalies at 850 hPa [shading, unit: kg (ms)−1 (mm day−1)−1] and anomalous wind field at 850 hPa [green vectors, unit: m s−1 (mm day−1)−1] in observations (left panel) and GloSea5 (right panel) onto observed EASRA index at 3-pentad lead in highest AS cases

The spatial distribution of the Z500 anomalies is dynamically coupled to the upper-level zonal wind, which also contributes to the increase in rainfall anomalies over East Asia. The regressed U200 in observation and GloSea5 (Figs. 7b, e) show prominent westerlies over northern China and Japan; they are located between the negative Z500 anomalies over northern Korean peninsula and positive Z500 anomalies over southern Japan (Fig. 7a). Over the south of the jet stream entrance, the upward motion is induced as a secondary circulation (Shapiro 1981; Sanders and Hoskins 1990); therefore, the anomalous upward motion contributes to the increased precipitation over Korea.

The low-tropospheric responses associated with the EASRA index were examined (Fig. 7c, f). The regressed Strm850 shows a dipole pattern consisting of negative streamfunction anomalies over Korea and positive streamfunction anomalies over the Kuroshio Extension region in both the observations and GloSea5. The positive streamfunction anomalies over the Kuroshio Extension region (140\(^\circ\)–160\(^\circ\) E, 25\(^\circ\)–35\(^\circ\) N), which coincide with the in situ anticyclonic anomalies (Fig. 7a), resulting in an anomalous southerly flow and lead to an intrusion of warm humid air from the lower latitudes (Ham et al. 2019, 2021). This intrusion contributes to leading an upward motion to balance the moist energy budget and eventually increases precipitation over Korea (Ham et al. 2007).

To examine whether the dominant climate variability associated with the EASRA index is linked to one of the well-known atmospheric teleconnection patterns during boreal summer, we calculated pattern correlation coefficients (PCCs) between the EASRA-regressed anomalies in the highest AS cases with respect to the typical pattern of the SNAO, CGT, PJ, MJO, or BSISO (Table 1). Note that we used the student's t-test for the significance test, whose degree of freedom depends on the number of grid points to calculate the PCCs.

Table 1 The pattern correlation coefficients between the highest AS cases and each mode

The PCCs between the EASRA-regressed Z500 anomalies in highest AS cases (Fig. 7a) and the Z500 anomalies during the positive (negative) phase of SNAO above 10° S–90° N, 40° W–160° E is only 0.00 (+ 0.04). The PCCs is − 0.36 (+ 0.45) with the Z500 anomalies during the positive (negative) phase of the CGT. Additionally, the PCCs between EASRA-regressed Strm850 anomalies in highest AS cases (Fig. 7c) and the positive (negative) phase of the PJ pattern above 0°–55° N, 80°–190° E is + 0.35 (− 0.40), and the magnitude of the absolute PCCs between U200 anomalies in highest AS cases (Fig. 7b) and each phase of the MJO or the BSISO over 10S°–50° N, 40°–160° E are less than 0.4. Even though the degree of similarity between the EASRA-regressed pattern in highest AS cases and the CGT pattern is significant, however, the determination coefficient calculated by a squared pattern correlation coefficient r2 is not high enough (i.e., 13 and 20% for positive and negative CGT patterns, respectively). The results suggest that the dominant EASRA-related anomalies in the highest AS cases are not explained by any single previously-known climate variability.

Figure 8 shows the regressed atmospheric variables with respect to the observed EASRA index anomaly at 3-pentad lead for all forecast cases (Figs. 8a–c), and lowest AS cases (Figs. 8d–f) in the observations. Although the regressed-Z500, U200, and Strm850 anomalies in all cases are similar to those in the highest AS cases over East Asia, the large-scale wave train patterns emanating from the North Atlantic is unclear. For example, while the negative Z500 anomalies were clearly shown over the western Europe in the highest AS cases, they are not shown in all cases. This indicates that the remote atmospheric teleconnection can be a key feature to improve the forecast skill over East Asia.

Fig. 8
figure 8

Same as Fig. 7, but for all cases (a, b, c) and the lowest AS cases (d, e, f) in the observation

In the lowest AS cases, the EASRA-regressed anomalies also exhibited a systematic difference from those in the highest AS cases. The EASRA-regressed anomalies in the lowest AS cases show a positive Arctic Oscillation (AO)-like Z500 pattern, and the direction of the WAF is distinct from those in the highest AS cases. The wavy structure is shown in U200 anomalies with weaker amplitude, even though the dipole structure of the strm850 is still clear in the lowest AS cases. It demonstrates a clear difference in the remote large-scale climate variability associated with the EASRA between the highest and the lowest AS cases, implying that the forecast skill of the mid-latitude climate variability can depend on the remote large-scale climate variability.

Lastly, Fig. 9 shows the regressed atmospheric variables for the observed EASRA index anomaly at 3-pentad lead for all forecast cases and lowest AS cases in GloSea5. In lowest AS cases, the regressed Z500 anomalies exhibit negative over the Korean peninsula but are more zonally elongated and weaker than observed. In addition, the negative Z500 anomalies are manifested over the Arctic as in observation, but the WAF flows from the Arctic to East Asia are weaker than observation. In the upper-level, the spatial distribution of regressed U200 anomalies over the western Pacific has the opposite sign to the highest AS cases (Fig. 7b, e). The westerlies are exhibited over southern Japan, located between low-pressure anomalies over Korea and high-pressure anomalies over the subtropical western Pacific. The downward motion is induced over the north of the jet stream entrance as a secondary circulation, contributing to the decreased precipitation over Korea. In the low-tropospheric, the regressed Strm850 anomalies show negative streamfunction anomalies over Japan and Kuroshio Extension region, which coincide with the in situ cyclonic anomalies (Fig. 9d), resulting in northerly wind anomalies over Korea. The northerly wind anomalies bring relatively dry and cold air from the higher latitudes, leading to less northward transport of moisture to East Asia and inducing anomalous downward motion during boreal summer (Wu 2002; Yihui and Chan 2005; Zhang and Zhou 2015). These features tend to contribute to a decrease the precipitation over East Asia.

Fig. 9
figure 9

Same as Fig. 8, but for all cases (a, b, c) and the lowest AS cases (d, e, f) in GloSea5

These spatial distributions of the lowest AS cases in GloSea5 are quite different from observations, while all cases are relatively well simulated in GloSea5. Since lowest AS case is a selection of cases in which covariance between EASRA of observation and GloSea5 tends to be opposite, the climate patterns of the lowest AS cases in GloSea5 relate to negative precipitation anomalies over East Asia. It clearly demonstrates that AS skill can successfully categorize not only the highest AS cases but also the lowest AS cases.

5.2 Possible implication for the real-time forecasts

The dominant modes revealed in the highest AS cases can be used to predict the reliability of real-time forecasts in advance. As a simple example, one can suppose that the forecasts that exhibited a similar pattern shown in the highest AS cases would be more reliable. To support this idea, Fig. 10 shows the prediction skill of the EASRA index at 3-pentad lead in groups categorized based on the degree of spatial similarity between each individual forecast and the EASRA-related Z500 in the highest AS cases (Fig. 7d) at 40\(^\circ \mathrm{ W}\)–170 \(^\circ \mathrm{E}\), 10\(^\circ\) S–90 \(^\circ \mathrm{N}\). The correlation skill of the EASRA index at the 3-pentad lead in 57 cases that the amplitude of the pattern correlation between the predicted Z500 anomalies and EASRA-regressed Z500 anomalies in the highest AS cases exceeds 0.4 is 0.65. In contrast, the correlation skill of the EASRA index at the 3-pentad lead is only 0.38 when the predicted Z500 anomalies are not similar to the EASRA-related Z500 in the highest AS cases (i.e., |pattern correlation|< 0.4). As its 95% confidence range is 0.19 to 0.57, the forecast skill of 0.65 for the group with a predicted Z500, which is similar to the EASRA-related Z500 in the highest AS case, is significantly higher than that of the other group. This type of approach can increase the utilization of dynamic forecast results without further post-processing.

Fig. 10
figure 10

Correlation skill of EASRA index at 3-pentad lead in groups categorized by amplitude of pattern correlation coefficient between individual forecast and spatial distribution of EASRA-related Z500 in highest AS cases (i.e., Fig. 7d at 40° W–170° E, 10° S–90° N). Bars represent groups with a degree of similarity > 0.4 (red bar, 57 cases) and < 0.4 (gray bar). Note that the gray bar represents the average skill for random samples of 57 cases without replacement from all forecast cases except for cases corresponding to a degree of similarity > 0.4, which were obtained from 1000 samplings. The error bar on the gray bar represents the range of the top 2.5% and to the bottom 2.5% of correlation skill in random samples

6 Summary and discussion

In this study, we formulated the AS skill to measure the forecast skill of the individual case. As the AS skill can be defined for the individual case, one can select the cases which exhibit the highest correlation skill in a linearized form. For the cases with the highest AS, the correlation skill is superior to any randomly chosen group. Therefore, with AS skill, one can group the well-predicted (or poorly-predicted) forecasts to examine their characteristics. Furthermore, once the group with highest AS cases is defined, one might expect that the corresponding group exhibits the highest correlation skill, as the correlation skill and the case-averaged AS skill in any group are almost equivalent by definition of the AS skill.

In the highest AS cases, the Rossby wave trains emanating from the North Atlantic to East Asia are prominent to enhance the precipitation amount over the Korean peninsula (Ham et al. 2018; Hu et al. 2020). Those EASRA-related atmospheric teleconnection patterns in the highest AS cases cannot be solely explained by any of the previously known atmospheric variability of SNAO, CGT, PJ, MJO, or BSISO (Madden and Julian 1971; Nitta 1987; Ding and Wang 2007; Bollasina and Messori 2018; Kikuchi 2020; Liu et al. 2020). Additionally, those EASRA-related large-scale teleconnection features in the highest AS cases are not shown in the lowest AS case or all cases. For example, in the lowest AS cases, the EASRA-related wave trains resemble the Arctic Oscillation (He et al. 2017), and wave-trains propagating from the North Atlantic are not shown. Those results support our notion that the dominant climate variability in the most skillful forecast is clearly different from the previously known climate variability.

This study utilized a methodology to evaluate the forecast skill of the individual cases based on the association between the observed and the predicted values. AS skill is a successful approximation of correlation skills because the AS skill is a linearized form of correlation skill. The most distinct feature of the AS skill from the squared error-based metrics is that the AS skill is less affected by the amplitude of the observed and forecasted values as the correlation skill is.

As the AS skill is a successful approximation of the correlation skill, the physical meaning of the AS skill with a certain amplitude is similar to that of the correlation skill. For example, once one defines a criterion for the skillful forecast as the correlation skill of 0.5, the corresponding AS skill also exhibits a similar amplitude. To confirm this point, we arranged the forecast cases of the EASRA index in descending orders of the AS skill, then correlation skill with 11-cases-moving window is calculated. The correlation skill for each subset with 11 cases and the

AS skill of an individual forecast case whose AS skill is at 6th largest is compared. The AS skills generally exhibits similar values with the correlation skill, even though AS skill tends to be slightly higher than the corresponding correlation skill. The value of 0.5, 0.6 for the correlation skill corresponds to approximately 0.59, 0.65 for the AS skill, respectively.

Even though various advantages of the AS skill, it should be noted that it requires an assumption to expand the Taylor series in Eq. (9) that delta terms should be sufficiently smaller than the total mean. In other words, AS skill requires the similarity of moment between the subset and total: the mean and variance of the subset should be similar to that of the total set, which is 0 and 1 in this study, respectively.

The AS skill would be utilized to detect a linkage of the forecast skill between different regions. For example, the spatial distribution of the difference in the AS skill of the 10-days averaged precipitation between highest and lowest AS cases shows an apparent positive AS skill not only over Korea, but also over the eastern Eurasia and Barents-Kara sea (Figure not shown). This implies that AS skill can be a useful tool to examine the covariations in the forecast skill between East Asia and remote regions as large-scale climate variability determines the forecast skill of precipitation over East Asia.