Probabilistic seasonal precipitation forecasts using quantiles of ensemble forecasts

Jin, Huidong; Mahani, Mona E.; Li, Ming; Shao, Quanxi; Crimp, Steven

doi:10.1007/s00477-024-02668-5

Probabilistic seasonal precipitation forecasts using quantiles of ensemble forecasts

ORIGINAL PAPER
Open access
Published: 29 February 2024

Volume 38, pages 2041–2063, (2024)
Cite this article

Download PDF

You have full access to this open access article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Probabilistic seasonal precipitation forecasts using quantiles of ensemble forecasts

Download PDF

548 Accesses
Explore all metrics

Abstract

Seasonal precipitation forecasting is vital for weather-sensitive sectors. Global Circulation Models (GCM) routinely produce ensemble Seasonal Climate Forecasts (SCFs) but suffer from issues like low forecast resolution and skills. To address these issues in this study, we introduce a post-processing method, Quantile Ensemble Bayesian Model Averaging (QEBMA). It utilises quantiles from a GCM ensemble forecast to create a pseudo-ensemble forecast. Through their reasonable linear relationships with observations, each pseudo-member connects a hurdle distribution with a point mass at zero for dry months and a gamma distribution for wet months. These distributions are mixed to construct a forecast probability distribution with their weights, proportional to the quantiles’ historical forecast performance. QEBMA is applied to three GCMs, including GloSea5 from the United Kingdom, ECMWF from Europe and ACCESS-S1 from Australia, for monthly precipitation forecasts in 32 locations across four climate zones in Australia. Leave-one-month-out cross-validation results illustrate that QEBMA enhances forecast skills compared to raw GCMs and other post-processing techniques, including quantile mapping and Extended Copula Post-Processing (ECPP), for forecast lead time of 0 to 2 months, based on five metrics. The skill improvements achieved by QEBMA are often statistically significant, particularly when compared to raw GCM forecasts across the 32 study locations. Among these post-processing models, only QEBMA consistently outperforms the SCF benchmark climatology, offering a promising alternative for improving seasonal precipitation forecasts.

An effective post-processing of the North American multi-model ensemble (NMME) precipitation forecasts over the continental US

Article 09 October 2017

Deterministic and probabilistic evaluation of raw and post processed sub-seasonal to seasonal precipitation forecasts in different precipitation regimes

Article 30 October 2018

Bias adjustment and ensemble recalibration methods for seasonal forecasting: a comprehensive intercomparison using the C3S dataset

Article 05 February 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Seasonal Climate Forecasts (SCFs) provide a middle- to long-range outlook of changes in the Earth system over periods of a few weeks to several months, through predictable changes in some of the slow-varying components of the system, such as ocean temperatures (Johnson et al. 2019). Skilful seasonal precipitation forecasts provide invaluable support to a wide range of sectors, including agriculture, construction, mining, hydrology, and water resources management (Falamarzi et al. 2023; Jin et al. 2022; Merryfield et al. 2020; the Centre for International Economics 2014; Tian et al. 2021). For example, site-specific information on monthly rainfall for the growing season, typically up to three months ahead, can help farmers make informed decisions about which crop types or varieties to plant (Weisheimer and Palmer 2014). For Australia alone, the Centre for International Economics (2014) found that the potential annual value added from SCFs is around A$1.6 billion (or 7.31%) for the agricultural sector. Data-driven forecast models have limited forecast skills for precipitation beyond one month (Vivas et al. 2023), thus this study will focus on SCFs from process-based GCMs. A single deterministic SCF is in fact not enough to reflect the inherently chaotic nature of our climate system, climate service centres around the world are routinely providing ensemble SCFs with multiple forecast trajectories from Global Circulation Models (GCMs) with perturbed initial conditions and/or model specifications, or from multiple GCMs (Hudson et al. 2017; Johnson et al. 2019; Kirtman et al. 2014; MacLachlan et al. 2015; Merryfield et al. 2020). Ensemble forecasts, which are used in many other earth sciences studies such as hydrology (Troin et al. 2021; Tyralis and Papacharalampous 2021; Yilmaz et al. 2023), serve to enhance forecast skills.

Due to imperfect model structure and parameterisation such as atmosphere-land and atmosphere‐ocean coupling processes, GCMs usually do not simulate site-specific conditions efficiently, as the predictability from atmospheric initial conditions is lost after a few weeks (Merryfield et al. 2020). Thus, cost-effective statistical post-processing has become a standard practice to eliminate bias and improve forecast skills to provide fit-for-purpose information for localised decision-making (Monhart et al. 2018). Various post-processing models have been developed for long-term climate projection, SCF, and Numeric Weather Predictions (NWP) in particular (Maraun and Widmann 2018; Vannitsem et al. 2021).

In contrast to continuous variables like temperature, post-processing precipitation data poses greater challenges due to its sporadic nature, characterised by numerous zero values and occasional extreme values. There are two main categories of models for post-processing precipitation forecasts. The first category is nonparametric and does not assume a parametric distribution for precipitation. For example, a relatively simple but popular distribution-free method is Quantile Mapping (QM), which calibrates a forecast by matching the empirical (or parametric) distributions of forecasts and observations in a reference period (Cannon 2018; Michelangeli et al. 2009; Piani et al. 2010). To use the relationship between synoptic meteorology and local weather, analogue methods sample daily precipitation from historical days that have similar atmospheric patterns (Shao and Li 2013; Vannitsem et al. 2021). Quantile-based methods focus on providing discretised quantile estimation, instead of a whole distribution (Kokic et al. 2013). In a recent model, Extended Copula Post-Processing (ECPP), the daily precipitation total is simply treated as a left-censored variable and its dependence structure with GCM precipitation and other possible atmospheric predictors are modelled through copulas (Li and Jin 2020). ECPP forecasts are used for predicting early-season yields in Australia (Jin et al. 2022). The nonparametric methods perform well when plenty of training data are available. Two representative nonparametric methods, QM and ECPP, will be included in the performance comparison later in this paper. QM has been utilised as operational downscaling in Australia (Griffiths et al. 2023). ECPP demonstrated superior performance in site-specific post-processing in recent studies (Jin et al. 2023b; Li and Jin 2020; Li et al. 2020). The second category is parametric-based and assumes a parametric distribution for precipitation. For precipitation, various distributions have succeeded in applications, especially for NWP, such as censored Gaussian distribution (Schepen et al. 2018; Schlosser et al. 2019), a mixture of a point mass at zero and Gaussian distributions (Yumnam et al. 2022), a hurdle distribution with a probability at zero and a gamma distribution for positive precipitation amount, as well as its mixture (Fraley et al. 2010; Sloughter et al. 2007), censored and shifted gamma (CSG) distribution (Baran and Nemoda 2016; Scheuerer and Hamill 2015), censored logistic distribution (Wilks 2009), and censored generalised extreme value distributions (Scheuerer 2014). The parameters of these distributions, such as location, scale and/or shape, are linked with predictors from GCMs with various functional dependencies (Schepen et al. 2018; Schlosser et al. 2019; Vannitsem et al. 2021). The parametric approaches are stable with a small sample, such as limited retrospective forecast years in SCFs. If the assumed distribution is less appropriate, this category of post-processing models may deteriorate the forecast skills. In this study, we will use a mixture of parametric distributions, following (Fraley et al. 2010; Sloughter et al. 2007), to be more flexible for monthly precipitation forecasts at the seasonal scale.

Artificial intelligence, especially deep learning, has recently succeeded in downscaling long-range climate forecasts, as exemplified by recent studies (Jin et al. 2023b; Vitart et al. 2022). These models often face challenges when attempting to surpass the SCF benchmark climatology (Vitart et al. 2022), or outperform site-specific downscaling methods like ECPP for short forecast lead times (Jin et al. 2023b). Furthermore, these models typically demand significant training resources, including data and computational time, even for a small number of locations (Jin et al. 2023a). Note that this study does not aim to directly compare our model with them, as such a comparison lies outside the scope of our work. We also restrict predictors in this study to precipitation forecasts from a single GCM as most operationalised SCF systems are based on one GCM, e.g., (Hudson et al. 2017; Johnson et al. 2019; MacLachlan et al. 2015), with a few exceptions (Kirtman et al. 2014). Additionally, other GCM predictors, such as sea surface temperature, may have insignificant contributions to precipitation forecasts in the context of SCFs, e.g., (Li et al. 2020).

When post-processing an ensemble of precipitation forecasts, the ensemble means/medians are commonly used, such as (Li et al. 2020; Schepen et al. 2018; Scheuerer et al. 2020; Wang et al. 2019; Wilks 2009) as they extract reliable information from all the members. Sometimes the number of zero precipitation members or ensemble dispersion can improve prediction skills (Scheuerer 2014; Scheuerer et al. 2020). When the ensemble members from a GCM are distinguishable such as initialised by different sources in NWP (Baran and Nemoda 2016), all the ensemble members can be used in post-processing, like in Ensemble Model Output Statistics (EMOS) (Baran and Nemoda 2016). In addition, if only some ensemble members are distinguishable, those non-distinguishable members could be treated equally, such as in Bayesian Model Averaging (BMA) (Fraley et al. 2010) for NWF where they share the parametric conditional distribution. Most operational SCF systems, including the three tested in this study, generate ensemble members with the help of perturbations of the initial conditions for a single GCM (Hudson et al. 2017; MacLachlan et al. 2015). These members are not climatologically distinguishable. Thus, most precipitation post-processing techniques for SCFs with a single GCM rely solely on ensemble medians (or means) (Li and Jin 2020; Li et al. 2020; Schepen et al. 2018). They exclude valuable information from individual ensemble members that can contribute to addressing precipitation forecast challenges.

In this study, our objective is to harness the ensemble precipitation forecasts more effectively to enhance precipitation forecast skills. Instead of relying solely on ensemble medians, we use the entire ensemble to create a forecast distribution. We then generate quantiles from this distribution to serve as pseudo-ensemble members. Across forecasts made on different initialisation times for a location, pseudo-ensemble quantile ensemble members become more distinguishable, each representing a unique perspective within the ensembles. Like ensemble medians for typical forecasts, quantiles such as 0.025 (or 0.975) offer reasonable low (high) end of ensemble forecasts. We observe reasonably linear relationships between these quantiles and precipitation observations. Consequently, parametric distributions can be learned based on these pseudo-ensemble members too. We use a hurdle distribution with a point mass at zero for dry months and a gamma distribution for power-transformed positive precipitation amounts (Sloughter et al. 2007). These conditional distributions then compete, forming a composite predictive Probability Density Function (PDF) through BMA. Their weights in the mixture distribution are determined by posterior probabilities, which are proportional to pseudo members’ historical forecast performance in the training period. This mixture distribution is subsequently employed for forecasting based on new pseudo-ensemble members in the future, a model we refer to as Quantile Ensemble BMA (QEBMA).

To ensure a fair comparison among various post-processing methods and avoid the need to specify quantile estimation methods and the number of quantiles, this study demonstrates the performance of QEBMA using the same ensemble size as the raw forecasts. Thus, pseudo-ensemble members are provided by simply sorting the raw ensemble members. As described in Sect. 2, QEBAM is verified and compared with five post-processing models on three GCMs’ retrospective forecast data sets on 32 locations from two case study regions in northeastern Australia. They are from four different climate zones with a forecast lead time of 0 to 2 months. The selected GCMs are GloSea5 from the United Kingdom, ECMWF from Europe and ACCESS-S1 from Australia, which have different numbers of ensemble members, as well as spatial resolution. After model development in Sect. 3, leave-one-month-out cross-validation results are given in Sect. 4. These results demonstrate that QEBMA can improve forecast skills in terms of relative bias, mean absolute error, reliability and overall ensemble forecast skills, Continuous Ranked Probability Score (CRPS), in comparison with raw forecasts, and several existing post-processing techniques including QM, and ECPP using ensemble medians or pseudo ensemble members. The improvements are often statistically significant for the three GCMs. Among these post-processing models, only QEBMA consistently outperforms the seasonal precipitation forecast benchmark, climatology. These comparison results demonstrate the potential of QEBMA, especially after further development discussed in Sect. 5.

2 Case study regions and data

2.1 Case study regions

We use two regions mainly in Queensland, Australia for this study. As shown in Fig. 1, each region has 16 grid locations with integer values of latitude and longitude. Each region has about 160,000 square km. The bottom right region covers South-Eastern Queensland (see in the top-right inset in Fig. 1) and about one-fourth of South Queensland. It has both temperate and subtropical climate zone areas. The top-left region, central west QLD, mostly overlaps with Outback Queensland. It has both grassland and desert climate zone areas. We selected these two regions due to their economic significance in agriculture and mining, along with the presence of quality weather station observations necessary for accurate performance validation. Additionally, these locations encompass four distinct climate zones, showcasing a broad spectrum of dry month proportions ranging from 1.3% to 48.7%, as depicted in Fig. 1.

We use SILO gridded data to validate the performance of our precipitation forecasts. The SILO gridded data are represented in a gridded format, where information is available for specific grid points covering a geographical area. They are available free of charge from the Queensland Government, licensed under Creative Commons Attribution 4.0, at https://legacy.longpaddock.qld.gov.au/silo. SILO gridded daily data are constructed from in situ rainfall station records and have been infilled to create a temporally complete record for all grid points with 0.05° resolution (Jeffrey et al. 2001). The monthly rainfall observations are aggregated from these daily data for each grid location. As we mentioned above, these two study regions were chosen because they have quality rainfall stations, thus the monthly gridded data have reasonable quality as well.

2.2 Monthly retrospective forecasts data

We use retrospective forecast data of seasonal precipitation from three Global Climate Models (GCMs): GloSea5, ECMWF, and a calibrated version of ACCESS-S1 (ACCESSc for short hereafter). Each of them is based on a single GCM. The metadata of these three GCMs are listed in Table 1 and briefed below.

Table 1 Three GCMs with different features used in the test and comparison

Full size table

2.2.1 GloSea5 from UK Met Office

The UK retrospective forecast data are derived from Met Office’s Global Seasonal Forecast system version 5 (GloSea5) (MacLachlan et al. 2015). GloSea5 is an ensemble forecast system centred around the high-resolution UK Met Office climate prediction model, known as the HadGEM3 family atmosphere–ocean coupled climate model. Its notable improvements, compared to its version 4, include enhanced year-to-year prediction accuracy for major climate variability patterns. These enhancements are attributed to increased horizontal resolutions in both the atmosphere (N216–0.7°) and the ocean (0.25°), as well as the implementation of a 3D-Var assimilation system for ocean and sea-ice conditions. GloSea5 became operational in July 2013. GloSea5's monthly retrospective forecasts at 1-degree atmospheric resolution were downscaled from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS), which provides public access to both re-forecast and forecast data.

The retrospective forecast period is from Feb 1993 to Dec 2016 as listed in Table 1. The GloSea5 data consists of a seven-member ensemble for each of the four start dates every month (1st, 9th, 17th, and 25th), forming a 28-member ensemble for monthly forecasts. This ensemble approach can enhance the skill of GloSea5's retrospective raw forecasts and is still suitable for demonstrating the performance improvements achieved through post-processing methods.

2.2.2 ECMWF from Europe

The European Centre for Medium-Range Weather Forecasts (ECMWF) data are sourced from SEAS5, its fifth-generation seasonal forecast system that became operational in Nov 2017 (Johnson et al. 2019). This state-of-the-art SCF system features several notable upgrades, including an improved ocean model (NEMO v3.4.1), a higher-resolution atmosphere model (43r1), and the addition of a new interactive sea-ice model, distinguishing it from its predecessor, SEAS4 (Johnson et al. 2019). The retrospective forecast data from ECMWF commences on the first day of each month, covering the years from 1981 to 2015 and comprising 25 ensemble members (in contrast to the 51 members used in its operational forecasts). Retrospective monthly precipitation forecasts were also downloaded from the C3S C3S Climate Data Store (CDS). Its spatial resolution is 1 degree. ECMWF forecast products are typically adjusted to account for mean biases within the forecast system (MacLachlan et al. 2015).

The minimum value of precipitation forecasts from ECMWF is − 0.00253 mm/day. Due to the ecCodes GRIB packing discretisation procedure, there were packing errors increasing with the range of values. The packing errors may lead to negative values. To overcome this, the strategy used by ECMWF is to set all values in an accumulated field computed by subtraction that is less than a positive threshold to zero.^{Footnote 1} A threshold, such as 0.04, is recommended as it allows for the multiplication of forecast values by up to ~ 25 in rare long-dry-day-spell cases. In our study, to reduce the possible influence of this strategy, we set the threshold as 0.003 mm/day.

2.2.3 ACCESS-S1 with calibration

Monthly precipitation retrospective forecasts are obtained from the seasonal prediction version of the Australian Community Climate and Earth-System Simulator (ACCESS) Seasonal model, version 1 (ACCESS-S1) (Hudson et al. 2017). The ACCESS-S1 model is a coupled general circulation model developed and tested by the Bureau of Meteorology (BoM) Australia with the key collaboration of the UK Met Office, based on GloSea5. ACCESS-S1 produces daily forecasts of various atmospheric quantities including precipitation on grid points at the resolution of 0.6 degrees. To increase testing environment variety, we use its calibrated version of ACCESS-S1 (termed ACCESSc hereafter) in this study. The calibration is carried out on a daily level with QM against gridded Australian Water Availability Project’s climate datasets (AWAP). The gridded AWAP data are in a spatial resolution of 0.05 degrees and are obtained by interpolating rainfall station observations (Jones et al. 2009). The calibrated daily data then are averaged to the monthly level (Griffiths et al. 2023). The ACCESSc data have 11 ensemble members with lead times of 0–6 months. Their retrospective forecasts are available from 1990 to 2012. There are 48 initialisation dates for each year, and we only use 12 initialisation dates, i.e., the 1st day of each calendar month.

The retrospective forecast data windows in Table 1 are employed to provide representative re-forecast data for demonstrating the performance of our proposed model in this paper. The latest UK Met Office seasonal climate model, GloSea6, also has the same re-forecast data window from 1993 to 2016 as its precursor, GloSea5. While some seasonal climate forecast models, such as ACCESS-S2 (Griffiths et al. 2023), have re-forecast data available for a few more recent years, such as up to 2018, the specific data are not yet publicly accessible. Similarly, the re-forecast data for ECMWF SEAS6, the successor of the current operational SEAS5, are anticipated to become available only in late 2024, following its development plan.

3 Method

After a brief description of Bayesian Model Averaging (BMA), we develop a new post-processing model Quantile Ensemble Bayesian Model Averaging (QEBMA) in this section.

3.1 Bayesian model averaging (BMA)

BMA was originally introduced as a mechanism to integrate predictions from multiple models while considering model uncertainty, resulting in posterior distributions for both model parameters and the models themselves (Fragoso et al. 2018). To produce weather predictions with less bias and higher skills, BMA has been extended to multiple GCMs (Fraley et al. 2010; Sloughter et al. 2007).

In BMA for ensemble forecasting, each ensemble forecast member $f_{k} (k = 1, \cdots ,K)$, often from a GCM, is associated with a conditional PDF, $h_{k} \left( {y\;{\mid }\;f_{k} ,\theta_{k} } \right).$ It indicates the distribution of precipitation $y$ conditional on $f_{k}$ as the best forecast from the ensemble members with parameter indicated by $\theta_{k}$. If there are multiple GCMs and each GCM has a set of forecasts, ensemble medians or means from each GCM are normally used as an ensemble member $f_{k} .$ The BMA predictive PDF for the $K$ ensemble members is expressed as

$$p\left( {y\;\left| {f_{1} ,\; \ldots ,\;f_{K} } \right.} \right) = \sum\limits_{k = 1}^{K} \; w_{k} \;h_{k} \left( {y\;\;\left| f \right._{k} ,\theta_{k} } \right)$$

(1)

where $w_{k}$ is the posterior probability of forecast $k$ being the most appropriate, and is based on the forecast $k$’s relative forecast performance over a training data set. As probabilities, $w_{k} {{^{\prime}s}}$ are nonnegative and add up to 1, i.e., $\sum\limits_{k = 1}^{K} \; w_{k} = 1$. Before detailing a conditional distribution for monthly precipitation in Sect. 3.3, we introduce QEBMA first.

3.2 QEBMA based on pseudo ensemble forecast members

As we discussed above, BMA assumes these ensemble members are distinguishable from each other. Most climate centres in the world, including the three listed in Table 1, only maintain and run one GCM for seasonal forecasts. Their seasonal ensemble forecasts are mainly from perturbation to initialisation and/or model parameterisation (Johnson et al. 2019), and could not be regarded as distinguishable (as also illustrated by our results in Sect. 4).

Motivated by using the ensemble medians, we can extend to other quantiles derived from all forecast ensemble members to form a new pseudo-ensemble forecast. These pseudo-ensemble members become more distinguishable as they maintain a partial order relationship for the forecasts made on different initialisation dates (Johnson et al. 2019). For example, the pseudo member corresponding to the 0.90-quantile is always not greater than the 0.95 quantile for all the ensemble forecasts. These quantiles can reflect ensemble dispersion, which is often also informative. In addition, these pseudo-members have different relationships with observations. Relating such quantile members to positive precipitation observations, we can expect that a quantile corresponding to a smaller probability has a larger change than a quantile to a larger probability (see, e.g., Fig. 2). In other words, a quantile member corresponding to a smaller probability would have a larger slope if observations were regressed against a given quantile member. The dry months, indicated by red diamonds in Fig. 2, have the narrowest value range for the 0.034 quantiles (the first pseudo ensemble member for GloSea5). The median and the maximum values are $8.43 \times 10^{ - 3}$ and $9.22 \times 10^{ - 2}$ respectively for the 0.034 quantiles. The median (or maximum) values for the 0.517 and 0.966 quantiles are 0.169 (or 1.15) and 1.95 (or 7.01) respectively.

From the K members in an ensemble forecast from a GCM, we can generate a certain number, $N$, of pseudo-members for a given list of probabilities. $N$ can equal $K$ or not, and the latter is useful for forecasts with a different ensemble size from re-forecasts, such as ECMWF (Johnson et al. 2019). We use N equal-distance probabilities to form $N$ quantiles and denote them as $q_{1} ,q_{2} , \cdots ,q_{N} .$ Taking these pseudo forecasts for BMA, we still assume each pseudo forecast member $q_{i} (i = 1, \cdots ,N),$ from a seasonal climate model, is associated with a conditional PDF $h_{i} \left( {y\;{\mid }\;q_{i} ,\theta_{i} } \right)$ to model precipitation $y$ conditional on $q_{i}$ as the best forecast from the pseudo ensemble members with parameter $\theta_{i}$. For simplicity and easy comparison, we use the same conditional PDF, $h_{i} \left( {} \right)$ for both BMA and QEBMA. The predictive PDF of QEBMA for the $N$ pseudo members is expressed as

$$p\left( {y\;\left| {q_{1} ,\; \ldots ,\;q_{N} } \right.} \right) = \sum\limits_{i = 1}^{N} \; w_{i} h_{i} \left( {y\;\;\left| {q_{i} } \right.,\theta_{i} } \right)$$

(2)

where the posterior probabilities $w_{i} \triangleq p\left( {h_{i} \left| {q_{1} ,\; \ldots ,\;q_{N} } \right.} \right)\;$ are nonnegative and add up to 1, i.e., $\sum\limits_{i = 1}^{N} \; w_{i} = 1$. The contribution from each pseudo member $q_{i}$, $w_{i} h_{i} \left( {y\;\;\left| {q_{i} } \right.,\theta_{i} } \right)$, forms a component prediction. These symbols are listed and explained briefly in Table S1.

3.3 Component probability distribution function for monthly precipitation

To eliminate the influence of the varying number of days in each month, we focus on modelling monthly average precipitation (in mm/day). It is the monthly total divided by the number of days in the month. We still call it monthly precipitation below.

For our SCF post-processing application, the dry month (with monthly average precipitation < 0.1 mm/day) proportion ranges from almost 0 to as high as 48% (see Fig. 1), or even higher for some seasons or some other areas in Australia. Left-censored PDF candidates like censored Gaussian and gamma would have less flexibility for a large range of dry month proportions. As illustrated in Fig. 2, 0 precipitation observations correspond to quite different values of two pseudo members. These left-censored PDFs might have difficulties in accumulating useful information from these quantile ensemble members for forecasting 0 precipitation. We choose to model zero precipitation separately from positive precipitation amount, as Sloughter et al. (2007) did for NWP, by using pseudo ensemble members. Besides issues related to the dry months, histograms of ensemble forecast medians for wet months (with average precipitation ≥ 0.1 mm/day) are quite skewed (see examples in Figure S1). Thus, a flexible and skewed distribution like gamma distribution (See examples in Fig. 3, and Figure S3) is used in our component PDF.

To address the issue of heteroscedasticity in positive precipitation values, we fit the gamma distribution to the power-transformed precipitation observations instead of directly fitting it to the observations (Hamill et al. 2004; Sloughter et al. 2007). Following our initial investigation, where we examined power values ranging from 0.2 to 1 across distinct locations corresponding to four climatic zones, it was observed that a power value $\frac{1}{3}$ often resulted in a better fit for a gamma distribution (see comparison examples in Figure S2 and Figure S3). In addition, the cube root transformation makes precipitation observations more homoscedastic as illustrated in Fig. 2. With this transformation, monthly precipitation observations show a linear relationship with these pseudo-ensemble forecasts. In addition, these linear relationships have different intercepts and slopes for different pseudo members (see examples in Fig. 2). Thus, we also work on the cube root of the monthly precipitation observations $z = y^{\frac{1}{3}}$. We have the following conditional PDF of monthly precipitation for Eq. (2), given that the pseudo-forecast member $q_{i}$ is the best forecast:

$$h_{i} \left( {z{\mid }q_{i} ,\theta_{i} } \right) = P\left( {z = 0{\mid }q_{i} } \right)I[z = 0] + P\left( {z > 0{\mid }q_{i} } \right)g_{i} \left( {z{\mid }q_{i} } \right)I[z > 0]$$

(3)

where the general indicator function $I\left[ \cdot \right]$ equals 1 if the condition in brackets is true and is 0 otherwise. It is a hurdle distribution with a point mass at zero for months without precipitation and a gamma distribution for a positive amount, similar to the daily precipitation in (Sloughter et al. 2007). The two distributions for dry months, $P\left( {z = 0{\mid }q_{i} } \right)$, and positive precipitation, $g_{i} \left( {z{\mid }q_{i} } \right)$, are given below respectively. Similar to (Hamill et al. 2004; Sloughter et al. 2007) for daily precipitation, conditional dry month probability for the binomial variable $z = 0$ given $q_{i}$, is modelled as a logistic regression,

$${\text{logit}}\left( {P\left( {z = 0{\mid }q_{i} } \right)} \right) = \log \frac{{P\left( {z = 0{\mid }q_{i} } \right)}}{{P\left( {z > 0{\mid }q_{i} } \right)}} = a_{0i} + a_{1i} q_{i}^{\frac{1}{3}} + a_{2i} I\left[ {q_{i} = 0} \right]$$

(4)

The second predictor is from the pseudo forecast member $q_{i}$ as well when it is zero, which can help smooth out the regression from non-zero to zero for $q_{i}$. Because a large pseudo ensemble member is expected to correspond to less probability of zero precipitation, the parameter $a_{1i}$ is expected to be non-negative. When the dry months in the training data are either very rare (or dominant), e.g., its probability $p_{0} < 0.025$ (or $> 0.975$), we impose $P\left( {z = 0{\mid }q_{i} } \right) \equiv p_{0}$ to ensure that the dry month probability forecasted matches that of the training period. This approach is akin to the QM method and mitigates the logistic regression model fitting issue arising from unbalanced training data.

The positive monthly precipitation amount is modelled by a gamma distribution with shape parameter $\alpha_{i}$ and scale parameter $\beta_{i}$,

$$g_{i} \left( {z{\mid }q_{i} } \right) = \frac{1}{{\beta_{i}^{{\alpha_{i} }} \Gamma \left( {\alpha_{i} } \right)}}z^{{\alpha_{i} - 1}} \exp \left( { - \frac{z}{{\beta_{i} }}} \right).$$

(5)

Since approximately linear responses are observed between cube root observations and cube root pseudo forecast members (e.g., Fig. 2), we link, instead of the shape and scale parameters, the distribution’s mean $\mu_{i} = \alpha_{i} \beta_{i}$ and variance $\sigma_{i}^{2} = \alpha_{i} \beta_{i}^{2}$ with $q_{i}^{1/3}$ and $q_{i}$ respectively. We assume an approximately linear relationship between the expectation of the cube root of monthly precipitation and the cube root of forecast $q_{i}$

$$\mu_{i} = b_{0i} + b_{1i} q_{i}^{1/3}$$

(6)

and the variance of precipitation distribution and forecast $q_{i}$

$$\sigma_{i}^{2} = c_{0} + c_{1} q_{i}$$

(7)

We allow the intercepts $b_{0i}$ and slopes $b_{1i}$ in Eq. (6) to vary among different pseudo ensemble members $q_{i}$ because smaller ensemble members typically have smaller intercepts and larger slopes (e.g., see Fig. 2). The regression coefficients in Eq. (7) remain consistent across all pseudo ensemble members, assuming similar variance relationships among them. This is often seen in the literature, e.g. (Chakraborty et al. 2015; Sloughter et al. 2007). Maintaining consistent regression coefficients reduces the model’s parameters, preventing overfitting. It also accounts for the typically small monthly-scale training datasets resulting from the relatively short retrospective forecast period of seasonal forecast systems.

3.4 Parameter estimation and forecasts

We use training data $\left\{ {z_{t} ,q_{1t,} \cdots ,q_{Nt} } \right\}_{t \in T}$ where $T$ is a retrospective forecast period excluding the month for the model testing of a GCM to determine parameters $\left\{ {a_{0i} ,a_{1i} ,a_{2i} ,b_{0i} ,b_{1i} ,c_{0} ,c_{1} ,w_{i} } \right\}_{i = 1, \cdots ,N}$ in the mixture model

$$p\left( {z\;\left| {q_{1} ,\; \ldots ,\;q_{N} } \right.} \right) = \sum\limits_{i = 1}^{N} \; w_{i} \;h_{i} \left( {z\;\;\left| {q_{i} } \right.,\theta_{i} } \right)$$

(8)

via maximum likelihood estimation. It involves three procedures. First, the coefficient parameters for dry month probability,$a_{0i}$, $a_{1i}$ and $a_{2i}$ are determined by the pseudo forecast member $i$ and training observations. When the dry month frequency $p_{0}$ in the training data is beyond the interval [0.025, 0.975], to avoid unbalanced training data, we let $a_{0i} = {\text{logit}}\left( {p_{0} } \right),$ and $a_{1i} = a_{2i} = 0$ such that $P\left( {z = 0{\mid }q_{i} } \right) = p_{0}$ to forecast the dry month probability identical to that in the training period. For other situations, similar to BMA for daily precipitation prediction (Sloughter et al. 2007), these coefficients are determined separately for each pseudo ensemble member via logistic regression with precipitation/no precipitation as the dependent variable, and cube root of forecast $q_{i}^{1/3}$ and $I\left[ {q_{i} = 0} \right]$ as the two predictors in Eq. (3). Secondly, the parameters b_0k and b_1k are member-specific and are determined through a regression analysis of the cube root of the observations against the cube root of pseudo ensemble forecasts as the predictor. Thirdly, for the remaining parameters $w_{i}^{{}} ,$ $c_{0}$ and $c_{1}$, as the weights have a constraint $\sum\limits_{i = 1}^{N} \; w_{i} = 1$ in the mixture model, we use an iterative Expectation–Maximisation (EM) procedure. The EM algorithm starts with an initial guess for these parameters with $w_{i}^{(0)} = \frac{1}{N}$, $c_{0}^{(0)} = {\text{var}} \left( {z_{t \in T} } \right)$, and $c_{1}^{(0)} = 0$. In the expectation step, we estimate the expectation of relative probability of $q_{it}$ among $N$ quantiles under the parameter estimation in the $j$ th iteration, denoted by $v_{it}^{(j + 1)}$ for the $(j + 1)$ th iteration. It is straightforward that $\widehat{v}_{it}^{(j + 1)} = \frac{{w_{i}^{(j)} \;h_{i} \left( {z_{t} \left| {q_{it} } \right.,\theta_{il}^{(j)} } \right)}}{{\sum\limits_{n = 1}^{N} \; w_{n}^{(j)} \;h_{n} \left( {z_{t} \left| {q_{nt} } \right.,\theta_{nl}^{(j)} } \right)}}$.

The maximisation step estimates $w_{i}^{(j + 1)}$, $c_{0}^{(j + 1)}$, and $c_{1}^{(j + 1)}$ with the current estimates of $v_{it}^{(j + 1)} ,$ or the alignment of $q_{it}$ to the pseudo member $i$. Thus, weight $w_{i}^{(j + 1)}$ is the average across the training period, $w_{i}^{(j + 1)} = \frac{{\sum\limits_{t \in T}^{{}} {\widehat{v}_{it}^{(j + 1)} } }}{\left| T \right|}$. There are no analytic solutions for the maximum likelihood estimates of the parameters $c_{0}$ and $c_{1}$, and so they must be estimated numerically by optimizing the likelihood $L\left( {w_{1} ,\; \ldots ,\;w_{N} ,c_{0} ,c_{1} } \right) = \prod\limits_{t \in T}^{{}} {p\left( {z_{t} \;\left| {q_{1t} ,\; \ldots ,\;q_{Nt} } \right.} \right)}$, using the current best estimates of parameters, $w_{i}^{(j + 1)}$, $c_{0}^{(j)}$, and $c_{1}^{(j)}$. This EM algorithm guarantees that the likelihood does not decrease after each iteration (Peel and McLachlan 2000), and consequently converges. In practice, the iteration procedure terminates when the change in likelihood or parameters is smaller than a given tolerance, such as the relative change being smaller than $10^{ - 5}$. It is worth noting that the EM algorithm does not guarantee to converge to a global maximum, and our parameter estimation is sensitive to the starting values, which is subject to future research.

After all the parameters are estimated, we have a probability distribution, in Eq. (8), for a test month $m$ conditional on pseudo forecast members. Three forecast distribution examples of QEBMA for the testing month of Aug 2010 are illustrated in Fig. 3. We use these predictive PDFs to generate samples of $z$, the cube root of the monthly precipitation. From these samples, we can easily produce the resulting probability statements in terms of the original precipitation amount, such as median and confidence intervals, as illustrated in Fig. 3.

4 Verification and comparison results

4.1 Comparison models and implementation

We use the leave-one-month-out cross-validation for performance evaluation and comparison. In model training for a location, for each lead time, we exclude the test month and use the retrospective forecasts and observations from all other months. The procedure is repeated for all the months.

The proposed post-processing model QEBMA is compared with several post-processing models. The first one is to illustrate the importance of quantile ensemble members by comparing QEBMA with BMA which works on the original ensemble forecast members $f_{k}$ from the GCMs. It is based on Eq. (1), and replaces the quantile ensemble members $q_{i}$ with $f_{k}$ in component PDFs in Eq. (3). In other words, in BMA, original ensemble forecast members are regarded as “distinguishable”, which is often not true in SCFs. To facilitate comparison between QEBMA and BMA or other counterparts, we use the same number of ensemble members, i.e., $N = K$. As a result, both BMA and QEBMA have the same number, $6K + 1$, of model parameters. Hence, the main difference between QEBMA and BMA is that QEBMA performs on the pseudo ensemble members at any forecast time point.

QM, being used as the operational post-processing method for seasonal forecasts in Australia (Griffiths et al. 2023), maps a raw forecast from a GCM to its corresponding quantile of historical observations. It focuses on adjusting the forecast mean as well as the ensemble spread (Wood et al. 2002). For a univariate climate variable like precipitation $Y$, we obtain a raw forecast of this variable $f_{k}$ from a GCM. We denote the Cumulative Distribution Functions (CDFs) of all the raw forecast ensemble members and observations in the reference period by $F_{j}$ and $F_{0}$, respectively. The QM post-processed forecast $y_{k}^{(QM)}$ can be formulated as $y_{k}^{(QM)} = F_{o}^{ - 1} \left( {F_{f} \left( {f_{k} } \right)} \right)$, where $F_{o}^{ - 1}$ is the inverse function of $F_{0}$. We use the empirical distribution of raw forecasts and observations over the training period as the estimates of $F_{j}$ and $F_{0}$ Only training data from one or zero months away from the target month are used. For example, for August 2008, training pairs include data from July to September, excluding the year 2008. QM is carried out at the level of individual ensemble members and keeps the same ensemble size as the raw forecasts. To check whether the relatively more distinguishable pseudo ensemble forecasts improve post-processing, we further restrict empirical distribution estimation to each quantile member, instead of all the members. We refer to this version as QMq in the comparison.

For comparison, we also use a recent post-processing model ECPP (Li and Jin 2020) on monthly precipitation, as it performs the best among site-specific nonparametric postprocessing methods (Jin et al. 2023b; Li et al. 2020). ECPP uses copulas, a powerful statistical tool to model the dependence structure among random variables, to model precipitation observations $Y$ and ensemble forecast medians $f_{M}$. Copulas conveniently separate the dependence structure of random variables from their marginal distributions without data transformation (Nelsen 2006). To make use of the efficient computation of the classical parametric copula, we treat monthly precipitation as left-censored at 0. Under the left-censoring assumption, the underlying variables ${X}_{o}$ and forecast ${X}_{f}$ are assumed to be continuous variables and can take values less than 0, such that $Y = \max (X_{o} ,0)$ and $f_{M} = \max (X_{f} ,0)$. When ${X}_{o}$ or ${X}_{f}$ is less than 0, we only observe the value of 0 and do not know its exact value. The underlying true variable ${X}_{o}$ and forecast variable ${X}_{f}$ are formulated as a bivariate copula function $C$ with a parameter $\theta$ as $F(X_{o} ,X_{f} ) = C\left\{ {F_{o} \left( {X_{o} } \right),F_{f} \left( {X_{f} } \right);\theta } \right\}$. The conditional distribution $F({X}_{o}|{X}_{f})$ to generate post-processed forecasts conditional on the raw forecast ${X}_{f},$ is$F({X}_{o}|{X}_{f};\theta )=\frac{\partial C\left({F}_{o}\left({X}_{o}\right),{F}_{f}\left({X}_{f}\right);\theta \right)}{\partial {F}_{f}({X}_{f})}$. We use empirical distributions ${F}_{o}$ and ${F}_{f}$ based on the historical observations and retrospective forecast data as it is convenient for empirical distributions to deal with possible multimodal distributions and zero precipitation observations. For more technical details please refer to (Li and Jin 2020). We estimate the copula parameter $\theta$ via maximum likelihood estimation for each location, month, and forecast lead time combination. Like QM, we include the three nearest months to increase the training data size. Once we complete the estimation process, we generate an ensemble forecast from the conditional forecast distribution ${\widehat{Y}}_{s}={F}_{o}^{-1}\left({F}^{-1}\left(u\left|v;\theta \right.\right)\right)$, where $u$ is a random number from the uniform distribution$U[\mathrm{0,1}]$, ${F}_{o}^{-1}$ is the quantile function of the precipitation observation and ${F}^{-1}(\cdot|\cdot)$ is the inverse function of the conditional forecast distribution. $v$ equals ${F}_{f}\left({f}_{M}\right)$ when${f}_{M}>0$, and is a random number from the uniform distribution $U\left[0,{F}_{f}\left(0\right)\right]$ when${f}_{M}=0$. For a GCM with $K$ ensemble members, we repeat the simulation procedure $10 \times K$ times to generate a post-processed ensemble forecast as suggested in Li and Jin (2020).

To check how the pseudo ensemble forecasts help ECPP, we extend ECPP to each pseudo member $q_{i} (i = 1, \cdots ,K)$ separately, instead of the ensemble median${f}_{M}$, to generate 10 simulations independently. These $10 \times K$ forecasts together form a post-processed ensemble forecast. We name this model ECPPq.

To provide a benchmark for forecasts, researchers often compare forecast techniques with a naïve climatology forecast (Jin et al. 2022; Li and Jin 2020; Li et al. 2020; Schepen et al. 2018). It uses historical observations (except data from the test time window) to form an ensemble forecast. The reference period used in this study is from 1980 to 2018, a total of 39 years. For example, in the case of generating the climatological reference forecast for Jan 2000, we use the historical observations from January other than the year 2000 (i.e. from 1980–1999 and 2001–2018) to form a reference forecast with 38 ensemble members for the leave-one-month-out cross-validation. That means, the reference forecast climatology, has 38 ensemble forecast members.

We implemented the QEBMA and its counterpart models in R, especially using/modifying packages like scoringutils, qmap, VineCopula, and ensembleBMA (Fraley et al. 2018; R Core Team 2022; Schepsmeier et al. 2015). We used a relatively short common period from Feb 1993 to Dec 2010 to enable comparison among the three GCMs.

4.2 Forecast verification metrics and skill scores

To assess the forecast models, cross-validation is conducted for both deterministic and probabilistic forecasts. Ensemble medians from forecasts are treated as deterministic forecasts. To facilitate the comparison among different post-processing models (e.g., QM, ECPP and BMA) across five metrics and three GCMs, we calculate a skill score for each metric in comparison with the reference forecast climatology, with values ranging from $-\infty$ to 1. A positive skill score, often as a percentage, indicates a skilled post-processing model, and higher scores correspond to better forecasts (Li et al. 2020). A skill score value of 1 (100%) represents a perfect forecast, where all forecasts match their target observations, while a score of 0 indicates performance equivalent to climatology for the metric. Differences in skill scores are regarded as indicators of improvements.

4.2.1 Relative bias

Bias, i.e., the difference between a deterministic forecast and an observation, is often used as a metric. These post-processing models QM, ECPP and BMA normally get systematic biases corrected. For example, the biases of the medians for the post-processing models are close to zero for different lead times, locations, and GCMs as illustrated in Figure S4 and Figure S5. The mean biases of QEBMA are normally less than 0.6 mm/day. Given the skewness of monthly precipitation observations, as illustrated in Figure S1, a bias of 0.6 mm/day on average is less concerning to end users for larger observation values (e.g., 10 mm/day) compared to smaller observation values (e.g., 1 mm/day). Therefore, we focus on comparing relative biases that are defined by the differences between the ensemble medians and observations normalised by the observations (Khajehei and Moradkhani 2017; Li et al. 2020). With such relative bias we can check whether a forecast tends to make an over- or under-estimation (indicated by positive or negative relative bias) and how much that forecast median is deviated from the observations. To avoid possible division by zero, we lift the observation with a constant of 0.8 mm/day. Thus, the relative bias of $y_{f,t}$ w.r.t. $y_{o,t}$ at a time $t$ is $E_{t} = \frac{{y_{f,t} - y_{o,t} }}{{y_{o,t} + c_{e} }}$. To facilitate comparisons among different post-processing models across GCMs, we further calculate the relative bias skill score in comparison with the reference forecast, climatology. For each of the 12 months, we calculate the average of absolute relative biases $\overline{{\left| {E_{t} } \right|}}^{(M)}$ for the post-processing model $M$. The relative bias skill score is $\left( {1 - \frac{{\overline{{\left| {E_{t} } \right|}}^{(M)} }}{{\overline{{\left| {E_{t} } \right|}}^{(ref)} }}} \right) \times 100\%$. A zero skill score indicates the same relative bias as the cross-validation version of the climatology forecast. The average relative bias skill score across 12 months is the final relative bias skill score for a given location and forecast lead time.

4.2.2 Mean absolute error (MAE)

MAE measures deterministic forecast accuracy, $MAE_{t} = \left| {y_{f,t} - y_{o,t} } \right|$. Like the relative bias skill score above, MAE is averaged across the years first. Against the average MAE of the reference forecast for each month, the MAE skill score is $1 - \frac{{\overline{{\left| {MAE_{t} } \right|}}^{(M)} }}{{\overline{{\left| {MAE_{t} } \right|}}^{(ref)} }}$. The 12-month average is the final MAE skill score for a given location and a lead time.

4.2.3 Forecast coverage and reliability

For calibration of a probabilistic forecast of model $M$, it is useful to check its coverage ${\text{cov}}_{t}^{(M)}$ of (1 − α_c) × 100% central prediction interval for a given α_c ∈ (0,1). It is the proportion of validating observations located between $\frac{{\alpha }_{c}}{2}$ and $\left(1-\frac{{\alpha }_{c}}{2}\right)$ quantiles of the predictive distribution or ensemble forecasts. Considering the minimum ensemble size of the three GCMs is 11, we set α_c = 0.2 to allow direct comparisons with the raw ensembles. Three examples in Fig. 3 illustrate that the observation is located within the three 80% confidence intervals of probabilistic forecasts generated by QEBMA for the three GCMs. As closer average coverage to 1-α_c is better, to simplify comparison, coverage skill score in comparison with the reference forecast is defined as $\frac{{\overline{{\left| {{\text{cov}}_{t}^{(M)} - (1 - \alpha_{c} )} \right|}} - \overline{{\left| {{\text{cov}}_{t}^{(ref)} - (1 - \alpha_{c} )} \right|}} }}{{\overline{{\left| {{\text{cov}}_{t}^{(ref)} - (1 - \alpha_{c} )} \right|}}^{{}} + 0.2}}$ where 0.2 is added in the denominator to avoid division by zero.

To eliminate influence from specifying α_c, another metric reliability is also used. It is an attribute to characterise the difference between the observed and forecast frequency of an event over a forecast period. We measure the forecast reliability by the α-index (Renard et al. 2010), defined as

$$\alpha = 1 - \frac{2}{n}\sum\limits_{t = 1}^{n} {\left| {p_{t}^{*} - \frac{t}{n + 1}} \right|}$$

(9)

where $p_{t}^{*}$ is the sorted forecast probability integral transform of rainfall observations $p_{t} = F_{ens,t} \left( {y_{o,t}^{{}} } \right)$ at time $t$ and $F_{ens,t}$ is the distribution of an ensemble forecast at time $t$. $p_{t}^{*}$ is expected to be uniformly distributed if the total $n$ forecasts made are reliable as illustrated by histograms in Figure S6. The α-index ranges from 0 (worst reliability) to 1. For location Lat-24Lon145, QEBMA has the highest reliability value of 0.956 (Figure S6). Comparing the α-index of a model M with the reference model, the reliability skill score is calculated as $\frac{{\alpha^{(M)} - \alpha^{(ref)} }}{{\alpha^{(ref)} }}.$

4.2.4 Continuous ranked probability score

The Continuous Ranked Probability Score (CRPS), a surrogate measure of forecast bias, reliability, sharpness and efficiency, is used to evaluate the overall forecast skill (Hersbach 2000). It is a popular proper score and is widely used in evaluating probabilistic forecasts (Fraley et al. 2010; Jin et al. 2022; Li et al. 2020; Schepen et al. 2018; Sloughter et al. 2007). For a probabilistic forecast distribution function $F_{ens,t} (y)$ for the observation $y_{o,t}$ at time $t$, it is a quadratic measure of the difference between $F_{ens,t} (y)$ and the empirical distribution of observation $CRPS_{t} = \int {\left( {F_{ens,t} (y) - I\left[ {y \le y_{o,t} } \right]} \right)^{2} dy} .$ We standardise the average CRPS for model M concerning the reference forecast and report it as the CRPS skill score

$$CRPS{\text{ Skill Score}} = \left( {1 - \frac{{\overline{{CRPS^{(M)} }} }}{{\overline{{CRPS^{(ref)} }} }}} \right) \times 100\%$$

(10)

where $CRPS^{(ref)}$ is the CRPS calculated from the climatology forecast. When the CRPS skill score > 0, we say it has a positive skill. The maximum CRPS skill score is 1, suggesting a perfect forecast as all ensemble members are identical to their target observations.

Three post-processed examples by QEBMA are illustrated in Fig. 3 for grid point Lat-24Lon145 for Aug 2010. As depicted by the dotted colour curves beneath the prominent black curve in Fig. 3a, the majority of component predictions contributed by pseudo-ensemble members exhibit minimal weights, with their dotted colour curves closely hugging the x-axis. Only seven out of 28 pseudo ensemble members have weights > 1% for GloSea5. They are, in the decreasing order of their weights, $q_{21}$(with weight $w_{21} = 28.9\%$), $q_{7}$(24.6%), $q_{13}$(24.6%), $q_{3}$(12.1%), $q_{11}$(8.52%), and $q_{26}$ (1.19%). As the observation is closer to the PDF median, and within the forecast confidence interval, the forecast has a CRPS of 0.279. In comparison with CRPS = 0.458 of the reference forecast climatology, its CRPS skill score is 0.391.

For ECMWF, QEBMA has six pseudo members (Fig. 3b) with weights > 1%:$q_{10}$ ($w_{10} = 33.9\%$), $q_{22}$(29.2%), $q_{6}$(16.2%), $q_{25}$(8.28%), $q_{3}$(7.8%) and $q_{9}$(4.15%). Its CRPS is 0.292, and its CRPS skill score is 0.361.

For ACCESSc, in Fig. 3c QEBMA has three pseudo members with weights higher than 1%: $q_{2}$($w_{2} = 49.5\%$),$q_{10}$(41.5%), and $q_{1}$(9.0%). Its CRPS is 0.497, higher than that of Climatology, resulting in a negative CRPS skill score of − 0.087.

For each location and lead time combination, we calculate the five skill scores for each post-processing model and raw GCM forecasts, which are reported for the three GCMs in the coming subsections.

4.3 Results on GloSea5

Table 2 summarises the skill scores of five metrics over three different lead times and 32 locations from the 215 months for GloSea5.^{Footnote 2} QEBMA performs the best in terms of relative bias, MAE, and CRPS, for which, the improvement is statistically significant at the level of 0.05 (as indicated by ‘*’ in Table 2) over the raw GloSea5 forecasts and the other five post-processing models. For appropriate coverage of observations within 80% central prediction intervals, QEBMA is better than all the other forecasts except BMA. For reliability, QEBMA is better than raw GloSea5 forecasts, QMq, ECPPq, and BMA, comparable with QM, and slightly worse than ECPP. Compared with the raw GloSea5 forecasts, QEBMA performs statistically significantly better in terms of all five skill scores. The seven models except QEBMA have one or more negative scores on the five metrics, which means only QEBMA is overall better than the reference model climatology on all the five metrics.

Table 2 Average skill scores (%, higher is better) over three lead times and 32 grid points on GloSea5

Full size table

The boxplots of skill scores over 32 locations for 0 to 2-month forecast lead times are given in Fig. 4. QEBMA normally has higher skill scores than raw forecasts and the other post-processing models, especially ECPPq and BMA on the five metrics and three lead times. Skill scores normally decrease with forecast lead time, except coverage and reliability as the forecast difficulty increases. Let's further examine 0-month lead time forecasts, typically made on the first day of a given month. Compared with QM, QEBMA has comparable MAE scores while higher relative bias scores in general, indicating that QM may be worse at forecasting months with large precipitation amounts. Such differences are reflected too in the relatively lower CRPS skill scores of QM. Compared with ECPP, QEBMA has comparable skill scores on MAE and reliability, and higher scores on relative bias, coverage and CRPS.

We further examine the ensemble forecast skill scores for 0-month lead time forecasts across all 32 locations. As illustrated in Fig. 5a, we can see QEBMA has stable skill scores over 32 locations in the four different climate zones. For 31 out of 32 locations, QEBMA shows a skill improvement over raw GloSea5 forecasts, with a mean CRPS skill score improvement of 3.54% (Fig. 5b). QEBMA has higher scores than QM in 31 out of 32 locations. The average CRPS skill score improvement is 3.25%. It outperforms ECPP in 28 out of 32 locations, with an average score improvement of 3.35%. QEBMA outperforms ECPPq in all 32 locations with an average improvement of 7.90%. It outperforms BMA in 29 out of 32 locations with an average improvement of 3.27%.

For the forecasts made for Dec, Jan and Feb with 0-month lead time, QEBMA has positive skills in general, with positive CRPS skill scores up to 19.0%. It normally has higher skills than raw forecasts of GloSea5 and the other post-processing models as illustrated in Fig. 6. For 0-month lead time forecasts made for Jun, Jul, and Aug, QEBMA has higher average skill scores in most locations (Fig. 7). Compared with raw GloSea5 forecasts, the improvement of QEBMA in the outback case study region is not as high as that in the Southeast Queensland case study region.

4.4 Results on ECMWF

The skill scores of the five metrics for ECMWF are summarised in Table 3. On relative bias, QEBMA has an average score of 1.75%, which is the only positive score and the best among the seven forecast models. On the second deterministic forecast metric MAE, QEBMA has the highest score of 7.13%, which is statistically significantly better than the other six models. On the metric of coverage, QEBMA is better than all the other models, except BMA. QEBMA has the highest average reliability skill score of 1.73%, which is statistically significantly better than all the other models except ECPP. QEBMA has an average CRPS skill score of 11.64%, which is statistically significantly better than the other six models. Compared with the raw ECMWF, only one post-processing model, QEBMA has higher average skill scores on all the five metrics. Furthermore, only QEBMA outperforms climatology on the five metrics on average, as it is the only one with five positive skill scores.

Table 3 Average skill scores (%, higher is better) over three lead times and 32 grid points on ECMWF

Full size table

As illustrated in Fig. 8, the raw ECMWF forecasts deteriorate with lead time on all five metrics, like all the post-processing models’ forecast skill scores. BMA has inferior performance on CRPS and MAE. ECPPq does not improve CRPS and MAE in general, and even could not improve reliability on 0-month lead time forecasts. QM and ECPP don’t improve the ECMWF’s forecast performance for the three different lead times on relative bias, MAE and CRPS. Both of them increase the skill scores on coverage and reliability to around 0, i.e., comparable with climatology. For all three forecast lead times, QEBMA has generally higher average scores than the raw ECMWF forecasts on the five metrics. Given the clear improvements on the 0-month lead time forecasts in Fig. 8, let's discuss its performance on the 1-month lead time forecasts, focusing on the two key metrics: MAE for deterministic forecasts and RMSE for ensemble forecasts.

As illustrated in Fig. 9 for 1-month lead time forecasts, MAE skill scores of QEBMA range from − 2.55% to 7.39% with a mean of 3.79%. QEBMA has positive scores on 29 out of 32 locations, instead of 32 out of 32 locations for 0-month lead time forecasts. Compared with raw ECMWF forecasts, QEBMA has higher scores on 21 locations with a slightly higher mean (0.81% higher) score. Compared with QM, QEBMA has higher MAE skill scores on 22 locations with a 0.77% higher mean. Compared with ECPP, QEBMA has 23 out of 32 locations with a 2.50% higher mean score. Compared with ECPPq, QEBMA has 23 out of 32 locations with a 1.47% higher mean MAE skill score. QBMA’s average MAE skill score differences with ECPP and QM decrease with forecast lead time. They are 4.33%, 2.50%, and 1.40% against ECPP and 2.89%, 0.77%, and − 0.51% against QM.

From Fig. 10a, we can see QEBMA has stable overall ensemble forecast skills, with a mean CRPS skill score of 8.68%, over the 32 locations for the lead time of 1 month for ECMWF. QEBMA has higher CRPS skill scores than the other models on at least 30 out of 32 locations except ECPP. Against ECPP, QEBMA has higher scores in 25 locations. In addition, QBMA’s average CRPS skill score differences with its two closest post-processing competitors ECPP and QM decrease with forecast lead time. They are 4.50%, 2.45%, and 1.86% against ECPP and 5.06%, 3.51%, and 3.40% against QM.

4.5 Results on ACCESSc

The skill scores of the five metrics for ACCESSc and six post-processing models are summarised in Table 4. QEBMA has statistically higher scores than raw ACCESSc forecasts, QM, and QMq on all five metrics. Compared with ECPP, QEBMA has much higher scores on relative bias and coverage, and comparable scores on MAE, reliability and CRPS. Compared with ECPPq, QEBMA has statistically significantly higher scores on relative bias, reliability and CRPS and comparable scores on MAE and coverage. QEBMA outperforms BMA on all the metrics except coverage.

Table 4 Average skill scores (%, higher is better) of ACCESSc and six post-processing models over three lead times and 32 grid points

Full size table

Figure 11 illustrates the skill scores of the five metrics for different forecast lead times for ACCESSc. It is visible that QEBMA outperforms all its counterparts on the 0-month lead time forecasts. Its advantages decrease with forecast lead time. Compared with its closest competitor ECPP, its average MAE skill score differences decrease from 2.26%, − 0.50% and − 0.46% for 0 to 2-month lead times. Its average CRPS skill score differences from ECPP decrease from 2.22%, to − 0.92% and − 0.25% for 0 to 2-month lead times.

4.6 Discussions

Comparing the raw forecasts of the three GCMs, it is evident from Tables 2, 3, 4 that these post-processing models exhibit varying degrees of skill improvement. Specifically, for ECMWF, both QM and ECPP deteriorate forecast performance in terms of relative bias, MAE and CRPS averaging across the 32 locations. Notably, among the six post-processing models, only QEBMA consistently outperforms raw forecasts in terms of these metrics for all three GCMs.

As skill scores are calculated against the same reference model, we can average the improvements in these post-processing models over raw forecasts from the three GCMs. Table 5 illustrates that, on average, QEBMA exhibits the highest skill improvements across four of five metrics, except for coverage on which it slightly trails behind BMA. Notably, only QEBMA and ECPP among the six post-processing models demonstrate positive skill scores on all five metrics. This suggests that forecasts generated by QEBMA and ECPP generally outperform the raw forecasts provided by the GCMs. Additionally, QEBMA outperforms ECPP on all five metrics.

Table 5 Average skill scores (%) improved from the raw forecasts of three GCMs

Full size table

QEBMA’s skill improvement depends on all the ensemble members of the raw GCM forecasts. For poor raw forecasts, such as the 2-month lead time forecasts from ACCESSc, QEBMA’s skill can be inferior to the reference model climatology, e.g., MAE in Fig. 11. For longer forecast lead times, such as 3 or more months, the performance of QEBMA will be examined in future, especially after the forecast skills of GCMs are further enhanced.

In this section, QEBMA uses the same ensemble size as the raw forecasts from a GCM. Thus, pseudo-ensemble members are provided by simply sorting the raw ensemble members, and the only difference between QEBMA and BMA is that QEBMA is conditional on the raw ensemble members after sorting. From Tables 2, 3, 4, it could be found that QEBMA outperforms BMA on four out of five metrics for each of the three GCMs. Similarly, as listed in the last two columns in Table 5, QEBMA has a higher average improvement from the raw forecasts of three GCMs than BMA on the five metrics, except coverage. For both metrics MAE and CRPS, QEBMA is superior to the raw forecasts of three GCMs, but BMA is inferior. These comparisons illustrate the extra information QEBMA extracts from the quantiles of an entire set of raw ensemble members.

QMq and ECPPq don’t have better performance than their counterparts QM and ECPP, especially on the two key probabilistic forecast metrics, reliability and CRPS. They treat the pseudo-ensemble forecast members equally. When these pseudo-ensemble members compete with each other as in QEBMA, more than two-thirds of pseudo-ensemble members have quite small weights (see examples in Fig. 3) based on their historical performance and could not contribute to post-processed forecasts that much. This means these pseudo-members should not be treated equally.

The improvement in forecast performance with QEBMA stems from multiple factors. Operating in a manner akin to conventional post-processing techniques, QEBMA utilises historical observational data and GCM re-forecasts to refine GCM forecasts. Additionally, QEBMA makes better use of ensemble information by transforming individual members into pseudo-ensemble members. A competitive weighting mechanism is employed to differentiate the influence of individual pseudo members, including ensemble medians. This strategic alignment process brings forecast ensembles into better agreement with the relationship between re-forecasts and historical observational data, consequently enhancing the accuracy and skill of monthly precipitation forecasts.

The proposed QEBMA method exhibits certain limitations. Its performance is intricately tied to the raw forecast performance of a GCM, aligning with the characteristics of most forecast post-processing techniques. Its parameter estimation procedure aims for a local optimum. This estimation procedure is relatively slow, particularly when dealing with a substantial number of pseudo ensemble members. QEBMA adopts an approach of assigning the same size to the pseudo members as that of the ensemble members for a GCM, a choice that proves suboptimal. This is attributed to the fact that a significant proportion of pseudo ensemble members contribute minimally, given their weights are close to zero. Consequently, this aspect may diminish the effectiveness of the QEBMA method.

This paper endeavours to improve the accuracy and skill of ensemble seasonal precipitation forecasts at a monthly scale by introducing a parametric post-processing technique, QEBMA, that leverages individual ensemble members. The evaluation spans 32 locations and involves three distinct GCMs. In the realm of seasonal forecasts, prevalent non-parametric (Griffiths et al. 2023; Monhart et al. 2018; Shao and Li 2013) or deep-learning-based models (Jin et al. 2023a, 2023b; Vitart et al. 2022) typically treat all ensemble members equally. Conversely, most parametric post-processing techniques for seasonal precipitation forecasts predominantly rely on ensemble medians (or means) (Li and Jin 2020; Li et al. 2020; Schepen et al. 2018; Wang et al. 2019), thereby potentially neglecting valuable forecast information of ensemble members. These studies span diverse temporal scales, encompassing daily accumulated precipitation (Griffiths et al. 2023; Jin et al. 2023a, 2023b; Li and Jin 2020; Li et al. 2021; Monhart et al. 2018; Schepen et al. 2018; Shao and Li 2013), weekly (Monhart et al. 2018), fortnightly (Vitart et al. 2022). Although originally designed for these temporal scales, they offer potential applicability to monthly or seasonal precipitation forecasting, e.g., (Li et al. 2020; Wang et al. 2019). In addition to GCM precipitation forecasts, certain post-processing studies incorporate supplementary predictors (Jin et al. 2023b; Li et al. 2020; Scheuerer et al. 2020; Vitart et al. 2022) without a consistent conclusion. Most other studies, to our knowledge, typically only present forecast assessment results for a single GCM. In contrast, this study goes further by incorporating performance assessment results for three distinct GCMs.

5 Conclusions

To make better use of all the ensemble members in probabilistic monthly precipitation forecasts, we have proposed Quantile Ensemble Bayesian Model Averaging (QEBMA) model for post-processing Seasonal Climate Forecasts (SCFs). It has taken an ensemble forecast as a whole and uses its different quantiles to form pseudo-ensemble forecast members that build up reasonable connections at different forecast initialisation times. To embody a linear relationship between a pseudo ensemble member with observations that we have observed after cube root transformation, a hurdle distribution with a point mass at zero for dry months and a gamma distribution for positive precipitation amount has been used conditional on each pseudo member. These distributions have then mixed to form a flexible predictive probability distribution whose weights are proportional to their historical forecast performance. The evaluation of 32 locations in Australia and three seasonal forecast systems has demonstrated that QEBMA can often statistically significantly outperform raw forecasts, several existing post-processing models and the seasonal forecast benchmark climatology in terms of five forecast metrics, including relative bias, mean absolute error, reliability and continuous ranked probability score. As only quantiles of ensemble forecasts are used, QEBMA is also suitable for SCFs having different ensemble sizes in the retrospective forecast period from its operational setting, such as ECMWF (Johnson et al. 2019).

For a fair comparison, we have used the same number of quantiles as the original ensemble forecasts. Considering a lot of pseudo ensemble members contribute little to the final forecast distribution, it would be interesting to examine if a better number of quantiles should be used. We focus on the post-processing single location, without considering the spatio-temporal correlation in the forecast. It may be our future work on how to include spatial and temporal structure into the post-processing seasonal precipitation forecasts. The idea of using pseudo ensemble members can be used for improving model averaging from multiple ensemble forecasts, such as combining GloSea5, ECMWF, and ACCESS-S1 together. We are planning to test how the idea can be extended to handle multiple GCMs. For cross-model comparison and analyses in this paper, a relatively short common period of three GCMs for retrospective forecast data is used for performance verification and comparison. Subsequent research will comprehensively assess the proposed QEBMA method and its variants, exploring longer retrospective forecast data windows, extended lead times, operational seasonal forecasts, seasonal scales, and related aspects.

Notes

See e.g., https://confluence.ecmwf.int/display/UDOC/Why+are+there+sometimes+small+negative+precipitation+accumulations+-+ecCodes+GRIB+FAQ.
GloSea5 does not have data for Jan 1993. The other two models ECMWF and ACCESSc have 216 months.

References

Baran S, Nemoda D (2016) Censored and shifted gamma distribution based EMOS model for probabilistic quantitative precipitation forecasting. Environmetrics 27:280–292
Google Scholar
Cannon AJ (2018) Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables. Clim Dyn 50:31–49
Google Scholar
Chakraborty A, De S, Bowman KP, Sang H, Genton MG, Mallick BK (2015) An adaptive spatial model for precipitation data from multiple satellites over large regions. Stat Comput 25:389–405
Google Scholar
Falamarzi Y, Pakdaman M, Javanshiri Z (2023) Statistical postprocessing of dynamically downscaled outputs of CFS.v2. Stoch Environ Res Risk Assess 37:2379–2397
Google Scholar
Fragoso TM, Bertoli W, Louzada F (2018) Bayesian model averaging: a systematic review and conceptual classification. Int Stat Rev 86:1–28
Google Scholar
Fraley C, Raftery AE, Gneiting T (2010) Calibrating multimodel forecast ensembles with exchangeable and missing members using Bayesian model averaging. Mon Weather Rev 138:190–202
Google Scholar
Fraley C, Raftery AE, Gneiting T, Sloughter JM (2018) ensembleBMA: probabilistic forecasting using ensembles and bayesian model averaging. R package version 5.1.5. Initial version in 2007
Griffiths M, Smith P, Yan H, Spillman C, Young G (2023) ACCESS-S2: updates and improvements to postprocessing pipelineBRR-082, 218–242 pp
Hamill TM, Whitaker JS, Wei X (2004) Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon Weather Rev 132:1434–1447
Google Scholar
Hersbach H (2000) Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15:559–570
Google Scholar
Hudson D et al (2017) ACCESS-S1: the new Bureau of Meteorology multi-week to seasonal prediction system. J South Hemisph Earth Syst Sci 67:132–159
Google Scholar
Jeffrey SJ, Carter JO, Moodie KB, Beswick AR (2001) Using spatial interpolation to construct a comprehensive archive of Australian climate data. Environ Model Softw 16:309–330
Google Scholar
Jin H, Li M, Hopwood G, Hochman Z, Bakar KS (2022) Improving early-season wheat yield forecasts driven by probabilistic seasonal climate forecasts. Agric for Meteorol 315:108832
Google Scholar
Jin H, Liu Y, Shao Q (2023a) Postprocessing Ensemble forecasts with generative adversarial networks for daily precipitation. In: 2023 IEEE international conference on knowledge graph (ICKG). IEEE, pp 152–159
Jin H, Jiang W, Chen M, Li M, Bakar KS, Shao Q (2023b) Downscaling long lead time daily rainfall ensemble forecasts through deep learning. Stoch Environ Res Risk Assess 37:3185–3203
Google Scholar
Johnson SJ et al (2019) SEAS5: the new ECMWF seasonal forecast system. Geosci Model Dev 12:1087–1117
CAS Google Scholar
Jones DA, Wang W, Fawcett R (2009) High-quality spatial climate data-sets for Australia. Austr Meteorol Oceanogr J 58:233
Google Scholar
Khajehei S, Moradkhani H (2017) Towards an improved ensemble precipitation forecast: a probabilistic post-processing approach. J Hydrol 546:476–489
Google Scholar
Kirtman BP et al (2014) The North American multimodel ensemble: phase-1 seasonal-to-interannual prediction; phase-2 toward developing intraseasonal prediction. Bull Am Meteorol Soc 95:585–601
Google Scholar
Kokic P, Jin HD, Crimp S (2013) Improved point scale climate projections using a block bootstrap simulation and quantile matching method. Clim Dyn 41:853–866
Google Scholar
Li M, Jin H (2020) Development of a postprocessing system of daily rainfall forecasts for seasonal crop prediction in Australia. Theor Appl Climatol 141:1331–1349
Google Scholar
Li M, Jin H, Brown JN (2020) Making the output of seasonal climate models more palatable to agriculture: a copula-based post-processing method. J Appl Meteorol Climatol 59:497–515
Google Scholar
Li M, Jin H, Shao Q (2021) Improvements in subseasonal forecasts of rainfall extremes by statistical postprocessing methods. Weather Clim Extremes 34:100384
Google Scholar
MacLachlan C et al (2015) Global Seasonal forecast system version 5 (GloSea5): a high-resolution seasonal forecast system. Q J R Meteorol Soc 141:1072–1084
Google Scholar
Maraun D, Widmann M (2018) Statistical downscaling and bias correction for climate research. Cambridge University Press
Google Scholar
Merryfield WJ et al (2020) Current and emerging developments in subseasonal to decadal prediction. Bull Am Meteorol Soc 101:E869–E896
Google Scholar
Michelangeli PA, Vrac M, Loukos H (2009) Probabilistic downscaling approaches: application to wind cumulative distribution functions. Geophys Res Lett 36
Monhart S, Spirig C, Bhend J, Bogner K, Schar C, Liniger MA (2018) Skill of subseasonal forecasts in Europe: effect of bias correction and downscaling using surface observations. J Geophys Res-Atmos 123:7999–8016
Google Scholar
Nelsen RB (2006) An introduction to copulas. 2nd ed. Springer, xiii, 269 p
Peel D, McLachlan GJ (2000) Finite mixture models. John & Sons
Piani C, Haerter JO, Coppola E (2010) Statistical bias correction for daily precipitation in regional climate models over Europe. Theor Appl Climatol 99:187–192
Google Scholar
R Core Team, 2022: R: A language and environment for statistical computing. R. F. f. S. Computing, Ed
Renard B, Kavetski D, Kuczera G, Thyer M, Franks SW (2010) Understanding predictive uncertainty in hydrologic modeling: the challenge of identifying input and structural errors. Water Resour Res 46:1–22
Google Scholar
Schepen A, Zhao T, Wang QJ, Robertson DE (2018) A Bayesian modelling method for post-processing daily sub-seasonal to seasonal rainfall forecasts from global climate models and evaluation for 12 Australian catchments. Hydrol Earth Syst Sci 22:1615–1628
Google Scholar
Schepsmeier U et al (2015) Package ‘vinecopula’. R package version, 2
Scheuerer M (2014) Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Q J R Meteorol Soc 140:1086–1096
Google Scholar
Scheuerer M, Hamill TM (2015) Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon Weather Rev 143:4578–4596
Google Scholar
Scheuerer M, Switanek MB, Worsnop RP, Hamill TM (2020) Using artificial neural networks for generating probabilistic subseasonal precipitation forecasts over California. Mon Weather Rev 148:3489–3506
Google Scholar
Schlosser L, Hothorn T, Stauffer R, Zeileis A (2019) Distributional regression forests for probabilistic precipitation forecasting in complex terrain. Ann Appl Stat 13:1564–1589
Google Scholar
Shao Q, Li M (2013) An improved statistical analogue downscaling procedure for seasonal precipitation forecast. Stoch Environ Res Risk Assess 27:819–830
Google Scholar
Sloughter JML, Raftery AE, Gneiting T, Fraley C (2007) Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon Weather Rev 135:3209–3220
Google Scholar
The Centre for International Economics (2014) Analysis of the benefits of improved seasonal climate forecasting for agriculture
Tian D, He X, Srivastava P, Kalin L (2021) A hybrid framework for forecasting monthly reservoir inflow based on machine learning techniques with dynamic climate forecasts, satellite-based data, and climate phenomenon information. Stoch Environ Res Risk Assess 36:1–23
Google Scholar
Troin M, Arsenault R, Wood AW, Brissette F, Martel JL (2021) Generating ensemble streamflow forecasts: a review of methods and approaches over the past 40 years. Wiley Online Library
Tyralis H, Papacharalampous G (2021) Quantile-based hydrological modelling. Water 13:3420
Google Scholar
Vannitsem S et al (2021) Statistical postprocessing for weather forecasts: review, challenges, and avenues in a big data world. J Bull Am Meteorol Soc 102:E681–E699
Google Scholar
Vitart F et al (2022) Outcomes of the WMO prize challenge to improve subseasonal to seasonal predictions using artificial intelligence. Bull Am Meteorol Soc 103:E2878–E2886
Google Scholar
Vivas E, de Guenni LB, Allende-Cid H, Salas R (2023) Deep Lagged-Wavelet for monthly rainfall forecasting in a tropical region. Stoch Environ Res Risk Assess 37:831–848
Google Scholar
Wang QJ, Shao Y, Song Y, Schepen A, Robertson DE, Ryu D, Pappenberger F (2019) An evaluation of ECMWF SEAS5 seasonal climate forecasts for Australia using a new forecast calibration algorithm. Environ Model Softw 122:104550
Google Scholar
Weisheimer A, Palmer T (2014) On the reliability of seasonal climate forecasts. J R Soc Interface 11:20131162
CAS Google Scholar
Wilks DS (2009) Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteorol Appl 16:361–368
Google Scholar
Wood AW, Maurer EP, Kumar A, Lettenmaier DP (2002) Long-range experimental hydrologic forecasting for the eastern United States. J Geophys Res-Atmos 107:ACL-6
Google Scholar
Yilmaz MU, Aksu H, Onoz B, Selek B (2023) An effective framework for improving performance of daily streamflow estimation using statistical methods coupled with artificial neural network. Pure Appl Geophys 180:3639–3654
Google Scholar
Yumnam K, Guntu RK, Rathinasamy M, Agarwal A (2022) Quantile-based Bayesian model averaging approach towards merging of precipitation products. J Hydrol 604:127206
Google Scholar

Download references

Acknowledgements

The authors acknowledged Mr Chao Lu for his help in the initial data preparation, implementation, and testing of QEBMA.

Funding

Open access funding provided by CSIRO Library Services. The project was funded by the Queensland government, Drought and Climate Adaptation Program (DCAP), and the CSIRO Digiscape Future Science Platform.

Author information

Authors and Affiliations

CSIRO Data61, GPO BOX 1700, Canberra, ACT, 2601, Australia
Huidong Jin
Institute for Climate, Energy and Disaster Solutions, The Australian National University, Canberra, ACT, 2601, Australia
Mona E. Mahani & Steven Crimp
CSIRO Data61, PO BOX 1130, Bentley, WA, 6102, Australia
Ming Li & Quanxi Shao

Authors

Huidong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Mona E. Mahani
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Quanxi Shao
View author publications
You can also search for this author in PubMed Google Scholar
Steven Crimp
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study's conception and design. Material preparation, data collection and analysis were performed by H. Jin and M. Mahani. The first draft of the manuscript was written by H. Jin and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Huidong Jin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 697 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jin, H., Mahani, M.E., Li, M. et al. Probabilistic seasonal precipitation forecasts using quantiles of ensemble forecasts. Stoch Environ Res Risk Assess 38, 2041–2063 (2024). https://doi.org/10.1007/s00477-024-02668-5

Download citation

Accepted: 21 January 2024
Published: 29 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00477-024-02668-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Probabilistic seasonal precipitation forecasts using quantiles of ensemble forecasts

Abstract

Similar content being viewed by others

An effective post-processing of the North American multi-model ensemble (NMME) precipitation forecasts over the continental US

Deterministic and probabilistic evaluation of raw and post processed sub-seasonal to seasonal precipitation forecasts in different precipitation regimes

Bias adjustment and ensemble recalibration methods for seasonal forecasting: a comprehensive intercomparison using the C3S dataset

1 Introduction

2 Case study regions and data

2.1 Case study regions

2.2 Monthly retrospective forecasts data

2.2.1 GloSea5 from UK Met Office

2.2.2 ECMWF from Europe

2.2.3 ACCESS-S1 with calibration

3 Method

3.1 Bayesian model averaging (BMA)

3.2 QEBMA based on pseudo ensemble forecast members

3.3 Component probability distribution function for monthly precipitation

3.4 Parameter estimation and forecasts

4 Verification and comparison results

4.1 Comparison models and implementation

4.2 Forecast verification metrics and skill scores

4.2.1 Relative bias

4.2.2 Mean absolute error (MAE)

4.2.3 Forecast coverage and reliability

4.2.4 Continuous ranked probability score

4.3 Results on GloSea5

4.4 Results on ECMWF

4.5 Results on ACCESSc

4.6 Discussions

5 Conclusions

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 697 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation