1 Introduction

Cool season (April to October) rainfall dominates the annual average over the Australian state of Victoria (CSIRO 2012; Hope et al. 2017), and is very important for the environment, agriculture and for replenishing reservoirs (Delage and Power 2020; Rauniyar and Power 2020). Paleoclimate proxy records and instrumental observations from a range of sources have shown that cool season Victorian rainfall exhibits large variability, with numerous flooding and drought episodes through the observational records, on interannual through to multidecadal timescales (e.g., Power et al. 1999a, b; Gallant and Gergis 2011; Gergis et al. 2012; Hope et al. 2017). While Victorian climate variability is high, it is also changing in response to anthropogenic forcing (CSIRO 2012; Timbal et al. 2016; Hope et al. 2017; Rauniyar et al. 2019; DELWP et al. 2020). For example, Victoria experienced its warmest period over the past few decades and unusually low cool season rainfall since the beginning of the Millennium Drought (MD) in 1997 (CSIRO 2012; Timbal et al. 2016; Hope et al. 2017; Rauniyar et al. 2019; Rauniyar and Power 2020). Research undertaken during the South-Eastern Australia Climate Initiative (SEACI) Phase 1 (CSIRO 2010; 2006–2009) and Phase 2 (CSIRO 2012; 2009–2012), during the Victorian Climate Initiative (VicCI; 2013–2016; Hope et al. 2017) and during the Victoria Water and Climate Initiative (VicWaCI: 2017–2020; DELWP et al. 2020) showed that the MD was the most severe protracted drought (1997–2009) in the instrumental record (Kiem and Verdon-Kidd 2010; Grant et al. 2013; Timbal and Fawcett 2013; Cai et al. 2014; DELWP 2016a; Dey et al. 2019). Drying during the MD primarily occurred during late autumn (April–May) and early winter (June-July) (CSIRO 2012; Timbal and Fawcett 2013; Timbal et al. 2016; Hope et al. 2017). The MD was "broken" due to widespread flooding across the region during spring/summer of 2010–2011 and 2011–2012, however the drying trend in cool season rainfall since the MD has still been continued (Hope et al. 2017; Kirono et al. 2017). Previous studies concluded that the decline in rainfall since 1997 is largely dominated by internal climate variability with climate change being only a partial contributor (Cai et al. 2014; Delworth and Zeng 2014; Rauniyar and Power 2020).

On the contrary, anthropogenic forcing is projected to strongly influence future rainfall in Victoria (e.g., CSIRO 2010, 2012; Timbal et al. 2010; Cai et al. 2014; Delworth and Zeng 2014; Grose et al. 2015; Rauniyar and Power 2020). For example, Rauniyar and Power (2020) looked at the cool season rainfall changes in the global climate models from phase 5 of the Coupled Model Intercomparison Project (CMIP5) over Victoria and concluded that towards the end of twenty-first century the median rainfall will decline by approximately − 12% (relative to 1900–1959 period) with an interquartile range (IQR) of approximately− 26 to − 6% under a high greenhouse gas emissions scenario, called Representative Concentration Pathway 8.5 (RCP8.5; van Vuuren et al. 2011). However, the degree of median drying is relatively less under low (RCP2.6) and medium (RCP4.5) emissions scenarios being around − 3% (IQR − 9 to − 1%) and − 8% (IQR − 12% and − 5%) respectively. Other studies that analysed the rainfall changes in the CMIP5 models have also reported similar changes in cool season rainfall over Victoria (e.g., Grose et al. 2015; Hope et al. 2015, 2017; Timbal et al. 2015). For the near-term future (2018–2037), Rauniyar and Power (2020) concluded that there is only a ~ 12% chance that the externally-forced drying could be completely offset by the internal natural rainfall variability, regardless of scenario. This warrants a reliable estimate of future water availability is needed for better management and planning of already scarce water resources across Victoria in the face of ongoing climate change.

Prior to the research undertaken over the recent decade (e.g., CSIRO 2010, 2012; Hope et al. 2017; Rauniyar et al. 2019), some stakeholders were using the full observed climate records (since 1900) as the historical reference period (baseline climate) for future planning and management of water resources with an assumption of stationary climate (i.e., a non-changing average to continue into the future). Victorian water planners use the term “baseline” to refer to a period over which the statistical properties of rainfall and streamflow will approximate “current conditions” (DELWP 2016b, 2020). “Current conditions” are also used by water managers to provide an estimate of the range of possible conditions that will be experienced in the near-future. This terminology differs from what we would use in climate science. Climate scientists would describe the baseline as e.g., “recent conditions”, and conditions for coming seasons and years as e.g., “near-future conditions”. Climate scientists would also recognize that the water planners are using the statistics of recent conditions as a proxy for near-future conditions. Consequently, the key requirements for a baseline from a water planner’s perspective—in light of the climate research over the past 15 years that highlights the importance of anthropogenically-forced changes, is that it should be of sufficient duration to encompass the range of natural climate variability (e.g., severe droughts or cool seasons), but short enough to represent the current level of anthropogenic forcing of the climate (DELWP 2016b, 2020; Potter et al. 2016). With results from VicCI and VicWaCI in mind, the current "Guidelines for Assessing the Impact of Climate Change on Water Supplies in Victoria" (i.e., DELWP 2016a, 2020) developed by the Victorian Government’s Department of Environment, Land, Water and Planning (DELWP) recommends using July 1975 to near-present as the historical reference climate ("baseline") for water resources planning and management across Victoria. The rationale behind the post-1975 climate reference period is that it is long enough to incorporate a wide range of natural variability, and it is consistent with the recent research findings (e.g., SEACI, VicCI, and VicWaCI), in particular that key climate variables changed after the 1970s (DELWP 2016b, 2020). The post-1975 baseline is mainly used to estimate water availability under historic greenhouse gas concentrations and to generate the current and future climate change scenarios by combining it with Global Climate Model (GCM)-derived projections.

In this paper we will use similar methods to those used in the past by Mpelasoka and Chiew (2009) and Potter et al. (2016) to estimate future rainfall. This is based on calculating so-called Scaling Factors (SFs) from the climate models, which measure projected changes in rainfall, and applying these to the observations. In previous studies, the SFs were applied directly to Relative Frequency Distributions (RFDs) of past rainfall. The model-based SFs are estimates of the impact of external forcing, often based on a number of different models and simulations so as to greatly reduce the impact of internal variability. These SFs are typically then applied to the observational record to obtain an estimate of future rainfall. Unlike the models, however, we only have one "realization" of the observations. The record will, of course reflect both internal variability and the impact of external forcing. In some decades internal variability will produce higher rainfall than average, while in other decades lower. This means, for example, that if the observational period occurred when internal variability produced a transient but large shift in the average amount of rainfall, the SF approach would lead to a biased estimate of future rainfall.

This issue is of particular relevance in the context of Victorian rainfall, because Rauniyar and Power (2020) estimated that approximately 80% of the observed decline in rainfall over Victoria for the 1997–2018 period (relative to the 1900–1959 period average) was due to internal variability. Thus, if one applied model-based SFs to observed rainfall over this period then one would expect to obtain a biased-low (i.e., overly pessimistic) estimate of future rainfall. We introduce a new method to help circumvent this problem and produce what we regard as a more reliable estimate of future rainfall. We do this by removing the estimated contribution of external forcing from the observed record before applying relevant SFs based on differences between simulations of both future and the early historical period to the modified observational record. We estimate the impact of this revised approach. We will also quantify the degree to which distributions based on observational records over various periods (e.g., post-1975 and post-1997) approximate what we regard as our best estimates of future rainfall distributions.

To facilitate the communication of our results we will define three Categories of methods to estimate future rainfall:

  • Category 1 (no future climate change): RFDs of observed rainfall are used as a guide to the future

  • Category 2 (basic SF): apply model derived SFs to the observed rainfall

  • Category 3 (adjusted SF): as for Category 2 but with the estimated impact of external forcing removed from the observational data before the SFs are applied.

A more complete description of the methods and the data used is given in the following section. RFDs using Category 1 methods (i.e., based on past observations only) are presented in Sect. 3. In Sect. 4, we present the results using Category 2 and examine the robustness of the SFs derived using the RCP8.5 scenario against the SFs computed using pre-industrial simulations, and the inefficacy of using shorter periods in estimating future rainfall. In Sect. 5, we present the results using Category 3 and compare the future estimates of rainfall using all three Categories. In Sect. 6, we provide the estimate of future rainfall variability and assess the confidence in the methods outlined. Finally, Sect. 7 provides a summary of the results and suggestions for future research.

2 Data and methods

2.1 Observed and model data

We use daily rainfall data gridded at 0.05° × 0.05° spatial resolution for the period 1900–2018 from the Australian Bureau of Meteorology Australian Water Availability Project (AWAP; Jones et al. 2009). This is the same period used by Rauniyar and Power (2020) which enables us to directly use some of their results. These daily data are converted to monthly resolution and re-gridded to 1.5° × 1.5° spatial resolution, which is closer to the resolution of the climate models, by a conservative interpolation method (Jones 1999; Rauniyar et al. 2017). A 119-year-long time-series of Victorian average cool season (April to October inclusive) rainfall is then calculated using area-averaging for each year from 1900 to 2018. This is done to develop a new method (see Sect. 2.2.3) that could exploit the research of Rauniyar and Power (2020) and may alleviate the limitations of existing scaling methods (see Sect. 2.2.2). Note that the new method is applicable to grids at any resolution (e.g., downscaled products), however it is hard to quantify the climate change signal at individual grid cells as the signal may not be robust. Furthermore, our confidence in regional projections diminishes as the spatial scale gets smaller given the coarse resolution of the climate models.

We also use monthly rainfall simulations from the CMIP5 climate models (Taylor et al. 2012). We use the historical rainfall simulations under time-dependent, observed forcings of atmospheric composition (i.e., “historical all forcing” runs) for the 1900–2005 period and future rainfall projections under the highest emission scenario (RCP8.5) for the 2006–2100 period from 40 CMIP5 climate models (see Supplementary Table 1 for a list of models that are common across historical and RCP8.5 simulations). Rainfall from these models is used to estimate the SFs for three different future periods. In addition, we use the long-term model simulations under the preindustrial control (hereafter piCTL) runs from the subset of 31 CMIP5 models which have at least 200 years of piCTL rainfall simulations (Supplementary Table 1). The piCTL simulations are used to estimate the modelled range of internal climate variability statistics (Delworth and Zeng 2014; Rauniyar and Power 2020) as the various forcing agents—the atmospheric concentrations of GHGs, aerosols, ozone, and solar irradiance are prescribed (fixed) at the preindustrial level (year 1850). The piCTL runs are used to compute the distribution of SFs that can arise from internal variability, alone and to analyse the impact of internal variability on the dependence of results on choosing different reference periods. Only the first run (r1i1p1) of each CMIP5 model is used in this study. Before computing the Victorian-average rainfall time-series, all the CMIP5 models are interpolated using a conservative mapping approach to a common 1.5° × 1.5° grid.

2.2 Description of methods

As mentioned in the Introduction, three categories of methods are used to project future all-Victoria rainfall (i.e., rainfall averaged over the whole state) under a changing climate at three future time slices, each of 30-years of length and centered at: 2025 (2010–2039; near-term), 2055 (2040–2069; medium-term), and 2085 (2070–2099; long-term). However, we consider a variety of slightly different details within each Method, resulting in a total of nine different estimates of future rainfall and a summary of these methods are provided in Table 1. To better present and summarise the results, we grouped them into three different categories according to how they use observations and climate models. Throughout this paper, we adopted \({R}_{data\ type}^{historical\ period}\) notation to refer to the methods, where R is used for RFD, subscript ‘data type’ indicates the types of data being used and superscript ‘historical period’ includes start and end year of historical period used to compute RFD. The valid values for ‘data type’ are: ‘obs’ to indicate that only historical observations are used, ‘obs&mods’ to indicate that model-based scaling factors are used to scale observations, and ‘obs*(Qx)&mods’ to indicate that the contribution of external forcing is removed from historical observations before applying scaling factors. The term ‘Qx’ shows which estimate (quartiles: Q1, Q2, or Q3) of the model-based externally forced contribution to the observed change is being used to remove the influence of external forcing from the historical observations. We will now describe the methods in more detail:

Table 1 Description of the nine methods used in this study to estimate rainfall distributions for the near-term (2010–2039), medium-term (2040–2069), and long-term (2070–2099)

2.2.1 Category 1 (i.e., methods that use observations only)

As stated in the Introduction, the methods in Category 1 utilizes the historical observations only and the RFDs are represented using \({R}_{obs}^{historical\ period}\) notation. We use one of three different historical (baseline) periods, all ending 2018 inclusively. These methods presume that rainfall in coming years can be adequately approximated by rainfall experienced during past periods. This is the approach that is sometimes taken or contemplated by water resource managers (Milly et al. 2015; Montanari and Koutsoyiannis 2014; Koutsoyiannis and Montanari 2015; Sun et al. 2018; Wang and Sun 2020). The inclusion of Category 1 enables us to see how well distributions of past observations over particular reference periods approximate estimated future rainfall distributions using more sophisticated approaches. It also allows us to place future rainfall distributions into a historical context.

The first method in Category 1 (hereafter \({R}_{obs}^{1900-2018}\)) incorporates the full historical record (i.e., 1900–2018). This would be most appropriate under the assumption of a stationary climate (Milly et al. 2015; Sun et al. 2018; Salas et al. 2018). The second method (\({R}_{obs}^{1975-2018}\)) uses the rainfall distribution only for the more recent period 1975–2018. The third method (\({R}_{obs}^{1997-2018}\)) uses the rainfall distribution for the very recent period, 1997–2018 period. The last two baseline periods were recommended, in light of VicCI (Hope et al. 2017) and VicWaCI (DELWP et al. 2020) research highlighting the recent dry conditions and the prospects for future drying in response to increase in greenhouse gases (Potter et al. 2016; Hope et al. 2017), in the "Guidelines for Assessing the Impact of Climate Change on Water Supplies in Victoria" developed by DELWP (DELWP 2016b, 2020). In this study, we use method \({R}_{obs}^{1975-2018}\) as a reference to gauge the changes in future rainfall.

2.2.2 Category 2 (i.e., methods that use observations and models outputs)

Current practice in water management in both Victoria, and commonly worldwide is, in general, to use subsets of historical records to approximate future conditions, and not to use climate model output directly (Chiew et al. 2009a, b; Westra et al. 2010; Potter et al. 2016). As climate scientists, we know that climate models provide information on future conditions and we wish to make use of this information in providing estimates of future rainfall. On the other hand, we want to avoid the situation in which water managers would need to overhaul the approach they take for estimating conditions in the near-term. Category 2 is designed with these issues in mind. Category 2 is comprised of methods in which SFs (denoted as Г) derived from climate models are applied to area-averaged Victorian observed rainfall time-series. SFs are calculated using the area-averaged model simulated Victorian rainfall time-series for each decile bin and they are then used to adjust the observed rainfall values inside each decile bin of the historical period chosen. This ensures that information from climate models on future change is incorporated, while preserving important statistical properties of the observational data (Chiew et al. 2009a). The methods in Category 2 (referred to as \({R}_{obs\&mods}^{1900-2018}\text{, } {R}_{obs\&mods}^{1975-2018} {\text{ and }} {R}_{obs\&mods}^{1997-2018}\)) where ‘obs&mods’ is used in subscript to denote ‘obs’ is scaled by the models based scaling factors to compute the RFDs (i.e., methods in Category 2 are the scaled versions of the three methods in Category 1).

SFs are calculated separately for the near-term (2010–2039), medium-term (2040–2069), and long-term (2070–2099) futures. The decile SFs for each model j, Гj, are computed using that model’s projection and its historical simulation. The mean of each decile bin for the (cool season) rainfall time series is calculated for the historical period and then for the future period. The SF for each decile bin, i (1, 2,...., 10) of each model j (1, 2,...., 40), is given by:

$${\Gamma}_{i, j}= {mu}_{i, j}^{future}/{mu}_{i, j}^{past}$$
(1)

where \({mu}_{i, j}^{future} \text{and } {mu}_{i, j}^{past}\) are the averages of the values inside the ith decile bin of model j's rainfall simulation for future and historical (past) periods, respectively. Note that decile 1 contains the lowest 10% and decile 10 contains the highest 10% of rainfall values. The observed rainfall values in decile bin i are then scaled using model j's SFs:

$${\widehat{R}}_{i, j}={\Gamma}_{i, j}\times {R}_{i,j}$$
(2)

For all three methods in Category 2, the above steps generate 40 time-series of scaled rainfall observations for each future (e.g., near-, medium- and long-term) period. Next, the RFDs or probability density functions (PDFs) of the scaled observations are estimated by fitting a gamma distribution. Finally, the multi-model median (MMMed) PDFs are estimated for all the individual future periods for all three methods in Category 2. The second method in Category 2 (i.e., \({R}_{obs\&mods}^{1975-2018}\)) is very similar to the scaling method used by Potter et al. (2016) and adopted by DELWP (2020), except that Potter et al. (2016) use the 1986–2005 period as the historical period to compute the SFs instead of the period we use (i.e., 1975–2018). However, the results are very similar whether the SFs are based on the 1975–2018 period or 1986–2005 period (not shown).

2.2.3 Category 3 (i.e., methods that use adjusted-observations and models outputs)

As noted in Sect. 1, one limitation of the scaling methods in Category 2 is that the historical periods (and future periods) chosen are of finite length. Strictly speaking, the methods would work best if the periods chosen were sufficiently long that the sample means closely match the population means or, equivalently, has an average that is not markedly affected by internal variability or external forcing. Unfortunately, and again as noted above, Rauniyar and Power (2020) concluded that this is not the case for the period 1997–2018. They estimated that 80% of the rainfall decline during the period 1997–2018 (relative to 1900–1959) was due to internal variability. So, applying scaling to this baseline period might give estimates of future rainfall that are lower than models suggest they should be. Furthermore, the SFs method works better on the forced signal assuming that the internal variability over the period in an ideal world is close to zero, which is not the case for shorter periods. We attempt to reduce this problem using the Category 3 methods.

There are actually two differences between Category 3 and Category 2 methods. First, the Category 3 method begins by removing the impact of external forcing on historical data—estimated using the method described below, and by Rauniyar and Power (2020). Specifically, the contribution of externally-forced drying in recent decades is estimated by determining the proportional contribution of the multi-model median rainfall change to the observed change (Rauniyar and Power 2020). All the changes are calculated relative to 1900–1959 period assuming that the influence of climate change, if any, on rainfall variability, is much less in earlier decades. This also enabled a direct comparison of our results with the results of Rauniyar and Power (2020). Removal of the contribution of external forcing results in a historical record that would have occurred in the absence of any external forcing. It is therefore an estimation of rainfall variability that would occur from internal variability alone. Second, the SFs applied to this modified observational record are based on a comparison between early twentieth century historical runs (in which there are minimum changes in external forcing, if any, and all the variability arises from internal variability only) and the simulations of future climate. This is a new and novel approach of estimating future rainfall distributions.

Firstly, we estimate the contribution of external forcing in the observed decline for the 1975–2018 period following the method described in Rauniyar and Power (2020). Selection of this period is based on the fact that there is a negligible contribution of climate change on Victorian rainfall prior to 1975 (Timbal et al. 2016; Hope et al. 2017; Jones and Ricketts 2017; DELWP et al. 2020). Furthermore, the 1975–2018 period is important for DEWLP as it is recommended to be used as a historical reference period to generate future climate scenarios (DELWP 2020). Secondly, the proportion of external forcing contribution is removed from 1975 onward and the adjusted time-series is combined with the raw observations prior to 1975 (i.e., 1900–1974). This process results in the longest possible historical rainfall records (i.e., 1900–2018) for Victoria, but the influence of climate change reduced (i.e., effectively rendering a longest possible stationary time-series). Finally, SFs based on the 1900–1974 period are applied to the adjusted observations for the 1900–2018 period. We will refer to this method as \({R}_{obs*(Q2)\&mods}^{1900-2018}\), where Q2 in subscript indicates that the median value of the model estimates of external forcing to the 1975–2018 rainfall decline is used to remove the influence of external forcing from historical observations. The term ‘obs*’ in subscript indicates that the contribution of external forcing is removed from historical observation before model-based scaling factors are applied.

Additional methods are included in Category 3 because Rauniyar and Power (2020) pointed out that there is considerable uncertainty in their estimate of the contribution of internal variability and external forcing to observed rainfall change. We therefore include two additional methods using the first quartile (Q1) and the third quartile (Q3) of the estimated contribution of external forcing to the 1975–2018 rainfall decline, as estimated by Rauniyar and Power (2020). These last two methods will be referred as \({R}_{obs*(Q1)\&mods}^{1900-2018}\) and \({R}_{obs*(Q3)\&mods}^{1900-2018}\), respectively.

3 RFDs based on observations-only (Category 1 methods)

In this section, we present the results using the three different Category 1 methods (Fig. 1), which all utilize past observations only. Figure 1 shows the RFDs for the periods 1900–2018, 1975–2018, and 1997–2018. While all three RFDs are different, only the distribution for the 1997–2018 period (shown in red color in Fig. 1) is statistically different from the other two distributions at a 90% significance level based on Wilcoxon rank-sum two-sided test (Fay and Proschan 2010). The average rainfall for these periods are: 63.4 mm month−1 for the full period (i.e., 1900–2018), 61.3 mm month−1 for 1975–2018 and 56.3 mm month−1 for 1997–2018. Other key statistics for these periods and other historical periods of interest for this study are summarized in Table 2.

Fig. 1
figure 1

Relative frequency distributions (RFDs) of observed area-averaged Victorian cool season mean rainfall for three different historical periods that are used in Category 1 methods (see Table 1). The first method (\({R}_{obs}^{1900-2018}\)) uses the full historical record 1900–2018 (black); the second method (\({R}_{obs}^{1975-2018}\)) uses the 1975–2018 period (blue) and the third method (\({R}_{obs}^{1997-2018}\)) uses the 1997–2018 period (red). Each RFD is the best-fit Gamma distribution. DELWP recommends the post-1975 period as the historical (current) climate reference period

Table 2 Key statistics of observed rainfall (mm month−1) for different historical periods used in this study

Given that the anthropogenic climate change has exacerbated drying from internal climate variability over Victoria in recent decades (Cai et al. 2014; Hope et al. 2017; DELWP et al. 2020; Rauniyar and Power 2020), the RFD for the full historical period (\({R}_{obs}^{1900-2018}\)) may overestimate the availability of future rainfall. The third method (\({R}_{obs}^{1997-2018}\)), which utilizes the recent observations from the beginning of the MD to near-present (i.e., 1997–2018) shows the driest future ahead of the three periods analysed. However, it seems that the post-1997 period may not be long enough to adequately represent the range of rainfall variability. This can be seen in Fig. 1 which shows that the lowest minimum rainfall has actually occurred outside the 1997–2018 period (see Table 2 as well). Furthermore, this method (i.e., \({R}_{obs}^{1997-2018}\)), assumes that the decline in rainfall during 1997–2018 relative to the earlier record was entirely due to external forcing, whereas Rauniyar and Power (2020) concluded that the observed decline in rainfall since 1997 was dominated by internal climate processes. In addition, the projected rainfall reductions from external forcing for 2030 across Victoria are smaller than the observed decline in rainfall for the post-1997 period (e.g., Grose et al. 2015; Timbal et al. 2016; Hope et al. 2017) and hence this period alone (i.e., 1997–2018) may not be a good representative of future rainfall over Victoria. It may be that the RFD for the second method (\({R}_{obs}^{1975-2018}\): shown in blue color in Fig. 1) could be a good approximation, at least for the near-term rainfall, as it includes a known influence from climate change and is long enough to incorporate a wide range of natural forcing and internal climate variability over Victoria. It is also consistent with World Meteorological Organization (WMO) conventions of using at least 30-year period to represent current climate. Previous studies (Jones 2012; Jones and Ricketts 2017) have shown that the start date of this method (i.e., 1975) broadly aligns with the apparent, observed step changes in climate variables, particularly temperature, in the 1970s. And finally, the post-1975 period is recommended by the DELWP for use in near-term planning decisions. We will return to the appropriateness of using 1975–2018 and other historical periods as a guide to future rainfall in Sect. 5.3.

4 Future estimates of rainfall by scaling observations (Category 2 methods)

4.1 Robustness of the scaling factors

Before we examine PDFs using Category 2 methods, it is instructive to know how large the SFs for the high emission scenario (i.e., RCP8.5) are compared to the SFs that arise simply from internally-generated variability (e.g., from piCTL runs) or due to randomly generated Gaussian white noise (WNoise). WNoise of 200 years of length are estimated using the means and standard deviations from the piCTL runs. Figure 2 shows the distributions of SFs for the long-term (2070–2099) and the near-term (2010–2039) periods using the 31 CMIP5 models that are common across the piCTL, historical and RCP8.5 scenarios (see Supplementary Table 1). For the high emission scenario, the SFs are computed relative to the reference period (1975–2018). However, for the piCTL runs, 200-years of rainfall simulations are separated into two different lengths in a way similar to the historical plus RCP8.5 scenario and then the SFs are computed. The same steps are repeated on a randomly generated WNoise time-series with 200 samples. As expected, the MMMed values of the SFs for both the piCTL runs (green circle) and for the randomly generated WNoise (blue circle) are located close to 1.0 at every decile bin (Fig. 2), reflecting the absence of externally-forced changes in the piCTL runs. The IQR of SFs for the piCTL and WNoise are within ± 4% of 1.0 while 90% of the SFs lie within ± 9% of 1.0. The Wilcoxon rank-sum test (Fay and Proschan 2010) shows that the distributions of SFs for piCTL and for WNoise are not statistically different from each other at the 95% significance level. In contrast, the difference between the distributions of the SFs under the high emission with those of either the piCTL runs or WNoise are statistically significant at the 95% level based on the Wilcoxon rank-sum test at all deciles for the long-term, but only for deciles 0–6 in the near-term (shaded boxes in Fig. 2). The large inter-model spread in the SFs for the high emission scenario reflects the fact that the externally-forced response varies from model to model. This spread reflects uncertainty in the precise value of SFs, due to the presence of internal variability and differences in factors such as climate sensitivity and circulation changes among the models. Nonetheless, the MMMed values under RCP8.5 are significantly less than 1.0 and are also located outside the IQR values of both the piCTL and the WNoise distributions at most of the decile bins, except at the upper deciles of the near-term period.

Fig. 2
figure 2

Distribution of the decile scaling factors (SFs) for each decile bin, relative to the historical reference period (1975–2018) for a the long-term (2070–2099) and b the near-term (2010–2039) periods. The SFs are based on the 31 CMIP5 models which have piCTL, historical and RCP8.5 runs. The horizontal line in each box indicates the median, the box represents the inter-quartile range (IQR 25th and 75th percentiles) and the whiskers indicate the minimum and the maximum values. The median values for the piCTL runs and from randomly generated white noise (WNoise) are overlaid on the box-plots in green and blue circles with corresponding IQRs represented by the blue and green vertical lines, respectively. The shaded boxes represent the distributions that are statistically different at the 95% level from the distributions of the piCTL or WNoise. The Wilcoxon rank-sum statistical test is applied to evaluate the statistical significance. The likelihood of occurrence by chance of the RCP8.5 scenario's multi model median (MMMed) values are shown in the parentheses

We estimated the probability of occurrence of the median values under RCP8.5 by random resampling (i.e., bootstrapping) of the SFs under the piCTL runs. We found that the likelihood of obtaining or exceeding the median values under RCP8.5 by the internally-generated variability alone is < 1% (i.e., below the 1st percentile) up to the 6th decile bin for the long-term period. The likelihood increases to around 7% for the 7th–9th decile bins and reaches approximately 16% for the 10th decile bin. For the medium-term period (2040–2069), the distributions of the SFs including the medians under the high emissions scenario are also found to be robust at all decile bins, except the 10th decile, where there is 37% probability that it could occur by chance (not shown). For the near-term period, when the impact of GHGs forcing is more modest, while the MMMed value of the SFs are all less than 1.0 for all decile bins, the difference between the SFs for near-term under the high emissions scenario and piCTL runs are statistically significant in only the mid and lower decile bins (Fig. 2b). We also found that the probability of obtaining or exceeding a SF as large as the MMMed SFs in the near-term by internally-generated variability varies between 25 and 35% for bins above the 6th decile and between 13 and 18% for the lower decile bins.

These findings on the statistical significance of the SFs, including the median under RCP8.5, relative to any other historical periods (e.g., 1900–2018, 1900–1974 or 1997–2018) are very similar to the 1975–2018 period, except for the SFs computed using the post-1997 historical period for the near-term (not shown). In this case, only the SF distributions of the 2nd, 4th and 5th decile bins are statistically different to the corresponding piCTL distributions at the 95% level (not shown).

4.2 RFDs based on scaled observations

In this section, we examine the estimates of future rainfall PDFs using the three methods in Category 2, i.e., \({R}_{obs\&mods}^{1900-2018}\text{, } {R}_{obs\&mods}^{1975-2018} {\text{ and }} {R}_{obs\&mods}^{1997-2018}\), in which scaling methods derived from climate models simulations are applied to observations for the periods indicated (i.e., superscripts). The estimates of rainfall distributions for long-term and near-term futures by these approaches are shown in Fig. 3. Irrespective of the historical period used for scaling, there exists large variability in the estimation of future rainfall distribution (gray lines), due to different model responses to forcing and from internal variability. In addition, the spread among the rainfall PDFs is wider for the long-term future compared to the near-term future. This is because the external forcing signal increases in the longer-term and with it model-to-model differences in response.

Fig. 3
figure 3

Estimates of area-averaged Victorian cool season mean rainfall distributions using the methods in Category 2 for long-term (left column) and near-term (right column) futures. These distributions are the scaled versions of the three methods in Category 1: (first row) the full historical period (\({R}_{obs\&mods}^{1900-2018}\)), (second row) the 1975–2018 period (\({R}_{obs\&mods}^{1975-2018}\)) and (third row) the 1997–2018 period (\({R}_{obs\&mods}^{1997-2018}\)). The estimated PDFs based on individual model SFs are shown in gray and the MMMed PDFs are represented by the dashed lines, while the RFDs for the observed rainfall are represented by the solid lines. All the data have been fitted with Gamma distributions. Summary statistics for each plot are shown in Table 3

Despite large variation from one model to the next, the vast majority of models have dry distribution "tails" that are drier than for the observations (solid lines in Fig. 3), and the MMMed dry tail (dashed lines in Fig. 3) is well below the observational dry tail. This is in a sharp contrast to the situation at the high end of the distributions, where the MMMed wet tail (dashed lines in Fig. 3) is similar to the observations. It is to be noted that these MMMed distributions represent the best estimate of the impact of anthropogenic forcing on rainfall as the differing internal variability across different ensemble members will tend to cancel each other out. Table 3 summarizes the percentage changes in key statistics of the scaled-version of historical periods relative to their raw-versions. The median values of the scaled-versions of different historical periods are projected to be about 2–4% lower than the medians of their raw-versions for the near-term period. The equivalent figures for the medium- and long-term are 6.5–7.5% and by 13–14%, respectively.

Table 3 Percentage changes in key statistics of future estimates of rainfall for the methods in Categories 2 and 3 at the near term (2010–2039), medium term (2040–2069) and long term (2070–2099)

All the methods in Category 2 project a general decline in rainfall for all parts of future MMMed distributions compared to the PDF of the reference period (\({R}_{obs}^{1975-2018}\)), except the scaled-version of the upper tail values (Fig. 3; Table 3), where there is little change under all three methods (i.e., \({R}_{obs\&mods}^{1900-2018}\), \({R}_{obs\&mods}^{1975-2018}\) and \({R}_{obs\&mods}^{1997-2018}\)). In addition, the decline in rainfall is largest for the scaled-version of post-1997 rainfall (\({R}_{obs\&mods}^{1997-2018}\)), a projected median that is lower than the median of the 1975–2018 baseline by 11% in the near-term and 21% in the long-term. But, as noted in the Introduction, we have less confidence in the future projections of rainfall based on the scaled-version of the 1997–2018 period because Rauniyar and Power (2020) concluded that there is a very large contribution of internal variability to the 1997–2018 rainfall decline, and so applying SFs to this period is expected to underestimate future rainfall. We will return to this point when we examine results using Category 3 methods in Sect. 5.3.

4.3 Caveats of choosing shorter periods for scaling

To examine the impact of choosing shorter periods on estimates of future rainfall, we use the rainfall simulations under the piCTL runs and high emission (RCP8.5) scenario. The benefit of using the piCTL runs is that there is no climate change signal in them and any differences in the future distributions can arise due to internal variability in the piCTL runs only. This helps us to quantify the variations in the estimates of future rainfall distributions that can arise entirely due to the internal variability when shorter (45-years) periods from piCTL runs are used. To do this, one model is taken out from the samples of piCTL models and the SFs are calculated using the piCTL period (200 years) to near-term (2010–2039) and long-term (2070–2099) futures projections (under RCP8.5) of the remaining 30 models. From these 30 samples of the SFs, the MMMed SFs are computed for all deciles. Finally, the whole time-series (1650–1849; 200-years) of the piCTL run of the model that was taken out is scaled using the MMMed SFs, which provide the best estimate of future rainfall due to external forcing (dashed black line in Fig. 4). The reason behind selecting the MMMed SFs is that the uncertainty in the SFs becomes significantly reduced due to averaging out of different phases of internal variability among the models. Therefore, any variations in the distributions will be due to internal variability in the reference periods only. The MMMed SFs are again applied, however this time separately to different 45-year piCTL run blocks of that model.

Fig. 4
figure 4

Impact of internal variability on estimates of area-averaged Victorian cool season mean rainfall for a long-term and b near-term futures, when different and shorter periods are chosen as baselines for a single model. The black solid line represents the PDF of full period (1650–1849; 200-years) of MPI-ESM-MR model's piCTL simulation, while the dashed black line represents the scaled version of the full piCTL period using the MMMed SFs. The coloured lines are the scaled versions of different 45-year blocks as shown in the legends in the piCTL run of the same model. SFs are based on the piCTL runs and RCP8.5 scenarios

Figure 4 shows that the estimates of rainfall for long- and near-term futures based on different 45-years blocks (colored lines) are scattered around the best estimate (dashed black line) due to internal variability in the selected reference periods. It is clear that when the SFs are applied to a drier period, the future distribution will overestimate the best estimate of drying, which can be seen for the 2nd 45-years chunk (dashed red line in Fig. 4). Similarly, when the reference period is wetter, the future will be less dry compared to the best estimate (i.e., underestimate) as can be seen for the 4th 45-years block (dashed blue line in Fig. 4). The difference between the estimated medians of the driest and wettest period is more than 12% which shows that picking a certain observed period to estimate future could lead to a misrepresentation of the expected climate as it could have been internal variability that was drying thing out or making it wetter. This is what is happening for the post-1997 period (Fig. 3c, f) as the drying in this period is largely dominated by the internal variability (Rauniyar and Power 2020). Therefore, applying the SFs to this (internally-driven) dry period would overestimate the expected drying. This analysis also illustrates that a major shortcoming of the SF method is that it is subject to large changes because of internal variability when it is applied to the raw data without adjustment for external forcing. The next section deals with this issue.

5 Future estimates of rainfall by scaling adjusted-observations (Category 3 methods)

In Sect. 4.3, we showed that the SFs need to be applied to the longest possible period of observed data with no external forcing component in it for greatest confidence in projected changes. In this section, we describe results using Category 3 methods, which remove the model estimates of external forcing from the historical observation before applying the scaling factors. We begin by explaining how the contribution of externally-forced response is removed from the post-1975 observational data.

5.1 Removing the contribution of external forcing

Following the method described in Rauniyar and Power (2020), we found that external forcing contributes approximately 20% (IQR 72 to − 20%) of the observed reduction in the cool season rainfall over Victoria for the 1975–2018 period relative to the 1900–1959 period (see Sect. 2.2.3, for further details on how to estimate the contribution of external forcing). On average, the observed cool season rainfall reduction over Victoria for the 1975–2018 period is around 2.82 mm month−1 (4.4%) below the observed 1900–1959 period average (i.e., 64.09 mm month−1). So, to remove the external forcing contribution in the past observation, we added 0.56 mm month−1 (i.e., 20% of 2.82 mm month−1) to each rainfall record during the 1975–2018 period. The resulting adjusted rainfall is then combined with the raw rainfall records for the period 1900–1974 (when the impact of external forcing on rainfall is small according to previous research (e.g., Grose et al. 2015; Timbal et al. 2016; Hope et al. 2017). This produces a continuous historical record for the period 1900–2018, in which our estimate of the climate change signal is removed. The record is therefore expected to be very largely dominated by internal variability alone.

We repeated the above steps using the IQR values (72% and − 20% of 2.82 mm month−1), to account for the large uncertainty in the estimates of external forcing contribution. The RFDs of adjusted full historical records for all three cases are shown in Fig. 5. Comparison of the distributions of the adjusted records with each other and with the distribution of raw data shows that they are not strikingly different. Nevertheless, the removal of the external forcing signal based on median and 1st quartile lead to a slightly wetter climate compared to the raw data (compare green or blue colors RFDs against the black RFD in Fig. 5). The opposite is true when the external forcing (Q3) acts against the internal variability and shifts the climate to the drier side (compare red against others). This shows that RFD of the records with internal variability will either shift to a wetter or a drier side compared to the raw data depending upon whether the external forcing acts to enhance or suppress the rainfall decline due to internal variability in recent decades. Furthermore, the strength of the shift depends on the magnitude of external forcing with stronger the climate change signal, the larger the shift will be.

Fig. 5
figure 5

RFDs of the area-averaged Victorian cool season mean rainfall using raw and adjusted historical period rainfall records. Three different magnitudes of external forcing contribution (i.e., median and IQR) for the 1975–2018 period are estimated following the methods developed by Rauniyar and Power (2020) which are then removed from the 1975–2018 period. The adjusted 1975–2018 period is then combined with the raw data for the 1900–1974 period to form three different versions of adjusted full period rainfall records which are then used to compute the RFDs. The raw version is represented by the solid black line (\({R}_{obs}^{1900-2018}\)), the median (Q2) and the IQR (Q1 and Q3) adjusted are shown in solid green (\({R}_{obs*(Q2)}^{1900-2018}\)), dashed blue (\({R}_{obs*(Q1)}^{1900-2018}\)) and dashed red lines (\({R}_{obs*(Q3)}^{1900-2018}\)), respectively

5.2 RFDs based on adjusted and scaled observations

Figure 6 shows the estimates of the rainfall distribution for long-term and near-term futures after scaling the adjusted rainfall records for the 1900–2018 period with the SFs relative to the 1900–1974 period. Only the results for the method \({R}_{obs*(Q2)\&mods}^{1900-2018}\) are presented here as the results of the other two methods in Category 3 are very similar (not shown). For the long-term period (Fig. 6a), the estimates of future distributions show very wide variations. However, the majority of results exhibit drying for most parts of the distribution (relative to all the observed RFDs) except at the higher values. This suggests that internal variability, combined with climate change, could lead to periods of dryness unprecedented in the historical record. This is consistent with the study of Delage and Power (2020) who found that drier conditions towards the end of twenty-first century over many parts of Australia are projected to be occasionally punctuated by seasons wetter than the wettest years experienced during the twentieth century. In addition, the lower tail of MMMed RFD (dotted line) shows that there exists a possibility that future estimate of rainfall based on recent period (i.e., \({R}_{obs}^{1997-2018}\)) could heavily overestimate the lower tail of the MMMed distribution and slightly underestimate the upper tail of the MMMed distribution.

Fig. 6
figure 6

PDFs of area-averaged Victorian cool season mean rainfall using the scaled-versions of adjusted full historical rainfall records for a long-term and b near-term periods. The full historical period is adjusted for the external forcing component for the 1975–2018 period (using the 50th percentile of contribution, see Sect. 5.1) and then scaled by the model-based SFs relative to 1900–1974 period (\({R}_{obs*(Q2)\&mods}^{1900-2018}\)). Distributions based on individual model SFs are shown as gray lines, while the MMMed distribution is shown as dotted black lines. RFDs using Category 1 methods based on three different historical lengths are also shown using same colour scheme (i.e., black: \({R}_{obs}^{1900-2018}\); blue: \({R}_{obs}^{1975-2018}\); and red: \({R}_{obs}^{1997-2018}\)) as in Fig. 1

For the near-term (Fig. 6b), most of the estimated distributions are (slightly) below the full or 1975–2018 periods. In contrast, most models suggest that the rainfall will be generally higher compared to the RFD of the 1997–2018 period, except at the lower tail. These findings suggest that there is a much larger chance of receiving rainfall below the lowest annual amount seen during the cool season of 1997–2018 period. In the near-term, there exists a small difference between the MMMed distribution based on what we regard as our best method (i.e., \({R}_{obs*(Q2)\&mods}^{1900-2018}\)) and the distribution for the 1975–2018 period (\({R}_{obs}^{1975-2018}\)), except that with \({R}_{obs*(Q2)\&mods}^{1900-2018}\) the likelihood of rainfall occurring outside the range in using \({R}_{obs}^{1975-2018}\) is increased. In other words, \({R}_{obs*(Q2)\&mods}^{1900-2018}\) indicates that rainfall extremes beyond those witnessed during 1975–2018 are possible. More specifically, the projected median dryings using our best method for the long-, medium- and near-term are slightly above 10%, close to 4% and < 1% of the 1975–2018 period median value (Table 3), respectively. Similarly, the estimated 5th percentiles for long-, medium- and near-term are about 16%, 11% and 6% less than the 5th percentile of the observed 1975–2018 period. In contrast, compared to the 95th percentile value of the observed 1975–2018 period, the method estimates no change towards the end of the century, but a slight increase for the medium- and near-term periods (Table 3). Furthermore, the results are not that different when the contribution of external forcing is removed from the post-1997 period (not shown). This shows that the climate change signal during the twentieth century is modest in size compared to the variability, consistent with earlier research (e.g., Rauniyar and Power 2020).

5.3 Inter-comparison of future rainfall estimates

In this section we compare the estimates of future rainfall distributions for long-term and near-term futures using all nine methods (see Fig. 7). Only the MMMed distributions are presented for the methods in Categories 2 and 3. The comparison against what we regard as our best estimate (i.e., \({R}_{obs*(Q2)\&mods}^{1900-2018}\); dotted black line in Fig. 7) shows that the PDF based on the full historical record (black solid line) grossly biased towards wet compared to the distributions of all future periods during the twenty-first century. This is clearly evident in Fig. 8 (black dash lines are always above zero) which shows the percentage changes in rainfall at different percentiles relative to the best estimate. The results indicate that using the full historical period observations (i.e., using \({R}_{obs}^{1900-2018}\)), could overestimate the median of the best distribution (× symbol on aqua-pale colour boxplot in Fig. 8) for the long-term by 15% (Fig. 8b), while the 5th and 95th percentiles are 18% and 4% higher (Fig. 8a, c), respectively. The equivalent figures for \({R}_{obs}^{1975-2018}\) are about 11%, 19%, and < 1%, and are 1%, 19% and -10% for \({R}_{obs}^{1997-2018}\).

Fig. 7
figure 7

Inter-comparison of the best estimates of area-averaged Victorian cool season mean rainfall distributions for a long-term and b near-term futures using all nine methods described in Table 1. The solid lines are the RFDs based on methods in Category 1 (see Fig. 1), which utilize three different historical periods (i.e., black: \({R}_{obs}^{1900-2018}\); blue: \({R}_{obs}^{1975-2018}\); and red: \({R}_{obs}^{1997-2018}\)) to compute the RFDs. The dashed curves are the RFDs based on methods in Category 2 (i.e., black: \({R}_{obs\&mods}^{1900-2018}\); blue: \({R}_{obs\&mods}^{1975-2018}\); and red: \({R}_{obs\&mods}^{1997-2018}\)) and are the scaled-version of the three methods in Category 1. Similarly, the dotted curves are based on methods in Category 3 (i.e., black: \({R}_{obs*(Q2)\&mods}^{1900-2018}\); blue: \({R}_{obs*(Q1)\&mods}^{1900-2018}\); and red: \({R}_{obs*(Q3)\&mods}^{1900-2018}\)), which remove the contribution of external forcing from historical data before applying the model-based SFs. See Sect. 2.2 for further details of the methods used

Fig. 8
figure 8

Spread of rainfall changes (%) for the future estimates of area-averaged Victorian cool season mean rainfall using different methods, as shown in the X-axis label. All changes are relative to the median values of the best method, \({R}_{obs*(Q2)\&mods}^{1900-2018}\) shown as an ‘x’ symbol. The panels on the left represent the differences for long-term future at the 5th, 50th and 95th percentiles, while the panels on the right are for the near-term. The dashed black, blue and red lines represent the differences for the 1900–2018, 1975–2018 and 1997–2018 periods, respectively. The zero line is shown as a solid gray line

For the near-term, the 1900–2018 period overestimates the best distribution as well, however the magnitudes are smaller (i.e., 2.7% for median and 5.8% and 1.5% for the 5th and 95th percentiles). On the other hand, the PDF obtained using for \({R}_{obs}^{1975-2018}\) closely resembles the PDF of the best estimate (\({R}_{obs*(Q2)\&mods}^{1900-2018}\)), with no change in median, 6% higher at 5th percentile, but 2% lower at 95th percentile. This similarity suggests that the historical reference period of 1975–2018 could be used to approximate the near-future (i.e., 2010–2039) conditions. In contrast, the 1997–2018 period grossly underestimates the median value of the best estimate by 10%, and the 95th percentile by 12% (Fig. 8e, f). However, \({R}_{obs}^{1997-2018}\) overestimates the lower tail by 6%, making the distribution narrow compared to the rest of distributions (Fig. 7), except the scaled-version of itself (i.e., \({R}_{obs\&mods}^{1997-2018}\)). This suggests that using 1997–2018 to estimate the future would underestimate the risk of very dry conditions compared with using our best method (compare location of red dashed line with × symbol on aqua-pale color boxplot in Fig. 8d).

The PDF of the scaled-version of 1997–2018 (i.e., using \({R}_{obs\&mods}^{1997-2018}\)) period exhibits a similar deficiency as its raw version (i.e., \({R}_{obs}^{1997-2018}\)). However, it grossly overestimates the dry tail for the future using \({R}_{obs*(Q2)\&mods}^{1900-2018}\), including the near-term (red boxplot in Fig. 8). Compared with the best estimate, \({R}_{obs\&mods}^{1997-2018}\) suggests approximately 10% more decline in rainfall at median percentile across all futures. This is equivalent to roughly 20% less than the median of 1975–2018 period for the long-term, and about 15% and 10% less for medium-term and near-term futures. These differences seem very unlikely to happen given that the observed drying since 1997 is found to be predominantly due to natural, internal variability (Rauniyar and Power 2020). Hence, \({R}_{obs\&mods}^{1997-2018}\) may be even more unsuitable than its raw version (i.e., \({R}_{obs}^{1997-2018}\)), and we have much less confidence in this method.

6 Outlook for future rainfall and limitations

6.1 Future rainfall

Figure 9 shows the range of possibilities of future rainfall in any individual year relative to the observed All-Victoria rainfall variability. The future rainfall ranges are shown for the MMMed distributions (dotted PDF in Fig. 7) of the best method (i.e., \({R}_{obs*(Q2)\&mods}^{1900-2018}\)), for near-term (2025), medium-term (2055) and long-term (2085) futures. Taking both the externally-forced change and variability into account, the median rainfall is projected to decrease over the remainder of the twenty-first century, due to external forcing. However, we estimate that there is a 90% chance that in any given year from 2025 onward the rainfall will be in the range that has been experienced historically (within the horizontal dotted red lines in Fig. 9). The flipside of this is that there is 10% probability that All-Victoria rainfall in any given year could be unprecedented. We also estimate that the probability of being below the observed 5th percentile (i.e., 291 mm) in any given year will increase in the future (see brown area below the observed 5th percentile). According to the best estimate, the likelihood of receiving rainfall less than or equal to the observed 5th percentile are approximately 8%, 12% and 16% for near-term, medium-term, and long-term future periods, respectively. In contrast, the probability of rainfall being greater than or equal to the observed 95th percentile (i.e., 580 mm) in any given year decreases into the future and are approximately 5%, 4% and 2.5% for near-term, medium-term, and long-term periods, respectively. Note, however, that these results are based on MMMed, and not all models exhibit such a simple monotonic relation changes of the same sign as the MMMed.

Fig. 9
figure 9

Inter-annual variations of the observed and estimated cool season total rainfall (mm) for Victoria. The vertical bars represent the observed rainfall for the period 1900–2018 in blue (red) colors for values above (below) average of DELWP's baseline (1975–2018) period (thick blue dotted line). Overlaid black and red thick dotted lines are the average of the full historical and post-1997 periods respectively. The orange dashed line represents the projected median rainfall, obtained by linearly interpolating the estimated medians at near-term (2025), medium-term (2055) and long-term (2085). The estimated medians are based on the MMMed distribution (black dotted line in Fig. 6) which uses the method that we have most confidence in (i.e. \({R}_{obs*(Q2)\&mods}^{1900-2018}\)). The brown envelope represents the 5th and 95th percentile range of the same MMMed distributions, while the gray shaded area represents the full range of the same distribution. Thin dotted black and red lines show the 5th–95th percentiles and min–max range of the full historical period

6.2 Assessing the methods in Categories 2 and 3

6.2.1 Cross-validation

In this section we use cross-validation, or "buddy checking", to help assess how well the Category 2 and 3 methods work. This is achieved by taking one model out from the pool of 31 and then applying the SFs of the other models, one by one, to the historical rainfall simulation of the model taken out. This results in 30 different estimates of future rainfall, which is then compared with the selected model’s actual (simulated) future rainfall. The results show that the spread of estimated future rainfall distributions of all the models encompasses the selected model's actual rainfall distribution for all the future periods (Fig. 10a, b). However, the envelope is generally wider for the late twenty-first century (Fig. 10a) and narrower for the near-term period (Fig. 10b). This is expected due to the existence of larger ranges in the SFs for the long-term compared with the short-term (see Fig. 2a). In general, the multi-model meadian of the estimated distributions lies close to the actual distribution for most of the models.

Fig. 10
figure 10

Comparison of probability of exceedances of future rainfall (gray), with the actual simulated future rainfall (red) for a the long-term and b the near-term periods. Multi-model median (MMMed) of the estimated futures is shown as a solid black line. Future estimates of rainfall are obtained by taking one model out and applying the SFs (relative to 1975–2018 historical period) from the 30 remaining models to that model's historical rainfall simulation. c Distributions of rainfall differences in percentage at each decile bin using all the combinations of actual and estimated future rainfalls (i.e., 31 models and each has 30 estimated futures) are shown as boxplots in blue for the near-term, and in orange for the long-term period. The spread due to internal variability is estimated using the piCTL models and shown as green boxes. The horizontal line in the box indicates the median, the box represents the inter-quartile range and the whiskers indicate the minimum and the maximum values

To evaluate the spread of estimated rainfall under RCP8.5 under internal-variability, the cross-validation process is repeated on the piCTL runs. For each model, the percentage differences are computed at all the decile bins using the estimated and the actual rainfall distributions of the models. Finally, the percentage changes from all the models are combined to form a sample of 930 differences (31 models × 30 estimates) at each decile bin and are shown as boxplots in Fig. 10c. The spread due to the internally-generated variability (green boxplot) ranges between ± 25% with estimated IQR values lying within ± 7% of the actual rainfall. On the other hand, there is a large spread under RCP8.5 runs, especially for the long-term period which has a full-range between 50 and 90% and IQR values of ± 20% of the actual rainfall. Comparison with the spread due to internal variability shows that some of the spread under high emission scenario is due to the internal variability, but a larger portion is due to model-to-model differences in the sensitivity to the forcing applied.

In contrast, for the near-term period, the IQR spread of ± 10% is close to internally-generated variability suggesting that the fraction due to different sensitivity is much smaller as the external forcing signal is not yet dominant. In addition, the differences are skewed towards the higher positive values at all the deciles, which suggest that the rainfall in many of the models are also skewed to the right (i.e., heavier rainfall). Nevertheless, the median values of differences are close to zero across all the deciles for both the near-term and the long-term futures which suggests that the SFs method exhibits no biases towards over- or under-estimation of actual (simulated) rainfall.

6.2.2 Verification

In this section we assess the ability of the methods using scaling factors to reproduce observed rainfall changes. This constitutes a direct verification of the methods. This is done by scaling the historical observation for the 1900–1959 period with the model-based SFs to replicate the observed rainfall distributions for the three different historical periods: the wettest 20-year period (1960–1979) and the driest 22-year period (1997–2018) periods and a slightly drier period (1975–2018). The predicted distributions are then verified against the corresponding period observed distribution. The results (Fig. 11) show that the actual distributions (solid red lines) for 1960–1979 and 1975–2018 remain within the limits of the estimated distributions (gray lines). However, for the driest (1997–2018) period, the observed distribution goes beyond the anticipated ranges at several spots and the SFs technique struggles to capture it (Fig. 11c). In addition, irrespective of whether the dry or wet period is considered, the MMMed distributions (dashed black lines) systematically underestimate the observed distributions indicating that the SFs method underestimates the variability. This is consistent with Rauniyar and Power (2020) who found that the majority of models significantly underestimate the observed rainfall variability over Victoria. Nevertheless, the degree of underestimation is significantly larger for the driest period compared to the wettest period (Fig. 11c). This is expected as in the historical observations, the post-1997 period exhibits strong drying compared to any other 22-year period and it was found to be largely driven by extremely large internal variability (Rauniyar and Power 2020). However, several models failed to reproduce the observed magnitude of the recent drying.

Fig. 11
figure 11

Comparison actual and estimate distributions of area-averaged Victorian cool season mean rainfall for three different historical periods: a the wettest 20-year period (1960–1979), b a slightly drier 44-year period (1975–2018), and c the driest 22-year period (1997–2018). The actual RFDs for the three diffrerent historical periods (\({R}_{obs}^{1960-1979}\), \({R}_{obs}^{1975-2018}\), and \({R}_{obs}^{1997-2018}\)) are shown in red color line in each panel. The estimated distributions which are shown in gray color in ac are calculated by applying the corresponding periods model-based scaling factors (SFs) relative to the 1900–1959 historical reference period. The best estimate is the MMMed RFD which is shown as a dashed black line in ac and labelled as (\({R}_{obs\&mods}^{1900-1959}\)), along with the RFD for the historical period, 1900–1959 (\({R}_{obs}^{1900-1959}\)) and is shown as black solid line

These results suggest that the estimates of future rainfall distributions may be underestimated as internal variability of Victorian rainfall is underestimated by the models (Rauniyar and Power 2020). The results certainly lower the confidence we have in the ability of models to provide projections of future rainfall distributions. The results also illustrate that the cross-validation (model buddy checking), while a sensible thing to do, is not sufficient and could result in having too much confidence in estimates of future rainfall distributions.

7 Summary and discussion

Cool season (i.e., April to October) rainfall over Victoria, Australia during recent decades was unusually low compared to the average of the first six decades of the twentieth century. These persistent dry conditions challenged the assumption of a stationary climate (i.e., a non-changing average to continue into the future) which leads to uncertainty in hydro-meteorological engineering design and practices (Milly et al. 2015; Rauniyar et al. 2019). Furthermore, some water managers use Relative Frequency Distributions (RFDs) from selected historical reference periods, or “baselines” (CSIRO 2012; DELWP 2016a, 2020) to approximate what water managers sometimes refer to as “current climate”, but which they use to make decisions over coming years. Thus their “current climate” is really an estimate of what the climate will be like over the next 5–10 years (DELWP 2020). Early records are sometimes omitted when making baseline choices, as the impact of anthropogenic forcing was smaller than it was in more recent decades. However, this omission reduces the duration of the records, and increases the likelihood that important aspects of the variability are omitted. Given that anthropogenic change is increasing (IPCC 2014) and this is reflected in Victorian rainfall changes (Hope et al. 2017; Rauniyar and Power 2020), it is not clear which, if any, historical periods will provide a good indication of near-term future rainfall.

To examine these and other issues we analysed historical rainfall observation for the 1900–2018 period to assess the suitability of using any historical periods as a "proxy" for expected futures. We also used 40 global CMIP5 climate models forced under historical conditions and a high emission (RCP8.5) scenario in conjunction with observed rainfall records to estimate Victorian rainfall distributions for near-term (2010–2039), medium-term (2040–2069), and long-term (2070–2099) future periods. Rainfall simulations from 31 CMIP5 models under pre-industrial (piCTL) conditions have also been used to assess the robustness of decile scaling factors (SFs) against the variations in SFs due to natural internal climate variability.

A total of nine different methods (Table 1) grouped into three Categories were used to provide probability density functions (PDFs) for future rainfall. Category 1 methods are very simple: they use RFDs of observed rainfall over specified historical periods (1900–2018, 1975–2018 and 1997–2018) as estimates of future rainfall PDFs. Category 2 methods utilize decile SFs derived from model simulations, which are then applied to observational data taken from the same three historical reference periods as in Category 1 methods. SFs, which are calculated separately for each decile bin, are equal to the ratio of the average rainfall of each decile bin from a future period of interest and the corresponding decile bin average of the selected historical period (see Eq. 1). The third and final set of methods (Category 3), which we have most confidence in, are very similar to those in Category 2, except that we modify the historical data to reduce the impact of external forcing before applying the SFs. The impact of external forcing is estimated using the method described by Rauniyar and Power (2020). The SFs in this case are based on a comparison between early twentieth century historical (i.e., 1900–1974, when the impact of external forcing is assumed to be small) runs and twenty-first century simulations under RCP8.5. The modified 1975–2018 records are then combined with the raw observations for the period 1900–1974 to construct the adjusted records for the full historical period, in which the impact of external forcing is reduced. The future rainfall distributions are then estimated by applying the SFs (based on the 1900–1974 period) to the adjusted data for the full period.

We have most confidence in estimates based on Category 3 because we found that picking a shorter period for applying SFs could lead to a misrepresentation of expected climate as there exists a higher chance that the shorter period could have been influenced heavily by internal variability that markedly reduced or increased rainfall averaged over the historical reference period (Sect. 4.3). Therefore, the most appropriate method to estimate future rainfall, according to the models, would be the one which applies model-derived appropriate SFs to the longest possible observation dataset that comprises the rainfall due to natural, internal variability only. This is what the methods in Category 3 do, which remove the impact of external forcing from the post-1975 observational records by following the methods described in Rauniyar and Power (2020) before applying the SFs relative to 1900–1974 period to the adjusted full historical records.

We found no striking difference among the results of the three methods in Category 3, even though each method uses different magnitudes of external forcing contribution to cater for uncertainty in models' simulated rainfall change. So, only the results from the method (i.e., \({R}_{obs*(Q2)\&mods}^{1900-2018}\)) which uses the MMMed estimate of the contribution of external forcing to the observed change over 1975–2018 is used to estimate future rainfall PDFs. We found that with all model-based methods there are large model-to-model differences in the estimation of future rainfall PDFs (Fig. 6) which reflects large model-to-model differences in the response of Victorian rainfall to external forcing and the presence of different realizations of internal variability in each model run. Nevertheless, under \({R}_{obs*(Q2)\&mods}^{1900-2018}\), the vast majority of models exhibit drying at most parts of the distribution with larger shift at lower extremes (relative to all the observed RFDs), except at the higher extremes where changes are small. This tendency is clearly reflected in the multi-model median (MMMed) of the projected PDFs which have medians which are 10%, 4% and < 1% lower than the median of the observed data for the 1975–2018 period, for the long- (2070–2099), medium- (2040–2069) and near-term (2010–2039) periods, respectively (Table 3). Similarly, the estimated 5th percentile are about 16%, 11% and 6% less than the 5th percentile of the 1975–2018 period for long-, medium- and near-term futures, respectively. These distributions are all lower than the distribution based on observed rainfall for the whole historical (1900–2018) period (i.e., \({R}_{obs}^{1900-2018}\)). In a sharp contrast, the RFD based on the recent 22-year period (i.e., \({R}_{obs}^{1997-2018}\)) is too dry and too narrow compared with the estimate we have most confidence in (\({R}_{obs*(Q2)\&mods}^{1900-2018}\)). The scaled-version of the 1997–2018 period (i.e., \({R}_{obs\&mods}^{1997-2018}\)) exhibits deficiencies similar to its raw-version. The distributions based on the scaled-version of the 1975–2018 period (\({R}_{obs\&mods}^{1975-2018}\)) underestimate the PDF of the best estimate by only 3–5%, and could be a more suitable approximation to future rainfall PDFs. Note that this method is very similar to the method proposed by DELWP (DELWP 2020).

Comparison of the future estimates of rainfall using the method we have most confidence in (i.e., \({R}_{obs*(Q2)\&mods}^{1900-2018}\)) against the observed rainfall at inter-annual time-scale indicates that the median rainfall is projected to keep declining over time. We estimate that in any given year from 2025 onward there is a 10% chance All-Victoria rainfall will occur that goes beyond historical experience. Furthermore, it is likely that the probability of getting rainfall that is less than or equal to the 5th percentile of the observed value (i.e., 291 mm) will also increase in future. Compared to the full historical period, we estimate that rainfall below the 5th percentile will become more than 1.5 and 3 times more likely for the near-term and long-term futures. In contrast, on average, it is likely that the probability of receiving rainfall greater than or equal to the 95th percentile of the observed value (i.e., 580 mm) will decrease marginally over time.

Even though we have more confidence in the methodology for Category 3 compared to other methods described in this paper, this confidence is somewhat lessened because previous studies have shown that the models underestimate the internal variability and exhibit difficulty in simulating the observed drying since 1997 (Rauniyar and Power 2020) and because the ability of the methods outlined–which use CMIP5 models—to simulate the observed drying in 1997–2018 is very poor (Sect. 6.2.2). This contrasted with the generally favorable assessment we made based on the ability of models to estimate projected changes in other models (i.e., “buddy checking”). This indicates that buddy checking, while a useful thing to do, could give rise to unwarranted degree of confidence in the estimated projected changes.

The fact that climate change has contributed to the current observed rainfall decline (Rauniyar and Power 2020) informed the guidance provided in “Guidance for Assessing the Impact of Climate Change on Water Availability in Victoria”, published by the Victorian Department of Environment, Land Water and Planning in 2020. This guidance is used by Catchment Management Authorities to account for the changing climate in their management plans. In that report there were still questions around which baseline is appropriate. The results from this study will finesse the guidance and best practice provided in the next update of the Guidelines for catchment and land managers. It would be prudent to Victorian decision-makers to factor in these plausible futures in planning for future availability of water in Victoria. We would recommend that for developing future plans, adjustment techniques that we introduced in this study be adopted as they make significant differences at least for medium- and long-term planning. In addition, we focus on Victoria in this study and on the drying only, but the adjustment methods can be applied anywhere and in estimating the likelihood of wet conditions as well.

In the future, we plan to extend this study by estimating future rainfall distributions using our favored method in conjunction with the new-state-of-the-art CMIP6 models (Grose et al. 2020). We will also examine past and future changes over smaller sub-regions, and the ability of the CMIP6 models to simulate Victorian rainfall variability. Comparing scaled observations with direct output from climate models (perhaps with bias correction) could form another further step. Utilizing features that are well simulated by the models rather than the rainfall alone may provide greater insight and certainty around Victoria's future rainfall regime.