1 Introduction

Precipitation is the primary source of accessible freshwater on Earth. It plays a pivotal role in maintaining Earth’s system equilibrium, supporting ecosystems, and crucially, sustaining human survival and activities (Masson-Delmotte et al. 2021). However, it also harbors the potential for catastrophic events. Intense rainfall can lead to devastating floods and adversely impact agricultural yields. Severe droughts inflict significant damage on ecosystems, agriculture, and access to potable water. Given the contemporary backdrop of global climate change, it is crucial to study potential changes in precipitation patterns and extremes.

The study of precipitation is inherently complex. It is a non-continuous variable, neither in temporal nor spatial terms. Precipitation occurrences are characterized by their frequency and intensity but also by their duration and spatial extent. Investigating precipitation series across diverse temporal and spatial scales is imperative for a comprehensive grasp of their inherent nature. While rainfall or snowfall may be influenced by large-scale atmospheric circulations, they can also manifest as highly localized events due to small-scale physical processes (e.g., convective instability, cold pool.. Ducrocq et al. 2008), influenced by local topography or surface heterogeneity, among other factors. Fine spatial and temporal resolution is, therefore, imperative when modeling precipitation and studying its local changes in the context of global climate change.

Undeniably, regional climate models (RCMs) stand out as one of the most widely employed modeling tools today, to fulfill the imperative for precise spatial and temporal resolution in projecting the future dynamics of precipitation. RCMs are a specific kind of climate models used to downscale at high-resolution and over a limited domain the low resolution simulations produced with Global Climate Models. Their high computational costs render unfeasible the production of large ensembles of high resolution simulations necessary to address the different sources of uncertainty associated with the local impacts of climate change (Hawkins and Sutton 2009; Evin et al. 2019). To try to address this high-resolution versus large-ensemble dilemma, various papers (Chadwick et al. 2011; Holden et al. 2015; Walton et al. 2015; Berg et al. 2015; Maraun and Widmann 2018) introduced the concept of emulator for Regional Climate Model (RCM) as a solution to create large ensembles of high resolution climate projections blending the RCM approach with modern machine-learning techniques. In the recent years, many RCM-emulator have been proposed, focusing on different variables or different regions, showing the exponential success of the approach (Wang et al. 2021; Serifi et al. 2021; Babaousmail et al. 2021; Boé et al. 2022; Meer et al. 2023).

Doury et al. (2023) introduced a RCM-emulator for the near-surface temperature for a RCM at its full resolution (12 km) over Europe. The concept of the RCM-emulator involves using machine learning tools to learn the relationship between low-resolution tropospheric variables describing the atmospheric circulation on a specific day and a high-resolution local surface variable, such as daily precipitation. This downscaling function is learnt inside existing RCM simulations. The aim is to tackle the cost limitation of RCM by mimicking its downscaling function for a specific variable at a low computational cost and then by applying it to any global and low resolution simulation. RCM-emulators are categorized as hybrid downscaling methods because they incorporate both statistical and dynamical downscaling. Using historical and future RCM simulations in the training set enables the RCM-emulator to learn how this relationship may evolve under changing climate conditions. Moreover, a RCM-emulator can also be built over regions with no long series of good quality precipitation records as it relies only on RCM simulations. Here, we propose testing if the RCM-emulator introduced by Doury et al. (2023) is suitable for the downscaling of daily precipitation at 12 km and we propose an adaptation to better capture the complexity of this new variable.

Numerous studies have proposed statistical downscaling methods to estimate the relationship between large-scale and local-scale variables in observational records. Maraun et al. (2010) or Gutiérrez et al. (2019) provide an overview of available approaches for precipitations. Some recent studies (Baño-Medina et al. 2020, 2021; Vandal et al. 2019) have successfully implemented convolutional neural networks for this purpose. The RCM-emulator employed in Doury et al. (2023) and here is based on a fully convolutional neural network architecture called UNet (Ronneberger et al 2015). It has exhibited an excellent ability to emulate the temperature, notably in reproducing the complex spatial structure and daily variability brought by the RCM. However, since precipitation is more challenging to model than temperature, this study proposes to explore the use of the loss function to help the neural network focusing on a specific task, following the spirit of previous work (Cannon 2008; Baño-Medina et al. 2020; Ravuri et al. 2021; Rampal et al. 2022). Here, the challenge relies in the reproduction of the complex precipitations distribution and specifically for the heavy precipitation events. Here, we introduce a novel asymmetric loss function tailored for daily precipitation, which we compare to two classical choices for regression problems that we consider as benchmarks. Other recent studies have proposed different strategies to improve the skills on the reproduction of precipitation such as oversampling approach or generative neural networks (Wang et al. 2021; Addison et al. 2022).

After assessing the suitability of the RCM-Emulator for precipitation, we propose in this study to profit from the EURO-CORDEX simulations to evaluate the transferability of the tool. Indeed the emulator is trained using a given set of available RCM simulations (driven by a given GCM and RCP scenario) and it is crucial to study its behavior when downscaling other socio-economic scenarios or GCMs. Then, in a first step, we evaluate the emulator in a perfect model framework (presented in Sect. 2.1)regarding all available simulations with the emulated RCM. Then in a final step, we propose a first showcase of application by downscaling GCM simulations.

This paper is organised into four main sections. In Sect. 2, we recall the concept of the RCM-emulator introduced in Doury et al. (2023), define the technical aspects related to the neural network and the loss functions, and present the framework of the study, including the data, the target domain, and the associated predictors. Section 3 presents the detailed evaluation and comparison of the emulators within a perfect model framework, while Sect. 4 shows the results of applying the asymmetric emulator to GCM simulations. The concluding section summarizes the paper and initiates the discussion.

2 Methodology

In this section, we define the framework used to build and evaluate the RCM emulator for precipitation. Firstly, we recall the emulator concept and present the simulations and the chosen target domain and predictors for this study. We present the neural network architecture and the three loss functions used to train the three emulators for the inter-comparison. The perfect model framework approach used to train and evaluate the emulator is also recalled. Finally, we detail the metrics used to evaluate the emulator under different aspects.

2.1 RCM-emulator concept and calibration process

Regional climate models (RCMs) are driven by global climate models (GCMs) as they continuously receive incoming data at their domain’s borders from a specific GCM simulation at regular intervals. The resulting RCM simulation essentially represents a downscaling of the data from the driving GCM. Nevertheless, within the boundaries of its domain, the RCM develops its own narrative and may consequently deviate from the driving GCM. This can lead to significant differences, both on a daily scale and on a climatological scale, as discussed by Laprise et al. (2008). This large scale transformation primarily arises from the chaotic nature of weather (Lucas-Picher et al. 2008), but it is also influenced by differences in how the models represent physical processes or their inherent complexity, as explored by Boé et al. (2020) and Taranu et al. (2022). Thanks to a lower computational cost, GCMs typically include more components than RCM such as ocean coupling or evolving aerosols. Consequently, Doury et al. (2023) decided to develop an RCM emulator specifically designed to learn the downscaling process inside the RCM simulation while excluding the impact of large-scale transformations.

To isolate the downscaling function, the emulator is trained within a “perfect model" framework, where both the inputs and target data are sourced from the same RCM simulation. The methodology is detailed in Fig. 1. The chosen predictors (described in Sect. 2.3) are upscaled to match the resolution of the GCM, typically around 150 km, through a conservative interpolation method, which involves a straightforward average of all points encompassed within the low-resolution grid. This strategy implies to build coarsened RCM predictors that are statistically equivalent with standard GCM predictors to downscale GCM simulations. For this reason, Doury et al. (2023) applied a spatial moving average filter eliminate at most high-resolution features that might persist through the interpolation. Subsequently, the emulator is trained to accurately replicate the relationship between these “upscaled" inputs and the target variable, such as precipitation, at the resolution of the RCM. A different strategy, generally referred as imperfect training (Boé et al. 2022; Meer et al. 2023; Bano-Medina et al. 2023), consist in training directly between GCM low resolution outputs and RCM high resolution outputs. Nevertheless, Bano-Medina et al. (2023) showed that the large scales differences differ from one RCM/GCM couple to the other, thus limiting the transferability of the tool.

This perfect model framework also facilitates a rigorous evaluation of the emulator, with the RCM series serving as an ideal reference that it should be capable of faithfully reproducing. In practical application, the emulator is directly applied to a GCM simulation, and the smoothing step is retained to consider the GCM at its effective resolution, as discussed by Klaver et al. (2020).

Fig. 1
figure 1

Scheme of the training (left), perfect model evaluation (middle) and GCM world application (right) protocols. Redrawn from Doury et al. (2023)

2.2 Data: the RCM matrix

The emulator proposed in this study relies on the regional climate model ALADIN63 (Nabat et al. 2020). A total of ten simulations have been published with this RCM over the whole Europe in the EURO-CORDEX framework (Coppola et al. 2021). They downscale four different GCMs and three different scenarios of greenhouse gas emissions (cf Table 1). The CNRM-CM5 global climate model is developed in the same institute as ALADIN63, so they belong to the same family of models. CNRM-CM5 drove 4 ALADIN63 simulations, the historical (1951–2005) and three RCP scenarios (2.6, 4.5 and 8.5, on the period 2006–2100). MPI-ESM-LR, NorESM1-M and HadGEM2-ES are the three other GCMs used to drive ALADIN63 following the historical and RCP8.5 scenarios of greenhouses gases emissions. From now, CNRM-CM5 will be referred to as CNRM, MPI-ESM-LR as MPI, NorESM1-M as NCC and HadGEM2-ES as HGM.

Table 1 RCM x GCM x Scenario matrix

2.3 Predictands, predictors and neural network architecture

This study focuses on the challenging task of emulating of daily precipitation from ALADIN63 at \(0.11^\circ\) horizontal resolution (about 12 km). We selected a sub-domain of the EURO-CORDEX domain centred over the Alps, consisting of \(128\times 128\) grid points. The target domain is visible on the left side of Fig. 2. It includes the entire Alps and goes from Sardinia until the north of France and from the Pyrenees until Croatia. This domain is of particular interest due to its diverse areas with distinct precipitation regimes. For example, the Cevennes (South-East of France) region is known for its very extreme events in autumn, similarly to other coastal areas of the Mediterranean region. High terrains receive more precipitation than low elevations. They are known to be spots of RCM added value, especially regarding extremes (Torma et al. 2015). The flat regions of the north of the domain receive a lot of precipitation throughout the year but have less strong daily extremes than the southern regions. The Alps have also a specific precipitation regime with intense summer storms.The emulator is trained to replicate both land and ocean precipitations, although at times, we will concentrate our evaluation solely on land. Additionally, this domain is four time larger than the one in Doury et al. (2023).

The emulator used in this paper for precipitation downscaling follows the principles developed in Doury et al. (2023). It can be viewed as a conventional machine learning problem

$$\begin{aligned} Y_t = F\left( X_t,Z_t\right) \end{aligned}$$

where \((X_t,Z_t)\) are the low resolution predictors, \(Y_{t}\) the high resolution target variable (in this case daily amount of precipitation) at day t and F the downscaling function we aim to estimate using a neural network. The list of predictors and the standardization procedure remain consistent, encompassing both sets of 1D and 2D inputs, as detailed in Table 2. As we considered the daily precipitation we also provide daily inputs. For each day, we perform spatial normalization on each 2D input, in order to provide normalized “images” to constrain the network to focus on the comprehension of the spatial structures. Since they are crucial information, the daily spatial mean and standard deviation, that are retrieved on each maps, are subsequently provided to the emulator through the set of 1D inputs, which also includes external forcings (yearly greenhouse gas concentrations, solar and ozone forcings) and the seasonal indicator (sinus-cosinus vector). The inclusion of the external forcings should help the network to get some context to better differentiate, if necessary, the climate state it is considering (season, CO\(_{2}\) concentration.. ). The predictors selection are the ones commonly used in statistical downscaling (Gutiérrez et al. 2019; Maraun and Widmann 2017), more details can be found in Doury et al. (2023). The input domain is adjusted to align with the new target domain. It is a 22 \(\times\) 16 grid points on the CNRM-CM5 grid (\(1.4^\circ\)) centred over the target domain, (the whole map on Fig. 2, left).

Fig. 2
figure 2

Illustration of the input (left) and target (right) domains through the climatology of the daily rainfall over the 1980–2000 period. The black line on the left panel shows the target domain while the input domain is the entire map. On the target domain: the red points are the three illustrating points on Fig. 5 and 9. From North to South, there is Paris, a high point (2247 ms) in the Swiss Alps and Roma. The three blue boxes are the three regions used for the SAL evaluation in section 3.1.4: The north region, centred over Belgium, the Cevennes region (south-east France) and the Dinaric Alps

The neural network architecture is adapted from the UNet architecture (Ronneberger et al 2015). The small differences with the one presented in Doury et al. (2023) are due to the size of the input and target domains. As shown in Fig. 3, the first layer of the network reshapes the 2D inputs from [16, 22, 32] to [16, 16, 64] in order to obtain squared images before the encoding path. On the other side, the expanding path is extended to reach the target domain size. This leads to a network of about 28 millions of parameters. The emulator presented in this paper is trained over the 150 years of the ALADIN63 simulations driven by the CNRM-CM5 historical and RCP85 runs. It takes about two hours and an half and 60 epochs to train the network on a GPU (Tesla V100 PCIe 16GB) using the keras environment (Chollet et al. 2015).

Fig. 3
figure 3

Illustration of the neural network architecture, adapted from Doury et al. (2023) where a complete description of the network is provided

Table 2 List of predictors, identical to Doury et al. (2023)

2.4 Loss function for the neural network training

Over this study, we propose a deeper look on the impact of the loss function on the emulator’s performance. The loss function is an essential part of the neural network training. In the training phase, the network sees examples of inputs and target pairs. For each day of the training set, it makes a prediction and compares it with the truth. The loss function evaluates the network prediction against the expected outcome. The network parameters are then updated according to the loss function results. This operation is repeated until the cost (i.e. the loss mean on the training set) stabilises. The best combination of parameters has the lowest cost over a validation set, different from the training set. This is then a minimisation problem to find the best estimate \(\hat{F}\) such that:

$$\begin{aligned} \hat{F} = \arg \min _{\theta \in \varTheta } L\left( {\mathcal {V}},\theta \right) \end{aligned}$$
(1)

Where \(\varTheta\) is the ensemble of possible parameters, \({\mathcal {V}}\) the validation set and L the loss function.

Fig. 4
figure 4

Illustration of daily precipitation distribution (in mm/day), in the Cevennes box (cf Fig. 2), all points and days are pooled. On the left side we show the classical representation of the distribution while the right side illustrate the distribution in term fractional contribution to mean (explained in Sect. 2.5.3)

Precipitations are particularly complicated to emulate with neural networks because of their distribution. Indeed, as illustrated in Fig. 4, the distribution of precipitation looks like a highly left-skewed gamma distribution. There are many days with no precipitation and few ones with very high precipitation, which induces heavy tail distributions. These different events contribute non equally to the mean, with a few days having more impact than the other ones as illustrated on Fig. 4. It is of fundamental interest that the emulator reproduces well the entire distribution. The good reproduction of the frequency and intensity of rare extreme events constitutes a substantial added value of RCM, so the emulator should reproduce them accurately. The loss function is therefore a possible way to rebalance the data and to force the emulator to look more specifically into some specific part of the distribution (Ayzel et al. 2020).

We compare here three emulators, constructed with different loss functions:

  • Emul-MSE uses the classical mean squared error for the loss function, as stated in Doury et al. (2023). It corresponds to the L2 distance.

    $$\begin{aligned} L(y,\hat{y})\, =\, \frac{1}{N \times T} \sum _{t=0}^{T} \sum _{i \in {\mathcal {D}}} \left( y_{i,t} - \hat{y}_{i,t}\right) ^2 \end{aligned}$$
    (2)

    With \({\mathcal {D}}\) the ensemble of grid points, N the number of grid points and T the number of days.

  • Emul-MAE uses the mean absolute error. It corresponds to the L1 distance.

    $$\begin{aligned} L(y,\hat{y})\, =\, \frac{1}{N \times T} \sum _{t=0}^{T} \sum _{i \in {\mathcal {D}}} \left| y_{i,t} - \hat{y}_{i,t}\right| \end{aligned}$$
    (3)
  • Emul-ASYM uses a specific loss function designed for the precipitation problem. It is based on the MAE loss function plus an asymmetric term which penalizes the emulator when it underestimates the true value while it was a raining day. The stronger the rain the stronger the penalty.

    $$\begin{aligned} L(y,\hat{y})\,= & {} \, \frac{1}{N \times T} \sum _{t=0}^{T} \sum _{i \in {\mathcal {D}}} \left| y_{i,t} - \hat{y}_{i,t}\right| \nonumber \\{} & {} + \gamma _{i,t}^2 \times max\left( 0,y_{i,t} - \hat{y}_{i,t} \right) \end{aligned}$$
    (4)

    With \(\gamma _{i,t} = G_i(y_{i,t})\) and \(G_i\) the cumulative distribution function of a random variable \(Y_i\) following a gamma distribution

    $$\begin{aligned} Y_i \sim \varGamma _i: \varGamma \left( \alpha _i,\beta _i\right) \end{aligned}$$

    where the \(\alpha _i\) and \(\beta _i\) parameters are fitted on the historical precipitation series at each grid point i.

The MAE and MSE losses are the most commonly used loss functions for regression problems. The MAE loss sums the absolute distance between an observation and its prediction. It gives the same weight to each observation. Knowing that daily rainfalls are strongly left-skewed, with a vast number of observations with a small amount of precipitation, the EMUL-MAE should be able to fit these days well. However, the rare cases with large precipitations could be less well reproduced.

The MSE loss function gives more weight to the significant errors than the small ones. The MSE generally shows the best results in regression problems and is equivalent to the maximum likelihood estimation in a Gaussian setting. It leads theoretically to the best estimate for normally distributed data knowing the inputs. In the case of precipitations, it is not likely to be the case because of their highly intermittent nature. So the MSE loss function might not be well suited. Note that Emul-MSE is the same emulator as the one introduced in Doury et al. (2023).

The choice of the asymmetric loss function comes from the results of both EMUL-MAE and EMUL-MSE presented in Sect. 3. The idea is to add a penalty when the emulator underestimates strong precipitations. This is done by the asymmetric term: \(max(0,y_{i,t} - \hat{y}_{i,t} )\). Moreover it needs to depend on the rain intensity. The more extreme the precipitation, the rarest it is and so the higher the penalty should be. The \(\gamma _{i,t}\) parameter determines how extreme is a given observation and defines the weight accordingly. At each grid point, we estimated the parameters of a gamma distribution on the rainy days (over 1 mm) of the training set (using the scipy python package, Virtanen et al. 2020). The Gamma distribution has been widely used to described precipitation data (Katz 1977; Vrac and Naveau 2007) but other distribution could be considered. In order to make this parameter estimation more robust, we fit them yearly and then average these parameters over the years. It gives a map of the shape and scale parameters. The \(\gamma _{i,t}\) parameter is then the evaluation of \(y_{i,t}\) (the target value at point i and time t) by the Cumulative Distribution Function (CDF) associated to the gamma distribution \(\varGamma _i\) fitted for this point. It is an objective way to indicate the relative intensity of the precipitation for a given location. The asymmetric term acts like a “regularization” parameter in the loss function as it aims to correct the defaults of the MAE based emulator.

2.5 Evaluation metrics

In order to evaluate and compare the performances of the emulators we will evaluate their predictions with respect to the daily precipitation series from the corresponding RCM simulation (cf Fig. 1). The evaluation relies on various metrics to compare the targeted (Y) and the predicted (\(\hat{Y}\)) series to have the most complete evaluation possible and understand the strengths and weaknesses of each emulators. The different metrics are detailed below.

2.5.1 Time series comparison

First of all we will evaluate in each grid point if the emulated time series matches the original RCM series through two metrics:

  • Temporal Anomalies Correlation. This is the Pearson correlation coefficient after removing the seasonal cycle:

    $$\begin{aligned} ACC(Y,\widehat{Y}) = \rho \left( Y_a,\widehat{Y_a}\right), \end{aligned}$$
    (5)

    with \(\rho\) the Pearson correlation coefficient and \(Y_a\) and \(\widehat{Y_a}\) are the anomaly series after removing a seasonal cycle computed on the whole series.

  • Ratio of Variance. It indicates the performance of the emulator in reproducing the local daily variability. We provide this score as a percentage:

    $$\begin{aligned} RoV(Y,\widehat{Y})=\frac{Var\left( \widehat{Y}\right) }{Var(Y)}*100 \end{aligned}$$
    (6)

Both metrics are computed at each grid point. Each map is summarised with its spatial mean and 5th and 95th super-quantiles. The super-quantile \(\alpha\) is defined as the mean of all the values larger (resp. smaller) than the quantile of order \(\alpha\), when \(\alpha\) is larger (resp. smaller) than 0.5.

2.5.2 Climatological scale metrics

It is necessary to evaluate the emulators at the climatological scale. We use three statistics over at least 20 years: the daily precipitation mean, the 99th quantile and the percentage of dry days (precipitations lower than 1 mm/day). These three metrics, often used in the climate community, are snapshots of the variable distribution illustrating the performances of the emulators from low to high precipitation. The biases maps are presented in percentage. When the biases are too strong, notably because of comparing very small values, we use the simple bias \((\hat{Y}-Y)\), expressed in mm/days. Again, the statistics are computed point-wise, and each map is summarised by its spatial mean and super-quantiles.

$$\begin{aligned} \text {Relative bias } = \frac{\left( \hat{Y} - Y\right) }{Y} \times 100 \end{aligned}$$
(7)

These three statistics will be looked at in present climate but also in climate change context. Each statistic will be computed in a future period and the climate change statistic is the relative difference with the past period. Then the simple bias is computed between RCM and emulator climate change statistics.

2.5.3 PDF normalisation

Since the probability density function for daily precipitation are very heavy-tailed, it is difficult to compare distribution we get from the emulators with the RCM. We propose here to use the ASoP method introduced in Klingaman et al. (2017) and applied in multiple studies as Berthou et al. (2020) or Vergara-Temprado et al. (2020). It consists in computing the precipitation frequency following some well-chosen bins \(b_n\) defined in Eq. 8. The bins are such that they contain a similar number of events for bins over 1 mm and as long as the number of events is sufficient.

$$\begin{aligned} b_n = e \left( log(0.005) + \left[ n \frac{\left( log(120)-log(0.005)\right) ^2}{59}\right] ^{\frac{1}{2}} \right) \text { with } n \in \llbracket 0, 100 \rrbracket \end{aligned}$$
(8)

Then we can look at each bin’s contribution \(C_n\) to the mean by multiplying each frequency by the corresponding bin’s mean as described in Eq. 9. Both frequency and contribution are interesting in comparing the emulated series with the true RCM.

$$\begin{aligned} \begin{gathered} C_n = f_n m_n \\ \text { where } f_n \text { and } m_n \text { are the frequency and the mean of bin } b_n \end{gathered} \end{aligned}$$
(9)

We use the skill score proposed in Berthou et al. (2020) to evaluate the difference between the emulators and the RCM truth contributions curves. The fractional contributions are the actual contributions divided by the total mean precipitation of the series. They give information on the shape of the distribution independently from the mean. The Fractional Contribution Skill Score (FCSS) sums the absolute difference in each bin between the fractional contributions of an emulator and the targeted true series. The area under the FC curve is equal to 1, so the FCSS is equal to 0 when the two distributions are identical and to 2 when there is no overlap between them. It measures the differences between the two distribution shapes independently from the series mean. This score is illustrated on Fig. 9 and further commented in the results Sect. 3.1.3.

$$\begin{aligned} \begin{gathered} FCSS(Emul,RCM) = \sum _{n\in \llbracket 0, 100 \rrbracket } \left| FC_n^{Emul} - FC_n^{RCM}\right| \\ \text {where } FC_n= \frac{C_n}{\sum _n C_n} = \frac{C_n}{mean} \end{gathered} \end{aligned}$$
(10)

2.5.4 SAL score

In order to further evaluate the performances of the emulator, we use an object-oriented score introduced in Wernli et al. (2008). The SAL score aims to evaluate the spatial structure of precipitation objects from a predicted map versus a reference. It compares two maps of precipitation at a given time step. It accounts for the objects’ structure (S-component), location (L-component) and the total amplitude of precipitation (A-component). In perfect model evaluation, the emulator should be able to reproduce the precipitation events accurately. This score indicates if the emulator recreates objects with the same characteristics than the RCM. Note that the days are dealt independently meaning that the life time of the objects is not considered.

The first step is to identify the precipitation objects. To do so, we used the pysteps (Pulkkinen et al. 2019) python library, which integrates a SAL implementation. On each daily map, the objects are define as the groups of at least 5 consecutive points with precipitation higher than a threshold equal to \(R^* = \frac{1}{15}R^{(95)}\), \(R^{(95)}\) being the 95th quantile on the map. Multiple objects can be detected every day. Then, the three components are computed aiming to differentiate objectively different precipitation objects. The A- and S- components take values between – 2 and 2 while the L-component takes values between 0 and 2. If all objects are similar on the maps the three components will be close to 0. A more detailed presentation of the score behavior can be find in Wernli et al. (2008, 2009).

The results are then presented in a diagram where each day is represented by a point with the S and A components on the x and y axis respectively, and the L component given by the color of the point. SAL diagram are visible in Fig. 11 and commented in Sect. 3.1.4. Following the recommendation of Wernli et al. (2009), we apply this score on sub-domains of a maximum of 500 km by side represented with blue squares on Fig. 2. It is noteworthy that other metrics could be applied to evaluate an RCM-emulator in an object perspective through tracking (Caillaud et al. 2021) or spectral density (Vosper et al. 2023) approaches.

3 Perfect model evaluation

This section is divided in two parts. In a first evaluation step we evaluate and compare the three emulators in perfect model framework. We use the CNRM-ALADIN RCP45 simulation, from 2006 to 2100, which has not been seen during the training of the neural network (see Fig. 1). After a first impression on the emulators’ abilities through some examples, we extend the analysis with climatological and daily scores. This section also aims to understand the impact of the loss function on the trained emulator by comparing the results of EMUL-ASYM with the two benchmarks emulators. A second step focuses the evaluation on the Emul-ASYM and the assessment of the transferability question by extending the analysis to all available ALADIN63 simulations (cf. Table 1).

3.1 Comparison of the three emulators

3.1.1 First look into the emulators’ prediction

Before evaluating the emulators’ performances with metrics, it seemed worthwhile to look into the raw series they produce. Figure 5 shows the times series at four grid points for the year 2022 in the evaluation simulation for the RCM truth and the three emulators. The three grid points show very different series. The Alps point series shows the strongest variability and intensities, with many days over 50 mm and almost no dry spell. The Paris series has minimal variability with numerous small precipitation days and lower extremes compared with the other points. The Roma series shows dry spells during spring and summer 2022 in this simulation and has a very strong rainfall event in fall.

Fig. 5
figure 5

Daily precipitation time series for four grid points. The RCM truth (in red) and the three emulators are plotted on each panel

The emulators series are very encouraging. They reproduce the original series accurately, respecting each point’s characteristics. They look like precipitation series as they appear to be able to produce periods with no precipitation and days with heavy rainfall. All emulators capture the extreme autumn rainfall in Roma and the dry spell between May and June. The very high variability over the Alpine point also appears to be well reproduced by the three emulators. On all points, the three emulators seem to miss some extremes simulated by the RCM, as it occurs several times that the red line comes higher than the others. However, it does not seem that Emul-MSE or Emul-MAE ever make stronger extremes than the RCM. At this point, it is impossible to decide if an emulator performs better than the others.

Figure 6 shows the precipitation field over the target for three days randomly picked along the simulation. It shows the RCM truth, the three emulators and the UPscaled precipitation field (UPRCM). The UPRCM helps to have an insight into the input resolution and shows how the RCM and the emulators refine it, even if precipitation is not part of the predictors. Several exciting points appear in this figure. First of all, the emulators’ prediction on each panel is very coherent with the RCM. The precipitation predicted by the emulator occur at the right place on these 3 days (with respect to the RCM “reality”) and with a correct intensity. It seems, however, that the emulators are producing too smooth objects. On the RCM maps, there are some very sharp and precise structures that the emulators fail to reproduce with the same precision. For example, on the lower panel, there is a hole with no rain over the southwest of France, which is missed by all emulators, even if Emul-MAE and Emul-ASYM make less intense precipitation over this area. The middle panel RCM map also shows very sharp structures that appear smoother in the emulators’ maps. Nevertheless, the extreme points are well located for the three days.

Fig. 6
figure 6

3 randomly chosen days illustrating the precipitation field of ALADIN63 at the Upscaled resolution (UPRCM), its native resolution (RCM truth). The three right-most plots show the precipitation field for each of the three emulators. The values corresponds to the spatial mean and 5th and 95th super-quantiles

In terms of intensities, the three emulators have mostly the correct spatial mean. Emul-ASYM reproduces better the spatial extremes as it has closer 95th superquantiles than Emul-MSE and Emul-MAE, which are both under-estimating the spatial extremes on these three days. Emul-ASYM is overestimating the spatial SQ95 on the first panel, as it creates a more significant local extreme over the Alps than in the RCM map. It is, however, remarkable that this extreme is not inconsistent with the UPRCM map. Indeed it is interesting to notice the differences between the RCM and the UPRCM maps, which attest to the resolution’s impact. The RCM is able to create sharp and well defined objects, with locally strong intensities. Regarding this aspect, the emulators seem to have an adequate capacity to refine the low-resolution maps and always recreate high resolution maps consistent with the driving large scale circulation. Nevertheless, it seems that the objects created by the emulator are smoother than the original RCM maps.

3.1.2 Daily scale analysis

In a second step, and to extend the first observations from the previous section, we can look at some scores over the time series. Firstly, the upper panel on Fig. 7 shows the Pearson correlation coefficients calculated between the RCM and the emulators’ series in each grid point. The three emulators appear to have similar performances regarding this aspect, with average correlation of 0.80 (de-seasonalised and de-trended) with the true series over the whole domain. The best correlations are over the high terrains with Pearson coefficients larger 0.9. The lowest correlation appears over the driest area (cf Fig. 8), like the south of the Pyrenees or the North-East corner of the domain, but the correlations are still around 0.75.

Fig. 7
figure 7

Temporal Anomalies Correlation (up) and Ratio of variance (bottom) computed on the entire evaluation simulation (2006–2100) for the three emulators

The lower panel on Fig. 7 shows the variance ratio for the three emulators against the RCM truth. Emul-ASYM manages to reproduce in each point the RCM variance much better than the two others. Its variance ratio ranges from 80 to 120 percent, with a big part of the map being very light, showing about 100% variance reproduction. It slightly overestimates the relief’s variance and slightly underestimates it over the regions with low rain average (cf Fig. 8). On the other hand, both Emul-MSE and Emul-MAE vastly underestimate the variance over the whole domain, even if Emul-MSE is slightly better.

It seems that the three emulators understand the driving atmospheric circulation described by the predictor similarly as they seem able to create the chronology of the original RCM series very accurately. They can identify where and when the precipitations occur at the grid point scale, as shown by the good correlation maps. However, the loss choice seems to substantially impact the reproduction of the events’ intensity as the emulators have different variance ratio maps. Let us see if this is confirmed when we look at aggregated statistics.

3.1.3 Climatological scale analysis

In this section, we look at some aggregated statistics to evaluate if the series produced by the emulator are statistically similar to the RCM one and how they differ. Figure 8 shows three climatological metrics over 20 years in the present period for the RCP4.5 simulation which is not in the training set. The upper panel shows the average daily precipitation over 2006–2025, the middle one is the 99th quantile, and the lower one shows the proportion of dry days. This figure illustrates well the impact of each loss function on the emulator.

The Emul-MSE mean is very similar to the RCM map. The spatial mean and superquantiles are the same. The bias map shows that it slightly underestimates the RCM values, but at maximum by 15% and over regions with low precipitations. However, it presents much poorer results on the other part of the distribution: it largely underestimates the 99th quantile (– 15% on average) and the number of dry days (– 10% on average). It is due to the nature of the mean squared error loss, mainly concentrating around the mean.

The Emul-MAE is, meanwhile, very accurate for the representation of dry days, very slightly overestimating them. However, it fails to reproduce the mean and the 99th quantile maps, broadly underestimating them. The MAE loss gives the same weight to all errors. Since the number of dry days is the most represented (between 35 and 85% of the days are between 0 and 1 mm) they weigh much more in the emulator training, so it mainly focuses on them.

The Emul-ASYM aims to correct the EMUL-MAE by giving more weight to the rainy days, proportionally to the amount of rain. It has similar performances to Emul-MAE over the dry days’ map, which is expected since both emulators have the same loss function on this part of the distribution. However, the Emul-ASYM mean and 99th quantile maps are also very accurate. It shows in both cases less than 15% bias over the worst points and almost no bias on average over the maps. Regarding both climatologic maps, it seems to slightly overestimate the precipitation over the high terrains where it is raining the most and under-estimates at the driest points. Nevertheless, these errors are small, and the Emul-ASYM is clearly the best option if we aggregate the performances for the three metrics.

On all maps in Fig. 8, it is striking to see how well the emulators reproduce the complex spatial structures. Emul-MAE and Emul-MSE have strong biases that are uniform over the domain. All three statistics present locally different patterns, and the emulators reproduce that. For instance, on the 99th quantile maps, there is a strong pattern in the Cevennes, just south of the Massif Central (France), which is much less intense in the daily mean map. It is the same for the emulators’ maps. The spatial structure over Italy is also very complex; there is a thin line over the high terrains with more rainy days and higher extremes, which is also almost perfectly reproduced by the emulators. Similar examples exist for the entire domain.

Fig. 8
figure 8

(Upper) the mean map of daily precipitation accumulations over the 2006–2025 period, (middle) the 99th quantile map over the same period and (lower) the percentage of dry days. These three statistics are shown for the RCM and the three emulators. For each emulator and each metric, the relative bias maps are shown. The spatial mean and 95th and 5th superquantiles are given for each map

In order to extend this result, we can look at the entire distribution using the ASoP method described in Sect. 2.5.3. In Fig. 9, the pdf analysis is detailed for the three regions of 9\(\times\)9 grid points previously centered over Paris, Roma and a high point in the Swiss Alps. The first column shows the events frequencies for each bin defined in Sect.  2.5.3. Most days fall in bins under 0.1mm/day as the RCM curve comes from high on the left part of the plots. The Emul-ASYM and the Emul-MAE reproduce this part well, while the Emul-MSE underestimates the very low precipitations (\(\le 0.1mm/day\)) and overestimates the ones between 0.01 and 10 mm/day. It is less pronounced for the Alps point, where the event distribution is more uniform across the bins than the other three points. Emul-ASYM reproduces the frequency of these stronger events better than the two other emulators.

Fig. 9
figure 9

Illustration of the probability density function analysis following the ASoP method (Klingaman et al. 2017) on three example regions composed of 9\(\times\)9 grid points. Each line is a point and each column is a different step of the method. The first column shows the frequency of events in each bins, the second and the third the actual and the fractional contribution and the last column illustrates the fractional skill score (i.e. bin per bin difference between RCM and emulators fractional contribution plots). The numbers in the last column plots are the FCSS for each emulator at the corresponding point

The second column shows the actual contributions to the mean, which are the frequencies multiplied by the bins’ mean. The first remark is that Emul-ASYM slightly overestimates the contribution of the precipitations around 10 mm, which probably led to the wet bias on the mean map of Fig. 8. Emul-MAE produces insufficient rainfall over \(\sim\) 8 mm as the right part of the distribution is shifted to the left. The same remark applies to the Emul-MSE to a minor extent, which has a better reproduction of the mean, confirming what we saw in Fig. 8.

The last column illustrates the fractional contributions skill score by plotting the difference between the emulators and the RCM distributions on the third column. The fractional contributions are the actual contributions normalized by the mean of the series, allowing us to compare only the shape of the distribution across the bins. It helps to see that the Emul-MSE and Emul-MAE distribution are generally left-shifted, with too many small precipitations and not enough strong events. The Emul-ASYM curve generally looks better even if it tends to produce slightly too many precipitations between the mean and the 75th quantile. The regularization term in the Emul-ASYM loss appears to play its role pretty well as the distribution of precipitation is closer to the real one but might sometimes be too strong.

The skill score measures the area between the emulators’ fractional contribution and the RCM one, and we can see that the Emul-ASYM outperforms the others over these three points. It is interesting to notice that Emul-MSE and Emul-MAE perform better over the Alps point, where the precipitations are more uniformly distributed across the bins. Finally, Fig. 10 shows that the Emul-ASYM skill score is better over the whole domain. It generalizes the distribution analysis and confirms that the specifically designed loss function is more adapted than the two others to reproduce the highly skewed distribution of precipitation.

Fig. 10
figure 10

Fractional Contribution Skill Score maps for the three emulators

3.1.4 Object oriented analysis

Figure 6 seems to illustrate that the precipitation objects created by the three emulators are smoother than in the RCM. The SAL method presented in Sect. 2.5.4 is a objected-oriented evaluation approach which compares on two maps the object similarities.

Following the recommendation of Wernli et al. (2009), we limited the evaluation to three subdomains of about 500 km by the side. The blue boxes represent them on Fig. 2. The first subdomain focuses on the Cevennes regions. This part of South France is well known for its extreme autumn precipitation events. These events are the object of multiple studies (Ribes et al. 2019; Caillaud et al. 2021) because of their strong socio-economic impacts. It is then important to assess whether the emulator is able or not to reproduce such events. The second domain is another hotspot for Mediterranean extreme precipitation events (Ivušić et al. 2021) located in Croatia, over the Dinaric Alps and the North of the Adriatic Sea. The last subdomain is centred around Belgium, including the South-East of England, the North-East of France and West of Germany. This region presents a different climatology with extreme events of smaller intensities occurring more in winter.

Figure 11 presents the SAL scores’ results for the Emul-ASYM, while SM.F2 and SM.F3 in supplementary material show the results for the two others emulators. We focus here the comments on the Emul-ASYM as the three emulators present similar conclusions on this aspect. For each region, there are five SAL diagrams. The left most diagram represents the results for all rainy days. Then going to the right we consider only days where the spatial 99th percentile of the RCM truth series is above an increasing threshold. The threshold and the number of considered days are indicated on each diagram. Thus, from left to right we consider only more and more extreme events. The first general comment is that over all these diagrams, the emulator reproduces accurately the large majority of the events. Indeed the red boxes regroup 90% of the days and they are always centred around 0 with most points in deep blue, showing good Location score.

On the first column representing all rainy days, the emulator underestimates the global amount of precipitation over the domain, with the red box being pulled down left. As it gets more centered when we look only at stronger events we can conclude that the emulator misses some small precipitation objects. Knowing the chaotic nature of rainfall, we assume that it is perfectly fine if the emulator misses or add some small events. Moreover, the SAL metrics are one-sided: they evaluate how the predicted map matches the reference one. As we fix the threshold according to the RCM true series, it is logical that events, especially small ones, are missed or underestimated by the emulator. Besides, when we fix the threshold according to the emulated series, then the emulator overestimates the amplitude of some small RCM events and the red box is pushed up-right. It shows that the emulator sometimes misses small objects and sometimes creates some.

On the right of the figure, when we look at days with heavier precipitation, the amplitude gets centred around zero or slightly positive on the right-most column of the two Mediterranean regions. In addition, the emulator tends to produce larger objects with a positive S-component. However, the centre of the object is most of the time well located. It tends to generalize that the emulator produces smoother objects than the RCM, especially on significant intensities events. There is a correlation between the errors in amplitude and structure metrics. It can attest that the emulator always creates objects consistent with the RCM. They are either smaller or bigger in terms of both shape and amplitude.

Generally speaking, the emulator manages to reproduce the precipitation objects simulated by the RCM, even if they do not always have the perfect characteristics. The emulator captures most of the extreme events with the most suitable characteristics. The emulator seems nevertheless to produce smoother objects implying that it misses the high frequency details. A further analysis, with an application to a hydrological impact study, should be conducted to determine whether it is a fundamental limitation and how we could maybe adapt the emulator.

Fig. 11
figure 11

SAL diagram for the three regions: Cevennes (up), North of the domain centred around Belgium (middle), and a region over Croatia and the North of the Adriatic sea. From left to right, the panel show the SAL results for days with maximum events intensities above an increasing threshold. Each point on the diagram represents a day with the Amplitude component on the y-axis, the Structure on the x-axis and the color give the Location score. The red box includes 90% of the points, and the black cross indicates the A and S median. On the colorbar the white point indicates the L metric median over selected days and the bars the 95th and 5th percentiles

3.2 Transferability assessment in perfect model framework

In order to give more robustness to the good performances of the Emul-ASYM, we extend its evaluation in perfect model to all ALADIN63 simulations available for our target domain. Indeed, up to now we focused the evaluation on the ALADIN simulation driven by CNRM-CM5 RCP4.5, which share the same driving GCM. The EURO-CORDEX matrix gives us the opportunity to evaluate the emulator on simulations driven a by different GCMs. We keep using coarse grained inputs from ALADIN simulations driven by different GCM which allow us to perform a first assessment for transferability to different GCMs, as it is a necessary condition for the application of the emulator to downscale large ensemble of simulations.

3.2.1 Present climate

Figure 12 summarizes climatological maps as the ones shown on Fig. 8. The three panels (from left to right) correspond to the three statistics we looked at in Sect. 3.1.3: the mean amount of daily precipitation, the 99th quantile and the percentage of dry days over the 2006–2025 period. On each panel, the upper part shows the summary statistics for the raw maps of the RCM and the emulator, and the lower part summarises the relative bias maps of the emulator with respect to the RCM truth. On each panel, the columns correspond to a simulation. Each bar shows the spatial mean of the map, the upper bound shows the 95th super-quantile and the lower bound shows the 05th super-quantile. The first column shows the results for the CNRM RCP85 simulation, which has been used to train the emulator. The results on this simulation are given here as an indicator and cannot be taken alone to evaluate the emulator’s performances. On each panel, the second column is the summary of the evaluation on the CNRM-RCP45 simulation presented on Fig. 8. The bars illustrate well the main conclusions with for example a slight over-estimation over the wettest point (as the green bar goes higher) or the low biases on the lower panel.

The results are encouraging as the performances of the emulator are very similar across simulations, even if those are different in several aspects. For instance, the 3 CNRM simulations have higher daily means than the three others since the spatial mean and superquantiles are higher. It is less evident on the 99th quantile maps, where only the NCC simulation produces “eye-visible" lower heavy precipitations. The emulator’s bars reproduce the diversity of behavior of the various GCM-RCM pairs, as for example the NCC simulation seems to have less spatial variability on the 99th quantile maps than the other simulations. As observed previously, the emulator overestimates the average daily precipitation on the wettest points and underestimates it over the driest points whatever the statistics (see Fig SM.F4), which stays valid for all simulations. The biases on land points are similar to the ones observed for the CNRM RCP45 simulation, showing that the emulator reproduces each simulation with the same accuracy. In all these simulations, the emulator reproduces the three parts of the distribution well over the whole domain.

Fig. 12
figure 12

Summary plots of the three climatological statistics regrouping the results on all ALADIN63 simulations in perfect model evaluation. On each error bar, the lower (resp. upper) bound is the spatial 5th (resp. 95th) superquantile and the spatial mean is represented by the dot. The upper panels show the raw maps summary statistics for the RCM (in red) and the Emul-ASYM (in green), and the lower panels show them for the relative bias maps

The analysis is the same regarding the variance maps summarized in Figure SM.F5 in supplementary material. The daily variance differs according to the simulation. For example, the RCM simulation driven by NCC has a smaller variance than the CNRM simulations or the HGM. The emulator reproduces in each case the variance maps quite accurately. However, in every simulation, it strengthens the variance where it is the strongest. The variance ratio summary plot confirms that the analysis made for the CNRM RCP45 in Sect. 3.1.2 extends to all other simulations. The emulator can reproduce the daily time series with globally acceptable variance at every grid point. The temporal correlations (not shown) are also similar to what we observed on the CNRM-RCP45 simulation across all simulations. In the worse cases, it misestimates the variance by about 20%. Figure SM.F5 also shows the FC skill score maps for the four missing evaluation simulations (the RCP45 simulation in in 10). Here again, we can observe that the emulators reproduce the shape of the precipitation distribution correctly at each grid point in all simulations. The emulator has similar performances across all simulations at the grid point scale, with for example lower skill scores on the South-West corner of the domain and better ones over the eastern alps.

3.2.2 Climate change reproduction

In order to finalise the evaluation of the emulator in the perfect model framework, we can look at the climate change maps. To do so, we will look at the three statistics used in the previous sections: the mean daily precipitation, the 99th quantile and the percentage of dry days. In each simulation, we compute the relative changes in a future period (2070–2100) versus a past period (1950–1980). The changes in precipitation are likely to be different according to the seasons over western Europe so we will look at the seasonal climate change here. Previous studies tend to project a decrease in summer precipitation over the region, notably around the Mediterranean sea, and an increase of winter precipitation on the North (Coppola et al. 2021). Besides, a possible increase in extreme precipitation, especially over northern Europe, is expected. The results for the summer and winter and the three statistics on all simulations are summarised through summary plots in Fig. 13 while the results for the MPI and HGM simulations are illustrated in Fig. 14. We chose those two maps as they show very contrasted climate change signal.

Fig. 13
figure 13

Same as Fig 12 for the winter and summer climate change (2070–2100 vs 1950–80) summary plots for the three statistics of interest: the daily precipitation mean, the 99th quantile and the percentage of dry days. The changes are the relative difference between the future period and the past one. The biases are simple bias between the emulator and RCM relative change maps. On each bias summary plot the number indicates the % of points where RCM and emulator agree on the sign

The first remark is that on all plots summarising the raw maps, the green bar corresponds very well to the red one, implying that the emulator correctly reproduces the maps and the intensity of the local changes. It is particularly notable on the summer plot, where the differences between the projections are the strongest. The MPI and NCC simulations show a substantial decrease in the mean daily precipitation over the entire map, associated with a global increase in the percentage of dry days. On the other hand, the HGM simulation projects an increase in average daily rainfall over some regions in summer. The emulator reproduces each simulation specificity with mainly the right intensity. Figure 14 shows summer and winter changes for the MPI and HGM simulations. It illustrates well that the emulator correctly captures the big spatial pattern. Still, in summer, we can observe that the emulator precisely places the regions where the HGM simulation produces an increase in average rainfall. This increase matches an increase of the 99th quantile in the same regions, and the emulator produces the same relationship. Similar analysis can exist on the winter maps, concluding that the emulator reproduces the ALADIN63 simulation with excellent accuracy.

Nevertheless, the emulator’s maps are more continuous than the RCM maps, especially for the 99th quantile maps, which are patchy. It results in significant local biases between the emulator and the RCM maps. It partly explains the large biases on the bias maps summary plots in Fig. 13. Generally, the emulator tends to overestimate some changes as we can see that the green bar is often longer than the red one. The number given on top of the bias maps summary plots shows the percentage of sign agreement between RCM and emulator over the grid points. It shows that the emulator identifies well the changes as these numbers are very high ( always above 75%, very often above 90%). Moreover, on the bias maps of Fig. 14, the hatching shows the points where RCM and emulator disagree on the signs. It is visible that they mostly correspond to points with minor changes.

To conclude, the emulator can reproduce high-resolution climate change maps with the same strong spatial pattern and intensities. Another relevant remark, not shown here, is that Emul-MSE and Emul-MAE have the same ability as Emul-ASYM to reproduce the climate change maps. It means that each emulator keeps the same biases along the simulation, and the changes are mainly driven by the large scale, which the emulators captures well (see Table 3).

Fig. 14
figure 14

Perfect model evaluation. Relative changes (in %) between 2070–2100 and 1950–1980 for the MPI and HGM driven simulations regarding (up) the mean map of daily precipitation accumulations, (middle) the 99th quantile map and (low) the percentage of dry days. These three statistics are shown for the RCM and the emulator, plus the simple bias map between the two. For each map, the spatial mean and 95th and 5th superquantiles are given. The hatching indicates the point where RCM and emulator disagree on the sign

3.2.3 Perfect model evaluation conclusions

Table 3 Summary of the emulators’ comparison results

Through Sects. 3.1 and 3.2 we have analysed the ability of the emulator trained with the asymmetric loss function to reproduce the precipitation field simulated by the RCM. The conclusion on the emulator performances are summarised here:

  • The three emulators are able to produce realistic precipitation time series well correlated to the RCM ones and with the right spatio-temporal variability.

  • The grid-point regularization term in the asymmetric loss function helps to respect and reproduce the entire complex distribution of precipitation everywhere on the target domain. However it is important to mention that the evaluation does not account for the full spatio-temporal complexity of precipitation and we did not explore the extremes (5 or 10 year return period events).

  • The asymmetric emulator tend to underestimate the precipitation in generally dry regions and overestimate it in the wettest parts of the domain.

  • The emulators creates coherent objects of precipitation, with generally the right characteristics even if they tend to miss highest frequency details, and this is true for the three emulators.

  • Those conclusions are the same for any ALADIN63 simulations available to evaluate the emulator in perfect model, including the ones driven by different GCMs than the one used during the training. It notably showed good ability to reproduce the diversity across simulations which attests for the good transferability of the learnt function and so gives some confidence on its applicability to various GCMs simulations. This is a key results for future applications.

  • Finally the climate change maps obtained from the emulated series are in excellent agreement to the RCM ones. It gives a lot of confidence to use the emulator in climate change context.

The emulator present therefore satisfactory results in perfect model evaluation, where it downscales coarse grained RCM inputs, even is there is space for improvements. The proposed loss function allowed to reproduce correctly the entire precipitation distribution at the grid point scale validating so far the use of the RCM emulator for precipitation downscaling. Therefore, the loss function plays here as a cursor to set the event intensities, while the chronology of the series is well captured from the predictors. It brings a clear added value compared to the two benchmark loss function that we proposed here.

4 Application to GCM predictors

This section aims to assess the emulator’s applicability to GCM simulations. The ultimate objective of the emulator is to downscale large ensembles of GCM simulations to generate high-resolution simulations, allowing the study of local precipitation evolution and the associated uncertainty. Hence, it is crucial to evaluate if the emulator is indeed applicable to GCM simulations while maintaining similar performance levels than in perfect model framework evaluation. The application protocol is illustrated in the right panel of Fig. 1, where the emulator processes GCM data after interpolating them onto a common grid. In this evaluation, we utilized the emulator to downscale four RCP85 GCM simulations-CNRM-CM5, MPI-ESM-LR, HadGEM2-ES, and NorESM1 (refer to Table 1), which were employed to drive ALADIN63. The corresponding RCM simulations serve as a comparison basis, yet they cannot be deemed as the reference truth for the emulated series.

Indeed, as elucidated in Doury et al. (2023) and in Sect. 2.1, differences between an RCM simulation and its driving GCM entail low day-to-day correlation and long-term statistical disparities. The climatological inconsistencies between an RCM and its driving GCM is a well identified issue that has been notably posed in Laprise et al. (2008). The authors discuss whether those differences should be expected (linked to the upscaled added-value of a better representation of the local processes) or if the RCM shouldn’t deviate from the GCM large scale climatology. Recent studies (Boé et al. 2020; Taranu et al. 2022) tend to agree that at least a part that the climatological biases observed in Doury et al. (2023) is not representing an added-value of the RCM at large-scale. The challenge of this section therefore lies in evaluating whether the emulator generates a series that aligns with the large-scale characteristics of the GCM while incorporating high-resolution features from the RCM. Another way to frame the objective of this section is that we try to identify if the Emulator in GCM application mode is able to reproduce an added-value with respect to its driving GCM similar to the one proposed by the original RCM. Consequently, we will compare the emulator’s output with both the RCM and GCM series. Our expectation is that the emulator produces a series consistent with the GCM’s large scale while integrating high-resolution features akin to those introduced by the RCM.

Fig. 15
figure 15

Illustration of 3 consecutive days for the UPRCM, the emulator downscaling the UPRCM, the RCM, the emulator downscaling the GCM, and the GCM precipitation fields. The contours on low resolution maps show the sea-level-pressure

4.1 Showcases on the large scale differences between RCM and GCM

Figure 15 showcases the precipitation field for three consecutive autumn days in the CNRM RCP85 simulation. Each day includes the RCM truth simulation alongside the emulated maps in perfect model (UPRCM) and application (GCM) mode, complemented the UPRCM and GCM precipitation maps for the respective days. It is important to remember here that the low-resolution precipitation field is not a predictor. The UPRCM precipitation is simply the RCM map interpolated on the GCM grid, and we use it to compare with the GCM precipitation map.

These three days vividly illustrate the low day-to-day correlation between the RCM and its driving GCM. Comparing the low-resolution maps reveals distinct chronologies. One can observe that the situation in day 1 is quite similar between the GCM and the RCM thanks to the sea-level-pressure contour, but that it evolves very differently leading to very different precipitation maps.

However, the three high-resolution maps offer assurance regarding the emulator’s ability to downscale GCM simulations. It generates a series consistent with the GCM, depicting precipitation objects that align with the GCM daily weather patterns. Moreover, the emulator refines the high resolution in a manner similar to the RCM. For instance, on day one, it precisely localizes extremes in the Alps and along the northern Italian coast. On day two, the GCM’s situation over Italy closely resembles the RCM’s depiction on day three, with the emulator producing similar events in mid-Italy in both cases. The emulator also adjusts the intensity of daily extremes, generating stronger heavy precipitations compared to the GCM as captured by the SQ95. However, it exhibits similar limitations in both UPRCM and GCM applications, with objects appearing overly blurred and lacking sharpness, as discussed in Sect. 3.1.4. This consistency underscores the emulator’s stability when downscaling GCM data. These three days exemplify the challenge of evaluating the emulator in application mode without a proper reference, given the day-to-day mismatches that hinder distinguishing potential emulator issues from large-scale-induced divergences.

4.2 Present climate

In this section we analyse the series downscaled by the emulator in present climate. As in the perfect model evaluation, we compute the annual average daily rainfall, the 99th quantile and the percentage of dry days in the present climate (2006–2025) in the four simulations. We compare the emulator’s maps with the RCM ones and the GCM ones.

The most striking observation lies in the added value brought by both the RCM and the emulator when compared to the GCM maps. CNRM, among the GCMs, exhibits some spatial structure across all three statistical measures, while the remaining three show notably flat maps, especially concerning heavy precipitation events. The emulator’s maps exhibit a high spatial correlation with the RCM ones, effectively replicating the fine-scale spatial structure across mean climate conditions and within dry or wet conditions. It successfully captures topography-driven spatial patterns, portraying areas like the central Alps experiencing more precipitation compared to the rest of the range across all RCM and emulator simulations. Additionally, intricate structures over Italy and the Mediterranean coastline are faithfully reproduced by the emulator. Another point of validation is the spatial super-quantile that are comparable with the RCM, confirming the emulator’s high-resolution consistency with the RCM.

Fig. 16
figure 16

Present (2006–2025) climate statistics of 4 simulations (CNRM RCP85, MPI, NCC and HGM) for (upper) the mean map of daily precipitation accumulations, (middle) the 99th quantile map and (lower) the percentage of dry days. For each simulation, we see the RCM, the emulated one and the corresponding GCM map. The spatial mean and 95th and 5th superquantiles are given for each map

In all four simulations and across the three statistical measures, significant disparities exist between the emulator and the RCM maps. As explained in Sect. 2.1 or in Doury et al. (2023), the daily inconsistencies between GCM and RCM large scales can lead to climatological differences. For instance, the emulator driven by CNRM generates more intense precipitation over the Alps than the RCM simulation, resulting in a higher 99th quantile and fewer dry days in the region. Conversely, the HGM-driven emulator simulation reflects a drier tendency, characterized by a lower 99th quantile and a larger number of dry days across the entire domain. The consistency between the three statistics and the fact that the differences vary across simulations tend to support the hypothesis of real large scale differences rather than a problem in the emulator downscaling.

However, some biases in the emulator’s outputs warrant attention. For instance, all emulated simulations underestimate the 99th quantile over the Cevennes in southern France. This region is recognized for its extreme events, an area where the RCMs usually bring a proven added-value at daily scale. While the emulator generates significant extreme events here, they appear comparatively less intense than those over the Alps in contrast to the RCM maps, where they exhibit a similar intensity. Dedicated studies specifically investigating the added value of emulators compared to RCMs and GCMs by analyzing particular events could certainly be conducted. However, such studies are beyond the scope of our current investigation (see Fig. 16).

Fig. 17
figure 17

Autumn relative changes of average daily precipitation between future (2080–2100) and present (2006–2025) period for the 4 GCM simulations downscaled with the emulator: CNRM, MPI, NCC and HGM under RCP85 scenario. From up to down, the rows show: the RCM, the emulator downscaling GCM, and the GCM maps. The spatial mean and 95th and 5th superquantiles are given for each map

4.3 Climate change

In order to complete the study of the emulator ability to downscale GCM simulations, we propose to look at climate change maps. Given the inherent challenges in assessing the emulator’s performance when downscaling GCMs, we will emphasize specific examples in this section. While the emulator is not expected to precisely replicate the changes simulated by the RCM, it should align with those produced by the GCM while integrating small-scale features consistent with the RCM. We compare the changes in autumn precipitation presented in Figs. 17 and 18 produced by the emulator maps for the four simulations with the RCM and the driving GCM simulations.

These figures affirm the emulator’s capability to incorporate high-resolution features into GCM simulations. In terms of both heavy precipitation and mean changes, the emulator generally aligns with the patterns observed in the GCM maps. For instance, the CNRM simulation exhibits an intensification of autumn precipitation over the northern domain, particularly noticeable in the 99th quantiles. The emulator echoes this trend, demonstrating a consistent signal with a more refined localization of pronounced changes, notably over northern and western France. The Emulator also clarifies the North-South contrast in precipitation change in the Alps with respect to the GCM low-resolution map, mimicing well the RCM pattern.

Moreover, the emulator appears capable of modifying the signal produced by the GCM. For instance, both the MPI and HGM simulations indicate a decrease in average precipitation across the entire domain, despite showing an intensification in the 99th quantile. In contrast, the emulator portrays an increase in autumn precipitations over the eastern domain, propelled by a more substantial intensification of extreme events in those regions. If it is difficult to assess for the validity of the modification, it is in agreement with the two other emulated simulations and the four RCM maps.

Even if some spatial structures are consistent between the RCM and the emulator maps, they remain fundamentally distinct. The emulator’s structures are generally smoother than the RCM ones. However, the maps produced by the emulator include realistic high resolution features influenced by topography or coastline for example. Setting aside the differences in smoothness, distinguishing between the RCM and emulator maps becomes a challenging task.

Fig. 18
figure 18

Same as Fig. 17 for the 99th quantile changes

5 Conclusion

This study aims to propose a credible solution to the high computational costs of Regional Climate Models to build large ensembles of high-resolution precipitation projections at daily scale. It extends the RCM-emulator introduced in Doury et al. (2023) for the case of temperature downscaling. RCM-emulators belong to the family of hybrid downscaling methods. They use RCM simulations to estimate the downscaling relationship between low-resolution and large-scale variables and a high-resolution surface variable. It is important to recall here that the present study propose an emulator of a given RCM, CNRM-ALADIN63, in its EURO-CORDEX configuration. This manuscript has three main objectives:

  1. 1.

    Addressing the suitability of the emulator for the complex variable of precipitation, including the extreme parts of its distribution.

  2. 2.

    Studying the transferability of the trained emulator to different sources of inputs.

  3. 3.

    Evaluating the emulator behavior when applied to GCM simulations.

To address these objectives we extended the Doury et al. (2023)’s work with some developments while keeping as most the same basis. Indeed a strength of the RCM-emulator should be its universality across domain or variables. Thus the emulator presented here relies on the same perfect model framework as in Doury et al. (2023), it takes the same list of predictors and the neural network architecture is simply adapted to match the new input and target domains. The target domain considered here is four times bigger which also implied increasing the size of the input domain. Because of the non-Gaussian nature of precipitation we introduced a novel asymmetric loss function and put those results in perspective with two classical functions (MSE and MAE) for regression problems that we consider as benchmark for this study. Finally we also extended the evaluation of the emulator to a larger test set including simulations driven by various GCMs allowing to study its transferability. A first result is the good stability of the methodology set in Doury et al. (2023) with a bigger domain even regarding to computational efficiency.

Regarding the first main objective we have shown that RCM-emulators are a credible strategy to downscale precipitation fields. The perfect model evaluation ensures a perfect reference against which we can precisely evaluate and compared the three emulators (corresponding to the three different loss functions). All of them managed to capture the relationship between the daily large scale circulation and the associated high resolution precipitation accumulation as they all showed very good temporal correlation. It validates the concept of the emulator as it is possible to identify and learn the RCM downscaling function associated to precipitation. Nevertheless, only the asymmetric loss function ensured the emulator to reproduce the full high resolution daily variability that the RCM creates as well as the entire precipitation distribution including heavy precipitation. In addition to our study, a dedicated work should be conducted to evaluate the emulator ability to reproduce specific rarest events. Indeed, we have seen that a dedicated loss function to re-balance the data is necessary to deal with precipitation, and the one introduced here is a credible strategy. We also evaluated the accuracy of precipitation object created by the emulator. We found that they are quite realistic and coherent even if they tend to textcolorbluesmooth high frequency details. Another deficiency of the asymmetric loss function we designed is that it leads to an over-estimation of the precipitation where it rains the most and underestimation where it rains the less. Therefore, the loss function is a critical aspect to ensure that emulators suit well a given variable. The asymmetric loss function is a proposition that showed some success, but other loss functions or different strategy could be used in the same purpose in future studies. Moreover, this result shows that adapting the loss function, which is a simple technical modification, helps to generalize the emulator to complex variables.

The EURO-CORDEX matrix allowed us to study the emulator’s behavior when we move out from the world corresponding to the Scenarios/GCM/RCM triplet used for training. We highlighted the robustness of the learnt function as it presents similar performances across all available simulations. The emulator notably managed to reproduce the specificity of each simulation in present climate but also in climate change signal. Indeed each simulation showed different climate change signals with different spatial patterns and variability over the domain and the emulator showed an excellent ability to reproduce this diversity. This question of transferability is essential for the potential applications it opens to the emulator. Our result tends to show that the emulator can be used to downscale various GCMs and various scenarios.

A critical point in the emulator evaluation is to ensure its good applicability to GCM simulations as it is its purpose. Because the emulator is trained in perfect model framework (i.e. with both target and input coming from the same RCM simulation), it learns only the downscaling function and not any large scale modification. Thus in GCM application it applies this function to the large scale provided by the GCM which is very likely to differ from the RCM one and so the run produced by the emulator is expected to differ from the RCM simulation driven by the same GCM. We expect the emulator to be coherent with the GCM large scale but also to include high resolution features brought by the RCM. We analysed the emulator performance over 4 GCMs and under different time horizons: we looked at some daily maps and at climatological statistics in present climate and in climate change. The conclusions are robust over all those aspects, the emulator brings a strong added-value with respect to its driving GCM that is consistent with the original RCM added-value. However, there are substantial differences between RCM and emulator maps, and it is difficult to assess if they results from large scale discrepancies between the RCM and its driving GCM, and that we do not want to reproduce, or from a misconception of the emulator. Further studies focused on given phenomenon or including other, specifically designed, simulations are probably necessary to assess if we can have a complete trust in the current version of the emulator when it is used to downscale GCM simulations.