Abstract
Climate model outputs are commonly corrected using statistical univariate bias correction methods. Most of the time, those 1dcorrections do not modify the ranks of the time series to be corrected. This implies that biases in the spatial or intervariable dependences of the simulated variables are not adjusted. Hence, over the last few years, some multivariate bias correction (MBC) methods have been developed to account for intervariable structures, intersite ones, or both. As proofofconcept, we propose to adapt a computer vision technique used for ImagetoImage translation tasks (CycleGAN) for the adjustment of spatial dependence structures of climate model projections. The proposed algorithm, named MBCCycleGAN, aims to transfer simulated maps (seen as images) with inappropriate spatial dependence structure from climate model outputs to more realistic images with spatial properties similar to the observed ones. For evaluation purposes, the method is applied to adjust maps of temperature and precipitation from climate simulations through two crossvalidation approaches. The first one is designed to assess two different postprocessing schemes (Perfect Prognosis and Model Output Statistics). The second one assesses the influence of nonstationary properties of climate simulations on the performance of MBCCycleGAN to adjust spatial dependences. Results are compared against a popular univariate bias correction method, a “quantilemapping” method, which ignores intersite dependencies in the correction procedure, and two stateoftheart multivariate bias correction algorithms aiming to adjust spatial correlation structure. In comparison with these alternatives, the MBCCycleGAN algorithm reasonably corrects spatial correlations of climate simulations for both temperature and precipitation, encouraging further research on the improvement of this approach for multivariate bias correction of climate model projections.
Introduction
With ongoing climate change, mitigation and adaptation strategies have to be anticipated by decision makers in order to reduce potential future consequences of climate change on human societies and activities (IPCC 2014). Such consequences are commonly assessed through climate change impact studies, for instance in hydrology (e.g., Bates et al. 2008), agronomy (e.g., Wheeler and von Braun 2013) or epidemiology (e.g., Caminade et al. 2014). They rely on impact model simulations, the quality of which highly depends on the reliability of the climate information used as inputs (e.g., Muerth et al. 2013; RamirezVillegas et al. 2013). Besides observations, global and regional climate models (GCM and RCM) are the major tools to understand the climate system and its evolutions in the future (Randall et al. 2007; Reichler and Kim 2008). However, despite considerable improvements in climate modelling, climate simulations often remain biased compared to observations: even for the current climate, key statistical features such as mean, variance or the dependence structures between physical variables or between sites can differ from those calculated for observational references (e.g., Eden et al. 2012; Cattiaux et al. 2013; Mueller and Seneviratne 2014). Consequently, biases are expected to be present in climate projections for future periods, making bias correction an often unavoidable data preprocessing step for impact studies (e.g., Christensen et al. 2008; Maraun et al. 2010; Teutschbein and Seibert 2012).
In the recent years, many statistical bias correction (BC) methods have been developed that aim to correct (selected features of) the distribution of climate variables. The idea of statistical bias correction is to find a mathematical transformation that makes climate simulations have similar statistical properties as a reference dataset over the historical period, and then apply this transformation for the modeled projection. Such transformations may be determined with statistical models based on either perfect prognosis (PP) or model output statistics (MOS) approaches (Maraun et al. 2010). The PP approach consists in determining the statistical link between a variable of interest from references (predictand) and one or several observed variables (predictors) occurring at the same time. Simultaneous values of predictand and predictors are indeed required to implement the PP approach and learn the (synchronous) relationships between them. By applying these relationships to predictors from climate simulations, this approach implicitly makes the assumption that these predictors are realistically simulated (Wilks 2006). In the MOS approach, observed and simulated variables are not considered to be synchronized in time, and biases relate to differences in some statistics (such as means or variances) or in distributions between references and modeled climate variables. Adjustments can be made to the simulated mean (e.g., Delta method, Xu 1999), variance (e.g., simple scaling adjustment, Berg et al. 2012) and also all moments of higher order and percentiles (e.g.,“quantilemapping”, Haddad and Rosenfeld 1997; Déqué 2007; Gudmundsson et al. 2012). In particular, quantilemapping technique has received a keen interest since it permits for adjusting not only the mean and variance but also the whole distribution of climate variables. It has been conducive to the development of many variants (e.g., Vrac et al. 2012, 2016; Tramblay et al. 2013; Cannon et al. 2015), and applied for various studies (e.g., Vigaud et al. 2013; Defrance et al. 2017; Bartok et al. 2019; Tong et al. 2020). However, such BC methods are designed to only correct statistical aspects of univariate distributions. Simulated variables are indeed adjusted separately for each physical variable at each specific location. Thus, potential biases in the spatial dependence structure of modeled variables are not corrected (e.g., Wilcke et al. 2013), which can generate corrections with inappropriate multivariate situations and can affect subsequent analyses that depend on spatial characteristics of climate variables (e.g., Zscheischler et al. 2019). For instance, this can occur with flood risk assessment, that depends on spatial (and temporal) properties of precipitation, soil moisture and river flow (Vorogushyn et al. 2018) or with droughtrelated impacts, that depend on complex interaction of natural and anthropogenic processes (Van Loon et al. 2016). It is hence crucial to provide end users with bias corrections of climate simulations that present not only relevant 1dimensional information at each individual site but also appropriate spatial representation.
Over the last years, a few multivariate bias correction (MBC) methods have been developed to address the issues of biases in multivariate dependencies. Not only do these methods correct marginal properties of simulated variables, they are also designed to adjust statistical dependencies between variables. Although it has been found for specific cases that MBC methods do not particularly outperform univariate ones for the adjustment of dependencies between multiple variables (Räty et al. 2018), this finding cannot be generalized to all applications and methods. For instance, François et al. (2020) showed the added value of MBC to improve intervariable dependence and spatial structures for temperature and precipitation over Europe. More generally, MBCs could be of great interest for compound events studies, where dependencies between drivers of extreme events with large impacts are crucial to evaluate their risks (Zscheischler et al. 2018).
A categorization of MBC methods in three main families of approaches has been proposed in the literature (e.g., Vrac 2018; François et al. 2020):

the “marginal/dependence” correction approach, that consists of MBC methods adjusting in two distinct steps, i.e. separately, marginal distributions and multivariate dependencies of climate simulations (e.g., Bárdossy and Pegram 2012; Mehrotra and Sharma 2016; Hnilica et al. 2017; Nahar et al. 2018; Cannon 2018; Nguyen et al. 2019; Guo et al. 2019; Vrac and Thao 2020).

the “successive conditional” category, made up of MBC methods performing successive univariate corrections of climate variables conditionally on the previously adjusted ones (e.g., Piani and Haerter 2012; Dekens et al. 2017).

the “allinone” correction approach, that adjusts directly the whole statistical distribution (i.e. both univariate and multivariate properties) of climate simulations at the same time (e.g., Robin et al. 2019).
Based on this categorization, François et al. (2020) performed an intercomparison and critical review of MBC methods. It presents a global picture of the performances of MBCs in terms of multivariate adjustments of climate simulations, as well as the different assumptions and statistical techniques used.
In parallel, i.e., in contexts other than bias correction, over the last decades, machine learning techniques have emerged as a promising approach to model highly nonlinear and complex relationships between statistical variables. Major improvements have been obtained with Deep Learning models (see the overview of Schmidhuber 2015), which have proved to be efficient to extract highlevel feature information from various datasets. In particular, convolutional neural networks (CNNs, see e.g., Lecun and Bengio 1995) showed that they can capture with great performances complex spatial structures. Initially developed for computer vision problems (e.g., Szegedy et al. 2015; He et al. 2016), they found numerous applications in climate sciences: for instance for weather forecast prediction uncertainty (Scher and Messori 2018), emulations of atmospheric dynamics (Shi et al. 2015; Scher and Messori 2019; Chapman et al. 2019), detection of extreme weather events (Liu et al. 2016; Racah et al. 2017) and statistical downscaling (Vandal et al. 2017; Rodrigues et al. 2018; BañoMedina et al. 2020). A recent overview of Deep Learning applications for Earth system science is offered by Reichstein et al. (2019).
Recently, a new class of artificial neural networks, named Generative Adversarial Networks (GANs; Goodfellow et al. 2014), has led to tremendous interests due to their ability to infer high dimensional probability distributions. Initially, this machinelearning method has been developed for estimating the distribution of images from a target dataset, with the aim of sampling new (and unseen) images from this distribution. GANs, implemented with deep convolutional neural networks, have achieved impressive results in computer vision problems (e.g., Radford et al. 2016) and are a subject of active research to improve computing architectures (e.g., Salimans et al. 2016; Karras et al. 2018; Menick and Kalchbrenner 2018) and optimization techniques (e.g., Mao et al. 2017; Arjovsky et al. 2017; Roth et al. 2017). Conditional formulations of GANs have also been developed, for which additional information, such as class labels or images, can serve as inputs to condition the generation of the new images (e.g., Mirza and Osindero 2014; Gauthier 2014; Denton et al. 2015; Kim et al. 2017; Isola et al. 2017). In particular, imageconditional GANs permit to perform imagetoimage translation tasks by learning how to map the statistical distribution of one set of images (source dataset) to the statistical distribution of another set (target dataset). Depending on the correspondence between images of the source and target datasets, different versions of imageconditional GANs have been developed. When all the images are paired (i.e., there is a known onetoone correspondence between every images of the source and target datasets), conditional GANs are trained by supervised learning (Yoo et al. 2016; Isola et al. 2017). When only a few images are paired, semisupervised is used (Gan et al. 2017) and when all points are unpaired, only unsupervised learning can be applied (Kim et al. 2017; Yi et al. 2017; Zhu et al. 2017). Due to the stochastic and highdimensionality nature of many physical processes of the Earth system, GANs and conditional GANs are particularly appealing for atmospheric science problems. Recently, they have been used for various Earthscience related applications: for instance for statistical downscaling (Leinonen et al. 2020; Wang et al. 2021), temporal disaggregation of spatial rainfall fields (Scher and Peßenteiner 2020), sampling of extreme values (Bhatia et al. 2020), modelling of chaotic dynamical systems (e.g., Xie et al. 2018; Wu et al. 2020), classification of snowflake images (Leinonen and Berne 2020), weather forecasting (Bihlo 2020) and stochastic parameterization in geophysical models (Gagne II et al. 2020).
In climate modelling context, no onetoone correspondence exists between observations and model simulations as they have different internal variabilities and thus are not synchronized in time. Biases refer to differences in distributional properties between references and simulated climate variables. Hence, in this context, bias correction can be seen as an unsupervised imagetoimage problem that aims to map daily images from model simulations to daily images from historical observational references in order to adjust the distributional properties of the climate model.
In this study, we adapt a specific formulation of conditional GANs, initially used for unsupervised imagetoimage translation problems (CycleGAN, Zhu et al. 2017), for multisite corrections of climate simulations. The new MBC method, referred to as MBCCycleGAN in the following, is introduced and applied in a proofofconcept context for the correction of daily temperature and precipitation fields with a simple neural network architecture. In order to investigate and evaluate the proposed methodology, applications and comparisons of MBCCycleGAN based on PP (corresponding to a supervised context) and MOS (unsupervised context) approaches are performed through a crossvalidation method. In addition, a second crossvalidation method is used in this study to assess the performances of MBCCycleGAN in a context of different degrees of nonstationarity of the climate model between present (i.e., calibration) and future (i.e., projection) periods. One univariate quantilemappingbased BC method and two MBC algorithms are included in the study in order to gain a better understanding of the performances of MBCCycleGAN concerning univariate, spatial and temporal properties.
The paper is organized as follows: Section 2 presents the model and reference data used, and Sect. 3 describes the MBCCycleGAN algorithm. Then, Sect. 4 displays the experimental setup used in this study, and results are provided in Sect. 5. Conclusions, discussions and perspectives for future research are finally proposed in Sect. 6.
Reference and model data
In this study, the dataset employed as reference for the bias correction is the “Système d’Analyse Fournissant des Renseignements Atmosphèriques á la Neige” (SAFRAN) reanalysis (Vidal et al. 2010) with an approximate 8 km \(\times\) 8 km spatial resolution. Daily temperature and precipitation time series from 1 January 1979 to 31 December 2016 are extracted over the region of Paris, France ([47.878, 49.830\(^{\circ }\) N] \(\times\) [0.949,3.947\(^{\circ }\) E]), which corresponds to a domain with 28 \(\times\) 28 = 784 continental grid cells.
For the climate simulations data to be corrected, daily temperature and precipitation time series are taken from runs of the IPSLCM5AMR Earth system model (Marti et al. 2010; Dufresne et al. 2013) with a 1.25\(^{\circ }\) \(\times\) 2.5\(^{\circ }\) spatial resolution over the same region of Paris. For the 1979–2005 period, a historical run is extracted and concatenated with a run under RCP 8.5 scenario (i.e., the scenario with highest CO\(_{2}\) concentration) for the 2006–2016 period, to obtain the desired 1979–2016 period. To perform a bias correction, a onetoone correspondence between model and reference grid cells is needed, i.e., spatial resolutions between reference and model data have to be the same. Hence, IPSL data are regridded to the SAFRAN spatial resolution with a bilinear interpolation for both temperature and precipitation.
More data are required for this study, in particular for the implementation of the PP approach and to assess the influence of nonstationary properties of climate simulations on the performance of the proposed MBC method. For sake of clarity and make reading easier, these data will be introduced thereafter in the appropriate sections.
For illustration purpose, Fig. 1a displays the topographic map of France with the region of Paris in a box, as well as the mean daily temperature (Fig. 1b, c) and precipitation (Fig. 1d, e) maps for SAFRAN and IPSL datasets during winter over the 1979–2016 period for Paris.
Methodology
GAN
In its most basic formulation, a generative adversarial network consists of two neural networks that are trained conjointly: a generator and a discriminator. We first consider one random variable \(\mathbf {Y}\) living in \(\mathbb {R}^{d}\), with a probability distribution denoted \(\mathbb {P}_{\mathbf {Y}}\). This random variable characterizes the available data, such as images of the target dataset (i.e., references), and hence takes its values in a highdimensional space. We assume to have at hand samples \(\mathbf{y} _{1}, \ldots , \mathbf{y} _{n}\) drawn according to the density \(\mathbb {P}_{\mathbf {Y}}\) on \(\mathbb {R}^{d}\). The generator, denoted G, is a function from \(\mathbb {R}^{d'}\) to \(\mathbb {R}^{d}\) and is intended to be applied to a \(d^{'}\)dimensional random variable \(\mathbf {W}\), usually multivariate Gaussian random noise (with \(d^{'}<\!\!\!<d\)), such that the random variable \(G(\mathbf {W})\) follows the law of \(\mathbf {Y}\), i.e. \(\mathbb {P}_{\mathbf {Y}} = \mathbb {P}_{\mathbf {G(\mathbf {W})}}\). Let \(\mathbf{w} _{1}, \dots , \mathbf{w} _{n}\) be a sample drawn from the distribution of \(\mathbf {W}\). To train the generator G, the discriminator \(D_{\mathbf {Y}}\), that is a function from \(\mathbb {R}^{d}\) to [0, 1], is used as complex loss function (Goodfellow et al. 2014). This neural network is a binary classifier that returns the probability that a given observation, or image, comes from \(\mathbb {P}_{\mathbf {Y}}\). The discriminator is trained in a supervised way to return maximal probability values on the reference images \(\mathbf{y} _{i}\) and minimal values on the artificially generated images \(G(\mathbf{w} _{i})\). Conversely, the goal of the generator is to “fool” the discriminator by making the distribution of \(G(\mathbf{w} _{i})\) as indistinguishable as possible from that of \(\mathbf{y} _{i}\), i.e., making difficult for the discriminator to determine that a sample \(G(\mathbf{w} _{i})\) comes from a distribution different from \(\mathbb {P}_{\mathbf {Y}}\). Generator and discriminator are trained in turns and are in competition (i.e. “adversarial training”) to improve themselves until it reaches an optimal equilibrium state.
The original formulation of GANs explained above is unconditional: the generator G only takes as input noise vectors \(\mathbf{w} _{i}\) to produce new samples that are drawn from the target distribution \(\mathbb {P}_{\mathbf {Y}}\). The idea of conditional GANs (e.g., Goodfellow et al. 2014; Mirza and Osindero 2014) is to add some information as inputs to direct the generation. By conditioning the generation on an input image, the generator is able to generate a corresponding output image, rendering the conditional GANs appropriate for imagetoimage translation tasks (e.g., Isola et al. 2017).
CycleGAN for unsupervised imagetoimage translation
CycleGAN (Zhu et al. 2017) is a particular imageconditional GANs that is commonly used for unsupervised imagetoimage translation. In the original application, CycleGAN has been applied with great success to transform photographs into the styles of master paintings by modifying colour information (i.e., RGB colour channels and/or spatial features of colours) of the photographs. Instead of the random noise \(\mathbf {W}\), we introduce another random variable \(\mathbf {X}\), with probability distribution \(\mathbb {P}_{\mathbf {X}}\), living in the same dimensional space as \(\mathbf {Y}\) (i.e., \(\mathbb {R}^{d}\)). This random variable \(\mathbf {X}\) characterizes the images of the source dataset (i.e., biased simulations to correct). The CycleGAN approach consists in learning a mapping (i.e., a generator) \(G_{\mathbf {X} \rightarrow \mathbf {Y}}: \mathbb {R}^{d} \rightarrow \mathbb {R}^{d}\) such that the random variable \(G_{\mathbf {X} \rightarrow \mathbf {Y}}(\mathbf{X} )\) follows the law of \(\mathbf {Y}\) (i.e., \(\mathbb {P}_{\mathbf {Y}} = \mathbb {P}_{G_{\mathbf {X} \rightarrow \mathbf {Y}}(\mathbf{X} )}\)). In addition to samples \(\mathbf{y} _{1}, \dots , \mathbf{y} _{n}\), we assume to have at hand image samples \(\mathbf{x} _{1}, \ldots , \mathbf{x} _{n}\) drawn according to density \(\mathbb {P}_{\mathbf {X}}\) on \(\mathbb {R}^{d}\). Similarly as unconditional GANs, the mapping \(G_{\mathbf {X} \rightarrow \mathbf {Y}}\) is learned using an adversarial loss, i.e. with a discriminator \(D_{\mathbf {Y}}\) which forces the generator \(G_{\mathbf {X} \rightarrow \mathbf {Y}}\) to generate images from a distribution close to the target distribution \(\mathbb {P}_{\mathbf {Y}}\). The adversarial loss is defined as:
\(G_{\mathbf {X} \rightarrow \mathbf {Y}}\) aims to minimize this adversarial objective against \(D_{\mathbf {Y}}\), that means, tries to fool the discriminator with its generated images (i.e., maximizing the probability \(D_{\mathbf {Y}}(G_{\mathbf {X} \rightarrow \mathbf {Y}}(\mathbf{x} _{i}))\)). On the contrary, the discriminator \(D_{\mathbf {Y}}\) aims to maximize the adversarial loss by distinguishing between transferred samples \(G_{\mathbf {X} \rightarrow \mathbf {Y}}(\mathbf{x} _{i})\) and samples \(\mathbf{y} _{i}\) from the distribution \(\mathbb {P}_{\mathbf {Y}}\). A perfect discriminator \(D_{\mathbf {Y}}\) would return probability values equal to 1 for samples drawn from \(\mathbb {P}_{\mathbf {Y}}\) and equal to 0 for samples generated by \(G_{\mathbf {X} \rightarrow \mathbf {Y}}\). Hence, \(G_{\mathbf {X} \rightarrow \mathbf {Y}}\) is designed to solve the optimization problem against \(D_{\mathbf {Y}}\):
As highlighted by Zhu et al. (2017), this adversarial objective for unsupervised problems is underconstrained: there is no guarantee that “an individual input \(\mathbf{x} _{i}\) and output \(\mathbf{y} _{i}\) are paired up in a meaningful way” with such a mapping \(G_{\mathbf {X} \rightarrow \mathbf {Y}}\). In fact, without further constraints, several different mappings can optimize similarly the adversarial loss by transferring the same set of images from \(\mathbb {P}_{\mathbf {X}}\) to any random permutation of a same set of images from the distribution \(\mathbb {P}_{\mathbf {Y}}\). Moreover, optimizing in practice this underconstrained adversarial objective alone has been found to be difficult for unsupervised problems, often leading to a wellknown problem called “mode collapse”. Mode collapse appears when a generator fails to model the complete range of input images. This results in a lack of diversity in the generated outputs. To address these issues, Zhu et al. (2017) propose to reduce the number of possible mapping functions by adding more constraints to the optimization problem. To do so, they introduce the inverse mapping \(G_{\mathbf {Y} \rightarrow \mathbf {X}}: \mathbb {R}^{d} \rightarrow \mathbb {R}^{d}\), as well as a second discriminator \(D_{\mathbf {X}}\) aimed to recognize images from the distribution \(\mathbb {P}_{\mathbf {X}}\). Similarly to the mapping \(G_{\mathbf {X} \rightarrow \mathbf {Y}}\), an equivalent adversarial loss can be used to learn the mapping \(G_{\mathbf {Y} \rightarrow \mathbf {X}}\) by solving \(\mathrm {arg\,} \underset{G_{\mathbf {Y} \rightarrow \mathbf {X}}}{\mathrm {min\,}} \underset{D_{\mathbf {X}}}{\mathrm {max\,}} L_{GAN}(G_{\mathbf {Y} \rightarrow \mathbf {X}},D_{\mathbf {X}})\). Zhu et al. (2017) proposed to use \(G_{\mathbf {Y} \rightarrow \mathbf {X}}\) to enforce the learned mappings to be cycleconsistent. That means that, for each input image \(\mathbf{x} _{i}\), the mappings \(G_{\mathbf {X} \rightarrow \mathbf {Y}}\) and \(G_{\mathbf {Y} \rightarrow \mathbf {X}}\) can be constrained such that it learns to translate \(\mathbf{x} _{i}\) back to the initial image, i.e. \(G_{\mathbf {Y} \rightarrow \mathbf {X}} \circ G_{\mathbf {X} \rightarrow \mathbf {Y}}(\mathbf{x} _{i}) \approx \mathbf{x} _{i}\) (and similarly for image \(\mathbf{y} _{i}\), such that \(G_{\mathbf {X} \rightarrow \mathbf {Y}} \circ G_{\mathbf {Y} \rightarrow \mathbf {X}}(\mathbf{y} _{i}) \approx \mathbf{y} _{i}\)). This property can be enforced by using a “cycleconsistency” loss which is defined as:
Finally, to ensure that images in \(\mathbf{x} _{1}, \ldots , \mathbf{x} _{n}\) that already seem to be draw from the distribution \(\mathbb {P}_{\mathbf {Y}}\) (and viceversa) are not mapped to another images, an identity mapping loss can also be defined as:
which further reduces the solution space of mapping functions and prevents even more the optimization problem from being underconstrained. The full objective function of the CycleGAN architecture can be expressed as follows:
where \(\lambda _{cyc}\) and \(\lambda _{id}\) control the relative importance of both cycleconsistency and identity losses. Finally, the CycleGAN aims to solve:
Although estimating the inverse mapping \(G_{\mathbf {Y} \rightarrow \mathbf {X}}\) is not necessarily the initial goal of many imagetoimage translation problems, its use to constrain the optimization problem has been found to be crucial in an unsupervised context for the convergence of the algorithm and the estimation of the desired mapping \(G_{\mathbf {X} \rightarrow \mathbf {Y}}\). Illustrations of the adversarial, cycleconsistent and identity losses within the CycleGAN architecture are given in Fig. 2.
The MBCCycleGAN approach
Adaptation of CycleGAN for MBC
The main idea of the proposed methodology, named MBCCycleGAN, is to adapt the CycleGAN approach so that it turns daily maps of a simulated variable with spatial features inappropriate compared to a reference dataset, to more realistic maps. Here, MBCCycleGAN is developed in the context of the “marginal/dependence” MBC category, i.e., correcting separately marginal distributions and dependence relationships. In addition to marginal distributions, we consider the adjustment of spatial dependence structures. The algorithm is trained on a historical period (i.e., calibration) for which both climate simulations and reference datasets are available. Once the adversarial neural network has converged, adjustment of climate simulations over a projection period (e.g., a future time period) is performed using the pretrained algorithm. The MBCCycleGAN proceeds as follows:

1.
As MBCCycleGAN belongs to the marginal/dependence category, univariate distributions of modeled climate variables are first corrected independently using a univariate BC method for both calibration and projection periods. In this study, the quantilequantile (QQ) mapping method is used (Déqué 2007).

2.
Then, quantilequantile and reference data over the calibration time period are transformed to belong to [0, 1] using a pointwise minmax normalization. For each grid cell, the minimum and maximum values from the reference during the calibration are taken to compute the normalization. The resulting daily maps are then given to a CycleGAN model to learn the transfer between the two distributions of images. Generators and discriminators are trained until the spatial distribution of the corrected maps stops improving. More details about the criteria used to evaluate spatial distributions are presented thereafter.

3.
Once the CycleGAN model has been trained for the calibration period, the same pointwise normalization is performed for quantilequantile data over the projection period, i.e., using the same minimum and maximum values from the reference during the calibration period. Normalized daily maps from quantilequantile data in the projection period are translated in the normalized reference domain using the pretrained adversarial neural network. Then, the corrected outputs obtained are rescaled to physical values by applying the inverse of the pointwise minmax normalization used.

4.
Finally, by taking advantage of the Schaake Shuffle technique (Clark et al. 2004), quantilequantile data for the projection period obtained from Step 1 are reordered such that the rank structure of the data obtained from Step 3 is reproduced. This shuffling technique, already employed in a few multivariate bias correction methods (e.g., Vrac 2018; Cannon 2018; Mehrotra and Sharma 2019), permits here to obtain biascorrected data with marginal properties from quantilequantile outputs and rank dependence structure from CycleGAN outputs.
A summary of the successive steps in the form of a flowchart is provided in Fig. 3. More details about the different algorithmic steps are presented in Appendix 1.
Network architecture
To infer the weights for the cycleconsistency mapping loss \(\lambda _{cyc}\) and the identity mapping loss \(\lambda _{id}\), preliminary tests have been conducted by checking a couple of combinations of weights and verifying that our optimization process improved the spatial structure of the climate simulations. With respect to these results (not shown), the weights have been chosen equal to \(\lambda _{cyc} = 10\) and \(\lambda _{id} = 1\).
Additionally, in this paper, we only present results obtained with a simple architecture for the CycleGAN neural networks. Our work being a proof of concept, we did not tune any further the architecture or the hyperparameters of the neural networks. However, the results presented later in Sect. 5 appear sufficient to illustrate the potential of CycleGANs for MBC. Schemes for the convolutional neural networks for both generators and discriminators are presented in Fig. 4. Architecture of generators for the mapping and inverse mapping are identical and are based on deep convolutional layers (DCGAN, Radford et al. 2016). First, the daily maps, i.e. images of size \(28 \times 28\) are given as inputs to the generators. Then, images flow through three 2D convolution layers with an increasing number of \(3 \times 3\) filters (64–128–256). Two of them are performing convolutions that downsample input images to capture complex patterns at different scales. Then, two 2D transpose convolutional layers with a decreasing number of \(4 \times 4\) filters (128–64) are used to perform inverse convolution operations and upsampling input data. Finally, one 2D convolution layer with one \(1 \times 1\) filter is used to generate an output image of the same size as the initial one. Skip connections between convolution and transpose convolutional layers are used to ease the training of the CycleGAN network (He et al. 2016). All the other hyperparameters for the neural network architecture of the generators are detailed in Appendix 2.
Concerning the discriminators, they take as well as inputs images of size \(28 \times 28\). Then, two 2D convolution layers with an increasing number of \(3 \times 3\) filters (64–128) are used. Finally, outputs are flattened, i.e., are converted into a 1dimensional array before being given to a fully connected layer (dense layer) that computes the sigmoïd values (i.e., probabilities) for the classification of images.
The number of parameters is equal to 1,025,281 for each generator and 80,769 for each discriminator, bringing the total number of parameters to 2,212,100 for the whole CycleGAN architecture. Please note that each convolution and transpose convolutional layer used within the neural network architectures of both generators and discriminators includes a bias vector to fit. The number of parameters added by individual convolutional layers depends on its number of filters \(f_{2}\), the filter size (here \(3 \times 3\)) and also the number of filters \(f_{1}\) from the previous convolutional layer. Adding an additional convolutional layer in a generator architecture with \(f_{2}\) filters will add \((3 \times 3 \times f_{1} +1) \times f_{2}\) parameters. Hence, constructing a (deeper) neural network with more and more layers increases drastically the number of parameters to train. In order to keep an algorithm which is relatively fast to train while being stable, we decided not to add further layers to generators and discriminators architectures. For a concise summary of network architectures used, we refer to the Tables 3 and 4 in Appendix 2.
Training details
In this study, CycleGAN networks are trained using the Adam optimizer (Kingma and Ba 2017) with learning rates of \(1\mathrm {e}{4}\) and \(5\mathrm {e}{5}\) for the generators and discriminators, respectively. Please note that no grid search has been performed to determine optimal values of learning rates, and hence there is room for improvement. For the performance assessment of the CycleGAN model during training, the energy distance (Székely and Rizzo 2004; Székely and Rizzo 2013) is used. This metric, already used in the bias correction literature (e.g., Cannon 2018), permits to measure the statistical discrepancy between two multivariate distributions that are potentially in high dimension. Given two kmultivariate independent random vectors \(\mathbf{P}\) and \(\mathbf{Q}\) with multivariate probability distributions \(\mu\) and \(\upsilon\) respectively, the energy distance \(\mathcal {E}\) between the two distributions is:
with \(\mathrm {E}\) denoting the expected value, \(\mathbf{P} ^{\prime }\) (resp. \(\mathbf{Q} ^{\prime }\)) independent and identically distributed copy of \(\mathbf{P}\) (resp. \(\mathbf{Q}\)) and \(\Vert .\Vert\) the Euclidean distance. The corresponding energy statistic of \(\mathcal {E}\) between two k dimensional statistical samples \(\mathbf{p}\) and \(\mathbf{q}\) can be computed as follows:
where \(\mathbf{p} _{i}\) denotes the realizations of \(\mathbf{P}\) at the time step i across the k dimensions (and similarly for \(\mathbf{q} _{m}\) with \(\mathbf{Q}\)). The energy statistic goes to zero when the two multivariate samples \(\mathbf{p}\) and \(\mathbf{q}\) are drawn from the same distribution.
During training, computations of energy distances are performed every 10 epochs, i.e. each time that the CycleGAN has worked 10 times through the entire training dataset. Estimated energy distances \(\widehat{\mathcal {E}}\) are calculated on multivariate distributions of ranks between references and biascorrected data. It permits to assess along the training the performance of the method to correct the whole spatial dependence structure of climate simulations. Computing energy distance using ranks instead of raw values allows the removal of the influence of univariate properties on the spatial relationships. The CycleGAN model that minimizes the energy distance on ranks during training is chosen for the correction of the projection period. Training 1000 epochs takes \(\sim\) 4 h on a single NVIDIA Tesla V100 GPU.
Design of experiments
For evaluation purposes, the proposed MBCCycleGAN method is applied to adjust climate simulations outputs with SAFRAN data as references. Bias correction is performed on separate seasons in order to preserve seasonal properties. In the following, for sake of clarity, only the winter results are presented. Data are available for the 1979–2016 period (i.e, 3420 winter days), and need to be divided into a calibration period and a projection period to train and evaluate our algorithm. In accordance with common practices in machine learning, the 1979–2016 period is split as follows: 70% (2394 days) as training dataset and 30% (1026 days) as evaluation dataset. In this study, two different crossvalidation methods—that differ in how calibration and projection periods are constructed—are used to evaluate our methodology.
Model output statistics (MOS) vs. Perfect prog (PP)
The first crossvalidation method consists in drawing randomly the days that define the calibration and projection periods. As these periods are drawn randomly, the potential climate change signal present in the data during the 1979–2016 period vanishes. Hence, for this crossvalidation method, no changes in marginal and dependence properties are expected between the calibration and projection periods, allowing for the assessment of the method in a stationary context. We take advantage of this first stationary crossvalidation technique to apply our method in both PP and MOS postprocessing schemes for the adjustment of IPSL climate simulations. Implementing and evaluating both the PP and MOS approaches in such a validation context permits to determine which approach is better suited in our context of bias correction of climate simulations. For the MOS approach, MBCCycleGAN is applied directly to IPSL data according to the 4 steps already described in Sect. 3.3. Concerning the implementation of the PP approach, the same procedure is applied but the CycleGAN model is trained in a slightly different way. Indeed, as already explained in Sect. 1, a PP approach consists in establishing the statistical relationships between large–cale predictors and localscale predictands from observational or reanalysis data (including for the predictors) before applying them to climate model data. Hence, largescale predictors temporally matching the SAFRAN dataset are needed to a PP approach. For this purpose, a new climate dataset is constructed for both temperature and precipitation as follows: initial localscale SAFRAN data with 8 km \(\times\) 8 km spatial resolution are upscaled using conservative interpolation on a largescale grid of 32 km \(\times\) 32 km spatial resolution. Then, the obtained largescale data are regridded using bilinear interpolation to the initial grid of SAFRAN, allowing to train CycleGAN. It results in “biased” daily maps of temperature and precipitation (largescale predictors) of the initial SAFRAN data (localscale predictands), temporally matching the chronology of the SAFRAN time series. Using these new data—hereafter referred to as “lowresolution (LR) SAFRAN”—a CycleGAN model is trained for the implementation of the PP approach by learning the transfer of maps from 1dBC largescale predictors (QQ(LR SAFRAN)) to maps from localscale predictands (SAFRAN). This trained model is then used to bias correct IPSL simulations over the projection period and, hence, evaluate the CycleGAN results in a PP context.
Nonstationarity investigation
To evaluate the nonstationary behavior of the proposed method, a second crossvalidation method is defined, which consists in dividing the 1979–2016 period chronologically. By still defining the calibration and the projection periods based on the 70–30% split, it results in obtaining approximately the 1979–2005 and 2006–2016 portions as calibration and projection periods, respectively. Hence, the potential climate change signal between the calibration and projection periods is not removed by the crossvalidation technique. Within this second crossvalidation method, IPSL simulations and SAFRAN references can potentially have different marginal and spatial dependence changes between calibration and projection periods. In this respect, depending on the level of agreement in changes between simulations and references, and how MBC methods account for these changes in their correction procedure, the quality of the correction for projection periods can possibly be different. Hence, to provide a global picture of the performances of the MBCCycleGAN method in the nonstationary context, three bias correction exercises of climate data with different statistical changes are performed with respect to SAFRAN references:

the correction of IPSL simulations that present different marginal and spatial properties from SAFRAN, and with potentially different changes than those from SAFRAN.

the correction of LR SAFRAN dataset (presented above), whose marginal and spatial properties as well as their changes are in line with those from SAFRAN.

the correction of a third dataset called IPSLbis (presented below) that presents different marginal and spatial properties from SAFRAN, but for which their changes are in line with those from SAFRAN.
For the sake of clarity, a summary of the different attributes of the three datasets to correct is presented in Table 1.
LR SAFRAN dataset already presented above has, by construction, little bias with SAFRAN references: its biases are only due to the interpolation technique used to obtain data with a lower resolution. Hence, statistical changes between the calibration and projection periods for LR SAFRAN are in line with those from the SAFRAN dataset. Adjusting LR SAFRAN data for the projection period permits to assess if the MBCCycleGAN method is able to reproduce the changes from the reference in the correction. Also, the LR SAFRAN dataset presents the particularity of being synchronous in time with references. Hence, in addition to evaluate the proposed method in terms of distributional properties, which is not considered as sufficient to identify successful bias correction techniques (Maraun 2016), this pairwise correspondence between predictors and predictands offers the possibility to directly compare corrected daily maps with those from the references using classic forecast verification statistics.
As IPSL simulations compute a different combination of variability and warming than those from the SAFRAN reanalysis, IPSL model and SAFRAN references are likely to present disagreeing changes in their statistical (marginal and dependence) properties between calibration and projection periods. To evaluate the influence of these potential disagreeing changes on the performance of correction of the proposed method, we constructed the third dataset, referred to as “IPSLbis”, for the projection period only. IPSLbis is specifically constructed so that its marginal and dependence changes between calibration and projection periods are in line with those from the reference. In order to ease the comparison of results with the first bias correction exercise, we forced IPSLbis to have the same changes as LR SAFRAN. This is reached by using a twostep procedure that takes advantage of a nonstationary quantile mapping technique for marginal changes (CDFt, Vrac et al. 2012) and a matrixrecorrelation technique for dependence changes (Bárdossy and Pegram 2012). More details about the generation of the IPSLbis data can be found in Appendix 3 and a detailed evaluation of the evolution of statistical properties of the different dataset between the calibration and projection period is provided in Appendix 4. In particular, results presented in Appendix 4 indicate that, as expected, changes in spatial structures from SAFRAN references are (globally) in agreement with those from LR SAFRAN for both temperature and precipitation. However, concerning changes in spatial structures for IPSL simulations, conclusions are not the same depending on the physical variable. While, for temperature, simulated changes of spatial correlations are partially in line with those from LR SAFRAN, IPSL model presents discrepancy of changes for precipitation. Globally, the construction of IPSLbis with the twostep procedure described in Appendix 3 permits to impose to IPSL data spatial changes for both temperature and precipitation that are in line with those from LR SAFRAN.
Comparisons to existing MBCs: R\(^{2}\)D\(^{2}\) and dOTC
Although evaluating the performance of correction for IPSL simulations is of primary interest, applying our method on these three datasets (IPSL, IPSLbis, LR SAFRAN) permits to assess gradually how well our method is performing depending on the biases present in the dataset to correct. Note that, as IPSL and IPSLbis data during calibration are identical, there is no need to train for a second time the CycleGAN model for IPSLbis data: the CycleGAN model trained with IPSL data can be used directly to adjust IPSLbis simulations for the projection period. In addition, two MBCs with different assumptions about nonstationarity are applied for comparison using the second crossvalidation method: the “Rank Resampling For Distributions And Dependences” (R\(^{2}\)D\(^{2}\), Vrac and Thao 2020) and the “Dynamical Optimal Transport Correction” (dOTC, Robin et al. 2019) methods.
R\(^{2}\)D\(^{2}\), developed in the context of marginal/dependence category, relies on an analoguebased method that allows to resample ranks from a reference dataset according to some conditioning information and reconstructs dependence structure of the simulated time series. The information to condition the analogues can be multivariate by considering, for example, a set of variables to be corrected at a given time t. Conditioning for the ranks resampling can also be extended to ranks sequences, i.e. conditioning by not only one but several lagged time steps. Please note that, for the different implementations of \(\hbox {R}^2\hbox {D}^2\) in this study, the multivariate conditioning used includes 4 grid points that cover uniformly the region of interest. In addition, 5 lagged time steps are used for the conditioning, as it has been found to stabilize the \(\hbox {R}^2\hbox {D}^2\) method (not shown). Also, the QQ method is used to correct the marginal properties for \(\hbox {R}^2\hbox {D}^2\) outputs.
Concerning the dOTC method, it was developed in the allinone category, i.e., adjusting the univariate distributions and dependence structures at the same time. The dOTC method takes advantage of the optimal transport theory to construct a multivariate transfer function, named a transport plan, for the adjustment of climate simulations with respect to references while minimizing an associated cost function. This particular transfer function permits to link, through conditional laws, all the multivariate elements from the biased multivariate distribution to their corrections. Corrections are then derived by drawing directly from these conditional laws to obtain the bias corrected data.
Both R\(^{2}\)D\(^{2}\) and dOTC methods are applied according to the spatialdimensional configuration (hereinafter referred to as “Spatial”), where all the 784 time series for a particular physical variable are corrected jointly. While R\(^{2}\)D\(^{2}\) assumes spatial dependence structures (i.e., the rank correlations, or copulas) to be stable in time, the dOTC method makes the hypothesis of nonstationarity of the dependence structure between the calibration and the projection periods, which allows for taking into account the changes of the model (e.g., due to climate change) in the bias correction procedure. Intercomparing the results from both SpatialR\(^{2}\)D\(^{2}\) and SpatialdOTC for adjusting spatial dependence structure of climate simulations with those from MBCCycleGAN allows to better assess how the proposed method performs in a nonstationary context.
Results
In this section, analyses are presented for the winter season (December, January and February) only. CycleGAN models are trained during the calibration period and selected such that energy distances on ranks are minimized. All evaluations are performed on the projection period for the corrected outputs obtained from the two crossvalidation methods and results are compared to those from the reference dataset. For biascorrected precipitation time series, thresholding of 1 mm is applied before evaluation to replace values lower than 1 mm by 0. Bias correction outputs from the first and second crossvalidation methods are evaluated in terms of both marginal and spatial properties. Analyses of temporal properties are only provided for outputs from the second crossvalidation method, in which calibration and projection periods are divided chronologically and hence do not distort temporal properties, contrary to the first crossvalidation method that randomly defines these periods. To assess the potential benefits of considering spatial aspects in the correction procedure, the univariate QQ method (Déqué 2007) is also included in the study as a benchmark.
MOS vs. PP
Training of MBCCycleGANs
Figure 5 shows energy distances with respect to SAFRAN references for temperature computed on physical values (Fig. 5a, b) and ranks (Fig. 5c, d) for LR SAFRAN, plain IPSL simulations, 1dQQ, and MBCCycleGAN (MBCCG) outputs during the training on the calibration period. In addition, results for RawCycleGAN (RawCG) are presented. Differences between RawCG and MBCCG only lie in their marginal properties: while RawCG corresponds to the outputs obtained from the CycleGAN after denormalization at the end of Step 3, MBCCG is the combination of the spatial structure from RawCG and univariate properties from QQ outputs (see the flowchart provided in Fig. 3). The results for precipitation are presented in Fig. S1 of the Supplement.
Clearly, Fig. 5a, b show large energy distances computed on physical values of temperature for LR SAFRAN and IPSL datasets, indicating some biases on spatial structures for those dataset with respect to SAFRAN references. Adjusting marginal properties with the univariate QQ method reduces values of energy distance computed on physical values, highlighting the influence of marginal properties on spatial features. Correction of the spatial dependence structure provided by MBCCG occurs relatively quickly, with energy distances on physical variables reduced by 2 compared to QQ after approximately 1000 epochs for both PP and MOS approaches. However, for RawCG, marginal properties generated by the inverse pointwise minmax normalization do not seem to improve values of energy distances, which justifies the postprocessing of univariate properties adopted in the MBCCycleGAN method with the Schaake Shuffle.
Figure 5c, d show that computing energy distances on ranks for temperature removes the influence of univariate properties on spatial features. Energy distances for both LR SAFRAN and IPSL with their respective QQ corrections are indeed the same (Fig. 5c). The same remark holds for MBCCG and RawCG energy distances on ranks that have, by construction, similar spatial dependence structures. As explained in Sect. 3.3.3, the CycleGAN model that minimizes the energy distance on ranks of MBCCycleGAN outputs is selected.
For precipitation (Fig. S1), the same conclusions hold, indicating a relative ability of the CycleGAN to adjust spatial dependence structure of precipitation fields. Nevertheless, contrary to temperature, one should remark that energy distances on ranks are different for LR SAFRAN, IPSL and their respective QQ corrections (Figs. S1c, d), which is specific to precipitation variables that can contain several null values for dry events. Indeed, ranks are computed here such that, when tied values are encountered, the minimum value of rank is attributed to each tied value. The combination of the correction with the QQ method and the thresholding for precipitation below 1 mm could modify the frequency of dry events, which could result in obtaining different rank structures, and hence, mechanically, different energy distances with respect to SAFRAN references. This mechanism is also obtained between MBCCG and RawCG (Figs. S1c, d), that present different energy distances due to the difference of dry events.
Univariate distribution properties
Once the CycleGAN models have been selected for both the PP and MOS approaches, the corrections of IPSL simulations can be performed for the projection period. First, biascorrected data are evaluated in terms of univariate statistics. For temperature and precipitation, differences of mean values between the bias corrected data and the SAFRAN references are computed at each grid cell. For temperature mean, absolute differences are computed, while for precipitation variables having absolute zeros, relative mean differences are more appropriate. Maps of differences with respect to the reference—for IPSL simulations and the biascorrected data—are displayed in Fig. 6 for both temperature and precipitation. The mean absolute error (MAE) with respect to the reference dataset is also reported on each map. For more results on marginal properties, maps of standard deviation relative differences for both physical variables are also provided in Fig. S2 of the Supplement.
For both temperature and precipitation, the maps for the IPSL model (Fig. 6c, d) present large values of mean differences with respect to the SAFRAN map (Fig. 6a, b) and highlight the need to adjust univariate properties of simulations. Maps provided by 1dQQ outputs (Fig. 6e, f) indicate that, as expected, the univariate method globally improves marginal properties at each individual site. In agreement with the properties of the marginal/dependence MBC methods, maps for MBCCG for PP (MBCCGPP, Fig. 6g, h) and MOS (MBCCGMOS, Fig. 6i, j) are exactly the same as those from the 1dQQ method. Indeed, by construction, the univariate distribution properties are identical between QQ and MBCCycleGAN outputs, regardless of the spatial correlation adjustments. Although MBCCGPP and MBCCGMOS do not use the same data for the training of the CycleGAN to adjust spatial features, same marginals are taken from the QQ outputs of IPSL data, which results in obtaining the same univariate properties between the three corrections.
Spatial correlations
Quality of the corrections in terms of spatial correlations is now assessed. For each grid cell, spatial dependencies are evaluated for temperature and precipitation by computing Pearson pairwise correlations between the cell of interest and each of the remaining 783 grid cells over the region of Paris for the different climate datasets. The biases of these 783 spatial Pearson correlations are then summarized by computing the Mean Squared Error (MSE) with the corresponding 783 correlations computed for the references. By computing the MSE values for each grid cell, 784 MSE values are obtained for each climate dataset and can be intercompared from one dataset to another. Figure 7 shows the boxplots of the MSE values obtained for both temperature and precipitation for the plain IPSL simulations and BC outputs. For both variables, the boxplots for the IPSL simulations indicate strong values of MSE with respect to SAFRAN references. For QQ outputs, only slight reductions of MSE of spatial correlations are observed compared to those from IPSL, indicating that QQ globally conserves the spatial structure of the IPSL model. This result could have been expected, as, for each site, the univariate QQ method does not modify (too much) rank sequences of the simulated time series. The slight improvement of spatial statistics, which is greater for precipitation (Fig. 7b) than temperature (Fig. 7a), is in fact mainly attributable to the correction of univariate properties provided by the QQ method. Concerning MBCCycleGAN, the PP and MOS approaches display different performances in adjusting the spatial properties of simulations. Boxplots of MSE for MBCCGMOS indicate clear improvements of spatial correlations with respect to QQ outputs for both temperature and, to a lesser extent, precipitation. However, results for MBCCGPP show less pronounced improvements, suggesting a failure for the MBCCGPP approach to adjust spatial properties. This difference of performance for the PP approach indicates that, although CycleGAN models are able to learn the spatial relationships between largescale predictors (LR SAFRAN) and localscale predictands (SAFRAN) during the training of the algorithm, as previously shown in Figs. 5 and S1, these relationships do not prove to be suited for adjusting IPSL simulations. Indeed, simulated largescale predictors seem here to present too large biases with respect to LR SAFRAN to make the CycleGAN fitted in a PP context applicable to the IPSL simulations. Hence, the perfectprognosis approach should be discarded in our context of bias correction of climate simulations. Therefore, in the following, only the MOS approach of MBCCG is further investigated.
MBCCycleGAN in the nonstationary context
In the following, analyses are presented for the application of the MBCCycleGAN method with the MOS approach in a nonstationary context using the second crossvalidation method. Results for the correction of the three datasets  IPSL, IPSLbis and LR SAFRAN  with different changes in marginal and dependence properties between the calibration and projection periods are provided.
Univariate distribution properties
Similarly to the first crossvalidation method, univariate properties are evaluated using mean differences computed at each grid cell. Figure 8 shows, for the biascorrected outputs from the three bias correction exercises, the maps of temperature mean differences with respect to SAFRAN references. Maps for precipitation relative mean differences are presented in Fig. S6 of the Supplement. For information purposes only, standard deviation relative mean differences for temperature and precipitation are also displayed in Figs. S7 and S8, respectively.
For temperature, values of IPSL and IPSLbis mean differences (Fig. 8b, c) are high, indicating strong biases of temperature mean with respect to the SAFRAN reference dataset (Fig. 8a), although less pronounced for IPSLbis. This was somehow expected since IPSLbis data are specifically constructed to mimic the SAFRAN changes in terms of marginal (and dependence) properties. It results here in having IPSLbis temperature means closer to those from SAFRAN reference for the projection period. Map for LR SAFRAN (Fig. 8d) shows small differences with the reference. Clear improvements of the temperature mean are provided by the QQ method for each of the bias correction exercises (Fig. 8e–g). Nevertheless, quite interestingly, QQ method provides less pronounced improvements for IPSL data (Fig. 8e), suggesting a degrading effect on results of correction when changes of marginal properties between calibration and projection periods for the climate data to be corrected are not in agreement with those from the references. With regard to the performances of the MBC methods, MBCCycleGAN presents exactly the same results as the QQ method (Fig. 8h–j), in agreement with the marginal/dependence MBC properties. For Spatial\(\hbox {R}^2\hbox {D}^2\) (S\(\hbox {R}^2\hbox {D}^2\)), very slight modifications of the marginal mean values provided by QQ are observed (Fig. 8k–m), due to the use of the multivariate conditioning to adjust spatial dependence structure (Vrac and Thao 2020). Concerning SpatialdOTC (SdOTC), the corrected outputs for IPSLbis (Fig. 8o) and LR SAFRAN (Fig. 8p) present results similar to those obtained for QQ and MBCCycleGAN. However, it is worth mentioning that, for the correction of IPSL, SdOTC (Fig. 8n) slightly improves marginal properties (MAE=0.37) compared to those obtained from QQ outputs (MAE=0.42).
For precipitation relative mean differences (Fig. S6), the same conclusions hold for each (M)BC method, indicating no particular influence of the variable to correct on the results of the marginal statistics adjustment.
Spatial correlations
We now evaluate the ability of MBCCycleGAN to adjust spatial dependence. First, as for the Sect. 5.1, we compute MSE of spatial Pearson correlations for both temperature and precipitation. Figure 9 displays the results with boxplots for the different datasets to correct and their adjusted outputs. Scatterplots of MSE values with respect to QQ outputs are presented in Fig. S9 to better assess the potential benefits of using MBC methods relative to univariate ones. For temperature (Fig. 9a), the positive values of MSE for IPSL suggest biases with respect to the SAFRAN references, illustrating the necessity to correct spatial properties of the model before using it in subsequent analyses. For IPSLbis, MSE values are slightly smaller, but still indicates strong differences of spatial correlations with respect to the references. The difference of results between IPSL and IPSLbis highlights that discrepancies of changes with the references can potentially have a nonnegligible effect on spatial properties; in fact, reducing those discrepancies as it is done with the generation of IPSLbis leads here to reduce biases in spatial correlations. Concerning LR SAFRAN, MSE values are small, suggesting that upscaling the reference dataset deteriorates only slightly its spatial structure. By simply correcting univariate distributions, the three QQ outputs do not present a particular improvement of temperature MSE values. Clear improvements of the spatial correlation structures are provided by the MBCCycleGAN method for the adjustment of IPSL, IPSLbis and LR SAFRAN, although some differences of performances are observed between the three corrected outputs. Temperature MSE values are indeed closer to 0 for the correction of LR SAFRAN than for the correction of IPSLbis and IPSL, for which similar results are obtained.
Concerning Spatial\(\hbox {R}^2\hbox {D}^2\), the corrections of IPSL and IPSLbis provide major improvements in adjusting the spatial correlations. In particular, better results are obtained for the correction of IPSLbis. However, with regard to the Spatial\(\hbox {R}^2\hbox {D}^2\) outputs with LR SAFRAN, the benefits provided by \(\hbox {R}^2\hbox {D}^2\) are smaller, as not all of the spatial correlations are improved. This result can better be seen in Fig. S9e. This contrasted performance for the \(\hbox {R}^2\hbox {D}^2\) method appears in the context of the correction of LR SAFRAN that already presents small spatial biases with respect to SAFRAN references. The correction obtained for LR SAFRAN suggests that the \(\hbox {R}^2\hbox {D}^2\) method is too constrained by the selected conditioning to find an appropriate collection of analogues for the projection period of this specific dataset.
For SpatialdOTC outputs, results present low MSEs values for each bias correction exercise, indicating that spatial correlations are satisfyingly corrected by this method. Nevertheless, the adjustments are slightly better for the corrected output of IPSL than for those for IPSLbis, which may be confusing here. Indeed, as dOTC is specifically designed to take into account the changes of the data to adjust in the correction procedure, better results for IPSLbis, for which changes of spatial correlations are in line with those from SAFRAN references, would have been expected. The great performance of dOTC to correct spatial correlations for IPSL could be due to the fact that, as explained in Appendix 4, IPSL simulated changes for temperature are not in total disagreement with those from SAFRAN, and hence there is no strong discrepancy of changes affecting the corrections.
For precipitation (Fig. 9b), the same conclusions as those drawn for temperature hold. Nevertheless, quite interestingly, IPSL and IPSLbis data present even larger differences of MSE values. This shows the effects on spatial correlations of the strong discrepancies of precipitation changes between the IPSL model and the references observed in Appendix 4: reducing this discrepancy of marginal and spatial changes with IPSLbis decreases significantly the biases on spatial correlations. In contrast with temperature, these differences of spatial correlations for precipitation between IPSL and IPSLbis are significant enough to spread itself in the biascorrected outputs: for each of the BC methods, the corrected outputs for IPSLbis present systematically lower MSE values compared to the corrections of IPSL.
To better assess spatial structure adjustments brought by MBCs, the calculation of energy distances between the biascorrected time series and the references are performed for each physical variable according to two different multivariate distributions:

on values of the physical variable directly over the whole region of Paris to assess differences of spatial properties (i.e., including both the marginals and their dependence);

on ranks of the physical variable over the whole region of Paris to assess differences of spatial dependence structures (i.e., without the influence of marginal properties).
Values of energy distances are estimated using a bootstrap method. It consists for each dataset in (i) sampling (with replacement) daily fields, (ii) computing the energy distance on the bootstrapped dataset, and (iii) repeating the previous two steps 1000 times to construct the bootstrap sampling distribution. From this bootstrap sampling, distribution is deduced by the bootstrap estimator (mean of the 1000 energy distances obtained) and a 90% bootstrap sampling interval to provide uncertainty bands of the estimated distance. Results for temperature and precipitation are displayed in Fig. 10. The closer the values of the energy distances are to 0, the closer the spatial properties of the outputs are to the one of the reference data.
For temperature, the two estimators of energy distances on physical values (Fig. 10a) and ranks (Fig. 10b) for IPSL and IPSLbis data are quite high compared to those for LR SAFRAN, which is in agreement with the differences of spatial properties already observed between these datasets and the references in Fig. 9. For the three QQ outputs, while energy distances on physical values are lower (Fig. 10a), similar energy distances on ranks as those from the dataset to correct are obtained (Fig. 10b). It highlights again that, although the QQ method adjusts the univariate distributions, it is not supposed to modify rank sequence of time series, and therefore spatial dependence structures, during the correction procedure. With regard to the three MBC methods for the correction of IPSL, dOTC performs slightly better on raw values (Fig. 10a) than MBCCycleGAN and \(\hbox {R}^2\hbox {D}^2\), for which comparable results are obtained. For energy distances computed on ranks (Fig. 10b), dOTC and \(\hbox {R}^2\hbox {D}^2\) produce similar results. Slightly poorer performances of MBCCycleGAN are obtained compared to the two other MBC methods, although strongly improving the spatial dependence structures of IPSL simulations. Note that, while bootstrap sampling intervals of energy distances on temperature values are overlapping for the three MBC methods, it is less the case for energy distances on temperature ranks, thereby permitting to determine with more confidence the best method for the adjustment of spatial dependence properties. However, it must be mentioned that results of energy distances between the three MBCs are very close. Consequently, differences in performances between MBCs might not be significant. Concerning the correction of IPSLbis, best performances are provided by dOTC for both multivariate distributions. For multivariate distributions with raw values, MBCCycleGAN is second best, while being third for rank dependence structure. This swap of performances between raw values and ranks for MBCCycleGAN and \(\hbox {R}^2\hbox {D}^2\) must be analyzed with caution as differences of estimated energy distances between the two MBC methods are again very small and thus might not be significant. This swap can however be explained by both the strong influence of marginal properties on energy distances and the slight deterioration of marginal properties provided by \(\hbox {R}^2\hbox {D}^2\) compared to the QQ outputs, already mentioned in Sect. 5.2.1. For the corrections of LR SAFRAN, MBCCycleGAN performs best and dOTC second best, with a more significant difference of performance for estimated energy distances evaluated on rank values (Fig. 10b).
For precipitation (Fig. 10c, d), conclusions similar to those obtained for temperature can be drawn for IPSL, IPSLbis and LR SAFRAN outputs. However, conclusions are slightly different for QQ and the MBCs. As already explained in Sect. 5.1, QQ modifies the frequency of dry events and consequently changes the rank dependence structure of precipitation, which results here in an improvement of spatial energy distances on ranks for the 1dQQ corrections of IPSL, IPSLbis and LR SAFRAN. Concerning the performances of the three MBCs for IPSL, \(\hbox {R}^2\hbox {D}^2\) performs best on energy distances for both raw values and ranks, while MBCCycleGAN produces reasonable results, in particular for the adjustment of the rank dependence structure of precipitation. The dOTC method produces results that are clearly unsatisfactory concerning the rank dependence structure of precipitation. Instead of improving the rank dependence structure, dOTC correction strongly degrades it. This underperformance is in fact due to the presence of too many wet events in the corrections provided by dOTC (not shown) compared to the references, which mechanically largely affects the quality of its rank dependence structure for precipitation. For the same reason, this underperformance on precipitation rank dependence structure is also observed for the adjustments of IPSLbis and LR SAFRAN with dOTC. For IPSLbis, estimated energy distances on ranks are similar between MBCCycleGAN and \(\hbox {R}^2\hbox {D}^2\). Note here that similar values of energy distances do not necessarily imply that their spatial dependence structures are similar. Concerning LR SAFRAN corrections, MBCCycleGAN again outperforms both dOTC and \(\hbox {R}^2\hbox {D}^2\) algorithms according to estimated energy distances on raw values and ranks.
Temporal structure
In this section, biascorrected data are evaluated relative to temporal properties. As a reminder, MBCCycleGAN and dOTC methods have been specifically implemented to only adjust marginal and spatial properties of climate simulations. Similarly, the \(\hbox {R}^2\hbox {D}^2\) algorithm is applied to adjust marginal and spatial features but, contrary to the two other methods, it also takes into account (part of) the temporal dependence properties through the multivariate conditioning chosen for its implementation, as previously explained in Sect. 4. In theory, this choice of conditioning dimensions allows \(\hbox {R}^2\hbox {D}^2\) to partially recover temporal properties of the reference dataset (Vrac and Thao 2020). Adjusting spatial coherence necessarily modifies the rank sequences of the initial time series during the correction procedure (e.g., Vrac 2018). It is hence interesting to quantify how strong those modifications are depending on the MBC method, whether temporal properties are taken into account in the correction procedure or not. Evaluation of temporal properties is performed by computing 1d lag Pearson autocorrelations (AR1) at each grid cell for both temperature and precipitation. The resulting maps of differences with respect to SAFRAN references for the different BC outputs are presented in Fig. 11 (resp. Fig. S10) for temperature (resp. precipitation).
For temperature, IPSL shows relatively low values of AR1 differences (Fig. 11b), indicating that temporal properties for temperature are relatively in line with those from the SAFRAN references (Fig. 11a). A similar differences map is provided by IPSLbis outputs (Fig. 11c). In fact, IPSLbis temporal properties are inherited from IPSL outputs: even in a highdimensional context, the twostep procedure—and in particular, the matrixrecorrelation technique—used to construct IPSLbis from IPSL does not lead to a strong modification of temporal properties. This result on temporal properties of data preprocessed with this matrixrecorrelation technique is consistent with the conclusions obtained in François et al. (2020) for a MBC method (MRec) using the same matrixrecorrelation. For LR SAFRAN outputs (Fig. 11d), values of AR1 differences are very close to 0, highlighting that the upscaling step used to construct LR SAFRAN data does not strongly modify the temporal properties of the initial SAFRAN reference dataset, which was expected by construction. Difference maps for temperature from QQ outputs (Fig. 11e–g) are relatively similar to those from the three datasets to adjust, respectively. However, for the three MBC methods used to adjust spatial dependence structure, modifications of temporal properties for temperature are not equivalent. With regard to MBCCycleGAN and dOTC outputs (Fig. 11h, i, j, n, o and p), temporal statistics are close to that from the QQ outputs. It hence suggests that both MBCCycleGAN and dOTC algorithms, although correcting the spatial features, perform little changes of the temporal sequencing of the time series to correct. For MBCCycleGAN, this is partly explained by the fact that, within the CycleGAN procedure, input maps from QQ outputs are transformed to outputs with improved spatial features, whilst not modifying too much the initial input image. It hence results in partially preserving the temporal properties of the QQ outputs used as inputs of the CycleGAN while providing improvements of the spatial representation. This particular point is thereafter discussed in greater details. Concerning \(\hbox {R}^2\hbox {D}^2\) outputs, different results are obtained depending on the dataset to correct. For the correction of both IPSL and IPSLbis (Fig. 11k, l), \(\hbox {R}^2\hbox {D}^2\) provides small improvements of temporal properties of temperature, which illustrates that, by including lags in the conditional dimensions, \(\hbox {R}^2\hbox {D}^2\) is able to improve—in addition to spatial properties—temporal structure of climate datasets. However, for the correction of LR SAFRAN (Fig. 11m), a deterioration of AR1 temperature differences is obtained with respect to initial LR SAFRAN data (Fig. 11d). This result can be linked with the previously mentioned contrasted performances of the \(\hbox {R}^2\hbox {D}^2\) method to adjust LR SAFRAN dataset in Subsect. 5.2.2.
For precipitation (Fig. S10), same conclusions hold for IPSL, IPSLbis and LR SAFRAN outputs. However, contrary to temperature, 1dQQ corrections of IPSL and IPSLbis (Figs. S10e, f) show a pronounced improvement of temporal properties for precipitation, highlighting the potential influence of marginal properties of precipitation time series on its autocorrelation values. Moreover, the improvements of temporal properties of temperature provided by \(\hbox {R}^2\hbox {D}^2\) for the corrections of IPSL and IPSLbis are no longer observed for precipitation (Fig. S10k, l). Instead, temporal properties with unexpected behaviors are obtained, potentially due to the difficulty of \(\hbox {R}^2\hbox {D}^2\) to correct physical variables with events occuring at local scale, such as precipitation (Vrac and Thao 2020). It can also be due to the choice of the conditioning information made in \(\hbox {R}^2\hbox {D}^2\). As a reminder, it is indeed the rank structure of simulated precipitation (resp. temperature) that serves as a conditioning to generate Spatial\(\hbox {R}^2\hbox {D}^2\) outputs for precipitation (resp. temperature). As temporal properties (including rank sequences) of precipitation time series are not well simulated by IPSL model (Fig. S10b) compared to temperature (Fig. 11b), it potentially affects the quality of the corrections—and its temporal properties—provided by Spatial\(\hbox {R}^2\hbox {D}^2\) for precipitation. This highlights the importance of choosing a relevant conditioning dimension for the implementation of \(\hbox {R}^2\hbox {D}^2\) (Vrac and Thao 2020).
To illustrate the fact that MBCCycleGAN performs little changes of the temporal sequencing of the inputs to adjust, we compare corrected daily maps from LR SAFRAN with those from the references. As the LR SAFRAN dataset is temporally matching the SAFRAN dataset by construction, classic forecast statistics such as Root Mean Square Error (RMSE) can indeed be interesting to assess the performances of MBC methods. Table 2 shows, for temperature and precipitation, the RMSE values with respect to SAFRAN references for the different BC outputs of LR SAFRAN. For temperature, the RMSE value between daily maps of the reference and the LR SAFRAN dataset is around 0.36. Slight improvement in terms of RMSE is provided by the QQ method (RMSE = 0.31). As the QQ method preserves the temporal sequencing of the times series to correct, this improvement is only due to the correction of marginal properties. The MBCCycleGAN method presents better results (RMSE = 0.23), permitting to state with more confidence that, while adjustment of spatial dependence structure are performed, it modifies only slightly the temporal sequencing of the times series to correct. For R\(^{2}\)D\(^{2}\) outputs, the RMSE value is quite large (RMSE=1.51), suggesting a strong modification of temporal properties. It can be linked with the underperformance of R\(^{2}\)D\(^{2}\) already observed in Fig. 11m for the correction of LR SAFRAN. Concerning dOTC outputs, the RMSE value (= 0.42) is slightly higher than those observed for LR SAFRAN and QQ outputs. It suggests that the influence of the correction of univariate distributions and spatial dependence on temporal properties provided by dOTC is strong enough to affect its ability to provide appropriate forecasts at a daily scale. For precipitation, the same conclusions hold for the different BC outputs. To better illustrate the results from Table 2, two animations presenting the successive daily temperature and precipitation maps generated by MBCCycleGAN for the correction of LR SAFRAN, as well as the corresponding daily maps from the references and the different BC methods, are provided as supplementary materials.
Conclusion, discussion and future work
Conclusions
Climate simulations biases are typically corrected with univariate BC methods, adjusting one physical variable and one location at a time, and thus spatial dependencies remain uncorrected. In this study, MBCCycleGAN, an adaptation of the CycleGAN approach (Zhu et al. 2017) used to train imagetoimage translation models, was presented, allowing for the adjustment of not only univariate distributions but also spatial dependence structures of climate simulations. The new suggested MBC method takes advantage of convolutional neural networks with simple architecture that are trained in competition to adjust spatial properties of simulated variables. The MBCCycleGAN method was tested by adjusting temperature and precipitation time series from IPSL simulations with respect to the SAFRAN dataset over the region of Paris using two different crossvalidation methods. The first crossvalidation, that defines randomly calibration and projection periods, allows to test the new methodology in a stationary context. We took advantage of this first crossvalidation method to compare two postprocessing schemes (PP and MOS) approaches that differ in the statistical relationships the MBCCycleGAN model learns to adjust spatial dependences. The MOS approach that considers biases to refer to systematic distributional differences between references and simulated climate variables was found to be more appropriate for the implementation of the MBCCycleGAN method and was chosen to be applied for the rest of the study. The second crossvalidation method, that defines chronologically calibration and projection periods, was then used to evaluate the ability of the MBCCycleGAN method to adjust climate datasets in a nonstationary context. As IPSL simulations and SAFRAN references present different marginal and spatial changes between calibration and projection periods, two additional climate datasets (LR SAFRAN and IPSLbis) with changes that are in line with the references were specifically constructed and adjusted, allowing to better assess the quality of the corrections provided by the new method depending on the statistical biases of the data to be corrected. A wide range of metrics has been used to evaluate bias adjustment outputs with references and initial climate data and assess the corrections of univariate distributions, spatial correlations and temporal properties. In addition to the 1dQQ method, two stateoftheart MBC (\(\hbox {R}^2\hbox {D}^2\) and dOTC) methods have been implemented and used as benchmarks to better evaluate the influence of nonstationary properties on the results of the MBCCycleGAN method. The results indicate that all the (M)BC methods implemented in this study generally present similar corrections of univariate distributions. Regarding spatial properties, the benefits of using MBC methods are clear compared to the 1dQQ method. The MBCCycleGAN method produced reasonable adjustments of spatial correlations with respect to \(\hbox {R}^2\hbox {D}^2\) and dOTC methods for both temperature and precipitation and the three different climate datasets to adjust. Concerning the temporal aspect, the MBCCycleGAN method is not designed to correct this specific statistical property and tends to conserve the temporal sequencing of the time series to correct. Combined with the corrections of spatial features, this property has proved to be particularly interesting for the applications of MBCCycleGAN when the data to correct temporally match the references (e.g., as for LR SAFRAN and SAFRAN dataset, see Sect. 5.2.2). The proposed method indeed outperformed all the others (M)BC alternatives for the correction of LR SAFRAN by generally presenting both spatial and temporal statistics closer to those from the references. Concerning nonstationary properties, it has been found that changes of both marginal and spatial properties between the calibration and projection periods of the climate data to adjust can have a nonnegligible effect on the quality of corrections from the MBCCycleGAN algorithm, and more generally from all (M)BC outputs. In a general way, better results are obtained for the corrections of simulations with changes that are in agreement with those from the references, whether the MBCs make the assumption of nonstationarity of marginal properties and dependence structures or not.
Discussion and perspectives
In this study, the development of the MBCCycleGAN method was mainly intended as a proof of concept, in order to test if GANs can be used for multivariate bias correction of climate simulations. Although bringing results with comparable performances of correction to that of wellestablished MBC methods, several avenues can be considered for the improvement of the proposed algorithm.
First, in order to remain in a context of proof of concept, a simple architecture of neural networks with a small number of convolutional layers has been considered for the discriminators and generators constituting the MBCCycleGAN method. In the same idea, a classic formulation of the CycleGAN procedure—as initially described in Zhu et al. (2017)—has been used with a binarycross entropy loss function for the adversarial training (Eq. 1). Improving the training performances of GANs through more advanced architectures and optimization techniques is an active area of research (e.g., Salimans et al. 2016; Arjovsky et al. 2017; Karras et al. 2018, among others). A first natural step to potentially improve results would be to opt for a more sophisticated CycleGAN model. For example, it can be done by adding more layers in the neural network architectures of both generators and discriminators to potentially capture more complex spatial relationships for the correction of climate simulations. Also, modifying the initial adversarial loss functions (\(L_{GAN}\) in Eq. 1), as proposed in Arjovsky et al. (2017), would be interesting as it could permit to improve the stability of the learning and can prevent from mode collapse issues. However, although progress is constantly increasing concerning GANs, it is wellknown that this particular class of neural networks can be more difficult to train than classical neural networks (e.g., Wu et al. 2020). The possibilities of modifications of the parameters defining a CycleGAN model are numerous, and a priori do not guarantee to improve the overall performance of the CycleGAN for the specific application of bias correction. Testing the different possibilities goes way beyond the scope of the present study and is left for future work.
Second, it has to be noted that our method, by combining the 1dQQ method and the CycleGAN approach to adjust both marginal and spatial properties, is not designed to specifically account for any simulated changes for future periods. For marginal properties, other 1dBC methods that are able to account for potential changes of univariate CDFs from the calibration to the projection period (e.g., CDFt or QDM, Vrac et al. 2012; Cannon et al. 2015) can of course be employed instead of QQ, as long as they do not modify (too much) rank sequence of temperature and precipitation time series and thus do not distort the convergence of the CycleGAN procedure. Concerning changes of spatial properties, the CycleGAN approach as implemented in this study is based on the key assumption that the conditional distributions \(\mathbf {XY}\) and \(\mathbf {YX}\) are the same in the training (i.e., calibration) and test (i.e., projection) datasets. It results in our context in making a strong assumption on copula stationarity between present and future periods. Although spatial dependence structures can be considered to be stable in time as imposed by physical laws over a specific region of interest (e.g., Vrac 2018), it can not be generalized to each of the physical variables and regions. For example, more concentrated spatial rainfall events are expected with higher temperatures in the future (Guinard et al. 2015; Wasko et al. 2016). Therefore, should the changes in spatial properties in the simulations between calibration and projection periods be reproduced in the correction? By comparing our results obtained with different levels of nonstationarity in the model evolution and with two wellestablished MBCs based on copula stationarity (\(\hbox {R}^2\hbox {D}^2\)) and nonstationarity (dOTC) for future periods, we shed light on how the nonstationary properties of the simulations are taken into account by the different multivariate BC methods. The benefits of considering MBC methods assuming copula nonstationarity for the correction of such climate dataset are not always as clearcut as expected compared to MBC methods assuming copula stationarity. This raises the question of whether developing MBC methods assuming copula nonstationarity is justified, i.e., whether it is worth striving for developing complicated statistical methods that consider the simulated evolution of copula in the correction procedure, and, in the end, do not produce drastically better results than MBCs assuming copula stationarity. In practice, accounting for nonstationarity of simulations in bias correction procedures still remains an open question which needs to be answered on a casebycase basis. Developing new MBC methods that are specifically able to reproduce these simulated changes in the correction is of course an important perspective but the application of such methods would be inappropriate as long as the changes from climate simulations for future periods have not been first identified as relevant.
Third, the MBCCycleGAN method has been developed to correct spatial correlations of climate simulations for each physical variable separately, and thus does neither consider the adjustment of intervariable correlations nor temporal structure. A possible extension of the initial method can be the consideration of intervariable and/or temporal correlations by providing to the CycleGAN model images with not only one but several channels of the different physical variables to correct. For example, for the adjustment of intervariable correlations between temperature and precipitation, concatenated images of daily temperature and precipitation maps in an array of dimension \(2 \times 28 \times 28\) can be provided as inputs to the adversarial neural network. Similarly, adjusting temporal correlations could be considered by adding channels with lagged versions of the physical variable. Using images with additional channels would imply to change, at least, the neural network architecture by replacing 2dconvolutional neural networks with 3dones to allow the CycleGAN model to consider interchannels correlations. However, as adding additional channels can potentially make the training of the CycleGAN more complicated, it is likely that others changes relative to the architecture of neural networks and optimization techniques would be required, as those mentioned previously.
Fourth, according to the results for the correction of the references at largescale (LR SAFRAN), MBCCycleGAN showed greater improvements of both spatial and temporal statistics compared to the other MBC methods. These promising results suggest that MBCCycleGAN can be used directly in downscaling applications, a practice that is not initially recommended with univariate quantile mapping techniques (Maraun 2013; Gutmann et al. 2014). Although producing reasonable results of adjustments for temperature and precipitation spatial distributions of IPSL and IPSLbis datasets, the outperformance of MBCCycleGAN observed for the correction of LR SAFRAN is not obtained for these climate outputs. A possible reason explaining why the performances of MBCCycleGAN differ between these three exercises of correction concerns the importance of the distributional differences between the inputs and target dataset considered. Indeed, unsupervised imagetoimage translation algorithms such as CycleGAN can present difficulties to map two random variables \(\mathbf{X}\) and \(\mathbf{Y}\) with probability distributions that exhibit strong differences (Gokaslan et al. 2019; Royer et al. 2020). As LR SAFRAN presents smaller bias with the references than IPSL and IPSLbis data, outstanding results are obtained for the correction of LR SAFRAN with MBCCycleGAN, while more moderate quality results are produced for IPSL and IPSLbis. Improving the MBCCycleGAN algorithm such that it is able to produce satisfactory results even when distributions with very strong (marginal and spatial) differences are considered is of great interest to allow its use for operational purposes.
Fifth, in this study, particular precautions have been taken to prevent overfitting during training of CycleGAN networks, such as including a regularization technique called “dropout” in both generators and discriminators architectures (see Appendix B for further details), or verifying that the performances of MBCCycleGAN on projection periods are not deteriorated along training (not shown). These precautions permit to apply with confidence MBCCycleGAN algorithms on projection periods. The issue of overfitting raises the question of the generalization capability of statistical models, and how they cope with new (and unseen) data. In most of the study, calibration and projection periods have been defined chronologically for the 1979–2016 period, and one can argue that small differences in terms of spatial properties are obtained between the two periods. Assessing the performances of the MBCCycleGAN algorithm for the adjustment of climate projections with very different spatial structures remains an interesting perspective. For example, this could be done by adapting the methodology used for the generation of IPSLbis to generate alternative climate simulations for the projection period with strong spatial changes, and apply the pretrained CycleGAN neural network used for the correction of IPSL in this study.
Finally, as implemented in this study, the proposed MBCCycleGAN algorithm produces a single correction (output) for a given input. Although essential in climate applications, uncertainty quantification of MBCCycleGAN outputs is not estimated here. An interesting possibility of extension to model uncertainty of corrected outputs would be to introduce some stochasticity into the correction procedure by giving to the generators not only daily maps to adjust but also vectors of random noises. Then, for a given daily map, it would produce an ensemble of plausible corrections. The spread between the ensemble members would represent the uncertainty associated with the multivariate bias correction.
We hope that this study serves as a starting point for the use of GANs for multivariate bias correction of climate simulations. One of the main advantages of using MBCCycleGAN is that adjustment is performed images by images, i.e. maps by maps. If well trained, discriminators somehow guarantee that individual generated maps produced by generators are realistic with respect to references, while daily maps with strong statistical artefacts are rejected. This is not the case for the other MBC methods such as \(\hbox {R}^2\hbox {D}^2\) or dOTC, that provide corrected simulations with appropriate distributional statistics without being particularly constrained to generate realistic daily maps. Providing corrections with realistic maps at a daily scale can be useful for the scientific community working on climate change impacts, e.g., in hydrology, for which daily spatial features are of major concern.
Availability of data and material
The IPSLCM5AMR model data simulations as part of the CMIP5 climate model simulations can be downloaded through the Earth System Grid Federation portals. Instructions to access the data are available here: https://pcmdi.llnl.gov/mips/cmip5/dataaccessgettingstarted.html, last access: 06 September 2020, (PCMDI, 1989). The SAFRAN reanalysis dataset is available upon request to the French National Centre for Meteorological Research (CNRM, MétéoFrance CNRS).
References
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. arXiv:1701.07875
BañoMedina J, Manzanas R, Gutiérrez JM (2020) Configuration and intercomparison of deep learning neural models for statistical downscaling. Geosci Model Dev 13(4):2109–2124. https://doi.org/10.5194/gmd1321092020
Bárdossy A, Pegram G (2012) Multiscale spatial recorrelation of RCM precipitation to produce unbiased climate change scenarios over large areas and small. Water Resour Res 48:9502. https://doi.org/10.1029/2011WR011524
Bartok B, Tobin I, Vautard R, Vrac M, Jin X, Levavasseur G, Denvil S, Dubus L, Parey S, Michelangeli PA, Troccoli A, SaintDrenan YM (2019) A climate projection dataset tailored for the European energy sector. Clim Serv 16(100):138. https://doi.org/10.1016/j.cliser.2019.100138
Bates B, Kundzewicz Z, Wu S, Burkett V, Doell P, Gwary D, Hanson C, Heij B, Jiménez B, Kaser G, Kitoh A, Kovats S, Kumar P, Magadza C, Martino D, Mata L, Medany M, Miller K, Arnell N (2008) Climate change and water. Technical Paper of the Intergovernmental Panel on Climate Change. Tech. rep, The Intergovernmental Panel on Climate Change
Beltrami E (1873) Sulle funzioni bilineari. Giornale Mat Uso degli Stud Delle Univ 11:98–106
Berg P, Feldmann H, Panitz HJ (2012) Bias correction of high resolution regional climate model data. J Hydrol 448–449:80–92. https://doi.org/10.1016/j.jhydrol.2012.04.026
Bhatia S, Jain A, Hooi B (2020) ExGAN: adversarial generation of extreme samples. arXiv:2009.08454
Bihlo A (2020) A generative adversarial network approach to (ensemble) weather prediction. arXiv:2006.07718
Caminade C, Kovats S, Rocklov J, Tompkins AM, Morse AP, ColónGonzález FJ, Stenlund H, Martens P, Lloyd SJ (2014) Impact of climate change on global malaria distribution. Proc Natl Acad Sci USA 111(9):3286–3291. https://doi.org/10.1073/pnas.1302089111
Cannon AJ (2018) Multivariate quantile mapping bias correction: an Ndimensional probability density function transform for climate model simulations of multiple variables. Clim Dyn 50(1):31–49. https://doi.org/10.1007/s0038201735806
Cannon A, Sobie S, Murdock T (2015) Bias correction of gcm precipitation by quantile mapping: how well do methods preserve changes in quantiles and extremes? J Clim 28(17):6938–6959. https://doi.org/10.1175/JCLID1400754.1
Cattiaux J, Douville H, Peings Y (2013) European temperatures in CMIP5: origins of presentday biases and future uncertainties. Clim Dyn 41:2889–2907. https://doi.org/10.1007/s003820131731y
Chapman WE, Subramanian AC, Delle Monache L, Xie SP, Ralph FM (2019) Improving atmospheric river forecasts with machine learning. Geophys Res Lett 46(17–18):10627–10635. https://doi.org/10.1029/2019GL083662
Christensen JH, Boberg F, Christensen OB, LucasPicher P (2008) On the need for bias correction of regional climate change projections of temperature and precipitation. Geophys Res Lett 35(20):L20709. https://doi.org/10.1029/2008GL035694
Clark M, Gangopadhyay S, Hay L, Rajagopalan B, Wilby R (2004) The Schaake shuffle: a method for reconstructing spacetime variability in forecasted precipitation and temperature fields. J Hydrometeor 5(1):243–262
Defrance D, Ramstein G, Charbit S, Vrac M, Famien AM, Sultan B, Swingedouw D, Dumas C, Gemenne F, AlvarezSolas J, Vanderlinden JP (2017) Consequences of rapid ice sheet melting on the Sahelian population vulnerability. Proc Natl Acad Sci USA 114(25):6533–6538. https://doi.org/10.1073/pnas.1619358114
Dekens L, Parey S, Grandjacques M, DacunhaCastelle D (2017) Multivariate distribution correction of climate model outputs: a generalization of quantile mapping approaches: multivariate distribution correction of climate model outputs. Environmetrics 28:e2454. https://doi.org/10.1002/env.2454
Denton E, Chintala S, Szlam A, Fergus R (2015) Deep generative image models using a laplacian pyramid of adversarial networks. arXiv:1506.05751
Déqué M (2007) Frequency of precipitation and temperature extremes over France in an anthropogenic scenario: model results and statistical correction according to observed values. Glob Planet Change 57(1):16–26. https://doi.org/10.1016/j.gloplacha.2006.11.030
Dufresne JL, Foujols MA, Denvil S, Caubel A, Marti O, Aumont O, Balkanski Y, Bekki S, Bellenger H, Benshila R, Bony S, Bopp L, Braconnot P, Brockmann P, Cadule P, Cheruy F, Codron F, Cozic A, Cugnet D, de Noblet N, Duvel JP, Ethé C, Fairhead L, Fichefet T, Flavoni S, Friedlingstein P, Grandpeix JY, Guez L, Guilyardi E, Hauglustaine D, Hourdin F, Idelkadi A, Ghattas J, Joussaume S, Kageyama M, Krinner G, Labetoulle S, Lahellec A, Lefebvre MP, Lefevre F, Levy C, Li ZX, Lloyd J, Lott F, Madec G, Mancip M, Marchand M, Masson S, Meurdesoif Y, Mignot J, Musat I, Parouty S, Polcher J, Rio C, Schulz M, Swingedouw D, Szopa S, Talandier C, Terray P, Viovy N, Vuichard N (2013) Climate change projections using the IPSLCM5 Earth System Model: from CMIP3 to CMIP5. Clim Dyn 40(9):2123–2165. https://doi.org/10.1007/s0038201216361
Eden J, Widmann M, Grawe D, Rast S (2012) Skill, correction, and downscaling of GCMsimulated precipitation. J Clim 25:3970–3984. https://doi.org/10.1175/JCLID1100254.1
Fisher RA (1915) Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10(4):507–521
François B, Vrac M, Cannon AJ, Robin Y, Allard D (2020) Multivariate bias corrections of climate simulations: which benefits for which losses? Earth Syst Dyn 2020:1–41. https://doi.org/10.5194/esd202010
Gagne DJ II, Christensen HM, Subramanian AC, Monahan AH (2020) Machine learning for stochastic parameterization: generative adversarial networks in the Lorenz ‘96 model. J Adv Model Earth Syst 12(3):e2019MS001896. https://doi.org/10.1029/2019MS001896
Gan Z, Chen L, Wang W, Pu Y, Zhang Y, Liu H, Li C, Carin L (2017) Triangle generative adversarial networks. arXiv:1709.06548
Gauthier J (2014) Conditional generative adversarial nets for convolutional face generation. In: Class Project for Stanford CS231N: convolutional neural networks for visual recognition, Winter semester vol. 5, p 2
Gokaslan A, Ramanujan V, Ritchie D, Kim KI, Tompkin J (2019) Improving shape deformation in unsupervised imagetoimage translation. arXiv:1808.04325
Goodfellow I, PougetAbadie J, Mirza M, Xu B, WardeFarley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst. https://doi.org/10.1145/3422622
Gudmundsson L, Bremnes JB, Haugen JE, EngenSkaugen T (2012) Technical note: downscaling RCM precipitation to the station scale using statistical transformations—a comparison of methods. Hydrol Earth Syst Sci 16(9):3383–3390. https://doi.org/10.5194/hess1633832012
Guinard K, Mailhot A, Caya D (2015) Projected changes in characteristics of precipitation spatial structures over North America. Int J Climatol 35:596–612. https://doi.org/10.1002/joc.4006
Guo Q, Chen J, Zhang X, Shen M, Chen H, Guo S (2019) A new twostage multivariate quantile mapping method for bias correcting climate model outputs. Clim Dyn 53(5):3603–3623. https://doi.org/10.1007/s0038201904729w
Gutmann E, Pruitt T, Clark M, Brekke L, Arnold J, Raff D, Rasmussen R (2014) An intercomparison of statistical downscaling methods used for water resource assessments in the United States. Water Resour Res 50:7167–7186. https://doi.org/10.1002/2014WR015559
Haddad Z, Rosenfeld D (1997) Optimality of empirical ZR relations. Q J R Meteor Soc 123(541):1283–1293. https://doi.org/10.1002/qj.49712354107
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778, https://doi.org/10.1109/CVPR.2016.90
Hnilica J, Hanel M, Puš V (2017) Multisite bias correction of precipitation data from regional climate models. Int J Climatol 37:2934–2946. https://doi.org/10.1002/joc.4890
IPCC (2014) Climate change 2014: synthesis report. In: Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Core Writing Team, R.K. Pachauri and L.A. Meyer (eds.)]. IPCC, Geneva, Switzerland, p 151. https://www.ipcc.ch/report/ar5/syr/
Isola P, Zhu JY, Zhou T, Efros AA (2017) Imagetoimage translation with conditional adversarial networks. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 5967–5976, https://doi.org/10.1109/CVPR.2017.632
Jordan C (1874a) Mémoire sur les formes bilinéaires. J Math Pures Appl 19(Deuxième Série):35–54
Jordan C (1874b) Sur la réduction des formes bilinéaires. C R Acad Sci Paris 78(Deuxième Série):614–617
Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover crossdomain relations with generative adversarial networks. arXiv:1703.05192
Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. arXiv:1412.6980
Lecun Y, Bengio Y (1995) Convolutional networks for images, speech, and timeseries. In: Arbib MA (ed) The handbook of brain theory and neural networks. MIT Press, Cambridge, MA, pp 255–258
Leinonen J, Berne A (2020) Unsupervised classification of snowflake images using a generative adversarial network and \(K\)medoids classification. Atmos Meas Tech 13(6):2949–2964. https://doi.org/10.5194/amt1329492020
Leinonen J, Nerini D, Berne A (2020) Stochastic superresolution for downscaling timeevolving atmospheric fields with a generative adversarial network. IEEE Trans Geosci Remote Sens. https://doi.org/10.1109/TGRS.2020.3032790
Liu Y, Racah E, Prabhat, Correa J, Khosrowshahi A, Lavers D, Kunkel K, Wehner M, Collins W (2016) Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv:1605.01156
Mao X, Li Q, Xie H, Lau RYK, Wang Z, Smolley SP (2017) Least squares generative adversarial networks. arXiv:1611.04076
Maraun D (2013) Bias correction, quantile mapping, and downscaling: revisiting the inflation issue. J Clim 26(6):2137–2143. https://doi.org/10.1175/JCLID1200821.1
Maraun D (2016) Bias correcting climate change simulations—a critical review. Curr Clim Chang Rep 2:211–220. https://doi.org/10.1007/s406410160050x
Maraun D, Wetterhall F, Ireson AM, Chandler RE, Kendon EJ, Widmann M, Brienen S, Rust HW, Sauter T, Themeßl M, Venema VKC, Chun KP, Goodess CM, Jones RG, Onof C, Vrac M, ThieleEich I (2010) Precipitation downscaling under climate change: recent developments to bridge the gap between dynamical models and the end user. Rev Geophys. https://doi.org/10.1029/2009RG000314
Marti O, Braconnot P, Dufresne JL, Bellier J, Benshila R, Bony S, Brockmann P, Cadule P, Caubel A, Codron F, de Noblet N, Denvil S, Fairhead L, Fichefet T, Foujols MA, Friedlingstein P, Goosse H, Grandpeix JY, Guilyardi E, Hourdin F, Idelkadi A, Kageyama M, Krinner G, Lévy C, Madec G, Mignot J, Musat I, Swingedouw D, Talandier C (2010) Key features of the IPSL ocean atmosphere model and its sensitivity to atmospheric resolution. Clim Dyn 34:1–26. https://doi.org/10.1007/S0038200906406
Mehrotra R, Sharma A (2016) A multivariate quantilematching bias correction approach with auto and crossdependence across multiple time scales: implications for downscaling. J Clim 29(10):3519–3539. https://doi.org/10.1175/JCLID150356.1
Mehrotra R, Sharma A (2019) A resampling approach for correcting systematic spatiotemporal biases for multiple variables in a changing climate. Water Resour Res 55(1):754–770. https://doi.org/10.1029/2018WR023270
Menick J, Kalchbrenner N (2018) Generating high fidelity images with subscale pixel networks and multidimensional upscaling. arXiv:1812.01608
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Mueller B, Seneviratne S (2014) Systematic land climate and evapotranspiration biases in CMIP5 simulations. Geophys Res Lett 41:128–134. https://doi.org/10.1002/2013GL058055
Muerth MJ, Gauvin StDenis B, Ricard S, Velázquez JA, Schmid J, Minville M, Caya D, Chaumont D, Ludwig R, Turcotte R (2013) On the need for bias correction in regional climate scenarios to assess climate change impacts on river runoff. Hydrol Earth Syst Sci 17(3):1189–1204. https://doi.org/10.5194/hess1711892013
Nahar J, Johnson F, Sharma A (2018) Addressing spatial dependence bias in climate model simulations—an independent component analysis approach. Water Resour Res 54(2):827–841. https://doi.org/10.1002/2017WR021293
Nguyen H, Mehrotra R, Sharma A (2019) Correcting systematic biases across multiple atmospheric variables in the frequency domain. Clim Dyn 52:1283–1298. https://doi.org/10.1007/s0038201841916
Piani C, Haerter J (2012) Two dimensional bias correction of temperature and precipitation copulas in climate models. Geophys Res Lett 39(L20):401. https://doi.org/10.1029/2012GL053839
Racah E, Beckham C, Maharaj T, Kahou SE, Prabhat, Pal C (2017) ExtremeWeather: a largescale climate dataset for semisupervised detection, localization, and understanding of extreme weather events. arXiv:1612.02095
Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
RamirezVillegas J, Challinor A, Thornton P, Jarvis A (2013) Implications of regional improvement in global climate models for agricultural impact research. Environ Res Lett 8(024):018. https://doi.org/10.1088/17489326/8/2/024018
Randall D, Wood R, Bony S, Colman R, Fichefet T, Fyfe J, Kattsov V, Pitman A, Shukla J, Srinivasan J, Ronald S, Sumi A, Taylor K (2007) Climate models and their evaluation. Cambridge University Press, Cambridge, pp 589–662
Reichler T, Kim J (2008) how well do coupled models simulate today's climate? Bull Am Meteorol Soc 89:303–311. https://doi.org/10.1175/BAMS893303
Reichstein M, CampsValls G, Stevens B, Jung M, Denzler J, Carvalhais N, Prabhat M (2019) Deep learning and process understanding for datadriven Earth system science. Nature 566:195–204. https://doi.org/10.1038/s4158601909121
Robin Y, Vrac M, Naveau P, Yiou P (2019) Multivariate stochastic bias corrections with optimal transport. Hydrol Earth Syst Sci 23(2):773–786. https://doi.org/10.5194/hess237732019
Rodrigues ER, Oliveira I, Cunha RLF, Netto MAS (2018) DeepDownscale: a deep learning strategy for highresolution weather forecast. In: 2018 IEEE 14th International Conference on eScience (eScience), pp 415–422, https://doi.org/10.1109/eScience.2018.00130
Roth K, Lucchi A, Nowozin S, Hofmann T (2017) Stabilizing training of generative adversarial networks through regularization. arXiv:1705.09367
Royer A, Bousmalis K, Gouws S, Bertsch F, Mosseri I, Cole F, Murphy K (2020) XGAN: unsupervised imagetoimage translation for manytomany mappings. Springer International Publishing, pp 33–49. https://doi.org/10.1007/9783030306717_3
Räty O, Räisänen J, Bosshard T, Donnelly C (2018) Intercomparison of univariate and joint bias correction methods in changing climate from a hydrological perspective. Climate 6:33. https://doi.org/10.3390/cli6020033
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training GANs. arXiv:1606.03498
Scher S, Messori G (2018) Predicting weather forecast uncertainty with machine learning. Q J R Meteorol Soc 144(717):2830–2841. https://doi.org/10.1002/qj.3410
Scher S, Messori G (2019) Weather and climate forecasting with neural networks: using general circulation models (GCMs) with different complexity as a study ground. Geosci Model Dev 12(7):2797–2809. https://doi.org/10.5194/gmd1227972019
Scher S, Peßenteiner S (2020) Technical note: temporal disaggregation of spatial rainfall fields with generative adversarial networks. Hydrol Earth Syst Sci 2020:1–23. https://doi.org/10.5194/hess2020464
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Shi X, Chen Z, Wang H, Yeung DY, Wong W, Woo W (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. arXiv:1506.04214
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(56):1929–1958
Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566. https://doi.org/10.1137/1035134
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1–9, https://doi.org/10.1109/CVPR.2015.7298594
Székely G, Rizzo M (2004) Testing for equal distributions in high dimension. InterStat 5:1249–1272
Székely G, Rizzo M (2013) Energy statistics: a class of statistics based on distances. J Stat Plan Inference 143:1249–1272. https://doi.org/10.1016/j.jspi.2013.03.018
Teutschbein C, Seibert J (2012) Bias correction of regional climate model simulations for hydrological climatechange impact studies: review and evaluation of different methods. J Hydrol 456:12–29. https://doi.org/10.1016/j.jhydrol.2012.05.052
Tong Y, Gao X, Han Z, Xu Y, Xu Y, Giorgi F (2020) Bias correction of temperature and precipitation over China for RCM simulations using the QM and QDM methods. Clim Dyn. https://doi.org/10.1007/s00382020054474
Tramblay Y, Ruelland D, Somot S, Bouaicha R, Servat E (2013) Highresolution MedCORDEX regional climate model simulations for hydrological impact studies: a first evaluation of the ALADINClimate model in Morocco. Hydrol Earth Syst Sci 17(10):3721–3739. https://doi.org/10.5194/hess1737212013
Van Loon A, Gleeson T, Clark J, van Dijk A, Stahl K, Hannaford J, Di Baldassarre G, Teuling A, Tallaksen L, Uijlenhoet R, Hannah D, Sheffield J, Svoboda M, Verbeiren B, Wagener T, Rangecroft S, Wanders N, Van Lanen H (2016) Drought in the anthropocene. Nat Geosci 9:89–91. https://doi.org/10.1038/ngeo2646
Vandal T, Kodra E, Ganguly S, Michaelis A, Nemani R, Ganguly AR (2017) DeepSD: generating high resolution climate change projections through single image superresolution. In: Proceedings of the 23rd ACM SIGKDD International Conference on knowledge discovery and data mining, pp 1663–1672, https://doi.org/10.1145/3097983.3098004
Vidal JP, Martin E, Franchistéguy L, Baillon M, Soubeyroux JM (2010) A 50year highresolution atmospheric reanalysis over France with the Safran system. Int J Climatol 30(11):1627–1644. https://doi.org/10.1002/joc.2003
Vigaud N, Vrac M, Caballero Y (2013) Probabilistic downscaling of GCM scenarios over southern India. Int J Climatol 33:1248–1263. https://doi.org/10.1002/joc.3509
Vorogushyn S, Bates PD, de Bruijn K, Castellarin A, Kreibich H, Priest S, Schröter K, Bagli S, Blöschl G, Domeneghetti A, Gouldby B, Klijn F, Lammersen R, Neal JC, Ridder N, Terink W, Viavattene C, Viglione A, Zanardo S, Merz B (2018) Evolutionary leap in largescale flood risk assessment needed. WIREs Water 5(2):e1266. https://doi.org/10.1002/wat2.1266
Vrac M (2018) Multivariate bias adjustment of highdimensional climate simulations: the rank resampling for distributions and dependences (R\(^2\)D\(^2\)) bias correction. Hydrol Earth Syst Sci 22(6):3175–3196. https://doi.org/10.5194/hess2231752018
Vrac M, Thao S (2020) R\(^2\)D\(^2\) v2.0: accounting for temporal dependences in multivariate bias correction via analogue ranks resampling. Geosci Model Dev 2020:1–29. https://doi.org/10.5194/gmd2020132
Vrac M, Drobinski P, Merlo A, Herrmann M, Lavaysse C, Li L, Somot S (2012) Dynamical and statistical downscaling of the French Mediterranean climate: uncertainty assessment. Nat Hazards Earth Syst Sci 12(9):2769–2784. https://doi.org/10.5194/nhess1227692012
Vrac M, Noël T, Vautard R (2016) Bias correction of precipitation through singularity stochastic removal: because occurrences matter. J Geophys Res Atmos 121:5237–5258. https://doi.org/10.1002/2015JD024511
Wang J, Liu Z, Foster I, Chang W, Kettimuthu R, Kotamarthi R (2021) Fast and accurate learned multiresolution dynamical downscaling for precipitation. arXiv:2101.06813
Wasko C, Sharma A, Westra S (2016) Reduced spatial extent of extreme storms at higher temperatures. Geophys Res Lett 43(8):4026–4032. https://doi.org/10.1002/2016GL068509
Wheeler T, von Braun J (2013) Climate change impacts on global food security. Science 341(6145):508–513. https://doi.org/10.1126/science.1239402
Wilcke RAI, Mendlik T, Gobiet A (2013) Multivariable error correction of regional climate models. Clim Change 120:871–887. https://doi.org/10.1007/s105840130845x
Wilks DS (2006) Statistical methods in the atmosphere science. Academic Press
Wu JL, Kashinath K, Albert A, Chirila D, Prabhat Xiao H (2020) Enforcing statistical constraints in generative adversarial networks for modeling chaotic dynamical systems. J Comput Phys 406(109):209. https://doi.org/10.1016/j.jcp.2019.109209
Xie Y, Franz E, Chu M, Thuerey N (2018) TempoGAN: a temporally coherent, volumetric GAN for superresolution fluid flow. ACM Trans Graph. https://doi.org/10.1145/3197517.3201304
Xu CY (1999) From GCMs to river flow: a review of downscaling methods and hydrologic modelling approaches. Prog Phys Geogr 23:229–249. https://doi.org/10.1177/030913339902300204
Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: unsupervised dual learning for imagetoimage translation. In: 2017 IEEE International Conference on computer vision (ICCV), pp 2868–2876, https://doi.org/10.1109/ICCV.2017.310
Yoo D, Kim N, Park S, Paek AS, Kweon IS (2016) Pixellevel domain transfer. arXiv:1603.07442
Zhu JY, Park T, Isola P, Efros AA (2017) unpaired imagetoimage translation using cycleconsistent adversarial networks. arXiv:1703.10593
Zscheischler J, Westra S, Hurk B, Seneviratne S, Ward P, Pitman A, AghaKouchak A, Bresch D, Leonard M, Wahl T, Zhang X (2018) Future climate risk from compound events. Nat Clim Change. https://doi.org/10.1038/s4155801801563
Zscheischler J, Fischer E, Lange S (2019) The effect of univariate bias adjustment on multivariate hazard estimates. Earth Syst Dyn 10:31–43. https://doi.org/10.5194/esd10312019
Acknowledgements
This work was granted access to the HPC resources of IDRIS under the allocation 20XX[AD011011646] made by GENCI. MV acknowledges support from the CoCliServ project, which is part of ERA4CS, an ERANET initiated by JPI Climate and cofunded by the European Union.
Funding
This research has been supported by the CoCliServ project, which is part of ERA4CS, an ERANET initiated by JPI Climate and cofunded by the European Union.
Author information
Affiliations
Contributions
MV had the initial idea of the study and its structure, which was enriched by all coauthors. BF made all computations and figures, with help from ST. BF wrote the first draft of the article, with inputs, corrections and additional writing contributions from MV and ST.
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Code availability
The code for MBCCycleGAN is publicly available at https://github.com/bastienfrancois/MBC_CycleGAN. The R package for R\(^{2}\)D\(^{2}\) is available at https://github.com/thaos/R2D2 (Vrac and Thao 2020). dOTC is publicly available at https://github.com/yrobink/SBCK (Robin et al. 2019).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file2 (MP4 4650 kb)
Supplementary file3 (MP4 3821 kb)
Appendices
Appendix A: Details on the MBCCycleGAN method
Let consider the correction of a random variable, denoted \(\mathbf{X}\) (e.g., biased climate simulations outputs) with respect to a reference random variable, denoted \(\mathbf{Y}\). In our study, \(\mathbf{X}\) and \(\mathbf{Y}\) live in dimension \(28 \times 28 = 784\) dimensions. We denote \(\mathbf{X} ^0\) and \(\mathbf{X} ^1\) the random variables to correct from climate simulations during the calibration and projection period, respectively. Similarly, \(\mathbf{Y} ^0\) is considered as the random variable of references for the calibration period. The goal of any BC methods is to infer future unobserved data \(\mathbf{Y} ^1\) from the reference variable \(\mathbf{Y} ^0\) during calibration, and the variables from model simulations for calibration (\(\mathbf{X} ^0\)) and projection (\(\mathbf{X} ^1\)) periods.
In practice, BC methods are applied to correct samples \((\mathbf{x} _{1}^0, \ldots , \mathbf{x} _{n}^0 )\) and \((\mathbf{x} _{1}^1, \ldots , \mathbf{x} _{n}^1 )\) from the random variables \(\mathbf{X} ^0\) and \(\mathbf{X} ^1\), with respect to a sample \((\mathbf{y} _{1}^0, \ldots , \mathbf{y} _{n}^0 )\) from the random variable \(\mathbf{Y} ^0\). For example, 1dbias corrections of \((\mathbf{x} _{1}^0, \ldots , \mathbf{x} _{n}^0 )\) and \((\mathbf{x} _{1}^1, \ldots , \mathbf{x} _{n}^1 )\) with the QQ method can be denoted \((\mathbf{qq} _{1}^0, \dots , \mathbf{qq} _{n}^0 )\) and \((\mathbf{qq} _{1}^1, \ldots , \mathbf{qq} _{n}^1 )\). As explained in Sect. 3, the CycleGAN approach within the MBCCycleGAN methodology is applied between 1dQQ outputs and references. Hence, two generators \(G_{\mathbf {QQ} \rightarrow \mathbf {Y}}\) and \(G_{\mathbf {Y} \rightarrow \mathbf {QQ}}\) are considered, as well as two discriminators \(D_{\mathbf {QQ}}\) and \(D_{\mathbf {Y}}\). The different steps constituting the MBCCycleGAN method are described in an algorithmic way as follows:
Appendix B: Details on the simple architecture of neural networks used in MBCCycleGAN
The simple neural network architectures used for the discriminators and generators constituting the MBCCycleGAN method in this study are described with more details in this appendix.
Appendix B.1: Architecture of the generators
As explained in Sect. 3.3.2, skip connections are used in the architecture of the generators to ease the training process. Skip connections permit to provide information to a given layer that comes not only from the direct previous layer, but also from other upstream convolution layers in the architecture. Skipping over layers permits to avoid vanishing gradients issues, which is a problem that can make the network hard to train. All layers except the first one have leaky rectified linear unit (leakyReLu) activation functions defined as: \(y = \left\{ \begin{array}{ll} x &{} \text{ if } x \ge 0, \\ \alpha x &{} \text{ otherwise, } \end{array} \right.\) with \(\alpha =0.2\). Dropout regularization, that refers to ignoring neurons chosen at random during training, is used after the second and third 2D convolutional layers to prevent from overfitting (e.g., Srivastava et al. 2014). The probability used for dropout is 0.4. A summary of the simple neural network architecture used for the generators in described below in Table 3.
Appendix B.2: Architecture of the discriminators
A summary of the simple neural network architecture used for the discriminators is described below in Table 4.
Appendix C: Methodology for the generation of IPSLbis
For the generation of IPSLbis data, a twostep procedure is developed to construct, from IPSL data, climate data that present marginal and spatial changes that are in line with those from references between the calibration and projection periods. In order to stay with comparable changes as those from LR SAFRAN, LR SAFRAN changes are reproduced. We recall that, for the calibration period, IPSL and IPSLbis data are strictly identical. The twostep procedure is only used to produce alternative climate data for the projection period.
Appendix C.1: Marginal changes with CDFt
The first step of the procedure consists in producing time series for the projection period of IPSLbis by taking into account marginal changes of LR SAFRAN with the 1dBC named CDFt (Vrac et al. 2012). Initially, CDFt is a version of univariate quantile mapping method designed to correct at each individual grid cell marginal properties of climate simulations outputs during the calibration and the projection period according to the data from the reference observed during calibration. CDFt, by defining a specific transfer function, has been conceived to take into account the potential simulated changes of univariate distributions from the calibration to the projection period in order to produce the adjusted data such that the marginal changes are in line with those from the simulations. While, traditionally, this quantilemapping approach is used to find, in a bias correction context, a mathematical transformation allowing to go from simulations to references, we here applied CDFt to go from “large scale” references (LR SAFRAN) to simulations for future periods. By proceeding this way, the produced time series are projected distributions in the domain of IPSL simulations that have been obtained while taking into account the potential evolution of CDFs of the LR SAFRAN dataset between the calibration and projection periods. By concatenating times series from IPSL for the calibration period and those obtained from the CDFt method for the projection period, new climate times series are obtained, presenting marginal distributions changes in line with those from references.
Appendix C.2: Spatial changes with a matrixrecorrelation technique
The second step consists in deriving a spatial dependence structure for the projection period such that spatial changes of LR SAFRAN are reproduced. To do so, we take advantage of a matrixrecorrelation technique used for the MBC method presented in Bárdossy and Pegram (2012) to impose to climate data a specific spatial dependence structure for the projection period. Our methodology is summarized in Table 5. It consists in first projecting individually each variable of both IPSL simulations and LR SAFRAN during calibration and projection periods to the univariate normal distribution with a Gaussian quantile mapping method. This “Gaussianization” step is particularly suited for variables with mixed distributions such as precipitation (composed of wet and dry events). Computing Pearson correlation matrices on such Gaussianized data instead of raw data permits to better describe its dependence structure. Thus, Pearson correlation matrices of the different Gaussianized data are computed. They are respectively denoted as \(C_{I, C}\), \(C_{I, P}\), \(C_{I, C}^{(bis)}\), \(C_{I, P}^{(bis)}\), \(C_{S, C}\), \(C_{S, P}\) for IPSL during calibration, IPSL during projection, IPSLbis during calibration, IPSLbis during projection, LR SAFRAN during calibration and LR SAFRAN during projection. Additionally, let \(r_{I, C}\), \(r_{I, P}\), \(r_{I, C}^{(bis)}\), \(r_{I, P}^{(bis)}\), \(r_{S, C}\), \(r_{S, P}\) denote one of their entry. Note that by construction, \(C_{I, C}\) is the same as \(C_{I, C}^{(bis)}\) and that \(C_{I, P}^{(bis)}\) is unknown. Assessing the changes of LR SAFRAN spatial correlations between calibration and projection periods is now required to derive the spatial dependence structure of IPSLbis for the projection period. A simple approach to determine \(r_{I, P}^{(bis)}\), the correlation of the Gaussianized data of IPSLbis for projection, would be to compute it based on the difference of correlations from Gaussianized LR SAFRAN data such as \(r_{I, P}^{(bis)} = r_{I, C} + r_{S, P}  r_{S, C}\). However, computing \(r_{I, P}^{(bis)}\) this way can lead to obtain correlation values that are out of range, i.e. being greater than 1 or less than 1, which is not appropriate.
From Bárdossy and Pegram (2012), given \(r_{I, C}\), \(r_{S, C}\), \(r_{S, P}\), one can derive \(r_{Ib_{P}}\) using FisherZ transformation (Fisher 1915) as following:
FisherZ transformation permits to transform a bounded random variable to another random variable that can be assumed to be Normal, and for which additive correction can be performed (see Mehrotra and Sharma (2019) for the derivation of Eq. 7). By deriving this way all the new correlation coefficients, the potential changes in correlations in the Gaussianized LR SAFRAN data are preserved and the Pearson correlation matrix for Gaussianized IPSLbis during the projection period is obtained.
Now that the Pearson correlation matrix, \(C_{I, P}^{(bis)}\), is computed, a combination of “decorrelation” and “recorrelation” steps using decompositions of correlation matrices through singular value decomposition (SVD, Beltrami 1873; Jordan 1874a, b; Stewart 1993) is applied on the Gaussianized data of IPSL during projection period, forcing its Pearson correlation matrix to be exactly the same as the Pearson correlation matrix, \(C_{I, P}^{(bis)}\). The new dependence structure for IPSLbis is obtained. Finally, a reordering of time series from CDFt outputs according to this new dependence structure is performed using the Schaake Shuffle method to obtain IPSLbis data for the projection period.
Appendix D: Spatial correlation changes analysis
We present a spatial changes analysis to provide a better picture of the properties of the climate data in terms of changes between the calibration and projection periods. As a reminder, IPSLbis data are generated using the twostep procedure described in Appendix 3 such that its marginal and dependence changes are in line with those from LR SAFRAN (and therefore SAFRAN) for the projection period. Fig. S3 displays scatterplots of differences between Spearman spatial correlations of temperature and precipitation evaluated for all pairwise combinations of sites, computed for the calibration (1979–2005) and the projection (2006–2016) period, respectively. Scatterplots compares differences of Spearman correlation with respect to those from LR SAFRAN. It permits one to visually verify if changes in the spatial dependence structure are in line to those from references at largescale. Using rank correlation here permits to measure in isolation the spatial dependence between two sites rid of their marginal properties. Figures for the analysis of marginal changes– in particular, mean and standard deviation changes—are also displayed in Figs. S4 and S5 for information purposes only. Results on univariate properties can be briefly summarized as such: changes in marginal properties from SAFRAN references (resp. IPSL model) are in agreement (resp. disagreement) with those from LR SAFRAN for both temperature and precipitation. For IPSLbis, the application of the CDFt method permits to obtain marginal changes for both temperature and precipitation similar to those from LR SAFRAN. Concerning spatial properties, as expected, changes in spatial correlations from SAFRAN references are (partially) in agreement with those from LR SAFRAN for both temperature (Fig. S3a) and precipitation (Fig. S3d). Concerning changes in the IPSL simulations, simulated changes of spatial correlations for temperature (Fig. S3b) are globally in line with those from LR SAFRAN, highlighting the ability of the climate model to provide appropriate temperature changes in spatial structure between the calibration and the projection periods. However, conclusions are quite different for precipitation, for which simulated changes are not in agreement at all with those from the reference at large scale (Fig. S3e). Hence, IPSL model presents discrepancy of changes for precipitation with respect to LR SAFRAN (and thus, SAFRAN references), that could potentially affect the quality of the correction depending on how MBCCycleGAN accounts for these changes in its correction procedure. Concerning the results for IPSLbis, changes for both temperature (Fig. S3c) and precipitation (Fig. S3f) are similar to those from LR SAFRAN, confirming that the twostep methodology used to impose to IPSL specific changes of spatial correlations is appropriate here.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
François, B., Thao, S. & Vrac, M. Adjusting spatial dependence of climate model outputs with cycleconsistent adversarial networks. Clim Dyn (2021). https://doi.org/10.1007/s00382021058698
Received:
Accepted:
Published:
Keywords
 Bias correction
 Spatial dependence
 Postprocessing
 Climate simulations
 Generative adversarial networks
 Model output statistics