1 Introduction

Global warming caused by the increasing concentration of greenhouse gases shows an evident influence on climates past and present. This may in turn further impact hydrology and the management of water resources such as in agriculture, hydraulic structures, forestry and human health, etc. Therefore, introducing appropriate climate change signals into the impact models is becoming a significant issue.

In the context of hydrology, a conventional way to analyze the hydrological response to the climate change is to define scenarios for change in hydrological input from the output of general circulation models (GCMs). GCMs are one of the most useful products from climate-related studies. They attempt to represent past and present climate situations through consideration of the internal and external driving forces and feedbacks in the climate system and to estimate the enhanced greenhouse effect and the consequences for the global climate in the future. The GCMs are powerful to incorporate and represent complex processes of the global system at continental and/or hemispheric spatial scales and monthly temporal scales. They are, however, still weak in representing local subgrid-scale features and dynamics (Wigley et al. 1990; Carter et al. 1994). This weakness is mainly caused by incomplete understanding of the complexity of mesoscale atmospheric processes occurring at relatively small scales such as cloud formation, moist convection, and others (Risbey and Stone 1996). Apart from that, due to high computational cost, global numerical models solve only the primary energetic motions that are not enough for those motions occurring at the order of several kilometers in scale (Hack 1994). At the present level, those parameterized variables only can present the large-scale averages, but not the real local features. Running a regional climate model (RCM) with boundary conditions from an overlying GCM is one of dynamical downscaling to transfer the climate change signal from global scale to regional scale. Compared to GCMs, RCMs normally outperform GCMs in representing the local phenomena. It may be partly caused by including extra information at local such as topography, intensive studies of subscale processes such as new land surface schemes, increased resolution, and computer power (Kjellström et al. 2006). However, RCMs share the same weakness as the GCMs arising from coarse spatial resolution and the derived difficulties. There are still a number of challenges remaining so that it may take quite a long time for RCMs to describe adequately small-scale processes on a reasonable scale.

Statistical downscaling is an another alternative to improve GCM simulations. It transfers atmospheric information by an established statistical relationship among one or several atmospheric large-scale variables and local variables. Compared to RCMs, statistical downscaling is normally relatively simple, easier to implement, and require lower computational cost. However, there are drawbacks to their application. One disadvantage relates to the basic assumption that an identified statistical relationship from a present climate remains unchanged in a future climate. This is by no means guaranteed. However, it can be somehow reasonably solved by using enough long historical records to cover all possible atmospheric phenomenon (von Storch et al. 1993; Rummukainen 1997).

A number of methods have been developed. They can be classified as analog, regression-based, or circulation-based approaches (Zorita and von Storch 1999; Wilby and Wigley 1994). As one important family, circulation-based approach has already been commonly adopted in the statistical downscaling field (Hay et al. 1991; Wilby and Wigley 1994). It uses several representative patterns that contain rich information about the atmosphere to explain events recorded in the long-term historical observations. The patterns are conducted on either professional knowledge of atmospheric motions (subjective classification) or statistical characteristics derived from the observations (objective classification) and are accordingly called as subjective circulation pattern (CP) and objective CP. Different from other approaches, as a precondition, each pattern describes a specific climate condition using multigrid points rather than a single grid point. The method is thus able to comprehensively capture overall properties of local climate situations. The circulation-based approach does convey information from large-scale atmosphere to local regions and provides useful products. However, the continuous properties of the climate system are not properly remained due to the fact that circulation patterns are discrete variable rather than continuous ones. Stéhlík and Bárdossy noticed this weakness in their work (Stéhlík and Bárdossy 2002). This study not only showed satisfactory agreement between observed and downscaled precipitation but also highlighted the model’s weakness in capturing the interannual variability. They suggested the possible improvement by including additional variables, for instance, measurements of humidity, vertical stability, etc.

In this paper, a new statistical downscaling model for daily precipitation is to be introduced. It is able to simultaneously generate daily precipitation time series at multiple locations. In the first phase of the work, a continuous atmospheric variable was identified and incorporated in to the model. It was included to provide extra information to precipitation amount in addition to the classified circulation patterns. The developed model was later applied to the Rhine River basin (Yeshewatesfa and Bárdossy 2008) and three regions in China (Wetterhall et al. 2006) and compared to other statistical downscaling methods. Both studies show clear improvement in capturing variability of precipitation between year to year and season to season. In the second phase of the work, the model was further developed by logistic regression for precipitation probability and two other distributions for the precipitation amount. The new model shows its ability to accurately reproduce complex precipitation processes and, at the same time, its flexibility to fit precipitation to other appropriate distributions.

This paper will mainly focus on work in the second phase. For the sake of completeness, all related model setups will be introduced one after another. The details about methodology and evaluation procedure can be found in Section 3. The models’ performance is demonstrated in Section 4 with a case study carried out in Rhine River basin. The paper finishes with conclusions in Section 5.

2 Case study area and data set

2.1 Case study area

The Rhine River basin is chosen as a study area to demonstrate the model application (see Fig. 1). The River Rhine originates from Swiss Alps, passing through western Europe and finally flows into the North Sea. It drains around 185,000 km2 and consists of four main subbasins: the alpine region in Switzerland, the Neckar basin, the Main basin, and the Mosel basin in parts of Germany and France. In this study, the focus will be on the mentioned subbasins, excluding the one in Switzerland.

Fig. 1
figure 1

Distribution of precipitation stations in the Rhine River basin. Solid lines are the boundaries of the NCEP grid box

The Rhine River basin is very densely populated (around 50 million inhabitants in total). Its temperate climate conditions and abundant water resource were crucial for the local socioeconomic development in the agriculture-based economy and industry-based economy in the past decades. The local synoptic climate is characterized by cool winters and warm summers. Precipitation falls all the year around and shows weak seasonal variability. Its amount varies from 500 to 1,800 mm. The region has been affected by several serious flooding events in the last century, e.g., the floods of 1925, 1926, 1955, 1993, and 1995 led to huge economic losses and a number of casualties. Most of the large-area flooding occurred in the late winter and early spring, caused by heavy rainfall combined with snow melt.

From the second half of the last century, the local observations indicate more extreme trends in the meteorological variables. The phenomenon is proved by the extreme value analysis conducted by Hundecha and Bárdossy. In their study, local climate tends to be more humid and warmer in winter and warmer in summer; extreme daily maximum and minimum temperatures are found to be increased, especially the extreme minimum temperature; extreme heavy precipitation becomes more extreme, in terms of both magnitude and frequency, in winter and transition seasons (Hundecha and Bárdossy 2005).

Therefore, the focus of this research is mainly the winter seasons. Whether the model is able to represent the variability of winter rainfall, especially under extreme climate conditions, is of great importance. The results for summer seasons are also included in this paper, which aims to evaluate the models’ capability of capturing the characteristics of precipitation governed by different climate mechanisms.

2.2 Predictand

The predictand of the downscaling process is multisite precipitation at daily scale that is required by most hydrological rainfall–runoff models. Particularly, their behavior under extreme conditions is of great importance. The daily precipitation over long time is therefore used for model calibration and validation.

In the German part of the River Rhine basin, there are hundreds of stations. Of the whole data set, 100 “evenly” distributed meteorological stations with well-controlled records for 43 years are selected. The time period slice starts from the year 1958 and ends in the year 2000. These hundred stations are partitioned into three groups, as exemplified in Fig. 1.

2.3 Predictor

The predictors for the downscaling procedure include mean sea level pressure (MSLP), geostrophic wind field (U, V), specific humidity (Sh), and a new combined term, moisture flux (MF). All the predictors are derived from National Center for Environmental Prediction (NCEP) re-analysis data at a spatial resolution of 2.5°×2.5°, provided by the NCEP in the USA.

As a precondition for circulation-based downscaling approach, circulation patterns are always required. In this work, the fuzzy rule-based classification scheme was used to classify the circulation patterns. It is a classification scheme based on the concept of fuzzy sets (Zadeh 1965), using imprecise statements to describe the climate system. The classification scheme for CPs follows four steps: the transformation of large-scale data, the definition of fuzzy rules, the optimization of fuzzy rules, and the classification of circulation patterns.

Anomalies of normalized MSLP serve as a predictor, and each CP is described by a fuzzy rule k represented by a vector V(k) = (v(1)k, v(2)k, ..., v(i)k, i = 1, n). Here, n is the number of locations (grid points) and k is the index for the CP. v(i)k are the indices of membership functions corresponding to the selected locations. Based on the membership functions, membership grades for the anomalies are calculated for a given time t and a given location i. These membership grades are combined to calculate the degree of fulfillment (DOF) of each rule. As a result, the rule k with the highest DOF is selected as CP for a specific day. A detailed description of the methodology can be found in Bárdossy et al. (2002).

The relationship between circulation patterns and precipitation has been highlighted by numerous studies. However, circulation patterns were found not efficient enough in capturing continuous variation of climate situations. To tackle this weakness, large-scale variables, daily geostrophic wind, and humidity variables are taken into consideration. Geostrophic wind is a daily airflow index. It is described by a vector containing zonal U and meridional V components of the wind field. A positive value of U indicates that airflow moves from west to east and a positive value of V from south to north. Humidity is an important variable that represents concentration of water vapor in the air. It can be expressed as absolute humidity (Ah), specific humidity (Sh), or relative humidity (Rh). Geostrophic wind and humidity-related variables are quite often considered respectively as potential predictors in atmospheric studies (Charles et al. 1999; Murphy 2000; Linderson et al. 2004). In their work, relative humidity is taken as one of the humidity measures to scale down precipitation. It has proven to be useful for describing rainfall occurrence, but has very limited influence on rainfall amount. Apart from relative humidity, specific humidity, independent of temperature, also shows its use to downscale rainfall amount on rainy days (Beckmann and Buishand 2002).

In this work, specific humidity is coupled with wind speed in zonal and/or meridional directions to form a new term, MF. The aim is to use this new term to describe the amount of water vapor conveyed by wind fields to the study area, which is considered to be important for both rainfall occurrence and its amount. The numerical expression of moisture flux is shown as below:

$${\text{MF} = \overline{\text{Geotrophic\ wind}}\times \text{Sh}} $$
(1)

3 Methodology

3.1 Conditional multivariate precipitation downscaling model

The precipitation is distinctly asymmetrical with positive skewness and is physically constrained to be nonnegative. Besides, it is also characterized with its temporal intermittence in occurrence. Conventionally, daily rainfall probability is treated as a function of the weather state on the previous day or on the current day such as in the Markov chain model (Richardson 1981) and semi-empirical model (Semenov and Barrow 1997), or dependent also on the CP of the day (Bárdossy and Plate 1992). Daily rainfall amount is described by a certain type of statistical probability distribution, for instance, exponential distribution (Todorovic and Woolhiser 1975), the gamma distribution (Groisman 1999), the mixed exponential distribution (Woolhiser and Pegram 1979), and the transformed normal distribution (Bárdossy and Plate 1992). Here, the circulation patterns are not the only predictor to determine occurrence and amount of rainfall. Moisture flux is selected as an additional predictor. In addition, three probability distributions are implemented. They are skewed normal distribution, exponential distribution, and gamma distribution.

For the model using skewed normal distribution, its rainfall occurrence was determined only by the governing CP, while for the modeling using exponential and gamma distribution, a logistic regression was used to link daily moisture flux and rainfall occurrence conditioned to a CP. Logistic regression is a technique to deal with binary predictands, in this case, a day being wet or dry. It uses binary predictands, but fits regression parameters to a nonlinear equation. By taking the moisture flux into account, the probability of rainfall occurrence can be differentiated, though under the same governing CP.

$$ {y = \frac{1}{1+\exp\big(b_{\!0}+b_{\!1}x\big)}} $$
(2)

In Eq. 2, moisture flux is an additional predictor, x, and rainfall probability, y, a binary predictand. b 0 and b 1 are two regression parameters. The probability of rainfall occurrence obtained by logistic regression always lie between zero and one.

Among the precipitation generators, a commonly used continuous distribution for rainfall amount is the gamma distribution, whose probability density function (PDF) is expressed as:

$${f(x) =\frac{{(x/\beta)^{\alpha-1}\exp(-x/\beta)}}{\beta\Gamma(\alpha)}} \ \ \ \ \ \ \ \ x,\alpha,\beta>0 $$
(3)

It is a two-parameter distribution, with α, the shape parameter, and β, the scale parameter. The density function of the gamma distribution is not analytically integrable. It has to be obtained by computing an approximation of its cumulative distribution function by application of an incomplete gamma function.

The PDF of the gamma distribution possesses a variety of shapes depending on its shape parameter. For the case of α = 1, the gamma distribution is called an exponential distribution, with a PDF that takes the form of:

$$ f(x) =\frac{1}{\mu}\exp\left(\frac{-x}{\mu}\right) \ \ \ \ \ \ \ \ x, \mu>0 $$
(4)

The exponential distribution has only one parameter, μ, the expectation of the precipitation amount. When α approaches a very large value, the shape of the gamma distribution resembles a normal distribution.

$$ f(x) =\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^{2}}{2\sigma^{2}}\right) \ \ \ \ \ \ \ \ \sigma>0 $$
(5)

The normal distribution has many advantages over other distributions. Its favorable characteristics make it popular in many aspects of higher-order multivariate analysis, including representation of precipitation, even though precipitation is asymmetrically distributed. The Box–Cox power transformation is always required to correct the skewness of precipitation to mathematically fit the normal distribution.

To couple with moisture flux, the aforementioned distributions that propose linear relationships between expectation of precipitation amount and identified moisture flux are modified accordingly. As a whole, a set of conditional precipitation downscaling models can be presented by the equations below. They are all dependent on the governing CP.

$${Z(t,u) =\left\{ \begin{array}{ll} 0 & \textrm{if } W(t,u)\leq 0\\[4pt] G(t,u) & \textrm{else} \end{array} \right.} $$
(6)
$${G(t,u) =\left\{ \begin{array}{lll} W(t,u)^{\beta} \\[4pt] \textrm{for the truncated normal distribution}\\[4pt] F^{-1}\left.\left[\left(\Phi\left(\frac{W(t,u)-\mu(t,u)}{\sigma(t,u)}\right)-\Phi\left(-\frac{\mu(t,u)}{\sigma(t,u)}\right)\right)\right/\right.\\[10pt] \left.\left(1-\Phi\left(\frac{-\mu(t,u)}{\sigma(t,u)}\right)\right)\right] \\[4pt] \textrm{for exponential or gamma distribution} \end{array}\right.} $$
(7)

Here,

Z(t,u):

Daily precipitation amount at location u and time t

W(t,u):

Normal random variable

G(t,u):

Function for generating daily precipitation amount

F − 1():

Inverse of cumulative distribution function (CDF) of exponential distribution or gamma distribution

Φ():

Inverse of CDF of normal distribution

μ(t,u):

Expectation of random variable W(t,u)

σ(t,u):

Standard deviation of random variable W(t,u)

β :

Transformation exponent relating G(t,u) to F  − 1(W(t,u)); If truncated, normal distribution is applied

A dependence between daily precipitation and MF is assumed and verified by a preliminary analysis for the study area. The parameters of the distribution are presented as μ in addition to the CP-dependent μ 0. Its annual cycle is approximately described by a Fourier series (Eqs. 8 and 9):

$$ {\mu(t,u)=\mu_{0}\big(t^{\ast},u\big)+a\times MF(t,u)} $$
(8)
$$\begin{array}{rll} \mu_{0}(t^{\ast},u)&=&\frac{a_{0}(i,u)}{2}+\sum\limits_{k=1}^{K}\left(a_{k}(i,u)\cos\big(kwt^*\big)\right.\\&&+\left.b_{\!k}(i,u)\sin\big(kwt^*\big)\right) \end{array}$$
(9)

where t* stands for the Julian day corresponding to every actual day, a describes the relationship between expectation of precipitation and daily moisture flux, \(\mu_{0}(t^*, u)\) is the expectation of precipitation on the Julian day at the location u. a k and b k are the coefficients of the harmonics of the Fourier series conditioned to CP pattern i. According to harmonic analysis, the Fourier approximation is able to be identical to observed time series when (t* − 1)/2 harmonics are introduced. Normally, the first three harmonics are enough.

The spatial structure of rainfall is presented by a spatial covariance structure, which takes spatial covariance, \(C_{i}(t^*)\), and autocorrelation, r(t*), into account. With the introduction of random numbers ψ(t), precipitation can be generated at multiple sites day by day:

$$ W\left(t,u\right)=r\left(t^{\ast}\right)W\left(t-1,u\right)+C_{i}\left(t^{\ast}\right)\psi\left(t\right) $$
(10)

3.2 Parameter estimation

3.2.1 Parameter for rainfall probability model

The rainfall probability for each station is estimated with the help of influencing moisture flux and governing CP using logistic regression. Logistic regression contains two parameters, b 0 and b 1. They are estimated using the maximum likelihood estimator (MLE) in this work.

MLE is an approach to estimate the distribution parameters apart from moments method. It is a method that finds the value of parameters that maximize the known likelihood.

The log-likelihood of the logistic regression model is demonstrated with Eq. 11:

$$ {L(\textbf{b})=\sum\limits_{i=1}^{N}x_{i}^Tb\!y_{i}-\sum\limits_{i=1}^{N}n_{i}\log\left(1+exp\left(x_{i}^T\textbf{b}\right)\right)} $$
(11)

The estimated parameters, b 0 and b 1, are used to determine the rainfall probability for a day with occurring CP and moisture flux.

3.2.2 Parameter for the rainfall amount model

Moment method is conventionally used to estimate distribution’s parameters. However, it may not be suitable for all the distributions and variables. Thom, for instance, pointed out that the moment estimator for the gamma distribution performs relatively well for larger values of α, but not for small values of α (Thom 1958). Besides this, the particular characteristics of precipitation makes moment estimator inefficient at capturing its discrete continuous process. Therefore, MLE is used instead.

Using MLE, probabilities for the dry days are described using a cumulative distribution, while probabilities for the wet days are described with a density distribution. In the following expressions, Φ and φ denote the cumulative distribution and the density function of the standard normal distribution. μ 0 stands for the expectation conditioned to a particular CP; a represents the dependence between the daily moisture flux and the precipitation under the influence of a certain CP; MF is the daily moisture flux produced using NCEP re-analysis data; Z 0 is the threshold value for rainfall event.

For the skewed normal distribution, MLE is expressed as:

$$\begin{array}{lll} && L^n\big(\mu_{0}, \sigma, a\big)\\ &&=\sum\limits_{Z(t,u)\leq Z_{0}}\ln\Phi\left(\frac{-\big(\mu_{0}\big(t^{\ast},u\big)+a\times \text{MF}\big)}{\sigma\big(t^{\ast},u\big)}\right) \\ &&\phantom{=} +\ \sum\limits_{Z(t,u)> Z_{0}}\ln\varphi\left(\!\frac{Z(t,u)^{1/\beta}-\big(\mu_{0}\big(t^{\ast},u\big)+a\times \text{MF}\big)}{\sigma\big(t^{\ast},u\big)}\!\right)\end{array}$$
(12)

To apply MLE for finding the parameters of the exponential distribution, the function has to be reformatted with a parameter λ, which is the inverse of the expected precipitation.

$$\begin{array}{rll} L^e(\lambda)&=&\sum\limits_{Z(t,u)\leq Z_{0}}\left(\lambda(t,u)\times x\right)\\ &&+ \sum\limits_{Z(t,u)> Z_{0}}\left(\ln\lambda(t,u)-\lambda(t,u) \times x\right) \end{array}$$
(13)
$$ \lambda(t,u) = \frac{1}{\mu_{0}\big(t^{\ast},u\big)+a\times MF} $$
(14)

When the gamma distribution is applied, the equation adopts a more complicated form.

$$\begin{array} {lll} &&{\kern-6pt} L^g\big(\alpha, \eta,\mu,a\big)\notag\\ &&=\sum\limits_{Z(t,u)\leq Z_{0}}\left(\!\ln\gamma\left(\alpha, \frac{\alpha \times Z(t,u)}{\mu\big(t^{\ast},u\big)}\!\right)-\ln\Gamma(\alpha)\!\right) \\ &&\phantom{=} +\! \sum\limits_{Z(t,u)> Z_{0}}\left(\!\ln\left(\frac{Z(t,u)}{\eta(t,u)}\!\right)^{\alpha-1}\!-\!\frac{Z(t,u)}{\eta(t,u)}\!-\!\ln\left(\frac{\eta(t,u)}{\Gamma(\alpha)}\!\right)\right)\notag\!\!\!\end{array}$$
(15)

where

α :

Shape parameter of the gamma distribution

η(t,u):

Location parameter of the gamma distribution

μ(t ∗ ,u):

Parameter including influence of daily moisture flux into the distribution parameter \(\mu(t^{\ast},u) =\mu_{0}(t^{\ast},u)\) + a×MF(t,u)

The integral of the gamma density function cannot be found analytically. It must be solved by computing approximations to the CDF or from tabulated probabilities. No matter which option is selected, the variable has to be rescaled to follow the standardized gamma distribution. After rescaling, the standard variate is dimensionless, and the shape parameter still remains the same. The cumulative probabilities for the standard gamma distribution can be calculated through a mathematical function, an incomplete gamma function Γ.

3.3 Evaluation procedure

Evaluation for CP classification

Circulation patterns were classified based on the daily anomaly of MSLP over central Europe covering the study area. Each day is assigned to a particular CP. Whether the classified CPs are able to distinguish the weather state in both normal and extreme rainfall events is considered as a criterion.

I w is a wetness index, shown in Eq. 16. It is an overall index calculated for each CP for each season to identify dependence between the local rainfall events and the governing CPs.

$$ \label{eq:wetness} I_{w}=\frac{R_{i}}{N_{i}} $$
(16)

where

i:

Number of CP

R i :

Rainfall contribution in percent

N i :

CP occurrence/frequency in percent

It describes the relative contribution of a certain CP to the regional rainfall amount. The wetness index can take values of 1, <1, and >1 indicating normal, dry and wet conditions, respectively. The combination of low-frequency occurrence and large amount of rainfall leads to I w  > 1, and the combination of high-frequency occurrence but small amount leads to I w  < 1. The larger the wetness, the wetter the CP is and vice versa. In addition, several statistic indices are calculated to analyze the performance of classified CPs for explaining the variability of precipitation in the given study area from CP to CP as well. They are CP frequency, Prec mean, and Prec 90 (Table 1).

Table 1 Diagnostics of classified circulation patterns

Evaluation for model performance

The annual cycle of each station is reproduced to ensure that the model is capable of capturing the annual variability of precipitation. In addition to that, the indices representing the statistics of daily precipitation with consideration of both precipitation occurrence (90N, 90T) and precipitation amount (P90, Pav, R5D) are selected; together with those indices reflecting average precipitation (SDI, CDD, L dd , L ww , P dd , P ww ). Table 2 summarizes the applied indices. The full descriptions of the indices can be obtained from the web site of the EU STARDEX project (http://www.cru.uea.ac.uk/cru/projects/stardex/ ).

Table 2 Diagnostics of daily precipitation

The diagnostics are calculated seasonally over all stations. The interannual variability is analyzed by the correlation between indices calculated from observed and downscaled daily precipitation time series.

4 Results and discussions

4.1 Circulation patterns

The circulation patterns for the Rhine River basin were classified within the STARDEX project (Bárdossy and Hundecha 2003). They were generated using anomalies of MSLP covering the central European window ranging from 65° N–15° W to 35° N–25° E and optimized against calculated daily discharge differences for the Moselle catchment. Thirteen CPs are classified. CP01 to CP12 are distinct CPs that explain specific large-scale circulation. CP13 is the one that does not belong to any of representative CP, and it is therefore considered as a unclassified CP.

The importances of individual CP to rainfall events at each station are evaluated using wetness index and statistical analysis introduced in Section 3.3. Figure 2 shows the wetness index calculated over all the stations for every half year, summer half year (March–August) and winter half year (September–February). CP04, CP10, and CP11 can be clearly found much wetter in comparison with other CPs, while CP05 and CP08 are much drier. Elsewhere, as can be seen from the statistics of single CP tabulated in Tables 3 and 4, two extreme CPs, CP05 and CP11, behave conversely from each other in both winter (December–January–February) and summer (June–July–August) seasons: In winter, 35.5% of the extreme rainfall events are caused by CP11 despite its low occurrence of 8.8%; however, with the occurrence of 14.9%, only 2.86% of the total extreme rainfall events were observed under the impact of CP05.

Fig. 2
figure 2

Wetness index of the CPs in summer and winter for the period from 1960 to 1978 and 1994 to 2000

Table 3 The contribution of CPs to the frequency and magnitude of precipitation in winter
Table 4 The contribution of CPs to the frequency and magnitude of precipitation in summer

A consensus between statistic analysis and climatology is also be reached. From the anomalies map of CP11 (Fig. 3), a cyclone can be identified intuitively. A large area of low pressure is centered over the North Sea and influences most parts of western and central Europe. The pressure gradient decreases from the high-pressure region to the low-pressure region; wind flow around the cyclone is moving counterclockwise bringing huge amounts of moisture from the northern Atlantic to Central Europe. That explains why heavy rainfall, especially the large area rainfall, is always produced with the occurrence of this particular CP. In contrast to CP11, the pressure map of CP05 is a typical anticyclone. The high pressure zone above western Europe indicates the negative dependence between the occurrences of CP05 and the local rainfall events.

Fig. 3
figure 3

Driest and wettest CP classified for the River Rhine basin

As a conclusion, the classified CP set captures large-scale information quite well which makes it reliable enough to apply it as a precondition for downscale precipitation in this study area.

4.2 Moisture flux

As mentioned in Section 2.3, moisture flux is used as an additional predictor in this study. It is expected to provide extra information to the given CPs and thereafter to enhance the original circulation-based downscaling model. Moisture flux and wind direction (U and V) at different pressure level, ranging from 850 to 500 hPa, were in use. The stations were divided into three groups (Fig. 1). For simplicity, the gridded moisture flux that has the highest correlation with all stations in the group is subsequently used as the best predictor for target variable, precipitation. Individual gridded moisture flux can be chosen for each station as well, which may reasonably increase dependence as a whole.

Firstly, whole sets of zonal and net moisture flux were compared to daily precipitation over the whole year to identify their dependence in general. The results summarized in Table 5 indicate that the westerly moisture flux (zonal moisture flux) has a greater impact on local rainfall, which is also reflected by the pressure map of the circulation patterns. Including meridional moisture flux deteriorates the dependence. In addition, the moisture flux impacts more on the coastal regions (group 1 and group 3) than on the inland region (group 2). Compared with other pressure levels, moisture flux at 700 hPa pressure level shows the strongest dependence on local rainfall.

Table 5 The correlation between daily precipitation and daily flux at different pressure levels

Secondly, identified westerly moisture flux was coupled with the three other linear regression models, MF, MF + CP, and MF + CP + AC. Model MF generates precipitation as a function of daily moisture flux without considering the governing CP. Model MF + CP uses CP as a condition to generate precipitation with the daily moisture flux. Model MF + CP + AC takes both seasonal variation of moisture flux and governing CP into account. The three simple models were all calibrated using the time periods from the year 1960–1978 and 1994–2000 and validated using data from the year 1979 to 1993. The correlations between models’ outputs and precipitation time series from observation were calculated for different seasons and their results for the validation period are tabulated. As seen from Tables 6 and 7, there is considerable dependence between daily moisture flux and precipitation varying from season to season. The highest correlation of 0.59 is reached in winter; the lowest correlation of 0.25 appears in summer. Improvement is noticed when moisture flux is treated as a predictor conditioned to the governing circulation patterns, instead of being treated as the most important predictor, which is in consensus with other related studies. Model MF + CP + AC reaches the highest correlation among different combinations of moisture flux, annual cycle, and governing CP. It indicates that the inclusion of moisture flux may help generate reasonable rainfall amount when coupled in relation to both CP and seasonal variation.

Table 6 Averaged correlation coefficients between observed and simulated daily precipitation generated by the three mentioned models at selected 100 stations (see Fig. 1; winter and summer)
Table 7 Averaged correlation coefficients between observed and simulated daily precipitation generated by the three mentioned models at selected 100 stations (see Fig. 1; spring and autumn)

4.3 Probability of precipitation

The logistic regression is used to describe the rainfall probability conditioned to the classified circulation patterns and moisture flux. The dependence between the moisture flux and rainfall probability is represented by a monotonic increasing curve that displays graphically the probability of a day being wet or dry with a given moisture flux under a particular CP.

Figure 4 illustrates the dependence between the probability of rainfall and the corresponding moisture flux under the impact of CPs at station GERMERSHEIM in southern Germany. In the figure, the dots represent the dependence under the wet CP–CP11; the diamonds for the same analysis under the dry CP–CP05. The dashes describe the dependence without consideration of CP impact.

Fig. 4
figure 4

Rainfall probability for the station GERMERSHEIM conditioned to CPs and moisture flux (MF)

As can be seen in the figure, the classified circulation patterns successfully differentiate the behavior of rainfall under the impact of moisture flux. Rainfall probability increases with the increase of moisture flux. Especially, there is a quite low probability of rainfall occurrence with the appearance of negative moisture flux that denotes the eastward flux. It proves again the assumption that the westerly moisture flux is a dominant factor for the Rhine River basin. It is worthy to note that the wet CP always induces higher rainfall probability compared with the dry one under the same amount of moisture flux.

In order to see whether the difference between estimated values and observed values is statistically significant, a statistical test has to be performed. A confidence level is used to construct intervals with respect to a specified probability of 95%. The probability of rainfall can be simply considered as the frequency of the weather state being wet. The weather state is either wet or dry, which exactly fits to the parametric test of the binomial distribution. The observed weather states are shuffled to calculate the observed frequency of rainfall for each moisture flux interval conditioned to each CP. The binomial distribution is then applied to construct its corresponding intervals at 95% confidence level.

In Fig. 5, the probability of rainfall calculated from both observations and model, together with a confidence level of 95%, is presented. The observed frequency of rainfall is shown as a discrete variable and calculated as a continuous variable. It can be noticed that the confidence intervals constructed around the observed frequencies is quite narrow except at the limits of extremely high and low moisture flux. The calculated rainfall probabilities are consistent with those from the observations. They are all falling within the intervals bounded by the confidence level of 95%. The statistical test indicates that the difference between the estimated values and observed values is not statistically significant at the 5% level. This result is statistically acceptable.

Fig. 5
figure 5

Rainfall probabilities calculated from the observations and logistic regression for CP11 (diamonds modeled rainfall probability; squares observed rainfall probability; dashes confidence level of 95%)

The logistic regression proved useful for integrating nonlinear dependence among CPs, moisture flux, and rainfall probability into one expression. Consequently, the daily rainfall probability conditioned to an individual CP is not constant anymore but varies together with daily moisture flux, which is conducive in providing more detailed information for generating daily rainfall time series.

4.4 Validation of the model

The downscaling models with different setups were implemented. They were calibrated using historical records from the periods 1958–1978 and 1994–2000 and validated with records of the period from 1979 to 1993. Mean monthly simulated precipitation was compared to that of observed precipitation to ensure the agreement in intraannual variation and precipitation amount in a year. Several exemplar stations located in Mosel, Neckar, and Main subbasins were selected. The bias and Pearson correlation between the observation and model’s outputs was used as one of the measurements for the models’ performances. Apart from that, a number of precipitation indices were calculated seasonally from both simulated and observed precipitation. Their rank correlations were used to evaluate how good the model is to capture interannual variability of average and extreme precipitation.

In general, all the models are able to capture the annual variability of rainfall at the station location no matter whether the moisture flux is considered or not. As shown in Fig. 6, no model performs predominantly better compared with the other models. The model using the skewed normal distribution coupled with moisture flux causes the least bias of −2.4 mm/month during the winter half year and −2.3 mm/month in the summer half year, while the remaining models produced larger deficits over the whole year. For the model using exponential distribution with moisture flux, the average correlation between day-to-day precipitation reaches the highest value, though the resultant bias is much higher compared to the one using skewed normal distribution with moisture flux (see Tables 8 and 9).

Fig. 6
figure 6

Annual cycle of monthly precipitation at the selected stations in the Rhein River basin (1979–1993)

Table 8 Correlation between monthly precipitations derived on observed and simulated daily precipitation (1979–1993)
Table 9 Bias between monthly precipitation derived from the observation records and simulation result over the whole study area (1979–1993)

Figures 7 and 8 show the rank correlation calculated from observed and simulated precipitation over all stations. The calculation was carried out for each season. Here, the results for the selected indices from winters (December–January–February) and summers (June–July–August) are presented as a bar chart. Each bar represents the result from a single model setup. The first and second model setups stand for the performance derived from generators using skewed normal distribution, the former proceeding one with no account of moisture flux and the latter one with. The rainfall probability is controlled by the governing CP. The last two models represent the performance of the generator using the exponential distribution and the gamma distribution, respectively. Both of them are set up to consider moisture flux. The weather state is controlled by both CP and moisture flux.

Fig. 7
figure 7

Comparison between the performances of models with and without moisture flux (MF) in summer (June–July–August). Definition of daily precipitation diagnostics acronyms are given in Table 1

Fig. 8
figure 8

Comparison between the performances of models with and without moisture flux (MF) in winter (December–January–February). Definition of daily precipitation diagnostics acronyms are given in Table 1

As can be seen from the figures, the models show their particular skills differently. However, the better performance in representing average precipitation and persistence of dry and wet days can still be noticed in moisture-related models, which implies the importance of introducing additional continuous atmospheric predictors to the discrete circulation patterns. The interannual variability in average and extreme rainfall has been largely enhanced in both winter and summer seasons. Generally, all model setups show their best performances in reproducing winter rainfall, rather than in reproducing summer rainfall. This is mainly caused by the different mechanisms governing synoptic climate. In winter, the long-lasting rainfall is caused by large-scale circulation that can be captured by classified circulation patterns. However, the summer rainfall is mostly dominated by local processes that are difficult to be represented by large-scale patterns. The model setup with skewed normal distribution as its marginal shows the best performance in winter, followed by the one with exponential distribution, and then the one with gamma distribution. In contrast to the winter season, the model with exponential distribution behaves best in the summer seasons. Improvement in dry and wet spells is noticeable in model setups that use both CP and moisture flux to determine rainfall occurrence. Furthermore, the variability related to average rainfall amount is increased; however, the variability related to extreme precipitation is still quite low. As a whole, the models are enhanced with inclusion of moisture flux over the whole year. Their performances vary in indices and differ from season to season and depend on the governing atmospheric mechanisms.

5 Conclusion

In this paper, a conditional multivariate downscaling model was introduced. The present model couples a continuous predictor, moisture flux, in addition to the circulation patterns, to generate the rainfall amount. Besides this, a logistic regression has been applied for determining the rainfall probability.

The analysis of the dependence between the precipitation series and the moisture flux highlighted that the influence is significantly related to local geography and orography. For example, the eastward moisture flux is more dominant for regions in central Europe, while for other regions, it may not be a dominant factor. Shown by a linear regression model and a logistic regression model, it is clear that not only precipitation amount but also probability are dependent on both circulation patterns and moisture flux, which proves the key role of circulation patterns and moisture flux for generating precipitation.

The current model was successfully applied to the Rhine River basin. Its performance was evaluated by diagnostic analysis for both extreme and average conditions. The performance is consistent with historical observations. Especially, the interannual performance could be improved a lot due to the incorporation of moisture flux. The main conclusions are summarized as follows:

  • Classification of circulation patterns is useful to capture the representative climate phenomena.

  • Moisture flux plays an important role on local rainfall events.

  • Logistic regression is useful when involving use of binary predictands and fitting regression parameters to the nonlinear equation. In this study, it helps to determine rainfall probability with respect to daily moisture flux conditioned to a given circulation pattern.

  • Under the same circulation pattern, the larger the moisture flux, the higher the probability of rainfall.

  • With the same amount of moisture flux, the wet CP is more likely to cause the occurrence of rainfall.

  • Generators with skewed normal distribution and exponential distribution were proven to be reliable in representing the variability of precipitation.

In comparison with other model setups, the model of the skewed normal distribution coupled with the moisture flux is the best one in terms of reproducing the interannual variability and representing the annual cycle. As an alternative, the exponential distribution using logistic regression for determining the occurrence of rainfall works also relatively well. In general, all the models show the best performance in winter, followed by transition season, spring and autumn.

The winter rainfall at high latitude regions in the northern hemisphere is important for major river flooding, often caused by long lasting rainfall due to large-scale west cyclonic circulation (Caspary 1996), which can be captured by the classified circulation patterns. For the winter rainfall events, all the models coupled with moisture flux yield better results in terms of indices related to both mean precipitation and extreme precipitation conditions. Especially for the model adopting the skewed normal distribution, the skill to reproduce the interannual variability has been enhanced by 100%. The models with exponential distribution and gamma distribution do not perform as well as the one using the skewed normal distribution. Nevertheless, they produce better results than the model without moisture flux. This implies the importance of introducing continuous meteorological predictors in addition to the discrete circulation patterns for generating meteorological variables.

The capabilities of the models to reproduce the interannual variability in summer are also improving with the consideration of moisture flux, however, not as much so in winter. The model with the exponential distribution shows the best performance of all three model setups. The skill to reproduce the extreme rainfall events is increased by 25% and for the normal rainfall events by 60%. The model performance is in general weaker than that in other seasons. This weakness is due to the different mechanisms characteristic for summer rainfall from that for winter rainfall. It is partly the result of local convective motion and therefore, the rainfall events are much more local. More research is needed related to moisture flux. Studying the vertical variation of moisture flux may be helpful to further improve model performance.