1 Introduction

Return period \(\mathcal T\) is probably one of the most used and misused concepts in hydrological and geophysical risk analysis. \(\mathcal T\) is commonly written as:

$$\begin{aligned} \mathcal T = \dfrac{\mu }{p} = \dfrac{\mu }{\mathbb P [X > x]} = \dfrac{\mu }{1-F(x)} \end{aligned}$$

where \(X\) is a random variable describing the process under study (e.g. flow or rainfall peaks above a given threshold), \(\mu >0\) denotes the average inter-arrival time between two realizations of the process, \(p = \mathbb P [X > x]\) is the probability to observe realizations exceeding a specific value \(x\), and \(F(x) = 1-p= \mathbb P [X \le x]\) indicates the distribution function of \(X\). Even though the attitude of considering observations of physical processes as realizations of random variables is questionable (e.g., Klemeš 1986, 2000, 2002), pure statistical frequency analyses and computation of \(\mathcal T\) values and corresponding return levels \(x\) are widespread in engineering, environmental sciences, and many other disciplines.

In this context, \(\mathcal T\) is usually preferred to values of the underlying probability of exceedance \(p\) as it seems to be (apparently) more friendly than the concept of probability. However, experience tells us that this feeling is generally not well founded, and often leads to misleading statements such as “The 50-year return period flood peak of 100 \(\text{m}^3 \text{s}^{-1}\) occurs once every 50 years” or “A flood peak of 100 \(\text{m}^3 \text{s}^{-1}\) has been recorded recently in this area. Therefore the value of 100 \(\text{m}^3 \text{s}^{-1}\) 50-year return period flood peak is now wrong”.

The causes of these incorrect conclusions (still widespread in technical reports and scientific literature) are discussed in textbooks and guidelines referring to frequency analyses of a single variable under the hypothesis that the observations are independent and identically distributed (iid) (see e.g., McCuen 1998; Fleming et al 2002; Gupta 2011, among others). However, the fast growth of nonstationary and multivariate frequency analyses occurred in the last decade led to extend the concept of return period to these frameworks. The aim of this study is to show how the causes of misconceptions mentioned above propagated in nonstationary and multivariate frequency analyses yielding further ill-posed procedures and misleading statements. In addition, we also show that the concepts of probability of exceedance and risk of failure over a given design life period provide more coherent, general and suitable tools to measure and communicate the risk corresponding to hydrological and geophysical hazards.

2 Reviewing some basic concepts: what risk do we really need to measure?

Before discussing nonstationary and multivariate cases, it is worth reviewing some basic concepts concerning the stationary univariate setting. Let us assume that a geophysical phenomenon is described by a random variable \(X\) and we observe realizations of the phenomenon at fixed time intervals (e.g. daily or annual time scales). Under the hypotheses that the phenomenon is stationary (i.e. the distribution function \(F\) is independent of time or other covariates) and each realization is independent of the previous ones (i.e. the realizations represent outcomes of a series of independent experiments under the same (controlled) conditions), \(\mathcal T\) can be defined in different ways: (1) as the expected value of the number of realizations (observed at fixed time steps) that one has to wait before observing an event whose magnitude exceeds a fixed value \(x\); (2) as the expected value of the number of trials between two successive occurrences of events exceeding \(x\). The first definition is known as “average occurrence interval” (Douglas et al. 2002) and implies that a finite time \(\tau \) has elapsed since a past exceedance, and the interest is in the residual or remaining waiting time for the next occurrence (Fernández and Salas 1999a), whereas the latter is known as “average recurrence interval” (Douglas et al. 2002) and conveys information about the mean elapsed time between occurrences of critical events. In the first case, \(\tau \) can be known (from historical records) or unknown, whereas in the second case, we have \(\tau = 0\), meaning that an exceedance has just occurred. The difference between these definitions can be (and actually is) commonly overlooked just because they both lead to Eq. 1 under iid hypotheses.

Referring to Chow et al. (1988, p. 382) or Cooley (2013) for the formal derivation of Eq. 1, we only recall that it corresponds to the Wald equation (Wald 1944) allowing for the calculation of the expected value of the sum of a random number \(n\) of iid random variables \((T_n)_{n\in \mathbb N}\) with common mean \({\text{E}}[T]\) under the hypothesis that the nonnegative integer-value random variable \(N\) is also independent of the sequence \((T_n)_{n\in \mathbb N}\), that is (e.g., Shiau 2003; Salvadori 2004):

$$\begin{aligned} \mathcal T= {\text{E}}\left[ \sum ^{N}_{n = 1} T_n\right] = {\text{E}}[T] {\text{E}}[N] = \dfrac{\mu }{p} \end{aligned}$$

where \(\mu := {\text{E}}[T]\), whereas \({\text{E}}[N] = 1/p\) results from the expected value of the geometric distribution describing the number of events occurring before observing a realization larger than \(x\).

As \(\mathcal T\) can be always expressed in years, the return period is deemed a friendly measure of the degree of rarity of an event, which however leads to statements such as “This event is expected to occur on average once each \(\mathcal T\) years”. This statement is formally correct but also possibly misleading because, as is well known, the underlying probability \(p\) actually says that there is a probability \(p\) to observe the so-called \(\mathcal T\)-year event “each year”, or better, each time interval of duration \(\mu \).

In fact, what really matters in systems’ design and planning is not \(\mathcal T\) and the corresponding \(x\) values obtained by inverting Eq. 1, but the risk of failure, which is the probability \(p_M\) to observe a critical event at least once in \(M\) years of design life. Under iid conditions, \(p_M\) is defined as (Chow et al. 1988, p. 383):

$$\begin{aligned} p_M:= 1- \prod ^{M}_{j=1} (1-p_j) = 1- (1-p)^M = 1- (F(x_d))^M. \end{aligned}$$

However, in the common practice, Eq. 3 is not used directly to define the design value \(x_d\) as it should, but only to verify the value of \(p_M\) corresponding to the \({\mathcal T}\)-year value \(x\), resulting in the well-known expression

$$\begin{aligned} p_M= 1- (F(x))^M = 1 - \left( 1- \dfrac{1}{{\mathcal T}} \right) ^M. \end{aligned}$$

In other words, Eq. 3 allows us to compute a design value \(x_d\) with an appropriate probability \(p_M\) describing the actual risk of observing at least a failure in the entire design life period

$$\begin{aligned} x_d = F^{-1}((1-p_M)^{1/M}), \end{aligned}$$

whereas Eq. 4 provides the risk of failure corresponding to a value \(x\) which only accounts for a fraction \(1/M\) of the time of exposure to a hazard. For example, when \(M= \mathcal T\), the probability \(p_M\) is about 63 %, which is an almost inacceptable level of risk in the most applications. A criticism to such remarks could be that one can always choose a value of \(\mathcal T\) yielding a design value equal to \(x_d\). However, this only introduces an intermediate step that makes the derivation of \(x_d\) unnecessarily more complicated. Indeed, the variables of true interest are \(p_M\) and \(x_d\), whereas starting from the value of \(p\) corresponding to \(x_d\) is superfluous (as \(p=1-(1-p_M)^{1/M}\)), and even more superfluous is the reciprocal transformation of \(p\) introduced by Eq. 1.

The shortcomings of \(\mathcal T\) and the advantages of reasoning in terms of \(p\) and \(p_M\) definitely emerge when we move from univariate iid conditions to univariate non-independent and/or non-identically distributed (ni/nid) data, and to iid multivariate framework.

3 Univariate nonstationary analyses: highlighting the limits of \(\mathcal T\)

When we move from stationary to nonstationary conditions (i.e. independent non-identically distributed i/nid data), the concept of return period becomes further ambiguous (Cooley 2013). However, it can still be defined in two ways for operational purposes. The first definition is the extension to nonstationary conditions of the concept of expected occurrence interval (expected waiting time until an exceedance occurs; Olsen et al 1998; Salas and Obeysekera 2014). In more detail, under nonstationarity, \(p_j= \mathbb {P}[X_j > x] =1- F_{j}(x)\) is no longer constant and equal to \(p\) but changes for each trial (time step) \(j\) along the time series. Therefore, the return period in Eq. 1 becomes (Cooley 2013; Salas and Obeysekera 2014)

$$\begin{aligned} {\mathcal T} = 1+\sum ^{ \infty }_{ k = 1 } \prod ^{k}_{j=1} (1-p_j) = 1+\sum ^{ \infty }_{ k = 1 } \prod ^{k}_{j=1} F_{j}(x). \end{aligned}$$

Parey et al. (2007, 2010) extended to i/nid conditions an alternative definition of return period such that \(x\) is the value for which the expected number of exceedances in \(\mathcal T\) years (trials) is equal to one. Therefore, \(x\) is the solution of the equation (Cooley 2013)

$$\begin{aligned} 1 = \sum ^{{\mathcal T}}_{ j = 1 } p_j = \sum ^{{\mathcal T}}_{ j = 1 } (1-F_{j}(x)). \end{aligned}$$

Unlike the iid case, both Eqs. 6 and 7 need to be solved numerically to obtain the required \(x\) value corresponding to the assigned \(\mathcal T\) (see e.g., Cooley 2013; Salas and Obeysekera 2014, for numerical details).

Whatever definition is used, the return periods and/or corresponding return levels \(x\) given by Eqs. 6 and 7 simply summarize (exactly or approximately) the average annual probability of exceedance similar to the iid case. Indeed, dividing both terms in Eq. 7 by \(\mathcal T\) and taking the reciprocal we obtain:

$$\begin{aligned} \frac{1}{{\mathcal{T}}} = \frac{1}{{\mathcal{T}}}\sum^{{\mathcal{T}}}_{ j = 1 } p_{j} ={\bar{p}} \Rightarrow {\mathcal{T}} = \frac{1}{\bar{p}} = \frac{1}{1 - {\overline{F_{j}(x)}}}. \end{aligned}$$

Therefore, similar to \(\mathcal T\) under iid conditions, Eq. 8 reveals that nonstationary \({\mathcal T}\) under \(i/nid\) hypotheses does not provide additional information compared with the average value of the probabilities of exceedance \(p_j\) over the period \(\mathcal T\). In other words, one can choose a prescribed average annual probability of exceedence \(\bar{p}\) to be met in the \(\mathcal T\) period and compute \(x\equiv x_{\bar{p}}\) solving \(\bar{p} = 1 - \overline{F_{j}(x)}\) without introducing the redundant concept of return period.

At this stage it is worth recalling that the two definitions of \({\mathcal T}\) as average occurrence and recurrence intervals can yield different relationships between \({\mathcal T}\) and \(p\) when the data are not independent but identically distributed (i.e. under ni/id conditions; Loaiciga and Mariño 1991; Fernández and Salas 1999a, b; Douglas et al 2002). This happens for instance for trials following a simple first-order Markov chain (see e.g., Fernández and Salas 1999a). Moreover, also for ni/id data, the final expressions of \({\mathcal T}\) are functions of the unconditional and conditional probabilities of failure and safe events at each trial (time step).

The above discussion highlights that (1) the definition of \({\mathcal T}\) is not unique and depends on some hypotheses about data that seldom if ever hold true for real world records, making the concept ambiguous (e.g., Fernández and Salas 1999a; Cooley 2013); (2) \({\mathcal T}\) only summarizes the average probability of exceedance (or failure) at single time steps (trials), thus generally underestimating the actual risk of failure; and (3) in the most common applications (i.e. applying Eq. 1 under iid conditions), \({\mathcal T}\) does not add information compared with \(p\).

On the contrary, \(p_M\) has a well devised and unique general definition that suitably specializes for each situation (iid, i/nid, etc.). This is better emphasized by resorting to copula notation (Nelsen 2006). Since \(p_M\) is defined as the complement to unity of the joint probability of observing no failures in the design life period (e.g., Şen 1999; Şen et al. 2003), it can be written as:

$$\begin{aligned} p_M&= 1- \mathbb P [X_1 \le x_d \cap X_2 \le x_d \cap \ldots \cap X_M \le x_d]\nonumber \\&= 1- H_M(X_1 \le x_d,X_2 \le x_d,\ldots ,X_M \le x_d)\nonumber \\&= 1- C_M(F_{1}(x_d),F_{2}(x_d),\ldots ,F_{M}(x_d)), \end{aligned}$$

where \(H_M\) denotes the joint distribution of a set of random variables \(X_j\), for \(j= 1,\ldots ,M\), describing the process at each time step, \(F_{j}\) indicates the marginal univariate distribution of \(X_j\), and \(C_M\) is the copula describing the margin-free serial dependence structure. Under iid conditions, \(F_{j}=F\), \(\forall j \in \left\{ 1,\ldots ,M\right\} \), and \(C_M = \mathrm {\Pi }\), so that Eq. 9 reduces to Eq. 3. For i/nid data (i.e. independent but nonstationary conditions), \(C_M = \mathrm {\Pi }\), whereas \(F_{j}\) changes at each time step (trial). For ni/id data (i.e. serially correlated stationary random variables), \(C_M \ne \mathrm {\Pi }\) and \(F_{j}=F\), \(\forall j \in \left\{ 1,\ldots ,M\right\} \). Therefore, Eq. 9 can account for whatever condition, and copula notation allows us to explicitly distinguish the role of temporal dependence/independence (summarized by \(C_M\)) and stationarity/nonstationarity (i.e. the assumption of identical/non-identical marginal distributions \(F_{j}\)).

To summarize, \(p_M\) in Eq. 9 (1) describes the actual risk of failure in the design life period; (2) has a unique definition independent of the nature of data, comprising every combination of (in)dependence and (non)stationarity assumptions as special cases; (3) does not imply elaborated analytical derivations and/or reasoning, and extrapolations beyond the design life (unlike the summations in Eqs. 6 and 7); and (4) has an easy and straightforward interpretation. In this respect, the so-called “design life levels” proposed by Rootzén and Katz (2013) for univariate nonstationary data (i.e. i/nid conditions) provide \(x_d\) values yielded by one of the special cases of Eq. 9 mentioned above [see also Sivapalan and Samuel (2009), for the rationale of risk assessment under nonstationary conditions].

4 Multivariate analyses: the sleep of \(p\) reason produces \(\mathcal T\) monsters

4.1 Preliminary remarks

In this section, we extend the discussion to a multivariate framework involving multiple iid random variables. In the literature dealing with the application of multivariate frequency analysis to hydrological variables, some effort has been made to understand how Eq. 1 can be adapted to be applied in a multivariate context. Indeed, moving from the univariate to multivariate framework, the following apparent problem seems to arise: in the univariate case, a critical value \(x\) defines a unique critical region, i.e. the set of values so that \(X>x\), and the denominator in Eq. 1 is uniquely defined as \(1 - F(x)\), whereas in a multivariate context it seems that we have a multiple choice. Referring for instance to a bivariate case involving two random variables \(X\) and \(Y\), they can combine in different ways yielding for instance the events \((X>x \cap Y>y)\), \((X>x \cup Y>y)\), \((X>x | Y>y)\), among many others. Such combinations of events are described by different joint and conditional distributions summarizing the corresponding joint and conditional probabilities (e.g., \(\mathbb P[X>x \cap Y>y]\), \(\mathbb P[X>x \cup Y>y\)], etc.). Moreover, unlike the univariate case, several (actually infinite) pairs of values \((x,y)\) share the same joint probability t because an infinite set of pairs \((x,y)\) fulfills for instance the equation \(t = H(x,y)\), where \(H\) denotes the joint distribution of \(X\) and \(Y\).

In light of this variety of possible cases, several studies attempted to examine the relationships between the \(\mathcal T\) values yielded by Eq. 1 replacing different conditional and joint probabilities of exceedance into the denominator in order to define the most appropriate choice, also making comparisons in terms of \(\mathcal T\) values and corresponding return levels. However, as is shown in the following, these analyses are essentially not well founded and related to the misleading nature of \(\mathcal T\). The evolution of such a literature is an interesting example of how misconceptions tend to spread more easily than good procedures and recommendations. Therefore the chronological path offers an outline for the discussion and an interesting interpretative lens.

4.2 Setting the stage

The examination of multivariate return periods and return levels requires the preliminary introduction of some concepts. For the sake of simplicity and without loss of generality, let us focus on the bivariate case so that \(X\) and \(Y\) are two random variables representing two hydrological/geophysical variables (e.g. the rainfall intensity at two measurement stations). Let \(F_X\) and \(F_Y\) be the marginal distributions of \(X\) and \(Y\), \(C\) their copula, and \(H(x,y)=C(F_X(x),F_Y(y))= C(u,v)\), where \(H\) is the bivariate joint distribution function of \(X\) and \(Y\), and \(U=F(X)\) and \(V=G(Y)\) are standard uniform random variables. The following discussion is based on \(C(u,v)\) as this allows us to work in the unit square \([0,1]^2\) (where \(C\) is defined), making the results independent of the marginal distributions, and the graphical visualization more effective and easier. Moreover, we deal with iid data, i.e. temporally independent and identically distributed two-dimensional observations \((x, y)\). We also introduce the expressions of some joint and conditional probabilities corresponding with some bivariate return periods commonly studied in the literature. Using copula notation, we define

$$\begin{aligned} p_{\text{AND}}&:= \mathbb P [U > u \cap V > v] = 1-u-v+C(u,v),\end{aligned}$$
$$\begin{aligned} p_{\text{OR}}&:= \mathbb P [U > u \cup V > v] = 1-C(u,v),\end{aligned}$$
$$\begin{aligned} p_{\text{COND1}}&:= \mathbb P [U > u | V > v] = (1-u)(1-u-v+C(u,v)), \end{aligned}$$
$$\begin{aligned} p_{\text{COND2}}&:= \mathbb P [U > u | V \le v] = 1-\dfrac{C(u,v)}{u}, \end{aligned}$$
$$\begin{aligned} p_{\text{COND3}}&:= \mathbb P [U > u | V = v] = 1-\dfrac{\partial C(u,v)}{\partial u}, \end{aligned}$$
$$\begin{aligned} p_{\text{K}}&:= \mathbb P [\mathbb P [U \le u \cap V \le v] > t] = \mathbb P [C(U,V) > t] = 1-K_C(t), \end{aligned}$$
$$\begin{aligned} p_{\text{S}}&:= \mathbb P [g(U,V) > z] = 1-F_Z(z). \end{aligned}$$

Equations 1014 describe the probabilities of exceedance used by Yue and Rasmussen (2002), Shiau (2003) and Salvadori (2004) in the first works providing a systematic discussion of multivariate return periods. Equation 15 describes the probability of exceedance corresponding to the so-called “secondary” or “Kendall” return period introduced by Salvadori (2004) and further investigated by Durante and Salvadori (2010), Salvadori and De Michele (2010), and Salvadori et al (2011). The function \(K_C\) in Eq. 15 is the so-called Kendall function and represents the distribution function of the copula [see e.g., Salvadori and De Michele (2010), for further details]. Finally, Eq. 16 refers to the so-called “structure-based” return period introduced by Volpi and Fiori (2014), where \(g\) is a functional relationship linking the forcing (environmental) variables with the design (structural) variable \(Z=g(U,V)\). The meaning of all these probabilities will be examined in more detail in the following sections.

4.3 Some classical definitions of multivariate return periods: is something better than something else?

To our knowledge, Yue and Rasmussen (2002) provided the first systematic discussion about multivariate return periods. The aim of that work was praiseworthy recognizing that the conditional distributions, conditional return periods, and joint return periods, were misused in spite of their importance for understanding and interpreting a multivariate event. Indeed, “incorrect interpretations of these concepts will lead to misinterpretation of frequency analysis results. Thus, for both practitioners and researchers to apply these concepts appropriately in the future, the authors feel that it is necessary to assemble these concepts together and to give a clear illustration of them”. Thus, Yue and Rasmussen (2002) collected and discussed some concepts related to conditional and joint distributions and return periods, and derived some relationships between univariate and bivariate return periods. Unfortunately, the road to hell is paved with good intentions, and the same work also introduced some ambiguous final recommendations whose negative consequences still persist. Based on a bivariate model describing the relationship between flood peak and volume, Yue and Rasmussen (2002) concluded that “under a given return period, the flood peak/volume value given by the single frequency analysis is greater than those by the joint distribution. This implies that if one neglects the close correlation between flood peak and volume, and carries out single-variable frequency analysis on flood peak or volume only, the severity of a flood event may be overestimated. If a hydrologic engineering design is based on the results from the single-variable frequency analysis, then this over-evaluation will lead to an increased cost. Hence, single-variable frequency analysis cannot provide a sufficient probabilistic assessment of a correlated multivariate event”.

Leaving out the actual nature of the correlation between flood peak and volume and the correctness of using joint distributions to describe such a relationship [see Serinaldi and Kilsby (2013), for a discussion], this sentence can be misleading, suggesting some comparisons that are actually illogical from both theoretical and practical point of view. To better understand this problem, it is worth starting from some very basic concepts. In applied sciences, probabilistic models are built and set up to describe specific situations concerning the behavior of a system. For example, hydraulic structures are designed to fulfill specific requirements, and are characterized by some key features (e.g., the length of a spillway) and operational rules. In these cases, if some variables of interest are known with uncertainty, a probabilistic model can be used to describe them and their interaction, according to physical constraints and device operating principles. In this respect, borrowing the example of flood events, if a device is designed to protect against flood peaks and is insensitive to flood volume, or the flood volume and/or duration are not of interest because the device does not manage these quantities in no way, therefore the variable of interest is only one and multivariate probabilistic models are not required. Thus, stating that the univariate frequency analysis of flood peak or volume yields an overestimation of the severity of a flood event is essentially meaningless without specifying (1) which variables are critical and are required to characterize a flood event, and (2) how these variables interact in light of the design/protection purposes.

This misconception, which is the basis of several ill-posed comparisons described in the literature, is partly related to the use of \(\mathcal T\) instead of the underlying probabilities. Indeed, based on Eq. 1, fixing \(\mathcal T\) does not mean to select a value expressed in a friendly measurement unit and summarized by the sentence “...under a given return period, e.g. 100 years...”; it means to select the probability of exceedance (or failure) corresponding to a specific and unique type of event. To clarify this point, let us consider the probabilities \(p_X= \mathbb P [X>x]\), \(p_Y= \mathbb P [Y>y]\), \(p_{\text{OR}}\), \(p_{\text{AND}}\) and \(p_{\text{COND1}}\), and the corresponding return periods given by Eq. 1 (i.e. the reciprocal of such probabilities up to the multiplying factor \(\mu \)). Referring to Yue and Rasmussen (2002) and Vandenberghe et al. (2011) for analytical derivations, it can be shown that

$$\begin{aligned} p_{\text{OR}} \ge \max \{p_X,p_Y\} \ge \min \{p_X,p_Y\} \ge \, p_{\text{AND}} \ge \, p_{\text{COND1}}, \end{aligned}$$

and therefore

$$\begin{aligned} \mathcal T\!_{\text{OR}} \le \min \{\mathcal T_X,\mathcal T_Y\} \le \max \{\mathcal T_X,\mathcal T_Y\} \le \mathcal T\!_{\text{AND}} \le \mathcal T\!_{\text{COND1}}. \end{aligned}$$

The conclusions of Yue and Rasmussen (2002) reflect these theoretical inequalities, but overlook that the probabilities in Eq. 17 describe events that cannot be ordered and compared, making statements about risk overestimation or underestimation essentially meaningless. A visualization of such concepts can help understand this issue. Figure 1 highlights the domains where the probabilities \(p_{\text{AND}}\), \(p_{\text{K}}\) \(p_{\text{OR}}\), \(p_{\text{COND1}}\), \(p_{\text{COND2}}\), and \(p_{\text{COND3}}\) are defined (black bold lines) along with the subsets where the critical events corresponding to such probabilities fall (grey areas). For example, \(p_{\text{AND}}\) is a bivariate joint probability defined on the entire unit square (as \(U\) and \(V\) can assume every value in \([0,1]^2\)) and measures the chance to observe an event in the upper right quadrant defined by \((U>u \cap V>v)\), whereas \(p_{\text{COND1}}\) is defined over the subset \((u,1]\times [0,1]\) and measures the probability to observe an event in the upper part of such a subset. Therefore, even though the analytical relationships in Eqs. 17 and 18 hold true because of pure mathematical constraints, it is evident that the different probabilities refer to different sets of events defined over different domains. In this respect, the inequality \(p_{\text{OR}} \ge p_{\text{AND}}\) (or \({\mathcal T}_{\text{OR}} \le {\mathcal T}_{\text{AND}}\)) naturally derives from the fact that both probabilities are defined on \([0,1]^2\) but describe the chance to observe events over different subsets, being the OR subset always larger than the AND subset for each fixed pair \((u,v)\).

Fig. 1
figure 1

Synopsis of the domains and critical regions corresponding to different types of probabilities (\(p_{\text{AND}}\), \(p_{\text{K}}\), etc.) described in the text. Bold black lines define the domains where the probability is computed, whereas grey areas denote the critical regions fulfilling the condition related to each type of probability. For example, \(p_{\text{AND}}\) is defined over the unit square \([0,1]^2\) and describes the chance to observe events in the top right corner fulfilling the condition \((U>u \cap V>v)\)

Salvadori and De Michele (2004) clearly described these aspects and highlighted that the univariate analyses are fine if only one variable is significant in the design process, whereas multivariate approaches are obviously required when several variables are involved. However, this did not prevent subsequent comparisons reported in several works. Referring to a case study discussed by De Michele et al. (2005), Salvadori and De Michele (2004) showed that \(\mathcal T\!_{\text{OR}}\) is about 20 % smaller than \(\mathcal T_X(= \mathcal T_Y)=\pi \), which in turn is about 30 % larger than \(\mathcal T\!_{\text{AND}}\), thus concluding that values differerent from \(x_{\pi } (=y_{\pi })\) (corresponding to univariate return periods) must be used to obtain joint events with a return period \(\mathcal T\!_{\text{OR}}\) or \(\mathcal T\!_{\text{AND}}\) equal to \(\pi \). Even though this line of reasoning seems to be correct, the following question arises. If the critical configuration is described by e.g. the OR sets of bivariate events, why should one use the sets of univariate events as a reference? In other words, given that \(\mathcal T_X(=\mathcal T_Y)\), \(\mathcal T\!_{\text{OR}}\) and \(\mathcal T\!_{\text{AND}}\) are different (excluding some limiting cases) as they refer to different mechanisms of failure and different sets of events, why should the values of \(x_{\pi } (=y_{\pi })\) corresponding to a univariate return period \(\pi \) match the values corresponding to \(\mathcal T\!_{\text{OR}}\) or \(\mathcal T\!_{\text{AND}}\)? Based on the inequalities in Eq. 18 and Fig. 1, it is evident that the pairs \((x^{\text{AND}}_{\pi },y^{\text{AND}}_{\pi })\) yielding \(\mathcal T\!_{\text{AND}}=\pi \) are generally different from those giving univariate \(\mathcal T_X(=\mathcal T_Y)=\pi \). Of course, the comparison makes sense if one wants to quantify the error of using a probabilistic model that does not provide a suitable description of the actual mechanisms of failure. However, without specifying such mechanisms, there is no way to make comparisons and draw conclusions about possible underestimation or overestimation. Moreover, advocating the multivariate nature of some geophysical phenomena (such as floods, droughts or storms) is also insufficient to assert that a multivariate approach is better then the univariate. Indeed, every phenomenon can be described in principle by multiple variables; however, as mentioned above, sometimes only one variable is of interest for design purposes (Salvadori and De Michele 2004).

We provide a visual description to further highlight that every probability in Eqs. 1015 (and the corresponding return periods) is perfectly coherent with the scenarios of events that it describes. Figure 2 shows 1,000 pairs \((u,v)\) simulated from a Gumbel copula with parameter corresponding to a value of Kendall correlation equal to 0.7 and \(u=0.6\) and \(v=0.8\). Each panel displays both the set of pairs falling within the domains over which a specific probability is defined, and the subsets of pairs falling within the critical regions according to the different definitions. Theoretical and empirical values of the probabilities of observing critical events are also reported. The agreement between theoretical and empirical probabilities, and the visualization of the sets and subsets of interest in each case should definitely clarify that (1) there is no definition better than others, (2) each definition is coherent with the scenario that it describes, and (3) making comparisons between probabilities defined over different sets and subsets of data is allowable only to show the error related to an incorrect choice of the probabilistic model. It should also be noted that reasoning in terms of probabilities allows us to bear in mind the underlying scenarios, whereas reasoning in terms of \(\mathcal T\)-year return period easily leads to miss the meaning of the underlying probability and to make comparisons of values that seem to be similar in terms of measurement unit, but actually describe incomparable mechanisms of failure.

Fig. 2
figure 2

Similar to Fig. 1, but showing 1,000 pairs \((u,v)\) simulated from a Gumbel copula. Each panel highlights the sets of pairs falling within the domains where each probability (\(p_{\text{AND}}\), \(p_{\text{K}}\), etc.) is defined (grey circles and filled circles), and the subsets of pairs fulfilling the condition related to each type of probability (filled circles)

4.4 Kendall and structure-based return periods: something better than classical definitions or other facets of the same die?

Referring to OR and AND cases and flood peak and volume, Shiau (2003) anticipated a summary of the above discussion stating that “The use of \(\mathcal T\!_{\text{OR}}\) or \(\mathcal T\!_{\text{AND}}\) as the design criterion depends on what situations will destroy the structure. Under the condition that either flood peak or flood volume exceeding a certain magnitude will cause damage, then \(\mathcal T\!_{\text{OR}}\) can be used to evaluate the average recurrence interval. On the other hand, when the flood volume and flood peak must exceed a certain magnitude that will cause damage, then \(\mathcal T\!_{\text{AND}}\) is used”. As this recommendation holds true not only for OR and AND cases but also for any other case, it raises some considerations about \(p_{\text{K}}\) and \(p_{\text{S}}\) and the corresponding return periods \(\mathcal T\!_{\text{K}}\) and \(\mathcal T\!_{\text{S}}\).

\(\mathcal T\!_{\text{K}}\) was proposed by Salvadori (2004) and therefore extensively applied (e.g., Salvadori and De Michele 2010, 2013; Durante and Salvadori 2010; Salvadori et al 2011; Vandenberghe et al. 2011, among others). The idea behind \(\mathcal T\!_{\text{K}}\) is to overcome an apparent shortcoming of \(\mathcal T\!_{\text{OR}}\) (and \(\mathcal T\!_{\text{AND}}\)) based on the following arguments. Different pairs of \((U,V)\), e.g. \((u,v)\), \((u',v')\) and \((u'',v'')\), lying on the same level curve of a bivariate joint distribution share the same joint probability, i.e. \(\mathbb P[X \le x \cap Y \le y] = \mathbb P[X \le x' \cap Y \le y'] = \mathbb P[X \le x'' \cap Y \le y'']\), but define different and partially overlapping \(p_{\text{AND}}\) critical regions (see e.g. panel \(p_{\text{K}}\) in Fig. 1). Thus, we have infinite OR (AND) critical regions characterized by the same joint probability, making a choice among them impossible (e.g., Salvadori and De Michele 2010). Since this lack of correspondence between each \(\mathcal T\!_{\text{OR}}\) (\(\mathcal T\!_{\text{AND}}\)) value and a unique critical region is incorrect from a measure theoretic point of view, Salvadori (2004) introduced \(\mathcal T\!_{\text{K}}\), which relies on the Kendall distribution (or measure) \(K_C\) and measures the chance to observe an event in one of the two unique subregions defined by a level curve characterized by a unique value of joint probability. This solves the lack of dichotomy mentioned above.

However, is \(\mathcal T\!_{\text{K}}\) a really better tool for dealing with multivariate return periods? In other words, is \(\mathcal T\!_{\text{K}}\) better than \(\mathcal T\!_{\text{AND}}\) (or \(\mathcal T\!_{\text{OR}}\))? Also in this case, removing the concealing effect of Eq. 1 and reasoning in terms of probabilities, a positive answer to the above question implies for example that \(\mathbb P [ \mathbb P [U \le u \cap V \le v] \le t ]=K_C\) is better than \(\mathbb P [U \le u \cap V \le v]=C\). Of course, both the probabilities legitimately exist along with every other joint and conditional probability describing the infinite possible combinations of bivariate events. They are simply different because describe different situations, cannot be interchanged, and their use only depends on which one better describes the design requirements and mechanisms of failure. In terms of critical regions, \({\text{AND}}\) and \({\text{OR}}\) (which rely on \(C\)) describe the probabilities associated with critical regions defined by a unique pair of values \((u,v)\), whereas \(\mathcal T\!_{\text{K}}\) (which relies on \(K_C\)) measures the probability associated with critical regions defined by an infinite set of points lying on a \(t\)-level curve. In the first case, the design criterion intrinsically focuses on \((u,v)\), whereas in the second case, the focus is on \(t\). In other words, in the first case the implicit requirement is that the final unique design pair \((u,v)\) guarantees a prescribed joint probability of exceedance, provided that a failure occurs when both specific values \(u\) and \(v\) are exceeded. In the second case, we implicitly deal with a system which is sensitive to and can fail for a set of bivariate events characterized by the same joint probability of exceedance. Thus, \(\mathcal T\!_{\text{K}}\), \(\mathcal T\!_{\text{AND}}\) and \(\mathcal T\!_{\text{OR}}\) simply describe different mechanisms of failure associated with different systems and must be used accordingly.

In this context, the structure-based return period introduced by Volpi and Fiori (2014) allows us to further expand the above discussion. The authors highlighted that “being strictly dependent on the particular structure under examination, the return period of structure failure usually does not match that of the hydrological loads. This entails that the multivariate approach may not fully rely on the assumption of hydrological design events, i.e., a multivariate event or an ensemble of events which all share the same (multivariate) return period”. These remarks led Volpi and Fiori (2014) to introduce a so-called structure-based return period \(\mathcal T\!_{\text{S}}\). Also in this case, reasoning in terms of probabilities provides a clearer picture than working with return periods. The idea was to move from the (multivariate) distribution of the hydraulic loads \(X\) and \(Y\) (e.g. peak and volume of the input hydrograph in a reservoir) to that of the actual design variable \(Z\) (e.g. the spillway design discharge) by propagating the probability density function of the hydrological loads through the the function \(Z = g(X,Y)\), which describes the physical dynamics of the system (e.g. the reservoir routing through the spillway). This approach is known as transformation of two random variables (e.g., (Papoulis and Pillai 2002, p. 139)), its univariate version (\(Z=g(X)\)) has been used in several applications (e.g. Kunstmann and Kastens 2006; Ashkar and Aucoin 2011; Serinaldi 2013), and in the present case it yields Eq. 16.

The comparison of Eqs. 15 and 16 highlights that \(p_{\text{K}}\) and \(p_{\text{S}}\) have the same form, meaning that \(C\) is just a particular case of \(g\). Both \(C\) and \(g\) are used to define sets of events that fulfill some specific requirements (a prescribed value of the joint probability or a physical law) and identify two sub- and super-critical regions uniquely defined by a critical region (i.e. a curve on \([0,1]^2\)). In other words, if the generic function \(g\) describes a physical transformation of \((X,Y)\), the resulting design variable \(Z\) has a physical meaning, whereas if \(g\) specializes as \(C\), the resulting design variable \(Z\) is implicitly the value of the joint probability. Is \(p_{\text{S}}\) (\(\mathcal T\!_{\text{S}}\)) better than \(p_{\text{K}}\) (\(\mathcal T\!_{\text{K}}\)) or vice versa? It is not actually. They simply focus on two different design variables, among the infinite options that can be selected using different forms of \(g\). The choice depends on the final aim as for \(p_{\text{OR}}\), \(p_{\text{AND}}\), etc. Therefore the comparison between \(\mathcal T\!_{\text{OR}}\), \(\mathcal T\!_{\text{K}}\) and \(\mathcal T\!_{\text{S}}\) critical regions is unfortunately once again no very informative. Indeed, \(p_{\text{OR}}\), \(p_{\text{K}}\) and \(p_{\text{S}}\) correctly describe their own underlying probabilistic structures, which are different and cannot be compared. Moreover, in that specific example, only \(p_{\text{S}}\) is correct as it is the only probability describing the physical mechanism under study, and stating that \(p_{\text{OR}}\) and \(p_{\text{K}}\) underestimate or overestimate the probability of failure is not meaningful as it is known a priori that they do not describe the critical scenarios corresponding to the mechanism of failure at hand. These comparisons may only be useful to show the error corresponding with the use of probabilities (return periods) that are known a priori to be inappropriate for the physical process of interest. Finally, the reduction of dimensionality given by the use of \(\mathcal T\!_{\text{K}}\) and \(\mathcal T\!_{\text{S}}\), that is, the use of the univariate distributions \(K_C\) and \(F_Z\) instead of the bivariate distribution \(H\), can be ineffective if the design (structural) variable is not unique (e.g., \(\left\{(Z,W):\,Z=g(X,Y)\,{\text{and}}\, W=h(X,Y) \right\} \)).

4.5 Multivariate risk of failure

Similar to univariate case, the definition of risk of failure \(p_M\) easily adapts to a multivariate iid framework. Denoting \(\mathcal E_j\), for \(j= 1,\ldots ,M\), a generic safe event, \(p_M\) can be written as

$$\begin{aligned} p_M&= 1- \mathbb P [\mathcal E_1 \cap \mathcal E_2 \cap \ldots \cap \mathcal E_M]\nonumber \\&= 1- H_M(\mathcal E_1, \mathcal E_2,\ldots , \mathcal E_M). \end{aligned}$$

As for the return period, the choice of \(\mathcal E_j\) is not arbitrary but must describe the mechanism of failure. Thus, it can be \(\left\{ \mathcal E_j: X_j \le x_d \cup Y_j \le y_d \right\} \), \(\left\{ \mathcal E_j: X_j \le x_d \cap Y_j \le y_d \right\} \), \(\left\{ \mathcal E_j: g(X_j,Y_j) \le z_d \right\} \) or whatever else related to the design process. Also in this case, \(p_M\) provides a more transparent and suitable measure of risk compared to \(\mathcal T\) as it measures the overall risk of failure along the entire design life period and requires to explicitly and carefully think about what type of event \(\mathcal E_j\) (and corresponding probability) must be used to obtain meaningful results fulfilling the true design requirements.

5 Conclusions

In this study, we attempted to show that the concept of return period is prone to additional misinterpretations when we move from the classical univariate frequency analysis of iid data to nonstationary and multivariate settings. Even though we used examples referring to hydrological processes and corresponding engineering problems, it should be noted that the discussion and methodological framework are fully general and concern the risk assessment of whatever process (environmental, geophysical, anthropogenic, etc.). Therefore, referring to a generic system (Dooge 1968) which can fail under critical conditions according to a given mechanism of failure, our conclusions can be summarized as follows:

  1. 1.

    Independent of the particular framework (univariate/multivariate and stationary/nonstationary), the concept of return period \(\mathcal T\) does not add information compared with the underlying probabilities of exceedance \(p_j\) measuring the risk of failure each time or time interval \(j\) in which there is exposure to a specific hazard. Using financial terminology, \(\mathcal T\) can be seen as a derivative of the underlying \(p_j\), and as we learned from the financial crisis of 2007–2008, derivatives can be toxic. Indeed, in spite of the simple relationships linking \(\mathcal T\) and \(p_j\), return period tends to conceal the actual meaning of \(p_j\) and the underlying mechanisms of failure by an apparently friendly and understandable measurement unit.

  2. 2.

    Focusing on the univariate nonstationary case, we have shown that the effort to define \(\mathcal T\) resulted in two measures that simply summarize the average value of \(p_j\) over the \(\mathcal T\) period, thus better highlighting an aspect that is well known in the classical analysis of univariate iid data, but concealed by the compact form of Eq. 1.

  3. 3.

    While the concealing nature of \(\mathcal T\) can have a limited impact in a univariate (stationary or nonstationary) context, it easily leads to incoherent calculations and misleading conclusions in the multivariate iid case. Since multiple variables can combine in almost infinite ways, the multiple definitions of \(\mathcal T\) (\(\mathcal T\!_{\text{OR}}\), \(\mathcal T\!_{\text{AND}}\), etc.) introduced the belief that the choice is somewhat arbitrary and subjective, and can be object of debate. However, looking at the underlying probabilities, it is clear that such a belief is not well founded, and no meaningful debate does exist because each type of probability (\(p_{\text{OR}}\), \(p_{\text{AND}}\), etc.) describes in a unique way a specific mechanism of failure. Therefore the choice between the multiple definitions depends on how the system (e.g., a hydraulic device or whatever else) responds to a specific forcing. This mechanism has a unique probabilistic description that results in a specific type of \(p\) (univariate, multivariate, conditional, etc.), which in turn corresponds to a unique type of \(\mathcal T\) according to the mere reciprocal transformation \(\mathcal T = \mu /p\).

  4. 4.

    Provided that multivariate return periods are not interchangeable because the underlying probabilities are not interchangeable, also comparisons between different definitions (so widespread in the literature) lose their meaning. Indeed, comparing different multivariate return period means to compare the probabilities describing different sets of events corresponding to different mechanisms of failure, only one of which describes the response of the system at hand. Therefore, conclusions about supposed overestimation or underestimation are illogical and misleading because every univariate, multivariate and conditional \(\mathcal T\) and \(p\) correctly describes its own events’ set (as shown in Fig. 2 and discussed in Sect. 4). Such comparisons may make sense only to assess the error of using a type of \(\mathcal T\) different from the correct one. However, also in this case the usefulness is limited as the different return periods usually correspond to very different combinations of critical events. In this context, the chain of inequalities linking some types of \(\mathcal T\) (see Eq. 18) results from pure mathematical constraints and provides numerical boundaries for the values of different return periods for fixed values of \(U\) and \(V\); however, the existence of these relationships should not be confused with the possibility of comparing probabilities that describe heterogeneous types of events defined over different domains (as is shown in Fig. 2).

  5. 5.

    Unlike \(\mathcal T\), the risk of failure in the design life period \(p_M\) (1) has a unique and general definition that can fit every situation (univariate/multivariate and stationary/nonstationary); (2) has an easy and coherent interpretation; and (3) provides a well devised measure of the actual risk to observe at least a critical event in the design life period moving from average “annual” risk summarized by \(p\) and \(\mathcal T\) to the actual joint probability of failure in the entire design life.