Introduction

A multitude of processes in water engineering involve more than one random variable. For example, floods are characterized by peak, duration, volume, and inter-arrival time, which are all random in nature. Droughts are described by their severity, duration, inter-arrival time, and areal extent, which are also random. Extreme precipitation events are represented by their intensity, amount, duration, and inter-arrival time, which are all random. Inter-basin water transfer involves transfer of excess water from one basin (say, donor) to a water deficient basin (say, recipient). The transfer involves the volume of water, availability of water in both donor and recipient basins, duration of transfer, rate of transfer, and time interval between water transfers which are all random variables. Water quality entails pollutant load, duration for which the load is higher than the protection limits, and peak pollutant concentration, which are all random variables. Likewise, erosion in a basin may be characterized by sediment yield, number of erosion events, duration of events, intensity of events, and time interval between two consecutive events. These are all random variables. Flooding in a coastal watershed may be caused by the simultaneous occurrence of high precipitation and high tides where both precipitation and tide are random variables. Examples of processes involving more than one random variable abound in hydrologic, hydraulic, environmental, and water resources engineering. There usually exists some degree of dependence among the random variables or at least among some of the variables. Often we are concerned with multivariate stochastic modeling and risk analysis of the systems and processes that involve the derivation of probability distributions of the random variables considering the dependence structure among them. Nowadays, these stochastic processes can be modeled with the copula–entropy theory that has proven to be more flexible and accurate than the traditional approaches. The objective of this paper therefore is to reflect on some recent advances made in the application of the copula–entropy theory and future challenges.

Methods

Copula–entropy theory

The copula–entropy theory (CET) is an amalgam of the copula theory and the entropy theory. These two theories are now discussed.

Entropy theory

The entropy theory comprises (1) formulation of entropy, (2) principle of maximum entropy (POME), and (3) theorem of concentration (TOC). Entropy can be defined in the real domain or frequency domain. In the real domain, the most famous form of entropy is the Shannon entropy (Shannon 1948), although Tsallis entropy (Tsallis 1988) and Renyi entropy (Renyi 1951) have been receiving much attention in recent years. Another popular formulation of entropy is the cross-entropy or relative entropy due to Kullback and Leibler (1951) which is a generalization of the Shannon entropy. For a continuous random variable X with a probability density function (PDF), f(x), and cumulative probability distribution function (CDF), F(x), the Shannon entropy, H(X) or H[f(x)], can be defined as

$$ H(X) = H[f(x)] = - \;\int\limits_{0}^{\infty } {f(x)\ln f(x)\,{\text{d}}x,} $$
(1)

where X ∊ [0,∞] but can also vary from − ∞ to + ∞ or from a finite lower limit to a finite upper limit. The Shannon entropy can also be defined in an analogous manner for a discrete variable.

The principle of maximum entropy (POME), propounded by Jaynes (1957), states that of all the distributions that satisfy the given constraints, the distribution yielding the maximum entropy is the least-biased distribution and should hence be preferred. If there are no constraints then POME says that the resulting distribution would be a uniform distribution, which is consistent with the Laplacian principle of insufficient reason.

The theorem of concentration states that POME yields the best constrained probability distribution and is the preferred method for inferring this distribution, and this distribution best represents our state of knowledge about the behavior of the system. This is a consequence of Shannon’s inequality and the relation between entropy and Chi square statistic.

Copula theory

The foundation of the copula theory is the Sklar theorem (Sklar 1959). The theorem states that the joint (multivariate) probability distribution of two or more random variables is a function of the probability distributions of individual variables (also referred to as marginal distributions which are one-dimensional). In other words, the multivariate distribution is coupled to its marginal distributions. It is implied that these random variables are not independent of each other. The copula theory does not specify the way to derive the marginal distributions and does not lead to a unique copula. There are different ways to construct copulas and different ways to select the best copula.

Methodology for application of copula–entropy theory

The copula–entropy theory can be applied in different ways: (1) the marginal distributions are derived using the entropy theory and the joint distribution using the copula theory (e.g., Hao and Singh 2012; Zhang and Singh 2012). Since there can be more than one joint distribution fitted to the multivariate random variables, the best distribution is then selected from either visual goodness-of-fit plot (e.g. QQ plot) or formal goodness-of-fit test statistics (Genest et al. 2009). (2) With the marginal distributions derived using the entropy theory, the best copula is selected as the copula function yielding the maximum entropy. (3) Both marginal and joint distributions are derived using the entropy theory (e.g., Chu 2011; Chen et al. 2013; Aghakouchak 2014). The methodology for application of the copula–entropy theory will depend on the way it is applied. Each of the three ways is now outlined. First, the methodology for application of the entropy theory is outlined, since entropy is needed in all three ways.

Methodology for application of entropy theory

Fundamental to applying the entropy theory is the specification of constraints the derived probability distribution must satisfy. There can be any number of constraints which can be defined in different ways but the easiest way is to define them in terms of moments. Let g(x) be any function of random variable X. Then, the ith constraint, C i , can be expressed as

$$ C_{i} = \int\limits_{0}^{\infty } {g_{i} \left( x \right)f\left( x \right)\,{\text{d}}x = E\left[ {g_{i} \left( x \right)} \right],\quad i = 1,2, \ldots ,m,} $$
(2)

where E is the expectation operator. If \( g_{0} \left( x \right) = 1, \) then Eq. (1) will lead to the total probability as

$$ C_{0} = \int\limits_{0}^{\infty } {f\left( x \right)\,{\text{d}}x = 1.} $$
(3)

The next step is to maximize entropy given by Eq. (1), subject to Eqs. (2) and (3). Entropy maximizing can be done using the method of Lagrange multipliers where the Lagrange function L can be written as

$$ L = - \;\int\limits_{0}^{\infty } {f(x)\ln f(x)\,{\text{d}}x} - (\lambda_{0} - 1)\left[ {\int\limits_{0}^{\infty } {f(x)\,} {\text{d}}x - 1} \right] - \sum\limits_{i = 1}^{m} {\lambda_{i} \left[ {\int\limits_{0}^{\infty } {g_{i} (x)f(x)\,{\text{d}}x} - C_{i} } \right]} , $$
(4)

where \( \lambda_{i} ,i = 0,1, \ldots ,m, \) are the unknown Lagrange multipliers. Applying the Lagrange–Euler calculus of variation, Eq. (4) leads to the maximum entropy distribution:

$$ f(x) = \exp \left[ { - \;\sum\limits_{i = 0}^{m} {\lambda_{i} g_{i} (x)} } \right]. $$
(5)

Now the unknown Lagrange multipliers are determined from the known constraints. The multipliers can be determined in two ways: regular entropy method and parameter space expansion method (Singh 1998; Singh and Rajagopal 1986). Substituting Eq. (5) in Eq. (3), we get

$$ \begin{array}{*{20}c} {\exp (\lambda_{0} ) = Z = \int\limits_{0}^{\infty } {\exp } \left[ { - \;\sum\limits_{i = 1}^{m} {\lambda_{i} g_{i} (x)} } \right]\,{\text{d}}x} \\ {\text{or}} \\ {\lambda_{0} = \ln Z = \ln \left\{ {\int\limits_{0}^{\infty } {\exp } \left[ { - \;\sum\limits_{i = 1}^{m} {\lambda_{i} g_{i} (x)} } \right]\,{\text{d}}x} \right\}} \\ \end{array} , $$
(6)

where Z is called the partition function, and

$$ ZC_{i} = \int\limits_{0}^{\infty } {g_{i} (x)\exp \left[ { - \,\sum\limits_{i = 1}^{m} {\lambda_{i} g_{i} (x)} } \right]} \,{\text{d}}x,\quad i = 1,2, \ldots ,m. $$
(7)

Equation (6) shows that λ0 is a function of \( \lambda_{ 1} , \, \lambda_{ 2} , \ldots , \, \lambda_{m} , \) i.e., \( \lambda_{0} = \lambda_{0} \left( {\lambda_{ 1} , \, \lambda_{ 2} , \ldots , \, \lambda_{m} } \right) \), and the function is convex. Differentiating λ0 with respect to \( \lambda_{ 1} , \, \lambda_{ 2} , \ldots , \, \lambda_{m} \) individually, we get the relations between Lagrange multipliers. Substituting Eq. (6) into Eq. (7), we obtain

$$ C_{i} = \frac{{\int\limits_{0}^{\infty } {g_{i} (x)\exp \left[ { - \;\sum\nolimits_{i = 1}^{m} {\lambda_{i} g_{i} (x)} } \right]\,{\text{d}}x} }}{{\int\limits_{0}^{\infty } {\exp \left[ { - \;\sum\nolimits_{i = 1}^{m} {\lambda_{i} g_{i} (x)} } \right]\,{\text{d}}x} }},\quad i = 1,2, \ldots ,m. $$
(8)

Equation (8) shows that C i , \( i = 1,{ 2}, \ldots ,m, \) are functions of \( \lambda_{ 1} , \, \lambda_{ 2} , \ldots , \, \lambda_{m} \).

Differentiating Eq. (6) and using Eqs. (2) and (5), the result is as follows:

$$ \frac{{\partial \lambda_{0} }}{{\partial \lambda_{i} }} = - \;E[g_{i} (x)] = - \;C_{i} ,\quad i = 1,2, \ldots ,m. $$
(9)

For obtaining parameters, the derivatives in Eq. (9) are equated to the derivatives obtained from \( \lambda_{0} = \lambda_{0} \left( {\lambda_{ 1} , \, \lambda_{ 2} , \ldots , \, \lambda_{m} } \right) \). Similarly, it can be shown that

$$ \frac{{\partial^{2} \lambda_{0} }}{{\partial \lambda_{i}^{2} }} = E\left[ {g_{i}^{2} \left( x \right)} \right]\; - \;\left\{ {E\left[ {g_{i} \left( x \right)} \right]} \right\}^{2} = {\text{Var}}\left[ {g_{i} \left( x \right)} \right],\quad i = 1,2, \ldots ,m $$
(10)

and

$$ \frac{{\partial^{2} \lambda_{0} }}{{\partial \lambda_{i} \partial \lambda_{j} }} = E\left[ {g_{i} \left( x \right)g_{j} \left( x \right)} \right]\; - \;E\left[ {g_{i} \left( x \right)} \right]E\left[ {g_{j} \left( x \right)} \right] = {\text{Cov}}\left[ {g_{i} \left( x \right)g_{j} \left( x \right)} \right],\quad i,j = 1,2, \ldots ,m,\;i \ne j. $$
(11)

The maximum entropy, Hmax, of the derived POME-based PDF can be expressed as

$$ \begin{aligned} H_{{{\text{max}}} } = - \,\int\limits_{0}^{\infty } {f(x)\ln f(x)} \,{\text{d}}x = - \;\int\limits_{0}^{\infty } {\left[ { - \,\lambda_{0} - \sum\limits_{i = 1}^{m} {\lambda_{i} g_{i} (x)} } \right]} \exp \left[ { - \,\lambda_{0} - \sum\limits_{i = 1}^{m} {\lambda_{i} g_{i} (x)} } \right]\,{\text{d}}x \\ = \lambda_{0} + \sum\limits_{i = 1}^{m} {\lambda_{i} E[g_{i} (x)]} = \sum\limits_{i = 0}^{m} {\lambda_{i} C_{i} } . \\ \end{aligned} $$
(12)

Equation (12) shows that maximum entropy is a function of Lagrange multipliers and constraints, such that Hmax is a concave function. Equation (12) also shows that Lagrange multipliers, \( \lambda_{ 1} , \, \lambda_{ 2} , \ldots ,\lambda_{m} , \) are partial derivatives of Hmax with respect to constraints C i , \( i = 1,{ 2}, \ldots ,m, \) respectively.

If q i and p i , \( i = 1,{ 2}, \ldots ,n, \) are the frequencies computed from POME-based and given fitted parametric distributions, respectively, for n class intervals, then we have

$$ 2N\Delta H = \sum\limits_{i = 1}^{n} {\frac{{\left( {q_{i} - p_{i} } \right)}}{{p_{i} }}^{2} } = \chi^{2} , $$
(13)

where χ2 is Chi square distributed with s degrees of freedom as

$$ s = n - m - 1. $$
(14)

With the Chi square distribution as the limiting distribution, it is shown that 2NΔH is Chi square distributed. Hence, the Chi square statistic may be applied to assess if the fitted parametric distribution is close to the POME-based distribution (i.e., the reference distribution of random variable).

Methodology for application of copula theory

Definition and main properties for copula

As stated by Sklar (1959), copula couples the multivariate distribution to its marginal distributions which are uniformly distributed on [0,1]. In other words, copula is a mapping function as \( \left[ {0, 1} \right]^{d} \to [0, 1] \). For d-dimensional continuous random variables, there is a unique copula function (C) to represent the joint distribution function (H) as

$$ {H(x_{1} ,x_{2} , \ldots ,x_{d} ) = C(u_{1} ,\,u_{2} , \ldots ,\,u_{d} );}\quad u_{i} = F_{i} (x_{i} )\sim {\text{uniform}}\; ( 0 , 1 ) ,\;i = 1, \ldots ,d. $$
(15)

As shown in Eq. (15), \( u_{i} \) is the CDF of random variable X i . Representing the joint distribution, the copula function has the following properties:

  1. 1.

    \( 0\; \le \;C(u_{1} , \ldots ,u_{d} )\; \le \;1 \);

  2. 2.

    if any \( u_{i} = 0 \), then \( C(u_{1} , \ldots ,u_{d} ) = 0 \);

  3. 3.

    if all \( u_{j} = 1, \, j = 1, \ldots ,d{\text{ and }}j \ne i \); then \( C(1, \ldots ,u_{i} , \ldots ,1) = u_{i} \);

  4. 4.

    C is bounded by the Fréchet–Hoeffding bounds as

    $$ {W \le C \le M;} \, \quad W = \hbox{max} \,\left( {1 - d + \sum\limits_{i = 1}^{d} {u_{i} } ,0} \right), \, M = \hbox{min} \,(u_{1} , \ldots ,u_{d} ) $$
    (16)

    In Eq. (16), W represents the perfectly negative dependence, while M represents the perfect positive dependence. For independent random variables, the corresponding copula function is simply given as \( \varPi = u_{1} u_{2} \cdot \cdot \cdot u_{d} = F_{1} (x_{1} )F_{2} (x_{2} ) \cdots F_{d} (x_{d} ) \); and

  5. 5.

    C is d-increasing, that is, the \( C\left( {u_{ 1} , \ldots , u_{d} } \right) \) volume for any given d-dimensional interval is non-negative.

Copula families and parameter estimation

The major copula families are Archimedean copulas, meta-elliptical copulas, extreme value copulas, vine copulas, and entropic copulas. The Archimedean copula (2-dimensional) is symmetric and easy to construct through the generating function as

$$ C(u,v) = \phi^{ - 1} \left( {\phi (u)\; + \;\phi (v)} \right), $$
(17)

where ϕ is the generating function which is non-increasing. Based on the choice of Archimedean copulas, different copulas within the family may cover different ranges of dependence (Nelsen 2006). For example, the Gumbel–Hougaard copula may only model the positive dependence, while Frank copula may model the entire range of dependence structure. Given its easy construction, the Archimedean copulas have been extensively applied in bivariate hydrological frequency analysis (e.g., Sraj et al. 2015; Salvadori and Michele 2015; Requena et al. 2016a, b).

Meta-elliptical copulas (Fang et al. 2002), as the name suggests, is derived from the elliptical joint distribution. The popularly applied meta-elliptical copulas are meta-Gaussian and meta-Student t copulas. Unlike the Archimedean copulas, the meta-elliptical copulas can model the entire range of dependence structure and can be easily applied to high-dimensional multivariate modeling. Comparing the two popularly applied meta-elliptical copulas, there exists the symmetric tail dependence for meta-Student t copula, while no tail dependence exists for meta-Gaussian copula (e.g. Genest et al. 2007; Song and Singh 2010).

The extreme value copula is derived in accordance with the extreme value theory which may be applied to model the rare events. As stated by Gudendorf and Segers (2009) and Joe (2014), the following relation exists:

$$ {C_{F} (u_{1}^{1/n} , \ldots ,u_{d}^{1/n} ) \to C(u_{1} , \ldots ,u_{d} );}\quad \, \exists \, n \to \infty . $$
(18)

In Eq. (18), C denotes the extreme value copula, and C F denotes that the copula fulfills the limiting relation.

In other words, the extreme value copula must be max-stable. For the bivariate case, the extreme value copula may be written as

$$ C(u,v) = uv\exp \left[ {A\left( {\tfrac{\log (v)}{\log (uv)}} \right)} \right], \, \quad u,v \in [0,1]. $$
(19)

In Eq. (19), A denotes the Pickands dependence function (Pickands 1981; Falk and Reiss 2005) that is convex as \( {A:}\;[0,1] \to [1/2, 1 ] {\text{ and max}}\,(t, \, 1 - t) \le A(t) \le 1{\text{ for }}t \in [0, 1 ] \).

The Gumbel–Hougaard copula (Archimedean copula family) is the only Archimedean copula that belongs to the extreme value family. Hence, the Gumbel–Hougaard copula has been popularly applied in bivariate flood frequency analysis, storm analysis, drought analysis, etc.

Vine copula is constructed, based on the probability density decomposition. The vine copula is applied for high-dimensional analysis (i.e., d ≥ 3). It is usually categorized into Canonical (C)-Vine copula, D-Vine copula, and Regular R-Vine copula (Aas et al. 2007). Using 3-dimensional analysis as an example, we can write the joint probability density function as

$$ f(x_{1} ,x_{2} ,x_{3} ) = \prod\limits_{i = 1}^{3} {f_{i} (x_{i} )c_{12} \left( {F_{1} (x_{1} ),F_{2} (x_{2} )} \right)c_{23} \left( {F_{2} (x_{2} ),F_{3} (x_{3} )} \right)} c_{13|2} \left( {F_{1|2} (x_{1} |x_{2} ),F_{3|2} (x_{3} |x_{2} )} \right). $$
(20)

In Eq. (20), c denotes the copula density function. As seen in Eq. (20), the vine copula is very flexible, since the bivariate copula is applied at all the levels. The vine copula has also been applied in high-dimensional hydrological frequency analysis (e.g., Pham et al. 2016; Arya and Zhang 2017; Verneiuwe et al. 2015)

The parameters of the parametric copula functions constructed above may be estimated with one of the following three approaches:

  1. (i)

    Full-Maximum Likelihood Estimation (Full-MLE): In this method, the parameters of the marginal distributions and copula functions are estimated simultaneously.

  2. (ii)

    Two-Stage Maximum Likelihood Estimation (Two-Stage MLE): In this method, one first estimates the parameters of marginal distributions and then the parameters of the copula function are estimated using MLE with the marginals computed from the previously fitted marginal distributions.

  3. (iii)

    Semi-Parametric (or Pseudo) Maximum Likelihood Estimation (Pseudo-MLE): In this method, the parameters of the copula function are estimated from the empirical marginals (i.e., empirical CDF computed from the plotting position formula or kernel density function).

Of the three estimation methods for parametric copula functions, the Pseudo-MLE is considered least impacted by the possible misidentification of marginal distributions. The advantage of Pseudo-MLE is the separate parameter estimation of marginal distributions and the copula function.

The most entropic canonical copula may be derived using the entropy theory, similar to the application of entropy theory to the univariate random variables. The Shannon entropy of the copula function for two variables is written as

$$ H(u,v) = - \,\int_{{[0,1]^{2} }} {c(u,v)\ln \,c(u,v)} \,{\text{d}}u{\text{d}}v $$
(21)

and the joint density function is given through the copula function as

$$ f(x,y) = f_{X} (x)f_{Y} (y)c(u,v). $$
(22)

Substituting Eq. (22) into Eq. (21), one may conclude, with some simple algebra, that the negative copula entropy [i.e., Eq. (21)] denotes the mutual information of random variables X and Y through the Kullback–Leibler cross-entropy as

$$ \begin{aligned} H_{C} (u,v) & = - \int_{{[0,1]^{2} }} {c(u,v)\ln [c(u,v)]\,{\text{d}}u{\text{d}}v} \\ & = - \int {\int {\frac{f(x,y)}{{f_{X} (x)f_{Y} (y)}}\ln \left( {\frac{f(x,y)}{{f_{X} (x)f_{Y} (y)}}} \right)f_{X} (x)f_{Y} (y)\,{\text{d}}x{\text{d}}y} } \\ & = - \int {\int {f(x,y)\ln \left( {\frac{f(x,y)}{{f_{X} (x)f_{Y} (y)}}} \right)\,{\text{d}}x{\text{d}}y} } = - \,KLCE({f_{X}{:}f_{Y}} ) = - \,I(X;Y). \\ \end{aligned} $$
(23)

According to the information theory, the mutual information [i.e., \( I({X{;}Y}) \)] is a measure of the total correlation between random variables, that is, the mutual dependence between random variables X and Y. From the copula theory [e.g., Eq. (22) for bivariate random variables], the copula density [i.e., \( c(u,v) \)] also denotes the mutual dependence between variables X and Y. Thus, the information maintained in the copula function is the mutual information (i.e., total correlation) between X and Y which results in the copula entropy being negative. In other words, a higher absolute value of the copula entropy represents higher mutual dependence (or total correlation) among the random variables.

Similar to the POME-based univariate distribution, the common constraints are the constraints of total probability of marginals (i.e., for uniform distributed variable on [0,1]), and a measure of dependence (also called association):

$$ \int_{{[0,1]^{2} }} {c(u,v)\,{\text{d}}u{\text{d}}v = 1} \quad \left( {{\text{total}}\;{\text{probability}}} \right) $$
(24)
$$ \int_{{[0,1]^{2} }} {u^{r} c(u,v)} \,{\text{d}}u{\text{d}}v = E(u^{r} ) = \frac{1}{r + 1}, \quad r = 1,{ 2}, \ldots \;\left( {{\text{constraints of}}\;u = F_{X} (x)} \right). $$
(25a)

Applying \( f(x) = \iint {f(x,y)\,{\text{d}}y} \), we can evaluate Eq. (25a) as

$$ \int_{{[0,1]^{2} }} {u^{r} c(u,v)\,{\text{d}}u{\text{d}}v} = \int_{0}^{1} {u^{r} \,{\text{d}}u} \int_{0}^{1} {c(u,v)\,{\text{d}}v} = \int_{0}^{1} {u^{r} f(u)} \,{\text{d}}u = \int_{0}^{1} {u^{r} \,{\text{d}}u} = E(u^{r} ) = \frac{1}{r + 1}. $$
(25b)

In Eq. (25b), \( f(u) = 1 \) since u ~ uniform (0,1). Similarly,

$$ \int_{{[0,1]^{2} }} {v^{r} c(u,v)\,} {\text{d}}u{\text{d}}v = E(v^{r} ) = \frac{1}{r + 1}, \, \quad r = 1, \, 2, \ldots \;\left( {{\text{constraints}}\;{\text{of}}\;v = F_{Y} (y)} \right). $$
(26)
$$ \int_{{[0,1]^{2} }} {a_{j} (u,v)c(u,v)} \,{\text{d}}u{\text{d}}v = E[a_{j} (u,v)] = \varTheta_{j} , \quad j = 1,2, \ldots \;\left( {{\text{constraints}}\;{\text{of}}\;{\text{dependence}}\;{\text{measure}}} \right). $$
(27)

In Eq. (27), Spearman’s rho is commonly applied as the constraint to measure the dependence with \( a_{j} (u,v) = uv \Rightarrow E(uv) = \frac{{\rho_{s} + 3}}{12} \). One can also apply other dependence measures discussed in Nelsen (2006) and Chu (2011).

Using the constraints [Eqs. (24)–(27)], the Lagrange function for the most entropic canonical copula (MECC) can be written as

$$ \begin{aligned} L & = - \int_{{[0,1]^{2} }} {c(u,v)\ln [c(u,v)]} \,{\text{d}}u{\text{d}}v - (\lambda_{0} - 1)\left[ {\int_{{[0,1]^{2} }} {c(u,v)\,{\text{d}}u{\text{d}}v} - 1} \right] \\ & \quad - \sum\limits_{i = 1}^{n} {\lambda_{i} \left[ {\int_{{[0,1]^{2} }} {u^{i} c(u,v)} \,{\text{d}}u{\text{d}}v - \frac{1}{i + 1}} \right]\; - \;} \sum\limits_{i = 1}^{n} {\gamma_{i} \left[ {\int_{{[0,1]^{2} }} {v^{i} c(u,v)} \,{\text{d}}u{\text{d}}v - \frac{1}{i + 1}} \right]} \\ & \quad - \sum\limits_{j = 1}^{k} {\lambda_{n + j} \left[ {\int_{{[0,1]^{2} }} {a_{j} (u,v)c(u,v)} \,{\text{d}}u{\text{d}}v - \varTheta_{j} } \right]} . \\ \end{aligned} $$
(28)

In Eq. (28), \( \lambda_{0} , \ldots ,\lambda_{n} ,\gamma_{1} , \ldots ,\gamma_{n} ,\lambda_{n + 1} , \ldots ,\lambda_{n + k} \) are the Lagrange multipliers. More specifically for MECC, \( \lambda_{r} = \gamma_{r} , \, r = 1, \ldots ,n \). The Lagrange multipliers \( \lambda_{n + 1} , \ldots ,\lambda_{n + k} \) are pertaining to the constraints in relation to the rank-based dependence measures.

Differentiating Eq. (28) with respect to \( c(u,v) \), we have

$$ c(u,v) = \frac{{\exp \left( { - \,\sum\nolimits_{i = 1}^{n} {\lambda_{i} u^{i} } - \sum\nolimits_{i = 1}^{n} {\gamma_{i} v^{i} } - \sum\nolimits_{j = 1}^{k} {\lambda_{n + j} a_{j} (u,v)} } \right)}}{{\int_{{[0,1]^{2} }} {\exp \left( { - \,\sum\nolimits_{i = 1}^{n} {\lambda_{i} u^{i} } - \sum\nolimits_{i = 1}^{n} {\gamma_{i} v^{i} } - \sum\nolimits_{j = 1}^{k} {\lambda_{n + j} a_{j} (u,v)} } \right)\,{\text{d}}u{\text{d}}v} }}. $$
(29)

Based on the principle of maximum entropy, maximizing Eq. (21) is equivalent to minimizing the objective function

$$ \begin{aligned} Z(\varLambda ) & = \ln \left[ {\int_{{[0,1]^{2} }} {\exp \left( { - \sum\limits_{i = 1}^{n} {\lambda_{i} u^{i} } - \sum\limits_{i = 1}^{n} {\gamma_{i} v^{i} } - \sum\limits_{j = 1}^{k} {\lambda_{n + j} a_{j} (u,v)} } \right)\,{\text{d}}u{\text{d}}v} } \right] \\ & \quad + \,\sum\limits_{i = 1}^{n} {\lambda_{i} \frac{1}{i + 1}} + \sum\limits_{i = 1}^{n} {\gamma_{i} \frac{1}{i + 1} + \sum\limits_{j = 1}^{k} {\lambda_{n + j} \,\hat{\varTheta }_{j} } } . \\ \end{aligned} $$
(30)

In Eq. (30), \( \varLambda = \left[ {\lambda_{1} , \ldots ,\lambda_{n} ,\gamma_{1} , \ldots ,\gamma_{n} ,\lambda_{n + 1} , \ldots ,\lambda_{n + k} } \right] \).

The most entropic canonical copula (MECC) may be generalized to most entropic copula (MEC) with respect to a given parametric copula (Chu 2011). In the case of MEC, Eqs. (29)–(30) can be re-written as

$$ c(u,v) = \frac{{\exp \left( { - \,\sum\nolimits_{i = 1}^{n} {\lambda_{i} u^{i} } - \sum\nolimits_{i = 1}^{n} {\gamma_{i} v^{i} } - \sum\nolimits_{j = 1}^{k} {\lambda_{n + j} \,a_{j} (u,v) - b\tilde{c}(u,v)} } \right)}}{{\int_{{[0,1]^{2} }} {\exp \left( { - \,\sum\nolimits_{i = 1}^{n} {\lambda_{i} u^{i} } - \sum\nolimits_{i = 1}^{n} {\gamma_{i} v^{i} } - \sum\nolimits_{j = 1}^{k} {\lambda_{n + j} \,a_{j} (u,v) - b\tilde{c}} (u,v)} \right)\,{\text{d}}u{\text{d}}v} }}, $$
(31a)
$$ \begin{aligned} Z(\varLambda ) & = \ln \left[ {\int_{{[0,1]^{2} }} {\exp \left( { - \,\sum\limits_{i = 1}^{n} {\lambda_{i} u^{i} } - \sum\limits_{i = 1}^{n} {\gamma_{i} v^{i} } - \sum\limits_{j = 1}^{k} {\lambda_{n + j} \,a_{j} (u,v) - b\tilde{c}(u,v)} } \right)\,{\text{d}}u{\text{d}}v} } \right] \\ & \quad + \,\sum\limits_{i = 1}^{n} {\lambda_{i} \frac{1}{i + 1} + \sum\limits_{i = 1}^{n} {\gamma_{i} \frac{1}{i + 1} + \sum\limits_{j = 1}^{k} {\lambda_{n + j} \,\hat{\varTheta }_{j} } } } . \\ \end{aligned} $$
(31b)

In Eq. (31), b is a generic constant, \( \tilde{c}(u,v) \) is the given reference copula. It is seen that the MECC is obtained by setting b = 0. In what follows, we will focus on the application of MECC for bivariate cases through examples.

Copula–entropy for multivariate modeling

Following the discussion of Shannon entropy and copula theory in the previous sections, we will outline the copula–entropy theory for stochastic modeling in this section. In general, we can apply the copula–entropy theory in three ways:

  1. (i)

    The marginal distributions are derived using the entropy theory, while the joint distribution (i.e., copula function) is modeled through the parametric copula function with its parameter estimated using the Full-MLE, Two-Stage MLE, or Pseudo-MLE. In this approach, the goodness-of-fit of the copula function may be assessed either graphically through the KK plot or statistically with the formal goodness-of-fit test statistics (Genest et al. 2009).

  2. (ii)

    The difference of this second approach from (i) above is that the parametric copula function is selected such that it yields the maximum entropy among all copula candidates.

  3. (iii)

    The approach (iii) takes full advantage of the entropy theory. Both marginal and joint distributions are derived using the entropy theory. The Lagrange multipliers are estimated by maximizing entropy or minimizing the corresponding objective function which is the dual problem of maximizing entropy. The Lagrange multipliers of the MECC (joint distribution) may be optimized from the fitted POME-based marginal distributions or from the empirical marginal distribution. The approach (iii) is further adopted for the applications.

Application to multivariate data of known population

Here, we will first show the application of copula–entropy theory to the bivariate sample dataset with the known true population. In this sample study, the sample dataset (N = 1000) is generated from the known Gumbel–Hougaard copula (θ = 4.5) with the true marginal distributions:

$$ \begin{aligned} X\sim {{\text{Gamma (10}} . 5 , { 4} . 3 ) {\text{:} }}\;\frac{ 1}{ 1 0. 5\varGamma ( 4. 3 )}\left( {\frac{x}{10.5}} \right)^{3.3} e^{ - (x/10.5)} \hfill \\ Y\sim {{\text{Lognormal (4, 0}} . 7^{ 2} ){:}}\;\frac{1}{{y(0.7)\sqrt {2\pi } }}\exp \left( { - \,\frac{{(\ln y - 4)^{2} }}{{2(0.7^{2} )}}} \right). \hfill \\ \end{aligned} $$

Study of univariate variates

In Singh (1998), it was shown that \( E[X], \;{\text{and}} \;E[\ln (X)] \) should be applied as constraints to derive the POME-based gamma distribution; while \( E\left[ {\ln \left( x \right)} \right] \;{\text{and}}\; E\left[ {\left( {\ln x} \right)^{2} } \right] \) are the constraints to derive the POME-based lognormal distribution. Following Singh (1998), we have the following:

Gamma distribution

The POME-based gamma distribution may be written as

$$ f(x) = \exp ( - \lambda_{0} - \lambda_{1} x - \lambda_{2} \ln x) $$
(32a)
$$ \frac{{\partial \lambda_{0} }}{{\partial \lambda_{1} }} = \frac{{\lambda_{2} - 1}}{{\lambda_{1} }} = - \,E(X) \approx - \,\bar{x} $$
(32b)
$$ \frac{{\partial \lambda_{0} }}{{\partial \lambda_{2} }} = \ln \lambda_{1} - \varGamma (1 - \lambda_{2} )\psi (1 - \lambda_{2} ) = - {\,E[\ln x]{;}} \, \quad \psi ( {\text{t)}} = \frac{{{\text{d}}\ln [\varGamma (t)]}}{{{\text{d}}t}}. $$
(32c)

The relation of Lagrange multipliers to the parameters of gamma distribution (Singh 1998) is given as

$$ \lambda_{1} = \frac{1}{a}; \, \lambda_{2} = 1 - b; \, f(x;a,b) = \frac{1}{a\varGamma (b)}\left( {\frac{x}{a}} \right)^{b - 1} \exp \left( { - \frac{x}{a}} \right). $$
(32d)

Lognormal distribution

The POME-based lognormal distribution may be written as

$$ f(x) = \exp \left( { - \lambda_{0} - \lambda_{1} \ln x - \lambda_{2} (\ln x)^{2} } \right) $$
(33a)
$$ \frac{{\partial \lambda_{0} }}{{\partial \lambda_{1} }} = \frac{{\lambda_{1} - 1}}{{2\lambda_{2} }} = - \;E[\ln X] $$
(33b)
$$ \frac{{\partial^{2} \lambda_{0} }}{{\partial \lambda_{1}^{2} }} = \frac{{\left( {\lambda_{1} - 1} \right)^{2} }}{{4\lambda_{2}^{2} }} - \frac{1}{{2\lambda_{2} }} = E\left[ {\left( {\ln x} \right)^{2} } \right] $$
(33c)
$$ \lambda_{1} = 1 - {\frac{{\bar{y}}}{{s_{y}^{2} }}{;}}\quad \, \lambda_{2} = \frac{1}{{2s_{y}^{2} }}. $$
(33d)

In Eq. (33d), y = ln (x) and s 2 y represents the sample variance of y.

Using the bivariate data sampled from the true population, Table 1 lists the Lagrange parameters that are estimated for the univariate variables based on both sample moments and population moments.

Table 1 Lagrange multipliers estimated from sample dataset and the true population

Besides applying the constraints directly related to the parametric distribution that may be fitted to the observed dataset, one may also directly apply the first three or four monocentral moments [i.e., \( E(X), \, E(X^{2} ), \, E(X^{3} ), \, E(X^{4} ) \)], given that the moments about the origin govern the shape and mode of the univariate probability density functions (Zellner and Highfield 1988; Cobb et al. 1983). The POME-based distribution so derived is given as

$$ f(x) = \exp ( - \lambda_{0} - \lambda_{1} x - \lambda_{2} x^{2} - \lambda_{3} x^{3} ),\quad{\text{if}}\;{\text{kurtosis}}\;{\text{is}}\;{\text{not}}\;{\text{significantly}}\;{\text{different}}\;{\text{from 3}} $$
(34a)
$$ f(x) = \exp ( - \lambda_{0} - \lambda_{1} x - \lambda_{2} x^{2} - \lambda_{3} x^{3} - \lambda_{4} x^{4} ),\quad {\text{if the kurtosis is significantly different from 3}} . $$
(34b)

The objective function is written as

$$ Z(\varLambda ) = \ln \left[ {\int {\exp \left( { - \sum\limits_{i = 1}^{m} {\lambda_{i} x^{i} } } \right)\,{\text{d}}x} } \right] - \sum\limits_{i = 1}^{m} {\lambda_{i} a_{i} ; \, m = 3,4} ;\quad \varLambda = \left[ {\lambda_{ 1} ,\ldots ,\lambda_{\text{m}} } \right]. $$
(35)

To avoid the possible integration problem, the univariate variable is commonly scaled to [0,1] or [− 1,1] (Hao and Singh 2012; Zhang and Singh 2014). In this study, the univariate variables are scaled to [0,1] to assess its appropriateness. The scaled variable x s is given as

$$ x_{s} = \frac{x - (1 - d)\hbox{min} (x)}{(1 + d)\hbox{max} (x) - (1 - d)\hbox{min} (x)}. $$
(36)

In Eq. (36), d is a small number such that the scaled variable will not reach either the lower limit or the upper limit, i.e., avoiding P(X ≤ max (x)) = 1. Here, we chose d = 0.01. Equations (34)–(35) may then be re-organized with the use of the scaled variable as

$$ f(x) = f(x_{s} )\left| {\frac{{{\text{d}}x_{s} }}{{{\text{d}}x}}} \right| = \frac{1}{(1 + d)\hbox{max} (x) - (1 - d)\hbox{min} (x)}\exp \left( { - \,\lambda_{0} - \sum\limits_{i = 1}^{m} {\lambda_{i} x_{s}^{i} } } \right) $$
(37)
$$ Z(\varLambda ) = \ln \left[ {\int\limits_{0}^{1} {\exp \left( { - \sum\limits_{i = 1}^{m} {\lambda_{i} x_{s}^{i} } } \right)\,{\text{d}}x} } \right] - \sum\limits_{i = 1}^{m} {\lambda_{i} a_{i} } ;\quad \, m = 3 {\text{ or 4}} . $$
(38)

Now applying the first four moments about origin to the scaled variable X and the first three moments about origin to the scaled variable Y, Table 2 lists the parameters estimated by optimizing the objective function using the first four moments about origin [i.e., Eq. (38)].

Table 2 Lagrange multipliers estimated using the first four moments about origin

The POME-based distributions for the original variables are expressed as

$$ f(x) = \frac{1}{153.4792}\exp \left( { - \,1.6011 + 29.2444x_{s} - 101.8716x_{s}^{2} + 125.7947x_{s}^{3} - 57.5913x_{s}^{4} } \right) $$
(39a)
$$ f(y) = \frac{1}{582.2347}\exp \left( {1.2604\; + \;11.6222y_{s} \; - \;103.6613y_{s}^{2} \; + \;182.3986y_{s}^{3} \; - \;101.5606y_{s}^{4} } \right). $$
(39b)

Furthermore, Fig. 1 plots the relative frequency and the frequency computed from the POME-based distributions. As shown in Fig. 1, the POME-based univariate distributions derived (using the constraints pertaining to certain population, and first four moments about the origin) visually fit the observed data very well. Using the true population from the reference distribution, Table 3 lists the Chi square test for the fitted parametric and POME-based distributions constructed. Results in Table 3 clearly indicate the POME-derived distributions may be applied to model the univariate variables. Thus, it is safe to conclude that one may directly use the moments about origin as the constraints to model the univariate random variables.

Fig. 1
figure 1

Comparison of the true population and POME-based distributions from given population or from first four moments about origin (i.e., entropy scaled)

Table 3 Chi square univariate goodness-of-fit results (comparing to the population parameters)

Study of dependence

As previously discussed, one may apply three different approaches to study the dependence using the copula–entropy theory. Hereafter, each approach is evaluated. Within the objective of the study, the Gumbel–Hougaard, Clayton, Frank and meta-t copulas (Nelsen 2006) were applied as parametric copulas. The MECC copula was derived with the constraints of \( E\left( U \right),E\left( {U^{2} } \right),E\left( V \right),E\left( {V^{2} } \right) \) and \( E\left( {UV} \right) \). According to the discussion in “Univariate analysis of peak discharge and flood volume” section for univariate analysis, we will simply apply the POME-based distribution derived using the moments about the origin with the use of scaled variables.

POME-based marginals with parametric copulas

In this approach, the copula parameters were estimated with the use of POME-based marginals and by maximizing the log-likelihood function (it may be also called Two-Stage MLE). Table 4 lists the estimated parameters as well as the corresponding log-likelihood. In this approach, the copula yielding the largest log-likelihood was selected for further analysis. As seen in Table 4, the Gumbel–Hougaard copula was the best candidate. It was in agreement with the sample data actually generated from the Gumbel–Hougaard copula discussed earlier in the section.

Table 4 Parameters, LogL, and entropy estimated from parametric copula

POME-based marginals with parametric copulas selected based on the entropy

In this approach, the parameters of copulas again were estimated by Two-Stage MLE. The difference is that the copula–entropy was computed for the fitted parametric copula. Here, the copula–entropy was estimated using

$$ H_{C} = - \,E\left[ {\ln c(u,v;\theta )} \right] = - \,\frac{1}{n}\sum\limits_{i = 1}^{n} {\ln } \;c(u,v;\theta );\quad \, n{:}\;{\text{sample size}}. $$
(40)

The computed entropy is also listed in Table 4. From the computed entropy using Eq. (40), it is seen that the Gumbel–Hougaard copula yielded the highest mutual information (the absolute value of the copula entropy) among all the copula candidates.

Parametric copulas estimated using Pseudo-MLE

In this approach, the parameters of the copula were directly estimated using the empirical distribution (e.g., empirical distribution using the Weibull plotting position formula) which is listed in Table 4. It is seen that with the Pseudo-MLE, the Gumbel–Hougaard copula again yielded the largest MLE and the highest mutual information.

Most entropic canonical copula with POME-based marginals (or empirical marginals)

In this approach, copulas were derived from the entropy theory with the constraints of \( E(U) = E(V) = \frac{1}{2}, \)\( \, E(U^{2} ) = E(V^{2} ) = \frac{1}{3}; \, \quad \rho_{\text{spearman}} = 0.9287 \) (sample Spearman’s rho) with the parameters listed in Table 5. In Eqs. (24)–(31), it is seen that the dependence structure (i.e., Spearman’s rho in this sample study) was the controlling factor to optimize the objective function of MECC. Thus, it did not matter how the marginals were handled, the MECC would not change for given Spearman’s rho which is shown in Table 5. To further evaluate the impact of Spearman’s rho correlation coefficient, we changed Spearman’s rho to population Spearman’s rho as the constraint (i.e., ρ = 0.9298 from the true Gumbel–Hougaard copula). As seen in Table 5, there was a significant difference in the Lagrange multipliers estimated for MECC.

Table 5 Parameters estimated for MECC copula

To assess whether the MECC so derived fulfilled the fundamental properties of copula function:

$$ C(u,1) = u;\quad C(1,v) = v, $$
(41)

Figure 2 compares the marginal variables computed using Eq. (41) from the MECC with both empirical and POME-based univariate marginals. As seen in Fig. 2, C(u,1) and C(1,v) were in good agreement with their empirical and POME-based univariate marginals. This also implied the appropriateness of POME-based univariate marginals derived using the first four and three non-central moments for random variables X and Y, respectively.

Fig. 2
figure 2

Comparison of empirical and POME-univariate marginals to computed \( C\left( {u,1} \right) \) and C(1,v) computed from MECC

With the fundamental properties fulfilled with the use of MECC derived, one may want to further evaluate the goodness-of-fit of the derived MECC. Here, the Kendall distribution plots were generated and compared to the Kendall distribution of the underlying true Gumbel–Hougaard copula:

$$ K_{\theta } (t) = \frac{t(\theta - \ln \;t)}{\theta }. $$
(42)

The Kendall distribution [\( K_{\varvec{\lambda}} (t) \)] for MECC may be approximated following the procedure discussed in Genest et al. (2009) as follows:

  1. 1.

    Generate random variables [U1,U2] with sample N from the MECC derived, where N is greater than the sample size of the observed dataset.

  2. 2.

    Approximate [\( K_{\varvec{\lambda}} (t) \)] using:

$$ {V_{i} = \frac{1}{N}\sum\limits_{j = 1}^{N} { 1\left( {U_{1j} \le U_{1i} \cap U_{2j} \le U_{2i} } \right)} ;} \, \quad i = 1,2, \ldots ,N $$
(43a)
$$ K^{\text{approx}} (t) = \frac{1}{N}\sum\limits_{i = 1}^{N} {(V_{i} \le t} );\quad t \in [0,1] $$
(43b)

Comparisons shown in Fig. 3 conclude that (i) both fitted Gumbel–Hougaard (GH) copula and MECC derived may properly represent the true GH (θ = 4.5) population through the comparison of the Kendall distribution plots; (ii) visually, there is minimal difference of GH fitted with empirical marginals (b) and the MECC derived based on sample or population Spearman’s rho (c and d); (iii) the reason for the minimal difference is due to the rank-based empirical distribution does not impose any external bias on parameter estimation for the parametric copulas; and the MECC derived here does not rely on the actual marginal values, but the population moments about origin for the uniformly distributed variables; and (iv) though the POME-based marginals well represent the univariate random variables, they do introduce external bias to the estimation of parametric copulas (a).

Fig. 3
figure 3

Comparison of Kendall distribution from fitted Gumbel–Hougaard copula and MECC to that of the true Gumbel–Hougaard copula

Overall, from the bivariate analysis of sample data, MECC may be directly applied to model the dependence structure of the random variables. In the case of the MECC application, the impact of the marginal distributions is eliminated. In the next section, we will use the real watershed data as a case study to further illustrate the copula–entropy theory as well as risk analysis.

Case study with real watershed data

Collected from Flume 1 at Walnut Gulch Watershed in Arizona, the annual maximum flood data [i.e., peak discharge (Q) and flood volume (V)] from 1957 to 2012 were considered for the case study. Based on the findings from analysis of sample data, the case study proceeded as follows: (i) the POME-based univariate distribution was applied to model the univariate peak discharge and flood volume; and (ii) the MECC was applied to model the dependence between peak discharge and flood volume.

Univariate analysis of peak discharge and flood volume

As discussed in “Univariate analysis of peak discharge and flood volume” section, the moments about origin for the scaled variables were considered as constraints to capture the shape and mode of the univariate flood variables. Choosing d = 0.1 in Eq. (36), Table 6 lists the sample statistics for the scaled variables. In Table 6, T and P denote the test statistic and the corresponding P value to evaluate whether kurtosis was significantly different from 3 using

Table 6 Sample statistics for scaled peak discharge and flood volume
$$ \gamma_{2}^{\text{ex}} = \gamma_{2} - 3 $$
(44a)
$$ G_{2} = \frac{n - 1}{(n - 2)(n - 3)}\left( {(n + 1)\gamma_{2} + 6} \right) $$
(44b)
$$ T = \frac{{G_{2} }}{\text{SEK}};\quad {\text{SEK}} = 2\sqrt {\frac{{6n(n - 1)^{2} }}{{(n - 2)(n + 5)(n^{2} - 9)}}} . $$
(44c)

In Eqs. (44a)–(44c), γ2 and γ ex2 denote the sample kurtosis and excessive kurtosis; n is the sample size; SEK is the standard error of kurtosis; and T is the test statistic with the underlying distribution of standard normal distribution.

Results in Table 6 indicate that the first three moments about origin were necessary to derive the POME-based distribution for the scaled peak discharge and flood volume variables. The Lagrange multipliers were optimized and listed in Table 7. Figure 4 compares the POME-based probability density to the histogram, as well as the POME-based CDF to the empirical CDF. Comparisons confirmed the appropriateness of the POME-based univariate distribution.

Table 7 Lagrange multipliers for POME-based univariate distribution
Fig. 4
figure 4

Comparison of POME-based probability density function to the histogram

Bivariate flood frequency analysis with MECC

Let U and V represent the univariate marginals for peak discharge and flood volume, the same constraints to construct MECC for sample data [i.e., \( E\left( U \right),E\left( {U^{2} } \right),E\left( V \right),E\left( {V^{2} } \right), E(UV) \)] were applied to model the dependence of peak discharge and flood volume. The Lagrange multipliers were optimized by minimizing the objective function of Eq. (31a) with b = 0.

With the observed data, sample Spearman’s rho was computed as \( \hat{\rho }_{\text{spearman}} = 0.9419 \), we approximated E(UV) from sample Spearman’s rho as \( E(UV) = \frac{{\rho_{\text{spearman}} + 3}}{12} \approx 0.3285 \). With these constraints, the copula density function for the MECC was obtained to model bivariate flood frequency as

$$ c(u,v) = \exp ( - 1.8194 - 1.1352u - 44.9511u^{2} - 1.1352v - 44.9511v^{2} + 92.1726uv). $$
(45)

Using Eq. (45), Fig. 5 compares (a) the \( C\left( {u,1} \right), C(1,v) \) to the corresponding empirical and POME-based marginals, and (b) the approximated parametric Kendall distribution for MECC to the empirical Kendall distribution. Comparisons in Fig. 5 indicated that (a) the MECC constructed successfully fulfilled the copula properties of \( C\left( {u,1} \right) = u, C\left( {1,v} \right) = v \); and (b) there was a good agreement between the empirical and parametric (i.e., MECC) Kendall distributions, which indicated the appropriateness of the MECC constructed. Applying the POME-based univariate distribution, Fig. 6 plots the simulated random variates versus the observed random variables. Figure 6 shows the dependence structure was well preserved with the application of MECC and POME-based marginals. To further compare the MECC with the empirical copula, Fig. 7 compares the copula and the survival copula with different contour levels. The plot on the left is for the copula function, while the plot on the right is for the survival copula. As shown in Fig. 7, there is good agreement between the contours obtained from the empirical copula (and its survival copula) and the MECC (and its survival copula) for different probability levels. This finding further assured the appropriateness of the MECC.

Fig. 5
figure 5

Comparison of \( C\left( {u,1} \right), \;C(1,v) \), and the Kendall distribution

Fig. 6
figure 6

Simulation results

Fig. 7
figure 7

Comparison of empirical to copula to MECC

Risk analysis

With the assurance to apply the MECC to model the dependence of annual flood sequences (i.e., peak discharge and flood volume), one may proceed to study the associated risk measure, which may be used as an engineering design criterion. In hydrology and water engineering, risk has commonly been assessed through the return period. In what follows, the joint return period of “AND” case was applied for risk analysis. The joint return period of “AND” case is given as

$$ T_{\text{and}} = \frac{\mu }{P(X \ge x,Y \ge y)} = \frac{\mu }{{1 - F_{X} (x) - F_{Y} (y) + C\left( {F_{X} (x),F_{Y} (y)} \right)}}. $$
(45)

To assess the return period “AND” case, the peak discharge and flood volume with univariate CDF of P = 0.8, 0.9, 0.96, and 0.98 were used. Given the limitation of the sample size (n = 56), P = 0.99 was not chosen for the study for comparison purposes. Table 8 lists the joint return period estimated from both empirical copula and MECC. Results in Table 8 indicated the following:

Table 8 Joint CDF and Tand estimated from the empirical copula and MECC
  1. (i)

    There was a small difference between the joint CDFs computed from empirical copula and the MECC. The absolute relative difference was in the range from 0.96% for C(0.8,0.8) to 2.17% for C(0.8,0.9). Thus, in regard to the joint CDF, the differences were insignificant.

  2. (ii)

    Though the difference with joint CDF estimated may not be significant, it resulted in larger differences in regard to the “AND” case return period. It is seen that with the increased marginal probability, the discrepancy also increased between the Tand estimated from empirical copula and the MECC.

  3. (iii)

    There was an interesting finding which was in agreement with Tand estimated from empirical copula and MECC. Using volume = 6.44 × 105 m3 corresponding to P = 0.96 as an example, the joint return period computed from smaller peak discharge (e.g., Q = 73.2 cms corresponding to P = 0.8) was less than that computed with larger peak discharge (e.g., Q = 109.9 cms). This was true in reality, since it was more likely for (Q ≥ 109.9 cms and V ≥ 6.44 × 105 m3) to occur simultaneously compared to that for (Q ≥ 73.2 cms and V ≥ 6.44 × 105 m3). This finding was also in the agreement that higher discharge was most likely associated higher flood volume. This scenario also happened for large flood volume with relatively low peak discharge and vice versa.

Discussion and conclusions

In this study, we investigate the copula–entropy theory in bivariate analysis. Using the sample data with the known univariate populations (i.e., gamma and lognormal) and known dependence (Gumbel–Hougaard), it is concluded that the POME-based distribution derived may model the univariate distribution well. There is minimal difference for POME-based distribution based on the moment of the observed variable and that derived based on the scaled variable (i.e., scaling the observed variable to [0,1]). To avoid the improper integrals, the scaled variable is suggested to derive the POME-based distribution. Comparing to the true Gumbel–Hougaard copula, the MECC derived using the constraints of E(U), E(U2), E(V), E(V2), and E(UV) can properly model the dependence structure of the sample data. The MECC constructed successfully fulfills the fundamental properties of the copula, i.e., C(u,1) = u; C(1,v) = v. In addition, the derived MECC can well present the true dependence structure represented with the Gumbel–Hougaard copula.

Using the real watershed data (i.e., Flume 1 at Walnut Gulch, Arizona), the case study shows the appropriateness of POME-univariate distribution of scaled variable to model the univariate distribution for the observed variates. With the constraints E(U), E(U2), E(V), and E(V2) converging to the population moments of the uniform distributed variables as \( E(U^{i} ) = E(V^{i} ) = {1 \mathord{\left/ {\vphantom {1 {(i + 1)}}} \right. \kern-0pt} {(i + 1)}} \); the MECC constructed only depends on the rank-based dependence measure (in this case, Spearman’s rho). The derived MECC properly models the dependence of annual peak discharge and flood volume, which is independent of the marginal distributions (non-parametric or parametric). The evaluation of the flood risk (using “AND” case return period) indicates that the MECC copula reasonably represents the change of the return period of “AND” case.

Overall, the study concludes as follows:

  1. (i)

    For the bivariate random variables investigated, the MECC may be easily and efficiently applied to model the dependence structure. Unlike other copulas, the MECC is uniquely defined for a given set of constraints. Its uniqueness allows one universal solution for the proposed frequency analysis.

  2. (ii)

    Similar to other copula families (e.g., Archimedean copulas, meta-elliptical copulas, vine copulas, etc.), the MECC may be applied for multivariate analysis in hydrology and water engineering, including multivariate rainfall analysis, multivariate drought analysis, spatial analysis of drainage networks, and spatial analysis of water quality as few examples.

  3. (iii)

    The bivariate MECC copula may be easily extended to higher dimensions. For example, for the d-dimensional variables \( \left[ {X_{ 1} ,X_{ 2} , \ldots ,X_{d} } \right] \) with the marginals of \( U_{i} = F_{i} (X_{i} ),i = 1,2, \ldots ,d \); the MECC may be constructed using the set of constraints, i.e., marginal \( E(U_{i}^{r} ) = {1 \mathord{\left/ {\vphantom {1 {(r + 1)}}} \right. \kern-0pt} {(r + 1)}},\quad \, i = 1,2, \ldots ,d \) and pair-wise \( E(U_{i} U_{j} );\quad \, i,j \in [1,d],i \ne j \) estimated from rank-based Spearman’s coefficient of correlation. The same optimization procedure applied for the bivariate case may be applied to construct the MECC for dependence structure in higher dimensions.