1 Introduction

In the last two decades the concept of copulas has become well-established for describing multivariate relationships between several attributes of a system. Copula models have been successfully applied in many fields, for instance in the fields of finance (Embrechts 2009), econometrics (Fan and Patton 2014), system reliability (Zhang and Wilson 2016) and astronomy (Vio et al. 2020) giving deep insights in the multivariate structure of the data. Copula-based approaches allow to split the structure of multivariate distributions into two parts: first, the one-dimensional marginal distributions of the attributes separately, and secondly, the copula which provides a complete description of the dependencies of all attributes among each other. For example in finance, copula approaches enables to estimate the risk of port-folios of assets more precisely, by providing insights on the dependencies between assets.

Recent studies have emphasized the potential of copulas for modelling multivariate distributions in ecology (Ghosh et al. 2020, and 2021; Anderson et al. 2019). Copula approaches have been utilized to model the relationship between several environmental quantities. Thereby, studies broaden knowledge on copulas presenting methodical frameworks to use copulas (Anderson et al. 2019), and discussing new findings and challenges applying them in ecology (Ghosh et al. 2020). She and Xia (2018) modelled the relationship between drought duration and severity by standard Archimedean copulas to investigate drought processes on the Loess Plateau of China, see also Dayal et al. (2020). In Anderson et al. (2019), fish population abundances are considered. Emura and Michimae (2017) analyzed data of salamander metamorphoses where the copula describes the dependence of the two variables in the censoring setup. The paper by Chang and Joe (2019) deals with vine copulas and their application to abalone data, see also Michimae et al. (2020). Since copulas have only been applied to a limited number of ecological field data yet, insights are still lacking on the generality of those statements, concerning different modelling approaches as well as regarding different types of observations.

Earlier studies mostly considered Archimedean or/and elliptical copulas (Ghosh et al. 2020; Anderson et al. 2019; She and Xia 2018 among others). One main characteristic of these copulas is that they are exchangeable, which means that the copula remains the same if the components are permuted. As a consequence of exchangeability, their bivariate marginal distributions are symmetric with regard to the main diagonal, and all bivariate correlations are identical. However, scatterplots and empirical correlations of practical data often do not show these symmetries. Recent studies show that copulas in ecology are typically asymmetric (see for instance Ghosh et al. (2020), Dayal et al. (2020)). In the paper by Ghosh et al. (2020), the rationale behind the asymmetries in the empirical copula is investigated. Here asymmetry of copulas means that they do not feature symmetric properties explained in Section 5.1. In this paper, we employ product copula models according to Liebscher (2008) to overcome the disadvantages of Archimedean or elliptical copula models. We show the great potential of product copulas for modelling the distribution of ecological data.

The aim of this paper is twofold: first we introduce a copula-based methodology of fitting multivariate distributions. We consider data vectors of an arbitrary dimension contrary to most papers on this subject. Secondly, we analyze data from a widely used ecological database of vegetation attributes to give an example of how to utilize the proposed methodology. In contrast to other papers, the model parameters are fit by means of a minimum distance method by use of Cramér-von-Mises divergence. This method exhibits some advantages. We do not need copula densities having a sophisticated and numerically inconvenient structure in many cases, especially in higher dimensions and for product copulas. Moreover, the minimum distance method provides an approximation coefficient assessing the fit quality. Earlier studies have focused on the maximum likelihood method to fit the copulas (see Hofert et al. (2021) and Ghosh et al. (2020), for example) which appears to be an alternative method. The maximum likelihood method gives theoretically best results whenever the underlying distribution belongs to the model family. The latter is rather seldom the case in practice. The minimum Cramér-von-Mises distance estimation method puts emphasis on fitting the copula rather than parameters, see Weiß (2011) for a comparison of several estimation methods.

The proposed approach is used to describe dependencies of a plant community. In vegetation ecology, a community of interacting plants can be described by the distribution of abundant plant traits. Plant traits (measured at the individual plant level) specify morphological, anatomical, biochemical, physiological or phenological properties of a plant (Violle et al. 2007) and characterize, for example, growth dynamics and functions of the vegetation (Kattge et al. 2011, 2020; Violle et al. 2007). The development of large databases like the global plant trait network GlopNet (Wright et al. 2004), the plant trait data-base for invasive species BiolFlor (Kü hn et al., 2004), the LEDA data-base of life-history traits (Kleyer et al. 2008), and the global plant trait database TRY (Kattge et al. 2011, 2020) enabled new insights on plant properties at the local scale, across biomes and to the global scale. Earlier studies found that the (marginal) distributions of plant traits tend to be lognormal (Kattge et al. 2011), but can markedly change due to climate (Butler et al. 2017), management regime (Herz et al. 2017), and fine-scale soil conditions (Bruelheide et al. 2018).

To investigate links between plant traits, one typical approach was to correlate empirical trait data pairwise (e.g., by using linear or power-law regression) and investigate the coefficient of determination (Paul et al. 1999; Reich et al. 1992, 1997). Theories about plant strategies (e.g., (Grime 1979)) were tested and revealed trade-offs between plant traits related to environmental conditions. For example, plants with large seeds tend to have a higher chance to establish in a community. In turn, the number of their seeds is usually lower resulting in a reduced probability that soil conditions promote their establishment (Heisse et al. 2007). More recent studies investigated plant traits not only pairwise but also in a higher dimension (Wright et al. 2004) and showed trade-offs in multivariate distributions.

However, going to higher dimensions is still challenging. Firstly, datasets can have limitations in the number of pairwise corresponding trait samples for each species. A method to reduce the range of attributes to those that have a high influence is, for example, the Principal Component Analysis (PCA) (Lever et al. 2017). Studies revealed plant traits associated with resource management (leaf traits that determine capture, usage and release of resources like light, carbon and nitrogen) as well as size related traits (e.g., leaf size, canopy height) to mainly determine plant ecosystem functioning (Díaz et al. 2004).

Second, skewed (marginal-) distributions of traits can span large ranges. In contrast to other multivariate statistics, which exclude outer ranges of observed values as outliers, copulas are able to cover any distribution (Anderson et al. 2019), not only separately but also combined in a variable number of dimensions.

Here we used a Germany-wide dataset of measured plant traits with a focus on 20 herbaceous species (Herz et al., 2017) and investigate marginal distributions of plant traits independently and combined in copulas to answer the following questions:

  •       Which statistical marginal distributions can be used to describe plant attributes?

  •       How do plant traits differ in their marginal distributions?

  •       Which plant traits correlate with each other under the given marginal distributions?

The analyzed dataset is part of the global trait database TRY (Kattge et al. 2011, 2020) which covers plant ecosystems like forests, croplands and grasslands worldwide. We introduce a new methodology of modeling multivariate distributions of data vectors from natural science applications. This methodology is rather general and flexible for data with diverse origins, and is demonstrated here on the example of the TRY database. By this, we developed a novel copula not modeled so far.

The paper is structured as follows: In Section 2 we present a description of the database used for numerical studies. Section 3 is devoted to the modelling of marginal distributions where the focus is on the Weibull distribution and on the Gamma distribution. Modelling the copula is subject of Section 4. Details to the results are collected in the appendices.

2 Ecological trait data: An exemplary data set from TRY

TRY plant trait database is one of the main plant trait databases (Kattge et al. 2011, 2020). The globally collected data includes plant attributes on an individual level across different ecosystems such as forests, crops and grasslands, and across climatic gradients.

In this study, we used a dataset on individual plant traits of 20 species (10 grass species and 10 forbs) in managed grasslands (Herz et al. (2017), two censuses measured in 2014 and 2015). Plant traits were measured across a climatic gradient in Germany covered by three study sites of the German Biodiversity Exploratories ((Fischer et al. 2010), Schorfheide-Chorin, Hainich, Schwäbische Alb). We focused the analysis on five plant traits explained in Table 1.

Table 1 Table of attributes

RSR was calculated as the fraction of the measured ‘dry mass aboveground biomass’ [g] and ‘dry mass roots total’ [g] (Herz et al. 2017). SLA was determined as leaf area per leaf dry mass. LCNC was calculated by the ratio of leaf carbon concentration to leaf nitrogen concentration and RCNC analogously by the ratio of root carbon concentration and root nitrogen concentration. HW was determined by the ratio of plant standing height [mm] and the maximum of plant diameter in north-south direction [mm] and east-west direction [mm]. All traits were derived for each census and each study site separately. The analyzed data are presented in a shortened form in Table 4.

3 Multivariate distributions

Let \(Y=(Y^{(1)},\ldots ,Y^{(d)})^{T}\) be a d-dimensional random vector having the joint distribution function H:

$$\begin{aligned} H(y_{1},\ldots ,y_{d})={\mathbb {P}}\left\{ Y^{(1)}\le y_{1},\ldots ,Y^{(d)}\le y_{d}\right\} \quad \text {for }y_{i}\in {\mathbb {R}}\text {.} \end{aligned}$$

In our context \(Y^{(1)},\ldots ,Y^{(d)}\) are the measured values of the traits. They are comprised in the data vector Y. All probabilities related to the random vector Y can be computed in terms of H. We denote the marginal distribution function of \(Y^{(j)}\) by \(F_{j}\) (\(j=1,\ldots ,d\)):

$$\begin{aligned} F_{j}(t)={\mathbb {P}}\left\{ Y^{(j)}\le t\right\} \quad \text {for }t\in {\mathbb {R}}\text {.} \end{aligned}$$

We assume that the marginal distributions are continuous. Then \(f_{j}\) denotes the marginal density of \(Y^{(j)}\):

$$\begin{aligned} f_{j}(t)=F_{j}^{\prime }(t)\quad \text {for }t\in {\mathbb {R}}\text {.} \end{aligned}$$

Sklar’s theorem (Sklar, 1959) implies that

$$\begin{aligned} H(y_{1},\ldots ,y_{d})=C(F_{1}(y_{1}),\ldots ,F_{d}(y_{d}))\quad \text {for } y_{i}\in {\mathbb {R}}\text {.} \end{aligned}$$
(1)

This formula shows that the joint distribution function comprises two parts: the marginal distribution functions \(F_{1},\ldots ,F_{d}\) and the so-called copula C. The copula C describes the dependence of the attributes irrespectively of the marginal distributions. Therefore, the methodology is divided into two parts related to the modeling of the marginal distributions and the copula, respectively. The challenge of the approach is to find suitable parametric models for \(F_{1},\ldots ,F_{d}\) and C. Using these functions, probabilities of rectangles \([y_{1},z_{1}]\times \ldots \times [y_{d},z_{d}]\) can be evaluated by

$$\begin{aligned}&{\mathbb {P}}\left\{ y_{1}\le Y^{(1)}\le z_{1},\ldots ,y_{d}\le Y^{(d)}\le z_{d}\right\} \\&\quad =\sum _{\delta _{1}=0}^{1}\ldots \sum _{\delta _{d}=0}^{1}(-1)^{d+\delta _{1}+\ldots +\delta _{d}}C(F_{1}(w_{\delta _{1},1}),\ldots ,F_{d}(w_{\delta _{d},d})), \end{aligned}$$

where \(w_{0,j}=y_{j},w_{1,j}=z_{j}\). This formula shows how to compute probabilities using \(F_{1},\ldots ,F_{d}\) and C.

In the next section we consider the problem of modelling and fitting the marginal distributions by use of the data. Modeling and fitting the copulas is then the subject of Section 5.

4 Analysis of the marginal distributions

4.1 Methodology of fitting the marginals

In a first step the data is divided into several subsets. Each subset corresponds to one class, here to one species. \(m_{c}\) denotes the median (alternatively the mean) of class c. In the following we consider the measured values of trait j. Let \(Z_{1},\ldots ,Z_{n}\) be the data sample of the considered trait; i.e. \(Z_{i}\) is the i-th measured value of the trait under consideration. \(c(1)\ldots c(n)\) denote the classes of the sample items. For each class c, we compute the empirical median \({\hat{m}} _{c}\).

In the second step we normalize the data by:

$$\begin{aligned} {\tilde{Z}}_{i}=\frac{Z_{i}}{{\hat{m}}_{c(i)}} \end{aligned}$$
(2)

for \(i=1,\ldots ,n\) in order to make the data comparable. Normalized data have the property that the empirical median equals 1.

In the last step, the Weibull distribution or the gamma distribution (see Appendix A) is fit to the transformed data \({\tilde{Z}}_{1},\ldots ,{\tilde{Z}} _{n}\) using the maximum-likelihood method. According to the maximum-likelihood method, the estimated parameters of \(\theta _{j}\) are evaluated by:

$$\begin{aligned} {\hat{\theta }}=\arg \max _{\theta \in \Theta _{1}}\sum _{i=1}^{n}\ln f_{j}( {\tilde{Z}}_{i}\mid \theta ) \end{aligned}$$
(3)

where \(f_{j}(.\mid \theta )\) denotes density model of trait j and \(f_{j}\) is the Weibull or the gamma distribution. In case of Weibull or gamma distribution, no explicit formulas for the estimators are available. The minimization problem (3) has to be solved numerically deploying an efficient optimization algorithm (see e.g. Meeker and Escobar (1998), Chapters 8 and 11). To find the best model, one can compare the values of the Akaike and the Bayesian information criterion (AIC, BIC). The smallest value gives the best model in the considered case.

Finally, \(f_{j}(.\mid {\hat{\theta }})\) is the best-fit model density for the normalized marginal data. For the original data, the best-fit density for the original marginal data of class c is \(t\mapsto {\hat{m}}_{c}^{-1}f_{j}(t~ {\hat{m}}_{c}^{-1}\mid {\hat{\theta }})\) and its median is about \({\hat{m}}_{c}\).

The computations were performed using R software. For distribution fitting and goodness-of-fit, the packages MASS (function fitdistr) and goftest are available in R.

4.2 Results for marginal distributions of the observed data

Looking at the density plots of the variables for the various species, we see that shapes of these curves are rather similar, see Figure 5 in Appendix A2 depicting the densities of the variable RSR. Further in case of some species, there is a sizable difference between the mean and the median being an indicator for the presence of outliers (cf. Table 5 in Appendix A2, exemplified for variable RSR and several grass species). We found similarities between species in density plots of all tested traits. Hence, the whole transformed data of any trait can be regarded all as coming from the same statistical population of the trait. Its distribution can be considered as a baseline one.

The results of the baseline distribution fitting are provided in Tables 6 and  7, see Appendix A2. All the estimated distribution models are checked using the Anderson-Darling goodness-of-fit test (AD-Test in Tables 6,7). All checks were successful since all p-values are higher than 0.05. Additionally, the AIC and BIC show mostly a small difference such that both models, gamma and Weibull distribution, are reasonable for modeling marginal distributions in most cases. Throughout the results reveal that the estimated models give good approximations of the marginal distributions.

4.3 Discussion

The mean is widely used as location parameter of the distribution. If outliers are present in the dataset, then the median should be preferred over the mean as description of the location since the median represents a more robust statistic. Because of the variety of the species medians \(m_{j}\) , normalization according to (2) makes the trait values comparable.

In our investigations concerning the marginal distributions, we found that two types of one-dimensional distributions are relevant in the context of the TRY database: the Weibull distribution and the Gamma distribution. These two distribution families play an important role in modelling positive random variables in the framework of biology and ecology. This is shown in the papers by Taubert et al. (2013), by Hagey et al. (2016) and by de Freitas Costa et al. (2021), for example. The Anderson-Darling goodness-of-fit test is proved to be a powerful method for testing how good the distribution fit is, see Stephens (1986). In this book alternative tests are discussed such as the Kolmogorov test and the Cramér-von Mises test.

5 Methodology of modeling and fitting the copula

5.1 Basics of copulas

Formula (1) provides a link between the joint distribution function H of the random vector \(Y=(Y^{(1)},\ldots ,Y^{(d)})^{T}\) and the copula. Suppose that the marginal distributions are continuous (i.e. \(F_{m}\) has a density \(f_{m}\)). Then the copula C is uniquely determined. A function \(C:[0,1]^{d}\rightarrow [0,1]\) is a copula if it has the following properties (cf. Nelsen 2006, p. 45):

$$\begin{aligned}&\text {1) }C(u) =0\text { if }u_{j}=0\text { for any }j, \\&\text {2) }C(u) =u_{j}\text { if }u_{l}=1\text { for all }l\ne j\text { and any }j, \\&\text {3)}\qquad \left. \sum _{\delta _{1}=0}^{1}\ldots \sum _{\delta _{d}=0}^{1}(-1)^{d+\delta _{1}+\ldots +\delta _{d}}C(u+\delta \cdot h)\ge 0\right. \end{aligned}$$

for all \(u,h\in [0,1]^{d}:u+h\in [0,1]^{d}\), where \(\delta =(\delta _{1},\ldots ,\delta _{d})^{T},u=(u_{1},\ldots ,u_{d})^{T}\).

Condition 3) ensures that C is a multivariate distribution function. This implies that C is increasing in each component. Moreover, according to condition 2), C has marginals which are uniformly distributed on [0, 1] (see Fig. 1). Concerning the theory of copulas we refer to the monograph by Nelsen (2006).

Fig. 1
figure 1

three-dimensional plot of a bivariate Frank copula \(C(u_1,u_2)\) with parameter 6, see Table 8 of copula models

We introduce

$$\begin{aligned} U_{j}=F_{j}(Y^{(j)})\ \ \text { for }j=1\ldots d. \end{aligned}$$

Then \(U_{j}\) has a uniform distribution on [0, 1]. The random vector \( U=(U_{1},\ldots ,U_{d})^{T}\) has the distribution function C. Thus the random vector U involves the information about the dependence of the attributes. In bivariate copula plots this idea is utilized.

There are several copula-based measures describing the dependence of two components. Spearman’s \(\rho _{S}\) and Kendall’s \(\tau \) are the most popular coefficients in this framework. The formulas for the population version of the coefficients are given by

$$\begin{aligned} \rho _{S}= & {} 12\int _{0}^{1}\int _{0}^{1}C(u,v)~\text {d}u\text {d}v-3, \\ \tau= & {} 4\int _{0}^{1}\int _{0}^{1}C(u,v)~\text {d}C(u,v)-1, \end{aligned}$$

where C is the copula of two attributes under consideration. Both association coefficients can be estimated by use of the data.

Two multivariate copulas are of special interest, especially in Proposition 5.1:

$$\begin{aligned} {independent copula }\text {: }&\Pi (u)=u_{1}u_{2}\ldots u_{d}, \\ {comonotonicity copula }&:&M(u)=\min \{u_{1},u_{2},\ldots ,u_{d}\} \end{aligned}$$

The copula M is also known as Fréchet-Hoeffding upper bound copula. If U has distribution M, then \(U_{1}=U_{2}=\ldots =U_{d}\) holds with probability 1. Theorems 2.10.14 and 5.1.9 in Nelsen (2006) lead to the following proposition:

Proposition 5.1

  1. 1)

    If \(Y^{(1)},\ldots ,Y^{(d)}\) are independent then

  2. a)

    \(U_{1},\ldots ,U_{d}\) are independent and \(C=\Pi \),

  3. b)

    all bivariate Spearman’s \(\rho _{S}\) and Kendall’s \(\tau \) of two components of Y are equal to 0.

  4. c)

    If \(C=\Pi \) then \(Y^{(1)},\ldots ,Y^{(d)}\) are independent.

  5. 2)

    Each of the variables \(Y^{(1)},\ldots ,Y^{(d)}\) is a strictly increasing function of any of the others with probability 1 if and only if \(C=M\). Furthermore, in this case, all Spearman’s \(\rho _{S}\) and Kendall’s \(\tau \) of two components of Y equal 1.

From the point of view of the visualization of the distribution and the dependence, a diagram of the copula density gives more information, see figure below. The copula density is evaluated by

$$\begin{aligned} c(u)=\frac{\partial ^{d}}{\partial u_{1}\ldots \partial u_{d}}C(u)\text { for }u\in [0,1]^{d}\text {.} \end{aligned}$$
Fig. 2
figure 2

three-dimensional plot of the density \(c(u_1,u_2)\) of a bivariate Frank copula with parameter 6

Several concepts of symmetry of copulas can be distinguished (see Nelsen (1993)). We mention only two of them here. A copula is referred to as exchangeable, if

$$\begin{aligned} C(u_{1},\ldots ,u_{d})=C(u_{\pi (1)},\ldots ,u_{\pi (d)}) \end{aligned}$$

for any permutation \((\pi (1),\ldots ,\pi (d))\) of \((1,\ldots ,d)\) and all \( u\in [0,1]^{d}\). Interchanging two components \(u_{j}\) leads to the same copula. Especially in the case \(d=3\), exchangeability means

$$\begin{aligned} C(u,v,w)= & {} C(v,u,w),\ \ C(u,v,w)=C(u,w,v),\\ C(u,v,w)= & {} C(w,v,u)\text { for }u,v,w\in [0,1]. \end{aligned}$$

Corresponding equalities hold true for copula densities. Exchangeability is one kind of symmetry of a copula. As a consequence, all bivariate correlations of exchangeable copulas are identical. All the commonly used copula models of Table 8 in Appendix B are exchangeable.

A copula is called radially symmetric (or tail symmetric), if \( (U_{1},\ldots ,U_{d})^{T}\) and \((1-U_{1},\ldots ,1-U_{d})^{T}\) have the same distribution. In the case \(d=2\), bivariate copulas are radially symmetric if C coincides with the survival copula:

$$\begin{aligned} C(u,v)=u+v-1+C(1-u,1-v)\ \ \text {for }u,v\in [0,1]. \end{aligned}$$

The bivariate Frank copula is the only one of Table 8 which is radially symmetric. Elliptical copulas are radially symmetric, too.

5.2 Product copula models

The classical copulas of Table 8 in Appendix B are used as basic models for the construction of product models. The theory of this model class of asymmetric copulas was established in Liebscher (2008). Let \(C_{1}\) and \( C_{2}\) be two copulas. Then the formula for the product copula is given by

$$\begin{aligned} C(u)=C_{1}(u_{1}^{\alpha _{1}},\ldots ,u_{d}^{\alpha _{d}})\cdot C_{2}(u_{1}^{1-\alpha _{1}},\ldots ,u_{d}^{1-\alpha _{d}}) \end{aligned}$$
(4)

where \(\alpha _{1},\ldots ,\) \(\alpha _{d}\in [0,1]\) are parameters inducing the asymmetry of C. In summary, the copula in (4) has \( d+2\) parameters: the two parameters of the copulas \(C_{1},C_{2}\), and the exponents \(\alpha _{1},\ldots ,\) \(\alpha _{d}\). Identity (4) gives the basic model for fitting it to ecological data from TRY database. The copulas \(C_{1}\) and \(C_{2}\) will be taken then from Table 8.

Notice that most of the copulas of Table 8 have pairwise positive correlation. If negative correlations occur, then one should transform some variables to obtain pairwise positive correlations. To achieve this, one simple idea would be to consider the negative of some variables instead of these variables. The methodology of fitting is the subject of the next sections.

5.3 Fitting the copula

Let \(Y_{1},\ldots ,Y_{n}\) be the sample of d-dimensional random vectors with distribution function H and copula C. Vector \(Y_{i}=(Y_{i1},\ldots ,Y_{id})^{T}\) includes the normalized values of the traits of the i-th sample item (i-th measurement). \(Y_{i}\) is the normalized i-th row of the data matrix (see Table  4). Note that the data preprocessing described in Section 4.1 has to be performed before the copula fitting. We denote the joint empirical distribution function by \({\hat{H}}_{n}\) (estimator for H):

$$\begin{aligned} {\hat{H}}_{n}(y)=\frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}\left\{ Y_{i1}\le y_{1},\ldots ,Y_{id}\le y_{d}\right\} \end{aligned}$$

for \(y=(y_{1},\ldots ,y_{d})^{T}\in {\mathbb {R}}^{d}\), where \({\mathbf {1}}\left\{ A\right\} \) is the indicator of the event A; i.e. \({\mathbf {1}}\left\{ A\right\} =1\) if A occurs and \(=0\) if A does not occur. Let \({\tilde{F}} _{n}(y)=(F_{1n}(y_{1}),\ldots ,F_{dn}(y_{d}))^{T}\) be the vector of the marginal empirical distribution functions

$$\begin{aligned} F_{jn}(t)=\frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}\left\{ Y_{ij}\le t\right\} \end{aligned}$$

for \(t\in {\mathbb {R}}\). The values \(F_{jn}(Y_{1j}),\ldots ,F_{jn}(Y_{nj})\) represent a permutation of the relative ranks \(\{i/n:1=1\ldots n\}\), and they mimic so the uniform distribution. Now a copula plot of the j-th and the \(\mu \)-th component can be drawn to show the dependence of both variables. The plot contains the points \((F_{jn}(Y_{1j}),F_{\mu n}(Y_{1\mu })),\ldots ,(F_{jn}(Y_{nj}), F_{\mu n}(Y_{n\mu }))\) (see Figure 6, Appendix D).

The empirical copula \({\hat{C}}_{n}\) estimates the copula and its formula is given by

$$\begin{aligned} {\hat{C}}_{n}(u)=\frac{1}{n}\sum _{i=1}^{n}{\mathbf {1}}\left\{ F_{1n}(Y_{i1})\le u_{1},\ldots ,F_{dn}(Y_{id})\le u_{d}\right\} \end{aligned}$$

for \(u\in [0,1]^{d}\). In the case \(d=2\), \({\hat{C}}_{n}(u_{1},u_{2})\) is just the relative frequency of points in the copula plot lying in \( [0,u_{1}]\times [0,u_{2}]\).

We consider now the family \(\{C_{\gamma }: \gamma \in \Theta \}\) of copulas where \(\gamma \) is the q-dimensional parameter vector and \(\Theta \subset {\mathbb {R}}^{q}\) the parameter space. To fit this family to the given data, we search for a copula \(C_{{\hat{\gamma }}}\) of the family that fits best to the data. In this paper the fit of the copulas is based on the Cramér-von-Mises divergence, see Liebscher (2009). A divergence is something like a distance, but it does not fulfil some of the mathematical requirements of a distance. The estimated Cramér-von-Mises divergence is given by:

$$\begin{aligned} \widehat{{\mathcal {D}}}_{n}(C_{\gamma })=\frac{1}{n} \sum _{i=1}^{n}\left( {\hat{H}}_{n}(Y_{i})-C_{\gamma }({\tilde{F}} _{n}(Y_{i}))\right) ^{2} \end{aligned}$$
(5)

for \(\gamma \in \Theta \). This divergence measures the distance between the nonparametrically estimated distribution function and the parametric model of it. It is an interesting feature that we do not need \( {\hat{C}}_{n}\) in (5). Fitting the copula model means minimizing \( \widehat{{\mathcal {D}}}_{n}(C_{\gamma })\) with respect to the parameter \(\gamma \). The minimizer \(\mathbb {{\hat{\gamma }}}\) of \(\gamma \mapsto \widehat{{\mathcal {D}}}_{n}(C_{\gamma })\) is called a minimum-distance estimator for \(\gamma \) (cf. Tsukahara 2005; Liebscher 2009) if the following identity holds true:

$$\begin{aligned} \widehat{\mathcal {D}}_{n} (C_{\hat{\gamma }})=\min _{\gamma \in \Theta }\widehat{\mathcal {D}}_{n} (C_{\gamma }). \end{aligned}$$
(6)

Assessing the goodness-of-approximation is discussed in Liebscher (2015). For comparisons, we compute the approximation coefficient:

$$\begin{aligned} \hat{\rho }=1-\frac{\widehat{\mathcal {D}}_{n} (C_{\hat{\gamma }})}{\widehat{\mathcal {D}}_{n}(\Pi )}. \end{aligned}$$
(7)

\(\widehat{{\mathcal {D}}}_{n}(\Pi )\) is the Cramér-von-Mises divergence ( 5) for the independent copula \(\Pi \) introduced in Section 5.1. Since \({\hat{\rho }}\le 1\) holds, the coefficient \({\hat{\rho }}\) is a normed measure and can be regarded as a counterpart to the coefficient of determination in regression. If \(\Pi \in \{C_{\gamma }\}\) then \({\hat{\rho }}\in [0,1]\). Ideally, \({\hat{\rho }}\) is close to 1 indicating a perfect fit.

Using R software the computations for fitting the copulas were performed. We deployed the function optim (R package stats) for optimization.

5.4 Discussion

There are several alternative construction concepts for asymmetric copulas. Among them, nested Archimedean Copulas (McNeil 2008) are asymmetric in general but they have the disadvantage being exchangeable in some components, see also Grimaldi and Serinaldi (2006).

We apply the Cramer-von Mises divergence to fit the copulas. Alternatively, one can use maximum-likelihood estimation for fitting. This may lead to rather different parameters in comparison to minimum-distance method especially in the often occurring case that C does not belong to the family \(\{C_{\gamma }: \gamma \in \Theta \}\). Maximum-likelihood estimation has the disadvantage that one needs the density of the model copula. According to formula (5), the Cramér-von-Mises divergence needs only the empirical distribution functions \({\hat{H}} _{n},F_{1n},\ldots ,F_{dn}\) for computations. Concerning estimation, we refer to Hofert et al. (2012) and Liebscher (2015).

6 Fitting copulas to the observed ecological data

6.1 Results

First we give some comments on the choice of the data. To comprise species with similar bivariate correlations, we concentrate on the grass species Al.pr, An.od, Ar.el, Av.pu, Cy.cr, Da.gl, Fe.pr, Lo.pe and the herb species Ga.mo, Ru.ac, Ce.ja, Ga.ve, Pl.la (see Appendix C for abbreviations of species). From a glance at the correlations, it is evident that variable RCNC can be regarded as independent of the remaining variables (confidence intervals cover zero), see Table  9 in Appendix B2. The variable LCNC is omitted because of a lack of sufficient amount of data. Therefore, in this section, we consider the 3-dimensional distribution of the normalized data vector \((Y^{(1)},Y^{(2)},Y^{(3)})\) including the variables RSR, SLA, and minus HW (normalization according to Section 4.2).

Next we fit copulas C from Table 8 and product copulas according to (4) to the data vector of the variables RSR, SLA and minus HW. The copulas \(C_{1}\) and \(C_{2}\) in (4) are taken from Table 8. \( Y^{(3)}\) is taken to be the negative of HW in order to obtain positive bivariate correlations (see Table 12 for correlations) which are present for most of the copula models in Table 8. The parameter vector of product copulas consists of the parameters \(\gamma _{1},\gamma _{2}\) of \(C_{1}\) and \( C_{2}\), and the exponents \(\alpha _{1},\alpha _{2},\alpha _{3}\). The best fitting results are given in Tables 2 and 3, in Tables 10 and 11 in more detail.

Table 2 Fitting results (copula of variables RSR, SLA and minus HW): the approximation coefficient \({\hat{\rho }}\) (defined in (7)) for various basic copula models abbreviated according to Table 8
Table 3 Fitting results (copula of variables RSR, SLA and minus HW): the approximation coefficient \({\hat{\rho }}\) (defined in (7)) for various product copula models with copulas \(C_{1}\) and \(C_{2}\) abbreviated according to Table 8

Approximation coefficients greater than 0.9 indicate a very good fitting accuracy, and the best fits of Tables 2 and 3 exhibit this property. Next we focus on the grass data. The estimated copula density of the components SLA and minus HW depicted in Figure 3 reveals a significant asymmetry with respect to the main diagonal. This effect is especially shown in Figure 4 where the difference \(\Delta (u_{1},u_{2})=c(u_{1},s(u_{1}))-c(s(u_{1}),u_{1})\) (\(u_{1}\in [0,1]\)) of copula density values along cross sections \(u_{2}=s(u_{1})\) is depicted. In case of a symmetrical bivariate density, we have \(C(u,v)=C(v,u)\) for \( u,v\in [0,1]\), and therefore \(\Delta \equiv 0\).

Fig. 3
figure 3

estimated bivariate copula density of RSR and SLA, product model J-F from Table 3

Fig. 4
figure 4

sections parallel to the secondary diagonal (section along \(u_2=1-u_1\) in red, section along \(u_2=1.5-u_1\) in green, section along \(u_2=0.5-u_1\) in blue) of the estimated bivariate copula density of RSR and SLA, model J-F from Table 3

Next we show the asymmetry of the distribution of RSR, SLA and minus HW using probabilities and tail indices \(\lambda _{L}\), and \(\lambda _{U}\) (for the definition see Ghosh et al. (2020), p. 420). For this purpose, we calculate the following two quantities

$$\begin{aligned} \gamma _{1}= & {} {\mathbb {P}}\left\{ 0.75\le U_{1}\le 1,0\le U_{2}\le 0.25\right\} =0.25-C(0.75,0.25), \\ \gamma _{2}= & {} {\mathbb {P}}\left\{ 0\le U_{1}\le 0.25,0.75\le U_{2}\le 1\right\} =0.25-C(0.25,0.75). \end{aligned}$$

In case of a symmetric copula, \(\gamma _{1}\) and \(\gamma _{2}\) are identical. The corresponding results are given in Table 12 together with the empirical bivariate Spearman correlations and these correlations in the estimated copula model. The results provided in Table 12 exhibit that the multivariate distribution is non-exchangeable and with regard to the tails (upper and lower). The correlations are significantly different which is not the case for the commonly used basic models (see Table 8, Appendix B1). The noticeable difference between \(\gamma _{1}\) and \(\gamma _{2}\) for the pair SLA and -HW gives a further evidence for asymmetry. Furthermore, we see from Table 12 that the combination of small SLA and small HW is less probable in comparison to large SLA and large HW. Table 12 reveals that the difference between empirical correlations and estimated model-driven ones are reasonably small showing the good quality of estimation results. On the other hand, bivariate Spearman’s \(\rho _{S}\) coefficients differ from each other significantly which also exhibits the asymmetry of the multivariate distribution.

6.2 Discussion

Looking at the results of the previous section, the dependence structure of the copula fit reflects the asymmetric dependence structure in the data. Similar results can be obtained by incorporating variable LCNC. Since symmetric copulas exhibit identical Spearman coefficients for all pairs of variables, it is comprehensible that the use of asymmetric copulas gives better fitting results in our context. A further analysis showed that the dependence structure varies over the species. To identify these differences more data are needed. We omitted to perform goodness-of-fit tests for several reasons. In higher dimensions they are hard to apply because of a sophisticated structure of variances of test statistics (often bootstrapping has to be applied). Moreover, goodness-of-fit tests do not yield a ranking of the models under consideration based on appropriate statistics.

Typically, copulas are fitted using big data (for example in finance sciences using daily stock exchange data). Ecological plant measurements are limited because of high efforts for data gathering. In our study, we used a state-of-the art data set, having (60-80 samples per species). If sample size is too small, then there will be a degree of uncertainty in the results.

7 Conclusions

The paper provides a methodology for analyzing environmental data. The description of the multivariate distribution is split up into the one-dimensional marginal distributions and the copula. In the case of the example data set from the TRY database, the data analysis exhibits good to very good accuracy of fitting results. The results have demonstrated that copulas are a valuable tool to analyze ecological data. Product copulas according to Liebscher (2008) prove to be flexible model classes for accurate fitting even in the case of higher dimensions \(d>2\). Product copulas have not been considered up to now in the context of ecological data to our knowledge. They are also flexible to describe asymmetries in several directions occurring in most applications.

We see challenges in the accuracy, due to the limited number of samples and mainly caused by the exclusion of samples in which some traits were missing. General approaches on how to deal with missing data would be a first step towards more reliable estimates. In light of the increasing effort in observing traits reflected by growing trait databases (e.g., TRY) we see a high potential to combine ecological data with copulas to gain deeper insights in measured characteristics.