1 Introduction

Datasets arising in the social sciences often contain ordinal variables. Sometimes they are genuine ordered assessments (judgements, preferences, degree of liking of a product or adhesion to a sentence, etc.), whereas in other circumstances they are discretized or categorized for convenience (age of people in classes, education achievement, levels of blood pressure, etc.) The former situation often arises when a survey is administered to a group of people being studied, e.g. questionnaires submitted by a company to their customers with the aim of assessing their level of satisfaction towards a product or service the company has provided. Respondents choose a qualitative assessment on a graduated sequence of verbal definitions (for instance, “extremely dissatisfied”, “very dissatisfied” \(,\;{\dots }\) “very satisfied”, “extremely satisfied”), also known as “Likert scale”, which can be coded as integers numbers (\(1,2\dots , m\)) just for convenience: this amounts to assuming that the categories are evenly spaced (Iannario and Piccolo 2012). There are several statistical models and techniques that can be employed for handling multivariate ordinal data without trying to quantify their ordered categories. (The review by Liu and Agresti (2005) and the later textbook of Agresti (2010) give a thorough treatment.) Among them, correlation models and association models both study departures from independence in contingency tables and involve the assignment of scores to the categories of the row and column variables in order to maximize the relevant measure of relationship (the correlation coefficient in the correlation models or the measure of intrinsic association in association models, see Faust and Wasserman 1993). We also remind nonlinear principal component analysis (NLPCA), which is a special case of a multivariate reduction technique named homogeneity analysis and which can be usefully applied in customer satisfaction surveys (Ferrari and Manzi 2010) for mapping the observed ordinal variables into a one-dimensional (or, more generally, lower-dimensional) quantitative variable. Neither the weights of the original variables nor the differences between their categories (this is the distinction with standard principal component analysis) are assumed a priori.

However, substituting the ordered categories with the corresponding integer numbers, though representing just an arbitrary assumption, is still quite a common and accepted practice, which leads to further multivariate statistical analyses handling them as (correlated) discrete variables (Norman 2010; Carifio and Perla 2008).

Now, one may be interested in building and simulating a multivariate random vector whose univariate components are point-scale variables. In fact, describing a real phenomenon by creating mirror images and imperfect proxies of the (partially) unknown underlying population in a repeated manner allows researchers to study the performance of their statistical methods through simulated data replicates that mimic the real data characteristics of interest in any given setting (Demirtas and Yavuz 2015; Demirtas and Vardar-Acar 2017). This is often necessary since exact analytic results are seldom available for finite sample sizes, and thus, simulation is required to assess the reliability, validity, and plausibility of inferential techniques and to evaluate to which extent they are robust to deviations from statistical assumptions. However, rather than completely detailing a joint distribution for modelling the phenomena under study, it is often more convenient and realistic to specify only the marginal distributions and pairwise correlations, which are very easy to interpret and whose sample analogues can be easily computed on the dataset at hand one wants to “reproduce”. In Lee (1997), some methods are described for generating random vectors of categorical or ordinal variables with specified marginal distributions and degrees of association between variables. For ordinal variables, a common index for measuring association is Goodman and Kruskal’s \(\gamma \) coefficient (Kruskal and Goodman 1954; Ruiz and Hüllermeier 2012), ranging between \(-1\) and \(+1\), with zero corresponding to independence. A first proposal is based upon using convex combination of joint distribution with extremal values of \(\gamma \) (extremal tables); a second one relies on threshold arguments and involves Archimedean copulas. In this paper, we suggest a sort of modification of this latter method to correlated point-scale variables.

In the following, we will limit our analysis to the bivariate case, which is by far easier to deal with, but whose results, with some caution, can be generalized to the multivariate context. We consider two point-scale random variables (rvs), \(X_1\) and \(X_2\), defined over the support spaces \(\mathcal {X}_1=\left\{ 1,2,\dots ,m_1\right\} \) and \(\mathcal {X}_2=\left\{ 1,2,\dots ,m_2\right\} \), respectively, with probability mass functions \(p_{1}(i)=p_{i\cdot }=P(X_1=i),i=1,\dots ,m_1\), and \(p_{2}(j)=p_{\cdot j}=P(X_2=j),j=1,\dots ,m_2\). We want to determine some bivariate probability mass function \(p_{ij}=p(i,j)=P(X_1=i,X_2=j), i=1,\dots ,m_1;j=1,\dots ,m_2\) such that its margins are \(p_{1}\) and \(p_{2}\) and the correlation \(\rho _{X_1,X_2}\) is equal to an assigned \(\rho \). In order to give an answer to this question, we have first to recall two properties of Pearson’s correlation, which apply to both the continuous and, to even a larger extent, the discrete case; this is the topic of Sect. 2. In Sect. 3, we first state the problem of finding a joint probability function with assigned margins and correlation in general terms; then, we focus on a particular class of joint distributions, recalling how to build copula-based bivariate discrete distributions; finally, we describe the proposed procedure for inducing a desired value of correlation between two point-scale variables. Section 4 illustrates an application to CUB distributions. Section 5 recalls inferential procedures for dependent rvs and based on the algorithm of Sect. 3 devises a sort of moment method for estimating the dependence parameter of the copula-based bivariate distribution. Section 6 describes a Monte Carlo simulation study whose aim is to comparatively assess the statistical performances of the new inferential method and the existing ones based on maximum likelihood. Section 7 provides an application to a real data set. In the concluding section, some final remarks are provided.

2 Attainable correlations between two random variables

Pearson’s linear correlation is by far the most popular measure of correlation between two quantitative variables.

It is often employed to measure correlation also between Likert scale variables, which arises some criticism.

Despite the many useful properties it enjoys, it also reveals some disadvantages. A first drawback of Pearson’s correlation is that given two marginal cumulative distribution functions (cdfs) \(F_1\) and \(F_2\) and a correlation value \(\rho \in [-1,+1]\), it is not always possible to construct a joint distribution F with margins \(F_1\) and \(F_2\), whose correlation is equal to the assigned \(\rho \). This is an issue that is often underrated if not neglected by researchers (Leonov and Qaqish 2020). We can state this result (often reported as “attainable correlations”, see McNeil et al. 2005, pp. 204-205) in the following way. Let \((X_1,X_2)\) be a random vector with marginal cdfs \(F_1\) and \(F_2\) and an unspecified joint cdf; assume also that \(\text {Var}(X_1) > 0\) and \(\text {Var}(X_2) > 0\). The following statements hold:

  1. 1.

    The attainable correlations form a closed interval \([\rho _{\min }, \rho _{\max }]\) with \(\rho _{\min }< 0 < \rho _{\max }\).

  2. 2.

    The minimum correlation \(\rho =\rho _{\min }\) is attained if and only if \(X_1\) and \(X_2\) are countermonotonic. The maximum correlation \(\rho = \rho _{\max }\) is attained if and only if \(X_1\) and \(X_2\) are comonotonic.

  3. 3.

    \(\rho _{\min } = -1\) if and only if \(X_1\) and \(-X_2\) are of the same type, and \(\rho _{\max } = 1\) if and only if \(X_1\) and \(X_2\) are of the same type.

For point-scale rvs \(X_1\) and \(X_2\), it is then clear that the maximum correlation is \(+1\) if and only if they are identically distributed: \(p_1(i)=p_2(i)\), \(\forall i=1\dots ,m\), with \(m=m_1=m_2\), whereas the minimum correlation is \(-1\) if and only if \(m_1=m_2=m\) and \(p_1(i)=p_2(m-i+1)\), \(\forall i=1,\dots ,m\). In the general case, the values \(\rho _{\min }\) and \(\rho _{\max }\) can be computed by building the cograduation and countergraduation tables (see Salvemini 1939 and Ferrari and Barbiero 2012 for an example of calculation).

A second result about Pearson’s correlation can be resumed as follows: given two margins \(F_1\) and \(F_2\) and a feasible linear correlation \(\rho \) (i.e. a value falling within the interval \([\rho _{\min },\rho _{\max }]\)), the joint distribution F having margins \(F_1\) and \(F_2\) and correlation \(\rho \) is not unique. In other terms, the marginal distributions and pairwise correlation of a bivariate rv do not univocally determine its joint distribution. Even if this second fallacy may represent a limit from one side, on the other side represents a form of flexibility, since it means that given two point-scale distributions and a consistent value of \(\rho \), there are generally several (possibly, infinite) different ways to join them into a bivariate distribution with that value of correlation, as we will see in the next two sections.

3 A procedure for inducing a desired value of correlation between two point-scale random variables with assigned marginal distributions

We will now state the problem object of this work in general terms; then, resorting to copulas, we will reformulate it in a more specific context. Somehow, we will split the original problem into two sequential sub-problems: (i) finding a family of joint distributions with the assigned margins, (ii) finding within this family a distribution with the desired value of correlation.

3.1 Statement of the problem

The problem of finding a bivariate point-scale distribution with assigned marginal distributions and Pearson’s correlation can be laid out as follows. We have to find the \(m_1\times m_2\) probabilities \(p_{ij}\), \(0\le p_{ij} \le 1\), defining the joint pmf of the rv \((X_1,X_2)\), which satisfy the following system of equalities:

$$\begin{aligned} {\left\{ \begin{array}{ll} \sum _{j=1}^{m_2} p_{ij} {} = p_{i\cdot }, \quad i=1,\dots ,m_1\\ \sum _{i=1}^{m_1} p_{ij} {} = p_{\cdot j},\quad j=1,\dots ,m_2\\ \rho _{x_1x_2} =(\mathbb {E}(X_1X_2)-\mathbb {E}(X_1)\mathbb {E}(X_2))(\text {Var}(X_1)\text {Var}(X_2))^{-1/2} \end{array}\right. } \end{aligned}$$
(1)

with \(\mu _1=\mathbb {E}(X_1)=\sum _{i=1}^{m_1} ip_{i\cdot }\), \(\text {Var}(X_1)=\sum _{i=1}^{m_1} (i-\mu _1)^2p_{i\cdot }\) (analogous results hold for \(X_2\)), and \(\mathbb {E}(X_1X_2)=\sum _{i=1}^{m_1}\sum _{j=1}^{m_2} ijp_{ij}\). The first two equalities correspond to the request of matching the assigned marginal distributions and the latter to the assigned correlation. The total number of equality constraints is \(m_1+m_2\) (\(m_1+m_2-1\) actual constraints on the two margins, plus one on Pearson correlation).

If, for example, \(m_1=m_2=2\) (i.e. if \(X_1\) and \(X_2\) are shifted Bernoulli rvs), then we have a system of 4 equations in 4 variables, which yields a unique solution for the \(p_{ij}\). In this case, in fact, one can easily prove that if the assigned \(\rho \) falls within the bounds \(\rho _{\min }\) and \(\rho _{\max }\), equal to

$$\begin{aligned} \rho _{\max }&= [\min (1-p_{1\cdot },1-p_{\cdot 1})-(1-p_{1\cdot })(1-p_{\cdot 1})]/\sqrt{p_{1\cdot }(1-p_{1\cdot })p_{\cdot 1}(1-p_{\cdot 1})}\\ \rho _{\min }&= [\max (0,1-p_{\cdot 1}-p_{1\cdot })-(1-p_{1\cdot })(1-p_{\cdot 1})]/\sqrt{p_{1\cdot }(1-p_{1\cdot })p_{\cdot 1}(1-p_{\cdot 1})} \end{aligned}$$

the probabilities satisfying the system (1) are

$$\begin{aligned} {\left\{ \begin{array}{ll} p_{22} &{}= (1-p_{1\cdot })(1-p_{\cdot 1}) + \rho \sqrt{p_{1\cdot }(1-p_{1\cdot })p_{\cdot 1}(1-p_{\cdot 1})}\\ p_{12}&{}=1-p_{\cdot 1}-p_{22}\\ p_{21}&{}=1-p_{1\cdot }-p_{22}\\ p_{11}&{}=p_{1\cdot }-p_{12} \end{array}\right. } \end{aligned}$$

For higher values of \(m_1\) (and \(m_2\)) the solution is not unique; generally, there are infinite solutions (i.e. bivariate distributions) satisfying system (1)—given that the bivariate correlation bounds are respected (i.e. \(\rho _{\min }\le \rho \le \rho _{\max }\)). We illustrate it through the following example.

Example 1

(Computation of joint probabilities for a \(2\times 3\) contingency table). Let us consider a rv \((X_1,X_2)\), whose marginals distributions are \(p_{1\cdot }=1/3\), \(p_{2\cdot }=2/3\); \(p_{\cdot 1}=1/4\), \(p_{\cdot 2}=1/2\), \(p_{\cdot 3}=1/4\), with support \(\left\{ 1,2\right\} \) and \(\left\{ 1,2,3 \right\} \), respectively. Marginal expected values and variances are: \(\mathbb {E}(X_1)=5/3\), \(\mathbb {E}(X_2)=2\); \(\text {Var}(X_1)=2/9\), \(\text {Var}(X_2)=1/2\). The bounds for Pearson’s correlation, based on countergraduation and cograduation tables (see Table 1a and b), are \(\rho _{\min }=-3/4\) and \(\rho _{\max }=3/4\). Let the target correlation value be \(\rho =1/3\). Then, system (1) consists of the following 6 equations in 6 variables:

$$\begin{aligned} {\left\{ \begin{array}{l} p_{11} + p_{12} + p_{13} = 1/3\\ p_{21} + p_{22} + p_{23}= 2/3\\ p_{11} + p_{21} = 1/4\\ p_{12} + p_{22} = 1/2\\ p_{13} + p_{23} = 1/4\\ 3(p_{11} + 2p_{12} + 3p_{13} + 2p_{21} + 4p_{22} + 6p_{23} -10/3) = 1/3 \end{array}\right. } \end{aligned}$$

Solving the system above, imposing that each \(p_{ij}\) must be non-negative, we obtain that a valid joint pmf respecting the assigned margins and the correlation coefficient is derived as

$$\begin{aligned} {\left\{ \begin{array}{ll} p_{12} &{}= 4/9 -2p_{11}\\ p_{13} &{}= p_{11}-1/9\\ p_{21} &{}= 1/4-p_{11}\\ p_{22} &{}= 2p_{11}+1/18\\ p_{23} &{}= 13/36-p_{11} \end{array}\right. } \end{aligned}$$

with \(p_{11}\) taking any value contained in the interval [1/9, 2/9]; for example, if we set \(p_{11}=1/6\), we obtain the contingency table displayed in Table 2a; letting \(p_{11}=1/9\), the result is Table 2b.

Table 1 Cograduation and countergraduation tables corresponding to the marginal distributions of Example 1
Table 2 Examples of joint distributions with the same assigned margins and the same correlation coefficient \(\rho =1/3\); see Example 1

Instead of considering the whole set of feasible joint probabilities \(p_{ij}\) obtained by solving system (1), one can restrict the analysis to a particular subset of these solutions satisfying the first two constraints on margins. For this aim, we will now recall the concept of copula and copula-based bivariate discrete distributions.

3.2 Generating bivariate discrete distributions having the pre-specified margins

Using copulas represents a straightforward solution for easily constructing a multivariate distribution respecting the assigned margins. A d-dimensional copula C is a joint cdf in \([0, 1]^d\) with standard uniform margins \(U_j\), \(j=1\dots ,d\):

$$\begin{aligned} C(u_1,\dots ,u_d):=P(U_1\le u_1,\dots ,U_d\le u_d),\quad (u_1,\dots ,u_d)\in [0,1]^d. \end{aligned}$$

The importance of copulas in studying multivariate cdfs is summarized by the Sklar’s theorem (McNeil et al. 2005), whose version for \(d=2\) states that if \(F_1\) and \(F_2\) are the cdfs of the rvs \(X_1\) and \(X_2\), the function

$$\begin{aligned} F(x,y)=C(F_1(x),F_2(y)),\quad x,y\in \mathbb {R}, \end{aligned}$$
(2)

defines a valid joint cdf, whose margins are exactly \(F_1\) and \(F_2\). This result keeps holding if \(X_1\) and \(X_2\) are point-scale rvs; in this case, the joint pmf can be derived from (2) as:

$$\begin{aligned} p(i,j)=F(i,j)-F(i-1,j)-F(i,j-1)+F(i-1,j-1), \end{aligned}$$
(3)

for \(i=1,\dots ,m_1;j=1,\dots ,m_2\). It is worth noting that given a joint cdf F with margins \(F_1\) and \(F_2\), Sklar’s theorem also states that there exists a copula C such that F can be written as in (2). This copula is unique if \(F_1\) and \(F_2\) are continuous; on the contrary, uniqueness is not guaranteed if they are discrete.

There exists a multitude of copulas, which—as happens for joint cdfs—usually depend on some parameter \(\theta \); we will now review three well-known parametric copula families.

3.2.1 The Gauss copula

The d-variate Gauss copula is the copula that can be extracted from a d-variate normal vector \(\pmb {Y}\) with mean vector \(\pmb {\mu }\) and covariance matrix \(\varSigma \) and is exactly the same as the copula of \(\pmb {X} \sim \mathcal {N}_d (\pmb {0}, P)\), where P is the correlation matrix of \(\pmb {Y}\). In two dimensions, it can be expressed, for \(\rho _{Ga}\ne \pm 1\), as:

$$\begin{aligned} C^{Ga}(u_1,u_2;\rho _{Ga})=\int _{-\infty }^{\varPhi ^{-1}(u_1)}\int _{-\infty }^{\varPhi ^{-1}(u_2)} \frac{1}{2\pi \sqrt{1-\rho _{Ga}^2}} e^{-\frac{s_1^2-2\rho _{Ga} s_1s_2+s_2^2}{2(1-\rho _{Ga}^2)}}\text {d}s_1\text {d}s_2. \end{aligned}$$

Independence, comonotonicity, and countermonotonicity copulas are special cases of the bivariate Gauss copula (for \(\rho _{Ga}=0\), \(\rho _{Ga}\rightarrow +1\), and \(\rho _{Ga}\rightarrow -1\), respectively).

3.2.2 The Frank copula

The one-parameter bivariate Frank copula is defined as

$$\begin{aligned} C^F(u_1,u_2;\kappa )=-\frac{1}{\kappa } \ln \left[ 1 + \frac{(e^{-\kappa u_1}-1)(e^{-\kappa u_2}-1)}{e^{-\kappa }-1}\right] , \end{aligned}$$

with \(\kappa \ne 0\). For \(\kappa \rightarrow 0\), we have that the Frank copula reduces to the independence copula; for \(\kappa \rightarrow +\infty \), it tends to the comonotonicity copula; for \(\kappa \rightarrow -\infty \), it tends to the countermonotonicity copula.

3.2.3 The Plackett copula

The one-parameter bivariate Plackett copula is defined as

$$\begin{aligned} C^P(u_1,u_2;\theta )=\frac{1+(\theta -1)(u_1+u_2)-\sqrt{[1+(\theta -1)(u_1+u_2)]^2-4\theta (\theta -1)u_1u_2}}{2(\theta -1)}, \end{aligned}$$

with \(\theta \in (0,+\infty )\setminus \left\{ 1\right\} \). When \(\theta \rightarrow 1\), it reduces to the independence copula, whereas for \(\theta \rightarrow 0\) it tends to the countermonotonicity copula and for \(\theta \rightarrow \infty \) to the comonotonicity copula.

3.3 An algorithm for inducing any feasible value of correlation within a parametric copula-based family of distributions

In order to induce any feasible value of correlation between the two discrete margins of the distribution (2), we have further to impose that the copula \(C(\cdot ;\theta )\) is able to encompass the entire range of dependence, from perfect negative dependence (which leads to the linear correlation \(\rho _{\min }\)) to perfect positive dependence (\(\rho _{\max }\)). Copulas enjoying this property are named “comprehensive”; the three copulas recalled in the previous section are all comprehensive.

Once the marginal distributions of \(X_1\) and \(X_2\) are assigned, and the parametric copula \(C(\cdot ;\theta )\) has been selected, their correlation coefficient \(\rho _{X_1,X_2}\) will depend only on the copula parameter \(\theta \in [\theta _{\min },\theta _{\max }]\); this relationship may be written in an analytical or numerical form, say \(\rho _{X_1,X_2}=g(\theta |F_1,F_2)\). Since the function g is not usually analytically invertible, inducing a desired value of correlation \(\rho \) between two point-scale variables, falling in \([\rho _{\min },\rho _{\max }]\), by setting an appropriate value of \(\theta \), is a task that can be generally done only numerically, by finding the (unique) root of the equation \(g(\theta )-\rho =0\). If \(g(\theta )\) is a monotone increasing function of the copula parameter, it can be implemented by resorting to the following iterative procedure (see a similar proposal for the Gauss copula in Ferrari and Barbiero 2012; Barbiero and Ferrari 2017; and an early extension to other copulas in Barbiero 2018):

  1. 1.

    Set \(\theta ^{(0)}=\theta ^{\varPi }\) (with \(\theta ^{\varPi }\) being the value of \(\theta \) for which the copula C reduces to the independence copula); \(\rho ^{(0)}=0\).

  2. 2.

    Set \(t\leftarrow 1\) and \(\theta =\theta ^{(t)}\), with \(\theta ^{(t)}\) some value strictly greater (smaller) than \(\theta ^{(0)}\) if \(\rho >(<)0\)

  3. 3.

    Compute \(F(i,j;\theta ^{(t)})\) using (2)

  4. 4.

    Compute \(p(i,j;\theta ^{(t)})\) using (3)

  5. 5.

    Compute \(\rho ^{(t)}\)

  6. 6.

    If \(|\rho ^{(t)} - \rho |<\epsilon \) stop; else set \(t\leftarrow t+1\), \(\theta ^{(t)}\leftarrow \min (\theta _{\max },\theta ^{(t-1)}+m^{(t)}(\rho -\rho ^{(t-1)}))\) if \(\rho >0\), or \(\theta ^{(t)}\leftarrow \max (\theta _{\min },\theta ^{(t-1)}+m^{(t)}(\rho -\rho ^{(t-1)}))\) if \(\rho <0\), with \(m^{(t)}=\dfrac{\theta ^{(t-1)}-\theta ^{(t-2)}}{\rho ^{(t-1)}-\rho ^{(t-2)}}\); go back to 3.

The iterative process at the basis of the algorithm is quite clear (see Fig. 1): one starts from two points in the \((\theta ,\rho )\) Cartesian diagram: \(A=(\theta ^{(0)},\rho ^{(0)})\) and \(B=(\theta ^{(1)},\rho ^{(1)})\), where \(\theta ^{(1)}\) can be chosen arbitrarily, respecting the unique condition that the resulting \(\rho ^{(1)}\) has the same sign as the target \(\rho \). From these two points, one derives the next value of \(\theta \), \(\theta ^{(2)}\) (corresponding to the abscissa of point C), by linear interpolation, considering the slope \(m^{(2)}\), associated with the line passing through them, respecting the lower or upper bounds of \(\theta \) (this is why the \(\min \) and \(\max \) operators appear in the recursive formulas for \(\theta ^{(t)}\) of step 6); the procedure then continues, computing the actual value \(\rho ^{(2)}\) (ordinate of point D), and then iteratively updating \(\theta ^{(t)}\) (and computing \(\rho ^{(t)}\)) by taking into account just the last two points for determining the updated \(m^{(t)}\).

The above heuristic algorithm makes sense if g is a monotone increasing function, which is often the case: for the Gauss, Frank, and Plackett copulas, the linear correlation is an increasing function of the dependence parameter \(\theta \), keeping fixed the two marginal distributions.

In fact, let us recall that we say that the joint cdf \(F(x_1,x_2;\theta )\), with fixed margins \(F_1\) and \(F_2\), is “increasing in concordance” as \(\theta \) increases if, for any \(\theta _2>\theta _1\),

$$ F(x_1,x_2;\theta _1)\le F(x_1,x_2;\theta _2)\quad \forall x_1,x_2\in \mathbb {R}. $$

Then, it follows (see, for example, Scarsini and Shaked 1996)

$$ \rho _{X_1,X_2}(\theta _1)\le \rho _{X_1,X_2}(\theta _2). $$

Since the Gauss copula, the Frank copula, and the Plackett copula are all increasing in concordance with respect to their parameter, i.e.

$$ C(u_1,u_2;\theta _1)\le C(u_1,u_2;\theta _2), $$

(Joe 2014) and the same holds for the joint cdf \(F(x_1,x_2;\theta )=C(F_1(x_1),F_2(x_2);\theta )\), then we can claim that \(\rho _{X_1,X_2}\) is increasing in \(\theta \).

A very particular case is represented by the Gauss copula. A known theoretical result, which goes under the name of Lancaster theorem (Lancaster 1957, p.290), and later reported for example in Cario and Nelson (1997), allows us to claim that the correlation between the discrete rvs \(X_1\) and \(X_2\) has the same sign of and in absolute value it is not greater than the Gauss copula correlation: \(\text {sgn}(\rho _{X_1,X_2})=\text {sgn}(\rho _{Ga})\) and \(|\rho _{X_1,X_2}|\le |\rho _{Ga}|\). Therefore, a reasonable value for the starting value \(\theta ^{(1)}:=\rho _{Ga}^{(1)}\) is the value of the target correlation itself.

The advantage of the proposed algorithm stands in the four following (connected) features: (i) the flexibility in the choice of the underlying copula, which can be different from the Gaussian and is just required to span the entire dependence range, (ii) the capacity of finding the appropriate value of \(\theta \) without making use of any sample from the two marginal distributions, thus avoiding introducing sampling errors, (iii) the possibility of controlling a priori the error \(\epsilon \) (absolute difference between target and actual value of \(\rho _{X_1,X_2}\))—setting \(\epsilon \) equal to \(10^{-7}\) generally allows to recover \(\theta \) in a few steps, and (iv) the absence of inner potentially time-consuming optimization or root finding routines.

Existing procedures for solving the same (or a similar) problem are available in the literature, but do not enjoy all the features above mentioned. For example, the proposal by Demirtas (2006) is based on simulating binary data whose marginals are derived collapsing the prespecified marginals of ordinal variables. The correlation matrix of the binary variables is obtained by an iterative process in order to match the target correlation matrix for ordinal data, which requires the generation of a “huge” bivariate sample of binary data.

Other proposals by Madsen and Dalthorp (2007), Ferrari and Barbiero (2012), or Xiao (2017) for simulating multivariate ordinal/discrete variables with assigned margins and correlations exclusively address the dependence structure induced by the Gauss copula.

Lee and Kaplan (2018) proposed two procedures based on the principles of maximum entropy and minimum cross-entropy to simulate multivariate ordinal variables with assigned values of marginal skewness and kurtosis; they rely on the multivariate normal distribution as a latent variable. Foldnes and Olsson (2016) proposed a simulation technique for nonnormal data with pre-specified skewness, kurtosis, and covariance matrix, by using linear combinations of independent generator (IG) variables; its most important feature is that the resulting copula is not Gaussian. In Nelsen (1987), using convex linear combinations of the pmfs for the discrete Fréchet boundary distributions (i.e. those corresponding to comotonicity and countermonotonicity) and the pmf for independent rvs, the author constructs bivariate pmfs for dependent discrete rvs with arbitrary marginals and any correlation between the theoretical minimum and maximum. A similar rationale has been later used by Demirtas and Vardar-Acar (2017) for devising an algorithm for inducing any desired Pearson or Spearman correlation to independent bivariate data whose marginals can be of any distributional type and nature. The algorithm we proposed, though limited to point-scale rvs, is much more flexible as it allows a much broader choice of dependence structures, whereas the latter two procedures employ a convex combination of bivariate comonotonicity and countermonotonicity copulas. (An analogous way was followed by Lee (1997) for the first method.)

Obviously, in the simplest case of a bivariate shifted Bernoulli (\(m_1=m_2=2\)), the proposed algorithm recovers (numerically) the same unique bivariate distribution yielded (analytically) by system (1), whatever copula is selected (provided it spans the entire dependence spectrum, i.e. it is comprehensive). For the case of dependent Bernoulli rvs, see also the example presented in McNeil et al. 2005, p.188.

We remark that even if we mentioned three well-known comprehensive copulas (Gauss, Frank, and Plackett), which are all exchangeable and radially symmetric (see, for example, McNeil et al. (2005), chapter 5), exchangeability and radial symmetry are not necessary conditions for the algorithm to work; thus, the proposed procedure is able to deal with asymmetrical dependence, which often occurs in many fields, especially in finance. The comprehensive property is instead required if one wants to span the entire range of feasible linear correlations between the two point-scale rvs. If one uses the Gumbel or the Clayton copula (both belonging to the broad class of Archimedean copulas) to induce correlation between the rvs, since these two copulas can only model positive dependence through their scalar parameter \(\theta \), it descends that only positive (or at most null) values of linear correlation can be induced and then assigned. A useful reference is Table 4.1 in Nelsen (2006), where some important one-parameter families of Archimedean copulas are listed along with some special and limiting cases; from there, it is thus possible to distinguish and select comprehensive copulas.

The algorithm presented in this section is naturally conceived for one-parameter copulas, but it can be extended to p-parameter copulas, \(p\ge 2\) (just think of Student’s t copula); in this case, \(p-1\) higher-order correlations or co-moments need to be assigned along with the usual linear correlation in order to calibrate all the parameters.

Fig. 1
figure 1

Graphical representation of the algorithm proposed for inducing a feasible value of correlation between point-scale rvs. The abscissas of black-filled points (A, B, D, F, \(\dots \)) represent the succession of the values \(\theta ^{(t)}\) of the copula parameter, converging to the solution of the problem (1)

3.4 Extension to multivariate context

The extension of the proposed procedure to dimension \(d>2\) —finding a joint distributions with d assigned margins and \(d(d-1)/2\) distinct pairwise correlations—is not straightforward at all, but this is due to theoretical rather than computational reasons. To explain it, we will refer to a counterexample described in Bergsma and Rudas (2002) and Chaganty and Joe (2006). Let X, Y and Z be three correlated binary rvs, each with support \(\left\{ 0, 1\right\} \), such that \(P(X = 1) = P(Y = 1) = P(Z = 1) = 0.5\) (nothing changes if we select \(\left\{ 1, 2\right\} \) as common support). Based on the bivariate correlation bounds (see point 3 in Sect. 2) the three correlation coefficients \(\rho _{XY}\) , \(\rho _{XZ}\), \(\rho _{YZ}\) can lie in the interval \([-1, +1]\). However, if we choose \(\rho _{XY} = \rho _{YZ}= 0.4\) and \(\rho _{XZ} = -0.4\), then a trivariate distribution for (XYZ) does not exist. So, a first type of problem is related to the feasibility of the (assigned) correlation matrix P for the discrete d-variate random vector: even if all the bivariate correlation bounds are respected and even if the matrix P reckoning all the pairwise correlations is a valid correlation matrix (a symmetric matrix with all ones on the main diagonal, which is positive semidefinite), nevertheless it may be impossible to construct a random vector with the d assigned margins and correlation matrix P. A second type of problem is related to the nearly lack of copulas able to calibrate—through their parameters—the values of the resulting pairwise correlation coefficients. The typical generalizations of the Frank and Plackett copulas, which we discussed in Sect. 3.2, are still characterized by a unique scalar parameter. Assigning arbitrary—though feasible—values to the pairwise correlations of the discrete random vector, generally would lead to no solution in terms of the dependence parameter. A way to overcome this issue is resorting to a copula whose number of parameters is at least equal to the number of distinct pairwise correlations: an obvious candidate is the Gauss copula; a richer option is represented by the t copula. Alternatively, one can resort to pair copula construction through vines, which are graphical models that represent high-dimensional distributions and can model a rich variety of dependencies. Another limitation of the algorithm of Sect. 3.3 is that it only handles rvs with finite supports: extensions to count variables need to seek for an accurate approximation of the correlation coefficient at step 5, possibly entailing a truncation of the support of the joint distribution of steps 3 and 4, in order to compute the correlation coefficient through a double finite summation.

3.5 Pseudo-random simulation

Simulating samples from a bivariate rv with assigned point-scale margins and correlation, built according the procedure described in Sects. 3.2 and 3.3, is straightforward. One can resort to the following general algorithm for meta-copula distributions:

  1. 1.

    Simulate a random sample \((u_1,u_2)\) from the copula \(C(u_1,u_2;\theta )\), where \(\theta \) is the value of the copula parameter recovered through the algorithm of Section 3.3;

  2. 2.

    Set \(x_1=F_1^{-1}(u_1)\) and \(x_2=F_2^{-1}(u_2)\), where \(F_1^{-1}\) and \(F_2^{-1}\) are the generalized inverse functions of \(F_1\) and \(F_2\), respectively;

  3. 3.

    \((x_1,x_2)\) is a random sample from the target bivariate distribution, with copula \(C(u_1,u_2;\theta )\) and margins \(F_1\) and \(F_2\).

Alternatively, since both \(X_1\) and \(X_2\) have a finite support space, one can resort to the “inversion algorithm” described in (Devroye (1986), pp.85,559) and used in Lee (1997): in this case, one directly considers the joint probability mass function p(ij) of Eq.(3) and proceeds as follows:

  1. 1.

    Set \(N=m_1\times m_2\); let \(\pi _t\), \(t=1,\dots ,N\) be the joint probabilities \(p_{ij}\) arranged in descending order;

  2. 2.

    Let \(y_t\) be the corresponding possible values of \(Y=(X_1,X_2)\), arranged similarly;

  3. 3.

    Define \(z_0,z_1,\dots , z_{N}\) in the following way:

    $$\begin{aligned} z_0&= 0\\ z_t&= z_{t-1} + \pi _t,\quad t=1,\dots , N \end{aligned}$$
  4. 4.

    Simulate a random number u from a standard uniform rv;

  5. 5.

    Return \(y_t\), where \(z_{t-1}< u \le z_t\).

4 Application to CUB random variables

A CUB rv X is defined as the mixture, with weights \(\pi \) and \(1-\pi \), of a shifted binomial with parameters m and \(\xi \) and a discrete uniform distribution over the support \(\left\{ 1,2,\dots ,m \right\} \), for \(m>3\) (Piccolo 2003). Its probability mass function is thus

$$\begin{aligned} P(X=i) = {\pi }\left( {\begin{array}{c}m-1\\ i-1\end{array}}\right) \xi ^{m-i}(1-\xi )^{i-1} + (1-\pi )\frac{1}{m},\quad i=1,2,\dots ,m, \end{aligned}$$

with \((\pi ,\xi )\) a parameter vector with parameter space \(\varTheta =(0,1]\times [0,1]\). This rv is shown to be adapt to model the discrete choice of an individual from an ordinal list of m alternatives, which can be regarded as the result of a complex decision blending both the attractiveness (“feeling”) towards an item, expressed by \(1-\xi \), and fuzziness surrounding the final choice (“uncertainty”) expressed by \(1-\pi \). The requirement \(m>3\) allows the identifiability of the model as it avoids the cases of a degenerate rv (if \(m=1\)), an indeterminate model (if \(m=2\)) and a saturated model (if \(m=3\)). With only two parameters, the CUB rv is able to model extreme and intermediate modes, positive and negative skewness, flat and peaked behaviours, etc. (see Piccolo 2003; Iannario and Piccolo 2010, for further properties of this stochastic model).

Corduas (2011) proposed using the Plackett copula in order to construct a one-parameter bivariate distribution from CUB margins; this proposal was later investigated by Andreis and Ferrari (2013), also in a multivariate direction. Here, we reprise and extend these attempts of constructing a bivariate CUB rv, by resorting to the results discussed in Section 3.2. Let us suppose that we want to build a bivariate model with margins \(X_1\sim \text {CUB}(m_1=5,\pi _1=0.4,\xi _1=0.8)\) and \(X_2\sim \text {CUB}(m_2=5,\pi _2=0.7,\xi _2=0.3)\); we can find the values of the attainable correlations by using the function corrcheck in GenOrd (Barbiero and Ferrari 2015a), which returns the values \(\rho _{\min }=-0.952003\) and \(\rho _{\max }=0.8640543\). We proceed and select a desired feasible value of correlation between the two CUB variates, say \(\rho =0.6\).

Afterward, we recover the values of \(\rho _{Ga}\) (for the Gauss copula), \(\kappa \) (for the Frank copula), and \(\theta \) (for the Plackett copula), according to the iterative procedure illustrated in the previous section. By setting \(\epsilon =10^{-7}\), we obtain \(\rho _{Ga}=0.6898959\), \(\kappa =5.453455\), and \(\theta =11.30106\). Table 3 reports the detailed iterations of the algorithm for the three copula-based models. Note that since \(\rho >0\), for the Frank copula we selected a value \(\kappa ^{(1)}\) larger than zero (tentatively, \(\kappa ^{(1)}=1\)) and for the Plackett copula a value \(\theta ^{(1)}\) larger than one (tentatively, \(\theta ^{(1)}=2\)). For the Gauss copula, we set \(\theta ^{(1)}=\rho =0.6\).

Table 3 Iterations of the algorithm when applied to the example CUB model. Note that for (b) and (c) the \(\left\{ \theta ^{(t)}\right\} \) is an increasing sequence

The three joint pmfs, sharing the same value of linear correlation, are reported in Table 4. It is easy to notice the differences among them. For example, the probability \(p_{23} = P(X_1=2,X_2=3)\) takes the values 0.0922, 0.0948, and 0.1008, across the three joint distributions.

Table 4 Bivariate distribution with margins \(X_1\sim \text {CUB}(m_1=5,\pi _1=0.4,\xi _1=0.8)\) and \(X_2\sim \text {CUB}(m_2=5,\pi _2=0.7,\xi _2=0.3)\) and \(\rho _{x_1x_2}=0.6\), obtained based on different copulas

Figure 2 displays the relationship between the copula parameter \(\theta \) (on the x axis) and the corresponding Pearson’s correlation of the example bivariate CUB model with Gauss, Frank, and Plackett copula, constructed by applying steps \(3\div 5\) of the proposed algorithm over a dense grid of uniformly spaced values of \(\theta \). The almost linear trend for the Gauss copula can be easily noted (in this case, the copula parameter is itself a correlation coefficient!), whereas for the other two it is obviously (highly) nonlinear, due also to the unbounded domain for \(\kappa \) and \(\theta \). From the figure, one can state that \(\rho _{X_1,X_2}\) is a monotone concave function of the copula parameter for the Plackett copula, and for the Frank copula only when \(\kappa >0\): this explains the increasing nature of the sequences \(\left\{ \theta ^{(t)}\right\} \) in Tables 3b and 3c.

Fig. 2
figure 2

Relationship between linear correlation and dependence parameter for a bivariate CUB model with Gaussian (top panel), Frank (centre), and Plackett (bottom) copula. Points are drawn in correspondence of \(\rho =0.6\)

5 Inferential aspects

If we have a bivariate ordinal sample \((x_{1t},x_{2t})\), \(t=1,\dots ,n\), that we assume has been drawn from a joint cdf \(F(x_1,x_2;\theta ,\theta _1,\theta _2)=C(F_1(x_1;\theta _1),F_2(x_2;\theta _2);\theta )\), then parameters’ estimation can be carried out through different inferential techniques. First, let us define the log-likelihood function \(\ell \) as

$$\begin{aligned} \ell (\theta _1,\theta _2,\theta )&=\ell (\theta _1,\theta _2,\theta |(x_{11},x_{21}),\dots ,(x_{1n},x_{2n})) = \sum _{t=1}^n \log p(x_{1t},x_{2t};\theta _1,\theta _2,\theta )\nonumber \\&= \sum _{i=1}^{m_1}\sum _{j=1}^{m_2} n_{ij} \log p(i,j;\theta _1,\theta _2,\theta ) \end{aligned}$$
(4)

with p being the joint pmf of Eq.(3) and \(n_{ij}\) the absolute joint frequency of \((x_i,x_j)\), \(i=1,\dots ,m_1\), \(j=1,\dots ,m_2\).

We now present three possible ways to perform point estimation of the distribution’s parameters: the first one is the customary maximum likelihood method; the second one is a modification thereof, whose use with copula models is however quite consolidated; the third one is directly suggested by the simulation procedure of Sect. 3.3 and can be regarded to as a by-product of the methodology presented in Sect. 3.

5.1 Full maximum likelihood

The most standard estimation technique consists of maximizing \(\ell \) with respect to all the three parameters (or parameter vectors) simultaneously. This task can be usually done just numerically (i.e. no closed-form expressions are available for the parameter estimates), by resorting to customary optimization routines. The resulting maximum likelihood (ML) estimates can be thus derived as

$$ (\hat{\theta }_1,\hat{\theta }_2,\hat{\theta })=\arg \max _{(\theta _1,\theta _2,\theta )\in \varTheta } \ell (\theta _1,\theta _2,\theta ) $$

where \(\varTheta \) is the parameter space.

5.2 Two-step maximum likelihood

This technique aims at reducing the computational burden of the previous one, by splitting the original maximization problem into two subsequent (sets of) maximizations in lower dimensions. In the first step, one estimates \(\theta _1\) and \(\theta _2\) separately, by resorting to maximum likelihood estimation, as if the two univariate components of the bivariate sample were independent, i.e. maximizing separately their marginal log-likelihood functions:

$$\begin{aligned} \ell _1(\theta _1) = \sum _{i=1}^{m_1} n_{i\cdot }\cdot \log p_1(i;\theta _1),\quad \ell _2(\theta _2) = \sum _{j=1}^{m_2} n_{\cdot j}\cdot \log p_2(j;\theta _2), \end{aligned}$$

with \(n_{i\cdot }\) and \(n_{\cdot j}\) being the observed marginal frequencies of \(X_1\) and \(X_2\), respectively, thus finding the estimates \(\hat{\theta }_1^{TS}\) and \(\hat{\theta }_2^{TS}\) (the superscript “TS” standing for “two-step”). Then, one sets \(\theta _1=\hat{\theta }_1^{TS}\) and \(\theta _2=\hat{\theta }_2^{TS}\) in (4) and maximize it with respect to \(\theta \), finding \(\hat{\theta }^{TS}\). This technique was introduced in a more general context and exhaustively described in Joe and Xu (1996), where it is also named “inference function for margins” (IFM). The authors compared the efficiency of the IFM with the ML by simulation and found that the ratio of the mean square errors of the IFM estimator to the full ML is close to 1. Theoretically, the ML estimator should be asymptotically the most efficient, since it attains the minimum asymptotic variance bound. However, for finite samples, Patton (2006) found that the IFM was often even more efficient than the ML. As a result, IFM is the main estimation method employed in estimating copula models.

5.3 Two-step maximum likelihood + method of moment

This method is directly suggested by the algorithmic procedure of Sect. 3.3. First, one estimates the marginal parameters \(\theta _1\) and \(\theta _2\) from sample data \(x_{1i}\) and \(x_{2i}\), \(i=1,\dots ,n\), independently maximizing \(\ell _1\) and \(\ell _2\) with respect to \(\theta _1\) and \(\theta _2\), as for the previous technique, obtaining \(\hat{\theta }_1^{TS}\) and \(\hat{\theta }_2^{TS}\). Then, one considers the maximum likelihood estimates of the marginal cdfs, \(\hat{F}_j(\cdot )=F_j(\cdot ;\hat{\theta }_j^{TS})\), \(j=1,2\), and obtain the estimate of the dependence parameter \(\theta \) via the method of moment, by inverting the relationship between the \(\theta \) and Pearson’s correlation: \(\hat{\theta }^{TSM}=g^{-1}(\hat{\rho }_{X_1X_2};\hat{F}_1,\hat{F}_2)\), where \(\hat{\rho }_{X_1,X_2}\) is Pearson’s sample correlation coefficient and TSM stands for “two-step-moment method”. The evaluation of \(g^{-1}\) at \(\hat{\rho }_{X_1X_2}\), given \(\hat{F}_1\) and \(\hat{F}_2\), is carried out through the algorithm of Sect. 3.3.

We remark that these three estimation methods are just possible alternatives to be employed for the specific context of copula-based discrete distributions considered in this work. When dealing with copula-based distributions in the continuous case, a straightforward estimation method for the parameter of a specific bivariate parametric copula is the method of moment. It consists in considering a rank correlation between the two rvs (say, Kendall’s \(\tau \) or Spearman’s \(\rho \)), looking for a theoretical relationship between it and the parameter of the copula, and substituting the empirical value of the rank correlation into this relationship to derive an estimate of the copula parameter. The main advantage of this method is that it does not require any assumption about the marginal distributions, since rank correlations are margin-free, i.e. they depend on the copula only and are not affected by the marginal distributions.

When dealing with factor copulas, another consolidated estimation method is represented by a sort of simulated method of moments (SMM), where the “moments” that are used in estimation are not the usual ones, but functions of rank statistics. The SMM estimator is derived as the minimizer of the distance between data dependence measures and dependence measures obtained through Monte Carlo simulation of the model (Oh and Patton 2017). In the context of hierarchical Archimedean copulas, Okhrin and Tetereva (2017) investigate a clustering estimator based on Kendall’s \(\tau \) by means of Monte Carlo simulations; it is shown to be competitive in terms of statistical properties (bias and variance) and to be computationally advantageous.

However, using these methods would not be convenient in our context, since we know rank correlations lose their nice properties, holding in the continuous set-up, due to the presence of tied values (see, for example, Nešlehová (2007)).

6 Monte Carlo study

The relative performance of the estimators derived through the three methods described in the previous section, expressed in terms of some statistical indicators such as bias or mean-squared error, can be assessed for finite sample size only via Monte Carlo (MC) simulation. Usually, the estimators of the marginal parameters \(\theta _1\) and \(\theta _2\) derived through the full maximum likelihood technique and the two-step methods have a very close statistical behaviour; on the contrary, differences are expected to arise among the estimators of the dependence parameter. Here we will examine the joint behaviour and performance of all the parameters’ estimators.

For the multivariate case, we recall that the bias of an estimator \(\hat{\pmb {\theta }}=(\hat{\theta }_1,\dots ,\hat{\theta }_p)'\) of a p-dimensional vector parameter \(\pmb {\theta }=(\theta _1,\dots ,\theta _p)'\) is defined as:

$$ \text {bias}(\hat{\pmb {\theta }}):=\mathbb {E}(\hat{\pmb {\theta }})-\pmb {\theta }; $$

\(\hat{\pmb {\theta }}\) is said to be an unbiased estimator of \(\theta \) if \(\mathbb {E}(\hat{\pmb {\theta }})=\pmb {\theta }\) for any \(\pmb {\theta }\in \varTheta \).

A multivariate generalization of the mean-squared error (MSE) is provided by the MSE matrix (see, for example, Mittelhammer 2013, p.377):

$$ \text {MSE}(\hat{\pmb {\theta }})=\mathbb {E}[(\hat{\pmb {\theta }}-\pmb {\theta })(\hat{\pmb {\theta }}-\pmb {\theta })'].$$

The MSE matrix can be decomposed into variance and bias components, analogous to the scalar case. Specifically, \(\text {MSE}\) is equal to the sum of the covariance matrix of \(\pmb {\hat{\theta }}\) and the outer product of the bias vector of \(\pmb {\hat{\theta }}\):

$$ \text {MSE}(\hat{\pmb {\theta }})=\text {cov}(\pmb {\hat{\theta }}) + \text {bias}(\hat{\pmb {\theta }}) \text {bias}(\hat{\pmb {\theta }})'.$$

The trace of the MSE matrix defines the expected squared distance (ESD) of the vector estimator \(\hat{\pmb {\theta }}\) from the vector estimand \(\pmb {\theta }\) and is equal to:

$$\begin{aligned} ESD(\pmb {\hat{\theta }})&:=\text {tr}(\text {MSE}(\hat{\pmb {\theta }}))=\text {tr}(\mathbb {E}[(\hat{\pmb {\theta }}-\pmb {\theta })(\hat{\pmb {\theta }}-\pmb {\theta })'])=\mathbb {E}(\text {tr}((\hat{\pmb {\theta }}-\pmb {\theta })(\hat{\pmb {\theta }}-\pmb {\theta })'))\\&=\mathbb {E}[(\hat{\pmb {\theta }}-\pmb {\theta })'(\hat{\pmb {\theta }}-\pmb {\theta })]=\mathbb {E}\left[ \sum _{i=1}^p (\hat{\theta }_i-\theta _i)^2\right] \\&=\sum _{i=1}^p \mathbb {E}[(\hat{\theta }_i-\theta _i)^2]=\text {tr}(\text {cov}(\pmb {\hat{\theta }})) + \text {bias}(\hat{\pmb {\theta }})'\text {bias}(\hat{\pmb {\theta }}), \end{aligned}$$

where “\(\text {tr}\)” denotes the trace of a matrix. Being a scalar, the ESD allows direct and easy comparison among different estimators of the same parameter vector: the lower the value of ESD, the better the estimator. Generally, for the estimator vectors corresponding to the three methods of Sect. 5, such quantities cannot be derived analytically, but an approximation can be obtained through the corresponding MC means computed over S simulation runs:

$$\begin{aligned} \widehat{ESD}_{MC}(\pmb {\hat{\theta }}) =\frac{1}{S} \sum _{t=1}^S \text {tr}(\text {cov}(\pmb {\hat{\theta }_t})) + \frac{1}{S}\sum _{t=1}^S \widehat{\text {bias}}_{MC}(\hat{\pmb {\theta }})'\widehat{\text {bias}}_{MC}(\hat{\pmb {\theta }}) \end{aligned}$$

with

$$ \widehat{\text {bias}}_{MC}(\hat{\pmb {\theta }}) = \frac{1}{S}\sum _{t=1}^S {\boldsymbol {\hat{\theta }}_t}- \pmb {\theta } $$

and \( {\boldsymbol{\hat{\theta }}}_t\) being the value of the vector estimator for run t, \(t=1,\dots ,S\). The larger the value S, the more accurate the approximation.

A MC study is designed to assess the relative statistical performance of the three types of vector estimators for a bivariate CUB model under an array of artificial scenarios, which are realized by varying the dependence structure, the CUB marginal parameters, and the sample size. We point out that this study will not allow general conclusions, but is intended merely to demonstrate how the different inferential methods work and check the potential of the proposal. As possible dependence structures, we evaluate the Gauss, Frank, and Plackett copulas: the first with parameter \(\rho _{Ga}\) equal to \(-0.6\) or \(+0.6\); the second with parameter \(\kappa =-5\) or \(\kappa =+5\); the last with parameter \(\theta =1/4\) or \(\theta =4\). As CUB marginal parameters, we consider \(m_1=m_2=5\) and \(m_1=m_2=7\), combined with the following values for the marginal parameter vector \(\pmb {\theta }_M=(\pi _1,\xi _1,\pi _2,\xi _2)'\): \((0.4,0.8,0.4,0.8)'\), \((0.4,0.8,0.7,0.3)'\), and \((0.7,0.3,0.7,0.3)'\). For all the \(2\times 2\times 3=12\) combinations above, the sample sizes 50 and 100 are investigated, for a total of 24 artificial settings for each type of copula.

Note that the two-step procedures (Sects. 5.2 and 5.3) require that both empirical marginal distributions assume at least 4 distinct values in order the MLEs to be computed (Piccolo 2003). To ensure feasibility of these two techniques for any simulation run, when they occurred, we simply discarded the samples that do not respect such condition from the MC simulation and kept fixed at \(S=1000\) the number of feasible samples to draw and to use for the statistical analysis under each artificial setting.

Simulation results reporting the values of \(\widehat{ESD}_{MC}\) are displayed in Tables 5a, b, and c. Other summary indices computed over the 1, 000 simulation runs, such as the MC mean of the bias vector, are available on request.

For the models with the Gauss copula, the full MLE performs the best in terms of ESD; for the other two estimators, the values of ESD are very close to each other for any setting, pointing out a slight preference towards the two-step MLE for \(n=100\). There is also a relevant difference in the values of ESD across the settings; as expected, ESD decreases moving from \(n=50\) to \(n=100\), holding the other parameters fixed. The setting with \(\pmb {\theta }=(0.7, 0.3, 0.7, 0.3, -0.6)'\) and \(n=100\) minimizes the value of the index for all the three estimators; the setting with \(\pmb {\theta }=(0.4, 0.8, 0.4, 0.8, -0.6)'\) maximizes it.

For the models with the Frank copula, the two-step MLE surprisingly overtakes the full MLE for any setting; the two-step moment estimator shows the worst performance, even if for \(n=100\) this difference is attenuated. Significant improvement of all the three estimators occur when moving from \(n=50\) to \(n=100\); the values of distribution’s parameters also affect the values of ESD: however, the effect of the model parameters cannot be easily extracted from the results on the examined scenarios.

For the models with the Plackett copula, we have the following interesting result: the MLE is the best performer (smallest \(\widehat{ESD}_{MC}\)) when \(\theta =1/4\) (corresponding to negative dependence); for the complementary scenarios, corresponding to \(\theta =4\) and then positive correlation, the two-step MLE is the best performer. The two-step moment estimator has a far worse behaviour than its competitors, apart from one scenario, where it is the second best after full MLE; this is especially apparent for \(\theta =4\). Note that in this case, differently from the previous models, the values of \(\widehat{ESD}_{MC}\) sensibly change moving from \(\theta =4\) to \(\theta =1/4\), holding fixed the other parameters; this is due to the different magnitude of the two values of \(\theta \) here considered, whose choice was related to the different meaning of the copula parameter’s values: for the Frank and Gauss copulas, changing the sign of \(\theta \) means changing the sign to correlation while keeping its intensity fixed; this does not occur for the Plackett copula, whose parameter has to range within \(\mathbb {R}^+\) (see Fig.2), leading to negative dependence if failing in (0, 1) and to positive dependence if larger than 1.

Although in this MC study, the results in terms of statistical performance of the inferential technique proposed in Sect. 5.3 are overall worse than those of the two more consolidated techniques described in Sects. 5.1 and 5.2, nevertheless the former can be still useful for providing starting values for the copula parameter to the maximization routines of the two latter two.

Table 5 Simulation results: Monte Carlo expected squared distance \(\widehat{ESD}_{MC}\) computed over 1000 runs

We also measured, on a selection of artificial settings, the total computational times in minutes required by each of the three estimation methods over the \(S=1000\) MC runs, which are displayed in Table 6. Although the magnitude of the computation time depends on the selected dependence structure, it can be noted that in each setting the two-step method of moment is by far the least time-consuming, followed by the two-step maximum likelihood method, and the full maximum likelihood method. We remark that the suggested estimation procedure is faster when moving from the Gaussian copula to the other two dependence structures.

Table 6 Overall computation times in minutes for the three estimation methods for some artificial scenarios considered in the MC study

7 Empirical analysis

In this section, we consider an application of the inferential techniques of Sect. 5 to real data, specifically, the survey data coming from the 2000 International Social Survey Programme (ISSP), which addressed the topic of attitudes to environmental protection and preferred government measures for environmental protection. The prefmod package (Hatzinger and Dittirch 2012) comprises the raw data structured as a dataset with 1, 595 complete observations (one for each respondent) on 11 variables, namely five socio-demographical variables (gender, location of residence, age, country, and education) and six items (with a 5-point rating scale, i.e Likert type). Respondents from Austria and Great Britain were asked about their perception of environmental dangers; the questions concerned air pollution caused by cars (variable CAR), air pollution caused by industry (IND), pesticides and chemicals used in farming (FARM), pollution of country’s rivers, lakes, and streams (WATER), a rise in the world’s temperature (TEMP), and modifying the genes of certain crops (GENE). The answers were given on a 5-point rating scale, with response categories: (1) extremely dangerous, (2) very dangerous, (3) somewhat dangerous, (4) not very dangerous, and (5) not dangerous at all for the environment. We focus on respondents from Austria only and on WATER and GENE items. The joint and marginal empirical distributions of the two items are reported in Table 7. By considering the ratings as numerical values, we can treat this distribution as a sample from a bivariate discrete rv; its sample correlation is equal to 0.2988.

Table 7 Joint distribution of the items WATER and GENE for Austrian respondents, ISSP2000 dataset

The nature of the data suggests using CUB as parametric family for the marginal distributions; the Gaussian, Frank and Plackett copulas are assumed as dependence structures for the joint distribution. Table 8 summarizes the estimation results obtained by applying the full maximum likelihood method and both the two-step methods. From Table 8, it is easy to note that within each dependence structure, the three estimation methods provide estimate values for the marginal parameters which are very close to each other; the estimates of the dependence parameter are slightly different. For example, under the Gaussian dependence structure, the values of the estimates of the dependence parameter \(\rho _{Ga}\) range between 0.343 and 0.361—by the way, this confirms a positive and moderate dependence between the two observed variables, as suggested by the value of the sample correlation. Differences in terms of maximized log-likelihood emerge across the three dependence structures here examined. Among the three models, the bivariate CUB with Plackett copula (and parameters set equal to the corresponding MLEs) is that providing the best fit (\(\ell =-2006.317\)).

Table 8 Bivariate CUB models’ estimates of the joint distribution of the items WATER and GENE for several dependence structures and according to the inferential methods of section 5

Alternative ways of comparing the goodness of fit of the three models can be taken up; for example, one can consider Aitchison distance between the empirical and theoretical probabilities \(\delta =\sum \sum (\log (\hat{p}_{ij}/p_{ij}) - \bar{L})^2\), where \(\bar{L}=\sum \sum \log (\hat{p}_{ij}/p_{ij})/k\), with k being the total number of points of the support of the bivariate rv. The minimum value of the Aitchison distance is achieved by the model with the Plackett copula (\(\varDelta =3.830\)); the model with the Frank copula provides \(\varDelta =4.089\); the Gauss copula returns \(\varDelta =4.390\). Other possible distances or divergences between discrete distributions are mentioned in Fossaluza et al. (2018).

An evaluation in absolute terms of the goodness-of-fit of this bivariate model can be carried out by resorting to the chi-squared statistic defined as \(\chi ^2=\sum \sum (n_{ij}^{obs}-n_{ij}^{theo})^2/n_{ij}^{theo}\), with \(n_{ij}^{obs}\) and \(n_{ij}^{theo}\) indicating the observed and the theoretical frequencies, respectively. In order to accomplish the requirement that each \(n_{ij}^{theo}\) has to be not smaller than 5, we can proceed and collapse the last two ordered categories (4 and 5) for each variable, thus obtaining the new observed joint distribution in Table 9. There, we also reported the theoretical joint frequencies corresponding to the “best” bivariate CUB model, for which the Chi-squared statistic above takes the value 21.68, with a p-value 0.0168 (the test statistic, under the null hypothesis that the data come from the bivariate CUB model, asymptotically follows a chi-squared distribution with \(16-1-5=10\) degrees of freedom). This means that the goodness of fit of the model is hardly satisfactory; at the significance level 1%, we do not reject the null hypothesis. Looking at Table 9, discrepancies between observed and theoretical joint (and also marginal) frequencies are visible to the unaided eye.

One can also consider not using a parametric model for the two margins and estimating them nonparametrically. In the opposite direction, other families of bivariate distributions may be tested for fit improvement; for example, cumulative link models could be used for the univariate margins (Agresti and Kateri 2019). Reasonably, introducing covariates (gender, education, age, location of residence) for the marginal (and dependence) parameters would likely increase the fit. However, we remark that the aim of this section was not so much to evaluate the fitting of bivariate (CUB) models to real data as to illustrate and compare different estimation techniques, one of which derived through a correlation-matching procedure. Besides, we remark that pooling cells, although being a viable alternative to obtain accurate p-values in some instances, should be preferably performed before the analysis is made in order to obtain a statistic with the appropriate asymptotic reference distribution; otherwise, it may distort the purpose of the analysis (Maydeu-Olivares and García-Forero 2010). Alternatively, one could resort to resampling methods (e.g. bootstrap), but unfortunately, existing evidence suggests that resampling methods do not yield accurate p-values for the \(\chi ^2\) statistic (Tollenaar and Mooijart 2003).

Table 9 Observed and theoretical (within brackets) joint frequencies of the items WATER and GENE under the bivariate CUB model with Plackett copula after collapsing the last two ordered categories

8 Conclusions

In this work, we showed how to build a joint probability distribution with assigned discrete point-scale margins enjoying a target (feasible) value of correlation. We proposed a two-step copula-based approach: first, one selects a copula function and constructs a bivariate distribution preserving the assigned margins; then, one adjust the value of the copula parameter in order to achieve the target correlation: this leads to the elaboration of an iterative procedure whose accuracy can be set a priori, differently from other approaches in the literature that are based on some rearrangement of very huge samples drawn independently from the two margins.

The approach is designed to work with any one-parameter copula family, provided that it encompasses the entire range of dependence, thus allowing the use of other dependence structures than the Gaussian, whose use in the social sciences has been overriding.

Although in this paper we considered three exchangeable and radially symmetric copulas, this feature is not necessary for the algorithm functioning. Moreover, being comprehensive is a property that makes the algorithm applicable to any feasible correlation value; however, if one is only concerned, for example, with positive correlations, then he/she can use non-comprehensive copulas as well, such as Gumbel or Clayton.

As said, the algorithm is specifically conceived for one-parameter copulas, but it can be extended to two or more-parameter copulas: in this case, some higher co-moments need to be assigned along with linear correlation in order to calibrate the additional parameters.

The extension of the proposed procedure to dimension \(d>2\), discussed in Sect. 3.4, is not straightforward, being limited by theoretical rather than computational reasons, related to the properties of Pearson’s correlation matrix for a d-variate random vector.

Future research will explore these aspects.

9 Supplementary material

The R code implementing the algorithm of Section 3, along with the example of Section 4, is provided as supplementary material here: https://tinyurl.com/ASTB-D-19-00215.