1 Introduction

Ordinal data are frequently encountered in various disciplines. As mentioned in Anderson (1984), ordinal data often arise in two situations: (1) thresholding an underlying continuous variable, and (2) ranking provided by an assessor after processing unspecified amount of available information. An example of the first type could be the abundance of species based on percentage cover on the ground, which can be defined as 0 (absence), 1 (> 0–5% cover), 2 (> 5–12% cover), and so on (Guisan and Harrell 2000). When it is reasonable to assume the existence of a latent continuous variable, logit- or probit-type regression models are commonly employed to analyse the data (McCullagh 1980; Agresti 2010), see also a recent review for a detailed account of various ordinal regression models (Tutz 2022).

The second type of ordinal data is usually recorded in terms of a Likert scale, which has become a widely used tool in researches that involve surveys and questionnaires (Joshi et al. 2015). For example, in visual grading experiments for medical images, assessors are often requested to classify an image using one of several possible options such as “Definitely it is not clearly visible”, “Probably it is not clearly visible”, and so on (Al-Humairi et al. 2022). Since this type of data is usually collected from human respondents, there exists response biases which may make the data not truly reflecting the respondent’s actual opinion towards the survey item (Baumgartner and Steenkamp 2006). For example, in answering a survey question, some people may choose a satisficing option rather than investing their time to give the optimal answer (Krosnick 1999). Van Vaerenbergh and Thomas (2013) have also reported different response styles where respondents tend to choose an answer regardless of the content. Thus, any serious attempt to analyse survey data should take into account the potential response biases inherent in the data. As argued by Iannario and Piccolo (2016), one of the simplest ways to model these kinds of data is to use a two-component model, which explicitly assumes that the data are generated from two processes as described below.

To this end, this paper focuses on the use of finite mixture models to analyse ordinal data arising from surveys. An advantage of using finite mixture models is that the data can be considered as generated from different underlying processes or heterogeneous populations, allowing for a greater flexibility (McLachlan et al. 2019). A popular mixture model that has gained attention recently is the combination of uniform and binomial (CUB) model. Since introduced by Piccolo (2003) and D’Elia and Piccolo (2005), CUB models and their variants have been widely applied in various disciplines to model ordinal data, especially those arising from surveys which require respondents to choose a response from a Likert scale. For example, CUB models have been applied in modelling survival probabilities (Iannario and Piccolo 2010b), customer preferences on food quality (Piccolo and D’Elia 2008), and job satisfaction (Gambacorta and Iannario 2013), just to name a few.

Under the settings of CUB models, the uniform component represents the indecisiveness or uncertainty of the respondent towards the survey item. In such a case, the respondent is assumed to pick an answer completely at random. The binomial component, on the other hand, is related to the feeling or actual opinion of the respondent towards the survey item. The stronger the feeling, the higher the rating. However, the analyst will not be able to distinguish whether the response is a completely random selection or a reflection of the actual feeling of the respondent. Nonetheless, the estimated parameters could inform the measure of uncertainty and preference for typical respondents. More details regarding the foundations and developments of CUB models can be found in a recent review (Piccolo and Simone 2019).

Most of the CUB models developed so far are univariate in nature. In other words, they focus on merely one survey item or question. Since most surveys contain more than one question, the data collected are multivariate in nature. To capture the dependency structure between the responses from several survey items, multivariate models are required. Some notable attempts to introduce multivariate CUB distributions include Corduas (2011, 2015), Andreis and Ferrari (2013), Colombi and Giordano (2016) and Colombi et al. (2019). Except the last one, all these works use copula-based methods to combine univariate CUB random variables. In particular, Colombi and Giordano (2016) employed the Sarmanov distribution while the others used the Plackett distribution. On a related note, Barbiero (2021) demonstrates how a joint distribution of two CUB margins can be constructed using copulas to match a desired correlation. While copula-based methods are flexible, there are limitations that cannot be overlooked. Firstly, copulas are usually applied to continuous random variables. The dangers and restrictions of applying the same practices to discrete distributions have been outlined by several authors, see Genest and Nešlehová (2007) and Geenens (2020) for example. Specifically, since copulas cannot be uniquely defined for discrete variables (Nelsen 2006), there are identifiability issues, which may cause inconsistency in parameter estimation (Genest and Nešlehová 2007). Secondly, parameter(s) in copula models is (are) usually related to either the rank or Pearson correlation between the two univariate random variables. However, since CUB random variables are a combination of two processes, the copula parameter(s) (assuming consistent) would relate to the overall correlation between the mixtures only, rather than the correlation between the individual uniform or binomial components. This may make the interpretation of the estimated copula parameters difficult.

To avoid the above concerns, this paper aims to construct a joint distribution for \((R_1,R_2)\), which represents a pair of ratings arising from a survey, using bivariate uniform and bivariate binomial distributions. Some important features of the proposed model include (1) both \(R_1\) and \(R_2\) follow a CUB distribution marginally, (2) the joint distribution is not derived through copula-based routines, and (3) the dependency between the uniform and binomial components can be estimated separately, allowing better interpretation of model parameters. Our proposed model is similar to the hierarchical marginal models with latent uncertainty (HMMLU) proposed by Colombi et al. (2019), which will be described more formally in Sect. 3. Briefly, in their work, the uncertainty components can take a more flexible shape while the feeling components and the corresponding associations are modelled directly using marginal logits and log odds ratios, in the spirit of marginal models (Molenberghs and Lesaffre 1994; Bartolucci et al. 2007). One drawback of HMMLU is that the uncertainty components are assumed to be independent. Our proposed model overcomes this by having a parameter that directly measures the correlation between the uncertainty components. Another drawback of HMMLU lies in the large number of parameters, which characterise the marginal logits and log odds ratios, especially in the absence of covariates. In our proposed model, the feelings are modelled using a bivariate binomial distribution which contains only three parameters, making it more parsimonious.

The rest of the paper is organised as follows. Section 2 provides a brief account of the CUB, bivariate uniform and bivariate binomial distributions. Section 3 demonstrates how these distributions can be combined to form a new class of bivariate CUB models. A comparison between the proposed model and HMMLU is provided as well. Section 4 deals with various inferential issues including identifiability, parameter estimation, calculation of standard errors and hypothesis tests. Simulation and application results are reported in Sects. 5 and 6, respectively. Finally, Sect. 7 provides a conclusion and discussions.

2 Preliminaries

Formally, a random variable R is said to follow the CUB distribution with parameters \(\pi \) and \(\xi \), denoted by \(R\sim CUB(\pi _,\xi )\), if the probability mass function (pmf) is a mixture of the discrete uniform distribution and a binomial distribution. Suppose R takes one of the \((m+1)\) values from \(\lbrace 0,1,2,\ldots ,m\rbrace \), the pmf admits the form

$$\begin{aligned} P(R=r) = \frac{1-\pi }{1+m} + \pi C^m_r (1-\xi )^r \xi ^{m-r}. \end{aligned}$$

Here, the mixing weight \((1-\pi )\) measures the degree of uncertainty while \((1-\xi )\) measures the degree of feeling. As \((1-\xi )\) increases, there is a higher chance of observing a higher rating.

In the literature, CUB random variables are assumed to range from 1 to m instead of starting from zero, represented by a shifted binomial distribution (Iannario and Piccolo 2010a). However, in this paper, we use the ordinary binomial distribution for several reasons. Firstly, real survey choice sets are often textual and arbitrary in numbering, making numerical interpretation less meaningful. Secondly, we treat the binomial component as a sum of independent Bernoulli variables, which naturally starts from zero. Lastly, ordinary binomial distribution results are more accessible and less confusing for readers unfamiliar with the history of CUB models.

To construct a bivariate model for \(R_1\) and \(R_2\) (which may represent rating responses from two survey questions), we first provide some details on a bivariate discrete uniform distribution and a bivariate binomial distribution which we have chosen to work on. A main feature of these distributions is that the marginal distributions belong to the same class.

2.1 Bivariate discrete uniform distribution

Let \(U_1\) and \(U_2\) be two random variables where the pmf for \(U_1\) admits the form

$$\begin{aligned} P(U_1=u_1) = \frac{1}{m+1},\qquad u_1=0,1,2,\ldots ,m. \end{aligned}$$
(1)

We further assume the following form for the conditional distribution of \(U_2\) given \(U_1\):

$$\begin{aligned} P(U_2=u_2|U_1=u_1) = {\left\{ \begin{array}{ll} \frac{1+\alpha _U}{m+1}, &{} \text {if }u_2=u_1;\\ \frac{m-\alpha _U}{m(m+1)}, &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$
(2)

In other words, the conditional distribution of \(U_2|U_1\) is not uniform but a categorical distribution. The parameter \(\alpha _U\) characterises the dependence between \(U_1\) and \(U_2\). Depending on the value of \(\alpha _U\), the probability of choosing the same answer in Question 2, given the response in Question 1, can be higher, lower, or unchanged. The admissible range of \(\alpha _U\) is \([-1,m]\), with \(\alpha _U=0\) representing the case of independence. Marginally, \(U_2\) follows the discrete uniform distribution, since

$$\begin{aligned} P(U_2=u_2)= & {} \sum _{u_1=0} ^{m} P(U_2|U_1) P(U_1)\\= & {} \frac{1}{m+1}\left( \frac{1+\alpha _U}{1+m}\right) + m \frac{1}{m+1} \left( \frac{m-\alpha _U}{m(m+1)}\right) = \frac{1}{m+1}. \end{aligned}$$

The joint distribution of \(U_1\) and \(U_2\) can be written as

$$\begin{aligned} P(U_1=u_1,U_2=u_2)=\frac{m+m\alpha _U 1_{u_2=u_1}-\alpha _U1_{u_2\ne u_1}}{m(m+1)^2}\equiv U_{12}(u_1,u_2,\alpha _U), \end{aligned}$$

where \(1_{A}\) is the indicator variable which takes a value of 1 if condition A is satisfied; and 0 otherwise. Notice that \(U_1\) and \(U_2\) are independent only if \(\alpha _U=0\).

Studies on psychological aspects of survey responses have revealed the tendency for respondents to select the same category regardless of the question, thus we expect \(\alpha _U\) to be positive in practice. For example, three of the common response styles reported by Baumgartner and Steenkamp (2001) and Van Vaerenbergh and Thomas (2013) are acquiescence response style, extreme response style and midpoint responding. These response styles refer to the tendency to agree with the item regardless of content, to select the most extreme category regardless of content, and to choose the middle scale category regardless of content. All these tendencies would make the probability of having two identical responses higher than expected under the independence assumption. The first two moments of \(U_1\) and \(U_2\) are summarised below. The derivations can be found in Appendix A.

$$\begin{aligned} E(U_i)= & {} \frac{m}{2},\\ \text {Var}(U_i)= & {} \frac{m(m+2)}{12},\\ \text {Cov}(U_1,U_2)= & {} \frac{\alpha _U (m+2)}{12},\\ r_U= & {} \text {Corr}(U_1,U_2) = \frac{\alpha _U}{m}. \end{aligned}$$

2.2 Bivariate binomial distribution

We will make use of the bivariate binomial distribution introduced in Biswas and Hwang (2002), where further details can be found. Let \(T_1\) and \(T_2\) be two random variables. Following Biswas and Hwang (2002), we consider \(T_1\) as a sum of m independent Bernoulli variables, i.e., \(T_1=\sum _{i=1} ^m T_{1i}\) where \(T_{1i}\overset{\text {i.i.d.}}{\sim } Ber(1-\xi _1)\). Given \(T_{1i}\), another Bernoulli variable \(T_{2i}\) is generated such that

$$\begin{aligned} P(T_{2i}=1|T_{1i})=\frac{1-\xi _2+\alpha _B(\xi _1 -\xi _2)+\alpha _B T_{1i}}{1+\alpha _B}, \end{aligned}$$
(3)

where \(\alpha _B\) measures the dependency between \(T_{1i}\) and \(T_{2i}\), with the admissible ranges

$$\begin{aligned} \alpha _B \in {\left\{ \begin{array}{ll} \left( \max \left\{ -\frac{\xi _2}{1-\xi _1+\xi _2},\frac{\xi _2-1}{1+\xi _1-\xi _2}\right\} ,\frac{1-\xi _2}{\xi _2-\xi _1}\right) , &{} 1-\xi _1> 1-\xi _2;\\ \left( \max \left\{ -\frac{\xi _2}{1-\xi _1+\xi _2},\frac{\xi _2-1}{1+\xi _1-\xi _2}\right\} ,\frac{\xi _2}{\xi _1-\xi _2}\right) , &{} 1-\xi _2 > 1-\xi _1;\\ (\max \left\{ -\xi ,-(1-\xi )\right\} ,\infty ), &{} 1-\xi _2=1-\xi _1=1-\xi . \end{array}\right. } \end{aligned}$$
(4)

Remark 1

The admissible ranges given in (4) ensure \(P(T_{2i}=0|T_{1i})\) and \(P(T_{2i}=1|T_{1i})\) are between 0 and 1. The above ranges correct the ones provided in Biswas and Hwang (2002).

When \(\alpha _B=0\), \(T_{1i}\) and \(T_{2i}\) are independent. Furthermore, \(T_{1i}\) and \(T_{2j}\) are assumed to be independent for all \(i\ne j\). Marginally, it can be checked that \(T_{2i}\sim Ber(1-\xi _2)\) since

$$\begin{aligned} P(T_{2i}=1)= & {} (1-\xi _1)\left( \frac{1-\xi _2+\alpha _B(\xi _1-\xi _2)+\alpha _B}{1+\alpha _B}\right) +\xi _1\left( \frac{1-\xi _2+\alpha _B(\xi _1-\xi _2)}{1+\alpha _B}\right) \\= & {} 1-\xi _2. \end{aligned}$$

We further define \(T_2=\sum _{i=1}^m T_{2i}\). In other words, both \(T_1\) and \(T_2\) follow the binomial distribution with parameters \((m,1-\xi _1)\) and \((m,1-\xi _2)\), respectively. The conditional distribution of \(T_2\) given \(T_1\) is given as

$$\begin{aligned} P(T_2=t_2|T_1=t_1)=(1+\alpha _B)^{-m} \times \sum _{j=0} ^{t_1} C_{j} ^{t_1} C_{t_2-j} ^{m-t_1} w_1^{j} w_2^{t_1-j} w_3^{t_2-j} w_4^{m-t_1-t_2+j}, \end{aligned}$$

where

$$\begin{aligned} w_1= & {} 1-\xi _2+\alpha _B(\xi _1 -\xi _2)+\alpha _B,\\ w_2= & {} \xi _2-\alpha _B(\xi _1-\xi _2),\\ w_3= & {} 1-\xi _2+\alpha _B(\xi _1-\xi _2),\quad \text {and}\\ w_4= & {} \xi _2-\alpha _B(\xi _1-\xi _2)+\alpha _B. \end{aligned}$$

Hence, the joint distribution of \(T_1\) and \(T_2\) can be written as

$$\begin{aligned} P(T_1=t_1,T_2=t_2)= & {} B_1 (t_1) \times (1+\alpha _B)^{-m}\\{} & {} \times \sum _{j=0} ^{t_1} C_{j} ^{t_1} C_{t_2-j} ^{m-t_1} w_1^{j} w_2^{t_1-j} w_3^{t_2-j} w_4^{m-t_1-t_2+j}\\\equiv & {} B_{12}(t_1,t_2;\xi _1,\xi _2,\alpha _B), \end{aligned}$$

where \(B_1 (t_1) = C_{t_1} ^m (1-\xi _1)^{t_1}\xi _1^{m-t_1}\). The covariance and correlation of \(T_1\) and \(T_2\) are given below [see also Biswas and Hwang (2002) for a more general class of the bivariate binomial distribution]:

$$\begin{aligned} \text {Cov}(T_{1},T_{2})= & {} \frac{m \alpha _B}{1+\alpha _B} \xi _1 (1-\xi _1),\nonumber \\ r_T= & {} \text {Corr}(T_{1},T_{2}) =\frac{\alpha _B}{1+\alpha _B} \sqrt{\frac{\xi _1 (1-\xi _1)}{\xi _2 (1-\xi _2)}}. \end{aligned}$$
(5)

Mathematical derivations are provided in Appendix A. When two survey questions inquire about similar aspects, it is reasonable to expect a positive correlation in the responses (\(\alpha _B>0\)). In the opposite, if the two questions are probing for conflicting aspects (for example, satisfaction of salary and tendency to leave the company), one may anticipate a negative \(\alpha _B\). Studies of survey response have also revealed that prior questions often influence later responses (Krosnick and Alwin 1987; Tourangeau et al. 2000), thus it is important to capture the correlation between \(T_1\) and \(T_2\).

3 A new class of bivariate CUB distributions

Suppose \(R_1\) and \(R_2\) represent the ordinal responses from two survey questions answered by the same respondent. Although there is no requirement for \(R_1\) and \(R_2\) to be the responses from two consecutive questions, it may be easier to understand the process considering that way. We assume the following generating process.

The respondent first decides if s/he is uncertain or certain about his/her feeling towards Question 1. If s/he is uncertain, the rating is given randomly according to a discrete uniform distribution. If s/he is certain, the rating is given by a binomial distribution reflecting her/his feeling. Hence, \(R_1\) resembles the generating process of a univariate CUB variable. The same process is repeated Question 2. However, this time the rating may depend on the rating provided in the previous question.

Since the decision process is repeated two times, there are four scenarios: (uncertain, uncertain), (uncertain, certain), (certain, uncertain) and (certain, certain), with respective probabilities \((1-\pi _1)(1-\pi _2)\), \((1-\pi _1)\pi _2\), \(\pi _1(1-\pi _2)\) and \(\pi _1\pi _2\). Symbolically, let \(D_1\) and \(D_2\) be two independent Bernoulli variables with \(P(D_i = 1) = \pi _i\). The four scenarios can be written as \((D_1=0,D_2=0)\), \((D_1=0,D_2=1)\), \((D_1=1,D_2=0)\), and \((D_1=1,D_2=1)\). We also assume that if the ‘regime’ goes from uncertain to certain (or vice versa), the ratings given in the two questions are independent. Such a process is represented schematically in Fig. 1. Note that all the stages except the outcome are unobservable and therefore unobserved. The above described process would result in the following joint distribution:

$$\begin{aligned}{} & {} P(R_1=r_1,R_2=r_2) \nonumber \\{} & {} \quad =(1-\pi _1)(1-\pi _2)U_{12}(r_1,r_2,\alpha _U) + \frac{(1-\pi _1)\pi _2B_2 (r_2)}{m+1} + \frac{\pi _1(1-\pi _2)B_1(r_1)}{m+1}\nonumber \\{} & {} \qquad + \pi _1\pi _2 B_{12}(r_1,r_2;\xi _1,\xi _2,\alpha _B) \end{aligned}$$
(6)
Fig. 1
figure 1

Schematic flowchart showing the generating process of \((R_1,R_2)\)

From the joint distribution, it can be checked that, marginally, both \(R_1\) and \(R_2\) follow a univariate CUB distribution with parameters \((\pi _1,\xi _1)\) and \((\pi _2,\xi _2)\), respectively. Also, \(R_1\) and \(R_2\) are independent if and only if \(\alpha _B=\alpha _U=0\). In that case,

$$\begin{aligned} P(R_1=r_1,R_2=r_2)=\left[ \frac{1-\pi _1}{m+1}+\pi _1B_1 (r_1)\right] \left[ \frac{1-\pi _2}{m+1}+\pi _2 B_2 (r_2)\right] . \end{aligned}$$
Fig. 2
figure 2

Contour plots and 3D histograms of some bivariate CUB models under three sets of parameters with \(m=9\)

The first two moments of the proposed bivariate CUB distribution are given by

$$\begin{aligned} E(R_i)= & {} (1-\pi _i)\frac{m}{2} + \pi _i m (1-\xi _i), \quad i=1,2; \nonumber \\ \text {Var}(R_i)= & {} (1-\pi _i)m\left[ \frac{2m+1}{6}-\frac{(1-\pi _i)m}{4}\right] \nonumber \\{} & {} + \pi _i m(1-\xi _i)\xi _i [1-m(1-\pi _i)],\quad i=1,2; \nonumber \\ \text {Cov}(R_1,R_2)= & {} (1-\pi _1)(1-\pi _2)\frac{\alpha _U (m+2)}{12}+\pi _1\pi _2\frac{m \alpha _B}{1+\alpha _B} \xi _1 (1-\xi _1). \end{aligned}$$
(7)

Derivation details are provided in Appendix A. The correlation, \(r_R\), can then be derived from the covariance and the variances. Figure 2 shows the contour plots and 3D histograms for the joint probability mass functions under three sets of parameters. From top to bottom panels, the figure demonstrates the cases where \(R_1\) and \(R_2\) are positively correlated, independent, and negatively correlated, respectively.

From (7), it can be deduced that the correlation between \(R_1\) and \(R_2\) is zero if \(\alpha _B=\alpha _U=0\), or when

$$\begin{aligned} \alpha _U=\frac{-12\pi _1\pi _2}{(1-\pi _1)(1-\pi _2)}\frac{m \xi _1 (1-\xi _1)}{m+2}\frac{\alpha _B}{1+\alpha _B}, \end{aligned}$$

as long as the right-hand-side (RHS) of the above equation is within the admissible range provided in (4). In other words, the dependency within the uniform components may sometimes cancel out that due to the binomial components.

The correlation \(r_R\) between the two responses are governed by not only \(\alpha _U\) and \(\alpha _B\), but also all other parameters. For this reason, \(r_R\) may sometimes be misleading, or at least undermining the dependency between the respondent’s feelings towards the two items. For instance, if \(\pi _1=\pi _2=0.5\), then approximately half of the pairs \((R_1,R_2)\) will be generated independently, which may shrink the overall correlation \(r_R\), even when \(r_U\) and \(r_T\) are reasonably large. Yet, in practice, \(r_U\) and \(r_T\) may be of higher interest. The former represents the tendency of choosing the same category when the respondent was uncertain towards both questions, while the latter represents the correlation between the liking of the two survey items. Once the model parameters were estimated, \(r_T\) and \(r_U\) can be found correspondingly. The separation of the overall dependency into different components cannot be accomplished in any previously proposed copula-based methods, as these methods tend to estimate the overall correlation between the two margins.

3.1 Comparison with HMMLU

A model that is similar to the bivariate CUB model proposed is the aforementioned HMMLU (Colombi et al. 2019). Similar to our approach, the data generating process of HMMLU assumes the existence of latent states that represent if the respondent’s answer was based on feeling or uncertainty. In the bivariate case, the four scenarios \((D_1=0,D_2=0)\), \((D_1=0,D_2=1)\), \((D_1=1,D_2=0)\), and \((D_1=1,D_2=1)\) would still apply. The major difference between HMMLU and our proposal lies in the distributions of the responses under each of the four scenarios.

When an the answer is given with uncertainty \((D=0)\), HMMLU assumes a distribution \(h_i(r_i), i=1,2,\) which can take different shapes such as U-shape and bell shape. The uniform distribution is one of the special cases. When both answers are given with uncertainty, \(R_1\) and \(R_2\) are assumed to be independent under HMMLU. In the opposite, when an answer is given with certainty (\(D=1\)), HMMLU does not impose any specific distribution for the responses. Rather, the marginal distributions and the joint distribution are parameterised through marginal logits and log odds ratios, respectively. Such an approach stems from the general framework of marginal models for categorical data (Bergsma and Rudas 2002; Bartolucci et al. 2007). The joint distribution of \(R_1\) and \(R_2\) under HMMLU can be written as

$$\begin{aligned} P(R_1=r_1,R_2=r_2)= & {} \sum _{i,j=0,1} \pi _{ij} P(R_1=r_1,R_2=r_2|D_1=i,D_2=j)\nonumber \\= & {} \pi _{00}h_1(r_1)h_2(r_2) + \pi _{01}h_1(r_1) P(R_2=r_2|D_2=1) \nonumber \\{} & {} +\pi _{10}h_2(r_2) P(R_1=r_1|D_1=1) \nonumber \\{} & {} + \pi _{11}P(R_1=r_1,R_2=r_2|D_1=D_2=1) \end{aligned}$$
(8)

Comparing Eqs. (6) and (8), some differences between HMMLU and the proposed bivariate CUB model are notable. Firstly, HMMLU does not allow correlation between uncertain responses. In the bivariate CUB model, such a correlation is captured through \(\alpha _U\) in \(U_{12}\). Of course, when \(\alpha _U=0\), the uncertain responses under the bivariate CUB model are generated in the same manner as HMMLU when both \(h_i\) take the uniform distribution. Secondly, the mixing weights (\(\pi \)’s) are generated differently. Implicitly, our approach assumes that \(D_1\) and \(D_2\) are independent while HMMLU allows them to be dependent.

Lastly, the distributions of the certain responses under HMMLU need not be the binomial distributions, and are hence more flexible. However, a consequence of which is that HMMLU contains way more parameters. The situation is more obvious in the absence of covariates. For example, with \(m+1\) categories, HMMLU would require m parameters for the marginal logits for each of \(R_1\) and \(R_2\), and \((m-1)^2\) log odds ratios to parameterise the joint distribution. As mentioned in Colombi et al. (2019, p. 599), the large number of parameters will usually lead to identifiability issues, and constraints are therefore required. In the opposite, the distributions of the certain responses under the proposed bivariate CUB model can be characterised using three parameters \(\xi _1,\xi _2\) and \(\alpha _B\).

4 Inferential issues

Next, we discuss various issues related to the inferential processes. We start with the identifiability since the estimation of the parameters is only meaningful if the model is identifiable. Next, we discuss the strategy of estimating the parameters. Before closing this section, we provide details for the standard error calculations and hypothesis tests for some of the parameters.

4.1 Identifiability

The following theorem specifies the conditions under which the bivariate CUB model is identifiable.

Theorem 1

Given that \(0<\pi _1,\pi _2,\xi _1,\xi _2<1\), \(\xi _1\ne \xi _2\), and \(m\ge 3\), the bivariate CUB model given in (6) is identifiable.

Before we provide a proof for the theorem above, we first place a couple of remarks. Firstly, the condition \(m\ge 3\) (i.e., the number of categories is at least 4) is equivalent to the condition required in the univariate case (Iannario 2010). Similar to the identifiability condition for HMMLU, restrictions on the number of categories are necessary to make sure that the number of parameters is less than the number of free frequencies (Colombi et al. 2019). The univariate CUB model is still identifiable when \(\pi =1\) (Iannario 2010) because the discrete uniform distribution does not contain any parameters (\(\xi \) can be identified even when \(\pi =1\)). Thus, the identifiability is ensured for \(\pi >0\). However, in the bivariate case, as specified in Theorem 1, while it is still required that \(\pi _1,\pi _2>0\), either \(\pi _1\) or \(\pi _2=1\) would make the model non-identifiable, since there are infinite number of possible \(\alpha _U\) that would yield the same joint distribution (6) under such case. Similarly, when \(\xi _1\) or \(\xi _2\) takes either values of 0 or 1, \(\alpha _B\) cannot be identified as well. The additional requirement of \(\xi _1\ne \xi _2\) may seem restrictive. In practice, however, since \(\xi _1\) and \(\xi _2\) correspond to the feeling of a respondent towards two survey questions, the values rarely coincide, unless the two questions are probing for exactly the same aspect (in that case, a single question would be sufficient).

Proof

Let \(\varvec{\theta }=(\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'\in \varvec{\Theta }=(0,1)^4\times [-1,m]\times {\mathcal {A}}_B\) where \({\mathcal {A}}_B\) is the parameter space for \(\alpha _B\) governed by (4), with the exception that \(\xi _1\) cannot be equal to \(\xi _2\). Further, denote by \(P_{r_1,r_2}(\varvec{\theta })=P(R_1=r_1,R_2=r_2;\varvec{\theta })\), \(P_{\bullet r_2}=\sum _{r_1=0} ^m P_{r_1,r_2}\) and \(P_{r_1 \bullet }=\sum _{r_2=0} ^m P_{r_1,r_2}\). The bivariate CUB model is identifiable if and only if, for any parameter vector \(\varvec{\theta ^*}\), the system of equations in \(\varvec{\theta }\):

$$\begin{aligned} P_{r_1,r_2}(\varvec{\theta })=P_{r_1,r_2}(\varvec{\theta ^*}),\qquad r_1,r_2 =0,1,\ldots ,m, \end{aligned}$$
(9)

admits only one solution in the parameter space (Manisera and Zuccolotto 2015). With \((m+1)\) categories, there are altogether \((m+1)^2\) equations in (9). Fortunately, results in Manisera and Zuccolotto (2015) also demonstrate that it is possible to reduce the number of equations in the system by constructing some equations that allow the parameters to be specified sequentially.

For the bivariate CUB model on hand, we consider the following system of equations:

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{P_{m\bullet }(\varvec{\theta })-P_{0\bullet }(\varvec{\theta })}{P_{0\bullet }(\varvec{\theta })-1/(m+1)}=\frac{P_{m\bullet }(\varvec{\theta ^*})-P_{0\bullet }(\varvec{\theta ^*})}{P_{0\bullet }(\varvec{\theta ^*})-1/(m+1)} \\ \pi _1=\frac{P_{0\bullet }(\varvec{\theta ^*})-1/(m+1)}{\xi _1^{m}-1/(m+1)}\\ \frac{P_{\bullet m}(\varvec{\theta })-P_{\bullet 0}(\varvec{\theta })}{P_{\bullet 0}(\varvec{\theta })-1/(m+1)}=\frac{P_{\bullet 0}(\varvec{\theta ^*})-P_{\bullet 0}(\varvec{\theta ^*})}{P_{\bullet 0}(\varvec{\theta ^*})-1/(m+1)} \\ \pi _2=\frac{P_{\bullet 0}(\varvec{\theta ^*})-1/(m+1)}{\xi _2^{m}-1/(m+1)}\\ \frac{P_{01}(\varvec{\theta })-P_{10}(\varvec{\theta })}{\xi _1^{m-1}(\xi _1-\xi _2)} = \frac{P_{01}(\varvec{\theta ^*})-P_{10}(\varvec{\theta ^*})}{\xi _1^{m-1}(\xi _1-\xi _2)}\\ \alpha _U=\left( P_{00}(\varvec{\theta }^*)-\frac{(1-\pi _1)\pi _2 B_2(0)}{m+1}-\frac{\pi _1(1-\pi _2) B_1(0)}{m+1}-\frac{\pi _1\pi _2B_1(0)w_4^m}{(1+\alpha _B)^m}\right) \frac{(m+1)^2}{(1-\pi _1)(1-\pi _2)}-1 \end{array}\right. } \end{aligned}$$
(10)

The selection of the above system was merely due to the simplicity of algebra involved, as shown below. In the first equation, both \(P_{m\bullet }\) and \(P_{0\bullet }\) represent marginal probabilities which are free of \(\alpha _U\) and \(\alpha _B\). According to Iannario (2010), the first two equations in (10) allow \(\pi _1\) and \(\xi _1\) to be uniquely specified. Similarly, the second two allow \(\pi _2\) and \(\xi _2\) to be uniquely specified. If \(\alpha _B\) can be uniquely specified, the last equation will only yield one \(\alpha _U\) (hence unique). Thus, it remains to prove the uniqueness of \(\alpha _B\). For this purpose, we consider in details the fifth equation in (10). Since

$$\begin{aligned} P_{01}(\varvec{\theta })= & {} (1-\pi _1)(1-\pi _2)\frac{m-\alpha _U}{m(m+1)^2} + \frac{(1-\pi _1)\pi _2 B_2(1)}{m+1} + \frac{\pi _1(1-\pi _2)B_1(0)}{m+1} \\{} & {} +\frac{\pi _1\pi _2 B_1(0)mw_3w_4 ^{m-1}}{(1+\alpha _B)^{m}}, \qquad \text {and}\\ P_{10}(\varvec{\theta })= & {} (1-\pi _1)(1-\pi _2)\frac{m-\alpha _U}{m(m+1)^2} + \frac{(1-\pi _1)\pi _2 B_2(0)}{m+1} + \frac{\pi _1(1-\pi _2)B_1(1)}{m+1} \\{} & {} + \frac{\pi _1\pi _2 B_1(1) w_2 w_4^{m-1}}{(1+\alpha _B)^{m}}, \end{aligned}$$

we have

$$\begin{aligned} \frac{P_{01}(\varvec{\theta })-P_{10}(\varvec{\theta })}{\xi _1^{m-1}(\xi _1-\xi _2)}= & {} \frac{(1-\pi _1)\pi _2(B_2(1)-B_2(0))}{(m+1)\xi _1^{m-1}(\xi _1-\xi _2)} + \frac{\pi _1(1-\pi _2)(B_1(0)-B_1(1))}{(m+1)\xi _1^{m-1}(\xi _1-\xi _2)}\nonumber \\{} & {} +\pi _1\pi _2 \frac{mw_4 ^{m-1}}{(1+\alpha _B)^{m-1}} \end{aligned}$$
(11)

which is a function in \(\alpha _B\), and free of \(\alpha _U\), provided all other specified parameters \(\pi _1,\pi _2,\xi _1\) and \(\xi _2\). Furthermore, this function is continuous in \(\alpha _U\). To see this, we simply need to show that \(\alpha _B>-1\). With \(\xi _1\ne \xi _2\), the lower bound of \(\alpha _B\) is always greater than \(-1\) since

$$\begin{aligned}{} & {} -\frac{\xi _2}{1-\xi _1+\xi _2}-(-1) = \frac{1-\xi _1}{1-\xi _1+\xi _2}>0, \qquad \text {and}\\{} & {} \frac{\xi _2-1}{1+\xi _1-\xi _2} -(-1) = \frac{\xi _1}{1+\xi _1-\xi _2}>0. \end{aligned}$$

Now, we will show that the above function is monotonically increasing in \(\alpha _B\). Differentiating (11) with respect to \(\alpha _B\) yields

$$\begin{aligned} \frac{\pi _1\pi _2 m(m-1)(1-\xi _1)[\xi _2-\alpha _B(\xi _1-\xi _2)+\alpha _B]^{m-2}}{(1+\alpha _B)^m}. \end{aligned}$$

Since \(\alpha _B>-1\), the denominator is always positive. Now, consider \(\xi _2-\alpha _B(\xi _1-\xi _2)+\alpha _B\). The lower bound of \(\alpha _B\) is given by

$$\begin{aligned} \alpha _B> & {} \max \left\{ -\frac{\xi _2}{1-\xi _1+\xi _2},\frac{\xi _2-1}{1+\xi _1-\xi _2}\right\} . \end{aligned}$$

Since

$$\begin{aligned} \frac{\xi _2-1}{1+\xi _1-\xi _2}-\frac{-\xi _2}{1-\xi _1+\xi _2} = \frac{\xi _1+\xi _2-1}{1-(\xi _1-\xi _2)^2}, \end{aligned}$$

we can deduce that

$$\begin{aligned} \max \left\{ -\frac{\xi _2}{1-\xi _1+\xi _2},\frac{\xi _2-1}{1+\xi _1-\xi _2}\right\} ={\left\{ \begin{array}{ll} \frac{\xi _2-1}{1+\xi _1-\xi _2}, &{} \text {if}\,\, \xi _1+\xi _2-1 \ge 0;\\ -\frac{\xi _2}{1-\xi _1+\xi _2}, &{} \text {if}\,\, \xi _1+\xi _2-1 < 0. \end{array}\right. } \end{aligned}$$

If \(\xi _1+\xi _2-1 \ge 0\),

$$\begin{aligned} \xi _2+\alpha _B(1-\xi _1+\xi _2)> & {} \xi _2 + \left( \frac{\xi _2-1}{1+\xi _1-\xi _2}\right) (1-\xi _1+\xi _2)\\= & {} \frac{\xi _1+\xi _2-1}{1-\xi _1+\xi _2} \ge 0 \end{aligned}$$

In the opposite, if \(\xi _1+\xi _2-1 < 0\),

$$\begin{aligned} \alpha _B> & {} -\frac{\xi _2}{1-\xi _1+\xi _2}\\ \alpha _B(1-\xi _1+\xi _2)> & {} -\xi _2\\ \xi _2+\alpha _B(1-\xi _1+\xi _2)> & {} 0. \end{aligned}$$

Hence, \(\xi _2+\alpha _B(1-\xi _1+\xi _2)\) is always positive. Since Eq. (11) is continuous and monotonically increasing, one and only one \(\alpha _B\) will be specified. This completes the proof. \(\square \)

4.2 Parameter estimation

The parameter estimation can be carried out using the EM algorithm (Dempster et al. 1977). Although the chief focus of Dempster et al. (1977) was on handling incomplete data, the EM algorithm has been proven to work well for mixture distributions, including CUB models (Piccolo 2006). Further details on this topic can be found in Everitt and Hand (1981), Redner and Walker (1984), McLachlan and Peel (2000) and Arcidiacono and Jones (2003), among many others. The details of the algorithm for the proposed bivariate CUB model are provided in Appendix B.

4.3 Standard errors

The variance-covariance matrix of the estimated parameters can be obtained by inverting the observed information matrix:

$$\begin{aligned} \text {Var}(\hat{\varvec{\theta }})=I(\hat{\varvec{\theta }})^{-1} = [-\Delta ^2 \log L(\varvec{\theta })]^{-1} |_{\varvec{\theta }=\hat{\varvec{\theta }}}. \end{aligned}$$

The standard errors of the parameters are the square root of the diagonal elements of \(\text {Var}(\hat{\varvec{\theta }})\). The use of observed information matrix, instead of expected information matrix, has been justified in Efron and Hinkley (1978). Explicit expressions of the elements in \(\text {Var}(\hat{\varvec{\theta }})\) are provided in Appendix C.

4.4 Model selection

For a particular dataset on hand, when selecting between non-nested models such as the bivariate CUB and HMMLU, common measures such as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) can be employed. In the context of the proposed bivariate CUB model, when comparing between nested models, it can be done via hypothesis tests by means of the likelihood ratio test (Hoel 1962). Here, we list some of the tests can be done regarding the dependency parameters:

  • \(H_0 ^1: \alpha _B=c_1, \alpha _U=c_2\),

  • \(H_0 ^2: \alpha _B=c\), and

  • \(H_0 ^3: \alpha _U=c\),

for some constants \(c, c_1\) and \(c_2\), against the alternative hypothesis that \(H_0\) is not true. In particular, testing if any or both of \(\alpha _U\) and \(\alpha _B\) is/ are zero would be of high interest. Under \(H_0 ^1\), if \(\alpha _B=\alpha _U=0\), \(R_1\) and \(R_2\) are completely independent. Under \(H_0 ^2\), if \(\alpha _B=0\), provided that the respondent chose to express his/ her opinions on both questions, the feelings towards the two questions are independent. Under \(H_0 ^3\), if \(\alpha _U=0\), provided that the respondent was uncertain to both questions, his/ her choices of the categories are independent (both completely random). The test statistic is

$$\begin{aligned} -2\log [L({\varvec{r}}_k;\hat{\varvec{\theta }}_0)/L({\varvec{r}}_k;\hat{\varvec{\theta }})], \end{aligned}$$

where \(\hat{\varvec{\theta }}_0\) is the maximum likelihood estimator of \(\varvec{\theta }\) evaluated under the restrictions specified in \(H_0\). The test statistic follows a \(\chi ^2\) distribution approximately, with a degrees of freedom of 2 for the \(H_0 ^1\), and 1 for both \(H_0 ^2\) and \(H_0 ^3\).

5 Simulation

Simulations were conducted to investigate the accuracy of the estimates based on the procedure described in Sect. 4.2 under two cases: (1) large sample with many categories, and (2) small sample with relatively fewer categories. As the number of categories is typically between 2 and 11, with 5 to 10 categories being the easiest to rate (Wakita et al. 2012), we have purposely chosen 5, 7 and 10 categories in the simulation studies below.

5.1 Large sample with 10 categories

In this simulation study, we set \(m=9\) (which means a total of 10 categories) and used two sets of parameters as given below.

  • Set 1: \((\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'=(0.7,0.5,0.6,0.4,5.0,1.5)'\)

  • Set 2: \((\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'=(0.5,0.6,0.6,0.4,3.0,-0.3)'\)

For each set of parameters, we first simulated two Bernoulli variables \(D_1\) and \(D_2\) using \(\pi _1\) and \(\pi _2\) as the respective parameters. If \(D_1=D_2=0\), \(R_1\) and \(R_2\) were simulated using (1) and (2), respectively. If \(D_1=0\) and \(D_2=1\), \(R_1\) was simulated using (1) and \(R_2\) was simulated using a binomial distribution with parameter \((1-\xi _2)\). If \(D_1=1\) and \(D_2=0\), \(R_1\) was simulated using a binomial distribution with parameter \((1-\xi _1)\) and \(R_2\) was simulated using (1). If \(D_1=D_2=1\), then m Bernoulli variables were simulated using \((1-\xi _1)\) as the parameter. These m Bernoulli variables were summed up to yield \(R_1\). Conditional on each value of these m Bernoulli variables, another m Bernoulli variables were generated with a parameter specified in (3). The sum of the latter m Bernoulli variables resulted in \(R_2\). Three sample sizes \(n=\lbrace 1000,2000,3000\rbrace \) were used. For each sample size, 1000 replicates were simulated. The convergence threshold for the EM algorithm was set to be \(1\times 10^{-5}\).

Under the parameters specified in Set 1, \(r_U=0.56, r_T=0.60\), and \(R_1\) and \(R_2\) are positively correlated, with a theoretical correlation of 0.24. Under those specified in Set 2, \(r_U=0.33, r_T=-0.43\), and \(R_1\) and \(R_2\) are only weakly positively correlated, with a theoretical correlation of 0.05. Table 1 summarises the estimation results across all simulation replicates.

For both sets of parameters, the biases of all estimated parameters were very small, with a generally decreasing trend with sample sizes. Meanwhile, the coefficients of variation decrease with the sample size as well, as one would expect. Not surprisingly, the variabilities of \(\alpha _U\) and \(\alpha _B\) were greater than the other parameters. This is probably due to the fact that these parameters can only be estimated when \(D_1=D_2=0\) and \(D_1=D_2=1\), respectively, hence requiring a larger sample size than the marginal parameters in achieving a lower variability. Overall, we conclude that the EM algorithm proposed in Sect. 4.2 worked well and is therefore an appropriate method for fitting the bivariate CUB model when both the sample size and the number of categories are large.

Table 1 Mean and coefficient of variation (CV) of the estimated parameters under different simulation scenarios for the large sample sizes with ten categories

5.2 Small sample with 5 or 7 categories

The data generating process was the same as those reported in Sect. 5.1, except the number of categories and sample sizes are smaller. Specifically, the cases where \(m=4\) and 6 were considered. For each value of m, sample sizes of 100, 200 and 300 were used. The parameters used were \((\pi _1,\pi _2,\xi _1,\xi _2,\alpha _U,\alpha _B)'=(0.7,0.3,0.8,0.4,3.0,0.2)'\). Compared to the previous two sets of parameters used, this set of parameters would make the data more sparse as \(\xi _1\) is closer to 1, meaning that the values for \(R_1\) are more concentrated in the lower end. The simulation results are provided in Table 2. From the results, as the sample size increases, a generally decreasing trend in the biases of the estimates can be observed. The marginal parameters can be accurately estimated even with the lower sample size considered, although larger biases can be observed compared to the large sample cases reported in Table 1. Consistent with the large sample case, the estimation of the dependency parameters \(\alpha _U\) and \(\alpha _B\) is less accurate than the marginal parameters. The number of categories does not seem to have a huge impact on the estimation of the parameters.

Table 2 Mean and CV of the estimated parameters for 1000 simulation replicates with \(n=100, 200\) and 300, and \(m=4\) and 6

6 Application

The proposed bivariate CUB model was applied to the “relgoods” dataset, available within the CUB package (Iannario et al. 2020) in R (R Core Team 2022). The dataset contains results from a survey conducted in Naples, Italy, in 2014. Respondents of the survey were asked to evaluate their scores for various relational goods (for example, time dedicated to friends and family) and related issues such as safety of surroundings and their feeling of happiness. We focused on two of the questions related to the following aspects:

  • Environment: the level of comfort with the surrounding environment, and

  • Safety: the level of safety in the streets.

In the original survey, for both questions, respondents provided a score on a 10-point Likert scale, ranging from 1 = “never, at all” to 10 = “always, a lot”. For our purpose, we have re-scaled the responses to 0 to 9 by subtracting 1 from each response (meaning that \(m=9\)). The dataset contains many other variables. Univariate analysis results on some of the variables can be found in, for example, Iannario and Simone (2017) and Capecchi et al. (2018). Further details regarding the dataset can be found on https://rdrr.io/cran/CUB/man/relgoods.html. The R code used to obtain the results in this section is available as Supplementary Information online.

As one can naturally expect some association between the level of comfort with the surrounding environment and the level of safety in the surrounding areas, a bivariate model would be appropriate. Originally, there were a total of 2,459 responses. Upon removing 9 observations that contained missing values, the proposed bivariate CUB model was fitted on the remaining 2,450 observations. Here we label “Environment” as \(R_1\) and “Safety” as \(R_2\). The procedures described in Sects. 4.2 to 4.4 were employed to gain insights from the dataset.

Table 3 Estimated parameters under the proposed bivariate CUB model and separate univariate CUB models

Table 3 depicts the estimated parameters based on the proposed bivariate CUB model and separate univariate CUB models. The parameters under the univariate case were obtained using the functionalities within the CUB package (Iannario et al. 2020). Overall, the bivariate model resulted in a higher log-likelihood as well as a lower AIC and BIC, indicating a better goodness-of-fit (GOF). The better performance can also be checked visually by assessing the contour plots and 3D histograms provided in Fig. 3. In particular, the separate model was not able to capture the positive correlation between the two ratings.

Based on the estimated parameters in the bivariate model, we have \({\hat{r}}_U\) = 0.191 and \({\hat{r}}_T\) = 0.316, while the empirical correlation between \(R_1\) and \(R_2\) was \(r_R = 0.229\). Thus, the correlation between the feelings of the two questions was larger than that suggested by \(r_R\). Results of hypothesis tests in Table 4 also show that both \(\alpha _U\) and \(\alpha _B\) are significantly different from zero.

Suppose the respondent was uncertain towards both questions, the estimated value of \({\hat{\alpha }}_U = 1.723\) suggests that the estimated probability of choosing the same category, given the first response, was \(1.723/10=0.1723\), a 72.3% increase compared to a model assuming independence among the responses. Moreover, suppose the respondent chose to express his/ her feeling towards the two questions, the model found a moderate positive correlation (\({\hat{r}}_T\) = 0.316) among the two responses, indicating that the two responses tended to go in the same direction. That is, respondents who are satisfied with the level of comfort with the surrounding environment tended to be satisfied with the level of safety in the streets as well. These kinds of insights regarding the associations between the two survey items were not obtainable if the two variables were fitted separately.

Table 4 Results of likelihood ratio tests under \(H_0^1: \alpha _B=\alpha _U=0\), \(H_0^2: \alpha _B=0\) and \(H_0^3:\alpha _U=0\)
Fig. 3
figure 3

Contour plots and 3D histograms for the observed data (left) and fitted models (middle: using bivariate CUB model; right: using univariate CUB models fitted separately)

The same dataset was also analysed using HMMLU with \(h_i(r_i)\) taking the form of discrete uniform distribution. In total, 22 parameters were used: three for the mixing weights \(\pi _{00}, \pi _{01}\) and \(\pi _{10}\) (\(\pi _{11}\) can be derived from these three), nine for the marginal logits for each of \(R_1\) and \(R_2\) and one for the log odds ratio. In particular, local logits in the form of \(\eta _r ^j=\log \left[ P(R_j = r+1|D_j=1)/P(R_j = r|D_j=1)\right] \) for \(j=1,2\) and \(r=0,2,\ldots ,8\) were used, and a global odds ratio (Dale 1986)

$$\begin{aligned} \psi =\frac{P(R_1\le i, R_2\le j)P(R_1> i, R_2> j)}{P(R_1> i, R_2\le j)P(R_1\le i, R_2> j)} \end{aligned}$$

that is identical for all i and j was used. The use of only one log odds ratio was to ensure model identifiability (Colombi et al. 2019, p. 599). Table 5 shows the estimated values of the parameters and the overall GOF of the model. Not surprisingly, HMMLU provided a better fit in terms of all measures used since it contained substantially more parameters. The relative advantage of the bivariate CUB model lies in parsimony and interpretability.

Table 5 Estimated parameters under HMMLU with \(h_i(r_i)\) taking the form of discrete uniform distribution, and the overall GOF of the model

7 Discussions and conclusion

In this research work, we have proposed a novel bivariate CUB model for modelling correlated ordinal variables, especially those arising from surveys that require people to rate or express their opinions on a Likert scale. The joint distribution belongs to a general class of mixture distributions while the marginal variables belong to the CUB distribution. Combining the two CUB variables facilitates further insights, such as the association between the two variables, to be drawn from the dataset. Identifiability and other inferential issues around the proposed model have been discussed throughout the paper. The estimation procedure has been found to work satisfactorily through simulation studies. Additional simulation studies under varied scenarios would enhance comprehension of the model’s performance. Upon applying the proposed model to a set of publicly available data, we have demonstrated the capability of the model in analysing two variables jointly instead of separately, and how further insights on the associations of the survey items could be discovered.

Since responses from surveys involve psychological behaviours of the respondents, it is important to take into account the potential biases that may have been introduced. Apart from indecision or uncertainty, the uncertainty component of the CUB model can also be used to account for other elements such as difficulty in expressing an actual feeling, limited knowledge, fatigue or willingness to satisfy the interviewer (Iannario and Piccolo 2016; Iannario and Tarantola 2023). As shown by Colombi et al. (2019), the ignorance of the uncertainty component during the modelling stage would lead to substantial biases in the estimation results.

One distinctive feature of the proposed model is the ability to estimate the associations within the uncertainty and feeling components separately. Previous attempts to generalise CUB models to the multivariate setting typically rely on copula-based methods, in particular the Plackett distribution (Corduas 2011; Andreis and Ferrari 2013; Corduas 2015). Another notable work by Colombi and Giordano (2016) used Sarmanov distribution to bind the univariate margins. Both the Plackett and Sarmanov distributions have a parameter that is related to either the rank or Pearson correlation of the two marginal variables. However, it is not possible to tell whether the correlation results from the uncertainty or the feeling component of the underlying CUB variables. Our proposal, on the other hand, allows the decomposition of the overall correlation into two separate elements. In particular, the estimated correlation between the respondents’ feelings/preferences would be considered an important measure in many applications. Although Colombi et al. (2019) do not use copula, it assumes independence between the uncertain responses.

One of the reasons why CUB models have become popular is the ability to include respondents’ covariates in the model, enabling analysts to explore the relationship between the CUB parameters and the subjects’ covariates for better interpretation. Under the proposed bivariate CUB model, we conjecture that it would be straightforward to include covariates for the uncertainty parameters \(\pi \). However, it may be challenging to include covariates for the feeling parameters \(\xi \), as the admissible range of \(\alpha _B\) [which is a function of \(\xi _1\) and \(\xi _2\) as provided in Eq. (4)] will then be affected by the covariates. Re-parameterising \(\alpha _B\) could be a way to overcome this challenge, but it is unclear at this stage how this would affect the likelihood function and the mechanism of the EM algorithm introduced in this paper. Further studies are needed to devise a solution. Nonetheless, we have purposely not considered models with covariates since the identifiability has not been established. In fact, to the best of our knowledge, we are not aware of any work that has fully tackled the identifiability issue even for univariate CUB models with covariates.

Our proposed model can be extended in several ways. For example, inclusion of “shelter”/ “refuge” (Iannario 2012) or “don’t know” category (Manisera and Zuccolotto 2014; Iannario et al. 2018) would be a direction for future research. Assuming identifiability is not an issue, other bivariate binomial and discrete uniform distributions would replace those utilised in this work. In the univariate case, Gottard et al. (2016) provide details of some other distributions that could be used to replace the uniform distribution in the uncertainty part. Building a bivariate model using these distributions would potentially lead to models that are more interpretable under certain contexts.

In this work, we have focused on the bivariate case. The model developed will serve as a building block for higher dimensional models. As the dependency structure becomes more complicated, the number of parameters will inevitably increase as well. Our proposed bivariate model would be useful if some pairwise dependence or Markov assumptions are to be imposed. These assumptions are particularly suitable for time series (Varin and Vidoni 2006) or spatial ordinal data (Feng et al. 2014; Ip and Wu 2024). More parameters will also mean a higher complexity of the observed information matrix. In that case, the empirical information matrix (Meilijson 1989; McLachlan and Peel 2000; Scott 2002), which requires only the first derivatives, can be used to ease the laborious burden in obtaining the second derivatives.