Psychological scales and educational tests are developed to measure a particular construct by selecting items from several identified domains (Gibbons et al. 2007). For example, a questionnaire or instrument, used in psychometrics to assess abstract concepts, such as the well-being has a large number of items or questions that are sampled from several subdomains such as depression, anxiety and stress. This special classification of items in educational assessments is termed ‘testlets’ (Wainer and Kiely 1987). It is essential to investigate the factorial structure, as implementing unstructured factor models on testlet-based items could result in biased estimates and a poor fit (Wang and Wilson 2005; DeMars 2006; Zenisky et al. 2002; Sireci et al. 1991; Lee and Frisbie 1999; Wainer and Thissen 1996).

To account for the homogeneous dependence in several subdomains of some larger domain, Gibbons and Hedeker (1992) and Gibbons et al. (2007) proposed bi-factor models for binary and ordinal items, respectively. They consist of a common factor, that is linked to all items, and non-overlapping group-specific factors. The common factor explains the dependence between all the items, while the group-specific factors explain the dependence amongst items within each domain or group. The items are assumed to be independent given the group-specific and common factors.

An alternative way of modelling items that are split into several domains is via the second-order model (e.g., de la Torre and Song 2009; Rijmen 2010), where items are indirectly mapped to an overall (second-order) factor via non-overlapping group-specific (first-order) factors. Second-order models are suitable when the first-order factors are associated with each other, and there is a second-order factor that accounts for the relations among the first-order factors. The second-order model can be described as an independent clusters factor model (McDonald 1999) with a single second-order factor.

The bi-factor and the second-order models are not generally equivalent (Yung et al. 1999; Gustafsson and Balke 1993; Mulaik and Quartetti 1997; Rijmen 2010), unless proportionality constraints are imposed by using the Schmid–Leiman transformation method (Schmid and Leiman 1957). More importantly, both models are restricted to the MVN assumption for the latent variables, which might not be valid. Nikoloulopoulos and Joe (2015) emphasized that if the ordinal variables in item response can be thought of as discretization of latent random variables that are maxima/minima or mixtures of means, then the use of factor models based on the MVN assumption for the latent variables could provide poor fit. In the context of item response data, latent maxima, minima and means can arise depending on how a respondent considers specific items. An item might make the respondent think about M past events which, say, have values \(W_1,\ldots ,W_M\). In answering the item, the subject might take the average, maximum or minimum of \(W_1,\ldots ,W_M\) and then convert to the ordinal scale depending on the magnitude. The case of a latent maxima/minima can occur if the response is based on a best or worst case.

Nikoloulopoulos and Joe (2015) have studied factor copula models for item response where for the first factor there are bivariate copulas that couple each item to the first latent variable, and for the second factor there are copulas that link each item to the second latent variable conditioned on the first factor (leading to conditional dependence parameters), etc. They have shown that there is an improvement on the factor models based on the MVN assumption for the latent variables both conceptually and in fit to data. This improvement relies on the aforementioned reasons, i.e., items can have more probability in joint upper or lower tail than would be expected with a discretized MVN or items can be considered as discretized maxima/minima or mixtures of discretized means rather than discretized means. When all the bivariate copulas are bivariate normal (BVN), then the resulting model is the same as the discretized MVN model with a p-factor correlation matrix (Maydeu-Olivares 2006), also known as the p-dimensional normal ogive model (Jöreskog and Moustaki 2001). For example, the 1-factor copula model with BVN copulas is the same as the variant of Samejima’s (1969) graded response IRT model, known as normal ogive model (McDonald 1997) with an 1-factor correlation matrix. We refer to Nikoloulopoulos and Joe (2015, Section 2.3) for further details and explanations on the normal ogive models as special cases of factor copula models.

In this paper, we propose copula extensions for bi-factor and second-order models. The construction of the bi-factor copula model exploits the use of bivariate copulas that link the items to the common and group-specific factors. Note that if there is only one group of items, then the bi-factor model reduces to the two-factor copula model in Nikoloulopoulos and Joe (2015). Similarly with the bi-factor copula model, we also use bivariate copulas to construct the second-order copula model. In this case, there are bivariate copulas that link the items to the group-specific factors, and also bivariate copulas that link the group-specific to the second-order factor. To account for the dependence between the items and group-specific factors, each group of variables in fact is modelled using the one-factor copula model proposed by Nikoloulopoulos and Joe (2015). In addition, if there is only one group of items, then the second-order copula model reduces to the one-factor copula model. Hence, the proposed models contain the one- and two-factor copula models in Nikoloulopoulos and Joe (2015) as special cases, while allowing flexible dependence structure for both within- and between-group dependence. As a result, the models are suitable for modelling a high-dimensional item response classified into non-overlapping groups.

The proposed copula constructions are truncated vine copula models (Brechmann et al. 2012) that involve both observed and latent variables. Joe et al. (2010) have shown that by choosing bivariate linking copulas appropriately, truncated vine copula models can have a wide range of asymmetric dependence as well as tail dependence (dependence among extreme values) and different lower/upper tail dependence parameters for each bivariate margin. Hence, the bi-factor and second-order copula models will be useful when the items have more probability in joint upper or lower tail than would be expected with a discretized MVN. If the bivariate linking copulas are BVN, then the Gaussian bi-factor and second-order models are special cases of our constructions which are the discrete counterparts of the structured factor copula models Krupskii and Joe (2015) where dependence and tail properties are obtained.

The remainder of the paper proceeds as follows: Section 1 introduces the bi-factor and second-order copula models for item response and discusses their relationship with the existing models. Estimation techniques and computational details are provided in Sect. 2. Section 3 proposes simple diagnostics based on semi-correlations and a heuristic method to select suitable bivariate copulas and build plausible bi-factor and second-order copula models. Section 4 summarizes the assessment of goodness of fit of these models using the \(M_2\) statistic of Maydeu-Olivares and Joe (2006), which is based on a quadratic form of the deviations of sample and model-based proportions over all bivariate margins. Section 5 contains an extensive simulation study to gauge the small-sample efficiency of the proposed estimation, investigate the misspecification of the bivariate copulas, and examine the reliability of the model selection and goodness-of-fit techniques. Section 6 presents an application of our methodology to the Toronto Alexithymia Scale. In this example, it turns out that our models, with linking copulas selected according to the items being discretized latent minima or mixtures of discretized means, provide better fit than the Gaussian bi-factor and second-order models. We conclude with some discussion in Sect. 7.

1 Bi-factor and Second-Order Copula Models

Let \(\underbrace{Y_{11}, \ldots ,Y_{d_11}}_1,\ldots , \underbrace{Y_{1g},\ldots , Y_{d_gg}}_g, \ldots , \underbrace{Y_{1G}, \ldots , Y_{d_GG}}_G\) denote the item response variables classified into the G non-overlapping groups. There are \(d_g\) items in group g; \(g=1,\ldots ,G,\,j=1,\ldots ,d_g\) and collectively there are \(d=\sum _{g=1}^{G} d_g\) items, which are all measured on an ordinal scale; \(Y_{jg}\in \{0,\ldots ,K-1\}\). Let the cutpoints in the uniform U(0, 1) scale for the jg’th item be \(a_{jg,k}\), \(k=1,\ldots ,K-1\), with \(a_{jg,0}=0\) and \(a_{jg,K}=1\). These correspond to \(a_{jg,k}=\Phi (\alpha _{jg,k})\), where \(\alpha _{jg,k}\) are cutpoints in the normal N(0, 1) scale.

The bi-factor and second-order factor copula models are presented in Sects. 1.1 and 1.2, respectively. Section 1.3 discusses their relationship with the existing Gaussian bi-factor and second-order models, and Sect. 1.4 provides the bivariate linking copulas we consider along with their properties.

1.1 Bi-factor Copula Model

Consider a common factor \(V_0\) and G group-specific factors \(V_1,\ldots ,V_G\), where \(V_0,V_1,\ldots ,V_G\) are independent and standard uniformly distributed. Let \(Y_{jg}\) be the jth observed variable in group g, with \(y_{jg}\) being the realization. The bi-factor model assumes that \(Y_{1g},\ldots , Y_{d_gg}\) are conditionally independent given \(V_0\) and \(V_g\), and that \(Y_{jg}\) in group g does not depend on \(V_{g'}\) for \(g\ne g'\). Figure 1 depicts a graphical representation of the model.

Fig. 1
figure 1

Graphical representation of the bi-factor copula model with G group-specific factors and a common factor \(V_0\).

The joint probability mass function (pmf) is given by:

$$\begin{aligned} \pi (\textbf{y})= & {} \Pr (Y_{jg} = y_{jg} ; j=1,\ldots ,d_g,g=1,\ldots ,G) \\= & {} \int _{[0,1]^{G+1}} \prod _{g=1}^{G}\prod _{j=1}^{d_g} \Pr (Y_{jg} = y_{jg}| V_0 =v_0, V_g=v_g) d{v_1}\cdots d{v_G} \textrm{d}{v_0}. \end{aligned}$$

According to Sklar’s theorem (1959), there exists a bivariate copula \(C_{Y_{jg},V_0}\) such that \(\Pr (Y_{jg} \le y_{jg}, V_0 \le v_0)= C_{Y_{jg},V_0}\bigl (F_{{Y_{jg}}}(y_{jg}), v_0\bigr )\), for \(v_0 \in [0,1]\), where \(C_{Y_{jg},V_0}\) is the copula that links the item \(Y_{jg}\) with the common factor \(V_0\), \(F_{Y_{jg}}\) is the cumulative distribution function (cdf) of \(Y_{jg}\); note that \(F_{Y_{jg}}\) is a step function with jumps at \(0,\ldots ,K-1\), i.e., \(F_{Y_{jg}}(y_{jg})=a_{jg,y_{jg}+1}\). Then, it follows that,

$$\begin{aligned} F_{Y_{jg}|V_0}(y_{jg}|v_0):= \Pr (Y_{jg} \le y_{jg} | V_0 = v_0)= \frac{\partial }{\partial v_0} C_{Y_{jg},V_0}\bigl (a_{jg,y_{jg}+1}, v_0\bigr ). \end{aligned}$$

For shorthand notation, we let \(C_{Y_{jg}|V_0}\bigl (a_{jg,y_{jg}+1}|v_0\bigr ) = \frac{\partial }{\partial v_0} C_{Y_{jg},V_0}\bigl ( a_{jg,y_{jg}+1}, v_0\bigr )\).

The observed variables also load on the group-specific factors; hence to account for this dependence, we let \(C_{Y_{jg},V_g|V_0 }\) be a bivariate copula that links the item \(Y_{jg}\) with the group-specific factor \(V_g\) given the common factor \(V_0\). Hence,

$$\begin{aligned}&\Pr (Y_{jg} \le y_{jg}| V_0 =v_0, V_g=v_g)= \frac{\partial }{\partial v_g} \Pr (Y_{jg} \le y_{jg} , V_g\le v_g | V_0 =v_0) \\&\qquad = \frac{\partial }{\partial v_g} C_{Y_{jg},V_g|V_0}\bigl (F_{Y_jg|V_0}(y_{jg}|v_0), v_g\bigr )= C_{Y_{jg}|V_g;V_0}\bigl (F_{Y_jg|V_0}(y_{jg}|v_0)| v_g\bigr ). \end{aligned}$$

To this end, the pmf of the bi-factor copula model takes the form

$$\begin{aligned} \pi (\textbf{y})&= \int _{[0,1]^{G+1}} \prod _{g=1}^{G} \prod _{j=1}^{d_g} \biggr \{C_{Y_{jg}|V_g;V_0}\bigl (F_{Y_{jg}|V_0}(y_{jg}|v_0)| v_g\bigr ) \nonumber \\ {}&\quad -C_{Y_{jg}|V_g;V_0}\bigr (F_{Y_{jg}|V_0}(y_{jg}-1|v_0)| v_g\bigl ) \biggl \} d{v_1}\cdots d{v_G} \textrm{d}{v_0} \nonumber \\&= \int _0^1 \prod _{g=1}^{G} \Bigg \{ \int _0^1 \prod _{j=1}^{d_g} \biggl [ C_{Y_{jg}|V_g;V_0}\bigl (F_{Y_{jg}|V_0}(y_{jg}|v_0)| v_g\bigr )\nonumber \\ {}&\quad -C_{Y_{jg}|V_g;V_0}\bigr (F_{Y_{jg}|V_0}(y_{jg}-1|v_0)| v_g\bigl ) \biggr ] \textrm{d}{v_g} \Bigg \} \textrm{d}{v_0} \nonumber \\&= \int _0^1 \prod _{g=1}^{G} \Bigg \{ \int _0^1 \prod _{j=1}^{d_g} \biggr [ C_{Y_{jg}|V_g;V_0}\bigl ( C_{Y_{jg}|V_0}(a_{jg,y_{jg}+1}|v_0) | v_g\bigr ) \nonumber \\ {}&\quad -C_{Y_{jg}|V_g;V_0}\bigr ( C_{Y_{jg}|V_0}(a_{jg,y_{jg}}|v_0) | v_g\bigr ) \biggl ] \textrm{d}{v_g} \Bigg \} \textrm{d}{v_0}\nonumber \\&= \int _0^1 \prod _{g=1}^{G} \Bigg \{ \int _0^1 \prod _{j=1}^{d_g} f_{Y_{jg}|V_{g};V_0}(y_{jg}|v_g,v_0) \textrm{d}{v_g} \Bigg \} \textrm{d}{v_0}. \end{aligned}$$
(1)

It is shown that the pmf is represented as an one-dimensional integral of a function which is in turn a product of G one-dimensional integrals. Thus, we avoid \((G+1)\)-dimensional numerical integration.

In addition to the computational advancements the proposed model offers, it can provide, with appropriately chosen linking copulas, more probability in joint upper or lower tail than would be expected with a discretized MVN. The bi-factor copula can be explained as a 2-truncated vine. d-dimensional vine copulas can cover flexible dependence structures through the specification of d bivariate marginal copulas at level 1 and \(d(d-1)/2\) bivariate conditional copulas at higher levels (Nikoloulopoulos et al. 2012). For the d-dimensional bi-factor copula, the pairs at level 1 are \(Y_{jg},V_0\) for \(g=1,\ldots ,G,\,j=1,\ldots ,d_g\), the pairs at level 2 are \(Y_{jg},V_g|V_0\) for \(g=1,\ldots ,G,\,j=1,\ldots ,d_g\), and for higher levels the (conditional) copula pairs are set to independence. That is the bi-factor copula has d bivariate copulas \(C_{Y_{jg},V_0}\) that link \(Y_{jg},\,g=1,\ldots ,G,\,j=1,\ldots ,d_g\) with \(V_0\) in the 1st level of the vine, d bivariate copulas \(C_{Y_{jg},V_g|V_0}\) that link \(Y_{jg},\,g=1,\ldots ,G,\,j=1,\ldots ,d_g\) with \(V_g,\,g=1,\ldots ,G\) given \(V_0\) in the 2nd level of the vine, and independence copulas in all the remaining levels of the vine (truncated after the 2nd level). From results in Joe et al. (2010) and Krupskii and Joe (2015), upper or lower tail dependent copulas in levels 1 and 2 will lead to items that have more probability in joint upper or lower tail than would be expected with a discretized MVN.

For the parametric version of the bi-factor copula model, we let \(C_{Y_{jg},V_0}\) and \(C_{Y_{jg},V_g|V_0}\) be parametric copulas with dependence parameters \(\theta _{jg}\) and \(\delta _{jg}\), respectively.

1.2 Second-Order Copula Model

Assume that for a fixed \(g=1,\ldots ,G\), the items \(Y_{1g},\ldots , Y_{d_gg}\) are conditionally independent given the first-order factors \(V_g \sim U(0,1),\,g=1,\ldots ,G\) and that \(\textbf{V}=(V_1,\ldots ,V_G)\) are conditionally independent given the second-order factor \(V_0 \sim U(0,1)\). That is the joint distribution of \(\textbf{V}\) has an one-factor structure. We also assume that \(Y_{jg}\) in group g does not depend on \(V_{g'}\) for \(g\ne g'\). Figure 2 depicts the graphical representation of the model.

The joint pmf takes the form

$$\begin{aligned} \pi (\textbf{y}) =\int _{[0,1]^G} \Bigg \{ \prod _{g=1}^{G} \prod _{j=1}^{d_g} \Pr (Y_{jg} = y_{jg} | V_g = v_g) \Bigg \} c_\textbf{V}(v_1,\ldots , v_G) \textrm{d}{v_1}\cdots \textrm{d}{v_G}; \end{aligned}$$

\(c_\textbf{V}\) is the one-factor copula density (Krupskii and Joe 2013) of \(\textbf{V}\), viz.

$$\begin{aligned} c_\textbf{V}(v_1,\ldots , v_G) = \int _0^1 \prod _{g=1}^{G} c_{V_g,V_0}(v_g,v_0) \textrm{d}v_0, \end{aligned}$$

where \(c_{V_g,V_0}\) is the bivariate copula density of the copula \(C_{V_g,V_0}\) linking \(V_g\) and \(V_0\).

Fig. 2
figure 2

Graphical representation of the second-order copula model with G first-order factors and one second-order factor \(V_0\).

Letting \(C_{Y_{jg},V_g}\) be a bivariate copula that joins the item \(Y_{jg}\) and the group-specific factor \(V_g\) such that

$$\begin{aligned} F_{Y_{jg}|V_g}(y_{jg}|v_g):= \Pr (Y_{jg} \le y_{jg} | V_g = v_g)= \frac{\partial }{\partial v_g} C_{Y_{jg},V_g}\bigl (a_{jg,y_{jg}+1}, v_g\bigr )=C_{Y_{jg}|V_g}\bigl (a_{jg,y_{jg}+1}| v_g\bigr ), \end{aligned}$$

the pmf of the second-order copula model becomes

$$\begin{aligned} \pi (\textbf{y})&= \int _0^1 \int _{[0,1]^G} \Bigg \{ \prod _{g=1}^{G} \prod _{j=1}^{d_g} \Big ( C_{Y_{jg}|V_g}\bigl (a_{jg,y_{jg}+1} | v_g\bigr ) - C_{Y_{jg}|V_g}\bigl (a_{jg,y_{jg}} | v_g \bigr ) \Big ) \Bigg \}\nonumber \\ {}&\quad \times \Bigg \{ \prod _{g=1}^{G} c_{V_g,V_0}\bigl (v_g,v_0\bigr ) \Bigg \} \textrm{d}{v_1} \cdots \textrm{d}{v_G} \textrm{d}{v_0}\nonumber \\&= \int _0^1 \Bigg \{ \prod _{g=1}^{G} \int _0^1 \Bigg [ \prod _{j=1}^{d_g} \Big ( C_{Y_{jg}|V_g}\bigl (a_{jg,y_{jg}+1} | v_g\bigr ) - C_{Y_{jg}|V_g}\bigl (a_{jg,y_{jg}} | v_g \bigr ) \Big ) \Bigg ] c_{V_g,V_0}\bigl (v_g,v_0\bigr ) \textrm{d}{v_g} \Bigg \} \textrm{d}{v_0}\nonumber \\&=\int _0^1 \Bigg \{ \prod _{g=1}^{G} \int _0^1 \Big [ \prod _{j=1}^{d_g} f_{Y_{jg}|V_g}(y_{jg}|v_g) \Big ] c_{V_g,V_0}\bigl (v_g,v_0\bigr ) \textrm{d}{v_g} \Bigg \} \textrm{d}{v_0}. \end{aligned}$$
(2)

Similarly with the bi-factor copula model, the pmf is represented as an one-dimensional integral of a function, which is in turn a product of G one-dimensional integrals.

In addition to the computational advancements, the second-order model offers, it can provide, with appropriately chosen linking copulas, more probability in joint upper or lower tail than would be expected with a discretized MVN. The second-order copula can be explained as an 1-truncated vine. For the d-dimensional second-order copula, the pairs at level 1 are \(Y_{jg},V_g\) for \(g=1,\ldots ,G,\,j=1,\ldots ,d_g\) and \(V_0,V_g\) for \(g=1,\ldots ,G\), and for higher levels the (conditional) copula pairs are set to independence. That is the second copula has d bivariate copulas \(C_{Y_{jg},V_g}\) that link \(Y_{jg},\,g=1,\ldots ,G,\,j=1,\ldots ,d_g\) with \(V_g,\,g=1,\ldots ,G\) and G bivariate copulas \(C_{V_g,V_0}\) that link \(V_{g},\,g=1,\ldots ,G\) with \(V_0\) in the 1st level of the vine, and independence copulas in all the remaining levels of the vine (truncated after the 1st level). Joe et al. (2010) have shown that in order for a vine copula to have tail dependence for all bivariate margins, it is only necessary for the bivariate copulas in level 1 to have tail dependence and it is not necessary for the conditional bivariate copulas in levels \(2,\ldots ,d\) to have tail dependence. Hence, upper or lower tail dependent copulas in level 1 will lead to will lead to items that have more probability in joint upper or lower tail than would be expected with a discretized MVN.

For the parametric version of the second-order copula model, we let \(C_{Y_{jg},V_g}\) and \(C_{V_g,V_0}\) be parametric copulas with dependence parameters \(\theta _{jg}\) and \(\delta _{g}\), respectively.

1.3 Special Cases

In this subsection, we show what happens when all bivariate copulas are BVN. Let \(Z_{jg}\) be the underlying continuous variable of the ordinal variable \(Y_{jg}\), i.e., \(Y_{jg} = y_{jg}\) if \(\alpha _{jg,y_{jg}} \le Z_{jg} \le \alpha _{jg,y_{jg}+1}\) with \(\alpha _{jg,K} = \infty \) and \(\alpha _{jg,0} = -\infty \).

For the bi-factor model, if \(C_{Y_{jg},V_0}(\cdot ;\theta _{jg})\) and \(C_{Y_{jg},V_g|V_0}(\cdot ;\delta _{jg})\) are BVN copulas,

$$\begin{aligned} C_{Y_{jg}|V_g; V_0}(C_{Y_{jg} | V_0}(a_{jg,y_{jg}+1}| v_0) | v_g) =\Phi \left( \frac{ \alpha _{jg,y_{jg}+1} - \theta _{jg} \Phi ^{-1}(v_0) - \delta _{jg} \sqrt{1-\theta _{jg}^2} \Phi ^{-1}(v_g)}{\sqrt{(1 - \theta _{jg}^2) (1-\delta _{jg}^2) }} \right) . \end{aligned}$$

Hence, the pmf for the bi-factor copula model in (1) becomes:

$$\begin{aligned} \pi (\textbf{y})= & {} \int _0^1 \prod _{g=1}^G \Biggl \{ \int _0^1 \prod _{j=1}^{d_g} \Biggl [ \Phi \left( \frac{ \alpha _{jg,y_{jg}+1} - \theta _{jg} \Phi ^{-1}(v_0) - \delta _{jg} \sqrt{1-\theta _{jg}^2} \Phi ^{-1}(v_g)}{\sqrt{(1 - \theta _{jg}^2) (1-\delta _{jg}^2) }} \right) \\{} & {} -\Phi \left( \frac{ \alpha _{jg,y_{jg}} - \theta _{jg} \Phi ^{-1}(v_0) - \delta _{jg} \sqrt{1-\theta _{jg}^2} \Phi ^{-1}(v_g)}{\sqrt{(1 - \theta _{jg}^2) (1-\delta _{jg}^2) }} \right) \Biggr ] dv_g \Biggr \} \textrm{d}v_0 \\= & {} \int _{-\infty }^{\infty } \prod _{g=1}^G \Biggl \{ \int _{-\infty }^{\infty } \prod _{j=1}^{d_g} \Biggl [ \Phi \left( \frac{ \alpha _{jg,y_{jg}+1} - \theta _{jg} z_{0} - \delta _{jg} \sqrt{1-\theta _{jg}^2} z_{g}}{\sqrt{(1 - \theta _{jg}^2) (1-\delta _{jg}^2) }} \right) \\{} & {} - \Phi \left( \frac{ \alpha _{jg,y_{jg}} - \theta _{jg} z_{0} - \delta _{jg} \sqrt{1-\theta _{jg}^2} z_{g}}{\sqrt{(1 - \theta _{jg}^2) (1-\delta _{jg}^2) }} \right) \Biggr ] \phi (z_{g}) dz_{g} \Biggr \} \phi (z_{0}) \textrm{d}z_{0}. \end{aligned}$$

This model is the same as the Gaussian bi-factor model (Gibbons and Hedeker 1992; Gibbons et al. 2007) with stochastic representation

$$\begin{aligned} Z_{jg} = \theta _{jg} Z_{0} + \gamma _{jg} Z_{g} +\sqrt{1 - \theta _{jg}^2 - \gamma _{jg}^2} \epsilon _{jg}, \qquad g=1,\ldots ,G,\quad j=1,\ldots ,d_g, \end{aligned}$$
(3)

where \(\gamma _{jg}=\delta _{jg}\sqrt{1 - \theta _{jg}^2}\) and \(Z_{0},Z_{g},\epsilon _{jg}\) are iid N(0, 1) random variables. The parameter \(\theta _{jg}\) of \(C_{Y_{jg},V_0}\) is the correlation of \(Z_{jg}\) and \(Z_0\), and the parameter \(\delta _{jg}\) of \(C_{Y_{jg},V_g|V_0}\) is the partial correlation between \(Z_{jg}\) and \(Z_{g}=\Phi ^{-1}(V_g)\) given \(Z_0=\Phi ^{-1}(V_0)\).

It implies that the underlying random variables \(Z_{jg}\)’s have a multivariate Gaussian distribution where the off-diagonal entries of the correlation matrix have the form \(\theta _{j_1g}\theta _{j_2g} + \gamma _{j_1g}\gamma _{j_2g}\) and \(\theta _{j_1g_1}\theta _{j_2g_2}\) for \( j_1 \ne j_2\) and \(g_1 \ne g_2\), respectively. For the Gaussian bi-factor model to be identifiable, the number of dependence parameters has to be \(2d - N_{1}-N_{2}\), where \(N_{1}\) and \(N_{2}\) is the number of groups that consist of 1 and 2 items, respectively. For a group g of size 1 with variable j, \(Z_g\) is absorbed with \(\epsilon _{jg}\) because \(\gamma _{jg}\) would not be identifiable. For a group g of size 2 with variable indices \(j_1,j_2\), the parameters \(\gamma _{j_1g}\) and \(\gamma _{j_2g}\) appear only in one correlation; hence, one of \(\gamma _{j_1g},\gamma _{j_2g}\) can be taken as 1 without loss of generality. For the bi-factor copula with non-Gaussian linking copulas, near non-identifiability can occur when there are groups of size 2; in this case, one of the linking copulas to the group latent variable can be fixed at comonotonicity.

For the Gaussian second-order model, let \(Z_0,Z_1',\ldots ,Z_G'\) be the dependent latent N(0, 1) variables, where \(Z_0\) is the second-order factor and \(Z_g'=\beta _{g}Z_0+\sqrt{1-\beta _{g}^2}Z_g\) is the first-order factor for group g. That is, there is an one second-order factor \(Z_0\), and the first-order factors \(Z_1',\ldots ,Z_G'\) are linear combinations of the second-order factor, plus a unique variable \(Z_g\) for each first-order factor. The stochastic representation is (Krupskii and Joe 2015):

$$\begin{aligned} Z_{jg}= & {} \beta _{jg} Z _g' +\sqrt{1-\beta _{jg}^2}\epsilon _{jg}\\ Z_g'= & {} \beta _{g}Z_0+\sqrt{1-\beta _{g}^2}Z_g, \qquad g=1,\ldots ,G, \quad j=1,\ldots ,d_g, \end{aligned}$$

or

$$\begin{aligned} Z_{jg}=\beta _{jg} \beta _{g}Z_0+\beta _{jg}\sqrt{1-\beta _{g}^2}Z_g +\sqrt{1-\beta _{jg}^2}\epsilon _{jg}, g=1,\ldots ,G, \quad j=1,\ldots ,d_g. \end{aligned}$$
(4)

Hence, this is a special case of (3) where \(\theta _{jg} = \beta _{jg} \beta _{g}\) and \(\gamma _{jg} = \beta _{jg} \sqrt{1-\beta _{g}^2}\).

1.4 Other Choices of Parametric Bivariate Copulas

In line with Nikoloulopoulos and Joe (2015), we use bivariate parametric copulas that can be used when considering latent maxima, minima or mixtures of means. For different dependent items based on latent maxima or minima, multivariate extreme value and copula theory (e.g., Joe 1997) can be used to select suitable copulas that link observed to latent variables. Copulas that arise from extreme value theory have more probability in one joint tail (upper or lower) than expected with a discretized MVN distribution or a MVN copula with discrete margins. If item responses are based on discretizations of latent variables that are means, then it is possible that there can be more probability in both the joint upper and joint lower tail, compared with discretized MVN models. This happens if the respondents consist of a ‘mixture’ population (e.g., different locations or genders). From the theory of elliptical distributions and copulas (e.g., McNeil et al. 2005), it is known that the multivariate Student-t distribution as a scale mixture of MVN has more dependence in the tails. Extreme value and elliptical copulas can model item response data that have reflection asymmetric and symmetric dependence, respectively.

A bivariate copula C is reflection symmetric if its density satisfies \(c(u_1,u_2)=c(1-u_1,1-u_2)\) for all \(0\le u_1,u_2\le 1\). Otherwise, it is reflection asymmetric often with more probability in the joint upper tail or joint lower tail. Upper tail dependence means that \(c(1-u,1-u)=O(u^{-1})\) as \(u\rightarrow 0\) and lower tail dependence means that \(c(u,u)=O(u^{-1})\) as \(u\rightarrow 0\). If \((U_1,U_2)\sim C\) for a bivariate copula C, then \((1-U_1,1-U_2)\sim \widehat{C}\), where \(\widehat{C}(u_1,u_2)=u_1+u_2-1+C(1-u_1,1-u_2)\) is the survival or reflected copula of C; this “reflection” of each uniform U(0, 1) random variable about 1/2 changes the direction of tail asymmetry.

After briefly providing definitions of tail dependence and reflection symmetry/asymmetry, we provide below the bivariate copula choices we consider:

  • The extreme value Gumbel copula with cdf

    $$\begin{aligned} C(u_1,u_2;\theta )=\exp \Bigl [-\Bigl \{(-\log u_1)^{\theta } +(-\log u_2)^{\theta }\Bigr \}^{1/\theta }\Bigr ], \theta \ge 1. \end{aligned}$$

    A model with bivariate Gumbel copulas has latent (ordinal) variables that can be considered as (discretized) maxima, and there is more probability in the joint upper tail as the Gumbel copula has reflection asymmetry and upper tail dependence.

  • The survival Gumbel (s.Gumbel) copula with cdf

    $$\begin{aligned} C(u_1,u_2;\theta )= & {} u_1+u_2-1 \\ {}{} & {} + \exp \Bigl [-\Bigl \{\bigl (-\log (1-u_1)\bigr )^{\theta } +\bigl (-\log (1-u_2)\bigr )^{\theta }\Bigr \}^{1/\theta }\Bigr ], \theta \ge 1. \end{aligned}$$

    A model with bivariate s.Gumbel copulas has latent (ordinal) variables that can be considered as (discretized) minima, and there is more probability in the joint lower tail as the s.Gumbel copula has reflection asymmetry and lower tail dependence.

  • The elliptical bivariate \(t_\nu \) copula with cdf

    $$\begin{aligned} C(u_1,u_2;\theta )=\mathcal {T}_2\Bigl (\mathcal {T}^{-1}(u_1;\nu ),\mathcal {T}^{-1}(u_2;\nu );\theta ,\nu \Bigr ),-1\le \theta \le 1, \end{aligned}$$

    where \(\mathcal {T}(;\nu )\) is the univariate Student t cdf with (non-integer) \(\nu \) degrees of freedom, and \(\mathcal {T}_2\) is the cdf of a bivariate Student-t distribution with \(\nu \) degrees of freedom and correlation parameter \(\theta \). A model with bivariate \(t_\nu \) copulas has latent (ordinal) variables that can be considered as mixtures of (discretized) means, since the bivariate Student-t distribution arises as a scale mixture of bivariate normals. A small value of \(\nu \), such as \(1 \le \nu \le 5\), leads to a model with more probabilities in the joint upper and joint lower tails compared with the BVN copula as the \(t_\nu \) copula has reflection symmetric upper and lower tail dependence.

The BVN and \(t_\nu \) are comprehensive copulas, i.e., they interpolate between countermonotonicity (perfect negative dependence) to comonotonicity (perfect positive dependence), but the Gumbel copulas interpolates between independence and perfect positive dependence. Nevertheless, negative dependence or interpolation between perfect negative dependence and independence can be obtained from the Gumbel copulas by considering reflection of one of the uniform random variables on (0, 1). If \((U_1,U_2)\sim C\) for a bivariate copula C with positive dependence, then

  • \((1-U_1,U_2)\sim \widehat{C}^{(1)}\), where \(\widehat{C}^{(1)}(u_1,u_2)=u_2-C(1-u_1,u_2)\) is the 1-reflected copula of C with negative lower-upper tail dependence;

  • \((U_1,1-U_2)\sim \widehat{C}^{(2)}\), where \(\widehat{C}^{(2)}(u_1,u_2)=u_1-C(u_1,1-u_2)\) is the 2-reflected copula of C with negative upper-lower dependence.

Negative upper-lower tail dependence means that \(c(1-u,u)=O(u^{-1})\) as \(u\rightarrow 0^+\) and negative lower-upper tail dependence means that \(c(u,1-u)=O(u^{-1})\) as \(u\rightarrow 0^+\) (Joe 2011). The proposed models can provide, with linking copulas that allow for negative (tail) dependence, negative (tail) dependence between the observed variables as the dependence between the observed and latent variables is inherited to the dependence amongst the observed variables.

In Fig. 3, to depict the concepts of refection symmetric or asymmetric tail dependence, we show contour plots of the corresponding copula densities with standard normal margins and dependence parameters corresponding to Kendall’s \(\tau \) value of 0.6 in absolute value. To make the models comparable, we convert the BVN/\(t_\nu \) and (reflected) Gumbel copula parameters to Kendall’s \(\tau \)’s via

$$\begin{aligned} \tau (\theta )=\frac{2}{\pi }\arcsin (\theta ) \end{aligned}$$
(5)

and

$$\begin{aligned} \quad \tau (\theta )=\left\{ \begin{array}{lll}1-\theta ^{-1},&{}\quad C, \widehat{C}\\ \theta ^{-1}-1,&{}\quad \widehat{C}^{(1)},\widehat{C}^{(2)}\end{array}\right. , \end{aligned}$$
(6)

respectively. Sharper corners (relative to ellipse) in Fig. 3 indicate tail dependence.

Fig. 3
figure 3

Contour plots of bivariate copulas with standard normal margins and dependence parameters corresponding to Kendall’s \(\tau \) value of 0.6 in absolute value.

The Kendall’s tau parameters, which are strictly increasing functions of the copula parameters, account for the dependence dominated by the middle of the data, and they are expected to be similar among different families of bivariate copulas. However, the tail dependence varies and it is a property to consider when choosing among different families of bivariate copulas (Nikoloulopoulos and Karlis 2008).

2 Estimation and Computational Details

For the set of all parameters, let \(\varvec{\theta }=(\textbf{a},\varvec{\theta }_g,\varvec{\delta }_{g})\) for the bi-factor copula model and \(\varvec{\theta }=(\textbf{a},\varvec{\theta }_g,\varvec{\delta })\) for the second-order copula model, where \(\textbf{a}=(a_{jg,k}: j=1,\ldots ,d_g, g=1,\ldots ,G, k=1,\ldots ,K-1)\), \(\varvec{\theta }_g=(\theta _{1g}, \ldots , \theta _{jg}, \ldots ,\theta _{d_gg}: g=1,\ldots ,G)\), \(\varvec{\delta }_{g}=(\delta _{1g}, \ldots , \delta _{jg}, \ldots ,\delta _{d_gg}: g=1,\ldots ,G)\) and \(\varvec{\delta }=(\delta _{1}, \ldots ,\delta _{G})\). The dimension of \(\textbf{a}\), \(\varvec{\theta }_g\), \(\varvec{\delta }_{g}\) and \(\varvec{\delta }\) are \(d(K-1)\), d, d and G, respectively. Hence, the dimension q of \(\varvec{\theta }\) is \(d(K+1)\) and \(dK +G\) for the bi-factor and second-order copula model, respectively.

With sample size n and data \(\textbf{y}_1,\ldots ,\textbf{y}_n\), the joint log-likelihood of the bi-factor and second-order copula is

$$\begin{aligned} \ell (\varvec{\theta };\textbf{y}_1,\ldots ,\textbf{y}_n)=\sum _{i=1}^n\log \pi (\textbf{y}_i;\varvec{\theta }). \end{aligned}$$
(7)

with \(\pi (\textbf{y}_i;\varvec{\theta })\) as in (1) and (2), respectively. Maximum likelihood (ML) estimation, i.e., maximization of (7), is numerically possible but time-consuming for large d because the large number of univariate cutpoints and dependence parameters. Hence, we approach estimation using the two-step Inference Function of Margins (IFM) method (Joe and Xu 1996; Joe 1997). Joe (2005) has established its asymptotic efficiency and has shown that can efficiently, in the sense of computing time and asymptotic variance, estimate the univariate and dependence parameters.

In the first step of the IFM, the univariate parameters, i.e., the cutpoints, are estimated using the univariate sample proportions. The univariate cutpoints for the jth item in group g are estimated as \(\hat{a}_{jg,k} = \sum _{y=0}^{k} p_{jg,y}\), where \(p_{jg,y}\,,y=0,\ldots ,K-1\) for \(g=1,\ldots ,G\) and \(j=1,\ldots ,d_g\) are the univariate sample proportions. In the second step of the IFM method, the joint log-likelihood in (7) is maximized over the copula parameters with the cutpoints fixed as estimated at the first step. That is for \(i=1,\ldots ,n\) we start from a d-variate sample \(y_{i11}, \ldots ,y_{id_11},\ldots , y_{i1G}, \ldots , y_{id_GG}\) from which d estimators \(F_{11}(y_{i11}), \ldots ,F_{d_11}(y_{id_11}), \ldots , F_{1G}(y_{i1G}), \ldots ,\) \(F_{d_GG}(y_{id_GG})\) are obtained. We use these estimators, i.e., the cutpoints, to transform the \(y_{i11}, \ldots ,y_{id_11},\ldots , y_{i1G},\) \(\ldots , y_{id_GG}\) sample to a uniform sample \({\hat{\alpha }}_{11,y_{i11}+1},\ldots ,{\hat{\alpha }}_{d_11,y_{id_11}+1},\ldots {\hat{\alpha }}_{1G,y_{i1G}+1}, \ldots , {\hat{\alpha }}_{d_GG,y_{id_GG}+1}\) on \([0, 1]^d\) and then fit the factor copula model at the second step. Hence, the IFM approach can be regarded as a two-step approach on the original data or simply as the standard one-step ML method on the transformed (copula) data. Note also in passing that compared to the ML, the IFM method is not as punishing for misspecification of the dependence structure (Joe and Xu 1996; Xu 1996).

For the bi-factor copula model, numerical evaluation of the joint pmf can be achieved with the following steps:

  1. 1.

    Calculate Gauss–Legendre quadrature (Stroud and Secrest 1966) points \(\{\mathcal {v}_q: q=1,\ldots ,n_q\}\) and weights \(\{w_q: q=1,\ldots ,n_q\}\) in terms of standard uniform.

  2. 2.

    Numerically evaluate the joint pmf

    $$\begin{aligned} \int _0^1 \prod _{g=1}^{G} \Bigg \{ \int _0^1 \prod _{j=1}^{d_g} f_{Y_{jg}|V_{jg};V_0}(y_{jg}|v_g,v_0) \textrm{d}{v_g} \Bigg \} \textrm{d}{v_0} \end{aligned}$$

    in a double sum

    $$\begin{aligned} \sum _{q_1=1}^{n_q} w_{q_1} \prod _{g=1}^{G} \Bigg \{ \sum _{q_2=1}^{n_q} w_{q_2} \prod _{j=1}^{d_g} f_{Y_{jg}|V_{jg};V_0}(y_{jg}|\mathcal {v}_{q_2},\mathcal {v}_{q_1}) \Bigg \}. \end{aligned}$$

For the second-order copula model, numerical evaluation of the joint pmf can be achieved with the following steps:

  1. 1.

    Calculate Gauss–Legendre quadrature points \(\{\mathcal {v}_q : q=1,\ldots , n_q \}\) and weights \(\{w_q : q=1,\ldots , n_q \}\) in terms of standard uniform.

  2. 2.

    Numerically evaluate the joint pmf

    $$\begin{aligned} \int _0^1 \Bigg \{ \prod _{g=1}^{G} \int _0^1 \Big [ \prod _{j=1}^{d_g} f_{Y_{jg}|V_g}(y_{jg}|v_g;\theta _{jg}) \Big ] c_{V_g,V_0}\bigl (v_g,v_0;\delta _{g}\bigr ) \textrm{d}{v_g} \Bigg \} \textrm{d}{v_0} \end{aligned}$$

    in a double sum

    $$\begin{aligned} \sum _{q_1=1}^{n_q} w_{q_1} \Bigg \{ \prod _{g=1}^{G} \sum _{q_2=1}^{n_q} w_{q_2} \Big [ \prod _{j=1}^{d_g} f_{Y_{jg}|V_g}(y_{jg}|\mathcal {v}_{q_2|q_1};\theta _{jg}) \Big ] \Bigg \}, \end{aligned}$$

    where \(\mathcal {v}_{q_2|q_1} = C^{-1}_{Y_{jg}|V_g;V_0}( {v}_{q_2} | {v}_{q_1};\delta _{g})\). Note that the independent quadrature points \(\{\mathcal {v}_{q_1}: q_1 = 1,\ldots ,n_q\}\) and \(\{\mathcal {v}_{q_2}: q_2 = 1,\ldots ,n_q\}\) have been converted to dependent quadrature points that have an one-factor copula distribution \(C_{X}(\cdot ;\varvec{\delta })\).

The estimated copula parameters can be obtained by using a quasi-Newton (Nash 1990) method applied to the logarithm of the joint likelihood. With Gauss–Legendre quadrature, the same nodes and weights are used for different functions; this helps in yielding smooth numerical derivatives for numerical optimization via quasi-Newton. Our comparisons show that \(n_q=25\) quadrature points are adequate with good precision.

3 Bivariate Copula Selection

In the following subsections, we describe simple diagnostics based on semi-correlations and a heuristic method that automatically selects the bivariate parametric copula families that build either the bi-factor or the second-order copula model.

Choices of copulas with upper or lower tail dependence are better if the items have more probability in joint lower or upper tail than would be expected with the BVN copula. This can be shown with summaries of polychoric correlations in the upper and lower joint tail (Kadhem and Nikoloulopoulos 2021). In the context of items that can be split into G non-overlapping groups, such that there is homogeneous dependence within each group, it is sufficient to (a) summarize the average of the polychoric semi-correlations for all pairs within each of the G groups and for all pairs of items, and (b) not mix bivariate copulas for a single factor; hence, for both the bi-factor and second-order copula models we allow \(G+1\) different copula families, one for each group specific factor \(V_g\) and one for \(V_0\).

We distinguish the simple descriptives, i.e., the semi-correlations, from the heuristic algorithm and model fitting. On the one hand, the descriptive statistics can suggest more probability to the lower or upper tail for many pairs of items, but bi-factor and second-order copula models with asymmetrical dependence can be used to check whether the two tails are significantly different.

3.1 Simple Diagnostics Based on Semi-correlations

Consider again the underlying N(0, 1) latent variables \(Z_{jg}\)’s of the ordinal variables \(Y_{jg}\)’s. The correlations of \(Z_{jg}\)’s in the upper and lower tail, hereafter semi-correlations, are defined as (Joe 2014, p. 71):

$$\begin{aligned} \rho _N^+= & {} \text{ Cor }\Bigl (Z_{jg},Z_{j'g'}|Z_{jg}>0,Z_{j'g'}>0\Bigr )\\= & {} \frac{\int _{0}^{\infty }\int _{0}^{\infty } z_1z_2\phi (z_1)\phi (z_2)c\bigl (\Phi (z_1),\Phi (z_2)\bigr )dz_1dz_2-{\biggr (\int _0^\infty z\phi (z)\Bigr (1- C_{2|1}\bigr (0.5|\Phi (z)\bigl )\Bigr )dz\biggr )^2}/{C(0.5,0.5)}}{\int _0^\infty z^2\phi (z)\Bigr (1- C_{2|1}\bigr (0.5|\Phi (z)\bigl )\Bigr )\textrm{d}z-{\biggr (\int _0^\infty z\phi (z)\Bigr (1- C_{2|1}\bigr (0.5|\Phi (z)\bigl )\Bigr )\textrm{d}z\biggr )^2}/{C(0.5,0.5)}};\nonumber \\ \rho _N^-= & {} \text{ Cor }\Bigl (Z_{j_1g},Z_{j_2g}|Z_{j_1g}<0,Z_{j_2g}<0\Bigr )\nonumber \\= & {} \frac{\int _{-\infty }^{0}\int _{-\infty }^{0} z_1z_2\phi (z_1)\phi (z_2)c\bigl (\Phi (z_1),\Phi (z_2)\bigr )\textrm{d}z_1\textrm{d}z_2-{\biggr (\int _{-\infty }^0 z\phi (z)C_{2|1}\bigr (0.5|\Phi (z)\bigl )\textrm{d}z\biggr )^2}/{C(0.5,0.5)}}{\int _{-\infty }^0 z^2\phi (z) C_{2|1}\bigr (0.5|\Phi (z)\bigl )dz-{\biggr (\int _{-\infty }^0 z\phi (z)C_{2|1}\bigr (0.5|\Phi (z)\bigl )\textrm{d}z\biggr )^2}/{C(0.5,0.5)}}.\nonumber \end{aligned}$$
(8)

From the above expressions, it is clear that the semi-correlations depend only on the copula C of \(\Bigl (\Phi (Z_{jg}),\) \(\Phi (Z_{j'g'})\Bigr )\); \(C_{2|1}\) is the conditional copula cdf. For the BVN and \(t_\nu \) copulas \(\rho _N^{-}=\rho _N^{+}\), while for the Gumbel and s.Gumbel copulas \(\rho _N^{-}<\rho _N^{+}\) and \(\rho _N^{-}>\rho _N^{+}\), respectively. The sample versions of \(\rho _N^{+},\rho _N^{-}\) for item response data are the polychoric correlations in the joint lower and upper quadrants of \(Y_{jg}\) and \(Y_{j'g'}\) (Kadhem and Nikoloulopoulos 2021).

3.2 Selection Algorithm

We propose a heuristic method that selects appropriate bivariate copulas for each factor of the bi-factor and second-order copula models. It starts with an initial assumption, that all bivariate linking copulas are BVN copulas, i.e. the starting model is either the Gaussian bi-factor or second-order model, and then, sequentially other copulas with lower or upper tail dependence are assigned to the factors where necessary to account for more probability in one or both joint tails. The selection algorithm involves the following steps:

  1. 1.

    Fit the bi-factor or second-order copula model with BVN copulas.

  2. 2.

    Fit all the possible bi-factor or second-order copula models, iterating over all the copula candidates that link all items \(Y_{jg}\)’s in group g or each group-specific factor \(V_g\), respectively, to \(V_0\).

  3. 3.

    Select the copula family that corresponds to the lowest Akaike information criterion (AIC), that is, \(\text {AIC}= -2 \times \ell +2 \times \#\text {copula parameters}\).

  4. 4.

    Fix the selected copula family that links the observed (bi-factor model) or latent (second-order model) variables to \(V_0\).

  5. 5.

    For \(g=1,\dots ,G\):

    1. (a)

      Fit all the possible models, iterating over all the copula candidates that link all the items in group g to the group-specific factor \(V_g\).

    2. (b)

      Select the copula family that corresponds to the lowest AIC.

    3. (c)

      Fix the selected linking copula family for all the items in group g with \(V_g\).

For vine copula models (bi-factor and second-order copula models are vine copula models that involve both observed and latent variables), Dissmann et al. (2013) also found that bivariate copula selection based on AIC seems to be better than even using bivariate goodness-of-fit tests. The goodness-of-fit procedures involve a global distance measure between the model-based and empirical distribution; hence, they might not be sensitive to tail behaviours and are not diagnostics in the sense of suggesting improved parametric models in the case of small p-values (Joe 2014, p. 254). A smaller AIC indicates a model that better approximates both the dependence structure of the data and the strength of dependence in the tails.

4 Goodness of Fit

We will use the limited information \(M_2\) statistic proposed by Maydeu-Olivares and Joe (2006) to evaluate the overall fit of the proposed bi-factor and second-order copula models. The \(M_2\) statistic is based on a quadratic form of the deviations of sample and model-based proportions over all bivariate margins. For the bi-factor and second-order copula models with parameter vector \(\varvec{\theta }\) of dimension q, let \(\varvec{\pi }_2(\varvec{\theta })=\bigl (\dot{\varvec{\pi }}_1(\varvec{\theta })^\top ,\dot{\varvec{\pi }}_2(\varvec{\theta })^\top \bigr )^\top \) be the column vector of the univariate and bivariate model-based marginal probabilities that do not include category 0 with sample counterpart \(\textbf{p}_2=(\dot{\textbf{p}}_1^\top ,\dot{\textbf{p}}_2^\top )^\top \). The total number of the univariate and bivariate residuals \(\bigl (\textbf{p}_2-\varvec{\pi }_2({\hat{\varvec{\theta }}})\bigr )^\top \) is

$$\begin{aligned} s = d (K-1) + \left( {\begin{array}{c}d\\ 2\end{array}}\right) (K - 1)^2, \end{aligned}$$

where \(d (K-1)\) is the dimension of the univariate residuals and \(\left( {\begin{array}{c}d\\ 2\end{array}}\right) (K - 1)^2\) is the dimension of the bivariate residuals excluding category 0.

With a sample size n, the limited-information \(M_2\) statistic is given by

$$\begin{aligned} M_2=M_2({\hat{\varvec{\theta }}})=n\bigl (\textbf{p}_2-\varvec{\pi }_2({\hat{\varvec{\theta }}})\bigr )^\top \textbf{C}_2({\hat{\varvec{\theta }}})\bigl (\textbf{p}_2-\varvec{\pi }_2\bigl ({\hat{\varvec{\theta }}})\bigr ), \end{aligned}$$

with

$$\begin{aligned} \textbf{C}_2(\varvec{\theta })=\varvec{\Xi }_2^{-1}-\varvec{\Xi }_2^{-1}\varvec{\Delta }_2(\varvec{\Delta }_2^\top \varvec{\Xi }_2^{-1}\varvec{\Delta }_2)^{-1}\varvec{\Delta }_2^\top \varvec{\Xi }_2^{-1} =\varvec{\Delta }_2^{(c)}\bigl ([\varvec{\Delta }_2^{(c)}]^\top \varvec{\Xi }_2\varvec{\Delta }_2^{(c)}\bigr )^{-1}[\varvec{\Delta }_2^{(c)}]^\top , \end{aligned}$$

where \(\varvec{\Delta }_2=\partial \varvec{\pi }_2(\varvec{\theta })/\partial \varvec{\theta }^\top \) is an \(s\times q\) matrix of full column rank q with the first-order derivatives of the univariate and bivariate marginal probabilities with respect to the estimated model parameters (in the Supplementary Tables 1–5 of the electronic supplementary material, we provide details on the calculation of these derivatives), \(\varvec{\Delta }_2^{(c)}\) is an \(s\times (s-q)\) orthogonal complement to \(\varvec{\Delta }_2\) of full column rank \(s-q\), such that \([\varvec{\Delta }_2^{(c)}]^\top \varvec{\Delta }_2=\textbf{0}\) and \(\varvec{\Xi }_2\) is the asymptotic \(s \times s\) covariance matrix of \(\sqrt{n}\bigl (\textbf{p}_2-\varvec{\pi }_2({\hat{\varvec{\theta }}})\bigr )^\top \).

The asymptotic covariance matrix \(\varvec{\Xi }_2\) can be partitioned according to the partitioning of \(\textbf{p}_2\) into \(\varvec{\Xi }_{11}=\sqrt{n}\text{ Acov }(\dot{\textbf{p}}_1)\), \(\varvec{\Xi }_{21}=\sqrt{n}\text{ Acov }(\dot{\textbf{p}}_2,\dot{\textbf{p}}_1)\) and \(\varvec{\Xi }_{22}=\sqrt{n}\text{ Acov }(\dot{\textbf{p}}_2)\), where \(\text{ Acov }(\cdot )\) denotes asymptotic covariance matrix. The elements of \(\varvec{\Xi }_{11}\), \(\varvec{\Xi }_{21}\) and \(\varvec{\Xi }_{22}\) involve up to the 4-dimensional probabilities as shown below:

$$\begin{aligned}{} & {} \sqrt{n}\text{ Acov }(p_{j_1,y_1},p_{j_2,y_2})=\pi _{j_1j_2,y_1y_2}-\pi _{j_1,y_1}\pi _{j_2,y_2}\\{} & {} \sqrt{n}\text{ Acov }(p_{j_1j_2,y_1y_2},p_{j_3,y_3})=\pi _{j_1j_2j_3,y_1y_2y_3}-\pi _{j_1j_2,y_1y_2}\pi _{j_3,y_3}\\{} & {} \sqrt{n}\text{ Acov }(p_{j_1j_2,y_1y_2},p_{j_3j_4,y_3y_4})=\pi _{j_1j_2j_3j_4,y_1y_2y_3y_4}-\pi _{j_1j_2,y_1y_2}\pi _{j_3j_4,y_3y_4}, \end{aligned}$$

where \( \pi _{j,y} = \Pr (Y_j = y)\), \(\pi _{j_1j_2,y_1y_2}= \Pr (Y_{j_1} = y_1,Y_{j_2} = y_2)\), \(\pi _{j_1j_2j_3,y_1y_2y_3}=\Pr (Y_{j_1} = y_1,Y_{j_2} = y_2,Y_{j_3} = y_3)\), and \(\pi _{j_1j_2j_3j_4,y_1y_2y_3y_4}=\Pr (Y_{j_1} = y_1,Y_{j_2} = y_2,Y_{j_3} = y_3,Y_{j_4} = y_4)\).

The limited information statistic \(M_2\) under the null hypothesis has an asymptotic distribution that is \(\chi ^2\) with \(s-q\) degrees of freedom as the IFM estimate \({\hat{\varvec{\theta }}}\) is \(\sqrt{n}\)-consistent.

5 Simulations

An extensive simulation study is conducted to (a) gauge the small-sample efficiency of the IFM estimation method and investigate the misspecification of the bivariate pair-copulas, (b) examine the reliability of using the heuristic algorithm to select the true (simulated) bivariate linking copulas, and (c) study the small-sample performance of the \(M_2\) statistic.

Table 1 Small sample of size \(n = 500\) simulations (10\(^3\) replications) from the bi-factor and second-order copula models with Gumbel copulas and group estimated average biases, root mean square errors (RMSE), and standard deviations (SD), scaled by n, for the IFM estimates under different pair-copulas from the bi-factor and second-order copula models.
Table 2 Small sample of size \(n = 500\) simulations (10\(^3\) replications) from the bi-factor and second-order copula models with Gumbel copulas and group estimated average biases, root mean square errors (RMSE), and standard deviations (SD), scaled by n, for the ML estimates under different pair-copulas from the bi-factor and second-order copula models.

We randomly generate 1000 datasets with samples of size \(n=500\) or 1000 and \(d=16\) items, with \(K=3\) or \(K=5\) equally weighted categories, that are equally separated into \(G=4\) non-overlapping groups from the bi-factor and second-order copula model. In each simulated model, we use different linking copulas to cover different types of dependence. To make the models comparable, we convert the BVN/\(t_\nu \) and Gumbel/s.Gumbel copula parameters to Kendall’s \(\tau \)’s via the relations in (5) and (6), respectively. For the bi-factor copula models, we set \(\tau (\varvec{\theta }_g)=(0.45,0.55,0.65,0.75)\) and \(\tau (\varvec{\delta }_g)=(0.30,0.35,\) 0.40, 0.50) for \(g=1,\ldots ,4\). For the second-order copula models, we set \(\tau (\varvec{\theta }_g)=(0.4,0.5,0.6,0.7)\) for \(g=1,\ldots ,4\) and \(\tau (\varvec{\delta })=(0.30,0.35,0.40,0.45)\).

The Kendall’s tau parameters \(\tau (\varvec{\theta }_g)\) and \(\tau (\varvec{\delta }_g)\) as described above are common for each group; hence, Table 1 (Table 2) contains the group estimated average biases, root mean square errors (RMSE), and standard deviations (SD), scaled by n, for the IFM (ML) estimates under different pair-copulas from the bi-factor and second-order copula models. In the true (simulated) models, the linking copulas are Gumbel copulas. Given the large number of cutpoints as the number of categories K increases, for ML estimation we restrict ourselves to \(K=3\) categories.

Conclusions from the values in the tables are the following:

Table 3 Small sample of size \(n = 500\) simulations (\(10^3\) replications) from the bi-factor and second-order copula models with various linking copulas and frequencies of the true bivariate copula identified using the model selection algorithm.
  • IFM with the true bi-factor or second-order model is highly efficient according to the simulated biases, SDs and RMSEs.

  • The IFM estimates of \(\tau \)’s are not robust under bivariate copula misspecification and their biases increase when the assumed bivariate copula has tail dependence of opposite direction from the true bivariate copula. For example, in Table 1 the scaled biases for the IFM estimates increase substantially when the linking copulas are the s.Gumbel copulas.

  • IFM is not as punishing for bivariate copula misspecification as ML estimation. For example, the scaled biases for the ML estimates are even larger when the bivariate linking copulas are misspecified to the s.Gumbel copulas.

To examine the reliability of using the heuristic algorithm to select the true (simulated) bivariate linking copulas, samples of size 500 were generated from various bi-factor and second-order copula models. Table 3 presents the number of times that the true (simulated) bivariate linking copulas were chosen over 1000 simulation runs. It is revealed that the model selection algorithm performs extremely well for various bi-factor and second-order copulas models with different choices of linking copulas as the number of categories K increases. For a small K dependence in the tails cannot be easily quantified. Hence, for example, when the true copula is the \(t_5\) which has the same upper and lower tail dependence, the algorithm selected either \(t_5\) or BVN which has zero lower and upper tail dependence, because both copulas provide reflection symmetric dependence.

Table 4 Small sample of size \(n =\{500,1000\}\) simulations (10\(^3\) replications) from bi-factor and second-order copula models and the empirical rejection levels at \(\alpha = \{0.20, 0.10, 0.05, 0.01\}\), degrees of freedom (df), mean and variance.

To check whether the \(\chi ^2_{s-q}\) is a good approximation for the distribution of the \(M_2\) statistic under the null hypothesis, samples of sizes 500 and 1000 were generated from various bi-factor second-order copula models. Table 4 contains four common nominal levels of the \(M_2\) statistic under the bi-factor and second-order copula models with different bivariate copulas. As can be seen in the table, the observed levels of \(M_2\) are close to the nominal \(\alpha \) levels and remain accurate even for extremely sparse tables (\(d=16\) and \(K=5\)).

6 Application

The Toronto Alexithymia Scale (TAS) is the most utilized measure of alexithymia in empirical research (e.g., Bagby et al. 1994; Taylor et al. 2003; Parker et al. 2003; Gignac et al. 2007; Reise et al. 2013; Tuliao et al. 2020; Carnovale et al. 2021) and is composed of \(d=20\) items. The aforementioned works suggest that the items measure 3 facets of alexithymia, namely Difficulty Identifying Feelings (DIF; \(d_1=7\) items), Difficulty Describing Feelings (DDF; \(d_2=5\) items), Externally Oriented Thinking (EOT; \(d_3=8\) items), where each facet represents different non-overlapping items. Therefore, a 3-factor model was initially called, but given the hypothesized association among the three facets of the alexithymia construct, oblique (correlated) factor or bi-factor models have been used (e.g., Bagby et al. 1994; Parker et al. 2003; Gignac et al. 2007). Tuliao et al. (2020) has demonstrated that a bi-factor model outperforms any other competing factor model for the TAS scale. To this end, recent studies adopted the bi-factor structure for the TAS scale (e.g., Carnovale et al. 2021) and further supported a general alexithymia factor and group-specific factors (DIF, DDf and EOT) that account for homogeneous dependence amongst the non-overlapping items. Note also in passing that estimating a 3-factor or an oblique 3-factor model is computationally demanding as it requires 3-dimensional integration. This is not case for the bi-factor or second-order (an oblique 3-factor model where the group-specific factors are linked to another latent variable via a 1-factor model) models. In spite of the fact they involve \(G+1=4\) latent variables, only require one-dimensional integrals of a function which in turn is a product of 3 one-dimensional integrals.

We use a dataset of 1925 university students from the French-speaking region of Belgium (Briganti and Linkowski 2020). Students were 17 to 25 years old, and 58% of them were female and 42% were male. They were asked to respond to each item using one of \(K=5\) categories: “\(1=\) completely disagree”, “\(2=\) disagree”, “\(3=\) neutral, “\(4=\) agree”, “\(5=\) completely agree”. The dataset and full description of the items can be found in the R package BGGM (Williams and Mulder 2020).

For these items, a respondent might be thinking about the average “sensation” of many past relevant events, leading to latent means. That is, based on the item descriptions, this seems more natural than a discretized maxima or minima. Since the sample is a mixture (male and female students), we can expect a priori that a bi-factor or second-order copula model with \(t_\nu \) copulas might be plausible, as in this case the items can be considered as mixtures of discretized means.

Table 5 Average observed polychoric correlations and semi-correlations for all pairs within each group and for all pairs of items for the Toronto Alexithymia Scale (TAS), along with the corresponding theoretical semi-correlations for BVN, \(t_5\), Frank, Gumbel , and survival Gumbel (s.Gumbel) copulas.

In Table 5, we summarize the averages of polychoric semi-correlations for all pairs within each facet of alexithymia and for all pairs of items along with the theoretical semi-correlations in (8) under different choices of copulas. For a BVN/\(t_\nu \) copula, the copula parameter is the sample polychoric correlation, while for a Gumbel/s.Gumbel copula the polychoric correlation was converted to Kendall’s tau with the relation in (5) and then from Kendall’s \(\tau \) to Gumbel/s.Gumbel copula parameter via the functional inverse in (6). The summary of findings from the diagnostics in the table show that the items appear to be a mixed selection between discretized means and minima. For the indicators of the DIF factor (items in group 1) there is more correlation in the joint lower tail, i.e., the items are based on discretizations of latent variables that are minima and have more probability in the joint lower tail, suggesting the use of s.Gumbel linking copulas when modelling the responses to these items. All the other items have stronger correlation in both the joint upper and joint lower tail than with the BVN, i.e., the items are based on discretizations of latent variables that are means and have more probability in both the joint lower and upper tail, suggesting the use of \(t_\nu \) bivariate linking copulas as the respondents consist of a “mixture” population (different genders), Hence, a bi-factor or second-order copula model with the aforementioned linking copulas might provide a better fit than the (Gaussian) models with BVN copulas.

Then, we fit the bi-factor and second-order models with the bivariate copulas selected by the heuristic algorithm in Sect. 3.2. For a baseline comparison, we also fit their special cases; these are the one- and two-factor copula models where we have also selected the bivariate copulas using the heuristic algorithm proposed by Kadhem and Nikoloulopoulos (2021). To show the improvement of the copula models over their Gaussian analogues, we have also fitted all the classes of copula models with BVN copulas. The fitted models are compared via the AIC, since the number of parameters is not the same between the models. In addition, we use Vuong’s test (Vuong 1989) to show if (a) the best fitted model according to the AICs provides better fit than the other fitted models and (b) a model with the selected copulas provides better fit than the one with BVN copulas. The Vuong’s test is the sample version of the difference in Kullback–Leibler divergence between two models and can be used to differentiate two parametric models which could be non-nested. For the Vuong’s test we provide the 95% confidence interval of the test statistic (Joe 2014, p. 258). If the interval does not contain 0, then the best fitted model according to the AICs is better if the interval is completely above 0. To assess the overall goodness-of-fit of the bi-factor and second-order copula models, we use the \(M_2\) statistic, along with the Root Mean Square Error of Approximation based on \(M_2\) (Maydeu-Olivares and Joe 2014), viz.

$$\begin{aligned} \text{ RMSEA}_2= \sqrt{\text{ Max } \left( \frac{{M}_2 - df}{n \times df},0\right) }. \end{aligned}$$
Table 6 AICs, Vuong’s 95% CIs, and \(M_2\) statistics for the 1-factor, 2-factor, bi-factor and second-order copula models with BVN copulas and selected copulas, along with the maximum deviations of observed and expected counts for all pairs within each group and for all pairs of items for the Toronto Alexithymia Scale.

Table 6 gives the AICs, the 95% CIs of Vuong’s tests and the \(M_2\) statistics for all the fitted models. The best fitted model, based on AIC values, is the bi-factor copula model obtained from the selection algorithm. The best-fitted bi-factor copula model results when we use s.Gumbel for the DIF factor, \(t_3\) for both the DDF and EOT factors and \(t_2\) for the common factor (alexithymia). This is in line with the preliminary analyses based on the interpretations of items as mixtures of means and the diagnostics in Table 5. The proposed model selection algorithm has selected the \(t_\nu \) copula that has the same lower and upper tail dependence for the common and all the group specific factors except the group specific factor in group 1 for which the survival Gumbel copula has been selected. For the items in group 1, the largest difference \({\hat{\rho }}_N^--{\hat{\rho }}_N^+=0.07\) is found showing more dependence in the lower tail and the fact that for this group the survival Gumbel copula is selected it shows that this difference is statistically significant. No other difference is statistically significant. It is revealed that the DIF items and DIF factor are discretized and latent minima, respectively, as the participants seem to reflect that they “disagree” or “completely disagree” having difficulty identifying feelings. From the Vuong’s 95% Cls and \(M_2\) statistics, it is shown that factor copula models provide a big improvement over their Gaussian analogues and that the selected bi-factor copula model outperforms all the fitted models.

Table 7 Estimated copula parameters and their standard errors (SE) in Kendall’s \(\tau \) scale for the bi-factor copula models with BVN copulas and selected copulas for the Toronto Alexithymia Scale.

The highly statistical significant \(M_2\) statistics are not surprising since one should expect discrepancies between the postulated parametric model and the population probabilities, when the sample size or dimension is sufficiently large (Maydeu-Olivares and Joe 2014) as in our case; none should expect a model with 3000 df to fit exactly. To further show that the fit has been improved we have calculated the maximum deviations of observed and model-based counts for each bivariate margin, that is, \(D_{j_1j_2}=n\max _{y_1,y_2}|p_{j_1,j_2,y_1,y_2}-\pi _{j_1,j_2,y_1,y_2}({\hat{\varvec{\theta }}})|\). In Table 6, we summarize the averages of these deviations for all pairs within each group and for all pairs of items. Overall, the maximum discrepancies have been sufficiently reduced in the selected bi-factor model.

Table 7 gives the copula parameter estimates in Kendall’s \(\tau \) scale and their standard errors (SE) for the selected bi-factor copula model and the Gaussian bi-factor model as the benchmark model. The SEs of the estimated parameters are obtained by the inversion of the Hessian matrix at the second step of the IFM method. These SEs are adequate to assess the flatness of the log-likelihood. Proper SEs that account for the estimation of cutpoints can be obtained by jackknifing the two-stage estimation procedure. The loading parameters (\({\hat{\tau }}\)’s converted to BVN copula parameters via the functional inverse in (5) and then to loadings using the relations in Section 2.3) show that the common alexithymia factor is mostly loaded on DIF and DDF items, suggesting that items in the domains DIF and DDF are good indicators for alexithymia. The items in the EOT although they loaded on the EOT latent factor, they had poor loadings in the common alexithymia factor.

7 Discussion

For items from several domains, we have proposed bi-factor and second-order copula models where we replace BVN distributions, between observed and latent variables, with bivariate copulas. Our copula constructions include the Gaussian bi-factor and second-order models as special cases and can provide a substantial improvement over the Gaussian models based on AIC, Vuong’s and goodness-of-fit statistics. Hence, superior statistical inference for the loadings can be achieved. We have demonstrated that the Kendall’s \(\tau \)’s or loading parameters are not robust under bivariate linking copula misspecification and their biases increase when the assumed bivariate copula has tail dependence of opposite direction from the true bivariate copula.

The improvement relies on the fact that when we use appropriate bivariate copulas other than BVN copulas in the construction, there is an interpretation of latent variables that can be maxima/minima or mixture of means instead of means. The bi-factor and second-order copula models, if other than BVN copulas are called, have a latent structure that is not additive as in (3) and (4), respectively. The bi-factor copula (dependence) parameters are interpretable as dependence of an observed variable with the common factor, or conditional dependence of an observed variable with the group-specific latent variable given the common factor, i.e., the bi-factor copula model permits conditional dependence within identified subsets of items.

Both the bi-factor and second-order copula models lead to substantial simplification of the joint likelihood. The joint pmfs in (1) and (2) reduce to one-dimensional integrals of a function which in turn is a product of G one-dimensional integrals. Hence, the evaluation of the joint likelihood requires only low-dimensional integration, as in the one- and two-factor copula models, regardless of the dimension \(G+1\) of the factors. This is an advantage over the p-factor (\(p>2\)) copula models where the joint pmf requires p-dimensional integration and becomes intractable as the number of factors increases. Hence, the proposed structured multidimensional factor models provide parsimonious factor solutions without any computational deficiencies as in the p-factor copula models when p increases.

We have proposed a numerically stable likelihood estimation technique based on Gauss–Legendre quadrature. The use of independent Gauss–Legendre quadrature points for this kind of models has been proposed in Krupskii and Joe (2015). For the bi-factor copula models for item response the integrants are bounded and thus independent Gauss–Legendre points are fine. For the second-order copula models for item response the integrants can be unbounded as copula densities can be unbounded, hence we have proposed the novel use of dependent Gauss–Legendre quadrature points that have an 1-factor copula distribution.

Building on the models proposed in this paper, there are several extensions that can be implemented. The adoption of the structure of the Gaussian tri-factor and third-order models (e.g., Rijmen et al. 2014), to account for any additional layer of dependence, is feasible using the notion of truncated vine copulas that involve both observed and latent variables.

8 Software

R functions for estimation, simulation, model selection and goodness-of-fit of the bi-factor and second-order copula models are part of the R package FactorCopula (Kadhem and Nikoloulopoulos 2022). All the analyses presented in Sect. 6 are given as code examples in the package.