Factor Tree Copula Models for Item Response Data

Factor copula models for item response data are more interpretable and fit better than (truncated) vine copula models when dependence can be explained through latent variables, but are not robust to violations of conditional independence. To circumvent these issues, truncated vines and factor copula models for item response data are joined to define a combined model, the so-called factor tree copula model, with individual benefits from each of the two approaches. Rather than adding factors and causing computational problems and difficulties in interpretation and identification, a truncated vine structure is assumed on the residuals conditional on one or two latent variables. This structure can be better explained as a conditional dependence given a few interpretable latent variables. On the one hand, the parsimonious feature of factor models remains intact and any residual dependencies are being taken into account on the other. We discuss estimation along with model selection. In particular, we propose model selection algorithms to choose a plausible factor tree copula model to capture the (residual) dependencies among the item responses. Our general methodology is demonstrated with an extensive simulation study and illustrated by analyzing Post-Traumatic Stress Disorder.


Introduction
Factor or conditional independence models are widely used techniques for analysing item response data using much fewer unobserved/latent variables or factors (Bartholomew et al., 2011).These are natural if the dependence amongst the d observed variables or items is assumed to arise from p latent variables with p << d.They are parsimonious models and favourable for large dimensions as the number of parameters is O(d) instead of O(d 2 ).Nevertheless, factor models mainly assume that the items are conditionally independent given some latent variables.This assumption implies that the dependence amongst the observed variables is fully accounted for by the factors with no remaining dependence.This could lead to biased estimates if the strict assumption of conditional independence is violated (Braeken et al., 2007;Sireci et al., 1991;Chen and Thissen, 1997;Yen, 1993).The conditional independence assumption is violated if there exists local or residual dependence.Mitigating the residual dependence might be achieved by adding more latent variables to the factor model, but at the expense of computational problems and difficulties in interpretation and identification.
To circumvent these problems, the items can be allowed to interrelate by forming a dependence structure with conditional dependence given a few interpretable latent variables.In this way, on the one hand the parsimonious feature of factor models remains intact and any residual dependencies are being taken into account on the other.This can be achieved by incorporating copulas into the conditional distribution of factor models in order to provide a conditional dependence structure given very few latent variables.Such copula approaches for item response data are proposed by Braeken et al. (2007Braeken et al. ( , 2013) ) and Braeken (2011) who explored the use of Archimedean copulas or a mixture of the independence and comonotonicity copulas to capture the residual dependence of traditional item response theory models.Therein simple copulas have been used for subgroups of items that are chosen from the context with homogeneous within-subgroup dependence.This is due to the fact that Archimedean copulas allow only for exchangeable dependence with a narrower range as the dimension increases (McNeil and Nešlehová, 2009).
Without a priori knowledge of obvious subgroups of items that are approximately exchangeable, we will propose a more general residual dependence approach that makes use of truncated regular vine copula models (Brechmann et al., 2012).Within a vine copula specification, no such restrictions need to be made.Regular vine copulas are a flexible class of models that are constructed from a set of bivariate copulas in hierarchies or tree levels (Joe, 1996;Bedford andCooke, 2001, 2002;Kurowicka and Cooke, 2006;Kurowicka and Joe, 2011;Joe, 2014).A d-dimensional regular vine copula can cover flexible dependence structures, different from assuming simple linear correlation structures, tail independence and normality (Nikoloulopoulos et al., 2012), through the specification of d − 1 bivariate parametric copulas at level 1 and d−1 2 bivariate conditional parametric copulas at higher levels; at level ℓ for ℓ = 2, . . ., d − 1, there are d − ℓ bivariate conditional copulas that condition on ℓ − 1 variables.Joe et al. (2010) have shown that in order for a vine copula to have (tail) dependence for all bivariate margins, it is only necessary for the bivariate copulas in level 1 to have (tail) dependence and it is not necessary for the conditional bivariate copulas in levels 2, . . ., d − 1 to have (tail) dependence.That provides the theoretical justification for the idea to model the dependence in the first level and then just use the independence copulas to model conditional dependence at higher levels without sacrificing the tail dependence of the vine copula distribution.That is the 1-truncated vine copula has d − 1 parametric bivariate copulas in the 1st level of the vine and independence copulas in all the remaining levels of the vine (truncated after the 1st level).This truncation, as per the terminology in (Brechmann et al., 2012), provides a parsimonious vine copula model.The 1-truncated vine copula can provide, with appropriately chosen linking copulas, asymmetric dependence structure as well as tail dependence (dependence among extreme values).Joe et al. (2010) have shown that by choosing bivariate linking copulas appropriately, vine copulas can have a flexible range of lower/upper tail dependence and different lower/upper tail dependence parameters for each bivariate margin.Choices of copulas with upper or lower tail dependence are better if the items have more joint upper or lower tail probability than would be expected with the discretized multivariate normal (MVN) model (Muthén, 1978).Note in passing that the discretized MVN distribution is a special case of the vine copula model with discrete margins.If all bivariate copulas are bivariate normal (BVN) in the vine copula model, then the resulting model is the discretized MVN.
To define the conditional independence part of the model we also use truncated vine copulas rather than the traditional factor models for item response in Braeken et al. (2007Braeken et al. ( , 2013) ) and Braeken (2011).Nikoloulopoulos and Joe (2015) have proposed factor copula models for item response data.These factor models can be explained as truncated canonical vines rooted at the latent variables.The canonical vine is a boundary case of regular vine copulas, which is suitable if there exists a (latent) variable that drives the dependence among the items.For the first factor there are bivariate copulas that couple each item to the first latent variable and for the second factor there are copulas that link each item to the second latent variable conditioned on the first factor (leading to conditional dependence parameters), etc. Factor copula models with appropriately chosen linking copulas will be useful when the items (a) have more probability in joint upper or lower tail than would be expected with a discretized multivariate normal, or (b) can be considered as discretized maxima/minima or mixtures of discretized means rather than discretized means (Nikoloulopoulos and Joe, 2015).
The proposed parsimonious approach, that requires no priori knowledge of the subgroups of items, can be explained as a truncated regular vine copula model that involves both observed and latent variables; but, more simply, we derive the models as conditional dependence models with a few interpretable latent variables that model the residual dependence of the factor copula model via an 1-truncated vine copula.The factor copula model explains most of the dependence and the remaining dependence can be further accounted for by an 1truncated vine copula conditioned on the factors.Brechmann and Joe (2014) and Joe (2018) initiated the study of such conditional dependence models with a unidimensional factor/latent variable for continuous data.The combined 1-factor and 1-truncated vine model for continuous data in Brechmann and Joe (2014) is restricted to Gaussian dependence, but Joe (2018) proposed a combination of an 1-factor copula model with 1-truncated vine copula model with non-Gaussian bivariate copulas.Our models for item response are discrete counterparts of the models in Brechmann and Joe (2014) and Joe (2018) with interpretation and technical details that are quite different and provide an extension to more than one factors.
The remainder of the paper proceeds as follows.In Section 2, we introduce the combined factor/truncated vine copula models for item response data.Section 3 provides estimation techniques and computational details.
Section 4 discusses vine tree and bivariate copula selection.Section 5 has an extensive simulation study to assess the estimation techniques and model selection algorithms.Our methodology is illustrated using real data in Section 6.We conclude with some discussion in Section 7, followed by a brief section with software details.
2 Factor tree copula models for item response This section introduces the theory of the combined factor/truncated vine copula models for item response data.
Before that, the first two sections provide some background about factor (Nikoloulopoulos and Joe, 2015) and truncated vine (Panagiotelis et al., 2012(Panagiotelis et al., , 2017) ) copula models for discrete responses.

Factor copula models
We first introduce the notation used in this paper.Let Y = {Y 1 , . . ., Y d } denote the vector with the item response variables that are all measured on an ordinal scale; Y j ∈ {0, . . ., K j − 1}.Let the cutpoints in the uniform U (0, 1) scale for the jth item be a j,k , k = 1, . . ., K −1, with a j,0 = 0 and a j,K = 1.These correspond to a j,k = Φ(α j,k ), where α j,k are cutpoints in the normal N (0, 1) scale.
The p-factor model assumes that Y, with corresponding realizations y = {y 1 , . . ., y d }, is conditionally independent given the p-dimensional latent vector X = (X 1 , . . ., X p ).The joint probability mass function (pmf) of the p-factor model is where F X is the distribution of the latent vector X.The factor copula methodology uses a set of bivariate copulas that link the items to the latent variables to specify Pr(Y j = y j |X 1 = x 1 , . . ., X p = x p ). Below we include the theory for one and two factors.
For the 1-factor model, let X 1 be a latent variable that is standard uniform.From Sklar (1959), there is a bivariate copula C X 1 j such that Pr(X 1 ≤ x, Y j ≤ y) = C X 1 j x, F j (y) for 0 ≤ x ≤ 1 where F j (y) = a j,y+1 is the cdf of Y j .Then it follows that Hence, the pmf for the 1-factor copula model becomes where For the 2-factor copula model, let X 1 , X 2 be latent variables that are independent uniform U (0, 1) random variables.Let C X 1 j be defined as in the 1-factor copula model and C X 2 j be a bivariate copula such that where Hence, the pmf for the 2-factor copula model is where (5)

1-truncated vine copula models
Vine copula models are flexible tools to analyse dependence structures and have been popular in many application areas (Kurowicka and Joe, 2011).They involve d − 1 trees, the first tree represents dependence (as edges) amongst d variables (as nodes).Then the edges become nodes in the next tree, involving the conditional dependencies given a common variable.This process continues until tree d − 1 that includes two nodes and one edge, representing conditional dependence of two variables given d − 2 variables (Chang and Joe, 2019).
If one is restricted to the first tree, that is truncation at level 1, then the result is a Markov tree dependence structure where two variables not connected by an edge are conditionally independent given the variables in the tree between them.In a Markov tree or 1-truncated vine with d variables, d − 1 of the d(d − 1)/2 possible pairs are identified as the edges of a tree with d nodes corresponding to the items, i.e., there are a total of d − 1 edges, where two connected pairs of items form an edge.Let j and k be indices for any pairs of items with 1 ≤ k < j ≤ d.For a given vine tree structure, let E denote the set of edges.Each edge of jk ∈ E is represented with a bivariate copula C jk such that Since the densities of vine copulas can be factorized in terms of bivariate linking copulas and lowerdimensional margins, they are computationally tractable for high-dimensional continuous variables.Nevertheless, the cdf of d-dimensional vine copula lacks a closed form and requires (d − 1)-dimensional integration (Joe, 1997).Hence, in order to derive the d-dimensional pmf using finite differences of the d-dimensional cdf (e.g., Braeken et al. 2007 or Nikoloulopoulos 2013) poses non-negligible numerical challenges.This problem has been solved by Panagiotelis et al. (2012) who decomposed the d-dimensional pmf into finite differences of bivariate copula cdfs.Hence, the pmf of an 1-truncated vine model takes the form where and Pr(Y = y) = a j,y+1 − a j,y .

Combined factor/truncated vine copula models
In this section we combine the factor copula model with an 1-truncated vine copula to account for the residual dependence.The pmf of an 1-truncated vine copula in (6) can be used in the pmf of the factor copula model in (1) instead of the product to capture any residual dependencies.Hence the pmf of the combined factor/truncated vine copula model takes the form With one factor and an 1-truncated vine given the latent variable X 1 (hereafter 1-factor tree) let C jk;X 1 be a bivariate copula such that where F j|X 1 and F k|X 1 are given in (2).Then for a given 1-truncated vine structure with a set of edges E, the pmf of the 1-factor tree copula model is where and f j|X (y j |x), f k|X (y k |x) are given in (3).In the above Figure 1 depicts the graphical representation of a 1-factor tree copula model with d = 5 items as a 2truncated vine.Tree 1 shows the typical 1-factor model, while Tree 2 accounts for the residual dependence by the pairwise conditional dependencies of two items conditioned on the factor With two factors and an 1-truncated vine given the latent variables X 1 , X 2 (hereafter 2-factor tree), let where F X 2 j|X 1 and F X 2 k|X 1 are given in (4).Then for a given vine structure with a set of edges E, the pmf of the 2-factor tree copula model is where X 1 For parametric 1-factor and 2-factor tree copula models, we let C X 1 j , C X 2 j and C jk;X be parametric bivariate copulas, say with parameters θ 1j , θ 2j , and δ jk , respectively.For the set of all parameters, let θ = {a jk , θ 1j , δ jk : j = 1, . . ., d; k = 1, . . ., K − 1; jk ∈ E} for the 1-factor tree copula model and θ = {a jk , θ 1j , θ 2j , δ jk : j = 1, . . ., d; k = 1, . . ., K − 1; jk ∈ E} for the 2-factor tree copula model.

Choices of parametric bivariate copulas
In line with Nikoloulopoulos and Joe (2015), we use bivariate parametric copulas that can be used when considering latent maxima, minima or mixtures of means.For different dependent items based on latent maxima or minima, multivariate extreme value and copula theory (e.g., Joe 1997 ) can be used to select suitable copulas that link observed to latent variables.Copulas that arise from extreme value theory have more probability in one joint tail (upper or lower) than expected with a discretized MVN distribution or a MVN copula with discrete margins.If item responses are based on discretizations of latent variables that are means, then it is possible that there can be more probability in both the joint upper and joint lower tail, compared with discretized MVN models.This happens if the respondents consist of a 'mixture' population (e.g., different locations or genders).
From the theory of elliptical distributions and copulas (e.g., McNeil et al. 2005), it is known that the multivariate Student-t distribution as a scale mixture of MVN has more dependence in the tails.Extreme value and elliptical copulas can model item response data that have reflection asymmetric and symmetric dependence, respectively.
A bivariate copula C is reflection symmetric if its density satisfies c(u Otherwise, it is reflection asymmetric often with more probability in the joint upper tail or joint lower tail.Upper tail dependence means that c(1 − u, 1 − u) = O(u −1 ) as u → 0 and lower tail dependence is the survival or reflected copula of C; this "reflection" of each uniform U (0, 1) random variable about 1/2 changes the direction of tail asymmetry.Choices of copulas with upper or lower tail dependence are better if the items have more probability in joint lower or upper tail than would be expected with the BVN copula.This can be shown with summaries of polychoric correlations in the upper and lower joint tail (Kadhem and Nikoloulopoulos, 2021b).
After briefly providing definitions of tail dependence and reflection symmetry/asymmetry we provide below the bivariate copula choices we consider: • The elliptical bivariate normal (BVN) copula with cdf where Φ is the univariate standard normal cdf and and Φ 2 is the cdf of a BVN distribution with correlation parameter θ.A model with BVN copulas has latent (ordinal) variables that can be considered as (discretized) means and and there is less probability in both the joint upper and joint lower tail as the BVN copula has reflection symmetry and tail independence.
• The extreme value Gumbel copula with cdf A model with bivariate Gumbel copulas has latent (ordinal) variables that can be considered as (discretized) maxima and there is more probability in the joint upper tail as the Gumbel copula has reflection asymmetry and upper tail dependence.
• The survival Gumbel (s.Gumbel) copula with cdf A model with bivariate s.Gumbel copulas has latent (ordinal) variables that can be considered as (discretized) minima and there is more probability in the joint lower tail as the s.Gumbel copula has reflection asymmetry and lower tail dependence.
• The elliptical bivariate t ν copula with cdf For the residual part of the model in addition to the aforementioned bivariate parametric copulas for computational improvements we can use the Archimedean Frank copula with cdf reflection symmetry and tail independence.Its tail independence is not a distributional concern about the tail dependence/asymmetry between the items due to the main result in (Joe et al., 2010): for all the bivariate margins to have more probability in the joint lower or upper tail, it only suffices that the bivariate copulas in the first trees (factor part) to have upper/lower tail dependence and is not necessary for the bivariate copulas in the higher trees (residual part) to have tail dependence.For discrete data, such as item response, the Frank copula has the same tail behaviour with the BVN copula but provides simplified computations as it has a closed from cdf and thus it can preferred over the BVN copula for the residual part of the model that involves finite differences of bivariate copula cdfs.

Estimation
With sample size n and data y 1 , . . ., y n , the joint log-likelihood of the factor tree copula models is with π d (y) as defined in ( 7) and ( 8) for the 1-factor and 2-factor tree copula model, respectively.Maximization of ( 9) is numerically possible but time-consuming for large d because of many univariate cutpoints and dependence parameters.Hence, we approach estimation using the two-step IFM method proposed by Joe (2005) that can efficiently, in the sense of computing time and asymptotic variance, estimate the model parameters.
In the first step, the cutpoints are estimated using the univariate sample proportions.The univariate cutpoints for the jth item are estimated as âj,k = k y=0 p j,y , where p j,y , y = 0, . . ., K − 1 for j = 1, . . ., d are the univariate sample proportions.In the second step of the IFM method, the joint log-likelihood in ( 9) is maximized over the copula parameters with the cutpoints fixed as estimated at the first step.The estimated copula parameters can be obtained by using a quasi-Newton (Nash, 1990) method applied to the logarithm of the joint likelihood.
For the 1-factor tree copula model, numerical evaluation of the joint pmf can be achieved with the following steps: 1. Calculate Gauss-Legendre quadrature (Stroud and Secrest, 1966) points {x q : q = 1, . . ., n q } and weights {w q : q = 1, . . ., n q } in terms of standard uniform.
2. Numerically evaluate the joint pmf in (8) via the following approximation in a double sum: .
With Gauss-Legendre quadrature, the same nodes and weights are used for different functions; this helps in yielding smooth numerical derivatives for numerical optimization via quasi-Newton.Our comparisons show that n q = 15 quadrature points are adequate with good precision.

Model selection
In this section we will discuss model selection strategies for the factor tree copula models.Section 4.1 proposes vine tree structure selection methods for the residual part of the model that assume the factor tree copula models are constructed with bivariate normal (BVN) copulas.Section 4.2 proposes a heuristic algorithm that sequentially selects suitable bivariate copulas to account for any tail dependence/asymmetry as in Kadhem and Nikoloulopoulos (2021a,b).

1-truncated vine tree structure selection
We propose two selection algorithms to choose the 1-truncated vine tree structure E for the residual part of the model, namely the polychoric and partial selection algorithms.Before that, we provide the necessary tools to form the aforementioned algorithms.These are the estimated polychoric correlations (Olsson, 1979), correlations between each of the items and the first factor and partial correlations between each of the items and the second factor given the first factor (Nikoloulopoulos and Joe, 2015).
The sample polychoric correlation for all possible pairs of items can be estimated as where Φ 2 (•, •; ρ) is the BVN cdf with correlation parameter ρ.
When all the bivariate copulas are BVN the p-factor copula model is the same as the discretized MVN model with a p-factor correlation matrix, also known as the p-dimensional normal ogive model (Jöreskog and Moustaki, 2001).The 1-factor copula model in (3) is the same as the variant of Samejima's (1969) The parameter θ 1j of C X 1 j is the correlation of the underlying normal variable Z j of Y j with Z 01 = Φ −1 (X 1 ), and the parameter θ 2j of C X 2 j is the partial correlation between Z j and Z 02 = Φ −1 (X 1 ) given Z 01 .
Subsequently, for all possible pair of items we can estimate the partial correlations between Z j and Z k given Z 01 and the partial correlations between Z j and Z k given Z 01 , Z 02 via the relations and ρjk;Z 01 ,Z 02 = ρjk;Z 01 − θ2j θ2k , respectively, where θ1j , θ1k are the estimated unidimensional normal ogive model's parameters and θ1j , θ1k , θ2j , θ2k are the estimated bidimensional normal ogive model's parameters.
The polychoric and partial correlation algorithms select the best vine tree using the minimum spanning tree algorithm (Prim, 1957).The former algorithm selects the edges E of the tree that minimize the sum of the weights log(1 − ρ2 jk ), while the latter algorithm the sum of the weights log(1 − ρ2 jk;Z 01 ) for the 1-factor tree copula model and log(1 − ρ2 jk;Z 01 ,Z 02 ) for the 2-factor tree copula model.

Bivariate copula selection
We propose a heuristic method that selects appropriate bivariate copulas for the proposed models.It starts with an initial assumption that all bivariate copulas are BVN and independent copulas in the factor and 1-truncated vine copula model, respectively.Then sequentially suitable copulas with lower or upper tail dependence are assigned where necessary to account for more probability in one or both joint tails.For ease of interpretation, we do not mix Gumbel, s.Gumbel, t ν and BVN for a single tree of the model; e.g., for the 2-factor tree copula model we allow three different copula families, one for the first factor, one for the second factor and one for the 1-truncated vine (residual dependence part of the model).
The selection algorithm involves the following steps: 1. Start with a factor tree copula model with BVN and independent copulas in the factor and 1-truncated vine copula parts of the model, respectively.
2. Factor part (a) Factor 1 i.Fit all the possible models, iterating over all the bivariate copula candidates that link each of the items to X 1 .
ii. Select the bivariate copula that corresponds to the highest log-likelihood.
iii.Replace the BVN with the selected bivariate copula that links each of the items to X 1 .
(b) Factor 2 i.Fit all the possible models, iterating over all the copula candidates that link each of the items to X 2 .
ii. Select the bivariate copula that corresponds to the highest log-likelihood.
iii.Replace BVN with the selected bivariate copula that links each of the items to X 2 .
3. 1-truncated vine part (a) Select the best 1-truncated vine tree structure E using both the polychoric and partial selection algorithms proposed in Subsection 4.1.
(b) Fit all the possible models, iterating over all the bivariate copula candidates that link the pairs of items ∈ E given the factors.
(c) Select the bivariate copula that corresponds to the highest log-likelihood.
(d) Replace the independence copula with the selected bivariate copula that links each pair of items ∈ E given the factors.

Simulations
An extensive simulation study is conducted to assess the (a) efficiency of the proposed estimation method and (b) reliability of using the model selection algorithms to select the correct 1-truncated vine tree structure for the residual dependence part of the model.We randomly generated 1, 000 datasets with sample size n = 500 and d = {8, 16, 24} items with K = 5 equally weighted categories from an 1-factor and 2-factor tree copula models with Gumbel copulas.The items in the last tree are either serially connected in ascending order with an 1-truncated drawable vine or randomly connected with a 1-truncated regular vine.Note in passing that the drawable vine is a boundary regular vine case.

nBias
1st tree (1st factor of 2-factor copula)     .64 35.46 32.88 30.56 29.33 26.03 25.33 23.06 25.25 23.76 22.80 20.38 21.07 20.61 20.48 22.01 20.23 20.15 19.22 19.71 18.33 18.25 18.47 Figure 3: Small sample of size n = 500 simulations (10 3 replications) and d = {8, 16, 24} items with K = 5 equally weighted categories from 1-factor and 2-factor tree copula models with Gumbel copulas and an 1-truncated drawable/regular vine residual dependence structure and resultant number of times a pair of items is correctly selected as an edge for each of the edges of the 1-truncated drawable and regular vine copula for both the partial and polychoric selection algorithms.In Figure 3 we report the frequency of a pair of items is correctly selected as an edge for each of the edges of the 1-truncated vine from the simulations of the 1-and 2-factor tree copula models with Gumbel copulas with d = 8, d = 16 and d = 24 items for both the partial and polychoric selection algorithms.It has been shown that the partial selection algorithm as the dimension increases performs extremely well for the 1-truncated drawable vine residual dependence structure, but poorly for the 1-truncated regular vine structure.The quite contrary (or complimentary) results are seen for the polychoric algorithm.The polychoric selection algorithm rather performs extremely well in selecting the true edges in the 1-truncated regular vine residual dependence structure.It is most accurate for the initial edges, while it is less accurate for the final edges.This is because the dependence strength is represented in descending order as τ = {0.40, . . ., 0.10}, so the polychoric selection algorithm is highly reliable to select the edges with stronger dependence.The edges with weaker dependence are not easily quantified and can be approximated with other edges that lead to a similar correlation matrix or even accounted for by the previous trees (factor copula models).

Application
In this section we illustrate the proposed methodology by analysing d = 20 items from a subsample of n = 221 veterans who reported clinically significant Post Traumatic Stress Disorder (PTSD) symptoms (Armour et al., 2017).The items are divided into four domains: (1) intrusions (e.g., repeated, disturbing and unwanted memories), ( 2) avoidance (e.g., avoiding external reminder of the stressful experience), (3) cognition and mood alterations (e.g., trouble remembering important parts of the stressful experience) and (4) reactivity alterations (e.g., taking too many risks or doing things that could cause you harm).Each item is answered in a five-point ordinal scale: "0 = Not at all", "1 = A little bit", "2 = Moderately", "3 = Quite a bit" and "4 = Extremely".
The dataset and its complete description can be found in Armour et al. (2017) or in the R package BGGM (Williams and Mulder, 2020).For some items, it is plausible that a veteran might be thinking about the maximum trauma (or a high quantile) of many past events.For example, for the items in the first domain, a participant might reflect on past relevant events where an intrusion affected their life; then by considering the worst case, i.e., the event where the negative effect of an intrusion in their life was substantial, they choose an appropriate ordinal response.For some of the other items, one might consider a median or less extreme harm of past relevant events.To sum up, the items appear to be a mixed selection between discretized averages and maxima so that a factor model with more probability in the joint upper tail might be an improvement over a factor model based on a discretized MVN.
The interpretations as above suggest that a factor tree with a combination of Gumbel and BVN or t ν copulas might provide a better fit.To further explore the above interpretations, we calculate the average of lower and upper polychoric semi-correlations (Kadhem and Nikoloulopoulos, 2021a,b) for all variables to check if there is any overall tail asymmetry.For comparison, we also report the theoretical semi-correlations under different choices of copulas.Table 3 shows averages of the polychoric semi-correlations for all pairs along with the theoretical semi-correlations under different choices of copulas.Overall, we see that there is more correlation in the joint upper tail than the joint lower tail, suggesting that factor tree copula models with Gumbel bivariate copulas might be plausible.
We then select a suitable vine tree structure using the partial and polychoric selection algorithms proposed in Section 4.1 and compute various discrepancy measures between the observed polychoric correlation matrix R observed and the correlation matrix R model based on factor tree copula models with BVN copulas.
We report the maximum absolute correlation difference D 1 = max |R model − R observed |, the average absolute correlation difference D 2 = avg|R model − R observed | and the correlation matrix discrepancy measure For a baseline comparison, we also compute the discrepancy measures for the 1-and 2-factor copula models with BVN copulas.We aim to obtain a dependence structure that results in the lowest discrepancy measure; this will indicate a suitable vine structure for the item response data on hand.
After finding a suitable vine structure, we construct a plausible factor tree copula model, to analyse any type of items, by using the proposed heuristic algorithm in Section 4.2.We use the AIC at the IFM estimates as a rough diagnostic measure for model selection between the models.In addition, we use the Vuong (1989) procedure that is based on the sample version of the difference in Kullback-Leibler divergence.Let Model 1 and Model 2 have parametric pmfs π (1) d (y; θ 1 ) and π (2) d (y; θ 1 ), respectively; θ 1 , θ 2 are the IFM estimates.The procedure computes the average D of the log differences between the two parametric models.Vuong (1989) has shown that asymptotically If it includes 0, then Model 1 and Model 2 are considered to be non-significantly different, while if it is above 0, then Model 2 is favourable and considered to fit better than Model 1.We will compare the (1) selected factor (tree) copula models (Model 2) versus their Gaussian analogues (Model 1), (2) selected factor tree copula model according to AIC (Model 2) versus all the other factor (tree) copula models with BVN copulas (Model 1), and (3) selected factor tree copula model according to AIC (Model 2) versus all the other factor (tree) copulas models with selected copulas (Model 1).Note in passing that the 2-factor (tree) copula models with BVN copulas will have one dependence parameter less as one copula in the second factor is set to independence for identification purposes.
Table 4 shows that the sample correlation matrix of the data has a 2-factor tree structure according to the discrepancy measures.The table also gives the AICs and the 95% CIs of Vuong's tests for all the fitted models.The best fitted model, based on AIC values, is the 2-factor tree copula model obtained from the partial selection algorithm.The best fitted 2-factor tree copula model has the t 2 for the 1st tree, Gumbel for the 2nd tree, and t 5 for the 3rd tree.From the Vuong's 95% Cls it is shown that 2-factor tree copula model provides a big improvement over its Gaussian analogue and outperforms all the other fitted models except the 2-factor tree obtained from the polychoric selection algorithm.The tree selection algorithms might not yield into the same 'true' vine tree, however, closely approximated factor tree copula models are achieved.The factor tree copula model is mostly constructed with t 2 bivariate copulas which are suitable for both positive and negative dependence, however the highest dependence is found in the 2nd factor which is constructed with Gumbel copulas.This is in line with both the initial interpretations and preliminary analysis which suggest that some items can be considered as discretized maxima.
Table 5 includes the copula parameter estimates in Kendall's τ scale and their standard errors (SE) for the selected 2-factor and 2-factor tree copula models.The latter is obtained from the partial selection algorithm.To make it easier to compare strengths of dependence, we convert the BVN/t ν and Gumbel/s.Gumbel copula parameters to Kendall's τ 's via the relation τ (θ) = 2 π arcsin(θ) and (10), respectively.Interestingly, the Kendall's τ 's in the 2-factor copula model are roughly equivalent to the estimates in the 1st and 2nd factors of the 2-factor tree copula model.Most of the dependence is captured in the first two trees, resulting in weak to medium residual dependencies in the 1-truncated vine copula model, but significantly larger from independence.Overall, the items, in the Markov tree, are mostly positively associated to one another with only few negative conditional dependencies.The residual dependencies reveal that there is stronger association between the 10th and 11th items that are "Blame of self or others" and "Negative trauma-related emotions", respectively.In addition, there is moderate association between items 9 and 11 that are "Negative beliefs" and "Negative trauma-related emotions", respectively.With similar moderate dependence found between items 9 and 4 that are "Negative beliefs" and "Emotional cue reactivity", respectively.

Discussion
We have proposed combined factor/truncated vine copula models to capture the residual dependence for item response data.They form conditional dependence of the items given the latent variables, and go beyond the factor models where the items are conditionally independent given the latent variables.By combining the factor copula models with an 1-truncated vine copula model, we construct conditional dependence models given very few interpretable latent variables.The combined factor/truncated vine structure has the form of (i) primary dependence being explained by one or more latent variables, and (ii) conditional dependence of item response variables given the latent variables (Joe, 2018).They are especially useful and interpretable when there are a few latent variables that can explain most but not all of the dependence in the item responses.
The flexibility of the factor tree copula models endorses the significance of model selection.In practice, one has to first select the 1-truncated vine tree structure E and then suitable bivariate copulas to account for more probability in the one or both joint tails.We tackle these model selection issues by proposing heuristic algorithms to choose a plausible factor tree copula model that can adequately capture the (residual) dependencies among the item responses.We have shown that the proposed models provide a substantial improvement over the 1-factor and 2-factor (tree) copula models with selected (BVN) copulas on the basis of the AIC and Vuong's statistics.The 1-factor and 2-factor tree copula models with BVN can be viewed as first order models if models based on other tail dependent copulas are called.We consider the 1-factor and 2-factor tree copula models to be reasonable parsimonious models as most of the dependence is explained via the first few trees in the factor model.This is because that for all the bivariate margins to have upper/lower tail dependence, it only suffices that the bivariate copulas in the first trees (factor part) to have upper/lower tail dependence and is not necessary for the bivariate copulas in the higher trees after the 1-truncated vine to have tail dependence (Joe et al., 2010).
The proposed models are reproducible as the conditional independence and residual dependence parts are modelled separately.The residual dependencies are taken into account by a Markov tree without changing anything to the conditional independence model part.The use of a Markov tree for the residual dependence is a new direction for parsimonious dependence.This reproducibility as per the terminology in Liang et al. (1992), means that we can remain within a well-known and conceptually attractive framework as offered by the factor copula models when applying a factor tree copula model.This will be attractive to practitioners that have a basic and conceptual understanding of factor models, but are less familiar with complicated models that are available to approach the problem of residual dependence.The main change in the factor copula model is only in the formulation of the joint conditional distribution, while the conditional part of the model, i.e., the unique loading parameters, these are τ s converted to normal copula parameters θ1j and θ2j and then to loadings with the relations in Section 4.1, is left intact.

Software
R functions for estimation, simulation and model selection of the factor tree copula models will be part of the next major release of the R package FactorCopula (Kadhem and Nikoloulopoulos, 2021c).

Figure 1 :
Figure 1: Graphical representation of a 1-factor tree copula model with d = 5 items.The first tree is the 1-factor model.The residual dependence is captured in Tree 2 with an 1-truncated vine model.

Figure 2
Figure 2 depicts the graphical representation of a 2-factor tree copula model with d = 5 items as a 3truncated vine.Trees 1 and 2 show the common 2-factor model, while Tree 3 involves the pairwise conditional dependencies of two items given the factors.

Figure 2 :
Figure 2: Graphical representation of a 2-factor tree copula model with d = 5 items.The first and second trees represent the 2-factor model.The residual dependence is captured in Tree 3 with an 1-truncated vine model.Note that the factors are linked to one another with an independent copula in Tree 1.
graded response IRT model, known as normal ogive model (McDonald, 1997) with a 1-factor correlation matrix R = (r jk ) with r jk = θ 1j θ 1k for j = k.The 2-factor model in (5) is the same as the bidimensional (2-factor) normal ogive model with a 2-factor correlation matrix R = (r jk ) with r jk where T (; ν) is the univariate Student-t cdf with (non-integer) ν degrees of freedom, and T 2 is the cdf of a bivariate Student-t distribution with ν degrees of freedom and correlation parameter θ.A model with bivariate t ν copulas has latent (ordinal) variables that can be considered as mixtures of (discretized) means, since the bivariate Student-t distribution arises as a scale mixture of bivariate normals.A small value of ν, such as 1 ≤ ν ≤ 5, leads to a model with more probabilities in the joint upper and joint lower tails compared with the BVN copula as the t ν copula has reflection symmetric upper and lower tail dependence.

Table 3 :
Average observed polychoric correlations and semi-correlations for all pairs of items for the Post Traumatic Stress Disorder dataset, along with the corresponding theoretical semi-correlations forBVN, t2, t5, Frank, Gumbel , and survival Gumbel (s.Gumbel)copulas.

Table 5 :
Estimated copula parameters and their standard errors (SE) in Kendall's τ scale for the selected 2-factor and 2-factor tree copula models obtained from the partial selection algorithm for the Post Traumatic Stress Disorder dataset.