A Multivariate Logistic Distance Model for the Analysis of Multiple Binary Responses

We propose a Multivariate Logistic Distance (MLD) model for the analysis of multiple binary responses in the presence of predictors. The MLD model can be used to simultaneously assess the dimensional/factorial structure of the data and to study the effect of the predictor variables on each of the response variables. To enhance interpretation, the results of the proposed model can be graphically represented in a biplot, showing predictor variable axes, the categories of the response variables and the subjects’ positions. The interpretation of the biplot uses a distance rule. The MLD model belongs to the family of marginal models for multivariate responses, as opposed to latent variable models and conditionally specified models. By setting the distance between the two categories of every response variable to be equal, the MLD model becomes equivalent to a marginal model for multivariate binary data estimated using a GEE method. In that case the MLD model can be fitted using existing statistical packages with a GEE procedure, e.g., the genmod procedure from SAS or the geepack package from R. Without the equality constraint, the MLD model is a general model which can be fitted by its own right. We applied the proposed model to empirical data to illustrate its advantages.


Introduction
Multivariate binary data with multiple binary response variables and one or more predictor variables, are often collected in empirical sciences such as psychology, criminology, epidemiology, life sciences and medicine. In the Netherlands Study of Depression and Anxiety (NESDA), for example, data were collected to investigate the interplay between personality traits and co-morbidity of depressive and anxiety disorders (Pennix et al., 2008;Spinhoven et al., 2009). Another study in which multivariate binary data arises is the Indonesian Children's Study (ICS: Sommer, Katz, and Tarwotjo, 1984;Liang, Zeger, and Qaqish, 1992) where over three-thousand children were medically examined to investigate whether they had respiratory infection, diarrhoeal infection, and xerophthalmia. The aim of the ICS study was to investigate whether vitamin A deficiency places children at increased risk of respiratory and diarrhoeal infections.
The availability of the multivariate normal distribution for multivariate interval responses, makes application of maximum likelihood-based statistical models on such data relatively easy. However, for binary responses, no multivariate distribution is available and therefore estimation becomes more difficult. Liang and Zeger (1986) proposed Generalized Estimating Equations (GEE) for marginal modelling of correlated categorical data. GEE is a quasi-likelihood (QL) estimation method that does not require specification of a particular multivariate distribution. It is widely used as a standard approach for fitting marginal models on multivariate data (Ziegler, Kastner, and Blettner, 1998;Fitzmaurice et al., 2008;Ziegler, 2011). The GEE approach, however, does not allow for a dimensional approach to analysis. Often researchers have theories how different response variables belong to one underlying dimension, factor, or latent variable.
For the dimensional approach often latent variable models are used, such as structural equation models or item response models. These models explicitly define underlying dimensions. However, these models make distributional assumptions of the latent dimensions or assume an underlying distribution for the dichotomous responses or both. Such assumptions are often unverifiable, i.e. we cannot check the assumptions using the data.
In this paper, we will develop a dimensional model for multivariate binary data within the marginal framework. The model does not make unverifiable assumptions. The model will be developed within a distance framework, but we show it can also be seen as a specific marginal model. To enhance interpretation, a biplot is developed to accompany the model that visualizes the result.
De Rooij (2009) proposed the Ideal Point Classification (IPC) model for analyzing a multinomial response variable in the presence of predic-tors. The IPC is a probabilistic distance model based on a two-mode distance function. De Rooij (2009) also showed that a simple logistic regression for binary response variable can be written as a unidimensional IPC model. Worku and De Rooij (2016) extended the IPC model to the analysis of two binary response variables, i.e., the bivariate, binary data setting, and showed that a new parameterization of the IPC model recovered both the marginal probabilities and the association structure of bivariate binary data well. However, this parameterization cannot be easily extended to handling multivariate binary data because all the possible pairwise and higher order association terms must be specified in the likelihood function, which makes the model complex and therefore hard to estimate. Therefore, in this paper we propose a Multivariate Logistic Distance (MLD) model for analyzing multivariate binary data that extends marginal models for multivariate data. The MLD model unifies two domains of statistical methods, i.e., Multidimensional Scaling (MDS: Kruskal and Wish, 1978;Borg and Groenen, 2005) and Generalized Linear Model (GLM: McCullagh and Nelder, 1989;Agresti, 2002). As a form of regularization, the MLD model allows for dimension reduction and therefore less parameters are estimated compared to the existing marginal models for multivariate data. Moreover, the model enhances interpretation by using a biplot (Gabriel, 1971;Gower and Hand, 1996;Gower, Lubbe, and Le Roux, 2011) based on a distance interpretation.
Unlike existing marginal models for multivariate data, the MLD model can be used for assessing the factorial/dimensional structure of multivariate data. In the area of mental disorders (with the NESDA data as example), clinical psychologists and epidemiologists are often interested in comorbidity and how co-morbidity is related to risk factors such as personality traits (Krueger, 1999;Beesdo-Baum et al., 2009;Spinhoven et al., 2013). Three candidate theories about the co-morbidity of mental disorders have been proposed, i.e., (1) a 2-dimensional structure with one dimension representing distress and the other one fear (d/f); (2) a different 2-dimensional structure with one dimension representing depression and the other one anxiety (d/a); and (3) an unidimensional structure where all the disorders are represented by a single dimension. The MLD model can be used to represent such theories within a unified framework, i.e., the candidate theories can be compared using appropriate statistics, and at the same time the MLD model allows for a direct relationship between co-morbidity of mental disorders and the predictor variables.
The paper is organized as follows. Section 2 develops the multivariate logistic distance model, investigates the link with marginal model for multivariate binary data estimated using a GEE method, and discusses the construction of biplots for the multivariate logistic model. In Section 3, the proposed model is fitted to empirical data and the results are interpreted using the estimated parameters and a graphical representation. We conclude in Section 4 with a discussion.

Logistic Regression as a Distance Model
Logistic regression is a standard method for modelling dichotomous response data. Let y i denote the observed value for a binary dependent variable Y for subject i, where i = 1, 2, . . . , N. Logistic regression models the probability of a category conditional on the value of a predictor variable x i , where β * 0 and β * 1 are the intercept and the regression coefficient, respectively. Logistic regression can easily be generalized to accommodate multiple predictors, where β * is now a vector with regression coefficients. De Rooij (2009) showed that logistic regression can be expressed as a distance model in a joint space with points representing the two categories of the response variable and points representing the subjects. In this section, we revisit this relationship and in Section 2.2 discuss an extension for multivariate binary responses.
Let us define a joint unidimensional space for subjects and the classes of the response variables. Denote by η i the coordinate of the position for subject i and by γ 0 the coordinate of the position of one category and by γ 1 the coordinate of the position of the other category of the binary response variable. Define δ i0 and δ i1 to be the squared Euclidean distances between the position of subject i and the two categories respectively. That is, With these two distances we can define the following probability model .
The smaller the relative distance between a person point and a class point, the larger the probability that the subject belongs to that class. Therefore, the class probability is inversely related to the squared Euclidean distance between the points. The coordinate for subject i, η i , is assumed to be a linear combination of the predictor variable x i , i.e., η i = β 0 + β 1 x i or in case of multiple predictors η i = β 0 + x T i β. The parameters of the distance model are the regression weights and the category points.
An important tool in the interpretation of probability models is the log-odds. The log-odds representation of the distance model becomes, In the case of multiple predictors, the logistic distance model takes the same form, having an intercept and extra slopes for the additional predictors. For example, with two predictors For a unit increase in x i1 , the log-odds in the distance model changes by β 1 (γ 1 − γ 0 ), similarly for x i2 . By setting β * 0 = β 0 (γ 1 − γ 0 ) + 0.5(γ 2 0 − γ 2 1 ) and β * 1 = β 1 (γ 1 − γ 0 ), a standard logistic regression is obtained. The logistic distance model (4) is not identified and therefore an identifiability constraint must be imposed. For example, with β 1 = 2 and (γ 1 − γ 0 ) = 1, β * 1 = 2. The same value β * 1 = 2 can also be obtained when β 1 = 0.5 and (γ 1 − γ 0 ) = 2. By imposing an identifiability constraint on the class points, the logistic distance model can be identified, for example by setting γ 1 = 1 and γ 1 = 0. The logistic distance model is now identified and its relationship with the univariate logistic model presented in (1) becomes

Multivariate Extension of the Distance Model
In this section, the logistic distance model for a single response variable will be extended to handling multivariate binary data. Suppose y i = (y i1 , y i2 , . . . , y ij , . . . , y iJ ) T denotes the multivariate responses observed on the i−th subject, which is a (J × 1)-dimensional vector of all responses, where y ij is the binary measurement of the j-th response variable observed on the i-th subject. It is not difficult to generalize the methodology to the case where the number of response variables differs over subjects, but that complicates the notation. As before, let x i represent the multiple predictors observed on i−th subject. In Table 1, we display the structure of multivariate data in long format. The first column, SID, is a variable which contains the subjects' identification number. The second column, Index, is a categorical indicator variable that indicates for which particular response variable the binary measurement y ij is obtained. In Table 1 five response variables are assumed, i.e., R 1 , R 2 , . . . , R 5 . The other columns represent measurements of the response variable and predictor variables, respectively. A unidimensional space was used to represent the logistic regression model (3), which positions both the subjects and the two categories of the response variable. In the case of multiple responses y i , the distance model can be extended to accommodate the additional responses. Suppose there is a second response variable. One possibility for generalization is to add the two points representing the categories of the second response variable to the unidimensional space. In that case, the predictor variables have a similar influence on the two response variables.
Another generalization is that the second response variable pertains to another dimension, giving rise to a two-dimensional model. The definition of the distance becomes where η im is the coordinate for the point representing subject i on the m-th dimension and is defined as a linear combination of the predictors, η im = β 0m + x T i β m ; and γ kj,m is the coordinate for category k (k = {0, 1}) of response variable j on dimension m. Each response variable belongs to one and only one dimension. This assumption is driven by theories often developed by applied scientists. In the Introduction section, we discussed three different theories about comorbidity of mental disorders. Spinhoven et al. (2013), for example, found two dimensions of which the first dimension (distress) was represented by major depressive disorder, generalized anxiety disorder, and dysthimia; and the second dimension (fear) was represented by panic disorder and social phobia.
The probability for category 1 on response variable j given the pre- The log-odds representation of the multivariate distance model becomes, Because each response variable belongs to a single dimension, the log odds representation can be further simplified. Suppose response variable j belongs to dimension 1 so that γ 0j,m and γ 1j,m equal zero for all m > 1, i.e. all dimensions except the first one. In that case, (8) simplifies to a single equation instead of a sum over dimensions. This distance model for multivariate binary data (7 -8) is called the Multivariate Logistic Distance (MLD) model. Because the MLD model is a type of bilinear model, for each dimension we have to fix the origin and scale. Like in the simple logistic regression representation we fix the class coordinates for one of the response variables on every dimension, e.g., γ 1j,m = 1 and γ 0j,m = 0.
The effect of a predictor variable on a specific response variable j is determined by the dimension the j-th response variable is positioned on. More specifically, the effect β m (γ 1j,m − γ 0j,m ). Therefore, for different response variables on the same dimension the size of the effect is different, depending on (γ 1j,m − γ 0j,m ), but the direction is the same as long as γ 1j,m ≥ γ 0j,m , ∀j, m, and defined by β m . Furthermore, the larger (γ 1j,m − γ 0j,m ) the bigger the effect is and vice versa. In other words, the larger the distance between the two points representing the categories of a single response variable, the better the predictor variables can discriminate between the categories.

Parameter Estimation
The parameters in the MLD model are estimated by maximizing the likelihood function for independent data, in the multivariate situation known as quasi-likelihood; i.e., where θ is the concatenation of all the class points and all the regression weights. Liang and Zeger (1986) showed that maximizing this quasi-likelihood provides consistent parameter estimates for the multivariate model. However, the standard errors based on the corresponding Hessian matrix are biased. The same authors proposed a sandwich estimator for the covariance matrix to correct for the bias (Liang and Zeger, 1986). Another method for obtaining correct standard errors is to apply a clustered bootstrap method (Sherman and Le Cessie, 1997;De Rooij and Worku, 2012;Cheng et al., 2013). In this case, the re-sampling procedure is applied on the subject (cluster) level so that the correlation between the multivariate responses is retained in each bootstrap sample.
The number of independent parameters estimated in the MLD model, q, equals The first term in (10), i.e., [M × (p + 1)], corresponds to the number of regression coefficients; the other term to the number of estimable class points. The identifiability constraints are accounted for in the second term, i.e., in each dimension the class coordinates for a single response variable are set to fixed values.
The MLD model can be fitted using the NLMIXED procedure in SAS software (SAS Institute Inc., 2011). Scripts for the analyses in this paper are available upon request from the first author.

The Relationship of the MLD Model to Generalized Estimating Equations
By setting the distance between the two categories of every response variable to be equal to one, i.e., (γ 1j,m − γ 0j,m ) = 1, the MLD model becomes equivalent to a marginal model for multivariate binary data estimated using GEE method (Liang and Zeger, 1986). The restriction of the class points implies that predictor variables discriminate equally well for all response variables belonging to a specific dimension. Existing statistical packages with a GEE procedure (e.g., the genmod procedure from SAS or the geepack package from R) can be used for fitting this "restricted" MLD model on multivariate binary data.
Fitting the restricted MLD model using a GEE procedure involves a three-step approach: (1) construction of a design matrix for both the response and the predictor variables; and (2) applying the GEE method with the constructed design matrix; and (3) transforming the GEE parameters to MLD parameters. We now show construction of the design matrix using the example presented in Table 1.
Suppose we want to fit a 2-dimensional model on the five binary response variables. Further, suppose we like the first three response variables to be represented on the first dimension, and the fourth and the fifth on the second dimension. Therefore define a response indicator matrix, denoted by Z, with dimension (J × M ). The response indicator matrix specifies for each response variable to which dimension it pertains, with position (j, m) equal to one if the j-th response variable belongs to the m-th dimension and zero otherwise. For the structure layed-out above, The design matrix for subject i is then obtained by computing the Kronecker product between the response indicator matrix and the predictors vector (without intercept), We concatenate U i and the identity matrix to get the final design matrix, Then, a vertical concatenation of all S i matrices will give us the final design matrix S on which the GEE method is finally applied to obtain parameter estimates of the marginal model. This results in five response specific intercepts (β * 01 , . . . , β * 05 ) corresponding to the first five columns of S and two sets of p regression weights (β * 11 , . . . , β * p1 and β * 12 , . . . , β * p2 ). The MLD parameters can be derived from these as follows γ 0j,m = −(β * 0j + 0.5) for the dimension, m, to which disorder j belongs, zero otherwise. The regression weights β jm are equal to the regression weights obtained from GEE method, β jm = β * jm . The number of parameters in the "restricted" MLD model then becomes q = [M × (p + 1)] + (J − M ) since additional constraints are imposed on the class points.

Model Selection
In statistical analysis, we often select a parsimonious and best fitting model from a set of candidate models given the data. In the MLD model, we select not only predictor variables for the final model, but also the dimensionality of the model must be determined. Pan (2001) proposed the quasi-likelihood under the independence model criterion (QIC) as an extension of Akaike Information Criterion (AIC) to GEE: whereV R stands for robust variance estimator obtained under the assumption of a general "working" covariance structure R; andΩ is for the naive variance estimator obtained under the assumption of an independence correlation structure. Pan (2001) also proposed a simplified version of QIC when trace(Ω In the MLD model, we use QIC u fit statistics both for determining the dimensionality of the model and for variable selection. The model with the lowest QIC u statistics is considered the most parsimonious and best fitting model. As recommended in Yu and De Rooij (2013), we first determine the dimensionality of the model and then proceed to the variable selection.

Biplot for the Multivariate Logistic Distance Model
To enhance interpretation of the model, the results of a MLD model can be graphically represented in a biplot (Gabriel, 1971;Gower and Hand, 1996;Gower et al., 2011). The biplot represents the subjects, the response variables, and the predictor variables so that the relationship between predictors and responses can be read from the graph. We first demonstrate how the response variables are included in the biplot, and then the predictors.
Because each response is positioned on one and only one dimension, one of the columns in γ j equals zero. So, γ j represents two points either on the first or second dimension. Halfway between the two points, a decision line is drawn indicating equal probabilities for the two categories of a response variable. Due to these lines (horizontal for response variables on the second dimension and vertical for response variables on the first dimension), the two dimensional space is partitioned into rectangles, each representing a most probable response profile. The predictors are included in the biplot by variable axes (Gower and Hand, 1996). To derive the variable axis, first, a pseudo-design matrixX is constructed containing ones in the first column and zeros in the other columns except for the column representing the variable to be plotted. In this column, marker values are included within the range of the observed variable. Second, the matrix B with regression weights is defined, i.e.
Finally we can compute the matrix H as H =XB, defining a straight line in our biplot. We will include variable axes for every statistically significant predictor. Positions of the subjects are computed as the linear combination of predictor variables and are included in the biplot as points.

Application: The NESDA Data
To illustrate the MLD model, the NESDA data (Penninx et al., 2008) introduced earlier were analysed. The sample comprised of N = 2, 938 subjects aged 18 to 65 years (Mean = 42; S.D. = 13.1). About 66.5% were female and the average number of years of education attained was 12.2 with S.D. = 3.3. In this study, 37.1% of the subjects have major depressive disorder (MDD), 10.2% have dysthmia (DYST), 15.3% have generalized anxiety disorder (GAD), 22.4% have social anxiety disorder (SP), and 28.6% have panic disorder (PD). These five disorders are the response variables.
The predictors are Neuroticism (N), Extraversion (E), Openness to experience (O), Agreeableness (A), and Conscientiousness (C). We also took into account three background variables, i.e., age (AGE), years of education attained (EDU), and gender (GEN: 1 = female; 0 = male). The linear predictor part of the MLD model is where η im is a coordinate for the i-th subject position on the m-th dimension; and the β's are regression weights. The candidate MLD models fitted on the NESDA data are (1) "distress-fear" (d/f) dimensions, in which MDD, GAD, are DYST are presumed to be indicators of distress, and PD and SP for fear; (2) "depression-anxiety" (d/a) dimensions, in which MDD and DYST are indicators of depression, and GAD, PD, and SP for anxiety; (3) "unidimensional" where all the five mental disorders are indicators of a single dimension.
These three theories are then translated into the following indicator matrices: respectively. The superscript corresponds to a theory. For illustration, we fitted both the MLD model with and without imposing equal distance restrictions on the class points. The results of the MLD model with the restrictions will be presented first, thereafter the solution without the restrictions will be discussed. Table 2 shows the fit statistics of the candidate MLD models. As shown in the first block of Table 2 which compares different dimensionality, the 2-dimensional distress-fear (d/f) model fitted the data best (QIC u = 12, 396.42). In the second block of Table 2, fit statistics for the comparison of different sets of predictor variables are given. The model with all predictor variables fitted the data best (QIC u = 12,396.42).
The regression weights of this selected model are given in Table 3. The standard errors based on both the sandwich and the clustered bootstrap method are included in Table 3. Both methods resulted in similar estimates.
There is a strong positive association between neuroticism and the two dimensions:β 41 = 0.1167 with distress; andβ 42 = 0.1039 with fear. With every unit increase in neuroticism the log odds for MDD, DYST, and GAD go up by 0.1167 while the log odds for SP and PD go up by 0.1039. There is a moderate negative association between extraversion and the two dimensions:β 51 = −0.0419 with distress; andβ 52 = −0.0320 with fear. With every unit increase in extraversion the log odds for MDD, DYST, and GAD go down by 0.0419 while the log odds for SP and PD go down by 0.0320. From the background variables, only education has a statistically significant effect on both dimensions:β 11 = −0.0386 with distress; and β 12 = −0.0575 with fear. Less educated people have a higher risk of getting a mental disorder. The variable conscientiousness had a significant effect only on the second dimension (distress),β 82 = 0.0189, i.e. it only influences PD and SP. Although the total number of parameters of the final d/f model is q = 21, only sixteen of the parameters were displayed in Table 3. The others are the intercepts obtained from GEE method which are response-specific, i.e., β MDD 01 = −2.23, β GAD 02 = −3.73, β DYST 03 = −4.28, β PD 04 = −3.74, and β SP 05 = −4.14. Using γ 0j,m = −(β * 0j + 0.5) as shown in Section 2.4 and γ 1j,m = 1 + γ 0j,m , the class point coordinates for each response variable can be obtained. Thus, γ 01,1 = 1.73 for MDD, γ 02,1 = 3.23 for GAD, γ 03,1 = 3.78 for DYST, γ 04,2 = 3.24 for PD, and γ 05,2 = 3.64 for SP. We can use the estimated class points to compare the effect of predictors on the risk of developing disorders belonging to the same dimension. For example, MDD, DYST and GAD belong to the first dimension. Because γ 03,1 = 3.78 for DYST is larger than both γ 01,1 = 1.73 for MDD and γ 02,1 = 3.23 for GAD, it means that starting from a very low subject position on the first dimension and then increasing this position will first lead to higher probabilities of MDD, followed by GAD, and than for DYST. The model accounts for comorbidity because a high probability of DYST implies a high probability of GAD and MDD. The results of the selected MLD model are displayed in a biplot shown in Figure 1. In order to interpret the biplot, let us first discuss how the biplot was constructed. The biplot is composed of two parts, i.e., the response space and the variable axes, as shown in Figures 2 and 3, respectively. The positions of the two categories of all response variables are displayed in Figure 2. For example, on the vertical dimension there are four points corresponding to no PD, no SP, having PD, and having SP from the bottom to the top, respectively. Included in the same representation are decision lines (vertical and horizontal lines) crossing the mid-points between the two categories. The decision lines partition the two-dimensional space into rectangles (regions), each representing a most probable response profile.
Each region shows the disorder profile by 1's and 0's for the order MDD, GAD, DYST, PD, SP. An index '10011', for example, corresponds to the presence of MDD, PD, and SP, but the absence of GAD and DYST. In the top-right, an index of '11111' is used to represent a co-morbidity of all five mental disorders, while the region '00000' in the bottom left representing the absence of disorders.
In Figure 3, both the variable axes (lines) and the subjects points (grey dots) are displayed. The space includes only statistically significant predic-  tors. On the variable axes markers are placed that represent μ x ± tσ x , where μ x is the mean of x, σ x is the standard deviation, and t = 0, 1, 2, 3. Variable labels are included at the positive side of the variable axis.
Let us now interpret the biplot displayed in Figure 1. Most of the subjects are in the bottom left region representing absence of all the disorders. However, significant number of subjects are in other regions representing co-morbidity of mental disorders. The regions are '10000' corresponding to the presence of having only MDD; and '10010' corresponding to the presence of having both MDD and PD; '10011' corresponding to the presence of having MDD, PD, and SP; and, '11011' corresponding to the presence of all disorders, except DYST. Also a few subjects are in the upper right region having all the mental disorders. Now let us interpret the variable axes. The variable axis for Neuroticism (N) runs from the lower left (low values of neuroticism) to the upper right (high values of Neuroticism), indicating that persons with low values of Neuroticism are located in the '00000' region, whereas persons with very high values of neuroticism are located in the '11111' region. In short, the higher neuroticism the more disorders. Contrarily, the variable axes of extraversion points to the other direction.
The length of the variable axis indicates effect size; the longer the variable axis the larger the effect of the corresponding variable on the disorders.
The angle between the variable axis and the dimension measures the strength of their association. The smaller the angle between them, the stronger the association. For example, the angle of the extraversion variable axis with the first (horizontal) dimension is relatively small compared to the angle of extraversion with the second dimension. This indicates that extraversion has a larger effect on the disorders represented on the first dimension (MDD, DYST, and GAD) than on the disorders presented on the second dimension (PD and SP). The angle of neuroticism with both dimensions is about equal, although a bit smaller with the first dimension, indicating that the relationship of neuroticism with the disorders on the first dimension (MDD, GAD, and DYST) is slightly stronger than with the other two disorders. The variable conscientiousness is highly correlated to the second dimension and not to the first as its variable axis is orthogonal to the first dimension.
Finally, there is a strong correlation between the estimates of the subject points in the two dimensions, corr(η i1 ,η i2 ) = 0.99, indicating that the distress and fear dimensions are strongly correlated.
We now present the results of MLD model that does not impose restriction on the class points, i.e., the "unrestricted" MLD model, to address specifically the extra information from this model. The regression esti- mates are shown in Table 4. The estimates obtained from the "unrestricted" MLD model are slightly different compared to results obtained from the "restricted" MLD model fitted on NESDA data (shown in Table 3). However, both results lead to the same conclusion about significance of predictors, which is also indicated by the "Wald" statistics displayed in the last column of both tables. The class points for MDD are fixed for identification on the first dimension, i.e. the coordinates are 0 for no MDD and 1 for MDD. Similarly, the coordinates of PD on the second dimension are fixed to 0 for absence and 1 for presence of the disorder. The other parameters are the class points, i.e., γ 02,1 = 0.96 and γ 12,1 = 1.73 for GAD; γ 03,1 = 1.10 and γ 13,1 = 1.99 for DYST; and, γ 05,2 = −0.25 and γ 15,2 = 1.28 for SP. The distance between the two category points is 0.77 for GAD, 0.89 for DYST, and 1.53 for SP. This unrestricted MLD model provides additional information about how well the predictors can discriminate between the response categories. According to equation (8), the effect of the predictor variables on each response is partially determined by the distance between class points of the response variable. The larger the distance between the class points of a response variable, the better the predictor variables are able to discriminates between the categories. In the fitted model, both DYST and GAD are posi-tioned on the first dimension; because the distance for DYST (0.89) is larger than the distance for GAD (0.77), the effect of the predictor variables on DYST is stronger.

Conclusion and Discussion
We proposed a multivariate logistic distance (MLD) model for analyzing multivariate binary data that extends existing marginal models in a distance framework. The distance model for a single response variable was extended to analyzing multivariate binary data in the presence of predictors. The advantage of the MLD model over existing marginal model for multivariate data, is the possibility for dimension reduction as a form of regularization which simplifies the complexity of standard multivariate GLM model because less parameters are estimated. Moreover, using this dimension reduction substantial theories can be represented and investigated.
We have shown applications of both the "restricted" and the "unrestricted" MLD models using an empirical data set. The former MLD model imposes a restriction on the class points and the latter model does not. The "restricted" MLD model is equivalent to a marginal model for multivariate binary data estimated using GEE method, which is an advantage for our model because existing software package developed for GEE can be adopted to fit the MLD model. For the unrestricted case, the MLD model is a general model and can be fitted by its own right. The general MLD model provides us with additional information about how well the predictors can discriminate between the categories of the response variable.
The MLD model has a clear interpretation where both the odds ratio expressions as well as the biplot representation can be used. The space in the biplot is partitioned into different regions that indicate the most probable response profile. It is important to note that the assumption of which response variables belong to which dimension has a crucial impact on which regions might occur. In a unidimensional model there are maximal 6 regions, whereas in the two dimensional solution in Figure 2 there are 12 regions. Having 5 response variables, a total of 32 different profiles can be defined. In a five dimensional model all these 32 profiles are present. Dimension reduction thus reduces the number of most probable response profiles. Moreover, the regions also account for comorbidity. In the solution of Figure 2 there is never a response profile where MDD is absent and DYST and GAD are present. Similarly, if PD is present then also SP is present in the response profile. Notice, however, that the model is a probability model not a deterministic model. So, a response profile is most probable but the model does not say that in that region only a profile must occur.
The effect size of predictor variables can be read from the biplot by the length of the variable axis. The longer the variable axis the stronger the effect. The differential effect on the two dimensions can be read from the angle of a variable axis with the dimension. The smaller the angle the stronger the effect. If a variable has a 90 o angle with a dimension, the variable has no effect on the disorders belonging to that dimension.
The MLD model is related to Canonical Correspondence Analysis (CCA), as proposed by Ter Braak (1986), which is a multivariate method used for ordination axes that maximizes the separation among the multivariate binary responses (Ter Braak, 1986;Ter Braak and Verdonschot, 1995). There are two main differences between CCA and our model. The first is that these models work in different framework, i.e., the MLD model in a logistic framework where as CCA in a Gaussian framework. Due to this difference, the MLD can provide a clear interpretation in terms of (log)-odds and probabilities. The second is that unlike in CCA, the MLD model can position responses (e.g., mental disorders) on certain dimensions driven by the theories that we would like to test.
In areas like psychology, epidemiology, criminology, economics, political sciences, etc, researchers often use Structural Equation Models (SEM) for the analysis of data similar to the NESDA data (Plewis, 1996;Von Oertzen et al., 2010;Spinhoven et al., 2013). Despite its popularity, SEM has limitations as it makes unverifiable assumptions about the underlying distributions of latent as well as observed variables. Moreover, SEM often suffers from improper solutions, non-convergent solutions, and the predicted factors are not determinate, i.e., for the same number of response variables multiple solutions can be obtained for the underlying latent variables. Therefore, they cannot be uniquely identified (Acito and Anderson, 1986;Boomsma and Hoogland, 2001;Hubbard et al., 2010). In the application section, we showed that the MLD model can be used for comparing theories of interest, without making unverifiable assumptions about underlying distributions. Asar and Ilk (2013) proposed marginal model with shared-parameter within the GEE method (Asar and Ilk, 2013). To compare with our MLD model, they use the five dimensional model where each response variable pertains to a unique dimension. Then, they incorporate equality restrictions for certain predictors over different dimensions, giving a so-called shared parameter. In the restricted MLD model the regression weights are shared for all response variables belonging to a specific dimension.
Although our focus was on binary data, the model can be extended to polytomous data. Where for binary data there are two class points on each dimension for polytomous data there are multiple class points. Interpretation follows largely the binary model, although in the ordinal case we can derive odds ratios for every contrast of two categories of a response variable. These are formed by the difference of class points coordinates, just like in the binary case. The polytomous situation, however, is often more complicated than the binary one. The binary model for every response variable is by definition unidimensional, which is not the case for polytomous data. Therefore, the polytomous case needs further study.
We developed a package (an alpha version) in R, the mldm package, for fitting the MLD model on multivariate binary data in the presence of predictors. The package handles both the clustered bootstrap method and the sandwich estimators for correcting standard errors of model parameters. The package provides a biplot function for the fitted model. We also have SAS scripts for fitting the models. The first author can provide the package or the script upon request.