1 Introduction

Researchers in many different disciplines apply latent structure models in which observed variables are treated as indicators of an underlying latent variable that cannot be measured directly. An often used strategy in this context consists of three steps (Vermunt 2010). First, the parameters of the measurement model are estimated, describing the relationship between the latent variable and its indicators. Second, each respondent is assigned a latent score based on his/her scores on the indicators. Finally, the relationships between the latent scores and scores on exogenous variables are assessed.

Croon (2002) showed that for general latent structure models, such a strategy leads to inconsistent estimates of the parameters of the joint distribution of the latent variable and the exogenous variables. Bolck et al. (2004) discussed this problem in the context of latent class analysis where observed variables are categorical. They also derived a correction procedure that produces consistent estimates, known as the BCH correction method. Subsequent simulation studies by Vermunt (2010), Bakk et al. (2013), Bakk and Vermunt (2016), and Nylund-Gibson and Masyn (2016) have demonstrated that this procedure produces unbiased parameter estimates and correct inference for a large range of simulation conditions. When applying the BCH correction method in cases of categorical exogenous variables, two problems can arise. First, negative cell proportion estimates can be obtained (Asparouhov and Muthén 2015). Second, the approach cannot deal with situations where marginals need to be constrained. An example is edit restrictions in official statistics, leading to certain marginals being fixed to zero (De Waal et al. 2012), which is also used in combination with latent class modelling (Boeschoten et al. 2017).

In this research, note the BCH method is extended to solve these two problems. We allow for linear equality and inequality constraints by noting the correction method minimizes a quadratic loss function and give a closed form solution for linear equality restrictions. Next, we demonstrate how solutions for inequality constraints may be obtained using numerical methods. We first discuss the three-step approach to the latent class model and the BCH correction method. We then show how to impose linear restrictions and how to extend this to including non-negativity constraints. At last, the extended BCH method is applied on a dataset from the Political Action Survey. In the Appendix, R code is given to apply the procedure.

2 The Three-Step Approach to the Latent Class Model and the BCH Correction Method

Let us denote a set of observed exogenous variables Q and an unobserved latent variable X. All variables involved are assumed to be categorical. Let Q = (Q1,Q2,...,QJ) be the Cartesian product of J different discrete random variables Qj. If the variable Qj is defined for nj categories, the distribution of Q can be specified as a multinomial distribution with \(n={\prod }_{j = 1}^{J}n_{j}\) categories.

In the basic latent class model considered by Bolck et al. (2004), a single categorical latent variable X with m categories is introduced. The variable X itself is not directly observed but only indirectly via a set of indicator variables Y = (Y1,Y2,...,YK). Let the joint distribution of the categorical variables Q, X, and Y be denoted by

$$\text{p}(\mathbf{{Q}}=\mathbf{{q}},{{X}}={x},\mathbf{{Y}}=\mathbf{{y}}) = \text{p}(\mathbf{{q}},{x},\mathbf{{y}}). $$

Then, a possible factorization is

$$\text{p}(\mathbf{{q}},{x},\mathbf{{y}}) = \text{p}(\mathbf{{q}})\text{p}({x}|\mathbf{{q}})\text{p}(\mathbf{{y}}|{x},\mathbf{{q}}). $$

Since in the basic latent class model Q is assumed to have no direct effect on Y, the latter result simplifies to

$$\begin{array}{@{}rcl@{}} \text{p}(\mathbf{{q}},{x},\mathbf{{y}}) &= & \text{p}(\mathbf{{q}})\text{p}({x}|\mathbf{{q}})\text{p}(\mathbf{{y}}|{x}). \end{array} $$

The three-step approach to the estimation of the parameters of the latent class model starts with the estimation of the parameters of the measurement model represented by the conditional probability distribution p(y|x). Once this estimation procedure is completed, individual research units may be assigned to one of the latent classes solely on the basis of their observed scores on Y. This defines the second step of the estimation procedure and results in an assignment of each individual to a latent class. If the random variable W represents the latent classes individuals are assigned to, and assignment is done using a modal rule where each individual is assigned to the class for which its posterior membership probability is the largest, this can be expressed as

$$ \text{p}({w}|\mathbf{{y}}) = \left\{\begin{array}{ll} 1 & \quad \text{if } \text{p}({x}_{1}|\mathbf{{y}}) > \text{p}({x}_{2}|\mathbf{{y}}) \forall {x}_{1} \neq {x}_{2}, \\ 0 & \quad \text{otherwise.} \end{array}\right. $$
(1)

Different assignment rules than the modal rule will yield a different form for Eq. 2. All subsequent results also apply to other assignment rules, such as proportional or random assignment (Bakk 2015).

Since Y and Q are conditionally independent given X, so are W and Q and the conditional distributions are related by

$$\text{p}({w}|\mathbf{q})=\sum\limits_{{x}= 1}^{\mathbf{{X}}}\text{p}({w}|{x})\text{p}({x}|\mathbf{{q}}). $$

In terms of the joint distribution, this becomes

$$\text{p}(\mathbf{{q}},{w})=\sum\limits_{{x}= 1}^{\mathbf{{X}}}\text{p}(\mathbf{{q}},{x})\text{p}({w}|{x}). $$

The latter result can be recast as a matrix equation

$$\mathbf{{E}}=\mathbf{{AD}}, $$

with the elements of the three matrices defined as eqw = p(q,w), aqx = p(q,x), and dxw = p(w|x). After completing the first and the second estimation steps, the elements of the matrices E and D are known. The joint distribution of Q and the latent variable X is then given by

$$\mathbf{{A}}=\mathbf{{ED}}^{-1}. $$

Here, it is assumed that matrix D is not singular so that its inverse exists (see Bolck et al. (2004, pp. 13–14) for a discussion on when this assumption may be violated). A consistent estimate of A is \({\hat {\mathbf {E}}\hat {\mathbf {D}}}^{-1}\).

The previously obtained algebraic solution for matrix A can also be derived via a rather trivial minimization of a least squares function. Let E and D be matrices with known elements. Matrix E is of order n × m and D is an invertible matrix of order m × m. Let A be an n × m matrix of unknown elements and consider the following least squares function:

$$\varphi=\frac{1}{2}\text{tr}(\mathbf{{AD}}-\mathbf{{E}})^{\prime}(\mathbf{{AD}}-\mathbf{{E}}). $$

Minimizing φ with respect to the unknown matrix A yields A = ED− 1, for which φ attains the truly minimal value of zero. Note that the factor 1/2 is introduced to obtain simpler expressions for the first derivatives. Its introduction does not change the solution of the minimization problem.

3 The Correction Procedure Under Linear Equality Constraints

In some applications, simple linear restrictions may be imposed on the elements of matrix A. For instance, some of the probabilities in the joint distribution of Q and X may be set equal to zero, for example for combinations of Q and X that cannot occur in practice. After imposing such zero constraints, all the non-zero cell probabilities should still add to one. The quadratic loss function φ can be minimized under equality constraints on the unknown elements of matrix A by applying the method of Lagrangian multipliers.

We first rewrite the quadratic loss function φ in the following way using vectorization operations on matrices (see Schott 1997, pp. 261–266). For the vector of residuals r, we obtain

$$\begin{array}{@{}rcl@{}} {\mathbf{r}} &= &\text{vec}({\mathbf{AD}}-{\mathbf{E}}) \\ &=& \text{vec}({\mathbf{I}}_{n\times n}{\mathbf{AD}})-\text{vec}({\mathbf{E}}), \end{array} $$

where In×n is an n × n identity matrix. Applying Theorem 7.15 from Schott (1997, p. 263) yields

$${\mathbf{r}} =({\mathbf{D}}^{\prime}\otimes{\mathbf{I}}_{n\times n})\cdot\text{vec}({\mathbf{A}})-\text{vec}({\mathbf{E}}), $$

in which ⊗ is the Kronecker product of two matrices (Graham 1982). Defining \({\mathbf {P}}={\mathbf {D}}^{\prime }\otimes {\mathbf {I}}_{n\times n}\), a = vec(A) and e = vec(E), we are able to write

$$\mathbf{r} = {\mathbf{Pa}}-{\mathbf{e}}, $$

so that the least squares function becomes

$$\begin{array}{@{}rcl@{}} \varphi &=& \frac{1}{2}{\mathbf{r}}^{\prime}{\mathbf{r}}\\ &=& \frac{1}{2}({\mathbf{a}}^{\prime}{\mathbf{P}}^{\prime}{\mathbf{P}} {\mathbf{a}}-2{\mathbf{e}}^{\prime}{\mathbf{Pa}}+{\mathbf{e}}^{\prime}{\mathbf{e}}). \end{array} $$

The completely unconstrained solution to the minimization problem is given by

$$\mathbf{a}_{0} = ({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}\cdot{\mathbf{P}}^{\prime}{\mathbf{e}}. $$

Now suppose that the S linear equality constraints can be represented by a matrix equation

$$\mathbf{Ha} = {\mathbf{c}}. $$

The matrix H is of order S × N, N being the number of cells in matrices A and E. We may assume that H is of rank S; otherwise, the linear equality constraints would not be linearly independent. To minimize the least square function φ under a set of S linear constraints on the elements of A, the Lagrangian is defined as

$$ {\mathbf{L}} = \varphi-{\mathbf\lambda}^{\prime}({\mathbf{Ha}}-{\mathbf{c}}). $$
(2)

Setting the first derivatives of L with respect to a equal to the zero vector, and solving for a yields:

$$\mathbf{a} = ({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}({\mathbf{P}}^{\prime}{\mathbf{e}}+{\mathbf{H}}^{\prime}\boldsymbol{\lambda}), $$

which can be rewritten as:

$$\mathbf{a} ={\mathbf{a}}_{0}+({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}{\mathbf{H}}^{\prime}\boldsymbol{\lambda}. $$

Solving for the unknown Lagrangian multipliers by taking the derivative of the Lagrangian (Eq. 2), and setting it to zero, or equivalently by imposing linear constraints Hac = 0 yields:

$$\boldsymbol{\lambda} = [{\mathbf{H}}({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}{\mathbf{H}}^{\prime}]^{-1}({\mathbf{c}}-{\mathbf{Ha}}_{0}). $$

So that the final solution for a is:

$${\mathbf{a}} = {\mathbf{a}}_{0}+({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}{\mathbf{H}}^{\prime}[{\mathbf{H}}({\mathbf{P}}^{\prime}{\mathbf{P}})^{-1}{\mathbf{H}}^{\prime}]^{-1}({\mathbf{c}}-{\mathbf{Ha}}_{0}). $$

Note that the vector cHa0 represents the deviations of the unconstrained solution from the linear equality constraints. Again a consistent estimate of a can be obtained by replacing P and a0 with their sample estimates.

4 The Correction Procedure Under Linear Equality and Inequality Constraints

A second issue with the BCH procedure is that in finite samples the consistent estimate \({\hat {\mathbf {A}}}\) hat may contain negative values. This issue is similar to the occurrence of Heywood cases in factor analysis (Heywood 1931). Such negative values in the probability table estimate \({\hat {\mathbf {A}}}\) may prevent subsequent analyses. We suggest preventing such inadmissible solutions by imposing inequality constraints. The resulting minimization problem is a quadratic program that can be solved by an iterative method.

Such a numerical iterative method for an equality and inequality constrained minimization of a quadratic function has been described by Goldfarb and Idnani (1983). Their numerical algorithm solves the quadratic programming problem of the form

$${\min}\left( \frac{1}{2}{\mathbf{b}}^{\prime}\mathbf{{D}}_{\text{mat}}\mathbf{b}-{\mathbf{d}}_{\text{vec}}^{\prime}{\mathbf{b}}\right), $$

subject to the constraints

$${\mathbf{H}}^{\prime}{\mathbf{b}} \geq{\mathbf{b}}_{0}, $$

with respect to the n unknown parameters in vector b. The matrix Dmat is a given n × n symmetric positive definite matrix whereas dvec is a given n × 1 vector.

To apply the Goldfarb-Idnani optimization procedure in the present context, the following definitions have to be implemented. First, to include non-negativity constraints, we make use of Theorem 7.6 from Schott (1997, p. 254) to obtain

$$\begin{array}{@{}rcl@{}} \mathbf{D}_{\text{mat}} &=& {\mathbf{P}}^{\prime}{\mathbf{P}}\\ &=&({\mathbf{DD}}^{\prime})\otimes{\mathbf{I}}_{n\times n}. \end{array} $$

and

$$\mathbf{d}_{\text{vec}}= {\mathbf{P}}^{\prime}{\mathbf{e}} $$

Since it is assumed that matrix D is of full rank, the matrix \({\mathbf {P}}^{\prime }{\mathbf {P}}\) is positive-definite. This ensures that the quadratic loss function φ is strictly convex. Moreover, the type of equality and inequality constraints considered here (the sum of the elements in matrix A is equal to 1, where all elements ≥ 0 and some are fixed to 0), define a convex region in the parameter space.

To represent the constraints on the cell probabilities we now define matrix H in such a way that the first row of H has all its elements equal to 1. This row represents a constraint on the sum of all cell probabilities. We represent this row vector as matrix H0. Let J = {1,2,3,...,N} be an index set corresponding to the column numbers of matrix H. This index set can be partitioned in two non-overlapping subsets J1 and J2:

  • Subset J1 contains the indices of the elements of vector a which are set exactly equal to zero: for those indices j we require aj = 0;

  • Subset J2 contains the indices of the elements of vector a which are required to be non-negative: for those indices j we require aj ≥ 0.

Now let In be an N × N identity matrix and permute the rows of this matrix so that the upper part contains the rows corresponding with the index numbers in J1, and the lower part of the permuted identity matrix contains the rows corresponding with the index numbers in J2. Referring to the two parts of the permuted identity matrix as H1 and H2, respectively, the matrix H is obtained by

$${\mathbf{H}} =\left( \begin{array}{c} {\mathbf{H}}_{0} \\ {\mathbf{H}}_{1} \\ {\mathbf{H}}_{2} \end{array}\right), $$

where H is used to obtain the final solution for a. Note that in cases where we are not interested in applying equality constraints, but we are interested in applying the inequality constraints we simply omit H1. Vector b0 is of length N + 1, with its first element equal to 1 and all the remaining elements equal to 0.

With this procedure, we are able to find a solution for A (the joint distribution of latent variable X and exogenous covariates Q) where the sum of the elements is equal to 1, where no negative elements are created, and where impossible combinations of scores can be set to have a probability of zero. Having defined b, Dmat and H, the solution can be obtained using standard software for quadratic programming, such as the R package quadprog (Turlach and Weingessel 2013).

5 Application

As an illustration, the extended BCH method is applied on a dataset from the Political Action Survey (Barnes et al. 1979; Jennings and Van Deth 1990). The dataset consists of five dichotomous indicators on political involvement and tolerance (“System Responsiveness”; “Ideological Level”; “Repression Potential”; “Protest Approval”; “Conventional Participation”) and three nominal covariates (“Sex”; “Level Of Education”; “Age”). This dataset has previously been used in Hagenaars (1993) and Vermunt and Magidson (2000) and in the Latent GOLD user’s manual (Vermunt and Magidson 2005). The dataset as well as the syntax used in this illustration can be found in Latent GOLD version 5.1 under “syntax examples” → LCA → restrictions → equalities → Model C.

In the first step, a four class restricted model is applied to distinguish between four latent classes on involvement and tolerance. In this model, response probabilities are restricted to be equal for the items “System Responsiveness” and “Conventional Participation,” and the response probability for the variable “Ideological Level” is fixed to 0 by specifying a logit of 100.

In the second step, cases are assigned to a latent class by using modal assignment, resulting in the imputed latent variable W. In the third step, the relationship between the imputed latent variable “Involvement And Tolerance” (W ) and exogenous covariate “Age” (Q) is investigated. The E-matrix containing the joint probabilities of these variables is:

$$\begin{array}{@{}rcl@{}} && \begin{array}{cccc} W_{\text{1}} & W_{\text{2}} & W_{\text{3}} & W_{\text{4}}\\ \end{array}\\ &&\begin{array}{l} Q_{\text{16-34}}\\ {\mathbf{E}} = Q_{\text{35-57}}\\ Q_{\text{58-91}}\\ \end{array}\!\! \left( \begin{array}{cccc} 0.05795848 & 0.15743945 & 0.01643599 & 0.09256055 \\ 0.08477509 & 0.17560554 & 0.05276817 & 0.03979239 \\ 0.12802768 & 0.10034602 & 0.06920415 & 0.02508651 \end{array}\right). \end{array} $$

The D-matrix describing the relationship between the imputed latent variable “involvement and tolerance” (W ) and the latent variable “involvement and tolerance” (X) is also obtained:

$$\begin{array}{@{}rcl@{}} && \begin{array}{cccc} W_{\text{1}} & W_{\text{2}} & W_{\text{3}} & W_{\text{4}}\\ \end{array}\\ &&{\mathbf{D}}=\begin{array}{l} X_{\text{1}}\\ X_{\text{2}}\\ X_{\text{3}}\\ X_{\text{4}}\\ \end{array}\!\! \left( \begin{array}{ccccc} 0.67389148 & 0.1570985 & 0.02678610 & 0.1422239 \\ 0.01898361 & 0.7891416 & 0.05879905 & 0.1330757\\ 0.17186997 & 0.2725275 & 0.54176422 & 0.0138383\\ 0.12184782 & 0.3220914 & 0.01975761 & 0.5363031 \end{array}\right). \end{array} $$

The BCH method can now be applied by estimating ED− 1, resulting in the A matrix:

$$\begin{array}{@{}rcl@{}} && \begin{array}{cccc} X_{\text{1}}&X_{\text{2}} & X_{\text{3}} & X_{\text{4}}\\ \end{array}\\ &&{\mathbf{A}}_{\text{unconstraint}} =\begin{array}{l} Q_{\text{16-34}}\\ Q_{\text{35-57}}\\ Q_{\text{58-91}} \\ \end{array}\!\! \left( \begin{array}{ccccc} 0.0577223 & 0.13465976 & 0.008359502 & 0.123652898\\ 0.1018944 & 0.17635045 & 0.073167182 & 0.001529175 \\ 0.1618782 & 0.06157076 & 0.113576159 & -0.014360760 \end{array}\right). \end{array} $$

As can be seen, this result is inadmissable since the cell Q58-91 × X4 contains a negative value. Therefore, it will not be possible to estimate posterior membership probabilities and to do subsequent analyses here.

When the extended BCH method is applied, the following constrained A matrix is obtained:

$$\begin{array}{@{}rcl@{}} && \begin{array}{cccc} X_{\text{1}}&X_{\text{2}} & X_{\text{3}} & X_{\text{4}}\\ \end{array}\\ &&{\mathbf{A}}_{\text{constraint}} =\begin{array}{l} Q_{\text{16-34}}\\ Q_{\text{35-57}}\\ Q_{\text{58-91}} \\ \end{array}\!\! \left( \begin{array}{ccccc} 0.05741718 & 0.13472999 & 0.007627791 & 0.1229631559\\ 0.10158926 & 0.17642067 & 0.072435471 & 0.0008394325 \\ 0.15689781 & 0.05436459 & 0.114714655 & 0.0000000000 \end{array}\right). \end{array} $$

The cell Q58-91 × X4 does not contain a negative value anymore, so this matrix can now be used to estimate posterior membership probabilities and to do subsequent analyses.

Since there are no combinations of scores between “Involvement And Tolerance” and “Age” that are not possible in practice, it is not needed to fix any marginals to zero.

6 Conclusion

We have modified the BCH method to include linear equality and inequality constraints solving the problem of negative solutions and allowing for restrictions on arbitrary cell margins. With these adjustments, analysts interested in relating covariates to assignments on latent class variables will now be able to, for example, impose edit restrictions, further analyse solutions that were previously inadmissible, and analyse datasets involving more complex marginal restrictions. The application demonstrates that when a negative value is obtained using the regular BCH method, this can be solved by using the extended BCH method. In the ??, R code is given to apply the extended BCH method, and an addition to the example is given that demonstates how margins can be fixed to zero using the extended BCH method.