In social, behavioral, and educational assessments, it is common for researchers and practitioners to adopt measurement tools in order to collect data from respondents sampled from the population of interest. The measurement tools can be in various forms—for example, standardized tests used in academic aptitude measurement, surveys deployed in consumer satisfaction research, or self-rating scales utilized in psychological symptom diagnosis. Among all measurement tools, instruments that comprise multiple dichotomous items are most prevalent, due to this method’s long history of use and convention (Crocker & Algina, 1986; Cronbach, 1951). A dichotomous item contains binary responses, such as true–false answers, present–absent symptoms, or endorsed–nonendorsed attitudes. Each item should be carefully created to measure one or multiple latent (unobservable) variables of interest based on particular theories. To illustrate, addition, subtraction, multiplication, and division are four latent variables defined in math ability assessment, where test items such as “2 + 4 – 1” measure the first two latent variables, and items such as “4 × 2/3” measure the last two. Table 1 displays a general form, called a Q-matrix, which aligns the items with latent variables in accordance with a theoretical framework or content experts’ judgment (it is also called an expert-matrix). To obtain latent variable information at the respondent level, granted that a Q-matrix has been scientifically verified, one usually chooses statistical modeling strategies such as classical test theory (CTT) and item response theory (IRT; Lord, 1980). Modeling strategies such as CTT and IRT treat latent variables as continua, which are conventionally assumed to follow a (multivariate) normal distribution. Recent advances in analytic methods are collectively referred to as diagnostic classification models (DCMs; Rupp & Templin, 2008) or cognitive diagnostic models (CDMs; DiBello & Stout, 2007). These assume that the latent variables are also defined on the binary scale; as compared with CTT and IRT, DCMs have higher reliability and can offer rich diagnostic information to aid decision making. Particularly, DCM reliability is uniquely computed as a function of the tetrachoric correlation for the replication contingency table, made of posterior attribute mastery probability estimates (see Templin & Bradshaw, 2014, for computing details). This is derived according to the reliability definition designed for categorical latent traits; the higher reliability yielded by DCMs is a consequence of the smaller range of latent-variable values that examinee estimates can take in DCMs (Liu, Qian, Luo, & Woo, 2017; Rupp & Templin, 2008). In addition, DCMs possess high interpretability for a latent-variable structure that can be used to support or verify a certain theory. To date, researchers across different fields have adopted DCMs because of their aforementioned advantages. For example, Seixas, Zadrozny, Laks, Conci, and Saade (2014) used DCMs to study dementia and Alzheimer’s disease; Dakanalis, Timko, Clerici, Zanetti, and Riva (2014) implemented DCMs to investigate eating disorders; and Clark and Wells (1995) modeled social phobia via a DCM framework. In sum, by analyzing responses via a DCM framework, one can not only obtain information about the latent-variable profile for each respondent, but also about the features of the assessment items. To differentiate from CTT and IRT frameworks, the latent variables on a binary scale are called “attributes” throughout this article.

Table 1 A Q-matrix sample

Although DCMs provide an innovative and useful angle to analyze response data, the attribute estimates can be questionable in four situations. The first situation in which the attribute estimates aren’t sufficiently accurate is model misspecification. For instance, when responses were generated from a saturated modelFootnote 1 and/or a model with a hierarchical attribute structure,Footnote 2 a simpler and/or a misspecified model can yield inappropriate estimates (Liu, 2017; Templin & Bradshaw, 2014). The second occurrence is the inaccurate setting of the Q-matrix. The Q-matrix has strong influences on both the item parameters and the attribute estimates (Rupp & Templin, 2008), and therefore misspecified Q-matrices are detrimental, since the modeling procedures then face the risk of beginning from an incorrect base. The third scenario takes places when the sample sizes are small. Chiu and Douglas (2013) proved that when the sample size is small, traditional estimation approaches do not assign attributes accurately. The fourth occasion, which is the focus of this article, is the inadequacy of the estimation approaches per se. The sample size problem addressed in the third scenario can be regarded as a special case of the present issue; more broadly defined, estimation problems are related to algorithmic robustness. In other words, questions related to estimations that only perform well under limited conditions fall into this category. To summarize, this problem is introduced by the limit of the traditional approaches used in estimation, and therefore, to remedy it, this article proposes an innovative estimation approach called iterative latent-class analysis. The idea is to construct a more robust attribute estimator such that, in many conditions, it can produce more accurate results than the traditional approaches do.

Log-linear cognitive diagnostic models

Within the DCM literature, assumptions about how skills (or cognitive/psychological processes) affect test performance are executed through the selection of a specific DCM. Depending on the specific DCM chosen, the modeling family can be defined as either the noncompensatory models or the compensatory models (Henson, Templin, & Willse, 2009). Within the noncompensatory genre, the models assume that one cannot “make up” for nonmastery of some attributes by mastery of others. On the other hand, compensatory models allow an individual to “make up” for what is lacking in one skill by having mastered another. The assumptions for either compensatory or noncompensatory models can be hard to test. Log-linear CDM (LCDM; Henson et al., 2009) or similar advanced models, however, can avoid such strong assumptions. To introduce LCDM, this section starts with a general description of a DCM.

Given that a DCM is essentially a constrained latent-class model (Day, 1969; Titterington, Smith, & Makov, 1985; Wolfe, 1970), mathematically it can be defined as

$$ P\left({\boldsymbol{Y}}_p={\boldsymbol{y}}_{\boldsymbol{p}}\right)=\sum \limits_{c=1}^C\left({v}_c\prod \limits_{i=1}^I{\pi}_{ci}^{y_{pi}}{\left(1-{\pi}_{ci}\right)}^{1-{y}_{pi}}\right), $$
(1)

where yp = (yp1, yp2,  … , ypI) is the binary response vector of person p on a test comprising I items, and element ypi is the response on item i. The variable vc is the probability of membership in latent class c, and πpi is the probability of a correct response to item i by person p in the class. Converted from Eq. 1, the log-likelihood function for the model of N persons can be expressed as

$$ L=\sum \limits_{p=1}^N\log \left\{\sum \limits_{c=1}^C\left({v}_c\prod \limits_{i=1}^I{\pi}_{ic}^{y_{pi}}{\left(1-{\pi}_{ic}\right)}^{1-{y}_{pi}}\right)\right\}. $$
(2)

Furthermore, Eq. 2 can be converted to

$$ L=\sum \limits_{p=1}^N\log \left\{\sum \limits_{c=1}^C\left(\exp \left(\log \left({v}_c\right)+\log \left[\prod \limits_{i=1}^I{\pi}_{ic}^{y_{pi}}{\left(1-{\pi}_{ic}\right)}^{1-{y}_{pi}}\right]\right)\right)\right\} $$
(3)

where \( \log \left[\prod \limits_{i=1}^I{\pi}_{ic}^{y_{pi}}{\left(1-{\pi}_{ic}\right)}^{1-{y}_{pi}}\right] \) is replaced with \( \sum \limits_{i=1}^I\log \left[{\pi}_{ic}^{y_{pi}}{\left(1-{\pi}_{ic}\right)}^{1-{y}_{pi}}\right] \), due to the mathematical properties of the log operation.

Assuming that the number of attributes is D, the mastery status of the attributes for a random person is denoted by α= (α1, α2,  … , αD), where the elements in α are expressed as binary values (i.e., the ath attribute, αa, is either 1 or 0). In total, there are 2D possible attribute profiles (i.e., classes). To illustrate, a person p with a profile α= (1, 1, 1, 0) has mastered all but the last of four dimensions. Recent advances in modeling development have produced general diagnostic model variants—for example, LCDM and, equivalently, the generalized deterministic input, noisy “and” gate model (G-DINA; de la Torre, 2011). Throughout this article, the general diagnostic model is referred to as an LCDM, for the sake of consistency. An LCDM provides great flexibility by subsuming most latent variables, enabling both additive and nonadditive relationships between skills and questions simultaneously, as well as combining with other psychometric models to allow even greater insight. The work of Rupp, Templin, and Henson (2010, p. 163) showed that LCDMs can be converted into core DCMs such as the deterministic input, noisy “and” gate model (DINA; Junker & Sijtsma, 2001); the noisy input, deterministic “and” gate model (NIDA; Junker & Sijtsma, 2001); and the reparameterized unified model (RUM; Hartz, 2002), whereas examples of disjunctive models include the deterministic input, noisy “or” gate model (DINO; Templin & Henson, 2006). Given the aforementioned advantages, the LCDM is introduced here and will be used in the following sections.

As is illustrated in Table 1, a Q-matrix of size I ∗ D is necessary for an LCDM, where, again, I and D are the numbers of items and attributes, respectively. The (i, a) element of the Q-matrix qia is 1 when item i measures attribute a, and 0 otherwise. Given that person p’s attribute profile is αc, the conditional probability of item i can be stated as

$$ {\pi}_{ic}=P\left({y}_{\mathrm{i}p}=1\right|{\boldsymbol{\upalpha}}_{\boldsymbol{c}}\Big)=\frac{\exp \left({\lambda}_{i,0}+{\boldsymbol{\lambda}}_i^T\boldsymbol{h}\left({\boldsymbol{\upalpha}}_{\boldsymbol{c}},{\boldsymbol{q}}_{\boldsymbol{i}}\right)\right)}{1+\exp \left({\lambda}_{i,0}+{\boldsymbol{\lambda}}_i^T\boldsymbol{h}\left({\boldsymbol{\upalpha}}_{\boldsymbol{c}},{\boldsymbol{q}}_{\boldsymbol{i}}\right)\right)}, $$
(4)

where qi is the set of Q-matrix entries for item i, λi, 0 is the intercept parameter, λi represents a vector of size (2D − 1) ∗ 1 that contains the main effect and interaction effect parameters for item i, and h(αc, qi) is a vector of size (2D − 1) with linear combinations of αc and qi. Particularly, \( {\boldsymbol{\lambda}}_i^T\boldsymbol{h}\left({\boldsymbol{\upalpha}}_{\boldsymbol{c}},{\boldsymbol{q}}_{\boldsymbol{i}}\right) \) can be expanded to

$$ {\boldsymbol{\lambda}}_i^T\boldsymbol{h}\left({\boldsymbol{\upalpha}}_{\boldsymbol{c}},{\boldsymbol{q}}_{\boldsymbol{i}}\right)=\sum \limits_{a=1}^D{\lambda}_{i,1,(a)}{\upalpha}_{ca}{q}_{ia}+\sum \limits_{a=1}\sum \limits_{a^{\prime }>1}^D{\lambda}_{i,2,\left(a,{a}^{\prime}\right)}{\upalpha}_{ca}{\upalpha}_{c{a}^{\prime }}{q}_{ia}{q}_{i{a}^{\prime }}+\dots \kern0.5em $$
(5)

where λi, 1, (a) and \( {\lambda}_{i,2,\left(a,{a}^{\prime}\right)} \) are, respectively, the main effect for the ath attribute αa and a two-way interaction effect for αa and \( {\alpha}_{a^{\prime }} \). Since the elements of αc and qi are binary, h(αc, qi) contains binary elements that indicate effects that are the estimates of interest. For an item measuring n attributes, n-way interaction effects should be specified in h(αc, qi). Table 2 provides a sample of a measure with three attributes: the first item, measuring one attribute only (i.e., α1), has two estimates, whereas the third item, which is associated with all of the given attributes, contains eight estimates.

Table 2 A three-item sample of expressions for a log-linear cognitive diagnosis model

In practice, an estimating LCDM refers to the expectation maximization (EM) algorithm (Bock & Aitkin, 1981) that maximizes the marginal likelihood; this is the most commonly seen algorithm in the DCM literature. Similarly, Jiang and Ma (2018) deployed the differential evaluation optimizer for LCDMs in the EM framework. In addition to the EM algorithm, Markov chain Monte Carlo (MCMC) techniques can be used to estimate the LCDM, but to date this algorithm has been applied only to simpler DCMs, such as the DINA model (da Silva, de Oliveira, Davier, & Bazán, 2017). Given that their updating procedures are based on random sampling, MCMC approaches tend to be slower than the EM algorithm (Patz & Junker, 1999). This study focuses on the EM algorithm, due to its practicality and popularity. Traditionally, a researcher specifies an LCDM within a likelihood function framework, as Eqs. 1 to 5 demonstrate. For instance, de la Torre (2011) described a marginalized maximum-likelihood estimation for estimating a GDINA model with the EM framework (which can be converted to an LCDM), and Templin and Hoffman (2013) estimated LCDMs using Mplus (L. K. Muthén & Muthén, 2013), which deploys an accelerated EM algorithm. These two likelihood-based approaches offer estimates of both items and respondents’ attributes, as well as statistical inferences, and therefore provide strong interpretability that aligns item features and respondent performance in a common scale (i.e., one can find the impacts of attribute changes on the item responses). Alternatively, Jiang and Carter (2018a) have described a Bayesian approach to handle the estimation that yields results similar to those yielded by EM algorithms. However, Chiu and Douglas (2013) proved that when the sample size is small, assigning attributes correctly becomes difficult for likelihood-based approaches; instead, they proposed the nonparametric classification (NPC) method, which was further extended to a general nonparametric classification (GNPC) method by Chiu, Sun, and Bian (2018) in order to handle LCDMs, such that attribute accuracy can be improved when the sample size is small. NPC and GNPC, however, remain less known to date, such that these utilities need further examinations. In this article, only the EM likelihood-based approaches are considered, due to the prevalence of these approaches (see Li, Hunter, & Lei, 2016; Ravand, 2016; Svetina, Dai, & Wang, 2017; and Templin & Bradshaw, 2014, for examples). This article introduces a new EM-based approach for estimating the attributes with a higher accuracy level. The following section depicts the proposed approach.

The iterative latent-class analysis (ILCA) approach

As the name suggests, ILCA is based on sequential latent-class modeling estimations that update attributes in order repeatedly. To define the algorithm, A is the matrix recording all respondents’ attributes; for example, in a simplified two-attribute model with two respondents only,\( \boldsymbol{A}=\left[\begin{array}{cc}0& 1\\ {}1& 1\end{array}\right] \) means that the first respondent masters only the second attribute, whereas the second respondent masters both attributes. Let Ya be the response matrix comprising items measuring attribute a, and Aa be the attribute matrix without an ath column. To estimate respondents’ ath attributes, this method begins by joining Ya and Aa, to form a new response matrix Xa (i.e., Xa = [Ya | Aa]); that said, given that Ya=\( \left[\begin{array}{ccc}1& 0& 1\\ {}1& 0& 0\end{array}\right] \) and \( {\boldsymbol{A}}^{-a}=\left[\begin{array}{c}0\\ {}1\end{array}\right] \), then \( {\boldsymbol{X}}^a=\left[\begin{array}{cccc}1& 0& 1& 0\\ {}1& 0& 0& 1\end{array}\right] \). Xa is then fed to a latent-class analysis (LCA), which essentially performs unsupervised clustering. The LCA is conducted via the EM algorithm, in which the log-likelihood function is identical to Eq. 2, except that the response matrix is Xa instead of Y. The derivation details about implementing the EM algorithm in the LCA can be found in Bartholomew, Knott, and Moustaki (2011) and B. Muthén and Shedden (1999). The summarized steps are provided below.

Assume that all \( \widehat{\boldsymbol{\pi}} \)s were estimated at the last iteration, t–1, and therefore are treated as known parameters, the probability for a respondent H(C = c | Yp=yp) can be obtained via \( \frac{v_c^{t-1}\prod \limits_{i=1}^I{\pi}_{ip}^{y_{ip}}{\left(1-{\pi}_{ip}\right)}^{1-{y}_{ip}}}{\sum \limits_{c=1}^C{v}_c^{t-1}\prod \limits_{i=1}^I{\pi}_{ip}^{y_{ip}}{\left(1-{\pi}_{ip}\right)}^{1-{y}_{ip}}} \), which leads to a new \( \widehat{v_c^t} \) calculated with the expression \( \frac{\sum \limits_{p=1}^NH\left(C=c\ |\ {\boldsymbol{Y}}_{\boldsymbol{p}}={\boldsymbol{y}}_{\boldsymbol{p}}\right)}{N} \). Note that πip is essentially πic, since the respondent p is addressed at the condition of the class c. Following from the idea that the estimated parameters from the previous iteration are treated as known, \( {\widehat{\pi}}_{ic} \) for the respondent p can be obtained from \( \frac{\sum \limits_{p=1}^N{y}_{ip}H\left(C=c\ |\ {\boldsymbol{Y}}_{\boldsymbol{p}}={\boldsymbol{y}}_{\boldsymbol{p}}\right)}{\sum \limits_{p=1}^NH\left(C=c\ |\ {\boldsymbol{Y}}_{\boldsymbol{p}}={\boldsymbol{y}}_{\boldsymbol{p}}\right)} \), where yp is the complete responses of respondent p. This step is essentially aggregating the posterior class estimates of the respondents to obtain the class probabilities of the structure model at the current iteration; at this point, an iteration is completed. This intertwined process runs until a certain convergence criterion is met—for example, when a change of the log-likelihood value is lower than .001.

For each attribute vector, ILCA deploys the aforementioned EM algorithm, such that respondents are classified into two classes (i.e., master and nonmaster). Once an LCA is achieved for the ath attribute, the clustering results of attribute a are taken into the sequential LCA for different attributes. This iterative LCA stops either when the estimated attribute matrix \( \widehat{\boldsymbol{A}} \) doesn’t change from the previous iteration or the iteration number reaches a predefined value. Like other mixture models, the LCA estimation faces the risk of local maxima (Jin, Zhang, Balakrishnan, Wainwright, & Jordan, 2016). To reduce the problem, multiple sets of initial values can be fed to the ILCA, and the set of the results with the largest log-likelihood value can be used as the final estimates. Let ITER be the predefined iteration number; the pseudo-codes are outlined in Table 3,Footnote 3 where G(+) and G(–) are symbols used to represent two groups/clusters estimated via the LCA. Note that Steps 3 and 4 in the inner for loop are implemented to eliminate the label-switching problem. That is, theoretically, the group mastering required attributes should perform better than the other group. The steps listed in the pseudo-code blocks are straightforward: The LCA iterates by using the results from the last iteration to treat both the estimated attributes and the associated items as (known) predictors.

Table 3 The iterative latent class analysis approach pseudo-codes

Why are Ya and Aa used to estimate the ath attributes via the LCA? The regular LCA for the ath attribute estimation is based upon Ya by its natural definition. In addition, the columns of Aa (i.e., the other attributes) are presumably informative as to the ath attributes to some degree because, theoretically, the attributes should be correlated. Y−a, on the other hand, may not contain much information useful to the estimation because it doesn’t directly associate with the ath attribute, such that it has a high risk of introducing noise. To examine the utility of the ILCA approach and compare it with the likelihood-based approach, a simulation study is described in the coming section.

Simulation study

Simulation design

Simulations were conducted to evaluate the performance of the ILCA approach by comparing it with the EM algorithm (referred to as the likelihood-based approach from now on) when the data conformed to the LCDM. The simulation conditions were varied by manipulating (1) the number of attributes D = [3, 5], (2) the number of items I = [30, 50], (3) the number of respondents N = [30, 50, 100, 500], (4) the item quality level ω = [“Low,” “Medium,” “High”], and (5) the shape of the latent-variable profile S = [“Flat,” “Nonflat”]. Similar simulation settings can be found in Chiu, Sun, and Bian (2018), as well as Templin and Bradshaw (2014). In total, there were 2 × 2 × 4 × 3 × 2 = 96 conditions. The levels of D, I, and N are straightforward and therefore need no further clarifications. When there were three attributes (D = 3), a balanced Q-matrix was utilized in which each item measured either one or two attributes; similarly, in the condition with five attributes, each item measured two to three attributes. To avoid situations in which the attributes were not associated with any item, a constraint that each attribute should be measured at least three times was imposed when generating the Q-matrix. The levels of ω reflected the responding power of items: how the changes in the required attributes influenced respondents’ responses. When ω = “Low,” the main effects were all set to a value of 0.2, and the interactions were all set to a value of 0.1. In the same order, these two parameters were set to 1 and – 0.5 when ω = “Medium,” and 3 and – 1.5 when ω = “High.” These settings were similar to Chiu et al.’s. Given known main and interaction effects, the intercepts were set to −0.5 · \( {\boldsymbol{\lambda}}_i^T\boldsymbol{h}\left({\boldsymbol{\upalpha}}_{{\mathbf{2}}^{\boldsymbol{D}\prime }},{\boldsymbol{q}}_{\boldsymbol{i}}\right) \), with \( {\boldsymbol{\upalpha}}_{{\mathbf{2}}^{\boldsymbol{D}\prime }} \) = 1. S described the relations between attributes. The “Flat” shape indicated that the probabilities of each latent class were equal (e.g., the probability for any one of the eight classes when D = 3 was .125). That said, the attributes were independent. The “Nonflat” shape, meaning that attributes were correlated with each other, was achieved through more steps: Continuous values were initially generated from a multinormal distribution MV (0, Σ), of which the diagonal elements of Σ were constrained to be 1 and the off-diagonal values were randomly drawn from a uniform distribution ranging from .6 to .9, and these continuous values were further converted onto the binary scale by comparing the values with zero (i.e., 1 if the value was larger than zero, and 0 otherwise).

The attribute accuracy was recorded in two ways: (1) the average classification rate of all individual attributes (this result is labeled as γ1) and (2) the attribute profile classification rate (this result is labeled as γ2). That is, the first measure examined the accuracy of attribute a for a = [1, . . . , D], whereas the latter measure focused on the latent-class level.

Software and hardware

The simulation study was conducted in the R environment (R Core Team, 2018). Specifically, the likelihood-based estimation approach was implemented through the CDM package (George, Robitzsch, Kiefer, Groß, & Ünlü, 2016); alternatively, one could choose the package GDINA (Ma & de la Torre, 2018) or Latent Gold (Vermunt & Magidson, 2005; see DeCarlo, 2018, for details), which offer functionalities similar to those of CDM. On the other hand, the proposed approach deployed the poLCA package (Linzer & Lewis, 2011) to execute the LCA. Each condition was replicated 50 times on a cloud computing facility with 32(64) [i.e., four 8(16)-core processors] AMD Opteron “bulldozer” processors and 64 GB RAM.

Results

The complete results are presented in Table 4. Overall, the ILCA approach outperformed the likelihood-based one substantially. The minimum, the mean, and the maximum of γ1 for the likelihood-based approach were .52, .65, and .95, respectively, where the values were .52, .73, and .98 for the ILCA approach. In nine out of the 96 conditions, the ILCA approach yielded higher γ1 than the likelihood-based approach. As compared with γ1, γ2 had larger variability, since matching the entire profile is generally harder than matching a single attribute. Following the same order, the three aforementioned statistics were .03, .22, and .87 for the likelihood-based approach, and .04, .28, and .90 for the ILCA approach. There was only a 9.6% chance that a larger γ2 would be yielded by the likelihood-based approach.

Table 4 Simulation results

When D was small (i.e., 3), both γ1 and γ2 were more accurate than in the conditions with five attributes: the means of γ1 were .69 and .74 for the likelihood-based and ILCA approaches, respectively, which became .60 and .73 when D = 5. This pattern was more obvious for γ2, in that, when D = 3, the means were .34 and .39 for the likelihood-based and ILCA approaches, respectively, and dropped to .09 and .26 when D increased to 5. Across all conditions, when D increased from 3 to 5, the standard deviations (SDs) of γ1 and γ2 decreased for both approaches; the mean sizes of the SD decrements were .10 and .16, meaning that the accuracy had less variability when D was larger. When D varied, ILCA was more robust when the five quantiles of the accuracy changes were below .05 for γ1 and below .23 for γ2, where the same statistics were .19 and .60 for the likelihood-based approach.

The sample size N influenced the simulation results with a tempering trend when N < 500. The means of γ1 for the likelihood-based approach were .64, .64, .63, and .69 for the four sample-size levels, where the values obtained via the ILCA approach were .73, .73, .73, and .76. This pattern, again, could be found in the results for γ2. However, the leaps of γ2 from N = 100 to N = 500 were relatively large: (1) the means were boosted from .19 to .30 for the likelihood-based approach and from .30 to .42 for the ILCA approach, and (2) the standard deviations increased from .16 to .27 for the likelihood-based approach and from .17 to .26 for the ILCA approach. These findings reaffirmed the robustness of the ILCA approach when the sample size varied.

The item number I didn’t show substantial main effects in either accuracy measure. On the other hand, the item quality ω did make a great difference, as Fig. 1 shows (see Jiang & Carter, 2018b, for visualization details). When ω = “Low,” the variability of the accuracy measures for the likelihood-based approach was much lower: The maximums for γ1 and γ2 were only .58 and .21, respectively. The ILCA approach, on the other hand, could still yield some accurate results in conditions of low item quality. As can be seen in the figure, the item quality was an important driving factor in attribute accuracy. When ω = “High,” the ILCA approach had a 75% chance for γ1 > .8 and γ2 > .3, where the chances for the likelihood-based approach were around 25% and 50%, respectively. Finally, I describe the attribute shape S. When S = “Flat,” both approaches produced more accurate results than in the “Nonflat” conditions.

Fig. 1
figure 1

Influence of item quality

Discussion

This article has proposed a straightforward approach—ILCA—to estimate attributes within an LCDM framework. As compared with traditional likelihood-based estimation, the ILCA approach yields substantially higher accuracy, such that it is more appropriate to use it for producing performance/score reports. Performing LCA via the EM algorithm is generally fast, and therefore ILCA, which essentially is chaining LCA for each attribute, does not demand substantial computing capacity. However, it may slow down the speed if a large number of initial value sets are fed into the ILCA approach; a solution to this problem would be to use the package doParallel (Calaway, Microsoft Corp., Weston, & Tenenbaum, 2017), which executes parallel computing such that simultaneous operations can be substituted for unnecessary loops, and therefore the computation time can be reduced. Although this was not the primary concern of this article, ILCA is slower than the EM algorithm, especially when the number of attributes is small and the modeling complexity is low. For instance, the average computing times for the conditions with D = 3 were 11 and 34 s for the EM and ILCA algorithms, respectively. However, these times became 232 and 313 s when D = 5.

The simulation study showed that when attributes are independent (i.e., the shape is flat), attribute accuracy is higher than with correlated attributes. The subscore literature has shown that in practice, however, the attribute correlation is rarely zero; instead, in many situations the correlation level can be higher than .90 (see Haberman, 2008, and Sinharay, Haberman, & Puhan, 2007, for details). If the true attribute correlation level is above .9, the attribute accuracy is expected to be lower, according to these simulation results. However, if this situation indeed happens, the concern becomes whether it is necessary to distinguish the attributes, given that theoretically they can be collapsed into one attribute. Discussion of the theoretical definitions of subscore and attribute/latent variable relations can be found in Sinharay, Puhan, and Haberman (2011).

Although ILCA can be used to estimate the attributes, it does not provide item-level information as the traditional approaches do. In some situations, especially for large-scale (and standardized) assessments, item parameters are important, since they can be used for linking different forms, anchoring different groups, and other similar functions that allocate subjects into a common scale (Vale, 1986). In addition, item parameters are useful indicators for item bank construction, item feature verification, computerized adaptive-testing design, and theoretical framework investigations. Without providing item-level information, the ILCA approach cannot offer the aforementioned rich features that the traditional approaches do. However, if item parameters are not the primary concern, which might be the case in some informal and/or small-sample-size settings, the ILCA approach can be adopted, since it produces more accurate assessment results. Essentially, these results are often treated as the core component of score reports: Classifying respondents into the right groups is critical, so that further treatments and/or interventions can be made accordingly and precisely. To emphasize, ILCA is not recommended in every possible CDM application, especially with large-scale assessments; whether to use it depends on the purpose of the work.

Although ILCA is based on sequential parametric approaches, as a whole it is not necessarily a parametric approach. Therefore, it can’t provide inferential statistics to inform one as to the uncertainty of the estimates. To address this drawback, future research should focus on adopting bootstrapping or other nonparametric techniques to estimate the uncertainty. In addition, this article has only examined binary-type attributes and binary responses. Recently the DCM literature has developed beyond binary-scale studies; for example, Chen and de la Torre (2013) described models for polytomous attributes, and Ma and de la Torre (2016) proposed a sequential cognitive diagnosis model for polytomous item designs. Therefore, future ILCA works can focus on extending the technique to polytomous designs and investigate how it performs when it comes to an ordinal scale (Liu & Jiang, 2018). In addition, features such as response times and population-level variables in multiple contexts are not considered here (Jiao, Zhan, Liao, & Man, 2018). Future studies should integrate ILCA into those settings of rich information. Furthermore, how ILCA can be incorporated into a longitudinal setting (especially ones with intervention effects; Madison & Bradshaw, 2018) should be investigated in follow-up research. Last but not least, ILCA is likely to be used to conduct group comparisons, in which the invariance of the proposed approach can be tested (Shi, Song, Liao, Terry, & Snyder, 2017; Shi, DiStefano, McDaniel, & Jiang, 2018).