An iterative two-step method for online item calibration in CD-CAT

Yu, Xiaofeng; Cheng, Ying

doi:10.3758/s13428-022-02036-7

An iterative two-step method for online item calibration in CD-CAT

Published: 29 December 2022

Volume 56, pages 233–257, (2024)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

An iterative two-step method for online item calibration in CD-CAT

Download PDF

473 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The development and maintenance of the item bank is a critical element to a CD-CAT (cognitive diagnostic computerized adaptive testing; Cheng, 2009) system. For continuous testing, it is important to replenish the item bank with new items that have been calibrated. This requires pretesting to estimate the parameters of the new items. For CD-CAT, the structural parameters that need to be estimated include both item parameters and attribute vectors. In this paper, we propose three residual-statistic-based methods: RMA, ROEM, and RMEM, to estimate the attribute vectors and item parameters all together for new items. An iterative two-step online calibration procedure is developed to estimate the attribute vectors for the new items in the first step, and estimate the item parameters in the second step, then proceed iteratively until convergence is reached. An extensive simulation study was conducted to evaluate the performance of the three proposed methods and compare them with two existing methods, namely the Joint Estimation Algorithm (JEA; Chen & Xin, 2011) and Single Item Estimation (SIE; Chen et al., 2015) methods. In terms of the estimation of the attribute vector, the RMEM method performs the best in most of the cases. In terms of item parameter estimation, RMEM still has some advantages, and RMA outperforms JEA and SIE. Taken together, results suggest that the RMEM is superior to the other methods, especially when sample size is relatively small. A real-data example is provided to illustrate the application of RMEM in practice.

Recent Developments in Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT): A Comprehensive Review

New Item-Selection Methods for Balancing Test Efficiency Against Item-Bank Usage Efficiency in CD-CAT

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Article 25 November 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Cognitive diagnostic computerized adaptive testing (CD-CAT; Cheng, 2009; McGlohen & Chang, 2008) is computerized adaptive testing (CAT) built upon a cognitive diagnostic model (CDM; Rupp & Templin, 2008; Rupp et al., 2010). Cognitive diagnostic models (CDMs) are considered important statistical tools that link item responses to latent cognitive profiles, which capture the strengths and weaknesses of each respondent in terms of their mastery of discrete knowledge points or attributes. Hence, testing programs built on CDMs have both features of model-based measurement and formative assessment (Embretson, 2001).

In a typical adaptive testing system, items are sequentially selected from an item bank, tailored to each respondent according to certain item selection rules, for example, maximizing test information or minimizing the standard error of measurement of the latent trait. In CD-CAT, the goal is to efficiently estimate the latent cognitive profiles by sequentially choosing the most suitable items for each candidate (Cheng, 2009; Dai et al., 2016; Yu et al., 2019; Zheng & Chang, 2016; Zheng & Wang, 2017). Given a well-designed item bank, continuous testing can be offered through CD-CAT, which means that efficient formative assessment can be provided to students continuously.

In real applications, any CAT systems that offer continuous testing need to replenish their item banks periodically. This is because repeated use of items may pose a risk to test security and validity. Therefore, retiring flawed, obsolete, or overexposed items and replacing them with new items that have been calibrated, a process called item replenishment, is important for continuous testing (Chen et al., 2012; Chen et al., 2015; Chen & Xin, 2011; Ren et al., 2017). For this reason, new items constantly need to be developed, reviewed, and calibrated for CAT programs.

Online calibration in CAT refers to estimating the parameters of new items that are administered to respondents during the course of their operational testing along with previously calibrated items (Wainer & Mislevy, 2000). Ren et al. (2017) pointed out several main advantages of online calibration. First, new items are calibrated under the exact same condition as for their future operational use. Second, the item parameters of the new items are calibrated on the same scale as the operational items, which means linking or rescaling is no longer required. Commonly used methods that have been proposed to calibrate new items include Method A and Method B (Stocking, 1988), marginal maximum likelihood estimation with one expectation maximization (OEM) iteration (Wainer & Mislevy, 2000), and marginal maximum likelihood estimation with multiple EM (MEM) iterations (Ban et al., 2001; Ban et al., 2002).

New items for CD-CAT need to be calibrated in terms of both item parameters and the attribute vectors. In contrast, in traditional CAT, item calibration only refers to the estimation of item parameters. Thus, it is even more challenging to conduct online item calibration for CD-CAT than regular CAT. Chen et al. (2012) considered the online calibration of only the item parameters in CD-CAT and proposed three methods, namely Cognitive Diagnostic-Method A (CD-MA), Cognitive Diagnostic-One EM cycle (CD-OEM), and Cognitive Diagnostic-Multiple EM cycle (CD-MEM). These methods assume known attribute vectors and are analogs to methods described in the preceding paragraph. For online calibration of both item parameters and attribute vectors, literature is relatively scarce. Chen and Xin (2011) proposed a joint estimation algorithm (JEA), which considered jointly estimating the attribute vectors and the item parameters based on the DINA (Deterministic Input, Noisy output “AND” gate; see Junker & Sijtsma, 2001; de la Torre, 2009) model. Their results indicated the JEA can have a promising performance. Chen et al. (2015) considered two Bayesian variations of JEA: the SIE (Single Item Estimation), and the SimIE (Simultaneous Item Estimation) method. As their names suggest, in SIE a single new item is calibrated at a time, while in SimIE multiple new items are calibrated at a time. With a sample size larger than 800, Chen et al. (2015) showed that SIE and SimIE methods perform better than the JEA method in the estimation of both attribute vectors and the item parameters. Due to their iterative nature, SIE and SimIE showed very similar performances in estimating attribute vectors and item parameters. For all three methods, JEA, SIE, or SimIE, the estimation of the item parameters is highly dependent on the estimation of the attribute vectors. However, if the sample size is relatively small (e.g., 400 or fewer), item parameters cannot be estimated well even with known attribute vectors, let alone with unknown attribute vectors (Chen et al., 2015).

Given the limitations of existing methods, in this paper we propose an iterative two-step procedure to estimate both attribute vectors and item parameters with relatively small sample sizes. First, we propose to use a residual-based statistic to estimate the attribute vectors in the context of CD-CAT. This step does not require known or precisely estimated item parameters. In the second step, we treat the estimated attribute vector as true, and estimate the item parameters based on CD-Method A, CD-OEM, or CD-MEM. The procedure proceeds iteratively until convergence is reached.

The rest of this paper is organized as follows. First, we provide a literature review for the existing methods on this topic, which involves two main lines of research: online calibration of the item parameters only, and online calibration of both the item parameters and attribute vectors. Next, we introduce in detail a new method of attribute vector estimation using a residual-based statistic, and the iterative two-step procedure for estimating both item parameters and attribute vectors. A simulation study to assess the performance of the proposed estimation methods is then described. A real-data analysis is provided to illustrate the application of RMEM in practice. Discussions and implications of the results are given in the last section.

Online calibration methods in CD-CAT

In this section, we briefly review several existing methods. For the sake of convenience but without loss of generality, we first introduce the following terms and notations that will be used throughout the remainder of the paper. As discussed earlier, new items refer to the items whose attribute vectors and item parameters are unknown, in contrast to the operational items that have been previously calibrated in the item bank. Let’s assume an existing item bank with J operational items. Meanwhile, the item parameters and attribute vectors of M new items need to be estimated. Consider a CD-CAT that targets a total of K attributes. Each of the J operational items require a distinct subset of the K attributes (denoted as q_j, j = 1, 2, …, J) for them to be answered correctly. The stacked q_js form the item-attribute associations matrix for the item bank, namely the Q-matrix, which is a binary J × K matrix. The Q-matrix for the m new items is denoted as Q_new. The mastery status of each of N test takers is captured by α_i (i = 1, 2, …, N), the attribute mastery pattern vector or, AMP. L refers to the fixed test length, and a N × L matrix X denotes the item response matrix with its binary element X_ij, with X_ij = 1 indicating a correct response of test taker i on item j, and X_ij = 0 an incorrect response. Let n_m be the total number of respondents responding to the m^th new item.

As a parsimonious and popular CDM model, the DINA model is used here as an example (de la Torre, 2009). An expected or ideal response under the DINA model is characterized by an indicator variable, denoted as ${\eta}_{ij}={\prod}_{k=1}^K{\alpha}_{ik}^{q_{jk}}$, which is used to indicate whether the i^th respondent possesses all the required attributes of the j^th item or not. Unexpected responses are accounted for by the slipping and guessing parameter, where s_j = P(X_ij = 0| η_ij = 1) and g_j = P(X_ij = 1| η_ij = 0), respectively. The probability of a correct response to the j^th item by the i^th respondent under the DINA model is therefore defined as

$$P\left({X}_{ij}=1|{\boldsymbol{\alpha}}_i\right)={\left(1-{s}_j\right)}^{\eta_{ij}}{g_j}^{1-{\eta}_{ij}}.$$

(1)

For a new item m, its attribute vector q_m and item parameters (s_m, g_m) are of key interest in online calibration.

Online calibration of item parameters

The following three methods are based on the assumption that the attribute vectors of the new items are known (i.e., q_m’s are available, perhaps through content experts who label each item for the attributes they measure), and only their item parameters need to be estimated.

CD-Method A For a new item m, suppose that there are n_m respondents responding to the item. The CD-method A treats the estimated AMP ${\hat{\boldsymbol{\alpha}}}_i$ as the true α_i, which was obtained based on the operational items answered by the i^th respondent, then estimates the slipping and guessing parameters through maximum likelihood (de la Torre, 2009).

$$\frac{\partial {l}_m}{\partial {\textrm{s}}_m}=0,$$

(2)

$$\frac{\partial {l}_m}{\partial {g}_m}=0,$$

(3)

where ${l}_m\left({\textbf{x}}_i|{\textbf{q}}_m,{s}_m,{g}_m\ \right)=\log \left(\prod_{i=1}^{n_m}{P}_{s_m,{g}_m}{\left({\textbf{q}}_m,{\hat{\boldsymbol{\alpha}}}_i\right)}^{x_{im}}{\left[1-{P}_{s_m,{g}_m}\left({\textbf{q}}_m,{\hat{\boldsymbol{\alpha}}}_i\right)\right]}^{1-{x}_{im}}\right)$ is the log-likelihood function, and q_m is the attribute vector for item m. x_im refers to the score of the m^th new item answered by the respondent i (0/1), and ${P}_{s_m,{g}_m}\left({\textbf{q}}_m,{\hat{\boldsymbol{\alpha}}}_i\right)$ refers to the response probability to new item m under the DINA model evaluated at ${\hat{\boldsymbol{\alpha}}}_i$.

CD-OEM. CD-OEM applies a single cycle of an EM algorithm (Chen et al., 2012; de la Torre, 2009) to estimate the item parameters for each new item. For the m^th new item, based on the posterior distribution of the AMPs, the CD-OEM method considers one E-step obtaining the expected proportion of respondents who have AMP ${\hat{\boldsymbol{\alpha}}}_v$ among those who answer the new item m, where ${\hat{\boldsymbol{\alpha}}}_v$ refers to one of the 2^K possible attribute profiles and $\sum_{v=1}^{2^K}{P}_m\left({\hat{\boldsymbol{\alpha}}}_v\right)=1$. Next, the M-step finds the ${\hat{s}}_m$ and ${\hat{g}}_m$ that maximize the logarithm of the corresponding expected likelihood.

CD-MEM By allowing multiple EM cycles, the CD-OEM becomes the CD-MEM. The first EM cycle in CD-MEM is the same as in the CD-OEM method, and the obtained item parameters and attribute vectors are regarded as the initial values of the second EM cycle. The CD-MEM method utilizes both the responses of operational items and new items to calculate the posterior distribution of the AMPs for the E-step from the second EM cycle onward, then fixes the item parameters of the operational items, and adopts the same M-step as that of the CD-OEM method to update the item parameters of the new items (refer to Chen et al., 2012 for further details). The EM cycles are repeated till a stop criterion is met.

Results of Chen et al. (2012) showed that CD-Method A, CD-OEM, and CD-MEM are able to recover item parameters accurately with large sample sizes, and CD-Method A performs the best when the items have smaller slipping and guessing parameters, but its performance is largely affected by the item parameter magnitude.

Online calibration of both item parameters and attribute vectors

The Joint Estimation Algorithm (JEA)

Based on the DINA model, Chen and Xin (2011) proposed the JEA to jointly estimate both the attribute vectors and the item parameters, which is the analog of the joint maximum likelihood estimation (JMLE; Baker & Kim, 2004) method in item response theory (IRT). As the extension of CD-Method A, the JEA treats the AMPs estimated from operational items as true, and then estimates the item parameters and the attribute vectors for the new items, one item at a time. For the m^th new item, the JEA maximizes l_m(q_m, s_m, g_m ) with respect to q_m given (s_m, g_m), then consider the estimated q_m as true and optimizes l_m(q_m, s_m, g_m ) with respect to (s_m, g_m). This is done iteratively until convergence is reached. Convergence can be defined as a very small difference of the log-likelihood between one iteration and the next.

To account for the uncertainty of the estimated AMPs, the SIE and SimIE are two Bayesian versions of the JEA.

The Single Item Estimation Method (SIE) Instead of plugging in the estimates of the AMPs of the respondents who answered the m^th new item, the SIE method considers the expected log likelihood

$${\displaystyle \begin{array}{l}\textrm{E}\left({l}_m\left({\textbf{x}}_i|{\textbf{q}}_m,{s}_m,{g}_m\ \right)\right)\\ {}=\sum\limits_{i=1}^{n_m}\sum\limits_{{\boldsymbol{\alpha}}_i}{\pi}_i\left({\boldsymbol{\alpha}}_i;{s}_m,{g}_m\right)\left[{x}_{im}\log {P}_{s_m,{g}_m}\left({\textbf{q}}_m,{\boldsymbol{\alpha}}_i\right)+\left(1-{x}_{im}\right)\left(1-\log {P}_{s_m,{g}_m}\left({\textbf{q}}_m,{\boldsymbol{\alpha}}_i\right)\right)\right],\end{array}}$$

(4)

where π_i(α_i; s_m, g_m) is the posterior distribution of α_i based on the operational items (in the first EM cycle), or both the operational items and new items (in the remaining EM cycles). By doing so, SIE takes the uncertainty of ${\hat{\boldsymbol{\alpha}}}_i$ into account. The SimIE further considers calibrating multiple new items at a time.

The Simultaneous Item Estimation Method (SimIE)

As noted by Chen et al. (2015), the more accurate the information about the AMP is, the better the calibration will be. Therefore, the motivation of the SimIE is to borrow some useful information from the new items to improve the estimation of the unknown AMPs. However, borrowing information from those inadequately calibrated items may have a detrimental effect on the estimation of AMPs. In order to address this issue, Chen et al. (2015) proposed an index, here denoted as ω_m (denoted as η_j in the original paper, but as ω_m here to avoid confusion), to evaluate the confidence in the fit of ${\hat{\textbf{q}}}_m$. ω_m was defined as the difference between the log-likelihood function for the two most probable ${{\hat{\textbf{q}}}_m}^{\prime }s$ for the m^th item. Half of the 95th percentile of the χ² distribution with one degree of freedom, i.e., 1.92, was chosen as the empirical cutoff for the “good” new items in Chen et al. (2015). Then treating the first chosen new item, which has the maximum ω_m and ω_m > 1.92, as an additional operational item, SimIE updates the posterior distribution of the AMP of the respondents based on all operational items, and recalibrates the second chosen new item. This process is repeated until all the chosen new items are treated as additional operational items. Then, new items which are not selected in the preceding step are calibrated one at a time. This is one estimation cycle. The algorithm proceeds until the chosen items do not change in two consecutive cycles.

Attribute vector estimation based on a residual-based statistic

In this section, we first briefly introduce the residual-based statistic (please refer to Yu and Cheng (2020) for more details) to measure the appropriateness of the attribute vector of an item. Then we present the theoretical proof that under the DINA model, the proposed residual-based statistic can be used to identify the true attribute vector of the m^th new item with arbitrarily chosen item parameters under certain assumptions. This may help liberate the dependency on large sample size for existing methods.

Let E(X_im| α_i) be the expected score for the i^th respondent with AMP α_i, and P(X_im = x_im| α_i), denoted by P(x_im| α_i) for short, be the probability for the respondent obtaining score x_im, x_im being 0 or 1. Then the appropriateness index of the attribute vector for the m^th item can be defined as

$${R}_m\left(\boldsymbol{\alpha}, {\textbf{q}}_m,{s}_m,{g}_m\right)=\sum_{i=1}^{n_m}\log {\left[\frac{x_{im}-E\left({X}_{im}|{\boldsymbol{\alpha}}_i\right)}{P\left({x}_{im}|{\boldsymbol{\alpha}}_i\right)}\right]}^2, or\ \sum_{i=1}^{n_m}\log \left|\frac{x_{im}-E\left({X}_{im}|{\boldsymbol{\alpha}}_i\right)}{P\left({x}_{im}|{\boldsymbol{\alpha}}_i\right)}\right|,$$

(5)

where α is a matrix of vertically stacked ${\boldsymbol{\upalpha}}_i^{\prime}\textrm{s},$ i.e., attribute profiles of those respondents who answered the m^th new item. The squared form is numerically two times the absolute form, so the performance of the method based on these two forms are equivalent. The squared form will be used in all our simulation conditions just for coding consistency. Under the DINA model, according to the values of η_im and the response x_im, each respondent is classified into one of the four groups, G₁, G₂, G₃ and G₄, where respondents in G₁ have η_im = 1 and x_im= 1, respondents in G₂ have η_im = 1 and x_im= 0, respondents in G₃ have η_im = 0 and x_im= 1, respondents in G₄ have η_im = 0 and x_im= 0, respectively. Hence, formula 5 can be expanded to

$${\displaystyle \begin{array}{l}{R}_m\left(\boldsymbol{\upalpha}, {\textbf{q}}_m,{s}_m,{g}_m\right)\\ {}=2\sum\limits_{i=1}^{n_m}\log \left[{\eta}_{im}{\left(\frac{s_m}{1-{s}_m}\right)}^{x_{im}}{\left(\frac{1-{s}_m}{s_m}\right)}^{1-{x}_{im}}+\left(1-{\eta}_{im}\right){\left(\frac{g_m}{1-{g}_m}\right)}^{1-{x}_{im}}{\left(\frac{1-{g}_m}{g_m}\right)}^{x_{im}}\right],\end{array}}$$

(6)

where ${\eta}_{im}=\prod_{k=1}^K{\alpha_{ik}}^{q_{mk}}$ is the ideal response of the i^th examine (with attribute profile α_i) to the m^th item (with attribute vector q_m). We expect that given $\hat{\boldsymbol{\alpha}}$ from operational items, ${R}_m\left(\hat{\boldsymbol{\alpha}},{\textbf{q}}_m,{s}_m,{g}_m\right)$ as a function of q_m is minimized when q_m is at its true value.

Theorem 1. Consider an infinite sample, that is N → ∞, and the true item parameters s_m, g_m ∈ (0, 0.5). Denote $\hat{\boldsymbol{\alpha}}$ as the estimate of α. Furthermore, assume its true value α is known in advance. Given the provisional item parameters for the m^th item as $\left({s}_m^0,{g}_m^0\right)$, where ${s}_m^0$, ${g}_m^0$ are two arbitrarily chosen real numbers within the range of (0, 0.5), and denote ${\hat{R}}_m^0\left(\boldsymbol{\alpha}, {\textbf{q}}_m,{s}_m^0,{g}_m^0\right)$ as the value of the residual-based statistic evaluated at $\left({s}_m^0,{g}_m^0\right)$, then ${\hat{R}}_m^0\left(\boldsymbol{\alpha}, {\textbf{q}}_m,{s}_m^0,{g}_m^0\right)$ reaches its minimum only when q_m is correctly specified.

Theorem 1 is the basis of our proposed iterative two-step online calibration method leveraging the residual statistic. According to the Theorem 1, we can obtain the attribute vector for each new item by arbitrarily assigning item parameters to it, e.g., ${s}_m^0=0.25$, ${g}_m^0=0.25$, and minimizing the residual statistic. In other words, it is not necessary to jointly estimate the attribute vector and the item parameters for each new item, and the vector of the new item can be obtained based on the fixed item parameters as long as α is known. Then one can estimate the item parameters based on the vector obtained in the preceding step. This is very useful for situations where existing joint estimation methods suffer, e.g., when the sample size is small, which is oftentimes the case for a diagnostic test. For conciseness, the proof is presented in Appendix A.

The iterative two-step online item calibration method

Based on the preceding theorem, we propose an iterative two-step method for online item calibration. A flow chart describing the procedure is presented in Fig. 1. As we can see, by fixing the new item parameters at 0.25 (or any value between 0 and .5), the estimated attribute vector for the m^th new item can be obtained based on the attribute profiles estimated from the responses of the operational items. In the second step, assume the estimated vector for each new item as true, the CD-MA, CD-OEM, and CD-MEM can be applied to calibrate item parameters as described in Chen et al. (2012). Accordingly, the resulting three variations of the iterative two-step online calibration methods based on the residual statistic are denoted as RMA, ROEM, and RMEM, respectively. Let $\hat{R}\left(\hat{\boldsymbol{\alpha}},{\hat{\boldsymbol{Q}}}_{\boldsymbol{new}},\hat{\boldsymbol{s}},\hat{\boldsymbol{g}}\right)$ denote the sum of the R for all new items, that is, $\hat{R}\left(\hat{\boldsymbol{\alpha}},{\hat{\boldsymbol{Q}}}_{\boldsymbol{new}},\hat{\boldsymbol{s}},\hat{\boldsymbol{g}}\right)=\sum_{m=1}^M{\hat{R}}_m\left(\hat{\boldsymbol{\alpha}},{\hat{\textbf{q}}}_m,{\hat{s}}_m,{\hat{g}}_m\ \right)$, and let ${\hat{R}}_{{\hat{\textbf{Q}}}_{new}}^t$ be the shorthand of $\hat{R}\left(\hat{\boldsymbol{\alpha}},{\hat{\boldsymbol{Q}}}_{\boldsymbol{new}},\hat{\boldsymbol{s}},\hat{\boldsymbol{g}}\right)$ in the t^th iteration. The iterative algorithm stops till the number of iterations reaches its prespecified maximum or the difference between two adjacent iterations, ${\hat{R}}_{{\hat{\textbf{Q}}}_{new}}^t$ and ${\hat{R}}_{{\hat{\textbf{Q}}}_{new}}^{t-1},\kern0.75em$ is smaller than a preset threshold.

The process of the calibration of the m^th item can be described as follows:

Step 1: Estimate the attribute vector for the m^th new item:

(1)
Obtain $\hat{\boldsymbol{\alpha}}$ of each examinee based on their responses to the operational items;
(2)
Assigning the initial slipping and guessing parameter as 0.25, estimate the attribute vector for each new item based on the proposed R statistic.

Step 2: Based on the estimated attribute-vectors obtained from the last step, apply the CD-MA, CD-OEM, or CD-MEM method to update the slipping and guessing parameters for the m^th item.

Two practical concerns arise when using the iterative two-step procedure in real applications. One is that the true AMPs are unknown, and the AMPs based on responses to operational items are used in their place. The other is that theorem 1 holds only when N → ∞. Therefore, robustness of the proposed procedure in presence of unknown AMPs and limited sample size remains to be examined. In order to evaluate the performance and the robustness of the proposed two-step method under the condition of unknown AMPs and a relatively small sample size, a simulation study is conducted. According to the results of Chen et al. (2015), SIE and SimIE have almost the same performance with sample sizes smaller than 1600. Since our main goal is to compare the online item calibration methods in the context of CD-CAT with a relatively small sample size, only the JEA, SIE, and the three residual-based methods are involved in the following simulation study. The purpose of this article is twofold: (a) to introduce three residual-based methods implemented in an iterative algorithm for online calibration in CDA, and (b) to examine how the performance of these methods compares to that of the JEA and SIE under a wide range of conditions by means of a simulation study.

Simulation study

Diagnostic assessment sees great promise in classroom assessment, which calls for considerations of a small sample size and short test length. Furthermore, the AMP distributions are most likely different for respondents in different classes. Therefore, in a comprehensive simulation study, we evaluate the performance of the proposed method under various conditions, e.g., different sample sizes, test lengths, distribution of AMPs, and proportion of the new items to the operational items. The performance of the proposed methods is compared against two existing methods, JEA and SIE. For each condition, the simulation is replicated 1000 times. The same as Chen et al. (2012), the number of attributes measured by the test is set as K = 6. Therefore, the number of possible AMPs is 2⁶ = 64. The comparison is made in terms of the accuracy of the estimation of the attribute vectors for the new items, slipping and guess parameters, as well as the respondents’ AMPs.

Sample Size

Six sample sizes (200, 400, 600, 800, and 1000) are considered. The first three are small sample sizes, and the last two are medium sample sizes.

Test Length

Three test lengths (20, 30, and 40) are considered. Each test consists of a certain number of operational items and new items, with the total test length being 20, 30, or 40. For each test length, the rate of new to operational items (denoted by λ) is 1:4, 1:3, or 1:2. For example, at the test length of 30, there could be six new and 24 operational items, or roughly eight new and 22 operational items, or ten new and 20 operational items.

Respondent Generation

We use a similar method to Chen et al. (2012) and Chen et al. (2015) to generate the AMPs of respondents. Two independent groups of respondents are simulated. The first group assumes each respondent has a 50% probability of mastering each attribute, i.e., all attributes are equally “difficult”. The second group assumes that the probability of mastery varies from one attribute to another. More specifically, the probability of mastery is set at 0.65, 0.25, 0.75, 0.45, 0.55, and 0.35 for attribute 1 to 6, where 0.65 and 0.75 refer to low difficulty, 0.45 and 0.55 refer to medium difficulty, and 0.25 and 0.35 refer to high difficulty.

Item Bank Generation

Similar to Chen et al. (2012) and Chen et al. (2015), two item banks are simulated based on the ranges of the item parameters. The slipping and guessing parameters are all randomly drawn from U(0.05, 0.25) for the first item bank, which feature items with high discrimination (Kaplan et al., 2015), and drawn from U(0.15, 0.35) for the second item bank, resulting in an item bank of low discrimination (Kaplan et al., 2015). A total of 360 items with the same Q-matrix as in Chen et al. (2012) are generated. Typically, high discrimination items involve less noise (as represented by slipping and guessing), and lead to better measurement outcomes.

New Item Generation

The same as Chen et al. (2012) and Chen et al. (2015), suppose the number of the new items as 20, which indicates that there are 20 items in the Q_new, the associated attribute vectors for them are randomly drawn from the operational item banks. The set of the new items will be drawn either from the low-discrimination bank or high-discrimination bank, denoted as New₁ or New₂, respectively. Table 1 presents detailed information of the new items.

Table 1 The settings of the new items

An iterative two-step method for online item calibration in CD-CAT

Abstract

Similar content being viewed by others

Recent Developments in Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT): A Comprehensive Review

New Item-Selection Methods for Balancing Test Efficiency Against Item-Bank Usage Efficiency in CD-CAT

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Online calibration methods in CD-CAT

Online calibration of item parameters

Online calibration of both item parameters and attribute vectors

The Joint Estimation Algorithm (JEA)

The Simultaneous Item Estimation Method (SimIE)

Attribute vector estimation based on a residual-based statistic

The iterative two-step online item calibration method

Simulation study

Sample Size

Test Length

Respondent Generation

Item Bank Generation

New Item Generation

Simulation of CD-CAT and Online Calibration

Update of the AMP

Evaluation Criteria

Person Pattern Accuracy Rate (PPAR)

Average Person Attribute Accuracy Rate (APAAR)

Root Mean Squared Error (RMSE)

Item Pattern Accuracy Rate (IPAR)

Item Attribute Accuracy Number (IAAN)

Results

Real data example

Conclusions and further discussion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Open practices statement

Publisher’s note

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation