Abstract
An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers’ ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
Similar content being viewed by others
References
Abdelbasit, K.M., & Plankett, R.L. (1983). Experimental design for binary data. Journal of the American Statistical Association, 78, 90–98.
Atchadé, Y.F., & Rosenthal, J.S. (2005). On adaptive Markov chain Monte Carlo algorithms. Bernoulli, 20, 815–828.
Berger, M.P.F. (1991). On the efficiency of IRT models when applied to different sampling designs. Applied Psychological Measurement, 15, 293–306.
Berger, M.P.F. (1992). Sequential sampling designs for the two-parameter item response theory model. Psychometrika, 57, 521–538.
Berger, M.P.F. (1994). D-optimal sequential sampling designs for item response theory models. Journal of Educational Statistics, 19, 43–56.
Berger, M.P.F., King, C.Y.J., & Wong, W.K. (2000). Minimax D-optimal designs for item response theory models. Psychometrika, 65, 377–390.
Berger, M.P.F., & van der Linden, W.J. (1991). Optimality of sampling design in item response theory models. In M. Wilson (Ed.), Objective measurement: theory into practice (pp. 274–288). Norwood: Ablex.
Berger, M.P.F., & Wong, W.K. (2009). Introduction to optimal designs for social and biomedical research. Chichester: Wiley.
Cai, L. (2010). Metropolis–Hastings Robbins–Monro algorithm for confirmatory factor analysis. Journal of Educational and Behavioral Statistics, 35, 307–335.
Chaloner, K., & Larntz, K. (1989). Optimal Bayesian design applied to logistic regression experiments. Journal of Statistical Planning and Inference, 21, 191–208.
Chang, Y.-C.I., & Lu, H.-Y. (2010). Online calibration via variable length computerized adaptive testing. Psychometrika, 75, 140–157.
Fedorov, V.V. (1972). Theory of optimal experiments. New York: Academic Press.
Fox, J.-P. (2010). Bayesian item response modeling. New York: Springer.
Gilks, W.R., Richardson, S., & Spiegelhalter, D.J. (1996). Introducing Markov chain Monte Carlo. In W.R. Gilks, S. Richardson, & D.J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 1–19). London: Chapman & Hall.
Johnson, V.E., & Albert, J.H. (1999). Ordinal data modeling. New York: Springer.
Jones, D.H., & Jin, Z. (1994). Optimal sequential designs for on-line item estimation. Psychometrika, 59, 59–75.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Erlbaum.
Makransky, G., & Glas, G.A.W. (2010). An automatic online calibration design in adaptive testing. Journal of Applied Testing Technology, 11, 1. Retrieved from http://www.testpublishers.org/mc/page.do?sitePageId=112031&orgId=atpu.
Mislevy, R.J., & Chang, H.-H. (2000). Does adaptive testing violate local independence. Psychometrika, 65, 149–156.
Patz, R.J., & Junker, B.W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146–178.
Patz, R.J., & Junker, B.W. (1999b). Applications and extensions of MCMC in IRT: multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342–366.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407.
Rosenthal, J.S. (2007). AMCMC: an R interface for adaptive MCMC. Computational Statistics & Data Analysis, 51, 5467–5470.
Silverman, B.W. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall.
Silvey, S.D. (1980). Optimal design. London: Chapman & Hall.
Stefanski, L.A., & Carroll, R.J. (1985). Covariate measurement error in logistic regression. The Annals of Statistics, 13, 1335–1351.
Stocking, M.L. (1990). Specifying optimum examinees for item parameter estimation in item response theory. Psychometrika, 55, 461–475.
van der Linden, W.J. (1988). Optimizing incomplete sampling designs for item response model parameters (Research Report No. 88-5). Enschede, The Netherlands: University of Twente.
van der Linden, W.J. (1994). Optimal design in item response theory: applications to test assembly and item calibration. In G.H. Fischer & D. Laming (Eds.), Contributions to mathematical psychology, psychometrics, and methodology (pp. 305–318). New York: Springer.
van der Linden, W.J. (1999). Empirical initialization of the trait estimator in adaptive testing. Applied Psychological Measurement, 23, 21–29. [Erratum, 23, 248].
van der Linden, W.J. (2005). Linear models for optimal test design. New York: Springer.
van der Linden, W.J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33, 5–20.
van der Linden, W.J. (2010). Sequencing an adaptive test battery. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 103–119). New York: Springer.
van der Linden, W.J., & Pashley, P.J. (2010). Item selection and ability estimation adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 3–30). New York: Springer.
Wingersky, M., & Lord, F.M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347–364.
Wynn, H.P. (1970). The sequential generation of D-optimum experimental designs. The Annals of Mathematical Statistics, 41, 1655–1664.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A. Implementations of the MCMC Algorithm
The MCMC algorithms used for the calibration design are variations of the usual MH within Gibbs algorithm with blocks of item and examinee parameters and symmetric proposal densities for the 3PL model. The general structure of the algorithm has been documented extensively (e.g., Fox 2010, Chapter 3; Johnson & Albert, 1999, Section 2.5; Patz & Junker, 1999a, 1999b). The following two versions of the algorithm are used.
1.1 A.1 Posterior of Ability Parameter
The first version is for the draws from the posterior distribution \(g(\theta_{j}\mid \mathbf{u}_{i_{k-1}})\) of ability parameter θ for test taker j in (2) upon answers to the items l=1,…,k−1 in the adaptive test. The posterior distributions of their parameters, \(g(\boldsymbol {\eta }_{i_{l}})\), are assumed to be available in the system in the form of vectors with random draws \(\boldsymbol {\eta }_{i_{l}}^{(t)}= (\boldsymbol {\eta }_{i_{l}}^{(1)},\ldots\boldsymbol {\eta }_{i_{l}}^{(T)})\).
The version can be summarized as iterations r=1,…,R each consisting of the following two steps:
-
1.
The rth draw from the posterior distribution of θ for test taker j is obtained by
-
(a)
Drawing a candidate value \(\theta _{j}^{(c)}\) for \(\theta _{j}^{(r)}\) from the proposal density \(q(\theta _{j}\mid \theta _{j}^{(r-1)})\);
-
(b)
Accepting \(\theta _{j}^{(r)}=\theta _{j}^{(c)}\) with probability
$$ \min \Biggl\{ \frac{g(\theta _{j}^{(c)})\prod_{l=1}^{k-1}p(\theta _{j}^{ ( c ) };\boldsymbol {\eta }_{i_{l}}^{(r-1)})^{u_{i_{l}}}[1-p(\theta _{j}^{(c)};\boldsymbol {\eta }_{i_{l}}^{(r-1)})]^{1-u_{i_{l}}}}{g(\theta _{j}^{(r-1)})\prod_{l=1}^{k-1}p(\theta _{j}^{ ( r-1 ) };\boldsymbol {\eta }_{i_{l}}^{(r-1)})^{u_{i_{l}}}[1-p(\theta _{j}^{(r-1)}; \boldsymbol {\eta }_{i_{l}}^{(r-1)})]^{1-u_{i_{l}}}},1 \Biggr\} $$(A.1)Otherwise, \(\theta _{j}^{(r)}=\theta _{j}^{(r-1)}\).
-
(a)
-
2.
The rth draws from the posterior distributions of the operational parameters \(\eta _{i_{l}}\), l=1,…,k−1, are randomly sampled from the vectors \(\boldsymbol {\eta }_{i_{l}}^{(t)}\) present in the system.
Upon stationarity, the draws from the posterior distribution of θ for test taker j in the first steps collected in a vector \(\boldsymbol {\theta }_{j}^{(s)}= (\theta _{j}^{(1)},\ldots,\theta _{j}^{(S)})\).
1.2 A.2 Update of Posterior of Field-Test Parameters
The second version is for the updates of the posterior distributions of the parameters of field-test item f after batch b f =1,2,… of test takers. The current posterior distributions of the ability parameters θ j for the test takers j∈b f are available in the form of estimates of their densities g(θ j ) derived from the vectors with the draws \(\boldsymbol {\theta }_{j}^{(s)}=(\theta _{j}^{(1)},\ldots,\theta _{j}^{(S)})\). Likewise, the current posterior distributions of the field-test parameter η f , f=1,…,F, are available in the form of estimates of their densities \(g^{(b-1)}(\boldsymbol {\eta }_{f}\mid \mathbf{u}_{f_{j}})\) derived from the vectors with random draws \(\boldsymbol {\eta }_{f}^{(b-1,t)}=(\boldsymbol {\eta }_{f}^{(b-1,1)},\ldots, \boldsymbol {\eta }_{f}^{(b-1,T)})\). See the main text for the derivation of these density estimates.
The second version can be summarized as iterations r=1,…,R each consisting of the following two steps:
-
1.
For each j∈b, the rth draw from the posterior distribution of θ j is obtained by
-
(a)
Drawing a candidate value \(\theta _{j}^{(c)}\) for \(\theta _{j}^{(r)}\) from the proposal density \(q(\theta _{j}\mid \theta _{j}^{(r-1)})\);
-
(b)
Accepting \(\theta _{j}^{(r)}=\theta _{j}^{(c)}\) with probability
$$ \min \Biggl\{ \frac{g(\theta _{j}^{(c)})\prod_{f=1}^{F}p(\theta_{j}^{ ( c ) }; \boldsymbol {\eta }_{f}^{(r-1)})^{u_{f_{j}}}[1-p(\theta_{j}^{(c)}; \boldsymbol {\eta }_{f}^{(r-1)})]^{1-u_{f_{j}}}}{g(\theta_{j}^{(r-1)}) \prod_{f=1}^{F}p(\theta _{j}^{ ( r-1 ) };\boldsymbol {\eta }_{f}^{(r-1)})^{u_{f_{j}}}[1-p(\theta _{j}^{(r-1)}; \boldsymbol {\eta}_{f}^{(r-1)})]^{1-u_{f_{j}}}},1 \Biggr\} $$(A.2)Otherwise, \(\theta _{j}^{(r)}=\theta _{j}^{(r-1)}\).
-
(a)
-
2.
For each of the field-test items f administered to a test taker j∈b f , the rth draw from the posterior distribution of η f is obtained by
-
(a)
Drawing a candidate value \(\boldsymbol {\eta }_{f}^{(c)}\) for \(\boldsymbol {\eta}_{f}^{(r)}\) from a proposal density \(q(\boldsymbol {\eta }_{f}\mid \boldsymbol {\eta }_{f}^{(r-1)})\);
-
(b)
Accepting \(\boldsymbol {\eta }_{f}^{(r)}=\boldsymbol {\eta }_{f}^{(c)}\) with probability
$$ \min \Biggl\{ \frac{g^{(b-1)}(\boldsymbol {\eta }_{f}^{(c)}) \prod_{j=1}^{n_{b_{f}}} \{ p(\theta _{j}^{(r)};\boldsymbol {\eta }_{f}^{(c)})^{u_{f_{j}}} [1-p(\theta _{j}^{(r)};\boldsymbol {\eta }_{f}^{(c)})]^{1-u_{f_{j}}} \} }{g^{(c)}(\boldsymbol {\eta }_{f}^{(r-1)}) \prod_{j=1}^{n_{b_{f}}} \{ p(\theta _{j}^{ (r ) }; \boldsymbol {\eta }_{f}^{(r-1)})^{u_{f_{j}}}[1-p(\theta _{j}^{(r)}; \boldsymbol {\eta }_{f}^{(r-1)})]^{1-u_{f_{j}}} \} },1 \Biggr\} $$(A.3)Otherwise, \(\boldsymbol {\eta }_{f}^{(r)}=\boldsymbol {\eta }_{f}^{(r-1)}\).
-
(a)
Upon stationarity, the draws \(\boldsymbol {\eta }_{f}^{(b,t)}=(\boldsymbol {\eta }_{f}^{(b,1)},\ldots, \boldsymbol {\eta }_{f}^{(b,T)})\) from the posterior update of η f for batch b during the second steps are saved as updates of the vectors \(\boldsymbol {\eta }_{f}^{(b-1,t)}\).
Appendix B. Information Matrices
2.1 B.1 Observed Information Matrix
For the 3PL model, the observed information matrix \(J_{u_{f_{j}}}(\boldsymbol {\eta }_{f};\theta _{j})\) in (17) has entries
where p f is the response probability on field-test item f in (1).
2.2 B.2 Expected Information Matrix
The expected matrix \(I_{U_{f}}(\boldsymbol {\eta }_{f};\theta _{j})\) in (9) is readily available in the literature (e.g., Lord 1980, Section 12.1). Using the notation in this paper, it is obtained taking the expectations of (B.1)–(B.6) over the response distribution, as
2.3 B.3 Observed Information Matrix for Transformed Parameters
For a multivariate normal proposal distribution based on the transformations \(a_{f}^{\ast }=\ln a_{f}\) and \(c_{f}^{\ast }=\operatorname {logit}c_{f}\) in (5), the entries of the observed information matrix in (B.1)–(B.6) take the following form:
Observe that, before entering the acceptance criterion in (A.2), the draws of \(a_{f}^{\ast }\) and \(c_{f}^{\ast }\) from the proposal distribution with this version of covariance matrix have to transformed back to their original scales as \(a_{f}=\exp (a_{f}^{\ast })\) and \(c_{f}=[1+\exp (-c_{f}^{\ast})]^{-1}\).
Rights and permissions
About this article
Cite this article
van der Linden, W.J., Ren, H. Optimal Bayesian Adaptive Design for Test-Item Calibration. Psychometrika 80, 263–288 (2015). https://doi.org/10.1007/s11336-013-9391-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-013-9391-8