Skip to main content

Item Selection and Ability Estimation in Adaptive Testing

  • Chapter
  • First Online:

Part of the book series: Statistics for Social and Behavioral Sciences ((SSBS))

Abstract

The last century saw a tremendous progression in the refinement and use of standardized linear tests. The first administered College Board exam occurred in 1901 and the first Scholastic Assessment Test (SAT) was given in 1926. Since then, progressively more sophisticated standardized linear tests have been developed for a multitude of assessment purposes, such as college placement, professional licensure, higher-education admissions, and tracking educational standing or progress. Standardized linear tests are now administered around the world. For example, the Test of English as a Foreign Language (TOEFL) has been delivered in approximately 88 countries.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Andersen, E. B. (1980). Discrete statistical models with social sciences applications. Amsterdam: North-Holland.

    Google Scholar 

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick, Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Bock, R. D. & Mislevy, R. J. (1988). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444.

    Article  Google Scholar 

  • Chang, H.-H. & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 37–52.

    Article  MATH  MathSciNet  Google Scholar 

  • Chang, H.-H. & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement,20, 213–229.

    Article  Google Scholar 

  • Chang, H.-H. & Ying, Z. (1999). α-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211–222.

    Article  Google Scholar 

  • Chang, H.-H. & Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73, 441–450.

    Article  Google Scholar 

  • Chang, H.-H. & Ying, Z. (2009). Nonlinear sequential designs for logistic item response models with applications to computerized adaptive tests. The Annals of Statistics, 37, 1466–1488.

    Article  MATH  MathSciNet  Google Scholar 

  • Chen, S., Hou, L. & Dodd, B. G. (1998). A comparison of maximum-likelihood estimation and expected a posteriori estimation in CAT using the partial credit model. Educational and Psychological Measurement, 58, 569–595.

    Article  Google Scholar 

  • De Ayala, R. J. (1992). The nominal response model in computerized adaptive testing. Applied Psychological Measurement, 16, 327–343.

    Article  Google Scholar 

  • De Ayala, R. J., Dodd, B. G. & Koch, W. R. (1992). A comparison of the partial credit and graded response models in computerized adaptive testing. Applied Measurement in Education, 5, 17–34.

    Article  Google Scholar 

  • Eggen, T. J. H. M. & Verschoor, A. J. (2006). Optimal testing with easy and difficult items in computerized adaptive testing. Applied Psychological Measurement, 30, 379–393.

    Article  MathSciNet  Google Scholar 

  • Freund, P. A., Hofer, S. & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210.

    Article  MathSciNet  Google Scholar 

  • Geerlings, H., van der Linden, W. J. & Glas, C. A. W. (2009). Modeling rule-based item generation. Submitted for publication.

    Google Scholar 

  • Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (1995). Bayesian data analysis. London: Chapman & Hall.

    Google Scholar 

  • Glas, C. A. W. & van der Linden, W. J. (2001). Modeling item variability in item parameters in item response models (Research Report 01-11). Enschede, the Netherlands: Department of Educational Measurement and Data Analysis, University of Twente.

    Google Scholar 

  • Glas, C. A. W. & van der Linden, W. J. (2003). Computerized adaptive testing with item clones. Applied Psychological Measurement, 27, 247–261.

    Article  MathSciNet  Google Scholar 

  • Gulliksen, H. (1950). Theory of mental tests. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Holling, H., Bertling, J. P. & Zeuch, N. (in press). Probability word problems: Automatic item generation and LLTM modelling. Studies in Educational Evaluation.

    Google Scholar 

  • Klein Entink, R. H., Fox, J.-P. & van der Linden, W. J. (2009). A multivariate multilevel approach to simultaneous modeling of accuracy and speed on test items. Psychometrika, 74, 21–48.

    Article  MATH  Google Scholar 

  • Lehmann, E. L. & Casella, G. (1998). Theory of point estimation. New York: Springer-Verlag.

    MATH  Google Scholar 

  • Lord, F. M. (1971). The self-scoring flexilevel test. Journal of Educational Measurement, 8, 147–151.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Lord, F. M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162.

    Article  Google Scholar 

  • Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    MATH  Google Scholar 

  • Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.

    Article  MATH  MathSciNet  Google Scholar 

  • Mislevy, R. J. & Wu, P.-K. (1988). Inferring examinee ability when some items response are missing (Research Report 88-48-ONR). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Owen, R. J. (1969). A Bayesian approach to tailored testing (Research Report 69-92). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351–356.

    Article  MATH  MathSciNet  Google Scholar 

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Denmarks Paedogogiske Institut.

    Google Scholar 

  • Roberts, J. S., Lin, Y. & Laughlin, J. E. (2001). Computerized adaptive testing with the generalized graded unfolding model. Applied Psychological Measurement, 25, 177–192.

    Article  MathSciNet  Google Scholar 

  • Samejima, F. (1973). A comment on Birnbaum’s three-parameter logistic model in latent trait theory. Psychometrika, 38, 221–233.

    Article  MATH  Google Scholar 

  • Samejima, F. (1993). The bias function of the maximum-likelihood estimate of ability for the dichotomous response level. Psychometrika, 58, 195–210.

    Article  Google Scholar 

  • Schnipke, D. L. & Green, B. F. (1995). A comparison of item selection routines in linear and adaptive testing. Journal of Educational Measurement, 32, 227–242.

    Article  Google Scholar 

  • Segall, D. O. (1997). Equating the CAT-ASVAB. In W. A. Sands, B. K. Waters & J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 181–198). Washington, DC: American Psychological Association.

    Chapter  Google Scholar 

  • Sinharay, S., Johnson, M. S. & Williamson, D. M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.

    Article  Google Scholar 

  • Stocking, M. L. (1996). An alternative method for scoring adaptive tests. Journal of Educational and Behavioral Statistics, 21, 365–389.

    Google Scholar 

  • Thissen, D., Chen, W.-H. & Bock, R. D. (2002). Multilog 7: Analysis of multi-category response data [Computer program and manual]. Lincolnwood, IL: Scientific Software International.

    Google Scholar 

  • Thissen, D. & Mislevy, R. J. (1990). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 103–134). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Tsutakawa, R. K. & Johnson, C. (1990). The effect of uncertainty on item parameter estimation on ability estimates. Psychometrika, 55, 371–390.

    Article  Google Scholar 

  • van der Linden, W. J. (1998). Bayesian item-selection criteria for adaptive testing. Psychometrika, 62, 201–216.

    Article  MathSciNet  Google Scholar 

  • van der Linden, W. J. (1999). A procedure for empirical initialization of the trait estimator in adaptive testing. Applied Psychological Measurement, 23, 21–29.

    Article  Google Scholar 

  • van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308.

    Article  MATH  MathSciNet  Google Scholar 

  • van der Linden, W. J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33, 5–20.

    Article  Google Scholar 

  • van der Linden, W. J. & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education,13, 35–53.

    Article  Google Scholar 

  • van der Linden, W. J. & Glas, C. A. W. (2001). Cross-validating item parameter estimation in computerized adaptive testing. In A. Boomsma, M. A. J. van Duijn & T. A. M. Snijders (Eds.), Essays on item response theory (pp. 205–219). New York: Springer-Verlag.

    Google Scholar 

  • van der Linden, W. J. & Glas, C. A. W. (2007). Statistical aspects of adaptive testing. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 27: Psychometrics) (pp. 801–838). Amsterdam: North-Holland.

    Google Scholar 

  • van Rijn, P. W., Eggen, T. J. H. M., Hemker, B. T. & Sanders, P. F. (2002). Evaluation of selection procedures for computerized adaptive testing with polytomous items. Applied Psychological Measurement, 26, 393–411.

    Article  MathSciNet  Google Scholar 

  • Veerkamp, W. J. J. & Berger, M. P. F. (1997). Item-selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203–226.

    Google Scholar 

  • Wainer, H., Lewis, C., Kaplan, B. & Braswell, J. (1991). Building algebra testlets: A comparison of hierarchical and linear structures. Journal of Educational Measurement, 28, 311–323.

    Article  Google Scholar 

  • Wang, T., Hanson, B. A. & Lau, C.-M. A. (1999). Reducing bias in CAT trait estimation: A comparison of approaches. Applied Psychological Measurement, 23, 263–278.

    Article  Google Scholar 

  • Wang, T. & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35, 109–135.

    Article  Google Scholar 

  • Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory with tests of finite length. Psychometrika, 54, 427–450.

    Article  MathSciNet  Google Scholar 

  • Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 4, 473–285.

    Article  Google Scholar 

  • Weiss, D. J. & McBride, J. R. (1984). Bias and information of Bayesian adaptive testing. Applied Psychological Measurement, 8, 273–285.

    Article  Google Scholar 

  • Zimoski, M. F., Muraki, E., Mislevy, R. & Bock, D. R. (2006). BILOG-MG 3 for Windows [Computer program and manual]. Lincolnwood, IL: Scientific Software International.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

van der Linden, W.J., Pashley, P.J. (2009). Item Selection and Ability Estimation in Adaptive Testing. In: van der Linden, W., Glas, C. (eds) Elements of Adaptive Testing. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-0-387-85461-8_1

Download citation

Publish with us

Policies and ethics