Abstract
An important component of both CAT and MST is the use of item response theory (IRT) as an underlying framework for item bank calibration, ability estimation, and item/module selection. In this chapter, we present a brief overview of this theory, by providing key information and introducing appropriate notation for use in subsequent chapters. Only topics and contents directly related to adaptive and multistage testing will be covered in this chapter; appropriate references for further reading are therefore also mentioned.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adams, R. J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–24. https://doi.org/10.1177/0146621697211001
Andersen, E. B. (1970). Asymptotic properties of conditional maximum likelihood equations. Journal of the Royal Statistical Society, Series B, 32, 283–301.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573. https://doi.org/10.1007/BF02293814
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.
Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (Research Bulletin No. 81-20). Princeton, NJ: Educational Testing Service.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Birnbaum, A. (1969). Statistical theory for logistic mental test models with a prior distribution of ability. Journal of Mathematical Psychology, 6, 258–276. https://doi.org/10.1016/0022-2496(69)90005-4
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. https://doi.org/10.1007/BF02291411
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459. https://doi.org/10.1007/BF02293801
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179–197. https://doi.org/10.1007/BF02291262
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444. https://doi.org/10.1177/014662168200600405
Braeken, J., Tuerlinckx, F., & De Boeck, P. (2007). Copulas for residual dependencies. Psychometrika, 72, 393–411. https://doi.org/10.1007/s11336-007-9005-4
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289. https://doi.org/10.3102/10769986022003265
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
DeMars, C. (2010). Item response theory. Oxford: Oxford University Press.
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.
Finch, H., & Habing, B. (2007). Performance of DIMTEST- and NOHARM-based statistics for testing unidimensionality. Applied Psychological Measurement, 31, 292–307. https://doi.org/10.1177/0146621606294490
Fischer, G. H. (1981). On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika, 46, 59–77. https://doi.org/10.1007/BF02293919
Fraser, C., & McDonald, R. P. (2003). Noharm 3.0 [Computer software manual]. http://people.niagaracollege.ca/cfraser/download/
Gessaroli, M. E., & De Champlain, A. F. (1996). Using a approximate chi-square statistic to test the number of dimensions underlying the responses to a set of items. Journal of Educational Measurement, 33, 157–179. https://doi.org/10.1111/j.1745-3984.1996.tb00487.x
Green, B. F. J. (1950). A general solution for the latent class model of latent structure analysis (ETS Research Bulletin Series No. RB-50-38). Princeton, NJ: Educational Testing Service.
Haberman, S. J., & von Davier, A. A. (2014). Considerations on parameter estimation, scoring, and linking in multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 229–248). New York: CRC Press.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese psychological Research, 22, 144–149.
Haley, D. (1952). Estimation of the dosage mortality relationship when the dose is subject to error (Technical report No. 15). Palo Alto, CA: Applied Mathematics and Statistics Laboratory, Stanford University.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
Hattie, J. (1984). An empirical study of various indices for determining unidimensionality. Multivariate Behavioral Research, 19, 49–78. https://doi.org/10.1207/s15327906mbr1901\_3
Holland, P. W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55, 577–602. https://doi.org/10.1007/BF02294609
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.
Jeffreys, H. (1939). Theory of probability. Oxford, UK: Oxford University Press.
Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186, 453–461.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty six person-fit statistics. Applied Measurement in Education, 16, 277–298. https://doi.org/10.1207/S15324818AME1604\_2
Kelderman, H., & Rijkes, C. P. M. (1994). Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, 149–176. https://doi.org/10.1007/BF02295181
Klein Entink, R. H., Fox, J.-P., & van der Linden, W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48. https://doi.org/10.1007/s11336-008-9075-y
Lord, F. M. (1951). A theory of test scores and their relation to the trait measured (ETS Research Bulletin Series No. RB-51-13). Princeton, NJ: Educational Testing Service.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Lord, F. M. (1986). Maximum likelihood and bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162. https://doi.org/10.1111/j.1745-3984.1986.tb00241.x
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Magis, D. (2014). On the asymptotic standard error of a class of robust estimators of ability in dichotomous item response models. British Journal of Mathematical and Statistical Psychology, 67, 430–450. https://doi.org/10.1111/bmsp.12027
Magis, D. (2015b). A note on the equivalence between observed and expected information functions with polytomous IRT models. Journal of Educational and Behavioral Statistics, 40, 96–105. https://doi.org/10.3102/1076998614558122
Magis, D. (2015c). A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80, 200–204. https://doi.org/10.1007/S11336-013-9378-5
Magis, D. (2016). Efficient standard error formulas of ability estimators with dichotomous item response models. Psychometrika, 81, 184–200. https://doi.org/10.1007/s11336-015-9443-3
Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862. https://doi.org/10.3758/BRM.42.3.847
Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547. https://doi.org/10.1007/BF02294327
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional response data (Research Report No. ONR 82-1). Iowa City, IA: American College testing.
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334. https://doi.org/10.1177/014662169301700401
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359–381. https://doi.org/10.1007/BF02306026
Mislevy, R. J. (1986). Bayesian modal estimation in item response models. Psychometrika, 51, 177–195. https://doi.org/10.1007/BF02293979
Mislevy, R. J., & Bock, R. D. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42, 725–737. https://doi.org/10.1177/001316448204200302
Mosteller, F., & Tukey, J. (1977). Exploratory data analysis and regression. Reading, MA: Addison-Wesley.
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71. https://doi.org/10.1177/014662169001400106
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 19–176. https://doi.org/10.1177/014662169201600206
Muraki, E., & Bock, R. D. (2003). PARSCALE 4.0 [Computer software manual]. Lincolnwood, IL: Scientific Software International.
Muraki, E., & Carlson, J. E. (1993). Full-information factor analysis for polytomous item responses. Paper presented at the annual meeting of the American Educational Research Association, Atlanta.
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thousand Oaks, CA: Sage.
Ostini, R., & Nering, M. L. (2006). Polytomous item response theory models. Thousand Oaks, CA: Sage.
Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, Vol. 26. Psychometrics (pp. 125–167). Amsterdam: Elsevier.
Rao, C. R., & Sinharay, S. (2007). Handbook of statistics, Vol. 26. Psychometrics. Amsterdam: Elsevier.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4, 207–230. https://doi.org/10.2307/1164671
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Roskam, E. E. (1987). Toward a psychometric theory of intelligence. In E. E. Roskam & R. Suck (Eds.), Progress in mathematical psychology (pp. 151–171). Amsterdam: North-Holland.
Roskam, E. E. (1997). Models for speed and time-limit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 187–208). New York: Springer.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement, Vol. 34 (Monograph no. 17). Richmond: Byrd Press.
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional space. Psychometrika, 39, 111–121. https://doi.org/10.1007/BF02291580
Samejima, F. (1994). Some critical observations of the test information function as a measure of local accuracy in ability estimation. Psychometrika, 59, 307–329. https://doi.org/10.1007/BF02296127
Samejima, F. (1998). Expansion of Warm’s weighted likelihood estimator of ability for the three-parameter logistic model to general discrete responses. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Schuster, C., & Yuan, K.-H. (2011). Robust estimation of latent ability in item response models. Journal of Educational and Behavioral Statistics, 36, 720–735. https://doi.org/10.3102/1076998610396890
Snijders, T. A. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66, 331–342. https://doi.org/10.1007/BF02294437
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210. https://doi.org/10.1177/014662168300700208
Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52, 589–617. https://doi.org/10.1007/BF02294821
Stout, W. (2005). DIMTEST (Version 2.0) [Computer software manual]. Champaign, IL: The William Stout Institute for Measurement.
Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing the fit of item response theory models. In C. R. Rao & S. Sinharray (Eds.), Handbook of statistics, Vol. 26. psychometrics (pp. 683–718). Amsterdam: Elsevier.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
Sympson, J. B. (1978). A model with testing for multidimensional items. In D. J. Weiss (Ed.), Proceedings of the 1977 computerized adaptive testing conference. Minneapolis, MN: University of Minnesota.
Tate, R. (2003). A comparison of selected empirical methods for assessing the structure of responses to test items. Applied Psychological Measurement, 27, 159–203. https://doi.org/10.1177/0146621603027003001
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567–577. https://doi.org/10.1007/BF02295596
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–170). Hillsdale, NJ: Erlbaum.
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. New York: Springer.
van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.
van der Linden, W. J., Klein Entink, R., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327–347. https://doi.org/10.1177/0146621609349800
Verhelst, N. D., Verstralen, H. H. F. M., & Jansen, M. G. (1997). A logistic model for time limit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 169–185). New York: Springer.
Wainer, H. (2000). Computerized adaptive testing: A primer (2nd ed.). New York: Routledge/Taylor and Francis.
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 246–270). Boston, MA: Kluwer-Nijhoff.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge: Cambridge University Press.
Wang, T., & Hanson, B. A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 323–339. https://doi.org/10.1177/0146621605275984
Warm, T. (1989). Weighted likelihood estimation of ability in item response models. Psychometrika, 54, 427–450. https://doi.org/10.1007/BF02294627
Weiss, D. J. (1983). New horizons in testing: Latent trait theory and computerized adaptive testing. New York: Academic Press.
Wright, B. O., & Masters, G. N. (1982). Rating scale analysis. Chicago, IL: MESA Press.
Wright, B. O., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press.
Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30, 469–492. https://doi.org/10.1177/0146621605284537
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262. https://doi.org/10.1177/014662168100500212
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145. https://doi.org/10.1177/014662168400800201
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed.) (pp. 111–153). Westport, CT: Praeger.
Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432–442. https://doi.org/10.1037/0033-2909.99.3.432
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Magis, D., Yan, D., von Davier, A.A. (2017). An Overview of Item Response Theory. In: Computerized Adaptive and Multistage Testing with R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-69218-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-69218-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69217-3
Online ISBN: 978-3-319-69218-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)