An Overview of Item Response Theory

Magis, David; Yan, Duanli; von Davier, Alina A.

doi:10.1007/978-3-319-69218-0_2

David Magis⁷,
Duanli Yan⁸ &
Alina A. von Davier⁹

Part of the book series: Use R! ((USE R))

2646 Accesses

Abstract

An important component of both CAT and MST is the use of item response theory (IRT) as an underlying framework for item bank calibration, ability estimation, and item/module selection. In this chapter, we present a brief overview of this theory, by providing key information and introducing appropriate notation for use in subsequent chapters. Only topics and contents directly related to adaptive and multistage testing will be covered in this chapter; appropriate references for further reading are therefore also mentioned.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adams, R. J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–24. https://doi.org/10.1177/0146621697211001
Article Google Scholar
Andersen, E. B. (1970). Asymptotic properties of conditional maximum likelihood equations. Journal of the Royal Statistical Society, Series B, 32, 283–301.
MATH Google Scholar
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573. https://doi.org/10.1007/BF02293814
Article MATH Google Scholar
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.
MATH Google Scholar
Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (Research Bulletin No. 81-20). Princeton, NJ: Educational Testing Service.
Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Google Scholar
Birnbaum, A. (1969). Statistical theory for logistic mental test models with a prior distribution of ability. Journal of Mathematical Psychology, 6, 258–276. https://doi.org/10.1016/0022-2496(69)90005-4
Article MATH Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. https://doi.org/10.1007/BF02291411
Article MATH Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459. https://doi.org/10.1007/BF02293801
Article MathSciNet Google Scholar
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179–197. https://doi.org/10.1007/BF02291262
Article Google Scholar
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444. https://doi.org/10.1177/014662168200600405
Article Google Scholar
Braeken, J., Tuerlinckx, F., & De Boeck, P. (2007). Copulas for residual dependencies. Psychometrika, 72, 393–411. https://doi.org/10.1007/s11336-007-9005-4
Article MathSciNet MATH Google Scholar
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289. https://doi.org/10.3102/10769986022003265
Article Google Scholar
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
Book MATH Google Scholar
DeMars, C. (2010). Item response theory. Oxford: Oxford University Press.
Book Google Scholar
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
Article Google Scholar
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Finch, H., & Habing, B. (2007). Performance of DIMTEST- and NOHARM-based statistics for testing unidimensionality. Applied Psychological Measurement, 31, 292–307. https://doi.org/10.1177/0146621606294490
Article MathSciNet Google Scholar
Fischer, G. H. (1981). On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika, 46, 59–77. https://doi.org/10.1007/BF02293919
Article MathSciNet MATH Google Scholar
Fraser, C., & McDonald, R. P. (2003). Noharm 3.0 [Computer software manual]. http://people.niagaracollege.ca/cfraser/download/
Google Scholar
Gessaroli, M. E., & De Champlain, A. F. (1996). Using a approximate chi-square statistic to test the number of dimensions underlying the responses to a set of items. Journal of Educational Measurement, 33, 157–179. https://doi.org/10.1111/j.1745-3984.1996.tb00487.x
Article Google Scholar
Green, B. F. J. (1950). A general solution for the latent class model of latent structure analysis (ETS Research Bulletin Series No. RB-50-38). Princeton, NJ: Educational Testing Service.
Google Scholar
Haberman, S. J., & von Davier, A. A. (2014). Considerations on parameter estimation, scoring, and linking in multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 229–248). New York: CRC Press.
Google Scholar
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese psychological Research, 22, 144–149.
Article Google Scholar
Haley, D. (1952). Estimation of the dosage mortality relationship when the dose is subject to error (Technical report No. 15). Palo Alto, CA: Applied Mathematics and Statistics Laboratory, Stanford University.
Google Scholar
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
Book Google Scholar
Hattie, J. (1984). An empirical study of various indices for determining unidimensionality. Multivariate Behavioral Research, 19, 49–78. https://doi.org/10.1207/s15327906mbr1901\_3
Article Google Scholar
Holland, P. W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55, 577–602. https://doi.org/10.1007/BF02294609
Article MathSciNet MATH Google Scholar
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.
Google Scholar
Jeffreys, H. (1939). Theory of probability. Oxford, UK: Oxford University Press.
MATH Google Scholar
Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186, 453–461.
Google Scholar
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty six person-fit statistics. Applied Measurement in Education, 16, 277–298. https://doi.org/10.1207/S15324818AME1604\_2
Article Google Scholar
Kelderman, H., & Rijkes, C. P. M. (1994). Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, 149–176. https://doi.org/10.1007/BF02295181
Article MATH Google Scholar
Klein Entink, R. H., Fox, J.-P., & van der Linden, W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48. https://doi.org/10.1007/s11336-008-9075-y
Article MathSciNet MATH Google Scholar
Lord, F. M. (1951). A theory of test scores and their relation to the trait measured (ETS Research Bulletin Series No. RB-51-13). Princeton, NJ: Educational Testing Service.
Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Google Scholar
Lord, F. M. (1986). Maximum likelihood and bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162. https://doi.org/10.1111/j.1745-3984.1986.tb00241.x
Article Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
MATH Google Scholar
Magis, D. (2014). On the asymptotic standard error of a class of robust estimators of ability in dichotomous item response models. British Journal of Mathematical and Statistical Psychology, 67, 430–450. https://doi.org/10.1111/bmsp.12027
Article MathSciNet Google Scholar
Magis, D. (2015b). A note on the equivalence between observed and expected information functions with polytomous IRT models. Journal of Educational and Behavioral Statistics, 40, 96–105. https://doi.org/10.3102/1076998614558122
Article Google Scholar
Magis, D. (2015c). A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80, 200–204. https://doi.org/10.1007/S11336-013-9378-5
Article MathSciNet MATH Google Scholar
Magis, D. (2016). Efficient standard error formulas of ability estimators with dichotomous item response models. Psychometrika, 81, 184–200. https://doi.org/10.1007/s11336-015-9443-3
Article MathSciNet MATH Google Scholar
Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862. https://doi.org/10.3758/BRM.42.3.847
Article Google Scholar
Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547. https://doi.org/10.1007/BF02294327
Article MATH Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272
Article MATH Google Scholar
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional response data (Research Report No. ONR 82-1). Iowa City, IA: American College testing.
Google Scholar
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334. https://doi.org/10.1177/014662169301700401
Article Google Scholar
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359–381. https://doi.org/10.1007/BF02306026
Article MATH Google Scholar
Mislevy, R. J. (1986). Bayesian modal estimation in item response models. Psychometrika, 51, 177–195. https://doi.org/10.1007/BF02293979
Article MathSciNet MATH Google Scholar
Mislevy, R. J., & Bock, R. D. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42, 725–737. https://doi.org/10.1177/001316448204200302
Article Google Scholar
Mosteller, F., & Tukey, J. (1977). Exploratory data analysis and regression. Reading, MA: Addison-Wesley.
Google Scholar
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71. https://doi.org/10.1177/014662169001400106
Article Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 19–176. https://doi.org/10.1177/014662169201600206
Article Google Scholar
Muraki, E., & Bock, R. D. (2003). PARSCALE 4.0 [Computer software manual]. Lincolnwood, IL: Scientific Software International.
Google Scholar
Muraki, E., & Carlson, J. E. (1993). Full-information factor analysis for polytomous item responses. Paper presented at the annual meeting of the American Educational Research Association, Atlanta.
Google Scholar
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thousand Oaks, CA: Sage.
Book Google Scholar
Ostini, R., & Nering, M. L. (2006). Polytomous item response theory models. Thousand Oaks, CA: Sage.
Book Google Scholar
Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, Vol. 26. Psychometrics (pp. 125–167). Amsterdam: Elsevier.
Google Scholar
Rao, C. R., & Sinharay, S. (2007). Handbook of statistics, Vol. 26. Psychometrics. Amsterdam: Elsevier.
Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Google Scholar
Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4, 207–230. https://doi.org/10.2307/1164671
Article Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Book MATH Google Scholar
Roskam, E. E. (1987). Toward a psychometric theory of intelligence. In E. E. Roskam & R. Suck (Eds.), Progress in mathematical psychology (pp. 151–171). Amsterdam: North-Holland.
Google Scholar
Roskam, E. E. (1997). Models for speed and time-limit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 187–208). New York: Springer.
Chapter Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement, Vol. 34 (Monograph no. 17). Richmond: Byrd Press.
Google Scholar
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional space. Psychometrika, 39, 111–121. https://doi.org/10.1007/BF02291580
Article MathSciNet MATH Google Scholar
Samejima, F. (1994). Some critical observations of the test information function as a measure of local accuracy in ability estimation. Psychometrika, 59, 307–329. https://doi.org/10.1007/BF02296127
Article MATH Google Scholar
Samejima, F. (1998). Expansion of Warm’s weighted likelihood estimator of ability for the three-parameter logistic model to general discrete responses. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Google Scholar
Schuster, C., & Yuan, K.-H. (2011). Robust estimation of latent ability in item response models. Journal of Educational and Behavioral Statistics, 36, 720–735. https://doi.org/10.3102/1076998610396890
Article Google Scholar
Snijders, T. A. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66, 331–342. https://doi.org/10.1007/BF02294437
Article MathSciNet MATH Google Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210. https://doi.org/10.1177/014662168300700208
Article Google Scholar
Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52, 589–617. https://doi.org/10.1007/BF02294821
Article MathSciNet MATH Google Scholar
Stout, W. (2005). DIMTEST (Version 2.0) [Computer software manual]. Champaign, IL: The William Stout Institute for Measurement.
Google Scholar
Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing the fit of item response theory models. In C. R. Rao & S. Sinharray (Eds.), Handbook of statistics, Vol. 26. psychometrics (pp. 683–718). Amsterdam: Elsevier.
Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
Article Google Scholar
Sympson, J. B. (1978). A model with testing for multidimensional items. In D. J. Weiss (Ed.), Proceedings of the 1977 computerized adaptive testing conference. Minneapolis, MN: University of Minnesota.
Google Scholar
Tate, R. (2003). A comparison of selected empirical methods for assessing the structure of responses to test items. Applied Psychological Measurement, 27, 159–203. https://doi.org/10.1177/0146621603027003001
Article MathSciNet Google Scholar
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567–577. https://doi.org/10.1007/BF02295596
Article MATH Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–170). Hillsdale, NJ: Erlbaum.
Google Scholar
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. New York: Springer.
Book MATH Google Scholar
van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.
Book MATH Google Scholar
van der Linden, W. J., Klein Entink, R., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327–347. https://doi.org/10.1177/0146621609349800
Article Google Scholar
Verhelst, N. D., Verstralen, H. H. F. M., & Jansen, M. G. (1997). A logistic model for time limit tests. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 169–185). New York: Springer.
Chapter Google Scholar
Wainer, H. (2000). Computerized adaptive testing: A primer (2nd ed.). New York: Routledge/Taylor and Francis.
Google Scholar
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 246–270). Boston, MA: Kluwer-Nijhoff.
Google Scholar
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Wang, T., & Hanson, B. A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 323–339. https://doi.org/10.1177/0146621605275984
Article MathSciNet Google Scholar
Warm, T. (1989). Weighted likelihood estimation of ability in item response models. Psychometrika, 54, 427–450. https://doi.org/10.1007/BF02294627
Article MathSciNet Google Scholar
Weiss, D. J. (1983). New horizons in testing: Latent trait theory and computerized adaptive testing. New York: Academic Press.
Google Scholar
Wright, B. O., & Masters, G. N. (1982). Rating scale analysis. Chicago, IL: MESA Press.
Google Scholar
Wright, B. O., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press.
Google Scholar
Yao, L., & Schwarz, R. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30, 469–492. https://doi.org/10.1177/0146621605284537
Article MathSciNet Google Scholar
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262. https://doi.org/10.1177/014662168100500212
Article Google Scholar
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145. https://doi.org/10.1177/014662168400800201
Article Google Scholar
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
Article MathSciNet Google Scholar
Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed.) (pp. 111–153). Westport, CT: Praeger.
Google Scholar
Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432–442. https://doi.org/10.1037/0033-2909.99.3.432
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Education, University of Liege, Liege, Belgium
David Magis
Educational Testing Service, Princeton, NJ, USA
Duanli Yan
ACTNext by ACT, Iowa City, IA, USA
Alina A. von Davier

Authors

David Magis
View author publications
You can also search for this author in PubMed Google Scholar
Duanli Yan
View author publications
You can also search for this author in PubMed Google Scholar
Alina A. von Davier
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Magis, D., Yan, D., von Davier, A.A. (2017). An Overview of Item Response Theory. In: Computerized Adaptive and Multistage Testing with R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-69218-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-69218-0_2
Published: 24 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69217-3
Online ISBN: 978-3-319-69218-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics