Using Item Response Theory as a Tool in Educational Measurement

  • Margaret WuEmail author
Part of the Education in the Asia-Pacific Region: Issues, Concerns and Prospects book series (EDAP, volume 18)


Item response theory (IRT) and classical test theory (CTT) are invaluable tools for the construction of assessment instruments and the measurement of student proficiencies in educational settings. However, the advantages of IRT over CTT are not always clear. This chapter uses an example item analysis to contrast IRT and CTT. It is hoped that the readers can gain a deeper understanding of IRT through comparisons of similarities and differences between IRT and CTT statistics. In particular, this chapter discusses item properties such as the difficulty and discrimination power of items, as well as person ability measures contrasting the weighted likelihood estimates and plausible values in non-technical ways. The main advantage of IRT over CTT is outlined through a discussion on the construction of a developmental scale on which individual students are located. Further, some limitations of both IRT and CTT are brought to light to guide the valid use of IRT and CTT results. Lastly, the IRT software program, ConQuest (Wu et al. ACERConQuest version 2: Generalised item response modelling software. Australian Council for Educational Research, Camberwell, 2007), is used to run the item analysis to illustrate some of the program’s functionalities.


Test Score Item Response Item Response Theory Item Difficulty Item Response Theory Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: CBS College Publishing.Google Scholar
  2. Lord, F. M. (1952). A theory of test scores. Psychometrika Monograph, No. 7, 17 (4, Pt. 2). Google Scholar
  3. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.Google Scholar
  4. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.Google Scholar
  5. Wu, M. L. (2005). The role of plausible values in large-scale surveys. In Postlethwaite (Ed.), Special issue of studies in educational evaluation (SEE) in memory of R M Wolf. 31 (2005) (pp. 114–128).Google Scholar
  6. Wu, M. L., & Adams, R. J. (2008). Properties of Rasch residual fit statistics. Unpublished paper.Google Scholar
  7. Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. A. (2007). ACERConQuest version 2: Generalised item response modelling software. Camberwell: Australian Council for Educational Research.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  1. 1.Work-based Education Research CentreVictoria UniversityMelbourneAustralia

Personalised recommendations