, Volume 75, Issue 2, pp 209–227 | Cite as

Reporting of Subscores Using Multidimensional Item Response Theory

  • Shelby J. Haberman
  • Sandip SinharayEmail author


Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in Appl. Psychol. Meas. 21:25–36, 1997; C.R. Rao and S. Sinharay (Eds), Handbook of Statistics, vol. 26, pp. 607–642, North-Holland, Amsterdam, 2007; Beguin & Glas in Psychometrika, 66:471–488, 2001). A MIRT model is fitted using a stabilized Newton–Raphson algorithm (Haberman in The Analysis of Frequency Data, University of Chicago Press, Chicago, 1974; Sociol. Methodol. 18:193–211, 1988) with adaptive Gauss–Hermite quadrature (Haberman, von Davier, & Lee in ETS Research Rep. No. RR-08-45, ETS, Princeton, 2008). A new statistical approach is proposed to assess when subscores using the MIRT model have any added value over (i)  the total score or (ii)  subscores based on classical test theory (Haberman in J. Educ. Behav. Stat. 33:204–229, 2008; Haberman, Sinharay, & Puhan in Br. J. Math. Stat. Psychol. 62:79–95, 2008). The MIRT-based methods are applied to several operational data sets. The results show that the subscores based on MIRT are slightly more accurate than subscore estimates derived by classical test theory.


2PL model Mean squared error augmented subscore 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ackerman, T., & Shu, Z. (2009). Using confirmatory mirt modeling to provide diagnostic information in large scale assessment. Paper presented at the annual meeting of the national council of measurement in education, San Diego, CA, April 2009. Google Scholar
  2. Adams, R.J., Wilson, M.R., & Wang, W.C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23. CrossRefGoogle Scholar
  3. Beguin, A.A., & Glas, C.A.W. (2001). MCMC estimation and some fit analysis of multidimensional irt models. Psychometrika, 66, 471–488. CrossRefGoogle Scholar
  4. Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an em algorithm. Psychometrika, 46, 443–459. CrossRefGoogle Scholar
  5. de la Torre, J., & Patz, R.J. (2005). Making the most of what we have: a practical application of multidimensional irt in test scoring. Journal of Educational and Behavioral Statistics, 30, 295–311. CrossRefGoogle Scholar
  6. Dwyer, A., Boughton, K.A., Yao, L., Steffen, M., & Lewis, D. (2006). A comparison of subscale score augmentation methods using empirical data. Paper presented at the annual meeting of the national council of measurement in education, San Fransisco, CA, April 2006. Google Scholar
  7. Haberman, S.J. (1974). The analysis of frequency data. Chicago: University of Chicago Press. Google Scholar
  8. Haberman, S.J. (1988). A stabilized Newton-Raphson algorithm for log-linear models for frequency tables derived by indirect observation. Sociological Methodology, 18, 193–211. CrossRefGoogle Scholar
  9. Haberman, S.J. (2007). The information a test provides on an ability parameter (ETS Research Rep. No. RR-07-18). Princeton, NJ: ETS. Google Scholar
  10. Haberman, S.J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204–229. CrossRefGoogle Scholar
  11. Haberman, S.J., & Sinharay, S. (2010, in press). Subscores based on multidimensional item response theory (ETS Research Rep.). Princeton, NJ: ETS. Google Scholar
  12. Haberman, S.J., Sinharay, S., & Puhan, G. (2008). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 79–95. CrossRefGoogle Scholar
  13. Haberman, S.J., von Davier, M., & Lee, Y. (2008). Comparison of multidimensional item response models: multivariate normal ability distributions versus multivariate polytomous distributions (ETS Research Rep. No. RR-08-45). Princeton, NJ: ETS. Google Scholar
  14. Haladyna, S.J., & Kramer, G.A. (2004). The validity of subscores for a credentialing test. Evaluation and the Health Professions, 24(7), 349–368. Google Scholar
  15. Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Newbury Park: Sage. Google Scholar
  16. Luecht, R.M., Gierl, M.J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the annual meeting of the national council on measurement in education, San Francisco, CA, April 2006. Google Scholar
  17. Puhan, G., Sinharay, S., Haberman, S.J., & Larkin, K. (2010, in press). The utility of augmented subscores in a licensure exam: an evaluation of methods using empirical data. Applied Measurement in Education. Google Scholar
  18. Reckase, M.D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36. CrossRefGoogle Scholar
  19. Reckase, M.D. (2007). Multidimensional item response theory. In Rao, C.R., & Sinharay, S. (Eds.), Handbook of statistics (Vol. 26, pp. 607–642). Amsterdam: North-Holland. Google Scholar
  20. Schilling, S., & Bock, R.D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555. Google Scholar
  21. Sinharay, S. (2010, in press). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement. Google Scholar
  22. Sinharay, S., Haberman, S.J., & Puhan, G. (2007). Subscores based on classical test theory: to report or not to report. Educational Measurement: Issues and Practice, 21–28. Google Scholar
  23. Thissen, D., Nelson, L., & Swygert, K.A. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items—approximation methods for scale scores. In Thissen, D., & Wainer, H. (Eds.), Test scoring (pp. 293–341). Hillsdale: Lawrence Erlbaum. Google Scholar
  24. Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., Rosa, K., & Nelson, L. (2001). Augmented scores—“borrowing strength” to compute scores based on small numbers of items. In Thissen, D., & Wainer, H. (Eds.), Test scoring (pp. 343–387). Hillsdale: Lawrence Erlbaum. Google Scholar
  25. Yao, L.H., & Boughton, K.A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological. Measurement, 31(2), 83–105. Google Scholar
  26. Yen, W.M. (1987). A Bayesian/IRT measure of objective performance. Paper presented at the annual meeting of the psychometric society, Montreal, Quebec, April 1987. Google Scholar

Copyright information

© The Psychometric Society 2010

Authors and Affiliations

  1. 1.ETSPrincetonUSA

Personalised recommendations