Assessing Item Fit for Unidimensional Item Response Theory Models Using Residuals from Estimated Item Response Functions

Haberman, Shelby J.; Sinharay, Sandip; Chon, Kyong Hee

doi:10.1007/s11336-012-9305-1

Assessing Item Fit for Unidimensional Item Response Theory Models Using Residuals from Estimated Item Response Functions

Published: 14 December 2012

Volume 78, pages 417–440, (2013)
Cite this article

Psychometrika Aims and scope Submit manuscript

Shelby J. Haberman¹,
Sandip Sinharay³ &
Kyong Hee Chon²

1821 Accesses
35 Citations
Explore all metrics

Abstract

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Item Response Theory

Estimation Methods for Item Factor Analysis: An Overview

On Matters of Invariance in Latent Variable Models: Reflections on the Concept, and its Relations in Classical and Item Response Theory

Notes

By standardization, we refer to dividing the difference of a variable and its expectation by the standard deviation of the difference.
The program is available on request from us.
We also computed our suggested residuals for 15 equispaced values between −2.8 and 2.8 to make these values the same as the midpoints of the intervals used to compute the standardized residuals (Hambleton et al. 1991). The results were virtually unchanged.
Sinharay (2010) reported the average disattenuated correlations among subtest scores from 20+ operational tests. The lowest value reported was 0.69.
We imposed this condition because we noticed that for some easy items, the values of both \(\hat{F}_{j}(\theta)\) and \(\bar {F}_{j}(\theta)\) are larger than 0.99 for 0<θ<2 so that the corresponding residual should not be practically significant, but it is statistically significant.
Note that this reordering was done for convenience. Operationally, the anchor items are interspersed with the operational items.

References

American Educational Research Association, American Psychological Association, & National Council for Measurement in Education (1999). Standards for educational and psychological testing. Washington: American Educational Research Association.
Google Scholar
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an em algorithm. Psychometrika, 46, 443–459.
Article Google Scholar
Bock, R.D., & Haberman, S.J. (2009). Confidence bands for examining goodness-of-fit of estimated item response functions. Paper presented at the annual meeting of the Psychometric Society. Cambridge, UK.
Box, G.E.P., & Draper, N.R. (1987). Empirical model-building and response surfaces. New York: Wiley.
Google Scholar
Chon, K.H., Lee, W., & Dunbar, S.B. (2010). A comparison of item fit statistics for mixed IRT models. Journal of Educational Measurement, 47, 318–338.
Article Google Scholar
Cochran, W.G. (1977). Sampling techniques (3rd ed.). New York: Wiley.
Google Scholar
Dodeen, H. (2004). The relationship between item parameters and item fit. Journal of Educational Measurement, 41, 259–268.
Article Google Scholar
du Toit, M. (2003). IRT from SSI. Lincolnwood: Scientific Software International.
Google Scholar
Glas, C.A.W., & Suarez-Falcon, J.C. (2003). A comparison of item-fit statistics for the three-parameter logistic model. Applied Psychological Measurement, 27(2), 87–106.
Article Google Scholar
Haberman, S.J. (1976). Generalized residuals for log-linear models. In Proceedings of the ninth international biometrics conference (Vol. 1, pp. 104–172). Boston: International Biometric Society.
Google Scholar
Haberman, S.J. (1977a). Log-linear models and frequency tables with small expected cell counts. The Annals of Statistics, 5, 1148–1169.
Article Google Scholar
Haberman, S.J. (1977b). Maximum likelihood estimates in exponential response models. The Annals of Statistics, 5, 815–841.
Article Google Scholar
Haberman, S.J. (1978). Analysis of qualitative data, Vol. I: Introductory topics. New York: Academic Press.
Google Scholar
Haberman, S.J. (1979). Analysis of qualitative data, Vol. II: New developments. New York: Academic Press.
Google Scholar
Haberman, S.J. (1988). A stabilized Newton-Raphson algorithm for log-linear models for frequency tables derived by indirect observation. Sociological Methodology, 18, 193–211.
Article Google Scholar
Haberman, S.J. (2006). Adaptive quadrature for item response models (Research Rep. No. RR-06-29). Princeton: ETS.
Haberman, S.J. (2009). Use of generalized residuals to examine goodness of fit of item response models (Research Rep. No. RR-09-15). Princeton: ETS.
Haberman, S.J., & Sinharay, S. (2012). Assessing goodness of fit of item response theory models using generalized residuals (Unpublished manuscript).
Hambleton, R.K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: a five step plan and several graphical displays. In W.R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington: Degnon Associates.
Google Scholar
Hambleton, R.K., & Swaminathan, H. (1985). Item response theory: principles and applications. Boston: Kluwer Academic.
Google Scholar
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). In Fundamentals of item response theory. Newbury Park: Sage.
Google Scholar
Holland, P.W. (1990). The Dutch identity: a new tool for the study of item response models. Psychometrika, 55, 5–18.
Article Google Scholar
Kang, T., & Chen, T.T. (2008). Performance of the generalized S−χ ² item-fit index for polytomous IRT models. Journal of Educational Measurement, 45, 391–406.
Article Google Scholar
Kolen, M.J., & Brennan, R.L. (2004). Test equating, scaling, and linking (2nd ed.). New York: Springer.
Book Google Scholar
Li, Y., & Rupp, A.A. (2011). Performance of the S−χ ² statistic for full-information bifactor models. Educational and Psychological Measurement, 71, 986–1005.
Article Google Scholar
Liang, T., Han, T.K., &, Hambleton, R.K. (2009). ResidPlots-2: computer software for IRT graphical residual analyses. Applied Psychological Measurement, 33, 411–412.
Article Google Scholar
Louis, T. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B, 44.
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47.
Mislevy, R.J., & Bock, R.D. (1991). BILOG 3.11 [computer software]. Lincolnwood: Scientific Software International.
Google Scholar
Muraki, E. (1997). A generalized partial credit model. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory. New York: Springer.
Google Scholar
Muraki, E., & Bock, R.D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating-scale data [computer program]. Chicago: Scientific Software.
Google Scholar
Naylor, J.C., & Smith, A.F.M. (1982). Applications of a method for the efficient computation of posterior distributions. Applied Statistics, 31, 214–225.
Article Google Scholar
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.
Article Google Scholar
Rao, C.R. (1973). Linear statistical inference and its applications (2nd ed.). New York: Wiley.
Book Google Scholar
Reckase, M.D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36.
Article Google Scholar
Sinharay, S. (2005). Assessing fit of unidimensional item response theory models using a Bayesian approach. Journal of Educational Measurement, 42, 375–394.
Article Google Scholar
Sinharay, S. (2006). Bayesian item fit analysis for unidimensional item response theory models. British Journal of Mathematical & Statistical Psychology, 59, 429–449.
Article Google Scholar
Sinharay, S. (2010). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement, 47, 150–174.
Article Google Scholar
Stone, C.A., & Zhang, B. (2003). Assessing goodness of fit of item response theory models: a comparison of traditional and alternative procedures. Journal of Educational Measurement, 40(4), 331–352.
Article Google Scholar
von Davier, M., Sinharay, S., Beaton, A.E., & Oranje, A. (2006). The statistical procedures used in national assessment of educational progress. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics (Vol. 26, pp. 205–233). Amsterdam: North-Holland.
Google Scholar
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339–368.
Article Google Scholar
Yen, W. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research & Development, Educational Testing Service, Rosedale Road, Princeton, NJ, 08541, USA
Shelby J. Haberman
Educational Administration, Leadership, and Research, Western Kentucky University, 1906 College Heights Building, Bowling Green, KY, 42101, USA
Kyong Hee Chon
CTB/McGraw-Hill, Monterey, CA, USA
Sandip Sinharay

Authors

Shelby J. Haberman
View author publications
You can also search for this author in PubMed Google Scholar
Sandip Sinharay
View author publications
You can also search for this author in PubMed Google Scholar
Kyong Hee Chon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandip Sinharay.

Additional information

Note: Any opinions expressed in this publication are those of the authors and not necessarily of Educational Testing Service. Sandip Sinharay conducted this study and wrote this report while on staff at Educational Testing Service. He is currently at CTB/McGraw-Hill.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haberman, S.J., Sinharay, S. & Chon, K.H. Assessing Item Fit for Unidimensional Item Response Theory Models Using Residuals from Estimated Item Response Functions. Psychometrika 78, 417–440 (2013). https://doi.org/10.1007/s11336-012-9305-1

Download citation

Received: 25 April 2011
Revised: 08 March 2012
Published: 14 December 2012
Issue Date: July 2013
DOI: https://doi.org/10.1007/s11336-012-9305-1

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing Item Fit for Unidimensional Item Response Theory Models Using Residuals from Estimated Item Response Functions

Abstract

Access this article

Similar content being viewed by others

Item Response Theory

Estimation Methods for Item Factor Analysis: An Overview

On Matters of Invariance in Latent Variable Models: Reflections on the Concept, and its Relations in Classical and Item Response Theory

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Assessing Item Fit for Unidimensional Item Response Theory Models Using Residuals from Estimated Item Response Functions

Abstract

Access this article

Similar content being viewed by others

Item Response Theory

Estimation Methods for Item Factor Analysis: An Overview

On Matters of Invariance in Latent Variable Models: Reflections on the Concept, and its Relations in Classical and Item Response Theory

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation