Skip to main content

Person Fit Across Subgroups: An Achievement Testing Example

  • Chapter
Essays on Item Response Theory

Part of the book series: Lecture Notes in Statistics ((LNS,volume 157))

Abstract

Item response theory (IRT) models are used to describe answering behavior on tests and examinations. Although items may fit an IRT model, some persons may produce misfitting item score patterns, for example, as a result of cheating or lack of motivation. Several statistics have been proposed to detect deviant item score patterns. Misfitting item score patterns may be related to group characteristics such as gender or race. Investigating misfitting item score patterns across different groups is strongly related to differential item functioning (DIF). In this study the usefulness of person fit to compare item score patterns for different groups was investigated. In particular, the effect of misspecification of a model due to DIF on person fit was explored. Empirical data of a math test were analyzed with respect to misfitting item score patterns and DIF for men and women and blacks and whites. Results indicated that there were small differences between subgroups with respect to the number of misfitting item score patterns. Also, the influence of DIF on the fit of a score pattern was small for both gender and ethnic groups. The results imply that person-fit analysis is not very sensitive to model misspecification on the item level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bird, C. (1929). An improved method of detecting cheating on objective examinations. Journal of Educational Research, 19, 341–348.

    Google Scholar 

  • Birenbaum, M. (1985). Comparing the effectiveness of several IRT based appropriateness measures in detecting unusual response patterns. Educational and Psychological Measurement, 45, 523–534.

    Google Scholar 

  • Drasgow, F., Levine M.V., & McLaughlin, M.E. (1991). Appropriateness for some multidimensional test batteries. Applied Psychological Measurement, 15, 171–191.

    Article  Google Scholar 

  • Drasgow, F., Levine, M.V., & Williams, E.A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86.

    Article  Google Scholar 

  • Glas, C.A.W. (1999). Modification indices for the 2-PL and nominal response model. Psychometrika, 64, 273–294.

    Article  Google Scholar 

  • Glas, C.A.W. (2001). Differential item functioning depending on general covariates. In A. Boomsma, M.A.J. van Duijn, & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 131–148). New York: Springer-Verlag.

    Chapter  Google Scholar 

  • Hambleton, R.K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.

    Google Scholar 

  • Holland, P.W. (1996). Assessing unusual agreement between the incorrect answers of two examinees using the K-index: Statistical theory and empirical support (ETS Technical Report No. 96-4). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Klauer, K.C., & Rettig, K. (1990). An approximately standardized person test for assessing consistency with a latent trait model. British Journal of Mathematical and Statistical Psychology, 43, 193–206.

    Article  Google Scholar 

  • Levine, M.V., & Rubin, D.B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269–290.

    Article  Google Scholar 

  • Meijer, R.R. (1998). Consistency of test behavior and individual difference in precision of prediction. Journal of Occupational and Organizational Psychology, 71, 147–160.

    Article  Google Scholar 

  • Meijer, R.R., & Sijtsma, K. (in press). A review of methods for evaluating the fit of item score patterns on a test. Applied Psychological Measurement.

    Google Scholar 

  • Mislevy R.J., & Bock, R.D. (1990). BILOG user’s guide [Software manual]. Chicago: Scientific Software.

    Google Scholar 

  • Molenaar, I.W. (1983). Some improved diagnostics for failure of the Rasch model. Psychometrika, 48, 49–73.

    Article  Google Scholar 

  • Molenaar, I.W., & Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55, 75–106.

    Article  MathSciNet  Google Scholar 

  • Reise, S.P., & Flannery, W.P. (1996). Assessing person-fit measurement of typical performance applications. Applied Measurement in Education, 9, 9–26.

    Article  Google Scholar 

  • Rosenbaum, P.R. (1987). Probability inequalities for latent scales. British Journal of Mathematical and Statistical Psychology, 40, 157–168.

    Article  MathSciNet  MATH  Google Scholar 

  • Rudner, L.M., Bracey, G., & Skaggs, G. (1996). The use of a person-fit statistic with one high quality achievement test. Applied Measurement in Education, 9, 91–109.

    Article  Google Scholar 

  • Schmitt, N., Chan, D., Sacco, J.M., McFarland, L.A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23, 41–53.

    Article  Google Scholar 

  • Schmitt, N., Cortina, J.M., & Whitney, D.J. (1993). Appropriateness fit and criterion-related validity. Applied Psychological Measurement, 17, 143–150.

    Article  Google Scholar 

  • Sijtsma, K., & Meijer, R.R. (in press). The person response function as a tool in person-fit research. Psychometrika.

    Google Scholar 

  • Snijders, T.A.B. (in press). Asymptotic null distributions of person-fit statistics with estimated person parameter. Psychometrika.

    Google Scholar 

  • Trabin, T.E., & Weiss, D.J. (1983). The person response curve: Fit of individuals to item response theory models. In D.J. Weiss (Ed.), New horizons in testing (pp. 83–108). New York: Academic Press.

    Google Scholar 

  • Van der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13, 267–298.

    Article  Google Scholar 

  • Van der Linden, W.J., & Hambleton, R.K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer-Verlag.

    MATH  Google Scholar 

  • Van Krimpen-Stoop, E.M.L.A., & Meijer, R.R. (1999). Simulating the null distribution of person-fit statistics for conventional and adaptive tests. Applied Psychological Measurement, 23, 327–345.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media New York

About this chapter

Cite this chapter

Meijer, R.R., van Krimpen-Stoop, E.M.L.A. (2001). Person Fit Across Subgroups: An Achievement Testing Example. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds) Essays on Item Response Theory. Lecture Notes in Statistics, vol 157. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-0169-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-0169-1_20

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-95147-8

  • Online ISBN: 978-1-4613-0169-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics