Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model

Abstract

A variety of statistical methods have been suggested for detecting differential item functioning (DIF) in the Rasch model. Most of these methods are designed for the comparison of pre-specified focal and reference groups, such as males and females. Latent class approaches, on the other hand, allow the detection of previously unknown groups exhibiting DIF. However, this approach provides no straightforward interpretation of the groups with respect to person characteristics. Here, we propose a new method for DIF detection based on model-based recursive partitioning that can be considered as a compromise between those two extremes. With this approach it is possible to detect groups of subjects exhibiting DIF, which are not pre-specified, but result from combinations of observed covariates. These groups are directly interpretable and can thus help generate hypotheses about the psychological sources of DIF. The statistical background and construction of the new method are introduced by means of an instructive example, and extensive simulation studies are presented to support and illustrate the statistical properties of the method, which is then applied to empirical data from a general knowledge quiz. A software implementation of the method is freely available in the R system for statistical computing.

This is a preview of subscription content, log in to check access.

Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.

Notes

  1. 1.

    At the time the quiz was conducted, Croatia was not yet a member of the EU.

References

  1. Andersen, E. (1972). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.

    Article  Google Scholar 

  2. Andrews, D.W.K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61, 821–856.

    Article  Google Scholar 

  3. Ben-Shakhar, G., & Sinai, Y. (1991). Gender differences in multiple-choice tests: the role of differential guessing tendencies. Journal of Educational Measurement, 28(1), 23–35.

    Article  Google Scholar 

  4. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. Lord & M. Novick (Eds.), Statistical theories of mental test scores, Reading: Addison-Wesley.

    Google Scholar 

  5. Boulesteix, A.L. (2006). Maximally selected chi-square statistics and binary splits of nominal variables. Biometrical Journal, 48(5), 838–848.

    PubMed  Article  Google Scholar 

  6. Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. New York: Chapman and Hall.

    Google Scholar 

  7. Cohen, A., & Bolt, D. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(3), 133–148.

    Article  Google Scholar 

  8. Dobra, A., & Gehrke, J. (2001). Bias correction in classification tree construction. In C.E. Brodley & A.P. Danyluk (Eds.), Proceedings of the seventeenth international conference on machine learning (ICML 2001), Williams College, Williamstown, MA, USA (pp. 90–97). San Mateo: Morgan Kaufmann.

    Google Scholar 

  9. Fischer, G. & Molenaar, I. (Eds.) (1995). Rasch models: foundations, recent developments and applications. New York: Springer.

    Google Scholar 

  10. Fraley, C., & Raftery, A. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97(458), 611–631.

    Article  Google Scholar 

  11. Fraley, C., & Raftery, A. (2012). mclust: Model-based clustering/Normal mixture modeling. R package version 3.4.11. http://CRAN.R-project.org/package=mclust.

  12. Gelin, M., Carleton, B., Smith, M., & Zumbo, B. (2004). The dimensionality and gender differential item functioning of the mini asthma quality of life questionnaire (MiniAQLQ). Social Indicators Research, 68, 91–105.

    Article  Google Scholar 

  13. Gustafsson, J. (1980). Testing and obtaining fit of data in the Rasch model. British Journal of Mathematical & Statistical Psychology, 33(2), 205–233.

    Article  Google Scholar 

  14. Hancock, G. & Samuelsen, K. (Eds.) (2007). Advances in latent variable mixture models. Charlotte: Information Age.

    Google Scholar 

  15. Hochberg, Y. & Tamhane, A. (Eds.) (1987). Multiple comparison procedures. New York: Wiley.

    Google Scholar 

  16. Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: a conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.

    Article  Google Scholar 

  17. Hothorn, T., & Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis, 43(2), 121–137.

    Article  Google Scholar 

  18. Hothorn, T., & Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics, 64(4), 1263–1269.

    PubMed  Article  Google Scholar 

  19. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  20. Kelderman, H., & MacReady, G. (1990). The use of loglinear models for assessing differential item functioning across manifest and latent examinee groups. Journal of Educational Measurement, 27(4), 307–327.

    Article  Google Scholar 

  21. Koziol, J. (1991). On maximally selected chi-square statistics. Biometrics, 47(4), 1557–1561.

    Article  Google Scholar 

  22. Liou, M. (1994). More on the computation of higher-order derivatives on the elementary symmetric functions in the Rasch model. Applied Psychological Measurement, 18(1), 53–62.

    Article  Google Scholar 

  23. Maij-de Meij, A., Kelderman, H., & Van der Flier, H. (2008). Fitting a mixture item response theory model to personality questionnaire data: characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement, 32(8), 611–631.

    Article  Google Scholar 

  24. Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: the eRm package for the application of IRT models in R. Journal of Statistical Software, 20, 9. http://www.jstatsoft.org/v20/i09/.

    Google Scholar 

  25. Mair, P., Hatzinger, R., & Maier, M. (2012). eRm: extended Rasch modeling. R package version 0.15-0. http://CRAN.R-project.org/package=eRm.

  26. Marcus, R., Peritz, E., & Gabriel, K. (1976). Closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63(3), 655–660.

    Article  Google Scholar 

  27. Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.

    Article  Google Scholar 

  28. Merkle, E.C., Fan, J., & Zeileis, A. (2013). Testing for measurement invariance with respect to an ordinal variable. Psychometrika, forthcoming.

  29. Merkle, E.C., & Zeileis, A. (2013). Tests of measurement invariance without subgroups: a generalization of classical methods. Psychometrika, 78(1), 59–82.

    PubMed  Article  Google Scholar 

  30. Miller, R., & Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics, 38(4), 1011–1016.

    Article  Google Scholar 

  31. Milligan, G., & Cooper, M. (1986). A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21(4), 441–458.

    Article  Google Scholar 

  32. Mislevy, R., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(2), 195–215.

    Article  Google Scholar 

  33. Pedraza, O., Graff-Radford, N., Smith, G., Ivnik, R., Willis, F., Petersen, R., & Lucas, J. (2009). Differential item functioning of the Boston Naming Test in cognitively normal African American and Caucasian older adults. Journal of the International Neuropsychological Society, 15(05), 758–768.

    PubMed Central  PubMed  Article  Google Scholar 

  34. Penfield, D. (2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of Educational Measurement, 44(3), 187–210.

    Article  Google Scholar 

  35. Penfield, D., Alvarez, K., & Lee, O. (2009). Using a taxonomy of differential step functioning to improve the interpretation of DIF in polytomous items: an illustration. Applied Measurement in Education, 22(1), 61–78.

    Article  Google Scholar 

  36. Perkins, A., Stump, T., Monahan, P., & McHorney, C. (2006). Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Quality of Life Research, 15, 331–348.

    PubMed  Article  Google Scholar 

  37. R Development Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0. http://www.R-project.org/.

    Google Scholar 

  38. Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8(2), 185–205.

    PubMed  Article  Google Scholar 

  39. Rizopoulos, D. (2006). ltm: an R package for latent variable modeling and item response analysis. Journal of Statistical Software, 17, 5. http://www.jstatsoft.org/v17/i05/.

    Google Scholar 

  40. Rizopoulos, D. (2012). ltm: latent trait models under IRT. R package version 0.9-9. http://CRAN.R-project.org/package=ltm.

  41. Rost, J. (1990). Rasch models in latent classes: an integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.

    Article  Google Scholar 

  42. Shih, Y.S. (2004). A note on split selection bias in classification trees. Computational Statistics & Data Analysis, 45(3), 457–466.

    Article  Google Scholar 

  43. Smit, J., Kelderman, H., & Van der Flier, H. (2000). The mixed Birnbaum model: estimation using collateral information. Methods of Psychological Research Online, 5, 1–13.

    Google Scholar 

  44. Strobl, C., Boulesteix, A.L., & Augustin, T. (2007). Unbiased split selection for classification trees based on the Gini index. Computational Statistics & Data Analysis, 52(1), 483–501.

    Article  Google Scholar 

  45. Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application and characteristics of classification and regression trees, bagging and random forests. Psychological Methods, 14(4), 323–348.

    PubMed Central  PubMed  Article  Google Scholar 

  46. Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley–Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics, 36(2), 135–153.

    Google Scholar 

  47. Trepte, S. & Verbeet, M. (Eds.) (2010). Allgemeinbildung in Deutschland—Erkenntnisse aus dem SPIEGEL Studentenpisa-Test. Wiesbaden: VS Verlag.

    Google Scholar 

  48. Van den Noortgate, W., & De Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30(4), 443–464.

    Article  Google Scholar 

  49. Westers, P., & Kelderman, H. (1992). Examining differential item functioning due to item difficulty and alternative attractiveness. Psychometrika, 57(1), 107–118.

    Article  Google Scholar 

  50. Woods, C., Oltmanns, T., & Turkheimer, E. (2009). Illustration of MIMIC-model DIF testing with the schedule for nonadaptive and adaptive personality. Journal of Psychopathology and Behavioral Assessment, 31, 320–330.

    PubMed Central  PubMed  Article  Google Scholar 

  51. Zeileis, A., & Hornik, K. (2007). Generalized m-fluctuation tests for parameter instability. Statistica Neerlandica, 61(4), 488–508.

    Article  Google Scholar 

  52. Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514.

    Article  Google Scholar 

  53. Zeileis, A., Strobl, C., Wickelmaier, F., & Kopf, J. (2012). psychotree: recursive partitioning based on psychometric models. R package version 0.12-2. http://CRAN.R-project.org/package=psychotree.

Download references

Acknowledgements

Julia Kopf is supported by the German Federal Ministry of Education and Research (BMBF) within the project “Heterogeneity in IRT-Models” (grant ID 01JG1060).

The authors would like to thank three anonymous reviewers for their very helpful and constructive feedback.

Special thanks go to Reinhold Hatzinger (1953–2012), who stimulated important insights for this and other projects through many conversations and his extensive work on the R package eRm (Mair & Hatzinger, 2007; Mair et al., 2012). We miss him as a researcher and friend.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Carolin Strobl.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Strobl, C., Kopf, J. & Zeileis, A. Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model. Psychometrika 80, 289–316 (2015). https://doi.org/10.1007/s11336-013-9388-3

Download citation

Key words

  • item response theory
  • IRT
  • Rasch model
  • differential item functioning
  • DIF
  • measurement invariance
  • structural change
  • model-based recursive partitioning