Psychometrika

, Volume 79, Issue 2, pp 210–231 | Cite as

Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy

Article

Abstract

This paper addresses methodological issues that concern the scaling model used in the international comparison of student attainment in the Programme for International Student Attainment (PISA), specifically with reference to whether PISA’s ranking of countries is confounded by model misfit and differential item functioning (DIF). To determine this, we reanalyzed the publicly accessible data on reading skills from the 2006 PISA survey. We also examined whether the ranking of countries is robust in relation to the errors of the scaling model. This was done by studying invariance across subscales, and by comparing ranks based on the scaling model and ranks based on models where some of the flaws of PISA’s scaling model are taken into account. Our analyses provide strong evidence of misfit of the PISA scaling model and very strong evidence of DIF. These findings do not support the claims that the country rankings reported by PISA are robust.

Key words

differential item functioning ranking robustness educational testing programme for international student assessment PISA Rasch models reading literacy 

References

  1. Adams, R.J. (2003). Response to ‘Cautions on OECD’s recent educational survey (PISA)’. Oxford Review of Education, 29, 379–389. Note: Publications from PISA can be found at http://www.oecd.org/pisa/pisaproducts/. CrossRefGoogle Scholar
  2. Adams, R., Berezner, A., & Jakubowski, M. (2010). Analysis of PISA 2006 preferred items ranking using the percent-correct method. Paris: OECD. http://www.oecd.org/pisa/pisaproducts/pisa2006/44919855.pdf. CrossRefGoogle Scholar
  3. Adams, R.J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23. CrossRefGoogle Scholar
  4. Adams, R.J., Wu, M.L., & Carstensen, C.H. (2007). Application of multivariate Rasch models in international large-scale educational assessments. In M. Von Davier & C.H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 271–280). New York: Springer. CrossRefGoogle Scholar
  5. Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140. CrossRefGoogle Scholar
  6. Brown, G., Micklewrigth, J., Schnepf, S.V., & Waldmann, R. (2007). International surveys of educational achievement: how robust are the findings? Journal of the Royal Statistical Society. Series A. General, 170, 623–646. CrossRefGoogle Scholar
  7. Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P.W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Hilsdale: Lawrence Erlbaum Associates. Google Scholar
  8. Fischer, G.H. & Molenaar, I.W. (Eds.) (1995). Rasch models—foundations, recent developments, and applications. Berlin: Springer. Google Scholar
  9. Glass, G.V., & Hopkins, K.D. (1995). In Statistical methods in education and psychology. Boston: Allyn & Bacon. Google Scholar
  10. Goldstein, H. (2004). International comparisons of student attainment: some issues arising from the PISA study. Assessment in Education, 11, 319–330. CrossRefGoogle Scholar
  11. Goodman, L.A., & Kruskal, W.H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732–764. Google Scholar
  12. Hopmann, S.T., Brinek, G., & Retzl, M. (Eds.) (2007). PISA zufolge PISA. PISA according to PISA. Wien: Lit Verlag. http://www.univie.ac.at/pisaaccordingtopisa/pisazufolgepisa.pdf. Google Scholar
  13. Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, 223–245. CrossRefGoogle Scholar
  14. Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54, 681–697. CrossRefGoogle Scholar
  15. Kirsch, I., de Jng, J., Lafontaine, D., McQueen, J., Mendelovits, J., & Monseur, C. (2002). Reading for change. performance and engagement across countries. results from PISA 2000. Paris: OECD. Google Scholar
  16. Kreiner, S. (1987). Analysis of multidimensional contingency tables by exact conditional tests: techniques and strategies. Scandinavian Journal of Theoretical Statistics, 14, 97–112. Google Scholar
  17. Kreiner, S. (2011a). A note on item-restscore association in Rasch models. Applied Psychological Measurement, 35, 557–561. CrossRefGoogle Scholar
  18. Kreiner, S. (2011b). Is the foundation under PISA solid? A critical look at the scaling model underlying international comparisons of student attainment. Research report 11/1, Dept. of Biostatistics, University of Copenhagen. https://ifsv.sund.ku.dk/biostat/biostat_annualreport/images/c/ca/ResearchReport-2011-1.pdf.
  19. Kreiner, S., & Christensen, K.B. (2007). Validity and objectivity in health-related scales: analysis by graphical loglinear Rasch models. In M. Von Davier & C.H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 271–280). New York: Springer. Google Scholar
  20. Kreiner, S., & Christensen, K.B. (2011). Exact evaluation of bias in Rasch model residuals. Advances in Mathematics Research, 12, 19–40. Google Scholar
  21. Molenaar, I.V. (1983). Some improved diagnostics for failure of the Rasch model. Psychometrika, 48, 49–72. CrossRefGoogle Scholar
  22. OECD (2000). Measuring student knowledge and skills. the PISA 2000 assessment of reading, mathematical and scientific literacy. Paris: OECD. http://www.oecd.org/dataoecd/44/63/33692793.pdf. Google Scholar
  23. OECD (2006). PISA 2006. Technical report. Paris: OECD. http://www.oecd.org/dataoecd/0/47/42025182.pdf.
  24. OECD (2007). PISA 2006. Volume 2: data. Paris: OECD. CrossRefGoogle Scholar
  25. Prais, S.J. (2003). Cautions on OECD’s recent educational survey (PISA). Oxford Review of Education, 29, 139–163. CrossRefGoogle Scholar
  26. Rosenbaum, P. (1989). Criterion-related construct validity. Psychometrika, 54, 625–633. CrossRefGoogle Scholar
  27. Smith, R.M. (2004). Fit analysis in latent trait measurement models. In E.V. Smith & R.M. Smith (Eds.), Introduction to Rasch measurement (pp. 73–92). Maple Grove: JAM Press. Google Scholar
  28. Schmitt, A.P., & Dorans, N.J. (1987). Differential item functioning on the scholastic aptitude test. Research memorandum No. 87-1. Princeton NJ: Educational Testing Service. Google Scholar

Copyright information

© The Psychometric Society 2013

Authors and Affiliations

  1. 1.Department of BiostatisticsUniversity of CopenhagenCopenhagen KDenmark

Personalised recommendations