# Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy

- 2.4k Downloads
- 47 Citations

## Abstract

This paper addresses methodological issues that concern the scaling model used in the international comparison of student attainment in the Programme for International Student Attainment (PISA), specifically with reference to whether PISA’s ranking of countries is confounded by model misfit and differential item functioning (DIF). To determine this, we reanalyzed the publicly accessible data on reading skills from the 2006 PISA survey. We also examined whether the ranking of countries is robust in relation to the errors of the scaling model. This was done by studying invariance across subscales, and by comparing ranks based on the scaling model and ranks based on models where some of the flaws of PISA’s scaling model are taken into account. Our analyses provide strong evidence of misfit of the PISA scaling model and very strong evidence of DIF. These findings do not support the claims that the country rankings reported by PISA are robust.

## Key words

differential item functioning ranking robustness educational testing programme for international student assessment PISA Rasch models reading literacy## References

- Adams, R.J. (2003). Response to ‘Cautions on OECD’s recent educational survey (PISA)’.
*Oxford Review of Education*,*29*, 379–389. Note: Publications from PISA can be found at http://www.oecd.org/pisa/pisaproducts/. CrossRefGoogle Scholar - Adams, R., Berezner, A., & Jakubowski, M. (2010).
*Analysis of PISA 2006 preferred items ranking using the percent-correct method*. Paris: OECD. http://www.oecd.org/pisa/pisaproducts/pisa2006/44919855.pdf. CrossRefGoogle Scholar - Adams, R.J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model.
*Applied Psychological Measurement*,*21*, 1–23. CrossRefGoogle Scholar - Adams, R.J., Wu, M.L., & Carstensen, C.H. (2007). Application of multivariate Rasch models in international large-scale educational assessments. In M. Von Davier & C.H. Carstensen (Eds.),
*Multivariate and mixture distribution Rasch models*(pp. 271–280). New York: Springer. CrossRefGoogle Scholar - Andersen, E.B. (1973). A goodness of fit test for the Rasch model.
*Psychometrika*,*38*, 123–140. CrossRefGoogle Scholar - Brown, G., Micklewrigth, J., Schnepf, S.V., & Waldmann, R. (2007). International surveys of educational achievement: how robust are the findings?
*Journal of the Royal Statistical Society. Series A. General*,*170*, 623–646. CrossRefGoogle Scholar - Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P.W. Holland & H. Wainer (Eds.),
*Differential item functioning*(pp. 35–66). Hilsdale: Lawrence Erlbaum Associates. Google Scholar - Fischer, G.H. & Molenaar, I.W. (Eds.) (1995).
*Rasch models—foundations, recent developments, and applications*. Berlin: Springer. Google Scholar - Glass, G.V., & Hopkins, K.D. (1995). In
*Statistical methods in education and psychology*. Boston: Allyn & Bacon. Google Scholar - Goldstein, H. (2004). International comparisons of student attainment: some issues arising from the PISA study.
*Assessment in Education*,*11*, 319–330. CrossRefGoogle Scholar - Goodman, L.A., & Kruskal, W.H. (1954). Measures of association for cross classifications.
*Journal of the American Statistical Association*,*49*, 732–764. Google Scholar - Hopmann, S.T., Brinek, G., & Retzl, M. (Eds.) (2007).
*PISA zufolge PISA. PISA according to PISA*. Wien: Lit Verlag. http://www.univie.ac.at/pisaaccordingtopisa/pisazufolgepisa.pdf. Google Scholar - Kelderman, H. (1984). Loglinear Rasch model tests.
*Psychometrika*,*49*, 223–245. CrossRefGoogle Scholar - Kelderman, H. (1989). Item bias detection using loglinear IRT.
*Psychometrika*,*54*, 681–697. CrossRefGoogle Scholar - Kirsch, I., de Jng, J., Lafontaine, D., McQueen, J., Mendelovits, J., & Monseur, C. (2002).
*Reading for change. performance and engagement across countries. results from PISA 2000*. Paris: OECD. Google Scholar - Kreiner, S. (1987). Analysis of multidimensional contingency tables by exact conditional tests: techniques and strategies.
*Scandinavian Journal of Theoretical Statistics*,*14*, 97–112. Google Scholar - Kreiner, S. (2011a). A note on item-restscore association in Rasch models.
*Applied Psychological Measurement*,*35*, 557–561. CrossRefGoogle Scholar - Kreiner, S. (2011b). Is the foundation under PISA solid? A critical look at the scaling model underlying international comparisons of student attainment. Research report 11/1, Dept. of Biostatistics, University of Copenhagen. https://ifsv.sund.ku.dk/biostat/biostat_annualreport/images/c/ca/ResearchReport-2011-1.pdf.
- Kreiner, S., & Christensen, K.B. (2007). Validity and objectivity in health-related scales: analysis by graphical loglinear Rasch models. In M. Von Davier & C.H. Carstensen (Eds.),
*Multivariate and mixture distribution Rasch models*(pp. 271–280). New York: Springer. Google Scholar - Kreiner, S., & Christensen, K.B. (2011). Exact evaluation of bias in Rasch model residuals.
*Advances in Mathematics Research*,*12*, 19–40. Google Scholar - Molenaar, I.V. (1983). Some improved diagnostics for failure of the Rasch model.
*Psychometrika*,*48*, 49–72. CrossRefGoogle Scholar - OECD (2000).
*Measuring student knowledge and skills. the PISA 2000 assessment of reading, mathematical and scientific literacy*. Paris: OECD. http://www.oecd.org/dataoecd/44/63/33692793.pdf. Google Scholar - OECD (2006).
*PISA 2006. Technical report*. Paris: OECD. http://www.oecd.org/dataoecd/0/47/42025182.pdf. - OECD (2007).
*PISA 2006. Volume 2: data*. Paris: OECD. CrossRefGoogle Scholar - OECD (2009).
*PISA data analysis manual: SPSS*(2nd ed.). Paris: OECD. http://www.oecd-ilibrary.org/education/pisa-data-analysis-manual-spss-second-edition_9789264056275-en. CrossRefGoogle Scholar - Prais, S.J. (2003). Cautions on OECD’s recent educational survey (PISA).
*Oxford Review of Education*,*29*, 139–163. CrossRefGoogle Scholar - Rosenbaum, P. (1989). Criterion-related construct validity.
*Psychometrika*,*54*, 625–633. CrossRefGoogle Scholar - Smith, R.M. (2004). Fit analysis in latent trait measurement models. In E.V. Smith & R.M. Smith (Eds.),
*Introduction to Rasch measurement*(pp. 73–92). Maple Grove: JAM Press. Google Scholar - Schmitt, A.P., & Dorans, N.J. (1987).
*Differential item functioning on the scholastic aptitude test*. Research memorandum No. 87-1. Princeton NJ: Educational Testing Service. Google Scholar