Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy

Kreiner, Svend; Christensen, Karl Bang

doi:10.1007/s11336-013-9347-z

Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy

Published: 14 June 2013

Volume 79, pages 210–231, (2014)
Cite this article

Psychometrika Aims and scope Submit manuscript

Svend Kreiner¹ &
Karl Bang Christensen¹

3286 Accesses
86 Citations
54 Altmetric
6 Mentions
Explore all metrics

Abstract

This paper addresses methodological issues that concern the scaling model used in the international comparison of student attainment in the Programme for International Student Attainment (PISA), specifically with reference to whether PISA’s ranking of countries is confounded by model misfit and differential item functioning (DIF). To determine this, we reanalyzed the publicly accessible data on reading skills from the 2006 PISA survey. We also examined whether the ranking of countries is robust in relation to the errors of the scaling model. This was done by studying invariance across subscales, and by comparing ranks based on the scaling model and ranks based on models where some of the flaws of PISA’s scaling model are taken into account. Our analyses provide strong evidence of misfit of the PISA scaling model and very strong evidence of DIF. These findings do not support the claims that the country rankings reported by PISA are robust.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

The Relation Between Family Socioeconomic Status and Academic Achievement in China: A Meta-analysis

Article 01 August 2019

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Article Open access 17 April 2024

References

Adams, R.J. (2003). Response to ‘Cautions on OECD’s recent educational survey (PISA)’. Oxford Review of Education, 29, 379–389. Note: Publications from PISA can be found at http://www.oecd.org/pisa/pisaproducts/.
Article Google Scholar
Adams, R., Berezner, A., & Jakubowski, M. (2010). Analysis of PISA 2006 preferred items ranking using the percent-correct method. Paris: OECD. http://www.oecd.org/pisa/pisaproducts/pisa2006/44919855.pdf.
Book Google Scholar
Adams, R.J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.
Article Google Scholar
Adams, R.J., Wu, M.L., & Carstensen, C.H. (2007). Application of multivariate Rasch models in international large-scale educational assessments. In M. Von Davier & C.H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 271–280). New York: Springer.
Chapter Google Scholar
Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.
Article Google Scholar
Brown, G., Micklewrigth, J., Schnepf, S.V., & Waldmann, R. (2007). International surveys of educational achievement: how robust are the findings? Journal of the Royal Statistical Society. Series A. General, 170, 623–646.
Article Google Scholar
Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P.W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Hilsdale: Lawrence Erlbaum Associates.
Google Scholar
Fischer, G.H. & Molenaar, I.W. (Eds.) (1995). Rasch models—foundations, recent developments, and applications. Berlin: Springer.
Google Scholar
Glass, G.V., & Hopkins, K.D. (1995). In Statistical methods in education and psychology. Boston: Allyn & Bacon.
Google Scholar
Goldstein, H. (2004). International comparisons of student attainment: some issues arising from the PISA study. Assessment in Education, 11, 319–330.
Article Google Scholar
Goodman, L.A., & Kruskal, W.H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732–764.
Google Scholar
Hopmann, S.T., Brinek, G., & Retzl, M. (Eds.) (2007). PISA zufolge PISA. PISA according to PISA. Wien: Lit Verlag. http://www.univie.ac.at/pisaaccordingtopisa/pisazufolgepisa.pdf.
Google Scholar
Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, 223–245.
Article Google Scholar
Kelderman, H. (1989). Item bias detection using loglinear IRT. Psychometrika, 54, 681–697.
Article Google Scholar
Kirsch, I., de Jng, J., Lafontaine, D., McQueen, J., Mendelovits, J., & Monseur, C. (2002). Reading for change. performance and engagement across countries. results from PISA 2000. Paris: OECD.
Google Scholar
Kreiner, S. (1987). Analysis of multidimensional contingency tables by exact conditional tests: techniques and strategies. Scandinavian Journal of Theoretical Statistics, 14, 97–112.
Google Scholar
Kreiner, S. (2011a). A note on item-restscore association in Rasch models. Applied Psychological Measurement, 35, 557–561.
Article Google Scholar
Kreiner, S. (2011b). Is the foundation under PISA solid? A critical look at the scaling model underlying international comparisons of student attainment. Research report 11/1, Dept. of Biostatistics, University of Copenhagen. https://ifsv.sund.ku.dk/biostat/biostat_annualreport/images/c/ca/ResearchReport-2011-1.pdf.
Kreiner, S., & Christensen, K.B. (2007). Validity and objectivity in health-related scales: analysis by graphical loglinear Rasch models. In M. Von Davier & C.H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 271–280). New York: Springer.
Google Scholar
Kreiner, S., & Christensen, K.B. (2011). Exact evaluation of bias in Rasch model residuals. Advances in Mathematics Research, 12, 19–40.
Google Scholar
Molenaar, I.V. (1983). Some improved diagnostics for failure of the Rasch model. Psychometrika, 48, 49–72.
Article Google Scholar
OECD (2000). Measuring student knowledge and skills. the PISA 2000 assessment of reading, mathematical and scientific literacy. Paris: OECD. http://www.oecd.org/dataoecd/44/63/33692793.pdf.
Google Scholar
OECD (2006). PISA 2006. Technical report. Paris: OECD. http://www.oecd.org/dataoecd/0/47/42025182.pdf.
OECD (2007). PISA 2006. Volume 2: data. Paris: OECD.
Book Google Scholar
OECD (2009). PISA data analysis manual: SPSS (2nd ed.). Paris: OECD. http://www.oecd-ilibrary.org/education/pisa-data-analysis-manual-spss-second-edition_9789264056275-en.
Book Google Scholar
Prais, S.J. (2003). Cautions on OECD’s recent educational survey (PISA). Oxford Review of Education, 29, 139–163.
Article Google Scholar
Rosenbaum, P. (1989). Criterion-related construct validity. Psychometrika, 54, 625–633.
Article Google Scholar
Smith, R.M. (2004). Fit analysis in latent trait measurement models. In E.V. Smith & R.M. Smith (Eds.), Introduction to Rasch measurement (pp. 73–92). Maple Grove: JAM Press.
Google Scholar
Schmitt, A.P., & Dorans, N.J. (1987). Differential item functioning on the scholastic aptitude test. Research memorandum No. 87-1. Princeton NJ: Educational Testing Service.

Download references

Author information

Authors and Affiliations

Department of Biostatistics, University of Copenhagen, Oster Farimagsgade 5, B, PO Box 2029, 1014, Copenhagen K, Denmark
Svend Kreiner & Karl Bang Christensen

Authors

Svend Kreiner
View author publications
You can also search for this author in PubMed Google Scholar
Karl Bang Christensen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Svend Kreiner.

Appendices

Appendix A. Information on Countries

Table A.1 provides information on (a) average scores and the number of students with complete responses to 20 items, (b) DIF equated scores with Azerbaijan as reference country (Kreiner and Christensen 2007), and (c) overall tests of fit of the Rasch model in 56 countries.

Table A.1. Average total and DIF equated scores on 20 items in 56 countries and conditional likelihood ratio (CLR) tests of the Rasch model comparing item parameters estimated for student with raw scores below and above the median raw score in the country.

Full size table

Appendix B. Analyses of Data from All Booklets with Reading Items

In addition to Booklet 6 with 28 reading items, reading items can also be found in Booklets 2, 7, 9, 11, 12, and 13. Each of these booklets contained 14 items. Booklets 9, 11, and 13 had items from reading units R055, R104, and R111. We refer the these booklets together with Booklet 6 that also had these reading units as Booklet set 1. Booklet set 2 consists of Booklets 2, 6, 7, and 12 with reading units R067, R102, R219, and R220.

Table B.1 shows the overall CLR tests of the Rasch model for the two different booklet sets as a whole and for the different booklets. In addition to the CLR tests not DIF relative to country, CLR tests also provided evidence of DIF relative to booklets (Booklet set 1: CLR=1669.0, df=42, p<0.00005; Booklet set 2: CLR=3260.3, df=27p<0.00005).

Additional information on these analyses is available from the authors on request.

Table B.1. Overall fit statistics for Booklets sets 1 and 2 and for Booklets 2,7,9,11,12,13.

Full size table

Appendix C. Assessment of Ranking Error

Let Y _cvi be the score on item i by person v from country c (c=1,…,C; v=1,…,N _c; i=1,…,I) and let A be the indices of a subset of items A⊂{1,…,I}. The total score on all items is \(S_{cv} = \sum_{i = 1}^{I} Y_{cvi}\) and the subscore over items in A is T _cv=∑_i∈A Y _cvi. This Appendix is concerned with errors when countries are ranked according to averages \(S_{c} = \frac{1}{N_{c}}\sum_{v} S_{cv}\) and \(T_{c} = \frac{1}{N_{c}}\sum_{v} T_{cv}\).

We assume that item responses fit a Rasch model with a latent variable Θ. The distribution of Θ may be nonparametric or parametric. In the nonparametric case, the population parameters of interest are the score probabilities P(S _cv=s) and P(T _cv=t). The marginal distribution of T _cv is given by P(T _c=t)=∑_s P(T _cv=t|S _cv=s)P(S _cv=s). Under the Rasch model, P(T _cv=t|S _cv=s) depends on item parameters, but not on Θ. Under such a model, it is consequently easy to calculate estimates of the subscore probabilities P(T _cv=t) in the nonparametric case if consistent estimates of the item parameters and the score probabilities P(S _cv=s) are available.

In the parametric case, we assume that the latent variables are Gaussian normal with means ξ _c and standard deviations σ _c. Given these distributions, Monte Carlo methods provide simple estimates of the distributions of S _cv and T _cv based on estimates of item parameters together with estimates of ξ _c and σ _c.

The country ranks according to (S ₁,…,S _C) and (T ₁,…,T _C) are expected to be similar under the Rasch model, but ranking errors will occur depending on the number of items and on sample sizes in different countries: the smaller the sample size and the smaller the number of items, the larger the ranking error. And, of course, the ranking errors also depend on ξ _c and σ _c. The results reported in this paper are derived under the parametric model with Monte Carlo estimates of the distributions of S _cv and T _cv based on Monte Carlo samples of 10,000 random students from each country.

To estimate the distribution of the country ranks based on country averages S _c and/or T _c, we generated 100,000 random values of S _c and/or T _c and for each set ranked the countries according to these values. Given these estimates, it is easy to find both confidence intervals and probabilities of extreme rankings for the countries.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kreiner, S., Christensen, K.B. Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy. Psychometrika 79, 210–231 (2014). https://doi.org/10.1007/s11336-013-9347-z

Download citation

Received: 08 December 2011
Revised: 30 January 2013
Published: 14 June 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s11336-013-9347-z

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

The Relation Between Family Socioeconomic Status and Academic Achievement in China: A Meta-analysis

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A. Information on Countries

Appendix B. Analyses of Data from All Booklets with Reading Items

Appendix C. Assessment of Ranking Error

Rights and permissions

About this article

Cite this article

Key words

Navigation

Analyses of Model Fit and Robustness. A New Look at the PISA Scaling Model Underlying Ranking of Countries According to Reading Literacy

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

The Relation Between Family Socioeconomic Status and Academic Achievement in China: A Meta-analysis

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A. Information on Countries

Appendix B. Analyses of Data from All Booklets with Reading Items

Appendix C. Assessment of Ranking Error

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation