Abstract
This paper discusses the issue of differential item functioning (DIF) in international surveys. DIF is likely to occur in international surveys. What is needed is a statistical approach that takes DIF into account, while at the same time allowing for meaningful comparisons between countries. Some existing approaches are discussed and an alternative is provided. The core of this alternative approach is to define the construct as a large set of items, and to report in terms of summary statistics. Since the data are incomplete, measurement models are used to complete the incomplete data. For that purpose, different models can be used across countries. The method is illustrated with PISA’s reading literacy data. The results indicate that this approach fits the data better than the current PISA methodology; however, the league tables are nearly identical. The implications for monitoring changes over time are discussed.
Similar content being viewed by others
Notes
In fact, PISA consists of participating economies. However, since most economies are countries, and since we think that the term countries is easier for the reader, we use the term countries instead of economies.
The parameters of polytomous items are connected with a dotted line.
The data were retrieved from http://pisa2003.acer.edu.au/downloads.php and http://pisa2006.acer.edu.au/downloads.php on August 22nd, 2013.
The item numbering is according to the order in which the items appear in booklet 6 of PISA 2006.
Around 2000, it has been discussed whether this construct should be part of the PISA survey.
References
Adams, R. (2011, 19 April). Comments on Kreiner 2011: Is the foundation under PISA solid? A critical look at the scaling model underlying international comparisons of student attainment. Retrieved from http://www.oecd.org/pisa/47681954.
Adams, R., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1–23.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washinton, DC: American Educational Research Association.
Andersen, E. B. (1973). Conditional inference and models for measuring. (Unpublished doctoral dissertation). Mentalhygiejnisk Forskningsinstitut.
Bechger, T. M., & Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340. doi:10.1007/s11336-014-9408-y.
Bechger, T.M., Maris, G., & Verstralen, H.H.F.M. (2010). A different view on DIF (Measurement and Research Department Reports No. 2010-4). Cito.
Béguin, A. A., & Wools, S. (2015). Vertical comparison using reference sets. In R. E. Millsap, D. M. Bolt, L. A. van der Ark, & W. C. Wang (Eds.), Quantitative psychology research (Vol. 89, pp. 195–211). Switzerland: Springer International Publishing.
Bolsinova, M., Maris, G., & Hoijtink, H. (2016). Unmixing Rasch scales: How to score an educational test. Annals of Applied Statistics, 10(2), 925–945. doi:10.1214/16-AOAS919.
Council of Europe. (2012). First european survey on language competences: Technical report. Retrieved from http://www.surveylang.org/.
Dieterich, C. (2013, March). In or out, DJIA companies reflect changing times. The Wall Street Journal. Retrieved from http://online.wsj.com/news/articles/SB10001424127887324678604578342113520798752.
Goldstein, H. (2004). International comparisons of student attainment: Some issues arising from the PISA study. Assessment in Education, 11(3), 319–330. doi:10.1080/0969594042000304618.
Holland, P., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. Methods and practices (2nd ed.). New York: Springer.
Kreiner, S. (2011). Is the foundation under PISA solid? A critical look at the scaling model underlying international comparisons of student attainment. (Tech. Rep.). Dept. of Biostatistics, University of Copenhagen.
Kreiner, S., & Christensen, K. B. (2007). Validity and objectivity in health-related scales: Analysis by graphical loglinear Rasch models. In M. Von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 329–346). New York: Springer.
Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210–231. doi:10.1007/s11336-013-9347-z.
Le, L. T. (2007). Effects of item positions on their difficulty and discrimination: A study in PISA science data across test language and countries. Paper presented at the 72nd Annual Meeting of the Psychometric Society, Tokyo, Japan. Retrieved from http://research.acer.edu.au/pisa/2/.
Linthorne, N. (2014, August). Wind assistance in the 100m sprint. Retrieved from http://www.brunel.ac.uk/~spstnpl/Publications/.
Lord, F., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17(3), 179–193.
Marsman, M., Maris, G., Bechger, T., & Glas, C. (2016). What can we learn from Plausible Values? Psychometrika, 81, 274–289. doi:10.1007/s11336-016-9497-x.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Mazzeo, J., Kulick, E., Tay-Lim, B., & Perie, M. (2006). Technical report for the 2000 market-basket study in mathematics (Tech. Rep.). ETS.
Mislevy, R. J. (1998). Implications of market-basket reporting for achievement-level setting. Applied Psychological Measurement, 11(1), 49–63.
National Research Council. (2001). Naep reporting practices: Investigating district-level and market-basket reporting. Washington, DC: The National Academies Press. doi:10.17226/10049.
NCES. (1997, October). NAEP reconfigured: An integrated redesign of the national assessment of educational progress (Tech. Rep. No. 97-31). National Center For Educational Statistics. Retrieved from http://nces.ed.gov/pubs97/9731.
OECD. (2004). Learning for tomorrows world: First results from PISA 2003. Retrieved from www.oecd.org/dataoecd/1/60/34002216.
OECD. (2007). PISA 2006: Science competencies for tomorrows world: Volume 1: Analysis.
OECD. (2009a). PISA 2006 technical report.
OECD. (2009b) PISA data analysis manual.
OECD. (2012). The policy impact of PISA: An exploration of the normative effects of international benchmarking in school system performance (OECD Education Working Paper No. 71). Organisation for Economic Co-operation and Development.
Oliveri, M. E., & Ercikan, K. (2011). Do different approaches to examining construct comparability in multilanguage assessments lead to similar conclusions? Applied Measurement in Education, 24(4), 349–366. doi:10.1080/08957347.2011.607063.
Oliveri, M. E., & Von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333.
Oliveri, M. E., & Von Davier, M. (2014). Toward increasing fairness in score scale calibrations employed in international large-scale assessments. International Journal of Testing, 14(1), 1–21. doi:10.1080/15305058.2013.825265.
Sandilands, D., Oliveri, M. E., Zumbo, B. D., & Ercikan, K. (2013). Investigating sources of differential item functioning in international large-scale assessments using a confirmatory approach. International Journal of Testing, 13(2), 152–174. doi:10.1080/15305058.2012.690140.
Verhelst, N. D. (2012). Profile analysis: A closer look at the PISA 2000 reading data. Scandinavian Journal of Educational Research, 56(3), 315–332. doi:10.1080/00313831.2011.583937.
Verhelst, N. D., & Glas, C. A. W. (1995). The one parameter logistic model: OPLM. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models: Foundations, recent developments and applications (pp. 215–238). New York: Springer.
Verhelst, N. D., Glas, C. A. W., & Verstralen, H. H. F. M. (1993). OPLM: One parameter logistic model. Computer program and manual. Arnhem: Cito.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 5.
Rights and permissions
About this article
Cite this article
Zwitser, R.J., Glaser, S.S.F. & Maris, G. Monitoring Countries in a Changing World: A New Look at DIF in International Surveys. Psychometrika 82, 210–232 (2017). https://doi.org/10.1007/s11336-016-9543-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-016-9543-8