Abstract
It is indisputable that international large-scale assessments (ILSAs), such as the Trends in International Mathematics and Science Study (TIMSS), the Progress in International Reading Literacy Study (PIRLS), and the Programme for International Student Assessment (PISA), play an important role in informing educational policies across countries. Such assessments provide rich but complex data. It is important to be aware of these complexities in order to analyse ILSA data correctly and interpret results appropriately. This chapter is an accessible introduction to the topic, providing a starting point for the application of multilevel modelling of ILSA data for research and policy. The chapter provides an introduction to key concepts and design features of ILSAs relevant to multilevel modelling (e.g., cluster sampling, weights, and plausible values) and considers issues from a practical perspective to support data preparation and the selection of modelling techniques and software.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Other similar studies are the International Civic and Citizenship Education Study (ICCS) and the International Computer and Information Literacy Study (ICILS).
- 3.
- 4.
For further detail on the item response models and resultant plausible values used in ILSAs, readers are referred to the relevant technical documentation of the ILSA in question. The IEA and the OECD publish technical reports for each ILSA cycle; for example, see Chap. 12 of the PISA 2018 technical report (OECD, 2021), and Chaps. 11 and 12 of the TIMSS 2019 technical report (Martin et al., 2020).
- 5.
The way these levels are identified is through the use of specific cut-off points across the performance continuum for each plausible value in each domain. In each of the ILSAs, students scoring at certain levels in each domain, taking all plausible values into account, are identified as low, medium, or high achievers. Detailed descriptions of the skills that students are expected to demonstrate at each level of performance in each domain and ILSA, and further information about how the cut-off points for each level are set, can be found in the technical reports of ILSAs (see, for example, Martin et al., 2017, 2020; OECD, 2021).
- 6.
It should be borne in mind that, in many cases, more than one teacher is linked to one class.
- 7.
The number of country dummies required in the model is k – 1, where k is the number of countries included in the analysis.
- 8.
This practice can be also applied to test how the relationships between explanatory and outcome variables change across different cycles of the same study within a country; see for example Karakolidis et al. (2021).
- 9.
In PISA, the term final weight, rather than total weight, is used to refer to the student weights that incorporate the school weights (e.g., OECD, 2021). In this chapter, the terms total and final weights are used in line with the IEA studies; the former refers to the student weights that incorporate the school (and class) weights and the latter to the student weights that are free from the school (and class) weights (e.g., Martin et al., 2020).
- 10.
Teacher weights are not equivalent to class weights as the former are just total student weights divided by the number of teachers a student has (Rutkowski et al., 2010).
- 11.
The nine weighting approaches Mang et al. (2021) compared in their study were: (i) no weights, (ii) unscaled weights, (iii) only student weights, (iv) only school weights, (v) house weights, (vi) cluster weights, (vii) ecluster weights, (viii) clustersum weights, and (ix) withincluster weights.
References
Adams, R., & Wu, M. (Eds.) (2002). PISA 2000 technical report. PISA, OECD Publishing. https://doi.org/10.1787/9789264199521-en
Bailey, P., Emad, A., Huo, H., Lee, M., Liao, Y., Lishinski, A., Nguyen, T., Xie, Q., Yu, J., Zhang, T., Buehler, E., & Lee, S. (2021). EdSurvey: Analysis of NCES education survey and assessment data (R package version 2.7.0). https://cran.r-project.org/package=EdSurvey
BIFIE, Robitzsch, A., & Oberwimmer, K. (2019). BIFIEsurvey: Tools for survey statistics in educational assessment (R package version 3.3–12). https://cran.r-project.org/package=BIFIEsurvey
Cohen, L., Manion, L., & Morrison, K. (2017). Research methods in education (8th ed.). Routledge.
Eivers, E., Clerkin, A., Millar, D., & Close, S. (2010). The 2009 National Assessments technical report. Educational Research Centre.
Ersan, O., & Rodriguez, M. C. (2020). Socioeconomic status and beyond: A multilevel analysis of TIMSS mathematics achievement given student and school context in Turkey. Large-Scale Assessments in Education, 8(15). https://doi.org/10.1186/s40536-020-00093-y
Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE.
Fishbein, B., Foy, P., & Yin, L. (2021). TIMSS 2019 user guide for the international database. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Foshay, A. W., Thorndike, R. L., Hotyat, F., Pidgeon, D. A., & Walker, D. A. (1962). Educational achievements of thirteen-year-olds in twelve countries. UNESCO Institute for Education.
Garson, G. D. (2013). Introductory guide to HLM with HLM 7 software. In G. D. Garson (Ed.), Hierarchical linear modeling: Guide and applications (pp. 55–96). SAGE Publications Inc. https://doi.org/10.4135/9781483384450.n3
Greaney, V., & Kellaghan, T. (2008). Assessing national achievement levels in education. World Bank. https://hdl.handle.net/10986/6904
Hanushek, E. A., & Woessmann, L. (2005). Does educational tracking affect performance and inequality? Differences-in-differences evidence across countries (IZA DP No. 1901).
Hox, J. J., Moerbeek, M., & van de Schoot, R. (2018). Multilevel analysis: Techniques and applications (3rd ed.). Routledge.
Husén, T., & Postlethwaite, T. N. (1996). A brief history of the International Association for the Evaluation of Educational Achievement (IEA). Assessment in Education: Principles, Policy & Practice, 3(2), 129–141. https://doi.org/10.1080/0969594960030202
IEA. (2021). Help manual for the IEA IDB analyzer (Version 4.0). https://www.iea.nl
Karakolidis, A., Duggan, A., Shiel, G., & Kiniry, J. (2021). Examining educational inequalities: Insights in the context of improved mathematics performance on national and international assessments at primary level in Ireland. Large-Scale Assessments in Education, 9(5). https://doi.org/10.1186/s40536-021-00098-1
Kellaghan, T. (1996). IEA studies and educational policy. Assessment in Education: Principles, Policy & Practice, 3(2), 143–160. https://doi.org/10.1080/0969594960030203
Kellaghan, T., & Greaney, V. (2001). The globalisation of assessment in the 20th century. Assessment in Education: Principles, Policy & Practice, 8(1), 87–102. https://doi.org/10.1080/09695940120033270
Kerkhoff, D., & Nussbeck, F. W. (2019). The influence of sample size on parameter estimates in three-level random-effects models. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01067
Kim, J. S., Anderson, C. J., & Keller, B. (2013). Multilevel analysis of assessment data. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-Scale assessment: Background, technical issues, and methods of data analysis. Chapman and Hall/CRC Press. https://doi.org/10.1201/b16061
Lai, M. H. C., & Kwok, O. (2015). Examining the rule of thumb of not using multilevel modeling: The “design effect smaller than two” rule. The Journal of Experimental Education, 83(3), 423–438. https://doi.org/10.1080/00220973.2014.907229
Laukaityte, I., & Wiberg, M. (2018). Importance of sampling weights in multilevel modeling of international large-scale assessment data. Communications in Statistics—Theory and Methods, 47(20), 4991–5012. https://doi.org/10.1080/03610926.2017.1383429
Mang, J., Küchenhoff, H., Meinck, S., & Prenzel, M. (2021). Sampling weights in multilevel modelling: An investigation using PISA sampling structures. Large-Scale Assessments in Education, 9(6). https://doi.org/10.1186/s40536-021-00099-0
Martin, M. O., Mullis, I. V. S., & Hooper, M. (Eds.). (2017). Methods and procedures in PIRLS 2016. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Martin, M. O., von Davier, M., & Mullis, I. V. S. (Eds.). (2020). Methods and procedures: TIMSS 2019 technical report. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Menezes, I. G., Duran, V. R., Mendonça Filho, E. J., Veloso, T. J., Sarmento, S. M. S., Paget, C. L., & Ruggeri, K. (2016). Policy implications of achievement testing using multilevel models: The case of Brazilian elementary schools. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01727
Mirazchiyski, P., & INERI. (2021). RALSA: R analyzer for large-scale assessments (R package version 1.0.2). https://cran.r-project.org/package=RALSA
Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 international results in mathematics and science. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Musca, S. C., Kamiejski, R., Nugier, A., Méot, A., Er-Rafiy, A., & Brauer, M. (2011). Data with hierarchical structure: Impact of intraclass correlation and sample size on Type-I error. Frontiers in Psychology, 2. https://doi.org/10.3389/fpsyg.2011.00074
Muthén, L. K., & Muthén, B. O. (2017). Mplus user’s guide (8th ed.). Muthén & Muthén.
National Center for Education Statistics. (2011). Overview of the NAEP assessment design. NAEP Technical Documentation. https://nces.ed.gov/nationsreportcard/tdw/overview/
OECD. (2009). PISA data analysis manual: SPSS second edition. PISA, OECD Publishing. https://doi.org/10.1787/9789264056275-en
OECD. (2013a). PISA 2012 results: Excellence through equity (Volume II): Giving every student the chance to succeed. PISA, OECD Publishing. https://doi.org/10.1787/9789264201132-en
OECD. (2013b). PISA 2012 results: What makes schools successful (Volume IV): Resources, policies and practices. PISA, OECD Publishing. https://doi.org/10.1787/9789264201156-en
OECD. (2016). PISA 2015 results (Volume II): Policies and practices for successful schools. PISA, OECD Publishing. https://doi.org/10.1787/9789264267510-en
OECD. (2018). Effective teacher policies: Insights from PISA. PISA, OECD Publishing. https://doi.org/10.1787/9789264301603-en
OECD. (2019a). TALIS 2018 technical report. OECD Publishing.
OECD. (2019b). PISA 2018 results (Volume III): What school life means for students’ lives. PISA, OECD Publishing. https://doi.org/10.1787/acd78851-en
OECD. (2021). PISA 2018 technical report. PISA, OECD Publishing. https://www.oecd.org/pisa/data/pisa2018technicalreport/
Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 60(1), 23–40.
Plomp, T., Howie, S., & McGaw, B. (2003). International studies of educational achievement. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation. Kluwer International Handbooks of Education (Vol. 9, pp. 951–978). Springer. https://doi.org/10.1007/978-94-010-0309-4_53
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society, 169(4), 805–827. https://doi.org/10.1111/j.1467-985X.2006.00426.x
Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2020). A user’s guide to MLwiN, v3.05. Centre for Multilevel Modelling, University of Bristol.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). SAGE Publications, Inc.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons Inc. https://doi.org/10.1002/9780470316696
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
SAS Institute Inc. (2018). SAS/STAT® 15.1 user’s guide. SAS Institute Inc.
Schütz, G., Ursprung, H. W., & Wößmann, L. (2008). Education policy and equality of opportunity. Kyklos, 61(2), 279–308. https://doi.org/10.1111/j.1467-6435.2008.00402.x
Sempé, L. (2021). School-level inequality measurement based categorical data: A novel approach applied to PISA. Large-Scale Assessments in Education, 9(9). https://doi.org/10.1186/s40536-021-00103-7
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). SAGE.
StataCorp. (2021). Stata base reference manual: Release 17. Stata Press.
van Daal, V., Begnum, A. C., Solheim, R. G., & Adèr, H. (2008). Nordic comparisons in PIRLS 2006. 3rd IEA International Research Conference (IRC-2008).
von Davier, M., Gonzalez, E., & Mislevy, R. J. (2009). What are plausible values and why are they useful? IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 2, 9–36.
Woltman, H., Feldstain, A., Mackay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69. https://doi.org/10.20982/tqmp.08.1.p052
Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31(2–3), 114–128. https://doi.org/10.1016/j.stueduc.2005.05.005
Acknowledgements
The authors are indebted to Alice Duggan for proofreading the chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Karakolidis, A., Pitsia, V., Cosgrove, J. (2022). Multilevel Modelling of International Large-Scale Assessment Data. In: Khine, M.S. (eds) Methodology for Multilevel Modeling in Educational Research. Springer, Singapore. https://doi.org/10.1007/978-981-16-9142-3_8
Download citation
DOI: https://doi.org/10.1007/978-981-16-9142-3_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9141-6
Online ISBN: 978-981-16-9142-3
eBook Packages: EducationEducation (R0)