Multilevel Modelling of International Large-Scale Assessment Data

Karakolidis, Anastasios; Pitsia, Vasiliki; Cosgrove, Jude

doi:10.1007/978-981-16-9142-3_8

Anastasios Karakolidis²,
Vasiliki Pitsia² &
Jude Cosgrove²

976 Accesses
2 Citations
8 Altmetric

Abstract

It is indisputable that international large-scale assessments (ILSAs), such as the Trends in International Mathematics and Science Study (TIMSS), the Progress in International Reading Literacy Study (PIRLS), and the Programme for International Student Assessment (PISA), play an important role in informing educational policies across countries. Such assessments provide rich but complex data. It is important to be aware of these complexities in order to analyse ILSA data correctly and interpret results appropriately. This chapter is an accessible introduction to the topic, providing a starting point for the application of multilevel modelling of ILSA data for research and policy. The chapter provides an introduction to key concepts and design features of ILSAs relevant to multilevel modelling (e.g., cluster sampling, weights, and plausible values) and considers issues from a practical perspective to support data preparation and the selection of modelling techniques and software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For example, compare OECD (2021), Chap. 9 to Adams and Wu (2002), Chap. 9.
2.
Other similar studies are the International Civic and Citizenship Education Study (ICCS) and the International Computer and Information Literacy Study (ICILS).
3.
For example, the National Assessments of Mathematics and English Reading (NAMER) in Ireland (Eivers et al., 2010) and the National Assessment of Educational Progress (NAEP) in the United States (National Center for Education Statistics, 2011).
4.
For further detail on the item response models and resultant plausible values used in ILSAs, readers are referred to the relevant technical documentation of the ILSA in question. The IEA and the OECD publish technical reports for each ILSA cycle; for example, see Chap. 12 of the PISA 2018 technical report (OECD, 2021), and Chaps. 11 and 12 of the TIMSS 2019 technical report (Martin et al., 2020).
5.
The way these levels are identified is through the use of specific cut-off points across the performance continuum for each plausible value in each domain. In each of the ILSAs, students scoring at certain levels in each domain, taking all plausible values into account, are identified as low, medium, or high achievers. Detailed descriptions of the skills that students are expected to demonstrate at each level of performance in each domain and ILSA, and further information about how the cut-off points for each level are set, can be found in the technical reports of ILSAs (see, for example, Martin et al., 2017, 2020; OECD, 2021).
6.
It should be borne in mind that, in many cases, more than one teacher is linked to one class.
7.
The number of country dummies required in the model is k – 1, where k is the number of countries included in the analysis.
8.
This practice can be also applied to test how the relationships between explanatory and outcome variables change across different cycles of the same study within a country; see for example Karakolidis et al. (2021).
9.
In PISA, the term final weight, rather than total weight, is used to refer to the student weights that incorporate the school weights (e.g., OECD, 2021). In this chapter, the terms total and final weights are used in line with the IEA studies; the former refers to the student weights that incorporate the school (and class) weights and the latter to the student weights that are free from the school (and class) weights (e.g., Martin et al., 2020).
10.
Teacher weights are not equivalent to class weights as the former are just total student weights divided by the number of teachers a student has (Rutkowski et al., 2010).
11.
The nine weighting approaches Mang et al. (2021) compared in their study were: (i) no weights, (ii) unscaled weights, (iii) only student weights, (iv) only school weights, (v) house weights, (vi) cluster weights, (vii) ecluster weights, (viii) clustersum weights, and (ix) withincluster weights.

References

Adams, R., & Wu, M. (Eds.) (2002). PISA 2000 technical report. PISA, OECD Publishing. https://doi.org/10.1787/9789264199521-en
Bailey, P., Emad, A., Huo, H., Lee, M., Liao, Y., Lishinski, A., Nguyen, T., Xie, Q., Yu, J., Zhang, T., Buehler, E., & Lee, S. (2021). EdSurvey: Analysis of NCES education survey and assessment data (R package version 2.7.0). https://cran.r-project.org/package=EdSurvey
BIFIE, Robitzsch, A., & Oberwimmer, K. (2019). BIFIEsurvey: Tools for survey statistics in educational assessment (R package version 3.3–12). https://cran.r-project.org/package=BIFIEsurvey
Cohen, L., Manion, L., & Morrison, K. (2017). Research methods in education (8th ed.). Routledge.
Google Scholar
Eivers, E., Clerkin, A., Millar, D., & Close, S. (2010). The 2009 National Assessments technical report. Educational Research Centre.
Google Scholar
Ersan, O., & Rodriguez, M. C. (2020). Socioeconomic status and beyond: A multilevel analysis of TIMSS mathematics achievement given student and school context in Turkey. Large-Scale Assessments in Education, 8(15). https://doi.org/10.1186/s40536-020-00093-y
Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE.
Google Scholar
Fishbein, B., Foy, P., & Yin, L. (2021). TIMSS 2019 user guide for the international database. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Google Scholar
Foshay, A. W., Thorndike, R. L., Hotyat, F., Pidgeon, D. A., & Walker, D. A. (1962). Educational achievements of thirteen-year-olds in twelve countries. UNESCO Institute for Education.
Google Scholar
Garson, G. D. (2013). Introductory guide to HLM with HLM 7 software. In G. D. Garson (Ed.), Hierarchical linear modeling: Guide and applications (pp. 55–96). SAGE Publications Inc. https://doi.org/10.4135/9781483384450.n3
Greaney, V., & Kellaghan, T. (2008). Assessing national achievement levels in education. World Bank. https://hdl.handle.net/10986/6904
Hanushek, E. A., & Woessmann, L. (2005). Does educational tracking affect performance and inequality? Differences-in-differences evidence across countries (IZA DP No. 1901).
Google Scholar
Hox, J. J., Moerbeek, M., & van de Schoot, R. (2018). Multilevel analysis: Techniques and applications (3rd ed.). Routledge.
Google Scholar
Husén, T., & Postlethwaite, T. N. (1996). A brief history of the International Association for the Evaluation of Educational Achievement (IEA). Assessment in Education: Principles, Policy & Practice, 3(2), 129–141. https://doi.org/10.1080/0969594960030202
Article Google Scholar
IEA. (2021). Help manual for the IEA IDB analyzer (Version 4.0). https://www.iea.nl
Karakolidis, A., Duggan, A., Shiel, G., & Kiniry, J. (2021). Examining educational inequalities: Insights in the context of improved mathematics performance on national and international assessments at primary level in Ireland. Large-Scale Assessments in Education, 9(5). https://doi.org/10.1186/s40536-021-00098-1
Kellaghan, T. (1996). IEA studies and educational policy. Assessment in Education: Principles, Policy & Practice, 3(2), 143–160. https://doi.org/10.1080/0969594960030203
Kellaghan, T., & Greaney, V. (2001). The globalisation of assessment in the 20th century. Assessment in Education: Principles, Policy & Practice, 8(1), 87–102. https://doi.org/10.1080/09695940120033270
Article Google Scholar
Kerkhoff, D., & Nussbeck, F. W. (2019). The influence of sample size on parameter estimates in three-level random-effects models. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01067
Kim, J. S., Anderson, C. J., & Keller, B. (2013). Multilevel analysis of assessment data. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-Scale assessment: Background, technical issues, and methods of data analysis. Chapman and Hall/CRC Press. https://doi.org/10.1201/b16061
Lai, M. H. C., & Kwok, O. (2015). Examining the rule of thumb of not using multilevel modeling: The “design effect smaller than two” rule. The Journal of Experimental Education, 83(3), 423–438. https://doi.org/10.1080/00220973.2014.907229
Article Google Scholar
Laukaityte, I., & Wiberg, M. (2018). Importance of sampling weights in multilevel modeling of international large-scale assessment data. Communications in Statistics—Theory and Methods, 47(20), 4991–5012. https://doi.org/10.1080/03610926.2017.1383429
Article Google Scholar
Mang, J., Küchenhoff, H., Meinck, S., & Prenzel, M. (2021). Sampling weights in multilevel modelling: An investigation using PISA sampling structures. Large-Scale Assessments in Education, 9(6). https://doi.org/10.1186/s40536-021-00099-0
Martin, M. O., Mullis, I. V. S., & Hooper, M. (Eds.). (2017). Methods and procedures in PIRLS 2016. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Google Scholar
Martin, M. O., von Davier, M., & Mullis, I. V. S. (Eds.). (2020). Methods and procedures: TIMSS 2019 technical report. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Google Scholar
Menezes, I. G., Duran, V. R., Mendonça Filho, E. J., Veloso, T. J., Sarmento, S. M. S., Paget, C. L., & Ruggeri, K. (2016). Policy implications of achievement testing using multilevel models: The case of Brazilian elementary schools. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01727
Mirazchiyski, P., & INERI. (2021). RALSA: R analyzer for large-scale assessments (R package version 1.0.2). https://cran.r-project.org/package=RALSA
Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 international results in mathematics and science. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Google Scholar
Musca, S. C., Kamiejski, R., Nugier, A., Méot, A., Er-Rafiy, A., & Brauer, M. (2011). Data with hierarchical structure: Impact of intraclass correlation and sample size on Type-I error. Frontiers in Psychology, 2. https://doi.org/10.3389/fpsyg.2011.00074
Muthén, L. K., & Muthén, B. O. (2017). Mplus user’s guide (8th ed.). Muthén & Muthén.
Google Scholar
National Center for Education Statistics. (2011). Overview of the NAEP assessment design. NAEP Technical Documentation. https://nces.ed.gov/nationsreportcard/tdw/overview/
OECD. (2009). PISA data analysis manual: SPSS second edition. PISA, OECD Publishing. https://doi.org/10.1787/9789264056275-en
OECD. (2013a). PISA 2012 results: Excellence through equity (Volume II): Giving every student the chance to succeed. PISA, OECD Publishing. https://doi.org/10.1787/9789264201132-en
Article Google Scholar
OECD. (2013b). PISA 2012 results: What makes schools successful (Volume IV): Resources, policies and practices. PISA, OECD Publishing. https://doi.org/10.1787/9789264201156-en
Article Google Scholar
OECD. (2016). PISA 2015 results (Volume II): Policies and practices for successful schools. PISA, OECD Publishing. https://doi.org/10.1787/9789264267510-en
Article Google Scholar
OECD. (2018). Effective teacher policies: Insights from PISA. PISA, OECD Publishing. https://doi.org/10.1787/9789264301603-en
Article Google Scholar
OECD. (2019a). TALIS 2018 technical report. OECD Publishing.
Google Scholar
OECD. (2019b). PISA 2018 results (Volume III): What school life means for students’ lives. PISA, OECD Publishing. https://doi.org/10.1787/acd78851-en
Article Google Scholar
OECD. (2021). PISA 2018 technical report. PISA, OECD Publishing. https://www.oecd.org/pisa/data/pisa2018technicalreport/
Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 60(1), 23–40.
Google Scholar
Plomp, T., Howie, S., & McGaw, B. (2003). International studies of educational achievement. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation. Kluwer International Handbooks of Education (Vol. 9, pp. 951–978). Springer. https://doi.org/10.1007/978-94-010-0309-4_53
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Google Scholar
Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society, 169(4), 805–827. https://doi.org/10.1111/j.1467-985X.2006.00426.x
Article Google Scholar
Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2020). A user’s guide to MLwiN, v3.05. Centre for Multilevel Modelling, University of Bristol.
Google Scholar
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). SAGE Publications, Inc.
Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons Inc. https://doi.org/10.1002/9780470316696
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
Article Google Scholar
SAS Institute Inc. (2018). SAS/STAT® 15.1 user’s guide. SAS Institute Inc.
Google Scholar
Schütz, G., Ursprung, H. W., & Wößmann, L. (2008). Education policy and equality of opportunity. Kyklos, 61(2), 279–308. https://doi.org/10.1111/j.1467-6435.2008.00402.x
Article Google Scholar
Sempé, L. (2021). School-level inequality measurement based categorical data: A novel approach applied to PISA. Large-Scale Assessments in Education, 9(9). https://doi.org/10.1186/s40536-021-00103-7
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). SAGE.
Google Scholar
StataCorp. (2021). Stata base reference manual: Release 17. Stata Press.
Google Scholar
van Daal, V., Begnum, A. C., Solheim, R. G., & Adèr, H. (2008). Nordic comparisons in PIRLS 2006. 3rd IEA International Research Conference (IRC-2008).
Google Scholar
von Davier, M., Gonzalez, E., & Mislevy, R. J. (2009). What are plausible values and why are they useful? IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 2, 9–36.
Google Scholar
Woltman, H., Feldstain, A., Mackay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69. https://doi.org/10.20982/tqmp.08.1.p052
Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31(2–3), 114–128. https://doi.org/10.1016/j.stueduc.2005.05.005
Article Google Scholar

Download references

Acknowledgements

The authors are indebted to Alice Duggan for proofreading the chapter.

Author information

Authors and Affiliations

Educational Research Centre, Dublin, Ireland
Anastasios Karakolidis, Vasiliki Pitsia & Jude Cosgrove

Authors

Anastasios Karakolidis
View author publications
You can also search for this author in PubMed Google Scholar
Vasiliki Pitsia
View author publications
You can also search for this author in PubMed Google Scholar
Jude Cosgrove
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasios Karakolidis .

Editor information

Editors and Affiliations

Curtin University, Bentley, WA, Australia
Myint Swe Khine

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Karakolidis, A., Pitsia, V., Cosgrove, J. (2022). Multilevel Modelling of International Large-Scale Assessment Data. In: Khine, M.S. (eds) Methodology for Multilevel Modeling in Educational Research. Springer, Singapore. https://doi.org/10.1007/978-981-16-9142-3_8

Download citation

DOI: https://doi.org/10.1007/978-981-16-9142-3_8
Published: 11 April 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9141-6
Online ISBN: 978-981-16-9142-3
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics