Skip to main content

Multilevel Modelling of International Large-Scale Assessment Data

  • Chapter
  • First Online:
Methodology for Multilevel Modeling in Educational Research

Abstract

It is indisputable that international large-scale assessments (ILSAs), such as the Trends in International Mathematics and Science Study (TIMSS), the Progress in International Reading Literacy Study (PIRLS), and the Programme for International Student Assessment (PISA), play an important role in informing educational policies across countries. Such assessments provide rich but complex data. It is important to be aware of these complexities in order to analyse ILSA data correctly and interpret results appropriately. This chapter is an accessible introduction to the topic, providing a starting point for the application of multilevel modelling of ILSA data for research and policy. The chapter provides an introduction to key concepts and design features of ILSAs relevant to multilevel modelling (e.g., cluster sampling, weights, and plausible values) and considers issues from a practical perspective to support data preparation and the selection of modelling techniques and software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For example, compare OECD (2021), Chap. 9 to Adams and Wu (2002), Chap. 9.

  2. 2.

    Other similar studies are the International Civic and Citizenship Education Study (ICCS) and the International Computer and Information Literacy Study (ICILS).

  3. 3.

    For example, the National Assessments of Mathematics and English Reading (NAMER) in Ireland (Eivers et al., 2010) and the National Assessment of Educational Progress (NAEP) in the United States (National Center for Education Statistics, 2011).

  4. 4.

    For further detail on the item response models and resultant plausible values used in ILSAs, readers are referred to the relevant technical documentation of the ILSA in question. The IEA and the OECD publish technical reports for each ILSA cycle; for example, see Chap. 12 of the PISA 2018 technical report (OECD, 2021), and Chaps. 11 and 12 of the TIMSS 2019 technical report (Martin et al., 2020).

  5. 5.

    The way these levels are identified is through the use of specific cut-off points across the performance continuum for each plausible value in each domain. In each of the ILSAs, students scoring at certain levels in each domain, taking all plausible values into account, are identified as low, medium, or high achievers. Detailed descriptions of the skills that students are expected to demonstrate at each level of performance in each domain and ILSA, and further information about how the cut-off points for each level are set, can be found in the technical reports of ILSAs (see, for example, Martin et al., 2017, 2020; OECD, 2021).

  6. 6.

    It should be borne in mind that, in many cases, more than one teacher is linked to one class.

  7. 7.

    The number of country dummies required in the model is k – 1, where k is the number of countries included in the analysis.

  8. 8.

    This practice can be also applied to test how the relationships between explanatory and outcome variables change across different cycles of the same study within a country; see for example Karakolidis et al. (2021).

  9. 9.

    In PISA, the term final weight, rather than total weight, is used to refer to the student weights that incorporate the school weights (e.g., OECD, 2021). In this chapter, the terms total and final weights are used in line with the IEA studies; the former refers to the student weights that incorporate the school (and class) weights and the latter to the student weights that are free from the school (and class) weights (e.g., Martin et al., 2020).

  10. 10.

    Teacher weights are not equivalent to class weights as the former are just total student weights divided by the number of teachers a student has (Rutkowski et al., 2010).

  11. 11.

    The nine weighting approaches Mang et al. (2021) compared in their study were: (i) no weights, (ii) unscaled weights, (iii) only student weights, (iv) only school weights, (v) house weights, (vi) cluster weights, (vii) ecluster weights, (viii) clustersum weights, and (ix) withincluster weights.

References

  • Adams, R., & Wu, M. (Eds.) (2002). PISA 2000 technical report. PISA, OECD Publishing. https://doi.org/10.1787/9789264199521-en

  • Bailey, P., Emad, A., Huo, H., Lee, M., Liao, Y., Lishinski, A., Nguyen, T., Xie, Q., Yu, J., Zhang, T., Buehler, E., & Lee, S. (2021). EdSurvey: Analysis of NCES education survey and assessment data (R package version 2.7.0). https://cran.r-project.org/package=EdSurvey

  • BIFIE, Robitzsch, A., & Oberwimmer, K. (2019). BIFIEsurvey: Tools for survey statistics in educational assessment (R package version 3.3–12). https://cran.r-project.org/package=BIFIEsurvey

  • Cohen, L., Manion, L., & Morrison, K. (2017). Research methods in education (8th ed.). Routledge.

    Google Scholar 

  • Eivers, E., Clerkin, A., Millar, D., & Close, S. (2010). The 2009 National Assessments technical report. Educational Research Centre.

    Google Scholar 

  • Ersan, O., & Rodriguez, M. C. (2020). Socioeconomic status and beyond: A multilevel analysis of TIMSS mathematics achievement given student and school context in Turkey. Large-Scale Assessments in Education, 8(15). https://doi.org/10.1186/s40536-020-00093-y

  • Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE.

    Google Scholar 

  • Fishbein, B., Foy, P., & Yin, L. (2021). TIMSS 2019 user guide for the international database. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).

    Google Scholar 

  • Foshay, A. W., Thorndike, R. L., Hotyat, F., Pidgeon, D. A., & Walker, D. A. (1962). Educational achievements of thirteen-year-olds in twelve countries. UNESCO Institute for Education.

    Google Scholar 

  • Garson, G. D. (2013). Introductory guide to HLM with HLM 7 software. In G. D. Garson (Ed.), Hierarchical linear modeling: Guide and applications (pp. 55–96). SAGE Publications Inc. https://doi.org/10.4135/9781483384450.n3

  • Greaney, V., & Kellaghan, T. (2008). Assessing national achievement levels in education. World Bank. https://hdl.handle.net/10986/6904

  • Hanushek, E. A., & Woessmann, L. (2005). Does educational tracking affect performance and inequality? Differences-in-differences evidence across countries (IZA DP No. 1901).

    Google Scholar 

  • Hox, J. J., Moerbeek, M., & van de Schoot, R. (2018). Multilevel analysis: Techniques and applications (3rd ed.). Routledge.

    Google Scholar 

  • Husén, T., & Postlethwaite, T. N. (1996). A brief history of the International Association for the Evaluation of Educational Achievement (IEA). Assessment in Education: Principles, Policy & Practice, 3(2), 129–141. https://doi.org/10.1080/0969594960030202

    Article  Google Scholar 

  • IEA. (2021). Help manual for the IEA IDB analyzer (Version 4.0). https://www.iea.nl

  • Karakolidis, A., Duggan, A., Shiel, G., & Kiniry, J. (2021). Examining educational inequalities: Insights in the context of improved mathematics performance on national and international assessments at primary level in Ireland. Large-Scale Assessments in Education, 9(5). https://doi.org/10.1186/s40536-021-00098-1

  • Kellaghan, T. (1996). IEA studies and educational policy. Assessment in Education: Principles, Policy & Practice, 3(2), 143–160. https://doi.org/10.1080/0969594960030203

  • Kellaghan, T., & Greaney, V. (2001). The globalisation of assessment in the 20th century. Assessment in Education: Principles, Policy & Practice, 8(1), 87–102. https://doi.org/10.1080/09695940120033270

    Article  Google Scholar 

  • Kerkhoff, D., & Nussbeck, F. W. (2019). The influence of sample size on parameter estimates in three-level random-effects models. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01067

  • Kim, J. S., Anderson, C. J., & Keller, B. (2013). Multilevel analysis of assessment data. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-Scale assessment: Background, technical issues, and methods of data analysis. Chapman and Hall/CRC Press. https://doi.org/10.1201/b16061

  • Lai, M. H. C., & Kwok, O. (2015). Examining the rule of thumb of not using multilevel modeling: The “design effect smaller than two” rule. The Journal of Experimental Education, 83(3), 423–438. https://doi.org/10.1080/00220973.2014.907229

    Article  Google Scholar 

  • Laukaityte, I., & Wiberg, M. (2018). Importance of sampling weights in multilevel modeling of international large-scale assessment data. Communications in Statistics—Theory and Methods, 47(20), 4991–5012. https://doi.org/10.1080/03610926.2017.1383429

    Article  Google Scholar 

  • Mang, J., Küchenhoff, H., Meinck, S., & Prenzel, M. (2021). Sampling weights in multilevel modelling: An investigation using PISA sampling structures. Large-Scale Assessments in Education, 9(6). https://doi.org/10.1186/s40536-021-00099-0

  • Martin, M. O., Mullis, I. V. S., & Hooper, M. (Eds.). (2017). Methods and procedures in PIRLS 2016. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and International Association for the Evaluation of Educational Achievement (IEA).

    Google Scholar 

  • Martin, M. O., von Davier, M., & Mullis, I. V. S. (Eds.). (2020). Methods and procedures: TIMSS 2019 technical report. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).

    Google Scholar 

  • Menezes, I. G., Duran, V. R., Mendonça Filho, E. J., Veloso, T. J., Sarmento, S. M. S., Paget, C. L., & Ruggeri, K. (2016). Policy implications of achievement testing using multilevel models: The case of Brazilian elementary schools. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01727

  • Mirazchiyski, P., & INERI. (2021). RALSA: R analyzer for large-scale assessments (R package version 1.0.2). https://cran.r-project.org/package=RALSA

  • Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 international results in mathematics and science. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).

    Google Scholar 

  • Musca, S. C., Kamiejski, R., Nugier, A., Méot, A., Er-Rafiy, A., & Brauer, M. (2011). Data with hierarchical structure: Impact of intraclass correlation and sample size on Type-I error. Frontiers in Psychology, 2. https://doi.org/10.3389/fpsyg.2011.00074

  • Muthén, L. K., & Muthén, B. O. (2017). Mplus user’s guide (8th ed.). Muthén & Muthén.

    Google Scholar 

  • National Center for Education Statistics. (2011). Overview of the NAEP assessment design. NAEP Technical Documentation. https://nces.ed.gov/nationsreportcard/tdw/overview/

  • OECD. (2009). PISA data analysis manual: SPSS second edition. PISA, OECD Publishing. https://doi.org/10.1787/9789264056275-en

  • OECD. (2013a). PISA 2012 results: Excellence through equity (Volume II): Giving every student the chance to succeed. PISA, OECD Publishing. https://doi.org/10.1787/9789264201132-en

    Article  Google Scholar 

  • OECD. (2013b). PISA 2012 results: What makes schools successful (Volume IV): Resources, policies and practices. PISA, OECD Publishing. https://doi.org/10.1787/9789264201156-en

    Article  Google Scholar 

  • OECD. (2016). PISA 2015 results (Volume II): Policies and practices for successful schools. PISA, OECD Publishing. https://doi.org/10.1787/9789264267510-en

    Article  Google Scholar 

  • OECD. (2018). Effective teacher policies: Insights from PISA. PISA, OECD Publishing. https://doi.org/10.1787/9789264301603-en

    Article  Google Scholar 

  • OECD. (2019a). TALIS 2018 technical report. OECD Publishing.

    Google Scholar 

  • OECD. (2019b). PISA 2018 results (Volume III): What school life means for students’ lives. PISA, OECD Publishing. https://doi.org/10.1787/acd78851-en

    Article  Google Scholar 

  • OECD. (2021). PISA 2018 technical report. PISA, OECD Publishing. https://www.oecd.org/pisa/data/pisa2018technicalreport/

  • Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 60(1), 23–40.

    Google Scholar 

  • Plomp, T., Howie, S., & McGaw, B. (2003). International studies of educational achievement. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation. Kluwer International Handbooks of Education (Vol. 9, pp. 951–978). Springer. https://doi.org/10.1007/978-94-010-0309-4_53

  • R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

    Google Scholar 

  • Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society, 169(4), 805–827. https://doi.org/10.1111/j.1467-985X.2006.00426.x

    Article  Google Scholar 

  • Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2020). A user’s guide to MLwiN, v3.05. Centre for Multilevel Modelling, University of Bristol.

    Google Scholar 

  • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). SAGE Publications, Inc.

    Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons Inc. https://doi.org/10.1002/9780470316696

  • Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170

    Article  Google Scholar 

  • SAS Institute Inc. (2018). SAS/STAT® 15.1 user’s guide. SAS Institute Inc.

    Google Scholar 

  • Schütz, G., Ursprung, H. W., & Wößmann, L. (2008). Education policy and equality of opportunity. Kyklos, 61(2), 279–308. https://doi.org/10.1111/j.1467-6435.2008.00402.x

    Article  Google Scholar 

  • Sempé, L. (2021). School-level inequality measurement based categorical data: A novel approach applied to PISA. Large-Scale Assessments in Education, 9(9). https://doi.org/10.1186/s40536-021-00103-7

  • Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). SAGE.

    Google Scholar 

  • StataCorp. (2021). Stata base reference manual: Release 17. Stata Press.

    Google Scholar 

  • van Daal, V., Begnum, A. C., Solheim, R. G., & Adèr, H. (2008). Nordic comparisons in PIRLS 2006. 3rd IEA International Research Conference (IRC-2008).

    Google Scholar 

  • von Davier, M., Gonzalez, E., & Mislevy, R. J. (2009). What are plausible values and why are they useful? IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 2, 9–36.

    Google Scholar 

  • Woltman, H., Feldstain, A., Mackay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69. https://doi.org/10.20982/tqmp.08.1.p052

  • Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31(2–3), 114–128. https://doi.org/10.1016/j.stueduc.2005.05.005

    Article  Google Scholar 

Download references

Acknowledgements

The authors are indebted to Alice Duggan for proofreading the chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasios Karakolidis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Karakolidis, A., Pitsia, V., Cosgrove, J. (2022). Multilevel Modelling of International Large-Scale Assessment Data. In: Khine, M.S. (eds) Methodology for Multilevel Modeling in Educational Research. Springer, Singapore. https://doi.org/10.1007/978-981-16-9142-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-9142-3_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-9141-6

  • Online ISBN: 978-981-16-9142-3

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics