Skip to main content

Policies and Practices of Assessment: A Showcase for the Use (and Misuse) of International Large Scale Assessments in Educational Effectiveness Research

  • Chapter
  • First Online:
International Perspectives in Educational Effectiveness Research

Abstract

International Large Scale Assessments (ILSAs) such as TIMSS and PISA provide comparative indicators and trend information on educational systems. Scholars repeatedly claimed that ILSAs should be based on concepts from Educational Effectiveness Research (EER). At the same time, ILSAs can contribute to the further development of EER by providing data, triggering new research studies, and providing instruments which work across cultures. When using ILSA data, however, researchers need to cope with limitations regarding design, sampling, and measurement. Cross-sectional data from individual ILSAs, with little information on students’ learning paths, rarely allow for estimating the effects of policies and practices on student outcomes. Rather, ILSAs inform about the distribution of educational opportunities among students, schools, and regions. Effects of national policies may be identified through country level trend data, if ecological fallacies can be avoided.- In an attempt to illustrate methodological problems and discuss relationships between ILSAs and EER, the present chapter uses a specific showcase: policies and practices of educational assessment. Several related measures were implemented in PISA. Reanalyzing these data, the chapter identifies national patterns of classroom assessment practices, use of assessment, school evaluation and accountability policies. E.g., “soft accountability” (comparing performance with a national standard) is discriminated from “strong accountability” (making test results public); soft accountability was related to country-level growth in achievement. English-speaking countries turned out to show similar patterns, while full invariance with regard to student-perceived assessment and feedback could be established for four (non-American) countries only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Usually, questionnaires are published in the source language (mostly English) only. For PISA 2015, translated versions from 75 countries, item- and scale-level statistics are available at https://daqs.fachportal-paedagogik.de/search/show/survey/177?language=en. This online depository includes Field Trial material not yet used in the Main Study. For an introduction and conceptual overview, see Kuger, Klieme, Jude and Kaplan (2016).

  2. 2.

    The author wants to thank Anindito Aditomo, Sonja Bayer, Janine Buchholz, Jessica Fischer, Jia He, Nina Jude and Susanne Kuger for collaboration on this topic at the DIPF Department for Research on Educational Quality and Evaluation.

  3. 3.

    This chapter of the Technical Report was authored by Janine Buchholz from DIPF, who kindly shared findings on the PERFEED scale with the present author. For a review of the scaling method, see Buchholz & Hartig, 2017.

  4. 4.

    https://www.focus.de/politik/deutschland/bildung-lehrer-machen-gegen-testeritis-an-schulen-front_id_3819831.html

  5. 5.

    Bergbauer, Hanushek & Wößmann (Bergbauer et al., 2018, p. 17) classified this item as an instance of “school-based external comparisons”.

  6. 6.

    With the exception of PISA 2006.

  7. 7.

    Later replaced by ‘students in national modal grade for 15-year-olds’.

  8. 8.

    The international school questionnaire allows for national adaptations regarding the level on which comparisons are made.

  9. 9.

    Changes in background questionnaire wording across cycles of measurement are yet another obstacle against analyzing trend data from ILSA’s; cf. Singer et al., 2018, p. 64.

References

  • Abrams, L. M. (2007). Implications of high-stakes testing for the use of formative classroom assessment. In J. H. McMillan (Ed.), Formative classroom assessment: Theory into practice (pp. 79–98). New York, NY/London, UK: Teacher College/Columbia University.

    Google Scholar 

  • Aloisi, C., & Tymms, P. (2017). PISA trends, social changes, and education reforms. Educational Research and Evaluation, 23(5–6), 180–220.

    Article  Google Scholar 

  • Altrichter, H., & Maag Merki, K. (2016). Handbuch Neue Steuerung im Schulsystem (2nd ed.). Wiesbaden, Germany: Springer.

    Book  Google Scholar 

  • Baker, D. P. (2009). The invisible hand of world education culture. In G. Sykes, B. Schneider, & D. N. Plank (Eds.), Handbook of education policy research (pp. 958–968). New York, NY: Routledge.

    Google Scholar 

  • Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., et al. (2010). Teachers’ mathematical knowledge, cognitive activation in the classroom, and student progress. American Educational Research Journal, 47(1), 133–180. https://doi.org/10.3102/0002831209345157

    Article  Google Scholar 

  • Bayer, S. (2019). Alle alles ganz lehren – Aber wie? Mathematikunterricht vergleichend zwischen den Schularten [Omnes omnia omnino doceantur – But how? Comparing mathematics teaching between school tracks]. Phil. Dissertation. Goethe University, Frankfurt am Main.

    Google Scholar 

  • Bayer, S., Klieme, E., & Jude, N. (2016). Assessment and evaluation in educational contexts. In S. In Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning. An international perspective (pp. 469–488). New York, NY: Springer.

    Chapter  Google Scholar 

  • Bennett, R. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25.

    Google Scholar 

  • Bergbauer, A. B., Hanushek, E. A., & Wößmann, L. (2018, July). Testing (CESifo working paper no. 7168 7168 2018).

    Google Scholar 

  • Bischof, L. M., Hochweber, J., Hartig, J., & Klieme, E. (2013). Schulentwicklung im Verlauf eines Jahrzehnts: Erste Ergebnisse des PISA-Schulpanels [School improvement throughout one decade: First results of the PISA school panel study]. Zeitschrift für Pädagogik, special issue, 59, 172–199.

    Google Scholar 

  • Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74.

    Google Scholar 

  • Black, P., & Wiliam, D. (2004). The formative purpose. Assessment must first promote learning. In M. Wilson (Ed.), Towards coherence between classroom assessment and accountability: 103rd yearbook of the national society for the study of education, Part II (pp. 20–50). Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Bogdandy, A. V., & Goldmann, M. (2009). The exercise of international public authority through National Policy Assessment. The PISA study of the OECD as a template for a new international standard legal instrument. Zeitschrift für ausländisches öffentliches Recht und Völkerrecht, 69, 51–102.

    Google Scholar 

  • Bottani, N., & Tuijnman, A. C. (1994). The design of indicator systems. In A. C. Tuijnman & T. N. Postlethwaithe (Eds.), Monitoring the standards of education (pp. 47–78). Oxford, UK: Pergamon.

    Google Scholar 

  • Bryk, A., & Hermanson, K. (1994). Observations on the structure, interpretation and use of education indicator systems. In OECD (Ed.), Making education count: Developing and using international indicators (pp. 37–53). Paris, France: OECD.

    Google Scholar 

  • Buchholz, J. & Hartig, J. (2017). Comparing attitudes across groups: An IRT-based item-fit statistic for the analysis of measurement invariance. Applied Psychological Measurement. Advance online publication. https://doi.org/10.1177/0146621617748323.

  • Coburn, C., & Turner, E. O. (2011). Research on data use: A framework and analysis. Measurement: Interdisciplinary Research and Practice, 9(4), 173–206.

    Google Scholar 

  • Creemers, B. P. M., & Kyriakides, L. (2008). The dynamics of educational effectiveness. A contribution to policy, practice and theory in contemporary schools. London, UK/New York, NY: Routledge.

    Google Scholar 

  • Decristan, J., Klieme, E., Kunter, M., Hochweber, J., Büttner, G., Fauth, B., et al. (2015). Embedded formative assessment and classroom process quality: How do they interact in promoting students’ science understanding? American Educational Research Journal, 52(6), 1133–1159.

    Article  Google Scholar 

  • Donaldson, S. I. (2004). Using professional evaluation to improve the effectiveness of nonprofit organizations. In R. E. Riggo & S. S. Orr (Eds.), Improving leadership in nonprofit organizations (pp. 234–251). San Francisco, CA: Wiley.

    Google Scholar 

  • Elacqua, G. (2016). Building more effective education systems. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.

    Google Scholar 

  • Ellwart, T., & Konradt, U. (2011). Formative versus reflective measurement: An illustration using work-family balance. Journal of Psychology, 145(5), 391–417.

    Article  Google Scholar 

  • Faubert, V. (2009). School evaluation: Current practices in OECD countries and a literature review (OECD Education working papers, no. 42). Paris, France: OECD.

    Google Scholar 

  • Fischer, J., He, J., & Klieme, E.. (Submitted). The structure of teaching practices across countries: A combination of factor analysis and network analysis.

    Google Scholar 

  • Fischer J., Klieme E., & Praetorius A-K.. (Submitted). The impact of linguistic similarity on cross-cultural comparability of students’ perceptions of teaching quality.

    Google Scholar 

  • Glas, C. A. W., & Jehangir, K. (2014). Modeling country specific differential item functioning. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large scale assessment (pp. 97–116). Boca Raton, FL: CRC Press.

    Google Scholar 

  • Gustafsson, J.-E. (2007). Understanding casual influences on educational achievement through analysis of differences over time within countries. In T. Loveless (Ed.), Lessons learned: What international assessments tell us about math achievement (pp. 37–63). Washington, DC: The Brookings Institution.

    Google Scholar 

  • Harlen, W., & Deakin Crick, R. (2002). A systematic review of the impact of summative assessment and tests on students’ motivation for learning (EPPI-Centre Review, version 1.1∗). London: EPPI-Centre. https://eppi.ioe.ac.uk/cms/Portals/0/PDF%20reviews%20and%20summaries/ass_rv1.pdf?ver=2006-02-24-112939-763. Accessed 17 June 2016.

  • Hattie, J. (2009). Visible learning. A synthesis of over 800 meta-analyses relating to achievement. London, UK: Routledge.

    Google Scholar 

  • Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.

    Article  Google Scholar 

  • He, J., Buchholz, J., & Klieme, E. (2017). Effects of anchoring vignettes on comparability and predictive validity of student self-reports in 64 cultures. Journal of Cross-Cultural Psychology, 48(3), 319–334.

    Article  Google Scholar 

  • He, J. & Kubacka, K. (2015). Data comparability in the teaching and learning international survey (TALIS) 2008 and 2013 (OECD education working papers vol. 124). Paris, France: OECD.

    Google Scholar 

  • Huber, S. G., & Skedsmo, G. (2016). Editorial: Data use – A key to improve teaching and learning. Educational Assessment, Evaluation and Accountability, 28(1), 1–3.

    Article  Google Scholar 

  • Jerrim, J. (2011). “England’s “plummeting” PISA test scores between 2000 and 2009: Is the performance of our secondary school pupils really in relative decline” (DoQSS working papers 11–09), Department of Quantitative Social Science – UCL Institute of Education, University College London.

    Google Scholar 

  • Johnson, K., Greenseid, L. O., Toal, S. A., King, J. A., Lawrenz, F., & Volkov, B. (2009). Research on evaluation use: A review of the empirical literature from 1986 to 2005. American Journal of Evaluation, 30(3), 377–410.

    Article  Google Scholar 

  • Jude, N. (2016). The assessment of learning contexts in PISA. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.

    Google Scholar 

  • Jude, N., & Kuger, S. (2018). Questionnaire development and design for international large-scale assessments (ILSAs). Washington, DC: National Academy of Education.

    Google Scholar 

  • Kaplan, D. & Lee, C. (2018). Optimizing prediction using Bayesian model averaging: Examples using large-scale educational assessments. Evaluation Review. Advance online publication. https://doi.org/10.1177/0193841X18761421

  • Kingston, N., & Nash, B. (2011). Formative assessment: A meta-analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28–37.

    Article  Google Scholar 

  • Klieme, E. (2012). The role of large-scale assessments in research on educational effectiveness and school development. In M. von Davier, E. Gonzalez, I. Kirsch, & K. Yamamoto (Eds.), The role of international large-scale assessments: Perspectives from technology, economy, and educational research (pp. 115–147). Heidelberg, Germany: Springer.

    Google Scholar 

  • Klieme, E. (2016, December). TIMSS 2015 and PISA 2015 -How are they related on the country level? (DIPF working paper). https://pisa.dipf.de/de/pdf-ordner/Klieme_TIMSS2015andPISA2015.pdf

  • Klieme, E. (2018, February). Alles schräg (Biased findings). https://www.zeit.de/2018/07/pisa-studie-oecd-politik-eckhard-klieme.

  • Klieme, E., Jude, N., Baumert, J., & Prenzel, M. (2010). PISA 2000–2009: Bilanz der Veränderungen im Schulsystem (Making up the balance of changes in the school system). In E. Klieme, C. Artelt, J. Hartig, N. Jude, O. Koeller, M. Prenzel, W. Schneider, & P. Stanat (Hrsg.), PISA 2009. Bilanz nach einem Jahrzehnt (Making up the balance a decade after). Münster, Germany: Waxmann.

    Google Scholar 

  • Klieme, E., & Kuger, S. (2015). PISA 2015 context questionnaires framework. In PISA 2015 assessment and analytical framework: Science, reading, mathematic and financial literacy (pp. 101–127). Paris, France: OECD.

    Google Scholar 

  • Klieme, E., & Rakoczy, K. (2003). Unterrichtsqualität aus Schülerperspektive: Kulturspezifische Profile, regionale Unterschiede und Zusammenhänge mit Effekten von Unterricht (Teaching quality from a student perspective: Culture-specific profiles, regional differences, and relationships with teaching effects). In J. Baumert, C. Artelt, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, W. Schneider, K.-J. Tillmann, (Hrsg.). PISA 2000. Ein differenzierter Blick auf die Länder der Bundesrepublik Deutschland (Hrsg.) (S. 334–359). Opladen, Germany: Leske + Budrich.

    Google Scholar 

  • Kuger, S., Klieme, E., Jude, N. & Kaplan, D. (Eds.) (2016). Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.

    Google Scholar 

  • Kuger, S., Klieme, E., Lüdtke, O., Schiepe-Tiska, A., & Reiss, K. (2017). Mathematikunterricht von Schülerleistungen in der Sekundarstufe: Zur Validität von Schülerbefragungen in Schulleistungsstudien (Mathematics teaching and student achievement in secondary education: The validity of student surveys in school achievement studies). Zeitschrift fuer Erziehungswissenschaft, 20(2), 612. https://doi.org/10.1007/s11618-017-0750-6

    Article  Google Scholar 

  • Lenkeit, J., & Caro, D. H. (2014). Performance status and change – Measuring education system effectiveness with data from PISA 2000–2009. Educational Research and Evaluation, 20(2), 146–174.

    Article  Google Scholar 

  • McMillan, J. H. (2007). Formative classroom assessment: The key to improving student achievement. In J. H. McMillan (Ed.), Formative classroom assessment. Theory into practice (pp. 1–7). New York/London: Teacher College, Columbia University.

    Google Scholar 

  • Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 3–16). Amsterdam, The Netherlands/Oxford, UK: Elsevier Science.

    Chapter  Google Scholar 

  • OECD. (2005). Formative assessment: Improving learning in secondary classrooms. Paris, France: OECD.

    Book  Google Scholar 

  • OECD. (2007). PISA 2006. Science competencies for tomorrow’s world. Paris, France: OECD.

    Book  Google Scholar 

  • OECD. (2013). Synergies for better learning. An international perspective on evaluation and assessment. OECD reviews of evaluation and assessment in education. Paris, France: OECD.

    Google Scholar 

  • OECD. (2014). PISA 2012 technical report. Paris, France: OECD.

    Google Scholar 

  • OECD. (2017a). PISA 2015 technical report. Paris, France: OECD.

    Google Scholar 

  • OECD. (2017b). PISA 2015 Results, Volume II. Policies and practices for successful schools. Paris, France: OECD.

    Google Scholar 

  • OECD & Vodafone Stiftung. (2018, January). Erfolgsfaktor Resilienz (Success factor resilience). https://www.vodafone-stiftung.de/uploads/tx_newsjson/Vodafone_Stiftung_Erfolgsfaktor_Resilienz_01_02.pdf

  • Rakoczy, K., Klieme, E., Leiss, D., & Blum, W. (2017). Formative assessment in mathematics instruction: Theoretical considerations and empirical results of the Co2CA project. In D. Leutner, J. Fleischer, J. Grünkorn, & E. Klieme (Eds.), Competence assessment in education: Research, models and instruments (pp. 447–467). Cham, Switzerland: Springer.

    Chapter  Google Scholar 

  • Reckwitz, A. (2002). Toward a theory of social practices: A development in culturalist theorizing. European Journal of Social Theory, 5(2), 243–263.

    Article  Google Scholar 

  • Rosenshine, B., & Stevens, R. (1986). Teaching functions. In M. Wittrock (Ed.), Handbook of research on teaching (3rd ed.). New York, NY: Macmillan.

    Google Scholar 

  • Rowan, B. (2002). Large-scale, cross-National Surveys of educational achievement: Promises, pitfalls, and possibilities. In A. C. Porter & A. Gamoran (Eds.), Methodological advances in cross-National Surveys of educational achievement (pp. 319–350). Washington, DC: National Academic Press.

    Google Scholar 

  • Rozman, M., & Klieme, E. (2017). Exploring cross-national changes in instructional practices: Evidence from four cycles of TIMSS (Policy brief vol. 13). Amsterdam, The Netherlands: International Association for the Evaluation of Educational Achievement.

    Google Scholar 

  • Rutkowski, L., & Svetina, D. (2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74(1), 31–57.

    Article  Google Scholar 

  • Ryan, K. E., Chandler, M., & Samuels, M. (2007). What should school-based evaluation look like? Studies in Educational Evaluation, 33(3–4), 197–212.

    Article  Google Scholar 

  • Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.

    Article  Google Scholar 

  • Sanders, J. R., & Davidson, E. J. (2003). A model for school evaluation. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation. Part one: Perspectives/ part two: Practice (pp. 807–826). Dordrecht, The Netherlands: Kluwer Academic Publishers.

    Google Scholar 

  • Scheerens, J. (2002). School self-evaluation: Origins, definitions, approaches, methods and implementation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 35–69). Amsterdam, The Netherlands/Oxford, UK: Elsevier Science.

    Chapter  Google Scholar 

  • Scheerens, J., Glas, C. A., & Thomas, S. M. (2003). Educational evaluation, assessment, and monitoring. A systemic approach. Lisse, Ther Netherlands/Exton, PA: Swets & Zeitlinger.

    Google Scholar 

  • Schmidt, W. H., Burroughs, N. A., Zoido, P., & Houang, R. T. (2015). The role of schooling in perpetuating educational inequality: An international perspective. Educational Researcher, 44(7), 371–386.

    Article  Google Scholar 

  • Shepard, L. A. (2006). Classroom assessment. In R. L. Brennan (Ed.), Educational measurement (pp. 623–646). Westport, CT: Rowman and Littlefield Publishers.

    Google Scholar 

  • Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189.

    Article  Google Scholar 

  • Singer, J., Braun, H., & Chudowsky, N. (Eds.). (2018). International education assessments – Cautions, conundrums, and common sense. Washington, DC: National Academy of Education.

    Google Scholar 

  • Spillane, J. P. (2012). Data in practice: Conceptualizing the data-based decision-making phenomena. American Journal of Education, 118(2), 113–141.

    Article  Google Scholar 

  • Strietholt, R., Bos, W., Gustafsson, J.-E., & Rosén, M. (Eds.). (2014). Educational policy evaluation through international comparative assessments. Münster, Germany: Waxmann.

    Google Scholar 

  • Sun, H., Creemers, B. P. M., & de Jong, R. (2007). Contextual factors and effective school improvement. School Effectiveness and School Improvement, 18(1), 93–122.

    Article  Google Scholar 

  • Teltemann, J., & Klieme, E. (2016). The impact of international testing projects on policy and practice. In G. T. L. Brown & L. R. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 369–386). New York, NY: Routledge.

    Google Scholar 

  • van de Vijver, F. & He, J., (2016), Bias assessment and prevention in non-cognitive outcome measures in PISA questionnaires. In Kuger, S., Klieme, E., Jude, N. & Kaplan, D. (eds.). Assessing contexts of learning world-wide: An international perspective. New York, NY: Springer Science, p. 229–253. 24 p.

    Google Scholar 

  • van de Vijver, F. J. R. (2018). Towards an integrated framework of Bias in noncognitive assessment in international large-scale studies: Challenges and prospects. Educational Measurement: Issues and Practices, 37(4), 49–56. 8p.

    Article  Google Scholar 

  • Visscher, A. J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321–349.

    Article  Google Scholar 

  • Watermann, R., Maaz, K., Bayer, S., & Roczen, N. (2016). Social background. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective (Methodology of educational measurement and assessment) (pp. 117–145). Springer. https://doi.org/10.1007/978-3-319-45357-6

  • Wößmann, L., Lüdemann, E., Schütz, G., & West, M. R. (2009). School accountability, autonomy and choice around the world. Cheltenham, UK: Edward Elgar.

    Google Scholar 

  • Wyatt-Smith, C. (2014). Designing assessment for quality learning: The enabling power of assessment. Heidelberg, Germany: Springer.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eckhard Klieme .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Klieme, E. (2020). Policies and Practices of Assessment: A Showcase for the Use (and Misuse) of International Large Scale Assessments in Educational Effectiveness Research. In: Hall, J., Lindorff, A., Sammons, P. (eds) International Perspectives in Educational Effectiveness Research. Springer, Cham. https://doi.org/10.1007/978-3-030-44810-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44810-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44809-7

  • Online ISBN: 978-3-030-44810-3

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics