Policies and Practices of Assessment: A Showcase for the Use (and Misuse) of International Large Scale Assessments in Educational Effectiveness Research

Klieme, Eckhard

doi:10.1007/978-3-030-44810-3_7

Eckhard Klieme³

1035 Accesses
6 Citations

Abstract

International Large Scale Assessments (ILSAs) such as TIMSS and PISA provide comparative indicators and trend information on educational systems. Scholars repeatedly claimed that ILSAs should be based on concepts from Educational Effectiveness Research (EER). At the same time, ILSAs can contribute to the further development of EER by providing data, triggering new research studies, and providing instruments which work across cultures. When using ILSA data, however, researchers need to cope with limitations regarding design, sampling, and measurement. Cross-sectional data from individual ILSAs, with little information on students’ learning paths, rarely allow for estimating the effects of policies and practices on student outcomes. Rather, ILSAs inform about the distribution of educational opportunities among students, schools, and regions. Effects of national policies may be identified through country level trend data, if ecological fallacies can be avoided.- In an attempt to illustrate methodological problems and discuss relationships between ILSAs and EER, the present chapter uses a specific showcase: policies and practices of educational assessment. Several related measures were implemented in PISA. Reanalyzing these data, the chapter identifies national patterns of classroom assessment practices, use of assessment, school evaluation and accountability policies. E.g., “soft accountability” (comparing performance with a national standard) is discriminated from “strong accountability” (making test results public); soft accountability was related to country-level growth in achievement. English-speaking countries turned out to show similar patterns, while full invariance with regard to student-perceived assessment and feedback could be established for four (non-American) countries only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Usually, questionnaires are published in the source language (mostly English) only. For PISA 2015, translated versions from 75 countries, item- and scale-level statistics are available at https://daqs.fachportal-paedagogik.de/search/show/survey/177?language=en. This online depository includes Field Trial material not yet used in the Main Study. For an introduction and conceptual overview, see Kuger, Klieme, Jude and Kaplan (2016).
2.
The author wants to thank Anindito Aditomo, Sonja Bayer, Janine Buchholz, Jessica Fischer, Jia He, Nina Jude and Susanne Kuger for collaboration on this topic at the DIPF Department for Research on Educational Quality and Evaluation.
3.
This chapter of the Technical Report was authored by Janine Buchholz from DIPF, who kindly shared findings on the PERFEED scale with the present author. For a review of the scaling method, see Buchholz & Hartig, 2017.
4.
https://www.focus.de/politik/deutschland/bildung-lehrer-machen-gegen-testeritis-an-schulen-front_id_3819831.html
5.
Bergbauer, Hanushek & Wößmann (Bergbauer et al., 2018, p. 17) classified this item as an instance of “school-based external comparisons”.
6.
With the exception of PISA 2006.
7.
Later replaced by ‘students in national modal grade for 15-year-olds’.
8.
The international school questionnaire allows for national adaptations regarding the level on which comparisons are made.
9.
Changes in background questionnaire wording across cycles of measurement are yet another obstacle against analyzing trend data from ILSA’s; cf. Singer et al., 2018, p. 64.

References

Abrams, L. M. (2007). Implications of high-stakes testing for the use of formative classroom assessment. In J. H. McMillan (Ed.), Formative classroom assessment: Theory into practice (pp. 79–98). New York, NY/London, UK: Teacher College/Columbia University.
Google Scholar
Aloisi, C., & Tymms, P. (2017). PISA trends, social changes, and education reforms. Educational Research and Evaluation, 23(5–6), 180–220.
Article Google Scholar
Altrichter, H., & Maag Merki, K. (2016). Handbuch Neue Steuerung im Schulsystem (2nd ed.). Wiesbaden, Germany: Springer.
Book Google Scholar
Baker, D. P. (2009). The invisible hand of world education culture. In G. Sykes, B. Schneider, & D. N. Plank (Eds.), Handbook of education policy research (pp. 958–968). New York, NY: Routledge.
Google Scholar
Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., et al. (2010). Teachers’ mathematical knowledge, cognitive activation in the classroom, and student progress. American Educational Research Journal, 47(1), 133–180. https://doi.org/10.3102/0002831209345157
Article Google Scholar
Bayer, S. (2019). Alle alles ganz lehren – Aber wie? Mathematikunterricht vergleichend zwischen den Schularten [Omnes omnia omnino doceantur – But how? Comparing mathematics teaching between school tracks]. Phil. Dissertation. Goethe University, Frankfurt am Main.
Google Scholar
Bayer, S., Klieme, E., & Jude, N. (2016). Assessment and evaluation in educational contexts. In S. In Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning. An international perspective (pp. 469–488). New York, NY: Springer.
Chapter Google Scholar
Bennett, R. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25.
Google Scholar
Bergbauer, A. B., Hanushek, E. A., & Wößmann, L. (2018, July). Testing (CESifo working paper no. 7168 7168 2018).
Google Scholar
Bischof, L. M., Hochweber, J., Hartig, J., & Klieme, E. (2013). Schulentwicklung im Verlauf eines Jahrzehnts: Erste Ergebnisse des PISA-Schulpanels [School improvement throughout one decade: First results of the PISA school panel study]. Zeitschrift für Pädagogik, special issue, 59, 172–199.
Google Scholar
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74.
Google Scholar
Black, P., & Wiliam, D. (2004). The formative purpose. Assessment must first promote learning. In M. Wilson (Ed.), Towards coherence between classroom assessment and accountability: 103rd yearbook of the national society for the study of education, Part II (pp. 20–50). Chicago, IL: University of Chicago Press.
Google Scholar
Bogdandy, A. V., & Goldmann, M. (2009). The exercise of international public authority through National Policy Assessment. The PISA study of the OECD as a template for a new international standard legal instrument. Zeitschrift für ausländisches öffentliches Recht und Völkerrecht, 69, 51–102.
Google Scholar
Bottani, N., & Tuijnman, A. C. (1994). The design of indicator systems. In A. C. Tuijnman & T. N. Postlethwaithe (Eds.), Monitoring the standards of education (pp. 47–78). Oxford, UK: Pergamon.
Google Scholar
Bryk, A., & Hermanson, K. (1994). Observations on the structure, interpretation and use of education indicator systems. In OECD (Ed.), Making education count: Developing and using international indicators (pp. 37–53). Paris, France: OECD.
Google Scholar
Buchholz, J. & Hartig, J. (2017). Comparing attitudes across groups: An IRT-based item-fit statistic for the analysis of measurement invariance. Applied Psychological Measurement. Advance online publication. https://doi.org/10.1177/0146621617748323.
Coburn, C., & Turner, E. O. (2011). Research on data use: A framework and analysis. Measurement: Interdisciplinary Research and Practice, 9(4), 173–206.
Google Scholar
Creemers, B. P. M., & Kyriakides, L. (2008). The dynamics of educational effectiveness. A contribution to policy, practice and theory in contemporary schools. London, UK/New York, NY: Routledge.
Google Scholar
Decristan, J., Klieme, E., Kunter, M., Hochweber, J., Büttner, G., Fauth, B., et al. (2015). Embedded formative assessment and classroom process quality: How do they interact in promoting students’ science understanding? American Educational Research Journal, 52(6), 1133–1159.
Article Google Scholar
Donaldson, S. I. (2004). Using professional evaluation to improve the effectiveness of nonprofit organizations. In R. E. Riggo & S. S. Orr (Eds.), Improving leadership in nonprofit organizations (pp. 234–251). San Francisco, CA: Wiley.
Google Scholar
Elacqua, G. (2016). Building more effective education systems. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.
Google Scholar
Ellwart, T., & Konradt, U. (2011). Formative versus reflective measurement: An illustration using work-family balance. Journal of Psychology, 145(5), 391–417.
Article Google Scholar
Faubert, V. (2009). School evaluation: Current practices in OECD countries and a literature review (OECD Education working papers, no. 42). Paris, France: OECD.
Google Scholar
Fischer, J., He, J., & Klieme, E.. (Submitted). The structure of teaching practices across countries: A combination of factor analysis and network analysis.
Google Scholar
Fischer J., Klieme E., & Praetorius A-K.. (Submitted). The impact of linguistic similarity on cross-cultural comparability of students’ perceptions of teaching quality.
Google Scholar
Glas, C. A. W., & Jehangir, K. (2014). Modeling country specific differential item functioning. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large scale assessment (pp. 97–116). Boca Raton, FL: CRC Press.
Google Scholar
Gustafsson, J.-E. (2007). Understanding casual influences on educational achievement through analysis of differences over time within countries. In T. Loveless (Ed.), Lessons learned: What international assessments tell us about math achievement (pp. 37–63). Washington, DC: The Brookings Institution.
Google Scholar
Harlen, W., & Deakin Crick, R. (2002). A systematic review of the impact of summative assessment and tests on students’ motivation for learning (EPPI-Centre Review, version 1.1∗). London: EPPI-Centre. https://eppi.ioe.ac.uk/cms/Portals/0/PDF%20reviews%20and%20summaries/ass_rv1.pdf?ver=2006-02-24-112939-763. Accessed 17 June 2016.
Hattie, J. (2009). Visible learning. A synthesis of over 800 meta-analyses relating to achievement. London, UK: Routledge.
Google Scholar
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.
Article Google Scholar
He, J., Buchholz, J., & Klieme, E. (2017). Effects of anchoring vignettes on comparability and predictive validity of student self-reports in 64 cultures. Journal of Cross-Cultural Psychology, 48(3), 319–334.
Article Google Scholar
He, J. & Kubacka, K. (2015). Data comparability in the teaching and learning international survey (TALIS) 2008 and 2013 (OECD education working papers vol. 124). Paris, France: OECD.
Google Scholar
Huber, S. G., & Skedsmo, G. (2016). Editorial: Data use – A key to improve teaching and learning. Educational Assessment, Evaluation and Accountability, 28(1), 1–3.
Article Google Scholar
Jerrim, J. (2011). “England’s “plummeting” PISA test scores between 2000 and 2009: Is the performance of our secondary school pupils really in relative decline” (DoQSS working papers 11–09), Department of Quantitative Social Science – UCL Institute of Education, University College London.
Google Scholar
Johnson, K., Greenseid, L. O., Toal, S. A., King, J. A., Lawrenz, F., & Volkov, B. (2009). Research on evaluation use: A review of the empirical literature from 1986 to 2005. American Journal of Evaluation, 30(3), 377–410.
Article Google Scholar
Jude, N. (2016). The assessment of learning contexts in PISA. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.
Google Scholar
Jude, N., & Kuger, S. (2018). Questionnaire development and design for international large-scale assessments (ILSAs). Washington, DC: National Academy of Education.
Google Scholar
Kaplan, D. & Lee, C. (2018). Optimizing prediction using Bayesian model averaging: Examples using large-scale educational assessments. Evaluation Review. Advance online publication. https://doi.org/10.1177/0193841X18761421
Kingston, N., & Nash, B. (2011). Formative assessment: A meta-analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28–37.
Article Google Scholar
Klieme, E. (2012). The role of large-scale assessments in research on educational effectiveness and school development. In M. von Davier, E. Gonzalez, I. Kirsch, & K. Yamamoto (Eds.), The role of international large-scale assessments: Perspectives from technology, economy, and educational research (pp. 115–147). Heidelberg, Germany: Springer.
Google Scholar
Klieme, E. (2016, December). TIMSS 2015 and PISA 2015 -How are they related on the country level? (DIPF working paper). https://pisa.dipf.de/de/pdf-ordner/Klieme_TIMSS2015andPISA2015.pdf
Klieme, E. (2018, February). Alles schräg (Biased findings). https://www.zeit.de/2018/07/pisa-studie-oecd-politik-eckhard-klieme.
Klieme, E., Jude, N., Baumert, J., & Prenzel, M. (2010). PISA 2000–2009: Bilanz der Veränderungen im Schulsystem (Making up the balance of changes in the school system). In E. Klieme, C. Artelt, J. Hartig, N. Jude, O. Koeller, M. Prenzel, W. Schneider, & P. Stanat (Hrsg.), PISA 2009. Bilanz nach einem Jahrzehnt (Making up the balance a decade after). Münster, Germany: Waxmann.
Google Scholar
Klieme, E., & Kuger, S. (2015). PISA 2015 context questionnaires framework. In PISA 2015 assessment and analytical framework: Science, reading, mathematic and financial literacy (pp. 101–127). Paris, France: OECD.
Google Scholar
Klieme, E., & Rakoczy, K. (2003). Unterrichtsqualität aus Schülerperspektive: Kulturspezifische Profile, regionale Unterschiede und Zusammenhänge mit Effekten von Unterricht (Teaching quality from a student perspective: Culture-specific profiles, regional differences, and relationships with teaching effects). In J. Baumert, C. Artelt, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, W. Schneider, K.-J. Tillmann, (Hrsg.). PISA 2000. Ein differenzierter Blick auf die Länder der Bundesrepublik Deutschland (Hrsg.) (S. 334–359). Opladen, Germany: Leske + Budrich.
Google Scholar
Kuger, S., Klieme, E., Jude, N. & Kaplan, D. (Eds.) (2016). Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.
Google Scholar
Kuger, S., Klieme, E., Lüdtke, O., Schiepe-Tiska, A., & Reiss, K. (2017). Mathematikunterricht von Schülerleistungen in der Sekundarstufe: Zur Validität von Schülerbefragungen in Schulleistungsstudien (Mathematics teaching and student achievement in secondary education: The validity of student surveys in school achievement studies). Zeitschrift fuer Erziehungswissenschaft, 20(2), 612. https://doi.org/10.1007/s11618-017-0750-6
Article Google Scholar
Lenkeit, J., & Caro, D. H. (2014). Performance status and change – Measuring education system effectiveness with data from PISA 2000–2009. Educational Research and Evaluation, 20(2), 146–174.
Article Google Scholar
McMillan, J. H. (2007). Formative classroom assessment: The key to improving student achievement. In J. H. McMillan (Ed.), Formative classroom assessment. Theory into practice (pp. 1–7). New York/London: Teacher College, Columbia University.
Google Scholar
Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 3–16). Amsterdam, The Netherlands/Oxford, UK: Elsevier Science.
Chapter Google Scholar
OECD. (2005). Formative assessment: Improving learning in secondary classrooms. Paris, France: OECD.
Book Google Scholar
OECD. (2007). PISA 2006. Science competencies for tomorrow’s world. Paris, France: OECD.
Book Google Scholar
OECD. (2013). Synergies for better learning. An international perspective on evaluation and assessment. OECD reviews of evaluation and assessment in education. Paris, France: OECD.
Google Scholar
OECD. (2014). PISA 2012 technical report. Paris, France: OECD.
Google Scholar
OECD. (2017a). PISA 2015 technical report. Paris, France: OECD.
Google Scholar
OECD. (2017b). PISA 2015 Results, Volume II. Policies and practices for successful schools. Paris, France: OECD.
Google Scholar
OECD & Vodafone Stiftung. (2018, January). Erfolgsfaktor Resilienz (Success factor resilience). https://www.vodafone-stiftung.de/uploads/tx_newsjson/Vodafone_Stiftung_Erfolgsfaktor_Resilienz_01_02.pdf
Rakoczy, K., Klieme, E., Leiss, D., & Blum, W. (2017). Formative assessment in mathematics instruction: Theoretical considerations and empirical results of the Co²CA project. In D. Leutner, J. Fleischer, J. Grünkorn, & E. Klieme (Eds.), Competence assessment in education: Research, models and instruments (pp. 447–467). Cham, Switzerland: Springer.
Chapter Google Scholar
Reckwitz, A. (2002). Toward a theory of social practices: A development in culturalist theorizing. European Journal of Social Theory, 5(2), 243–263.
Article Google Scholar
Rosenshine, B., & Stevens, R. (1986). Teaching functions. In M. Wittrock (Ed.), Handbook of research on teaching (3rd ed.). New York, NY: Macmillan.
Google Scholar
Rowan, B. (2002). Large-scale, cross-National Surveys of educational achievement: Promises, pitfalls, and possibilities. In A. C. Porter & A. Gamoran (Eds.), Methodological advances in cross-National Surveys of educational achievement (pp. 319–350). Washington, DC: National Academic Press.
Google Scholar
Rozman, M., & Klieme, E. (2017). Exploring cross-national changes in instructional practices: Evidence from four cycles of TIMSS (Policy brief vol. 13). Amsterdam, The Netherlands: International Association for the Evaluation of Educational Achievement.
Google Scholar
Rutkowski, L., & Svetina, D. (2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74(1), 31–57.
Article Google Scholar
Ryan, K. E., Chandler, M., & Samuels, M. (2007). What should school-based evaluation look like? Studies in Educational Evaluation, 33(3–4), 197–212.
Article Google Scholar
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.
Article Google Scholar
Sanders, J. R., & Davidson, E. J. (2003). A model for school evaluation. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation. Part one: Perspectives/ part two: Practice (pp. 807–826). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Google Scholar
Scheerens, J. (2002). School self-evaluation: Origins, definitions, approaches, methods and implementation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 35–69). Amsterdam, The Netherlands/Oxford, UK: Elsevier Science.
Chapter Google Scholar
Scheerens, J., Glas, C. A., & Thomas, S. M. (2003). Educational evaluation, assessment, and monitoring. A systemic approach. Lisse, Ther Netherlands/Exton, PA: Swets & Zeitlinger.
Google Scholar
Schmidt, W. H., Burroughs, N. A., Zoido, P., & Houang, R. T. (2015). The role of schooling in perpetuating educational inequality: An international perspective. Educational Researcher, 44(7), 371–386.
Article Google Scholar
Shepard, L. A. (2006). Classroom assessment. In R. L. Brennan (Ed.), Educational measurement (pp. 623–646). Westport, CT: Rowman and Littlefield Publishers.
Google Scholar
Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189.
Article Google Scholar
Singer, J., Braun, H., & Chudowsky, N. (Eds.). (2018). International education assessments – Cautions, conundrums, and common sense. Washington, DC: National Academy of Education.
Google Scholar
Spillane, J. P. (2012). Data in practice: Conceptualizing the data-based decision-making phenomena. American Journal of Education, 118(2), 113–141.
Article Google Scholar
Strietholt, R., Bos, W., Gustafsson, J.-E., & Rosén, M. (Eds.). (2014). Educational policy evaluation through international comparative assessments. Münster, Germany: Waxmann.
Google Scholar
Sun, H., Creemers, B. P. M., & de Jong, R. (2007). Contextual factors and effective school improvement. School Effectiveness and School Improvement, 18(1), 93–122.
Article Google Scholar
Teltemann, J., & Klieme, E. (2016). The impact of international testing projects on policy and practice. In G. T. L. Brown & L. R. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 369–386). New York, NY: Routledge.
Google Scholar
van de Vijver, F. & He, J., (2016), Bias assessment and prevention in non-cognitive outcome measures in PISA questionnaires. In Kuger, S., Klieme, E., Jude, N. & Kaplan, D. (eds.). Assessing contexts of learning world-wide: An international perspective. New York, NY: Springer Science, p. 229–253. 24 p.
Google Scholar
van de Vijver, F. J. R. (2018). Towards an integrated framework of Bias in noncognitive assessment in international large-scale studies: Challenges and prospects. Educational Measurement: Issues and Practices, 37(4), 49–56. 8p.
Article Google Scholar
Visscher, A. J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321–349.
Article Google Scholar
Watermann, R., Maaz, K., Bayer, S., & Roczen, N. (2016). Social background. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective (Methodology of educational measurement and assessment) (pp. 117–145). Springer. https://doi.org/10.1007/978-3-319-45357-6
Wößmann, L., Lüdemann, E., Schütz, G., & West, M. R. (2009). School accountability, autonomy and choice around the world. Cheltenham, UK: Edward Elgar.
Google Scholar
Wyatt-Smith, C. (2014). Designing assessment for quality learning: The enabling power of assessment. Heidelberg, Germany: Springer.
Book Google Scholar

Download references

Author information

Authors and Affiliations

DIPF| Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany
Eckhard Klieme

Authors

Eckhard Klieme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eckhard Klieme .

Editor information

Editors and Affiliations

Southampton Education School, University of Southampton, Southampton, UK
James Hall
Department of Education, University of Oxford, Oxford, UK
Ariel Lindorff & Pamela Sammons &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Klieme, E. (2020). Policies and Practices of Assessment: A Showcase for the Use (and Misuse) of International Large Scale Assessments in Educational Effectiveness Research. In: Hall, J., Lindorff, A., Sammons, P. (eds) International Perspectives in Educational Effectiveness Research. Springer, Cham. https://doi.org/10.1007/978-3-030-44810-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-44810-3_7
Published: 11 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44809-7
Online ISBN: 978-3-030-44810-3
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics