Abstract
International Large Scale Assessments (ILSAs) such as TIMSS and PISA provide comparative indicators and trend information on educational systems. Scholars repeatedly claimed that ILSAs should be based on concepts from Educational Effectiveness Research (EER). At the same time, ILSAs can contribute to the further development of EER by providing data, triggering new research studies, and providing instruments which work across cultures. When using ILSA data, however, researchers need to cope with limitations regarding design, sampling, and measurement. Cross-sectional data from individual ILSAs, with little information on students’ learning paths, rarely allow for estimating the effects of policies and practices on student outcomes. Rather, ILSAs inform about the distribution of educational opportunities among students, schools, and regions. Effects of national policies may be identified through country level trend data, if ecological fallacies can be avoided.- In an attempt to illustrate methodological problems and discuss relationships between ILSAs and EER, the present chapter uses a specific showcase: policies and practices of educational assessment. Several related measures were implemented in PISA. Reanalyzing these data, the chapter identifies national patterns of classroom assessment practices, use of assessment, school evaluation and accountability policies. E.g., “soft accountability” (comparing performance with a national standard) is discriminated from “strong accountability” (making test results public); soft accountability was related to country-level growth in achievement. English-speaking countries turned out to show similar patterns, while full invariance with regard to student-perceived assessment and feedback could be established for four (non-American) countries only.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Usually, questionnaires are published in the source language (mostly English) only. For PISA 2015, translated versions from 75 countries, item- and scale-level statistics are available at https://daqs.fachportal-paedagogik.de/search/show/survey/177?language=en. This online depository includes Field Trial material not yet used in the Main Study. For an introduction and conceptual overview, see Kuger, Klieme, Jude and Kaplan (2016).
- 2.
The author wants to thank Anindito Aditomo, Sonja Bayer, Janine Buchholz, Jessica Fischer, Jia He, Nina Jude and Susanne Kuger for collaboration on this topic at the DIPF Department for Research on Educational Quality and Evaluation.
- 3.
This chapter of the Technical Report was authored by Janine Buchholz from DIPF, who kindly shared findings on the PERFEED scale with the present author. For a review of the scaling method, see Buchholz & Hartig, 2017.
- 4.
- 5.
Bergbauer, Hanushek & Wößmann (Bergbauer et al., 2018, p. 17) classified this item as an instance of “school-based external comparisons”.
- 6.
With the exception of PISA 2006.
- 7.
Later replaced by ‘students in national modal grade for 15-year-olds’.
- 8.
The international school questionnaire allows for national adaptations regarding the level on which comparisons are made.
- 9.
Changes in background questionnaire wording across cycles of measurement are yet another obstacle against analyzing trend data from ILSA’s; cf. Singer et al., 2018, p. 64.
References
Abrams, L. M. (2007). Implications of high-stakes testing for the use of formative classroom assessment. In J. H. McMillan (Ed.), Formative classroom assessment: Theory into practice (pp. 79–98). New York, NY/London, UK: Teacher College/Columbia University.
Aloisi, C., & Tymms, P. (2017). PISA trends, social changes, and education reforms. Educational Research and Evaluation, 23(5–6), 180–220.
Altrichter, H., & Maag Merki, K. (2016). Handbuch Neue Steuerung im Schulsystem (2nd ed.). Wiesbaden, Germany: Springer.
Baker, D. P. (2009). The invisible hand of world education culture. In G. Sykes, B. Schneider, & D. N. Plank (Eds.), Handbook of education policy research (pp. 958–968). New York, NY: Routledge.
Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., et al. (2010). Teachers’ mathematical knowledge, cognitive activation in the classroom, and student progress. American Educational Research Journal, 47(1), 133–180. https://doi.org/10.3102/0002831209345157
Bayer, S. (2019). Alle alles ganz lehren – Aber wie? Mathematikunterricht vergleichend zwischen den Schularten [Omnes omnia omnino doceantur – But how? Comparing mathematics teaching between school tracks]. Phil. Dissertation. Goethe University, Frankfurt am Main.
Bayer, S., Klieme, E., & Jude, N. (2016). Assessment and evaluation in educational contexts. In S. In Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning. An international perspective (pp. 469–488). New York, NY: Springer.
Bennett, R. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25.
Bergbauer, A. B., Hanushek, E. A., & Wößmann, L. (2018, July). Testing (CESifo working paper no. 7168 7168 2018).
Bischof, L. M., Hochweber, J., Hartig, J., & Klieme, E. (2013). Schulentwicklung im Verlauf eines Jahrzehnts: Erste Ergebnisse des PISA-Schulpanels [School improvement throughout one decade: First results of the PISA school panel study]. Zeitschrift für Pädagogik, special issue, 59, 172–199.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74.
Black, P., & Wiliam, D. (2004). The formative purpose. Assessment must first promote learning. In M. Wilson (Ed.), Towards coherence between classroom assessment and accountability: 103rd yearbook of the national society for the study of education, Part II (pp. 20–50). Chicago, IL: University of Chicago Press.
Bogdandy, A. V., & Goldmann, M. (2009). The exercise of international public authority through National Policy Assessment. The PISA study of the OECD as a template for a new international standard legal instrument. Zeitschrift für ausländisches öffentliches Recht und Völkerrecht, 69, 51–102.
Bottani, N., & Tuijnman, A. C. (1994). The design of indicator systems. In A. C. Tuijnman & T. N. Postlethwaithe (Eds.), Monitoring the standards of education (pp. 47–78). Oxford, UK: Pergamon.
Bryk, A., & Hermanson, K. (1994). Observations on the structure, interpretation and use of education indicator systems. In OECD (Ed.), Making education count: Developing and using international indicators (pp. 37–53). Paris, France: OECD.
Buchholz, J. & Hartig, J. (2017). Comparing attitudes across groups: An IRT-based item-fit statistic for the analysis of measurement invariance. Applied Psychological Measurement. Advance online publication. https://doi.org/10.1177/0146621617748323.
Coburn, C., & Turner, E. O. (2011). Research on data use: A framework and analysis. Measurement: Interdisciplinary Research and Practice, 9(4), 173–206.
Creemers, B. P. M., & Kyriakides, L. (2008). The dynamics of educational effectiveness. A contribution to policy, practice and theory in contemporary schools. London, UK/New York, NY: Routledge.
Decristan, J., Klieme, E., Kunter, M., Hochweber, J., Büttner, G., Fauth, B., et al. (2015). Embedded formative assessment and classroom process quality: How do they interact in promoting students’ science understanding? American Educational Research Journal, 52(6), 1133–1159.
Donaldson, S. I. (2004). Using professional evaluation to improve the effectiveness of nonprofit organizations. In R. E. Riggo & S. S. Orr (Eds.), Improving leadership in nonprofit organizations (pp. 234–251). San Francisco, CA: Wiley.
Elacqua, G. (2016). Building more effective education systems. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.
Ellwart, T., & Konradt, U. (2011). Formative versus reflective measurement: An illustration using work-family balance. Journal of Psychology, 145(5), 391–417.
Faubert, V. (2009). School evaluation: Current practices in OECD countries and a literature review (OECD Education working papers, no. 42). Paris, France: OECD.
Fischer, J., He, J., & Klieme, E.. (Submitted). The structure of teaching practices across countries: A combination of factor analysis and network analysis.
Fischer J., Klieme E., & Praetorius A-K.. (Submitted). The impact of linguistic similarity on cross-cultural comparability of students’ perceptions of teaching quality.
Glas, C. A. W., & Jehangir, K. (2014). Modeling country specific differential item functioning. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large scale assessment (pp. 97–116). Boca Raton, FL: CRC Press.
Gustafsson, J.-E. (2007). Understanding casual influences on educational achievement through analysis of differences over time within countries. In T. Loveless (Ed.), Lessons learned: What international assessments tell us about math achievement (pp. 37–63). Washington, DC: The Brookings Institution.
Harlen, W., & Deakin Crick, R. (2002). A systematic review of the impact of summative assessment and tests on students’ motivation for learning (EPPI-Centre Review, version 1.1∗). London: EPPI-Centre. https://eppi.ioe.ac.uk/cms/Portals/0/PDF%20reviews%20and%20summaries/ass_rv1.pdf?ver=2006-02-24-112939-763. Accessed 17 June 2016.
Hattie, J. (2009). Visible learning. A synthesis of over 800 meta-analyses relating to achievement. London, UK: Routledge.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.
He, J., Buchholz, J., & Klieme, E. (2017). Effects of anchoring vignettes on comparability and predictive validity of student self-reports in 64 cultures. Journal of Cross-Cultural Psychology, 48(3), 319–334.
He, J. & Kubacka, K. (2015). Data comparability in the teaching and learning international survey (TALIS) 2008 and 2013 (OECD education working papers vol. 124). Paris, France: OECD.
Huber, S. G., & Skedsmo, G. (2016). Editorial: Data use – A key to improve teaching and learning. Educational Assessment, Evaluation and Accountability, 28(1), 1–3.
Jerrim, J. (2011). “England’s “plummeting” PISA test scores between 2000 and 2009: Is the performance of our secondary school pupils really in relative decline” (DoQSS working papers 11–09), Department of Quantitative Social Science – UCL Institute of Education, University College London.
Johnson, K., Greenseid, L. O., Toal, S. A., King, J. A., Lawrenz, F., & Volkov, B. (2009). Research on evaluation use: A review of the empirical literature from 1986 to 2005. American Journal of Evaluation, 30(3), 377–410.
Jude, N. (2016). The assessment of learning contexts in PISA. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.
Jude, N., & Kuger, S. (2018). Questionnaire development and design for international large-scale assessments (ILSAs). Washington, DC: National Academy of Education.
Kaplan, D. & Lee, C. (2018). Optimizing prediction using Bayesian model averaging: Examples using large-scale educational assessments. Evaluation Review. Advance online publication. https://doi.org/10.1177/0193841X18761421
Kingston, N., & Nash, B. (2011). Formative assessment: A meta-analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28–37.
Klieme, E. (2012). The role of large-scale assessments in research on educational effectiveness and school development. In M. von Davier, E. Gonzalez, I. Kirsch, & K. Yamamoto (Eds.), The role of international large-scale assessments: Perspectives from technology, economy, and educational research (pp. 115–147). Heidelberg, Germany: Springer.
Klieme, E. (2016, December). TIMSS 2015 and PISA 2015 -How are they related on the country level? (DIPF working paper). https://pisa.dipf.de/de/pdf-ordner/Klieme_TIMSS2015andPISA2015.pdf
Klieme, E. (2018, February). Alles schräg (Biased findings). https://www.zeit.de/2018/07/pisa-studie-oecd-politik-eckhard-klieme.
Klieme, E., Jude, N., Baumert, J., & Prenzel, M. (2010). PISA 2000–2009: Bilanz der Veränderungen im Schulsystem (Making up the balance of changes in the school system). In E. Klieme, C. Artelt, J. Hartig, N. Jude, O. Koeller, M. Prenzel, W. Schneider, & P. Stanat (Hrsg.), PISA 2009. Bilanz nach einem Jahrzehnt (Making up the balance a decade after). Münster, Germany: Waxmann.
Klieme, E., & Kuger, S. (2015). PISA 2015 context questionnaires framework. In PISA 2015 assessment and analytical framework: Science, reading, mathematic and financial literacy (pp. 101–127). Paris, France: OECD.
Klieme, E., & Rakoczy, K. (2003). Unterrichtsqualität aus Schülerperspektive: Kulturspezifische Profile, regionale Unterschiede und Zusammenhänge mit Effekten von Unterricht (Teaching quality from a student perspective: Culture-specific profiles, regional differences, and relationships with teaching effects). In J. Baumert, C. Artelt, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, W. Schneider, K.-J. Tillmann, (Hrsg.). PISA 2000. Ein differenzierter Blick auf die Länder der Bundesrepublik Deutschland (Hrsg.) (S. 334–359). Opladen, Germany: Leske + Budrich.
Kuger, S., Klieme, E., Jude, N. & Kaplan, D. (Eds.) (2016). Assessing contexts of learning: An international perspective. Dordrecht, The Netherlands: Springer.
Kuger, S., Klieme, E., Lüdtke, O., Schiepe-Tiska, A., & Reiss, K. (2017). Mathematikunterricht von Schülerleistungen in der Sekundarstufe: Zur Validität von Schülerbefragungen in Schulleistungsstudien (Mathematics teaching and student achievement in secondary education: The validity of student surveys in school achievement studies). Zeitschrift fuer Erziehungswissenschaft, 20(2), 612. https://doi.org/10.1007/s11618-017-0750-6
Lenkeit, J., & Caro, D. H. (2014). Performance status and change – Measuring education system effectiveness with data from PISA 2000–2009. Educational Research and Evaluation, 20(2), 146–174.
McMillan, J. H. (2007). Formative classroom assessment: The key to improving student achievement. In J. H. McMillan (Ed.), Formative classroom assessment. Theory into practice (pp. 1–7). New York/London: Teacher College, Columbia University.
Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 3–16). Amsterdam, The Netherlands/Oxford, UK: Elsevier Science.
OECD. (2005). Formative assessment: Improving learning in secondary classrooms. Paris, France: OECD.
OECD. (2007). PISA 2006. Science competencies for tomorrow’s world. Paris, France: OECD.
OECD. (2013). Synergies for better learning. An international perspective on evaluation and assessment. OECD reviews of evaluation and assessment in education. Paris, France: OECD.
OECD. (2014). PISA 2012 technical report. Paris, France: OECD.
OECD. (2017a). PISA 2015 technical report. Paris, France: OECD.
OECD. (2017b). PISA 2015 Results, Volume II. Policies and practices for successful schools. Paris, France: OECD.
OECD & Vodafone Stiftung. (2018, January). Erfolgsfaktor Resilienz (Success factor resilience). https://www.vodafone-stiftung.de/uploads/tx_newsjson/Vodafone_Stiftung_Erfolgsfaktor_Resilienz_01_02.pdf
Rakoczy, K., Klieme, E., Leiss, D., & Blum, W. (2017). Formative assessment in mathematics instruction: Theoretical considerations and empirical results of the Co2CA project. In D. Leutner, J. Fleischer, J. Grünkorn, & E. Klieme (Eds.), Competence assessment in education: Research, models and instruments (pp. 447–467). Cham, Switzerland: Springer.
Reckwitz, A. (2002). Toward a theory of social practices: A development in culturalist theorizing. European Journal of Social Theory, 5(2), 243–263.
Rosenshine, B., & Stevens, R. (1986). Teaching functions. In M. Wittrock (Ed.), Handbook of research on teaching (3rd ed.). New York, NY: Macmillan.
Rowan, B. (2002). Large-scale, cross-National Surveys of educational achievement: Promises, pitfalls, and possibilities. In A. C. Porter & A. Gamoran (Eds.), Methodological advances in cross-National Surveys of educational achievement (pp. 319–350). Washington, DC: National Academic Press.
Rozman, M., & Klieme, E. (2017). Exploring cross-national changes in instructional practices: Evidence from four cycles of TIMSS (Policy brief vol. 13). Amsterdam, The Netherlands: International Association for the Evaluation of Educational Achievement.
Rutkowski, L., & Svetina, D. (2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74(1), 31–57.
Ryan, K. E., Chandler, M., & Samuels, M. (2007). What should school-based evaluation look like? Studies in Educational Evaluation, 33(3–4), 197–212.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.
Sanders, J. R., & Davidson, E. J. (2003). A model for school evaluation. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation. Part one: Perspectives/ part two: Practice (pp. 807–826). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Scheerens, J. (2002). School self-evaluation: Origins, definitions, approaches, methods and implementation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 35–69). Amsterdam, The Netherlands/Oxford, UK: Elsevier Science.
Scheerens, J., Glas, C. A., & Thomas, S. M. (2003). Educational evaluation, assessment, and monitoring. A systemic approach. Lisse, Ther Netherlands/Exton, PA: Swets & Zeitlinger.
Schmidt, W. H., Burroughs, N. A., Zoido, P., & Houang, R. T. (2015). The role of schooling in perpetuating educational inequality: An international perspective. Educational Researcher, 44(7), 371–386.
Shepard, L. A. (2006). Classroom assessment. In R. L. Brennan (Ed.), Educational measurement (pp. 623–646). Westport, CT: Rowman and Littlefield Publishers.
Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189.
Singer, J., Braun, H., & Chudowsky, N. (Eds.). (2018). International education assessments – Cautions, conundrums, and common sense. Washington, DC: National Academy of Education.
Spillane, J. P. (2012). Data in practice: Conceptualizing the data-based decision-making phenomena. American Journal of Education, 118(2), 113–141.
Strietholt, R., Bos, W., Gustafsson, J.-E., & Rosén, M. (Eds.). (2014). Educational policy evaluation through international comparative assessments. Münster, Germany: Waxmann.
Sun, H., Creemers, B. P. M., & de Jong, R. (2007). Contextual factors and effective school improvement. School Effectiveness and School Improvement, 18(1), 93–122.
Teltemann, J., & Klieme, E. (2016). The impact of international testing projects on policy and practice. In G. T. L. Brown & L. R. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 369–386). New York, NY: Routledge.
van de Vijver, F. & He, J., (2016), Bias assessment and prevention in non-cognitive outcome measures in PISA questionnaires. In Kuger, S., Klieme, E., Jude, N. & Kaplan, D. (eds.). Assessing contexts of learning world-wide: An international perspective. New York, NY: Springer Science, p. 229–253. 24 p.
van de Vijver, F. J. R. (2018). Towards an integrated framework of Bias in noncognitive assessment in international large-scale studies: Challenges and prospects. Educational Measurement: Issues and Practices, 37(4), 49–56. 8p.
Visscher, A. J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321–349.
Watermann, R., Maaz, K., Bayer, S., & Roczen, N. (2016). Social background. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective (Methodology of educational measurement and assessment) (pp. 117–145). Springer. https://doi.org/10.1007/978-3-319-45357-6
Wößmann, L., Lüdemann, E., Schütz, G., & West, M. R. (2009). School accountability, autonomy and choice around the world. Cheltenham, UK: Edward Elgar.
Wyatt-Smith, C. (2014). Designing assessment for quality learning: The enabling power of assessment. Heidelberg, Germany: Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Klieme, E. (2020). Policies and Practices of Assessment: A Showcase for the Use (and Misuse) of International Large Scale Assessments in Educational Effectiveness Research. In: Hall, J., Lindorff, A., Sammons, P. (eds) International Perspectives in Educational Effectiveness Research. Springer, Cham. https://doi.org/10.1007/978-3-030-44810-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-44810-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44809-7
Online ISBN: 978-3-030-44810-3
eBook Packages: EducationEducation (R0)