Abedi, J., & Lord, C. (2001). The language factors in mathematics tests. Applied Measurement in Education, 14(3), 219–234.
Auld, E., & Morris, P. (2016). PISA, policy and persuasion: Translating complex conditions into education ‘best practice’. Comparative Education, 52(2), 202–229.
Australian Association of Mathematics Teachers Inc. (2008). Position paper on the practice of assessing mathematical learning. http://www.aamt.edu.au/content/download/9895/126744/file/Assessment_position_paper_2017.pdf. Accessed 9 July 2017.
Ayalon, H., & Livneh, I. (2013). Educational standardization and gender differences in mathematics achievement: A comparative study. Social Science Research, 42(2), 432–445.
Baird, J.-A., Johnson, S., Hopfenbeck, T. H., Isaacs, T., Sprague, T., Stobart, G., & Yu, G. (2016). On the supranational spell of PISA in policy. Educational Research, 58(2), 121–138.
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., Ravitch, D., et al. (2010). Problems with the use of student test scores to evaluate teachers. Economic Policy Institute Briefing Paper #278. http://www.epi.org/publication/bp278/. Accessed 9 July 2017.
Biesta, G. (2009). Good education in an age of measurement: On the need to reconnect with the question of purpose in education. Educational Assessment, Evaluation and Accountability, 21(1), 33–46.
Black, P., & Wiliam, D. (2005). Inside the black box: Raising standards through classroom assessment. The Phi Delta Kappan, 80(2), 139–148.
Black, P., & Wiliam, D. (2012). Assessment for learning in the classroom. In J. Gardner (Ed.), Assessment and learning (pp. 11–32). London: Sage.
Bradshaw, C. P., O’Brennan, L. M., & McNeely, C. A. (2008). Core competencies and the prevention of school failure and early school leaving. New Directions for Child and Adolescent Development, 122, 19–32.
Brown, G. T. L., & Harris, L. R. (2009). Unintended consequences of using tests to improve learning: How improvement-oriented resources heighten conceptions of assessment as school accountability. Journal of Multidisciplinary Evaluation, 6(12), 68–91.
Buchholtz, N., Kaiser, G., & Blömeke, S. (2014). Measuring pedagogical content knowledge in mathematics—conceptualizing a complex domain. Journal für Mathematik-Didaktik, 35(1), 101–128.
Buchholtz, N., Krosanke, N., Orschulik, A. B., & Vorhölter, K. (2018). Combining and integrating formative and summative assessment in mathematics teacher education. ZDM Mathematics Education, 50(4), 1–14.
Buchholtz, N., Leung, F. K. S., Ding, L., Kaiser, G., Park, K., & Schwarz, B. (2013). Future mathematics teachers’ professional knowledge of elementary mathematics from an advanced standpoint. ZDM, 45(1), 107–120.
Burkhardt, H., & Schoenfeld, A. (2003). Improving educational research: Toward a more useful, more influential, and better-funded enterprise. Educational Researcher, 32(9), 3–14.
Burkhardt, H., & Schoenfeld, A. (2018). Assessment in the service of learning: Challenges and opportunities. ZDM Mathematics Education, 50(4), 1–15.
Cai, J., Hwang, S., & Middleton, J. A. (2015). The role of large-scale studies in mathematics education. In J. A. Middleton, S. Hwang & J. Cai (Eds.), Large-scale studies in mathematics education (pp. 405–414). Cham: Springer.
Cai, J., Mok, I. A. C., Reddy, V., & Stacey, K. (2016). International comparative studies in mathematics: Lessons for improving students learning. In ICME-13 topical surveys (pp. 1–36). Cham (Switzerland): Springer.
Cotton, C., McIntyre, F., & Price, J. (2010). Gender differences disappear with exposure to competition. Working paper 2010–11. University of Miami, Department of Economics. http://moya.bus.miami.edu/~ccotton/papers/cotton_mcintyre_price_2009.pdf. Accessed 9 July 2017.
Elstad, E., Nortvedt, G. A., & Turmo, A. (2009). The Norwegian assessment system: An accountability perspective. CADMO, 17(1), 89–103.
Ernest, P. (2014). Policy debates in mathematics education. In S. Lerman (Ed.), Encyclopedia of mathematics education. Dordrecht: Springer.
Fischer, R. (2004). Standardization to account for cross-cultural response bias: A classification of score adjustment procedures and review of research. Journal of Cross-Cultural Psychology, 35(3), 263–282.
Fujita, T., Jones, K., & Miyazaki, M. (2018). Learners’ use of domain-specific computer-based feedback to overcome logical circularity in deductive proving in geometry. ZDM Mathematics Education, 50(4), 1–15.
Gaber, S., Cankar, G., Umek, L. M., & Tašner, V. (2012). The danger of inadequate conceptualisation in PISA for education policy. Compare, 42(4), 647–663.
Grant, M., & Booth, A. (2009). A typology of reviews: An analysis of 14 review types and associated methodologies. Health Information and Libraries Journal, 26(2), 91–108.
Groß Ophoff, J. (2013). Lernstandserhebungen: Reflexion und Nutzung. Münster: Waxmann.
Hallinger, P., & Heck, R. H. (2010). Collaborative leadership and school improvement: Understanding the impact on school capacity and student learning. School Leadership & Management, 30(2), 95–110.
Hamilton, L. S., Stecher, B. M., Marsh, J. A., McCombs, J. S., Robyn, A., Russell, J. L., et al. (2007). Standards-based accountability under no child left behind: Experiences of teachers and administrators in three states. Santa Monica: RAND Corporation.
Hannon, B. (2012). Test anxiety and perfomance-avoidance goals explain gender differences in SAT-V, SAT-M, and overall SAT scores. Personality and Individual Differences, 53(7), 816–820.
Hattie, J. A. C., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.
Heritage, M., & Wylie, C. (2018). Reaping the benefits of assessment for learning: Achievement, identity and equity. ZDM Mathematics Education, 50(4), 1–13.
Hoogland, K., & Tout, D. (2018). Computer-based assessment of mathematics in the 21st century: Pressures and tensions. ZDM Mathematics Education, 50(4), 1–12.
Hopfenbeck, T. H., & Görgen, K. (2017). The politics of PISA: The media, policy and public responses in Norway and England. European Journal of Education, 52(2), 195–205.
Hopson, R., & Hood, S. (2005). An untold story in evaluation roots: Reid E. Jackson and his contribution toward culturally responsive evaluation at three quarters of a century. In S. Hood, R. Hopson & H. Frierson (Eds.), The role of culture and cultural context (pp. 87–104). Greenwich: Information Age Publishing.
Hoth, J., Döhrmann, M., Kaiser, G., Busse, A., König, J., & Blömeke, S. (2016). Diagnostic competence of primary school mathematics teachers during classroom situations.. ZDM Mathematics Education, 48(1), 41–53.
Hsieh, F.-J., Chu, C.-T., Hsieh, C.-J., & Lin, P.-J. (2014). In-depth analyses of different countries’ responses to MCK items: A view on the differences within and between East and West. In S. Blömeke, F.-J. Hsieh, G. Kaiser & W. H. Schmidt (Eds.), International perspectives on teacher knowledge, beliefs and opportunities to learn (pp. 115–140). Dordrecht: Springer.
Hyde, J. S., & Mertz, J. E. (2009). Gender, culture, and mathematics performance. Proceedings of the National Academy of Sciences of the United States of America, 106(22), 8801–8807.
Institut zur Qualitätsentwicklung im Bildungswesen (IQB). (2017). Erprobungsstudie 2017 zu den Bildungsstandards Mathematik in der Sekundarstufe I. https://www.iqb.hu-berlin.de/bt/BT2018/Erprobungsstudie2017. Accessed 27 Apr 2018.
Jerrim, J. (2016). PISA 2012: How do results for the paper and computer tests compare? Assessment in Education: Principles, Policy & Practice, 23(4), 495–518.
Kaarstein, H. (2014). Norwegian mathematics teachers’ and educational researchers’ perception of MPCK items used in the TEDS-M study. Nordisk Matematikkdidaktikk, 19(3–4), 57–82.
Kaiser, G., Blömeke, S., König, J., Busse, A., Döhrmann, M., & Hoth, J. (2017). Professional competencies of (prospective) mathematics teachers: Cognitive versus situated approaches. Educational Studies in Mathematics, 94(2), 161–182.
Kilpatrick, J. (2014). History of research in mathematics education. In S. Lerman (Ed.), Encyclopedia of mathematics education. Dordrecht: Springer.
Klenowski, V. (2009). Australian indigenous students: Addressing equity issues in assessment. Teacher Education, 20(1), 77–93.
Leder, G., & Forgasz, H. J. (2018). Measuring who counts: Gender and mathematics assessment. ZDM Mathematics Education, 50(4), 1–11.
Lester, F. Jr. (Ed.). (2007). Second handbook of research on mathematics teaching and learning. Charlotte: Information Age Publishing.
Lin, F.-L., Wang, T.-Y., & Chang, Y.-P. (2018). Effects of large-scale studies on mathematics education policy on Taiwan through the lens of societal and cultural characteristics. ZDM Mathematics Education, 50(4), 1–14.
Lindberg, S. M., Hyde, J. S., Petersen, J. L., & Linn, M. C. (2010). New trends in gender and mathematics performance: A meta-analysis. Psychological Bulletin, 136(6), 1123–1135.
Liu, O. L., & Wilson, M. (2009). Gender differences in large-scale math assessments: PISA trend 2000 and 2003. Applied Measurement in Education, 22(2), 164–184.
Lynch, K., & Star, J. R. (2014). Teachers’ views about multiple strategies in middle and high school mathematics. Mathematical Thinking and Learning, 16(2), 85–108.
Ma, X. (1999). A meta-analysis of the relationship between anxiety towards mathematics and achievement in mathematics. Journal for Research in Mathematics Education, 30(5), 520–540.
Martinovic, D., & Manizade, A. G. (2018). The challenges in the assessment for knowledge for teaching geometry. ZDM Mathematics Education, 50(4), 1–17.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.
Middleton, J. A., Cai, J., & Hwang, S. (2015). Why mathematics education needs large-scale research. In J. A. Middleton, J. Cai & S. Hwang (Eds.), Large-scale studies in mathematics education (pp. 1–3). Cham: Springer.
Miller, J., & Mitchell, J. (2006). Interrupted schooling and the acquisition of literacy: Experiences of Sudanese refugees in Victorian secondary schools. Australian Journal of Language and Literacy, 29(2), 150–162.
Montenegro, E., & Jankowski, N. A. (2017). Equity and assessment: Moving towards culturally responsive assessment. National Institute for Learning Outcomes Assessment. http://learningoutcomesassessment.org/documents/OccasionalPaper29.pdf. Accessed 9. July 2017.
Mullis, I. V. S., Martin, M. O., Foy, P., & Hooper, M. (2016). TIMSS 2015 international results in mathematics. Boston College TIMSS & PIRLS International Study Center website: http://timssandpirls.bc.edu/timss2015/international-results/. Accessed 9 July 2017.
Museus, S. D., Palmer, R. T., Davis, R. J., & Maramba, D. (2011). Special issue: Racial and ethnic minority student success in STEM education. ASHE Higher Education Report, 36, 1–140.
National Council of Teachers of Mathematics (NCTM). (2016). Large-scale mathematics assessments and high-stakes decisions: A position of the National Council of Teachers of Mathematics. Reston: NCTM.
National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. Washington, DC: AERA.
Neubrand, M. (2018). Conceptualizations of professional knowledge for teachers of mathematics. ZDM Mathematics Education, 50(4), 1–12.
Newton, P. E. (2007). Clarifying the purpose of educational assessment. Assessment in Education: Principles, Policy & Practice, 14(2), 149–170.
Newton, P. E., & Shaw, S. D. (2014). Validity in educational and psychological assessment. London: Sage.
Nichols, S. L., & Berliner, D. C. (2007). Collateral damage: How high-stakes testing corrupts America’s schools. Cambridge: Harvard Education Press.
Niss, M. (1993). Assessment in mathematics education and its effects: An Introduction. In M. Niss (Ed.), Investigations into assessment in mathematics education. An ICMI Study (pp. 1–30). Dordrecht: Springer.
Niss, M. (2007). Reflections on the state of and trends in research on mathematics teaching and learning. In F. K. J. Lester (Ed.), Second handbook of research on mathematics teaching and learning (pp. 1293–1312). Charlotte: Information Age Publishing.
Niss, M. (2015). Mathematical competencies and PISA. In R. Turner & K. Stacey (Eds.), Assessing mathematical literacy: The PISA experience (pp. 35–55). Cham: Springer.
Nortvedt, G. A. (2011). Coping strategies applied to comprehend multistep arithmetic word problems by students with above-average numeracy skills and below-average reading skills. Journal for Mathematical Behavior, 30(3), 255–269.
Nortvedt, G. A. (2018). Policy impact of PISA on mathematics education: The case of Norway. European Journal for Psychology in Education, 33(3), 427–444.
Nortvedt, G. A., Gustafsson, J.-E., & Lehre, A.-C. W. G. (2016). The importance of InQua for the relation between achievement in reading and mathematics. In T. Nilsen & J.-E. Gustafsson (Eds.), Teacher quality, instructional quality and student outcome: Relationships across countries, cohorts and time (pp. 97–113). Cham: Springer.
OECD. (2013a). PISA 2012 results: Student performance in mathematics, reading, science. Volume I. Paris: OECD Publishing.
OECD. (2013b). PISA 2012 results: Ready to learn. Students’ engagement, drive and self-beliefs. Volume III. Paris: OECD Publishing.
OECD. (2013c). PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving and financial literacy. Paris: OECD Publishing.
OECD. (2015). Helping immigrant students to succeed at school—and beyond. Paris: OECD Publishing.
OECD. (2016). PISA 2015 results: Excellence and equity in education (Vol I). Paris: OECD Publishing.
Pajares, F., & Miller, M. D. (1995). Mathematics self-efficacy and mathematics performances: The need for specificity of assessment. Journal of Counseling Psychology, 42(2), 190–198.
Palm, T., Boesen, J., & Lithner, J. (2011). Mathematical reasoning in Swedish upper secondary level assessments. Mathematics Thinking and Learning, 13(3), 221–246.
Pankow, L., Kaiser, G., & König, J. (2018). Perception of students’ errors under time limitation: Are teachers better than mathematicians or students? Results of a validation study. ZDM Mathematics Education, 50(4), 1–12.
Paxton, G., Smith, N., Win, A. K., Mulholland, N., & Hood, S. (2011). Refugee status report: A report on how refugee children and young people in Victoria are faring. Melbourne: Department of Education and Early Childhood Development (DEECD).
Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.
Rowland, T., & Ruthven, K. (2010). Mathematical knowledge in teaching. Dordrecht: Springer.
Sälzer, C., & Prenzel, M. (2014). Looking back at five rounds of PISA: Impacts on teaching and learning in Germany. Solsko Polje, 25(5/6), 53–72.
Sangwin, C. J. (2013). Computer aided assessment of mathematics. Oxford: Oxford University Press.
Semana, S., & Santos, L. (2018). Self-regulation of learning in student participation in mathematics assessment. ZDM Mathematics Education, 50(4), 1–13.
Scherer, P., Beswick, K., DeBois, L., Healy, L., & Opitz, E. M. (2016). Assistance of students with mathematical learning difficulties: How can research support practice? ZDM, 48, 633–649.
Schoenfeld, A. (2007). Issues and tensions in the assessment of mathematical proficiency. In A. Schoenfeld (Ed.), Assessing mathematical proficiency (pp. 3–16). New York: Cambridge University Press.
Seeley, C. (2006). Teaching to the test. NCTM News Bulletin. http://www.nctm.org/News-and-Calendar/Messages-from-the-President/Archive/Cathy-Seeley/Teaching-to-the-Test/. Accessed 9 July 2017.
Shen, C., & Tam, H. P. (2008). The paradoxical relationship between student achievement and self-perception: A cross-national analysis based on three waves of TIMSS data. Educational Research and Evaluation, 14(1), 87–100.
Siemon, D., Enilane, F., & McCarty, J. (2004). Supporting indigenous students’ achievement in numeracy. Australian Primary Mathematics Classroom, 9(4), 50–53.
Speer, N. M., King, K. D., & Howell, H. (2015). Definitions of mathematical knowledge for teaching: Using these constructs in research on secondary and college mathematics teachers. Journal of Mathematics Teacher Education, 18(2), 105–122.
Stobart, G. (2008). Testing times: The uses and abuses of assessment. Oxford: Routledge.
Suurtamm, C., & Neubrand, M. (2015). Assessment and testing in mathematics education. In S. J. Cho (Ed.), Proceedings of the 12th International Congress on Mathematical Education (pp. 557–562). Cham: Springer.
Suurtamm, C., Thompson, D. R., Kim, R. Y., Moreno, L. D., Sayac, N., Schukajlow, S., et al. (2016). Assessment in mathematics education: Large-scale assessment and classroom assessment. Cham: Springer.
Ubuz, B., Aydin. (2018). Geometry knowledge test about triangles: Development and validation. ZDM Mathematics Education, 50(4).
van den Heuvel-Panhuizen, M., & Becker, J. (2003). Towards a didactic model for assessment design in mathematics education. In A. J. Bishop, M. A. Clements, C. Keitel, J. Kilpatrick & F. K. S. Leung (Eds.), Second international handbook of mathematics education (pp. 689–716). Dordrecht: Springer.
Wang, S., Jiao, H., Young, M., Brooks, T., & Olson, J. (2007). A meta-analysis of testing mode effects in grade K–12 mathematics tests. Educational and Psychological Measurement, 67(2), 219–238.
Wiliam, D. (2003). The impact of educational research on mathematics education. In A. J. Bishop, M. A. Clements, C. Keitel, J. Kilpatrick & F. K. S. Leung (Eds.), Second international handbook of mathematics education (pp. 471–490). Dordrecht: Springer Netherlands.
Wiliam, D. (2007). Keeping learning on track. In F. K. J. Lester (Ed.), Second handbook of research on mathematics teaching and learning (pp. 1053–1098). Charlotte: Information Age.
Wilson, A., Watson, C., Thompson, T. L., Drew, V., & Doyle, S. (2017). Learning analytics: Challenges and limitations. Teaching in Higher Education, 22(8), 991–1007.
Wong, P. A., & Glass, R. D. (2005). Assessing a professional development school approach to preparing teachers for urban schools serving low-income, culturally and linguistically diverse communities. Teacher Education Quarterly, 32(3), 63–77.
Wößmann, L. (2005). The effect heterogeneity of central examinations: Evidence from TIMSS, TIMSS-Repeat and PISA. Education Economics, 13(2), 143–169.
Wuttke, J. (2007). Uncertainties and bias in PISA. In S. T. Hopmann, G. Brinek & M. Retzl (Eds.), PISA according to PISA: Does PISA keep what it promises? Vienna: LIT-Verlag.
Hansen, K. Y., & Strietholt, R. (2018). Does schooling actually perpetuate educational inequality in mathematics performance? A question of validity. ZDM Mathematics Education, 50(4), 1–6.