, Volume 48, Issue 1–2, pp 29–40 | Cite as

Theoretical and methodological challenges in measuring instructional quality in mathematics education using classroom observations

  • Lena Schlesinger
  • Armin JentschEmail author
Original Article


In this article, we analyze theoretical as well as methodological challenges in measuring instructional quality in mathematics classrooms by examining standardized observational instruments. At the beginning, we describe the results of a systematic literature review for determining subject-specific aspects measured in recent lesson studies in mathematics education. The main results are that there is little or no consistency in the conceptualization and nomination of subject-specific aspects. We therefore structured these different aspects along two perspectives, a mathematical perspective on mathematics educational quality of instruction as well as a pedagogical perspective. Furthermore, referring to the usage of these observational instruments in the field, in this paper we inquire into methodological challenges in measuring instructional quality in mathematics classrooms, e.g., the optimal number of raters and lessons to be observed. The results are twofold: on the one hand, there are recent studies that provide a useful answer to these questions. On the other hand, these results appear to be specific to the given data. Therefore, this problem seems to be unsolved so far.


Instructional quality Methodological challenges Classroom observations 



We thank Nils Buchholtz, Andreas Busse and the reviewers for helpful suggestions and comments on earlier versions of this article.


  1. American Educational Research Association/American Psychological Association. (1999). Standards for educational and psychological testing. Washington: American Educational Research Association.Google Scholar
  2. Atweh, B., Clarkson, P., & Nebres, B. (2003). Mathematics education in international and global contexts. In A. J. Bishop, M. A. Clements, C. Keitel, J. Kilpatrick, & F. K. S. Leung (Eds.), Second international handbook of mathematics education (pp. 185–229). Dordrecht: Springer Netherlands.CrossRefGoogle Scholar
  3. Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., & Tsai, Y.-M. (2010). Teachers’ mathematical knowledge, cognitive activation in the classroom, and student progress. American Educational Research Journal, 47(1), 133–180.CrossRefGoogle Scholar
  4. Beaton, A. E., Mullis, I. V. S., Martin, M. O., Gonzales, E. J., Kelly, D. L., & Smith, T. A. (1996). Mathematics achievement in the middle school years: IEA’s Third International Mathematics and Science Study. Chestnut Hill: Boston College.Google Scholar
  5. Blömeke, S., Gustafsson, J.-E., & Shavelson, R. J. (2015). Beyond Dichotomies. Zeitschrift für Psychologie, 223(1), 3–13.CrossRefGoogle Scholar
  6. Blum, W., Drücke-Noe, C., Hartung, R., & Köller, O. (2006). Bildungsstandards Mathematik: Konkret. Sekundarstufe 1: Aufgabenbeispiele, Unterrichtsanregungen, Fortbildungsideen. Berlin: Cornelsen Scriptor.Google Scholar
  7. Brennan, R. L. (2001). Generalizability theory. New York: Springer.CrossRefGoogle Scholar
  8. Brennan, R. L. (2011). Generalizability theory and classical test theory. Applied Measurement in Education, 24(1), 1–21. doi: 10.1080/08957347.2011.532417.CrossRefGoogle Scholar
  9. Brophy, J. (2000). Teaching. Brüssel: International Academy of Education.Google Scholar
  10. Brophy, J. (2006). Observational research on generic aspects of classroom teaching. In P. A. Alexander & P. H. Winne (Eds.), Handbook of educational psychology (2nd ed., pp. 755–780). Mahwah: Erlbaum.Google Scholar
  11. Buchholtz, N., Kaiser, G., & Blömeke, S. (2014). Die Erhebung mathematikdidaktischen Wissens—Konzeptualisierung einer komplexen Domäne. Journal für Mathematik-Didaktik, 35(1), 101–128.CrossRefGoogle Scholar
  12. Casabianca, J. M., McCaffrey, D. F., Gitomer, D. H., Bell, C. A., Hamre, B. K., & Pianta, R. C. (2013). Effect of observation mode on measures of secondary mathematics teaching. Educational and Psychological Measurement, 73(5), 757–783.CrossRefGoogle Scholar
  13. Charalambous, C. Y., & Hill, H. C. (2012). Teacher knowledge, curriculum materials, and quality of instruction: Unpacking a complex relationship. Journal of Curriculum Studies, 44(4), 443–466.CrossRefGoogle Scholar
  14. Clare, L., Valdés, R., Pascal, J., & Steinberg, J. (2001). Teachers’ assignments as indicators of instructional quality in elementary schools (CSE Technical Report No. 545). Los Angeles: National Center for Research on Evaluation.Google Scholar
  15. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability. New York: Wiley.Google Scholar
  16. Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. Perspectives in social psychology. New York: Plenum.CrossRefGoogle Scholar
  17. Drollinger-Vetter, B. (2011). Verstehenselemente und strukturelle Klarheit: Fachdidaktische Qualität der Anleitung von mathematischen Verstehensprozessen im Unterricht. Münster: Waxmann.Google Scholar
  18. Drollinger-Vetter, B., & Lipowsky, F. (2006). Fachdidaktische Qualität der Theoriephasen. In E. Klieme, C. Pauli, & K. Reusser (Eds.), Dokumentation der Erhebungs- und Auswertungsinstrumente zur schweizerisch-deutschen Videostudie “Unterrichtsqualität, Lernverhalten und mathematisches Verständnis” (Teil 3: Hugener, Isabelle; Pauli, Christine & Reusser, Kurt: Videoanalysen (pp. 189–205). Frankfurt am Main: GFPF.Google Scholar
  19. Fend, H. (1981). Theorie der Schule (2., durchges. Aufl). U- & -S-Pädagogik. München [u.a.]: Urban & Schwarzenberg.Google Scholar
  20. Gates Foundation (2012). Gathering feedback for teaching: Combining high quality observations with student surveys and achievement gains. Research paper, Accessed 22 Jan 2016.
  21. Hattie, J. (2009). Visible learning. Synthesis of over 800 meta-analyzes relating to achievement. London: Routledge.Google Scholar
  22. Helmke, A. (2012). Unterrichtsqualität und Lehrerprofessionalität: Diagnose, Evaluation und Verbesserung des Unterrichts. Seelze: Klett-Kallmeyer.Google Scholar
  23. Hiebert, J., Gallimore, R., Garnier, H., & Stigler, J. (2003). Teaching mathematics in seven countries. Results from the TIMSS 1999 video study. Washington: National Center for Education Statistics.CrossRefGoogle Scholar
  24. Hiebert, J., & Grouws, D. A. (2007). The effects of classroom mathematics teaching on students’ learning. In F. K. Lester (Ed.), Second handbook of research on mathematics teaching and learning (pp. 371–404). Charlotte: Information Age.Google Scholar
  25. Hill, H. C., Blunk, M. L., Charalambous, C. Y., Lewis, J. M., Phelps, G. C., Sleep, L., & Ball, D. L. (2008). Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study. Cognition and Instruction, 26(4), 430–511.CrossRefGoogle Scholar
  26. Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64. doi: 10.3102/0013189X12437203.CrossRefGoogle Scholar
  27. Hill, H. C., Kapitula, L., & Umland, K. (2010). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794–831. doi: 10.3102/0002831210387916.CrossRefGoogle Scholar
  28. Hill, H. C., Rowan, B., & Ball, D. L. (2005). Effects of teachers’ mathematical knowledge for teaching on student achievement. American Educational Research Journal, 42(2), 371–406.CrossRefGoogle Scholar
  29. Horizon Research, Inc. (2000). Inside the classroom observation and analytic protocol. Chapel Hill: Horizon Research, Inc.Google Scholar
  30. Howard, G. S., Maxwell, S. E., Weiner, R. L., Boynton, K. S., & Rooney, W. M. (1980). Is a behavioral measure the best estimate of behavioral parameters? Perhaps not. Applied Psychological Measurement, 4, 293–311.CrossRefGoogle Scholar
  31. Jacobs, J., Garnier, H., Gallimore, R., Hollingsworth, H., Givvin, K. B., Rust, K., Kawanaka, T., Smith, M., Wearne, D., Manaster, A., Etterbeek, W., Hiebert, J., Stigler, J. (2003). TIMSS 1999 video study technical report: volume 1: Mathematics study, NCES (2003-012), U.S. Department of Education. Washington, DC: National Center for Education Statistics.Google Scholar
  32. Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 18–64). Westport: Praeger.Google Scholar
  33. Kersting, N. B., Givvin, K. B., Thompson, B. J., Santagata, R., & Stigler, J. W. (2012). Measuring usable knowledge: Teachers’ analyses of mathematics classroom videos predict teaching quality and student learning. American Educational Research Journal, 49(3), 568–589. doi: 10.3102/0002831212437853.CrossRefGoogle Scholar
  34. Klieme, E., Pauli, C., & Reusser, K. (2009). The Pythagoras study. In T. Janik & T. Seidel (Eds.), The power of video studies in investigating teaching and learning in the classroom (pp. 137–160). Münster: Waxmann.Google Scholar
  35. Klieme, E., & Rakoczy, K. (2008). Empirische Unterrichtsforschung und Fachdidaktik. Outcome-orientierte Messung und Prozessqualität des Unterrichts. Zeitschrift für Pädagogik, 54, 222–237.Google Scholar
  36. Kounin, J. S. (1970). Disciplin and group management in classrooms. New York: Holt, Rinehart and Winston.Google Scholar
  37. Kunter, M., & Baumert, J. (2006). Who is the expert? Construct and criteria validity of student and teacher ratings of instruction. Learning Environments Research, 9(3), 231–251. doi: 10.1007/s10984-006-9015-7.CrossRefGoogle Scholar
  38. Kunter, M., Baumert, J., & Köller, O. (2007). Effective classroom management and the development of subject-related interest. Learning and Instruction, 17(5), 494–509. doi: 10.1016/j.learninstruc.2007.09.002.CrossRefGoogle Scholar
  39. Kunter, M., Klusmann, U., Baumert, J., Richter, D., Voss, T., & Hachfeld, A. (2013). Professional competence of teachers: Effects on instructional quality and student development. Journal of Educational Psychology, 105(3), 805–820. doi: 10.1037/a0032583.CrossRefGoogle Scholar
  40. Learning Mathematics for Teaching Project. (2011). Measuring the mathematical quality of instruction. Journal of Mathematics Teacher Education, 14, 25–47.CrossRefGoogle Scholar
  41. Lipowsky, F., Rakoczy, K., Pauli, C., Drollinger-Vetter, B., Klieme, E., & Reusser, K. (2009). Quality of geometry instruction and its short-term impact on students’ understanding of the Pythagorean Theorem. Learning and Instruction, 19(6), 527–537. doi: 10.1016/j.learninstruc.2008.11.001.CrossRefGoogle Scholar
  42. Lotz, M., Lipowsky, F., Faust, G. (2013). Dokumentation der Erhebungsinstrumente des Projekts “Persönlichkeits-und Lernentwicklung von Grundschülern” (PERLE). 3. Technischer Bericht zu den PERLE-Videostudien. Materialien zur Bildungsforschung: Vol. 23,3. Frankfurt am Main: Gesellschaft zur Förderung Pädagogischer Forschung [u.a.].Google Scholar
  43. Lüdtke, O., Robitzsch, A., Trautwein, U., & Kunter, M. (2009). Assessing the impact of learning environments: How to use student ratings of classroom or school characteristics in multilevel modeling. Contemporary Educational Psychology, 34, 120–131. doi: 10.1016/j.cedpsych.2008.12.001.CrossRefGoogle Scholar
  44. Marder, M., & Walkington, C. (2014). Classroom observation and value-added models give complementary information about quality of mathematics teaching. In T. Kane, K. Kerr, & R. Pianta (Eds.), Designing teacher evaluation systems: New guidance from the Measuring Effective Teaching project (pp. 234–277). New York: Wiley.Google Scholar
  45. Matsumura, L. C., Garnier, H. E., Pascal, J., & Valdés, R. (2002). Measuring instructional quality in accountability systems: Classroom assignments and students achievement. Educational Assessment, 8, 207–229.CrossRefGoogle Scholar
  46. Matsumura, L. C., Garnier, H., Slater, S. C., & Boston, M. D. (2008). Toward measuring instructional interactions “at-scale”. Educational Assessment, 13, 267–300.CrossRefGoogle Scholar
  47. Oser, F., Dick, A., & Patry, J.-L. (Eds.). (1992). Effective and responsible teaching: The new synthesis. San Francisco: Jossey Bass.Google Scholar
  48. Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38(2), 109–119. doi: 10.3102/0013189X09332374.CrossRefGoogle Scholar
  49. Praetorius, A.-K., Lenske, G., & Helmke, A. (2012). Observer ratings of instructional quality: Do they fulfill what they promise? Learning and Instruction, 22, 387–400.CrossRefGoogle Scholar
  50. Praetorius, A.-K., Pauli, C., Reusser, K., Rakoczy, K., & Klieme, E. (2014). One lesson is all you need? Stability of instructional quality across lessons. Learning and Instruction, 31, 2–12.CrossRefGoogle Scholar
  51. Reyes, M. R., Brackett, M. A., Rivers, S. E., White, M., & Salovey, P. (2012). Classroom emotional climate, student engagement, and academic achievement. Journal of Educational Psychology, 104, 700–712. doi: 10.1037/a0027268.CrossRefGoogle Scholar
  52. Rosenshine, B. (1970). Evaluation of instruction. Review of Educational Research, 40, 279–300.Google Scholar
  53. Sawada, D., Piburn, M. D., Judson, E., Turley, J., Falconer, K., Benford, R., & Bloom, I. (2002). Measuring reform practices in science and mathematics classrooms: The reformed teaching observation protocol. School Science and Mathematics, 102(6), 245–253. doi: 10.1111/j.1949-8594.2002.tb17883.CrossRefGoogle Scholar
  54. Scheerens, J. (2004). Review of school and instructional effectiveness. Background paper prepared for the Education for All Global Monitoring Report 2005. Paris: UNESCO.Google Scholar
  55. Scheerens, J., & Bosker, R. (1997). The foundations of educational effectiveness. Oxford, UK: Pergamon.Google Scholar
  56. Schmidt, W. H., Tatto, M. T., Bankov, K., Blömeke, S., Cedillo, T., Cogan, L., et al. (2007). The preparation gap: Teacher education for middle school mathematics in six countries. Mathematics teaching in the 21st century (MT21). East Lansing: Michigan State University, Center for Research in Mathematics and Science Education.Google Scholar
  57. Schoenfeld, A. H. (2013). Classroom observations in theory and practice. ZDM-The International Journal on Mathematics Education, 45(4), 607–621.CrossRefGoogle Scholar
  58. Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the past decade: The role of theory and research design in disentangling meta-analysis results. Review of Educational Research, 77(4), 454–499. doi: 10.3102/0034654307310317.CrossRefGoogle Scholar
  59. Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Thousand Oaks: Sage.Google Scholar
  60. Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–31.CrossRefGoogle Scholar
  61. Shulman, L. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57(1), 1–22.CrossRefGoogle Scholar
  62. Smith, E., & Gorard, S. (2007). Improving teacher quality: Lessons from America’s No Child Left Behind. Cambridge Journal of Education, 37(2), 191–206.CrossRefGoogle Scholar
  63. Soar, R. S., Medley, D. M., & Coker, H. (1983). Teacher evaluation: A critique of currently used methods. The Phi Delta Kappan, 65, 239–246.Google Scholar
  64. Thompson, C. J., & Davis, S. B. (2014). Classroom observation data and instruction in primary mathematics education: Improving design and rigour. Mathematics Education Research Journal, 26(2), 301–323. doi: 10.1007/s13394-013-0099-y.CrossRefGoogle Scholar
  65. Veenman, S., Kenter, B., & Post, K. (2000). Cooperative learning in Dutch primary classrooms. Educational Studies, 26(3), 281–302.CrossRefGoogle Scholar

Copyright information

© FIZ Karlsruhe 2016

Authors and Affiliations

  1. 1.University of HamburgHamburgGermany

Personalised recommendations