Skip to main content
Log in

Classroom observation frameworks for studying instructional quality: looking back and looking forward

  • Original Article
  • Published:
ZDM Aims and scope Submit manuscript

Abstract

Observation-based frameworks of instructional quality differ largely in the approach and the purposes of their development, their theoretical underpinnings, the instructional aspects covered, their operationalization and measurement, as well as the existing evidence on reliability and validity. The current paper summarizes and reflects on these differences by considering the 12 frameworks included in this special issue. By comparing the analysis of three focal mathematics lessons through the lens of each framework as presented in the preceding papers, this paper also examines the similarities, differences, and potential complementarities of these frameworks to describe and evaluate mathematics instruction. To do so, a common structure for comparing all frameworks is suggested and applied to the analyses of the three selected lessons. The paper concludes that although significant work has been pursued over the past years in exploring instructional quality through classroom observation frameworks, the field would benefit from establishing agreed-upon standards for understanding and studying instructional quality, as well as from more collaborative work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. For ease of discussion, we refer to generic, content-specific and hybrid frameworks, although, as explained in the introductory paper (see Charalambous and Praetorius 2018), these frameworks can be situated along a continuum in terms of how generic and content-specific they are.

  2. It is possible, however, that the intended uses of the frameworks are (better) clarified in other publications.

  3. The quotes here and in what follows are directly drawn from the preceding papers of this special issue.

  4. In the publications in the special issue, this challenge is solved by the CLASS researchers by focusing on one version of CLASS (i.e., CLASS-UE), and by the TBD researchers by providing a comprehensive overview of elements included in different studies. In the following, we refer to the results presented based on these decisions.

  5. For ease of reference, in what follows we use the term “elements” to more collectively capture the different terms employed across instruments to describe what they contain on their respective Level-1, 2, and 3.

  6. DMEE and MECORS complement these with low-inference instruments. High-inference instruments require a high degree of subjective judgment on the raters’ part, thus allowing more latitude for interpretation. In contrast, low-inference instruments constrain such interpretations by focusing on more readily observable behaviors and thus reduce both ambiguity and the need for interpretation (see more on this distinction in the publication by Kennedy 2010, p. 231).

  7. Whereas the two preceding measurement decisions are closely tied to the specific framework, the following two might vary to a certain degree from study to study, but still allow us to point to some differences among frameworks. Presented here is the typical approach according to the authors of the respective papers in the special issue.

  8. Instead of conducting a comprehensive review on these issues, we contacted the authors of each paper and asked them to provide us with information regarding reliability and validity. For some frameworks (e.g., TBD) a systematic literature review was conducted to gather such information. For other frameworks for which a large number of publications was available (e.g., CLASS), doing so would, however, have been a huge effort; therefore, the data provided should be seen as indicative.

  9. Because the frameworks differed in the criteria used to examine reliability and validity, Tables C1 and C2 include those common criteria that were also reported by most frameworks.

  10. For example, Shavelson et al. (1986) suggest that a crossed design should be preferred when the lessons are considered as interchangeable, which is the case when the teachers are observed on the same day teaching the same or similar lessons content-wise; this design should also be used when certain features are largely similar across teachers (e.g., the time interval between observations among teachers is shorter than the time interval between observations within lessons of each teacher).

  11. In making this argument we do not mean to imply that the MET study was not without limitations in terms of design and results obtained.

  12. For parsimony reasons, for this work we used Level-1 elements instead of Level-2 elements to organize the huge list generated from the preceding procedure.

  13. The general description of Level-1 indicators was not always consistent with the Level-2 and Level-3 indicators. Some Level-1 classifications were very broad and encompassed other Level-1 or Level-2 classifications within the same instrument. Some Level-3 indicators captured diverse elements which were not necessarily corresponding to or reflecting Level-2 indicators within the same instrument.

  14. The distinction among the intended, implemented, and achieved curriculum (McKnight 1979) gets close to this idea, but still does not emphasize the importance of explicitly attending to students’ use of opportunities.

References

  • AERA/APA/NCME (2014). Standards for educational and psychological testing. Washington: American Educational Research Association.

    Google Scholar 

  • Ball, D. L., Sleep, L., Boerst, T. A., & Bass, H. (2009). Combining the development of practice and the practice of development in teacher education. The Elementary School Journal, 109(5), 458–474.

    Article  Google Scholar 

  • Bell, C., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach to observation protocol validity. Educational Assessment, 17(2–3), 62–87. https://doi.org/10.1080/10627197.2012.715014.

    Article  Google Scholar 

  • Bell, C. A., Qi, Y., Croft, A., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. (2014). Improving observational score quality: Challenges in observer thinking. In K. Kerr, R. Pianta & T. Kane (Eds.), Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching project (pp. 50–97). San Francisco: Jossey-Bass.

    Google Scholar 

  • Berlin, R., & Cohen, J. (2018). Understanding instructional quality through a relational lens. ZDM Mathematics Education. (this issue).

  • Berliner, D. C. (2005). The near impossibility of testing for teacher quality. Journal of Teacher Education, 56(3), 205–213. https://doi.org/10.1177/0022487105275904.

    Article  Google Scholar 

  • Boston, M. D., & Candela, A. G. (2018). The instructional quality assessment as a tool for reflecting on instructional practice. ZDM Mathematics Education. (this issue).

  • Brennan, R. L. (2001). Generalizability theory. New York: Springer.

    Book  Google Scholar 

  • Casabianca, J. M., Lockwood, J. R., & McCaffrey, D. F. (2015). Trends in classroom observation scores. Educational and Psychological Measurement, 75(2), 311–337. https://doi.org/10.1177/0013164414539163.

    Article  Google Scholar 

  • Chapman, C., Reynolds, D., Muijs, D., Sammons, P., Stringfiled, S., & Teddlie, C. (2016). Educational effectivness and improvement research and practice. In C. Chapman, D. Muijs, D. Reynolds, P. Sammons & C. Teddlie (Eds.), The Routledge international handbook of educational effectiveness and improvement: research, policy, and practice (pp. 1–24). New York: Routledge.

    Google Scholar 

  • Charalambous, C. Y., & Litke, E. (2018). Studying instructional quality by using a content-specific lens: The case of the mathematical quality of Instruction framework. ZDM Mathematics Education. (this issue).

  • Charalambous, C. Y., & Pitta-Pantazi, D. (2016). Perspectives on priority mathematics education: Unpacking and understanding a complex relationship linking teacher knowledge, teaching, and learning. In L. English & D. Kirshner (Eds.), Handbook of international research in mathematics education (3rd edn., pp. 19–59). Abingdon: Routledge.

    Google Scholar 

  • Charalambous, C. Y., & Praetorius, A. K. (2018). Studying instructional quality in mathematics through different lenses: In search of common ground. ZDM Mathematics Education. (this issue).

  • Cohen, D. K. (2011). Teaching and its predicaments. Cambridge: Harvard University Press.

    Book  Google Scholar 

  • Cronbach, L. (1990). Essentials of psychological testing (5th edn.). Boston: Allyn & Bacon, Inc.

    Google Scholar 

  • Diederich, J., & Tenorth, H. E. (1997). Theorie der Schule. Ein Studienbuch zu Geschichte, Funktionen und Gestaltung. Berlin, Germany: Cornelsen.

  • Fend, H. (1981). Theorie der Schule [Theory of the schooling]. München: Urban & Schwarzenberg.

    Google Scholar 

  • Gitomer, D. (2009). Crisp measurement and messy context: A Clash of assumptions and metaphors—Synthesis of Section III. In G. Drew (Ed.), Measurement issues and assessment for teaching quality (pp. 223–233). Thousand Oaks: Sage.

    Chapter  Google Scholar 

  • Gitomer, D. H., & Bell, C. A. (2013). Evaluating teaching and teachers. In K. F. Geisinger (Ed.), APA handbook of testing and assessment in psychology (Vol. 3, pp. 415–444). Washington: American Psychological Association.

    Google Scholar 

  • Grossman, P., & McDonald, M. (2008). Back to the future: Directions for research in teaching and teacher education. American Educational Research Journal, 45(1), 184–205. https://doi.org/10.3102/0002831207312906.

    Article  Google Scholar 

  • Hattie, J. A. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York: Routledge.

    Google Scholar 

  • Herlihy, C., Karger, E., Pollard, C., Hill, H. C., Kraft, M. A., Williams, M., & Howard, S. (2014). State and local efforts to investigate the validity and reliability of scores from teacher evaluation systems. Teachers College Record, 116(1), 1–28.

    Google Scholar 

  • Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64. https://doi.org/10.3102/0013189X12437203.

    Article  Google Scholar 

  • Kennedy, M. M. (2010). Approaches to annual performance assessment. In M. M. Kennedy (Ed.), Teacher assessment and the quest for teacher quality: A handbook (pp. 225–250). San Francisco: Jossey-Bass.

    Google Scholar 

  • Ko, J., Sammons, P., & Bakkum, L. (2016). Effective teaching. Education Development Trust. https://www.educationdevelopmenttrust.com/~/media/EDT/Reports/Research/2015/r-effective-teaching.pdf. Accessed September 15, 2017.

  • Konstantopoulos, S. (2012). Teacher effects: Past, present and future. In S. Kelly (Ed.), Assessing teacher quality: Understanding teacher effects on instruction and achievement (pp. 33–48). New York: Teachers College Press.

    Google Scholar 

  • Koretz, D. (2008). Measuring up: What educational testing really tells us. Cambridge: Harvard University Press.

    Google Scholar 

  • Krosnick, J. A., & Presser, S. (2010). Questionnaire design. In J. D. Wright & P. V. Marsden (Eds.), Handbook of survey research (2nd edn., pp. 503–512). West Yorkshire: Emerald Group.

    Google Scholar 

  • Kyriakides, L., Creemers, B. P. M., & Panayiotou, A. (2018). Using educational effectiveness research to promote quality of teaching: The contribution of the dynamic model. ZDM Mathematics Education. (this issue).

  • Lampert, M. (2010). Learning teaching in, from, and for practice: What do we mean? Journal of Teacher Education, 61(1–2), 21–34.

    Article  Google Scholar 

  • Lindorff, A., & Sammons, P. (2018). Going beyond structured observations: Looking at classroom practice through a mixed method lens. ZDM Mathematics Education. (this issue).

  • Maykut, P. S., & Morehouse, R. (1994). Beginning qualitative research: A philosophic and practical guide. London: Falmer Press.

    Google Scholar 

  • McKnight. C. C. (1979). Model for the Second Study of Mathematics. In Bulletin 4: Second IEA Study of Mathematics. Urbana, Illinois: SIMS Study Center.

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd edn., pp. 13–103). Washington: American Council on Education & National Council on Measurement in Education.

    Google Scholar 

  • Metzler, H. (1990). Methodological interdependencies between conceptualization and operationalization in empirical social sciences. In E. Zarnecka-Bialy (Ed.), Logic counts. Reason and argument (Vol. 3, pp. 167–176). Dordrecht: Springer.

    Chapter  Google Scholar 

  • Muijs, D., Kyriakides, L., van der Werf, G., Creemers, B., Timperley, H., & Earl, L. (2014). State of the art-teacher effectiveness and professional learning. School Effectiveness and School Improvement, 25(2), 231–256. https://doi.org/10.1080/09243453.2014.885451.

    Article  Google Scholar 

  • Muijs, D., Reynolds, D., Sammons, P., Kyriakides, L., Creemers, B. P. M., & Teddlie, C. (2018). Assessing individual lessons using a generic teacher observation instrument: How useful is the International System for Teacher Observation and Feedback (ISTOF)? ZDM Mathematics Education. (this issue).

  • Open Science Collaboration (2017). Maximizing the reproducibility of your research. In S. O. Lilienfeld & I. D. Waldman (Eds.), Psychological science under scrutiny: Recent challenges and proposed solutions (pp. 1–21). New York: Wiley.

    Google Scholar 

  • Patton, M. Q. (2002). Qualitative research & evaluation methods (3rd edn.). London: Sage Publications.

    Google Scholar 

  • Praetorius, A. K., Lenske, G., & Helmke, A. (2012). Observer ratings of instructional quality: Do they fulfill what they promise? Learning and Instruction, 22(6), 387–400. https://doi.org/10.1016/j.learninstruc.2012.03.002.

    Article  Google Scholar 

  • Praetorius, A., Pauli, K., Reusser, C., Rakoczy, K., & Klieme, E. (2014). One lesson is all you need? Stability of instructional quality across lessons. Learning and Instruction, 31, 2–12.

    Article  Google Scholar 

  • Praetorius, A.-K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German framework of the three basic dimensions. ZDM Mathematics Education. (this issue).

  • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd edn.). Thousand Oaks: Sage Publications.

    Google Scholar 

  • Rosenshine, B. (1983). Teaching functions in instructional programs. The Elementary School Journal, 83(4), 335–351. https://doi.org/10.1086/461321.

    Article  Google Scholar 

  • Scheerens, J. (2013). The use of theory in school effectiveness research revisited. School Effectiveness and School Improvement, 24(1), 1–38. https://doi.org/10.1080/09243453.2012.691100.

    Article  Google Scholar 

  • Schlesinger, L., Jentsch, A., Kaiser, G., König, J., & Blömeke, S. (2018). Subject-specific characteristics of instructional quality in mathematics education. ZDM Mathematics Education. (this issue).

  • Schoenfeld, A. (2018). Video analyses for research and professional development: the teaching for robust understanding (TRU) framework. ZDM Mathematics Education. (this issue).

  • Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47, 609–612. https://doi.org/10.1016/j.jrp.2013.05.009.

    Article  Google Scholar 

  • Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the past decade: The role of theory and research design in disentangling meta-analysis results. Review of Educational Research, 77(4), 454–499. https://doi.org/10.3102/0034654307310317.

    Article  Google Scholar 

  • Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Thousand Oaks: Sage.

    Google Scholar 

  • Shavelson, R. J., Webb, N. M., & Burstein, L. (1986). Measurement of teaching. In M. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 50–91). New York: Macmillan.

  • Stein, M. K., Grover, B., & Henningsen, M. (1996). Building student capacity for mathematical thinking and reasoning: An analysis of mathematical tasks used in reform classrooms. American Educational Research Journal, 33, 455–488. https://doi.org/10.3102/00028312033002455.

    Article  Google Scholar 

  • Tomlinson, C. A., & Moon, T. R. (2013). Assessment and student success in a differentiated classroom. Alexandria: ASCD.

    Google Scholar 

  • Walkington, C., & Marder, M. (2018). Using the UTeach Observation Protocol (UTOP) to understand the quality of mathematics instruction. ZDM Mathematics Education. (this issue).

  • Walkowiak, T. A., Berry, R. Q., Pinter, H. H., & Jacobson, E. D. (2018). Utilizing the M-Scan to measure standards-based mathematics teaching practices: Affordances and limitations. ZDM Mathematics Education. (this issue).

  • Whetten, D. A. (1989). What constitutes a theoretical contribution? Academy of Management Review, 14, 490–495.

    Article  Google Scholar 

  • Wirtz, M., & Caspar, F. (2002). Beurteilerübereinstimmung und Beurteilerreliabilität. Göttingen: Hogrefe.

    Google Scholar 

Download references

Acknowledgements

We would like to thank all authors who have contributed to this special issue and have invested considerable time and energy in replying to all our questions and requests. Our gratitude also goes to the reviewers of each single paper within this special issue as well as the reviewers of this paper who helped to improve the quality of the special issue considerably.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna-Katharina Praetorius.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 86 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Praetorius, AK., Charalambous, C.Y. Classroom observation frameworks for studying instructional quality: looking back and looking forward. ZDM Mathematics Education 50, 535–553 (2018). https://doi.org/10.1007/s11858-018-0946-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11858-018-0946-0

Keywords

Navigation