Abstract
Observation-based frameworks of instructional quality differ largely in the approach and the purposes of their development, their theoretical underpinnings, the instructional aspects covered, their operationalization and measurement, as well as the existing evidence on reliability and validity. The current paper summarizes and reflects on these differences by considering the 12 frameworks included in this special issue. By comparing the analysis of three focal mathematics lessons through the lens of each framework as presented in the preceding papers, this paper also examines the similarities, differences, and potential complementarities of these frameworks to describe and evaluate mathematics instruction. To do so, a common structure for comparing all frameworks is suggested and applied to the analyses of the three selected lessons. The paper concludes that although significant work has been pursued over the past years in exploring instructional quality through classroom observation frameworks, the field would benefit from establishing agreed-upon standards for understanding and studying instructional quality, as well as from more collaborative work.
Similar content being viewed by others
Notes
For ease of discussion, we refer to generic, content-specific and hybrid frameworks, although, as explained in the introductory paper (see Charalambous and Praetorius 2018), these frameworks can be situated along a continuum in terms of how generic and content-specific they are.
It is possible, however, that the intended uses of the frameworks are (better) clarified in other publications.
The quotes here and in what follows are directly drawn from the preceding papers of this special issue.
In the publications in the special issue, this challenge is solved by the CLASS researchers by focusing on one version of CLASS (i.e., CLASS-UE), and by the TBD researchers by providing a comprehensive overview of elements included in different studies. In the following, we refer to the results presented based on these decisions.
For ease of reference, in what follows we use the term “elements” to more collectively capture the different terms employed across instruments to describe what they contain on their respective Level-1, 2, and 3.
DMEE and MECORS complement these with low-inference instruments. High-inference instruments require a high degree of subjective judgment on the raters’ part, thus allowing more latitude for interpretation. In contrast, low-inference instruments constrain such interpretations by focusing on more readily observable behaviors and thus reduce both ambiguity and the need for interpretation (see more on this distinction in the publication by Kennedy 2010, p. 231).
Whereas the two preceding measurement decisions are closely tied to the specific framework, the following two might vary to a certain degree from study to study, but still allow us to point to some differences among frameworks. Presented here is the typical approach according to the authors of the respective papers in the special issue.
Instead of conducting a comprehensive review on these issues, we contacted the authors of each paper and asked them to provide us with information regarding reliability and validity. For some frameworks (e.g., TBD) a systematic literature review was conducted to gather such information. For other frameworks for which a large number of publications was available (e.g., CLASS), doing so would, however, have been a huge effort; therefore, the data provided should be seen as indicative.
Because the frameworks differed in the criteria used to examine reliability and validity, Tables C1 and C2 include those common criteria that were also reported by most frameworks.
For example, Shavelson et al. (1986) suggest that a crossed design should be preferred when the lessons are considered as interchangeable, which is the case when the teachers are observed on the same day teaching the same or similar lessons content-wise; this design should also be used when certain features are largely similar across teachers (e.g., the time interval between observations among teachers is shorter than the time interval between observations within lessons of each teacher).
In making this argument we do not mean to imply that the MET study was not without limitations in terms of design and results obtained.
For parsimony reasons, for this work we used Level-1 elements instead of Level-2 elements to organize the huge list generated from the preceding procedure.
The general description of Level-1 indicators was not always consistent with the Level-2 and Level-3 indicators. Some Level-1 classifications were very broad and encompassed other Level-1 or Level-2 classifications within the same instrument. Some Level-3 indicators captured diverse elements which were not necessarily corresponding to or reflecting Level-2 indicators within the same instrument.
The distinction among the intended, implemented, and achieved curriculum (McKnight 1979) gets close to this idea, but still does not emphasize the importance of explicitly attending to students’ use of opportunities.
References
AERA/APA/NCME (2014). Standards for educational and psychological testing. Washington: American Educational Research Association.
Ball, D. L., Sleep, L., Boerst, T. A., & Bass, H. (2009). Combining the development of practice and the practice of development in teacher education. The Elementary School Journal, 109(5), 458–474.
Bell, C., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach to observation protocol validity. Educational Assessment, 17(2–3), 62–87. https://doi.org/10.1080/10627197.2012.715014.
Bell, C. A., Qi, Y., Croft, A., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. (2014). Improving observational score quality: Challenges in observer thinking. In K. Kerr, R. Pianta & T. Kane (Eds.), Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching project (pp. 50–97). San Francisco: Jossey-Bass.
Berlin, R., & Cohen, J. (2018). Understanding instructional quality through a relational lens. ZDM Mathematics Education. (this issue).
Berliner, D. C. (2005). The near impossibility of testing for teacher quality. Journal of Teacher Education, 56(3), 205–213. https://doi.org/10.1177/0022487105275904.
Boston, M. D., & Candela, A. G. (2018). The instructional quality assessment as a tool for reflecting on instructional practice. ZDM Mathematics Education. (this issue).
Brennan, R. L. (2001). Generalizability theory. New York: Springer.
Casabianca, J. M., Lockwood, J. R., & McCaffrey, D. F. (2015). Trends in classroom observation scores. Educational and Psychological Measurement, 75(2), 311–337. https://doi.org/10.1177/0013164414539163.
Chapman, C., Reynolds, D., Muijs, D., Sammons, P., Stringfiled, S., & Teddlie, C. (2016). Educational effectivness and improvement research and practice. In C. Chapman, D. Muijs, D. Reynolds, P. Sammons & C. Teddlie (Eds.), The Routledge international handbook of educational effectiveness and improvement: research, policy, and practice (pp. 1–24). New York: Routledge.
Charalambous, C. Y., & Litke, E. (2018). Studying instructional quality by using a content-specific lens: The case of the mathematical quality of Instruction framework. ZDM Mathematics Education. (this issue).
Charalambous, C. Y., & Pitta-Pantazi, D. (2016). Perspectives on priority mathematics education: Unpacking and understanding a complex relationship linking teacher knowledge, teaching, and learning. In L. English & D. Kirshner (Eds.), Handbook of international research in mathematics education (3rd edn., pp. 19–59). Abingdon: Routledge.
Charalambous, C. Y., & Praetorius, A. K. (2018). Studying instructional quality in mathematics through different lenses: In search of common ground. ZDM Mathematics Education. (this issue).
Cohen, D. K. (2011). Teaching and its predicaments. Cambridge: Harvard University Press.
Cronbach, L. (1990). Essentials of psychological testing (5th edn.). Boston: Allyn & Bacon, Inc.
Diederich, J., & Tenorth, H. E. (1997). Theorie der Schule. Ein Studienbuch zu Geschichte, Funktionen und Gestaltung. Berlin, Germany: Cornelsen.
Fend, H. (1981). Theorie der Schule [Theory of the schooling]. München: Urban & Schwarzenberg.
Gitomer, D. (2009). Crisp measurement and messy context: A Clash of assumptions and metaphors—Synthesis of Section III. In G. Drew (Ed.), Measurement issues and assessment for teaching quality (pp. 223–233). Thousand Oaks: Sage.
Gitomer, D. H., & Bell, C. A. (2013). Evaluating teaching and teachers. In K. F. Geisinger (Ed.), APA handbook of testing and assessment in psychology (Vol. 3, pp. 415–444). Washington: American Psychological Association.
Grossman, P., & McDonald, M. (2008). Back to the future: Directions for research in teaching and teacher education. American Educational Research Journal, 45(1), 184–205. https://doi.org/10.3102/0002831207312906.
Hattie, J. A. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York: Routledge.
Herlihy, C., Karger, E., Pollard, C., Hill, H. C., Kraft, M. A., Williams, M., & Howard, S. (2014). State and local efforts to investigate the validity and reliability of scores from teacher evaluation systems. Teachers College Record, 116(1), 1–28.
Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64. https://doi.org/10.3102/0013189X12437203.
Kennedy, M. M. (2010). Approaches to annual performance assessment. In M. M. Kennedy (Ed.), Teacher assessment and the quest for teacher quality: A handbook (pp. 225–250). San Francisco: Jossey-Bass.
Ko, J., Sammons, P., & Bakkum, L. (2016). Effective teaching. Education Development Trust. https://www.educationdevelopmenttrust.com/~/media/EDT/Reports/Research/2015/r-effective-teaching.pdf. Accessed September 15, 2017.
Konstantopoulos, S. (2012). Teacher effects: Past, present and future. In S. Kelly (Ed.), Assessing teacher quality: Understanding teacher effects on instruction and achievement (pp. 33–48). New York: Teachers College Press.
Koretz, D. (2008). Measuring up: What educational testing really tells us. Cambridge: Harvard University Press.
Krosnick, J. A., & Presser, S. (2010). Questionnaire design. In J. D. Wright & P. V. Marsden (Eds.), Handbook of survey research (2nd edn., pp. 503–512). West Yorkshire: Emerald Group.
Kyriakides, L., Creemers, B. P. M., & Panayiotou, A. (2018). Using educational effectiveness research to promote quality of teaching: The contribution of the dynamic model. ZDM Mathematics Education. (this issue).
Lampert, M. (2010). Learning teaching in, from, and for practice: What do we mean? Journal of Teacher Education, 61(1–2), 21–34.
Lindorff, A., & Sammons, P. (2018). Going beyond structured observations: Looking at classroom practice through a mixed method lens. ZDM Mathematics Education. (this issue).
Maykut, P. S., & Morehouse, R. (1994). Beginning qualitative research: A philosophic and practical guide. London: Falmer Press.
McKnight. C. C. (1979). Model for the Second Study of Mathematics. In Bulletin 4: Second IEA Study of Mathematics. Urbana, Illinois: SIMS Study Center.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd edn., pp. 13–103). Washington: American Council on Education & National Council on Measurement in Education.
Metzler, H. (1990). Methodological interdependencies between conceptualization and operationalization in empirical social sciences. In E. Zarnecka-Bialy (Ed.), Logic counts. Reason and argument (Vol. 3, pp. 167–176). Dordrecht: Springer.
Muijs, D., Kyriakides, L., van der Werf, G., Creemers, B., Timperley, H., & Earl, L. (2014). State of the art-teacher effectiveness and professional learning. School Effectiveness and School Improvement, 25(2), 231–256. https://doi.org/10.1080/09243453.2014.885451.
Muijs, D., Reynolds, D., Sammons, P., Kyriakides, L., Creemers, B. P. M., & Teddlie, C. (2018). Assessing individual lessons using a generic teacher observation instrument: How useful is the International System for Teacher Observation and Feedback (ISTOF)? ZDM Mathematics Education. (this issue).
Open Science Collaboration (2017). Maximizing the reproducibility of your research. In S. O. Lilienfeld & I. D. Waldman (Eds.), Psychological science under scrutiny: Recent challenges and proposed solutions (pp. 1–21). New York: Wiley.
Patton, M. Q. (2002). Qualitative research & evaluation methods (3rd edn.). London: Sage Publications.
Praetorius, A. K., Lenske, G., & Helmke, A. (2012). Observer ratings of instructional quality: Do they fulfill what they promise? Learning and Instruction, 22(6), 387–400. https://doi.org/10.1016/j.learninstruc.2012.03.002.
Praetorius, A., Pauli, K., Reusser, C., Rakoczy, K., & Klieme, E. (2014). One lesson is all you need? Stability of instructional quality across lessons. Learning and Instruction, 31, 2–12.
Praetorius, A.-K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German framework of the three basic dimensions. ZDM Mathematics Education. (this issue).
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd edn.). Thousand Oaks: Sage Publications.
Rosenshine, B. (1983). Teaching functions in instructional programs. The Elementary School Journal, 83(4), 335–351. https://doi.org/10.1086/461321.
Scheerens, J. (2013). The use of theory in school effectiveness research revisited. School Effectiveness and School Improvement, 24(1), 1–38. https://doi.org/10.1080/09243453.2012.691100.
Schlesinger, L., Jentsch, A., Kaiser, G., König, J., & Blömeke, S. (2018). Subject-specific characteristics of instructional quality in mathematics education. ZDM Mathematics Education. (this issue).
Schoenfeld, A. (2018). Video analyses for research and professional development: the teaching for robust understanding (TRU) framework. ZDM Mathematics Education. (this issue).
Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47, 609–612. https://doi.org/10.1016/j.jrp.2013.05.009.
Seidel, T., & Shavelson, R. J. (2007). Teaching effectiveness research in the past decade: The role of theory and research design in disentangling meta-analysis results. Review of Educational Research, 77(4), 454–499. https://doi.org/10.3102/0034654307310317.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Thousand Oaks: Sage.
Shavelson, R. J., Webb, N. M., & Burstein, L. (1986). Measurement of teaching. In M. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 50–91). New York: Macmillan.
Stein, M. K., Grover, B., & Henningsen, M. (1996). Building student capacity for mathematical thinking and reasoning: An analysis of mathematical tasks used in reform classrooms. American Educational Research Journal, 33, 455–488. https://doi.org/10.3102/00028312033002455.
Tomlinson, C. A., & Moon, T. R. (2013). Assessment and student success in a differentiated classroom. Alexandria: ASCD.
Walkington, C., & Marder, M. (2018). Using the UTeach Observation Protocol (UTOP) to understand the quality of mathematics instruction. ZDM Mathematics Education. (this issue).
Walkowiak, T. A., Berry, R. Q., Pinter, H. H., & Jacobson, E. D. (2018). Utilizing the M-Scan to measure standards-based mathematics teaching practices: Affordances and limitations. ZDM Mathematics Education. (this issue).
Whetten, D. A. (1989). What constitutes a theoretical contribution? Academy of Management Review, 14, 490–495.
Wirtz, M., & Caspar, F. (2002). Beurteilerübereinstimmung und Beurteilerreliabilität. Göttingen: Hogrefe.
Acknowledgements
We would like to thank all authors who have contributed to this special issue and have invested considerable time and energy in replying to all our questions and requests. Our gratitude also goes to the reviewers of each single paper within this special issue as well as the reviewers of this paper who helped to improve the quality of the special issue considerably.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Praetorius, AK., Charalambous, C.Y. Classroom observation frameworks for studying instructional quality: looking back and looking forward. ZDM Mathematics Education 50, 535–553 (2018). https://doi.org/10.1007/s11858-018-0946-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11858-018-0946-0