Skip to main content
Log in

A Criterion-Referenced Approach to Student Ratings of Instruction

  • Published:
Research in Higher Education Aims and scope Submit manuscript

Abstract

We developed a criterion-referenced student rating of instruction (SRI) to facilitate formative assessment of teaching. It involves four dimensions of teaching quality that are grounded in current instructional design principles: Organization and structure, Assessment and feedback, Personal interactions, and Academic rigor. Using item response theory and Wright mapping methods, we describe teaching characteristics at various points along the latent continuum for each scale. These maps enable criterion-referenced score interpretation by making an explicit connection between test performance and the theoretical framework. We explain the way our Wright maps can be used to enhance an instructor’s ability to interpret scores and identify ways to refine teaching. Although our work is aimed at improving score interpretation, a criterion-referenced test is not immune to factors that may bias test scores. The literature on SRIs is filled with research on factors unrelated to teaching that may bias scores. Therefore, we also used multilevel models to evaluate the extent to which student and course characteristic may affect scores and compromise score interpretation. Results indicated that student anger and the interaction between student gender and instructor gender are significant effects that account for a small amount of variance in SRI scores. All things considered, our criterion-referenced approach to SRIs is a viable way to describe teaching quality and help instructors refine pedagogy and facilitate course development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The complete measure is available upon request. For brevity, we did not include it in this paper.

  2. An unpublished manuscript about the original study is available upon request.

  3. A Wright map is also referred to as an item map.

References

  • Aleamoni, L. M. (1999). Student rating myths versus research facts from 1924 to 1998. Journal of Personnel Evaluation in Education, 13(2), 153–166.

    Article  Google Scholar 

  • Anderson, K., & Miller, E. D. (1997). Gender and student evaluations of teaching. PS: Political Science and Politics, 30(2), 216–219.

    Google Scholar 

  • Arreola, R. A. (2007). Developing a comprehensive faculty evaluation system: A guide to designing, building, and operating large-scale faculty evaluation systems (3rd ed.). Bolton, MA: Anker Publishing Company Inc.

    Google Scholar 

  • Basow, S. A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology, 87(4), 656–665.

    Article  Google Scholar 

  • Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed effects models using lme4. Journal of Statistical Software, 67(1), 1–48. doi:10.18637/jss.v067.i01.

    Article  Google Scholar 

  • Bensley, D. A. (2010). A brief guide for teaching and assessing critical thinking in psychology. Observer, 23(10). Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2010/december-10/a-brief-guide-for-teaching-and-assessing-critical-thinking-in-psychology.html.

  • Benson, J. (1998). Developing a strong program of construct validation: A test anxiety example. Educational Measurement: Issues & Practice, 17, 10–22.

    Article  Google Scholar 

  • Benton, S. L., & Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In M. B. Paulsen (Ed.), Higher education: Handbook of theory & research (Vol. 29, pp. 279–326). Dordrecht, The Netherlands: Springer.

    Chapter  Google Scholar 

  • Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). New York: Routledge.

    Google Scholar 

  • Braga, M., Paccagnella, M., & Pellizzari, M. (2014). Evaluating students’ evaluations of professors. Economic of Education Review, 41, 71–88.

    Article  Google Scholar 

  • Bransford, J. D., Brown, A. L., & Cocking, R. R. (Eds.). (1999). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press.

    Google Scholar 

  • Brophy, J. E. (1999). Teaching (educational practices series—1). Geneva, Switzerland: International Academy of Education and International Bureau of Education, UNESCO.

    Google Scholar 

  • Carrell, S. E., & West, J. E. (2010). Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy, 118(3), 409–432.

    Article  Google Scholar 

  • Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards for tests. Thousand Oaks, CA: Sage.

    Book  Google Scholar 

  • Clark, R. E. (1983). Reconsidering research on learning from media. Review of Educational Research, 53, 445–459. doi:10.3102/00346543053004445.

    Article  Google Scholar 

  • Clark, R. E. (2009). Translating research into new instructional technologies for higher education: The active ingredient process. The Journal of Computing in Higher Education, 21, 4–18. doi:10.1007/s12528-009-9013-8.

    Article  Google Scholar 

  • Clayson, D. E. (2009). Student evaluations of teaching: Are they related to what students learn? A meta-analysis and review of the literature. Journal of Marketing Education, 31, 16–30.

    Article  Google Scholar 

  • Cohen, P. A. (1981). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research, 51, 281–309.

    Article  Google Scholar 

  • Feldman, K. A. (1992). College students’ views of male and female college teachers: Part I—evidence from the social laboratory and experiments. Research in Higher Education, 33(3), 317–375.

    Article  Google Scholar 

  • Feldman, K. A. (1993). College students’ views of male and female college teachers: Part II—evidence from students’ evaluations of their classroom teachers. Research in Higher Education, 34(2), 151–211.

    Article  Google Scholar 

  • Ferguson, R. F. (2012). Can student surveys measure teaching quality. The Phi Delta Kappan, 94(3), 24–28.

    Article  Google Scholar 

  • Fink, L. D. (2013). Creating significant learning experiences: An integrated approach to designing college courses. San Franciso: Jossey-Bass.

    Google Scholar 

  • Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–521.

    Article  Google Scholar 

  • Greenwald, A. G., & Gillmore, G. M. (1997). Grading leniency is a removable contaminant of student ratings. American Psychologist, 52, 1209–1217.

    Article  Google Scholar 

  • Hamre, B. K., Pianta, R. C., Downer, J. T., DeCoster, J., Mashburn, A. J., Jones, S. M., et al. (2013). Teaching through interactions: Testing a developmental framework of teacher effectiveness in over 4000 classrooms. The Elementary School Journal, 113(4), 461–487.

    Article  Google Scholar 

  • Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. London: Routledge.

    Google Scholar 

  • Huynh, H. (1998). On score locations of binary and partial credit items and their applications to item mapping and criterion-referenced interpretation. Journal of Educational and Behavioral Statistics, 23, 35–56.

    Article  Google Scholar 

  • Huynh, H., & Meyer, J. P. (2003). Maximum information approach to scale description for affective measures based on the Rasch model. Journal of Applied Measurement, 4, 101–110.

    Google Scholar 

  • Johnson, V. E. (2003). Grade inflation: A crisis in college education. New York: Springer.

    Google Scholar 

  • Kennedy, M. J., Thomas, C. N., Aronin, S., Newton, J. R., & Lloyd, J. W. (2014). Improving teacher candidate knowledge using content acquisition podcasts. Computers & Education, 70, 116–127. doi:10.1016/j.compedu.2013.08.010.

    Article  Google Scholar 

  • Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2015). Package ‘lmerTest’ [computer software] version 2.0–29. Retrieved from https://cran.r-project.org/web/packages/lmerTest/index.html.

  • Linacre, J. M. (2006). A user’s guide to WINSTEPS Rasch-model computer programs. Chicago, IL: Author.

    Google Scholar 

  • Lüdtke, O., Robitzsch, A., Trautwein, U., & Kunter, M. (2009). Assessing the impact of learning environments: How to use student ratings of classroom or school characteristics in multilevel modeling. Contemporary Educational Psychology, 34, 120–131.

    Article  Google Scholar 

  • Marsh, H. W. (1987). Students’ evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253–388.

    Article  Google Scholar 

  • Marsh, H. W., & Roche, L. A. (1997). Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 1187–1197.

    Article  Google Scholar 

  • Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students’ evaluations of teaching: Popular myths, bias, validity, or innocent bystanders? Journal of Educational Psychology, 92(1), 202–228.

    Article  Google Scholar 

  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.

    Article  Google Scholar 

  • Mayer, R. E. (2008). Applying the science of learning: Evidence-based principles for the design of multimedia instruction. American Psychologist, 63, 760–769. doi:10.1037/0003-066X.63.8.760.

    Article  Google Scholar 

  • Mayer, R. E. (2009). Multimedia learning (2nd ed.). New York: Cambridge University Press.

    Book  Google Scholar 

  • McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., & Hamilton, D. S. (2003). Evaluating value-added models for teacher accountability. Santa Monica, CA: Rand.

    Book  Google Scholar 

  • McKeachie, W. J., & Svinicki, M. (2006). McKeachie’s teaching tips: Strategies, research, and theory for college and university teachers (12th ed.). Boston: Houghton Mifflin.

    Google Scholar 

  • Meyer, J. P. (2014). Applied measurement with jMetrik. New York: Routledge.

    Google Scholar 

  • Ory, J. C., & Ryan, K. (2001). How do student ratings measure up to a new validity framework? New Directions for Institutional Research, 109, 27–44. doi:10.1002/ir.2.

    Article  Google Scholar 

  • Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38(2), 109–119.

    Article  Google Scholar 

  • Pyc, M. A., Agarwal, P. K., & Roediger, H. L., III (2014). Test-enhanced learning. In V. A. Benassi, C. E. Overson, & C. M. Hakala (Eds.) Applying science of learning in education: Infusing psychological science into the curriculum. Retrieved from the Society for the Teaching of Psychology website http://teachpsych.org/ebooks/asle2014/index.php.

  • Raudenbush, S. W., & Jean, M. (2014). To what extent do student perceptions of classroom quality predict teacher value added? In T. J. Kane, K. A. Kerr, & R. C. Pianta (Eds.), Designing teacher evaluation systems. San Franciso, CA: Jossey-Bass.

    Google Scholar 

  • Wiggins, G., & McTighe, J. (2011). Understanding by design (2nd ed.). Alexandria, VA: Association for Supervision and Curriculum Development.

    Google Scholar 

  • Willingham, D. T. (2007). Critical thinking: Why is it so hard to teach? American Educator. Washington, D.C.: American Federation of Teachers. http://www.aft.org/newspubs/periodicals/ae.

  • Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

Download references

Acknowledgments

We thank Emily Bowling, Fares Karam, Bo Odom, and Laura Tortorelli for their work on the original version of this measure. They developed the original teaching framework and wrote the initial pool of items as part of a course project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Patrick Meyer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meyer, J.P., Doromal, J.B., Wei, X. et al. A Criterion-Referenced Approach to Student Ratings of Instruction. Res High Educ 58, 545–567 (2017). https://doi.org/10.1007/s11162-016-9437-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11162-016-9437-8

Keywords

Navigation