Skip to main content
Log in

The detection and correction of bias in student ratings of instruction

  • Published:
Research in Higher Education Aims and scope Submit manuscript

Abstract

With the use of surveys of instructional effectiveness that use Likert rating scales, bias is a potential threat to the validity of interpretations. Simple summation of ratings or the use of larger samples are not methods for removing bias. In this study, a new model for scaling ratings is examined. The method both identifies and corrects for bias. Working with a database of student ratings of college instruction, the model was tested in terms of a variety of criteria. Results indicated that bias was detected and that it was large enough to warrant our concern. The statistical corrections were significant both in terms of order and magnitude of class means. Implications for future studies include the specification of more potential sources of bias, the interaction of some of these factors, and the development of more systematic evidence supporting the need to be attentive to bias. The many-faceted Rasch model used in this study needs more evaluation before we are convinced of its utility to study and correct for bias, but preliminary evidence is encouraging. Recommendations were offered for a theoretical rationale for studying bias in student ratings of instructional effectiveness and a program of research leading to the use of this model for reporting results for use in improving instruction and for promotion, tenure, and merit decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abrami, P. C., and S. d'Apollonia (1991). Multidimensional students' evaluations of teaching effectiveness-generalizability of “N=1” research: Comment on Marsh (1991).Journal of Educational Psychology 83: 411–415.

    Article  Google Scholar 

  • Abrami, P. C., S. d'Apollonia, and P. Cohen (1990). Validity of student ratings of instruction: What we know and what we do not.Journal of Educational Psychology 82: 219–231.

    Article  Google Scholar 

  • Aleamoni, L. (1981). Student ratings of instruction. In J. Millman (ed.),Handbook of Teacher Evaluation, 1st ed., pp. 110–146.

  • Centra, J. A. (1979). Uses and limitations of student ratings.Determining Faculty Effectiveness, pp. 17–46. San Francisco: Jossey-Bass.

    Google Scholar 

  • Cole, N. S., and Moss, P. A. (1989). Bias in test use. In R. L. Linn (ed.),Educational Measurement, 3rd ed., pp. 201–219. Washington, DC: American Council on Education/Macmillan.

    Google Scholar 

  • Costin, F., W. T. Greenough, and R. J. Menges (1971). Student ratings of college teaching: Reliability, validity, and usefulness.Review of Educational Research 41: 511–535.

    Article  Google Scholar 

  • Cranton, P. A., and R. A. Smith (1986). A new look at the effect of course characteristics on student ratings.American Educational Research Journal 23: 117–128.

    Article  Google Scholar 

  • Dukes, R. L., and G. Francis (1989). The effects of gender, status, and effective teaching on the evaluation of college instruction.Teaching Sociology 17: 447–457.

    Article  Google Scholar 

  • Engelhard, G. (1992). Developing rater banks. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

  • Engelhard, G. (1993). The measurement of writing ability with a many-faceted Rasch model.Applied Measurement in Education 5: 171–192.

    Article  Google Scholar 

  • Feldman, K. A. (1976). Grades and college students' evaluations of their courses and teachers.Research in Higher Education 4: 69–111.

    Article  Google Scholar 

  • Feldman, K. A. (1977). Consistency and variability among college students in rating their teachers and courses: A review and analysis.Research in Higher Education 6: 223–274.

    Article  Google Scholar 

  • Feldman, K. A. (1978). Course characteristics and college students' ratings of their teachers.Research in Higher Education 9: 199–242.

    Article  Google Scholar 

  • Feldman, K. A. (1979). The significance of circumstances for students' ratings of their teachers and courses.Research in Higher Education 10: 149–172.

    Article  Google Scholar 

  • Gillmore, G. M., M. T. Kane, and R. Naccarato (1977). The generalizability of student ratings of instruction: Estimates of teacher and course components.Journal of Educational Measurement 15: 1–13.

    Article  Google Scholar 

  • Hess, R. K., M. Becker, and V. Gibney (1993). Large-scale assessment in writing: Factors influencing scaling of writer's performance. Paper presented at the annual meeting of the National Council on Measurement in Education, Atlanta.

  • Hess, R. K., and R. Olsen (1993). Performance based assessment in writing: Detecting bias in raters and prompts. Paper presented at the annual meeting of the National Council on Measurement in Education, Atlanta.

  • Howard, G. S., C. G. Conway, and S. E. Maxwell (1985). Construct validity of measures of college teaching effectiveness.Journal of Educational Psychology 77: 187–196.

    Article  Google Scholar 

  • Kierstead, D., P. D'Agostino, and H. Dill (1988). Sex role stereotyping of college professors: Bias in students' ratings of instructors.Journal of Educational Psychology 80: 342–344.

    Article  Google Scholar 

  • Kulik, J. A., and W. J. McKeachie (1975). The evaluation of teachers in higher education.Review of Research in Higher Education 3: 210–240.

    Google Scholar 

  • Linacre, J. M. (1987).An Extension of the Rasch Model to Multi-faceted Situations. Chicago: University of Chicago, Department of Education.

    Google Scholar 

  • Linacre, J. M. (1989). Objectivity for judge-intermediated certification examination. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

  • Linacre, J. M. (1991).FACETS. Many-Faceted Rasch Analysis. Chicago, IL: MESA Press.

    Google Scholar 

  • Lunz, M. E., B. D. Wright, and J. M. Linacre (1990). Measuring the impact of judge severity on examination scores.Applied Measurement in Education 3: 331–345.

    Article  Google Scholar 

  • Marsh, H. W. (1977). The validity of students' evaluations: Classroom evaluations of instructors in independently nominated as best and worst teachers by graduating seniors.American Educational Research Journal 14: 441–447.

    Article  Google Scholar 

  • Marsh, H. W. (1982). Validity of students' evaluation of college teaching: A multi-trait, multi-method analysis.Journal of Educational Psychology 67: 833–839.

    Article  Google Scholar 

  • Marsh, H. W. (1984). Students' evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility.Journal of Educational Psychology 76: 707–754.

    Article  Google Scholar 

  • Marsh, H. W. (1987). Relationship to background characteristics: The witch hunt for potential biases in students' evaluations.International Journal of Educational Research 11: 305–329.

    Article  Google Scholar 

  • Marsh, H. W. (1991a). Multidimensional students' evaluations of teaching effectiveness: A test of alternative higher order structures.Journal of Educational Psychology 83: 285–296.

    Article  Google Scholar 

  • Marsh, H. W. (1991b). A multidimensional perspective on students' evaluation of teaching effectiveness: Reply to Abrami and d'Apollonia.Journal of Educational Psychology 83: 416–421.

    Article  Google Scholar 

  • Masters, G. N. (1982). A Rasch model for partial credit scoring.Psychometrika 47: 149–174.

    Article  Google Scholar 

  • McKeachie, W. J. (1979). Student ratings of faculty: A reprise.Academe 65: 384–397.

    Google Scholar 

  • McKeachie, W. J. (1991). Research on college teaching: The historical background.Journal of Educational Psychology 82: 189–200.

    Article  Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (ed.),Educational Measurement, 3rd ed., pp. 13–103. Washington, DC: American Council on Education/Macmillan.

    Google Scholar 

  • Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Copenhagen, 1960, and the University of Chicago Press, Department of Education.

  • Scott, C. S. (1977). Student ratings and instructor-defined extenuating circumstances.Journal of Educational Psychology 6: 744–747.

    Article  Google Scholar 

  • Shavelson, R. J., and N. M. Webb (1991).Generalizability Theory: A Primer. Newbury Park, CA: Sage Publications.

    Google Scholar 

  • Wright, B. D., and M. H. Stone (1979).Best Test Design: Rasch Measurement, Chicago: MESA Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haladyna, T., Hess, R.K. The detection and correction of bias in student ratings of instruction. Res High Educ 35, 669–687 (1994). https://doi.org/10.1007/BF02497081

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02497081

Keywords

Navigation