The detection and correction of bias in student ratings of instruction

Haladyna, Thomas; Hess, Robert K.

doi:10.1007/BF02497081

The detection and correction of bias in student ratings of instruction

Published: November 1994

Volume 35, pages 669–687, (1994)
Cite this article

Research in Higher Education Aims and scope Submit manuscript

Thomas Haladyna¹ &
Robert K. Hess¹

195 Accesses
17 Citations
Explore all metrics

Abstract

With the use of surveys of instructional effectiveness that use Likert rating scales, bias is a potential threat to the validity of interpretations. Simple summation of ratings or the use of larger samples are not methods for removing bias. In this study, a new model for scaling ratings is examined. The method both identifies and corrects for bias. Working with a database of student ratings of college instruction, the model was tested in terms of a variety of criteria. Results indicated that bias was detected and that it was large enough to warrant our concern. The statistical corrections were significant both in terms of order and magnitude of class means. Implications for future studies include the specification of more potential sources of bias, the interaction of some of these factors, and the development of more systematic evidence supporting the need to be attentive to bias. The many-faceted Rasch model used in this study needs more evaluation before we are convinced of its utility to study and correct for bias, but preliminary evidence is encouraging. Recommendations were offered for a theoretical rationale for studying bias in student ratings of instructional effectiveness and a program of research leading to the use of this model for reporting results for use in improving instruction and for promotion, tenure, and merit decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abrami, P. C., and S. d'Apollonia (1991). Multidimensional students' evaluations of teaching effectiveness-generalizability of “N=1” research: Comment on Marsh (1991).Journal of Educational Psychology 83: 411–415.
Article Google Scholar
Abrami, P. C., S. d'Apollonia, and P. Cohen (1990). Validity of student ratings of instruction: What we know and what we do not.Journal of Educational Psychology 82: 219–231.
Article Google Scholar
Aleamoni, L. (1981). Student ratings of instruction. In J. Millman (ed.),Handbook of Teacher Evaluation, 1st ed., pp. 110–146.
Centra, J. A. (1979). Uses and limitations of student ratings.Determining Faculty Effectiveness, pp. 17–46. San Francisco: Jossey-Bass.
Google Scholar
Cole, N. S., and Moss, P. A. (1989). Bias in test use. In R. L. Linn (ed.),Educational Measurement, 3rd ed., pp. 201–219. Washington, DC: American Council on Education/Macmillan.
Google Scholar
Costin, F., W. T. Greenough, and R. J. Menges (1971). Student ratings of college teaching: Reliability, validity, and usefulness.Review of Educational Research 41: 511–535.
Article Google Scholar
Cranton, P. A., and R. A. Smith (1986). A new look at the effect of course characteristics on student ratings.American Educational Research Journal 23: 117–128.
Article Google Scholar
Dukes, R. L., and G. Francis (1989). The effects of gender, status, and effective teaching on the evaluation of college instruction.Teaching Sociology 17: 447–457.
Article Google Scholar
Engelhard, G. (1992). Developing rater banks. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Engelhard, G. (1993). The measurement of writing ability with a many-faceted Rasch model.Applied Measurement in Education 5: 171–192.
Article Google Scholar
Feldman, K. A. (1976). Grades and college students' evaluations of their courses and teachers.Research in Higher Education 4: 69–111.
Article Google Scholar
Feldman, K. A. (1977). Consistency and variability among college students in rating their teachers and courses: A review and analysis.Research in Higher Education 6: 223–274.
Article Google Scholar
Feldman, K. A. (1978). Course characteristics and college students' ratings of their teachers.Research in Higher Education 9: 199–242.
Article Google Scholar
Feldman, K. A. (1979). The significance of circumstances for students' ratings of their teachers and courses.Research in Higher Education 10: 149–172.
Article Google Scholar
Gillmore, G. M., M. T. Kane, and R. Naccarato (1977). The generalizability of student ratings of instruction: Estimates of teacher and course components.Journal of Educational Measurement 15: 1–13.
Article Google Scholar
Hess, R. K., M. Becker, and V. Gibney (1993). Large-scale assessment in writing: Factors influencing scaling of writer's performance. Paper presented at the annual meeting of the National Council on Measurement in Education, Atlanta.
Hess, R. K., and R. Olsen (1993). Performance based assessment in writing: Detecting bias in raters and prompts. Paper presented at the annual meeting of the National Council on Measurement in Education, Atlanta.
Howard, G. S., C. G. Conway, and S. E. Maxwell (1985). Construct validity of measures of college teaching effectiveness.Journal of Educational Psychology 77: 187–196.
Article Google Scholar
Kierstead, D., P. D'Agostino, and H. Dill (1988). Sex role stereotyping of college professors: Bias in students' ratings of instructors.Journal of Educational Psychology 80: 342–344.
Article Google Scholar
Kulik, J. A., and W. J. McKeachie (1975). The evaluation of teachers in higher education.Review of Research in Higher Education 3: 210–240.
Google Scholar
Linacre, J. M. (1987).An Extension of the Rasch Model to Multi-faceted Situations. Chicago: University of Chicago, Department of Education.
Google Scholar
Linacre, J. M. (1989). Objectivity for judge-intermediated certification examination. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Linacre, J. M. (1991).FACETS. Many-Faceted Rasch Analysis. Chicago, IL: MESA Press.
Google Scholar
Lunz, M. E., B. D. Wright, and J. M. Linacre (1990). Measuring the impact of judge severity on examination scores.Applied Measurement in Education 3: 331–345.
Article Google Scholar
Marsh, H. W. (1977). The validity of students' evaluations: Classroom evaluations of instructors in independently nominated as best and worst teachers by graduating seniors.American Educational Research Journal 14: 441–447.
Article Google Scholar
Marsh, H. W. (1982). Validity of students' evaluation of college teaching: A multi-trait, multi-method analysis.Journal of Educational Psychology 67: 833–839.
Article Google Scholar
Marsh, H. W. (1984). Students' evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility.Journal of Educational Psychology 76: 707–754.
Article Google Scholar
Marsh, H. W. (1987). Relationship to background characteristics: The witch hunt for potential biases in students' evaluations.International Journal of Educational Research 11: 305–329.
Article Google Scholar
Marsh, H. W. (1991a). Multidimensional students' evaluations of teaching effectiveness: A test of alternative higher order structures.Journal of Educational Psychology 83: 285–296.
Article Google Scholar
Marsh, H. W. (1991b). A multidimensional perspective on students' evaluation of teaching effectiveness: Reply to Abrami and d'Apollonia.Journal of Educational Psychology 83: 416–421.
Article Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring.Psychometrika 47: 149–174.
Article Google Scholar
McKeachie, W. J. (1979). Student ratings of faculty: A reprise.Academe 65: 384–397.
Google Scholar
McKeachie, W. J. (1991). Research on college teaching: The historical background.Journal of Educational Psychology 82: 189–200.
Article Google Scholar
Messick, S. (1989). Validity. In R. L. Linn (ed.),Educational Measurement, 3rd ed., pp. 13–103. Washington, DC: American Council on Education/Macmillan.
Google Scholar
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Copenhagen, 1960, and the University of Chicago Press, Department of Education.
Scott, C. S. (1977). Student ratings and instructor-defined extenuating circumstances.Journal of Educational Psychology 6: 744–747.
Article Google Scholar
Shavelson, R. J., and N. M. Webb (1991).Generalizability Theory: A Primer. Newbury Park, CA: Sage Publications.
Google Scholar
Wright, B. D., and M. H. Stone (1979).Best Test Design: Rasch Measurement, Chicago: MESA Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Education, Arizona State University West, P.O. Box 37100, 85069-7100, Phoenix, AZ
Thomas Haladyna & Robert K. Hess

Authors

Thomas Haladyna
View author publications
You can also search for this author in PubMed Google Scholar
Robert K. Hess
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haladyna, T., Hess, R.K. The detection and correction of bias in student ratings of instruction. Res High Educ 35, 669–687 (1994). https://doi.org/10.1007/BF02497081

Download citation

Received: 31 May 1993
Issue Date: November 1994
DOI: https://doi.org/10.1007/BF02497081

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The detection and correction of bias in student ratings of instruction

Abstract

Access this article

Similar content being viewed by others

A Criterion-Referenced Approach to Student Ratings of Instruction

Student Ratings of Instruction: a Causal Analysis of Process Variables

Student Ratings of Instruction: Updating Measures to Reflect Recent Scholarship

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The detection and correction of bias in student ratings of instruction

Abstract

Access this article

Similar content being viewed by others

A Criterion-Referenced Approach to Student Ratings of Instruction

Student Ratings of Instruction: a Causal Analysis of Process Variables

Student Ratings of Instruction: Updating Measures to Reflect Recent Scholarship

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation