Abstract
With the use of surveys of instructional effectiveness that use Likert rating scales, bias is a potential threat to the validity of interpretations. Simple summation of ratings or the use of larger samples are not methods for removing bias. In this study, a new model for scaling ratings is examined. The method both identifies and corrects for bias. Working with a database of student ratings of college instruction, the model was tested in terms of a variety of criteria. Results indicated that bias was detected and that it was large enough to warrant our concern. The statistical corrections were significant both in terms of order and magnitude of class means. Implications for future studies include the specification of more potential sources of bias, the interaction of some of these factors, and the development of more systematic evidence supporting the need to be attentive to bias. The many-faceted Rasch model used in this study needs more evaluation before we are convinced of its utility to study and correct for bias, but preliminary evidence is encouraging. Recommendations were offered for a theoretical rationale for studying bias in student ratings of instructional effectiveness and a program of research leading to the use of this model for reporting results for use in improving instruction and for promotion, tenure, and merit decisions.
Similar content being viewed by others
References
Abrami, P. C., and S. d'Apollonia (1991). Multidimensional students' evaluations of teaching effectiveness-generalizability of “N=1” research: Comment on Marsh (1991).Journal of Educational Psychology 83: 411–415.
Abrami, P. C., S. d'Apollonia, and P. Cohen (1990). Validity of student ratings of instruction: What we know and what we do not.Journal of Educational Psychology 82: 219–231.
Aleamoni, L. (1981). Student ratings of instruction. In J. Millman (ed.),Handbook of Teacher Evaluation, 1st ed., pp. 110–146.
Centra, J. A. (1979). Uses and limitations of student ratings.Determining Faculty Effectiveness, pp. 17–46. San Francisco: Jossey-Bass.
Cole, N. S., and Moss, P. A. (1989). Bias in test use. In R. L. Linn (ed.),Educational Measurement, 3rd ed., pp. 201–219. Washington, DC: American Council on Education/Macmillan.
Costin, F., W. T. Greenough, and R. J. Menges (1971). Student ratings of college teaching: Reliability, validity, and usefulness.Review of Educational Research 41: 511–535.
Cranton, P. A., and R. A. Smith (1986). A new look at the effect of course characteristics on student ratings.American Educational Research Journal 23: 117–128.
Dukes, R. L., and G. Francis (1989). The effects of gender, status, and effective teaching on the evaluation of college instruction.Teaching Sociology 17: 447–457.
Engelhard, G. (1992). Developing rater banks. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Engelhard, G. (1993). The measurement of writing ability with a many-faceted Rasch model.Applied Measurement in Education 5: 171–192.
Feldman, K. A. (1976). Grades and college students' evaluations of their courses and teachers.Research in Higher Education 4: 69–111.
Feldman, K. A. (1977). Consistency and variability among college students in rating their teachers and courses: A review and analysis.Research in Higher Education 6: 223–274.
Feldman, K. A. (1978). Course characteristics and college students' ratings of their teachers.Research in Higher Education 9: 199–242.
Feldman, K. A. (1979). The significance of circumstances for students' ratings of their teachers and courses.Research in Higher Education 10: 149–172.
Gillmore, G. M., M. T. Kane, and R. Naccarato (1977). The generalizability of student ratings of instruction: Estimates of teacher and course components.Journal of Educational Measurement 15: 1–13.
Hess, R. K., M. Becker, and V. Gibney (1993). Large-scale assessment in writing: Factors influencing scaling of writer's performance. Paper presented at the annual meeting of the National Council on Measurement in Education, Atlanta.
Hess, R. K., and R. Olsen (1993). Performance based assessment in writing: Detecting bias in raters and prompts. Paper presented at the annual meeting of the National Council on Measurement in Education, Atlanta.
Howard, G. S., C. G. Conway, and S. E. Maxwell (1985). Construct validity of measures of college teaching effectiveness.Journal of Educational Psychology 77: 187–196.
Kierstead, D., P. D'Agostino, and H. Dill (1988). Sex role stereotyping of college professors: Bias in students' ratings of instructors.Journal of Educational Psychology 80: 342–344.
Kulik, J. A., and W. J. McKeachie (1975). The evaluation of teachers in higher education.Review of Research in Higher Education 3: 210–240.
Linacre, J. M. (1987).An Extension of the Rasch Model to Multi-faceted Situations. Chicago: University of Chicago, Department of Education.
Linacre, J. M. (1989). Objectivity for judge-intermediated certification examination. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Linacre, J. M. (1991).FACETS. Many-Faceted Rasch Analysis. Chicago, IL: MESA Press.
Lunz, M. E., B. D. Wright, and J. M. Linacre (1990). Measuring the impact of judge severity on examination scores.Applied Measurement in Education 3: 331–345.
Marsh, H. W. (1977). The validity of students' evaluations: Classroom evaluations of instructors in independently nominated as best and worst teachers by graduating seniors.American Educational Research Journal 14: 441–447.
Marsh, H. W. (1982). Validity of students' evaluation of college teaching: A multi-trait, multi-method analysis.Journal of Educational Psychology 67: 833–839.
Marsh, H. W. (1984). Students' evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility.Journal of Educational Psychology 76: 707–754.
Marsh, H. W. (1987). Relationship to background characteristics: The witch hunt for potential biases in students' evaluations.International Journal of Educational Research 11: 305–329.
Marsh, H. W. (1991a). Multidimensional students' evaluations of teaching effectiveness: A test of alternative higher order structures.Journal of Educational Psychology 83: 285–296.
Marsh, H. W. (1991b). A multidimensional perspective on students' evaluation of teaching effectiveness: Reply to Abrami and d'Apollonia.Journal of Educational Psychology 83: 416–421.
Masters, G. N. (1982). A Rasch model for partial credit scoring.Psychometrika 47: 149–174.
McKeachie, W. J. (1979). Student ratings of faculty: A reprise.Academe 65: 384–397.
McKeachie, W. J. (1991). Research on college teaching: The historical background.Journal of Educational Psychology 82: 189–200.
Messick, S. (1989). Validity. In R. L. Linn (ed.),Educational Measurement, 3rd ed., pp. 13–103. Washington, DC: American Council on Education/Macmillan.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Copenhagen, 1960, and the University of Chicago Press, Department of Education.
Scott, C. S. (1977). Student ratings and instructor-defined extenuating circumstances.Journal of Educational Psychology 6: 744–747.
Shavelson, R. J., and N. M. Webb (1991).Generalizability Theory: A Primer. Newbury Park, CA: Sage Publications.
Wright, B. D., and M. H. Stone (1979).Best Test Design: Rasch Measurement, Chicago: MESA Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Haladyna, T., Hess, R.K. The detection and correction of bias in student ratings of instruction. Res High Educ 35, 669–687 (1994). https://doi.org/10.1007/BF02497081
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02497081