Many sports, such as gymnastics, diving, figure skating, etc. use judges’ scores to generate a rank for determining the winner of a competition. These judges use some type of rating scale when assessing performances. Human ratings are subject to various forms of error and bias. The overall outcomes may largely depend upon the set of chosen raters. The aim of this paper is to illustrate how results from the Many-Facet Rasch Measurement framework can be used to highlight feedback to judges about their scoring patterns. The purpose is to analytically detect anomalous rater behaviours. We consider the field of Sport Dance, a discipline which enjoys increasing public interest and passion in recent years. We analyze data relating to two national competitions held in Italy in 2018 and 2019.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Data available at the website https://www.federdanza.it/images/gare/2017_2018/EXPORT/CAMPIONATI_2018/arena_bianca_05/12-grp-d_synchrolat_u11_c/index.htm.
Data available at the website https://www.federdanza.it/images/gare/2018_2019/EXPORT/arena_bianca_05/10-p-grp_synchrolat_u15_c/index.htm.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573.
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
Carroll, J. D., & Chang, J.-J. (1970). Analysis of individual differences in multidimensional scaling via an \(n\)-way generalization of “eckart-young” decomposition. Psychometrika, 35(3), 283–319.
Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang Edition: Language Testing and Evaluation.
Engelhard, G. (2002). Large-scale assessment programs for all students: Validity, technical adequacy, and implementation. In Monitoring raters in performance assessments, Chapter 11 (pp. 261–287). Routledge.
Farrokhi, F., & Esfandiari, R. (2011). A many-facet rasch model to detect halo effect in three types of raters. Theory and Practice in Language Studies, 1(11), 1531–1540.
Harshman, R. (1970). Foundations of the parafac procedure: Models and conditions for an “explanatory” multimodal factor analysis. In: UCLA working papers in phonetics Vol 16 (pp. 1–84).
Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275–304.
Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26–43.
Linacre, J. (1989). Many-facet Rasch measurement (I ed.). Chicago: Mesa Press.
Linacre, J. (1994). Many-facet Rasch measurement (II ed.). Chicago: Mesa Press.
Linacre, J. (2002). Facets, factors, elements and levels. Rasch Measurement Transactions, 16(2), 880.
Linacre, J. (2009). Local independence and residual covariance: A study of olympic figure skating ratings. Journal of Applied Measurement, 10(2), 157–169.
Linacre, J. (2013). Facets computer program for many-facet Rasch measurement, version 3.71. 4. Beaverton, Oregon: https://www.winsteps.com.
Looney, M. (2004). Evaluating judge performance in sport. Journal of Applied Measurement, 5(1), 31–47.
Murphy, K., & Balzer, W. (1989). Rater errors and rating accuracy. Journal of Applied Psychology, 74(4), 619.
Myford, C., & Wolfe, E. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
Myford, C., & Wolfe, E. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189–227.
Parke, C., Lane, S., & Stone, C. A. (2006). Impact of a state performance assessment program in reading and writing. Educational Research and Evaluation, 12(3), 239–269.
Rasch, G. (1960). Studies in mathematical psychology: I. In: Probabilistic models for some intelligence and attainment tests. Nielsen and Lydiche.
Roever, C., & McNamara, T. (2006). Language testing: The social dimension. International Journal of Applied Linguistics, 16(2), 242–258.
Saal, F., Downey, R., & Lahey, M. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413.
Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3), 279–311.
Wolfe, E. W., & Dobria, L. (2008). Best practices in quantitative methods, chapter applications of the multifaceted Rasch model (pp. 71–85). Sage Thousand Oaks, CA.
Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: Mesa press.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Anderlucci, L., Lubisco, A. & Mignani, S. Investigating the Judges Performance in a National Competition of Sport Dance. Soc Indic Res (2020). https://doi.org/10.1007/s11205-019-02256-z
- Many-Facet Rasch Measurement
- Rater effect
- Aesthetic sport