Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Investigating the Judges Performance in a National Competition of Sport Dance


Many sports, such as gymnastics, diving, figure skating, etc. use judges’ scores to generate a rank for determining the winner of a competition. These judges use some type of rating scale when assessing performances. Human ratings are subject to various forms of error and bias. The overall outcomes may largely depend upon the set of chosen raters. The aim of this paper is to illustrate how results from the Many-Facet Rasch Measurement framework can be used to highlight feedback to judges about their scoring patterns. The purpose is to analytically detect anomalous rater behaviours. We consider the field of Sport Dance, a discipline which enjoys increasing public interest and passion in recent years. We analyze data relating to two national competitions held in Italy in 2018 and 2019.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    Data available at the website

  2. 2.

    Data available at the website


  1. Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573.

  2. Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.

  3. Carroll, J. D., & Chang, J.-J. (1970). Analysis of individual differences in multidimensional scaling via an \(n\)-way generalization of “eckart-young” decomposition. Psychometrika, 35(3), 283–319.

  4. Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang Edition: Language Testing and Evaluation.

  5. Engelhard, G. (2002). Large-scale assessment programs for all students: Validity, technical adequacy, and implementation. In Monitoring raters in performance assessments, Chapter 11 (pp. 261–287). Routledge.

  6. Farrokhi, F., & Esfandiari, R. (2011). A many-facet rasch model to detect halo effect in three types of raters. Theory and Practice in Language Studies, 1(11), 1531–1540.

  7. Harshman, R. (1970). Foundations of the parafac procedure: Models and conditions for an “explanatory” multimodal factor analysis. In: UCLA working papers in phonetics Vol 16 (pp. 1–84).

  8. Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275–304.

  9. Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26–43.

  10. Linacre, J. (1989). Many-facet Rasch measurement (I ed.). Chicago: Mesa Press.

  11. Linacre, J. (1994). Many-facet Rasch measurement (II ed.). Chicago: Mesa Press.

  12. Linacre, J. (2002). Facets, factors, elements and levels. Rasch Measurement Transactions, 16(2), 880.

  13. Linacre, J. (2009). Local independence and residual covariance: A study of olympic figure skating ratings. Journal of Applied Measurement, 10(2), 157–169.

  14. Linacre, J. (2013). Facets computer program for many-facet Rasch measurement, version 3.71. 4. Beaverton, Oregon:

  15. Looney, M. (2004). Evaluating judge performance in sport. Journal of Applied Measurement, 5(1), 31–47.

  16. Murphy, K., & Balzer, W. (1989). Rater errors and rating accuracy. Journal of Applied Psychology, 74(4), 619.

  17. Myford, C., & Wolfe, E. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.

  18. Myford, C., & Wolfe, E. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189–227.

  19. Parke, C., Lane, S., & Stone, C. A. (2006). Impact of a state performance assessment program in reading and writing. Educational Research and Evaluation, 12(3), 239–269.

  20. Rasch, G. (1960). Studies in mathematical psychology: I. In: Probabilistic models for some intelligence and attainment tests. Nielsen and Lydiche.

  21. Roever, C., & McNamara, T. (2006). Language testing: The social dimension. International Journal of Applied Linguistics, 16(2), 242–258.

  22. Saal, F., Downey, R., & Lahey, M. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413.

  23. Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3), 279–311.

  24. Wolfe, E. W., & Dobria, L. (2008). Best practices in quantitative methods, chapter applications of the multifaceted Rasch model (pp. 71–85). Sage Thousand Oaks, CA.

  25. Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: Mesa press.

Download references

Author information

Correspondence to Laura Anderlucci.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Anderlucci, L., Lubisco, A. & Mignani, S. Investigating the Judges Performance in a National Competition of Sport Dance. Soc Indic Res (2020).

Download citation


  • Many-Facet Rasch Measurement
  • Rater effect
  • Aesthetic sport