Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits

  • Baiyu ChenEmail author
  • Sergio Escalera
  • Isabelle Guyon
  • Víctor Ponce-López
  • Nihar Shah
  • Marc Oliu Simón
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9915)


We address the problem of calibration of workers whose task is to label patterns with continuous variables, which arises for instance in labeling images of videos of humans with continuous traits. Worker bias is particularly difficult to evaluate and correct when many workers contribute just a few labels, a situation arising typically when labeling is crowd-sourced. In the scenario of labeling short videos of people facing a camera with personality traits, we evaluate the feasibility of the pairwise ranking method to alleviate bias problems. Workers are exposed to pairs of videos at a time and must order by preference. The variable levels are reconstructed by fitting a Bradley-Terry-Luce model with maximum likelihood. This method may at first sight, seem prohibitively expensive because for N videos, \(p=N(N-1)/2\) pairs must be potentially processed by workers rather that N videos. However, by performing extensive simulations, we determine an empirical law for the scaling of the number of pairs needed as a function of the number of videos in order to achieve a given accuracy of score reconstruction and show that the pairwise method is affordable. We apply the method to the labeling of a large scale dataset of 10,000 videos used in the ChaLearn Apparent Personality Trait challenge.


Calibration of labels Label bias Ordinal labeling Variance models Bradley-Terry-Luce model Continuous labels Regression Personality traits Crowd-sourced labels 



This work was supported in part by donations of Microsoft Research to prepare the personality trait challenge, and Spanish Projects TIN2012-38187-C03-02, TIN2013-43478-P and the European Comission Horizon 2020 granted project SEE.4C under call H2020-ICT-2015. We are grateful to Evelyne Viegas, Albert Clapés i Sintes, Hugo Jair Escalante, Ciprian Corneanu, Xavier Baró Solé, Cécile Capponi, and Stéphane Ayache for stimulating discussions. We are thankful for Prof. Alyosha Efros for his support and guidance.


  1. 1.
    Marcos-Ramiro, A., Pizarro-Perez, D., Marron-Romera, M., Nguyen, L., Gatica-Perez, D.: Body communicative cue extraction for conversational analysis. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8, April 2013Google Scholar
  2. 2.
    Aran, O., Gatica-Perez, D.: One of a kind: inferring personality impressions in meetings. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI, pp. 11–18. ACM, New York (2013)Google Scholar
  3. 3.
    Chalearn lap 2016: First round challenge on first impressions - dataset and resultsGoogle Scholar
  4. 4.
    Escalera, S., Gonzlez, J., Bar, X., Pardo, P., Fabian, J., Oliu, M., Escalante, H.J., Huerta, I., Guyon, I.: Chalearn looking at people 2015 new competitions: age estimation and cultural event recognition. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, July 2015Google Scholar
  5. 5.
    Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based Bayesian aggregation models for crowdsourcing. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, pp. 155–164. ACM, New York (2014)Google Scholar
  6. 6.
    Miller, J., Haden, P.: Statistical Analysis with The General Linear Model (2006)Google Scholar
  7. 7.
    Shah, N., Balakrishnan, S., Bradley, J., Parekh, A., Ramchandran, K., Wainwright, M.: Estimation from pairwise comparisons: sharp minimax bounds with topology dependence. CoRR abs/1505.01462 (2015)Google Scholar
  8. 8.
    Whitehill, J., Wu, T.J., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 2035–2043. Curran Associates, Inc. (2009)Google Scholar
  9. 9.
    Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: Lafferty, J., Williams, C., Shawe-taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 2424–2432 (2010)Google Scholar
  10. 10.
    Welinder, P., Perona, P.: Online crowdsourcing: rating annotators and obtaining cost-effective labels. In: Workshops on Advancing Computer Vision with Humans in the Loop (2010)Google Scholar
  11. 11.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  12. 12.
    Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS 2012, Richland, SC, pp. 467–474. International Foundation for Autonomous Agents and Multiagent Systems (2012)Google Scholar
  13. 13.
    Bachrach, Y., Graepel, T., Minka, T., Guiver, J.: How to grade a test without knowing the answers – a Bayesian graphical model for adaptive crowdsourcing and aptitude testing. ArXiv e-prints (2012)Google Scholar
  14. 14.
    Bradley, R., Terry, M.: Rank analysis of incomplete block designs: the method of paired comparisons. Biometrika 39, 324–345 (1952)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Thurstone, L.L.: A law of comparative judgment. Psychol. Rev. 34(4), 273 (1927)CrossRefGoogle Scholar
  16. 16.
    Shah, N.B., Balakrishnan, S., Guntuboyina, A., Wainwright, M.J.: Stochastically transitive models for pairwise comparisons: statistical and computational issues. arXiv preprint (2015). arXiv:1510.05610
  17. 17.
    Herbrich, R., Minka, T., Graepel, T.: Trueskill: a Bayesian skill rating system. Adv. Neural Inf. Process. Syst. 19, 569 (2007)Google Scholar
  18. 18.
    Escalera, S., Gonzàlez, J., Baró, X., Reyes, M., Lopés, O., Guyon, I., Athitsos, V., Escalante, H.J.: Multi-modal gesture recognition challenge 2013: dataset and results. In: ChaLearn Multi-modal Gesture Recognition Workshop, ICMI (2013)Google Scholar
  19. 19.
    Escalera, S., Gonzàlez, J., Baro, X., Reyes, M., Guyon, I., Athitsos, V., Escalante, H., Argyros, A., Sminchisescu, C., Bowden, R., Sclarof, S.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: ICMI, pp. 365–368 (2013)Google Scholar
  20. 20.
    Escalera, S., Baro, X., Gonzàlez, J., Bautista, M., Madadi, M., Reyes, M., Ponce-López, V., Escalante, H., Shotton, J., Guyon, I.: Chalearn looking at people challenge 2014: dataset and results (2014)Google Scholar
  21. 21.
    Escalera, S., Gonzàlez, J., Baro, X., Pardo, P., Fabian, J., Oliu, M., Escalante, H.J., Huerta, I., Guyon, I.: Chalearn looking at people 2015 new competitions: age estimation and cultural event recognition. In: IJCNN (2015)Google Scholar
  22. 22.
    Baro, X., Gonzàlez, J., Fabian, J., Bautista, M., Oliu, M., Escalante, H., Guyon, I., Escalera, S.: Chalearn looking at people 2015 challenges: action spotting and cultural event recognition. In: ChaLearn LAP Workshop, CVPR (2015)Google Scholar
  23. 23.
    Escalera, S., Fabian, J., Pardo, P., Baró, X., Gonzàlez, J., Escalante, H., Misevic, D., Steiner, U., Guyon, I.: Chalearn looking at people 2015: apparent age and cultural event recognition datasets and results. In: International Conference in Computer Vision, ICCVW (2015)Google Scholar
  24. 24.
    Escalera, S., Athitsos, V., Guyon, I.: Challenges in multimodal gesture recognition. J. Mach. Learn. Res. (2016)Google Scholar
  25. 25.
    Escalera, S., Gonzàlez, J., Baró, X., Shotton, J.: Special issue on multimodal human pose recovery and behavior analysis. IEEE Trans. Pattern Anal. Mach. Intell. (2016)Google Scholar
  26. 26.
    Park, G., Schwartz, H., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., Ungar, L., Seligman, M.: Automatic personality assessment through social media language. J. Pers. Soc. Psychol. 108, 934–952 (2014)CrossRefGoogle Scholar
  27. 27.
    Ponce-López, V., Escalera, S., Baró, X.: Multi-modal social signal analysis for predicting agreement in conversation settings. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI, pp. 495–502. ACM, New York (2013)Google Scholar
  28. 28.
    Ponce-López, V., Escalera, S., Pérez, M., Janés, O., Baró, X.: Non-verbal communication analysis in victim-offender mediations. Pattern Recogn. Lett. 67(Part 1), 19–27 (2015). Cognitive Systems for Knowledge DiscoveryCrossRefGoogle Scholar
  29. 29.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8, June 2008Google Scholar
  30. 30.
    Pentland, A.: Honest Signals: How They Shape Our World. The MIT Press, Cambridge (2008)Google Scholar
  31. 31.
    Goldberg, L.: The structure of phenotypic personality traits (1993)Google Scholar
  32. 32.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)CrossRefGoogle Scholar
  33. 33.
    Humphries, M., Gurney, K., Prescott, T.: The brainstem reticular formation is a small-world, not scale-free, network. Proc. R. Soc. London B: Biol. Sci. 273(1585), 503–511 (2006)CrossRefGoogle Scholar
  34. 34.
    Knoll, D.A., Keyes, D.E.: Jacobian-free Newton-Krylov methods: a survey of approaches and applications. J. Comput. Phys. 193, 357–397 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Shah, N.B., Wainwright, M.J.: Simple, robust and optimal ranking from pairwise comparisons. arXiv preprint (2015). arXiv:1512.08949

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Baiyu Chen
    • 4
    Email author
  • Sergio Escalera
    • 1
    • 2
    • 3
  • Isabelle Guyon
    • 3
    • 5
  • Víctor Ponce-López
    • 1
    • 2
    • 6
  • Nihar Shah
    • 4
  • Marc Oliu Simón
    • 6
  1. 1.Computer Vision CenterCampus UABBarcelonaSpain
  2. 2.Department of Mathematics and Computer ScienceUniversity of BarcelonaBarcelonaSpain
  3. 3.ChaLearnBerkeleyUSA
  4. 4.University of California BerkeleyBerkeleyUSA
  5. 5.University of Paris-SaclayParisFrance
  6. 6.EIMT at the Open University of CataloniaBarcelonaSpain

Personalised recommendations