Machine Learning

, Volume 107, Issue 8–10, pp 1333–1361 | Cite as

Stagewise learning for noisy k-ary preferences

  • Yuangang Pan
  • Bo Han
  • Ivor W. TsangEmail author
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2018 Journal Track


The aggregation of k-ary preferences is a novel ranking problem that plays an important role in several aspects of daily life, such as ordinal peer grading, online image-rating, meta-search and online product recommendation. Meanwhile, crowdsourcing is increasingly emerging as a way to provide a plethora of k-ary preferences for these types of ranking problems, due to the convenience of the platforms and the lower costs. However, preferences from crowd workers are often noisy, which inevitably degenerates the reliability of conventional aggregation models. In addition, traditional inferences usually lead to massive computational costs, which limits the scalability of aggregation models. To address both of these challenges, we propose a reliable CrowdsOUrced Plackett–LucE (COUPLE) model combined with an efficient Bayesian learning technique. To ensure reliability, we introduce an uncertainty vector for each crowd worker in COUPLE, which recovers the ground truth of the noisy preferences with a certain probability. Furthermore, we propose an Online Generalized Bayesian Moment Matching (OnlineGBMM) algorithm, which ensures that COUPLE is scalable to large-scale datasets. Comprehensive experiments on four large-scale synthetic datasets and three real-world datasets show that, COUPLE with OnlineGBMM achieves substantial improvements in reliability and noisy worker detection over other well-known approaches.


Noisy preferences Rank aggregation Reliable Plackett–Luce model Online ranking Bayesian moment matching 



This paper was supported by the ARC Future Fellowship FT130100746, ARC grant LP150100671 and DP180100106.


  1. Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.zbMATHGoogle Scholar
  2. Bradley, R. A., & Terry, M. (1952). Rank analysis of incomplete block designs: The method of paired comparisons. Biometrika, 39, 324–345.MathSciNetzbMATHGoogle Scholar
  3. Chen, X., Bennett, P., Collins-Thompson, K., & Horvitz, E. (2013). Pairwise ranking aggregation in a crowdsourced setting. In WWW.Google Scholar
  4. De Alfaro, L., & Shavlovsky, M. (2014). Crowdgrader: A tool for crowdsourcing the evaluation of homework assignments. In ACM technical symposium on computer science education.Google Scholar
  5. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR (pp. 248–255). IEEE.Google Scholar
  6. Desarkar, M. S., Sarkar, S., & Mitra, P. (2016). Preference relations based unsupervised rank aggregation for metasearch. Expert Systems with Applications, 49, 86–98.CrossRefGoogle Scholar
  7. Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. In WWW (pp. 613–622). ACM.Google Scholar
  8. Elo, A. E. (1978). The rating of chessplayers, past and present. Nagoya: Arco Pub.Google Scholar
  9. Fligner, M. A., & Verducci, J. S. (1986). Distance based ranking models. Journal of the Royal Statistical Society Series B (Methodological), 48, 359–369.MathSciNetzbMATHGoogle Scholar
  10. Glickman, M. E. (1999). Parameter estimation in large dynamic paired comparison experiments. Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(3), 377–394.CrossRefzbMATHGoogle Scholar
  11. Guiver, J., & Snelson, E. (2009). Bayesian inference for Plackett-Luce ranking models. In ICML. ACM.Google Scholar
  12. Herbrich, R., Minka, T., & Graepel, T. (2007). Trueskill\(^{{\rm TM}}\): A Bayesian skill rating system. In NIPS (pp. 569–576).Google Scholar
  13. Jaini, P., Chen, Z., Carbajal, P., Law, E., Middleton, L., Regan, K., Schaekermann, M., Trimponias, G., Tung, J., & Poupart, P. (2016). Online Bayesian transfer learning for sequential data modeling. In ICLR Google Scholar
  14. Kazai, G., Kamps, J., Koolen, M., & Milic-Frayling, N. (2011). Crowdsourcing for book search evaluation: Impact of hit design on comparative system ranking. In SIGIR.Google Scholar
  15. Khare, R., Good, B. M., Leaman, R., Su, A. I., & Lu, Z. (2015). Crowdsourcing in biomedicine: Challenges and opportunities. Briefings in Bioinformatics, 17(1), 23–32.CrossRefGoogle Scholar
  16. Khetan, A., & Oh, S. (2016). Data-driven rank breaking for efficient rank aggregation. Journal of Machine Learning Research, 17(193), 1–54.MathSciNetzbMATHGoogle Scholar
  17. Knight, H., & Keith, O. (2005). Ranking facial attractiveness. The European Journal of Orthodontics, 27(4), 340–348.CrossRefGoogle Scholar
  18. Kulkarni, C., Wei, K. P., Le, H., Chia, D., Papadopoulos, K., Cheng, J., et al. (2013). Peer and self assessment in massive online classes. ACM Transactions on Computer-Human Interaction, 20(6), 1–31.CrossRefGoogle Scholar
  19. Lijphart, A. (1994). Electoral systems and party systems: A study of twenty-seven democracies, 1945–1990. Oxford: Oxford University Press.CrossRefGoogle Scholar
  20. Liu, Q., Peng, J., & Ihler, A. T. (2012). Variational inference for crowdsourcing. In NIPS (pp. 692–700).Google Scholar
  21. Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3, 225–331.CrossRefGoogle Scholar
  22. Luaces, O., Díez, J., Alonso-Betanzos, A., Troncoso, A., & Bahamonde, A. (2015). A factorization approach to evaluate open-response assignments in moocs using preference learning on peer assessments. Knowledge-Based Systems, 85, 322–328.CrossRefGoogle Scholar
  23. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley.zbMATHGoogle Scholar
  24. Mallows, C. (1957). Non-null ranking models. Biometrika, 44, 114–130.MathSciNetCrossRefzbMATHGoogle Scholar
  25. Maydeu-Olivares, A. (1999). Thurstonian modeling of ranking data via mean and covariance structure analysis. Psychometrika, 64, 325–340.MathSciNetCrossRefzbMATHGoogle Scholar
  26. Mollica, C., & Tardella, L. (2016). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82, 442–458.MathSciNetCrossRefzbMATHGoogle Scholar
  27. Negahban, S., Oh, S., & Shah, D. (2016). Rank centrality: Ranking from pairwise comparisons. Operations Research, 65(1), 266–287.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Ok, J., Oh, S., Shin, J., & Yi, Y. (2016). Optimality of belief propagation for crowdsourced classification. In ICML (pp. 535–544).Google Scholar
  29. Plackett, R. (1975). The analysis of permutations. Applied Statistics, 24, 193–202.MathSciNetCrossRefGoogle Scholar
  30. Raman, K., & Joachims, T. (2014). Methods for ordinal peer grading. In KDD.Google Scholar
  31. Rashwan, A., Zhao, H., & Poupart, P. (2016). Online and distributed Bayesian moment matching for parameter learning in sum-product networks. In AISTATS (pp. 1469–1477).Google Scholar
  32. Richard, B. (2013). Cheap solutions: Managing a co-producing crowd of strangers to solve your problems. Contemporary perspectives on technical innovation, management and policy (pp. 261–287).Google Scholar
  33. Saari, D. G. (1999). Explaining all three-alternative voting outcomes. Journal of Economic Theory, 87(2), 313–355.MathSciNetCrossRefzbMATHGoogle Scholar
  34. Shah, N., Balakrishnan, S., Bradley, J., Parekh, A., Ramchandran, K., & Wainwright, M. (2015). Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence. In AISTATS.Google Scholar
  35. Shah, N., Bradley, J., Parekh, A., Wainwright, M., & Ramchandran, K. (2013). A case for ordinal peer-evaluation in moocs. In NIPS Workshop on Data Driven Education.Google Scholar
  36. Soufiani, H. A., Chen, W., Parkes, D. C., & Xia, L. (2013). Generalized method-of-moments for rank aggregation. In NIPS (pp. 2706–2714).Google Scholar
  37. Soufiani, H. A., Parkes, D. C., & Xia, L. (2014). Computing parametric ranking models via rank-breaking. In ICML (pp. 360–368).Google Scholar
  38. Thurstone, L. L. (1927). The method of paired comparisons for social values. Journal of Abnormal and Social Psychology, 21, 384.CrossRefGoogle Scholar
  39. Tsiporkova, E., & Boeva, V. (2006). Multi-step ranking of alternatives in a multi-criteria and multi-expert decision making environment. Information Sciences, 176(18), 2673–2697.MathSciNetCrossRefzbMATHGoogle Scholar
  40. Turner, T. L., & Miller, P. M. (2012). Investigating natural variation in drosophila courtship song by the evolve and resequence approach. Genetics, 191(2), 633–642.CrossRefGoogle Scholar
  41. Vitelli, V., Sørensen, Ø., Frigessi, A., & Arjas, E. (2014). Probabilistic preference learning with the mallows rank model. arXiv:1405.7945.
  42. Volkovs, M., & Zemel, R. (2012). A flexible generative model for preference aggregation. In WWW.Google Scholar
  43. Vuurens, J., de Vries, A. P., & Eickhoff, C. (2011). How much spam can you take? An analysis of crowdsourcing results to increase accuracy. In SIGIR Workshop on CIR.Google Scholar
  44. Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1, 1–305.CrossRefzbMATHGoogle Scholar
  45. Weng, R., & Lin, C. J. (2011). A Bayesian approximation method for online ranking. Journal of Machine Learning Research, 12, 267–300.MathSciNetzbMATHGoogle Scholar
  46. Woodroofe, M., et al. (1989). Very weak expansions for sequentially designed experiments: Linear models. The Annals of Statistics, 17(3), 1087–1102.MathSciNetCrossRefzbMATHGoogle Scholar
  47. Yan, L., Dodier, R. H., Mozer, M., & Wolniewicz, R. H. (2003). Optimizing classifier performance via an approximation to the Wilcoxon–Mann–Whitney statistic. In ICML.Google Scholar
  48. Zagel, C., Piazza, A., Petrov, Y., & Bodendorf, F. (2018). Sciencomat: A gamified research platform for evaluating visual attractiveness. In L. E. Freund & W. Cellary (Eds.), Advances in the human side of service engineering (pp. 50–60). Berlin: Springer.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Centre for Artificial Intelligence (CAI)University of Technology SydneySydneyAustralia

Personalised recommendations