Machine Learning

, Volume 107, Issue 5, pp 903–941 | Cite as

Dyad ranking using Plackett–Luce models based on joint feature representations

  • Dirk Schäfer
  • Eyke Hüllermeier


Label ranking is a specific type of preference learning problem, namely the problem of learning a model that maps instances to rankings over a finite set of predefined alternatives. Like in conventional classification, these alternatives are identified by their name or label while not being characterized in terms of any properties or features that could be potentially useful for learning. In this paper, we consider a generalization of the label ranking problem that we call dyad ranking. In dyad ranking, not only the instances but also the alternatives are represented in terms of attributes. For learning in the setting of dyad ranking, we propose an extension of an existing label ranking method based on the Plackett–Luce model, a statistical model for rank data. This model is combined with a suitable feature representation of dyads. Concretely, we propose a method based on a bilinear extension, where the representation is given in terms of a Kronecker product, as well as a method based on neural networks, which allows for learning a (highly nonlinear) joint feature representation. The usefulness of the additional information provided by the feature description of alternatives is shown in several experimental studies. Finally, we propose a method for the visualization of dyad rankings, which is based on the technique of multidimensional unfolding.


Preference learning Label ranking Dyad ranking Plackett–Luce model Neural networks Multidimensional unfolding 


  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., et al. (2016). TensorFlow: Large-scale machine learning on heterogeneous systems. CoRR. arXiv:1603.04467.
  2. Alvo, M., & Yu, P. L. (2014). Statistical methods for ranking data. New York: Springer.CrossRefzbMATHGoogle Scholar
  3. Bakir, G., Hofmann, T., Schölkopf, B., Smola, A. J., Taskar, B., & Vishwanathan, S. V. N. (Eds.). (2007). Predicting structured data. Cambridge: MIT Press.Google Scholar
  4. Basilico, J., & Hofmann, T. (2004). Unifying collaborative and content-based filtering. In Proceedings ICML, 21st international conference on machine learning. ACM, New York, USA.Google Scholar
  5. Bellet, A., Habrard, A., & Sebban, M. (2013). A survey on metric learning for feature vectors and structured data (p. 57). arXiv:1306.6709.
  6. Borg, I. (1981). Anwendungsorientierte Multidimensionale Skalierung. New York: Springer.CrossRefGoogle Scholar
  7. Borg, I., & Groenen, P. (2005). Modern multidimensional scaling: Theory and applications. New York: Springer.zbMATHGoogle Scholar
  8. Borg, I., Groenen, P. J., & Mair, P. (2012). Applied multidimensional scaling. New York: Springer.Google Scholar
  9. Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of the COMPSTAT’2010, 19th international conference on computational statistics (pp. 177–187). Springer, Paris, France.Google Scholar
  10. Bottou, L. (1998). Online algorithms and stochastic approximations. Online learning and neural networks. Cambridge: Cambridge University Press.zbMATHGoogle Scholar
  11. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Reading, MA: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  12. Brazdil, P., Giraud-Carrier, C., Soares, C., & Vilalta, R. (2008). Metalearning: Applications to data mining (1st ed.). New York: Springer.zbMATHGoogle Scholar
  13. Brinker, K., Fürnkranz, J., & Hüllermeier, E. (2006). A unified model for multilabel classification and ranking. In Proceedings of the ECAI2006: 17th European conference on artificial intelligence (pp. 489–493), Riva Del Garda, Italy.Google Scholar
  14. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In Proceedings ICML, 22nd international conference on machine learning (pp. 89–96). ACM, New York, USA.Google Scholar
  15. Busing, F. (2010). Advances in multidimensional unfolding. Ph.D. thesis, University of Leiden.Google Scholar
  16. Cao, Z., Qin, T., Liu, T. Y., Tsai, M. F., & Li, H. (2007). Learning to rank: From pairwise approach to listwise approach. In Proceedings ICML, 24th international conference on machine learning (pp. 129–136). ACM, New York, USA.Google Scholar
  17. Caron, F., & Doucet, A. (2012). Efficient bayesian inference for generalized Bradley–Terry models. Journal of Computational and Graphical Statistics, 21(1), 174–196.MathSciNetCrossRefGoogle Scholar
  18. Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart–Young” decomposition. Psychometrika, 35(3), 283–319.CrossRefzbMATHGoogle Scholar
  19. Chechik, G., Sharma, V., Shalit, U., & Bengio, S. (2009). An online algorithm for large scale image similarity learning. Advances in Neural Information Processing Systems, 21, 1–9.zbMATHGoogle Scholar
  20. Chechik, G., Sharma, V., Shalit, U., & Bengio, S. (2010). Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11, 1–29.MathSciNetzbMATHGoogle Scholar
  21. Cheng, W., & Hüllermeier, E. (2008). Learning similarity functions from qualitative feedback. In Proceedings of the ECCBR—2008, 9th European conference on case-based reasoning (pp. 120–134). Springer, Trier, Germany, no. 5239 in LNAI.Google Scholar
  22. Cheng, W., Henzgen, S., & Hüllermeier, E. (2013). Labelwise versus pairwise decomposition in label ranking. In Proceedings of Lernen Wissen Adaptivität 2013 (LWA 2013) (pp. 140–147). Otto Friedrich Universität Bamberg, Germany.Google Scholar
  23. Cheng, W., Hühn, J., & Hüllermeier, E. (2009). Decision tree and instance-based learning for label ranking. In Proceedings ICML, 26th international conference on machine learning (pp. 161–168). Omnipress, Montreal, Canada.Google Scholar
  24. Cheng, W., Hüllermeier, E., Waegeman, W., & Welker, V. (2012). Label ranking with partial abstention based on thresholded probabilistic models. In Proceedings NIPS—2012, 26th annual conference on neural information processing systems, Lake Tahoe, Nevada, US.Google Scholar
  25. Cheng, W., Rademaker, M., De Beats, B., & Hüllermeier, E. (2010b). Predicting partial orders: Ranking with abstention. In Proceedings ECML/PKDD—2010, European conference on machine learning and principles and practice of knowledge discovery in databases, Barcelona, Spain.Google Scholar
  26. Cheng, W., Dembczyński, K., & Hüllermeier, E. (2010a). Label ranking methods based on the Plackett–Luce model. In J. Fürnkranz & T. Joachims (Eds.), Proceedings ICML, 27th international conference on machine learning (pp. 215–222). Haifa: Omnipress.Google Scholar
  27. Cohen, W., Schapire, R., & Singer, Y. (1999). Learning to order things. Journal of Artificial Intelligence Research, 10(1), 243–270.MathSciNetzbMATHGoogle Scholar
  28. Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57(3), 145–158.CrossRefGoogle Scholar
  29. David, H. A. (1969). The method of paired comparisons. London: Griffin.Google Scholar
  30. De Leeuw, J., & Mair, P. (2009). Multidimensional scaling using majorization: SMACOF in R. Journal of Statistical Software, 31(1), 1–30.Google Scholar
  31. De Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. In J. R. Barra, F. Brodeau, G. Romier, & B. Van Cutsem (Eds.), Recent developments in statistics (pp. 133–146). North Holland.Google Scholar
  32. De Leeuw, J., & Heiser, W. J. (1977). Convergence of correction matrix algorithms for multidimensional scaling. In J. C. Lingoes (Ed.), Geometric representations of relational data (pp. 735–752). Ann Arbor, MI: Mathesis Press.Google Scholar
  33. Dekel, O., Singer, Y., & Manning, C. D. (2004). Log-linear models for label ranking. In S. Thrun, L. K. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems (Vol. 16, pp. 497–504). Cambridge: MIT Press.Google Scholar
  34. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B (methodological), 39(1), 1–38.MathSciNetzbMATHGoogle Scholar
  35. Fürnkranz, J., & Hüllermeier, E. (2010). Preference learning: An introduction. In J. Fürnkranz & E. Hüllermeier (Eds.), Preference learning (pp. 1–17). New York: Springer.Google Scholar
  36. Fürnkranz, J., Hüllermeier, E., Mencía, E., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.CrossRefGoogle Scholar
  37. Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Caltech Mimeo, 11, 20.Google Scholar
  38. Groenen, P. J., & Heiser, W. J. (1996). The tunneling method for global optimization in multidimensional scaling. Psychometrika, 61(3), 529–550.CrossRefzbMATHGoogle Scholar
  39. Groenen, P., & van de Velden, M. (2016). Multidimensional scaling by majorization: A review. Journal of Statistical Software, 73(1), 1–26.Google Scholar
  40. Guiver, J., & Snelson, E. (2009). Bayesian inference for plackett-luce ranking models. In Proceedings ICML, 26th international conference on machine learning (pp. 377–384). ACM, ICML ’09.Google Scholar
  41. Har-Peled, S., Roth, D., & Zimak, D. (2002a). Constraint classification: A new approach to multiclass classification. In Proceedings ALT, 13th international conference on algorithmic learning theory (pp. 365–379). Springer.Google Scholar
  42. Har-Peled, S., Roth, D., & Zimak, D. (2002b). Constraint classification for multiclass classification and ranking. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 15, pp. 809–816). Cambridge: MIT Press.Google Scholar
  43. Hüllermeier, E., Fürnkranz, J., Cheng, W., & Brinker, K. (2008). Label ranking by learning pairwise preferences. Artificial Intelligence, 172(16), 1897–1916.MathSciNetCrossRefzbMATHGoogle Scholar
  44. Hüllermeier, E., & Vanderlooy, S. (2009). Why fuzzy decision trees are good rankers. IEEE Transactions on Fuzzy Systems, 17(6), 1233–1244.CrossRefGoogle Scholar
  45. Hunter, D. R. (2004). MM algorithms for generalized Bradley–Terry models. Annals of Statistics, 32(1), 384–406.MathSciNetCrossRefzbMATHGoogle Scholar
  46. Huybrechts, G. (2016). Learning to rank with deep neural networks. Master’s thesis, Ecole polytechnique de Louvain (EPL).Google Scholar
  47. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.
  48. Kamishima, T., Kazawa, H., & Akaho, S. (2011). A survey and empirical comparison of object ranking methods. In Preference learning (pp .181–201). Springer.Google Scholar
  49. Kanda, J., Soares, C., Hruschka, E. R., & de Carvalho, A.C.P.L.F. (2012). A meta-learning approach to select meta-heuristics for the traveling salesman problem using MLP-based label ranking. In Proceedings ICONIP, 19th international conference on neural information processing (pp. 488–495). Springer, Doha, Qatar.Google Scholar
  50. Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93.CrossRefzbMATHGoogle Scholar
  51. Kidwell, P., Lebanon, G., & Cleveland, W. (2008). Visualizing incomplete and partially ranked data. IEEE Transactions on Visualization and Computer Graphics, 14(6), 1356–1363.CrossRefGoogle Scholar
  52. Krantz, D. H. (1967). Rational distance functions for multidimensional scaling. Journal of Mathematical Psychology, 4(2), 226–245.CrossRefzbMATHGoogle Scholar
  53. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 25, pp. 1097–1105). Red Hook: Curran Associates, Inc.Google Scholar
  54. Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1–27.MathSciNetCrossRefzbMATHGoogle Scholar
  55. Lange, K. (2016). MM optimization algorithms. Philadelphia: Society for Industrial and Applied Mathematics (SIAM).CrossRefzbMATHGoogle Scholar
  56. Lange, K., Hunter, D., & Yang, I. (2000). Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9, 1–20.MathSciNetGoogle Scholar
  57. Larochelle, H., Erhan, D., & Bengio, Y. (2008). Zero-data learning of new tasks. In Proceedings of the AAAI’08, 23rd national conference on artificial intelligence (pp. 646–651).Google Scholar
  58. Larrañaga, P., Kuijpers, C. M. H., Murga, R. H., Inza, I., & Dizdarevic, S. (1999). Genetic algorithms for the traveling salesman problem: A review of representations and operators. Artificial Intelligence Review, 13, 129–170. Scholar
  59. Lichman, M. (2013). UCI Machine Learning Repository. School of Information and Computer Sciences, University of California, Irvine.
  60. Liu, T. (2011). Learning to rank for information retrieval. New York: Springer.CrossRefzbMATHGoogle Scholar
  61. Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1–3), 503–528.MathSciNetCrossRefzbMATHGoogle Scholar
  62. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley.zbMATHGoogle Scholar
  63. Luce, R. D. (1961). A choice theory analysis of similarity judgments. Psychometrika, 26(2), 151–163.CrossRefzbMATHGoogle Scholar
  64. Luenberger, D. G. (1973). Introduction to linear and nonlinear programming. Reading, MA: Addison-Wesley.zbMATHGoogle Scholar
  65. Luo, T., Wang, D., Liu, R., & Pan, Y. (2015). Stochastic top-k listnet. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 676–684). Association for Computational Linguistics, Lisbon, Portugal.Google Scholar
  66. Mallows, C. L. (1957). Non-null ranking models. Biometrika, 44(1/2), 114–130.MathSciNetCrossRefzbMATHGoogle Scholar
  67. Marden, J. I. (1995). Analyzing and modeling rank data. London: Chapman & Hall.zbMATHGoogle Scholar
  68. Maystre, L., & Grossglauser, M. (2015). Fast and accurate inference of Plackett–Luce models. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 28, pp. 172–180). Red Hook: Curran Associates, Inc.Google Scholar
  69. Menke, J. E., & Martinez, T. R. (2008). A Bradley–Terry artificial neural network model for individual ratings in group competitions. Neural Computing and Applications, 17(2), 175–186.CrossRefGoogle Scholar
  70. Menon, A. K., & Elkan, C. (2010a). Dyadic prediction using a latent feature log-linear model. arXiv:1006.2156.
  71. Menon, A. K., & Elkan, C. (2010b). A log-linear model with latent features for dyadic prediction. In Proceedings of the 2010 IEEE international conference on data mining (pp. 364–373). IEEE Computer Society, ICDM ’10.Google Scholar
  72. Menon, A. K., & Elkan, C. (2010c). Predicting labels for dyadic data. Data Mining and Knowledge Discovery, 21(2), 327–343.MathSciNetCrossRefGoogle Scholar
  73. Meulman, J. J., Van der Kooj, A. J., & Heiser, W. J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. The Sage handbook of quantitative methodology for the social sciences (pp. 49–72). London: Sage.Google Scholar
  74. Mitchell, M. (1998). An introduction to genetic algorithms. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  75. Murata, N., Kitazono, J., & Ozawa, S. (2017). Multidimensional unfolding based on stochastic neighbor relationship. In Proceedings of the 9th international conference on machine learning and computing (pp. 248–252).Google Scholar
  76. Pahikkala, T., Stock, M., Airola, A., Aittokallio, T., De Baets, B., & Waegeman, W. (2014). A two-step learning approach for solving full and almost full cold start problems in dyadic prediction. In T. Calders, F. Esposito, E. Hüllermeier, & R. Meo (Eds.), Lecture notes in computer science (Vol. 8725, pp. 517–532). Springer.Google Scholar
  77. Pahikkala, T., Waegeman, W., Airola, A., Salakoski, T., & De Baets, B. (2010). Conditional ranking on relational data. In Proceedings ECML/PKDD European conference on machine learning and knowledge discovery in databases (pp. 499–514). Springer.Google Scholar
  78. Pahikkala, T., & Airola, A. (2016). RLScore: Regularized least-squares learners. Journal of Machine Learning Research, 17(221), 1–5.MathSciNetzbMATHGoogle Scholar
  79. Pahikkala, T., Airola, A., Stock, M., De Baets, B., & Waegeman, W. (2013). Efficient regularized least-squares algorithms for conditional ranking on relational data. Machine Learning, 93, 321–356.MathSciNetCrossRefzbMATHGoogle Scholar
  80. Plackett, R. L. (1975). The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics), 24(2), 193–202.MathSciNetGoogle Scholar
  81. Prechelt, L. (2012). Early stopping: But when? In G. Montavon, G. B. Orr, & K.-R. Müller (Eds.), Neural networks: Tricks of the trade (pp. 53–67). Springer.Google Scholar
  82. Ribeiro, G., Duivesteijn, W., Soares, C., & Knobbe, A. J. (2012). Multilayer perceptron for label ranking. In Proceedings ICANN, 22nd international conference on artificial neural networks (pp. 25–32). Springer, Lausanne, Switzerland.Google Scholar
  83. Rigutini, L., Papini, T., Maggini, M., & Scarselli, F. (2011). Sortnet: Learning to rank by a neural preference function. IEEE Transactions on Neural Networks, 22(9), 1368–1380.CrossRefGoogle Scholar
  84. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 9.CrossRefzbMATHGoogle Scholar
  85. Schäfer, D., & Hüllermeier, E. (2015). Dyad ranking using a bilinear Plackett–Luce model. In Proceedings ECML/PKDD—2015, European conference on machine learning and knowledge discovery in databases (pp. 227–242). Springer, Porto, Portugal.Google Scholar
  86. Schäfer, D., & Hüllermeier, E. (2016). Plackett–Luce networks for dyad ranking. In Workshop LWDA, “Lernen, Wissen, Daten, Analysen”. Potsdam, Germany.Google Scholar
  87. Soufiani, H., Parkes, D., & Xia, L. (2014). Computing parametric ranking models via rank-breaking. In Proceedings of ICML, 31st international conference on machine learning, Beijing, China.Google Scholar
  88. Tesauro, G. (1989). Connectionist learning of expert preferences by comparison training. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems (NIPS-1988) (Vol. 1, pp. 99–106). Los Altos: Morgan Kaufmann.Google Scholar
  89. Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. (2008). Multilabel classification of music into emotions. In Proceedings of ISMIR 2008, international conference on music information retrieval (pp. 325–330), Philadelphia, PA, USA.Google Scholar
  90. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.MathSciNetzbMATHGoogle Scholar
  91. Tsoumakas, I., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.CrossRefGoogle Scholar
  92. Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S. Messick (Eds.), Psychological scaling: Theory and applications (pp. 155–167). New York: Wiley.Google Scholar
  93. Van Deun, K., Groenen, P., & Delbeke, L. (2005). VIPSCAL: A combined vector ideal point model for preference data. Econometric Institute Report No. EI 2005-03, Erasmus University Rotterdam.Google Scholar
  94. Vembu, S., & Gärtner, T. (2010). Label ranking algorithms: A survey. In J. Fürnkranz & E. Hüllermeier (Eds.), Preference Learning (pp. 45–64). New York: Springer.CrossRefGoogle Scholar
  95. Weimer, M., Karatzoglou, A., Le, Q. V., & Smola, A. J. (2007). COFI RANK: Maximum margin matrix factorization for collaborative ranking. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 1593–1600). Cambridge: MIT Press.Google Scholar
  96. Werbos, P. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA.Google Scholar
  97. Zhang, M. L., & Zhou, Z. H. (2006). Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 18(10), 1338–1351.CrossRefGoogle Scholar
  98. Zhou, Y., Liu, Y., Yang, J., He, X., & Liu, L. (2014). A taxonomy of label ranking algorithms. Journal of Computers, 9(3), 557–565.Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of Computer SciencePaderborn UniversityPaderbornGermany

Personalised recommendations