Machine Learning

, Volume 91, Issue 1, pp 67–104 | Cite as

Ranking data with ordinal labels: optimality and pairwise aggregation

  • Stéphan Clémençon
  • Sylvain Robbiano
  • Nicolas Vayatis
Article

Abstract

The paper describes key insights in order to grasp the nature of K-partite ranking. From the theoretical side, the various characterizations of optimal elements are fully described, as well as the likelihood ratio monotonicity condition on the underlying distribution which guarantees that such elements do exist. Then, a pairwise aggregation procedure based on Kendall tau is introduced to relate learning rules dedicated to bipartite ranking and solutions of the K-partite ranking problem. Criteria reflecting ranking performance under these conditions such as the ROC surface and its natural summary, the volume under the ROC surface (VUS), are then considered as targets for empirical optimization. The consistency of pairwise aggregation strategies are studied under these criteria and shown to be efficient under reasonable assumptions. Eventually, numerical results illustrate the relevance of the methodology proposed.

Keywords

K-partite ranking Ordinal data ROC surface Volume under the ROC surface Empirical risk minimization Median ranking 

References

  1. Agarwal, S. (2008). Generalization bounds for some ordinal regression algorithms. In Proceedings of the 19th international conference on algorithmic learning theory, ALT ’08 (pp. 7–21). Berlin: Springer. CrossRefGoogle Scholar
  2. Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., & Roth, D. (2005). Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6, 393–425. MathSciNetGoogle Scholar
  3. Allwein, E., Schapire, R., & Singer, Y. (2001). Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141. MathSciNetMATHGoogle Scholar
  4. Audibert, J., & Tsybakov, A. (2007). Fast learning rates for plug-in classifiers. The Annals of Statistics, 35, 608–633. MathSciNetMATHCrossRefGoogle Scholar
  5. Barthélemy, J., Guénoche, A., & Hudry, O. (1989). Median linear orders: heuristics and a branch and bound algorithm. European Journal of Operational Research, 42(3), 313–325. MathSciNetMATHCrossRefGoogle Scholar
  6. Baskiotis, N., Clémençon, S., Depecker, M., & Vayatis, N. (2010). Treerank: an R package for bipartite ranking. In Proceedings of SMDTA 2010—stochastic modeling techniques and data analysis international conference. Google Scholar
  7. Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005a). Error limiting reductions between classification tasks. In Machine learning, proceedings of the twenty-second international conference (ICML 2005) (pp. 49–56). Google Scholar
  8. Beygelzimer, A., Langford, J., & Zadrozny, B. (2005b). Weighted one against all. In Proceedings of the 20th national conference on artificial intelligence, AAAI ’05 (Vol. 2, pp. 720–725). Google Scholar
  9. Charon, I., & Hudry, O. (1998). Lamarckian genetic algorithms applied to the aggregation of preferences. Annals of Operations Research, 80, 281–297. MathSciNetMATHCrossRefGoogle Scholar
  10. Clémençon, S., & Robbiano, S. (2011). Minimax learning rates for bipartite ranking and plug-in rules. In Proceedings of the 28th international conference on machine learning, ICML’11 (pp. 441–448). Google Scholar
  11. Clémençon, S., & Vayatis, N. (2009a). On partitioning rules for bipartite ranking. Journal of Machine Learning Research, 5, 97–104. Google Scholar
  12. Clémençon, S., & Vayatis, N. (2009b). Tree-based ranking methods. IEEE Transactions on Information Theory, 55(9), 4316–4336. CrossRefGoogle Scholar
  13. Clémençon, S., & Vayatis, N. (2009c). Adaptive estimation of the optimal ROC curve and a bipartite ranking algorithm. In Proceedings of the 20th international conference on algorithmic learning theory, ALT ’09 (pp. 216–231). CrossRefGoogle Scholar
  14. Clémençon, S., & Vayatis, N. (2010). Overlaying classifiers: a practical approach to optimal scoring. Constructive Approximation, 32(3), 619–648. MathSciNetMATHCrossRefGoogle Scholar
  15. Clémençon, S., Lugosi, G., & Vayatis, N. (2008). Ranking and empirical risk minimization of U-statistics. The Annals of Statistics, 36(2), 844–874. MathSciNetMATHCrossRefGoogle Scholar
  16. Clémençon, S., Depecker, M., & Vayatis, N. (2011a). Adaptive partitioning schemes for bipartite ranking. Machine Learning, 43(1), 31–69. CrossRefGoogle Scholar
  17. Clémençon, S., Depecker, M., & Vayatis, N. (2011b). Avancées récentes dans le domaine de l’apprentissage statistique d’ordonnancements. Revue d’Intelligence Artificielle, 25(3), 345–368. CrossRefGoogle Scholar
  18. David, A. B. (2008). Ordinal real-world data sets repository. Google Scholar
  19. Debnath, R., Takahide, N., & Takahashi, H. (2004). A decision based one-against-one method for multi-class support vector machine. Pattern Analysis and Its Applications, 7(2), 164–175. MathSciNetGoogle Scholar
  20. Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. The Journal of Artificial Intelligence Research, 2, 263–286. MATHGoogle Scholar
  21. Dreiseitl, S., Ohno-Machado, L., & Binder, M. (2000). Comparing three-class diagnostic tests by three-way ROC analysis. Medical Decision Making, 20, 323–331. CrossRefGoogle Scholar
  22. Edwards, D., Metz, C., & Kupinski, M. (2005). The hypervolume under the ROC hypersurface of ‘near-guessing’ and ‘near-perfect’ observers in n-class classification tasks. IEEE Transactions on Medical Imaging, 24(3), 293–299. CrossRefGoogle Scholar
  23. Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., & Vee, E. (2004). Comparing and aggregating rankings with ties. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’04 (pp. 47–58). CrossRefGoogle Scholar
  24. Ferri, C., Hernández-Orallo, J., & Salido, M. (2003). Volume under the ROC surface for multi-class problems. In Proceedings of 14th European conference on machine learning (pp. 108–120). Google Scholar
  25. Fieldsend, J., & Everson, R. (2005). Formulation and comparison of multi-class ROC surfaces. In Proceedings of the ICML 2005 workshop on ROC analysis in machine learning (pp. 41–48). Google Scholar
  26. Fieldsend, J., & Everson, R. (2006). Multi-class ROC analysis from a multi-objective optimisation perspective. Pattern Recognition Letters, 27, 918–927. CrossRefGoogle Scholar
  27. Flach, P. (2004). Tutorial: “the many faces of ROC analysis in machine learning”. Part III (Technical report). International conference on machine learning 2004. Google Scholar
  28. Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Google Scholar
  29. Freund, Y., Iyer, R. D., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969. MathSciNetGoogle Scholar
  30. Fürnkranz, J. (2002). Round robin classification. Journal of Machine Learning Research, 2, 721–747. MATHGoogle Scholar
  31. Fürnkranz, J., Hüllermeier, E., & Vanderlooy, S. (2009). Binary decomposition methods for multipartite ranking. In Proceedings of the European conference on machine learning and knowledge discovery in databases: Part I, ECML PKDD ’09 (pp. 359–374). CrossRefGoogle Scholar
  32. Hand, D., & Till, R. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171–186. MATHCrossRefGoogle Scholar
  33. Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26(2), 451–471. MathSciNetMATHCrossRefGoogle Scholar
  34. Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In Advances in large margin classifiers (pp. 115–132). Cambridge: MIT Press. Google Scholar
  35. Higgins, J. (2004). Introduction to modern nonparametric statistics. N. Scituate: Duxbury Press. Google Scholar
  36. Hudry, O. (2008). NP-hardness results for the aggregation of linear orders into median orders. Annals of Operations Research, 163, 63–88. MathSciNetMATHCrossRefGoogle Scholar
  37. Huhn, J., & Hüllermeier, E. (2008). Is an ordinal class structure useful in classifier learning? International Journal of Data Mining, Modelling and Management, 1(1), 45–67. CrossRefGoogle Scholar
  38. Kramer, S., Pfahringer, B., Widmer, G., & Groeve, M. D. (2001). Prediction of ordinal regression trees. Fundamenta Informaticae, 47, 1001–1013. Google Scholar
  39. Laguna, M., Marti, R., & Campos, V. (1999). Intensification and diversification with elite tabu search solutions for the linear ordering problem. Computers and Operations Research, 26(12), 1217–1230. MATHCrossRefGoogle Scholar
  40. Landgrebe, T., & Duin, R. (2006). A simplified extension of the area under the ROC to the multiclass domain. In Seventeenth annual symposium of the pattern recognition association of South Africa (pp. 241–245). Google Scholar
  41. Lebanon, G., & Lafferty, J. (2002). Conditional models on the ranking poset. In Advances in neural information processing systems (Vol. 15, pp. 415–422). Google Scholar
  42. Lehmann, E., & Romano, J. P. (2005). Testing statistical hypotheses. Berlin: Springer. MATHGoogle Scholar
  43. Li, J., & Zhou, X. (2009). Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface. Journal of Statistical Planning and Inference, 139, 4133–4142. MathSciNetMATHCrossRefGoogle Scholar
  44. Mandhani, B., & Meila, M. (2009). Tractable search for learning exponential models of rankings. Journal of Machine Learning Research. Proceedings Track, 5, 392–399. Google Scholar
  45. Meila, M., Phadnis, K., Patterson, A., & Bilmes, J. (2007). Consensus ranking under the exponential model. In Proceedings of the twenty-third conference annual conference on uncertainty in artificial intelligence (UAI-07) (pp. 285–294). Google Scholar
  46. Mossman, D. (1999). Three-way ROCs. Medical Decision Making, 19(1), 78–89. CrossRefGoogle Scholar
  47. Nakas, C., & Yiannoutsos, C. (2004). Ordered multiple-class ROC analysis with continuous measurements. Statistics in Medicine, 23(22), 3437–3449. CrossRefGoogle Scholar
  48. Pahikkala, T., Tsivtsivadze, E., Airola, A., Boberg, J., & Salakoski, T. (2007). Learning to rank with pairwise regularized least-squares. In Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval (pp. 27–33). Google Scholar
  49. Pepe, M. (2003). Statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press. MATHGoogle Scholar
  50. Rajaram, S., & Agarwal, S. (2005). Generalization bounds for k-partite ranking. In NIPS workshop on learning to rank. Google Scholar
  51. Robbiano, S. (2010). Note on confidence regions for the ROC surface (Technical report). Telecom ParisTech. Google Scholar
  52. Rudin, C., Cortes, C., Mohri, M., & Schapire, R. E. (2005). Margin-based ranking and boosting meet in the middle. In Proceedings of the 18th annual conference on learning theory, COLT’05 (pp. 63–78). Berlin: Springer. Google Scholar
  53. Scurfield, B. (1996). Multiple-event forced-choice tasks in the theory of signal detectability. Journal of Mathematical Psychology, 40, 253–269. MATHCrossRefGoogle Scholar
  54. Tsybakov, A. (2004). Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1), 135–166. MathSciNetMATHCrossRefGoogle Scholar
  55. Vapnik, V. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5), 988–999. CrossRefGoogle Scholar
  56. Venkatesan, G., & Amit, S. (1999). Multiclass learning, boosting, and error-correcting codes. In Proceedings of the twelfth annual conference on computational learning theory, COLT’99 (pp. 145–155). Google Scholar
  57. Waegeman, W., & Baets, B. D. (2011). On the era ranking representability of pairwise bipartite ranking functions. Artificial Intelligence, 175, 1223–1250. MathSciNetMATHCrossRefGoogle Scholar
  58. Waegeman, W., Baets, B. D., & Boullart, L. (2008a). On the scalability of ordered multi-class ROC analysis. Computational Statistics and Data Analysis, 52, 3371–3388. MathSciNetMATHCrossRefGoogle Scholar
  59. Waegeman, W., Baets, B. D., & Boullart, L. (2008b). ROC analysis in ordinal regression learning. Pattern Recognition Letters, 29, 1–9. CrossRefGoogle Scholar
  60. Wakabayashi, Y. (1998). The complexity of computing medians of relations. Resenhas, 3(3), 323–349. MathSciNetMATHGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Stéphan Clémençon
    • 1
  • Sylvain Robbiano
    • 1
  • Nicolas Vayatis
    • 2
  1. 1.LTCI UMR Telecom ParisTech/CNRS No. 5141Telecom ParisTechParis cedex 13France
  2. 2.CMLA UMR CNRS No. 8536ENS Cachan & UniverSudCahan cedexFrance

Personalised recommendations