Pattern Analysis and Applications

, Volume 16, Issue 4, pp 475–496 | Cite as

An empirical comparison of learning algorithms for nonparametric scoring: the TreeRank algorithm and other methods

  • Stéphan Clémençon
  • Marine DepeckerEmail author
  • Nicolas Vayatis


The TreeRank algorithm was recently proposed in [1] and [2] as a scoring-based method based on recursive partitioning of the input space. This tree induction algorithm builds orderings by recursively optimizing the Receiver Operating Characteristic curve through a one-step optimization procedure called LeafRank. One of the aim of this paper is the in-depth analysis of the empirical performance of the variants of TreeRank/LeafRank method. Numerical experiments based on both artificial and real data sets are provided. Further experiments using resampling and randomization, in the spirit of bagging and random forests are developed [3, 4] and we show how they increase both stability and accuracy in bipartite ranking. Moreover, an empirical comparison with other efficient scoring algorithms such as RankBoost and RankSVM is presented on UCI benchmark data sets.


Scoring rules Ranking trees ROC curve AUC maximization Resampling Feature randomization 



We warmly thank Cynthia Rudin who kindly provided the code for the P-norm Push algorithm.


  1. 1.
    Clémençon S, Vayatis N (2009) Tree-based ranking methods. IEEE Trans Inf Theory 9:4316–4336CrossRefGoogle Scholar
  2. 2.
    Clémençon S, Depecker M, Vayatis N (2011) Adaptive partitioning schemes for bipartite ranking. J Mach Learn 43(1):3169Google Scholar
  3. 3.
    Clémençon S, Depecker M, Vayatis N (2009) Bagging ranking trees. In: Proceedings of ICMLA, international conference on machine learning and applicationsGoogle Scholar
  4. 4.
    Clémençon S, Vayatis N (2010) Ranking forests (to be published)Google Scholar
  5. 5.
    Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNetGoogle Scholar
  6. 6.
    Hastie T, Tibshirani R (1990) Generalized additive models. Chapman & Hall, Boca RatonGoogle Scholar
  7. 7.
    Zhu J, Hastie T (2005) Kernel logistic regression and the import vector machine. J Comput Graph Stat 14(1):185–205MathSciNetCrossRefGoogle Scholar
  8. 8.
    Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142Google Scholar
  10. 10.
    Pahikkala T, Tsivtsivadze E, Airola A, Boberg J, Salakoski T (2007) Learning to rank with pairwise regularized least-squares. In: Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval, pp 27–33Google Scholar
  11. 11.
    Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of ICML, 22nd international conference on machine learning, pp 89–96Google Scholar
  12. 12.
    Dodd L, Pepe M (2003) Partial AUC estimation and regression. Biometrics 59(3):614–623MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Clémençon S, Vayatis N (2007) Ranking the Best Instances. J Mach Learn Res 8:2671–2699MathSciNetzbMATHGoogle Scholar
  14. 14.
    Clémençon S, Vayatis N (2008) Empirical performance maximization for linear rank statistics. In: Proceedings of NIPS’08, conference on neural information processing systems, pp 305–312Google Scholar
  15. 15.
    Rudin C (2009) The P-norm push: a simple convex ranking algorithm that concentrates at the top of the list. J Mach Learn Res 10:2233–2271MathSciNetzbMATHGoogle Scholar
  16. 16.
    Robertson S, Zaragoza H (2007) On rank-based effectiveness measures and optimization. Inf Retr 10(3):321–339CrossRefGoogle Scholar
  17. 17.
    Bartlett P, Jordan M, McAuliffe J (2006) Convexity classification and risk bounds. J Am Stat Assoc 101(473):138–156MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Bartlett P, Tewari A (2007) Sparseness vs estimating conditional probabilities: some asymptotic results. J Mach Learn Res 8:775–790MathSciNetzbMATHGoogle Scholar
  19. 19.
    Mease D, Wyner A (2008) Evidence contrary to the statistical view of boosting. J Mach Learn Res 9:131–156Google Scholar
  20. 20.
    Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, BerlinGoogle Scholar
  21. 21.
    Clémençon S, Vayatis N (2010) Overlaying classifiers: a practical approach for optimal scoring. Constr Approx 32(3):619–648MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Boucheron S, Bousquet O, Lugosi G (2005) Theory of classification: a survey of recent advances. ESAIM Probab Stat 9:323–375MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    anley J, McNeil J (1982) The meaning and use of the area under a ROC curve. Radiology 143:29–36Google Scholar
  24. 24.
    Clémençon S, Lugosi G, Vayatis N (2008) Ranking and empirical risk minimization of U-statistics. Ann Stat 36:844–874CrossRefzbMATHGoogle Scholar
  25. 25.
    Ailon N, Mohri M (2010) Preference-based learning to rank. Mach Learn J 80(2):189–211Google Scholar
  26. 26.
    Breiman L, Friedman J, Olshen R, Stone C (1984) Classification, regression trees. Wadsworth and Brooks, MontereyGoogle Scholar
  27. 27.
    Bach FR, Heckerman D, Eric H (2006) Considering cost asymmetry in learning classifiers. J Mach Learn Res 7:1713–1741MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  • Stéphan Clémençon
    • 1
  • Marine Depecker
    • 1
    Email author
  • Nicolas Vayatis
    • 2
  1. 1.Télécom ParisTechParisFrance
  2. 2.ENS Cachan, UniverSudCachanFrance

Personalised recommendations