Skip to main content

Subset Ranking Using Regression

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4005))

Abstract

We study the subset ranking problem, motivated by its important application in web-search. In this context, we consider the standard DCG criterion (discounted cumulated gain) that measures the quality of items near the top of the rank-list. Similar to error minimization for binary classification, the DCG criterion leads to a non-convex optimization problem that can be NP-hard. Therefore a computationally more tractable approach is needed. We present bounds that relate the approximate optimization of DCG to the approximate minimization of certain regression errors. These bounds justify the use of convex learning formulations for solving the subset ranking problem. The resulting estimation methods are not conventional, in that we focus on the estimation quality in the top-portion of the rank-list. We further investigate the generalization ability of these formulations. Under appropriate conditions, the consistency of the estimation schemes with respect to the DCG metric can be derived.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bartlett, P., Jordan, M., McAuliffe, J.: Convexity, classification, and risk bounds. Technical Report 638, Statistics Department, University of California, Berkeley (2003) (to appear in JASA)

    Google Scholar 

  2. Lugosi, G., Vayatis, N.: On the Bayes-risk consistency of regularized boosting methods. The Annals of Statistics 32, 30–55 (2004)

    MathSciNet  MATH  Google Scholar 

  3. Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics 32, 56–85 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  4. Zhang, T.: Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research 5, 1225–1251 (2004)

    Google Scholar 

  5. Steinwart, I.: Support vector machines are universally consistent. J. Complexity 18, 768–791 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  6. Tewari, A., Bartlett, P.L.: On the consistency of multiclass classification methods. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 143–157. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Jarvelin, K., Kekalainen, J.: IR evaluation methods for retrieving highly relevant documents. In: SIGIR 2000, pp. 41–48 (2000)

    Google Scholar 

  8. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML 2005 (2005)

    Google Scholar 

  9. Hanley, J., McNeil, B.: The meaning and use of the Area under a Receiver Operating Characetristic (ROC) curve. Radiology, 29–36 (1982)

    Google Scholar 

  10. Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., Roth, D.: Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research 6, 393–425 (2005)

    MathSciNet  MATH  Google Scholar 

  11. Agarwal, S., Roth, D.: Learnability of bipartite ranking functions. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 16–31. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Clémençon, S., Lugosi, G., Vayatis, N.: Ranking and scoring using empirical risk minimization. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 1–15. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Rosset, S.: Model selection via the AUC. In: ICML 2004 (2004)

    Google Scholar 

  14. Herbrich, R., Graepel, T., Obermayer, K.: Large margin rank boundaries for ordinal regression. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 115–132. MIT Press, Cambridge (2000)

    Google Scholar 

  15. Cossock, D.: Method and apparatus for machine learning a document relevance function. US patent application, 20040215606 (2003)

    Google Scholar 

  16. Blanchard, G., Lugosi, G., Vayatis, N.: On the rate of convergence of regularized boosting classifiers. Journal of Machine Learning Research 4, 861–894 (2003)

    Article  MathSciNet  Google Scholar 

  17. Mannor, S., Meir, R., Zhang, T.: Greedy algorithms for classification - consistency, convergence rates, and adaptivity. Journal of Machine Learning Research 4, 713–741 (2003)

    Article  MathSciNet  Google Scholar 

  18. Friedman, J.: Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29, 1189–1232 (2001)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cossock, D., Zhang, T. (2006). Subset Ranking Using Regression. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_44

Download citation

  • DOI: https://doi.org/10.1007/11776420_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35294-5

  • Online ISBN: 978-3-540-35296-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics