Skip to main content

Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve

  • Conference paper
Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

Abstract

Learning ranking functions is crucial for solving many problems, ranging from document retrieval to building recommendation systems based on an individual user’s preferences or on collaborative filtering. Learning-to-rank is particularly necessary for adaptive or personalizable tasks, including email prioritization, individualized recommendation systems, personalized news clipping services and so on. Whereas the learning-to-rank challenge has been addressed in the literature, little work has been done in an active-learning framework, where requisite user feedback is minimized by selecting only the most informative instances to train the rank learner. This paper addresses active rank-learning head on, proposing a new sampling strategy based on minimizing hinge rank loss, and demonstrating the effectiveness of the active sampling method for rankSVM on two standard rank-learning datasets. The proposed method shows convincing results in optimizing three performance metrics, as well as improvement against four baselines including entropy-based, divergence- based, uncertainty-based and random sampling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amini, M., Usunier, N., Laviolette, F., Lacasse, A., Gallinari, P.: A selective sampling strategy for label ranking. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, vol. 4212, pp. 18–29. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Brefeld, U., Scheffer, T.: AUC maximizing support vector learning. In: ICML Workshop on ROC Analysis in Machine Learning (2005)

    Google Scholar 

  3. Brinker, K.: Active Learning of Label Ranking Functions. In: ICML 2004, pp. 17–24 (2004)

    Google Scholar 

  4. Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., Hon, H.-W.: Adapting ranking svm to document retrieval. In: Proceedings of the international ACM SIGIR Conference on Research and Development in information retrieval (SIGIR 2006), pp. 186–193 (2006)

    Google Scholar 

  5. Chu, W., Ghahramani, Z.: Extensions of Gaussian Processes for Ranking: Semi-supervised and Active Learning. In: Proceedings of the NIPS 2005 Workshop on Learning to Rank, pp. 29–34 (2005)

    Google Scholar 

  6. Craswell, N., Hawking, D., Wilkinson, R., Wu, M.: Overview of the trec 2003 web track. In: Text Retrieval Conference (TREC 2003) (2003)

    Google Scholar 

  7. Craswell, N., Hawking, D.: Overview of the trec 2004 web track. In: Text Retrieval Conference (TREC 2004) (2004)

    Google Scholar 

  8. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4, 933–969 (2003)

    MathSciNet  MATH  Google Scholar 

  9. Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 116–127. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Hand, D.J., Till, R.J.: A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 171–186 (2001)

    Google Scholar 

  11. Gao, J., Qi, H., Xia, X., Nie, J.-Y.: Linear discriminant model for information retrieval. In: Proceedings of the international ACM SIGIR Conference on Research and Development in information retrieval (SIGIR 2005), pp. 290–297 (2005)

    Google Scholar 

  12. Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: SIGIR 1994, pp. 3–12 (1994)

    Google Scholar 

  13. Liu, T.Y., Xu, J., Qin, T., Xiong, W., Wang, T., Li, H.: http://research.microsoft.com/users/tyliu/LETOR/

  14. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transaction on Information Systems 20(4), 422–446 (2002)

    Article  Google Scholar 

  15. Joachims, T.: http://svmlight.joachims.org/

  16. Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002) (2002)

    Google Scholar 

  17. Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the International Conference on Machine Learning (ICML 2005), pp. 377–384 (2005)

    Google Scholar 

  18. Mann, H.B., Whitney, D.R.: On a test whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 50–60 (1947)

    Google Scholar 

  19. McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: ICML 1998, pp. 359–367 (1998)

    Google Scholar 

  20. Nguyen, H.T., Smeulders, A.: Active learning with pre-clustering. In: ICML 2004, pp. 623–630 (2004)

    Google Scholar 

  21. Radlinski, F., Joachims, T.: Active Exploration for Learning Rankings from Clickthrough Data. In: KDD 2007, pp. 570–579 (2007)

    Google Scholar 

  22. Rajaram, S., Dagli, C.K., Petrovic, N., Huang, T.S.: Diverse Active Ranking for Multimedia Search. In: Computer Vision and Pattern Recognition (CVPR 2007) (2007)

    Google Scholar 

  23. Rakotomamonjy, A.: Optimizing the area under ROC curve with SVMs. In: ECAI Workshop on ROC Analysis in AI (2004)

    Google Scholar 

  24. Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: ICML 2001, pp. 441–448 (2001)

    Google Scholar 

  25. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pp. 287–294 (1992)

    Google Scholar 

  26. Steck, H.: Hinge rank loss and the area under the ROC curve. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 347–358. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  27. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of International Conference on Machine Learning, pp. 999–1006 (2000)

    Google Scholar 

  28. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)

    Article  MathSciNet  Google Scholar 

  29. Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  30. Yu, H.: SVM selective sampling for ranking with application to data retrieval. In: SIGKDD 2005, pp. 354–363 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Donmez, P., Carbonell, J.G. (2009). Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics