Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve

Donmez, Pinar; Carbonell, Jaime G.

doi:10.1007/978-3-642-00958-7_10

Pinar Donmez¹⁹ &
Jaime G. Carbonell¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

European Conference on Information Retrieval

3321 Accesses
21 Citations

Abstract

Learning ranking functions is crucial for solving many problems, ranging from document retrieval to building recommendation systems based on an individual user’s preferences or on collaborative filtering. Learning-to-rank is particularly necessary for adaptive or personalizable tasks, including email prioritization, individualized recommendation systems, personalized news clipping services and so on. Whereas the learning-to-rank challenge has been addressed in the literature, little work has been done in an active-learning framework, where requisite user feedback is minimized by selecting only the most informative instances to train the rank learner. This paper addresses active rank-learning head on, proposing a new sampling strategy based on minimizing hinge rank loss, and demonstrating the effectiveness of the active sampling method for rankSVM on two standard rank-learning datasets. The proposed method shows convincing results in optimizing three performance metrics, as well as improvement against four baselines including entropy-based, divergence- based, uncertainty-based and random sampling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amini, M., Usunier, N., Laviolette, F., Lacasse, A., Gallinari, P.: A selective sampling strategy for label ranking. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, vol. 4212, pp. 18–29. Springer, Heidelberg (2006)
Chapter Google Scholar
Brefeld, U., Scheffer, T.: AUC maximizing support vector learning. In: ICML Workshop on ROC Analysis in Machine Learning (2005)
Google Scholar
Brinker, K.: Active Learning of Label Ranking Functions. In: ICML 2004, pp. 17–24 (2004)
Google Scholar
Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., Hon, H.-W.: Adapting ranking svm to document retrieval. In: Proceedings of the international ACM SIGIR Conference on Research and Development in information retrieval (SIGIR 2006), pp. 186–193 (2006)
Google Scholar
Chu, W., Ghahramani, Z.: Extensions of Gaussian Processes for Ranking: Semi-supervised and Active Learning. In: Proceedings of the NIPS 2005 Workshop on Learning to Rank, pp. 29–34 (2005)
Google Scholar
Craswell, N., Hawking, D., Wilkinson, R., Wu, M.: Overview of the trec 2003 web track. In: Text Retrieval Conference (TREC 2003) (2003)
Google Scholar
Craswell, N., Hawking, D.: Overview of the trec 2004 web track. In: Text Retrieval Conference (TREC 2004) (2004)
Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4, 933–969 (2003)
MathSciNet MATH Google Scholar
Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 116–127. Springer, Heidelberg (2007)
Chapter Google Scholar
Hand, D.J., Till, R.J.: A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 171–186 (2001)
Google Scholar
Gao, J., Qi, H., Xia, X., Nie, J.-Y.: Linear discriminant model for information retrieval. In: Proceedings of the international ACM SIGIR Conference on Research and Development in information retrieval (SIGIR 2005), pp. 290–297 (2005)
Google Scholar
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: SIGIR 1994, pp. 3–12 (1994)
Google Scholar
Liu, T.Y., Xu, J., Qin, T., Xiong, W., Wang, T., Li, H.: http://research.microsoft.com/users/tyliu/LETOR/
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transaction on Information Systems 20(4), 422–446 (2002)
Article Google Scholar
Joachims, T.: http://svmlight.joachims.org/
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002) (2002)
Google Scholar
Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the International Conference on Machine Learning (ICML 2005), pp. 377–384 (2005)
Google Scholar
Mann, H.B., Whitney, D.R.: On a test whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 50–60 (1947)
Google Scholar
McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: ICML 1998, pp. 359–367 (1998)
Google Scholar
Nguyen, H.T., Smeulders, A.: Active learning with pre-clustering. In: ICML 2004, pp. 623–630 (2004)
Google Scholar
Radlinski, F., Joachims, T.: Active Exploration for Learning Rankings from Clickthrough Data. In: KDD 2007, pp. 570–579 (2007)
Google Scholar
Rajaram, S., Dagli, C.K., Petrovic, N., Huang, T.S.: Diverse Active Ranking for Multimedia Search. In: Computer Vision and Pattern Recognition (CVPR 2007) (2007)
Google Scholar
Rakotomamonjy, A.: Optimizing the area under ROC curve with SVMs. In: ECAI Workshop on ROC Analysis in AI (2004)
Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: ICML 2001, pp. 441–448 (2001)
Google Scholar
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pp. 287–294 (1992)
Google Scholar
Steck, H.: Hinge rank loss and the area under the ROC curve. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 347–358. Springer, Heidelberg (2007)
Chapter Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of International Conference on Machine Learning, pp. 999–1006 (2000)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Article MathSciNet Google Scholar
Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)
Chapter Google Scholar
Yu, H.: SVM selective sampling for ranking with application to data retrieval. In: SIGKDD 2005, pp. 354–363 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA, USA
Pinar Donmez & Jaime G. Carbonell

Authors

Pinar Donmez
View author publications
You can also search for this author in PubMed Google Scholar
Jaime G. Carbonell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université de Toulouse - IRIT,, 118 Route de Narbonne,, 31062,, Toulouse Cedex 4,, France
Mohand Boughanem
Laboratoire d’Informatique de Grenoble, BP 53,, Université Joseph Fourier,, 38041, Grenoble Cedex 9,, France
Catherine Berrut
Université de Toulouse - IRIT,, 118 Route de Narbonne,, 31062, Toulouse Cedex 4,, France
Josiane Mothe & Chantal Soule-Dupuy &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Donmez, P., Carbonell, J.G. (2009). Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-00958-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics