Abstract
Learning ranking functions is crucial for solving many problems, ranging from document retrieval to building recommendation systems based on an individual user’s preferences or on collaborative filtering. Learning-to-rank is particularly necessary for adaptive or personalizable tasks, including email prioritization, individualized recommendation systems, personalized news clipping services and so on. Whereas the learning-to-rank challenge has been addressed in the literature, little work has been done in an active-learning framework, where requisite user feedback is minimized by selecting only the most informative instances to train the rank learner. This paper addresses active rank-learning head on, proposing a new sampling strategy based on minimizing hinge rank loss, and demonstrating the effectiveness of the active sampling method for rankSVM on two standard rank-learning datasets. The proposed method shows convincing results in optimizing three performance metrics, as well as improvement against four baselines including entropy-based, divergence- based, uncertainty-based and random sampling methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amini, M., Usunier, N., Laviolette, F., Lacasse, A., Gallinari, P.: A selective sampling strategy for label ranking. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, vol. 4212, pp. 18–29. Springer, Heidelberg (2006)
Brefeld, U., Scheffer, T.: AUC maximizing support vector learning. In: ICML Workshop on ROC Analysis in Machine Learning (2005)
Brinker, K.: Active Learning of Label Ranking Functions. In: ICML 2004, pp. 17–24 (2004)
Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., Hon, H.-W.: Adapting ranking svm to document retrieval. In: Proceedings of the international ACM SIGIR Conference on Research and Development in information retrieval (SIGIR 2006), pp. 186–193 (2006)
Chu, W., Ghahramani, Z.: Extensions of Gaussian Processes for Ranking: Semi-supervised and Active Learning. In: Proceedings of the NIPS 2005 Workshop on Learning to Rank, pp. 29–34 (2005)
Craswell, N., Hawking, D., Wilkinson, R., Wu, M.: Overview of the trec 2003 web track. In: Text Retrieval Conference (TREC 2003) (2003)
Craswell, N., Hawking, D.: Overview of the trec 2004 web track. In: Text Retrieval Conference (TREC 2004) (2004)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4, 933–969 (2003)
Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 116–127. Springer, Heidelberg (2007)
Hand, D.J., Till, R.J.: A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 171–186 (2001)
Gao, J., Qi, H., Xia, X., Nie, J.-Y.: Linear discriminant model for information retrieval. In: Proceedings of the international ACM SIGIR Conference on Research and Development in information retrieval (SIGIR 2005), pp. 290–297 (2005)
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: SIGIR 1994, pp. 3–12 (1994)
Liu, T.Y., Xu, J., Qin, T., Xiong, W., Wang, T., Li, H.: http://research.microsoft.com/users/tyliu/LETOR/
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transaction on Information Systems 20(4), 422–446 (2002)
Joachims, T.: http://svmlight.joachims.org/
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002) (2002)
Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the International Conference on Machine Learning (ICML 2005), pp. 377–384 (2005)
Mann, H.B., Whitney, D.R.: On a test whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 50–60 (1947)
McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: ICML 1998, pp. 359–367 (1998)
Nguyen, H.T., Smeulders, A.: Active learning with pre-clustering. In: ICML 2004, pp. 623–630 (2004)
Radlinski, F., Joachims, T.: Active Exploration for Learning Rankings from Clickthrough Data. In: KDD 2007, pp. 570–579 (2007)
Rajaram, S., Dagli, C.K., Petrovic, N., Huang, T.S.: Diverse Active Ranking for Multimedia Search. In: Computer Vision and Pattern Recognition (CVPR 2007) (2007)
Rakotomamonjy, A.: Optimizing the area under ROC curve with SVMs. In: ECAI Workshop on ROC Analysis in AI (2004)
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: ICML 2001, pp. 441–448 (2001)
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pp. 287–294 (1992)
Steck, H.: Hinge rank loss and the area under the ROC curve. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 347–358. Springer, Heidelberg (2007)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of International Conference on Machine Learning, pp. 999–1006 (2000)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)
Yu, H.: SVM selective sampling for ranking with application to data retrieval. In: SIGKDD 2005, pp. 354–363 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Donmez, P., Carbonell, J.G. (2009). Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)