Learning to Rank
Many tasks in information retrieval, natural language processing, and data mining are essentially ranking problems. These include document retrieval, expert search, question answering, collaborative filtering, and keyphrase extraction. Learning to rank is a subarea of machine learning, studying methodologies and theories for automatically constructing a model from data for a ranking problem (Liu T-Y, Found Trends Inf Retr 3(3):225–331, 2009; Li H, Synth Lect Hum Lang Technol 4(1):1–113, 2011a; Li H, IEICE Trans Inf Syst 94-D(10):1854–1862, 2011b). Learning to rank is usually formalized as a supervised learning task, while unsupervised learning and semi-supervised learning formulations are also possible. In learning, training data consisting of sets of objects as well as the total or partial orders of the objects in each set is given, and a ranking model is learned using the data. In prediction, a new set of objects is given, and a ranking list of the objects is created using the ranking model. Learning to rank has been intensively studied in the past decade and many methods of learning to rank have been proposed. Popular methods include Ranking SVM, IR SVM, AdaRank, LambdaRank, and LambdaMART. The methods can be categorized into the pointwise, pairwise, and listwise approaches according to the loss functions which they use. It is known that learning-to-rank methods, such as LambdaMART, are being employed in a number of commercial web search engines. In this entry, we describe the formulation as well as several methods of learning to rank. Without loss of generality, we take document retrieval as example.
- Burges CJC (2010) From RankNet to LambdaRank to LambdaMART: an overview. Microsoft Research Technical Report, MSR-TR-2010-82Google Scholar
- Cao Y, Xu J, Liu T-Y, Li H, Huang Y, Hon H-W (2006) Adapting ranking SVM to document retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, Seattle, pp 186–193Google Scholar
- Herbrich R, Graepel T, Obermayer K (1999) Large margin rank boundaries for ordinal regression. Adv Neural Inf Process Syst 115–132Google Scholar
- Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, pp 391–398Google Scholar