Information Retrieval

, Volume 13, Issue 3, pp 291–314 | Cite as

Learning to rank with (a lot of) word features

  • Bing BaiEmail author
  • Jason Weston
  • David Grangier
  • Ronan Collobert
  • Kunihiko Sadamasa
  • Yanjun Qi
  • Olivier Chapelle
  • Kilian Weinberger
Learning to rank for information retrieval


In this article we present Supervised Semantic Indexing which defines a class of nonlinear (quadratic) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained from a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as cross-language retrieval or online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, correlated feature hashing and sparsification. We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.


Semantic indexing Feature hashing Learning to rank Cross language retrieval Content matching 


  1. Baeza-Yates, R., & Ribeiro-Neto, B., et al. (1999). Modern information retrieval. England: Addison-Wesley Harlow.Google Scholar
  2. Bai, B., Weston, J., Collobert, R., & Grangier, D. (2009). Supervised semantic indexing. In European conference on information retrieval.Google Scholar
  3. Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In ACM SIGIR’ 99, (pp. 222–229).Google Scholar
  4. Blei, D. M., & McAuliffe, J. D. (2007). Supervised topic models. In In advances in neural information processing systems (NIPS).Google Scholar
  5. Blei, D. M., Ng, A., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.zbMATHCrossRefGoogle Scholar
  6. Bunescu, R., & Pasca, M. (2006). Using encyclopedic knowledge for named entity disambiguation. In In EACL, (pp. 9–16).Google Scholar
  7. Burges, C., Ragno, R., & Le, Q. V. (2006). Learning to rank with nonsmooth cost functions. In Advances in neural information processing systems: Proceedings of the 2006 conference. Cambridge, MA: MIT Press.Google Scholar
  8. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In ICML 2005 (pp. 89–96). New York: ACM Press.Google Scholar
  9. Cao, Z., Qin, T., Liu, T. Y., Tsai, M. F., & Li, H. (2007). Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning (pp. 129–136). New York: ACM Press.Google Scholar
  10. Caruana, R., Lawrence, S., & Giles. L. (2000). Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in neural information processing systems, NIPS 13 (pp. 402–408).Google Scholar
  11. Chernov, S., Iofciu, T., Nejdl, W., & Zhou, X. (2006). Extracting semantic relationships between wikipedia categories. In In 1st international workshop: SemWiki2006—From Wiki to semantics (SemWiki 2006), co-located with the ESWC2006 in Budva.Google Scholar
  12. Collins, M., & Duffy, N. (2001). New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 263–270). Morristown, NJ: Association for Computational Linguistics.Google Scholar
  13. Collobert, R., & Bengio, S. (2004). Links between perceptrons, mlps and svms. In ICML 2004.Google Scholar
  14. Cucerzan, S. (2007). Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 708–716). Prague: Association for Computational Linguistics.Google Scholar
  15. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. JASIS, 41(6), 391–407.CrossRefGoogle Scholar
  16. Dumais, S. T., Letsche, T. A., Littman, M. L., & Landauer, T. K. (1997). Automatic cross-language retrieval using latent semantic indexing. In AAAI spring symposium on cross-language text and speech retrieval.Google Scholar
  17. Gabrilovich, E. & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In International joint conference on artificial intelligence.Google Scholar
  18. Gehler, P., Holub, A., & Welling, M. (2006). The rate adapting poisson (rap) model for information retrieval and object recognition. In Proceedings of the 23rd international conference on machine learning.Google Scholar
  19. Globerson, A., & Roweis, S. (2007). Visualizing pairwise similarity via semidefinite programming. In AISTATS.Google Scholar
  20. Goel, S., Langford, J., & Strehl, A. (2009). Predictive indexing for fast search. In Advances in neural information processing systems 21.Google Scholar
  21. Grangier, D., & Bengio, S. (2005). Inferring document similarity from hyperlinks. In CIKM ’05 (pp. 359–360). New York: ACM.Google Scholar
  22. Grangier, D., & Bengio, S., (2008). A discriminative kernel-based approach to rank images from text queries. IEEE Transactions on PAMI, 30(8), 1371–1384.Google Scholar
  23. Grefenstette, G. (1998). Cross-language information retrieval. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  24. Guyon, I. M., Gunn, S. R., Nikravesh, M., & Zadeh, L. (Eds). (2006). Feature extraction: Foundations and applications. Berlin: Springer.zbMATHGoogle Scholar
  25. Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. Cambridge, MA: MIT PressGoogle Scholar
  26. Hofmann, T. (1999). Probabilistic latent semantic indexing. In SIGIR 1999 (pp. 50–57). New York: ACM Press.Google Scholar
  27. Hu, J., Fang, L., Cao, Y., Zeng, H., Li, H., Yang, Q., et al. (2008). Enhancing text clustering by leveraging wikipedia semantics. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 179–186). New York: ACM.Google Scholar
  28. Jain, P., Kulis, B., Dhillon, I. S., & Grauman, K. (2008). Online metric learning and fast similarity search. In Advances in neural information processing systems (NIPS).Google Scholar
  29. Joachims, T. (2002). Optimizing search engines using clickthrough data. In ACM SIGKDD (pp. 133–142).Google Scholar
  30. Keller, M., & Bengio, S. (2005). A neural network for text representation. In International conference on artificial neural networks, ICANN, IDIAP-RR 05-12.Google Scholar
  31. Langford, J., Li, L., & Zhang, T. (2009). Sparse online learning via truncated gradient. In Advances in neural information processing systems 21.Google Scholar
  32. Liu, T. Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval.Google Scholar
  33. Milne, D. N., Witten, I. H., & Nichols D. M. (2007). A knowledge-based search engine powered by wikipedia. In CIKM ’07: Proceedings of the sixteenth ACM conference on conference on information and knowledge management (pp. 445–454). New York: ACM.Google Scholar
  34. Minier, Z., Bodo, Z., & Csato, L. (2007). Wikipedia-based kernels for text categorization. In In 9th international symposium on symbolic and numeric algorithms for scientific computing (pp. 157–164).Google Scholar
  35. Ruiz-casado, M., Alfonseca, E., & Castells, P. (2005). Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia. In In NLDB pp. 67–79. Berlin: Springer.Google Scholar
  36. Salakhutdinov, R., & Hinton, G. (2007). Semantic hashing. Proceedings of the SIGIR workshop on information retrieval and applications of graphical models, Amsterdam.Google Scholar
  37. Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels. In Twelfth international conference on artificial intelligence and statistics.Google Scholar
  38. Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach.Computational Linguistics, 22(1), 1–38.Google Scholar
  39. Sun, J., Chen, Z., Zeng, H., Lu, Y., Shi, C., & Ma, W. (2004). Supervised latent semantic indexing for document categorization. In ICDM 2004 (pp. 535–538). Washington, DC: IEEE Computer Society.Google Scholar
  40. Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2003). Inferring a semantic representation of text via cross-language correlation analysis. NIPS (pp. 1497–1504).Google Scholar
  41. Voorhees, E. M.,& Dang, H. T. (2005). Overview of the trec 2005 question answering track. In In TREC 2005.Google Scholar
  42. Wang, X., Sun, J., Chen, Z.,& Zhai, C. (2006). Latent semantic analysis for multiple-type interrelated data objects. In SIGIR’06.Google Scholar
  43. Weinberger, K., & Saul, L. (2008). Fast solvers and efficient implementations for distance metric learning. In International conference on machine learning.Google Scholar
  44. Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In SIGIR (pp. 271–278).Google Scholar
  45. Zighelnic, L., & Kurland, O. (2008). Query-drift prevention for robust query expansion. In SIGIR 2008 (pp. 825–826). New York: ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Bing Bai
    • 1
    Email author
  • Jason Weston
    • 1
  • David Grangier
    • 1
  • Ronan Collobert
    • 1
  • Kunihiko Sadamasa
    • 1
  • Yanjun Qi
    • 1
  • Olivier Chapelle
    • 2
  • Kilian Weinberger
    • 2
  1. 1.NEC Labs AmericaPrincetonUSA
  2. 2.Yahoo! ResearchSanta ClaraUSA

Personalised recommendations