Information Retrieval Journal

, Volume 18, Issue 1, pp 26–50 | Cite as

A new approach to query segmentation for relevance ranking in web search

  • Haocheng WuEmail author
  • Yunhua Hu
  • Hang Li
  • Enhong Chen


In this paper, we try to determine how best to improve state-of-the-art methods for relevance ranking in web searching by query segmentation. Query segmentation is meant to separate the input query into segments, typically natural language phrases. We propose employing the re-ranking approach in query segmentation, which first employs a generative model to create the top k candidates and then employs a discriminative model to re-rank the candidates to obtain the final segmentation result. The method has been widely utilized for structure prediction in natural language processing, but has not been applied to query segmentation, as far as we know. Furthermore, we propose a new method for using the results of query segmentation in relevance ranking, which takes both the original query words and the segmented query phrases as units of query representation. We investigate whether our method can improve three relevance models, namely n-gram BM25, key n-gram model and term dependency model, within the framework of learning to rank. Our experimental results on large scale web search datasets show that our method can indeed significantly improve relevance ranking in all three cases.


Web search Query segmentation Relevance ranking Query processing Re-ranking BM25 Term dependency model Key n-gram extraction 


  1. Bendersky, M., Croft, W. B., & Smith, D. A. (2009). Two-stage query segmentation for information retrieval. In SIGIR, pp. 810–811.Google Scholar
  2. Bendersky, M., Metzler, D., & Croft, W. B. (2010). Learning concept importance using a weighted dependence model. In WSDM, pp. 31–40.Google Scholar
  3. Bendersky, M., Croft, W. B., & Smith, D. A. (2011a). Joint annotation of search queries. In ACL, pp. 102–111.Google Scholar
  4. Bendersky, M., Metzler, D., & Croft, W. B. (2011b). Parameterized concept weighting in verbose queries. In SIGIR, pp. 605–614.Google Scholar
  5. Bergsma, S., & Wang, Q. I. (2007). Learning noun phrase query segmentation. In EMNLP-CoNLL, pp. 819–826.Google Scholar
  6. Brenes, D. J., Gayo-Avello, D., & Garcia, R. (2010). On the fly query entity decomposition using snippets. CoRR abs/1005.5516.Google Scholar
  7. Burges, C. J. C. (2010). From ranknet to lambdarank to lambdamart: An overview. Microsoft Research Technical Report MSR-TR-2010-82.Google Scholar
  8. Hagen, M., Potthast, M., Stein, B., & Bräutigam, C. (2011). Query segmentation revisited. In WWW, pp. 97–106.Google Scholar
  9. Hagen, M., Potthast, M., Beyer, A., & Stein, B. (2012). Towards optimum query segmentation: In doubt without. In CIKM, pp. 1015–1024.Google Scholar
  10. Huang, J., Gao, J., Miao, J., Li, X., Wang, K., Behr, F., & Giles, C. L. (2010). Exploring web scale language models for search query processing. In WWW, pp. 451–460.Google Scholar
  11. Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly relevant documents. In SIGIR, pp. 41–48.Google Scholar
  12. Joachims, T. (2002). Learning to classify text using support vector machines—Methods, theory, and algorithms. Berlin: Springer.CrossRefGoogle Scholar
  13. Jones, R., Rey, B., Madani, O., & Greiner, W. (2006). Generating query substitutions. In WWW, pp. 387–396.Google Scholar
  14. Lafferty, J. D., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In SIGIR, pp. 111–119.Google Scholar
  15. Li, H. (2011a). Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies. Morgan and Claypool Publishers.Google Scholar
  16. Li, H. (2011b). A short introduction to learning to rank. IEICE Transactions, 94–D(10), 1854–1862.Google Scholar
  17. Li, Y., Hsu, B. J. P., Zhai, C., & Wang, K. (2011). Unsupervised query segmentation using clickthrough for information retrieval. In SIGIR, pp. 285–294.Google Scholar
  18. Liu, T. Y. (2011). Learning to rank for information retrieval. Berlin: Springer.CrossRefzbMATHGoogle Scholar
  19. Metzler, D., & Croft, W. B. (2005). A markov random field model for term dependencies. In SIGIR, pp. 472–479.Google Scholar
  20. Ogilvie, P., & Callan, J. P. (2003) Combining document representations for known-item search. In SIGIR, pp. 143–150.Google Scholar
  21. Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In Infoscale, p. 1.Google Scholar
  22. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR, pp. 275–281.Google Scholar
  23. Risvik, K. M., Mikolajewski, T., & Boros, P. (2003). Query segmentation for web search. In WWW (Posters).Google Scholar
  24. Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR, pp. 232–241.Google Scholar
  25. Robertson, S. E., Zaragoza, H., & Taylor, M. J. (2004). Simple bm25 extension to multiple weighted fields. In CIKM, pp. 42–49.Google Scholar
  26. Roy, R. S., Ganguly, N., Choudhury, M., & Laxman, S. (2012). An ir-based evaluation framework for web search query segmentation. In SIGIR, pp. 881–890.Google Scholar
  27. Tan, B., & Peng, F. (2008). Unsupervised query segmentation using generative language models and wikipedia. In WWW, pp. 347–356.Google Scholar
  28. Wang, C., Bi, K., Hu, Y., Li, H., & Cao, G. (2012). Extracting search-focused key n-grams for relevance ranking in web search. In WSDM, pp. 343–352.Google Scholar
  29. White, R. W., & Morris, D. (2007). Investigating the querying and browsing behavior of advanced search engine users. In SIGIR, pp. 255–262.Google Scholar
  30. Xu, J., Li, H., & Zhong, C. (2010). Relevance ranking using kernels. In AIRS, pp. 1–12.Google Scholar
  31. Yu, X., & Shi, H. (2009). Query segmentation using conditional random fields. In KEYS, pp. 21–26.Google Scholar
  32. Zhang, C., Sun, N., Hu, X., Huang, T., & Chua, T. S. (2009). Query segmentation based on eigenspace similarity. In ACL/IJCNLP (Short Papers), pp. 185–188.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of Science and Technology of ChinaHefeiChina
  2. 2.Alibaba.comBeijingChina
  3. 3.Noah’s Ark Lab of Huawei TechnologiesHong KongChina

Personalised recommendations