Information Retrieval

, Volume 13, Issue 5, pp 485–506 | Cite as

Statistical query expansion for sentence retrieval and its effects on weak and strong queries

  • David E. LosadaEmail author
S.I.: Focused Retrieval and Result Aggr.


The retrieval of sentences that are relevant to a given information need is a challenging passage retrieval task. In this context, the well-known vocabulary mismatch problem arises severely because of the fine granularity of the task. Short queries, which are usually the rule rather than the exception, aggravate the problem. Consequently, effective sentence retrieval methods tend to apply some form of query expansion, usually based on pseudo-relevance feedback. Nevertheless, there are no extensive studies comparing different statistical expansion strategies for sentence retrieval. In this work we study thoroughly the effect of distinct statistical expansion methods on sentence retrieval. We start from a set of retrieved documents in which relevant sentences have to be found. In our experiments different term selection strategies are evaluated and we provide empirical evidence to show that expansion before sentence retrieval yields competitive performance. This is particularly novel because expansion for sentence retrieval is often done after sentence retrieval (i.e. expansion terms are mined from a ranked set of sentences) and there are no comparative results available between both types of expansion. Furthermore, this comparison is particularly valuable because there are important implications in time efficiency. We also carefully analyze expansion on weak and strong queries and demonstrate clearly that expanding queries before sentence retrieval is not only more convenient for efficiency purposes, but also more effective when handling poor queries.


Sentence retrieval Query expansion Information retrieval 



I thank the anonymous reviewers for their useful comments and suggestions that have been incorporated into this article. This research was co-funded by FEDER and Xunta de Galicia under projects 07SIN005206PR, and 2008/068.


  1. Abdul-Jaleel, N., Allan, J., Croft, B., Diaz, F., Larkey, L., Li, X., et al. (2004). UMass at TREC 2004: Novelty and hard. In Proceedings of the 13th text retrieval conference (TREC 2004).
  2. Allan, J., Wade, C., & Bolivar, A. (2003). Retrieval and novelty detection at the sentence level. In Proceedings of SIGIR-03, the 26th ACM conference on research and development in information retrieval (pp. 314–321). Toronto, Canada: ACM press.Google Scholar
  3. Attar, R., & Fraenkel, A. (1977). Local feedback in full-text retrieval systems. Journal of the Association for Computing Machinery, 24(3), 397–417.zbMATHGoogle Scholar
  4. Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1996). New retrieval approaches using SMART: TREC 4. In D. Harman (Ed.), Proceedings of TREC-4 (pp. 25–48).Google Scholar
  5. Collins-Thompson, K., Ogilvie, P., Zhang, Y., & Callan, J. (2002). Information filtering, novelty detection, and named-page finding. In Proceedings of TREC-2002, the 11th text retrieval conference.
  6. Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of SIGIR-2002, the 25th ACM conference on research and development in Information retrieval (pp. 299–306). Tampere, Finland.Google Scholar
  7. Doi, T., Yamamoto, H., & Sumita, E. (2005). Example-based machine translation using efficient sentence retrieval based on edit-distance. ACM Transactions on Asian Language Information Processing, 4(4), 377–399.CrossRefGoogle Scholar
  8. Harman, D. (2002). Overview of the TREC 2002 novelty track. In Proceedings of TREC-2002, the 11th text retrieval conference.
  9. Hauff, C., Azzopardi, L., & Hiemstra, D. (2009). The combination and evaluation of query performance prediction methods. In Proceedings of the 31st European conference on information retrieval research, ECIR-09 (pp. 301–312).Google Scholar
  10. Hawking, D., Thistlewaite, P., & Craswell, N. (1998). ANU/ACSys TREC-6 experiments. In Proceedings of the 6th text retrieval conference (TREC-6).
  11. Kirsh, D. (2000). A few thoughts on cognitive overload. Intellectia, 30, 19–51.Google Scholar
  12. Larkey, L., Allan, J., Connell, M., Bolivar, A., & Wade, C. (2002). UMass at TREC 2002: Cross language and novelty tracks. In Proceedings of TREC-2002, the 11th text retrieval conference.
  13. Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. In Proceedings of 24th ACM conference on research and development in information retrieval, SIGIR’01 (pp. 120–127). New Orleans, USA.Google Scholar
  14. Li, X., & Croft, B. (2005). Novelty detection based on sentence level patterns. In Proceedings of CIKM-2005, the ACM conference on information and knowledge management (pp. 314–321).Google Scholar
  15. Losada, D., & Fernández, R. T. (2007). Highly frequent terms and sentence retrieval. In Proceedings of 14th string processing and information retrieval symposium, SPIRE’07. Santiago de Chile.Google Scholar
  16. Macdonald, C., He, B., & Ounis, I. (2005). Predicting query performance in intranet search. In Proceedings of ACM SIGIR’05 query prediction workshop. Salvador, Brazil.
  17. Murdock, V. (2006). Aspects of sentence retrieval. Ph.D. thesis, University of Massachussetts.Google Scholar
  18. Nobata, C., & Sekine, S. (1999). Towards automatic acquisition of patterns for information extraction. In Proceedings of international conference of computer processing of oriental languages (pp. 11–16).Google Scholar
  19. Robertson, S., Walker, S., Jones, S., HancockBeaulieu, M., & Gatford, M. (1995). Okapi at TREC-3. In D. Harman (Ed.), Proceedings of TREC-3, the 3rd text retrieval conference (pp. 109–127). NIST.Google Scholar
  20. Schiffman, B. (2002). Experiments in novelty detection at Columbia University. In Proceedings of TREC-2002, the 11th text retrieval conference.
  21. Soboroff, I. (2004). Overview of the TREC 2004 novelty track. In Proceedings of TREC-2004, the 13th text retrieval conference.
  22. Soboroff, I., & Harman, D. (2003). Overview of the TREC 2003 novelty track. In Proceedings of TREC-2003, the 12th text retrieval conference.
  23. Tombros, A., & Sanderson, M. (1998). Advantages of query biased summaries in information retrieval. In Proceedings of SIGIR-98, the 21st ACM international conference on research and development in information retrieval (pp. 2–10). ACM press.Google Scholar
  24. Voorhees, E., & Harman, D. (Eds.). (2005). The TREC adhoc experiments. In TREC: Experiment and evaluation in information retrieval (pp. 79–97). Cambridge: The MIT press.Google Scholar
  25. White, R., Jose, J., & Ruthven, I. (2005). Using top-ranking sentences to facilitate effective information access. Journal of the American Society for Information Science and Technology (JASIST), 56(10), 1113–1125.CrossRefGoogle Scholar
  26. Xu, J., & Croft, B. (1996). Query expansion using local and global document analysis. In Proceedings of SIGIR-96, the 19th ACM conference on research and development in information retrieval (pp. 4–11). Zurich, Switzerland.Google Scholar
  27. Xu, J., & Croft, B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112CrossRefGoogle Scholar
  28. Zhang, M., Song, R., Lin, C., Ma, S., Jiang, Z., Jin, Y., et al. (2002). THU TREC 2002: Novelty track experiments. In Proceedings of TREC-2002, the 11th text retrieval conference.
  29. Zhang, H., Xu, H., Bai, S., Wang, B., & Cheng, X. (2004). Experiments in TREC 2004 novelty track at CAS-ICT. In Proceedings of TREC-2004, the 13th text retrieval conference.

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Departamento de Electrónica y ComputaciónUniversidad de Santiago de CompostelaGaliciaSpain

Personalised recommendations