Statistical query expansion for sentence retrieval and its effects on weak and strong queries
- 200 Downloads
The retrieval of sentences that are relevant to a given information need is a challenging passage retrieval task. In this context, the well-known vocabulary mismatch problem arises severely because of the fine granularity of the task. Short queries, which are usually the rule rather than the exception, aggravate the problem. Consequently, effective sentence retrieval methods tend to apply some form of query expansion, usually based on pseudo-relevance feedback. Nevertheless, there are no extensive studies comparing different statistical expansion strategies for sentence retrieval. In this work we study thoroughly the effect of distinct statistical expansion methods on sentence retrieval. We start from a set of retrieved documents in which relevant sentences have to be found. In our experiments different term selection strategies are evaluated and we provide empirical evidence to show that expansion before sentence retrieval yields competitive performance. This is particularly novel because expansion for sentence retrieval is often done after sentence retrieval (i.e. expansion terms are mined from a ranked set of sentences) and there are no comparative results available between both types of expansion. Furthermore, this comparison is particularly valuable because there are important implications in time efficiency. We also carefully analyze expansion on weak and strong queries and demonstrate clearly that expanding queries before sentence retrieval is not only more convenient for efficiency purposes, but also more effective when handling poor queries.
KeywordsSentence retrieval Query expansion Information retrieval
I thank the anonymous reviewers for their useful comments and suggestions that have been incorporated into this article. This research was co-funded by FEDER and Xunta de Galicia under projects 07SIN005206PR, and 2008/068.
- Abdul-Jaleel, N., Allan, J., Croft, B., Diaz, F., Larkey, L., Li, X., et al. (2004). UMass at TREC 2004: Novelty and hard. In Proceedings of the 13th text retrieval conference (TREC 2004). http://trec.nist.gov/pubs/trec13/papers/umass.novelty.hard.pdf.
- Allan, J., Wade, C., & Bolivar, A. (2003). Retrieval and novelty detection at the sentence level. In Proceedings of SIGIR-03, the 26th ACM conference on research and development in information retrieval (pp. 314–321). Toronto, Canada: ACM press.Google Scholar
- Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1996). New retrieval approaches using SMART: TREC 4. In D. Harman (Ed.), Proceedings of TREC-4 (pp. 25–48).Google Scholar
- Collins-Thompson, K., Ogilvie, P., Zhang, Y., & Callan, J. (2002). Information filtering, novelty detection, and named-page finding. In Proceedings of TREC-2002, the 11th text retrieval conference. http://trec.nist.gov/pubs/trec11/papers/cmu.collins-thompson.pdf.
- Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of SIGIR-2002, the 25th ACM conference on research and development in Information retrieval (pp. 299–306). Tampere, Finland.Google Scholar
- Harman, D. (2002). Overview of the TREC 2002 novelty track. In Proceedings of TREC-2002, the 11th text retrieval conference. http://trec.nist.gov/pubs/trec11/papers/NOVELTY.OVER.pdf.
- Hauff, C., Azzopardi, L., & Hiemstra, D. (2009). The combination and evaluation of query performance prediction methods. In Proceedings of the 31st European conference on information retrieval research, ECIR-09 (pp. 301–312).Google Scholar
- Hawking, D., Thistlewaite, P., & Craswell, N. (1998). ANU/ACSys TREC-6 experiments. In Proceedings of the 6th text retrieval conference (TREC-6). http://trec.nist.gov/pubs/trec6/papers/anu.ps.gz.
- Kirsh, D. (2000). A few thoughts on cognitive overload. Intellectia, 30, 19–51.Google Scholar
- Larkey, L., Allan, J., Connell, M., Bolivar, A., & Wade, C. (2002). UMass at TREC 2002: Cross language and novelty tracks. In Proceedings of TREC-2002, the 11th text retrieval conference. http://trec.nist.gov/pubs/trec11/papers/umass.wade.pdf.
- Lavrenko, V., & Croft, W. B. (2001). Relevance-based language models. In Proceedings of 24th ACM conference on research and development in information retrieval, SIGIR’01 (pp. 120–127). New Orleans, USA.Google Scholar
- Li, X., & Croft, B. (2005). Novelty detection based on sentence level patterns. In Proceedings of CIKM-2005, the ACM conference on information and knowledge management (pp. 314–321).Google Scholar
- Losada, D., & Fernández, R. T. (2007). Highly frequent terms and sentence retrieval. In Proceedings of 14th string processing and information retrieval symposium, SPIRE’07. Santiago de Chile.Google Scholar
- Macdonald, C., He, B., & Ounis, I. (2005). Predicting query performance in intranet search. In Proceedings of ACM SIGIR’05 query prediction workshop. Salvador, Brazil. http://ir.dcs.gla.ac.uk/terrier/publications/macdonald05predicting-query-performance-in-intranet-search.pdf.
- Murdock, V. (2006). Aspects of sentence retrieval. Ph.D. thesis, University of Massachussetts.Google Scholar
- Nobata, C., & Sekine, S. (1999). Towards automatic acquisition of patterns for information extraction. In Proceedings of international conference of computer processing of oriental languages (pp. 11–16).Google Scholar
- Robertson, S., Walker, S., Jones, S., HancockBeaulieu, M., & Gatford, M. (1995). Okapi at TREC-3. In D. Harman (Ed.), Proceedings of TREC-3, the 3rd text retrieval conference (pp. 109–127). NIST.Google Scholar
- Schiffman, B. (2002). Experiments in novelty detection at Columbia University. In Proceedings of TREC-2002, the 11th text retrieval conference. http://trec.nist.gov/pubs/trec11/papers/columbia.schiffman.pdf.
- Soboroff, I. (2004). Overview of the TREC 2004 novelty track. In Proceedings of TREC-2004, the 13th text retrieval conference. http://trec.nist.gov/pubs/trec13/papers/NOVELTY.OVERVIEW.pdf.
- Soboroff, I., & Harman, D. (2003). Overview of the TREC 2003 novelty track. In Proceedings of TREC-2003, the 12th text retrieval conference. http://trec.nist.gov/pubs/trec12/papers/NOVELTY.OVERVIEW.pdf.
- Tombros, A., & Sanderson, M. (1998). Advantages of query biased summaries in information retrieval. In Proceedings of SIGIR-98, the 21st ACM international conference on research and development in information retrieval (pp. 2–10). ACM press.Google Scholar
- Voorhees, E., & Harman, D. (Eds.). (2005). The TREC adhoc experiments. In TREC: Experiment and evaluation in information retrieval (pp. 79–97). Cambridge: The MIT press.Google Scholar
- Xu, J., & Croft, B. (1996). Query expansion using local and global document analysis. In Proceedings of SIGIR-96, the 19th ACM conference on research and development in information retrieval (pp. 4–11). Zurich, Switzerland.Google Scholar
- Zhang, M., Song, R., Lin, C., Ma, S., Jiang, Z., Jin, Y., et al. (2002). THU TREC 2002: Novelty track experiments. In Proceedings of TREC-2002, the 11th text retrieval conference. http://trec.nist.gov/pubs/trec11/papers/tsinghuau.novelty2.pdf.
- Zhang, H., Xu, H., Bai, S., Wang, B., & Cheng, X. (2004). Experiments in TREC 2004 novelty track at CAS-ICT. In Proceedings of TREC-2004, the 13th text retrieval conference. http://trec.nist.gov/pubs/trec13/papers/cas.ict.novelty.pdf.