Skip to main content

Query Expansion for Language Modeling Using Sentence Similarities

  • Conference paper
Multidisciplinary Information Retrieval (IRFC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6653))

Included in the following conference series:

Abstract

We propose a novel method of query expansion for Language Modeling (LM) in Information Retrieval (IR) based on the similarity of the query with sentences in the top ranked documents from an initial retrieval run. In justification of our approach, we argue that the terms in the expanded query obtained by the proposed method roughly follow a Dirichlet distribution which, being the conjugate prior of the multinomial distribution used in the LM retrieval model, helps the feedback step. IR experiments on the TREC ad-hoc retrieval test collections using the sentence based query expansion (SBQE) show a significant increase in Mean Average Precision (MAP) compared to baselines obtained using standard term-based query expansion using LM selection score and the Relevance Model (RLM). The proposed approach to query expansion for LM increases the likelihood of generation of the pseudo-relevant documents by adding sentences with maximum term overlap with the query sentences for each top ranked pseudo-relevant document thus making the query look more like these documents. A per topic analysis shows that the new method hurts less queries compared to the baseline feedback methods, and improves average precision (AP) over a broad range of queries ranging from easy to difficult in terms of the initial retrieval AP. We also show that the new method is able to add a higher number of good feedback terms (the golden standard of good terms being the set of terms added by True Relevance Feedback). Additional experiments on the challenging search topics of the TREC-2004 Robust track show that the new method is able to improve MAP by 5.7% without the use of external resources and query hardness prediction typically used for these topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wilkinson, R.: Effective retrieval of structured documents. In: SIGIR, pp. 311–317. Springer New York, Inc., New York (1994)

    Google Scholar 

  2. Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: SIGIR 1998, pp. 2–10. ACM, New York (1998)

    Google Scholar 

  3. Terra, E.L., Warren, R.: Poison pills: harmful relevant documents in feedback. In: CIKM 2005, pp. 319–320. ACM, New York (2005)

    Google Scholar 

  4. Callan, J.P.: Passage-level evidence in document retrieval. In: SIGIR 1994, pp. 302–310. ACM/Springer (1994)

    Google Scholar 

  5. Allan, J.: Relevance feedback with too much data. In: SIGIR 1995, pp. 337–343. ACM Press, New York (1995)

    Google Scholar 

  6. Rocchio, J.J.: Relevance feedback in information retrieval. In: The SMART Retrieval System – Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  7. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Overview of the Third Text Retrieval Conference (TREC-3), pp. 109–126. NIST (1995)

    Google Scholar 

  8. Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, Center of Telematics and Information Technology, AE Enschede (2000)

    Google Scholar 

  9. Billerbeck, B., Zobel, J.: Questioning query expansion: An examination of behaviour and parameters. In: ADC 2004, vol. 27, pp. 69–76. Australian Computer Society, Inc. (2004)

    Google Scholar 

  10. Ogilvie, P., Vorhees, E., Callan, J.: On the number of terms used in automatic query expansion. Information Retrieval 12(6), 666–679

    Google Scholar 

  11. Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: SIGIR 2008, pp. 243–250. ACM, New York (2008)

    Google Scholar 

  12. Leveling, J., Jones, G.J.F.: Classifying and filtering blind feedback terms to improve information retrieval effectiveness. In: RIAO 2010, CID (2010)

    Google Scholar 

  13. Sakai, T., Manabe, T., Koyama, M.: Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Processing 4(2), 111–135 (2005)

    Article  Google Scholar 

  14. Robertson, S., Walker, S., Beaulieu, M., Willett, P.: Okapi at TREC-7: Automatic ad hoc, filtering, vlc and interactive track 21, 253–264 (1999)

    Google Scholar 

  15. Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using SMART: TREC 3. In: Overview of the Third Text REtrieval Conference (TREC-3), pp. 69–80. NIST (1994)

    Google Scholar 

  16. Ponte, J.M.: A language modeling approach to information retrieval. PhD thesis, University of Massachusetts (1998)

    Google Scholar 

  17. Lavrenko, V., Croft, B.W.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM, New York (2001)

    Google Scholar 

  18. Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: SIGIR 1996, pp. 4–11. ACM, New York (1996)

    Google Scholar 

  19. Lam-Adesina, A.M., Jones, G.J.F.: Applying summarization techniques for term selection in relevance feedback. In: SIGIR 2001, pp. 1–9. ACM, New York (2001)

    Google Scholar 

  20. Järvelin, K.: Interactive relevance feedback with graded relevance and sentence extraction: simulated user experiments. In: CIKM 2009, pp. 2053–2056. ACM, New York (2009)

    Google Scholar 

  21. Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: SIGIR 2010, pp. 579–586. ACM, New York (2010)

    Google Scholar 

  22. Murdock, V.: Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts - Amherst (2006)

    Google Scholar 

  23. Losada, D.E.: Statistical query expansion for sentence retrieval and its effects on weak and strong queries. Inf. Retr. 13, 485–506 (2010)

    Article  Google Scholar 

  24. Wilkinson, R., Zobel, J., Sacks-Davis, R.: Similarity measures for short queries. In: Fourth Text REtrieval Conference (TREC-4), pp. 277–285 (1995)

    Google Scholar 

  25. Blackwell, D., James, M.: Fergusson distributions via Polya urn schemes. Annals of Statistics, 353–355 (1973)

    Google Scholar 

  26. Xu, J., Croft, W.B.: Improving the effectiveness of informational retrieval with Local Context Analysis. ACM Transactions on Information Systems 18, 79–112 (2000)

    Article  Google Scholar 

  27. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  28. Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language-model based search engine for complex queries. In: Online Proceedings of the International Conference on Intelligence Analysis (2005)

    Google Scholar 

  29. Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: SIGIR 1998, pp. 206–214. ACM, New York (1998)

    Google Scholar 

  30. Voorhees, E.M.: Overview of the TREC 2004 robust track. In: TREC (2004)

    Google Scholar 

  31. Harman, D., Buckley, C.: The NRRC Reliable Information Access (ria) workshop. In: SIGIR 2004, pp. 528–529. ACM, New York (2004)

    Google Scholar 

  32. Buckley, C.: Why current IR engines fail. In: SIGIR 2004, pp. 584–585. ACM, New York (2004)

    Google Scholar 

  33. Kwok, K.L., Grunfeld, L., Sun, H.L., Deng, P.: TREC 2004 robust track experiments using PIRCS. In: TREC (2004)

    Google Scholar 

  34. Amati, G., Carpineto, C., Romano, G.: Fondazione Ugo Bordoni at TREC 2004. In: TREC (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ganguly, D., Leveling, J., Jones, G.J.F. (2011). Query Expansion for Language Modeling Using Sentence Similarities. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds) Multidisciplinary Information Retrieval. IRFC 2011. Lecture Notes in Computer Science, vol 6653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21353-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21353-3_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21352-6

  • Online ISBN: 978-3-642-21353-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics