Query Expansion for Language Modeling Using Sentence Similarities

Ganguly, Debasis; Leveling, Johannes; Jones, Gareth J. F.

doi:10.1007/978-3-642-21353-3_6

Debasis Ganguly¹⁹,
Johannes Leveling¹⁹ &
Gareth J. F. Jones¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6653))

Included in the following conference series:

Information Retrieval Facility Conference

389 Accesses
4 Citations

Abstract

We propose a novel method of query expansion for Language Modeling (LM) in Information Retrieval (IR) based on the similarity of the query with sentences in the top ranked documents from an initial retrieval run. In justification of our approach, we argue that the terms in the expanded query obtained by the proposed method roughly follow a Dirichlet distribution which, being the conjugate prior of the multinomial distribution used in the LM retrieval model, helps the feedback step. IR experiments on the TREC ad-hoc retrieval test collections using the sentence based query expansion (SBQE) show a significant increase in Mean Average Precision (MAP) compared to baselines obtained using standard term-based query expansion using LM selection score and the Relevance Model (RLM). The proposed approach to query expansion for LM increases the likelihood of generation of the pseudo-relevant documents by adding sentences with maximum term overlap with the query sentences for each top ranked pseudo-relevant document thus making the query look more like these documents. A per topic analysis shows that the new method hurts less queries compared to the baseline feedback methods, and improves average precision (AP) over a broad range of queries ranging from easy to difficult in terms of the initial retrieval AP. We also show that the new method is able to add a higher number of good feedback terms (the golden standard of good terms being the set of terms added by True Relevance Feedback). Additional experiments on the challenging search topics of the TREC-2004 Robust track show that the new method is able to improve MAP by 5.7% without the use of external resources and query hardness prediction typically used for these topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wilkinson, R.: Effective retrieval of structured documents. In: SIGIR, pp. 311–317. Springer New York, Inc., New York (1994)
Google Scholar
Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: SIGIR 1998, pp. 2–10. ACM, New York (1998)
Google Scholar
Terra, E.L., Warren, R.: Poison pills: harmful relevant documents in feedback. In: CIKM 2005, pp. 319–320. ACM, New York (2005)
Google Scholar
Callan, J.P.: Passage-level evidence in document retrieval. In: SIGIR 1994, pp. 302–310. ACM/Springer (1994)
Google Scholar
Allan, J.: Relevance feedback with too much data. In: SIGIR 1995, pp. 337–343. ACM Press, New York (1995)
Google Scholar
Rocchio, J.J.: Relevance feedback in information retrieval. In: The SMART Retrieval System – Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Overview of the Third Text Retrieval Conference (TREC-3), pp. 109–126. NIST (1995)
Google Scholar
Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, Center of Telematics and Information Technology, AE Enschede (2000)
Google Scholar
Billerbeck, B., Zobel, J.: Questioning query expansion: An examination of behaviour and parameters. In: ADC 2004, vol. 27, pp. 69–76. Australian Computer Society, Inc. (2004)
Google Scholar
Ogilvie, P., Vorhees, E., Callan, J.: On the number of terms used in automatic query expansion. Information Retrieval 12(6), 666–679
Google Scholar
Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: SIGIR 2008, pp. 243–250. ACM, New York (2008)
Google Scholar
Leveling, J., Jones, G.J.F.: Classifying and filtering blind feedback terms to improve information retrieval effectiveness. In: RIAO 2010, CID (2010)
Google Scholar
Sakai, T., Manabe, T., Koyama, M.: Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Processing 4(2), 111–135 (2005)
Article Google Scholar
Robertson, S., Walker, S., Beaulieu, M., Willett, P.: Okapi at TREC-7: Automatic ad hoc, filtering, vlc and interactive track 21, 253–264 (1999)
Google Scholar
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using SMART: TREC 3. In: Overview of the Third Text REtrieval Conference (TREC-3), pp. 69–80. NIST (1994)
Google Scholar
Ponte, J.M.: A language modeling approach to information retrieval. PhD thesis, University of Massachusetts (1998)
Google Scholar
Lavrenko, V., Croft, B.W.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM, New York (2001)
Google Scholar
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: SIGIR 1996, pp. 4–11. ACM, New York (1996)
Google Scholar
Lam-Adesina, A.M., Jones, G.J.F.: Applying summarization techniques for term selection in relevance feedback. In: SIGIR 2001, pp. 1–9. ACM, New York (2001)
Google Scholar
Järvelin, K.: Interactive relevance feedback with graded relevance and sentence extraction: simulated user experiments. In: CIKM 2009, pp. 2053–2056. ACM, New York (2009)
Google Scholar
Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: SIGIR 2010, pp. 579–586. ACM, New York (2010)
Google Scholar
Murdock, V.: Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts - Amherst (2006)
Google Scholar
Losada, D.E.: Statistical query expansion for sentence retrieval and its effects on weak and strong queries. Inf. Retr. 13, 485–506 (2010)
Article Google Scholar
Wilkinson, R., Zobel, J., Sacks-Davis, R.: Similarity measures for short queries. In: Fourth Text REtrieval Conference (TREC-4), pp. 277–285 (1995)
Google Scholar
Blackwell, D., James, M.: Fergusson distributions via Polya urn schemes. Annals of Statistics, 353–355 (1973)
Google Scholar
Xu, J., Croft, W.B.: Improving the effectiveness of informational retrieval with Local Context Analysis. ACM Transactions on Information Systems 18, 79–112 (2000)
Article Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language-model based search engine for complex queries. In: Online Proceedings of the International Conference on Intelligence Analysis (2005)
Google Scholar
Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: SIGIR 1998, pp. 206–214. ACM, New York (1998)
Google Scholar
Voorhees, E.M.: Overview of the TREC 2004 robust track. In: TREC (2004)
Google Scholar
Harman, D., Buckley, C.: The NRRC Reliable Information Access (ria) workshop. In: SIGIR 2004, pp. 528–529. ACM, New York (2004)
Google Scholar
Buckley, C.: Why current IR engines fail. In: SIGIR 2004, pp. 584–585. ACM, New York (2004)
Google Scholar
Kwok, K.L., Grunfeld, L., Sun, H.L., Deng, P.: TREC 2004 robust track experiments using PIRCS. In: TREC (2004)
Google Scholar
Amati, G., Carpineto, C., Romano, G.: Fondazione Ugo Bordoni at TREC 2004. In: TREC (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

CNGL, School of Computing, Dublin City University, Ireland
Debasis Ganguly, Johannes Leveling & Gareth J. F. Jones

Authors

Debasis Ganguly
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Leveling
View author publications
You can also search for this author in PubMed Google Scholar
Gareth J. F. Jones
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Retrieval Facility, Donau City Str. 1, 1220, Vienna, Austria
Allan Hanbury
Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
Andreas Rauber
CWI, Science Park 123, 1098 XG, Amsterdam, The Netherlands
Arjen P. de Vries

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ganguly, D., Leveling, J., Jones, G.J.F. (2011). Query Expansion for Language Modeling Using Sentence Similarities. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds) Multidisciplinary Information Retrieval. IRFC 2011. Lecture Notes in Computer Science, vol 6653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21353-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-21353-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21352-6
Online ISBN: 978-3-642-21353-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics