Skip to main content

Thesaurus-Based Feedback to Support Mixed Search and Browsing Environments

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4675))

Abstract

We propose and evaluate a query expansion mechanism that supports searching and browsing in collections of annotated documents. Based on generative language models, our feedback mechanism uses document-level annotations to bias the generation of expansion terms and to generate browsing suggestions in the form of concepts selected from a controlled vocabulary (as typically used in digital library settings). We provide a detailed formalization of our feedback mechanism and evaluate its effectiveness using the TREC 2006 Genomics track test set. As to the retrieval effectiveness, we find a 20% improvement in mean average precision over a query-likelihood baseline, whilst increasing precision at 10. When we base the parameter estimation and feedback generation of our algorithm on a large corpus, we also find an improvement over state-of-the-art relevance models. The browsing suggestions are assessed along two dimensions: relevancy and specifity. We present an account of per-topic results, which helps understand for what type of queries our feedback mechanism is particularly helpful.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Raghavan, H.: Using part-of-speech patterns to reduce query ambiguity. In: SIGIR 2002, pp. 307–314 (2002)

    Google Scholar 

  2. Cao, G., Nie, J.-Y., Bai, J.: Integrating word relationships into language models. In: SIGIR 2005, pp. 298–305 (2005)

    Google Scholar 

  3. Carmel, D., Yom-Tov, E., Darlow, A., Pelleg, D.: What makes a query difficult?. In: SIGIR 2006, pp. 390–397 (2006)

    Google Scholar 

  4. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: ACL, pp. 310–318 (1996)

    Google Scholar 

  5. Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: CIKM 2005, pp. 704–711 (2005)

    Google Scholar 

  6. Cooley, W., Lohnes, R.: Multivariate data analysis. Wiley, Chichester (1971)

    MATH  Google Scholar 

  7. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: SIGIR 2002, pp. 299–306 (2002)

    Google Scholar 

  8. Diaz, F., Metzler, D.: Improving the estimation of relevance models using large external corpora. In: SIGIR 2006, pp. 154–161 (2006)

    Google Scholar 

  9. Hersh, W., Cohen, A.M., Roberts, P., Rekapalli, H.K.: TREC 2006 Genomics track overview. In: TREC Notebook. NIST (2006)

    Google Scholar 

  10. Herskovic, J.R., Tanaka, L.Y., Hersh, W., Bernstam, E.V.: A Day in the Life of PubMed: Analysis of a Typical Day’s Query Log. J. Am Med. Inform. Assoc. 14(2), 212–220 (2007)

    Article  Google Scholar 

  11. Hiemstra, D.: A linguistically motivated probabilistic model of information retrieval. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 569–584. Springer, Heidelberg (1998)

    Google Scholar 

  12. Huang, X., Ming, Z., Si, L.: York University at TREC 2005 Genomics track. In: Proceedings of the 14th Text Retrieval Conference (2005)

    Google Scholar 

  13. Koch, T., Ardö, A., Golub, K.: Browsing and searching behavior in the renardus web service a study based on log analysis. In: JCDL 2004, pp. 378–378 (2004)

    Google Scholar 

  14. Kurland, O., Lee, L., Domshlak, C.: Better than the real thing?: Iterative pseudo-query processing using cluster-based language models. In: SIGIR 2005, pp. 19–26 (2005)

    Google Scholar 

  15. Lam-Adesina, A.M., Jones, G.J.F.: Applying summarization techniques for term selection in relevance feedback. In: SIGIR 2001, pp. 1–9 (2001)

    Google Scholar 

  16. Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127 (2001)

    Google Scholar 

  17. Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: SIGIR 1998, pp. 206–214 (1998)

    Google Scholar 

  18. Tan, K.F., Wing, M., Revell, N., Marsden, G., Baldwin, C., MacIntyre, R., Apps, A., Eason, K.D., Promfett, S.: Facts and myths of browsing and searching in a digital library. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 669–670. Springer, Heidelberg (1998)

    Google Scholar 

  19. Tao, T., Wang, X., Mei, Q., Zhai, C.: Accurate language model estimation with document expansion. In: CIKM 2005, pp. 273–274 (2005)

    Google Scholar 

  20. Voorhees, E.M.: Using wordnet to disambiguate word senses for text retrieval. In: SIGIR 1993, pp. 171–180 (1993)

    Google Scholar 

  21. Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: SIGIR 1996: Proceedings of the 19th ACM SIGIR conference, pp. 4–11 (1996)

    Google Scholar 

  22. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001, pp. 334–342 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

László Kovács Norbert Fuhr Carlo Meghini

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Meij, E., de Rijke, M. (2007). Thesaurus-Based Feedback to Support Mixed Search and Browsing Environments. In: Kovács, L., Fuhr, N., Meghini, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2007. Lecture Notes in Computer Science, vol 4675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74851-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74851-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74850-2

  • Online ISBN: 978-3-540-74851-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics