Abstract
We propose and evaluate a query expansion mechanism that supports searching and browsing in collections of annotated documents. Based on generative language models, our feedback mechanism uses document-level annotations to bias the generation of expansion terms and to generate browsing suggestions in the form of concepts selected from a controlled vocabulary (as typically used in digital library settings). We provide a detailed formalization of our feedback mechanism and evaluate its effectiveness using the TREC 2006 Genomics track test set. As to the retrieval effectiveness, we find a 20% improvement in mean average precision over a query-likelihood baseline, whilst increasing precision at 10. When we base the parameter estimation and feedback generation of our algorithm on a large corpus, we also find an improvement over state-of-the-art relevance models. The browsing suggestions are assessed along two dimensions: relevancy and specifity. We present an account of per-topic results, which helps understand for what type of queries our feedback mechanism is particularly helpful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J., Raghavan, H.: Using part-of-speech patterns to reduce query ambiguity. In: SIGIR 2002, pp. 307–314 (2002)
Cao, G., Nie, J.-Y., Bai, J.: Integrating word relationships into language models. In: SIGIR 2005, pp. 298–305 (2005)
Carmel, D., Yom-Tov, E., Darlow, A., Pelleg, D.: What makes a query difficult?. In: SIGIR 2006, pp. 390–397 (2006)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: ACL, pp. 310–318 (1996)
Collins-Thompson, K., Callan, J.: Query expansion using random walk models. In: CIKM 2005, pp. 704–711 (2005)
Cooley, W., Lohnes, R.: Multivariate data analysis. Wiley, Chichester (1971)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: SIGIR 2002, pp. 299–306 (2002)
Diaz, F., Metzler, D.: Improving the estimation of relevance models using large external corpora. In: SIGIR 2006, pp. 154–161 (2006)
Hersh, W., Cohen, A.M., Roberts, P., Rekapalli, H.K.: TREC 2006 Genomics track overview. In: TREC Notebook. NIST (2006)
Herskovic, J.R., Tanaka, L.Y., Hersh, W., Bernstam, E.V.: A Day in the Life of PubMed: Analysis of a Typical Day’s Query Log. J. Am Med. Inform. Assoc. 14(2), 212–220 (2007)
Hiemstra, D.: A linguistically motivated probabilistic model of information retrieval. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 569–584. Springer, Heidelberg (1998)
Huang, X., Ming, Z., Si, L.: York University at TREC 2005 Genomics track. In: Proceedings of the 14th Text Retrieval Conference (2005)
Koch, T., Ardö, A., Golub, K.: Browsing and searching behavior in the renardus web service a study based on log analysis. In: JCDL 2004, pp. 378–378 (2004)
Kurland, O., Lee, L., Domshlak, C.: Better than the real thing?: Iterative pseudo-query processing using cluster-based language models. In: SIGIR 2005, pp. 19–26 (2005)
Lam-Adesina, A.M., Jones, G.J.F.: Applying summarization techniques for term selection in relevance feedback. In: SIGIR 2001, pp. 1–9 (2001)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127 (2001)
Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: SIGIR 1998, pp. 206–214 (1998)
Tan, K.F., Wing, M., Revell, N., Marsden, G., Baldwin, C., MacIntyre, R., Apps, A., Eason, K.D., Promfett, S.: Facts and myths of browsing and searching in a digital library. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 669–670. Springer, Heidelberg (1998)
Tao, T., Wang, X., Mei, Q., Zhai, C.: Accurate language model estimation with document expansion. In: CIKM 2005, pp. 273–274 (2005)
Voorhees, E.M.: Using wordnet to disambiguate word senses for text retrieval. In: SIGIR 1993, pp. 171–180 (1993)
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: SIGIR 1996: Proceedings of the 19th ACM SIGIR conference, pp. 4–11 (1996)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR 2001, pp. 334–342 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Meij, E., de Rijke, M. (2007). Thesaurus-Based Feedback to Support Mixed Search and Browsing Environments. In: Kovács, L., Fuhr, N., Meghini, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2007. Lecture Notes in Computer Science, vol 4675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74851-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-74851-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74850-2
Online ISBN: 978-3-540-74851-9
eBook Packages: Computer ScienceComputer Science (R0)