Abstract
Automatic query expansion techniques are widely applied for improving text retrieval performance, using a variety of approaches that exploit several data sources for finding expansion terms. Selecting expansion terms is challenging and requires a framework capable of extracting term relationships. Recently, several Natural Language Processing methods, based on Deep Learning, are proposed for learning high quality vector representations of terms from a large amount of unstructured text with billions of words. These high quality vector representations capture a large number of term relationships. In this paper, we experimentally compare several expansion methods with expansion using these term vector representations. We use language models for information retrieval to evaluate expansion methods. Experiments conducted on four CLEF collections show a statistically significant improvement over the language models and other expansion models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A real-valued vector of a predefined dimension, 600 dimensions for exemple.
- 2.
References
Bengio, Y., Schwenk, H., Sencal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol. 194, pp. 137–186. Springer, Heidelberg (2006)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)
Jiani, H., Deng, W., Guo, J.: Improving retrieval performance by global analysis. In: ICPR 2006, pp. 703–706 (2006)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM, New York (2001)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR (2013)
Peat, H.J., Willett, P.: The limitations of term co-occurrence data for query expansion in document retrieval systems. J. Am. Soc. Inf. Sci. 42(5), 378–383 (1991)
Serizawa, M., Kobayashi, I.: A study on query expansion based on topic distributions of retrieved documents. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 369–379. Springer, Heidelberg (2013)
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007. ACM (2007)
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A language model-based search engine for complex queries. In: Proceedings of the International Conference on Intelligence Analysis (2004)
Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: ICSC, pp. 9–15 (2010)
Yang, X., Jones, G.J.F., Wang, B.: Query dependent pseudo-relevance feedback based on wikipedia. In: SIGIR 2009, Boston, MA, USA, pp. 59–66 (2009)
Zhang, J., Deng, B., Li, X.: Concept based query expansion using wordnet. In: AST 2009, pp. 52–55. IEEE Computer Society (2009)
Zhu, W., Xuheng, X., Xiaohua, H., Song, I.-Y., Allen, R.B.: Using UMLS-based re-weighting terms as a query expansion strategy. In: 2006 IEEE International Conference on Granular Computing, pp. 217–222, May 2006
Acknowledgements
This work was conducted as a part of the CHIST-ERA CAMOMILE project, which was funded by the ANR (Agence Nationale de la Recherche, France).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
ALMasri, M., Berrut, C., Chevallet, JP. (2016). A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_57
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_57
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)