Abstract
Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. These models are not very efficient when faced with ambiguous words and phrases because they are unable to incorporate contextual information. To overcome this limitation, we propose a novel Wikipedia-based semantic smoothing method that decomposes a document into a set of weighted Wikipedia concepts and then maps those unambiguous Wikipedia concepts into query terms. The mapping probabilities from each Wikipedia concept to individual terms are estimated through the EM algorithm. Document models based on Wikipedia concept mapping are then derived. The new smoothing method is evaluated on the TREC Ad Hoc Track (Disks 1, 2, and 3) collections. Experiments show significant improvements over the two-stage language model, as well as the language model with translation-based semantic smoothing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proc. 22nd Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 1999), pp. 222–229 (1999)
Cao, G., Nie, J.Y., Bai, J.: Integrating Word Relationships into Language Models. In: Proc. 28th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2005), pp. 298–305 (2005)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statistical Soc. 39, 1–38 (1977)
Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: International Joint Conference on Artificial Intelligence, Edinburgh, Scotland (2005)
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: National Conference on Artificial Intelligence (AAAI), Boston, Massachusetts (2006)
Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 6–12 (2007)
Jin, R., Hauptmann, A., Zhai, C.: Title Language Model for Information Retrieval. In: Proc. 25th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2002), pp. 42–48 (2002)
Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proc. 24th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2001), pp. 111–119 (2001)
Liu, X., Croft, W.B.: Cluster-Based Retrieval Using Language Models. In: Proc. 24th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2001), pp. 186–193 (2001)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the Twenty First ACM-SIGIR, Melbourne, Australia, pp. 275–281. ACM Press, New York (1998)
Wang, P., et al.: Improving Text Classification by Using Encyclopedia Knowledge. In: Seventh IEEE International Conference on Data Mining, pp. 332–341 (2007)
Wang, P., Domeniconi, C.: Building semantic kernels for text classification using Wikipedia. In: The 14th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2008 (2008)
Zhou, X., Hu, X., et al.: Topic Signature Language Models for Ad Hoc Retrieval. IEEE Transactions on Knowledge and Data Engineering 19(9), 1276–1287 (2007)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proc. 24th Ann. Int’l ACM Conf. Research and Development in Information Retrieval (SIGIR 2001), pp. 334–342 (2001)
Zhai, C., Lafferty, J.: Two-Stage Language Models for Information Retrieval. In: Proc. ACM Conf. Research and Development in Information Retrieval, SIGIR 2002 (2002)
Zhai, C., Lafferty, J.: Model-Based Feedback in the Language Modeling Approach to Information Retrieval. In: Proc. 10th Int’l Conf. Information and Knowledge Management (CIKM 2001), pp. 403–410 (2001)
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Dept. of Computer Science, University of Glasgow (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tu, X., He, T., Chen, L., Luo, J., Zhang, M. (2010). Wikipedia-Based Semantic Smoothing for the Language Modeling Approach to Information Retrieval. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-12275-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)