Abstract
This paper proposes a novel topic inference framework that is built on the scalability and adaptability of mutual information (MI) techniques. The framework is designed to systematically construct a more robust language model (LM) for topic-oriented search terms in the domain of electronic programming guide (EPG) for broadcast TV programs. The topic inference system identifies the most relevant topics implied from a search term, based on a simplified MI-based classifier trained from a highly structured XML-based text corpus, which is derived from continuously updated EPG data feeds. The proposed framework is evaluated against a set of EPG-specific queries from a large user population collected from a real world web-based IR system. The MI-base topic inference system is able to achieve 98 percent accuracy in recall measurement and 82 percent accuracy in precision measurement on the test set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. J. Am. Soc. Inform. Science 41(6), 391–407 (1990)
Bellegarda, J.: Latent Semantic Mapping. IEEE Signal Processing Magazine 22, 70–80 (2005)
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent Semantic Indexing: A Probabilistic Analysis. In: Proc. 17th ACM Symp. Princeples Database Systems, pp. 159–168 (1998)
Hofmann, T.: Probabilistic latent Semantic Analysis. Uncertainty in Artificial Intelligence (1999)
Landauer, T.K., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104, 211–214 (1997)
Recchia, G., Jones, M.N.: More Data Trumps Smarter Algorithms: Comparing Pointwise Mutual Information with Latent Semantic Analysis. Behavior Research Methods 41(3), 647–656 (2009)
Budiu, R., Royer, C., Pirolli, P.L.: Modeling Information Scent: A Comparison of LSA, PMI, and GLAS Similarity Measure on Common Tests and Corpora. In: Proc. of the 8th Annual Conference of the Recherche d’Information Assistee Par Ordinateur (2005)
Chang, H.M.: Conceptual Modeling of Online Entertainment Programming Guide for Natural Language Interface. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 188–195. Springer, Heidelberg (2010)
Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proc. of the 22nd ACM Conference on Research and Development in Information Retrieval, pp. 222–229 (1999)
Ramos, J.: Using TF-IDF to Determine Word Relevance in Document Queries. In: Proc. of the First Instructional Conference on Machine Learning (2003)
Magerman, D.M., Marcus, M.P.: Parsing a Natural Language Using Mutual Information Statistics. In: Proc. of the 8th National Conference on Artificial Intelligence, pp. 984–989 (1990)
Jang, M.-G., Myaeng, S.H., Park, S.Y.: Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting. In: Proc. of the 37th Annual Meeting of Association of Computational Linguistics, pp. 223–228 (1999)
Peters, J.: Semantic Text Clusters and Word Classes – The Dualism of Mutual Information and Maximum Likelihood. In: Proc. of the Workshop on Language Modeling and Information Retrieval, pp. 55–59 (2001)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-Clustering. In: Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chang, H. (2011). Topics Inference by Weighted Mutual Information Measures Computed from Structured Corpus. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-22327-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22326-6
Online ISBN: 978-3-642-22327-3
eBook Packages: Computer ScienceComputer Science (R0)