Topics Inference by Weighted Mutual Information Measures Computed from Structured Corpus

Chang, Harry

doi:10.1007/978-3-642-22327-3_7

Harry Chang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1805 Accesses
1 Citations

Abstract

This paper proposes a novel topic inference framework that is built on the scalability and adaptability of mutual information (MI) techniques. The framework is designed to systematically construct a more robust language model (LM) for topic-oriented search terms in the domain of electronic programming guide (EPG) for broadcast TV programs. The topic inference system identifies the most relevant topics implied from a search term, based on a simplified MI-based classifier trained from a highly structured XML-based text corpus, which is derived from continuously updated EPG data feeds. The proposed framework is evaluated against a set of EPG-specific queries from a large user population collected from a real world web-based IR system. The MI-base topic inference system is able to achieve 98 percent accuracy in recall measurement and 82 percent accuracy in precision measurement on the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. J. Am. Soc. Inform. Science 41(6), 391–407 (1990)
Article Google Scholar
Bellegarda, J.: Latent Semantic Mapping. IEEE Signal Processing Magazine 22, 70–80 (2005)
Article Google Scholar
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent Semantic Indexing: A Probabilistic Analysis. In: Proc. 17th ACM Symp. Princeples Database Systems, pp. 159–168 (1998)
Google Scholar
Hofmann, T.: Probabilistic latent Semantic Analysis. Uncertainty in Artificial Intelligence (1999)
Google Scholar
Landauer, T.K., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104, 211–214 (1997)
Article Google Scholar
Recchia, G., Jones, M.N.: More Data Trumps Smarter Algorithms: Comparing Pointwise Mutual Information with Latent Semantic Analysis. Behavior Research Methods 41(3), 647–656 (2009)
Article Google Scholar
Budiu, R., Royer, C., Pirolli, P.L.: Modeling Information Scent: A Comparison of LSA, PMI, and GLAS Similarity Measure on Common Tests and Corpora. In: Proc. of the 8th Annual Conference of the Recherche d’Information Assistee Par Ordinateur (2005)
Google Scholar
Chang, H.M.: Conceptual Modeling of Online Entertainment Programming Guide for Natural Language Interface. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 188–195. Springer, Heidelberg (2010)
Chapter Google Scholar
Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proc. of the 22nd ACM Conference on Research and Development in Information Retrieval, pp. 222–229 (1999)
Google Scholar
Ramos, J.: Using TF-IDF to Determine Word Relevance in Document Queries. In: Proc. of the First Instructional Conference on Machine Learning (2003)
Google Scholar
Magerman, D.M., Marcus, M.P.: Parsing a Natural Language Using Mutual Information Statistics. In: Proc. of the 8th National Conference on Artificial Intelligence, pp. 984–989 (1990)
Google Scholar
Jang, M.-G., Myaeng, S.H., Park, S.Y.: Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting. In: Proc. of the 37th Annual Meeting of Association of Computational Linguistics, pp. 223–228 (1999)
Google Scholar
Peters, J.: Semantic Text Clusters and Word Classes – The Dualism of Mutual Information and Maximum Likelihood. In: Proc. of the Workshop on Language Modeling and Information Retrieval, pp. 55–59 (2001)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-Clustering. In: Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)
Google Scholar
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Labs – Research, Austin, TX, USA
Harry Chang

Authors

Harry Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Alicante, 03080, Alicante, Spain
Rafael Muñoz
Department of Software and Computing Systems, University of Alicante, Aptdo. de Correos 99, 03080, Alicante, Spain
Andrés Montoyo
CNAM- Laboratoire Cédric, 292 Rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, H. (2011). Topics Inference by Weighted Mutual Information Measures Computed from Structured Corpus. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-22327-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22326-6
Online ISBN: 978-3-642-22327-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics