Skip to main content

Topics Inference by Weighted Mutual Information Measures Computed from Structured Corpus

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

Abstract

This paper proposes a novel topic inference framework that is built on the scalability and adaptability of mutual information (MI) techniques. The framework is designed to systematically construct a more robust language model (LM) for topic-oriented search terms in the domain of electronic programming guide (EPG) for broadcast TV programs. The topic inference system identifies the most relevant topics implied from a search term, based on a simplified MI-based classifier trained from a highly structured XML-based text corpus, which is derived from continuously updated EPG data feeds. The proposed framework is evaluated against a set of EPG-specific queries from a large user population collected from a real world web-based IR system. The MI-base topic inference system is able to achieve 98 percent accuracy in recall measurement and 82 percent accuracy in precision measurement on the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. J. Am. Soc. Inform. Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  2. Bellegarda, J.: Latent Semantic Mapping. IEEE Signal Processing Magazine 22, 70–80 (2005)

    Article  Google Scholar 

  3. Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent Semantic Indexing: A Probabilistic Analysis. In: Proc. 17th ACM Symp. Princeples Database Systems, pp. 159–168 (1998)

    Google Scholar 

  4. Hofmann, T.: Probabilistic latent Semantic Analysis. Uncertainty in Artificial Intelligence (1999)

    Google Scholar 

  5. Landauer, T.K., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104, 211–214 (1997)

    Article  Google Scholar 

  6. Recchia, G., Jones, M.N.: More Data Trumps Smarter Algorithms: Comparing Pointwise Mutual Information with Latent Semantic Analysis. Behavior Research Methods 41(3), 647–656 (2009)

    Article  Google Scholar 

  7. Budiu, R., Royer, C., Pirolli, P.L.: Modeling Information Scent: A Comparison of LSA, PMI, and GLAS Similarity Measure on Common Tests and Corpora. In: Proc. of the 8th Annual Conference of the Recherche d’Information Assistee Par Ordinateur (2005)

    Google Scholar 

  8. Chang, H.M.: Conceptual Modeling of Online Entertainment Programming Guide for Natural Language Interface. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 188–195. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proc. of the 22nd ACM Conference on Research and Development in Information Retrieval, pp. 222–229 (1999)

    Google Scholar 

  10. Ramos, J.: Using TF-IDF to Determine Word Relevance in Document Queries. In: Proc. of the First Instructional Conference on Machine Learning (2003)

    Google Scholar 

  11. Magerman, D.M., Marcus, M.P.: Parsing a Natural Language Using Mutual Information Statistics. In: Proc. of the 8th National Conference on Artificial Intelligence, pp. 984–989 (1990)

    Google Scholar 

  12. Jang, M.-G., Myaeng, S.H., Park, S.Y.: Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting. In: Proc. of the 37th Annual Meeting of Association of Computational Linguistics, pp. 223–228 (1999)

    Google Scholar 

  13. Peters, J.: Semantic Text Clusters and Word Classes – The Dualism of Mutual Information and Maximum Likelihood. In: Proc. of the Workshop on Language Modeling and Information Retrieval, pp. 55–59 (2001)

    Google Scholar 

  14. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-Clustering. In: Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)

    Google Scholar 

  15. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chang, H. (2011). Topics Inference by Weighted Mutual Information Measures Computed from Structured Corpus. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22327-3_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22326-6

  • Online ISBN: 978-3-642-22327-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics