Unsupervised Hidden Topic Framework for Extracting Keywords (Synonym, Homonym, Hyponymy and Polysemy) and Topics in Meeting Transcripts

  • J. I. Sheeba
  • K. Vivekanandan
  • G. Sabitha
  • P. Padmavathi
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 177)

Abstract

Keyword is the important item in the document that provides efficient access to the content of a document. It can be used to search for information or to decide whether to read a document. This paper mainly focuses on extracting hidden topics from meeting transcripts. Existing system is handled with web documents, but this proposed framework focuses on solving Synonym, Homonym, Hyponymy and Polysemy problems in meeting transcripts. Synonym problem means different words having similar meaning are grouped and single keyword is extracted. Hyponymy problem means one word denoting subclass is considered and super class keyword is extracted. Homonym means a word can have two or more different meanings. For example, Left might appear in two different contexts: Car left (past tense of leave) and Left side (Opposite of right). A polysemy means word with different, but related senses. For example, count has different related meanings: to say number in right order, to calculate. Hidden topics from meeting transcripts can be found using LDA model. Finally MaxEnt classifier is used for extracting keywords and topics which will be used for information retrieval.

Keywords

Keyword Meeting transcripts LDA MaxEnt Synonym Homonym Polysemy Hyponymy 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Liu, F., Pennell, D., Liu, F.: Unsupervised Approaches for Automatic keyword extraction, Boulder, Colorado. ACM (June 2009)Google Scholar
  2. 2.
    Phan, X.-H., Nguyen, C.-T., Le, D.-T., Nguyen, L.-M.: A Hidden Topic-Based Framework toward Building Applications with Short Web Documents. IEEE Transactions on Knowledge and Data Engineering 23 (2011)Google Scholar
  3. 3.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge Univ. Press, Springer (2008)Google Scholar
  4. 4.
    Deerwester, S., Furnas, G., Landauer, T.: Indexing by Latent Semantic Analysis. J. Am. Soc. for Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  5. 5.
    Letsche, T.A., Berry, M.W.: Large-Scale Information Retrieval with Latent Semantic Indexing. Information Science 100(1-4), 105–137 (1997)CrossRefGoogle Scholar
  6. 6.
    Baker, L., McCallum, A.: Distributional Clustering of Words for Text Classification. In: Proc. ACM SIGIR (1998)Google Scholar
  7. 7.
    Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional Word Clusters vs. Words for Text Categorization. Machine Learning Research 3, 1183–1208 (2003)MATHGoogle Scholar
  8. 8.
    Dhillon, I., Modha, D.: Concept Decompositions for Large Sparse Text Data Using Clustering. Machine Learning 42(1/2), 143–175 (2001)MATHCrossRefGoogle Scholar
  9. 9.
    Metzler, D., Dumais, S., Meek, C.: Similarity Measures for Short Segments of Text. In: Proc. 29th European Conference IR Research, ECIR 2007. ACM (2007)Google Scholar
  10. 10.
    Yih, W., Meek, C.: Improving Similarity Measures for Short Segments of Text. In: Proc. 22nd National Conference on Artificial Intelligence, AAAI (2007)Google Scholar
  11. 11.
    Sahami, M., Heilman, T.: A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets. In: Proc. 15th International Conference on World Wide Web. ACM (2006)Google Scholar
  12. 12.
    Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness Using Wikipedia-Based Explicit Semantic Analysis. In: Proc. 20th Int’l Joint Conference, Artificial Intelligence (2007)Google Scholar
  13. 13.
    Cai, L., Hofmann, T.: Text Categorization by Boosting Automatically Extracted Concepts. In: Proc. ACM SIGIR (2003)Google Scholar
  14. 14.
    Cai, J., Lee, W., The, Y.: Improving WSD Using Topic Features. In: Proc. Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLPCoNLL, Prague, pp. 1015–1023 (June 2007)Google Scholar
  15. 15.
    Term frequency-inverse document frequency, http://www.wikipedia.com/
  16. 16.
  17. 17.
  18. 18.
  19. 19.
    Gibb Sampling Algorithm, http://www.wikipedia.com/

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • J. I. Sheeba
    • 1
  • K. Vivekanandan
    • 1
  • G. Sabitha
    • 1
  • P. Padmavathi
    • 1
  1. 1.Department of Computer Science & EngineeringPondicherry Engineering CollegePuducherryIndia

Personalised recommendations