Advertisement

Automatic Topic Identification Using Ontology Hierarchy

  • Sabrina Tiun
  • Rosni Abdullah
  • Tang Enya Kong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2004)

Abstract

This paper proposes a method of using ontology hierarchy in automatic topic identification. The fundamental idea behind this work is to exploit an ontology hierarchical structure in order to find a topic of a text. The keywords that are extracted from a given text will be mapped onto their corresponding concepts in the ontology. By optimizing the corresponding concepts, we will pick a single node among the concepts nodes that we believe is the topic of the target text. However, a limited vocabulary problem is encountered while mapping the keywords onto their corresponding concepts. This situation forces us to extend the ontology by enriching each of its concepts with new concepts using the external linguistics knowledge-base (WordNet). Our intuition of a high number keywords mapped onto the ontology concepts is that our topic identification technique can perform at its best.

Keywords

Word Sense Topic Identification Extraction Module Ontology Concept Node Concept 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Banerjee, S., Mittal, V. O.: On the Use of Linguistics Ontologies for Acessing Distributed Digital Libraries. Proceeding of the First Annual Conference on Theory and Practice of Digital Libraries (1994)Google Scholar
  2. 2.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced Hypertext Categorization Using Hyperlinks. ACM SIGMIND, Seattle, Washington (1998)Google Scholar
  3. 3.
    Chekuri, C., Goldwasser, M. H, Raghavan, P., Upfal, E.: Web Search Using Automated Classification. Poster at the Sixth International World Wide Web Conference (WWW6) (1997)Google Scholar
  4. 4.
    D’ Alessio, D., Murray, K., Schiaffino, R., Kreshenbaum, A.: Hierarchical Text Categorization. Proceeding RIAO2000 (2000)Google Scholar
  5. 5.
    D’ Alessio, D., Murray, K., Schiaffino, R., Kreshenbaum, A.: The effect of Topological Structure on Hierarchical Text Categorization. Proceeding of the Sixth Workshop on Very Large Corpora, COLLING ACL’ 98 (1998)Google Scholar
  6. 6.
    Gövert, N., Lalmas, M., Fuhr, N.: A Probabilistic Description-Oriented Approach for Categorizing Web Document. Proceeding of the Eighth International Conference on Information Knowledge Management, Kansas City, MO USA (1999) 475–482Google Scholar
  7. 7.
    Gelbukh, A., Sidorov, G., Guzman, A.: A Method of Describing Document Contents through Topic Selection. In Proc. of International Symposium on String Processing and Information Retrieval, Cancun, Mexico. Library of Congress 99-64139, IEEE Computer Society Press (1999)Google Scholar
  8. 8.
    Gelbukh, A., Sidorov, G., Guzman, A.: Use of a Weighted Topic Hierarchy for Document Classification. In Václav Matoušek et al (eds.): Text, Speech and Dialogue in Poc. 2nd International Workshop. Lecture Notes in Artificial Intelligence, No.92, ISBN 3-540-66494-7, Springer-Verlag., Czech Republic (1999) 130–135Google Scholar
  9. 9.
    Gelbukh, A., Sidorov, G., Guzman, A.,: Text Categorization Using a Hierarchical Topic Dictionary. Proc. Text Mining Workshop at 16th International Joint Conference on Artificial Intelligence (IJCAI’99), Stockholm, Sweden (1999)Google Scholar
  10. 10.
    Greiner, R., Grove, A, Schuurmans, D.: On learning hierarchical Classifications (1997)Google Scholar
  11. 11.
    Grobelnik, M., Mladenic, D.: Fast Categorization. In Proceedings of Third International Conference on Knowledge Discovery Data Mining (1998)Google Scholar
  12. 12.
    Guzman, A.: Finding the Main Themes in a Spanish Document. Journal Expert Systems with Application (1998) 139–148Google Scholar
  13. 13.
    Hoenkamp, E.: Spotting Ontological Lacunae through Spectrum Analysis Of Retrieved Documents. 13th European Conference On Artificial Intelligent, ECAI98, Brighton, England (1998)Google Scholar
  14. 14.
    Koller, D., Sahami, M.: Hierarchically Classifying Documents Using Very Few Words. In the Proceeding of Machine Learning (ICML-97) (1997) 170–176Google Scholar
  15. 15.
    Lee, J. Shin, D.: Multilevel Automatic Categorization for Webpages. The INET Proceeding’ 98 (1998)Google Scholar
  16. 16.
    Lin, C. Y, Hovy, E.: Identifying Topics by Position. In the Proceeding of The Workshop of Intelligent Scalable Text Summarization’ 97 (1997)Google Scholar
  17. 17.
    Lin, C. Y: Knowledge-based Automatic Topic Identification. In the Proceeding of The 33rd Annual Meeting of the Association for Computational Linguistics’ 95 (1995)Google Scholar
  18. 18.
    McCallum, A., Rosenfeld, R., Mitchell, T., Ng, Y.A.: Improving Text Classification by Shrinkage in a Hierarchy of Classes. Proceeding of the 15th Conference on Machine Learning (ICML-98) (1998)Google Scholar
  19. 19.
    Miller, G. A, Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An-Online Lexical Database. Five Papers on WordNet (1993)Google Scholar
  20. 20.
    Quek, C. Y, Mitchell, T: Classification of World Wide Web Documents. Seniors Honors Thesis, School of Computer Science, Carnegie Melon University (1998)Google Scholar
  21. 21.
    Scott, S., Matwin, S.: Text Classification using WordNet Hypernyms. In the Proceeding of Workshop-Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)Google Scholar
  22. 22.
    Sense Tagger. UTMK Internal Paper. Universiti Sains Malaysia, Penang, Malaysia (1999)Google Scholar
  23. 23.
    Soderland, S.: Learning to extract text-based information from World Wide Web. In the Proceeding of the Third International Conference on Knowledge Discovery and Data-Mining (1997)Google Scholar
  24. 24.
    Voorhees, E. M.: On Expanding Query Vectors with Lexically Related Words. Proceeding of the Second Text REtrieval Conference (TREC-2), NIST Special Publication, Gatherburg, Maryland} (1993)Google Scholar
  25. 25.
    Weigned, A. S, Wiener, E. D, Pedersen, J. O.: Working Papers IS-98-22. Dept. of Info. System, Leonard N. Stern, School Of Business, New York University (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Sabrina Tiun
    • 1
  • Rosni Abdullah
    • 2
  • Tang Enya Kong
    • 1
  1. 1.UTMK, P.Pengajian Sains KomputerUniversiti Sains MalaysiaPulau PinangMalaysia
  2. 2.Pusat Pengajian Sains KomputerUniversiti Sains MalaysiaPulau PinangMalaysia

Personalised recommendations