Advertisement

Topic Selection of Web Documents Using Specific Domain Ontology

  • Hyunjang Kong
  • Myunggwon Hwang
  • Gwangsu Hwang
  • Jaehong Shim
  • Pankoo Kim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)

Abstract

This paper proposes a topic selection method for web documents using ontology hierarchy. The idea of this approach is to utilize the ontology structure in order to determine a topic in a web document. In this paper, we propose an approach for improving the performance of document clustering as we select the topic efficiently based on domain ontology. We preprocess the web documents for keywords extraction using Term Frequency formula and we build domain ontology as we branch off the partial hierarchy from WordNet using an automatic domain ontology building tool in preprocessing step. And we select a topic for the web documents based on domain ontology structure. Finally we realized that our approach contributes the efficient document clustering.

Keywords

Mapping Module Term Frequency Domain Ontology Domain Concept Topic Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chekuri, C., Goldwasser, M.H., Raghavan, P., Upfal, E.: Web Search Using Automated Classification. In: Poster at the Sixth International World Wide Web Conference (WWW6) (1997)Google Scholar
  2. 2.
    Gelbukh, A., Sidorov, G., Guzman, A.: Use of a Weighted Topic Hierarchy for Document Classification. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 130–135. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  3. 3.
    Gövert, N., Lalmas, M., Fuhr, N.: A Probabilistic Description-Oriented Approach for Categorizing Web Document. In: Proceeding of the Eighth International Conference on Information Knowledge Management, Kansas City, MO USA, pp. 475–482 (1999)Google Scholar
  4. 4.
    Greiner, R., Grove, A., Schuurmans, D.: On learning hierarchical Classifications (1997)Google Scholar
  5. 5.
    Grobelnik, M., Mladenic, D.: Fast Categorization. In: Proceedings of Third International Conference on Knowledge Discovery Data Mining (1998)Google Scholar
  6. 6.
    Koller, D., Sahami, M.: Hierarchically Classifying Documents Using Very Few Words. In: The Proceeding of Machine Learning (ICML 1997), pp. 170–176 (1997)Google Scholar
  7. 7.
    Lee, J., Shin, D.: Multilevel Automatic Categorization for Webpages. In: The INET Proceeding 1998 (1998)Google Scholar
  8. 8.
    Lin, C.Y., Hovy, E.: Identifying Topics by Position. In: The Proceeding of The Workshop of Intelligent Scalable Text Summarization 1997 (1997)Google Scholar
  9. 9.
    Lin, C.Y.: Knowledge-based Automatic Topic Identification. In: The Proceeding of The 33rd Annual Meeting of the Association for Computational Linguistics 1995 (1995)Google Scholar
  10. 10.
    McCallum, A., Rosenfeld, R., Mitchell, T., Ng, Y.A.: Improving Text Classification by Shrinkage in a Hierarchy of Classes. In: Proceeding of the 15th Conference on Machine Learning (ICML-1998) (1998)Google Scholar
  11. 11.
    Quek, C.Y., Mitchell, T.: Classification of World Wide Web Documents. Seniors Honors Thesis, School of Computer Science, Carnegie Melon University (1998)Google Scholar
  12. 12.
    Scott, S., Matwin, S.: Text Classification using WordNet Hypernyms. In: The Proceeding of Workshop – Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hyunjang Kong
    • 1
  • Myunggwon Hwang
    • 1
  • Gwangsu Hwang
    • 1
  • Jaehong Shim
    • 1
  • Pankoo Kim
    • 2
  1. 1.Dept. of Computer EngineeringChosun UniversityGwangjuSouth Korea
  2. 2.Corresponding Author, Dept. of Computer EngineeringChosun University 

Personalised recommendations