Skip to main content

Conceptual Clustering of Documents for Automatic Ontology Generation

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7888)

Abstract

In Information retrieval, Keyword based retrieval is unsatisfactory for user needs since it can’t always retrieve relevant words according to the concept. Since different words can represent the same concept (polysemy) and one word can represent different concepts (homonymy), mapping problem will lead to word sense Disambiguation. Through the implementation of domain dependent ontology, concept based information retrieval (IR) can be achieved. Since Semantic concept extraction from keywords is the initial phase for automatic construction of ontology process, this paper propose an effective method for it. Reuters21578 is used as the input of this process, followed by indexing, training and clustering using self-Organizing Map. Based on the feature vector, the clustering of documents are formed using automatic concept selections, in order to make the hierarchy. Clusters are represented hierarchically based on the topics assigned .Ontology will be generated automatically for each cluster, based on the topic assigned.

Keywords

  • homonymy
  • polysemy
  • Information retrieval
  • indexing
  • feature vector
  • Self-Organizing Map
  • Clustering

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-38786-9_27
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-38786-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   74.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bedini, I., Nguyen, B.: Automatic Ontology Generation: State of the Art. Journal of Molecular Evolution 44(2), 226–233, 02 (1997, 2005)

    Google Scholar 

  2. Reshmy, K., Hussain, A., Sherimon P.C.: Retrieval of Semantic Concepts Based on Analysis of Texts for Automatic Construction of Ontology. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012, Part I. LNCS, vol. 7663, pp. 524–532. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  3. Lin., C.-Y.I., Ho, C.-S.: An Ontology-Based Approach to Acquiring Domain Knowledge for Requirement Analysis. In: Proc. Natl. Sci, Counc. ROC (A), vol. 24(1), pp. 44–60 (2000)

    Google Scholar 

  4. Bohring, H., Auer, S.: Mapping XML to OWL Ontologies. In: 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 6, pp. V6-517 – V6-519 (2010)

    Google Scholar 

  5. Reshmy, K., Hussain, A., Sherimon P.C.: Automatic ontology construction of unstructured documents using semantic clustering: Applied Ontology (communicated, 2013)

    Google Scholar 

  6. Thomas, M., Hussain, A.: Novel logistic regression models to aid the diagnosis of dementia. (Elsevier) Expert Systems with Applications 39(3), 3356–3361 (2012)

    Google Scholar 

  7. Bedini, I., Nguyen, B., Gardarin, G.: B2B Automatic Taxonomy Construction. In: International Conference on Enterprise Information systems, ICEIS 2008, pp. 325–330 (2008)

    Google Scholar 

  8. Guarino, N., Masolo, C., Vetere, G.: OntoSeek: Content-based Access to the Web. IEEE Intelligent Systems 14(3), 70–80 (1999)

    CrossRef  Google Scholar 

  9. Khan, L.: Ontology-based Information Selection, Ph.D. Thesis, University of South California (2000)

    Google Scholar 

  10. Smeaton, F., Rijsbergen, V.: The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System. The Computer Journal 26(3), 239–246 (1993)

    CrossRef  Google Scholar 

  11. Woods, W.: Conceptual Indexing: A Better Way to Organize Knowledge. Technical Report of Sun Microsystems (1999)

    Google Scholar 

  12. Khan, L., McLeod, D.: Audio Structuring and Personalized Retrieval Using Ontology. In: Proc. of IEEE Advances in Digital Libraries, Library of Congress, Bethesda, MD, pp. 116–126 (May 2000)

    Google Scholar 

  13. Khan, L., McLeod, D.: Disambiguation of Annotated Text of Audio Using Ontology. In: Proc. of ACM SIGKDD Workshop on Text Mining, Boston, MA (August 2000)

    Google Scholar 

  14. Elliman, D., Pulido, J.R.G.: Automatic Derivation of On-line Document Ontology. In: 15th European Conference on Object Oriented Programming, MERIT 2001, Budapest, Hungary (June 2001)

    Google Scholar 

  15. Hotho, A., Mädche, A., Staab, S.: Ontology-based Text Clustering. In: Workshop Text Learning: Beyond Supervision (2001)

    Google Scholar 

  16. Myat, N.N., Hla, K.H.S.: A combined approach of formal concept analysis and text mining for concept based document clustering. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, September 19-22, pp. 330–333 (2005)

    Google Scholar 

  17. Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by Computer, Reading, and Mass. Addison-Wesley, Wokingham (1988)

    Google Scholar 

  18. Kaski, S., et al.: Creating an order in Digital Libraries with self-organizing Map. In: Proc. WCNN 1996 World Congress on Neural Networks, pp. 814–817. Lawrence Erlbann and INNS Press, Mahwah (1996)

    Google Scholar 

  19. Freeman, R., Yin, H., Allinson, N.M.: Self-Organizing Maps for Tree View Based Hierarchical Document Clustering. In: Proceedings of the IEEE IJCNN 2002, Honolulu, Hawaii, May 12-18, vol. 2, pp. 1906–1911 (2002)

    Google Scholar 

  20. Mehotra, et al.: Self-Organizing Maps, Elements of Artificial Neural Networks, p. 189. MIT Press (1997)

    Google Scholar 

  21. Khan, L., Luo, F.: Ontology Construction for Information Selection. In: 14th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2002), p. 122 (2002)

    Google Scholar 

  22. http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt

  23. http://www.ehow.com/facts_5676704_file-extension-sgm_.html

  24. http://en.wikipedia.org/wiki/Stop_words

  25. http://en.wikipedia.org/wiki/Stemming

  26. http://www.ibm.com/developerworks/library/wa-lucene/

  27. http://en.wikipedia.org/wiki/Features_pattern_recognition

  28. http://en.wikipedia.org/wiki/Feature_vector

  29. http://en.wikipedia.org/wiki/Tf%E2%80%93idf

  30. http://en.wikipedia.org/wiki/Semantic_similarity

  31. Mehotra, et al.: Self-Organizing Maps, Elemets of Artificial Neural Networks, p. 189. MIT Press (1997)

    Google Scholar 

  32. Biébow, B., Szulman, S.: TERMINAE: A linguistics-based tool for the building of a domain ontology. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 49–66. Springer, Heidelberg (1999)

    CrossRef  Google Scholar 

  33. Lonsdale, D., Ding, Y., Embley, D., Melby, A.: Peppering knowledge sources with SALT: Boosting conceptual content for ontology generation (2002)

    Google Scholar 

  34. Dahaba, M.Y., Hassanb, H.A., Rafea, A.: TextOntoEx: Automatic ontology construction from natural English text Expert systems with applications, pp. 1474–1480 (February 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Krishnan, R., Hussain, A., P.C., S. (2013). Conceptual Clustering of Documents for Automatic Ontology Generation. In: Liu, D., Alippi, C., Zhao, D., Hussain, A. (eds) Advances in Brain Inspired Cognitive Systems. BICS 2013. Lecture Notes in Computer Science(), vol 7888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38786-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38786-9_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38785-2

  • Online ISBN: 978-3-642-38786-9

  • eBook Packages: Computer ScienceComputer Science (R0)