Unsupervised Domain Ontology Learning from Text

  • Sree Harissh Venu
  • Vignesh Mohan
  • Kodaikkaavirinaadan Urkalan
  • Geetha T.V.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10089)

Abstract

Construction of Ontology is indispensable with rapid increase in textual information. Much research in learning Ontology are supervised and require manually annotated resources. Also, quality of Ontology is dependent on quality of corpus which may not be readily available. To tackle these problems, we present an iterative focused web crawler for building corpus and an unsupervised framework for construction of Domain Ontology. The proposed framework consists of five phases, Corpus Collection using Iterative Focused crawling with novel weighting measure, Term Extraction using HITS algorithm, Taxonomic Relation Extraction using Hearst and Morpho-Syntactic Patterns, Non Taxonomic relation extraction using association rule mining and Domain Ontology Building. Evaluation results show that proposed crawler outweighs traditional crawling techniques, domain terms showed higher precision when compared to statistical techniques and learnt ontology has rich knowledge representation.

Keywords

Iterative Focused Crawling Domain Ontology Domain terms extraction Taxonomy Non Taxonomy 

References

  1. 1.
    Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)Google Scholar
  2. 2.
    De Knijff, J., Frasincar, F., Hogenboom, F.: Domain taxonomy learning from text: the subsumption method versus hierarchical clustering. Data Knowl. Eng. 83, 54–69 (2013)CrossRefGoogle Scholar
  3. 3.
    Drymonas, E., Zervanou, K., Petrakis, E.G.M.: Unsupervised ontology acquisition from plain texts: the OntoGain system. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 277–287. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13881-2_29 CrossRefGoogle Scholar
  4. 4.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics (1992)Google Scholar
  5. 5.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Liu, L., Peng, T., Zuo, W.: Topical web crawling for domain-specific resource discovery enhanced by selectively using link-context. Proc. Int. Arab J. Inf. Technol. 12(2), 196–204 (2015)Google Scholar
  7. 7.
    Lopez, V., Pasin, M., Motta, E.: AquaLog: an ontology-portable question answering system for the semantic web. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 546–562. Springer, Heidelberg (2005). doi:10.1007/11431053_37 CrossRefGoogle Scholar
  8. 8.
    Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Yet another ranking function for automatic multiword term extraction. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 52–64. Springer, Cham (2014). doi:10.1007/978-3-319-10888-9_6 Google Scholar
  9. 9.
    Meijer, K., Frasincar, F., Hogenboom, F.: A semantic approach for extracting domain taxonomies from text. Decis. Support Syst. 62, 78–93 (2014)CrossRefGoogle Scholar
  10. 10.
    Mukherjee, S., Ajmera, J., Joshi, S.: Domain cartridge: unsupervised framework for shallow domain ontology construction from corpus. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 929–938. ACM (2014)Google Scholar
  11. 11.
    Nabila, N., Mamat, A., Azmi-Murad, M., Mustapha, N.: Enriching non-taxonomic relations extracted from domain texts. In: 2011 International Conference on Semantic Technology and Information Retrieval, pp. 99–105. IEEE (2011)Google Scholar
  12. 12.
    Ochoa, J.L., Almela, Á., Hernández-Alcaraz, M.L., Valencia-García, R.: Learning morphosyntactic patterns for multiword term extraction. Sci. Res. Essays 6(26), 5563–5578 (2011)Google Scholar
  13. 13.
    Rusu, D., Dali, L., Fortuna, B., Grobelnik, M., Mladenic, D.: Triplet extraction from sentences. In: Proceedings of the 10th International Multiconference Information Society-IS, pp. 8–12 (2007)Google Scholar
  14. 14.
    Serra, I., Girardi, R.: A process for extracting non-taxonomic relationships of ontologies from text (2011)Google Scholar
  15. 15.
    Gangly, B., Sheikh, R.: A review of focused web crawling strategies. Int. J. Adv. Comput. Res. 2(4) (2012)Google Scholar
  16. 16.
    Shue, L.Y., Chen, C.W., Shiue, W.: The development of an ontology-based expert system for corporate financial rating. Expert Syst. Appl. 36(2), 2130–2142 (2009)CrossRefGoogle Scholar
  17. 17.
    Srikant, R., Agrawal, R.: Mining generalized association rules. IBM Research Division (1995)Google Scholar
  18. 18.
    Sure, Y., Staab, S., Studer, R.: Ontology engineering methodology. In: Staab, R., Studer, R. (eds.) Handbook on Ontologies. International Handbooks on Information Systems, pp. 135–152. Springer, Heidelberg (2009). doi:10.1007/978-3-540-92673-3_6 CrossRefGoogle Scholar
  19. 19.
    Tartir, S., Arpinar, I.B., Moore, M., Sheth, A.P., Aleman-Meza, B.: Ontoqa: metric-based ontology quality analysis (2005)Google Scholar
  20. 20.
    Thenmalar, S., Geetha, T.: The modified concept based focused crawling using ontology. J. Web Eng. 13(5–6), 525–538 (2014)Google Scholar
  21. 21.
    Uzun, Y.: Keyword extraction using naïve bayes. Bilkent University, Department of Computer Science, Turkey (2005). www.cs.bilkent.edu.tr/~guvenir/courses/CS550/Workshop/Yasin_Uzun.pdf
  22. 22.
    Zhang, Y., Vasconcelos, W., Sleeman, D.: Ontosearch: an ontology search engine. In: Bramer, M., Coenen, F., Allen, T. (eds.) Research and Development in Intelligent Systems XXI. Springer, London (2005). doi:10.1007/1-84628-102-4_5 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Sree Harissh Venu
    • 1
  • Vignesh Mohan
    • 1
  • Kodaikkaavirinaadan Urkalan
    • 1
  • Geetha T.V.
    • 1
  1. 1.Department of Computer Science and EngineeringCollege of EngineeringGuindyIndia

Personalised recommendations