Advertisement

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora

  • Jeroen de Knijff
  • Kevin Meijer
  • Flavius Frasincar
  • Frederik Hogenboom
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6997)

Abstract

In this paper, we propose the Automatic Taxonomy Construction from Text (ATCT) framework for building taxonomies from text-based Web corpora. The framework is composed of multiple processing steps. Firstly, domain terms are extracted using a filtering method. Subsequently, Word Sense Disambiguation (WSD) is optionally applied in order to determine the senses of these terms. Then, by means of a subsumption technique, the resulting concepts are arranged in a hierarchy. We construct taxonomies with and without WSD and we investigate the effect of WSD on the quality of concept type-of relations using an evaluation framework that uses a golden taxonomy. We find that WSD improves the quality of the built taxonomy in terms of the taxonomic F-Measure.

Keywords

Domain Pertinence Word Sense Disambiguation Computational Linguistics Concept Hierarchy Link Open Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bechhofer, S., Miles, A.: SKOS Simple Knowledge Organization System Reference - W3C Recommendation, August 18 (2009), http://www.w3.org/TR/2009/REC-skos-reference-20090818/
  2. 2.
    Budanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental, Application-Oriented Evaluation of Five Measures. In: Workshop on WordNet and Other Lexical Resources, 2nd Meeting of the North American Chapter of the Assocation for Computational Linguistics (NAACL 2001), pp. 29–34. Association for Computational Linguistics (2001)Google Scholar
  3. 3.
    Cimiano, P., Hotho, A., Staab, S.: Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis. Journal of Artificial Intelligence Research 24(1), 305–339 (2005)zbMATHGoogle Scholar
  4. 4.
    Dellschaft, K., Staab, S.: On How to Perform a Gold Standard Based Evaluation of Ontology Learning. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5(2), 199–221 (1993)CrossRefGoogle Scholar
  6. 6.
    Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th Conf. on Computational Linguistics (COLING 1992), vol. 2, pp. 539–545 (1992)Google Scholar
  7. 7.
    Jian, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: 10th Republic of China Computational Linguistics Conf. on Research in Computational Linguistics, The Association for Compuational Linguistics and Chinese Language Processing (ROCLING 1997), pp. 19–33 (1997)Google Scholar
  8. 8.
    Klein, D., Manning, C.D.: Fast Exact Inference with a Factored Model for Natural Language Processing. In: 16th Annual Conf. on Neural Information Processing Systems (NIPS 2002). Advances in Neural Information Processing Systems, vol. 15, pp. 3–10. MIT Press, Cambridge (2002)Google Scholar
  9. 9.
    McBride, B.: Jena: Semantic Web Toolkit. IEEE Internet Computing 6(6), 55–59 (2002)CrossRefGoogle Scholar
  10. 10.
    Navigli, R., Lapata, M.: Graph Connectivity Measures for Unsupervised Word Sense Disambiguation. In: Veloso, M.M. (ed.) 20th Int. Joint Conf. on Artificial Intelligence (IJCAI 2007), pp. 1683–1688. AAAI Press, Menlo Park (2007)Google Scholar
  11. 11.
    Sanderson, M., Croft, B.: Deriving Concept Hierarchies from Text. In: 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 1999), pp. 206–213. ACM, New York (1999)CrossRefGoogle Scholar
  12. 12.
    Sclano, F., Velardi, P.: TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities. In: 7th Conf. on Terminology and Artificial Intelligence (TIA 2007). Presses Universitaires de Grenoble (2007)Google Scholar
  13. 13.
    Weber, N., Buitelaar, P.: Web-based Ontology Learning with ISOLDE. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 428–444. Springer, Heidelberg (2006), http://www.dfki.de/dfkibib/publications/docs/ISWC06.WebContentMining.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jeroen de Knijff
    • 1
  • Kevin Meijer
    • 1
  • Flavius Frasincar
    • 1
  • Frederik Hogenboom
    • 1
  1. 1.Erasmus University RotterdamRotterdamThe Netherlands

Personalised recommendations