Automatically Enriching a Thesaurus with Information from Dictionaries

  • Hugo Gonçalo Oliveira
  • Paulo Gomes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7026)

Abstract

Regarding that information in broad-coverage knowledge bases, such as thesauri, is usually incomplete, merging information from different sources is a good option to amplify coverage. We propose a method for the enrichment of a thesaurus with information acquired automatically from dictionaries: pairs of synonyms are assigned to candidate synsets and, the pairs whose elements are not in the thesaurus are clustered to identify new synsets. This method was used in the enrichment of a Brazilian Portuguese thesaurus with synonyms from a European Portuguese dictionary, and resulted in a larger and broader thesaurus with new words and new concepts. The assignments and the obtained synsets were manually evaluated and yielded correction scores higher than 71% and 85% respectively.

Keywords

thesaurus synonymy lexico-semantic knowledge clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proc. Human Language Technologies: 2009 Annual Conference of the North American Chapter of ACL (NAACL-HLT), pp. 19–27. ACL, Stroudsburg (2009)Google Scholar
  2. 2.
    Dolan, W.B.: Word sense ambiguation: clustering related senses. In: Proc. 15th Conference on Computational Linguistics (COLING), pp. 712–716. ACL, Morristown (1994)CrossRefGoogle Scholar
  3. 3.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press (May 1998)Google Scholar
  4. 4.
    Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Interfacing WordNet with DOLCE: towards OntoWordNet. In: Ontology and the Lexicon: A Natural Language Processing Perspective, ch.3. Cambridge University Press (2010)Google Scholar
  5. 5.
    Gfeller, D., Chappelier, J.C., Rios, P.D.L.: Synonym Dictionary Improvement through Markov Clustering and Clustering Stability. In: Proc. International Symposium on Applied Stochastic Models and Data Analysis (ASMDA), pp. 106–113 (2005)Google Scholar
  6. 6.
    Gomes, P., Pereira, F.C., Paiva, P., Seco, N., Carreiro, P., Ferreira, J.L., Bento, C.: Noun sense disambiguation with wordnet for software design retrieval. In: Proc. Advances in Artificial Intelligence, 16th Conference of the Canadian Society for Computational Studies of Intelligence, Halifax, Canada, pp. 537–543 (2003)Google Scholar
  7. 7.
    Gonçalo Oliveira, H., Gomes, P.: Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese. In: Proc. 5th European Starting AI Researcher Symposium (STAIRS 2010). IOS Press (2010)Google Scholar
  8. 8.
    Gonçalo Oliveira, H., Gomes, P.: Automatic discovery of fuzzy synsets from dictionary definitions. In: Proc. 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain (2011)Google Scholar
  9. 9.
    Gonçalo Oliveira, H., Santos, D., Gomes, P.: Extracção de relações semânticas entre palavras a partir de um dicionário: o PAPEL e sua avaliação. Linguamática 2(1), 77–93 (2010)Google Scholar
  10. 10.
    Harabagiu, S.M., Moldovan, D.I.: Enriching the WordNet taxonomy with contextual knowledge acquired from text. In: Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language, pp. 301–333. MIT Press, Cambridge (2000)Google Scholar
  11. 11.
    Hearst, M.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database and Some of its Applications, pp. 131–153. MIT Press, Cambridge (1998)Google Scholar
  12. 12.
    Kilgarriff, A.: Word senses are not bona fide objects: implications for cognitive science, formal semantics. In: Proc. 5th International Conference on the Cognitive Science of Natural Language Processing, NLP, pp. 193–200 (1996)Google Scholar
  13. 13.
    Lin, D., Pantel, P.: Concept discovery from text. In: Proc. 19th International Conference on Computational Linguistics (COLING), pp. 577–583 (2002)Google Scholar
  14. 14.
    Maziero, E.G., Pardo, T.A.S., Felippo, A.D., Dias-da-Silva, B.C.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL), pp. 390–392 (2008)Google Scholar
  15. 15.
    Nastase, V., Szpakowicz, S.: Augmenting WordNet’s Structure Using LDOCE. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 281–294. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Navarro, E., Sajous, F., Gaume, B., Prévot, L., Hsieh, S., Kuo, T.Y., Magistry, P., Huang, C.R.: Wiktionary and NLP: Improving synonymy networks. In: Proc. 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, pp. 19–27. ACL, Suntec (2009)CrossRefGoogle Scholar
  17. 17.
    Navigli, R., Velardi, P., Cucchiarelli, A., Neri, F.: Extending and enriching WordNet with OntoLearn. In: Proc. 2nd Global WordNet Conference (GWC), pp. 279–284. Masaryk University, Brno (2004)Google Scholar
  18. 18.
    Niemann, E., Gurevych, I.: The people’s web meets linguistic knowledge: Automatic sense alignment of wikipedia and WordNet. In: Proc. International Conference on Computational Semantics (IWCS), Oxford, UK, pp. 205–214 (2011)Google Scholar
  19. 19.
    Pantel, P.: Inducing ontological co-occurrence vectors. In: Proc. 43rd Annual Meeting of the Association for Computational Linguistics, pp. 125–132. ACL Press (2005)Google Scholar
  20. 20.
    Pasca, M., Harabagiu, S.M.: The informative role of WordNet in open-domain question answering. In: Proc. NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, USA, pp. 138–143 (2001)Google Scholar
  21. 21.
    Pease, A., Fellbaum, C.: Formal ontology as interlingua: the SUMO and WordNet linking project and global WordNet linking project and global WordNet. In: Ontology and the Lexicon: A Natural Language Processing Perspective, ch.2., Cambridge University Press (2010)Google Scholar
  22. 22.
    Peters, W., Peters, I., Vossen, P.: Automatic sense clustering in EuroWordnet. In: Proc. 1st International Conference on Language Resources and Evaluation (LREC), Granada, pp. 409–416 (May 1998)Google Scholar
  23. 23.
    Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proc. 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, pp. 2083–2088 (2009)Google Scholar
  24. 24.
    Ponzetto, S.P., Navigli, R.: Knowledge-rich word sense disambiguation rivaling supervised systems. In: Procs. of 48th Annual Meeting of the Association for Computational Linguistics, pp. 1522–1531. ACL Press, Uppsala (2010)Google Scholar
  25. 25.
    Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  26. 26.
    Santos, D., Barreiro, A., Costa, L., Freitas, C., Gomes, P., Gonçalo Oliveira, H., Medeiros, J.C., Silva, R.: O papel das relações semânticas em português: Comparando o TeP, o MWN.PT e o PAPEL. In: Actas do XXV Encontro Nacional da Associação Portuguesa de Linguística (forthcomming, 2010)Google Scholar
  27. 27.
    Teixeira, J., Sarmento, L., Oliveira, E.: Comparing Verb Synonym Resources for Portuguese. In: Computational Processing of the Portuguese Language, 9th International Conference Proc. (PROPOR), Porto Alegre, Brasil, pp. 100–109 (2010)Google Scholar
  28. 28.
    Tonelli, S., Pighin, D.: New features for FrameNet: WordNet mapping. In: Proc. 13th Conference on Computational Natural Language Learning (CoNLL), pp. 219–227. ACL, Stroudsburg (2009)CrossRefGoogle Scholar
  29. 29.
    Toral, A., Muñoz, R., Monachini, M.: Named Entity Wordnet. In: Proc. International Conference on Language Resources and Evaluation (LREC). ELRA, Marrakech (2008)Google Scholar
  30. 30.
    Vossen, P.: EuroWordNet: a multilingual database for information retrievaleuroWordNet: a multilingual database for information retrieval. In: Proc. DELOS workshop on Cross-Language Information Retrieval, Zurich (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hugo Gonçalo Oliveira
    • 1
  • Paulo Gomes
    • 1
  1. 1.CISUCUniversity of CoimbraPortugal

Personalised recommendations