Advertisement

Bootstrapping a Portuguese WordNet from Galician, Spanish and English Wordnets

  • Alberto Simões
  • Xavier Gómez Guinovart
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)

Abstract

In this article we exploit the possibility on bootstrapping an European Portuguese WordNet from the English, Spanish and Galician wordnets using Probabilistic Translation Dictionaries automatically created from parallel corpora.

The process generated a total of 56 770 synsets and 97 058 variants. An evaluation of the results using the Brazilian OpenWordNet-PT as a gold standard resulted on a precision varying from 53% to 75% percent, depending on the cut-line. The results were satisfying and comparable to similar experiments using the WN-Toolkit.

Keywords

WordNet Portuguese probabilistic translation dictionaries parallel corpora knowledge acquisition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Atserias, J., Villarejo, L., Rigau, G., Agirre, E., Carroll, J., Magnini, B., Vossen, P.: The MEANING Multilingual Central Repository. In: Second International WordNet Conference, pp. 80–210 (2004)Google Scholar
  2. 2.
    Fernández Montraveta, A., Vázquez, G.: La construcción del wordnet 3.0 en español. In: Castillo, M.A., Platero, J.M.G. (eds.) La Lexicografía en su Dimensión Teórica, pp. 201–220. Universidad de Málaga, Málaga (2010)Google Scholar
  3. 3.
    Gómez Guinovart, X.: A hybrid corpus-based approach to bilingual terminology extraction. In: Fandiño, I.M.S., Crespo, B. (eds.) Encoding the Past, Decoding the Future: Corpora in the 21st Century, pp. 147–175. Cambridge Scholar Publishing, Newcastle upon Tyne (2012)Google Scholar
  4. 4.
    Gómez Guinovart, X., Clemente, X.M.G., Pereira, A.G., Lorenzo, V.T.: Galnet: WordNet 3.0 do galego. Linguamática 3(1), 61–67 (2011)Google Scholar
  5. 5.
    Gómez Guinovart, X., Oliver, T.: Methodology and evaluation of the Galician WordNet expansion with the WN-Toolkit. Procesamiento del Lenguaje Natural 53, 43–50 (2014)Google Scholar
  6. 6.
    Gonçalo Oliveira, H., Costa, H., Gomes, P.: Extracção de conhecimento léxico-semântico a partir de resumos da Wikipédia. In: Proceedings of INFORUM 2010, Simpósio de Informática. Braga, Portugal (September 2010)Google Scholar
  7. 7.
    Gonçalo Oliveira, H., Gomes, P.: Towards the automatic creation of a wordnet from a term-based lexical network. In: Proceedings of the ACL Workshop TextGraphs-5: Graph-based Methods for Natural Language Processing, pp. 10–18. ACL Press (July 2010)Google Scholar
  8. 8.
    Gonçalo Oliveira, H., Gomes, P.: Automatic discovery of fuzzy synsets from dictionary definitions. In: Proceedings of 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, pp. 1801–1806. AAAI Press, Barcelona (2011)Google Scholar
  9. 9.
    González, A., Laparra, E., Rigau, G.: Multilingual central repository version 3.0: upgrading a very large lexical knowledge base. In: 6th Global WordNet Conference, Matsue, Japan (2012)Google Scholar
  10. 10.
    Levenshtein, V.I.: On the minimal redundancy of binary error-correcting codes. Information and Control 28(4), 268–291 (1975)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Maziero, E.G., Pardo, T.A.S., Di Felippo, A., Dias-da Silva, B.C.: A base de dados lexical e a interface Web do TeP 2.0: Thesaurus eletrônico para o português do brasil. In: Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web, WebMedia 2008, pp. 390–392. ACM, New York (2008)CrossRefGoogle Scholar
  12. 12.
    de Melo, G., Weikum, G.: Towards a universal wordnet by learning from combined evidence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 513–522. ACM, New York (2009)Google Scholar
  13. 13.
    Miller, G.A.: WordNet: A lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  14. 14.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)CrossRefzbMATHGoogle Scholar
  15. 15.
    Oliver, A.: Wn-toolkit: Automatic generation of wordnets following the expand model. In: Proceedings of the 7th Global WordNetConference, Tartu, Estonia (2014)Google Scholar
  16. 16.
    Padró, L.: Analizadores multilingües en FreeLing. Linguamática 3(2), 13–20 (2011)Google Scholar
  17. 17.
    de Paiva, V., Rademaker, A., de Melo, G.: OpenWordNet-PT: An open Brazilian WordNet for reasoning. In: Proceedings of the 24th International Conference on Computational Linguistics (2012)Google Scholar
  18. 18.
    Simões, A., Almeida, J.J., Carvalho, N.R.: Defining a probabilistic translation dictionaries algebra. In: Correia, L., Reis, L.P., Cascalho, J., Gomes, L., Guerra, H., Cardoso, P. (eds.) XVI Portuguese Conference on Artificial Inteligence - EPIA, pp. 444–455. Angra do Heroismo, Azores (2013)Google Scholar
  19. 19.
    Simões, A., Guinovart, X.G.: Dictionary Alignment by Rewrite-based Entry Translation. In: Leal, J.P., Rocha, R., Simões, A. (eds.) 2nd Symposium on Languages, Applications and Technologies. OpenAccess Series in Informatics (OASIcs), vol. 29, pp. 237–247. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2013)Google Scholar
  20. 20.
    Simões, A.M., Almeida, J.J.: NATools – a statistical word aligner workbench. Procesamiento del Lenguaje Natural 31, 217–224 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Alberto Simões
    • 1
  • Xavier Gómez Guinovart
    • 2
  1. 1.Centro de Estudos HumanísticosUniversidade do MinhoPortugal
  2. 2.Seminario de Lingüística InformáticaUniversidade de VigoSpain

Personalised recommendations