Extraction of Bilingual Cognates from Wikipedia

  • Pablo Gamallo
  • Marcos Garcia
Conference paper

DOI: 10.1007/978-3-642-28885-2_7

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7243)
Cite this paper as:
Gamallo P., Garcia M. (2012) Extraction of Bilingual Cognates from Wikipedia. In: Caseli H., Villavicencio A., Teixeira A., Perdigão F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science, vol 7243. Springer, Berlin, Heidelberg

Abstract

In this article, we propose a method to extract translation equivalents with similar spelling from comparable corpora. The method was applied on Wikipedia to extract a large amount of Portuguese-Spanish bilingual terminological pairs that were not found in existing dictionaries. The resulting bilingual lexicons consists of more than 27,000 new pairs of lemmas and multiwords, with about 92% accuracy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Pablo Gamallo
    • 1
  • Marcos Garcia
    • 1
  1. 1.Centro de Investigação em Tecnologias da Informação (CITIUS)Universidade de Santiago de CompostelaGalizaSpain

Personalised recommendations