Extraction of Bilingual Cognates from Wikipedia
- Cite this paper as:
- Gamallo P., Garcia M. (2012) Extraction of Bilingual Cognates from Wikipedia. In: Caseli H., Villavicencio A., Teixeira A., Perdigão F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science, vol 7243. Springer, Berlin, Heidelberg
In this article, we propose a method to extract translation equivalents with similar spelling from comparable corpora. The method was applied on Wikipedia to extract a large amount of Portuguese-Spanish bilingual terminological pairs that were not found in existing dictionaries. The resulting bilingual lexicons consists of more than 27,000 new pairs of lemmas and multiwords, with about 92% accuracy.
Unable to display preview. Download preview PDF.