Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts Using Bilingual Dictionaries

  • Alexander Gelbukh
  • Grigori Sidorov
  • José Ángel Vera-Félix
Conference paper

DOI: 10.1007/11846406_8

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)
Cite this paper as:
Gelbukh A., Sidorov G., Vera-Félix J.Á. (2006) Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts Using Bilingual Dictionaries. In: Sojka P., Kopeček I., Pala K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science, vol 4188. Springer, Berlin, Heidelberg

Abstract

Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc. Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment. In this paper, we describe an automatically aligned English-Spanish parallel corpus of fiction texts and evaluate our method of alignment that uses linguistic data-namely, on the usage of existing bilingual dictionaries-to calculate word similarity. The method is based on the simple idea: if a meaningful word is present in the source text then one of its dictionary translations should be present in the target text. Experimental results of alignment at paragraph level are described.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alexander Gelbukh
    • 1
  • Grigori Sidorov
    • 1
  • José Ángel Vera-Félix
    • 1
  1. 1.Natural Language and Text Processing Laboratory, Center for Research in Computer Science, National Polytechnic InstituteMexico CityMexico

Personalised recommendations