Alignment of Paragraphs in Bilingual Texts Using Bilingual Dictionaries and Dynamic Programming

  • Alexander Gelbukh
  • Grigori Sidorov
Conference paper

DOI: 10.1007/11892755_85

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4225)
Cite this paper as:
Gelbukh A., Sidorov G. (2006) Alignment of Paragraphs in Bilingual Texts Using Bilingual Dictionaries and Dynamic Programming. In: Martínez-Trinidad J.F., Carrasco Ochoa J.A., Kittler J. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2006. Lecture Notes in Computer Science, vol 4225. Springer, Berlin, Heidelberg

Abstract

Parallel text alignment is a special type of pattern recognition task aimed to discover the similarity between two sequences of symbols. Given the same text in two different languages, the task is to decide which elements—paragraphs in case of paragraph alignment—in one text are translations of which elements of the other text. One of the applications is training training statistical machine translation algorithms. The task is not trivial unless detailed text understanding can be afforded. In our previous work we have presented a simple technique that relied on bilingual dictionaries but does not perform any syntactic analysis of the texts. In this paper we give a formal definition of the task and present an exact optimization algorithm for finding the best alignment.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alexander Gelbukh
    • 1
  • Grigori Sidorov
    • 1
  1. 1.Natural Language and Text Processing LaboratoryCenter for Research in Computer Science, National Polytechnic InstituteMexico CityMexico

Personalised recommendations