Lexical-Based Alignment for Reconstruction of Structure in Parallel Texts

  • Alexander Gelbukh
  • Grigori Sidorov
  • Liliana Chanona-Hernandez
Conference paper

DOI: 10.1007/978-3-540-73351-5_37

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4592)
Cite this paper as:
Gelbukh A., Sidorov G., Chanona-Hernandez L. (2007) Lexical-Based Alignment for Reconstruction of Structure in Parallel Texts. In: Kedad Z., Lammari N., Métais E., Meziane F., Rezgui Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg

Abstract

In this paper, we present an optimization algorithm for finding the best text alignment based on the lexical similarity and the results of its evaluation as compared with baseline methods (Gale and Church, relative position). For evaluation, we use fiction texts that represent non-trivial cases of alignment. Also, we present a new method for evaluation of the algorithms of parallel texts alignment, which consists in restoration of the structure of the text in one of the languages using the units of the lower level and the available structure of the text in the other language. For example, in case of paragraph level alignment, the sentences are used to constitute the restored paragraphs. The advantage of this method is that it does not depend on corpus data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Alexander Gelbukh
    • 1
  • Grigori Sidorov
    • 1
  • Liliana Chanona-Hernandez
    • 2
  1. 1.Center for Research in Computer Science, National Polytechnic Institute, Av. Juan Dios Batiz, s/n, Zacatenco, 07738, Mexico CityMexico
  2. 2.Faculty of Electric and Mechanical Engineering, National Polytechnic Institute, Mexico CityMexico

Personalised recommendations