Incorporating Linguistic Information to Statistical Word-Level Alignment

  • Eduardo Cendejas
  • Grettel Barceló
  • Alexander Gelbukh
  • Grigori Sidorov
Conference paper

DOI: 10.1007/978-3-642-10268-4_46

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5856)
Cite this paper as:
Cendejas E., Barceló G., Gelbukh A., Sidorov G. (2009) Incorporating Linguistic Information to Statistical Word-Level Alignment. In: Bayro-Corrochano E., Eklundh JO. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2009. Lecture Notes in Computer Science, vol 5856. Springer, Berlin, Heidelberg

Abstract

Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, the enrichment can be performed on paragraphs, sentences or words, of the expressed content in the source language and its translation. There are two main approaches to perform word-level alignment: statistical or linguistic. Due to the dissimilar grammar rules the languages have, the statistical algorithms usually give lower precision. That is why the development of this type of algorithms is generally aimed at a specific language pair using linguistic techniques. A hybrid alignment system based on the combination of the two traditional approaches is presented in this paper. It provides user-friendly configuration and is adaptable to the computational environment. The system uses linguistic resources and procedures such as identification of cognates, morphological information, syntactic trees, dictionaries, and semantic domains. We show that the system outperforms existing algorithms.

Keywords

Parallel texts word alignment linguistic information dictionary cognates semantic domains morphological information 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Eduardo Cendejas
    • 1
  • Grettel Barceló
    • 1
  • Alexander Gelbukh
    • 1
  • Grigori Sidorov
    • 1
  1. 1.Center for Computing ResearchNational Polytechnic InstituteMexico CityMexico

Personalised recommendations