Aligning Multiword Terms Using a Hybrid Approach

  • Arantza Casillas
  • Raquel Martínez
Conference paper

DOI: 10.1007/3-540-45715-1_28

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2276)
Cite this paper as:
Casillas A., Martínez R. (2002) Aligning Multiword Terms Using a Hybrid Approach. In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg

Abstract

In the context of parallel corpus alignment research between a pair of languages with various and important distinguishing factors (e.g., structural, lexical, morpho-syntactical), this paper presents an approach that deals with multiword terms alignment. Our system, ALINTEC, implements a hybrid strategy that adds various kinds of linguistic knowledge (an aligned corpus at the sentence level, POS tagging, grammatical patterns, and a bilingual glossary) to quantitative criteria such as frequency and distribution of terms in the corpus. The experiments were undertaken on a parallel corpus consisting on a collection of administrative and legal documents in Spanish and Basque. This pair of languages is representative of the context in which our work is framed. The results show that our approach obtains reasonably good results in aligning terms of a pair of languages of different typology such as Spanish and Basque.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Arantza Casillas
    • 1
  • Raquel Martínez
    • 2
  1. 1.Facultad de CienciasUniversidad del País VascoSpain
  2. 2.Escuela Superior de CC. Experimentales y TecnologíaUniversidad Rey Juan CarlosSpain

Personalised recommendations