Sentence Alignment for Spanish-Basque Bitexts: Word Correspondences vs. Markup Similarity

  • Arantza Casillas
  • Idoia Fernández
  • Raquel Martínez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2945)

Abstract

In this paper, we present an evaluation of two different sentence alignment techniques. One is the well-known SIMR algorithm based on word correspondences on both sides of a bitext. The other one is the ALINOR algorithm, which is based on the similarity of the markup on both sides of a bitext. Both algorithms are accurate in 1-1 alignment, but ALINOR works slightly better in the case of N-M alignment.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Martinez 1998a]
    Martínez, R., Abaitua, J., Casillas, A.: Bitext Correspondences through Rich Mark-up. In: Proceedings of the 17th International Conference on Computational Linguistics (COLING 1998) and 36th Annual Meeting of the Association for Computational Linguistics, ACL 1998 (1998)Google Scholar
  2. [Martinez 1998b]
    Martínez, R., Abaitua, J., Casillas, A.: Aligning tagged bitext. In: Proceedings of the Sixth Workshop on Very Large Corpora (1998)Google Scholar
  3. [Melamed 1996]
    Dan, M.I.: A Geometric Approach to Mapping Bitext Correspondence. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1–12 (1996)Google Scholar
  4. [Melamed 1997]
    Dan, M.I.: A Portable Algorithm for Mapping Bitext Correspondence. In: 35th Conference of the Association for Computational Linguistics, ACL 1997 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Arantza Casillas
    • 1
  • Idoia Fernández
    • 1
  • Raquel Martínez
    • 2
  1. 1.Dpt. de Electricidad y Electrónica, Facultad de C. y TecnologíaUniversidad del País Vasco 
  2. 2.Escuela Superior de CC. Experimentales y TecnologíaUniversidad Rey Juan Carlos 

Personalised recommendations