Spanish-Basque Parallel Corpus Structure: Linguistic Annotations and Translation Units

  • A. Casillas
  • A. Díaz de Illarraza
  • J. Igartua
  • R. Martínez
  • K. Sarasola
  • A. Sologaistoa
Conference paper

DOI: 10.1007/978-3-540-74628-7_31

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4629)
Cite this paper as:
Casillas A., de Illarraza A.D., Igartua J., Martínez R., Sarasola K., Sologaistoa A. (2007) Spanish-Basque Parallel Corpus Structure: Linguistic Annotations and Translation Units. In: Matoušek V., Mautner P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science, vol 4629. Springer, Berlin, Heidelberg

Abstract

In this paper we propose a corpus structure which represents and manages an aligned parallel corpus. The corpus structure is based on a stand-off annotation model, which is composed of several XML documents. A bilingual parallel corpus represented in the proposed structure will contain: (1) the entire corpus together with its corresponding linguistic information, (2) translation units and alignment relations between units of the two languages: paragraphs, sentences and named entities. The proposed structure permits to work with the corpus both as an annotated corpus with linguistic information, and as a translation memory.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • A. Casillas
    • 1
  • A. Díaz de Illarraza
    • 2
  • J. Igartua
    • 2
  • R. Martínez
    • 3
  • K. Sarasola
    • 2
  • A. Sologaistoa
    • 2
  1. 1.Dpt. Electricidad y Electrónica, UPV-EHU 
  2. 2.IXA Taldea 
  3. 3.NLP&IR Group, UNED 

Personalised recommendations