Advertisement

Evaluation of Alignment Methods for HTML Parallel Text

  • Enrique Sánchez-Villamil
  • Susana Santos-Antón
  • Sergio Ortiz-Rojas
  • Mikel L. Forcada
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)

Abstract

The Internet constitutes a potential huge store of parallel text that may be collected to be exploited by many applications such as multilingual information retrieval, machine translation, etc. These applications usually require at least sentence-aligned bilingual text. This paper presents new aligners designed for improving the performance of classical sentence-level aligners while aligning structured text such as HTML. The new aligners are compared with other well-known geometric aligners.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Black, E., Abney, S., Flickenger, D., Gdaniec, C., Grishman, R., Harrison, P., Hin-dle, D., Ingria, R., Jelinek, F., Klavans, J., Liberman, M., Marcus, M., Roukos, S., Santorini, B., Strzalkowski, T.: A procedure for quantitatively comparing syntactic coverage of english grammars. In: DARPA Speech and Natural Language Workshop (1991)Google Scholar
  2. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Meeting of the Association for Computational Linguistics, pp. 169–176. University of California, Berkeley (1991)Google Scholar
  3. Chen, S.F.: Aligning sentences in bilingual corpora using lexical information. In: Proceedings of the 31st annual meeting on Association for Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 9–16 (1993)Google Scholar
  4. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Meeting of the Association for Computational Linguistics, pp. 177–184 (1991)Google Scholar
  5. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19, 75–102 (1993)Google Scholar
  6. Melamed, I.D.: A geometric approach to mapping bitext correspondence. In: Brill, E., Church, K. (eds.) Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1–12. Association for Computational Linguistics, Somerset (1996)Google Scholar
  7. Sánchez-Villamil, E., Tomás, J., Forcada, M.L.: Building parallel text collections for closely related languages (unpublished, 2006)Google Scholar
  8. Sigurd, B., Eeg-Olofsson, M., van Weijer, J.: Word length, sentence length and frequency - Zipf revisited. Studia Linguistica 58(1), 37–52 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Enrique Sánchez-Villamil
    • 1
  • Susana Santos-Antón
    • 1
  • Sergio Ortiz-Rojas
    • 1
  • Mikel L. Forcada
    • 1
  1. 1.Transducens group, Departament de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacantSpain

Personalised recommendations