Longest Sorted Sequence Algorithm for Parallel Text Alignment

  • Tiago Ildefonso
  • Gabriel Pereira Lopes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3643)

Abstract

This paper describes a language independent method for aligning parallel texts (texts that are translations of each other, or of a common source text), statistically supported. This new approach is inspired on previous work by Ribeiro et al (2000). The application of the second statistical filter, proposed by Ribeiro et al, based on Confidence Bands (CB), is substituted by the application of the Longest Sorted Sequence algorithm (LSSA). LSSA is described in this paper. As a result, 35% decrease in processing time and 18% increase in the number of aligned segments was obtained, for Portuguese-French alignments. Similar results were obtained regarding Portuguese-English alignments. Both methods are compared and evaluated, over a large parallel corpus made up of Portuguese, English and French parallel texts (approximately 250Mb of text per language).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Parallel Texts Alignment. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), October 3-6, Hong Kong, China (2000)Google Scholar
  2. 2.
    Ribeiro, A., Dias, G., Lopes, G., Mexia, J.: Cognates Alignment. In: Maegaard, B. (ed.) Proceedings of the Machine Translation Summit VIII (MT Summit VIII), Santiago de Compostela, Spain, September 18-22. European Association of Machine Translation, pp. 287–292 (2001)Google Scholar
  3. 3.
    Ferreira da Silva, J., Dias, G., Guilloré, S., Pereira Lopes, J.G.: Using Local Maxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J.J. (eds.) EPIA 1999. LNCS (LNAI), vol. 1695, pp. 113–132. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  4. 4.
    Danielsson, P., Mühlenbock, K.: Small but efficient: The misconception of high frequency words in Scandinavian translation. In: White, J.S. (ed.) AMTA 2000. LNCS (LNAI), vol. 1934, pp. 158–168. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Simard, M., Foster, G., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation, vol. TMI-92, pp. 67–91 (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Tiago Ildefonso
    • 1
  • Gabriel Pereira Lopes
    • 1
  1. 1.Faculdade de Ciências e TecnologiaUniversidade Nova de Lisboa, Centre of Informatics and Information Technologies (CITI), Quinta da TorreCaparicaPortugal

Personalised recommendations