Alignment of Paragraphs in Bilingual Texts Using Bilingual Dictionaries and Dynamic Programming

  • Alexander Gelbukh
  • Grigori Sidorov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4225)

Abstract

Parallel text alignment is a special type of pattern recognition task aimed to discover the similarity between two sequences of symbols. Given the same text in two different languages, the task is to decide which elements—paragraphs in case of paragraph alignment—in one text are translations of which elements of the other text. One of the applications is training training statistical machine translation algorithms. The task is not trivial unless detailed text understanding can be afforded. In our previous work we have presented a simple technique that relied on bilingual dictionaries but does not perform any syntactic analysis of the texts. In this paper we give a formal definition of the task and present an exact optimization algorithm for finding the best alignment.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 169–176 (1991)Google Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  3. 3.
    Caseli, H.M., Volpe Nunes, M.G.: Evaluation of Sentence Alignment Methods on Portuguese-English Parallel Texts. Scientia 14(2), 1–14 (2003)Google Scholar
  4. 4.
    Chen, S.: Aligning sentences in bilingual corpora using lexical information. In: Proceeding of ACL 93, 9–16 (1993)Google Scholar
  5. 5.
    Zhao, B., et al.: Efficient Optimization for Bilingual Sentence Alignment based on Linear Regression. In: HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, pp. 81–87 (2003)Google Scholar
  6. 6.
    Gale, W.A., Church, K.W.: A program for Aligning Sentences in Bilingual Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California (1991)Google Scholar
  7. 7.
    Gelbukh,, Alexander,, Sidorov, G.: Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: CICLing 2003. LNCS, vol. 2588, pp. 215–220. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Gelbukh, A., Sidorov, G.: Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts using Bilingual Dictionaries. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)Google Scholar
  9. 9.
    Martin, K., Roscheisen, M.: Text-translation alignment. Computational Linguistics 19(1), 121–142 (1993)Google Scholar
  10. 10.
    Kit, C., Webster, J.J., Sin, K.K., Pan, H., Li, H.: Clause alignment for Hong Kong legal texts: A lexical-based approach. International Journal of Corpus Linguistics 9(1), 29–51 (2004)CrossRefGoogle Scholar
  11. 11.
    Langlais, P., Simard, M., Veronis, J.: Methods and practical issues in evaluation alignment techniques. In: Proceeding of Coling-ACL (1998)Google Scholar
  12. 12.
    McEnery, A.M., Oakes, M.P.: Sentence and word alignment in the CRATER project. In: Using Corpora for Language Research, London, pp. 211–231 (1996)Google Scholar
  13. 13.
    Melamed, I.D.: A Geometric Approach to Mapping Bitext Correspondence. In: Proc. EMNLP-1996, ACL, pp. 1–12 (1996)Google Scholar
  14. 14.
    Melamed, I.D.: Pattern Recognition for Mapping Bitext Correspondence. In: Parallel Text Processing: Alignment and Use of Translation Corpora, pp. 25–47. Kluwer, Dordrecht (2000)Google Scholar
  15. 15.
    Meyers, A., Kosaka, M., Grishman, R.: A Multilingual Procedure for Dictionary-Based Sentence Alignment. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 187–198. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  16. 16.
    Mikhailov, M.: Two Approaches to Automated Text Aligning of Parallel Fiction Texts. Across Languages and Cultures 2(1), 87–96 (2001)CrossRefGoogle Scholar
  17. 17.
    Robert, C.M.: Fast and Accurate Sentence Alignment of Bilingual Corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, Springer, Heidelberg (2002)Google Scholar
  18. 18.
    Simard, M., Foster, G., Isabelle, P.: Using Cognates to Align Sentences in Bilingual Corpora. In: TMI-1992, pp. 67–81 (1992)Google Scholar
  19. 19.
    Velásquez, F., Gelbukh, A., Sidorov, G.: AGME: un sistema de análisis y generación de la morfología del español. In: Garijo, F.J., Riquelme, J.-C., Toro, M. (eds.) IBERAMIA 2002. LNCS (LNAI), vol. 2527, pp. 1–6. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alexander Gelbukh
    • 1
  • Grigori Sidorov
    • 1
  1. 1.Natural Language and Text Processing LaboratoryCenter for Research in Computer Science, National Polytechnic InstituteMexico CityMexico

Personalised recommendations