Abstract
Parallel text alignment is a special type of pattern recognition task aimed to discover the similarity between two sequences of symbols. Given the same text in two different languages, the task is to decide which elements—paragraphs in case of paragraph alignment—in one text are translations of which elements of the other text. One of the applications is training training statistical machine translation algorithms. The task is not trivial unless detailed text understanding can be afforded. In our previous work we have presented a simple technique that relied on bilingual dictionaries but does not perform any syntactic analysis of the texts. In this paper we give a formal definition of the task and present an exact optimization algorithm for finding the best alignment.
Work done under partial support of Mexican Government (CONACyT, SNI) and National Polytechnic Institute, Mexico (CGPI, COFAA). We thank an anonymous reviewer for attracting our attention to valuable resources and publications.
Chapter PDF
Similar content being viewed by others
Keywords
- Machine Translation
- Dynamic Programming Algorithm
- Parallel Corpus
- Bilingual Dictionary
- Pattern Recognition Task
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 169–176 (1991)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Caseli, H.M., Volpe Nunes, M.G.: Evaluation of Sentence Alignment Methods on Portuguese-English Parallel Texts. Scientia 14(2), 1–14 (2003)
Chen, S.: Aligning sentences in bilingual corpora using lexical information. In: Proceeding of ACL 93, 9–16 (1993)
Zhao, B., et al.: Efficient Optimization for Bilingual Sentence Alignment based on Linear Regression. In: HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, pp. 81–87 (2003)
Gale, W.A., Church, K.W.: A program for Aligning Sentences in Bilingual Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California (1991)
Gelbukh,, Alexander,, Sidorov, G.: Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: CICLing 2003. LNCS, vol. 2588, pp. 215–220. Springer, Heidelberg (2003)
Gelbukh, A., Sidorov, G.: Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts using Bilingual Dictionaries. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)
Martin, K., Roscheisen, M.: Text-translation alignment. Computational Linguistics 19(1), 121–142 (1993)
Kit, C., Webster, J.J., Sin, K.K., Pan, H., Li, H.: Clause alignment for Hong Kong legal texts: A lexical-based approach. International Journal of Corpus Linguistics 9(1), 29–51 (2004)
Langlais, P., Simard, M., Veronis, J.: Methods and practical issues in evaluation alignment techniques. In: Proceeding of Coling-ACL (1998)
McEnery, A.M., Oakes, M.P.: Sentence and word alignment in the CRATER project. In: Using Corpora for Language Research, London, pp. 211–231 (1996)
Melamed, I.D.: A Geometric Approach to Mapping Bitext Correspondence. In: Proc. EMNLP-1996, ACL, pp. 1–12 (1996)
Melamed, I.D.: Pattern Recognition for Mapping Bitext Correspondence. In: Parallel Text Processing: Alignment and Use of Translation Corpora, pp. 25–47. Kluwer, Dordrecht (2000)
Meyers, A., Kosaka, M., Grishman, R.: A Multilingual Procedure for Dictionary-Based Sentence Alignment. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 187–198. Springer, Heidelberg (1998)
Mikhailov, M.: Two Approaches to Automated Text Aligning of Parallel Fiction Texts. Across Languages and Cultures 2(1), 87–96 (2001)
Robert, C.M.: Fast and Accurate Sentence Alignment of Bilingual Corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, Springer, Heidelberg (2002)
Simard, M., Foster, G., Isabelle, P.: Using Cognates to Align Sentences in Bilingual Corpora. In: TMI-1992, pp. 67–81 (1992)
Velásquez, F., Gelbukh, A., Sidorov, G.: AGME: un sistema de análisis y generación de la morfología del español. In: Garijo, F.J., Riquelme, J.-C., Toro, M. (eds.) IBERAMIA 2002. LNCS (LNAI), vol. 2527, pp. 1–6. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gelbukh, A., Sidorov, G. (2006). Alignment of Paragraphs in Bilingual Texts Using Bilingual Dictionaries and Dynamic Programming. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2006. Lecture Notes in Computer Science, vol 4225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11892755_85
Download citation
DOI: https://doi.org/10.1007/11892755_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46556-0
Online ISBN: 978-3-540-46557-7
eBook Packages: Computer ScienceComputer Science (R0)