Abstract
Existing sentence alignment methods are founded fundamentally on sentence length and lexical correspondences. Methods based on the former follow in general the length proportionality assumption that the lengths of sentences in one language tend to be proportional to that of their translations, and are known to bear poor adaptivity to new languages and corpora. In this paper, we attempt to interpret this assumption from a new perspective via the notion of collaborative matching, based on the observation that sentences can work collaboratively during alignment rather than separately as in previous studies. Our approach is tended to be independent on any specific language and corpus, so that it can be adaptively applied to a variety of texts without binding to any prior knowledge about the texts. We use one-to-one sentence alignment to illustrate this approach and implement two specific alignment methods, which are evaluated on six bilingual corpora of different languages and domains. Experimental results confirm the effectiveness of this collaborative matching approach.
The paper was supported by the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (No. 2017ZT07X355).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 169–176 (1991)
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Collier, N., Ono, K., Hirakawa, H.: An experiment in hybrid dictionary and statistical sentence alignment. In: Proceedings of the 17th International Conference on Computational Linguistics - The 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 1998), pp. 268–274 (1998)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (1991)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 177–184 (1991)
Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996), pp. 131–138 (1996)
Kit, C., et al.: Clause alignment for hong kong legal texts: a lexical-based approach. Int. J. Corpus Linguist. 9, 29–51 (2004)
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit 2005, pp. 79–86 (2005)
Li, P., Sun, M., Xue, P.: Fast-champollion: a fast and robust sentence alignment algorithm. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010): Posters, pp. 710–718 (2010)
Ma, X.: Champollion: a robust parallel text sentence aligner. In: LREC 2006, pp. 489–492 (2006)
Moore, R.C.: Fast and accurate sentence alignment of bilingual corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 135–144. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45820-4_14
Nie, J.Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 74–81 (1999)
Quan, X., Kit, C.: Towards non-monotonic sentence alignment. Inf. Sci. 323, 34–47 (2015)
Quan, X., Kit, C., Song, Y.: Non-monotonic sentence alignment via semisupervised learning. In: Proceedings of 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 622–630 (2013)
Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Recent Advances in Natural Language Processing (RANLP 2005), pp. 590–596 (2005)
Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL 1994), pp. 80–87 (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Quan, X., Kit, C., Chen, W. (2018). Collaborative Matching for Sentence Alignment. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2018 2018. Lecture Notes in Computer Science(), vol 11221. Springer, Cham. https://doi.org/10.1007/978-3-030-01716-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-01716-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01715-6
Online ISBN: 978-3-030-01716-3
eBook Packages: Computer ScienceComputer Science (R0)