Collaborative Matching for Sentence Alignment

Quan, Xiaojun; Kit, Chunyu; Chen, Wuya

doi:10.1007/978-3-030-01716-3_4

Xiaojun Quan¹⁸,
Chunyu Kit¹⁹ &
Wuya Chen¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11221))

Included in the following conference series:

1439 Accesses

Abstract

Existing sentence alignment methods are founded fundamentally on sentence length and lexical correspondences. Methods based on the former follow in general the length proportionality assumption that the lengths of sentences in one language tend to be proportional to that of their translations, and are known to bear poor adaptivity to new languages and corpora. In this paper, we attempt to interpret this assumption from a new perspective via the notion of collaborative matching, based on the observation that sentences can work collaboratively during alignment rather than separately as in previous studies. Our approach is tended to be independent on any specific language and corpus, so that it can be adaptively applied to a variety of texts without binding to any prior knowledge about the texts. We use one-to-one sentence alignment to illustrate this approach and implement two specific alignment methods, which are evaluated on six bilingual corpora of different languages and domains. Experimental results confirm the effectiveness of this collaborative matching approach.

The paper was supported by the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (No. 2017ZT07X355).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 169–176 (1991)
Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Google Scholar
Collier, N., Ono, K., Hirakawa, H.: An experiment in hybrid dictionary and statistical sentence alignment. In: Proceedings of the 17th International Conference on Computational Linguistics - The 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 1998), pp. 268–274 (1998)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (1991)
Book Google Scholar
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 177–184 (1991)
Google Scholar
Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996), pp. 131–138 (1996)
Google Scholar
Kit, C., et al.: Clause alignment for hong kong legal texts: a lexical-based approach. Int. J. Corpus Linguist. 9, 29–51 (2004)
Article Google Scholar
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit 2005, pp. 79–86 (2005)
Google Scholar
Li, P., Sun, M., Xue, P.: Fast-champollion: a fast and robust sentence alignment algorithm. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010): Posters, pp. 710–718 (2010)
Google Scholar
Ma, X.: Champollion: a robust parallel text sentence aligner. In: LREC 2006, pp. 489–492 (2006)
Google Scholar
Moore, R.C.: Fast and accurate sentence alignment of bilingual corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 135–144. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45820-4_14
Chapter Google Scholar
Nie, J.Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 74–81 (1999)
Google Scholar
Quan, X., Kit, C.: Towards non-monotonic sentence alignment. Inf. Sci. 323, 34–47 (2015)
Article MathSciNet Google Scholar
Quan, X., Kit, C., Song, Y.: Non-monotonic sentence alignment via semisupervised learning. In: Proceedings of 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 622–630 (2013)
Google Scholar
Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Recent Advances in Natural Language Processing (RANLP 2005), pp. 590–596 (2005)
Google Scholar
Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL 1994), pp. 80–87 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Xiaojun Quan & Wuya Chen
Department of Linguistics and Translation, City University of Hong Kong, Kowloon Tong, Hong Kong
Chunyu Kit

Authors

Xiaojun Quan
View author publications
You can also search for this author in PubMed Google Scholar
Chunyu Kit
View author publications
You can also search for this author in PubMed Google Scholar
Wuya Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojun Quan .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Ting Liu
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Tsinghua University, Beijing, China
Zhiyuan Liu
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Quan, X., Kit, C., Chen, W. (2018). Collaborative Matching for Sentence Alignment. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2018 2018. Lecture Notes in Computer Science(), vol 11221. Springer, Cham. https://doi.org/10.1007/978-3-030-01716-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-01716-3_4
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01715-6
Online ISBN: 978-3-030-01716-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics