Abstract
Aligning the representation spaces of two languages to induce a bilingual lexicon achieves attractive results on European language pairs. Unfortunately, current solutions perform terribly on distant language pairs. To address this problem, we analyze existing models for the lexicon induction task of distant language pairs, such as English-Chinese. We propose an framework for the task with improved preprocessing, mapping and inference accordingly. Experimental results show that our proposed approach enhances the accuracy of bilingual lexicons substantially on English-Chinese, as well as some other distant language pairs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Artetxe, M., Labaka, G., Agirre, E.: Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 2289–2294 (2016)
Artetxe, M., Labaka, G., Agirre, E.: Learning bilingual word embeddings with (almost) no bilingual data. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 451–462 (2017)
Artetxe, M., Labaka, G., Agirre, E.: Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations. In: AAAI Conference on Artificial Intelligence, pp. 5012–5019 (2018)
Artetxe, M., Labaka, G., Agirre, E.: A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 789–798 (2018)
Barone, A.: Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders. In: Meeting of the Association for Computational Linguistics, pp. 121–126 (2016)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5(1), 135–146 (2017)
Dinu, G., Baroni, M.: Improving zero-shot learning by mitigating the hubness problem. In: International Conference on Learning Representations (2014)
Lample, G., Conneau, A., Ranzato, M., Denoyer, L., Jegou, H.: Word translation without parallel data. In: International Conference on Learning Representations (2018)
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Nakashole, N.: NORMA: neighborhood sensitive maps for multilingual word embeddings. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 512–522. Association for Computational Linguistics, Brussels (2018)
Smith, S.L., Turban, D.H.P., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In: International Conference on Learning Representations (2017)
Vulic, I., Korhonen, A.: On the role of seed lexicons in learning bilingual word embeddings, vol. 1, pp. 247–257 (2016)
Xing, C., Wang, D., Liu, C., Lin, Y.: Normalized word embedding and orthogonal transform for bilingual word translation, pp. 1006–1011 (2015)
Zhang, M., Liu, Y., Luan, H., Sun, M.: Adversarial training for unsupervised bilingual lexicon induction, vol. 1, pp. 1959–1970 (2017)
Acknowledgement
We would like to thank the anonymous reviewers for their insightful comments. Shujian Huang is the corresponding author. This work is supported by the National Science Foundation of China (No. 61772261), the Jiangsu Provincial Research Foundation for Basic Research (No. BK20170074), “13th Five-Year” All-Army Common Information System Equipment Pre-Research Project (No. 31510040201). This work is also partially supported by the research funding from ZTE Corporation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhu, W. et al. (2019). Improving Bilingual Lexicon Induction on Distant Language Pairs. In: Huang, S., Knight, K. (eds) Machine Translation. CCMT 2019. Communications in Computer and Information Science, vol 1104. Springer, Singapore. https://doi.org/10.1007/978-981-15-1721-1_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-1721-1_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1720-4
Online ISBN: 978-981-15-1721-1
eBook Packages: Computer ScienceComputer Science (R0)