Abstract
Bilingual lexicon is a useful language resource, but such data rarely available for lower-density language pairs, especially for those that are closely related. The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task. Using a third language to link two other languages is a well-known solution in low-resource situation, which usually requires only two input bilingual lexicons to automatically induce the new one. This approach, however, is weak in measuring semantic distance between bilingual word pairs because it has never been demonstrated to utilize the complete structures of the input bilingual lexicons as dropped meanings negatively influence the result. This research discuss a constraint approach to pivot-based lexicon induction in case the target language pair are closely related. We create constraints from language similarity and model the structures of the input dictionaries as an optimization problem whose solution produces optimally correct target bilingual lexicon. In addition, we enable created bilingual lexicons of low-resource languages accessible through service grid federation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Notice that the optimal assignment may not be unique, since more than one assignments may have equally minimum cost. If it is the case, solver selects one randomly based on its designated behavior.
- 3.
Library of SAT and Boolean Optimization solver: http://www.sat4j.org.
References
Finch, A., Harada, T., Tanaka-Ishii, K., Sumita, E.: Inducing a bilingual lexicon from short parallel multiword sequences. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16(3), 15:1–15:20 (2017)
Haghighi, A., Liang, P., Berg-Kirkpatrick, T., Klein, D.: Learning bilingual lexicons from monolingual corpora. Proc. ACL-08: HLT, 771–779 (2008)
Ishida, T. (ed.): The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer Science & Business Media, Berlin (2011)
István, V., Shoichi, Y.: Bilingual dictionary generation for low-resourced language pairs. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 862–870. Association for Computational Linguistics, Stroudsburg (2009)
Matsuno, J., Ishida, T.: Constraint optimization approach to context based word selection. In: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, vol. 3, pp. 1846–1851. AAAI Press (2011)
Murakami, Y., Lin, D., Ishida, T.: Service-Oriented Architecture for Interoperability of Multilanguage Services. Springer, Berlin (2014)
Murakami, Y., Tanaka, M., Lin, D., Ishida, T.: Service grid federation architecture for heterogeneous domains. In: IEEE Ninth International Conference on Services Computing, pp. 539–546 (2012)
Nakov, P., Ng, H.T.: Improving statistical machine translation for a resource-poor language using related resource-rich languages. J. Artif. Intell. Res. 44(1), 179–222 (2012)
Nasution, A.H., Murakami, Y., Ishida, T.: Constraint-based bilingual lexicon induction for closely related languages. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 3291–3298. Paris, France (2016)
Otero, P.G., Campos, J.R.P.: Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora. In: Computational Linguistics and Intelligent Text Processing, pp. 473–483. Springer, Berlin (2010)
Saralegi, X., Manterola, I., San Vicente, I.N.: Building a Basque-Chinese dictionary by using English as pivot. In: LREC, pp. 1443–1447 (2012)
Saralegi, X., Manterola, I., Vicente, I.S.: Analyzing methods for improving precision of pivot based bilingual dictionaries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 846–856. Association for Computational Linguistics, Stroudsburg (2011)
Schulz, S., Markó, K., Sbrissia, E., Nohama, P., Hahn, U.: Cognate mapping: a heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04, pp. 813:1–813:7. Association for Computational Linguistics, Stroudsburg (2004)
Shezaf, D., Rappoport, A.: Bilingual lexicon generation using non-aligned signatures. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 98–107. Association for Computational Linguistics, Stroudsburg (2010)
Sjobergh, J.: Creating a free digital Japanese-Swedish lexicon. In: Proceedings of PACLING, pp. 296–300. Citeseer (2005)
Swadesh, M.: Towards greater accuracy in lexicostatistic dating. Int. J. Am. Linguist. 21(2), 121–137 (1955)
Tanaka, K., Iwasaki, H.: Extraction of lexical translations from non-aligned corpora. In: Proceedings of the 16th conference on Computational linguistics, vol. 2, pp. 580–585. Association for Computational Linguistics, Stroudsburg (1996)
Tanaka, K., Umemura, K.: Construction of a bilingual dictionary intermediated by a third language. In: Proceedings of the 15th Conference on Computational Linguistics, COLING ’94, vol. 1, pp. 297–303. Association for Computational Linguistics, Stroudsburg (1994)
Tanaka, R., Murakami, Y., Ishida, T.: Context-based approach for pivot translation services. In: IJCAI 2009, Proceedings of the International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, pp. 1555–1561 (2009)
Wushouer, M., Lin, D., Ishida, T., Hirayama, K.: Pivot-Based Bilingual Dictionary Extraction from Multiple Dictionary Resources. Springer, Berlin (2014)
Wushouer, M., Lin, D., Ishida, T., Hirayama, K.: A constraint approach to pivot-based bilingual dictionary induction. ACM Trans. Asian Low-Res. Lang. Inform. Process. 15(1), 1–26 (2015)
Zhang, M., Peng, H., Liu, Y., Luan, H.B., Sun, M.: Bilingual lexicon induction from non-parallel data with minimal supervision, pp. 3379–3385. AAAI Press (2017)
Acknowledgements
This research was partially supported by Service Science, Solutions and Foundation Integrated Research Program from JST RISTEX, and a Grant-in-Aid for Scientific Research (S) (24220002) from Japan Society for the Promotion of Science.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Wushouer, M., Lin, D., Ishida, T., Murakami, Y. (2018). A Constraint Approach to Lexicon Induction for Low-Resource Languages. In: Murakami, Y., Lin, D., Ishida, T. (eds) Services Computing for Language Resources . Cognitive Technologies. Springer, Singapore. https://doi.org/10.1007/978-981-10-7793-7_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-7793-7_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7792-0
Online ISBN: 978-981-10-7793-7
eBook Packages: Computer ScienceComputer Science (R0)