Abstract
High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a well-known solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because the dictionary entries are normally are not transitive due to polysemy and the ambiguous words in the pivot language. Utilizing the complete structures of the input bilingual dictionaries positively influences the result since dropped meanings can be countered. Moreover, an additional input dictionary may provide more complete information for calculating the semantic distance between word senses which is key to suppressing wrong sense matches. This paper proposes an extended constraint optimization model to inducing new dictionaries of closely related languages from multiple input dictionaries, and its formalization based on Integer Linear Programming. Evaluations indicated that the proposal not only outperforms the baseline method, but also shows improvements in performance and scalability as more dictionaries are utilized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, P.F., Cocke, J., Della Pietra, S.A., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)
Nie, J.-Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74–81. ACM (1999)
Wu, D., Xia, X.: Learning an English-Chinese lexicon from a parallel corpus. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 206–213. Citeseer (1994)
Dou, Q., Knight, K.: Large scale decipherment for out-of-domain machine translation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 266–275. Association for Computational Linguistics (2012)
Yu, K., Tsujii, J.: Bilingual dictionary extraction from wikipedia. In: Proceedings of Machine Translation Summit XII, pp. 379–386 (2009)
Haghighi, A., Liang, P., Berg-Kirkpatrick, T., Klein, D.: Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL 2008: HLT, pp. 771–779 (2008)
Tanaka, K., Umemura, K.: Construction of a bilingual dictionary intermediated by a third language. In: Proceedings of the 15th Conference on Computational Linguistics, COLING 1994, vol. 1, pp. 297–303. Association for Computational Linguistics, Stroudsburg (1994)
Wushouer, M., Lin, D., Ishida, T., Hirayama, K.: Bilingual dictionary induction as an optimization problem
Bond, F., Ogura, K.: Combining linguistic resources to create a machine-tractable Japanese-Malay dictionary. Language Resources and Evaluation 42(2), 127–136 (2008)
István, V., Shoichi, Y.: Bilingual dictionary generation for low-resourced language pairs. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 862–870. Association for Computational Linguistics (2009)
Kaji, H., Tamamura, S., Erdenebat, D.: Automatic construction of a Japanese-Chinese dictionary via English. In: LREC, vol. 2008, pp. 699–706 (2008)
Sjobergh, J.: Creating a free digital Japanese-Swedish lexicon. In: Proceedings of PACLING, pp. 296–300. Citeseer (2005)
Matsuno, J., Ishida, T.: Constraint optimization approach to context based word selection. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1846–1851. AAAI Press (2011)
Ben Hassine, A., Matsubara, S., Ishida, T.: A constraint-based approach to horizontal web service composition. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 130–143. Springer, Heidelberg (2006)
Barth, P., Stadtwald, I.: A davis-putnam based enumeration algorithm for linear pseudo-boolean optimization (1995)
Saralegi, X., Manterola, I., Vicente, I.S.: Analyzing methods for improving precision of pivot based bilingual dictionaries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 846–856. Association for Computational Linguistics (2011)
Kaji, H., Aizono, T.: Extracting word correspondences from bilingual corpora based on word co-occurrences information. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 1, pp. 23–28. Association for Computational Linguistics (1996)
Bond, F., Yamazaki, T., Sulong, R.B., Okura, K.: Design and construction of a machine–tractable Japanese-Malay lexicon. In: Annual Meeting of the Association for Natural Language Processing, vol. 7, p. 1 (2001)
Soderland, S., Etzioni, O., Weld, D.S., Skinner, M., Bilmes, J., et al.: Compiling a massive, multilingual dictionary via probabilistic inference. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 262–270. Association for Computational Linguistics (2009)
Nerima, L., Wehrli, E.: Generating bilingual dictionaries by transitivity. In: LREC (2008)
Shezaf, D., Rappoport, A.: Bilingual lexicon generation using non-aligned signatures. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 98–107. Association for Computational Linguistics (2010)
Schulz, S., Markó, K., Sbrissia, E., Nohama, P., Hahn, U.: Cognate mapping: A heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 813. Association for Computational Linguistics (2004)
Tanaka, R., Murakami, Y., Ishida, T.: Context-based approach for pivot translation services. In: IJCAI, pp. 1555–1561 (2009)
Ishida, T.: The Language Grid. Springer (2011)
Dan Melamed, I.: Models of translational equivalence among words. Computational Linguistics 26(2), 221–249 (2000)
Dan Melamed, I.: A word-to-word model of translational equivalence. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pp. 490–497. Association for Computational Linguistics (1997)
Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, vol. 9, pp. 9–16. Association for Computational Linguistics (2002)
Shezaf, D., Rappoport, A.: Bilingual lexicon generation using non-aligned signatures. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 98–107. Association for Computational Linguistics (2010)
Nakov, P., Ng, H.T.: Improving statistical machine translation for a resource-poor language using related resource-rich languages. Journal of Artificial Intelligence Research 44(1), 179–222 (2012)
Schrijver, A.: Theory of linear and integer programming. John Wiley & Sons (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wushouer, M., Lin, D., Ishida, T., Hirayama, K. (2014). Pivot-Based Bilingual Dictionary Extraction from Multiple Dictionary Resources. In: Pham, DN., Park, SB. (eds) PRICAI 2014: Trends in Artificial Intelligence. PRICAI 2014. Lecture Notes in Computer Science(), vol 8862. Springer, Cham. https://doi.org/10.1007/978-3-319-13560-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-13560-1_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13559-5
Online ISBN: 978-3-319-13560-1
eBook Packages: Computer ScienceComputer Science (R0)