Skip to main content

Pivot-Based Bilingual Dictionary Extraction from Multiple Dictionary Resources

  • Conference paper
PRICAI 2014: Trends in Artificial Intelligence (PRICAI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8862))

Included in the following conference series:

Abstract

High quality bilingual dictionaries are rarely available for lower-density language pairs, especially for those that are closely related. Using a third language as a pivot to link two other languages is a well-known solution, and usually requires only two input bilingual dictionaries to automatically induce the new one. This approach, however, produces many incorrect translation pairs because the dictionary entries are normally are not transitive due to polysemy and the ambiguous words in the pivot language. Utilizing the complete structures of the input bilingual dictionaries positively influences the result since dropped meanings can be countered. Moreover, an additional input dictionary may provide more complete information for calculating the semantic distance between word senses which is key to suppressing wrong sense matches. This paper proposes an extended constraint optimization model to inducing new dictionaries of closely related languages from multiple input dictionaries, and its formalization based on Integer Linear Programming. Evaluations indicated that the proposal not only outperforms the baseline method, but also shows improvements in performance and scalability as more dictionaries are utilized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, P.F., Cocke, J., Della Pietra, S.A., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)

    Google Scholar 

  2. Nie, J.-Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74–81. ACM (1999)

    Google Scholar 

  3. Wu, D., Xia, X.: Learning an English-Chinese lexicon from a parallel corpus. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 206–213. Citeseer (1994)

    Google Scholar 

  4. Dou, Q., Knight, K.: Large scale decipherment for out-of-domain machine translation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 266–275. Association for Computational Linguistics (2012)

    Google Scholar 

  5. Yu, K., Tsujii, J.: Bilingual dictionary extraction from wikipedia. In: Proceedings of Machine Translation Summit XII, pp. 379–386 (2009)

    Google Scholar 

  6. Haghighi, A., Liang, P., Berg-Kirkpatrick, T., Klein, D.: Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL 2008: HLT, pp. 771–779 (2008)

    Google Scholar 

  7. Tanaka, K., Umemura, K.: Construction of a bilingual dictionary intermediated by a third language. In: Proceedings of the 15th Conference on Computational Linguistics, COLING 1994, vol. 1, pp. 297–303. Association for Computational Linguistics, Stroudsburg (1994)

    Google Scholar 

  8. Wushouer, M., Lin, D., Ishida, T., Hirayama, K.: Bilingual dictionary induction as an optimization problem

    Google Scholar 

  9. Bond, F., Ogura, K.: Combining linguistic resources to create a machine-tractable Japanese-Malay dictionary. Language Resources and Evaluation 42(2), 127–136 (2008)

    Article  Google Scholar 

  10. István, V., Shoichi, Y.: Bilingual dictionary generation for low-resourced language pairs. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 862–870. Association for Computational Linguistics (2009)

    Google Scholar 

  11. Kaji, H., Tamamura, S., Erdenebat, D.: Automatic construction of a Japanese-Chinese dictionary via English. In: LREC, vol. 2008, pp. 699–706 (2008)

    Google Scholar 

  12. Sjobergh, J.: Creating a free digital Japanese-Swedish lexicon. In: Proceedings of PACLING, pp. 296–300. Citeseer (2005)

    Google Scholar 

  13. Matsuno, J., Ishida, T.: Constraint optimization approach to context based word selection. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1846–1851. AAAI Press (2011)

    Google Scholar 

  14. Ben Hassine, A., Matsubara, S., Ishida, T.: A constraint-based approach to horizontal web service composition. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 130–143. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Barth, P., Stadtwald, I.: A davis-putnam based enumeration algorithm for linear pseudo-boolean optimization (1995)

    Google Scholar 

  16. Saralegi, X., Manterola, I., Vicente, I.S.: Analyzing methods for improving precision of pivot based bilingual dictionaries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 846–856. Association for Computational Linguistics (2011)

    Google Scholar 

  17. Kaji, H., Aizono, T.: Extracting word correspondences from bilingual corpora based on word co-occurrences information. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 1, pp. 23–28. Association for Computational Linguistics (1996)

    Google Scholar 

  18. Bond, F., Yamazaki, T., Sulong, R.B., Okura, K.: Design and construction of a machine–tractable Japanese-Malay lexicon. In: Annual Meeting of the Association for Natural Language Processing, vol. 7, p. 1 (2001)

    Google Scholar 

  19. Soderland, S., Etzioni, O., Weld, D.S., Skinner, M., Bilmes, J., et al.: Compiling a massive, multilingual dictionary via probabilistic inference. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 262–270. Association for Computational Linguistics (2009)

    Google Scholar 

  20. Nerima, L., Wehrli, E.: Generating bilingual dictionaries by transitivity. In: LREC (2008)

    Google Scholar 

  21. Shezaf, D., Rappoport, A.: Bilingual lexicon generation using non-aligned signatures. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 98–107. Association for Computational Linguistics (2010)

    Google Scholar 

  22. Schulz, S., Markó, K., Sbrissia, E., Nohama, P., Hahn, U.: Cognate mapping: A heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 813. Association for Computational Linguistics (2004)

    Google Scholar 

  23. Tanaka, R., Murakami, Y., Ishida, T.: Context-based approach for pivot translation services. In: IJCAI, pp. 1555–1561 (2009)

    Google Scholar 

  24. Ishida, T.: The Language Grid. Springer (2011)

    Google Scholar 

  25. Dan Melamed, I.: Models of translational equivalence among words. Computational Linguistics 26(2), 221–249 (2000)

    Article  Google Scholar 

  26. Dan Melamed, I.: A word-to-word model of translational equivalence. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pp. 490–497. Association for Computational Linguistics (1997)

    Google Scholar 

  27. Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, vol. 9, pp. 9–16. Association for Computational Linguistics (2002)

    Google Scholar 

  28. Shezaf, D., Rappoport, A.: Bilingual lexicon generation using non-aligned signatures. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 98–107. Association for Computational Linguistics (2010)

    Google Scholar 

  29. Nakov, P., Ng, H.T.: Improving statistical machine translation for a resource-poor language using related resource-rich languages. Journal of Artificial Intelligence Research 44(1), 179–222 (2012)

    MATH  Google Scholar 

  30. Schrijver, A.: Theory of linear and integer programming. John Wiley & Sons (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wushouer, M., Lin, D., Ishida, T., Hirayama, K. (2014). Pivot-Based Bilingual Dictionary Extraction from Multiple Dictionary Resources. In: Pham, DN., Park, SB. (eds) PRICAI 2014: Trends in Artificial Intelligence. PRICAI 2014. Lecture Notes in Computer Science(), vol 8862. Springer, Cham. https://doi.org/10.1007/978-3-319-13560-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13560-1_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13559-5

  • Online ISBN: 978-3-319-13560-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics