Skip to main content

A Constraint Approach to Lexicon Induction for Low-Resource Languages

  • Chapter
  • First Online:
Services Computing for Language Resources

Part of the book series: Cognitive Technologies ((COGTECH))

  • 436 Accesses

Abstract

Bilingual lexicon is a useful language resource, but such data rarely available for lower-density language pairs, especially for those that are closely related. The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task. Using a third language to link two other languages is a well-known solution in low-resource situation, which usually requires only two input bilingual lexicons to automatically induce the new one. This approach, however, is weak in measuring semantic distance between bilingual word pairs because it has never been demonstrated to utilize the complete structures of the input bilingual lexicons as dropped meanings negatively influence the result. This research discuss a constraint approach to pivot-based lexicon induction in case the target language pair are closely related. We create constraints from language similarity and model the structures of the input dictionaries as an optimization problem whose solution produces optimally correct target bilingual lexicon. In addition, we enable created bilingual lexicons of low-resource languages accessible through service grid federation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://turkic-languages.scienceontheweb.net.

  2. 2.

    Notice that the optimal assignment may not be unique, since more than one assignments may have equally minimum cost. If it is the case, solver selects one randomly based on its designated behavior.

  3. 3.

    Library of SAT and Boolean Optimization solver: http://www.sat4j.org.

References

  1. Finch, A., Harada, T., Tanaka-Ishii, K., Sumita, E.: Inducing a bilingual lexicon from short parallel multiword sequences. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16(3), 15:1–15:20 (2017)

    Article  Google Scholar 

  2. Haghighi, A., Liang, P., Berg-Kirkpatrick, T., Klein, D.: Learning bilingual lexicons from monolingual corpora. Proc. ACL-08: HLT, 771–779 (2008)

    Google Scholar 

  3. Ishida, T. (ed.): The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer Science & Business Media, Berlin (2011)

    Google Scholar 

  4. István, V., Shoichi, Y.: Bilingual dictionary generation for low-resourced language pairs. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 862–870. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  5. Matsuno, J., Ishida, T.: Constraint optimization approach to context based word selection. In: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, vol. 3, pp. 1846–1851. AAAI Press (2011)

    Google Scholar 

  6. Murakami, Y., Lin, D., Ishida, T.: Service-Oriented Architecture for Interoperability of Multilanguage Services. Springer, Berlin (2014)

    Book  Google Scholar 

  7. Murakami, Y., Tanaka, M., Lin, D., Ishida, T.: Service grid federation architecture for heterogeneous domains. In: IEEE Ninth International Conference on Services Computing, pp. 539–546 (2012)

    Google Scholar 

  8. Nakov, P., Ng, H.T.: Improving statistical machine translation for a resource-poor language using related resource-rich languages. J. Artif. Intell. Res. 44(1), 179–222 (2012)

    MATH  Google Scholar 

  9. Nasution, A.H., Murakami, Y., Ishida, T.: Constraint-based bilingual lexicon induction for closely related languages. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 3291–3298. Paris, France (2016)

    Google Scholar 

  10. Otero, P.G., Campos, J.R.P.: Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora. In: Computational Linguistics and Intelligent Text Processing, pp. 473–483. Springer, Berlin (2010)

    Google Scholar 

  11. Saralegi, X., Manterola, I., San Vicente, I.N.: Building a Basque-Chinese dictionary by using English as pivot. In: LREC, pp. 1443–1447 (2012)

    Google Scholar 

  12. Saralegi, X., Manterola, I., Vicente, I.S.: Analyzing methods for improving precision of pivot based bilingual dictionaries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 846–856. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  13. Schulz, S., Markó, K., Sbrissia, E., Nohama, P., Hahn, U.: Cognate mapping: a heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04, pp. 813:1–813:7. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

  14. Shezaf, D., Rappoport, A.: Bilingual lexicon generation using non-aligned signatures. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 98–107. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  15. Sjobergh, J.: Creating a free digital Japanese-Swedish lexicon. In: Proceedings of PACLING, pp. 296–300. Citeseer (2005)

    Google Scholar 

  16. Swadesh, M.: Towards greater accuracy in lexicostatistic dating. Int. J. Am. Linguist. 21(2), 121–137 (1955)

    Article  Google Scholar 

  17. Tanaka, K., Iwasaki, H.: Extraction of lexical translations from non-aligned corpora. In: Proceedings of the 16th conference on Computational linguistics, vol. 2, pp. 580–585. Association for Computational Linguistics, Stroudsburg (1996)

    Google Scholar 

  18. Tanaka, K., Umemura, K.: Construction of a bilingual dictionary intermediated by a third language. In: Proceedings of the 15th Conference on Computational Linguistics, COLING ’94, vol. 1, pp. 297–303. Association for Computational Linguistics, Stroudsburg (1994)

    Google Scholar 

  19. Tanaka, R., Murakami, Y., Ishida, T.: Context-based approach for pivot translation services. In: IJCAI 2009, Proceedings of the International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, pp. 1555–1561 (2009)

    Google Scholar 

  20. Wushouer, M., Lin, D., Ishida, T., Hirayama, K.: Pivot-Based Bilingual Dictionary Extraction from Multiple Dictionary Resources. Springer, Berlin (2014)

    Book  Google Scholar 

  21. Wushouer, M., Lin, D., Ishida, T., Hirayama, K.: A constraint approach to pivot-based bilingual dictionary induction. ACM Trans. Asian Low-Res. Lang. Inform. Process. 15(1), 1–26 (2015)

    Article  Google Scholar 

  22. Zhang, M., Peng, H., Liu, Y., Luan, H.B., Sun, M.: Bilingual lexicon induction from non-parallel data with minimal supervision, pp. 3379–3385. AAAI Press (2017)

    Google Scholar 

Download references

Acknowledgements

This research was partially supported by Service Science, Solutions and Foundation Integrated Research Program from JST RISTEX, and a Grant-in-Aid for Scientific Research (S) (24220002) from Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mairidan Wushouer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wushouer, M., Lin, D., Ishida, T., Murakami, Y. (2018). A Constraint Approach to Lexicon Induction for Low-Resource Languages. In: Murakami, Y., Lin, D., Ishida, T. (eds) Services Computing for Language Resources . Cognitive Technologies. Springer, Singapore. https://doi.org/10.1007/978-981-10-7793-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7793-7_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7792-0

  • Online ISBN: 978-981-10-7793-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics