Skip to main content

Complex Technology of Machine Translation Resources Extension for the Kazakh Language

  • Chapter
  • First Online:
Book cover Advanced Topics in Intelligent Information and Database Systems (ACIIDS 2017)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 710))

Included in the following conference series:

Abstract

The paper is devoted to creating linguistic resources such as parallel corpora, dictionaries and transfer rules for machine translation for low resources languages. We describe the usage of Bitextor tool for mining parallel corpora from online texts, usage of dictionary enrichment methodology so that people without deep linguistic knowledge could improve word dictionaries, and we show how transfer rules for machine translation can be automatically learned from a parallel corpus. All describe methods were applied to Kazakh, Russian and English languages with a task of machine translation between these languages in mind.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Esplà-Gomis, M.: Bitextor: a free/open-source software to harvest translation memories from multilingual websites. In: Proceedings of MT Summit XII, Ottawa, Canada, Association for Machine Translation in the Americas (2009)

    Google Scholar 

  2. Esplà-Gomis, M., Forcada, M.: Combining content-based and URL-based heuristics to harvest aligned bitexts from multilingual sites with Bitextor. Prague Bull. Math. Linguist. 93, 77–86 (2010)

    Article  Google Scholar 

  3. Rubino, R., Pirinen, T., Espla-Gomis, M., Ljubešic, N., Ortiz Rojas, S., Papavassiliou, V., Prokopidis, P., Toral, A.: Abu-MaTran at WMT 2015 translation task: morphological segmentation and web crawling. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 184–191 (2015)

    Google Scholar 

  4. Esplà-Gomis, M., Klubicka, F., Ljubesic, N., Ortiz-Rojas, S., Papavassiliou, V., Prokopidis, P.: Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites. In: LREC, pp. 1252–1258 (2014)

    Google Scholar 

  5. Espla-Gomis, M., Carrasco, R.C., Sánchez-Cartagena, V.M., Forcada, M.L., Sánchez-Martınez, F., Pérez-Ortiz, J.A.: An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknown words. In: Proceedings of the 17th Annual Conference of the European Association for Machine Translation, pp. 19–26

    Google Scholar 

  6. Ljubešic, N., Espla-Gomis, M., Klubicka, F., Preradovic, N.M.: Predicting inflectional paradigms and lemmata of unknown words for semi-automatic expansion of morphological lexicons. In: RANLP, p. 379 (2014)

    Google Scholar 

  7. Sundetova, A., Karibayeva, A., Tukeyev, U.: Structural transfer rules for Kazkah-to-English machine translation in the free/open-source platform Apertium. TÜRKİYE BİLİŞİM VAKFI BİLGİSAYAR BİLİMLERİ ve MÜHENDİSLİĞİ DERGİSİ, 7(1 (Basılı 8) (2014)

    Google Scholar 

  8. Sundetova, A., Forcada, M.L., Shormakova, A., Aitkulova, A.: Structural transfer rules for Kazakh-to-English machine translation in the free/opensource platform Apertium. Proceedings of the I International Conference on Computer processing of Turkic Languages (TurkLang’13), pp. 322–331, Astana, Kazakhstan (2013)

    Google Scholar 

  9. Sánchez-Cartagenaa, V.M., Pérez-Ortiza, J.A., Sánchez-Martínez, F.: A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora. Comput. Speech Lang. 32(1), 46–90 (2015)

    Google Scholar 

  10. Forcada, M.L., GinestíRosell, M., Nordfalk, J., O’Regan, J., OrtizRojas, S., PérezOrtiz, J.A., SánchezMartínez, F., RamírezSánchez, G., Tyers, F.M.: Apertium: a free/opensource platform for rulebased machine translation. Mach. Transl. 25(2), 127–144 (2011)

    Google Scholar 

  11. Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A: Constraint Grammar: A Language Independent System for Parsing Unrestricted Text. Mouton de Gruyter (1995)

    Google Scholar 

  12. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhandos Zhumanov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Rakhimova, D., Zhumanov, Z. (2017). Complex Technology of Machine Translation Resources Extension for the Kazakh Language. In: Król, D., Nguyen, N., Shirai, K. (eds) Advanced Topics in Intelligent Information and Database Systems. ACIIDS 2017. Studies in Computational Intelligence, vol 710. Springer, Cham. https://doi.org/10.1007/978-3-319-56660-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56660-3_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56659-7

  • Online ISBN: 978-3-319-56660-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics