Abstract
The paper is devoted to creating linguistic resources such as parallel corpora, dictionaries and transfer rules for machine translation for low resources languages. We describe the usage of Bitextor tool for mining parallel corpora from online texts, usage of dictionary enrichment methodology so that people without deep linguistic knowledge could improve word dictionaries, and we show how transfer rules for machine translation can be automatically learned from a parallel corpus. All describe methods were applied to Kazakh, Russian and English languages with a task of machine translation between these languages in mind.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Esplà-Gomis, M.: Bitextor: a free/open-source software to harvest translation memories from multilingual websites. In: Proceedings of MT Summit XII, Ottawa, Canada, Association for Machine Translation in the Americas (2009)
Esplà-Gomis, M., Forcada, M.: Combining content-based and URL-based heuristics to harvest aligned bitexts from multilingual sites with Bitextor. Prague Bull. Math. Linguist. 93, 77–86 (2010)
Rubino, R., Pirinen, T., Espla-Gomis, M., Ljubešic, N., Ortiz Rojas, S., Papavassiliou, V., Prokopidis, P., Toral, A.: Abu-MaTran at WMT 2015 translation task: morphological segmentation and web crawling. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 184–191 (2015)
Esplà-Gomis, M., Klubicka, F., Ljubesic, N., Ortiz-Rojas, S., Papavassiliou, V., Prokopidis, P.: Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites. In: LREC, pp. 1252–1258 (2014)
Espla-Gomis, M., Carrasco, R.C., Sánchez-Cartagena, V.M., Forcada, M.L., Sánchez-Martınez, F., Pérez-Ortiz, J.A.: An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknown words. In: Proceedings of the 17th Annual Conference of the European Association for Machine Translation, pp. 19–26
Ljubešic, N., Espla-Gomis, M., Klubicka, F., Preradovic, N.M.: Predicting inflectional paradigms and lemmata of unknown words for semi-automatic expansion of morphological lexicons. In: RANLP, p. 379 (2014)
Sundetova, A., Karibayeva, A., Tukeyev, U.: Structural transfer rules for Kazkah-to-English machine translation in the free/open-source platform Apertium. TÜRKİYE BİLİŞİM VAKFI BİLGİSAYAR BİLİMLERİ ve MÜHENDİSLİĞİ DERGİSİ, 7(1 (Basılı 8) (2014)
Sundetova, A., Forcada, M.L., Shormakova, A., Aitkulova, A.: Structural transfer rules for Kazakh-to-English machine translation in the free/opensource platform Apertium. Proceedings of the I International Conference on Computer processing of Turkic Languages (TurkLang’13), pp. 322–331, Astana, Kazakhstan (2013)
Sánchez-Cartagenaa, V.M., Pérez-Ortiza, J.A., Sánchez-Martínez, F.: A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora. Comput. Speech Lang. 32(1), 46–90 (2015)
Forcada, M.L., GinestíRosell, M., Nordfalk, J., O’Regan, J., OrtizRojas, S., PérezOrtiz, J.A., SánchezMartínez, F., RamírezSánchez, G., Tyers, F.M.: Apertium: a free/opensource platform for rulebased machine translation. Mach. Transl. 25(2), 127–144 (2011)
Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A: Constraint Grammar: A Language Independent System for Parsing Unrestricted Text. Mouton de Gruyter (1995)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Rakhimova, D., Zhumanov, Z. (2017). Complex Technology of Machine Translation Resources Extension for the Kazakh Language. In: Król, D., Nguyen, N., Shirai, K. (eds) Advanced Topics in Intelligent Information and Database Systems. ACIIDS 2017. Studies in Computational Intelligence, vol 710. Springer, Cham. https://doi.org/10.1007/978-3-319-56660-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-56660-3_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56659-7
Online ISBN: 978-3-319-56660-3
eBook Packages: EngineeringEngineering (R0)