Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna
In this research an attempt have been made to experiment on Amharic-Tigrigna machine translation for promoting information sharing. Since there is no Amharic-Tigrigna parallel text corpus, we prepared a parallel text corpus for Amharic-Tigrigna machine translation system from religious domain specifically from bible. Consequently, the data preparation involves sentence alignment, sentence splitting, tokenization, normalization of Amharic-Tigrigna parallel corpora and then splitting the dataset into training, tuning and testing data. Then, Amharic-Tigrigna translation model have been constructed using training data and further tuned for better translation. Finally, given target language model, the Amharic-Tigrigna translation system generates a target output with reference to translation model using word and morpheme as a unit. The result we found from the experiment is promising to design Amharic-Tigrigna machine translation system between resource deficient languages. We are now working on post-editing to enhance the performance of the bi-lingual Amharic-Tigrigna translator.
KeywordsUnder-resourced language Amharic-Tigrigna Semitic language Machine translation
We would like to thank Ethiopia Ministry of Communication and Information Technology (MCIT) for funding to collect parallel text corpus and conduct an experiement for a bilingual Amharic-Tigrigna statistical machine translation research project.
- 1.Nakamura, S.: Overcoming the language barrier with speech translation technology. Sci. Technol. Trends Q. Rev. 31, 35–48 (2009)Google Scholar
- 2.What is machine translation, SYSTRAN: we speak your industry’s language. http://www.systran.co.uk/systran/corporate-profile/translation-technology/what-is-machine-translation
- 3.Martínez, L.G.: Human Translation Versus Machine Translation and Full Post-editing of Raw Machine Translation Output. Dublin City University, Dublin (2003)Google Scholar
- 4.Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 20th edn. SIL, Dallas (2017)Google Scholar
- 5.Zekaria, S.: Summary and Statistical Report of the 2007 Population and Housing Census. Central Statistical Agency, Addis Ababa (2008)Google Scholar
- 6.Ager, S.: Omniglot, the online Encyclopedia of writing systems and languagesGoogle Scholar
- 7.Hudson, G.: The world’s major languages: Amharic. In: The World’s Major Languages, 2nd edn, pp. 594–614. Routledge, Oxon/New York (2009)Google Scholar
- 8.Abyssinica dictionary: Amharic, the official language of Ethiopia (2015)Google Scholar
- 9.Teferra, S., Menzel, W., Tafila, B.: An Amharic speech corpus for large vocabulary continuous speech recognition. In: Proceedings of the XVth International Conference of Ethiopian Studies, Hamburg, Germany (2005)Google Scholar
- 10.Woldeyohannis, M.M., Besacier, L., Meshesha, M.: A corpus for Amharic-English speech translation: the case of tourism domain. In: Mekuria, F., Nigussie, E.E., Dargie, W., Edward, M., Tegegne, T., et al. (eds.) ICT4DA 2017. LNICST, vol. 244, pp. 129–139. Springer, Cham (2018)Google Scholar
- 11.Besacier, L., Le, V.-B., Boitet, C., Berment, V.: ASR and translation for under-resourced languages, Grenoble cedex 9, FranceGoogle Scholar
- 12.Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the Workshop on Morphological and Phonological Learning of ACL-02, pp. 21–30, Philadelphia, Pennsylvania (2002)Google Scholar