Abstract
In this research an attempt have been made to experiment on Amharic-Tigrigna machine translation for promoting information sharing. Since there is no Amharic-Tigrigna parallel text corpus, we prepared a parallel text corpus for Amharic-Tigrigna machine translation system from religious domain specifically from bible. Consequently, the data preparation involves sentence alignment, sentence splitting, tokenization, normalization of Amharic-Tigrigna parallel corpora and then splitting the dataset into training, tuning and testing data. Then, Amharic-Tigrigna translation model have been constructed using training data and further tuned for better translation. Finally, given target language model, the Amharic-Tigrigna translation system generates a target output with reference to translation model using word and morpheme as a unit. The result we found from the experiment is promising to design Amharic-Tigrigna machine translation system between resource deficient languages. We are now working on post-editing to enhance the performance of the bi-lingual Amharic-Tigrigna translator.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at https://www.perl.org.
- 2.
Available at https://www.python.org.
- 3.
The unit obtained with Morfessor segmentation is referred here as morpheme without any linguistic definition of morpheme.
References
Nakamura, S.: Overcoming the language barrier with speech translation technology. Sci. Technol. Trends Q. Rev. 31, 35–48 (2009)
What is machine translation, SYSTRAN: we speak your industry’s language. http://www.systran.co.uk/systran/corporate-profile/translation-technology/what-is-machine-translation
Martínez, L.G.: Human Translation Versus Machine Translation and Full Post-editing of Raw Machine Translation Output. Dublin City University, Dublin (2003)
Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 20th edn. SIL, Dallas (2017)
Zekaria, S.: Summary and Statistical Report of the 2007 Population and Housing Census. Central Statistical Agency, Addis Ababa (2008)
Ager, S.: Omniglot, the online Encyclopedia of writing systems and languages
Hudson, G.: The world’s major languages: Amharic. In: The World’s Major Languages, 2nd edn, pp. 594–614. Routledge, Oxon/New York (2009)
Abyssinica dictionary: Amharic, the official language of Ethiopia (2015)
Teferra, S., Menzel, W., Tafila, B.: An Amharic speech corpus for large vocabulary continuous speech recognition. In: Proceedings of the XVth International Conference of Ethiopian Studies, Hamburg, Germany (2005)
Woldeyohannis, M.M., Besacier, L., Meshesha, M.: A corpus for Amharic-English speech translation: the case of tourism domain. In: Mekuria, F., Nigussie, E.E., Dargie, W., Edward, M., Tegegne, T., et al. (eds.) ICT4DA 2017. LNICST, vol. 244, pp. 129–139. Springer, Cham (2018)
Besacier, L., Le, V.-B., Boitet, C., Berment, V.: ASR and translation for under-resourced languages, Grenoble cedex 9, France
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the Workshop on Morphological and Phonological Learning of ACL-02, pp. 21–30, Philadelphia, Pennsylvania (2002)
Acknowledgement
We would like to thank Ethiopia Ministry of Communication and Information Technology (MCIT) for funding to collect parallel text corpus and conduct an experiement for a bilingual Amharic-Tigrigna statistical machine translation research project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Woldeyohannis, M.M., Meshesha, M. (2018). Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna. In: Mekuria, F., Nigussie, E., Dargie, W., Edward, M., Tegegne, T. (eds) Information and Communication Technology for Development for Africa. ICT4DA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 244. Springer, Cham. https://doi.org/10.1007/978-3-319-95153-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-95153-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95152-2
Online ISBN: 978-3-319-95153-9
eBook Packages: Computer ScienceComputer Science (R0)