Abstract
Speech recognition systems play an important role in solving problems such as spoken content retrieval. Thus, we are interested in the task of speech recognition for low-resource languages, such as Amharic. The main challenges in solving Amharic speech recognition are the limited availability of corpora and complex morphological nature of the language. This paper presents a new corpus for the low-resource Amharic language which is suitable for training and evaluation of speech recognition systems. The corpus prepared contains 90 h of speech data with word and syllable-based annotation. Moreover, the use of syllable units for acoustic and language model in comparison with a morpheme-based model is presented. Syllable-based triphone speech recognition system provides a lower word error rate of 16.82% on the subset of the dataset. Moreover, syllable-based hybrid deep neural network with hidden Markov model provides a 14.36% word error rate.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
https://www.ethnologue.com/language/amh (last accessed on 30.11.2018).
- 2.
- 3.
- 4.
- 5.
- 6.
References
Gales, M., Steve, Y.: The application of hidden Markov models in speech recognition. Found. Trends® Signal Process. 1(3), 195–304 (2008)
Chelba, C., Timothy, H., Murat, S.: Retrieval and browsing of spoken content. IEEE Signal Process. Mag. 25(3), 39–49 (2008)
Larson, M., Stefan, E.: Using syllable-based indexing features and language models to improve German spoken document retrieval. In: Eighth European Conference on Speech Communication and Technology (2003)
Getahun, A.: (Modern Amharic Grammar in a simple approach), Addis Ababa (2008)
Baye, Y.: (Short and simple Amharic Grammar). Addis Ababa (2008)
Solomon, T.: Automatic speech recognition for Amharic. Ph.D. thesis (2006). http://www.sub.unihamburg.de/opus/volltexte/2006/2981/pdf/thesis.pdf
Solomon, T., Wolfgang, M.: Syllable-based speech recognition for Amharic. In: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. Association for Computational Linguistics (2007)
Martha, Y., Solomon, T., Wolfgang, M.: Morpheme-based automatic speech recognition for a morphologically rich language-Amharic. In: Spoken Languages Technologies for Under-Resourced Languages (2010)
Martha, Y., Solomon, T., Laurent, B.: Using different acoustic, lexical and language modeling units for ASR of an under-resourced language–Amharic. Speech Commun. 56, 181–194 (2014)
Solomon, T., Wolfgang, M., Bairu, T.: An Amharic speech corpus for large vocabulary continuous speech recognition. In: 9th European Conference on Speech Communication and Technology (2005)
Michael, M., Laurent, B., Million, M.: Amharic speech recognition for speech translation. Atelier Traitement Automatique des Langues Africaines (TALAF). JEP-TALN (2016)
Nirayo, H., Sebsibe, H.: Modeling improved syllabification algorithm for Amharic. In: Proceedings of the International Conference on Management of Emergent Digital EcoSystems. ACM (2012)
Chelba, C., Timothy, H., Ramabhadran, B., Saraçlar, M.: Speech retrieval. Spoken language understanding: systems for extracting semantic information from speech (2011)
Lee, L., et al.: Spoken content retrieval: beyond cascading speech recognition with text retrieval. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(9), 1389–1420 (2015)
Larson, M., Gareth, J.: Spoken content retrieval: a survey of techniques and technologies. Found. Trends® Inf. Retr. 5(4–5), 235–422 (2012)
Huang, X., et al.: Spoken Language Processing: A Guide to Theory, Algorithm, And System Development, vol. 95. Prentice Hall PTR, Upper Saddle River (2001)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Can, D., Murat, S.: Lattice indexing for spoken term detection. IEEE Trans. Audio Speech Lang. Process. 19(8), 2338–2347 (2011)
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning (2016)
Bahdanau, D., Chorowski, J., Serdyuk, D., Bengio, Y., et al.: End-to-end attention-based large vocabulary speech recognition. In: ICASSP, pp. 4945–4949. IEEE (2016)
Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: ICASSP, pp. 4960–4964. IEEE (2016)
Kim, S., Seltzer, M. L.: Towards language-universal end-to-end speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4914–4918. IEEE (2018)
Mikolov, T., et al.: Recurrent neural network based language model. In: 11th Annual Conference of the International Speech Communication Association (2010)
Andargachew, M.G., Binyam, E.S., Michael, G., Andreas, N.: Contemporary Amharic corpus: automatically morpho-syntactically tagged Amharic corpus. In: Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, pp. 65–70 (2018)
Sami, V., Peter, S., Stig-Arne, G., Mikko, K.: Morfessor 2.0: python implementation and extensions for Morfessor Baseline. Aalto University publication series SCIENCE + TECHNOLOGY, 25/2013. Aalto University, Helsinki (2013)
Mulugeta, S.: The syllable structure and syllabification in Amharic. Masters of philosophy in general linguistic thesis. Department of Linguistics, Trondheim, Norway (2001)
Acknowledgments
The authors would like to thank the DAAD and MoSHE for funding this research work and DW for allowing us to use Amharic radio program audio from their online archive.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gebreegziabher, N.H., Nürnberger, A. (2019). An Amharic Syllable-Based Speech Corpus for Continuous Speech Recognition. In: Martín-Vide, C., Purver, M., Pollak, S. (eds) Statistical Language and Speech Processing. SLSP 2019. Lecture Notes in Computer Science(), vol 11816. Springer, Cham. https://doi.org/10.1007/978-3-030-31372-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-31372-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31371-5
Online ISBN: 978-3-030-31372-2
eBook Packages: Computer ScienceComputer Science (R0)