Morpheme-Based and Factored Language Modeling for Amharic Speech Recognition

  • Martha Yifiru Tachbelie
  • Solomon Teferra Abate
  • Wolfgang Menzel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6562)


This paper presents the application of morpheme-based and factored language models in an Amharic speech recognition task. Since the use of morphemes in both acoustic and language models often results in performance degradation due to a higher acoustic confusability and since it is problematic to use factored language models in standard word decoders, we applied the models in a lattice rescoring framework. Lattices of 100 best alternatives for each test sentence of the 5k development test set have been generated using a baseline speech recognizer with a word-based backoff bigram language model. The lattices have then been rescored by means of various morpheme-based and factored language models. A slight improvement in word recognition accuracy has been observed with morpheme-based language models while factored language models led to notable improvements in word recognition accuracy.


Morpheme-based language modeling Amharic Lattice rescoring Factored language modeling Speech recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Junqua, J.-C., Haton, J.-P.: Robustness in Automatic Speech Recognition: Fundamentals and Applications. Kluwer Academic, London (1996)CrossRefGoogle Scholar
  2. 2.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University Engineering Department (2006)Google Scholar
  3. 3.
    Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-Based Language Modeling for Arabic Speech Recognition. In: ICSLP 2004, pp. 2245–2248 (2004)Google Scholar
  4. 4.
    Geutner, P.: Using Morphology towards Better Large-Vocabulary Speech Recognition Systems. IEEE International on Acoustics, Speech and Signal Processing I, 445–448 (1995)Google Scholar
  5. 5.
    Whittaker, E., Woodland, P.: Particle-Based Language Modeling. In: Proceeding of International Conference on Spoken Language Processing, pp. 170–173 (2000) Google Scholar
  6. 6.
    Byrne, W., Hajič, J., Ircing, P., Jelinek, F., Khudanpur, S., Krebc, P., Psutka, J.: On Large Vocabulary Continuous Speech Recognition of Highly Inflectional Language - Czech. In: Proceeding of the European Conference on Speech Communication and Technology, pp. 487–489 (2001)Google Scholar
  7. 7.
    Kirchhoff, K., Bilmes, J., Henderson, J., Schwartz, R., Noamany, M., Schone, P., Ji, G., Das, S., Egan, M., He, F., Vergyri, D., Liu, D., Duta, N.: Novel Speech Recognition Models for Arabic. In: Johns-Hopkins University Summer Research Workshop (2002) Google Scholar
  8. 8.
    Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M.: Morphologically Motivated Language Models in Speech Recognition. In: Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, pp. 121–126 (2005) Google Scholar
  9. 9.
    Abate, S. T.: Automatic Speech Recognition for Amharic. University of Hamburg (2006) Google Scholar
  10. 10.
    Tachbelie, M.Y., Menzel, W.: Sub-Word Based Language Modeling for Amharic. In: Proceedings of International Conference on Recent Advances in Natural Language Processing, pp. 564–571 (2007)Google Scholar
  11. 11.
    Tachbelie, M.Y., Menzel, W.: Morpheme-Based Language Modeling for Inflectional Language - Amharic. In: Nicolov, N., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing Selected Papers from RANLP 2007, vol. V, pp. 301–310. John Benjamin’s Publishing, Amsterdam (2009)CrossRefGoogle Scholar
  12. 12.
    Pellegrini, T., Lamel, L.: Investigating Automatic Decomposition for ASR in Less Represented Languages. In: Proceedings of INTERSPEECH 2006 (2006)Google Scholar
  13. 13.
    Pellegrini, T., Lamel, L.: Using Phonetic Features in Unsupervised Word Decompounding for ASR with Application to A Less-Represented Language. In: Proceedings of INTERSPEECH 2007, pp. 1797–1800 (2007)Google Scholar
  14. 14.
    Creutz, M., Lagus, K.: Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.1. A81, Neural Networks Research Center, Helsinki University of Technology (2005)Google Scholar
  15. 15.
    Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schone, P., Schwartz, R., Vergyri, D.: Novel Approaches to Arabic Speech Recognition: Report from the 2002 Johns-Hopkins Summer Workshop. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 344–347 (2003)Google Scholar
  16. 16.
    Tachbelie, M.Y., Abate, S.T., Menzel, W.: Morpheme-Based Language Modeling for Amharic Speech Recognition. In: Proceedings of the 4th Language and Technology Conference, pp. 114–118 (2009)Google Scholar
  17. 17.
    Kirchhoff, K., Bilmes, J., Duh, K.: Factored Language Models - a Tutorial. Dept. of Electrical Eng., Univ. of Washington (2008)Google Scholar
  18. 18.
    Duh, K., Kirchhoff, K.: Automatic Learning of Language Model Structure. In: Proceeding of International Conference on Computational Linguistics (2004)Google Scholar
  19. 19.
    Bender, M.L., Bowen, J.D., Cooper, R.L., Ferguson, C.A.: Languages in Ethiopia. Oxford Univ. Press, London (1976)Google Scholar
  20. 20.
    Yimam, B.: yäamarIŋa säwasäw. 2nd. ed. EMPDE, Addis Ababa (2007) Google Scholar
  21. 21.
    Abate, S.T., Menzel, W., Tafila, B.: An Amharic Speech Corpus for Large Vocabulary Continuous Speech Recognition. In: Proceedings of 9th European Conference on Speech Communication and Technology (2005)Google Scholar
  22. 22.
    Stolcke, A.: SRILM - an Extensible Language Modeling Toolkit. In: Proceedings of International Conference on Spoken Language Processing, vol. II, pp. 901–904 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Martha Yifiru Tachbelie
    • 1
  • Solomon Teferra Abate
    • 2
  • Wolfgang Menzel
    • 1
  1. 1.Department of InformaticsUniversity of HamburgHamburgGermany
  2. 2.LIG/GETALPJoseph Fourier UniversityFrance

Personalised recommendations