Skip to main content

Advertisement

Log in

Development of Standard Yorùbá speech-to-text system using HTK

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, a Standard Yorùbá speech-to-text system capable of recognizing isolated words spoken by the users based on previously stored data was designed and implemented. This system adopted syllable-based approach, and carefully-selected words were recorded, analyzed and annotated, using Praat software. An experimental database of six native speakers was taken, each speaking 25 bi-syllabic and 25 tri-syllabic words, under an acoustically-controlled room. The meaningful spectral coefficients were successfully extracted using Mel-frequency cepstral coefficients technique and Hidden Markov Model Toolkit was used to implement the system. A graphical user interface was also developed to make the system accessible and more interactive. Furthermore, the system was tested and evaluated based on the perception of native speakers of the language. The overall accuracy for bi-syllabic and tri-syllabic words was 76 and 84 % respectively. These results obtained for both bi and tri-syllabic words showed that this system was a promising approach that could be adopted for Standard Yorùbá continuous speech recognition system as this will make the system useable for the foreign speaker.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Abdul-Wahab, F. A., Shahrul, A. M. Y., & Hariharan, M. (2013). Yorùbá Automatic speech recognition: A review. In International conference of rural ICT development, Melaka, Malaysia (pp. 116–121).

  • Adeniran, W. (2015). Will the Yoruba language survive beyond the 21st century? A lecture delivered at the Yoruba day celebration in Stockholm, Sweden. Retrieved November 07, 2015.

  • Afolabi, A. O., & Wahab, A. J. (2013). Implementation of Yoruba text-to-speech E-learning system. International Journal of Engineering Research and Technology, 2(11), 1055–1064.

    Google Scholar 

  • Ahmad, M. A., Gunawan, T. S., & Khalifa, O. O. (2010). English digits speech recognition system based on hidden Markov models. In International Islamic University Malaysia, international conference on computer and communication engineering (ICCCE), 11–13, Kuala Lumpur, Malaysia. Corporation and Cambridge University Engineering Department.

  • Bamgboṣe, A. (1969). Yorùbá. In Elizabeth Dunstan (Ed.), Twelve Nigerian languages (p. 166). New York: Africana Publishing Corp.

    Google Scholar 

  • Cini, K., & Balakrishnan, K. (2012). Continuous speech recognition system for Malayalam language using PLP cepstral coefficient. International Journal of Computing and Business Research (IJCBR), 3(1).

  • Das, R., & Das, P. K. (2013). Design and implementation of monophones and triphones based speech recognition systems for Spanish language. Bharati International Journal of Information Technology (BIJIT), 2(3), 237–253.

    Google Scholar 

  • Dopamu, P. (2004). Understanding Yorùbá life and culture. Trenton, NJ: Africa World Press Inc.

    Google Scholar 

  • Dua, M., Aggarwal, R. K., Kadyan, V, & Dua, S. (2013). Punjabi continuous speech recognition using HTK, Department of Computer Engineering, NIT, India.

  • Gales, M. J. F., Kim, D. Y., Woodland, P. C., Chan, R. H. Y., Mrva, D., Sinha, R., et al. (2006). Progress in the CU-HTK broadcast news transcription system. IEEE Transactions on Audio, Speech and Language Processing, 14(5), 1513–1525.

    Article  Google Scholar 

  • Graham, W. (2014). Syllabic consonants-speech and language therapy information. Retrieved December 13, 2015 from http://www.sltinfo.com/syllabic-consonants/.

  • Ishizuka, K., & Nakatani, T. (2006). Study of noise robust voice activity detection based on periodic component to aperiodic component ratio. In Statistical and perceptual audition (SAPA) (pp. 65–70). Retrieved from ISCA Archive http://www.isca-speech.org/archive.

  • Kumolalo, F. O., Adagunodo, E. R., & Odejobi, O. A. (2010). Development of a Syllabicator for Yorùbá Language. Department of Computer Science and Engineering, Obafemi Awolowo University, Ile-Ife, Nigeria Proceedings of OAU TekConf.

  • Majdalawieh, O., Gu, J., & Meng, M. (2004). An HTK-developed Hidden Markov Model (hmm) for a voice-controlled robotic system. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan (pp. 4050–4055).

  • Manadhar, S., Ziolko, B., Wilson, R. C., & Galka, J. (2008). Application of HTK to the Polish language. International Conference on Audio, Language and Image Processing, 2(1), 234–245.

    Google Scholar 

  • Maya, M., Elizabeth, S., & Varghese, W. S. (2010). Malayalam word identification for speech recognition system. Kerala: Indian Institute of Information Technology and Management (IIITM-K).

    Google Scholar 

  • Mohri, M. (2002). Edit-distance of weighted automata: General definitions and algorithms. International Journal of Foundations of Computer Science, 14(6), 957–982.

    Article  MathSciNet  MATH  Google Scholar 

  • Odéjobí, O. A. (2008). Recognition of tones in Yoruba speech: Experiments with artificial neural networks. In B. Prasad & S. R. M. Prasanna (Eds.), Studies Computational Intelligence (SCI) (vol. 83, pp. 23–47).

  • Oloruntoyin, S. T. (2014). Development of Yorùbá language text-to-speech E-learning system. International Journal of Scholarly Research Gate, 2(1), 345–367.

    Google Scholar 

  • Oyekanmi, E. O., Oluwadare, S. A., & Alese, B. K. (2013). Intelligent system learning and understanding of Yorùbá language. International Journal of Computer and Information Technology, 2(5), 993–997.

    Google Scholar 

  • Paul, D. B. (2010). A Tutorial of HMM Tool Kit (HTK): A power point presentation at the Department of Electrical and Computer Engineering Binghamton University.

  • Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliff, NJ: Prentice-Hall.

    MATH  Google Scholar 

  • Saini, P., Kaur, P., & Dua, M. (2013). Automatic speech segmentation for Hindi language using HTK. International Journal of Engineering Trends and Technology (IJETT), 4(6), 2451–2555.

    Google Scholar 

  • Williamson, K., & Blench, R. (2000). Niger-Congo. African languages: An introduction (pp. 1–42).

  • Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2002). The HTK book, Microsoft. Cambridge University Engineering Department.

  • Zhang, G., Yin, J., Liu, Q., & Yang, C. (2011). The fixed-point optimization of Mel frequency cepstrum coefficients for speech recognition. School of Applied Sciences, Harbin University of Science and Technology, Harbin, China.

Download references

Funding

Self sponsored.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. N. Iyanda.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adetunmbi, O.A., Obe, O.O. & Iyanda, J.N. Development of Standard Yorùbá speech-to-text system using HTK. Int J Speech Technol 19, 929–944 (2016). https://doi.org/10.1007/s10772-016-9380-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9380-2

Keywords

Navigation