Advertisement

Development and analysis of Punjabi ASR system for mobile phones under different acoustic models

  • Puneet MittalEmail author
  • Navdeep Singh
Article
  • 14 Downloads

Abstract

Speech technology is widely gaining importance in our daily life. Speech based mobile phone applications are becoming popular in masses due to their usability and ease of access. Speech technology is helping people, with disabilities like blindness and physical abnormalities, to access and control mobile phone applications through voice, without using keypad or touchpad. Punjabi is one of the widely spoken language in various parts of the world. In this paper, an automatic speech recognition (ASR) system for mobile phone applications in Punjabi has been proposed and implemented for four different acoustic models- context independent, context dependent untied, context dependent tied, and context dependent deleted interpolation models. The proposed ASR is evaluated at 4, 16, 32 and 64 GMMs for performance analysis in terms of parameters like accuracy, word error rate and storage space required. It is observed that context dependent untied models outperform others by having better accuracy and lower word error rate, while context independent models require less storage space than others. The choice of fruitful acoustic model depends upon the available storage space as well as desired recognition accuracy. Mobile phones having limited resources may use context independent models, while context dependent untied models can be used to develop ASR system for high end mobile phones.

Keywords

Acoustic model ASR Context dependent Context independent HMM Speech recognition 

Notes

References

  1. Acoustic Model Types – CMUSphinx Open Source Speech Recognition. (n.d.). Retrieved March 16, 2018 from https://cmusphinx.github.io/wiki/acousticmodeltypes/.
  2. Adda-Decker, M., Adda, G., Gauvain, J., & Lamel, L. (1999). Large vocabulary speech recognition in French. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 45–48 vol.1). IEEE.  https://doi.org/10.1109/ICASSP.1999.758058.
  3. Aggarwal, R. K., & Dave, M. (2011). Discriminative techniques for hindi speech recognition system (pp. 261–266). Berlin: Springer.  https://doi.org/10.1007/978-3-642-19403-0_45.Google Scholar
  4. Bahl, L. R., Jelinek, F., & Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(2), 179–190.  https://doi.org/10.1109/TPAMI.1983.4767370.CrossRefGoogle Scholar
  5. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.  https://doi.org/10.2307/2239727.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Beaufays, F., & Weintraub, M. & Yochai Konig. (1999). Discriminative mixture weight estimation for large Gaussian mixture models. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 337–340 vol.1). IEEE.  https://doi.org/10.1109/ICASSP.1999.758131.
  7. Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.  https://doi.org/10.1016/J.SPECOM.2013.07.008.CrossRefGoogle Scholar
  8. Beulen, K., Bransch, E., & Ney, H. (1997). State tying for context dependent phoneme models. In European Conference on Speech Comnumicution and Technology (pp. 1179–1182).Google Scholar
  9. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Dey, N., & Ashour, A. S. (2018). Sources localization and DOAE techniques of moving multiple sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 23–34). Cham: Springer.  https://doi.org/10.1007/978-3-319-73059-2.CrossRefGoogle Scholar
  11. Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.  https://doi.org/10.1007/978-3-319-73059-2.CrossRefGoogle Scholar
  12. Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 49–52). Cham: Springer.  https://doi.org/10.1007/978-3-319-73059-2.CrossRefGoogle Scholar
  13. Dua, M., Kadyan, V., Aggarwal, R. K., & Dua, S. (2012). Punjabi speech to text system for connected words. In Fourth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom2012) (pp. 206–209). Institution of Engineering and Technology.  https://doi.org/10.1049/cp.2012.2528.
  14. Ferreiros, J., & Pardo, J. M. (1999). Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations. Speech Communication, 29(1), 65–76.  https://doi.org/10.1016/S0167-6393(99)00013-8.CrossRefGoogle Scholar
  15. Hasnat, M. A., Mowla, J., & Khan, M. (n.d.). Isolated and continuous bangla speech recognition: implementation, performance and application perspective. Retrieved January 3, 2018 from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.372&rep=rep1&type=pdf.
  16. History of Punjabi Language & Gurmukhi Alphabet | Trumbull, CT Patch. (n.d.). Retrieved January 4, 2018 from https://patch.com/connecticut/trumbull/history-of-punjabi-language--gurmukhi-alphabet.
  17. Huang, X. D., Hwang, M.-Y., Li, J., & Mahajan, M. (n.d.). Deleted interpolation and density sharing for continuous hidden Markov models. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 2, pp. 885–888). IEEE.  https://doi.org/10.1109/ICASSP.1996.543263.
  18. Huang, X. D., & Jack, M. A. (1988). Hidden Markov modelling of speech based on a semicontinuous model. Electronics Letters, 24(1), 6–7.CrossRefGoogle Scholar
  19. Huang, X. D., & Jack, M. A. (1990). Semi-continuous hidden Markov models for speech signals. Readings in speech recognition. San Francisco: Morgan Kaufmann Publishers Inc. Retrieved January 4, 2018 from https://dl.acm.org/citation.cfm?id=108259.
  20. Kumar, Y., & Singh, N. (2017). An automatic speech recognition system for spontaneous Punjabi speech corpus. International Journal of Speech Technology, 20(2), 297–303.  https://doi.org/10.1007/s10772-017-9408-2.CrossRefGoogle Scholar
  21. Liu, Y., & Fung, P. (2004). State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition. IEEE Transactions on Speech and Audio Processing, 12(4), 351–364.  https://doi.org/10.1109/TSA.2004.828638.CrossRefGoogle Scholar
  22. Lučić, B., Ostrogonac, S., Vujnović Sedlar, N., & Sečujski, M. (2015). Educational applications for blind and partially sighted pupils based on speech technologies for Serbian. The Scientific World Journal. 2015.  https://doi.org/10.1155/2015/839252.Google Scholar
  23. Nkosi, M., Manamela, M., & Gasela, N. (n.d.). Creating a pronunciation dictionary for automatic speech recognition -a morphological approach. Retrieved January 3, 2018 from http://www.satnac.org.za/proceedings/2011/papers/Network_Services/176.pdf.
  24. Patel, H. N., & Virparia, P. V. (2011). A Small Vocabulary Speech Recognition for Gujarati. International Journal of Advanced Research in Computer Science, 2(1), 208–210.Google Scholar
  25. Persian Influence on Punjabi (Shahmukhi and Gurumukhi) Language | Universal Urdu Post. (n.d.). Retrieved March 16, 2018 from http://universalurdupost.com/english-articles/12-01-2016/33581.
  26. Pronunciation guide for English and Academic English Dictionaries at OxfordLearnersDictionaries.com. (n.d.). Retrieved March 16, 2018 from https://www.oxfordlearnersdictionaries.com/about/pronunciation_english.html.
  27. Punjabi/Phonetics - Wikibooks, open books for an open world. (n.d.). Retrieved March 16, 2018 from https://en.wikibooks.org/wiki/Punjabi/Phonetics.
  28. Radeck-Arneth, S., Milde, B., Lange, A., Gouvêa, E., Radomski, S., Mühlhäuser, M., & Biemann, C. (2015). Open source german distant speech recognition: corpus and acoustic model (pp. 480–488). Cham: Springer.  https://doi.org/10.1007/978-3-319-24033-6_54.Google Scholar
  29. Ruan, S., Wobbrock, J. O., Liou, K., Ng, A., & Landay, J. (2016). Speech is 3 × faster than typing for english and mandarin text entry on mobile devices. Retrieved January 3, 2018 from http://arxiv.org/abs/1608.07323.
  30. Sarma, H., Saharia, N., & Sharma, U. (2017). Development and analysis of speech recognition systems for assamese language using HTK. ACM Transactions on Asian and Low-Resource Language Information Processing, 17(1), 1–14.  https://doi.org/10.1145/3137055.CrossRefGoogle Scholar
  31. Satori, H., & ElHaoussi, F. (2014). Investigation Amazigh speech recognition using CMU tools. International Journal of Speech Technology, 17(3), 235–243.  https://doi.org/10.1007/s10772-014-9223-y.CrossRefGoogle Scholar
  32. Schmitt, A., Zaykovskiy, D., & Minker, W. (2008). Speech recognition for mobile devices. International Journal of Speech Technology, 11(2), 63–72.  https://doi.org/10.1007/s10772-009-9036-6.CrossRefGoogle Scholar
  33. Shackle, C. (n.d.). Punjabi language | Britannica.com. Retrieved March 16, 2018 from https://www.britannica.com/topic/Punjabi-language.
  34. Smart Voice Recorder for Android - Download. (n.d.). Retrieved January 4, 2018 from https://smart-voice-recorder.en.softonic.com/android.
  35. Taylor, S. (2010). “Striking a healthy balance”: speech technology in the mobile ecosystem. In A. Neustein (Ed.), Advances in speech recognition (pp. 19–30). Boston: Springer US.  https://doi.org/10.1007/978-1-4419-5951-5_2.CrossRefGoogle Scholar
  36. Thalengala, A., & Shama, K. (2016). Study of sub-word acoustical models for Kannada isolated word recognition system. International Journal of Speech Technology, 19(4), 817–826.  https://doi.org/10.1007/s10772-016-9374-0.CrossRefGoogle Scholar
  37. Thangarajan, R., Natarajan, A. M., & Selvam, M. (2009). Syllable modeling in continuous speech recognition for Tamil language. International Journal of Speech Technology, 12, 47–57.  https://doi.org/10.1007/s10772-009-9058-0.CrossRefGoogle Scholar
  38. The World Factbook — Central Intelligence Agency. (n.d.). Retrieved March 16, 2018 from https://www.cia.gov/library/publications/the-worldfactbook/fields/2098.html.
  39. Training an acoustic model for CMUSphinx – CMUSphinx Open Source Speech Recognition. (n.d.). Retrieved March 16, 2018 from https://cmusphinx.github.io/wiki/tutorialam/.
  40. Walha, R., Drira, F., El-Abed, H., and A. M. A (2012). On developing an automatic speech recognition system for standard arabic language. International Journal of Electrical and Computer Engineering, 6(10), 1138–1143.Google Scholar
  41. Why your smartphone won’t be your next PC | Digital Trends. (n.d.). Retrieved January 4, 2018 from https://www.digitaltrends.com/computing/why-your-smartphone-wont-be-your-next-pc/.
  42. Yang, H., Oehlke, C., & Meinel, C. (2011). German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings. In 2011 10th IEEE/ACIS International Conference on Computer and Information Science (pp. 201–206). IEEE.  https://doi.org/10.1109/ICIS.2011.38.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of CSEBBSBECFatehgarh SahibIndia
  2. 2.Department of Computer ScienceMata Gujri CollegeFatehgarh SahibIndia

Personalised recommendations