Advertisement

Development and analysis of multilingual phone recognition systems using Indian languages

  • K. E. Manjunath
  • Dinesh Babu Jayagopi
  • K. Sreenivasa Rao
  • V. Ramasubramanian
Article
  • 10 Downloads

Abstract

In this paper, the development of Multilingual Phone Recognition System (Multi-PRS) using four Indian languages—Kannada, Telugu, Bengali, and Odia—is described. Multi-PRS is an universal Phone Recognition System (PRS), which performs the phone recognition independent of any language. International phonetic alphabets based transcription is used for grouping the acoustically similar phonetic units from multiple languages. Multilingual phone recognisers for Indian languages are studied using two broad groups namely—Dravidian languages and Indo-Aryan languages. Dravidian and Indo-Aryan languages are grouped separately to develop Bilingual PRSs. We have explored both HMMs and DNNs for developing PRSs under both context-dependent and context-independent setups. The state-of-the-art DNNs have outperformed the HMMs. The performance of Multi-PRSs is analysed and compared with that of the monolingual PRSs. The advantages of Multi-PRSs over monolingual PRSs are discussed. Further, we have developed tandem Multi-PRSs using phone posteriors as tandem features to improve the performance of the baseline Multi-PRSs. It is found that the tandem Multi-PRSs have outperformed the baseline Multi-PRSs in all the cases.

Keywords

Monolingual Bilingual Multilingual Phone recognition Indian languages Kannada Telugu Bengali Odia DNNs 

Notes

Acknowledgements

We thank Prof. B. Yegnanarayana, Prof. K. Sri Rama Murthy, and Prof. R. Kumaraswamy for providing Kannada and Telugu datasets. These datasets were developed as a part of the consortium project titled ”Prosodically guided phonetic engine for searching speech databases in Indian languages” supported by DIT, New Delhi, India.

References

  1. Corredor-Ardoy, C. et al. (1998). Multilingual phone recognition of spontaneous telephone speech. In ICASSP, pp. 413–416.Google Scholar
  2. Frankel, J., Magimai-Doss, M., King, S., Livescu, K., & Cetin, O. (2007). Articulatory feature classifiers trained on 2000 hours of telephone speech. In Interspeech.Google Scholar
  3. Gangashetty, S. V., Chandra Sekhar, C., & Yegnanarayana, B. (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In International conference on non-linear speech processing (NOLISP), pp. 303–317.Google Scholar
  4. Golla V. (2011). California Indian languages. London: University of California Press—Language Arts & DisciplinesGoogle Scholar
  5. Hermansky, H., Ellis, D. P., & Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In IEEE international conference on acoustics, speech and signal processing (ICASSP), vol. 3, pp. 1635–1638.Google Scholar
  6. Ketabdar, H., & Bourlard, H. (2008). Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 4065–4068.Google Scholar
  7. Kiran, R. R., Kumar, S. S., Manjunath, K. E., Satapathy, B., Chaturvedi, A., Pati, D., et al. (2013). Automatic phonetic and prosodic transcription for Indian languages: Bengali and Odia. In 10th International conference on natural language processing (ICON).Google Scholar
  8. Madhavi, M. C., Sharma, S., & Patil, H. A. (2014). Development of language resources for speech application in Gujarati and Marathi. In IEEE International conference on asian language processing (IALP), vol. 1, pp. 115–118.Google Scholar
  9. Manjunath, K. E., & Sreenivasa Rao, K. S. (2014). Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In IEEE national conference on communications (NCC).Google Scholar
  10. Manjunath, K. E., Sreenivasa Rao, K. S., & Jayagopi, D. B. (2017). Development of multilingual phone recognition system for Indian languages. In IEEE international conference on signal processing, informatics, communication and energy systems (SPICES).Google Scholar
  11. Manjunath, K. E., Sreenivasa Rao, K. S., Jayagopi, D. B., & Ramasubramanian, V. (2018). Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion. In INTERSPEECH.Google Scholar
  12. Mohan, A., Rose, R., Ghalehjegh, S. H., & Umesh, S. (2014). Acoustic modelling for speech recognition in Indian languages inan agricultural commodities task domain. Speech Communication, 56, 167–180.CrossRefGoogle Scholar
  13. Muller, M., Stuker, S., & Waibel, A. (2016). Towards improving low-resource speech recognition using articulatory and language features. In International workshop on spoken language translation (IWSLT), pp. 1–7.Google Scholar
  14. Muller, M., & Waibel, A. (2015). Using language adaptive deep neural networks for improved multilingual speech recognition. In International workshop on spoken language translation (IWSLT).Google Scholar
  15. Pinto, J., Garimella, S., Magimai-Doss, M., Hermansky, H., & Bourlard, H. (2011). Analysis of MLP-based hierarchical phoneme posterior probability estimator. IEEE transactions on audio, speech, and language processing, 19(2), 225–241.CrossRefGoogle Scholar
  16. Povey, D. et al. (2011). The Kaldi speech recognition toolkit, IEEE workshop on ASRU. http://kaldi-asr.org/
  17. Rabiner, L., Juang, B., & Yegnanarayana, B. (2008). Fundamentals of speech recognition. London: Pearson Education.Google Scholar
  18. Riedhammer, K. T., Bocklet, T., Ghoshal, A., & Povey, D. (2012). Revisiting semi-continuous hidden Markov models. In ICASSP, pp. 4721– 4724.Google Scholar
  19. Santhosh Kumar, C., Mohandas, V. P., & Haizhou, L. (2005). Multilingual speech recognition: A unified approach. In Interspeech.Google Scholar
  20. Sarma, B. D., Sarma, M., Sarma, M., & Prasanna, S. R. M. (2013). Development of assamese phonetic engine: Some issues. In IEEE INDICON, pp. 1–6.Google Scholar
  21. Schultz, T., & Kirchhoff, K. (2006). Multilingual speech processing. Cambridge: Academic Press.Google Scholar
  22. Schultz, T., & Waibel, A. (1998a). Language independent and language adaptive large vocabulary speech recognition. In International conference on spoken language processing (ICSLP), pp. 1819–1822.Google Scholar
  23. Schultz, T., & Waibel, A. (1998b). Multilingual and crosslingual speech recognition. In Proceedings of DARPA workshop on broadcast news transcription and understanding, pp. 259–262.Google Scholar
  24. Schultz, T., & Waibel, A. (2001). Language independent and language adaptive acoustic modeling for speech recognition. Speech Communication, 35, 31–51.CrossRefzbMATHGoogle Scholar
  25. Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013). Development of Kannada speech corpus for prosodically guided phonetic search engine. In O-COCOSDA, pp. 1–6.Google Scholar
  26. Siniscalchi, S. M., Lyu, D., Svendsen, T., & Lee, C. (2012). Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE Transactions on Acoustics, Speech, and Signal Processing, 20(3), 875–887.Google Scholar
  27. Sunil Kumar, S. B., Sreenivasa Rao, K., & Pati, D. (2013). Phonetic and prosodically rich transcribed speech corpus in Indian languages : Bengali and Odia. In Sixteenth International Oriental COCOSDA.Google Scholar
  28. The International Phonetic Association. (2007). Handbook of the international phonetic association. Cambridge University Press. https://www.internationalphoneticassociation.org/
  29. Vuppala, A. K., Yadav, J., Chakrabarti, S., & Sreenivasa Rao, K. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech and Language Processing, 20, 1894–1903.CrossRefGoogle Scholar
  30. Zhang, X., Trmal, J., Povey, D., & Khudanpur, S. (2014). Improving deep neural network acoustic models using generalized maxout networks. In ICASSP, pp. 215–219.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • K. E. Manjunath
    • 1
  • Dinesh Babu Jayagopi
    • 1
  • K. Sreenivasa Rao
    • 2
  • V. Ramasubramanian
    • 1
  1. 1.International Institute of Information TechnologyBangaloreIndia
  2. 2.Indian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations