Skip to main content
Log in

Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper implements and evaluates the performance of a discriminatively trained continuous Hindi language speech recognition system. The system uses maximum mutual information and minimum phone error discriminative techniques with various numbers of Gaussian mixtures to train the automatic speech recognition (ASR) system. The training dataset consists of Hindi speech transcription. The experiments show a significant performance gain over maximum likelihood-based Hindi language speech recognition system. The system uses an efficient recurrent neural network (RNN)-based language modeling. The results indicate that the use of RNN-based language modeling enhances the performance of the ASR system. Further, the interpolation of n-gram language model (LM) with the RNNLM exhibits an additional increase in recognition performance of the implemented system. The proposed system introduces the concept of speaker adaption using maximum likelihood linear regression technique. The paper also gives an overview of the techniques used for discriminative training along with practical issues involved in their implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Liu H, Yin J, Luo X, Zhang S (2018) Foreword to the special issue on recent advances on pattern recognition and artificial intelligence. Neural Comput Appl 29(1):1–2

    Article  Google Scholar 

  2. de Jesús Rubio J et al (2013) A method for online pattern recognition of abnormal eye movements. Neural Comput Appl 22(3–4):597–605

    Article  Google Scholar 

  3. Acır N (2006) A modified hybrid neural network for pattern recognition and its application to SSW complex in EEG. Neural Comput Appl 15(1):49–54

    Article  Google Scholar 

  4. Cervelló-Royo R, Guijarro F, Michniuk K (2015) Stock market trading rule based on pattern recognition and technical analysis: forecasting the DJIA index with intraday data. Expert Syst Appl 42(14):5963–5975

    Article  Google Scholar 

  5. Arabacı H, Bilgin O (2010) Automatic detection and classification of rotor cage faults in squirrel cage induction motor. Neural Comput Appl 19(5):713–723

    Article  Google Scholar 

  6. Cardoso JS, Pardo XM, Paredes R (2017) Foreword to the special issue on pattern recognition and image analysis. Neural Comput Appl 28(9):2371–2372

    Article  Google Scholar 

  7. Daneshyari M (2010) Chaotic neural network controlled by particle swarm with decaying chaotic inertia weight for pattern recognition. Neural Comput Appl 19(4):637–645

    Article  Google Scholar 

  8. Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Zweig G (2017) The Microsoft 2016 conversational speech recognition system. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5255–5259

  9. Adiga A, Magimai M, Seelamantula CS (2013) Gammatone wavelet cepstral coefficients for robust speech recognition. In: TENCON 2013-2013 IEEE Region 10 conference (31194). IEEE, pp 1–4

  10. Aggarwal RK, Dave M (2011) Discriminative techniques for Hindi speech recognition system. In: Information systems for Indian languages, pp 261–266

  11. Biswas A et al (2015) Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Comput Electr Eng 42(2015):12–22

    Article  Google Scholar 

  12. Shao Y et al (2009) An auditory-based feature for robust speech recognition. In: IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009. IEEE, pp 4625–4628

  13. Baba Ali B, Sameti H, Falk TH (2011) A model distance maximizing framework for speech recognizer-based speech enhancement. AEU Int J Electron Commun 65(2):99–106

    Article  Google Scholar 

  14. Huang Z, Siniscalchi SM, Lee C-H (2016) A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition. Neurocomputing 218:448–459

    Article  Google Scholar 

  15. Sun S et al (2017) An unsupervised deep domain adaptation approach for robust speech recognition. Neurocomputing 257:79–87

    Article  Google Scholar 

  16. Hayasaka N, Kawamura A, Sasaoka N (2017) Noise-robust scream detection using band-limited spectral entropy. AEU Int J Electron Commun 76:117–124

    Article  Google Scholar 

  17. Mahapatra A et al (2014) Human recognition system for outdoor videos using Hidden Markov model. AEU Int J Electron Commun 68(3):227–236

    Article  Google Scholar 

  18. Vertanen K (2004) An overview of discriminative training for speech recognition. University of Cambridge, Cambridge, pp 1–14

    Google Scholar 

  19. Gillick D, Wegmann S, Gillick L (2012) Discriminative training for speech recognition is compensating for statistical dependence in the HMM framework. In: 2012 IEEE acoustics, speech and signal processing (ICASSP-12) conference, Kyoto. IEEE, pp 4745–4748

  20. McDermott E, Hazen TJ, Le Roux J, Nakamura A, Katagiri S (2007) Discriminative training for large-vocabulary speech recognition using minimum classification error. IEEE Trans Audio Speech Lang Process 15(1):203–223

    Article  Google Scholar 

  21. Siniscalchi SM, Svendsen T, Lee C-H (2014) An artificial neural network approach to automatic speech processing. Neurocomputing 140:326–338

    Article  Google Scholar 

  22. Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37(1):91–126

    Article  Google Scholar 

  23. Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Valtchev V (2002) The HTK book. Cambridge University Engineering Department, vol 3, pp 1–285

  24. Kumar M, Rajput N, Verma A (2004) A large-vocabulary continuous speech recognition system for Hindi. IBM J Res Dev 48(5.6):703–715

    Article  Google Scholar 

  25. Kuamr A, Dua M, Choudhary A (2014) Implementation and performance evaluation of continuous Hindi speech recognition. In: Electronics and communication systems (ICECS), 2014 international conference on. IEEE, pp 1–5

  26. Fung ADYLP (2012) Using English acoustic models for Hindi automatic speech recognition. In: 24th international conference on computational linguistics

  27. Patil HA, Basu TK (2008) Development of speech corpora for speaker recognition research and evaluation in Indian languages. Int J Speech Technol 11(1):17–32

    Article  Google Scholar 

  28. Aggarwal RKumar, Dave M (2012) Filterbank optimization for robust ASR using GA and PSO. Int J Speech Technol 15(2):191–201

    Article  Google Scholar 

  29. Biswas A, Sahu PK, Chandra M (2016) Admissible wavelet packet sub-band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition. IET Signal Proc 10(8):902–911

    Article  Google Scholar 

  30. Biswas A et al (2015) Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition. IET Signal Proc 9(6):511–519

    Article  Google Scholar 

  31. Mittal T, Sharma R (2016) Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm. Turk J Electr Eng Comput Sci 24:4790–4803

    Article  Google Scholar 

  32. Gopalakrishnan PS, Kanevsky D, Nadas A, Nahamoo D (1991) An inequality for rational functions with applications to some statistical estimation problems. IEEE Trans Inf Theory 37(1):107–113

    Article  Google Scholar 

  33. Valtchev V (1995) Discriminative methods in HMM-based speech recognition, Ph.D Thesis. University of Cambridge

  34. Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Saon G, Visweswariah K (2008) Boosted MMI for model and feature-space discriminative training. In: 2008 IEEE international conference on acoustics, speech and signal processing (ICASSP-08), Las Vegas. IEEE, pp 4057–4060

  35. Povey D (2005) Discriminative training for large vocabulary speech recognition, Ph.D Thesis. University of Cambridge

  36. Liu X, Wang Y, Chen X, Gales MJF, Woodland PC (2014) Efficient lattice rescoring using recurrent neural network language models. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP-14), Florence. IEEE, pp 4908–4912

  37. Williams DRGHR, Hinton GE (1986) Learning representations by back-propagating errors. Nature 323(6088):533–538

    Article  Google Scholar 

  38. Boden M (2002) A guide to recurrent neural networks and back propagation. The Dallas Project, Halmstad University, Sweden

  39. Shi Y, Hwang MY, Yao K, Larson M (2013) Speed up of recurrent neural network language models with sentence independent sub sampling stochastic gradient descent. In: Proceeding of interspeech conference, Lyon. ISCA, pp 1203–1207

  40. Huang Z, Zweig G, Levit M, Dumoulin B, Oguz B, Chang S (2013) Accelerating recurrent neural network training via two stage classes and parallelization. In: 2013 IEEE workshop on automatic speech recognition and understanding, Olomouc. IEEE, pp 326–331

  41. Li B, Zhou E, Huang B, Duan J, Wang Y, Xu N, Zhang J, Yang H (2014) Large scale recurrent neural network on GPU. In: 2014 international joint conference on neural networks (IJCNN), Beijing. IEEE, pp 4062–4069

  42. Chen X, Wang Y, Liu X, Gales MJ, Woodland PC (2014) Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch. In: Proceeding of interspeech conference, Singapore. ISCA, pp 641–645

  43. Liu X, Chen X, Wang Y, Gales MJ, Woodland PC (2016) Two efficient lattice rescoring methods using recurrent neural network language models. IEEE/ACM Trans Audio Speech Lang Process 24(8):1438–1449

    Article  Google Scholar 

  44. Samudravijaya K, Rao PVS, Agrawal SS (2002) Hindi speech database. In: International conference on spoken language processing, Beijing, pp 456–464

  45. Macherey W (2010) Discriminative training and acoustic modeling for speech recognition, Ph.D Thesis. RWTH Aachen University

  46. Chen X, Liu X, Qian Y, Gales MJF, Woodland PC (2016) CUED-RNN LM—an open-source toolkit for efficient training and evaluation of recurrent neural network language models. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP-16), Shanghai. IEEE, pp 6000–6004

  47. Deoras A, Mikolov T, Kombrink S, Karafiát M, Khudanpur S (2011) Variational approximation of long-span language models for LVCSR. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP-11), Prague. IEEE, pp 5532–5535

  48. Lecouteux B, Linares G, Esteve Y, Gravier G (2008) Generalized driven decoding for speech recognition system combination. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 1549–1552

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohit Dua.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dua, M., Aggarwal, R.K. & Biswas, M. Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Comput & Applic 31, 6747–6755 (2019). https://doi.org/10.1007/s00521-018-3499-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3499-9

Keywords

Navigation