Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network

Upadhyaya, Prashant; Mittal, Sanjeev Kumar; Farooq, Omar; Varshney, Yash Vardhan; Abidi, Musiur Raza

doi:10.1007/978-981-13-0923-6_26

Prashant Upadhyaya¹⁶,
Sanjeev Kumar Mittal¹⁷,
Omar Farooq¹⁶,
Yash Vardhan Varshney¹⁶ &
…
Musiur Raza Abidi¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 748))

1289 Accesses
7 Citations

Abstract

Today, deep learning is one of the most reliable and technically equipped approaches for developing more accurate speech recognition model and natural language processing (NLP). In this paper, we propose Context-Dependent Deep Neural-network HMMs (CD-DNN-HMM) for large vocabulary Hindi speech using Kaldi automatic speech recognition toolkit. Experiments on AMUAV database demonstrate that CD-DNN-HMMs outperform the conventional CD-GMM-HMMs model and provide the improvement in word error rate of 3.1% over conventional triphone model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Article Google Scholar
Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cogn. Sci. 9, 147–169 (1985)
Article Google Scholar
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Article Google Scholar
Raina, R., Madhavan, A., Ng, A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of 26th International Conference on Machine Learning (ICML 09), pp. 873–880 (2009)
Google Scholar
Mnih, V., Hinton, G.E.: Learning to detect roads in high-resolution aerial images. Lecture Notes in Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) vol. 6316 LNCS, pp. 210–223 (2010)
Google Scholar
Cireşan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Handwritten digit recognition with a committee of deep neural nets on GPUs. Technical Report No. IDSIA-03-11. 1–8 (2011)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMS. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4688–4691 (2011)
Google Scholar
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8609–8613. IEEE (2013)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 82–97 (2012)
Google Scholar
Mohamed, A., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22 (2012)
Article Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 1–18. IEEE (2012)
Google Scholar
Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 5884–5887 (2011)
Google Scholar
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for speech research at Microsoft. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8604–8608 (2013)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2012)
Google Scholar
Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., Hinton, G.E.: On rectified linear units for speech processing New York University, USA Google Inc., USA University of Toronto, Canada. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 3517–3521. IEEE (2013)
Google Scholar
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of 2011 IEEE Workshop on Automatic Speech Recognition and Understandings (ASRU 2011), pp. 24–29 (2011)
Google Scholar
Gehring, J., Miao, Y., Metze, F., Waibel, A.: Extracting deep bottleneck features using stacked auto-encoders. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3377–3381 (2013)
Google Scholar
Deng, L., Seltzer, M.L., Yu, D., Acero, A., Mohamed, A.R., Hinton, G.: Binary coding of speech spectrograms using a deep auto-encoder. In: Eleventh Annual Conference of the International Speech Communication Association, pp. 1692–1695 (2010)
Google Scholar
Dahl, G., Mohamed, A.R., Hinton, G.E.: Phone recognition with the mean-covariance restricted Boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 469–477 (2010)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE Workshop Automatic Speech Recognition and Understanding, pp. 1–4 (2011)
Google Scholar
Young, S., Gales, M., Liu, X.A., Povey, D., Woodland, P.: The HTK Book (version 3.5a). English Department, Cambridge University (2015)
Google Scholar
Kaldi Home Page. www.kaldi-asr.org
Povey, D., Peddinti, V., Galvez, D., Ghahrmani, P., Manohar, V., Na, X., Wang, Y., Khudanpur, S.: Purely sequence-trained neural networks for {ASR} based on lattice-free {MMI}. In: Proceedings of Interspeech, pp. 2751–2755 (2016)
Google Scholar
Cosi, P.: Phone recognition experiments on ArtiPhon with KALDI. In: Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian 1749, 0–5 (2016)
Google Scholar
Canevari, C., Badino, L., Fadiga, L.: A new Italian dataset of parallel acoustic and articulatory data. In: Proceedings of Annual Conference on International Speech Communication Association Interspeech, Jan 2015, pp. 2152–2156 (2015)
Google Scholar
Miao, Y.: Kaldi + PDNN: building DNN-based ASR systems with Kaldi and PDNN. arXiv CoRR. abs/1401.6, 1–4 (2014)
Google Scholar
Chen, Z., Watanabe, S., Erdogan, H., Hershey, J.R.: Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. Interspeech 3274–3278 (2015)
Google Scholar
Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 215–219 (2014)
Google Scholar
Vu, N.T., Imseng, D., Povey, D., Motlicek, P., Schultz, T., Bourlard, H.: Multilingual deep neural network based acoustic modeling for rapid language adaptation. Icassp-2014, pp. 7639–7643 (2014)
Google Scholar
Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, P.: Comparative study of visual feature for bimodal hindi speech recognition. Arch. Acoust. 40 (2015)
Google Scholar
Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete Kaldi recipe for building Arabic speech recognition systems. In: Proceedings of 2014 IEEE Workshop Spoken Language Technology (SLT 2014), pp. 525–529 (2014)
Google Scholar
Cosi, P.: A KALDI-DNN-based ASR system for Italian. In: Proceedings of International Joint Conference on Neural Networks (2015)
Google Scholar
Lopes, C., Perdigão, F.: Phone recognition on the TIMIT database. Speech Technol. 1, 285–302 (2011)
Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge Institution of Electronics and Telecommunication Engineers (IETE) for sponsoring the research fellowship during this period of research.

Author information

Authors and Affiliations

Department of Electronics, Aligarh Muslim University, Aligarh, 202002, Uttar Pradesh, India
Prashant Upadhyaya, Omar Farooq, Yash Vardhan Varshney & Musiur Raza Abidi
Electrical Engineering, Indian Institute of Science Bangalore, Bengaluru, 560012, Karnataka, India
Sanjeev Kumar Mittal

Authors

Prashant Upadhyaya
View author publications
You can also search for this author in PubMed Google Scholar
Sanjeev Kumar Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Omar Farooq
View author publications
You can also search for this author in PubMed Google Scholar
Yash Vardhan Varshney
View author publications
You can also search for this author in PubMed Google Scholar
Musiur Raza Abidi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prashant Upadhyaya .

Editor information

Editors and Affiliations

Discipline of Mathematics, Indian Institute of Technology Indore, Simrol, Madhya Pradesh, India
M. Tanveer
Discipline of Electrical Engineering, Indian Institute of Technology Indore, Simrol, Madhya Pradesh, India
Ram Bilas Pachori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Upadhyaya, P., Mittal, S.K., Farooq, O., Varshney, Y.V., Abidi, M.R. (2019). Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network. In: Tanveer, M., Pachori, R. (eds) Machine Intelligence and Signal Analysis. Advances in Intelligent Systems and Computing, vol 748. Springer, Singapore. https://doi.org/10.1007/978-981-13-0923-6_26

Download citation

DOI: https://doi.org/10.1007/978-981-13-0923-6_26
Published: 08 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0922-9
Online ISBN: 978-981-13-0923-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics