Abstract
Today, deep learning is one of the most reliable and technically equipped approaches for developing more accurate speech recognition model and natural language processing (NLP). In this paper, we propose Context-Dependent Deep Neural-network HMMs (CD-DNN-HMM) for large vocabulary Hindi speech using Kaldi automatic speech recognition toolkit. Experiments on AMUAV database demonstrate that CD-DNN-HMMs outperform the conventional CD-GMM-HMMs model and provide the improvement in word error rate of 3.1% over conventional triphone model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cogn. Sci. 9, 147–169 (1985)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Raina, R., Madhavan, A., Ng, A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of 26th International Conference on Machine Learning (ICML 09), pp. 873–880 (2009)
Mnih, V., Hinton, G.E.: Learning to detect roads in high-resolution aerial images. Lecture Notes in Computer Science (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) vol. 6316 LNCS, pp. 210–223 (2010)
Cireşan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Handwritten digit recognition with a committee of deep neural nets on GPUs. Technical Report No. IDSIA-03-11. 1–8 (2011)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMS. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4688–4691 (2011)
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8609–8613. IEEE (2013)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 82–97 (2012)
Mohamed, A., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22 (2012)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 1–18. IEEE (2012)
Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 5884–5887 (2011)
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for speech research at Microsoft. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8604–8608 (2013)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2012)
Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., Hinton, G.E.: On rectified linear units for speech processing New York University, USA Google Inc., USA University of Toronto, Canada. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 3517–3521. IEEE (2013)
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of 2011 IEEE Workshop on Automatic Speech Recognition and Understandings (ASRU 2011), pp. 24–29 (2011)
Gehring, J., Miao, Y., Metze, F., Waibel, A.: Extracting deep bottleneck features using stacked auto-encoders. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3377–3381 (2013)
Deng, L., Seltzer, M.L., Yu, D., Acero, A., Mohamed, A.R., Hinton, G.: Binary coding of speech spectrograms using a deep auto-encoder. In: Eleventh Annual Conference of the International Speech Communication Association, pp. 1692–1695 (2010)
Dahl, G., Mohamed, A.R., Hinton, G.E.: Phone recognition with the mean-covariance restricted Boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 469–477 (2010)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE Workshop Automatic Speech Recognition and Understanding, pp. 1–4 (2011)
Young, S., Gales, M., Liu, X.A., Povey, D., Woodland, P.: The HTK Book (version 3.5a). English Department, Cambridge University (2015)
Kaldi Home Page. www.kaldi-asr.org
Povey, D., Peddinti, V., Galvez, D., Ghahrmani, P., Manohar, V., Na, X., Wang, Y., Khudanpur, S.: Purely sequence-trained neural networks for {ASR} based on lattice-free {MMI}. In: Proceedings of Interspeech, pp. 2751–2755 (2016)
Cosi, P.: Phone recognition experiments on ArtiPhon with KALDI. In: Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian 1749, 0–5 (2016)
Canevari, C., Badino, L., Fadiga, L.: A new Italian dataset of parallel acoustic and articulatory data. In: Proceedings of Annual Conference on International Speech Communication Association Interspeech, Jan 2015, pp. 2152–2156 (2015)
Miao, Y.: Kaldi + PDNN: building DNN-based ASR systems with Kaldi and PDNN. arXiv CoRR. abs/1401.6, 1–4 (2014)
Chen, Z., Watanabe, S., Erdogan, H., Hershey, J.R.: Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. Interspeech 3274–3278 (2015)
Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 215–219 (2014)
Vu, N.T., Imseng, D., Povey, D., Motlicek, P., Schultz, T., Bourlard, H.: Multilingual deep neural network based acoustic modeling for rapid language adaptation. Icassp-2014, pp. 7639–7643 (2014)
Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, P.: Comparative study of visual feature for bimodal hindi speech recognition. Arch. Acoust. 40 (2015)
Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete Kaldi recipe for building Arabic speech recognition systems. In: Proceedings of 2014 IEEE Workshop Spoken Language Technology (SLT 2014), pp. 525–529 (2014)
Cosi, P.: A KALDI-DNN-based ASR system for Italian. In: Proceedings of International Joint Conference on Neural Networks (2015)
Lopes, C., Perdigão, F.: Phone recognition on the TIMIT database. Speech Technol. 1, 285–302 (2011)
Acknowledgements
The authors would like to acknowledge Institution of Electronics and Telecommunication Engineers (IETE) for sponsoring the research fellowship during this period of research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Upadhyaya, P., Mittal, S.K., Farooq, O., Varshney, Y.V., Abidi, M.R. (2019). Continuous Hindi Speech Recognition Using Kaldi ASR Based on Deep Neural Network. In: Tanveer, M., Pachori, R. (eds) Machine Intelligence and Signal Analysis. Advances in Intelligent Systems and Computing, vol 748. Springer, Singapore. https://doi.org/10.1007/978-981-13-0923-6_26
Download citation
DOI: https://doi.org/10.1007/978-981-13-0923-6_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0922-9
Online ISBN: 978-981-13-0923-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)