Robust Perceptual Wavelet Packet Features for the Recognition of Spontaneous Kannada Sentences

Shanthamallappa, Mahadevaswamy; Ravi, D. J.

doi:10.1007/s11277-023-10802-9

Robust Perceptual Wavelet Packet Features for the Recognition of Spontaneous Kannada Sentences

Published: 21 December 2023

Volume 133, pages 1011–1030, (2023)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

65 Accesses
Explore all metrics

Abstract

The speech community achieved significant improvements in the performance of ASR systems for English and other languages. But, the ASR for low-resource languages is still far from reality. The major cause for this problem is data scarcity, as ASR systems based on deep learning require huge data for training the system. An ASR for detecting and recognizing the spontaneously spoken low-resource Kannada language sentences is developed here. The proposed system is implemented over DNN-HMM hybrid acoustic models and n-gram language models. The efficacy of the proposed system is presented through the metricWord Error Rate (WER). The feature extraction is achieved through Perceptual Wavelets and Mel filter banks. The results reveal the competitive performance of the proposed ASR system over the baseline features and models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Recent advances in deep learning based dialogue systems: a systematic survey

Article 20 August 2022

Data Availability Material

The datasets generated during and/or analysed during the current study are not publicly available due to the institute regulations, but are available from the corresponding author on reasonable request.

References

Mamyrbayev, O. Z., Oralbekova, D. O., Alimhan, K., & Nuranbayeva, B. M. (2023). Hybrid end-to-end model for Kazakh speech recognition. International Journal of Speech Technology., 26(2), 261–270.
Article Google Scholar
Wang, X., Long, Y., & Xu, D. (2022). Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition. International Journal of Speech Technology., 25(4), 987–995.
Article Google Scholar
Jolad, B., & Khanai, R. (2023). An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks. International Journal Of Speech Technology., 21, 1–9.
Google Scholar
Raj, P. P. (2021). Real-time pre-processing for improved feature extraction of noisy speech. International Journal of Speech Technology, 24(3), 715–728.
Article Google Scholar
Tang, L. (2023). A transformer-based network for speech recognition. International Journal of Speech Technology., 26, 1–9.
Article Google Scholar
Dua, M., Akanksha, & Dua, S. (2023). Noise robust automatic speech recognition: review and analysis. International Journal of Speech Technology, 24, 1–45.
Google Scholar
Tian, J. et al (2022). Integrating lattice-free MMI into end-to-end speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31¸ 25–38.
Wang, M. et al. (2022). End-to-end multi-modal speech recognition on an air and bone conducted speech corpus. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 513–524.
Ghorbani, S., & Hansen, J. H. L. (2022). Domain expansion for end-to-end speech recognition: applications for accent/dialect speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 762–774.
Swamidason, I. T., Tatiparthi, S., Arul Xavier ,V. M., & Devadass, C. S. (2020). Exploration of diverse intelligent approaches in speech recognition systems. International Journal of Speech Technology, pp 1–0.
Birajdar, G. K., & Raveendran, S. (2023). Indian language identification using time-frequency texture features and kernel ELM. Journal of Ambient Intelligence and Humanized Computing., 14(10), 13237–13250.
Article Google Scholar
Kim, C., & Stern, R. M. (2016). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7), 1315–1329.
Article Google Scholar
Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777. https://doi.org/10.1109/TASLP.2014.2304637
Article Google Scholar
Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9525-6
Article Google Scholar
Bouguelia, M.-R., Nowaczyk, S., Santosh, K. C., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319. https://doi.org/10.1007/s13042-017-0645-0
Article Google Scholar
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. https://doi.org/10.1109/TASSP.1980.1163420
Article Google Scholar
Farooq, O., Datta, S., & Shrotriya, M. C. (2010). Wavelet sub-band based temporal features for robust Hindi phoneme recognition. International Journal of Wavelets, Multiresolution and Information Processing, 08(06), 847–859. https://doi.org/10.1142/S0219691310003845
Article Google Scholar
Passricha, V., & Aggarwal, R. K. (2020). A comparative analysis of pooling strategies for convolutional neural network based Hindi ASR. Journal of Ambient Intelligence and Humanized Computing, 11(2), 675–691.
Article Google Scholar
Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198. https://doi.org/10.1109/97.928676
Article Google Scholar
Kumar, S., Kumar, R., Momani, S., & Hadid, S. (2023). A study on fractional COVID-19 disease model by using Hermite wavelets. Mathematical Methods in the Applied Sciences, 46(7), 7671–7687.
Article MathSciNet Google Scholar
Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, 17(4), 389–399. https://doi.org/10.1007/s10772-014-9236-6
Article Google Scholar
Biswas, A., Sahu, P. K., & Chandra, M. (2016). Admissible wavelet packet sub-band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition. IET Signal Processing, 10(8), 902–911. https://doi.org/10.1049/iiet-spr.2015.0488
Article Google Scholar
Addison, P. S. (2017). The illustrated wavelet transform handbook: introductory theory and applications in science, engineering, medicine and finance. CRC Press.
Vetterli, M., & Herley, C. (1992). Wavelets and filter banks: Theory and design. IEEE Transactions on Signal Processing, 40(9), 2207–2232. https://doi.org/10.1109/78.157221
Article Google Scholar
Lin, T., Xu, S., Shi, Q., & Hao, P. (2006). An algebraic construction of orthonormal M-band wavelets with perfect reconstruction. Applied Mathematics and Computation, 172(2), 717–730. https://doi.org/10.1016/j.amc.2004.11.025
Article MathSciNet Google Scholar
Pollock, S., & Cascio, IL (2007). Non-dyadic wavelet analysis. In Optimisation, econometric and financial analysis (pp. 167–203). Springer. https://doi.org/10.1007/3-540-36626-1_9.
Chiu, C.-C., Chuang, C.-M., & Hsu, C.-Y. (2009). Discrete wavelet transform applied on personal identity verification with ECG signal. International Journal of Wavelets, Multiresolution and Information Processing, 07(03), 341–355. https://doi.org/10.1142/S0219691309002957
Article Google Scholar
Rajoub, B., Alshamali, A., & Al-Fahoum, A. S. (2002). An efficient coding algorithm for the compression of ECG signals using the wavelet transform. IEEE Transactions on Biomedical Engineering, 49(4), 355–362. https://doi.org/10.1109/10.991163
Article Google Scholar
Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback-Leibler divergence. Signal Processing, 106, 184–197. https://doi.org/10.1016/J.SIGPRO.2014.06.027
Article Google Scholar
Zao, L., Coelho, R., & Flandrin, P. (2014). Speech enhancement with EMD and hurst-based mode selection. IEEE Transactions on Audio, Speech and Language Processing, 22(5), 899–911. https://doi.org/10.1109/TASLP.2014.2312541
Article Google Scholar
Adeli, H., Zhou, Z., & Dadmehr, N. (2003). Analysis of EEG records in an epileptic patient using wavelet transform. Journal of Neuroscience Methods, 123(1), 69–87. https://doi.org/10.1016/S0165-0270(02)00340-0
Article Google Scholar
Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. https://doi.org/10.1016/J.ESWA.2007.12.065
Article Google Scholar
Biswas, A., Sahu, P. K., & Chandra, M. (2014). Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Computers & Electrical Engineering, 40(4), 1111–1122. https://doi.org/10.1016/J.COMPELECENG.2014.01.008
Article Google Scholar
Leggetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech & Language, 9(2), 171–185.
Article Google Scholar
Gales, M. (2000). Cluster adaptive training of hidden Markov models. IEEE Transactions on Speech and Audio Processing, 8(4), 417–428.
Article Google Scholar
Karpov, A., et al. (2014). Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Communication, 56, 213–228.
Article Google Scholar
Heil, C. (1993). Ten lectures on wavelets (ingrid daubechies). SIAM Review, 35(4), 666–669. https://doi.org/10.1137/1035160.
Mahadevaswamy, D. J. (2019). Ravi, Performance of isolated and continuous digit recognition system using Kaldi Toolkit. International Journal of Recent Technology and Engineering, 8, 264–271.
TYadava, T. G., Jayanna, H. S. (2018). Creation and comparison of language and acoustic models using Kaldi for noisy and enhanced speech data. International Journal of Intelligent Systems and Applications, 12(3), 22.
Praveen Kumar, P.S., Thimmaraja Yadava, G., Jayanna, H. S. (2020). Continuous Kannada speech recognition system under degraded condition. Circuits Systems and Signal Processing, 39, 391–419.
Biswas, A., Sahu, P. K., Anirban, B., & Mahesh, C. (2015). Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Computers & Electrical Engineering.
Mahadevaswamy. (2023). Robust automatic speech recognition system for the recognition of continuous kannada speech sentences in the presence of noise. Wireless Personal Communication 130, 2039–2058. https://doi.org/10.1007/s11277-023-10371-x
Mahadevaswamy, D. J. (2019). Ravi, Performance of isolated and continuous digit recognition system using Kaldi Toolkit. International Journal of Recent Technology and Engineering, 8, 264–271.
Steffen, P., Heller, P. N., Gopinath, R. A., & Burrus, C. S. (1993). Theory of regular M-band wavelet bases. IEEE Transactions on Signal Processing, 41(12), 3497–3511. https://doi.org/10.1109/78.258088
Article Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., & Hannemann, M. (2011). The Kaldi speech recognition toolkit. In IEEE, et al. (2011). Workshop on automatic speech recognition and understanding, no (p. 2011). IEEE Signal Processing Society.
http://www.iitg.ac.in/cseweb/tts/tts/Assamese/transliteration/Indic%20Language%20Transliteration%20Tool%20(IT3%20to%20UTF-8)_11.html
http://www.iitg.ac.in/samudravijaya/tutorials/ILSL_V3.2.pdf
Weng, Z., Qin, Z., Tao, X., Pan, C., Liu, G., Li, G. Y. (2023). Deep learning enabled semantic communications with speech recognition and synthesis. IEEE Transactions on Wireless Communications. Feb 6.
Yadava, T. G., Jayanna, H. S. (2017). A spoken query system for the agricultural commodity prices and weather information access in Kannada language. International Journal of Speech Technology 20, 635–44.
Mahadevaswamy, & Ravi, D. J. (2021). Robust perceptual wavelet packet features for recognition of continuous kannada speech. Wireless Personal Communication, 121, 1781–1804. https://doi.org/10.1007/s11277-021-08736-1
Mahadevaswamy & Ravi, D. J. (2016). Performance analysis of adaptive wavelet denosing by speech discrimination and thresholding. In 2016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT), Mysuru, 2016, pp. 173–178. https://doi.org/10.1109/ICEECCOT.2016.7955209.
Mahadevaswamy, & Ravi, D. J. (2017). Performance analysis of LP residual and correlation coefficients based speech seperation front end. In 2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC), Mysore, 2017, pp. 328–332. https://doi.org/10.1109/CTCEEC.2017.8455039
Geethashree A., & Ravi D.J. (2018). Kannada Emotional Speech Database: Design, Development and Evaluation. In: Guru D., Vasudev T., Chethan H., Kumar Y. (Editors) Proceedings of International Conference on Cognition and Recognition. Lecture Notes in Networks and Systems, vol 14. Springer, Singapore.
Basavaiah, J., & Patil, C. M. (2020). Human activity detection and action recognition in videos using convolutional neural networks. Journal of Information and Communication Technology, 19(2), 157–183.
Google Scholar
Basavaiah, J., & Anthony, A. A. (2020). Tomato leaf disease classification using multiple feature extraction techniques. Wireless Personal Communications. https://doi.org/10.1007/s11277-020-07590-x
Article Google Scholar
Upadhyaya, P., Farooq, O., & Abidi, M. R. (2018). Mel scaled M-band wavelet filter bank for speech recognition. International Journal of Speech Technology., 21, 797–807.
Article Google Scholar

Download references

Acknowledgements

Authors would like to express their sincere gratitude to the Management of Vidyavardhaka Sangha, Principal of VVCE, and HOD, Department of ECE, for their constant support and encouragement throught the completion of this work.

Funding

The authors declare that no funds, grants were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Vidyavardhaka College of Engineering, Visvesvaraya Technological University, Mysuru, Karnataka, 570002, India
Mahadevaswamy Shanthamallappa & D. J. Ravi

Authors

Mahadevaswamy Shanthamallappa
View author publications
You can also search for this author in PubMed Google Scholar
D. J. Ravi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Content preparation, data collection and analysis were performed by Mahadevaswamy. The first draft of the manuscript was written by Mahadevaswamy, and Editing, Writing and Final review was performed by DJR. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mahadevaswamy Shanthamallappa.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests related to this research.

Ethics Approval

The research conducted in this study adhered to the highest ethical standards, and is approved by the institutional review board. The study protocol and any subsequent modifications were reviewed and found to be in compliance with established ethical guidelines.

Informed Consent

Informed consent was obtained from all study participants or their legal representatives, as appropriate. Participants were provided with comprehensive information about the study's purpose, procedures, potential risks, and benefits, and their right to withdraw from the study at any time without consequences. Written consent was obtained from all participants before their inclusion in the research.

Participant Anonymity and Confidentiality

To ensure participant anonymity and confidentiality, all personal identifiers were removed or anonymized from the data and any published materials. Participant identities were protected throughout the study, and data were stored securely in accordance with applicable data protection regulations.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shanthamallappa, M., Ravi, D.J. Robust Perceptual Wavelet Packet Features for the Recognition of Spontaneous Kannada Sentences. Wireless Pers Commun 133, 1011–1030 (2023). https://doi.org/10.1007/s11277-023-10802-9

Download citation

Accepted: 02 December 2023
Published: 21 December 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11277-023-10802-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Perceptual Wavelet Packet Features for the Recognition of Spontaneous Kannada Sentences

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Recent advances in deep learning based dialogue systems: a systematic survey

Data Availability Material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics Approval

Informed Consent

Participant Anonymity and Confidentiality

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Perceptual Wavelet Packet Features for the Recognition of Spontaneous Kannada Sentences

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Recent advances in deep learning based dialogue systems: a systematic survey

Data Availability Material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics Approval

Informed Consent

Participant Anonymity and Confidentiality

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation