Skip to main content
Log in

Robust vowel region detection method for multimode speech

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The aim of this paper is to explore a robust method for vowel region detection from multimode speech. In realistic scenario, speech can be classified into three modes namely; conversation, extempore, and read. The existing method detects the vowel form the speech recorded in clean environment which may not be appropriate for the multimode speech tasks. To address this issue, we proposed an approach based on continuous wavelet transform coefficients and phone boundaries for detecting the vowel regions from different modes of the speech signal. For evaluation of the proposed vowel region (VR) detection technique, TIMIT (read speech) and Bengali (read, extempore, and conversation speech) corpora are used. The proposed VR detection technique is compared to the state-of-the-art methods. The experiments has recorded significant gain in the performance of the proposed technique than the state-of-the-art methods. The efficiency of the proposed technique is shown by extracting vocal tract and excitation source features from automatically detected VRs for developing the multilingual speech mode classification (MSMC) model. The evaluation results report that the performance of the MSMC model is significantly improved when features are extracted from the vowel regions than the entire speech utterance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Burget L, Schwarz P, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel N, Karafiát M, Povey D, et al. (2010) Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: International conference on acoustics speech and signal processing (ICASSP), pp 4334–4337. IEEE

  2. Furui S (1986) On the role of spectral transition for speech perception. The Journal of the Acoustical Society of America 80(4):1016–1025

    Article  Google Scholar 

  3. Garofalo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) The DARPA TIMIT acoustic-phonetic continuous speech corpus cdrom. Linguistic Data Consortium. pp. 207–212

  4. Haykin S (1994) Neural networks: a comprehensive foundation. Prentice Hall PTR, Upper Saddle River

    MATH  Google Scholar 

  5. Keerthana YM, Reddy MK, Rao KS (2019) Cwt-based approach for epoch extraction from telephone quality speech. IEEE Signal Processing Letters 26(8):1107–1111

    Article  Google Scholar 

  6. Kumar A, Shahnawazuddin S, Pradhan G (2017) Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits, systems, and signal processing 36(6):2315–2340

    Article  MathSciNet  Google Scholar 

  7. Kumar A, Shahnawazuddin S, Pradhan G (2017) Non-local estimation of speech signal for vowel onset point detection in varied environments.. In: INTERSPEECH, pp 429–433

  8. Kumar SBSunil, Rao KS, Pati D (2013) Phonetic and prosodically rich transcribed speech corpus in Indian languages: Bengali and Odia. In: Proceedings of international conference on oriental COCOSDA held jointly with conference on asian spoken language research and evaluation (O-COCOSDA/CASLRE), Gurgaon, India, pp 1–5

  9. Mallat S (1999) Wavelet tour of signal processing. New York, NY, USA: Academic

  10. Manjunath KE, Rao KS (2014) Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In: Twentieth national conference on communications (NCC), pp 1–6. IEEE

  11. Manjunath KE, Rao KS (2015) Source and system features for phone recognition. International Journal of Speech Technology 18(2):257–270

    Article  Google Scholar 

  12. Manjunath KE, Rao KS, Jayagopi DB (2017) Development of multilingual phone recognition system for Indian languages. In: International conference on signal processing, informatics, communication and energy systems (SPICES), pp 1–6. IEEE

  13. Mittal VK, Vuppala AK (2016) Changes in shout features in automatically detected vowel regions. In: 2016 International conference on signal processing and communications (SPCOM), pp 1–5. IEEE

  14. Murty K SR, Yegnanarayana B (2008) Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing 16(8):1602–1613

    Article  Google Scholar 

  15. Pradeep R, Rao KS (2016) Deep neural networks for Kannada phoneme recognition. In: Ninth international conference on contemporary computing (IC3), pp 1–6. IEEE

  16. Pradhan G, Prasanna SRMahadeva (2013) Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing 21(4):854–867

    Article  Google Scholar 

  17. Prasanna SRM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE transactions on audio, speech, and language processing 19(8):2552–2565

    Article  Google Scholar 

  18. Prasanna SRM, Reddy BVS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on audio, speech, and language processing 17(4):556–565

    Article  Google Scholar 

  19. Ramdinmawii E, Mohanta A, Mittal VK (2017) Emotion recognition from speech signal. In: TENCON, pp 1562–1567. IEEE

  20. Reddy MK, Rao KS (2017) Robust pitch extraction method for the hmm-based speech synthesis system. IEEE signal processing letters 24(8):1133–1137

    Article  Google Scholar 

  21. Scanzio S, Laface P, Fissore L, Gemello R, Mana F (2008) On the use of a multilingual neural network front-end. In: Ninth annual conference of the international speech communication association

  22. Suni AS, Simko J, Vainio MT, et al. (2016) Boundary detection using continuous wavelet analysis. Proceedings of Speech prosody 2016

  23. Thirumuru R, Gangashetty SV, Vuppala AK (2018) Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points. Multimedia Tools and Applications 77(4):4753–4767

    Article  Google Scholar 

  24. Tripathi K, Rao KS (2017) Improvement of phone recognition accuracy using speech mode classification. International Journal of Speech Technology 21(3):489–500. https://doi.org/10.1007/s10772-017-9483-4

    Article  Google Scholar 

  25. Tripathi K, Rao KS (2019) Speech mode classification for indian languages using vocal tract and excitation source features. In: 22nd Conference of the oriental COCOSDA

  26. Tripathi K, Sreenivasa Rao K (2019) VOP Detection for Read and Conversation Speech using CWT Coefficients and Phone Boundaries

  27. Vuppala AK, Rao KS (2013) Speaker identification under background noise using features extracted from steady vowel regions. International Journal of Adaptive Control and Signal Processing 27(9):781–792

    Article  Google Scholar 

  28. Vuppala AK, Rao KS, Chakrabarti S (2012) Improved vowel onset point detection using epoch intervals. AEU-International Journal of Electronics and Communications 66(8):697–700

    Article  Google Scholar 

  29. Vuppala AK, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing 20(6):1894–1903

    Article  Google Scholar 

  30. Yadav J, Rao KS (2013) Detection of vowel offset point from speech signal. IEEE signal processing letters 20(4):299–302

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kumud Tripathi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tripathi, K., Rao, K.S. Robust vowel region detection method for multimode speech. Multimed Tools Appl 80, 13615–13637 (2021). https://doi.org/10.1007/s11042-020-10394-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10394-7

Keywords

Navigation