Robust vowel region detection method for multimode speech

Tripathi, Kumud; Rao, K. Sreenivasa

doi:10.1007/s11042-020-10394-7

Robust vowel region detection method for multimode speech

Published: 16 January 2021

Volume 80, pages 13615–13637, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

259 Accesses
2 Citations
Explore all metrics

Abstract

The aim of this paper is to explore a robust method for vowel region detection from multimode speech. In realistic scenario, speech can be classified into three modes namely; conversation, extempore, and read. The existing method detects the vowel form the speech recorded in clean environment which may not be appropriate for the multimode speech tasks. To address this issue, we proposed an approach based on continuous wavelet transform coefficients and phone boundaries for detecting the vowel regions from different modes of the speech signal. For evaluation of the proposed vowel region (VR) detection technique, TIMIT (read speech) and Bengali (read, extempore, and conversation speech) corpora are used. The proposed VR detection technique is compared to the state-of-the-art methods. The experiments has recorded significant gain in the performance of the proposed technique than the state-of-the-art methods. The efficiency of the proposed technique is shown by extracting vocal tract and excitation source features from automatically detected VRs for developing the multilingual speech mode classification (MSMC) model. The evaluation results report that the performance of the MSMC model is significantly improved when features are extracted from the vowel regions than the entire speech utterance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Vowel Region Detection from a Continuous Speech

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Article 01 August 2017

Application of non-negative frequency-weighted energy operator for vowel region detection

Article 10 April 2018

References

Burget L, Schwarz P, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel N, Karafiát M, Povey D, et al. (2010) Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: International conference on acoustics speech and signal processing (ICASSP), pp 4334–4337. IEEE
Furui S (1986) On the role of spectral transition for speech perception. The Journal of the Acoustical Society of America 80(4):1016–1025
Article Google Scholar
Garofalo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) The DARPA TIMIT acoustic-phonetic continuous speech corpus cdrom. Linguistic Data Consortium. pp. 207–212
Haykin S (1994) Neural networks: a comprehensive foundation. Prentice Hall PTR, Upper Saddle River
MATH Google Scholar
Keerthana YM, Reddy MK, Rao KS (2019) Cwt-based approach for epoch extraction from telephone quality speech. IEEE Signal Processing Letters 26(8):1107–1111
Article Google Scholar
Kumar A, Shahnawazuddin S, Pradhan G (2017) Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits, systems, and signal processing 36(6):2315–2340
Article MathSciNet Google Scholar
Kumar A, Shahnawazuddin S, Pradhan G (2017) Non-local estimation of speech signal for vowel onset point detection in varied environments.. In: INTERSPEECH, pp 429–433
Kumar SBSunil, Rao KS, Pati D (2013) Phonetic and prosodically rich transcribed speech corpus in Indian languages: Bengali and Odia. In: Proceedings of international conference on oriental COCOSDA held jointly with conference on asian spoken language research and evaluation (O-COCOSDA/CASLRE), Gurgaon, India, pp 1–5
Mallat S (1999) Wavelet tour of signal processing. New York, NY, USA: Academic
Manjunath KE, Rao KS (2014) Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In: Twentieth national conference on communications (NCC), pp 1–6. IEEE
Manjunath KE, Rao KS (2015) Source and system features for phone recognition. International Journal of Speech Technology 18(2):257–270
Article Google Scholar
Manjunath KE, Rao KS, Jayagopi DB (2017) Development of multilingual phone recognition system for Indian languages. In: International conference on signal processing, informatics, communication and energy systems (SPICES), pp 1–6. IEEE
Mittal VK, Vuppala AK (2016) Changes in shout features in automatically detected vowel regions. In: 2016 International conference on signal processing and communications (SPCOM), pp 1–5. IEEE
Murty K SR, Yegnanarayana B (2008) Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing 16(8):1602–1613
Article Google Scholar
Pradeep R, Rao KS (2016) Deep neural networks for Kannada phoneme recognition. In: Ninth international conference on contemporary computing (IC3), pp 1–6. IEEE
Pradhan G, Prasanna SRMahadeva (2013) Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing 21(4):854–867
Article Google Scholar
Prasanna SRM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE transactions on audio, speech, and language processing 19(8):2552–2565
Article Google Scholar
Prasanna SRM, Reddy BVS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on audio, speech, and language processing 17(4):556–565
Article Google Scholar
Ramdinmawii E, Mohanta A, Mittal VK (2017) Emotion recognition from speech signal. In: TENCON, pp 1562–1567. IEEE
Reddy MK, Rao KS (2017) Robust pitch extraction method for the hmm-based speech synthesis system. IEEE signal processing letters 24(8):1133–1137
Article Google Scholar
Scanzio S, Laface P, Fissore L, Gemello R, Mana F (2008) On the use of a multilingual neural network front-end. In: Ninth annual conference of the international speech communication association
Suni AS, Simko J, Vainio MT, et al. (2016) Boundary detection using continuous wavelet analysis. Proceedings of Speech prosody 2016
Thirumuru R, Gangashetty SV, Vuppala AK (2018) Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points. Multimedia Tools and Applications 77(4):4753–4767
Article Google Scholar
Tripathi K, Rao KS (2017) Improvement of phone recognition accuracy using speech mode classification. International Journal of Speech Technology 21(3):489–500. https://doi.org/10.1007/s10772-017-9483-4
Article Google Scholar
Tripathi K, Rao KS (2019) Speech mode classification for indian languages using vocal tract and excitation source features. In: 22nd Conference of the oriental COCOSDA
Tripathi K, Sreenivasa Rao K (2019) VOP Detection for Read and Conversation Speech using CWT Coefficients and Phone Boundaries
Vuppala AK, Rao KS (2013) Speaker identification under background noise using features extracted from steady vowel regions. International Journal of Adaptive Control and Signal Processing 27(9):781–792
Article Google Scholar
Vuppala AK, Rao KS, Chakrabarti S (2012) Improved vowel onset point detection using epoch intervals. AEU-International Journal of Electronics and Communications 66(8):697–700
Article Google Scholar
Vuppala AK, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing 20(6):1894–1903
Article Google Scholar
Yadav J, Rao KS (2013) Detection of vowel offset point from speech signal. IEEE signal processing letters 20(4):299–302
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
Kumud Tripathi & K. Sreenivasa Rao

Authors

Kumud Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumud Tripathi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathi, K., Rao, K.S. Robust vowel region detection method for multimode speech. Multimed Tools Appl 80, 13615–13637 (2021). https://doi.org/10.1007/s11042-020-10394-7

Download citation

Received: 20 November 2019
Revised: 05 October 2020
Accepted: 22 December 2020
Published: 16 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11042-020-10394-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust vowel region detection method for multimode speech

Abstract

Access this article

Similar content being viewed by others

A Study on Vowel Region Detection from a Continuous Speech

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Application of non-negative frequency-weighted energy operator for vowel region detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust vowel region detection method for multimode speech

Abstract

Access this article

Similar content being viewed by others

A Study on Vowel Region Detection from a Continuous Speech

Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

Application of non-negative frequency-weighted energy operator for vowel region detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation