Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points

Vuppala, Anil Kumar; Rao, K. Sreenivasa; Chakrabarti, Saswat

doi:10.1007/s00034-012-9391-4

Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points

Published: 02 February 2012

Volume 31, pages 1459–1474, (2012)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Anil Kumar Vuppala¹,
K. Sreenivasa Rao² &
Saswat Chakrabarti¹

415 Accesses
24 Citations
Explore all metrics

Abstract

In this paper, we propose an efficient approach to spotting and recognition of consonant-vowel (CV) units from continuous speech using accurate detection of vowel onset points (VOPs). Existing methods for VOP detection suffer from lack of high accuracy, spurious VOPs, and missed VOPs. The proposed VOP detection is designed to overcome most of the shortcomings of the existing methods and provide accurate detection of VOPs for improving the performance of spotting and recognition of CV units. The proposed method for VOP detection is carried out in two levels. At the first level, VOPs are detected by combining the complementary evidence from excitation source, spectral peaks, and modulation spectrum. At the second level, hypothesized VOPs are verified (genuine or spurious), and their positions are corrected using the uniform epoch intervals present in the vowel regions. The spotted CV units are recognized using a two-stage CV recognizer. Two-stage CV recognition system consists of hidden Markov models (HMMs) at the first stage for recognizing the vowel category of a CV unit and support vector machines (SVMs) for recognizing the consonant category of a CV unit at the second stage. Performance of spotting and recognition of CV units from continuous speech is evaluated using Telugu broadcast news speech corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Article 08 September 2016

Application of non-negative frequency-weighted energy operator for vowel region detection

Article 10 April 2018

VOP detection for read and conversation speech using CWT coefficients and phone boundaries

Article 07 January 2021

References

L.R. Bahl, R. Bakis, P.S. Cohen, A.G. Cole, F. Jelinek, B.L. Lewis, R.L. Mercer, Further results on the recognition of a continuously read natural corpus, in Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (1980), pp. 872–875
Google Scholar
R. Collobert, S. Bengio, C. Williamson, SVMTorch, support vector machines for large-scale regression problems. J. Mach. Learn. Res. 1, 143–160 (2001)
MathSciNet Google Scholar
O. Fujimura, Syllable as a unit of speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 82–87 (1975)
Article Google Scholar
S.V. Gangashetty, Neural network models for recognition of consonant-vowel units of speech in Multiple Languages. Ph.D. thesis, IIT Madras, Oct. 2004
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Spotting multilingual consonant-vowel units of speech using neural networks, in An ISCA Tutorial and Research Workshop on Non-Linear Speech Processing, Apr. 2005, pp. 287–297
Google Scholar
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, Oct. 2004, pp. 401–410
Google Scholar
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances, in Proc. Intelligent Sensing and Information Processing (2004), pp. 159–164
Chapter Google Scholar
A. Ganapathiraju, J. Hamaker, J. Picone, M. Ordowski, G.R. Doddington, Syllable based large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(4), 358–366 (2001)
Article Google Scholar
J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993)
Google Scholar
D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87, 866–873 (1990)
Article Google Scholar
M.Y. Hwang, X.D. Huang, Shared distribution Hidden Markov Models for speech recognition. IEEE Trans. Speech Audio Process. 1, 414–420 (1993)
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)
Article Google Scholar
J.W. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81, 1215–1247 (1993)
Article Google Scholar
S.R.M. Prasanna, B.V. Sandeep Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17(4), 556–565 (2009)
Article Google Scholar
S.R.M. Prasanna, S.V. Gangashetty, B. Yegnanarayana, Significance of vowel onset point for speech analysis, in Signal Processing and Communications (Biennial Conf., IISc) (2001), pp. 81–88
Google Scholar
S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information. Interspeech, pp. 1133–1136, Sep. 2005
L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (PTR Prentice Hall, Englewood Cliffs, 1993)
Google Scholar
R.M. Schwartz, Y.L. Chow, S. Roucos, M. Krasner, J. Makhoul, Improved hidden Markov modeling phonemes for continuous speech recognition, in Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (1984), pp. 21–24
Google Scholar
C.C. Sekhar, Neural network models for recognition of stop consonant-vowel (SCV) segments in continuous speech. Ph.D. thesis, IIT Madras (1996)
C.C. Sekhar, W.F. Lee, K. Takeda, F. Itakura, Acoustic modeling of subword units using support vector machines, in Proc. of Workshop on Spoken Language Processing (2003), pp. 79–86
Google Scholar
R. Thangarajan, A.M. Natarajan, M. Selvam, Syllable modeling in continuous speech recognition for Tamil language. Int. J. Speech Technol. 12, 47–57 (2009)
Article Google Scholar
A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on recognition of Consonant-Vowel (CV) units, in Proc. Int. Conf. Contemporary Computing, Aug. 2010. Springer Communications in Computer and Information Science (2010), pp. 284–294. ISSN: 1865-0929
Google Scholar
J.-H. Wang, S.-H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proc. Int. Conf. Acoust. Speech, Signal Process, vol. 1, Sep. 1999, pp. 1261–1264
Google Scholar
J.F. Wang, C.H. Wu, S.H. Chang, J.Y. Lee, A hierarchical neural network based on C/V segmentation algorithm for isolated Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)
Article Google Scholar
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.0 (Cambridge University Press, Cambridge, 2000)
Google Scholar

Download references

Acknowledgement

The authors like to acknowledge the reviewers for their valuable comments and suggested corrections. Those have helped us a lot for improving the quality of the paper.

Author information

Authors and Affiliations

GSSST, IIT Kharagpur, Kharagpur, 721302, West Bengal, India
Anil Kumar Vuppala & Saswat Chakrabarti
SIT, IIT Kharagpur, Kharagpur, 721302, West Bengal, India
K. Sreenivasa Rao

Authors

Anil Kumar Vuppala
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Saswat Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anil Kumar Vuppala.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vuppala, A.K., Rao, K.S. & Chakrabarti, S. Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points. Circuits Syst Signal Process 31, 1459–1474 (2012). https://doi.org/10.1007/s00034-012-9391-4

Download citation

Received: 07 March 2011
Revised: 09 January 2012
Published: 02 February 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s00034-012-9391-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points

Abstract

Access this article

Similar content being viewed by others

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Application of non-negative frequency-weighted energy operator for vowel region detection

VOP detection for read and conversation speech using CWT coefficients and phone boundaries

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points

Abstract

Access this article

Similar content being viewed by others

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Application of non-negative frequency-weighted energy operator for vowel region detection

VOP detection for read and conversation speech using CWT coefficients and phone boundaries

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation