Abstract
In this paper, we propose an efficient approach to spotting and recognition of consonant-vowel (CV) units from continuous speech using accurate detection of vowel onset points (VOPs). Existing methods for VOP detection suffer from lack of high accuracy, spurious VOPs, and missed VOPs. The proposed VOP detection is designed to overcome most of the shortcomings of the existing methods and provide accurate detection of VOPs for improving the performance of spotting and recognition of CV units. The proposed method for VOP detection is carried out in two levels. At the first level, VOPs are detected by combining the complementary evidence from excitation source, spectral peaks, and modulation spectrum. At the second level, hypothesized VOPs are verified (genuine or spurious), and their positions are corrected using the uniform epoch intervals present in the vowel regions. The spotted CV units are recognized using a two-stage CV recognizer. Two-stage CV recognition system consists of hidden Markov models (HMMs) at the first stage for recognizing the vowel category of a CV unit and support vector machines (SVMs) for recognizing the consonant category of a CV unit at the second stage. Performance of spotting and recognition of CV units from continuous speech is evaluated using Telugu broadcast news speech corpus.
Similar content being viewed by others
References
L.R. Bahl, R. Bakis, P.S. Cohen, A.G. Cole, F. Jelinek, B.L. Lewis, R.L. Mercer, Further results on the recognition of a continuously read natural corpus, in Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (1980), pp. 872–875
R. Collobert, S. Bengio, C. Williamson, SVMTorch, support vector machines for large-scale regression problems. J. Mach. Learn. Res. 1, 143–160 (2001)
O. Fujimura, Syllable as a unit of speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 82–87 (1975)
S.V. Gangashetty, Neural network models for recognition of consonant-vowel units of speech in Multiple Languages. Ph.D. thesis, IIT Madras, Oct. 2004
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Spotting multilingual consonant-vowel units of speech using neural networks, in An ISCA Tutorial and Research Workshop on Non-Linear Speech Processing, Apr. 2005, pp. 287–297
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, Oct. 2004, pp. 401–410
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances, in Proc. Intelligent Sensing and Information Processing (2004), pp. 159–164
A. Ganapathiraju, J. Hamaker, J. Picone, M. Ordowski, G.R. Doddington, Syllable based large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(4), 358–366 (2001)
J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993)
D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87, 866–873 (1990)
M.Y. Hwang, X.D. Huang, Shared distribution Hidden Markov Models for speech recognition. IEEE Trans. Speech Audio Process. 1, 414–420 (1993)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)
J.W. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81, 1215–1247 (1993)
S.R.M. Prasanna, B.V. Sandeep Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17(4), 556–565 (2009)
S.R.M. Prasanna, S.V. Gangashetty, B. Yegnanarayana, Significance of vowel onset point for speech analysis, in Signal Processing and Communications (Biennial Conf., IISc) (2001), pp. 81–88
S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information. Interspeech, pp. 1133–1136, Sep. 2005
L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (PTR Prentice Hall, Englewood Cliffs, 1993)
R.M. Schwartz, Y.L. Chow, S. Roucos, M. Krasner, J. Makhoul, Improved hidden Markov modeling phonemes for continuous speech recognition, in Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (1984), pp. 21–24
C.C. Sekhar, Neural network models for recognition of stop consonant-vowel (SCV) segments in continuous speech. Ph.D. thesis, IIT Madras (1996)
C.C. Sekhar, W.F. Lee, K. Takeda, F. Itakura, Acoustic modeling of subword units using support vector machines, in Proc. of Workshop on Spoken Language Processing (2003), pp. 79–86
R. Thangarajan, A.M. Natarajan, M. Selvam, Syllable modeling in continuous speech recognition for Tamil language. Int. J. Speech Technol. 12, 47–57 (2009)
A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on recognition of Consonant-Vowel (CV) units, in Proc. Int. Conf. Contemporary Computing, Aug. 2010. Springer Communications in Computer and Information Science (2010), pp. 284–294. ISSN: 1865-0929
J.-H. Wang, S.-H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proc. Int. Conf. Acoust. Speech, Signal Process, vol. 1, Sep. 1999, pp. 1261–1264
J.F. Wang, C.H. Wu, S.H. Chang, J.Y. Lee, A hierarchical neural network based on C/V segmentation algorithm for isolated Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.0 (Cambridge University Press, Cambridge, 2000)
Acknowledgement
The authors like to acknowledge the reviewers for their valuable comments and suggested corrections. Those have helped us a lot for improving the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vuppala, A.K., Rao, K.S. & Chakrabarti, S. Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points. Circuits Syst Signal Process 31, 1459–1474 (2012). https://doi.org/10.1007/s00034-012-9391-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-012-9391-4