Vowel Recognition from Telephonic Speech Using MFCCs and Gaussian Mixture Models

Koolagudi, Shashidhar G.; Thakur, Sujata Negi; Barthwal, Anurag; Singh, Manoj Kumar; Rawat, Ramesh; Sreenivasa Rao, K.

doi:10.1007/978-3-642-32112-2_21

Vowel Recognition from Telephonic Speech Using MFCCs and Gaussian Mixture Models

Shashidhar G. Koolagudi⁴,
Sujata Negi Thakur⁵,
Anurag Barthwal⁴,
Manoj Kumar Singh⁵,
Ramesh Rawat⁵ &
…
K. Sreenivasa Rao⁶

Conference paper

1360 Accesses
3 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 305))

Abstract

This paper presents vowel recognition from speech using mel frequency cepstral coefficients (MFCCs). In this work, microphone recorded speech and telephonic speech are used for conducting vowel recognition studies. The vowels considered for recognition are from Hindi alphabet namely अ(a), इ(i), उ(u), ए(e), ऐ(ai), ओ(o) and औ(au). Gaussian mixture models are used for developing vowel recognition models. Vowel recognition performance for microphone recorded speech and telephonic speech are 91.4% and 84.2% respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sadeghi, V.S., Yaghmaie, K.: Vowel Recognition using Neural Networks. International Journal of Computer Science and Network Security (IJCSNS) 6(12) (December 2006)
Google Scholar
Rui, W., Yao, H., Gao, W.: Recognition of sequence lip images and application. In: Proc. ICSP (1998)
Google Scholar
Tobely, T.E., Tsuruta, N., Amamiya, M.: On-Line Speech-Reading System for Japanese Language (2000)
Google Scholar
Paulraj, M.P., Yaacob, S.B., Nazri, A., Kumar, S.: Classification of Vowel Sounds Using MFCC and Feed Forward Neural Network. In: 5th International Colloquium on Signal Processing & Its Applications (CSPA) (2009)
Google Scholar
Chauhan, R., Yadav, J., Koolagudi, S.G., Sreenivasa Rao, K.: Text Independent Emotion Recognition Using Spectral Features. In: Aluru, S., Bandyopadhyay, S., Catalyurek, U.V., Dubhashi, D.P., Jones, P.H., Parashar, M., Schmidt, B. (eds.) IC3 2011. CCIS, vol. 168, pp. 359–370. Springer, Heidelberg (2011)
Chapter Google Scholar
Gheidi, M., Sayadian, A.: Vowel Detection and Classification using Support Vector Machines (SVM). In: 4th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications, TUNISIA, March 25-29 (2007)
Google Scholar
Gupta, J.P., Agrawal, S.S., Ahmed, R.: Perception of (Hindi) Vowels in Clipped Speech. Journal of Acoustic Society of America 49(2B), 567–568 (1971)
Article Google Scholar
Li, Y., Zhao, Y.: Recognizing emotions in speech using short-term and long-term features. In: Proc. of the International Conference on Speech and Language Processing, pp. 2255–2258 (1998)
Google Scholar
Benesty, J., Sondhi, M.M., Huang, Y.: Springer handbook on speech processing. Springer (2008)
Google Scholar
Sreenivasa Rao, K., Yegnanarayana, B.: Duration modification using glottal closure instants and vowel onset points. Speech Communication 51, 1263–1269 (2009), doi:10.1016/j.specom.2009.06.004
Article Google Scholar
Koolagudi, S.G., Kumar, N., Sreenivasa Rao, K.: Speech emotion recognition using segmental level prosodic analysis. In: Proc of IEEE International Confrence on Device Communication BIT MESRA, India (February 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Graphic Era University, Dehradun, 248002, Uttarakhand, India
Shashidhar G. Koolagudi & Anurag Barthwal
Department of Computer Applications, Graphic Era University, Dehradun, 248002, Uttarakhand, India
Sujata Negi Thakur, Manoj Kumar Singh & Ramesh Rawat
School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
K. Sreenivasa Rao

Authors

Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar
Sujata Negi Thakur
View author publications
You can also search for this author in PubMed Google Scholar
Anurag Barthwal
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ramesh Rawat
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Bristol, BS8 1UB, Bristol, UK
Jimson Mathew & Dhiraj K. Pradhan &
Intel Corporation, 211, Northeast 25 th Ave., 97124, Hillsbro, Oregon, USA
Priyadarshan Patra
Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi, Kerala, India
A. J. Kuttyamma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koolagudi, S.G., Thakur, S.N., Barthwal, A., Singh, M.K., Rawat, R., Sreenivasa Rao, K. (2012). Vowel Recognition from Telephonic Speech Using MFCCs and Gaussian Mixture Models. In: Mathew, J., Patra, P., Pradhan, D.K., Kuttyamma, A.J. (eds) Eco-friendly Computing and Communication Systems. ICECCS 2012. Communications in Computer and Information Science, vol 305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32112-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-32112-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32111-5
Online ISBN: 978-3-642-32112-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics