Pitch synchronous and glottal closure based speech analysis for language recognition

Rao, K. Sreenivasa; Maity, Sudhamay; Reddy, V. Ramu

doi:10.1007/s10772-013-9193-5

Pitch synchronous and glottal closure based speech analysis for language recognition

Published: 12 April 2013

Volume 16, pages 413–430, (2013)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

K. Sreenivasa Rao¹,
Sudhamay Maity¹ &
V. Ramu Reddy¹

458 Accesses
31 Citations
Explore all metrics

Abstract

This paper explores pitch synchronous and glottal closure (GC) based spectral features for analyzing the language specific information present in speech. For determining pitch cycles (for pitch synchronous analysis) and GC regions, instants of significant excitation (ISE) are used. The ISE correspond to the instants of glottal closure (epochs) in the case of voiced speech, and some random excitations like onset of burst in the case of nonvoiced speech. For analyzing the language specific information in the proposed features, Indian language speech database (IITKGP-MLILSC) is used. Gaussian mixture models are used to capture the language specific information from the proposed features. Proposed pitch synchronous and glottal closure spectral features are evaluated using language recognition studies. The evaluation results indicate that language recognition performance is better with pitch synchronous and GC based spectral features compared to conventional spectral features derived through block processing. GC based spectral features are found to be more robust against degradations due to background noise. Performance of proposed features is also analyzed on standard Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implicit excitation source features for robust language identification

Article 17 June 2015

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Article 28 November 2015

Speech synthesis for glottal activity region processing

Article 03 December 2018

References

Ambikairajah, E., Li, H., Wang, L., Yin, B., & Sethu, V. (2011). Language identification: a tutorial. IEEE Circuits and Systems Magazine, 11(2), 82–108.
Article Google Scholar
Benesty, J., Sondhi, M. M., & Huang, Y. (2007). Springer handbook of speech processing. New York: Springer.
Google Scholar
Bhaskararao, P. (2005). Salient phonetic features of Indian languages in speech technology. Sâdhana, 36(5), 587–599.
Article Google Scholar
Carrasquillo, P. A. T., Reynolds, D. A., & Deller, J. R. (2002). Language identification using Gaussian mixture model tokenization. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (Vol. I, pp. 757–760).
Google Scholar
Cimarusti, D., & Eves, R. B. (1982). Development of an automatic identification system of spoken languages: phase I. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (pp. 1661–1663).
Google Scholar
Cole, R. A., Inouye, J. W. T., Muthusamy, Y. K., & Gopalakrishnan, M. (1989). Language identification with neural networks: a feasibility study. In Proc. IEEE pacific rim conf. communications, computers and signal processing (pp. 525–529).
Chapter Google Scholar
Corredor-Ardoy, C., Gauvain, J., Adda-Decker, M., & Lamel, L. (1997). Language identification with language-independent acoustic models. In Proc. EUROSPEECH-1997 (pp. 55–58).
Google Scholar
Dalsgaard, P., & Andersen, O. (1992). Identification of mono- and poly-phonemes using acoustic-phonetic features derived by a self-organising neural network. In Proc. int. conf. spoken language processing (ICSLP-1992) (pp. 547–550).
Google Scholar
Guoliang, Z., Fang, Z., & Zhanjiang, S. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(16), 582–589.
MATH Google Scholar
Ives, R. (1986). A minimal rule AI expert system for real-time classification of natural spoken languages. In Proc. 2nd annual artificial intelligence and advanced computer technology conf. (pp. 337–340).
Google Scholar
Jayaram, A. K. V. S., Ramasubramanian, V., & Sreenivas, T. V. (2003). Language identification using parallel sub-word recognition. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (Vol. I, pp. 32–35).
Google Scholar
Jyotsna, B., Murthy, H. A., & Nagarajan, T. (2000). Language identification from short segments of speech. In Proceedings of int. conf. spoken language processing, Beijing, China, October 2000 (pp. 1033–1036).
Google Scholar
Koolagudi, S. G., & Rao, K. S. (2011). Two-stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14(3), 167–181.
Article Google Scholar
Lamel, L. F., & Gauvain, J. L. (1994). Language identification using phonebased acoustic likelihoods. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (Vol. 1, pp. 293–296).
Google Scholar
Lander, T., Cole, R., Oshika, B., & Noel, M. (1995). The OGI 22 language telephone speech corpus. In Proc. EUROSPEECH-1995 (pp. 817–820).
Google Scholar
LDC, Philadelphia, PA (1996). LDC96S46–LDC96S60. http://www.ldc.upenn.edu/Catalog.
Leonard, R. G., & Doddington, G. R. (1974). Automatic language identification (Tech. Rep., A.F.R.A.D. Centre Tech. Rep. RADC-TR-74-200).
Mallidi, S. H. R., Prahallad, K., Gangashetty, S. V., & Yegnanarayana, B. (2010). Significance of pitch synchronous analysis for speaker recognition using AANN models. In Proceedings of INTERSPEECH-2010, Makuhari, Japan (pp. 669–672).
Google Scholar
Martínez, D., Burget, L., Ferrer, L., & Scheffer, N. (2012). iVector-based prosodic system for language identification. In ICASSP.
Google Scholar
Mary, L., & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In Proc. int. conf. intelligent sensing and information processing, Chennai, India (pp. 317–320).
Google Scholar
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796.
Article Google Scholar
Mary, L., Rao, K. S., & Yegnanarayana, B. (2005). Neural network classifiers for language identification using syntactic and prosodic features. In Proc. IEEE int. conf. intelligent sensing and information processing, Chennai, India, January 2005 (pp. 404–408).
Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.
Article Google Scholar
Muthusamy, Y. K., Cole, R. A., & Oshika, B. T. (1992). The OGI multi-language telephone speech corpus. In Proceedings of int. conf. spoken language processing, October 1992 (pp. 895–898).
Google Scholar
Nagarajan, T., & Murthy, H. A. (2002). Language identification using spectral vector distribution across languages. In Proc. international conference on natural language processing (pp. 327–335).
Google Scholar
Nakagawa, S., Ueda, Y., & Seino, T. (1992). Speaker-independent, text independent language identification by HMM. In Proc. int. conf. spoken language processing (ICSLP-1992) (pp. 1011–1014).
Google Scholar
Narendra, N. P., Rao, K. S., Ghosh, K., Reddy, V. R., & Maity, S. (2011). Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology, 14(3), 167–181.
Article Google Scholar
Navratil, J. (2001). Spoken language recognition a step toward multilinguality in speech processing. IEEE Transactions on Speech and Audio Processing, 9(6), 678–685.
Article Google Scholar
Pellegrino, F., & Andre-Abrecht, R. (1999). An unsupervised approach to language identification. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (pp. 833–836).
Google Scholar
Pellegrino, F., Farinas, J., & André-Obrecht, R. (1999). Comparison of two phonetic approaches to language identification. In Proc. EUROSPEECH’99 (pp. 399–402).
Google Scholar
Prahalad, L., Prahalad, K., & Ganapathiraju, M. (2005). A simple approach for building transliteration editors for Indian languages. Journal of Zhejiang University. Science A, 6(11), 1354–1361.
Article Google Scholar
Ramasubramanian, V., Sai Jayaram, A. K. V., & Sreenivas, T. V. (2003). Language identification using parallel phone recognition. In WSLP, TIFR, Mumbai, January 2003 (pp. 109–116).
Google Scholar
Rao, K. S. (2010a). Real time prosody modification. Journal of Signal and Information Processing, 1(1), 50–62.
Article Google Scholar
Rao, K. S. (2010b). Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Computer Speech & Language, 24(1), 474–494.
Article Google Scholar
Rao, K. S. (2011a). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14(1), 19–33.
Article Google Scholar
Rao, K. S. (2011b). Role of neural network models for developing speech systems. Sâdhana, 36, 783–836.
Article Google Scholar
Rao, K. S., & Koolagudi, S. G. (2010). Selection of suitable features for modeling the durations of syllables. Journal of Software Engineering and Applications, 3(12), 1107–1117.
Article Google Scholar
Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108.
Article Google Scholar
Riek, L., Mistreta, W., & Morgan, D. (1991). Experiments in language identification (Tech. Rep., Lockheed Sanders Tech. Rep. SPCOT-91-002).
Sangwan, A., Mehrabani, M., & Hansen, J. H. L. (2010). Automatic language analysis and identification based on speech production knowledge. In ICASSP.
Google Scholar
Siu, M.-H.H., Yang, X., & Gish, H. (2009). Discriminatively trained GMMs for language classification using boosting methods. IEEE Transactions on Audio, Speech, and Language Processing, 17(1), 187–197.
Article Google Scholar
Ueda, Y., & Nakagawa, S. (1990). Diction for phoneme/syllable/word-category and identification of language using HMM. In Proc. int. conf. spoken language processing (ICSLP-1990) (pp. 1209–1212).
Google Scholar
Wong, E., & Sridharan, S. (2002). Gaussian mixture model based language identification system. In Proc. int. conf. spoken language processing (ICSLP-2002) (pp. 93–96).
Google Scholar
Zissman, M. A. (1993). Automatic language identification using Gaussian mixture and hidden Markov models. In Proceedings of IEEE int. conf. acoust., speech, and signal processing, April 1993 (pp. 399–402).
Chapter Google Scholar
Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4, 31–44.
Article Google Scholar
Znai, L.-F, Siu, M.-H.H., Yang, X., & Gish, H. (2006). Discriminatively trained language models using support vector machines for language identification. In Proc. speaker and language recognition workshop 2006. IEEE odyssey (pp. 1–6).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
K. Sreenivasa Rao, Sudhamay Maity & V. Ramu Reddy

Authors

K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Sudhamay Maity
View author publications
You can also search for this author in PubMed Google Scholar
V. Ramu Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, K.S., Maity, S. & Reddy, V.R. Pitch synchronous and glottal closure based speech analysis for language recognition. Int J Speech Technol 16, 413–430 (2013). https://doi.org/10.1007/s10772-013-9193-5

Download citation

Received: 07 November 2012
Accepted: 14 March 2013
Published: 12 April 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10772-013-9193-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pitch synchronous and glottal closure based speech analysis for language recognition

Abstract

Access this article

Similar content being viewed by others

Implicit excitation source features for robust language identification

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Speech synthesis for glottal activity region processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pitch synchronous and glottal closure based speech analysis for language recognition

Abstract

Access this article

Similar content being viewed by others

Implicit excitation source features for robust language identification

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Speech synthesis for glottal activity region processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation