Implicit excitation source features for robust language identification

Nandi, Dipanjan; Pati, Debadatta; Rao, K. Sreenivasa

doi:10.1007/s10772-015-9288-2

Implicit excitation source features for robust language identification

Published: 17 June 2015

Volume 18, pages 459–477, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Dipanjan Nandi¹,
Debadatta Pati² &
K. Sreenivasa Rao¹

207 Accesses
5 Citations
Explore all metrics

Abstract

In present work, the robustness of excitation source features has been analyzed for language identification (LID) task. The raw samples of linear prediction (LP) residual signal, its magnitude and phase components are processed at sub-segmental, segmental and supra-segmental levels for capturing the robust language-specific phonotactic information. Present LID study has been carried out on 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC). Gaussian mixture models are used to develop the LID systems using robust language-specific excitation source information. Robustness of excitation source information has been evinced in view of (i) background noise, (ii) varying amount of training data and (iii) varying length of test samples. Finally, the robustness of proposed excitation source features is compared with the well-known spectral features using LID performances obtained from IITKGP-MLILSC database. Segmental level excitation source features obtained from raw samples of LP residual signal and its phase component perform better at low SNR levels, compared with the vocal tract features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Article 28 November 2015

Phani Kumar Polasi & Kalva Sri Rama Krishna

Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages

Article 04 December 2017

Farah Adeeba & Sarmad Hussain

Feature Extraction Methods in Language Identification: A Survey

Article 22 April 2019

Deepti Deshwal, Pardeep Sangwan & Divya Kumar

References

All India Radio. (AIR). Accessed Feb. 2014 from http://www.newsonair.nic.in/.
Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. Journal of the Acoustical Society of America, 52(6), 1687–1697.
Article Google Scholar
Bajpai, A., & Yegnanarayana, B. (2004). Exploring features for audio clip classification using LP residual and AANN models. In The international conference on intelligent sensing and information processing (ICISIP 2004), Chennai, India (pp. 305–310).
Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, September 2009.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1), 1–38.
Foil, J. T. (1986). Language identification using noisy speech. In IEEE international conference on acoustics, speech, and signal processing (ICASSP) (pp. 861–864).
Goodman, F., Martin, A., & Wohlford, R. (1989). Improved automatic language identification in noisy speech. In International conference on acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. 528–531).
Gupta, C. S., Prasanna, S. R. M., & Yegnanarayana, B. (2002). Autoassociative neural network models for online speaker verification using source features from vowels. In IEEE international joint conference on neural networks (IJCNNS).
Kim, J. H., & Hong, K. S. (2006). Speech and gesture recognition-based robust language processing interface in noise environment. Intelligent Data Engineering and Automated Learning, 4224, 338–345.
Google Scholar
Li, M., & Narayanan, S. (2014). Simplified supervised I-vector modeling with application to robust and efficient language identification and speaker verification. Computer, Speech Language, 28, 940–958.
Article Google Scholar
Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., et al. (2013). Improving language identification robustness to highly channel degraded speech through multiple system fusion. In INTERSPEECH, Lyon, France, Aug. 2013.
Manchala, S., Prasad, V. K., & Janaki, V. (2014). GMM based language identification system using robust features. International Journal of Speech Technology, 17, 99–105.
Article Google Scholar
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings on IEEE, 63, 561–580.
Article Google Scholar
Maity, S., Vuppala, A. K., Rao, K. S., & Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In National conference on communication (NCC).
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55.
Article Google Scholar
Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.
Article Google Scholar
Rao, K. S., & Koolagudi, S. G. (2013). Characterization and recognition of emotions from speech using excitation source information. International Journal of Speech Technology, 16, 181–201.
Article Google Scholar
Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.
Article Google Scholar
Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.
Article Google Scholar
Reynolds, D. A., & Rose, R. C. (1995). Robust text independent speaker identification using gaussian mixture speaker models. IEEE Transactions on Speech and Audio Process, 3, 4–17.
Article Google Scholar
Shahina, A., & Yegnanarayana, V. (2005). Language identification in noisy environments using throat microphone signals. In International conference on intelligent sensing and information processing.
Sreenivasa Rao, K., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Speech and Audio Processing, 14, 972–980.
Article Google Scholar
Seshadri, G. P., & Yegnanarayana, B. (2009). Perceived loudness of speech based on the characteristics of glottal excitation source. The Journal of the Acoustical Society of America, 126, 2061–2071.
Article Google Scholar
Sreenivasa Rao, K., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Processing Letters, 14(10), 762–765.
Sreenivasa Rao, K., & Vuppala, A. K. (2013). Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Communication, 55(6), 745–756.
Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.
Article Google Scholar
Vuppala, A. K., & Sreenivasa Rao, K. (2013). Speaker identification under background noise using features extracted from steady vowel regions. International Journal of Adaptive Control and Signal Processing, 27(9), 781–792.
Vuppala, A. K., & Sreenivasa Rao, K. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235.
Vuppala, A. K., Sreenivasa Rao, K., & Chakrabarti, S. (2012). Improved vowel onset point detection using epoch intervals. International Journal of Electronics and Communications, 66(8), 697–700.
Yegnanarayana, B., Avendano, C., Hermansky, H., & Murthy, P. S. (1997). Processing linear prediction residual for speech enhancement. In European conference on speech communication and technology (pp. 1399–1402).
Yegnanarayana, B., Prasanna, S., Zachariah, J., & Gupta, C. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13, 575–582.
Article Google Scholar
Yegnanarayana, B., Prasanna, S. R. M., & Sreenivasa Rao, K. (2002). Speech enhancement using excitation source information. In International conference on acoustics, speech, and signal processing (ICASSP).
Yegnanarayana, B., Swamy, R. K., & Murthy, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, IIT Kharagpur, Kharagpur, 721302, West Bengal, India
Dipanjan Nandi & K. Sreenivasa Rao
Department of Electronics and Communication Engineering, National Institute of Technology, Nagaland, Dimapur, 797103, Nagaland, India
Debadatta Pati

Authors

Dipanjan Nandi
View author publications
You can also search for this author in PubMed Google Scholar
Debadatta Pati
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nandi, D., Pati, D. & Rao, K.S. Implicit excitation source features for robust language identification. Int J Speech Technol 18, 459–477 (2015). https://doi.org/10.1007/s10772-015-9288-2

Download citation

Received: 26 September 2014
Accepted: 02 June 2015
Published: 17 June 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10772-015-9288-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Implicit excitation source features for robust language identification

Abstract

Access this article

Similar content being viewed by others

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages

Feature Extraction Methods in Language Identification: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Implicit excitation source features for robust language identification

Abstract

Access this article

Similar content being viewed by others

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages

Feature Extraction Methods in Language Identification: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation