Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Polasi, Phani Kumar; Sri Rama Krishna, Kalva

doi:10.1007/s10772-015-9326-0

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Published: 28 November 2015

Volume 19, pages 75–85, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

239 Accesses
6 Citations
Explore all metrics

Abstract

Language Identification has gained significant importance in recent years, both in research and commercial market place, demanding an improvement in the ability of machines to distinguish between languages. Although methods like Gaussian mixture models, hidden Markov models and neural networks are used for identifying languages the problem of language identification in noisy environments could not be addressed so far. This paper addresses the performance of automatic language identification system in noisy environments. A comparative performance analysis of speech enhancement techniques like minimum mean squared estimation, spectral subtraction and temporal processing, with different types of noise at different SNRs, is presented here. Though these individual enhancement techniques may not yield good performance with different types of noise at different SNRs, it is proposed to combine the evidences of all these techniques to improve the overall performance of the system significantly. The language identification studies are performed using IITKGP-MLILSC (IIT Kharagpur-Multilingual Indian Language Speech Corpus) databases which consists of 27 languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Conventional and contemporary approaches used in text to speech synthesis: a review

Article 13 November 2022

A Strategic Approach for Robust Dysarthric Speech Recognition

Article 01 February 2024

References

Ambikairajah, E., et al. (2011). Language identification: A tutorial. Circuits and Systems Magazine IEEE, 11(2), 82–108.
Article Google Scholar
Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.). (2008). Springer handbook of speech processing. Berlin: Springer.
Google Scholar
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.
MathSciNet MATH Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443–445.
Article Google Scholar
Foil, J. (1986). Language identification using noisy speech. Acoustics, Speech, and Signal Processing, IEEE international conference on ICASSP’86. Vol. 11. IEEE.
Goodman, F. J., Martin, A. F., & Wohlford, R. (1989). Improved automatic language identification in noisy speech. Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 international conference on. IEEE.
Hegde, R. M., & Murthy, H. A. (2005) Automatic language identification and discrimination using the modified group delay feature. In Intelligent Sensing and Information Processing, 2005. Proceedings of 2005 International Conference on. IEEE.
Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2012). A hierarchical language identification system for Indian languages. Digital Signal Processing, 22(3), 544–553.
Article MathSciNet Google Scholar
Krishnamoorthy, P., & Prasanna, S. R. M. (2009). Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments. Sadhana, 34(5), 729–754.
Article Google Scholar
Lander, T., Cole, R., Oshika, B., & Noel, M. (1995). The OGI 22 language telephone speech corpus. In Eurospeech (pp. 1894–1903).
Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., & Graciarena, M. (2013). Improving language identification robustness to highly channel-degraded speech through multiple system fusion. In INTERSPEECH (pp. 1507–1510). Lyon.
Maity, S., et al. (2012). IITKGP-MLILSC speech database for language identification. Communications (NCC), 2012 National Conference on. IEEE.
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.
Article Google Scholar
Nakagawa, S., Ueda, Y., & Seino T. (1992). Speaker-independent, text-independent language identification by HMM. ICSLP. Vol. 92.
Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.
Article Google Scholar
Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.
Article Google Scholar
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.
Article Google Scholar
Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(3), 259–272.
Article Google Scholar
Vuppala, A. K., & Sreenivasa Rao, K. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235.
Article Google Scholar
Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31.
Article Google Scholar

Download references

Acknowledgments

The authors are grateful to Dr K Sreenivasa Rao, Associate Professor and his team at School of Information Technology (SIT), IIT Kharagpur for providing IIT Kharagpur-Multilingual Indian Language Speech Corpus) databases which consists of 27 languages. We would also like to thank their suggestions and helpful discussions.

Author information

Authors and Affiliations

ECE Department, V R Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India
Phani Kumar Polasi & Kalva Sri Rama Krishna

Authors

Phani Kumar Polasi
View author publications
You can also search for this author in PubMed Google Scholar
Kalva Sri Rama Krishna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phani Kumar Polasi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Polasi, P.K., Sri Rama Krishna, K. Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise. Int J Speech Technol 19, 75–85 (2016). https://doi.org/10.1007/s10772-015-9326-0

Download citation

Received: 31 July 2015
Accepted: 21 November 2015
Published: 28 November 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10772-015-9326-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Conventional and contemporary approaches used in text to speech synthesis: a review

A Strategic Approach for Robust Dysarthric Speech Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Conventional and contemporary approaches used in text to speech synthesis: a review

A Strategic Approach for Robust Dysarthric Speech Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation