Dravidian language classification from speech signal using spectral and prosodic features

Koolagudi, Shashidhar G.; Bharadwaj, Akash; Srinivasa Murthy, Y. V.; Reddy, Nishaanth; Rao, Priya

doi:10.1007/s10772-017-9466-5

Dravidian language classification from speech signal using spectral and prosodic features

Published: 14 October 2017

Volume 20, pages 1005–1016, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Shashidhar G. Koolagudi¹,
Akash Bharadwaj¹,
Y. V. Srinivasa Murthy ORCID: orcid.org/0000-0001-6146-5272¹,
Nishaanth Reddy¹ &
…
Priya Rao¹

371 Accesses
10 Citations
Explore all metrics

Abstract

The interesting aspect of the Dravidian languages is a commonality through a shared script, similar vocabulary, and their common root language. In this work, an attempt has been made to classify the four complex Dravidian languages using cepstral coefficients and prosodic features. The speech of Dravidian languages has been recorded in various environments and considered as a database. It is demonstrated that while cepstral coefficients can indeed identify the language correctly with a fair degree of accuracy, prosodic features are added to the cepstral coefficients to improve language identification performance. Legendre polynomial fitting and the principle component analysis (PCA) are applied on feature vectors to reduce dimensionality which further resolves the issue of time complexity. In the experiments conducted, it is found that using both cepstral coefficients and prosodic features, a language identification rate of around 87% is obtained, which is about 18% above the baseline system using Mel-frequency cepstral coefficients (MFCCs). It is observed from the results that the temporal variations and prosody are the important factors needed to be considered for the tasks of language identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A lazy learning-based language identification from speech using MFCC-2 features

Article 28 January 2019

Spoken Language Identification of Indian Languages Using MFCC Features

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

Article 12 October 2018

Notes

The terms ’pitch’ and ’F0’ are interchangeably used in the article.

References

Allen, F., Ambikairajah, E., & Epps, J. (2005). Language identification using warping and the shifted delta cepstrum. In IEEE 7th workshop on multimedia signal processing, pp. 1–4. IEEE.
Atal, B., & Rabiner, L. (1946). pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(3), 201–212.
Article Google Scholar
Brümmer, N., Cumani, S., Glembek, O., Karafiát, M., Matějka, P., Pešán, J., Plchot, O., Soufifar, M., Villiers, E. D., & Cernockỳ, J. H. (2012). Description and analysis of the brno276 system for lre2011. In Odyssey 2012-the speaker and language recognition workshop.
Buttkus, B. (2000). Spectral Analysis and Filter Theory in Applied Geophysics: With 23 Tables. Berlin: Springer Science & Business Media.
Book Google Scholar
Chandrasekaran, K. (2012). Indeterminacies in howatch’s st. benet’s trilogy. Language in India, 12(12).
Childers, D. G., Hahn, M., & Larar, J. N. (1989). Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(11), 1771–1774.
Article Google Scholar
Ciresan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep big simple neural nets excel on handwritten digit recognition []. Retrieved July 03, 2014, from: http://arxiv.orgpdf/1003.0358.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning, pp. 160–167. ACM.
Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Article Google Scholar
Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D., & Dehak, R. (2011). Language recognition via i-vectors and dimensionality reduction. In Twelfth annual conference of the international speech communication association.
Deng, L., Dong, Y., et al. (2014). Deep learning: Methods and applications. Foundations and Trends^®. Signal Processing, 7(3–4), 197–387.
MATH Google Scholar
Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple classifier systems, pp. 1–15. Springer.
Ellis, D. (2005). Reproducing the feature outputs of common programs using matlab and melfcc.
Ganapathy, S., Han, K., Thomas, S., Omar, M., Segbroeck, M. V., & Narayanan, S. S. (2014). Robust language identification using convolutional neural network features. In Fifteenth annual conference of the international speech communication association.
Gnana S. K., & Deepa, S. N. (2013). Review on methods to fix number of hidden neurons in neural networks. In Mathematical Problems in Engineering.
Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 6645–6649. IEEE.
Hinton, G., Deng, L., Dong, Y., Dahl, G. E., Mohamed, A. R., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Huang, X., Acero, A., Hon, H. W., & Reddy, R. (2001). Spoken language processing: A guide to theory, algorithm, and system development. Upper Saddle River: Prentice Hall PTR.
Google Scholar
Jain, D., & Cardona, G. (2007). The Indo-Aryan Languages. Abingdon: Routledge.
Google Scholar
Jiang, B., Song, Y., Wei, S., McLoughlin, I. V., & Dai, L. R. (2014). Task-aware deep bottleneck features for spoken language identification. In Proceedings of the 15th annual conference of the international speech communication association (INTERSPECH), Singapore.
Kumar, K., Kim, C., & Stern, R. M. (2011). Delta-spectral cepstral coefficients for robust speech recognition. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4784–4787. IEEE.
Li, H., Ma, B., & Lee, K. A. (2013). Spoken language recognition: From fundamentals to practice. Proceedings of the IEEE, 101(5), 1136–1159.
Article Google Scholar
Li, H., & Ma, B. (2005). A phonotactic language model for spoken language identification. In Proceedings of the 43rd annual meeting on association for computational linguistics, pp. 515–522. Association for Computational Linguistics.
Loizou, P. (1998). A matlab software tool for speech analysis. Dallas: Author.
Google Scholar
Lopez-Moreno, I., Gonzalez-Dominguez, J., Martinez, D., Plchot, O., Gonzalez-Rodriguez, J., & Moreno, P. J. (2016). On the use of deep feedforward neural networks for automatic language identification. Computer Speech and Language, 40, 46–59.
Article Google Scholar
Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., & Moreno, P. (2014). Automatic language identification using deep neural networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5337–5341. IEEE.
Martínez, D., Burget, L., Ferrer, L., & Scheffer, N. (2012). ivector-based prosodic system for language identification. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4861–4864. IEEE.
Matejka, P., Burget, L., Schwarz, P., & Cernocky, J. (2006). Brno university of technology system for nist 2005 language recognition evaluation. In The IEEE Odyssey speaker and language recognition workshop, pp. 1–7. IEEE.
Matejka, P., Schwarz, P., Cernockỳ, J., & Chytil, P. (2005). Phonotactic language identification using high quality phoneme recognition. In Interspeech, pp. 2237–2240.
Mohamed, A. R., Dahl, G. E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22.
Article Google Scholar
Montavon, G. (2009). Deep learning for spoken language identification. In NIPS workshop on deep learning for speech recognition and related applications.
Nanavati, T. (2002). Biometrics. New York: Wiley.
Google Scholar
Ng, R.W., Leung, C.C., Lee, T., Ma, B., & Li, H. (2010). Prosodic attribute model for spoken language identification. In IEEE international conference on acoustics speech and signal processing (ICASSP), pp. 5022–5025. IEEE.
Pinto, J., Yegnanarayana, B., Hermansky, H., & Doss, M. M. (2008). Exploiting contextual information for improved phoneme recognition. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4449–4452. IEEE.
Prahallad, K., Kumar E. N., Keri V., Rajendran, S., & Black, A. W. (2012). In INTERSPEECH TheIIIT-HIndic speech databases.
Ranjan, S., Yu, C., Zhang, C., Kelly, F., & Hansen, J. H. (2016). Language recognition using deep neural networks with very limited training data. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5830–5834. IEEE.
Rao, K. S., & Nandi, D. (2015). Language Identification Using Excitation Source Features. Berlin: Springer.
Google Scholar
Singer, E., Torres-Carrasquillo, P., Reynolds, D. A., McCree, A., Richardson, F., Dehak, N., & Sturim, D. (2012). The mitll nist lre 2011 language recognition system. In IEEE international conference on acoustics speech and signal processing (ICASSP), pp. 209–215.
Sturim, D., Campbell, W., Dehak, N., Karam, Z., McCree, A., Reynolds, D., Richardson, F., Torres-Carrasquillo, P., & Shum, S. (2011). The mit ll 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5272–5275. IEEE.
Torres-Carrasquillo, P. A., Reynolds, D., & Deller, J. R. Jr. (2002). Language identification usingGaussian mixture model tokenization. In IEEE international conference on acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I–757). IEEE.
Torres-Carrasquillo, P. A., Singer, E., Kohler, M. A., Greene, R. J., Reynolds, D. A., & Deller Jr., J. R. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In Interspeech.
Torres-Carrasquillo P. A., Singer E., Gleason T., McCree A., Reynolds D. A., Richardson F., & Sturim, D. (2010). The mitll nist lre 2009 language recognition system. In IEEE international conference on acoustics speech and signal processing (ICASSP), pp. 4994–4997. IEEE.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., & Povey, D. (1997). In The HTK book (Vol. 2. Entropic Cambridge Research Laboratory Cambridge).
Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31.
Article Google Scholar
Zissman, M. A. (1995). Language identification using phoneme recognition and phonotactic language modeling. In International conference on acoustics, speech, and signal processing (ICASSP) (Vol. 5, pp. 3503–3506). IEEE.

Download references

Author information

Authors and Affiliations

Department of CSE, National Institute of Technology Karnataka, Mangalore, 575 025, India
Shashidhar G. Koolagudi, Akash Bharadwaj, Y. V. Srinivasa Murthy, Nishaanth Reddy & Priya Rao

Authors

Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar
Akash Bharadwaj
View author publications
You can also search for this author in PubMed Google Scholar
Y. V. Srinivasa Murthy
View author publications
You can also search for this author in PubMed Google Scholar
Nishaanth Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Priya Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Y. V. Srinivasa Murthy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koolagudi, S.G., Bharadwaj, A., Srinivasa Murthy, Y.V. et al. Dravidian language classification from speech signal using spectral and prosodic features. Int J Speech Technol 20, 1005–1016 (2017). https://doi.org/10.1007/s10772-017-9466-5

Download citation

Received: 29 May 2017
Accepted: 19 September 2017
Published: 14 October 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10772-017-9466-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dravidian language classification from speech signal using spectral and prosodic features

Abstract

Access this article

Similar content being viewed by others

A lazy learning-based language identification from speech using MFCC-2 features

Spoken Language Identification of Indian Languages Using MFCC Features

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dravidian language classification from speech signal using spectral and prosodic features

Abstract

Access this article

Similar content being viewed by others

A lazy learning-based language identification from speech using MFCC-2 features

Spoken Language Identification of Indian Languages Using MFCC Features

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation