A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models

  • K. N. R. K. Raju Alluri
  • Sivanand Achanta
  • Rajendra Prasath
  • Suryakanth V. Gangashetty
  • Anil Kumar Vuppala
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10089)


The present study focuses on the text-independent speaker recognition in emotional conditions. In this paper, both system and source features are considered to represent speaker specific information. At the model level, Gaussian Mixture Models (GMMs), Gaussian Mixture Model-Universal Background Model (GMM-UBM) and Deep Neural Networks (DNN) are explored. The experiments are performed using 3 emotional databases, i.e. German emotional speech database (EMO-DB), IITKGP-SESC: Hindi and IITKGP-SESC: Telugu databases. The emotions considered in the present study are neutral, anger, happy and sad. The results show that, the performance of a speaker recognition system trained with clean speech is degrading while testing with emotional data irrespective of feature used or model used to build the system. The best results are obtained for the score level fusion of system and source features based systems when speakers are modeled with DNNs.


Speaker recognition Emotion System features Source features Gaussian Mixture Modeling Universal Background Modeling Deep Neural Networks 



The first author would like to thank Department of Electronics and Information Technology, Ministry of Communication & IT, Govt of India for granting PhD Fellowship under Visvesvaraya PhD Scheme.


  1. 1.
    Atal, B.S.: Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976)CrossRefGoogle Scholar
  2. 2.
    Atal, B.S.: Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Am. 52(6B), 1687–1697 (1972)CrossRefGoogle Scholar
  3. 3.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. In: Proceedings of the INTERSPEECH, vol. 5, pp. 1517–1520 (2005)Google Scholar
  4. 4.
    Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using gmm supervectors for speaker verification. IEEESignal Process. Lett. 13(5), 308–311 (2006)CrossRefGoogle Scholar
  5. 5.
    Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRefGoogle Scholar
  6. 6.
    Ghiurcau, M.V., Rusu, C., Astola, J.: Speaker recognition in an emotional environment. In: Proceedings of the Signal Processing and Applied Mathematics for Electronics and Communications, pp. 81–84 (2011)Google Scholar
  7. 7.
    Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  8. 8.
    Lakshmi, H.R., Achanta, S., Bhavya, P.V., Gangashetty, S.V.: An investigation of end-to-end speaker recognition using deep neural networks. Int. J. Eng. Res. Electron. Commun. Eng. 3(1), 42–47 (2016)Google Scholar
  9. 9.
    Koolagudi, S.G., Krothapalli, R.S.: Two stage emotion recognition based on speaking rate. Int. J. Speech Technol. 14(1), 35–48 (2011)CrossRefGoogle Scholar
  10. 10.
    Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: Speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03547-0_46 CrossRefGoogle Scholar
  11. 11.
    Koolagudi, S.G., Sharma, K., Sreenivasa Rao, K.: Speaker recognition in emotional environment. In: Mathew, J., Patra, P., Pradhan, D.K., Kuttyamma, A.J. (eds.) ICECCS 2012. CCIS, vol. 305, pp. 117–124. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-32112-2_15 CrossRefGoogle Scholar
  12. 12.
    Mounika, K.V., Achanta, S., Lakshmi, H.R., Suryakanth, V.G., Vuppala, A.K.: An investigation of deep neural network architectures for language recognition in Indian languages. In: Proceedings of the INTERSPEECH, pp. 2930–2933 (2016)Google Scholar
  13. 13.
    Li, D., Yang, Y., Wu, Z., Wu, T.: Emotion-state conversion for speaker recognition. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 403–410. Springer, Heidelberg (2005). doi: 10.1007/11573548_52 CrossRefGoogle Scholar
  14. 14.
    Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., Moreno, P.: Automatic language identification using deep neural networks. In: Proceedings of the ICASSP, pp. 5337–5341 (2014)Google Scholar
  15. 15.
    Makhoul, J.: Linear prediction: A tutorial review. Proc. IEEE 63(4), 561–580 (1975)CrossRefGoogle Scholar
  16. 16.
    Oshaughnessy, D.: Speaker recognition. IEEE ASSP Mag. 3, 4–17 (1986)CrossRefGoogle Scholar
  17. 17.
    O’shaughnessy, D.: Speech Communication: Human and Machine. Universities Press, India (1987)zbMATHGoogle Scholar
  18. 18.
    Prasanna, S.M., Gupta, C.S., Yegnanarayana, B.: Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun. 48(10), 1243–1261 (2006)CrossRefGoogle Scholar
  19. 19.
    Reynolds, D.: An overview of automatic speaker recognition. In: Proceedings of the ICASSP, pp. 4072–4075 (2002)Google Scholar
  20. 20.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000)CrossRefGoogle Scholar
  21. 21.
    Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)CrossRefGoogle Scholar
  22. 22.
    Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)CrossRefGoogle Scholar
  23. 23.
    Salman, A., Chen, K.: Exploring speaker-specific characteristics with deep learning. In: Proceedings of the IJCNN, pp. 103–110. IEEE (2011)Google Scholar
  24. 24.
    Scherer, K.R., Johnstone, T., Klasmeyer, G., Bänziger, T.: Can automatic speaker verification be improved by training the algorithms on emotional speech? In: Proceedings of the INTERSPEECH, pp. 807–810 (2000)Google Scholar
  25. 25.
    Wegmuller, M., von der Weid, J.P., Oberson, P., Gisin, N.: Study on speaker verification on emotional speech. In: Proceedings of the INTERSPEECH (2006)Google Scholar
  26. 26.
    Wu, T., Yang, Y., Wu, Z.: Improving speaker recognition by training on emotion-added models. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 382–389. Springer, Heidelberg (2005). doi: 10.1007/11573548_49 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • K. N. R. K. Raju Alluri
    • 1
  • Sivanand Achanta
    • 1
  • Rajendra Prasath
    • 2
  • Suryakanth V. Gangashetty
    • 1
  • Anil Kumar Vuppala
    • 1
  1. 1.Speech and Vison Lab (LTRC)International Institute of Information Technology HyderabadHyderabadIndia
  2. 2.NTNUTrondheimNorway

Personalised recommendations