Advertisement

International Journal of Speech Technology

, Volume 22, Issue 4, pp 1115–1122 | Cite as

Maximum entropy PLDA for robust speaker recognition under speech coding distortion

  • Ahmed KrobbaEmail author
  • Mohamed Debyeche
  • Sid. Ahmed Selouani
Article
  • 35 Downloads

Abstract

The system combining i-vector and probabilistic linear discriminant analysis (PLDA) has been applied with great success in the speaker recognition task. The i-vector space gives a low-dimensional representation of a speech segment and training data of a PLDA model, which offers greater robustness under different conditions. In this paper, we propose a new framework based on i-vector/PLDA and Maximum Entropy (ME) to improve the performance of speaker identification system in the presence of speech coding distortion. The results are reported on TIMIT database and speech coding obtained by passing the speech test from TIMIT database through the AMR encoder/decoder. Our results show that the proposed methode achieves improved performance when compared with the i-vector/PLDA and MEGMM.

Keywords

GMM-UBM MEGMM i-vector/PLDA i-vector/MEPLDA Speaker identification Speech coding 

Notes

References

  1. Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics,22, 39–71.Google Scholar
  2. Bousquet, P. M., Bonastre, J. F., & Matrouf, D. (2014). Exploring some limits of Gaussian PLDA modeling for i-vector distributions.Google Scholar
  3. Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.CrossRefGoogle Scholar
  4. Chen, X., Zhang, J., Anastasakos, T., & Alleva, F. (2019). Investigation of sampling techniques for maximum entropy language modeling training. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7240–7244). IEEE.Google Scholar
  5. Chien-Lin, H., & Bin, M. A. (2011). Maximum entropy based data selection for speaker recognition. In Twelfth Annual Conference of the International Speech Communication Association.Google Scholar
  6. Chilli, A. K., Kumar, K. P., Murthy, H. A., & Sekhar, C. C. (2018). Approaches to codec independent speaker identification in voip speech. In 2018 Twenty Fourth National Conference on Communications (NCC) (pp. 1–5). IEEE.Google Scholar
  7. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing,19(4), 788–798.CrossRefGoogle Scholar
  8. Dunn, R. B, Quatieri, T. F., Reynolds, D. A., & Campbell, J. P. (2001). Speaker recognition from coded speech and the effects of score normalization. In Proceedings of Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Vol. 2, pp. 1562–1567).Google Scholar
  9. Gallardo, L. F. (2016). Human and automatic speaker recognition over telecommunication channels. Singapore: Springer.CrossRefGoogle Scholar
  10. Gallardo, L. F., Wagner, M., & Möller, S. (2014). i-vector speaker verification for speech degraded by narrowband and wideband channels. In Speech Communication; 11. ITG Symposium (pp. 1–4). VDE.Google Scholar
  11. Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., & Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.Google Scholar
  12. Gibson, J. D. (2005). Speech coding methods, standards, and applications. IEEE Circuits and Systems Magazine,5(4), 30–49.CrossRefGoogle Scholar
  13. Goodman, J. (2001). Classes for fast maximum entropy training. arXiv preprint cs/0108006.Google Scholar
  14. Grassi, S., Besacier, L., Dufaux, A., Ansorge, M., & Pellandini, F. (2000). Influence of GSM speech coding on the performance of text-independent speaker recognition. In Proc. of European Signal Processing Conference (EUSIPCO) (pp. 437–440). Tampere, Finland, September 4–8.Google Scholar
  15. Hansen, J. H. L., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine,32, 74–79.CrossRefGoogle Scholar
  16. Hayes, B. (2008). A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry,39(3), 379–440.CrossRefGoogle Scholar
  17. Huang, C.L., & Ma, B. (2011). Maximum entropy based data selection for speaker recognition. In Proceeding of: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy, August 27–31.Google Scholar
  18. Kanagasundaram, A., Vogt, R. J., Dean, D. B., & Sridharan, S. (2012). PLDA based speaker recognition on short utterances. In The Speaker and Language Recognition Workshop.Google Scholar
  19. Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In Odyssey (Vol. 14).Google Scholar
  20. Kenny, P., Boulianne, G., & Dumouchel, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal (Report) CRIM-06/08-13, 14, 28–29.Google Scholar
  21. Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication,52, 12–40.CrossRefGoogle Scholar
  22. Krobba, A., Debyeche, M., & Amrouche, A. (2010). Evaluation of speaker identification system using GSM-EFR speech data. In Proceedings of International Conference on Design and Technology of Integrated Systems (Nanoscale Era Hammamet) (pp. 1–5).Google Scholar
  23. Krobba, A, Debyeche, M., & Selouani, S. A. (2017) Combining acoustic distinctive cues and GFCCs features for robust speaker recognition under speech coding distortion. International Journal of Electrical Electronics & Computer Science Engineering, 4(6).Google Scholar
  24. McCree, A. (2006). Reducing speech coding distortion for speaker identification. In Annual Conference (Interspeech) (pp. 941–944).Google Scholar
  25. McLaren, M., Abrash, V., Graciarena, M., Lei, Y., & Pes’an, J. (2013). Improving robustness to compressed speech in speaker recognition. In Proceedings of INTERSPEECH (pp. 3698–3702).Google Scholar
  26. Pawar, R. V., Kajave, P. P., & Mali, S. N. (2005). Speaker identification using neural networks. In IEC (Prague) (pp. 429–433).Google Scholar
  27. Peinado, A., & Segura, J. (2006). Speech recognition over digital channels: Robustness and standards. ISBN: 978-0-470-02400-3.Google Scholar
  28. Phythian, M., Ingram, J., & Sridharan, S. (1997). Effects of speech coding on text-dependent speaker recognition. In Proceedings of IEEE TENCON ‘97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON) (Vol. 1, pp. 137–140).Google Scholar
  29. Polacky, J., Jarina, R., & Chumlık, M. (2016). Assessment of automatic speaker verification on lossy transcoded speech. In Proceedings of 4th International Workshop on Biometrics and Forensics (IWBF) (pp. 1–6).Google Scholar
  30. Prince, S. J. D., & Elder, J. H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In Proceedings of International Conference on Computer Vision (pp. 1–8).Google Scholar
  31. Quatieri, T. F., Dunn, R. B., Reynolds, D. A., Campbell, J. P., & Singer, E. (2000). Speaker recognition using G.729 speech codec parameters. Proceedings of International Conference on Acoustics, Speech, and Signal Processing,2, 1089–1092.Google Scholar
  32. Rao, W., & Mak, M. W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech and Language Processing,21(5), 1012–1022.CrossRefGoogle Scholar
  33. Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing,10(1–3), 19–41.CrossRefGoogle Scholar
  34. Sreenivasa, K. R., & Anil Kumar, V. (2014). Speech processing in mobile environments. Switzerland: Springer.Google Scholar
  35. Uffink, J. (1996). The constraint rule of the maximum entropy principle. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics,27(1), 47–79.MathSciNetCrossRefGoogle Scholar
  36. Variani, E, Lei, X., McDermott, E., Lopez Moreno, I., & Gonzalez-Dominguez, J. (2014). Deep neural networks for small footprint text-dependent speaker verification. In Proceedings of ICASSP (pp. 4052–4056).Google Scholar
  37. Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2010). Effect of speech coding on speaker identification (pp. 1–4, 17–19). Annual IEEE India Conference (INDICON).Google Scholar
  38. Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2013). Improved speaker identification in wireless environment. International Journal of Signal and Imaging Systems Engineering,6(3), 130–137.CrossRefGoogle Scholar
  39. Young, S., & Odell, J. (2005). The HTK book version 3.3. Speech group, Engineering Department, Cambridge University.Google Scholar
  40. Zhang, L. E. (2004). Maximum entropy modeling toolkit for python and C++. Shenyang: Natural Language Processing Lab, Northeastern University.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Ahmed Krobba
    • 1
    Email author
  • Mohamed Debyeche
    • 1
  • Sid. Ahmed Selouani
    • 2
  1. 1.Speech Communication and Signal Processing Laboratory Université des Sciences et de la Technologie Houari Boumediene (USTHB)AlgiersAlgeria
  2. 2.LARIHS Laboratory, Campus ShappaingUniversity of MonctonMonctonCanada

Personalised recommendations