Advertisement

International Journal of Speech Technology

, Volume 21, Issue 1, pp 51–63 | Cite as

Speaker discrimination based on fuzzy fusion and feature reduction techniques

  • S. Khennouf
  • H. Sayoud
Article
  • 117 Downloads

Abstract

In this paper, we propose a research work on speaker discrimination using a multi-classifier fusion with focus on feature reduction effects. Speaker discrimination consists in the automatic distinction between two speakers using the vocal characteristics of their speeches. A number of features are extracted using Mel Frequency Spectral Coefficients and then reduced using Relative Speaker Characteristic (RSC) along with the Principal Components Analysis (PCA). Several classification methods are implemented to ensure the discrimination task. Since different classifiers are employed, two fusion algorithms at the decision level, referred to as Weighted Fusion and Fuzzy Fusion, are proposed to boost the classification performances. These algorithms are based on the weighting of the different classifiers outputs. Furthermore, the effects of speaker gender and feature reduction on the speaker discrimination task have been examined too. The evaluation of our approaches was conducted on a subset of Hub-4 Broadcast-News. The experimental results have shown that the speaker discrimination accuracy is improved by 5–15% using the (RSC–PCA) feature reduction. In addition, the proposed fusion methods recorded an improvement of about 10% compared to the individual scores of the classifiers. Finally, we noticed that the gender has an important impact on the discrimination performances.

Keywords

Artificial intelligence Speaker discrimination Speech processing Features reduction Gender-based classification Fusion 

References

  1. Benzeghiba, M. et al. (2007). Automatic speech recognition and speech variability: A review. Speech Commununication, 49(10), 763–786.CrossRefGoogle Scholar
  2. Bimbot, F. (2009). Automatic speaker recognition. Language and speech processing, pp. 321–354.Google Scholar
  3. Bimbot, F., Magrin-Chagnolleau, I., & Mathan, L. (1995). Second-order statistical measures for text-independent speaker identification. Speech Communication, 17(1–2), 177–192.CrossRefGoogle Scholar
  4. Bulgakova, E. et al. (2015). Speaker verification using spectral and durational segmental characteristics. In: International conference on speech and computer. New York: Springer, pp. 397–404.Google Scholar
  5. Burget, L. et al. (2011). Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In: Proceedings of the 36th international conference on acoustics, speech and signal processing, Prague, Czech Republic, May 22–27.Google Scholar
  6. Corinna, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.zbMATHGoogle Scholar
  7. Dhonde, S. B., & Jagade, S. M. (2015). Feature extraction techniques in speaker recognition: A review. International Journal on Recent Technologies in Mechanical and Electrical Engineering (IJRMEE), 2(no 5), 104–106.Google Scholar
  8. El-Samie, F. E. A. (2011). Information security for automatic speaker identification (1st ed.). New York: Springer.CrossRefGoogle Scholar
  9. Ghodsi, A. (2006). Dimensionality reduction a short tutorial, Department of Statistics and Actuarial Science, University of Waterloo, Ontario, Canada, pp. 37–38‏.Google Scholar
  10. Guandong, X., Zong, Y., & Zhenglu, Y. (2013). Applied data mining. Boston: CRC Press.Google Scholar
  11. Jebara, T. (2012). Machine learning: Discriminative and generative (Vol. 755). New York: Springer.zbMATHGoogle Scholar
  12. Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.CrossRefGoogle Scholar
  13. Lee, C. H., Soong, F. K., & Paliwal, K. (Eds.) (2012). Automatic speech and speaker recognition: Advanced topics, (Vol. 355). New York: Springer.Google Scholar
  14. Lei, Y., & Scheffer, N., et al. (2014). A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proceedings of the 39th international conference on acoustics, speech and signal processing, Florence, Italy, May 4–9.Google Scholar
  15. Li, W. et al. (2015). Sparsity analysis and compensation for i-vector based speaker verification. In Proceeding of the international conference on speech and computer. New York: Springer.Google Scholar
  16. Man-Wai, M., & Hon-Bill, Y. (2014). A study of voice activity detection techniques for NIST speaker recognition evaluations. Computer Speech & Language, 28(1), 295–313.CrossRefGoogle Scholar
  17. Meignier, S. (2002). Indexation en locuteurs de documents sonores: Segmentation d’un document et Appariement d’une collection. Ph.D. thesis, Univ. d’Avignon et des Pays de Vaucluse, France.Google Scholar
  18. Ming, L. et al. (2016). Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Computer Speech & Language, 36, 196–211.CrossRefGoogle Scholar
  19. Nakagawa, S., Longbiao, W., & Ohtsuka, S. (2012). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.CrossRefGoogle Scholar
  20. Nakagawa, S., Wang, L., & Ohtsuka, S. (2012). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.CrossRefGoogle Scholar
  21. Ouamour, S., Guerti, M., & Sayoud, H. (2008). A new relativistic vision in speaker discrimination. Canadian Acoustics, 36(4), 24–35.Google Scholar
  22. Ouamour, S., Sayoud, H., & Guerti, M. (2009). Optimal spectral resolution in speaker authentication application in noisy environment and telephony. International Journal of Mobile Computing and Multimedia Communications (IJMCMC), 1(2), 36–47.CrossRefGoogle Scholar
  23. Pribil, J., Pribilova, A., Matousek, J. (2016). GMM-based speaker gender and age classification after voice conversion. In First international workshop on sensing, processing and learning for intelligent machines (SPLINE), Denmark.Google Scholar
  24. Richardson, F., Reynolds, D. A., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.CrossRefGoogle Scholar
  25. Sayoud, H. (2003). Automatic speaker recognition using neural approaches, PhD thesis, USTHB University, Algiers, Algeria.Google Scholar
  26. Shlens, J. (2014). A tutorial on principal component analysis, arXiv preprint arXiv.1404.1100.Google Scholar
  27. Venables, W. N., & Ripley, B. D. (2013). Modern applied statistics with S-PLUS. Berlin: Springer.zbMATHGoogle Scholar
  28. Wu, D., & Jie, C. (2015). Multimodel biometrics fusion based on FAR and FRR using triangular norm. International Journal of Computational Intelligence Systems, 8(4), 779–786.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.USTHB UniversityAlgerAlgeria

Personalised recommendations