Speaker Discrimination Based on a Fusion Between Neural and Statistical Classifiers

  • Siham Ouamour
  • Halim Sayoud
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9680)


Speaker discrimination consists in checking whether two (or more) speech segments belong to the same speaker or not. In this framework, we propose a new approach developed for the task of speaker discrimination, this approach results from the fusion between a neural network classifier (NN) and a statistical classifier, this fusion is obtained once by combining the scores of the simple classifiers weighted by some confidence coefficients and another time, by using the scores of the statistical classifier as an additional input of the Multi-Layer Perceptron (MLP), in order to optimize the NN training (Hybrid model).

In one hand, we notice that the fusion has improved the results obtained by each approach alone and in the other hand we notice that the fusion using the sum of weighted scores, obtained by each classifier alone, seems to be better than the hybrid method. The experiments, done on a subset of Hub4 Broadcast News database, have shown the efficiency of that fusion in speaker discrimination, where the Equal Error Rate (EER) is about 7 %, with short segments of 4 s only.


Speaker discrimination Fusion Speech processing 


  1. 1.
    Woodland, P.C., Gales, M.J.F., Pye, D., Young, S.J.: The Development of the 1996 HTK broadcast news transcription system. In: Workshop DARPA Speech Recognition, pp. 97−99 (1997)Google Scholar
  2. 2.
    Motlicek, P., Dey, S., Madikeri, S., Burget, L.: Employment of subspace gaussian mixture models in speaker recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, pp. 4445–4449, 19–24 April 2015Google Scholar
  3. 3.
    Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Sig. Process. Lett. 22(10), 1671–1675 (2015)CrossRefGoogle Scholar
  4. 4.
    Ouamour, S., Sayoud, H.: Speaker detection on telephone calls using fusion between SVMs and statistical measures. In: International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Beijing, China, 10–12 October 2013Google Scholar
  5. 5.
    Alam, M.M., Uddin, M.S., Uddin, M.N.: Text dependent speaker identification using hidden Markov model and mel frequency cepstrum coefficient. Int. J. Comput. Appl. 104(14), 33–37 (2014)Google Scholar
  6. 6.
    Lee, H.S., Tsoi, A.C.: Application of multi-layer perceptron in estimating speech / noise characteristics for speech recognition in noisy environment. Speech Commun. 17(1–2), 59–76 (1995)CrossRefGoogle Scholar
  7. 7.
    Sayoud, H., Ouamour, S., Boudraa, M.: ‘ASTRA’ an automatic speaker tracking system based on SOSM measures and an interlaced indexation. acta. Acustica 89(4), 702–710 (2003)Google Scholar
  8. 8.
    Sayoud, H., Ouamour, S.: Reconnaissance automatique du locuteur en milieu bruité. In: JEP 2000 Conference, Aussois Juin, pp. 345−348 (2000)Google Scholar
  9. 9.
    Ouamour, S., Sayoud, H.: Looking for the best spectral resolution in automatic speaker recognition. In: 3rd IEEE-GCC Conference, Manama Bahrain, 19–22 March (2006)Google Scholar
  10. 10.
    Bimbot, F., Magrin-Chagnolleau, I., Mathan, L.: Second-order statistical measures for text-independent broadcaster identification. Speech Commun. 17(1–2), 177–192 (1995)CrossRefGoogle Scholar
  11. 11.
    Bonastre, F., Besacier, L.: Traitement Indépendant de Sous-bandes Fréquentielles par des méthodes Statistiques du Second Ordre pour la Reconnaissance du Locuteur. Actes du 4ème Congrès Français d’Acou., Marseille pp. 357–360, 14–18 Apr 1997Google Scholar
  12. 12.
    Bennani, Y.: Approches connexionnistes pour la reconnaissance du locuteur: modélisation et identification. Ph. D. thesis, Université Paris XI (1992)Google Scholar
  13. 13.
    Sayoud, H.: Automatic speaker recognition using neural approaches. Ph. D. thesis, USTHB University, Algiers (2003)Google Scholar
  14. 14.
    Ouamour, S., Guerti, M., Sayoud, H.: A new relativistic vision in speaker discrimination. Can. Acoust. J. 36(4), 24–34 (2008). Publisher: Canadian Acoustics Association, CanadaGoogle Scholar
  15. 15.
    Dasarathy, B.V.: Decision Fusion. IEEE Computer Society Press, Los Alamitos (1994)Google Scholar
  16. 16.
    Kitler, J.: Multiple classifier systems in decision-level fusion of multimodal biometric experts. 1st BioSecure residential workshop, Paris, France 1–26 August (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.USTHB UniversityAlgiersAlgeria

Personalised recommendations