Skip to main content
Log in

A pertinent learning machine input feature for speaker discrimination by voice

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This research work is a part of a global project of speech indexing entitled ISDS and concerns more particularly two machine learning classifier types: Neural Networks (NN) and Support Vector Machines (SVM), which are used by that project. However, in the present paper, we will only deal with the problem of speaker discrimination using a new relative reduced modelization for the speaker, restricting then our analysis to the new relative speaker characteristic used as input feature of the learning machines (NN and SVM). Speaker discrimination consists in checking whether two speech signals belong to the same speaker or not, by using some features of the speaker directly from his own speech. Our new proposed feature is based on a relative characterization of the speaker, called Relative Speaker Characteristic (RSC) and is well adapted for NN and SVM trainings. RSC consists in modeling one speaker relatively to another one, meaning that each speaker model is determined from both its speech signal and its dual speech. This investigation shows that the relative model, used as input of the classifier, optimizes the training, by speeding up the learning time and enhancing the discrimination accuracy of that classifier.

Experiments of speaker discrimination are done on two different databases: Hub4 Broadcast-News database and a telephonic speech database, by using two learning machines: a Multi-Layer Perceptron (MLP) and a Support Vector Machines (SVM) with several input characteristics. Another comparative investigation is conducted by using two classical discriminative measures (Covariance-based mono-Gaussian distance and Kullback-Leibler distance) on the same databases.

The originality of this relativist approach is that the new characteristic gives to the speaker a flexible model, since it changes every time that the competing speaker model changes. Results show that the new input characteristic is interesting in speaker discrimination. Furthermore, by using the Relative Speaker Characteristic, we reduce the size of the classifier input and the training time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bennani, Y. (1992). Approches connexionnistes pour la reconnaissance du locuteur: modélisation et identification. Ph.D. thesis, Paris XI University.

  • Bennani, Y., & Gallinari, P. (1995). Neural Networks for discrimination and modelization of speakers. Speech Communication, 17(1–2), 159–175.

    Article  Google Scholar 

  • Bimbot, F., Magrin-Chagnolleau, I., & Mathan, L. (1995). Second-order statistical measures for text-independent speaker identification. Speech Communication, 17(1–2), 177–192.

    Article  Google Scholar 

  • Bonastre, F., & Besacier, L. (1997). Traitement indépendant de sous-bandes fréquentielles par des méthodes statistiques du second ordre pour la reconnaissance du locuteur. In Actes du 4ème congrès français d’acoustique (pp. 357–360). Marseille, 14–18 April, 1997.

    Google Scholar 

  • Delacourt, P. (2000). La segmentation et le regroupement par locuteurs pour l’indexation de documents audio. Ph.D. Thesis, Institut Eurecom, Nice, France.

  • Didiot, E., Illina, I., Fohr, D., & Mella, O. (2010). A wavelet-based parameterization for speech/music discrimination. Computer Speech and Language, 24(2), 341–357. ISSN:0885-2308.

    Article  Google Scholar 

  • Gish, H. (1990). Robust discrimination in automatic speaker identification. In IEEE international conference on acoustics speech and signal processing, New Mexico, April 1990 (pp. 289–292).

    Chapter  Google Scholar 

  • Kitler, J. (2005). Multiple classifier systems in decision-level fusion of multimodal biometric experts. In 1st BioSecure residential workshop, Paris, France, 1–26 August 2005.

    Google Scholar 

  • Lee, H. S., & Tsoi, A. C. (1995). Application of multi-layer perceptron in estimating speech/noise characteristics for speech recognition in noisy environment. Speech Communication, 17(1–2), 59–76.

    Article  Google Scholar 

  • Meignier, S. (2002). Indexation en locuteurs de documents sonores: Segmentation d’un document et Appariement d’une collection. Ph.D. thesis, LIA Avignon, France.

  • Meignier, S., et al. (2006). Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech and Language, 20, 303–330.

    Article  Google Scholar 

  • Ouamour, S., Guerti, M., & Sayoud, H. (2008). A new relativistic vision in speaker discrimination. Canadian Acoustics Journal, 36(4), 24–34.

    Google Scholar 

  • Ouamour, S., Sayoud, H., & Guerti, M. (2009). Speaker segmentation using parallel fusion between three classifiers. In The proceedings of the 3rd international conference on signals, circuits & systems, Djerba, Tunisia, 5–8 November.

    Google Scholar 

  • Rajeshwara Rao, R., Prasad, A., & Kedari Rao, Ch. (2011). Robust features for automatic text-independent speaker recognition using Gaussian mixture model. International Journal of Soft Computing and Engineering (IJSCE), 1(5), 330–335. ISSN:2231-2307.

    Google Scholar 

  • Rose, P. (2007). Forensic speaker discrimination with Australian English vowel acoustics. In ICPhS XVI Saarbrücken, 6–10 August 2007, Saarbrücken.

    Google Scholar 

  • Sayoud, H. (2003). Automatic speaker recognition using neural approaches. Ph.D. thesis, USTHB University, Algiers, Algeria.

  • Sayoud, H., & Ouamour, S. (2000). Reconnaissance automatique du locuteur en milieu bruité. In JEP’00 (pp. 345–348). Aussois, France, June 2000.

    Google Scholar 

  • Sayoud, H., Ouamour, S., & Boudraa, M. (2003). ‘ASTRA’ an automatic speaker tracking system based on SOSM measures and an interlaced indexation. Acta Acustica, 89(4), 702–710.

    Google Scholar 

  • Verlinde, P. (1999). A contribution to multimodal identity verification using decision fusion. Ph.D. thesis, Ecole Nationale Supérieure des Télécommunications, Paris, France, September.

  • Vincent, W. (2003). Speaker verification using support vector machines. Ph.D. thesis, Department of Computer Science, University of Sheffield, Sheffield (United Kingdom), June.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Sayoud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ouamour, S., Sayoud, H. A pertinent learning machine input feature for speaker discrimination by voice. Int J Speech Technol 15, 181–190 (2012). https://doi.org/10.1007/s10772-012-9132-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9132-x

Keywords

Navigation