A pertinent learning machine input feature for speaker discrimination by voice

Ouamour, S.; Sayoud, H.

doi:10.1007/s10772-012-9132-x

A pertinent learning machine input feature for speaker discrimination by voice

Published: 07 February 2012

Volume 15, pages 181–190, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

S. Ouamour¹ &
H. Sayoud¹

135 Accesses
2 Citations
Explore all metrics

Abstract

This research work is a part of a global project of speech indexing entitled ISDS and concerns more particularly two machine learning classifier types: Neural Networks (NN) and Support Vector Machines (SVM), which are used by that project. However, in the present paper, we will only deal with the problem of speaker discrimination using a new relative reduced modelization for the speaker, restricting then our analysis to the new relative speaker characteristic used as input feature of the learning machines (NN and SVM). Speaker discrimination consists in checking whether two speech signals belong to the same speaker or not, by using some features of the speaker directly from his own speech. Our new proposed feature is based on a relative characterization of the speaker, called Relative Speaker Characteristic (RSC) and is well adapted for NN and SVM trainings. RSC consists in modeling one speaker relatively to another one, meaning that each speaker model is determined from both its speech signal and its dual speech. This investigation shows that the relative model, used as input of the classifier, optimizes the training, by speeding up the learning time and enhancing the discrimination accuracy of that classifier.

Experiments of speaker discrimination are done on two different databases: Hub4 Broadcast-News database and a telephonic speech database, by using two learning machines: a Multi-Layer Perceptron (MLP) and a Support Vector Machines (SVM) with several input characteristics. Another comparative investigation is conducted by using two classical discriminative measures (Covariance-based mono-Gaussian distance and Kullback-Leibler distance) on the same databases.

The originality of this relativist approach is that the new characteristic gives to the speaker a flexible model, since it changes every time that the competing speaker model changes. Results show that the new input characteristic is interesting in speaker discrimination. Furthermore, by using the Relative Speaker Characteristic, we reduce the size of the classifier input and the training time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bennani, Y. (1992). Approches connexionnistes pour la reconnaissance du locuteur: modélisation et identification. Ph.D. thesis, Paris XI University.
Bennani, Y., & Gallinari, P. (1995). Neural Networks for discrimination and modelization of speakers. Speech Communication, 17(1–2), 159–175.
Article Google Scholar
Bimbot, F., Magrin-Chagnolleau, I., & Mathan, L. (1995). Second-order statistical measures for text-independent speaker identification. Speech Communication, 17(1–2), 177–192.
Article Google Scholar
Bonastre, F., & Besacier, L. (1997). Traitement indépendant de sous-bandes fréquentielles par des méthodes statistiques du second ordre pour la reconnaissance du locuteur. In Actes du 4ème congrès français d’acoustique (pp. 357–360). Marseille, 14–18 April, 1997.
Google Scholar
Delacourt, P. (2000). La segmentation et le regroupement par locuteurs pour l’indexation de documents audio. Ph.D. Thesis, Institut Eurecom, Nice, France.
Didiot, E., Illina, I., Fohr, D., & Mella, O. (2010). A wavelet-based parameterization for speech/music discrimination. Computer Speech and Language, 24(2), 341–357. ISSN:0885-2308.
Article Google Scholar
Gish, H. (1990). Robust discrimination in automatic speaker identification. In IEEE international conference on acoustics speech and signal processing, New Mexico, April 1990 (pp. 289–292).
Chapter Google Scholar
Kitler, J. (2005). Multiple classifier systems in decision-level fusion of multimodal biometric experts. In 1st BioSecure residential workshop, Paris, France, 1–26 August 2005.
Google Scholar
Lee, H. S., & Tsoi, A. C. (1995). Application of multi-layer perceptron in estimating speech/noise characteristics for speech recognition in noisy environment. Speech Communication, 17(1–2), 59–76.
Article Google Scholar
Meignier, S. (2002). Indexation en locuteurs de documents sonores: Segmentation d’un document et Appariement d’une collection. Ph.D. thesis, LIA Avignon, France.
Meignier, S., et al. (2006). Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech and Language, 20, 303–330.
Article Google Scholar
Ouamour, S., Guerti, M., & Sayoud, H. (2008). A new relativistic vision in speaker discrimination. Canadian Acoustics Journal, 36(4), 24–34.
Google Scholar
Ouamour, S., Sayoud, H., & Guerti, M. (2009). Speaker segmentation using parallel fusion between three classifiers. In The proceedings of the 3rd international conference on signals, circuits & systems, Djerba, Tunisia, 5–8 November.
Google Scholar
Rajeshwara Rao, R., Prasad, A., & Kedari Rao, Ch. (2011). Robust features for automatic text-independent speaker recognition using Gaussian mixture model. International Journal of Soft Computing and Engineering (IJSCE), 1(5), 330–335. ISSN:2231-2307.
Google Scholar
Rose, P. (2007). Forensic speaker discrimination with Australian English vowel acoustics. In ICPhS XVI Saarbrücken, 6–10 August 2007, Saarbrücken.
Google Scholar
Sayoud, H. (2003). Automatic speaker recognition using neural approaches. Ph.D. thesis, USTHB University, Algiers, Algeria.
Sayoud, H., & Ouamour, S. (2000). Reconnaissance automatique du locuteur en milieu bruité. In JEP’00 (pp. 345–348). Aussois, France, June 2000.
Google Scholar
Sayoud, H., Ouamour, S., & Boudraa, M. (2003). ‘ASTRA’ an automatic speaker tracking system based on SOSM measures and an interlaced indexation. Acta Acustica, 89(4), 702–710.
Google Scholar
Verlinde, P. (1999). A contribution to multimodal identity verification using decision fusion. Ph.D. thesis, Ecole Nationale Supérieure des Télécommunications, Paris, France, September.
Vincent, W. (2003). Speaker verification using support vector machines. Ph.D. thesis, Department of Computer Science, University of Sheffield, Sheffield (United Kingdom), June.

Download references

Author information

Authors and Affiliations

Institute of Electronics, USTHB University, Algiers, Algeria
S. Ouamour & H. Sayoud

Authors

S. Ouamour
View author publications
You can also search for this author in PubMed Google Scholar
H. Sayoud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H. Sayoud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ouamour, S., Sayoud, H. A pertinent learning machine input feature for speaker discrimination by voice. Int J Speech Technol 15, 181–190 (2012). https://doi.org/10.1007/s10772-012-9132-x

Download citation

Received: 31 August 2011
Accepted: 23 January 2012
Published: 07 February 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10772-012-9132-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A pertinent learning machine input feature for speaker discrimination by voice

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A pertinent learning machine input feature for speaker discrimination by voice

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation