Abstract
This research work is a part of a global project of speech indexing entitled ISDS and concerns more particularly two machine learning classifier types: Neural Networks (NN) and Support Vector Machines (SVM), which are used by that project. However, in the present paper, we will only deal with the problem of speaker discrimination using a new relative reduced modelization for the speaker, restricting then our analysis to the new relative speaker characteristic used as input feature of the learning machines (NN and SVM). Speaker discrimination consists in checking whether two speech signals belong to the same speaker or not, by using some features of the speaker directly from his own speech. Our new proposed feature is based on a relative characterization of the speaker, called Relative Speaker Characteristic (RSC) and is well adapted for NN and SVM trainings. RSC consists in modeling one speaker relatively to another one, meaning that each speaker model is determined from both its speech signal and its dual speech. This investigation shows that the relative model, used as input of the classifier, optimizes the training, by speeding up the learning time and enhancing the discrimination accuracy of that classifier.
Experiments of speaker discrimination are done on two different databases: Hub4 Broadcast-News database and a telephonic speech database, by using two learning machines: a Multi-Layer Perceptron (MLP) and a Support Vector Machines (SVM) with several input characteristics. Another comparative investigation is conducted by using two classical discriminative measures (Covariance-based mono-Gaussian distance and Kullback-Leibler distance) on the same databases.
The originality of this relativist approach is that the new characteristic gives to the speaker a flexible model, since it changes every time that the competing speaker model changes. Results show that the new input characteristic is interesting in speaker discrimination. Furthermore, by using the Relative Speaker Characteristic, we reduce the size of the classifier input and the training time.
Similar content being viewed by others
References
Bennani, Y. (1992). Approches connexionnistes pour la reconnaissance du locuteur: modélisation et identification. Ph.D. thesis, Paris XI University.
Bennani, Y., & Gallinari, P. (1995). Neural Networks for discrimination and modelization of speakers. Speech Communication, 17(1–2), 159–175.
Bimbot, F., Magrin-Chagnolleau, I., & Mathan, L. (1995). Second-order statistical measures for text-independent speaker identification. Speech Communication, 17(1–2), 177–192.
Bonastre, F., & Besacier, L. (1997). Traitement indépendant de sous-bandes fréquentielles par des méthodes statistiques du second ordre pour la reconnaissance du locuteur. In Actes du 4ème congrès français d’acoustique (pp. 357–360). Marseille, 14–18 April, 1997.
Delacourt, P. (2000). La segmentation et le regroupement par locuteurs pour l’indexation de documents audio. Ph.D. Thesis, Institut Eurecom, Nice, France.
Didiot, E., Illina, I., Fohr, D., & Mella, O. (2010). A wavelet-based parameterization for speech/music discrimination. Computer Speech and Language, 24(2), 341–357. ISSN:0885-2308.
Gish, H. (1990). Robust discrimination in automatic speaker identification. In IEEE international conference on acoustics speech and signal processing, New Mexico, April 1990 (pp. 289–292).
Kitler, J. (2005). Multiple classifier systems in decision-level fusion of multimodal biometric experts. In 1st BioSecure residential workshop, Paris, France, 1–26 August 2005.
Lee, H. S., & Tsoi, A. C. (1995). Application of multi-layer perceptron in estimating speech/noise characteristics for speech recognition in noisy environment. Speech Communication, 17(1–2), 59–76.
Meignier, S. (2002). Indexation en locuteurs de documents sonores: Segmentation d’un document et Appariement d’une collection. Ph.D. thesis, LIA Avignon, France.
Meignier, S., et al. (2006). Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech and Language, 20, 303–330.
Ouamour, S., Guerti, M., & Sayoud, H. (2008). A new relativistic vision in speaker discrimination. Canadian Acoustics Journal, 36(4), 24–34.
Ouamour, S., Sayoud, H., & Guerti, M. (2009). Speaker segmentation using parallel fusion between three classifiers. In The proceedings of the 3rd international conference on signals, circuits & systems, Djerba, Tunisia, 5–8 November.
Rajeshwara Rao, R., Prasad, A., & Kedari Rao, Ch. (2011). Robust features for automatic text-independent speaker recognition using Gaussian mixture model. International Journal of Soft Computing and Engineering (IJSCE), 1(5), 330–335. ISSN:2231-2307.
Rose, P. (2007). Forensic speaker discrimination with Australian English vowel acoustics. In ICPhS XVI Saarbrücken, 6–10 August 2007, Saarbrücken.
Sayoud, H. (2003). Automatic speaker recognition using neural approaches. Ph.D. thesis, USTHB University, Algiers, Algeria.
Sayoud, H., & Ouamour, S. (2000). Reconnaissance automatique du locuteur en milieu bruité. In JEP’00 (pp. 345–348). Aussois, France, June 2000.
Sayoud, H., Ouamour, S., & Boudraa, M. (2003). ‘ASTRA’ an automatic speaker tracking system based on SOSM measures and an interlaced indexation. Acta Acustica, 89(4), 702–710.
Verlinde, P. (1999). A contribution to multimodal identity verification using decision fusion. Ph.D. thesis, Ecole Nationale Supérieure des Télécommunications, Paris, France, September.
Vincent, W. (2003). Speaker verification using support vector machines. Ph.D. thesis, Department of Computer Science, University of Sheffield, Sheffield (United Kingdom), June.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ouamour, S., Sayoud, H. A pertinent learning machine input feature for speaker discrimination by voice. Int J Speech Technol 15, 181–190 (2012). https://doi.org/10.1007/s10772-012-9132-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-012-9132-x