Modelling Speaker Variability Using Covariance Learning

  • Moses EkpenyongEmail author
  • Imeh Umoren
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10841)


In this contribution, we investigate the relationship between speakers and speech utterance, and propose a speaker normalization/adaptation model that incorporates correlation amongst the utterance classes produced by male and female speakers of varying age categories (children: 0–15; youths: 16–30; adults: 31–50; seniors: \({>}50\)). Using Principal Component Analysis (PCA), a speaker space was constructed, and based on the speaker covariance matrix obtained directly from the speech data signals, a visualisation of the first three principal components (PCs) was achieved. For effective covariance learning, a component-wise normalisation of each vector weights of the covariance matrix was performed, and a machine learning algorithm (the SOM: self organising map) implemented to model selected speaker features (F0, intensity, pulse) variability. Results obtained reveal that, for the features selected, F0 gave the most variance, as both genders exhibited high variability. For male speakers, PC1 captured the most variance of 87%, while PC2 and PC3 captured the least variances of 7% and 3%, respectively. For female speakers, PC1 captured the most variance of 97%, while PC2 and PC3 captured the least variances of 2% and 1%, respectively. Further, intensity and pulse features show close similarity patterns between the speech features, and are not most relevant for speaker variability modelling. Component planes visualisation of the respective speech patterns learned from the features covariance revealed consistent patterns, and hence, useful in speaker recognition systems.


Component visualisation Machine learning PCA SOM Speech variability 



This research is funded by the Tertiary Education Trust Fund (TETFund), Nigeria. We also appreciate the students, staff and other participants who accepted to offer their voices for this experiment.


  1. 1.
    Kajarekar, S.S.: Analysis of variability in speech with applications to speech and speaker recognition. Ph.D. thesis, Oregon Health and Science University, Oregon (2002)Google Scholar
  2. 2.
    Chen, T., Huang, C., Chang, E., Wang, J.: On the use of Gaussian mixture model for speaker variability analysis. In: 17th International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 1–4 (2002)Google Scholar
  3. 3.
    Huang, C., Chen, T., Li, S., Chang, E., Zhou, J.: Analysis of speaker variability. In: 7th European Conference on Speech Communication and Technology, Scandinavia, pp. 1–4 (2001)Google Scholar
  4. 4.
    Kohonen, T.: MATLAB Implementations and Applications of the Self-organizing Map. Unigrafia Oy, Helsinki (2014)Google Scholar
  5. 5.
    Zehraoui, F., Bennani, Y.: M-SOM: matricial self organizing map for sequence clustering and classification. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, Hungary, vol. 1, pp. 763–768 (2004)Google Scholar
  6. 6.
    Le Cun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: application to neural-network learning. Phys. Rev. Lett. 66(18), 2396 (1991)CrossRefGoogle Scholar
  7. 7.
    Park, S., Mun, S., Lee, Y., Ko, H.: Acoustic scene classification based on convolution neural network using double image features. In: Proceedings of Detection and Classification of Acoustic Scenes and Events Workshop, Munich, Germany, pp. 1–5 (2017)Google Scholar
  8. 8.
    Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-organizing map in MATLAB: the SOM Toolbox. In: Proceedings of MATLAB DSP Conference, Espoo, Finland (1999)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of UyoUyoNigeria
  2. 2.Akwa Ibom State UniversityMkpat-EninNigeria

Personalised recommendations