Cepstral Trajectories in Linguistic Units for Text-Independent Speaker Recognition

  • Javier Franco-Pedroso
  • Fernando Espinoza-Cuadros
  • Joaquin Gonzalez-Rodriguez
Part of the Communications in Computer and Information Science book series (CCIS, volume 328)


In this paper, the contributions of different linguistic units to the speaker recognition task are explored by means of temporal trajectories of their MFCC features. Inspired by successful work in forensic speaker identification, we extend the approach based on temporal contours of formant frequencies in linguistic units to design a fully automatic system that puts together both forensic and automatic speaker recognition worlds. The combination of MFCC features and unit-dependent trajectories provides a powerful tool to extract individualizing information. At a fine-grained level, we provide a calibrated likelihood ratio per linguistic unit under analysis (extremely useful in applications such as forensics), and at a coarse-grained level, we combine the individual contributions of the different units to obtain a highly discriminative single system. This approach has been tested with NIST SRE 2006 datasets and protocols, consisting of 9,720 trials from 219 male speakers for the 1side-1side English-only task, and development data being extracted from 367 male speakers from 1,808 conversations from NIST SRE 2004 and 2005 datasets.


automatic speaker recognition forensic speaker identification temporal contours linguistic units cepstral trajectories 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brummer, N., et al.: Application-independent evaluation of speaker detection. Comp. Speech Lang. (20), 230–275 (2006)Google Scholar
  2. 2.
    Dehak, N., et al.: Front-End Factor Analysis for Speaker Verification. IEEE Trans. on Audio, Speech and Lang. Proc. 19(4), 788–798 (2011)CrossRefGoogle Scholar
  3. 3.
    de Castro, A., Ramos, D., Gonzalez-Rodriguez, J.: Forensic speaker recognition using traditional features comparing automatic and human-in-the-loop formant tracking. In: Proceedings of Interspeech 2009, pp. 2343–2346 (September 2009)Google Scholar
  4. 4.
    Ferrer, L.: Statistical modeling of heterogeneous features for speech processing tasks. Ph.D. dissertation, Stanford Univ. (2009),
  5. 5.
    Franco-Pedroso, J., Gonzalez-Rodriguez, J., Gonzalez-Dominguez, J., Ramos, D.: Fine-grained automatic speaker recognition using cepstral trajectories in pone units. In: Proceedings of IAFPA 2012, Santander, Spain (2012)Google Scholar
  6. 6.
    Kajarekar, S., et al.: The SRI NIST 2008 Speaker Recognition Evaluation System. In: Proc. IEEE ICASSP 2009, Taipei, pp. 4205–4209 (2009)Google Scholar
  7. 7.
    Kenny, P., et al.: A Study of Inter-speaker Variability in Speaker Verification. IEEE Trans. on Audio, Speech and Lang. Proc. 16(5), 980–988 (2008)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Kenny, P.: Bayesian speaker verification with heavy tailed priors. Keynote Presentation at Odyssey 2010, Brno (2010)Google Scholar
  9. 9.
    Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Communication 52, 12–40 (2010)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)CrossRefGoogle Scholar
  12. 12.
    Shriberg, E.: Modeling prosodic feature sequences for speaker recognition. Speech Communication 46(3-4), 455–472 (2005)CrossRefGoogle Scholar
  13. 13.
    Wikipedia contributors. Arpabet. Wikipedia, The Free Encyclopedia (July 19, 2012),

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Javier Franco-Pedroso
    • 1
  • Fernando Espinoza-Cuadros
    • 1
  • Joaquin Gonzalez-Rodriguez
    • 1
  1. 1.ATVS – Biometric Recognition GroupUniversidad Autonoma de MadridSpain

Personalised recommendations