Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features

  • Lara Stoll
  • Joe Frankel
  • Nikki Mirghafori
Conference paper

DOI: 10.1007/978-3-540-77347-4_8

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4885)
Cite this paper as:
Stoll L., Frankel J., Mirghafori N. (2007) Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features. In: Chetouani M., Hussain A., Gas B., Milgram M., Zarader JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science, vol 4885. Springer, Berlin, Heidelberg

Abstract

We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-of-the-art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Lara Stoll
    • 1
    • 2
  • Joe Frankel
    • 1
    • 3
  • Nikki Mirghafori
    • 1
  1. 1.International Computer Science Institute, Berkeley, CAUSA
  2. 2.University of California at Berkeley, CAUSA
  3. 3.Centre for Speech Technology Research, EdinburghUK

Personalised recommendations