Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features
- Cite this paper as:
- Stoll L., Frankel J., Mirghafori N. (2007) Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features. In: Chetouani M., Hussain A., Gas B., Milgram M., Zarader JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science, vol 4885. Springer, Berlin, Heidelberg
We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-of-the-art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.
Unable to display preview. Download preview PDF.