Speaker Identification Using Higher Order Spectral Phase Features and their Effectiveness vis–a–vis Mel-Cepstral Features

  • Vinod Chandran
  • Daryl Ning
  • Sridha Sridharan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3072)


The effectiveness of higher-order spectral (HOS) phase features in speaker recognition is investigated by comparison with Mel Cepstral features on the same speech data. HOS phase features retain phase information from the Fourier spectrum unlike Mel–frequency Cepstral coefficients (MFCC). Gaussian mixture models are constructed from Mel–Cepstral features and HOS features, respectively, for the same data from various speakers in the Switchboard telephone Speech Corpus. Feature clusters, model parameters and classification performance are analyzed. HOS phase features on their own provide a correct identification rate of about 97% on the chosen subset of the corpus. This is the same level of accuracy as provided by MFCCs. Cluster plots and model parameters are compared to show that HOS phase features can provide complementary information to better discriminate between speakers.


Gaussian Mixture Model Speaker Recognition Speaker Model High Order Spectrum Cepstral Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83 (1995)CrossRefGoogle Scholar
  2. 2.
    Reynolds, D.A.: Large Population Speaker Identification using Clean and Telephone Speech. IEEE Signal Processing Letters 2(3), 46–48 (1995)CrossRefGoogle Scholar
  3. 3.
    Liu, L., He, J., Palm, G.: Signal Modeling for Speaker Identification. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 665–668 (1996)Google Scholar
  4. 4.
    Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A Real–time Text Independent Speaker Identification System. In: Proceedings of the 12th International Conference on Image Analysis and Processing, pp. 632–637 (2003)Google Scholar
  5. 5.
    Patterson, R.D.: A Pulse Ribbon Model of Monaural Phase Perception. Journal of the Acoustical Society of America 82(5), 1560–1586 (1987)CrossRefGoogle Scholar
  6. 6.
    Pobloth, H., Kleijn, W.B.: On Phase Perception in Speech. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 29–32 (1999)Google Scholar
  7. 7.
    Elgar, S., Chandran, V.: Higher Order Spectral Analysis to Detect Nonlinear Interactions in Measured Time Series and an Application to Chua’s Circuit. International Journal of Bifurcation and Chaos 3(1), 19–34 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Chandran, V., Elgar, S.L.: Pattern Recognition using Invariants Defined from Higher Order Spectra–One Dimensional Inputs. IEEE Transactions on Signal Processing 41(1), 205–212 (1993)zbMATHCrossRefGoogle Scholar
  9. 9.
    Wells, B.B.: Voiced/Unvoiced Decision based on the Bispectrum. In: International Conference on Acoustics, Speech and Signal Processing, vol. 10, pp. 1589–1592 (1985)Google Scholar
  10. 10.
    Fackrell, J.W.A., McLaughlin, S.: The Higher–Order Statistics of Speech Signals. IEE Colloquium on Techniques for Speech Processing and their Applications, 7/1–7/6 (1994)Google Scholar
  11. 11.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, Boston (1990)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Vinod Chandran
    • 1
  • Daryl Ning
    • 1
  • Sridha Sridharan
    • 1
  1. 1.Queensland University of TechnologyBrisbaneAustralia

Personalised recommendations