Speaker Identification Using Higher Order Spectral Phase Features and their Effectiveness vis–a–vis Mel-Cepstral Features
The effectiveness of higher-order spectral (HOS) phase features in speaker recognition is investigated by comparison with Mel Cepstral features on the same speech data. HOS phase features retain phase information from the Fourier spectrum unlike Mel–frequency Cepstral coefficients (MFCC). Gaussian mixture models are constructed from Mel–Cepstral features and HOS features, respectively, for the same data from various speakers in the Switchboard telephone Speech Corpus. Feature clusters, model parameters and classification performance are analyzed. HOS phase features on their own provide a correct identification rate of about 97% on the chosen subset of the corpus. This is the same level of accuracy as provided by MFCCs. Cluster plots and model parameters are compared to show that HOS phase features can provide complementary information to better discriminate between speakers.
KeywordsGaussian Mixture Model Speaker Recognition Speaker Model High Order Spectrum Cepstral Feature
Unable to display preview. Download preview PDF.
- 3.Liu, L., He, J., Palm, G.: Signal Modeling for Speaker Identification. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 665–668 (1996)Google Scholar
- 4.Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A Real–time Text Independent Speaker Identification System. In: Proceedings of the 12th International Conference on Image Analysis and Processing, pp. 632–637 (2003)Google Scholar
- 6.Pobloth, H., Kleijn, W.B.: On Phase Perception in Speech. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 29–32 (1999)Google Scholar
- 9.Wells, B.B.: Voiced/Unvoiced Decision based on the Bispectrum. In: International Conference on Acoustics, Speech and Signal Processing, vol. 10, pp. 1589–1592 (1985)Google Scholar
- 10.Fackrell, J.W.A., McLaughlin, S.: The Higher–Order Statistics of Speech Signals. IEE Colloquium on Techniques for Speech Processing and their Applications, 7/1–7/6 (1994)Google Scholar