Advertisement

Speaker Verification with Adaptive Spectral Subband Centroids

  • Tomi Kinnunen
  • Bingjun Zhang
  • Jia Zhu
  • Ye Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4642)

Abstract

Spectral subband centroids (SSC) have been used as an additional feature to cepstral coefficients in speech and speaker recognition. SSCs are computed as the centroid frequencies of subbands and they capture the dominant frequencies of the short-term spectrum. In the baseline SSC method, the subband filters are pre-specified. To allow better adaptation to formant movements and other dynamic phenomena, we propose to adapt the subband filter boundaries on a frame-by-frame basis using a globally optimal scalar quantization scheme. The method has only one control parameter, the number of subbands. Speaker verification results on the NIST 2001 task indicate that the selection of the parameter is not critical and that the method does not require additional feature normalization.

Keywords

Equal Error Rate Speaker Recognition Scalar Quantizer Speaker Verification Noisy Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  2. 2.
    Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2001), Crete, Greece, pp. 213–218 (2001)Google Scholar
  3. 3.
    Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.: A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing 2004(4), 430–451 (2004)CrossRefGoogle Scholar
  4. 4.
    Gajić, B., Paliwal, K.: Robust speech recognition in noisy environments based on subband spectral centroid histograms. IEEE Trans. Audio, Speech and Language Processing 14(2), 600–608 (2006)CrossRefGoogle Scholar
  5. 5.
    Paliwal, K.: Spectral subband centroid features for speech recognition. In: ICASSP 1998. Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Seattle, USA, vol. 2, pp. 617–620 (1998)Google Scholar
  6. 6.
    Seo, J., Jin, M., Lee, S., Jang, D., Lee, S., Yoo, C.: Audio fingerprinting based on normalized spectral subband centroids. In: ICASSP 2005. Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 3, pp. 213–216 (2005)Google Scholar
  7. 7.
    Thian, N., Sanderson, C., Bengio, S.: Spectral subband centroids as complementary features for speaker authentication. In: Zhang, D., Jain, A.K. (eds.) ICBA 2004. LNCS, vol. 3072, pp. 631–639. Springer, Heidelberg (2004)Google Scholar
  8. 8.
    Wu, X.: Optimal quantization by matrix searching. Journal of Algorithms 12(4), 663–673 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Gersho, A., Gray, R.: Vector Quantization and Signal Compression. Kluwer Academic Publishers, Boston (1991)Google Scholar
  10. 10.
    Reynolds, D., Quatieri, T., Dunn, R.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1), 19–41 (2000)CrossRefGoogle Scholar
  11. 11.
    Rose, P.: Forensic Speaker Identification. Taylor & Francis, London (2002)Google Scholar
  12. 12.
    Chen, J., Huang, Y., Li, Q., Paliwal, K.: Recognition of noisy speech using dynamic spectral subband centroids. IEEE Signal Processing Letters 11(2), 258–261 (2004)CrossRefGoogle Scholar
  13. 13.
    Atal, B.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustic Society of America 55(6), 1304–1312 (1974)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Tomi Kinnunen
    • 1
  • Bingjun Zhang
    • 2
  • Jia Zhu
    • 2
  • Ye Wang
    • 2
  1. 1.Speech and Dialogue Processing Lab, Institution for Infocomm Research (I2R), 21 Heng Mui Keng Terrace, 119613Singapore
  2. 2.Department of Computer Science, School of Computing, National University of Singapore (NUS), 3 Science Drive 2, 117543Singapore

Personalised recommendations