Skip to main content
Log in

Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor. cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Rabiner L R, Juang B. Fundamentals on Speech Recognition. New Jersey: Prentice Hall, 1996.

    Google Scholar 

  2. Hermansky H. Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 1990, 87(4): 1738-1752.

    Article  Google Scholar 

  3. Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech and Audio Processing, 1995, 3(1): 72-83.

    Article  Google Scholar 

  4. Hermansky H, Morgan N. RASTA processing of speech. IEEE Trans. Speech and Audio Processing, 1994, 2(4): 578-589.

    Article  Google Scholar 

  5. Reynolds D A. Experimental evaluation of features for robust speaker identification. IEEE Trans. Speech and Audio Processing, 1994, 2(4): 639-643.

    Article  Google Scholar 

  6. Mammone R, Zhang X, Ramachandran R P. Robust speaker recognition: A feature-based approach. IEEE Signal Process. Mag, 1996, 13(5): 58-71.

    Article  Google Scholar 

  7. Van Vuuren S. Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch. In Proc. ICSLP1996, Oct. 3-6, 1996, Vol.3, pp.1788-1791.

  8. Berouti M, Schwartz R, Makhoul J, Beranek B, Newman I, Cambridge M A. Enhancement of speech corrupted by acoustic noise. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1979), Washington DC, USA, April 2-4, 1979, Vol.4, pp.208-211.

  9. Wu M Y, Wang D L. A two-stage algorithm for onemicrophone reverberant speech enhancement. IEEE Transactions on Speech and Audio Processing, 2006, 14(3): 774-784.

    Article  Google Scholar 

  10. Hu Y, Loizou P C. A perceptually motivated subspace approach for speech enhancement. In Proc. the Seventh International Conference on Spoken Language Processing, Denver, USA, Sept. 16-20, 2002.

  11. Hermus K, Wambacq P, Van Hamme H. A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Applied Signal Processing, 2007, (1): 195-209.

    Google Scholar 

  12. Mami Y, Charlet D. Speaker identification by anchor models with PCA/LDA post-processing. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong, China, April 6-10, 2003, pp.180-183.

  13. Wilson K W, Raj B, Smaragdis P, Divakaran A. Speech denoising using nonnegative matrix factorization with priors. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), Las Vegas, USA, March 30-April 4, 2008, pp.4029-4032.

  14. Wang K, Shamma S A. Spectral shape analysis in the central auditory system. IEEE Transactions on Speech and Audio Processing, 1995, 3(5): 382-395.

    Article  Google Scholar 

  15. Yang X, Wang K, Shamma S A. Auditory representation of acoustic signals. IEEE Trans. Information Theory, 1992, 38(2): 824-839.

    Article  Google Scholar 

  16. Mesgarani N, Slaney M, Shamma S A. Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations. IEEE Trans. Audio, Speech, and Language Processing, 2006, 14(3): 920-930.

    Article  Google Scholar 

  17. Woojay J, Juang B H. Speech analysis in a model of the central auditory system. IEEE Trans. Audio, Speech, and Language Processing, 2008, 15(6): 1802-1817.

    Google Scholar 

  18. Carroll J D, Chang J J. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika, 1970, 35(3): 283-319.

    Article  MATH  Google Scholar 

  19. Harshman R A. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, 1970, 16: 1-84.

    Google Scholar 

  20. Bro R. PARAFAC: Tutorial and applications. Chemometrics and Intelligent Laboratory Systems, 1997, 38(2): 149-171.

    Article  Google Scholar 

  21. Kroonenberg P M, de Leeuw J. Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika, 1980, 45(1): 69-97.

    Article  MathSciNet  MATH  Google Scholar 

  22. Lathauwer L D. Signal processing based on multilinear algebra [Ph.D. Dissertation]. Katholike Universiteit Leuven, 1997.

  23. Lathauwer L D, Moor B D, Vandewalle J. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 2000, 21(4): 1253-1278.

    Article  MathSciNet  MATH  Google Scholar 

  24. Welling M, Weber M. Positive tensor factorization. Pattern Recognition Letters, 2001, 22(12): 1255-1261.

    Article  MATH  Google Scholar 

  25. Shashua A, Hazan T. Non-negative tensor factorization with applications to statistics and computer vision. In Proc. IEEE International Conference on the International Conference on Machine Learning (ICML), Bonn, Germany, Aug. 7-11, 2005, pp.792-799.

  26. Cichocki A, Zdunek R, Choi S, Plemmons R, Amari S. Nonnegative tensor factorization using alpha and beta divergences. In Proc. Acoustics, Speech and Signal Processing, Honolulu, USA, April 15-20, 2007, Vol.3, pp.1393-1396.

  27. Vasilescu M A O, Terzopoulos D. Multilinear independent components analysis. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Diego, USA, Jan. 20-26, 2005, Vol.1, pp.547-553.

  28. Tao D C, Li X L, Wu X D, Maybank S J. General tensor discriminant analysis and Gabor feature for gait recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(10): 1700-1715.

    Article  Google Scholar 

  29. Tao D C, Li X L, Wu X D, Maybank S J. Tensor rank one discriminant analysis — A convergent method for discriminative multilinear subspace selection. Neurocomputing, 2008, 71(10-12): 1866-1882.

    Article  Google Scholar 

  30. Stegeman A, Sidiropoulos N D. On Kruskal’s uniqueness condition for the Candecomp/Parafac decomposition. Linear Algebra and Its Applications, 2007, 420(2/3): 540-552.

    Article  MathSciNet  MATH  Google Scholar 

  31. Comon, P. Mathematics in Signal Processing V. Oxford University Press, USA, 2002.

    Google Scholar 

  32. Lee D D, Seung H S. Algorithms for non-negative matrix factorization. In Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, Dec. 3-8, 2001, 13: 556-562.

  33. Pascual-Montano A, Carazo J M, Kochi K, Lehmann D, Pascual-Marqui R D. Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(3): 403-415.

    Article  Google Scholar 

  34. Merzenich M M, Knight P L, Roth G L. Representation of cochea within the primary auditory cortex in cat. Journal of Neurophysiology. 1975, 38(2): 231-249.

    Google Scholar 

  35. Chi T, Ru P, Shamma S A. Multiresolution spectrotemporal analysis of complex sounds. The Journal of the Acoustical Society of America, 2005, 118(2): 887-906.

    Article  Google Scholar 

  36. Wang K, Shamma S A. Self-normalization and noiserobustness in early auditory representations. IEEE Transactions on Speech and Audio Processing, 1994, 2(3): 421-435.

    Article  Google Scholar 

  37. Mendelson J R, Cynader M S. Sensitivity of cat primary auditory cortex (AI) neurons to the direction and rate of frequency modulation. Brain Research, 1985, 327(1/2): 331-335.

    Article  Google Scholar 

  38. Qiu A, Schreiner C E, Escabi M A. Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition. Journal of Neurophysiology, 2003, 90(1): 456-476.

    Article  Google Scholar 

  39. Ezzat T, Bouvrie J, Poggio T. Max-Gabor analysis and synthesis of spectrograms. In Proc. Ninth International Conference on Spoken Language Processing (ICASLP 2006), Pittsburg, USA, Sept. 17-21, 2006.

  40. Cho Y C, Choi S. Nonnegative features of spectro-temporal sounds for classification. Pattern Recognition Letters, 2005, 26(9): 1327-1336.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-Qing Zhang.

Additional information

The work was supported by the National Natural Science Foundation of China under Grant No. 60775007, the National Basic Research 973 Program of China under Grant No. 2005CB724301, and the Science and Technology Commission of Shanghai Municipality under Grant No. 08511501701.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Q., Zhang, LQ. & Shi, GC. Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization. J. Comput. Sci. Technol. 25, 783–792 (2010). https://doi.org/10.1007/s11390-010-9365-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-010-9365-6

Keywords

Navigation