Abstract
How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor. cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment.
Similar content being viewed by others
References
Rabiner L R, Juang B. Fundamentals on Speech Recognition. New Jersey: Prentice Hall, 1996.
Hermansky H. Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 1990, 87(4): 1738-1752.
Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech and Audio Processing, 1995, 3(1): 72-83.
Hermansky H, Morgan N. RASTA processing of speech. IEEE Trans. Speech and Audio Processing, 1994, 2(4): 578-589.
Reynolds D A. Experimental evaluation of features for robust speaker identification. IEEE Trans. Speech and Audio Processing, 1994, 2(4): 639-643.
Mammone R, Zhang X, Ramachandran R P. Robust speaker recognition: A feature-based approach. IEEE Signal Process. Mag, 1996, 13(5): 58-71.
Van Vuuren S. Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch. In Proc. ICSLP1996, Oct. 3-6, 1996, Vol.3, pp.1788-1791.
Berouti M, Schwartz R, Makhoul J, Beranek B, Newman I, Cambridge M A. Enhancement of speech corrupted by acoustic noise. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1979), Washington DC, USA, April 2-4, 1979, Vol.4, pp.208-211.
Wu M Y, Wang D L. A two-stage algorithm for onemicrophone reverberant speech enhancement. IEEE Transactions on Speech and Audio Processing, 2006, 14(3): 774-784.
Hu Y, Loizou P C. A perceptually motivated subspace approach for speech enhancement. In Proc. the Seventh International Conference on Spoken Language Processing, Denver, USA, Sept. 16-20, 2002.
Hermus K, Wambacq P, Van Hamme H. A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Applied Signal Processing, 2007, (1): 195-209.
Mami Y, Charlet D. Speaker identification by anchor models with PCA/LDA post-processing. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong, China, April 6-10, 2003, pp.180-183.
Wilson K W, Raj B, Smaragdis P, Divakaran A. Speech denoising using nonnegative matrix factorization with priors. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), Las Vegas, USA, March 30-April 4, 2008, pp.4029-4032.
Wang K, Shamma S A. Spectral shape analysis in the central auditory system. IEEE Transactions on Speech and Audio Processing, 1995, 3(5): 382-395.
Yang X, Wang K, Shamma S A. Auditory representation of acoustic signals. IEEE Trans. Information Theory, 1992, 38(2): 824-839.
Mesgarani N, Slaney M, Shamma S A. Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations. IEEE Trans. Audio, Speech, and Language Processing, 2006, 14(3): 920-930.
Woojay J, Juang B H. Speech analysis in a model of the central auditory system. IEEE Trans. Audio, Speech, and Language Processing, 2008, 15(6): 1802-1817.
Carroll J D, Chang J J. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika, 1970, 35(3): 283-319.
Harshman R A. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, 1970, 16: 1-84.
Bro R. PARAFAC: Tutorial and applications. Chemometrics and Intelligent Laboratory Systems, 1997, 38(2): 149-171.
Kroonenberg P M, de Leeuw J. Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika, 1980, 45(1): 69-97.
Lathauwer L D. Signal processing based on multilinear algebra [Ph.D. Dissertation]. Katholike Universiteit Leuven, 1997.
Lathauwer L D, Moor B D, Vandewalle J. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 2000, 21(4): 1253-1278.
Welling M, Weber M. Positive tensor factorization. Pattern Recognition Letters, 2001, 22(12): 1255-1261.
Shashua A, Hazan T. Non-negative tensor factorization with applications to statistics and computer vision. In Proc. IEEE International Conference on the International Conference on Machine Learning (ICML), Bonn, Germany, Aug. 7-11, 2005, pp.792-799.
Cichocki A, Zdunek R, Choi S, Plemmons R, Amari S. Nonnegative tensor factorization using alpha and beta divergences. In Proc. Acoustics, Speech and Signal Processing, Honolulu, USA, April 15-20, 2007, Vol.3, pp.1393-1396.
Vasilescu M A O, Terzopoulos D. Multilinear independent components analysis. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Diego, USA, Jan. 20-26, 2005, Vol.1, pp.547-553.
Tao D C, Li X L, Wu X D, Maybank S J. General tensor discriminant analysis and Gabor feature for gait recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(10): 1700-1715.
Tao D C, Li X L, Wu X D, Maybank S J. Tensor rank one discriminant analysis — A convergent method for discriminative multilinear subspace selection. Neurocomputing, 2008, 71(10-12): 1866-1882.
Stegeman A, Sidiropoulos N D. On Kruskal’s uniqueness condition for the Candecomp/Parafac decomposition. Linear Algebra and Its Applications, 2007, 420(2/3): 540-552.
Comon, P. Mathematics in Signal Processing V. Oxford University Press, USA, 2002.
Lee D D, Seung H S. Algorithms for non-negative matrix factorization. In Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, Dec. 3-8, 2001, 13: 556-562.
Pascual-Montano A, Carazo J M, Kochi K, Lehmann D, Pascual-Marqui R D. Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(3): 403-415.
Merzenich M M, Knight P L, Roth G L. Representation of cochea within the primary auditory cortex in cat. Journal of Neurophysiology. 1975, 38(2): 231-249.
Chi T, Ru P, Shamma S A. Multiresolution spectrotemporal analysis of complex sounds. The Journal of the Acoustical Society of America, 2005, 118(2): 887-906.
Wang K, Shamma S A. Self-normalization and noiserobustness in early auditory representations. IEEE Transactions on Speech and Audio Processing, 1994, 2(3): 421-435.
Mendelson J R, Cynader M S. Sensitivity of cat primary auditory cortex (AI) neurons to the direction and rate of frequency modulation. Brain Research, 1985, 327(1/2): 331-335.
Qiu A, Schreiner C E, Escabi M A. Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition. Journal of Neurophysiology, 2003, 90(1): 456-476.
Ezzat T, Bouvrie J, Poggio T. Max-Gabor analysis and synthesis of spectrograms. In Proc. Ninth International Conference on Spoken Language Processing (ICASLP 2006), Pittsburg, USA, Sept. 17-21, 2006.
Cho Y C, Choi S. Nonnegative features of spectro-temporal sounds for classification. Pattern Recognition Letters, 2005, 26(9): 1327-1336.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work was supported by the National Natural Science Foundation of China under Grant No. 60775007, the National Basic Research 973 Program of China under Grant No. 2005CB724301, and the Science and Technology Commission of Shanghai Municipality under Grant No. 08511501701.
Rights and permissions
About this article
Cite this article
Wu, Q., Zhang, LQ. & Shi, GC. Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization. J. Comput. Sci. Technol. 25, 783–792 (2010). https://doi.org/10.1007/s11390-010-9365-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-010-9365-6