Improvement in monaural speech separation using sparse non-negative tucker decomposition
- 57 Downloads
A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced in this paper. In the proposed work, the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. By using the proposed algorithm, the vector components of both target and mixed signal can be exploited and used for the separation of any monaural mixture. Experiment was done on the monaural data generated by mixing the speech signals from two speakers and, by mixing noise and speech signals using TIMIT and noisex-92 dataset. The separation results are compared with the other existing algorithms in terms of correlation of separated signal with the original signal, signal to distortion ratio, perceptual evaluation of speech quality and short-time objective intelligibility. Further, to get more conclusive information about separation ability, speech recognition using Kaldi toolkit was also performed. The recognition results are compared in terms of word error rate (WER) using the MFCC based features. Results show the average improved WER using proposed algorithm over the nearest performing algorithm is up to 2.7% for mixed speech of two speakers and 1.52% for noisy speech input.
KeywordsNon-negative matrix factorization Kaldi ASR toolkit Non-negative tucker decomposition Sparse NTD
- Anastasakos, T., McDonough, J., & Makhoul, J. (1997). Speaker adaptive training: A maximum likelihood approach to speaker normalization. In IEEE international conference on acoustics, speech, and signal processing (pp. 1043–1046).Google Scholar
- Bavkar, S. (2013). PCA based single channel speech enhancement method for highly noisy environment. In Advances in computing, communications and informatics (ICACCI) (pp. 1103–1107).Google Scholar
- Bertin, N., Févotte, C., & Badeau, R. (2009). A tempering approach for Itakura-Saito non-negative matrix factorization. With application to music transcription. In Proceedings of ICASSP, IEEE international conference on acoustics, speech and signal processing (pp. 1545–1548).Google Scholar
- Févotte, C. (2011). Majorization-minization algorithm for smooth Itakuro-Saito non-negative matrix factorization. Compute 1980–1983. https://doi.org/10.1109/ICASSP.2011.5946898.
- Févotte, C., Gribonval, R., & Vincent, E. (2005). BSS EVAL Toolbox User Guide. Tech Rep 1706, IRISA.Google Scholar
- Garofolo, J., Lamel, L., & Fisher, W., et al. (1988). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA.Google Scholar
- Guan, N., Lan, L., & Tao, D., et al. (2014). Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation. In Proceedings of ICASSP, IEEE international conference on acoustics, speech and signal processing (pp 2534–2538).Google Scholar
- ITU. (2001). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In ITU-T recommendation (pp. 1–32).Google Scholar
- Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). Berlin: SpringerGoogle Scholar
- Khademian, M., & Mehdi, M. (2016). Monaural multi-talker speech recognition using factorial speech processing models. 1–28.Google Scholar
- Kim, Y.-D. & Choi, S. (2007). Nonnegative tucker decomposition. 1–8. https://doi.org/10.1109/CVPR.2007.383405.
- Kolda, T. G. (2006) Multilinear operators for higher-order decompositions, SANDIA Report SAND2006-2081.Google Scholar
- Lef, A., & Bach, F. (2011). Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence to cite this version: online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence.Google Scholar
- Mallat, S. (1998) A wavelet tour of signal processing: the sparse way (3rd ed.). Cambridge: Academic Press.Google Scholar
- Mørup, M., & Hansen, L. K. (2009) Tuning pruning in sparse non-negative matrix factorization. In European signal processing conference (pp. 1923–1927).Google Scholar
- Plátek, O. (2014). Automatic speech recognition using Kaldi. Charles University in Prague.Google Scholar
- Povey, D., Ghoshal, A., Boulianne, G., et al. (2011). The Kaldi speech recognition toolkit. In IEEE workshop on automatic speech recognition and understanding (pp. 1–4). https://doi.org/10.1017/CBO9781107415324.004.
- Schmidt, M., Winther, O., & Hansen, L. K. (2009). Bayesian non-negative matrix factorization. In Independent component analysis and signal separation (pp. 540–547).Google Scholar
- Stern, R. M. (2003). Signal separation motivated by human auditory perception: Applications to automatic speech recognition. In NSF symposium on speech separation.Google Scholar
- Upadhyaya, P., Mittal, S. K., Varshney, Y. V., et al. (2017) Speaker adaptive model for hindi speech using Kaldi speech recognition toolkit. In International conference on multimedia, signal processing and communication technologies (IMPACT) (pp. 222–226).Google Scholar
- Varshney, Y. V., Abbasi, Z. A., Abidi, M. R., & Farooq, O. (2017a). Variable sparsity regularization factor based SNMF for monaural speech separation. In 2017 40th international conference on telecommunications and signal processing, TSP 2017.Google Scholar
- Vincent, E., Gribonval, R., & F´evotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing Institute of Electrical and Electronics Engineers, 14, 1462–1469.Google Scholar
- Virtanen, T., Cemgil, A. T., & Godsill, S. (2008). Bayesian extensions to non-negative matrix factorisation for audio signal modelling. In Proceedings of ICASSP, IEEE international conference on acoustics, speech, and signal processing (pp. 1825–1828). https://doi.org/10.1109/ICASSP.2008.4517987.
- Young, S., Hain, T., & Woodland, P., et al. (2002). The HTK book (for version 3.2.1). Cambridge: Cambridge University Engineering Department.Google Scholar
- Yuan, Z., Yang, Z., & Oja, E. (2007) Projective nonnegative matrix factorization: Sparseness, orthogonality, and clustering. Helsinki University of Technology 1–14.Google Scholar