Abstract
Separating singing voice from music accompaniment for monaural recordings is very useful in many applications, such as lyrics recognition and singer identification. Based on non-negative matrix partial co-factorization (NMPCF), we propose an improved algorithm which restricts the activation coefficients of singing voice components to be temporal continuous and sparse in each frame. Temporal continuity is favored by using a cost term which is the sum of squared difference between the activation coefficients in adjacent frames, and sparsity is favored by penalizing nonzero values for each frame. For the separated singing voice, we quantify the performance of the system by the signal-to-noise ratio (SNR) gain and the accuracy of singer identification. The experiments show that the constraints of temporal continuity and sparsity criteria both can improve the performance of singing voice separation, especially the constraint of temporal continuity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hu, Y., Liu, G.: Separation of singing voice using non-negative matrix partial co-factorization for singer identification. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 643–653 (2015)
Zhu, B., Li, W., Li, R., et al.: Multi-stage non-negative matrix factorization for monaural singing voice separation. IEEE Trans. Audio Speech Lang. Process. 21(10), 2096–2107 (2013)
Rafii, Z., Pardo, B.: Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2013)
Becker, J.M., Christian, S., Christian, R.: NMF with spectral and temporal continuity criteria for monaural sound source separation. In: Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), 2013, pp. 316–320. IEEE (2014)
Kim, M., et al.: Non negative matrix partial co-factorization for spectral and temporal drum source separation. IEEE J. Sel. Top. Sig. Process. 5(6), 1192–1204 (2011)
Smaragdis, P., Madhusudana, S., Bhiksha, R.: A sparse non parametric approach for single channel separation of known sounds. In: Advances in Neural Information Processing Systems (2009)
Sun, D.L., Mysore, G.J.: Universal speech models for speaker independent single channel source separation. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)
Lefevre, A., Bach, F., Févotte, C.: Itakura-Saito nonnegative matrix factorization with group sparsity. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2011)
Kim, M., Smaragdis, P.: Mixtures of local dictionaries for unsupervised speech enhancement. IEEE Sig. Process. Lett. 22(3), 293–297 (2015)
Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Wilson, K.W., Raj, B., Smaragdis, P.: Regularized non-negative matrix factorization with temporal dependencies for speech denoising. Interspeech 2008, Brisbane Australia, 22–26 September 2008
Hu, Y., Liu, G.: Singer identification based on computational auditory scene analysis and missing feature methods. J. Intell. Inf. Syst. 42(3), 333–352 (2014)
Boersma, P., Weenink, D.: Praat: doing phonetics by computer [Computer program], Version, vol. 5, p. 21, (2005)
Klapuri, A.: A perceptually motivated multiple-f0 estimation method. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, pp. 291–294. IEEE (2005)
Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)
Acknowledgments
This work is funded by the National Natural Science Foundation of China under Grants 61471311 and 61365005, and the Scientific Research Programs of the Higher Education Institution of XinJiang under Grants XJEDU2014S006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Hu, Y., Wang, L., Huang, H., Zhou, G. (2016). Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-42297-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)