Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria

Hu, Ying; Wang, Liejun; Huang, Hao; Zhou, Gang

doi:10.1007/978-3-319-42297-8_4

Ying Hu¹⁶,
Liejun Wang¹⁶,
Hao Huang¹⁶ &
…
Gang Zhou¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9773))

Included in the following conference series:

International Conference on Intelligent Computing

2981 Accesses
1 Citations

Abstract

Separating singing voice from music accompaniment for monaural recordings is very useful in many applications, such as lyrics recognition and singer identification. Based on non-negative matrix partial co-factorization (NMPCF), we propose an improved algorithm which restricts the activation coefficients of singing voice components to be temporal continuous and sparse in each frame. Temporal continuity is favored by using a cost term which is the sum of squared difference between the activation coefficients in adjacent frames, and sparsity is favored by penalizing nonzero values for each frame. For the separated singing voice, we quantify the performance of the system by the signal-to-noise ratio (SNR) gain and the accuracy of singer identification. The experiments show that the constraints of temporal continuity and sparsity criteria both can improve the performance of singing voice separation, especially the constraint of temporal continuity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hu, Y., Liu, G.: Separation of singing voice using non-negative matrix partial co-factorization for singer identification. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 643–653 (2015)
Article Google Scholar
Zhu, B., Li, W., Li, R., et al.: Multi-stage non-negative matrix factorization for monaural singing voice separation. IEEE Trans. Audio Speech Lang. Process. 21(10), 2096–2107 (2013)
Article Google Scholar
Rafii, Z., Pardo, B.: Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2013)
Article Google Scholar
Becker, J.M., Christian, S., Christian, R.: NMF with spectral and temporal continuity criteria for monaural sound source separation. In: Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), 2013, pp. 316–320. IEEE (2014)
Google Scholar
Kim, M., et al.: Non negative matrix partial co-factorization for spectral and temporal drum source separation. IEEE J. Sel. Top. Sig. Process. 5(6), 1192–1204 (2011)
Article Google Scholar
Smaragdis, P., Madhusudana, S., Bhiksha, R.: A sparse non parametric approach for single channel separation of known sounds. In: Advances in Neural Information Processing Systems (2009)
Google Scholar
Sun, D.L., Mysore, G.J.: Universal speech models for speaker independent single channel source separation. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)
Google Scholar
Lefevre, A., Bach, F., Févotte, C.: Itakura-Saito nonnegative matrix factorization with group sparsity. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2011)
Google Scholar
Kim, M., Smaragdis, P.: Mixtures of local dictionaries for unsupervised speech enhancement. IEEE Sig. Process. Lett. 22(3), 293–297 (2015)
Article Google Scholar
Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Article Google Scholar
Wilson, K.W., Raj, B., Smaragdis, P.: Regularized non-negative matrix factorization with temporal dependencies for speech denoising. Interspeech 2008, Brisbane Australia, 22–26 September 2008
Google Scholar
Hu, Y., Liu, G.: Singer identification based on computational auditory scene analysis and missing feature methods. J. Intell. Inf. Syst. 42(3), 333–352 (2014)
Article Google Scholar
Boersma, P., Weenink, D.: Praat: doing phonetics by computer [Computer program], Version, vol. 5, p. 21, (2005)
Google Scholar
Klapuri, A.: A perceptually motivated multiple-f0 estimation method. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, pp. 291–294. IEEE (2005)
Google Scholar
Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)
Article Google Scholar

Download references

Acknowledgments

This work is funded by the National Natural Science Foundation of China under Grants 61471311 and 61365005, and the Scientific Research Programs of the Higher Education Institution of XinJiang under Grants XJEDU2014S006.

Author information

Authors and Affiliations

The Institution of Information Science and Technology, Xinjiang University, Shengli Road, 14, 830001, Urumuqi, China
Ying Hu, Liejun Wang, Hao Huang & Gang Zhou

Authors

Ying Hu
View author publications
You can also search for this author in PubMed Google Scholar
Liejun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Hu .

Editor information

Editors and Affiliations

Tongji University , Shanghai, China
De-Shuang Huang
Inha University , Incheon, Korea (Republic of)
Kyungsook Han
Liverpool John Moores University , Liverpool, United Kingdom
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Y., Wang, L., Huang, H., Zhou, G. (2016). Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-42297-8_4
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics