Skip to main content

Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria

  • Conference paper
  • First Online:
Intelligent Computing Methodologies (ICIC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9773))

Included in the following conference series:

Abstract

Separating singing voice from music accompaniment for monaural recordings is very useful in many applications, such as lyrics recognition and singer identification. Based on non-negative matrix partial co-factorization (NMPCF), we propose an improved algorithm which restricts the activation coefficients of singing voice components to be temporal continuous and sparse in each frame. Temporal continuity is favored by using a cost term which is the sum of squared difference between the activation coefficients in adjacent frames, and sparsity is favored by penalizing nonzero values for each frame. For the separated singing voice, we quantify the performance of the system by the signal-to-noise ratio (SNR) gain and the accuracy of singer identification. The experiments show that the constraints of temporal continuity and sparsity criteria both can improve the performance of singing voice separation, especially the constraint of temporal continuity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hu, Y., Liu, G.: Separation of singing voice using non-negative matrix partial co-factorization for singer identification. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 643–653 (2015)

    Article  Google Scholar 

  2. Zhu, B., Li, W., Li, R., et al.: Multi-stage non-negative matrix factorization for monaural singing voice separation. IEEE Trans. Audio Speech Lang. Process. 21(10), 2096–2107 (2013)

    Article  Google Scholar 

  3. Rafii, Z., Pardo, B.: Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2013)

    Article  Google Scholar 

  4. Becker, J.M., Christian, S., Christian, R.: NMF with spectral and temporal continuity criteria for monaural sound source separation. In: Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), 2013, pp. 316–320. IEEE (2014)

    Google Scholar 

  5. Kim, M., et al.: Non negative matrix partial co-factorization for spectral and temporal drum source separation. IEEE J. Sel. Top. Sig. Process. 5(6), 1192–1204 (2011)

    Article  Google Scholar 

  6. Smaragdis, P., Madhusudana, S., Bhiksha, R.: A sparse non parametric approach for single channel separation of known sounds. In: Advances in Neural Information Processing Systems (2009)

    Google Scholar 

  7. Sun, D.L., Mysore, G.J.: Universal speech models for speaker independent single channel source separation. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)

    Google Scholar 

  8. Lefevre, A., Bach, F., Févotte, C.: Itakura-Saito nonnegative matrix factorization with group sparsity. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2011)

    Google Scholar 

  9. Kim, M., Smaragdis, P.: Mixtures of local dictionaries for unsupervised speech enhancement. IEEE Sig. Process. Lett. 22(3), 293–297 (2015)

    Article  Google Scholar 

  10. Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)

    Article  Google Scholar 

  11. Wilson, K.W., Raj, B., Smaragdis, P.: Regularized non-negative matrix factorization with temporal dependencies for speech denoising. Interspeech 2008, Brisbane Australia, 22–26 September 2008

    Google Scholar 

  12. Hu, Y., Liu, G.: Singer identification based on computational auditory scene analysis and missing feature methods. J. Intell. Inf. Syst. 42(3), 333–352 (2014)

    Article  Google Scholar 

  13. Boersma, P., Weenink, D.: Praat: doing phonetics by computer [Computer program], Version, vol. 5, p. 21, (2005)

    Google Scholar 

  14. Klapuri, A.: A perceptually motivated multiple-f0 estimation method. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, pp. 291–294. IEEE (2005)

    Google Scholar 

  15. Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

This work is funded by the National Natural Science Foundation of China under Grants 61471311 and 61365005, and the Scientific Research Programs of the Higher Education Institution of XinJiang under Grants XJEDU2014S006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Hu, Y., Wang, L., Huang, H., Zhou, G. (2016). Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42297-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42296-1

  • Online ISBN: 978-3-319-42297-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics