Multimedia Tools and Applications

, Volume 72, Issue 1, pp 925–949 | Cite as

Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures

  • Francisco José Rodríguez-SerranoEmail author
  • Julio José Carabias-Orti
  • Pedro Vera-Candeas
  • Francisco Jesús Canadas-Quesada
  • Nicolás Ruiz-Reyes


In this paper we propose a monophonic constrained signal decomposition model applied to polyphonic signals composed of several monophonic sources from different musical instruments. The harmonic constraint is particularly effective for tonal instruments because each note is associated with a unique basis. The monophonic constraint is implemented by enforcing single-non-zero gains per instrument in the factorization process. The proposed method uses previously trained instrument models with a supervised procedure. Both constraints (harmonic and monophonic) are implemented in a deterministic manner. The proposed method has been tested for two audio signal applications, Sound Source Separation and Automatic Music Transcription. Comparison with other state-of-the-art methods using a dataset of polyphonic mixtures composed of monophonic sources has produced competitive and promising results.


Non-negative sparse coding (NNSC) Sparse representations Non-negative matrix factorization (NMF) Spectral analysis Harmonicity Sparsity Monophony Music transcription Source separation 



This work was supported by the Andalusian Business, Science and Innovation Council under project P10- TIC-6762, (FEDER) the Spanish Ministry of Science and Innovation under Project TEC2009-14414-C03-02, and the University of Jaen under Project R1/12/2010/64.

The authors would like to thank Z. Duan for kindly sharing his annotated real world music database with them.


  1. 1.
    Abdallah S, Plumbley M (2004) Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proc. 5th Int. Society for Music Information Retrieval conf. (ISMIR), Barcelona, SpainGoogle Scholar
  2. 2.
    Abdallah S, Plumbley M (2006) Unsupervised analysis of polyphonic music by sparse coding. IEEE Trans Neural Netw 17(1):179–196CrossRefGoogle Scholar
  3. 3.
    Benaroya L, Bimbot F, Gribonval R (2006) Audio source separation with a single sensor. IEEE Trans Audio Speech Lang Process 14(1):191–199CrossRefGoogle Scholar
  4. 4.
    Bertin N, Badeau R, Vincent E (2010) Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans Audio Speech Lang Process 18(3):538–549CrossRefGoogle Scholar
  5. 5.
    Candés EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30CrossRefGoogle Scholar
  6. 6.
    Carabias-Orti JJ, Virtanen T, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada FJ (2011) Musical instrument sound multi-excitation model for non-negative spectrogram factorization. IEEE J Sel Topics Signal Process 5(6):1144–1158CrossRefGoogle Scholar
  7. 7.
    Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61CrossRefMathSciNetGoogle Scholar
  8. 8.
    Dixon S (2000) On the computer recognition of solo piano music. In: Proceedings of Australasian computer music conferenceGoogle Scholar
  9. 9.
    Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Topics Signal Process 5(6):1205–1215CrossRefGoogle Scholar
  10. 10.
    Duan Z, Pardo B, Zhang C (2010) Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans Audio Speech Lang Process 18(8):2121–2133CrossRefGoogle Scholar
  11. 11.
    Every MR, Szymanski JE (2006) Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Trans Audio Speech Lang Process 14(5):1845–1856CrossRefGoogle Scholar
  12. 12.
    Févotte C, Idier J (2011) Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Comput 23(9):2421–2456CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura–Saito divergence. With application to music analysis. Neural Comput 21(3):793–830CrossRefzbMATHGoogle Scholar
  14. 14.
    FitzGerald D, Cranitch M, Coyle E (2009) On the use of the beta divergence for musical source separation. In: Signals and systems conference (ISSC 2009), IET Irish, 10–11 June 2009, pp 1–6Google Scholar
  15. 15.
    Gainza M, Coyle E (2007) Automating ornamentation transcription. In: IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007, vol 1, 15–20 April 2007, pp I-69–I-72Google Scholar
  16. 16.
    Gemmeke JF, Virtanen T, Hurmalainen A (2011) Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans Audio Speech Lang Process 19(7):2067–2080CrossRefGoogle Scholar
  17. 17.
    Goto M (2004) Development of the RWC music database. In: Proc. of the 18th international congress on acoustics (ICA 2004), pp I-553–I-556 (invited paper)Google Scholar
  18. 18.
    Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical, and jazz music databases. In: Proc. of the 3rd Int. Society for Music Information Retrieval conf. (ISMIR), Paris, FranceGoogle Scholar
  19. 19.
    Gribonval R, Bacry E (2003) Harmonic decomposition of audio signals with matching pursuit. IEEE Trans Signal Process 51(1):101–111CrossRefMathSciNetGoogle Scholar
  20. 20.
    Helen M, Virtanen T (2005) Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: Proc. EUSIPCOGoogle Scholar
  21. 21.
    Hoyer P (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469zbMATHMathSciNetGoogle Scholar
  22. 22.
    Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13:411–430CrossRefGoogle Scholar
  23. 23.
    Klapuri A (2004) Signal processing methods for the automatic transcription of music. PhD thesis, Tampere University of TechnologyGoogle Scholar
  24. 24.
    Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791CrossRefGoogle Scholar
  25. 25.
    Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proc. of neural information processing systems, Denver, USAGoogle Scholar
  26. 26.
    Marxer R, Jordi J, Bonada J (2012) Low-latency instrument separation in polyphonic audio using timbre models. In: Proc. LVA/ICAGoogle Scholar
  27. 27.
    Namgook C, Kuo C-CJ (2009) Underdetermined audio source separation from anechoic mixtures with long time delay. In: IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009, 19–24 April 2009, pp 1557–1560Google Scholar
  28. 28.
    Olshausen BA, Field DF (1997) Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis Res 37:3311–3325CrossRefGoogle Scholar
  29. 29.
    Ozerov A, Févotte C (2010) Multichannel non-negative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Lang Process 18(3):550–563CrossRefGoogle Scholar
  30. 30.
    Ozerov A, Févotte C, Charbit M (2009) Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: IEEE workshop on applications of signal processing to audio and acoustics, WASPAA’09, pp 121–124Google Scholar
  31. 31.
    Ozerov A, Vincent E, Bimbot F (2012) A general flexible framework for the handling of prior information in audio source separation. IEEE Trans Audio Speech Lang Process 20(4):1118–1133CrossRefGoogle Scholar
  32. 32.
    Plumbley M (2003) Algorithms for nonnegative independent component analysis. IEEE Trans Neural Netw 14(3):534–543CrossRefGoogle Scholar
  33. 33.
    Raczyński SA, Ono N, Sagayama S (2007) Multipitch analysis with harmonic nonnegative matrix approximation. In: Proc. int. conf. music inf. retrieval (ISMIR), pp 381–386Google Scholar
  34. 34.
    Reyes-Gomez MJ, Raj B, Ellis D (2003) Multi-channel source separation by factorial HMMs. In: Proc. ICASSP, vol I, pp 664–667Google Scholar
  35. 35.
    Sawada H, Araki S, Makino S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans Audio Speech Lang Process 19(3):516–527CrossRefGoogle Scholar
  36. 36.
    Smaragdis P (1998) Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22:21–34CrossRefzbMATHGoogle Scholar
  37. 37.
    Valentin E, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio Speech Lang Process 19(7):2046–2057CrossRefGoogle Scholar
  38. 38.
    Vincent E (2012) Improved perceptual metrics for the evaluation of audio source separation. In: 10th int. conf. on latent variable analysis and signal separation (LVA/ICA 2012)Google Scholar
  39. 39.
    Vincent E, Bertin N, Badeau R (2010) Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans Audio Speech Lang Process 18(3):528–537CrossRefGoogle Scholar
  40. 40.
    Virtanen T (2007) Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074CrossRefGoogle Scholar
  41. 41.
    Virtanen T, Klapuri A (2006) Analysis of polyphonic audio using source-filter model and non-negative matrix factorization. In: Advances in models for acoustic processing, neural information processing systems workshopGoogle Scholar
  42. 42.
    Virtanen T, Cemgil AT, Godsill S (2008) Bayesian extensions to non-negative matrix factorisation for audio signal modeling. In: Proc. int. conf. acoust., speech, signal process. (ICASSP), Las Vegas, USAGoogle Scholar
  43. 43.
    Wang B, Plumbley MD (2005) Musical audio stream separation by non-negative matrix factorization. In: Proc. DMRN summer conference, GlasgowGoogle Scholar
  44. 44.
    Zibulevsky M, Kisilev P, Zeevi YY, Pearlmutter B (2002) Blind source separation via multinode sparse representation. In: NIPSGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Francisco José Rodríguez-Serrano
    • 1
    Email author
  • Julio José Carabias-Orti
    • 1
  • Pedro Vera-Candeas
    • 1
  • Francisco Jesús Canadas-Quesada
    • 1
  • Nicolás Ruiz-Reyes
    • 1
  1. 1.Telecommunication Engineering DepartmentUniversity of JaenJaenSpain

Personalised recommendations