Advertisement

Real-Time Detection of Overlapping Sound Events with Non-Negative Matrix Factorization

  • Arnaud Dessein
  • Arshia Cont
  • Guillaume Lemaitre
Chapter

Abstract

In this paper, we investigate the problem of real-time detection of overlapping sound events by employing non-negative matrix factorization techniques. We consider a setup where audio streams arrive in real-time to the system and are decomposed onto a dictionary of event templates learned off-line prior to the decomposition. An important drawback of existing approaches in this context is the lack of controls on the decomposition. We propose and compare two provably convergent algorithms that address this issue, by controlling respectively the sparsity of the decomposition and the trade-off of the decomposition between the different frequency components. Sparsity regularization is considered in the framework of convex quadratic programming, while frequency compromise is introduced by employing the beta-divergence as a cost function. The two algorithms are evaluated on the multi-source detection tasks of polyphonic music transcription, drum transcription and environmental sound recognition. The obtained results show how the proposed approaches can improve detection in such applications, while maintaining low computational costs that are suitable for real-time.

Keywords

Real-time multi-source detection overlapping sound events non-negative matrix factorization convex quadratic programming sparsity regularization beta-divergence frequency compromise. 

Notes

Acknowledgments

This work was partially funded by a doctoral fellowship from the UPMC (EDITE). The authors would like to thank Chunghsin Yeh and Roland Badeau for their valuable help, Emmanouil Benetos for his helpful comments on the paper, Valentin Emiya for kindly providing the MAPS database, as well as Patrick Hoyer and Emmanuel Vincent for sharing their source code.

References

  1. 1.
    Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. 5(2), 111–126 (1994)CrossRefGoogle Scholar
  2. 2.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  3. 3.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 13, pp. 556–562. MIT Press, Cambridge, (2001)Google Scholar
  4. 4.
    Sha, F., Saul, L.K.: Real-time pitch determination of one or more voices by nonnegative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1233–1240. MIT Press, Cambridge, (2005)Google Scholar
  5. 5.
    Cheng, C.-C., Hu, D.J., Saul, L.K.: Nonnegative matrix factorization for real time musical analysis and sight-reading evaluation. In: 33rd IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2017–2020. Las Vegas, USA (2008)Google Scholar
  6. 6.
    Paulus, J., Virtanen, T.: Drum transcription with non-negative spectrogram factorisation. In: 13th European Signal Processing Conference, Antalya, Turkey (2005)Google Scholar
  7. 7.
    Niedermayer, B.: Non-negative matrix division for the automatic transcription of polyphonic music. In: 9th International Conference on Music Information Retrieval, pp. 544–549. Philadelphia, USA (2008)Google Scholar
  8. 8.
    Cont, A.: Realtime multiple pitch observation using sparse non-negative constraints. In: 7th International Conference on Music Information Retrieval, Victoria, Canada (2006)Google Scholar
  9. 9.
    Cont, A., Dubnov, S., Wessel, D.: Realtime multiple-pitch and multiple-instrument recognition for music signals using sparse non-negative constraints. In: 10th International Conference on Digital Audio Effects, Bordeaux, France (2007)Google Scholar
  10. 10.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)Google Scholar
  11. 11.
    Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with quadratic programming. Neurocomputing 71(10–12), 2309–2320 (2008)Google Scholar
  12. 12.
    Sha, F., Lin, Y., Saul, L.K., Lee, D.D.: Multiplicative updates for nonnegative quadratic programming. Neural Comput. 19(8), 2004–2031 (2007)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika 85(3), 549–559 (1998)Google Scholar
  14. 14.
    Eguchi, S., Kano, Y.: Robustifying Maximum Likelihood Estimation. Technical Report, Institute of Statistical Mathematics, Tokyo, Japan (2001)Google Scholar
  15. 15.
    O’Grady, P.D., Pearlmutter, B.A.: Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1–3), 88–101 (2008)Google Scholar
  16. 16.
    FitzGerald, D., Cranitch, M., Coyle, E.: On the use of the beta divergence for musical source separation. In: 20th IET Irish Signals and Systems Conference, Galway, Ireland (2009)Google Scholar
  17. 17.
    Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)Google Scholar
  18. 18.
    Hennequin, R., Badeau, R., David, B.: Time-dependent parametric and harmonic templates in non-negative matrix factorization. In: 13th International Conference On Digital Audio Effects, pp. 246–253. Graz, Austria (2010)Google Scholar
  19. 19.
    Hennequin, R., Badeau, R., David, B.: NMF with time-frequency activations to model nonstationary audio events. IEEE Trans. Audio Speech Lang. Process. 19(4), 744–753 (2011)Google Scholar
  20. 20.
    Nakano, M., Kameoka, H., Le Roux, J., Kitano, Y., Ono, N., Sagayama, S.: Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with \(\beta \)-divergence. In: IEEE International Workshop on Machine Learning for Signal Processing, pp. 283–288. Kittilä, Finland (2010)Google Scholar
  21. 21.
    Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)Google Scholar
  22. 22.
    Badeau, R., Bertin, N., Vincent, E.: Stability analysis of multiplicative update algorithms and application to nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(12), 1869–1881 (2010)CrossRefGoogle Scholar
  23. 23.
    Dessein, A., Cont, A., Lemaitre, G.: Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In: 11th International Society for Music Information Retrieval Conference, pp. 489–494. Utrecht, Netherlands (2010)Google Scholar
  24. 24.
    Berry, M.W., Browne, M., Langville, A., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Analysis. 52(1), 155–173 (2007)Google Scholar
  25. 25.
    Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.-i.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley-Blackwell (2009)Google Scholar
  26. 26.
    Abdallah, S.A., Plumbley, M.D.: Polyphonic music transcription by non-negative sparse coding of power spectra. In: 5th International Conference on Music Information Retrieval, pp. 318–325. Barcelona, Spain (2004)Google Scholar
  27. 27.
    Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180. New Paltz, USA (2003)Google Scholar
  28. 28.
    Virtanen, T., Klapuri, A.: Analysis of polyphonic audio using source-filter model and non-negative matrix factorization. In: Neural Information Processing Systems Workshop on Advances in Models for Acoustic Processing, (2006)Google Scholar
  29. 29.
    Raczyński, S.A., Ono, N., Sagayama, S.: Multipitch analysis with harmonic nonnegative matrix approximation. In: 8th International Conference on Music Information Retrieval, pp. 381–386. Vienna, Austria (2007)Google Scholar
  30. 30.
    Marolt, M.: Non-negative matrix factorization with selective sparsity constraints for transcription of bell chiming recordings. In: 6th Sound and Music Computing Conference, pp. 137–142. Porto, Portugal (2009)Google Scholar
  31. 31.
    Grindlay, G., Ellis, D.P.W.: Multi-voice polyphonic music transcription using eigeninstruments. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, USA (2009)Google Scholar
  32. 32.
    Févotte, C., Bertin, N., Durrieu, J.-L.: Nonnegative matrix factorization with the Itakura-Saito divergence with application to music analysis. Neural Comput. 21(3), 793–830 (2009)Google Scholar
  33. 33.
    Févotte, C.: Itakura-Saito nonnegative factorizations of the power spectrogram for music signal decomposition. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems, pp. 266–296. IGI Global Press (2010)Google Scholar
  34. 34.
    Bertin, N., Févotte, C., Badeau, R.: A tempering approach for Itakura-Saito non-negative matrix factorization with application to music transcription. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1545–1548. Taipei, Taiwan (2009)Google Scholar
  35. 35.
    Bertin, N., Badeau, R., Vincent, E.: Enforcing harmonicity and smoothness in bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio Speech Lang. Process. 18(3), 538–549 (2010)CrossRefGoogle Scholar
  36. 36.
    Shashanka, M., Raj, B., Smaragdis, P.: Probabilistic latent variable models as nonnegative factorizations. Comput. Intell. Neurosci. (2008)Google Scholar
  37. 37.
    Smaragdis, P., Raj, B., Shashanka, M.: Sparse and shift-invariant feature extraction from non-negative data. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2069–2072. Las Vegas, USA (2008)Google Scholar
  38. 38.
    Mysore, G.J., Smaragdis, P.: Relative pitch estimation of multiple instruments. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 313–316. Washington, USA (2009)Google Scholar
  39. 39.
    Grindlay, G., Ellis, D.P.W.: Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments. IEEE J. Sel. Top. Sig. Process. 5(6), 1159–1169 (2011)CrossRefGoogle Scholar
  40. 40.
    Hennequin, R., Badeau, R., David, B.: Scale-invariant probabilistic latent component analysis. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, USA (2011)Google Scholar
  41. 41.
    Fuentes, B., Badeau, R., Richard, G.: Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In: 36th International Conference on Acoustics, Speech, and Signal Processing, pp. 401–404. Prague, Czech Republic (2011)Google Scholar
  42. 42.
    Benetos, E., Dixon, S.: Multiple-instrument polyphonic music transcription using a convolutive probabilistic model. In: 8th Sound and Music Computing Conference, pp. 19–24. Padova, Italy (2011)Google Scholar
  43. 43.
    Karvanen, J., Cichocki, A.: Measuring sparseness of noisy signals. In: 4th International Symposium on Independent Component Analysis and Blind Signal Separation, pp. 125–130. Nara, Japan (2003)Google Scholar
  44. 44.
    Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)MathSciNetMATHGoogle Scholar
  45. 45.
    Eggert, J., Körner, E.: Sparse coding and NMF. In: IEEE International Joint Conference on Neural Networks, pp. 2529–2533. Budapest, Hungary (2004)Google Scholar
  46. 46.
    Albright, R., Cox, J., Duling, D., Langville, A.N., Meyer, C.D.: Algorithms, Initializations, and Convergence for the Non Negative Matrix Factorization. NC State University, Technical Report (2006)Google Scholar
  47. 47.
    Hoyer, P.O.: Non-negative sparse coding. In: 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 557–565. Martigny, Switzerland (2002)Google Scholar
  48. 48.
    Heiler, M., Schnörr, C.: Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res. 7, 1385–1407 (2006)MathSciNetMATHGoogle Scholar
  49. 49.
    Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)Google Scholar
  50. 50.
    Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural Comput. 19(3), 780–791 (2007)Google Scholar
  51. 51.
    Wang, D., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Wiley-IEEE Press (2006)Google Scholar
  52. 52.
    Klapuri, A., Davy, M.: Signal Processing Methods for Music Transcription. Springer, New York (2006)Google Scholar
  53. 53.
    Bay, M., Ehmann, A.F., Downie, J.S.: Evaluation of multiple-F0 estimation and tracking systems. In: 10th International Society for Music Information Retrieval Conference, pp. 315–320. Kobe, Japan (2009)Google Scholar
  54. 54.
    Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)Google Scholar
  55. 55.
    Yeh, C., Roebel, A., Rodet, X.: Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals. IEEE Trans. Audio Speech Lang. Process. 18(6), 1116–1126 (2010)CrossRefGoogle Scholar
  56. 56.
    Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: popular, classical, and jazz music databases. In: 3rd International Conference on Music Information Retrieval, pp. 287–288. Paris, France (2002)Google Scholar
  57. 57.
    Badeau, R.: Gaussian modeling of mixtures of non-stationary signals in the time-frequency domain (HR-NMF). In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 253–256. New Paltz, USA (2011)Google Scholar
  58. 58.
    Mysore, G., Smaragdis, P., Raj, B.: Non-negative hidden Markov modeling of audio with applications to source separation. In: 9th International Conference on Latent Variable Analysis and, Signal Separation, pp. 140–148 (2010)Google Scholar
  59. 59.
    Nakano, M., Le Roux, J., Kameoka, H., Kitano, Y., Ono, N., Sagayama, S.: Nonnegative matrix factorization with Markov-chained bases for modeling time-varying patterns in music spectrograms. In: 9th International Conference on Latent Variable Analysis and Signal Separation, pp. 149–156 (2010)Google Scholar
  60. 60.
    Benetos, E., Dixon, S.: A temporally-constrained convolutive probabilistic model for pitch detection. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 133–136. New Paltz, USA (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Arnaud Dessein
    • 1
  • Arshia Cont
    • 1
  • Guillaume Lemaitre
    • 1
  1. 1.STMS Lab (IRCAM, CNRS, UPMC, INRIA)ParisFrance

Personalised recommendations