Abstract
In this paper, we investigate the problem of real-time detection of overlapping sound events by employing non-negative matrix factorization techniques. We consider a setup where audio streams arrive in real-time to the system and are decomposed onto a dictionary of event templates learned off-line prior to the decomposition. An important drawback of existing approaches in this context is the lack of controls on the decomposition. We propose and compare two provably convergent algorithms that address this issue, by controlling respectively the sparsity of the decomposition and the trade-off of the decomposition between the different frequency components. Sparsity regularization is considered in the framework of convex quadratic programming, while frequency compromise is introduced by employing the beta-divergence as a cost function. The two algorithms are evaluated on the multi-source detection tasks of polyphonic music transcription, drum transcription and environmental sound recognition. The obtained results show how the proposed approaches can improve detection in such applications, while maintaining low computational costs that are suitable for real-time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Additional material including sound files described in the paper are available on a companion website: http://imtr.ircam.fr/imtr/Real-Time_Multi-Source_Detection.
- 2.
Recent work presented in [21] may however prove a posteriori the cost monotonicity for certain heuristic multiplicative updates with sparsity penalty.
- 3.
Results in [21] may again prove a posteriori the cost monotonicity for certain heuristic multiplicative updates employed in the literature.
- 4.
The results of the 2010 MIREX evaluation for multiple fundamental frequency estimation and tracking are available on-line: http://www.music-ir.org/mirex/wiki/2010:Multiple_Fundamental_Frequency_Estimation_%26_Tracking_Results.
References
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. 5(2), 111–126 (1994)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 13, pp. 556–562. MIT Press, Cambridge, (2001)
Sha, F., Saul, L.K.: Real-time pitch determination of one or more voices by nonnegative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1233–1240. MIT Press, Cambridge, (2005)
Cheng, C.-C., Hu, D.J., Saul, L.K.: Nonnegative matrix factorization for real time musical analysis and sight-reading evaluation. In: 33rd IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2017–2020. Las Vegas, USA (2008)
Paulus, J., Virtanen, T.: Drum transcription with non-negative spectrogram factorisation. In: 13th European Signal Processing Conference, Antalya, Turkey (2005)
Niedermayer, B.: Non-negative matrix division for the automatic transcription of polyphonic music. In: 9th International Conference on Music Information Retrieval, pp. 544–549. Philadelphia, USA (2008)
Cont, A.: Realtime multiple pitch observation using sparse non-negative constraints. In: 7th International Conference on Music Information Retrieval, Victoria, Canada (2006)
Cont, A., Dubnov, S., Wessel, D.: Realtime multiple-pitch and multiple-instrument recognition for music signals using sparse non-negative constraints. In: 10th International Conference on Digital Audio Effects, Bordeaux, France (2007)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with quadratic programming. Neurocomputing 71(10–12), 2309–2320 (2008)
Sha, F., Lin, Y., Saul, L.K., Lee, D.D.: Multiplicative updates for nonnegative quadratic programming. Neural Comput. 19(8), 2004–2031 (2007)
Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika 85(3), 549–559 (1998)
Eguchi, S., Kano, Y.: Robustifying Maximum Likelihood Estimation. Technical Report, Institute of Statistical Mathematics, Tokyo, Japan (2001)
O’Grady, P.D., Pearlmutter, B.A.: Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1–3), 88–101 (2008)
FitzGerald, D., Cranitch, M., Coyle, E.: On the use of the beta divergence for musical source separation. In: 20th IET Irish Signals and Systems Conference, Galway, Ireland (2009)
Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)
Hennequin, R., Badeau, R., David, B.: Time-dependent parametric and harmonic templates in non-negative matrix factorization. In: 13th International Conference On Digital Audio Effects, pp. 246–253. Graz, Austria (2010)
Hennequin, R., Badeau, R., David, B.: NMF with time-frequency activations to model nonstationary audio events. IEEE Trans. Audio Speech Lang. Process. 19(4), 744–753 (2011)
Nakano, M., Kameoka, H., Le Roux, J., Kitano, Y., Ono, N., Sagayama, S.: Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with \(\beta \)-divergence. In: IEEE International Workshop on Machine Learning for Signal Processing, pp. 283–288. Kittilä, Finland (2010)
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Badeau, R., Bertin, N., Vincent, E.: Stability analysis of multiplicative update algorithms and application to nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(12), 1869–1881 (2010)
Dessein, A., Cont, A., Lemaitre, G.: Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In: 11th International Society for Music Information Retrieval Conference, pp. 489–494. Utrecht, Netherlands (2010)
Berry, M.W., Browne, M., Langville, A., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Analysis. 52(1), 155–173 (2007)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.-i.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley-Blackwell (2009)
Abdallah, S.A., Plumbley, M.D.: Polyphonic music transcription by non-negative sparse coding of power spectra. In: 5th International Conference on Music Information Retrieval, pp. 318–325. Barcelona, Spain (2004)
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180. New Paltz, USA (2003)
Virtanen, T., Klapuri, A.: Analysis of polyphonic audio using source-filter model and non-negative matrix factorization. In: Neural Information Processing Systems Workshop on Advances in Models for Acoustic Processing, (2006)
Raczyński, S.A., Ono, N., Sagayama, S.: Multipitch analysis with harmonic nonnegative matrix approximation. In: 8th International Conference on Music Information Retrieval, pp. 381–386. Vienna, Austria (2007)
Marolt, M.: Non-negative matrix factorization with selective sparsity constraints for transcription of bell chiming recordings. In: 6th Sound and Music Computing Conference, pp. 137–142. Porto, Portugal (2009)
Grindlay, G., Ellis, D.P.W.: Multi-voice polyphonic music transcription using eigeninstruments. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, USA (2009)
Févotte, C., Bertin, N., Durrieu, J.-L.: Nonnegative matrix factorization with the Itakura-Saito divergence with application to music analysis. Neural Comput. 21(3), 793–830 (2009)
Févotte, C.: Itakura-Saito nonnegative factorizations of the power spectrogram for music signal decomposition. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems, pp. 266–296. IGI Global Press (2010)
Bertin, N., Févotte, C., Badeau, R.: A tempering approach for Itakura-Saito non-negative matrix factorization with application to music transcription. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1545–1548. Taipei, Taiwan (2009)
Bertin, N., Badeau, R., Vincent, E.: Enforcing harmonicity and smoothness in bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio Speech Lang. Process. 18(3), 538–549 (2010)
Shashanka, M., Raj, B., Smaragdis, P.: Probabilistic latent variable models as nonnegative factorizations. Comput. Intell. Neurosci. (2008)
Smaragdis, P., Raj, B., Shashanka, M.: Sparse and shift-invariant feature extraction from non-negative data. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2069–2072. Las Vegas, USA (2008)
Mysore, G.J., Smaragdis, P.: Relative pitch estimation of multiple instruments. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 313–316. Washington, USA (2009)
Grindlay, G., Ellis, D.P.W.: Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments. IEEE J. Sel. Top. Sig. Process. 5(6), 1159–1169 (2011)
Hennequin, R., Badeau, R., David, B.: Scale-invariant probabilistic latent component analysis. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, USA (2011)
Fuentes, B., Badeau, R., Richard, G.: Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In: 36th International Conference on Acoustics, Speech, and Signal Processing, pp. 401–404. Prague, Czech Republic (2011)
Benetos, E., Dixon, S.: Multiple-instrument polyphonic music transcription using a convolutive probabilistic model. In: 8th Sound and Music Computing Conference, pp. 19–24. Padova, Italy (2011)
Karvanen, J., Cichocki, A.: Measuring sparseness of noisy signals. In: 4th International Symposium on Independent Component Analysis and Blind Signal Separation, pp. 125–130. Nara, Japan (2003)
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)
Eggert, J., Körner, E.: Sparse coding and NMF. In: IEEE International Joint Conference on Neural Networks, pp. 2529–2533. Budapest, Hungary (2004)
Albright, R., Cox, J., Duling, D., Langville, A.N., Meyer, C.D.: Algorithms, Initializations, and Convergence for the Non Negative Matrix Factorization. NC State University, Technical Report (2006)
Hoyer, P.O.: Non-negative sparse coding. In: 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 557–565. Martigny, Switzerland (2002)
Heiler, M., Schnörr, C.: Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res. 7, 1385–1407 (2006)
Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural Comput. 19(3), 780–791 (2007)
Wang, D., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Wiley-IEEE Press (2006)
Klapuri, A., Davy, M.: Signal Processing Methods for Music Transcription. Springer, New York (2006)
Bay, M., Ehmann, A.F., Downie, J.S.: Evaluation of multiple-F0 estimation and tracking systems. In: 10th International Society for Music Information Retrieval Conference, pp. 315–320. Kobe, Japan (2009)
Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)
Yeh, C., Roebel, A., Rodet, X.: Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals. IEEE Trans. Audio Speech Lang. Process. 18(6), 1116–1126 (2010)
Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: popular, classical, and jazz music databases. In: 3rd International Conference on Music Information Retrieval, pp. 287–288. Paris, France (2002)
Badeau, R.: Gaussian modeling of mixtures of non-stationary signals in the time-frequency domain (HR-NMF). In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 253–256. New Paltz, USA (2011)
Mysore, G., Smaragdis, P., Raj, B.: Non-negative hidden Markov modeling of audio with applications to source separation. In: 9th International Conference on Latent Variable Analysis and, Signal Separation, pp. 140–148 (2010)
Nakano, M., Le Roux, J., Kameoka, H., Kitano, Y., Ono, N., Sagayama, S.: Nonnegative matrix factorization with Markov-chained bases for modeling time-varying patterns in music spectrograms. In: 9th International Conference on Latent Variable Analysis and Signal Separation, pp. 149–156 (2010)
Benetos, E., Dixon, S.: A temporally-constrained convolutive probabilistic model for pitch detection. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 133–136. New Paltz, USA (2011)
Acknowledgments
This work was partially funded by a doctoral fellowship from the UPMC (EDITE). The authors would like to thank Chunghsin Yeh and Roland Badeau for their valuable help, Emmanouil Benetos for his helpful comments on the paper, Valentin Emiya for kindly providing the MAPS database, as well as Patrick Hoyer and Emmanuel Vincent for sharing their source code.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dessein, A., Cont, A., Lemaitre, G. (2013). Real-Time Detection of Overlapping Sound Events with Non-Negative Matrix Factorization. In: Nielsen, F., Bhatia, R. (eds) Matrix Information Geometry. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30232-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-30232-9_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30231-2
Online ISBN: 978-3-642-30232-9
eBook Packages: EngineeringEngineering (R0)