Abstract
As seen so far, non-negative models can be quite powerful when it comes to resolving mixtures of sounds. However, in such models we often ignore temporal information, instead focusing on resolving each incoming spectrum independently. In this chapter we will present some methods that learn to incorporate the temporal aspects of sounds and use that information to perform improved separation. We will show three such models, a conlvolutive model that learns fixed temporal features, a hidden Markov model that learns state transitions and can incorporate language information, and finally a continuous dynamical model that learns how sounds evolve over time and is able to resolve cases where static information is not enough.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this experiment, we have used \(M=1\), \(\beta _{\text {speech}}=0.5\), \(\beta _{\text {noise}}=0.2\) for filtering, and \(\beta _{\text {speech}}=0.9\), \(\beta _{\text {noise}}=0.6\) for smoothing.
References
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)
P. Smaragdis, B. Raj, Shift-invariant probabilistic latent component analysis. Technical Report TR2007-009 (Mitsubishi Electric Research Labs, 2007)
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
A. Ozerov, C. Févotte, M. Charbit, Factorial scaled hidden Markov model for polyphonic audio representation and source separation in Proceedings of IEEE Workshop Applications of Signal Processings Audio Acoustics (WASPAA) (2009) pp. 121–124
G.J. Mysore, P. Smaragdis, B. Raj, Non-negative hidden Markov modeling of audio with application to source separation in Proceedings of the International Conference Latent Variable Analysis and Signal Separation (LVA/ICA) (2010) pp. 140–148
G.J. Mysore P. Smaragdis, A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics in Proceedings of the IEEE International Conference Acoustics, Speech, and Signal Processing (ICASSP) (2011)
M. Nakano, J.L. Roux, H. Kameoka, Y. Kitano, N. Ono, S. Sagayama, Nonnegative matrix factorization with Markov-chained bases for modeling time-varying patterns in music spectrograms in Proceedings of the International Conference, Latent Variable Analysis and Signal Separation (LVA/ICA) (2010)
N. Mohammadiha, A. Leijon, Nonnegative hmm for babble noise derived from speech hmm: application to speech enhancement. IEEE Trans. Audio Speech Lang. Process. 21(5), 998–1011 (2013)
G.J. Mysore, P. Smaragdis, A non-negative approach to language informed speech separation in Proceedings of the International Conference Latent Variable Analysis and Signal Separation (LVA/ICA) (2012)
M. Nakano, J. Le Roux, H. Kameoka, T. Nakamura, N. Ono, S. Sagayama, Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model in Proceedings of IEEE Workshop Applications of Signal Processing Audio Acoustics (WASPAA) (2011)
G.J. Mysore, M. Sahani, Variational inference in non-negative factorial hidden Markov models for efficient audio source separation in Proceedings of the International Conference, Machine Learning (ICML) (2012)
G.J. Mysore, A block sparsity approach to multiple dictionary learning for audio modeling in Proceedings of the International Conference, Machine Learning (ICML) (2012)
G.J. Mysore, A non-negative framework for joint modeling of spectral structure and temporal dynamics in sound mixtures. Ph.D. Dissertation, Stanford University, 2010
N. Mohammadiha, P. Smaragdis, A. Leijon, Prediction based filtering and smoothing to exploit temporal dependencies in NMF in Proceedings of IEEE International Conference Acoustics, Speech, and Signal Processing (ICASSP) (2013) pp. 873–877
N. Mohammadiha, P. Smaragdis, G. Panahandeh, S. Doclo, A state-space approach to dynamic nonnegative matrix factorization. IEEE Trans. Signal Process. 63(4), 949–959 (2015)
J.D. Hamilton, Time Series Analysis (Princeton University Press, New Jersey, 1994)
E. Vincent, R. Gribonval, C. Févotte, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
ITU-T. P.862, Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assesment of narrowband telephone networks and speech codecs. Technical Report (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Smaragdis, P., Mysore, G., Mohammadiha, N. (2018). Dynamic Non-negative Models for Audio Source Separation. In: Makino, S. (eds) Audio Source Separation. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-73031-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-73031-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73030-1
Online ISBN: 978-3-319-73031-8
eBook Packages: EngineeringEngineering (R0)