Musically Informed Audio Decomposition

Müller, Meinard

doi:10.1007/978-3-319-21945-5_8

Meinard Müller²

9390 Accesses
1 Citations

Abstract

Audio signals are typically complex mixtures of different sound sources. The sound sources can be several people talking simultaneously in a room, different instruments playing together, or a speaker talking in the foreground with music being played in the background. The decomposition of a complex sound mixture into its constituent components is one of the central research topics in digital audio signal processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. ABESSER, Automatic Transcription of Bass Guitar Tracks Applied for Music Genre Classification and Sound Synthesis, PhD thesis, Ilmenau University of Technology, Ilmenau, Germany, 2014.
Google Scholar
E. BENETOS AND S. DIXON, Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model, The Journal of the Acoustical Society of America (JASA), 133 (2013), pp. 1727–1741.
Google Scholar
N. BERTIN, R. BADEAU, AND E. VINCENT, Enforcing harmonicity and smoothness in bayesian non-negative matrix factorization applied to polyphonic music transcription, IEEE Transactions on Audio, Speech, and Language Processing, 18 (2010), pp. 538–549.
Google Scholar
E. CANO, C. DITTMAR, AND G. SCHULLER, Efficient implementation of a system for solo and accompaniment separation in polyphonic music, in Proceedings of the European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 2012, pp. 285–289.
Google Scholar
G. CHABRIEL, M. KLEINSTEUBER, E. MOREAU, H. SHEN, P. TICHAVSKÝ, AND A. YEREDOR, Joint matrices decompositions and blind source separation: A survey of methods, identification, and applications, IEEE Signal Processing Magazine, 31 (2014), pp. 34–43.
Google Scholar
M. G. CHRISTENSEN AND A. JAKOBSSON, Multi-Pitch Estimation, Synthesis Lectures on Speech and Audio Processing, Morgan and Claypool Publishers, 2009.
Google Scholar
P. COMON AND C. JUTTEN, Handbook of Blind Source Separation, Independent Component Analysis and Applications, Academic Press, Elsevier, 2010.
Google Scholar
A. DE CHEVEIGNE, Multiple F0 estimation, in Computational Auditory Scene Analysis, D. Wang and G. J. Brown, eds., Wiley/IEEE Press, 2006.
Google Scholar
C. DITTMAR, E. CANO, J. ABESSER, AND S. GROLLMISCH, Music information retrieval meets music education, in Multimodal Music Processing, M. Müller, M. Goto, and M. Schedl, eds., vol. 3 of Dagstuhl Follow-Ups, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2012, pp. 95–120.
Google Scholar
M. DOLSON, The phase vocoder: A tutorial, Computer Music Journal, 10 (1986), pp. 14–27.
Google Scholar
K. DRESSLER, Audio melody extraction, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): Late Breaking session, 2010.
Google Scholar
J. DRIEDGER, H. GROHGANZ, T. PRÄTZLICH, S. EWERT, AND M. MÜLLER, Scoreinformed audio decomposition and applications, in Proceedings of the ACM International Conference on Multimedia (ACM-MM), Barcelona, Spain, 2013, pp. 541–544.
Google Scholar
J. DRIEDGER, M. MÜLLER, AND S. DISCH, Extending harmonic-percussive separation of audio signals, in Proceedings of the International Conference on Music Information Retrieval (ISMIR), Taipei, Taiwan, 2014, pp. 611–616.
Google Scholar
J. DRIEDGER, M. MÜLLER, AND S. EWERT, Improving time-scale modification of music signals using harmonic-percussive separation, IEEE Signal Processing Letters, 21 (2014), pp. 105–109.
Google Scholar
J.-L. DURRIEU AND G. R. B. DAVID, An iterative approach to monaural musical mixture de-soloing, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009, pp. 105–108.
Google Scholar
J.-L. DURRIEU, G. RICHARD, B. DAVID, AND C. FÉVOTTE, Source/filter model for unsupervised main melody extraction from polyphonic audio signals, IEEE Transactions on Audio, Speech, and Language Processing, 18 (2010), pp. 564–575.
Google Scholar
C. DUXBURY, M. DAVIES, AND M. SANDLER, Separation of transient information in audio using multiresolution analysis techniques, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Limerick, Ireland, 2001.
Google Scholar
______, Improved time-scaling of musical audio using phase locking at transients, in Audio Engineering Society Convention, 2002.
Google Scholar
S. EWERT AND M. MÜLLER, Using score-informed constraints for NMF-based source separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012, pp. 129–132.
Google Scholar
S. EWERT, B. PARDO, M. MÜLLER, AND M. PLUMBLEY, Score-informed source separation for musical audio recordings, IEEE Signal Processing Magazine, 31 (2014), pp. 116–124.
Google Scholar
D. FITZGERALD, Harmonic/percussive separation using median filtering, in Proceedings of the International Conference on Digital Audio Effects (DAFX), Graz, Austria, 2010, pp. 246–253.
Google Scholar
D. FITZGERALD, E. COYLE, AND M. CRANITCH, Using tensor factorisation models to separate drums from polyphonic music, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Camo, Italy, September 2009.
Google Scholar
D. FITZGERALD AND J. PAULUS, Unpitched percussion transcription, in Signal Processing Methods for Music Transcription, Springer, 2006, pp. 131–162.
Google Scholar
J. L. FLANAGAN AND R. M. GOLDEN, Phase vocoder, Bell System Technical Journal, 45 (1966), pp. 1493–1509.
Google Scholar
J. FRITSCH AND M. D. PLUMBLEY, Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, 2013, pp. 888–891.
Google Scholar
J. GANSEMAN, P. SCHEUNDERS, G. J. MYSORE, AND J. S. ABEL, Source separation by score synthesis, in Proceedings of the International Computer Music Conference (ICMC), New York, USA, 2010, pp. 462–465.
Google Scholar
A. GKIOKAS, V. KATSOUROS, G. CARAYANNIS, AND T. STAFYLAKIS, Music tempo estimation and beat tracking by applying source separation and metrical relations, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012, pp. 421–424.
Google Scholar
M. GOTO, A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 757–760.
Google Scholar
______, A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals, Speech Communication (ISCA Journal), 43 (2004), pp. 311–329.
Google Scholar
M. GOTO AND S. HAYAMIZU, A real-time music scene description system: detecting melody and bass lines in audio signals, in Proceedings of the International Workshop on Computational Auditory Scene Analysis, 1999.
Google Scholar
D. W. GRIFFIN AND J. S. LIM, Signal estimation from modified short-time Fourier transform, IEEE Transactions on Acoustics, Speech, and Signal Processing, 32 (1984), pp. 236–243.
Google Scholar
S. HAINSWORTH AND M. D. MACLEOD, Automatic bass line transcription from polyphonic music, in Proceedings of the International Computer Music Conference (ICMC), Havana, 2001.
Google Scholar
M. H. HAYES, Statistical Digital Signal Processing and Modeling, Wiley, 1st ed., 1996.
Google Scholar
T. HEITTOLA, A. P. KLAPURI, AND T. VIRTANEN, Musical instrument recognition in polyphonic audio using source-filter model for sound separation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Kobe, Japan, 2009, pp. 327–332.
Google Scholar
R. HENNEQUIN, B. DAVID, AND R. BADEAU, Score informed audio source separation using a parametric model of non-negative spectrogram, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 45–48.
Google Scholar
J. HERRE, H. PURNHAGEN, J. KOPPENS, O. HELLMUTH, J. ENGDEGÅRD, J. HILPER, L. VILLEMOES, L. TERENTIV, C. FALCH, A. HÖLZER, M. L. VALERO, B. RESCH, H. MUNDT, AND H.-O. OH, MPEG Spatial Audio Object Coding - The ISO/MPEG standard for efficient coding of interactive audio scenes, Journal of the Audio Engineering Society, 60 (2012), pp. 655–673.
Google Scholar
W. HESS, Pitch Determination of Speech Signals, Springer-Verlag, Berlin, 1983.
Google Scholar
______, Pitch and voicing determination, in Advances in Speech Signal Processing, S. Furui and M. M. Sohndi, eds., Marcel Dekker, New York, 1992, pp. 3–48.
Google Scholar
P.-S. HUANG, S. D. CHEN, P. SMARAGDIS, AND M. HASEGAWA-JOHNSON, Singing-voice separation from monaural recordings using robust principal component analysis, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012.
Google Scholar
K. ITOYAMA, M. GOTO, K. KOMATANI, T. OGATA, AND H. G. OKUNO, Instrument equalizer for query-by-example retrieval: Improving sound source separation based on integrated harmonic and inharmonic models, in Proceedings of the International Conference for Music Information Retrieval (ISMIR), Philadelphia, USA, 2008, pp. 133–138.
Google Scholar
C. JODER AND B. SCHULLER, Score-informed leading voice separation from monaural audio, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, 2012, pp. 277–282.
Google Scholar
H. KAMEOKA, T. NISHIMOTO, AND S. SAGAYAMA, A multipitch analyzer based on harmonic temporal structured clustering, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 982–994.
Google Scholar
A. P. KLAPURI, Multiple fundamental frequency estimation by summing harmonic amplitudes, in International Society for Music Information Retrieval Conference (ISMIR), 2006, pp. 216–221.
Google Scholar
A. P. KLAPURI AND M. DAVY, eds., Signal Processing Methods for Music Transcription, Springer, New York, 2006.
Google Scholar
M. LAGRANGE AND S. MARCHAND, Estimating the instantaneous frequency of sinusoidal components using phase-based methods, Journal of the Audio Engineering Society, 55 (2007), pp. 385–399.
Google Scholar
J. LE ROUX, H. KAMEOKA, N. ONO, AND S. SAGAYAMA, Fast signal reconstruction from magnitude STFT spectrogram based on spectrogram consistency, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Graz, Austria, 2010, pp. 397–403.
Google Scholar
D. D. LEE AND H. S. SEUNG, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999), pp. 788–791.
Google Scholar
______, Algorithms for non-negative matrix factorization, in Proceedings of the Neural Information Processing Systems (NIPS), Denver, CO, USA, 2000, pp. 556–562.
Google Scholar
A. LEFEVRE, F. BACH, AND C. FÉVOTTE, Semi-supervised NMF with time-frequency annotations for single-channel source separation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, 2012, pp. 115–120.
Google Scholar
B. LEHNER, R. SONNLEITNER, AND G. WIDMER, Towards light-weight, real-time-capable singing voice detection, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Curitiba, Brazil, 2013, pp. 53–58.
Google Scholar
A. LIUTKUS, D. FITZGERALD, Z. RAFII, B. PARDO, AND L. DAUDET, Kernel additive models for source separation, IEEE Transactions on Signal Processing, 62 (2014), pp. 4298–4310.
Google Scholar
M. MÜLLER, D. P. W. ELLIS, A. KLAPURI, AND G. RICHARD, Signal processing for music analysis, IEEE Journal on Selected Topics in Signal Processing, 5 (2011), pp. 1088–1110.
Google Scholar
J. NOCEDAL AND S. J. WRIGHT, Numerical Optimization, Springer (Springer Series in Operations Research and Financial Engineering), 2006.
Google Scholar
N. ONO, K. MIYAMOTO, H. KAMEOKA, AND S. SAGAYAMA, A real-time equalizer of harmonic and percussive components in music signals, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Philadelphia, Pennsylvania, USA, 2008, pp. 139–144.
Google Scholar
N. ONO, K. MIYAMOTO, J. LEROUX, H. KAMEOKA, AND S. SAGAYAMA, Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram, in European Signal Processing Conference, Lausanne, Switzerland, 2008, pp. 240–244.
Google Scholar
A. OZEROV, C. FÉVOTTE, R. BLOUET, AND J.-L. DURRIEU, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 257–260.
Google Scholar
J. PAULUS, Signal Processing Methods for Drum Transcription and Music Structure Analysis, PhD thesis, Tampere University of Technology, Tampere, Finland, 2009.
Google Scholar
N. PERRAUDIN, P. BALAZS, AND P. L. SØNDERGAARD, A fast Griffin-Lim algorithm, in Proceedings of the IEEEWorkshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013, pp. 1–4.
Google Scholar
G. E. POLINER, D. P. ELLIS, A. F. EHMANN, E. GÓMEZ, S. STREICH, AND B. ONG, Melody transcription from music audio: Approaches and evaluation, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 1247–1256.
Google Scholar
M. PUIGT, E. VINCENT, AND Y. DEVILLE, Validity of the independence assumption for the separation of instantaneous and convolutive mixtures of speech and music sources, in Proceedings of the International Conference on Independent Component Analysis and Signal Separation (ICA), Paraty, Brazil, 2009, pp. 613–620.
Google Scholar
S. A. RACZYNSKI, N. ONO, AND S. SAGAYAMA, Multipitch analysis with harmonic nonnegative matrix approximation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2007, pp. 381–386.
Google Scholar
M. RYYNÄNEN AND A. P. KLAPURI, Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music Journal, 32 (2008), pp. 72–86.
Google Scholar
M. RYYNÄNEN, T. VIRTANEN, J. PAULUS, AND A. KLAPURI, Accompaniment separation and karaoke application based on automatic melody transcription, in IEEE International Conference on Multimedia and Expo, Hannover, Germany, 2008, pp. 1417–1420.
Google Scholar
J. SALAMON AND E. GÓMEZ, Melody extraction from polyphonic music signals using pitch contour characteristics, IEEE Transactions on Audio, Speech, and Language Processing, 20 (2012), pp. 1759–1770.
Google Scholar
J. SALAMON, E. GÓMEZ, D. P. W. ELLIS, AND G. RICHARD, Melody extraction from polyphonic music signals: Approaches, applications, and challenges, IEEE Signal Processing Magazine, 31 (2014), pp. 118–134.
Google Scholar
J. SALAMON, J. SERRÀ, AND E. GÓMEZ, Tonal representations for music retrieval: from version identification to query-by-humming, International Journal of Multimedia Information Retrieval, 2 (2013), pp. 45–58.
Google Scholar
M. SHASHANKA, B. RAJ, AND P. SMARAGDIS, Probabilistic latent variable models as nonnegative factorizations, Computational Intelligence and Neuroscience, (2008).
Google Scholar
U. SIMSEKLI AND A. T. CEMGIL, Score guided musical source separation using generalized coupled tensor factorization, in Proceedings of the European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 2012, pp. 2639–2643.
Google Scholar
P. SMARAGDIS AND J. C. BROWN, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003, pp. 177–180.
Google Scholar
P. SMARAGDIS, C. FÉVOTTE, G. J. MYSORE, N. MOHAMMADIHA, AND M. D. HOFFMAN, Static and dynamic source separation using nonnegative factorizations: A unified view, IEEE Signal Processing Magazine, 31 (2014), pp. 66–75.
Google Scholar
P. SMARAGDIS AND G. J. MYSORE, Separation by humming: User guided sound extraction from monophonic mixtures, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2009, pp. 69–72.
Google Scholar
P. SPRECHMANN, P. CANCELA, AND G. SAPIRO, Gaussian mixture models for scoreinformed instrument separation, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012, pp. 49–52.
Google Scholar
Y. UEDA, Y. UCHIYAMA, T. NISHIMOTO, N. ONO, AND S. SAGAYAMA, HMM-based approach for automatic chord detection using refined acoustic features, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, 2010, pp. 5518–5521.
Google Scholar
C. UHLE, C. DITTMAR, AND T. SPORER, Extraction of drum tracks from polyphonic music using independent subspace analysis, Proceedings International Symposium on Independent Component Analysis and Blind Signal Separation (ICA), (2003), pp. 843–847.
Google Scholar
W. VERHELST AND M. ROELANDS, An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, USA, 1993.
Google Scholar
E. VINCENT, N. BERTIN, R. GRIBONVAL, AND F. BIMBOT, From blind to guided audio source separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, 31 (2014), pp. 107–115.
Google Scholar
E. VINCENT, M. G. JAFARI, S. A. ABDALLAH, M. D. PLUMBLEY, AND M. E. DAVIES, Probabilistic modeling paradigms for audio source separation, in Machine Audition: Principles, Algorithms and Systems, W. Wang, ed., Hershey: IGI Global, 2010, pp. 162–185.
Google Scholar
T. VIRTANEN, Sound Source Separation in Monaural Music Signals, PhD thesis, Tampere University of Technology, 2006.
Google Scholar
______, Unsupervised learning methods for source separation in monaural music signals, in Signal Processing Methods for Music Transcription, A. P. Klapuri and M. Davy, eds., Springer, 2006, ch. 6, pp. 267–296.
Google Scholar
______, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 1066–1074.
Google Scholar
J. WEIL, J.-L. DURRIEU, G. RICHARD, AND T. SIKORA, Automatic generation of lead sheets from polyphonic music signals, in International Society for Music Information Retrieval Conference, Kobe, Japan, 2009, pp. 603–608.
Google Scholar
R. J. WEISS AND J. P. BELLO, Unsupervised discovery of temporal structure in music, IEEE Journal of Selected Topics in Signal Processing, 5 (2011), pp. 1240–1251.
Google Scholar
G. ZHOU, A. CICHOCKI, Q. ZHAO, AND S. XIE, Nonnegative matrix and tensor factorizations: An algorithmic perspective, IEEE Signal Processing Magazine, 31 (2014), pp. 54–65.
Google Scholar

Download references

Author information

Authors and Affiliations

International Audio Laboratories Erlangen, Erlangen, Germany
Meinard Müller

Authors

Meinard Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meinard Müller .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Müller, M. (2015). Musically Informed Audio Decomposition. In: Fundamentals of Music Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21945-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-21945-5_8
Published: 22 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21944-8
Online ISBN: 978-3-319-21945-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics