Abstract
Audio signals are typically complex mixtures of different sound sources. The sound sources can be several people talking simultaneously in a room, different instruments playing together, or a speaker talking in the foreground with music being played in the background. The decomposition of a complex sound mixture into its constituent components is one of the central research topics in digital audio signal processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. ABESSER, Automatic Transcription of Bass Guitar Tracks Applied for Music Genre Classification and Sound Synthesis, PhD thesis, Ilmenau University of Technology, Ilmenau, Germany, 2014.
E. BENETOS AND S. DIXON, Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model, The Journal of the Acoustical Society of America (JASA), 133 (2013), pp. 1727–1741.
N. BERTIN, R. BADEAU, AND E. VINCENT, Enforcing harmonicity and smoothness in bayesian non-negative matrix factorization applied to polyphonic music transcription, IEEE Transactions on Audio, Speech, and Language Processing, 18 (2010), pp. 538–549.
E. CANO, C. DITTMAR, AND G. SCHULLER, Efficient implementation of a system for solo and accompaniment separation in polyphonic music, in Proceedings of the European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 2012, pp. 285–289.
G. CHABRIEL, M. KLEINSTEUBER, E. MOREAU, H. SHEN, P. TICHAVSKÝ, AND A. YEREDOR, Joint matrices decompositions and blind source separation: A survey of methods, identification, and applications, IEEE Signal Processing Magazine, 31 (2014), pp. 34–43.
M. G. CHRISTENSEN AND A. JAKOBSSON, Multi-Pitch Estimation, Synthesis Lectures on Speech and Audio Processing, Morgan and Claypool Publishers, 2009.
P. COMON AND C. JUTTEN, Handbook of Blind Source Separation, Independent Component Analysis and Applications, Academic Press, Elsevier, 2010.
A. DE CHEVEIGNE, Multiple F0 estimation, in Computational Auditory Scene Analysis, D. Wang and G. J. Brown, eds., Wiley/IEEE Press, 2006.
C. DITTMAR, E. CANO, J. ABESSER, AND S. GROLLMISCH, Music information retrieval meets music education, in Multimodal Music Processing, M. Müller, M. Goto, and M. Schedl, eds., vol. 3 of Dagstuhl Follow-Ups, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2012, pp. 95–120.
M. DOLSON, The phase vocoder: A tutorial, Computer Music Journal, 10 (1986), pp. 14–27.
K. DRESSLER, Audio melody extraction, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): Late Breaking session, 2010.
J. DRIEDGER, H. GROHGANZ, T. PRÄTZLICH, S. EWERT, AND M. MÜLLER, Scoreinformed audio decomposition and applications, in Proceedings of the ACM International Conference on Multimedia (ACM-MM), Barcelona, Spain, 2013, pp. 541–544.
J. DRIEDGER, M. MÜLLER, AND S. DISCH, Extending harmonic-percussive separation of audio signals, in Proceedings of the International Conference on Music Information Retrieval (ISMIR), Taipei, Taiwan, 2014, pp. 611–616.
J. DRIEDGER, M. MÜLLER, AND S. EWERT, Improving time-scale modification of music signals using harmonic-percussive separation, IEEE Signal Processing Letters, 21 (2014), pp. 105–109.
J.-L. DURRIEU AND G. R. B. DAVID, An iterative approach to monaural musical mixture de-soloing, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009, pp. 105–108.
J.-L. DURRIEU, G. RICHARD, B. DAVID, AND C. FÉVOTTE, Source/filter model for unsupervised main melody extraction from polyphonic audio signals, IEEE Transactions on Audio, Speech, and Language Processing, 18 (2010), pp. 564–575.
C. DUXBURY, M. DAVIES, AND M. SANDLER, Separation of transient information in audio using multiresolution analysis techniques, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Limerick, Ireland, 2001.
______, Improved time-scaling of musical audio using phase locking at transients, in Audio Engineering Society Convention, 2002.
S. EWERT AND M. MÜLLER, Using score-informed constraints for NMF-based source separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012, pp. 129–132.
S. EWERT, B. PARDO, M. MÜLLER, AND M. PLUMBLEY, Score-informed source separation for musical audio recordings, IEEE Signal Processing Magazine, 31 (2014), pp. 116–124.
D. FITZGERALD, Harmonic/percussive separation using median filtering, in Proceedings of the International Conference on Digital Audio Effects (DAFX), Graz, Austria, 2010, pp. 246–253.
D. FITZGERALD, E. COYLE, AND M. CRANITCH, Using tensor factorisation models to separate drums from polyphonic music, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Camo, Italy, September 2009.
D. FITZGERALD AND J. PAULUS, Unpitched percussion transcription, in Signal Processing Methods for Music Transcription, Springer, 2006, pp. 131–162.
J. L. FLANAGAN AND R. M. GOLDEN, Phase vocoder, Bell System Technical Journal, 45 (1966), pp. 1493–1509.
J. FRITSCH AND M. D. PLUMBLEY, Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, 2013, pp. 888–891.
J. GANSEMAN, P. SCHEUNDERS, G. J. MYSORE, AND J. S. ABEL, Source separation by score synthesis, in Proceedings of the International Computer Music Conference (ICMC), New York, USA, 2010, pp. 462–465.
A. GKIOKAS, V. KATSOUROS, G. CARAYANNIS, AND T. STAFYLAKIS, Music tempo estimation and beat tracking by applying source separation and metrical relations, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012, pp. 421–424.
M. GOTO, A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 757–760.
______, A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals, Speech Communication (ISCA Journal), 43 (2004), pp. 311–329.
M. GOTO AND S. HAYAMIZU, A real-time music scene description system: detecting melody and bass lines in audio signals, in Proceedings of the International Workshop on Computational Auditory Scene Analysis, 1999.
D. W. GRIFFIN AND J. S. LIM, Signal estimation from modified short-time Fourier transform, IEEE Transactions on Acoustics, Speech, and Signal Processing, 32 (1984), pp. 236–243.
S. HAINSWORTH AND M. D. MACLEOD, Automatic bass line transcription from polyphonic music, in Proceedings of the International Computer Music Conference (ICMC), Havana, 2001.
M. H. HAYES, Statistical Digital Signal Processing and Modeling, Wiley, 1st ed., 1996.
T. HEITTOLA, A. P. KLAPURI, AND T. VIRTANEN, Musical instrument recognition in polyphonic audio using source-filter model for sound separation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Kobe, Japan, 2009, pp. 327–332.
R. HENNEQUIN, B. DAVID, AND R. BADEAU, Score informed audio source separation using a parametric model of non-negative spectrogram, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 45–48.
J. HERRE, H. PURNHAGEN, J. KOPPENS, O. HELLMUTH, J. ENGDEGÅRD, J. HILPER, L. VILLEMOES, L. TERENTIV, C. FALCH, A. HÖLZER, M. L. VALERO, B. RESCH, H. MUNDT, AND H.-O. OH, MPEG Spatial Audio Object Coding - The ISO/MPEG standard for efficient coding of interactive audio scenes, Journal of the Audio Engineering Society, 60 (2012), pp. 655–673.
W. HESS, Pitch Determination of Speech Signals, Springer-Verlag, Berlin, 1983.
______, Pitch and voicing determination, in Advances in Speech Signal Processing, S. Furui and M. M. Sohndi, eds., Marcel Dekker, New York, 1992, pp. 3–48.
P.-S. HUANG, S. D. CHEN, P. SMARAGDIS, AND M. HASEGAWA-JOHNSON, Singing-voice separation from monaural recordings using robust principal component analysis, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012.
K. ITOYAMA, M. GOTO, K. KOMATANI, T. OGATA, AND H. G. OKUNO, Instrument equalizer for query-by-example retrieval: Improving sound source separation based on integrated harmonic and inharmonic models, in Proceedings of the International Conference for Music Information Retrieval (ISMIR), Philadelphia, USA, 2008, pp. 133–138.
C. JODER AND B. SCHULLER, Score-informed leading voice separation from monaural audio, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, 2012, pp. 277–282.
H. KAMEOKA, T. NISHIMOTO, AND S. SAGAYAMA, A multipitch analyzer based on harmonic temporal structured clustering, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 982–994.
A. P. KLAPURI, Multiple fundamental frequency estimation by summing harmonic amplitudes, in International Society for Music Information Retrieval Conference (ISMIR), 2006, pp. 216–221.
A. P. KLAPURI AND M. DAVY, eds., Signal Processing Methods for Music Transcription, Springer, New York, 2006.
M. LAGRANGE AND S. MARCHAND, Estimating the instantaneous frequency of sinusoidal components using phase-based methods, Journal of the Audio Engineering Society, 55 (2007), pp. 385–399.
J. LE ROUX, H. KAMEOKA, N. ONO, AND S. SAGAYAMA, Fast signal reconstruction from magnitude STFT spectrogram based on spectrogram consistency, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Graz, Austria, 2010, pp. 397–403.
D. D. LEE AND H. S. SEUNG, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999), pp. 788–791.
______, Algorithms for non-negative matrix factorization, in Proceedings of the Neural Information Processing Systems (NIPS), Denver, CO, USA, 2000, pp. 556–562.
A. LEFEVRE, F. BACH, AND C. FÉVOTTE, Semi-supervised NMF with time-frequency annotations for single-channel source separation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, 2012, pp. 115–120.
B. LEHNER, R. SONNLEITNER, AND G. WIDMER, Towards light-weight, real-time-capable singing voice detection, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Curitiba, Brazil, 2013, pp. 53–58.
A. LIUTKUS, D. FITZGERALD, Z. RAFII, B. PARDO, AND L. DAUDET, Kernel additive models for source separation, IEEE Transactions on Signal Processing, 62 (2014), pp. 4298–4310.
M. MÜLLER, D. P. W. ELLIS, A. KLAPURI, AND G. RICHARD, Signal processing for music analysis, IEEE Journal on Selected Topics in Signal Processing, 5 (2011), pp. 1088–1110.
J. NOCEDAL AND S. J. WRIGHT, Numerical Optimization, Springer (Springer Series in Operations Research and Financial Engineering), 2006.
N. ONO, K. MIYAMOTO, H. KAMEOKA, AND S. SAGAYAMA, A real-time equalizer of harmonic and percussive components in music signals, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Philadelphia, Pennsylvania, USA, 2008, pp. 139–144.
N. ONO, K. MIYAMOTO, J. LEROUX, H. KAMEOKA, AND S. SAGAYAMA, Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram, in European Signal Processing Conference, Lausanne, Switzerland, 2008, pp. 240–244.
A. OZEROV, C. FÉVOTTE, R. BLOUET, AND J.-L. DURRIEU, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 257–260.
J. PAULUS, Signal Processing Methods for Drum Transcription and Music Structure Analysis, PhD thesis, Tampere University of Technology, Tampere, Finland, 2009.
N. PERRAUDIN, P. BALAZS, AND P. L. SØNDERGAARD, A fast Griffin-Lim algorithm, in Proceedings of the IEEEWorkshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013, pp. 1–4.
G. E. POLINER, D. P. ELLIS, A. F. EHMANN, E. GÓMEZ, S. STREICH, AND B. ONG, Melody transcription from music audio: Approaches and evaluation, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 1247–1256.
M. PUIGT, E. VINCENT, AND Y. DEVILLE, Validity of the independence assumption for the separation of instantaneous and convolutive mixtures of speech and music sources, in Proceedings of the International Conference on Independent Component Analysis and Signal Separation (ICA), Paraty, Brazil, 2009, pp. 613–620.
S. A. RACZYNSKI, N. ONO, AND S. SAGAYAMA, Multipitch analysis with harmonic nonnegative matrix approximation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2007, pp. 381–386.
M. RYYNÄNEN AND A. P. KLAPURI, Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music Journal, 32 (2008), pp. 72–86.
M. RYYNÄNEN, T. VIRTANEN, J. PAULUS, AND A. KLAPURI, Accompaniment separation and karaoke application based on automatic melody transcription, in IEEE International Conference on Multimedia and Expo, Hannover, Germany, 2008, pp. 1417–1420.
J. SALAMON AND E. GÓMEZ, Melody extraction from polyphonic music signals using pitch contour characteristics, IEEE Transactions on Audio, Speech, and Language Processing, 20 (2012), pp. 1759–1770.
J. SALAMON, E. GÓMEZ, D. P. W. ELLIS, AND G. RICHARD, Melody extraction from polyphonic music signals: Approaches, applications, and challenges, IEEE Signal Processing Magazine, 31 (2014), pp. 118–134.
J. SALAMON, J. SERRÀ, AND E. GÓMEZ, Tonal representations for music retrieval: from version identification to query-by-humming, International Journal of Multimedia Information Retrieval, 2 (2013), pp. 45–58.
M. SHASHANKA, B. RAJ, AND P. SMARAGDIS, Probabilistic latent variable models as nonnegative factorizations, Computational Intelligence and Neuroscience, (2008).
U. SIMSEKLI AND A. T. CEMGIL, Score guided musical source separation using generalized coupled tensor factorization, in Proceedings of the European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 2012, pp. 2639–2643.
P. SMARAGDIS AND J. C. BROWN, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003, pp. 177–180.
P. SMARAGDIS, C. FÉVOTTE, G. J. MYSORE, N. MOHAMMADIHA, AND M. D. HOFFMAN, Static and dynamic source separation using nonnegative factorizations: A unified view, IEEE Signal Processing Magazine, 31 (2014), pp. 66–75.
P. SMARAGDIS AND G. J. MYSORE, Separation by humming: User guided sound extraction from monophonic mixtures, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2009, pp. 69–72.
P. SPRECHMANN, P. CANCELA, AND G. SAPIRO, Gaussian mixture models for scoreinformed instrument separation, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012, pp. 49–52.
Y. UEDA, Y. UCHIYAMA, T. NISHIMOTO, N. ONO, AND S. SAGAYAMA, HMM-based approach for automatic chord detection using refined acoustic features, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, 2010, pp. 5518–5521.
C. UHLE, C. DITTMAR, AND T. SPORER, Extraction of drum tracks from polyphonic music using independent subspace analysis, Proceedings International Symposium on Independent Component Analysis and Blind Signal Separation (ICA), (2003), pp. 843–847.
W. VERHELST AND M. ROELANDS, An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, USA, 1993.
E. VINCENT, N. BERTIN, R. GRIBONVAL, AND F. BIMBOT, From blind to guided audio source separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, 31 (2014), pp. 107–115.
E. VINCENT, M. G. JAFARI, S. A. ABDALLAH, M. D. PLUMBLEY, AND M. E. DAVIES, Probabilistic modeling paradigms for audio source separation, in Machine Audition: Principles, Algorithms and Systems, W. Wang, ed., Hershey: IGI Global, 2010, pp. 162–185.
T. VIRTANEN, Sound Source Separation in Monaural Music Signals, PhD thesis, Tampere University of Technology, 2006.
______, Unsupervised learning methods for source separation in monaural music signals, in Signal Processing Methods for Music Transcription, A. P. Klapuri and M. Davy, eds., Springer, 2006, ch. 6, pp. 267–296.
______, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 1066–1074.
J. WEIL, J.-L. DURRIEU, G. RICHARD, AND T. SIKORA, Automatic generation of lead sheets from polyphonic music signals, in International Society for Music Information Retrieval Conference, Kobe, Japan, 2009, pp. 603–608.
R. J. WEISS AND J. P. BELLO, Unsupervised discovery of temporal structure in music, IEEE Journal of Selected Topics in Signal Processing, 5 (2011), pp. 1240–1251.
G. ZHOU, A. CICHOCKI, Q. ZHAO, AND S. XIE, Nonnegative matrix and tensor factorizations: An algorithmic perspective, IEEE Signal Processing Magazine, 31 (2014), pp. 54–65.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Müller, M. (2015). Musically Informed Audio Decomposition. In: Fundamentals of Music Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21945-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-21945-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21944-8
Online ISBN: 978-3-319-21945-5
eBook Packages: Computer ScienceComputer Science (R0)