Abstract
When musical instruments are recorded in isolation, modern editing and mixing tools allow correction of small errors without requiring a group to re-record an entire passage. Isolated recording also allows rebalancing of levels between musicians without re-recording and application of audio effects to individual instruments. Many of these techniques require (nearly) isolated instrumental recordings to work. Unfortunately, there are many recording situations (e. g., a stereo recording of a 10-piece ensemble) where there are many more instruments than there are microphones, making many editing or remixing tasks difficult or impossible.
Audio source separation is the process of extracting individual sound sources (e. g., a single flute) from a mixture of sounds (e. g., a recording of a concert band using a single microphone). Effective source separation would allow application of editing and remixing techniques to existing recordings with multiple instruments on a single track.
In this chapter we will focus on a pair of source separation approaches designed to work with music audio. The first seeks the repeated elements in the musical scene and separates the repeating from the nonrepeating. The second looks for melodic elements, pitch tracking and streaming the audio into separate elements. Finally, we consider informing source separation with information from the musical score.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAbbreviations
- BPM:
-
beats per minute
- ICA:
-
independent component analysis
- MAP:
-
maximum a posteriori
- MFCC:
-
Mel-frequency cepstral coefficient
- MIDI:
-
musical instrument digital interface
- MIS:
-
University of Iowa musical instrument samples
- MMSE:
-
minimum mean square error
- MPE:
-
multipitch estimation
- NMF:
-
nonnegative matrix factorization
- NTF:
-
nonnegative tensor factorization
- PLCA:
-
probabilistic latent component analysis
- REPET:
-
repeating pattern extraction technique
- RPCA:
-
robust principal component analysis
- STFT:
-
short-term Fourier transform/short-time Fourier transform
- UDC:
-
uniform discrete cepstrum
References
P. Common, C. Jutten (Eds.): Handbook of Blind Source Separation: Independent Component Analysis and Applications, 1st edn. (Academic, Oxford 2010)
T. Virtanen: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
D. FitzGerald, M. Cranitch, E. Coyle: Non-negative tensor factorisation for sound source separation. In: Irish Signals and Syst. Conf., Dublin (2005)
P. Smaragdis, B. Raj, M.V.S. Shashanka: A probabilistic latent variable model for acoustic modeling. In: NIPS Workshop Adv. Modeling Acoust. Process., Whistler (2006)
P.-S. Huang, S.D. Chen, P. Smaragdis: Singing-voice separation from monaural recordings using robust principal component analysis. In: 37th Int. Conf. Acoustics, Speech and Signal Process., Kyoto (2012)
H. Schenker: Harmony, Vol. 1 (Univ. Chicago Press, Chicago 1980)
N. Ruwet, M. Everist: Methods of analysis in musicology, Music Anal. 6(1/2), 3–9 (1987)
A. Ockelford: Repetition in Music: Theoretical and Metatheoretical Perspectives, Royal Musical Association Monographs, Vol. 13, 2005)
M.A. Bartsch: To catch a chorus using chroma-based representations for audio thumbnailing. In: IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz (2001)
M. Cooper, J. Foote: Automatic music summarization via similarity analysis. In: 3rd Int. Conf. Music Inf. Retr., Paris (2002)
G. Peeters: Deriving musical structures from signal analysis for music audio summary generation: Sequence and state approach, Comput. Music Modeling Retr. 2771, 143–166 (2004)
J. Foote: Automatic audio segmentation using a measure of audio novelty. In: IEEE Int. Conf. Multimedia and Expo, New York (2000)
J. Foote, S. Uchihashi: The beat spectrum: A new approach to rhythm analysis. In: IEEE Int. Conf. Multimedia and Expo, Tokyo (2001)
K. Yoshii, M. Goto, H.G. Okuno: Drum sound identification for polyphonic music using template adaptation and matching methods. In: ISCA Tutor. Res. Workshop on Stat. Percept. Audio Process., Jeju (2004)
R.B. Dannenberg: Listening to Naima: An automated structural analysis of music from recorded audio. In: Int. Comput. Music Conf., Gothenburg (2002)
R.B. Dannenberg, M. Goto: Music structure analysis from acoustic signals, Handbook of Signal Process, Acoustics 1, 305–331 (2009)
J. Paulus, M. Müller, A. Klapuri: Audio-based music structure analysis. In: 11th Int. Soc. Music Inf. Retr., Utrecht (2010)
J.H. McDermott, D. Wrobleski, A.J. Oxenham: Recovering sound sources from embedded repetition, Proc. Nat. Acad. Sci. USA 108(3), 1188–1193 (2011)
A. Bregman, C. Jutten: Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, Cambridge 1994)
Interactive Audio Lab of Northwestern University: http://music.eecs.northwestern.edu/research.php?project=repet
Z. Rafii, B. Pardo: A simple music–voice separation system based on the extraction of the repeating musical structure. In: 36th Int. Conf. Acoust. Speech Signal Process., Prague (2011)
Z. Rafii, B. Pardo: REpeating pattern extraction technique (REPET): A simple method for music–voice separation, IEEE Trans. Audio Speech Lang. Process. 21(1), 71–82 (2013)
Z. Rafii, D.L. Sun, F.G. Germain, G.J. Mysore: Combining modeling of singing voice and background music for automatic separation of musical mixtures. In: 14th Int. Soc. Music Inf. Retr., Curitiba (2013)
A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, G. Richard: Adaptive filtering for music–voice separation exploiting the repeating musical structure. In: 37th Int. Conf. Acoustics, Speech and Signal Process., Kyoto (2012)
Z. Rafii, B. Pardo: Music–voice separation using the similarity matrix. In: 13th Int. Soc. Music Inf. Retr., Porto (2012)
J. Foote: Visualizing music and audio using self-similarity. In: 7th ACM Int. Conf. Multimedia, Orlando (1999)
Z. Rafii, B. Pardo: Online REPET-SIM for real-time speech enhancement. In: 38th Int. Conf. Acoust. Speech and Signal Process., Vancouver (2013)
D. FitzGerald: Vocal separation using nearest neighbours and median filtering. In: 23nd IET Irish Signals and Syst. Conf., Maynooth (2012)
Z. Duan, B. Pardo, C. Zhang: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions, IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)
Z. Duan, J. Han, B. Pardo: Multi-pitch streaming of harmonic sound mixtures, IEEE Trans. Audio Speech Lang. Process. 22(1), 1–13 (2014)
G.E. Poliner, D.P.W. Ellis: A discriminative model for polyphonic piano transcription, EURASIP J. Adv. Signal Process. 2007, 48317-1–48317-9 (2007), https://doi.org/10.1155/2007/48317
M. Davy, S.J. Godsill, J. Idier: Bayesian analysis of polyphonic western tonal music, J. Acoustical Soc. Am. 119, 2498–2517 (2006)
E. Vincent, M.D. Plumbley: Efficient Bayesian inference for harmonic models via adaptive posterior factorization, Neurocomputing 72, 79–87 (2008)
K. Kashino, H. Murase: A sound source identification system for ensemble music based on template adaptation and music stream extraction, Speech Commun. 27(3--4), 337–349 (1999)
M. Goto: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals, Speech Commun. 43(4), 311–329 (2004)
H. Kameoka, T. Nishimoto, S. Sagayama: A multipitch analyzer based on harmonic temporal structured clustering, IEEE Trans. Audio Speech Lang. Process. 15(3), 982–994 (2007)
S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto, S. Sagayama: Specmurt analysis of polyphonic music signals, IEEE Trans. Speech Audio Process. 16(3), 639–650 (2008)
J.-L. Durrieu, G. Richard, B. David: Singer melody extraction in polyphonic signals using source separation methods. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2008) pp. 169–172
V. Emiya, R. Badeau, B. David: Multipitch estimation of quasi-harmonic sounds in colored noise. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2007)
G. Reis, N. Fonseca, F. Ferndandez: Genetic algorithm approach to polyphonic music transcription. In: Proc. IEEE Int. Symp. Intell. Signal Process (2007)
T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Process. 8(6), 708–716 (2000)
A. de Cheveigné, H. Kawahara: Multiple period estimation and pitch perception model, Speech Commun. 27, 175–185 (1999)
A. Klapuri: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness, IEEE Trans. Speech Audio Process. 11(6), 804–815 (2003)
A. Klapuri: Multiple fundamental frequency estimation by summing harmonic amplitudes. In: Proc. ISMIR (2006) pp. 216–221
R.J. Leistikow, H.D. Thornburg, J.S. Smith, J. Berger: Bayesian identification of closely-spaced chords from single-frame STFT peaks. In: Proc. Int. Conf. Digital Audio Effects (DAFx’04), Naples (2004) pp. 228–233
A. Pertusa, J.M. Inesta: Multiple fundamental frequency estimation using Gaussian smoothness. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) (2008) pp. 105–108
C. Yeh, A. Röbel, X. Rodet: Multiple fundamental frequency estimation of polyphonic music signals. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP) (2005) pp. 225–228
J.O. Smith: Spectral Audio Signal Processing, http://ccrma.stanford.edu/~jos/sasp/ (2014)
Z. Duan, Y. Zhang, C. Zhang, Z. Shi: Unsupervised single-channel music source separation by average harmonic structure modeling, IEEE Trans. Audio Speech Lang. Process. 16(4), 766–778 (2008)
J.O. Smith, X. Serra: Parshl: An analysis–synthesis program for non-harmonic sounds based on a sinusoidal representation. In: Proc. Int. Comput. Music Conf. (ICMC) (1987)
L. Fritts, University of Iowa: http://theremin.music.uiowa.edu/MIS.html
A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoustical Soc. Am. 111, 1917–1930 (2002)
M. Ryynanen, A. Klapuri: Polyphonic music transcription using note event modeling. In: Proc. IEEE Workshop on Appl. Signal Process. Audio Acoustics (WASPAA) (2005) pp. 319–322
W.-C. Chang, A.W.Y. Su, C. Yeh, A. Robel, X. Rodet: Multiple-F0 tracking based on a high-order HMM model. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2008)
Z. Duan, B. Pardo, L. Daudet: A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2014)
K. Wagstaff, C. Cardie: Clustering with instance-level constraints. In: Proc. Int. Conf. Machine Learning (ICML) (2000) pp. 1103–1110
K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl: Constrained k-means clustering with background knowledge. In: Proc. Int. Conf. Machine Learning (ICML) (2001) pp. 577–584
I. Davidson, S.S. Ravi, M. Ester: Efficient incremental constrained clustering. In: Proc. ACM Conf. Knowl. Discovery and Data Mining (KDD) (2007) pp. 240–249
Z. Duan, B. Pardo: Soundprism: An online system for score-informed source separation of music audio, IEEE J. Selected Topics Signal Process. 5(6), 1205–1215 (2011)
S. Ewert, M. Müller, P. Grosche: High resolution audio synchronization using chroma onset features. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2009) pp. 1869–1872
C. Joder, S. Essid, G. Richard: A conditional random field framework for robust and scalable audio-to-score matching, IEEE Trans. Audio Speech Lang. Process. 19(8), 2385–2397 (2011)
A. Doucet, N. de Freitas, N.J. Gordon (Eds.): Sequential Monte Carlo Methods in Practice (Springer, New York 2001)
M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp: A tutorial on particle filters for online nonlinear–non-Gaussian Bayesian tracking, IEEE Trans. Signal Process. 50(2), 174–188 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pardo, B., Rafii, Z., Duan, Z. (2018). Audio Source Separation in a Musical Context. In: Bader, R. (eds) Springer Handbook of Systematic Musicology. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55004-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-662-55004-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-55002-1
Online ISBN: 978-3-662-55004-5
eBook Packages: EngineeringEngineering (R0)