Audio Source Separation in a Musical Context

Pardo, Bryan; Rafii, Zafar; Duan, Zhiyao

doi:10.1007/978-3-662-55004-5_15

Audio Source Separation in a Musical Context

Bryan Pardo²,
Zafar Rafii³ &
Zhiyao Duan⁴

Chapter

5364 Accesses
5 Citations

Part of the book series: Springer Handbooks ((SHB))

Abstract

When musical instruments are recorded in isolation, modern editing and mixing tools allow correction of small errors without requiring a group to re-record an entire passage. Isolated recording also allows rebalancing of levels between musicians without re-recording and application of audio effects to individual instruments. Many of these techniques require (nearly) isolated instrumental recordings to work. Unfortunately, there are many recording situations (e. g., a stereo recording of a 10-piece ensemble) where there are many more instruments than there are microphones, making many editing or remixing tasks difficult or impossible.

Audio source separation is the process of extracting individual sound sources (e. g., a single flute) from a mixture of sounds (e. g., a recording of a concert band using a single microphone). Effective source separation would allow application of editing and remixing techniques to existing recordings with multiple instruments on a single track.

In this chapter we will focus on a pair of source separation approaches designed to work with music audio. The first seeks the repeated elements in the musical scene and separates the repeating from the nonrepeating. The second looks for melodic elements, pitch tracking and streaming the audio into separate elements. Finally, we consider informing source separation with information from the musical score.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 269.00; Price excludes VAT (USA)

Hardcover Book: USD 349.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

BPM:: beats per minute
ICA:: independent component analysis
MAP:: maximum a posteriori
MFCC:: Mel-frequency cepstral coefficient
MIDI:: musical instrument digital interface
MIS:: University of Iowa musical instrument samples
MMSE:: minimum mean square error
MPE:: multipitch estimation
NMF:: nonnegative matrix factorization
NTF:: nonnegative tensor factorization
PLCA:: probabilistic latent component analysis
REPET:: repeating pattern extraction technique
RPCA:: robust principal component analysis
STFT:: short-term Fourier transform/short-time Fourier transform
UDC:: uniform discrete cepstrum

References

P. Common, C. Jutten (Eds.): Handbook of Blind Source Separation: Independent Component Analysis and Applications, 1st edn. (Academic, Oxford 2010)
Google Scholar
T. Virtanen: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Article Google Scholar
D. FitzGerald, M. Cranitch, E. Coyle: Non-negative tensor factorisation for sound source separation. In: Irish Signals and Syst. Conf., Dublin (2005)
Google Scholar
P. Smaragdis, B. Raj, M.V.S. Shashanka: A probabilistic latent variable model for acoustic modeling. In: NIPS Workshop Adv. Modeling Acoust. Process., Whistler (2006)
Google Scholar
P.-S. Huang, S.D. Chen, P. Smaragdis: Singing-voice separation from monaural recordings using robust principal component analysis. In: 37th Int. Conf. Acoustics, Speech and Signal Process., Kyoto (2012)
Google Scholar
H. Schenker: Harmony, Vol. 1 (Univ. Chicago Press, Chicago 1980)
Google Scholar
N. Ruwet, M. Everist: Methods of analysis in musicology, Music Anal. 6(1/2), 3–9 (1987)
Article Google Scholar
A. Ockelford: Repetition in Music: Theoretical and Metatheoretical Perspectives, Royal Musical Association Monographs, Vol. 13, 2005)
Google Scholar
M.A. Bartsch: To catch a chorus using chroma-based representations for audio thumbnailing. In: IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz (2001)
Google Scholar
M. Cooper, J. Foote: Automatic music summarization via similarity analysis. In: 3rd Int. Conf. Music Inf. Retr., Paris (2002)
Google Scholar
G. Peeters: Deriving musical structures from signal analysis for music audio summary generation: Sequence and state approach, Comput. Music Modeling Retr. 2771, 143–166 (2004)
Article Google Scholar
J. Foote: Automatic audio segmentation using a measure of audio novelty. In: IEEE Int. Conf. Multimedia and Expo, New York (2000)
Google Scholar
J. Foote, S. Uchihashi: The beat spectrum: A new approach to rhythm analysis. In: IEEE Int. Conf. Multimedia and Expo, Tokyo (2001)
Google Scholar
K. Yoshii, M. Goto, H.G. Okuno: Drum sound identification for polyphonic music using template adaptation and matching methods. In: ISCA Tutor. Res. Workshop on Stat. Percept. Audio Process., Jeju (2004)
Google Scholar
R.B. Dannenberg: Listening to Naima: An automated structural analysis of music from recorded audio. In: Int. Comput. Music Conf., Gothenburg (2002)
Google Scholar
R.B. Dannenberg, M. Goto: Music structure analysis from acoustic signals, Handbook of Signal Process, Acoustics 1, 305–331 (2009)
Google Scholar
J. Paulus, M. Müller, A. Klapuri: Audio-based music structure analysis. In: 11th Int. Soc. Music Inf. Retr., Utrecht (2010)
Google Scholar
J.H. McDermott, D. Wrobleski, A.J. Oxenham: Recovering sound sources from embedded repetition, Proc. Nat. Acad. Sci. USA 108(3), 1188–1193 (2011)
Article Google Scholar
A. Bregman, C. Jutten: Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, Cambridge 1994)
Google Scholar
Interactive Audio Lab of Northwestern University: http://music.eecs.northwestern.edu/research.php?project=repet
Z. Rafii, B. Pardo: A simple music–voice separation system based on the extraction of the repeating musical structure. In: 36th Int. Conf. Acoust. Speech Signal Process., Prague (2011)
Google Scholar
Z. Rafii, B. Pardo: REpeating pattern extraction technique (REPET): A simple method for music–voice separation, IEEE Trans. Audio Speech Lang. Process. 21(1), 71–82 (2013)
Article Google Scholar
Z. Rafii, D.L. Sun, F.G. Germain, G.J. Mysore: Combining modeling of singing voice and background music for automatic separation of musical mixtures. In: 14th Int. Soc. Music Inf. Retr., Curitiba (2013)
Google Scholar
A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, G. Richard: Adaptive filtering for music–voice separation exploiting the repeating musical structure. In: 37th Int. Conf. Acoustics, Speech and Signal Process., Kyoto (2012)
Google Scholar
Z. Rafii, B. Pardo: Music–voice separation using the similarity matrix. In: 13th Int. Soc. Music Inf. Retr., Porto (2012)
Google Scholar
J. Foote: Visualizing music and audio using self-similarity. In: 7th ACM Int. Conf. Multimedia, Orlando (1999)
Google Scholar
Z. Rafii, B. Pardo: Online REPET-SIM for real-time speech enhancement. In: 38th Int. Conf. Acoust. Speech and Signal Process., Vancouver (2013)
Google Scholar
D. FitzGerald: Vocal separation using nearest neighbours and median filtering. In: 23nd IET Irish Signals and Syst. Conf., Maynooth (2012)
Google Scholar
Z. Duan, B. Pardo, C. Zhang: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions, IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)
Article Google Scholar
Z. Duan, J. Han, B. Pardo: Multi-pitch streaming of harmonic sound mixtures, IEEE Trans. Audio Speech Lang. Process. 22(1), 1–13 (2014)
Article Google Scholar
G.E. Poliner, D.P.W. Ellis: A discriminative model for polyphonic piano transcription, EURASIP J. Adv. Signal Process. 2007, 48317-1–48317-9 (2007), https://doi.org/10.1155/2007/48317
Article MATH Google Scholar
M. Davy, S.J. Godsill, J. Idier: Bayesian analysis of polyphonic western tonal music, J. Acoustical Soc. Am. 119, 2498–2517 (2006)
Article Google Scholar
E. Vincent, M.D. Plumbley: Efficient Bayesian inference for harmonic models via adaptive posterior factorization, Neurocomputing 72, 79–87 (2008)
Article Google Scholar
K. Kashino, H. Murase: A sound source identification system for ensemble music based on template adaptation and music stream extraction, Speech Commun. 27(3--4), 337–349 (1999)
Article Google Scholar
M. Goto: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals, Speech Commun. 43(4), 311–329 (2004)
Article Google Scholar
H. Kameoka, T. Nishimoto, S. Sagayama: A multipitch analyzer based on harmonic temporal structured clustering, IEEE Trans. Audio Speech Lang. Process. 15(3), 982–994 (2007)
Article Google Scholar
S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto, S. Sagayama: Specmurt analysis of polyphonic music signals, IEEE Trans. Speech Audio Process. 16(3), 639–650 (2008)
Article Google Scholar
J.-L. Durrieu, G. Richard, B. David: Singer melody extraction in polyphonic signals using source separation methods. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2008) pp. 169–172
Google Scholar
V. Emiya, R. Badeau, B. David: Multipitch estimation of quasi-harmonic sounds in colored noise. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2007)
Google Scholar
G. Reis, N. Fonseca, F. Ferndandez: Genetic algorithm approach to polyphonic music transcription. In: Proc. IEEE Int. Symp. Intell. Signal Process (2007)
Google Scholar
T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Process. 8(6), 708–716 (2000)
Article Google Scholar
A. de Cheveigné, H. Kawahara: Multiple period estimation and pitch perception model, Speech Commun. 27, 175–185 (1999)
Article Google Scholar
A. Klapuri: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness, IEEE Trans. Speech Audio Process. 11(6), 804–815 (2003)
Article Google Scholar
A. Klapuri: Multiple fundamental frequency estimation by summing harmonic amplitudes. In: Proc. ISMIR (2006) pp. 216–221
Google Scholar
R.J. Leistikow, H.D. Thornburg, J.S. Smith, J. Berger: Bayesian identification of closely-spaced chords from single-frame STFT peaks. In: Proc. Int. Conf. Digital Audio Effects (DAFx’04), Naples (2004) pp. 228–233
Google Scholar
A. Pertusa, J.M. Inesta: Multiple fundamental frequency estimation using Gaussian smoothness. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) (2008) pp. 105–108
Google Scholar
C. Yeh, A. Röbel, X. Rodet: Multiple fundamental frequency estimation of polyphonic music signals. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP) (2005) pp. 225–228
Google Scholar
J.O. Smith: Spectral Audio Signal Processing, http://ccrma.stanford.edu/~jos/sasp/ (2014)
Z. Duan, Y. Zhang, C. Zhang, Z. Shi: Unsupervised single-channel music source separation by average harmonic structure modeling, IEEE Trans. Audio Speech Lang. Process. 16(4), 766–778 (2008)
Article Google Scholar
J.O. Smith, X. Serra: Parshl: An analysis–synthesis program for non-harmonic sounds based on a sinusoidal representation. In: Proc. Int. Comput. Music Conf. (ICMC) (1987)
Google Scholar
L. Fritts, University of Iowa: http://theremin.music.uiowa.edu/MIS.html
A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoustical Soc. Am. 111, 1917–1930 (2002)
Article Google Scholar
M. Ryynanen, A. Klapuri: Polyphonic music transcription using note event modeling. In: Proc. IEEE Workshop on Appl. Signal Process. Audio Acoustics (WASPAA) (2005) pp. 319–322
Google Scholar
W.-C. Chang, A.W.Y. Su, C. Yeh, A. Robel, X. Rodet: Multiple-F0 tracking based on a high-order HMM model. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2008)
Google Scholar
Z. Duan, B. Pardo, L. Daudet: A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2014)
Google Scholar
K. Wagstaff, C. Cardie: Clustering with instance-level constraints. In: Proc. Int. Conf. Machine Learning (ICML) (2000) pp. 1103–1110
Google Scholar
K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl: Constrained k-means clustering with background knowledge. In: Proc. Int. Conf. Machine Learning (ICML) (2001) pp. 577–584
Google Scholar
I. Davidson, S.S. Ravi, M. Ester: Efficient incremental constrained clustering. In: Proc. ACM Conf. Knowl. Discovery and Data Mining (KDD) (2007) pp. 240–249
Google Scholar
Z. Duan, B. Pardo: Soundprism: An online system for score-informed source separation of music audio, IEEE J. Selected Topics Signal Process. 5(6), 1205–1215 (2011)
Article Google Scholar
S. Ewert, M. Müller, P. Grosche: High resolution audio synchronization using chroma onset features. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2009) pp. 1869–1872
Google Scholar
C. Joder, S. Essid, G. Richard: A conditional random field framework for robust and scalable audio-to-score matching, IEEE Trans. Audio Speech Lang. Process. 19(8), 2385–2397 (2011)
Article Google Scholar
A. Doucet, N. de Freitas, N.J. Gordon (Eds.): Sequential Monte Carlo Methods in Practice (Springer, New York 2001)
MATH Google Scholar
M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp: A tutorial on particle filters for online nonlinear–non-Gaussian Bayesian tracking, IEEE Trans. Signal Process. 50(2), 174–188 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Ford Engineering Design Center, Northwestern University, 2133 Sheridan Rd., IL 60208, Evanston, USA
Bryan Pardo
Gracenote, 2000 Powell St., Ste 1500, 94608, Emeryville, USA
Zafar Rafii
Dept. of Electrical and Computer Engineering, University of Rochester, 308 Hopeman, NY 14627, Rochester, USA
Zhiyao Duan

Authors

Bryan Pardo
View author publications
You can also search for this author in PubMed Google Scholar
Zafar Rafii
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyao Duan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Systematic Musicology, University of Hamburg, Neue Rabenstr. 13, 20354, Hamburg, Germany
Rolf Bader

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pardo, B., Rafii, Z., Duan, Z. (2018). Audio Source Separation in a Musical Context. In: Bader, R. (eds) Springer Handbook of Systematic Musicology. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55004-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-662-55004-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-55002-1
Online ISBN: 978-3-662-55004-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics