Skip to main content

Audio Source Separation in a Musical Context

  • Chapter

Part of the book series: Springer Handbooks ((SHB))

Abstract

When musical instruments are recorded in isolation, modern editing and mixing tools allow correction of small errors without requiring a group to re-record an entire passage. Isolated recording also allows rebalancing of levels between musicians without re-recording and application of audio effects to individual instruments. Many of these techniques require (nearly) isolated instrumental recordings to work. Unfortunately, there are many recording situations (e. g., a stereo recording of a 10-piece ensemble) where there are many more instruments than there are microphones, making many editing or remixing tasks difficult or impossible.

Audio source separation is the process of extracting individual sound sources (e. g., a single flute) from a mixture of sounds (e. g., a recording of a concert band using a single microphone). Effective source separation would allow application of editing and remixing techniques to existing recordings with multiple instruments on a single track.

In this chapter we will focus on a pair of source separation approaches designed to work with music audio. The first seeks the repeated elements in the musical scene and separates the repeating from the nonrepeating. The second looks for melodic elements, pitch tracking and streaming the audio into separate elements. Finally, we consider informing source separation with information from the musical score.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   269.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   349.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

BPM:

beats per minute

ICA:

independent component analysis

MAP:

maximum a posteriori

MFCC:

Mel-frequency cepstral coefficient

MIDI:

musical instrument digital interface

MIS:

University of Iowa musical instrument samples

MMSE:

minimum mean square error

MPE:

multipitch estimation

NMF:

nonnegative matrix factorization

NTF:

nonnegative tensor factorization

PLCA:

probabilistic latent component analysis

REPET:

repeating pattern extraction technique

RPCA:

robust principal component analysis

STFT:

short-term Fourier transform/short-time Fourier transform

UDC:

uniform discrete cepstrum

References

  1. P. Common, C. Jutten (Eds.): Handbook of Blind Source Separation: Independent Component Analysis and Applications, 1st edn. (Academic, Oxford 2010)

    Google Scholar 

  2. T. Virtanen: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)

    Article  Google Scholar 

  3. D. FitzGerald, M. Cranitch, E. Coyle: Non-negative tensor factorisation for sound source separation. In: Irish Signals and Syst. Conf., Dublin (2005)

    Google Scholar 

  4. P. Smaragdis, B. Raj, M.V.S. Shashanka: A probabilistic latent variable model for acoustic modeling. In: NIPS Workshop Adv. Modeling Acoust. Process., Whistler (2006)

    Google Scholar 

  5. P.-S. Huang, S.D. Chen, P. Smaragdis: Singing-voice separation from monaural recordings using robust principal component analysis. In: 37th Int. Conf. Acoustics, Speech and Signal Process., Kyoto (2012)

    Google Scholar 

  6. H. Schenker: Harmony, Vol. 1 (Univ. Chicago Press, Chicago 1980)

    Google Scholar 

  7. N. Ruwet, M. Everist: Methods of analysis in musicology, Music Anal. 6(1/2), 3–9 (1987)

    Article  Google Scholar 

  8. A. Ockelford: Repetition in Music: Theoretical and Metatheoretical Perspectives, Royal Musical Association Monographs, Vol. 13, 2005)

    Google Scholar 

  9. M.A. Bartsch: To catch a chorus using chroma-based representations for audio thumbnailing. In: IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz (2001)

    Google Scholar 

  10. M. Cooper, J. Foote: Automatic music summarization via similarity analysis. In: 3rd Int. Conf. Music Inf. Retr., Paris (2002)

    Google Scholar 

  11. G. Peeters: Deriving musical structures from signal analysis for music audio summary generation: Sequence and state approach, Comput. Music Modeling Retr. 2771, 143–166 (2004)

    Article  Google Scholar 

  12. J. Foote: Automatic audio segmentation using a measure of audio novelty. In: IEEE Int. Conf. Multimedia and Expo, New York (2000)

    Google Scholar 

  13. J. Foote, S. Uchihashi: The beat spectrum: A new approach to rhythm analysis. In: IEEE Int. Conf. Multimedia and Expo, Tokyo (2001)

    Google Scholar 

  14. K. Yoshii, M. Goto, H.G. Okuno: Drum sound identification for polyphonic music using template adaptation and matching methods. In: ISCA Tutor. Res. Workshop on Stat. Percept. Audio Process., Jeju (2004)

    Google Scholar 

  15. R.B. Dannenberg: Listening to Naima: An automated structural analysis of music from recorded audio. In: Int. Comput. Music Conf., Gothenburg (2002)

    Google Scholar 

  16. R.B. Dannenberg, M. Goto: Music structure analysis from acoustic signals, Handbook of Signal Process, Acoustics 1, 305–331 (2009)

    Google Scholar 

  17. J. Paulus, M. Müller, A. Klapuri: Audio-based music structure analysis. In: 11th Int. Soc. Music Inf. Retr., Utrecht (2010)

    Google Scholar 

  18. J.H. McDermott, D. Wrobleski, A.J. Oxenham: Recovering sound sources from embedded repetition, Proc. Nat. Acad. Sci. USA 108(3), 1188–1193 (2011)

    Article  Google Scholar 

  19. A. Bregman, C. Jutten: Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, Cambridge 1994)

    Google Scholar 

  20. Interactive Audio Lab of Northwestern University: http://music.eecs.northwestern.edu/research.php?project=repet

  21. Z. Rafii, B. Pardo: A simple music–voice separation system based on the extraction of the repeating musical structure. In: 36th Int. Conf. Acoust. Speech Signal Process., Prague (2011)

    Google Scholar 

  22. Z. Rafii, B. Pardo: REpeating pattern extraction technique (REPET): A simple method for music–voice separation, IEEE Trans. Audio Speech Lang. Process. 21(1), 71–82 (2013)

    Article  Google Scholar 

  23. Z. Rafii, D.L. Sun, F.G. Germain, G.J. Mysore: Combining modeling of singing voice and background music for automatic separation of musical mixtures. In: 14th Int. Soc. Music Inf. Retr., Curitiba (2013)

    Google Scholar 

  24. A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, G. Richard: Adaptive filtering for music–voice separation exploiting the repeating musical structure. In: 37th Int. Conf. Acoustics, Speech and Signal Process., Kyoto (2012)

    Google Scholar 

  25. Z. Rafii, B. Pardo: Music–voice separation using the similarity matrix. In: 13th Int. Soc. Music Inf. Retr., Porto (2012)

    Google Scholar 

  26. J. Foote: Visualizing music and audio using self-similarity. In: 7th ACM Int. Conf. Multimedia, Orlando (1999)

    Google Scholar 

  27. Z. Rafii, B. Pardo: Online REPET-SIM for real-time speech enhancement. In: 38th Int. Conf. Acoust. Speech and Signal Process., Vancouver (2013)

    Google Scholar 

  28. D. FitzGerald: Vocal separation using nearest neighbours and median filtering. In: 23nd IET Irish Signals and Syst. Conf., Maynooth (2012)

    Google Scholar 

  29. Z. Duan, B. Pardo, C. Zhang: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions, IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)

    Article  Google Scholar 

  30. Z. Duan, J. Han, B. Pardo: Multi-pitch streaming of harmonic sound mixtures, IEEE Trans. Audio Speech Lang. Process. 22(1), 1–13 (2014)

    Article  Google Scholar 

  31. G.E. Poliner, D.P.W. Ellis: A discriminative model for polyphonic piano transcription, EURASIP J. Adv. Signal Process. 2007, 48317-1–48317-9 (2007), https://doi.org/10.1155/2007/48317

    Article  MATH  Google Scholar 

  32. M. Davy, S.J. Godsill, J. Idier: Bayesian analysis of polyphonic western tonal music, J. Acoustical Soc. Am. 119, 2498–2517 (2006)

    Article  Google Scholar 

  33. E. Vincent, M.D. Plumbley: Efficient Bayesian inference for harmonic models via adaptive posterior factorization, Neurocomputing 72, 79–87 (2008)

    Article  Google Scholar 

  34. K. Kashino, H. Murase: A sound source identification system for ensemble music based on template adaptation and music stream extraction, Speech Commun. 27(3--4), 337–349 (1999)

    Article  Google Scholar 

  35. M. Goto: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals, Speech Commun. 43(4), 311–329 (2004)

    Article  Google Scholar 

  36. H. Kameoka, T. Nishimoto, S. Sagayama: A multipitch analyzer based on harmonic temporal structured clustering, IEEE Trans. Audio Speech Lang. Process. 15(3), 982–994 (2007)

    Article  Google Scholar 

  37. S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto, S. Sagayama: Specmurt analysis of polyphonic music signals, IEEE Trans. Speech Audio Process. 16(3), 639–650 (2008)

    Article  Google Scholar 

  38. J.-L. Durrieu, G. Richard, B. David: Singer melody extraction in polyphonic signals using source separation methods. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2008) pp. 169–172

    Google Scholar 

  39. V. Emiya, R. Badeau, B. David: Multipitch estimation of quasi-harmonic sounds in colored noise. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2007)

    Google Scholar 

  40. G. Reis, N. Fonseca, F. Ferndandez: Genetic algorithm approach to polyphonic music transcription. In: Proc. IEEE Int. Symp. Intell. Signal Process (2007)

    Google Scholar 

  41. T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Process. 8(6), 708–716 (2000)

    Article  Google Scholar 

  42. A. de Cheveigné, H. Kawahara: Multiple period estimation and pitch perception model, Speech Commun. 27, 175–185 (1999)

    Article  Google Scholar 

  43. A. Klapuri: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness, IEEE Trans. Speech Audio Process. 11(6), 804–815 (2003)

    Article  Google Scholar 

  44. A. Klapuri: Multiple fundamental frequency estimation by summing harmonic amplitudes. In: Proc. ISMIR (2006) pp. 216–221

    Google Scholar 

  45. R.J. Leistikow, H.D. Thornburg, J.S. Smith, J. Berger: Bayesian identification of closely-spaced chords from single-frame STFT peaks. In: Proc. Int. Conf. Digital Audio Effects (DAFx’04), Naples (2004) pp. 228–233

    Google Scholar 

  46. A. Pertusa, J.M. Inesta: Multiple fundamental frequency estimation using Gaussian smoothness. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) (2008) pp. 105–108

    Google Scholar 

  47. C. Yeh, A. Röbel, X. Rodet: Multiple fundamental frequency estimation of polyphonic music signals. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP) (2005) pp. 225–228

    Google Scholar 

  48. J.O. Smith: Spectral Audio Signal Processing, http://ccrma.stanford.edu/~jos/sasp/ (2014)

  49. Z. Duan, Y. Zhang, C. Zhang, Z. Shi: Unsupervised single-channel music source separation by average harmonic structure modeling, IEEE Trans. Audio Speech Lang. Process. 16(4), 766–778 (2008)

    Article  Google Scholar 

  50. J.O. Smith, X. Serra: Parshl: An analysis–synthesis program for non-harmonic sounds based on a sinusoidal representation. In: Proc. Int. Comput. Music Conf. (ICMC) (1987)

    Google Scholar 

  51. L. Fritts, University of Iowa: http://theremin.music.uiowa.edu/MIS.html

  52. A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoustical Soc. Am. 111, 1917–1930 (2002)

    Article  Google Scholar 

  53. M. Ryynanen, A. Klapuri: Polyphonic music transcription using note event modeling. In: Proc. IEEE Workshop on Appl. Signal Process. Audio Acoustics (WASPAA) (2005) pp. 319–322

    Google Scholar 

  54. W.-C. Chang, A.W.Y. Su, C. Yeh, A. Robel, X. Rodet: Multiple-F0 tracking based on a high-order HMM model. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2008)

    Google Scholar 

  55. Z. Duan, B. Pardo, L. Daudet: A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2014)

    Google Scholar 

  56. K. Wagstaff, C. Cardie: Clustering with instance-level constraints. In: Proc. Int. Conf. Machine Learning (ICML) (2000) pp. 1103–1110

    Google Scholar 

  57. K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl: Constrained k-means clustering with background knowledge. In: Proc. Int. Conf. Machine Learning (ICML) (2001) pp. 577–584

    Google Scholar 

  58. I. Davidson, S.S. Ravi, M. Ester: Efficient incremental constrained clustering. In: Proc. ACM Conf. Knowl. Discovery and Data Mining (KDD) (2007) pp. 240–249

    Google Scholar 

  59. Z. Duan, B. Pardo: Soundprism: An online system for score-informed source separation of music audio, IEEE J. Selected Topics Signal Process. 5(6), 1205–1215 (2011)

    Article  Google Scholar 

  60. S. Ewert, M. Müller, P. Grosche: High resolution audio synchronization using chroma onset features. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2009) pp. 1869–1872

    Google Scholar 

  61. C. Joder, S. Essid, G. Richard: A conditional random field framework for robust and scalable audio-to-score matching, IEEE Trans. Audio Speech Lang. Process. 19(8), 2385–2397 (2011)

    Article  Google Scholar 

  62. A. Doucet, N. de Freitas, N.J. Gordon (Eds.): Sequential Monte Carlo Methods in Practice (Springer, New York 2001)

    MATH  Google Scholar 

  63. M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp: A tutorial on particle filters for online nonlinear–non-Gaussian Bayesian tracking, IEEE Trans. Signal Process. 50(2), 174–188 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pardo, B., Rafii, Z., Duan, Z. (2018). Audio Source Separation in a Musical Context. In: Bader, R. (eds) Springer Handbook of Systematic Musicology. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-55004-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-55004-5_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-55002-1

  • Online ISBN: 978-3-662-55004-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics