Skip to main content

Musically Informed Audio Decomposition

Abstract

Audio signals are typically complex mixtures of different sound sources. The sound sources can be several people talking simultaneously in a room, different instruments playing together, or a speaker talking in the foreground with music being played in the background. The decomposition of a complex sound mixture into its constituent components is one of the central research topics in digital audio signal processing.

Keywords

  • Instantaneous Frequency
  • Audio Signal
  • Source Separation
  • Nonnegative Matrix Factorization
  • Music Information Retrieval

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-21945-5_8
  • Chapter length: 66 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-21945-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. ABESSER, Automatic Transcription of Bass Guitar Tracks Applied for Music Genre Classification and Sound Synthesis, PhD thesis, Ilmenau University of Technology, Ilmenau, Germany, 2014.

    Google Scholar 

  2. E. BENETOS AND S. DIXON, Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model, The Journal of the Acoustical Society of America (JASA), 133 (2013), pp. 1727–1741.

    Google Scholar 

  3. N. BERTIN, R. BADEAU, AND E. VINCENT, Enforcing harmonicity and smoothness in bayesian non-negative matrix factorization applied to polyphonic music transcription, IEEE Transactions on Audio, Speech, and Language Processing, 18 (2010), pp. 538–549.

    Google Scholar 

  4. E. CANO, C. DITTMAR, AND G. SCHULLER, Efficient implementation of a system for solo and accompaniment separation in polyphonic music, in Proceedings of the European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 2012, pp. 285–289.

    Google Scholar 

  5. G. CHABRIEL, M. KLEINSTEUBER, E. MOREAU, H. SHEN, P. TICHAVSKÝ, AND A. YEREDOR, Joint matrices decompositions and blind source separation: A survey of methods, identification, and applications, IEEE Signal Processing Magazine, 31 (2014), pp. 34–43.

    Google Scholar 

  6. M. G. CHRISTENSEN AND A. JAKOBSSON, Multi-Pitch Estimation, Synthesis Lectures on Speech and Audio Processing, Morgan and Claypool Publishers, 2009.

    Google Scholar 

  7. P. COMON AND C. JUTTEN, Handbook of Blind Source Separation, Independent Component Analysis and Applications, Academic Press, Elsevier, 2010.

    Google Scholar 

  8. A. DE CHEVEIGNE, Multiple F0 estimation, in Computational Auditory Scene Analysis, D. Wang and G. J. Brown, eds., Wiley/IEEE Press, 2006.

    Google Scholar 

  9. C. DITTMAR, E. CANO, J. ABESSER, AND S. GROLLMISCH, Music information retrieval meets music education, in Multimodal Music Processing, M. Müller, M. Goto, and M. Schedl, eds., vol. 3 of Dagstuhl Follow-Ups, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2012, pp. 95–120.

    Google Scholar 

  10. M. DOLSON, The phase vocoder: A tutorial, Computer Music Journal, 10 (1986), pp. 14–27.

    Google Scholar 

  11. K. DRESSLER, Audio melody extraction, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): Late Breaking session, 2010.

    Google Scholar 

  12. J. DRIEDGER, H. GROHGANZ, T. PRÄTZLICH, S. EWERT, AND M. MÜLLER, Scoreinformed audio decomposition and applications, in Proceedings of the ACM International Conference on Multimedia (ACM-MM), Barcelona, Spain, 2013, pp. 541–544.

    Google Scholar 

  13. J. DRIEDGER, M. MÜLLER, AND S. DISCH, Extending harmonic-percussive separation of audio signals, in Proceedings of the International Conference on Music Information Retrieval (ISMIR), Taipei, Taiwan, 2014, pp. 611–616.

    Google Scholar 

  14. J. DRIEDGER, M. MÜLLER, AND S. EWERT, Improving time-scale modification of music signals using harmonic-percussive separation, IEEE Signal Processing Letters, 21 (2014), pp. 105–109.

    Google Scholar 

  15. J.-L. DURRIEU AND G. R. B. DAVID, An iterative approach to monaural musical mixture de-soloing, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009, pp. 105–108.

    Google Scholar 

  16. J.-L. DURRIEU, G. RICHARD, B. DAVID, AND C. FÉVOTTE, Source/filter model for unsupervised main melody extraction from polyphonic audio signals, IEEE Transactions on Audio, Speech, and Language Processing, 18 (2010), pp. 564–575.

    Google Scholar 

  17. C. DUXBURY, M. DAVIES, AND M. SANDLER, Separation of transient information in audio using multiresolution analysis techniques, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Limerick, Ireland, 2001.

    Google Scholar 

  18. ______, Improved time-scaling of musical audio using phase locking at transients, in Audio Engineering Society Convention, 2002.

    Google Scholar 

  19. S. EWERT AND M. MÜLLER, Using score-informed constraints for NMF-based source separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012, pp. 129–132.

    Google Scholar 

  20. S. EWERT, B. PARDO, M. MÜLLER, AND M. PLUMBLEY, Score-informed source separation for musical audio recordings, IEEE Signal Processing Magazine, 31 (2014), pp. 116–124.

    Google Scholar 

  21. D. FITZGERALD, Harmonic/percussive separation using median filtering, in Proceedings of the International Conference on Digital Audio Effects (DAFX), Graz, Austria, 2010, pp. 246–253.

    Google Scholar 

  22. D. FITZGERALD, E. COYLE, AND M. CRANITCH, Using tensor factorisation models to separate drums from polyphonic music, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Camo, Italy, September 2009.

    Google Scholar 

  23. D. FITZGERALD AND J. PAULUS, Unpitched percussion transcription, in Signal Processing Methods for Music Transcription, Springer, 2006, pp. 131–162.

    Google Scholar 

  24. J. L. FLANAGAN AND R. M. GOLDEN, Phase vocoder, Bell System Technical Journal, 45 (1966), pp. 1493–1509.

    Google Scholar 

  25. J. FRITSCH AND M. D. PLUMBLEY, Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, 2013, pp. 888–891.

    Google Scholar 

  26. J. GANSEMAN, P. SCHEUNDERS, G. J. MYSORE, AND J. S. ABEL, Source separation by score synthesis, in Proceedings of the International Computer Music Conference (ICMC), New York, USA, 2010, pp. 462–465.

    Google Scholar 

  27. A. GKIOKAS, V. KATSOUROS, G. CARAYANNIS, AND T. STAFYLAKIS, Music tempo estimation and beat tracking by applying source separation and metrical relations, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012, pp. 421–424.

    Google Scholar 

  28. M. GOTO, A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 757–760.

    Google Scholar 

  29. ______, A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals, Speech Communication (ISCA Journal), 43 (2004), pp. 311–329.

    Google Scholar 

  30. M. GOTO AND S. HAYAMIZU, A real-time music scene description system: detecting melody and bass lines in audio signals, in Proceedings of the International Workshop on Computational Auditory Scene Analysis, 1999.

    Google Scholar 

  31. D. W. GRIFFIN AND J. S. LIM, Signal estimation from modified short-time Fourier transform, IEEE Transactions on Acoustics, Speech, and Signal Processing, 32 (1984), pp. 236–243.

    Google Scholar 

  32. S. HAINSWORTH AND M. D. MACLEOD, Automatic bass line transcription from polyphonic music, in Proceedings of the International Computer Music Conference (ICMC), Havana, 2001.

    Google Scholar 

  33. M. H. HAYES, Statistical Digital Signal Processing and Modeling, Wiley, 1st ed., 1996.

    Google Scholar 

  34. T. HEITTOLA, A. P. KLAPURI, AND T. VIRTANEN, Musical instrument recognition in polyphonic audio using source-filter model for sound separation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Kobe, Japan, 2009, pp. 327–332.

    Google Scholar 

  35. R. HENNEQUIN, B. DAVID, AND R. BADEAU, Score informed audio source separation using a parametric model of non-negative spectrogram, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 45–48.

    Google Scholar 

  36. J. HERRE, H. PURNHAGEN, J. KOPPENS, O. HELLMUTH, J. ENGDEGÅRD, J. HILPER, L. VILLEMOES, L. TERENTIV, C. FALCH, A. HÖLZER, M. L. VALERO, B. RESCH, H. MUNDT, AND H.-O. OH, MPEG Spatial Audio Object Coding - The ISO/MPEG standard for efficient coding of interactive audio scenes, Journal of the Audio Engineering Society, 60 (2012), pp. 655–673.

    Google Scholar 

  37. W. HESS, Pitch Determination of Speech Signals, Springer-Verlag, Berlin, 1983.

    Google Scholar 

  38. ______, Pitch and voicing determination, in Advances in Speech Signal Processing, S. Furui and M. M. Sohndi, eds., Marcel Dekker, New York, 1992, pp. 3–48.

    Google Scholar 

  39. P.-S. HUANG, S. D. CHEN, P. SMARAGDIS, AND M. HASEGAWA-JOHNSON, Singing-voice separation from monaural recordings using robust principal component analysis, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012.

    Google Scholar 

  40. K. ITOYAMA, M. GOTO, K. KOMATANI, T. OGATA, AND H. G. OKUNO, Instrument equalizer for query-by-example retrieval: Improving sound source separation based on integrated harmonic and inharmonic models, in Proceedings of the International Conference for Music Information Retrieval (ISMIR), Philadelphia, USA, 2008, pp. 133–138.

    Google Scholar 

  41. C. JODER AND B. SCHULLER, Score-informed leading voice separation from monaural audio, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, 2012, pp. 277–282.

    Google Scholar 

  42. H. KAMEOKA, T. NISHIMOTO, AND S. SAGAYAMA, A multipitch analyzer based on harmonic temporal structured clustering, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 982–994.

    Google Scholar 

  43. A. P. KLAPURI, Multiple fundamental frequency estimation by summing harmonic amplitudes, in International Society for Music Information Retrieval Conference (ISMIR), 2006, pp. 216–221.

    Google Scholar 

  44. A. P. KLAPURI AND M. DAVY, eds., Signal Processing Methods for Music Transcription, Springer, New York, 2006.

    Google Scholar 

  45. M. LAGRANGE AND S. MARCHAND, Estimating the instantaneous frequency of sinusoidal components using phase-based methods, Journal of the Audio Engineering Society, 55 (2007), pp. 385–399.

    Google Scholar 

  46. J. LE ROUX, H. KAMEOKA, N. ONO, AND S. SAGAYAMA, Fast signal reconstruction from magnitude STFT spectrogram based on spectrogram consistency, in Proceedings of the International Conference on Digital Audio Effects (DAFx), Graz, Austria, 2010, pp. 397–403.

    Google Scholar 

  47. D. D. LEE AND H. S. SEUNG, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999), pp. 788–791.

    Google Scholar 

  48. ______, Algorithms for non-negative matrix factorization, in Proceedings of the Neural Information Processing Systems (NIPS), Denver, CO, USA, 2000, pp. 556–562.

    Google Scholar 

  49. A. LEFEVRE, F. BACH, AND C. FÉVOTTE, Semi-supervised NMF with time-frequency annotations for single-channel source separation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, 2012, pp. 115–120.

    Google Scholar 

  50. B. LEHNER, R. SONNLEITNER, AND G. WIDMER, Towards light-weight, real-time-capable singing voice detection, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Curitiba, Brazil, 2013, pp. 53–58.

    Google Scholar 

  51. A. LIUTKUS, D. FITZGERALD, Z. RAFII, B. PARDO, AND L. DAUDET, Kernel additive models for source separation, IEEE Transactions on Signal Processing, 62 (2014), pp. 4298–4310.

    Google Scholar 

  52. M. MÜLLER, D. P. W. ELLIS, A. KLAPURI, AND G. RICHARD, Signal processing for music analysis, IEEE Journal on Selected Topics in Signal Processing, 5 (2011), pp. 1088–1110.

    Google Scholar 

  53. J. NOCEDAL AND S. J. WRIGHT, Numerical Optimization, Springer (Springer Series in Operations Research and Financial Engineering), 2006.

    Google Scholar 

  54. N. ONO, K. MIYAMOTO, H. KAMEOKA, AND S. SAGAYAMA, A real-time equalizer of harmonic and percussive components in music signals, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Philadelphia, Pennsylvania, USA, 2008, pp. 139–144.

    Google Scholar 

  55. N. ONO, K. MIYAMOTO, J. LEROUX, H. KAMEOKA, AND S. SAGAYAMA, Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram, in European Signal Processing Conference, Lausanne, Switzerland, 2008, pp. 240–244.

    Google Scholar 

  56. A. OZEROV, C. FÉVOTTE, R. BLOUET, AND J.-L. DURRIEU, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 257–260.

    Google Scholar 

  57. J. PAULUS, Signal Processing Methods for Drum Transcription and Music Structure Analysis, PhD thesis, Tampere University of Technology, Tampere, Finland, 2009.

    Google Scholar 

  58. N. PERRAUDIN, P. BALAZS, AND P. L. SØNDERGAARD, A fast Griffin-Lim algorithm, in Proceedings of the IEEEWorkshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013, pp. 1–4.

    Google Scholar 

  59. G. E. POLINER, D. P. ELLIS, A. F. EHMANN, E. GÓMEZ, S. STREICH, AND B. ONG, Melody transcription from music audio: Approaches and evaluation, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 1247–1256.

    Google Scholar 

  60. M. PUIGT, E. VINCENT, AND Y. DEVILLE, Validity of the independence assumption for the separation of instantaneous and convolutive mixtures of speech and music sources, in Proceedings of the International Conference on Independent Component Analysis and Signal Separation (ICA), Paraty, Brazil, 2009, pp. 613–620.

    Google Scholar 

  61. S. A. RACZYNSKI, N. ONO, AND S. SAGAYAMA, Multipitch analysis with harmonic nonnegative matrix approximation, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2007, pp. 381–386.

    Google Scholar 

  62. M. RYYNÄNEN AND A. P. KLAPURI, Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music Journal, 32 (2008), pp. 72–86.

    Google Scholar 

  63. M. RYYNÄNEN, T. VIRTANEN, J. PAULUS, AND A. KLAPURI, Accompaniment separation and karaoke application based on automatic melody transcription, in IEEE International Conference on Multimedia and Expo, Hannover, Germany, 2008, pp. 1417–1420.

    Google Scholar 

  64. J. SALAMON AND E. GÓMEZ, Melody extraction from polyphonic music signals using pitch contour characteristics, IEEE Transactions on Audio, Speech, and Language Processing, 20 (2012), pp. 1759–1770.

    Google Scholar 

  65. J. SALAMON, E. GÓMEZ, D. P. W. ELLIS, AND G. RICHARD, Melody extraction from polyphonic music signals: Approaches, applications, and challenges, IEEE Signal Processing Magazine, 31 (2014), pp. 118–134.

    Google Scholar 

  66. J. SALAMON, J. SERRÀ, AND E. GÓMEZ, Tonal representations for music retrieval: from version identification to query-by-humming, International Journal of Multimedia Information Retrieval, 2 (2013), pp. 45–58.

    Google Scholar 

  67. M. SHASHANKA, B. RAJ, AND P. SMARAGDIS, Probabilistic latent variable models as nonnegative factorizations, Computational Intelligence and Neuroscience, (2008).

    Google Scholar 

  68. U. SIMSEKLI AND A. T. CEMGIL, Score guided musical source separation using generalized coupled tensor factorization, in Proceedings of the European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 2012, pp. 2639–2643.

    Google Scholar 

  69. P. SMARAGDIS AND J. C. BROWN, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003, pp. 177–180.

    Google Scholar 

  70. P. SMARAGDIS, C. FÉVOTTE, G. J. MYSORE, N. MOHAMMADIHA, AND M. D. HOFFMAN, Static and dynamic source separation using nonnegative factorizations: A unified view, IEEE Signal Processing Magazine, 31 (2014), pp. 66–75.

    Google Scholar 

  71. P. SMARAGDIS AND G. J. MYSORE, Separation by humming: User guided sound extraction from monophonic mixtures, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2009, pp. 69–72.

    Google Scholar 

  72. P. SPRECHMANN, P. CANCELA, AND G. SAPIRO, Gaussian mixture models for scoreinformed instrument separation, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012, pp. 49–52.

    Google Scholar 

  73. Y. UEDA, Y. UCHIYAMA, T. NISHIMOTO, N. ONO, AND S. SAGAYAMA, HMM-based approach for automatic chord detection using refined acoustic features, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, 2010, pp. 5518–5521.

    Google Scholar 

  74. C. UHLE, C. DITTMAR, AND T. SPORER, Extraction of drum tracks from polyphonic music using independent subspace analysis, Proceedings International Symposium on Independent Component Analysis and Blind Signal Separation (ICA), (2003), pp. 843–847.

    Google Scholar 

  75. W. VERHELST AND M. ROELANDS, An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, USA, 1993.

    Google Scholar 

  76. E. VINCENT, N. BERTIN, R. GRIBONVAL, AND F. BIMBOT, From blind to guided audio source separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, 31 (2014), pp. 107–115.

    Google Scholar 

  77. E. VINCENT, M. G. JAFARI, S. A. ABDALLAH, M. D. PLUMBLEY, AND M. E. DAVIES, Probabilistic modeling paradigms for audio source separation, in Machine Audition: Principles, Algorithms and Systems, W. Wang, ed., Hershey: IGI Global, 2010, pp. 162–185.

    Google Scholar 

  78. T. VIRTANEN, Sound Source Separation in Monaural Music Signals, PhD thesis, Tampere University of Technology, 2006.

    Google Scholar 

  79. ______, Unsupervised learning methods for source separation in monaural music signals, in Signal Processing Methods for Music Transcription, A. P. Klapuri and M. Davy, eds., Springer, 2006, ch. 6, pp. 267–296.

    Google Scholar 

  80. ______, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Transactions on Audio, Speech, and Language Processing, 15 (2007), pp. 1066–1074.

    Google Scholar 

  81. J. WEIL, J.-L. DURRIEU, G. RICHARD, AND T. SIKORA, Automatic generation of lead sheets from polyphonic music signals, in International Society for Music Information Retrieval Conference, Kobe, Japan, 2009, pp. 603–608.

    Google Scholar 

  82. R. J. WEISS AND J. P. BELLO, Unsupervised discovery of temporal structure in music, IEEE Journal of Selected Topics in Signal Processing, 5 (2011), pp. 1240–1251.

    Google Scholar 

  83. G. ZHOU, A. CICHOCKI, Q. ZHAO, AND S. XIE, Nonnegative matrix and tensor factorizations: An algorithmic perspective, IEEE Signal Processing Magazine, 31 (2014), pp. 54–65.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meinard Müller .

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Müller, M. (2015). Musically Informed Audio Decomposition. In: Fundamentals of Music Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21945-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21945-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21944-8

  • Online ISBN: 978-3-319-21945-5

  • eBook Packages: Computer ScienceComputer Science (R0)