Skip to main content

Machine Listening of Music

  • Chapter
  • First Online:
Digital Da Vinci
  • 1163 Accesses

Abstract

The analysis and recognition of sounds in complex auditory scenes is a fundamental step towards context-awareness in machines, and thus an enabling technology for applications across multiple domains including robotics, human-computer interaction, surveillance and bioacoustics. In the realm of music, endowing computers with listening and analytical skills can aid the organization and study of large music collections, the creation of music recommendation services and personalized radio streams, the automation of tasks in the recording studio or the development of interactive music systems for performance and composition.

In this chapter, we survey common techniques for the automatic recognition of timbral, rhythmic and tonal information from recorded music, and for characterizing the similarities that exist between musical pieces. We explore the assumptions behind these methods and their inherent limitations, and conclude by discussing how current trends in machine learning and signal processing research can shape future developments in the field of machine listening.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Also known as onset detection function, or onset strength signal.

References

  • Agawu K (2012) Trends in African musicology: a review article. EthnoMusicol 56(1):133–140

    Article  Google Scholar 

  • Aucouturier JJ (2006) Ten experiments on the modelling of polyphonic timbre. PhD thesis, University of Paris 6, France

    Google Scholar 

  • Aucouturier, J.-J., Defreville, B. and Pachet, F. The bag-of-frame approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music. Journal of the Acoustical Society of America, 122(2):881–91, 2007.

    Google Scholar 

  • Bamberger JS, Hernandez A (2000) Developing musical intuitions: a project-based introduction to making and understanding music. Oxford University Press, New York

    Google Scholar 

  • Barbedo JGA (2012) Instrument recognition. In: Li T, Ogihara M, Tzanetakis G (eds) Music data mining. CRC Press, Boca Raton, Florida, USA

    Google Scholar 

  • Battenberg E, Wessel D (2012) Analyzing drum patterns using conditional deep belief networks. In: ISMIR, pp 37–42

    Google Scholar 

  • Bello JP (2003) Towards the automated analysis of simple polyphonic music: a knowledge-based approach. PhD thesis, Department of Electronic Engineering, Queen Mary University of London

    Google Scholar 

  • Bello JP (September 2007) Audio-based cover song retrieval using approximate chord sequences: testing shifts, gaps, swaps and beats. In: Proceedings of the 8th international conference on music information retrieval (ISMIR-07). Vienna, Austria, September 2007.

    Google Scholar 

  • Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler MB (September 2005) A tutorial on onset detection in music signals. IEEE Trans Speech Audio Process 13(5):1035–1047 (Part 2)

    Google Scholar 

  • Bengio Y (January (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  MATH  MathSciNet  Google Scholar 

  • Berenzweig A (2007) Anchors and hubs in audio-based music similarity. PhD thesis, Columbia University, New York

    Google Scholar 

  • Berenzweig A, Logan B, Ellis D, Whitman B (2003) A large-scale evaluation of acoustic and subjective music similarity measures. In: Proceedings of the international conference on music information retrieval, Baltimore

    Google Scholar 

  • Bertin-Mahieux T, Ellis DPW (2012) Large-scale cover song recognition using the 2D Fourier transform magnitude. In: The 13th international society for music information retrieval conference, pp 241–246

    Google Scholar 

  • Bertin-Mahieux T, Ellis DPW, Whitman B, Lamere P (2011) The million song dataset. In: Proceedings of the 12th international conference on music information retrieval (ISMIR 2011)

    Google Scholar 

  • BMAT (2013) http://www.bmat.com/ Accessed July 20, 2013

  • Brown J (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1):425–434

    Article  Google Scholar 

  • Burgoyne JA, Pugin L, Kereliuk C, Fujinaga I (2007) A cross-validated study of modelling strategies for automatic chord recognition in audio. In: ISMIR, pp 251–254

    Google Scholar 

  • Burgoyne JA, Wild J, Fujinaga I (2011) An expert ground truth set for audio chord recognition and music analysis. In: Proceedings of the conference of the international society for music information retrieval (ISMIR), Miami, FL, pp 633–638

    Google Scholar 

  • Cho T, Bello JP (2011) A feature smoothing method for chord recognition using recurrence plots. In: Proceedings of the conference of the international society for music information retrieval (ISMIR)

    Google Scholar 

  • Taemin Cho; Bello, J.P., “On the Relative Importance of Individual Components of Chord Recognition Systems,” Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol.22, no.2, pp.477,492, Feb. 2014

    Google Scholar 

  • Cho T, Weiss RJ, Bello JP (July 2010) Exploring common variations in state of the art chord recognition systems. In: Proceedings of the sound and music computing conference (SMC-10), Barcelona

    Google Scholar 

  • Cook PR (2001) Music, cognition, and computerized sound: an introduction to psychoacoustics. The MIT Press, Cambridge, MA, USA.

    Google Scholar 

  • Daudet L (September (2006) Sparse and structured decompositions of signals with the molecular matching pursuit. IEEE Trans Audio Speech Lang Process 14(5):1808–1816

    Article  Google Scholar 

  • Davies MEP, Plumbley MD (2007) Context-dependent beat tracking of musical audio. IEEE Trans Audio Speech Lang Process 15(3):1009–1020

    Article  Google Scholar 

  • Gouyon F, Klapuri A, Dixon S, Alonso M, Tzanetakis G, Uhle C, Cano P (2006) An experimental comparison of audio tempo induction algorithms. IEEE Trans Audio Speech Lang Process 14(5):1832–1844

    Article  Google Scholar 

  • Gracenote (2013) http://www.gracenote.com/music/

  • Grey JM (1975) An exploration of musical timbre. PhD thesis, Department of Music, Stanford University

    Google Scholar 

  • Grosche P, Muller M (2011, to appear) Extracting predominant local pulse information from music recordings. IEEE Trans Audio Speech Lang Process

    Google Scholar 

  • Hamel P, Eck D (2010) Learning features from music audio with deep belief networks. In: ISMIR, Utrecht, pp 339–344

    Google Scholar 

  • Harte C, Sandler MB, Abdallah SA, Gómez E (2005) Symbolic representation of musical chords: a proposed syntax for text annotations. In: Proceedings of the conference of the international society for music information retrieval (ISMIR), London, pp 66–71

    Google Scholar 

  • Henaff M, Jarrett K, Kavukcuoglu K, LeCun Y (2011) Unsupervised learning of sparse features for scalable audio classification. In: Proceedings of international symposium on music information retrieval (ISMIR’11)

    Google Scholar 

  • Herrera P, Klapuri A, Davy M (2006) Automatic classification of pitched musical instrument sounds. In: Klapuri A, Davy M (eds) Signal processing methods for music transcription. Springer, New York, pp 163–200

    Chapter  Google Scholar 

  • Hockman J, Bello JP, Davies MEP, Plumbley M (September 2008) Automated rhythmic transformation of musical audio. In: Proceedings of the International Conference on Digital Audio Effects (DAFX-08), Espoo

    Google Scholar 

  • Holzapfel A, Stylianou Y (2009) A scale transform based method for rhythmic similarity of music. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei

    Google Scholar 

  • Holzapfel A, Flexer A, Widmer G (2011) Improving tempo-sensitive and tempo-robust descriptors for rhythmic similarity. In: Proceedings of SMC 2011, Conference on Sound and Music Computing

    Google Scholar 

  • Honing H (2012) The structure and interpretation of rhythm in music. In: Deutsch D (ed) The psychology of music, 3rd edn. Academic Press, London, pp 369–404

    Google Scholar 

  • Humphrey E, Glennon A, Bello JP (December 2011) Non-linear semantic embedding for organizing large instrument sample libraries. In: Proceedings of the IEEE international conference on machine learning and applications (ICMLA-11), Honolulu

    Google Scholar 

  • Humphrey E, Cho T, Bello JP (2012) Learning a robust tonnetz-space transform for automatic chord recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP-12). Kyoto, Japan. May, 2012

    Google Scholar 

  • Humphrey E, Bello JP, LeCun Y (December 2013) Feature learning and deep architectures: new directions for music informatics. J Intell Inf Syst 41(3):461–481

    Article  Google Scholar 

  • Huron D (2006) Sweet anticipation: music and the psychology of expectation. MIT Press Cambridge, MA, USA.

    Google Scholar 

  • Janata P, Birk JL, Van Horn JD, Leman M, Tillmann B, Bharucha JJ (2002) The cortical topography of tonal structures underlying western music. Science 298:2167–2170

    Article  Google Scholar 

  • Janata P, Tomic ST, Haberman JM (2012) Sensorimotor coupling in music and the psychology of the groove. J Exp Psychol Gen 141(1):54

    Article  Google Scholar 

  • Jehan T (2005) Creating music by listening. PhD thesis, Massachusetts Institute of Technology, MA, USA, September

    Google Scholar 

  • Khadkevich M, Omologo M (2009) Use of hidden markov models and factored language models for automatic chord recognition. In: Proceedings of the conference of the International Society for Music Information Retrieval (ISMIR), Kobe, Japan, pp 561–566

    Google Scholar 

  • Klapuri A (1999) Sound onset detection by applying psychoacoustic knowledge. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Washington, D.C., USA, pp 3089–3092

    Google Scholar 

  • Kolinski M (1973) A cross-cultural approach to metro-rhythmic patterns. Ethnomusicology 17(3):494–506

    Article  Google Scholar 

  • Krumhansl CL (1990) Cognitive foundations of musical pitch. Oxford University Press, New York

    Google Scholar 

  • Lee K (2006) Identifying cover songs from audio using harmonic representation. In: MIREX task on audio cover song ID

    Google Scholar 

  • Lee K (May (2007) A system for chord transcription, key extraction, and cadence recognition from audio using hidden Markov models. PhD thesis. Stanford University, CA, USA, May 2007

    Google Scholar 

  • Lee H, Largman Y, Pham P, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems (NIPS), pp 1096–1104

    Google Scholar 

  • Lerdahl F (2001) Tonal pitch space. Oxford University Press, New York

    Google Scholar 

  • Lewis AC (2007) Rhythm: what it is and how to improve your sense of it. RhythmSource Press, San Francisco

    Google Scholar 

  • London J (2012) Hearing in time. Oxford University Press, New York

    Book  Google Scholar 

  • Martin B, Brown DG, Hanna P, Ferraro P (2012) Blast for audio sequences alignment: a fast scalable cover identification tool. In: ISMIR, pp 529–534

    Google Scholar 

  • Mauch M, Dixon S (2010a) Approximate note transcription for the improved identification of difficult chords. In: ISMIR, pp 135–140

    Google Scholar 

  • Mauch M, Dixon S (2010b) Simultaneous estimation of chords and musical context from audio. IEEE Trans Audio Speech Lang Process 18(6):1280–1289

    Article  Google Scholar 

  • McFee B, Barrington L, Lanckriet G (2012) Learning content similarity for music recommendation. IEEE Trans Audio Speech Lang Process 20(8):2207–2218

    Article  Google Scholar 

  • Nam J, Herrera J, Slaney M, Smith JO (2012) Learning sparse feature representations for music annotation and retrieval. In: ISMIR, pp 565–570

    Google Scholar 

  • Ni Y, McVicar M, Santos-Rodriguez R, Bie TD (2012) An end-to-end machine learning system for harmonic analysis of music. IEEE Trans Audio Speech Lang Process 20(6):1771–1783

    Article  Google Scholar 

  • Oppenheim AV, Schafer RW (2004) From frequency to quefrency: a history of the cepstrum. Signal Processing Mag IEEE 21(5):95–106

    Article  Google Scholar 

  • Papadopoulos H, Peeters G (2007) Large-scale study of chord estimation algorithms based on chroma representation and hmm. In: Content-Based Multimedia Indexing. 2007. CBMI’07. International Workshop on (IEEE), pp 53–60

    Google Scholar 

  • Peeters G (2011) Spectral and temporal periodicity representations of rhythm for the automatic classification of music audio signal. Audio Speech Lang Process IEEE Trans 19(5):1242–1252

    Article  Google Scholar 

  • Pohle T, Schnitzer D, Schedl M, Knees P, Widmer G (2009) On rhythm and general music similarity. In: Proceedings of the Conference of the International Society for Music Information Retrieval (ISMIR), Kobe, Japan, pp 525–530

    Google Scholar 

  • Rabiner LR (1989) A tutorial on HMM and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  • Ravelli E, Bello JP, Sandler M (April 2007) Automatic rhythm modification of drum loops. IEEE Signal Proc Lett 14(4):228–231

    Google Scholar 

  • Schluter J, Osendorfer C (2011) Music similarity estimation with the mean-covariance restricted boltzmann machine. In: Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on (IEEE), vol 2, pp 118–123

    Google Scholar 

  • Schmidt EM, Kim YE (2011) Learning emotion-based acoustic features with deep belief networks. In: Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (IEEE), pp 65–68

    Google Scholar 

  • Schnitzer D, Flexer A, Schedl M, Widmer G (2012) Local and global scaling reduce hubs in space. J Mach Learn Res 13:2871–2902

    MATH  MathSciNet  Google Scholar 

  • Serra J, Gomez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Transactions on Audio, Speech and Language Processing. 16, 2008

    Google Scholar 

  • Serrà J, Serra X, (September 2009) Andrzejak RG (September 2009) Cross recurrence quantification for cover song identification. New J Phys 11:093017, September 2009

    Google Scholar 

  • Sheh A, Ellis D (October 2003) Chord segmentation and recognition using EM- trained hidden Markov models. In: Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR-03). Baltimore, USA, pp 185–191

    Google Scholar 

  • Shepard R (1999) Pitch perception and measurement. In: Cook P (ed) Music, cognition, and computerized sound. MIT Press, Cambridge, pp 149–165

    Google Scholar 

  • Smaragdis P, Brown JC (2003) Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp 177–180

    Google Scholar 

  • Smith JO (2007) Mathematics of the discrete fourier transform (DFT): with music and audio applications. W3K http://books.w3k.org/

  • The Echonest (2013) http://the.echonest.com/ Accessed July 20, 2013

  • Toussaint G (2013) The geometry of musical rhythm: what makes a good rhythm good? CRC Press, Boca Raton, FL, USA.

    Google Scholar 

  • Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Proces 16(2):467–476

    Article  Google Scholar 

  • Tzanetakis G, Cook P (July 2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Proces 10(5):293–302

    Google Scholar 

  • Weiss RJ, Bello JP (2011) Unsupervised discovery of temporal structure in music. IEEE J Sel Top Signal Proces 5(6):1240–1251

    Article  Google Scholar 

  • Weller A, Ellis D, Jebara T (2009) Structured prediction models for chord transcription of music audio. In: Machine Learning and Applications, 2009. ICMLA’09. International Conference on (IEEE), pp 590–595

    Google Scholar 

  • Wessel DL (1979) Timbre space as a musical control structure. Comp Music J 3(2):45–52

    Article  MathSciNet  Google Scholar 

  • Widmer G, Dixon S, Goebl W, Pampalk E, Tobudic A (2003) In search of the Horowitz factor. AI Mag 24(3):111–130

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Pablo Bello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Bello, J. (2014). Machine Listening of Music. In: Lee, N. (eds) Digital Da Vinci. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0536-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-0536-2_7

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-0535-5

  • Online ISBN: 978-1-4939-0536-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics