Content-Based Methods for Knowledge Discovery in Music

Part of the Springer Handbooks book series (SHB)


This chapter presents several computational approaches aimed at supporting knowledge discovery in music. Our work combines data mining, signal processing and data visualization techniques for the automatic analysis of digital music collections, with a focus on retrieving and understanding musical structure.

We discuss the extraction of midlevel feature representations that convey musically meaningful information from audio signals, and show how such representations can be used to synchronize different instances of a musical work and enable new modes of music content browsing and navigation. Moreover, we utilize these representations to identify repetitive structures and representative patterns in the signal, via self-similarity analysis and matrix decomposition techniques that can be made invariant to changes of local tempo and key. We discuss how structural information can serve to highlight relationships within music collections, and explore the use of information visualization tools to characterize the patterns of similarity and dissimilarity that underpin such relationships.

With the help of illustrative examples computed on a collection of recordings of Frédéric Chopin’s Mazurkas, we aim to show how these content-based methods can facilitate the development of novel modes of access, analysis and interaction with digital content that can empower the study and appreciation of music.


Mazurka Music Structure Analysis Pitch Class Self-similarity Matrix Music Synchronization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Mel-frequency cepstral coefficient


musical instrument digital interface


music information retrieval


normalized compression distance


pitch class profile


radial convergence diagram


shift-invariant probabilistic latent component analysis


self-similarity matrix


short-term Fourier transform/short-time Fourier transform



This material is based upon work supported by the National Science Foundation, under grant IIS-0844654, and the Cluster of Excellence on Multimodal Computing and Interaction at Saarland University. The authors would like to thank Craig Sapp for kindly providing access to the Mazurka dataset and beat annotations.


  1. 39.1
    M.A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, M. Slaney: Content-based music information retrieval: Current directions and future challenges, Proc. IEEE 96(4), 668–696 (2008)CrossRefGoogle Scholar
  2. 39.2
    M. Slaney: Web-scale multimedia analysis: Does content matter?, Multimed. IEEE 18(2), 12–15 (2011)CrossRefGoogle Scholar
  3. 39.3
    H. Schenker: Der freie Satz (Universal, Vienna 1935)Google Scholar
  4. 39.4
    A. Ockelford: Repetition in Music: Theoretical and Metatheoretical Perspectives (Ashgate, London 2005)Google Scholar
  5. 39.5
    D. Huron: Sweet Anticipation: Music and the Psychology of Expectation (MIT Press, Cambridge 2006)Google Scholar
  6. 39.6
    M.J. Bruderer, M. McKinney, A. Kohlrausch: Structural boundary perception in popular music. In: Proc. Int. Conf. Music Inf. Retr. (ISMIR), Victoria (2006) pp. 198–201Google Scholar
  7. 39.7
    G. Peeters, E. Deruty: Is music structure annotation multi-dimensional? A proposal for robust local music annotation. In: Proc. 3rd Workshop Learn. Semant. Audio Signals, Graz (2009) pp. 75–90Google Scholar
  8. 39.8
    The AHRC Research Centre for the History and Analysis of Recorded Music: Website of the Mazurka Project,
  9. 39.9
    C.S. Sapp: Comparative analysis of multiple musical performances. In: Proc. Int. Conf. Music Inf. Retr. (ISMIR), Vienna (2007) pp. 497–500Google Scholar
  10. 39.10
    C.S. Sapp: Hybrid numeric/rank similarity metrics. In: Proc. Int. Conf. Music Inf. Retr. (ISMIR), Philadelphia (2008) pp. 501–506Google Scholar
  11. 39.11
    E. Pampalk: Computational Models of Music Similarity and Their Application to Music Information Retrieval, Ph.D. Thesis (Vienna University of Technology, Vienna 2006)Google Scholar
  12. 39.12
    S. Essid: Classification Automatique des Signaux Audio-Fréquences: Reconnaissance des Instruments de Musique, Ph.D. Thesis (Université Pierre et Marie Curie, Paris 2005)Google Scholar
  13. 39.13
    G. Peeters: A large set of audio features for sound description (similarity and classification) in the CUIDADO project, (Ircam, Analyis/Synthesis Team, Paris 2004), version 1.0
  14. 39.14
    A. Sheh, D.P.W. Ellis: Chord segmentation and recognition using EM-trained hidden Markov models. In: Proc. Int. Conf. Music Inf. Retr. (ISMIR), Baltimore (2003)Google Scholar
  15. 39.15
    D.P.W. Ellis, G.E. Poliner: Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Honolulu (2007)Google Scholar
  16. 39.16
    J. Serrà, E. Gómez, P. Herrera, X. Serra: Chroma binary similarity and local alignment applied to cover song identification, IEEE Trans. Audio Speech Lang. Process. 16, 1138–1151 (2008)CrossRefGoogle Scholar
  17. 39.17
    E. Gómez: Tonal Description of Music Audio Signals, Ph.D. Thesis (Universitat Pompeu Fabra, Barcelona 2006)Google Scholar
  18. 39.18
    M. Mauch, K. Noland, S. Dixon: Using musical structure to enhance automatic chord transcription. In: Proc. Int. Conf. Music Inf. Retr. (ISMIR), Kobe (2009) pp. 231–236Google Scholar
  19. 39.19
    M. Müller: Information Retrieval for Music and Motion (Springer, Berlin, Heidelberg 2007)CrossRefGoogle Scholar
  20. 39.20
    R.N. Shepard: Circularity in judgments of relative pitch, J. Acoust. Soc. Am. 36(12), 2346–2353 (1964)CrossRefGoogle Scholar
  21. 39.21
    T. Fujishima: Realtime chord recognition of musical sound: A system using common lisp music. In: Proc. ICMC, Beijing (1999) pp. 464–467Google Scholar
  22. 39.22
    M. Mauch, S. Dixon: Approximate note transcription for the improved identification of difficult chords. In: Proc. 11th Int. Soc. Music Inf. Retr. Conf. (ISMIR), Utrecht (2010) pp. 135–140Google Scholar
  23. 39.23
    M. Müller, S. Ewert: Towards timbre-invariant audio features for harmony-based music, IEEE Trans. Audio Speech Lang. Process. 18(3), 649–662 (2010)CrossRefGoogle Scholar
  24. 39.24
    I.T. Jolliffe: Principal Component Analysis (Springer, New York 2002)zbMATHGoogle Scholar
  25. 39.25
    N. Hu, R.B. Dannenberg, G. Tzanetakis: Polyphonic audio matching and alignment for music retrieval. In: Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz (2003)Google Scholar
  26. 39.26
    S. Ewert, M. Müller, P. Grosche: High resolution audio synchronization using chroma onset features. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Taipei (2009) pp. 1869–1872Google Scholar
  27. 39.27
    C. Fremerey, F. Kurth, M. Müller, M. Clausen: A demonstration of the SyncPlayer system. In: Proc. 8th Int. Conf. Music Inf. Retr. (ISMIR), Vienna (2007) pp. 131–132Google Scholar
  28. 39.28
    D. Damm, C. Fremerey, F. Kurth, M. Müller, M. Clausen: Multimodal presentation and browsing of music. In: Proc. 10th Int. Conf. Multimodal Interfaces (ICMI), Chania (2008) pp. 205–208Google Scholar
  29. 39.29
    M. Müller, V. Konz, N. Jiang, Z. Zuo: A multi-perspective user interface for music signal analysis. In: Proc. Int. Computer Music Conf. (ICMC), Huddersfield (2011)Google Scholar
  30. 39.30
    M. Goto: A chorus section detection method for musical audio signals and its application to a music listening station, IEEE Trans. Audio Speech Lang. Process. 14(5), 1783–1794 (2006)CrossRefGoogle Scholar
  31. 39.31
    J. Foote: Visualizing music and audio using self-similarity. In: Proc. ACM Int. Conf. Multimed., Orlando (1999) pp. 77–80Google Scholar
  32. 39.32
    J. Foote: Automatic audio segmentation using a measure of audio novelty. In: Proc. IEEE Int. Conf. Multimed. Expo (ICME), New York (2000) pp. 452–455Google Scholar
  33. 39.33
    G. Peeters: Sequence representation of music structure using higher-order similarity matrix and maximum-likelihood approach. In: Proc. Int. Conf. Music Inf. Retr. (ISMIR), Vienna (2007) pp. 35–40Google Scholar
  34. 39.34
    M. Goto: A chorus-section detecting method for musical audio signals. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Hong Kong (2003) pp. 437–440Google Scholar
  35. 39.35
    M.A. Bartsch, G.H. Wakefield: Audio thumbnailing of popular music using chroma-based representations, IEEE Trans. Multimed. 7(1), 96–104 (2005)CrossRefGoogle Scholar
  36. 39.36
    J. Paulus, M. Müller, A. Klapuri: Audio-based music structure analysis. In: Proc. 11th Int. Conf. Music Inf. Retr. (ISMIR), Utrecht (2010) pp. 625–636Google Scholar
  37. 39.37
    N. Marwan, M.C. Romano, M. Thiel, J. Kurths: Recurrence plots for the analysis of complex systems, Phys. Rep. 438(5/6), 237–329 (2007)MathSciNetCrossRefGoogle Scholar
  38. 39.38
    G. Tzanetakis, P. Cook: Musical genre classification of audio signals, IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)CrossRefGoogle Scholar
  39. 39.39
    M. Slaney, M. Casey: Locality sensitive hashing for finding nearest neighbours, IEEE Signal Process. Mag. 25(2), 128–131 (2008)CrossRefGoogle Scholar
  40. 39.40
    J. Serrà, X. Serra, R.G. Andrzejak: Cross recurrence quantification for cover song identification, New J. Phys. 11(9), 093017 (2009)CrossRefGoogle Scholar
  41. 39.41
    T. Cho, J. Forsyth, L. Kang, J.P. Bello: Time-varying delay effects based on recurrence plots. In: Proc. 14th Int. Conf. Digit. Audio Eff. (DAFx), Paris (2011)Google Scholar
  42. 39.42
    M. Müller, F. Kurth: Enhancing similarity matrices for music audio analysis. In: Proc. 32nd Int. Conf. Acoust. Speech Signal Process. (ICASSP), Toulouse (2006) pp. 437–440Google Scholar
  43. 39.43
    M. Müller, M. Clausen: Transposition-invariant self-similarity matrices. In: Proc. 8th Int. Conf. Music Inf. Retr. (ISMIR), Vienna (2007) pp. 47–50Google Scholar
  44. 39.44
    R.B. Dannenberg, M. Goto: Music structure analysis from acoustic signals. In: Handbook of Signal Processing in Acoustics, Vol. 1, ed. by D. Havelock, S. Kuwano, M. Vorländer (Springer, New York 2008) pp. 305–331CrossRefGoogle Scholar
  45. 39.45
    T. Izumitani, K. Kashino: A robust musical audio search method based on diagonal dynamic programming matching of self-similarity matrices. In: Proc. 9th Int. Conf. Music Inf. Retr. (ISMIR), Philadelphia (2008) pp. 609–613Google Scholar
  46. 39.46
    J.P. Bello: Measuring structural similarity in music, IEEE Trans. Audio Speech Lang. Process. 19(7), 2013–2025 (2011)CrossRefGoogle Scholar
  47. 39.47
    W. Xie, N.V. Sahinidis: A Branch-and-reduce algorithm for the contact map overlap problem, Res. Comput. Biol. (RECOMB 2006), Lect. Notes Bioinform. 3909, 516–529 (2006)zbMATHGoogle Scholar
  48. 39.48
    N. Krasnogor, D.A. Pelta: Measuring the similarity of protein structures by means of the universal similarity metric, Bioinformatics 20(7), 1015–1021 (2004)CrossRefGoogle Scholar
  49. 39.49
    J.P. Bello: Grouping recorded music by structural similarity. In: Proc. Int. Conf. Music Inf. Retr. (ISMIR), Kobe (2009)Google Scholar
  50. 39.50
    I. Borg, P. Groenen: Modern Multidimensional Scaling (Springer, New York 1997)CrossRefGoogle Scholar
  51. 39.51
    P. Toiviainen: Visualization of tonal content with self-organizing maps and self-similarity matrices, Comput. Entertain. 3(4), 1–10 (2005)CrossRefGoogle Scholar
  52. 39.52
    K.W. Church, J.I. Helfman: Dotplot: A program for exploring self-similarity in millions of lines for text and code, J. Am. Stat. Assoc., Inst. Math. Stat. Interface Found. North Am. 2(2), 153–174 (1993)Google Scholar
  53. 39.53
    E.L.L. Sonnhammer, J.C. Wootton: Dynamic contact maps of protein structures, J. Mol. Graph. Modell. 16(33), 1–5 (1998)CrossRefGoogle Scholar
  54. 39.54
    M. Lima: VC blog on Radial Convergence (2011)
  55. 39.55
    M.I. Krzywinski, J.E. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S.J. Jones, M.A. Marra: Circos: An information aesthetic for comparative genomics, Genome Res. 19(9), 1639–1645 (2009)CrossRefGoogle Scholar
  56. 39.56
    R.J. Weiss, J.P. Bello: Identifying repeated patterns in music using sparse convolutive non-negative matrix factorization. In: Proc. Int. Conf. Music Inf. Retr. (ISMIR), Utrecht (2010) pp. 123–128Google Scholar
  57. 39.57
    R.J. Weiss, J.P. Bello: Unsupervised discovery of temporal structure in music, IEEE J. Sel. Top. Signal Process. 5(6), 1240–1251 (2011)CrossRefGoogle Scholar
  58. 39.58
    P. Grosche, M. Müller, C.S. Sapp: What makes beat tracking difficult? A case study on Chopin Mazurkas. In: Proc. 11th Int. Conf. Music Inf. Retr. (ISMIR), Utrecht (2010) pp. 649–654Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2018

Authors and Affiliations

  1. 1.New York UniversityNew YorkUSA
  2. 2.Huawei Technologies Duesseldorf GmbHMünchenGermany
  3. 3.International Audio Laboratories ErlangenErlangenGermany
  4. 4.Google Inc.New YorkUSA

Personalised recommendations