Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: “Sequence” and “State” Approach

  • Geoffroy Peeters
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2771)


In this paper, we investigate the derivation of musical structures directly from signal analysis with the aim of generating visual and audio summaries. From the audio signal, we first derive features – static features (MFCC, chromagram) or proposed dynamic features. Two approaches are then studied in order to derive automatically the structure of a piece of music. The sequence approach considers the audio signal as a repetition of sequences of events. Sequences are derived from the similarity matrix of the features by a proposed algorithm based on a 2D structuring filter and pattern matching. The state approach considers the audio signal as a succession of states. Since human segmentation and grouping performs better upon subsequent hearings, this natural approach is followed here using a proposed multi-pass approach combining time segmentation and unsupervised learning methods. Both sequence and state representations are used for the creation of an audio summary using various techniques.


Hide Markov Model Similarity Matrix Audio Signal Short Time Fourier Transform Sequence Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aucouturier, J.-J., Sandler, M.: Segmentation of musical signals using hidden markov models. In: AES 110th Convention, Amsterdam, The Netherlands (2001)Google Scholar
  2. 2.
    Aucouturier, J.-J., Sandler, M.: Finding repeating patterns in acoustic musical signals: applications for audio thumbnailing. In: AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio, Espoo, Finland (2002)Google Scholar
  3. 3.
    Bartsch, M., Wakefield, G.: To catch a chorus: Using chroma-based representations for audio thumbnailing. In: WASPAA, New Paltz, New York, USA (2001)Google Scholar
  4. 4.
    Beatles, T.: Love me do (one, the best of album). Apple, Capitol Records (2001) Google Scholar
  5. 5.
    Bjork. It’s oh so quiet (post album). Mother records (1995) Google Scholar
  6. 6.
    Cambouropoulos, E., Crochemore, M., Iliopoulos, C., Mouchard, L., Pinzon, Y.: Algorithms for computing approximate repetitions in musical sequences. In: Raman, R., Simpson, J. (eds.) 10th Australasian Workshop On Combinatorial Algorithms, Perth, WA, Australia, pp. 129–144 (1999)Google Scholar
  7. 7.
    Cooper, M., Foote, J.: Automatic music summarization via similarity analysis. In: ISMIR, Paris, France (2002)Google Scholar
  8. 8.
    Crawford, T., Iliopoulos, C., Raman, R.: String matching techniques for musical similarity and melodic recognition. In: Computing in Musicology, vol. 11, pp. 73–100. MIT Press, Cambridge (1998)Google Scholar
  9. 9.
    Dannenberg, R.: Pattern discovery techniques for music audio. In: ISMIR, Paris (2002)Google Scholar
  10. 10.
    Deliege, I.: A perceptual approach to contemporary musical forms. In: Osborne, N. (ed.) Music and the cognitive sciences, vol. 4, pp. 213–230. Harwood Academic publishers (1990)Google Scholar
  11. 11.
    Eckman, J., Kamphorts, S., Ruelle, R.: Recurrence plots of dynamical systems. Europhys. Lett. 4, 973–977 (1987)CrossRefGoogle Scholar
  12. 12.
    Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: ICME (IEEE Int. Conf. Multimedia and Expo), New York City, NY, USA, p. 452 (1999)Google Scholar
  13. 13.
    Foote, J.: Visualizing music and audio using self-similarity. In: ACM Multimedia, Orlando, Florida, USA, pp. 77–84 (1999)Google Scholar
  14. 14.
    Foote, J.: Arthur: Retrieving orchestral music by long-term structure. In: ISMIR, Pymouth, Massachusetts, USA (2000)Google Scholar
  15. 15.
    Hunt, M., Lennig, M., Mermelstein, P.: Experiments in syllable-based recognition of continuous speech. In: ICASSP, Denver, Colorado, USA, pp. 880–883 (1980)Google Scholar
  16. 16.
    Laburthe, A.: Resume sonore. Master thesis, Universite Joseph Fourier, Grenoble, France (2002) Google Scholar
  17. 17.
    Lemstrom, K., Tarhio, J.: Searching monophonic patterns within polyphonic sources. In: RIAO, pp. 1261–1278. College of France, Paris (2000)Google Scholar
  18. 18.
    Logan, B., Chu, S.: Music summarization using key phrases. In: ICASSP, Istanbul, Turkey (2000)Google Scholar
  19. 19.
    Moby. Natural blues (play album). Labels (2001) Google Scholar
  20. 20.
    MPEG-7. Information technology - multimedia content description interface - part 5: Multimedia description scheme (2002) Google Scholar
  21. 21.
    Nirvana. Smells like teen spirit (nevermind album). Polygram (1991) Google Scholar
  22. 22.
    Orio, N., Schwarz, D.: Alignment of monophonic and polyphonic music to a score. In: ICMC, La Habana, Cuba (2001) Google Scholar
  23. 23.
    Peeters, G., Laburthe, A., Rodet, X.: Toward automatic music audio summary generation from signal analysis. In: ISMIR, Paris, France (2002) Google Scholar
  24. 24.
    Rabiner, L.: A tutorial on hidden markov model and selected applications in speech. Proccedings of the IEEE 77(2), 257–285 (1989)CrossRefGoogle Scholar
  25. 25.
    Rossignol, S.: Segmentation et indexation des signaux sonores musicaux. Phd thesis, Universite Paris VI, Paris, France (2000) Google Scholar
  26. 26.
    Scheirer, E.: Tempo and beat analysis of acoustic musical signals. JASA 103(1), 588–601 (1998)Google Scholar
  27. 27.
    Souren, K.: Extraction of structure of a musical piece starting from audio descriptors. Technical report, Ircam (2003) Google Scholar
  28. 28.
    Tzanetakis, G., Cook, P.: Multifeature audio segmentation for browsing and annotation. In: WASPAA, New Paltz, New York, USA (1999) Google Scholar
  29. 29.
    VanSteelant, D., DeBaets, B., DeMeyer, H., Leman, M., Martens, S.-P., Clarisse, L., Lesaffre, M.: Discovering structure and repetition in musical audio. In: Eurofuse, Varanna, Italy (2002) Google Scholar
  30. 30.
    Vinet, H., Herrera, P., Pachet, F.: The cuidado project. In: ISMIR, Paris, France (2002)Google Scholar
  31. 31.
    Zhang, H., Kankanhalli, A., Smoliar, S.: Automatic partitioning of full-motion video. ACM Multimedia System 1(1), 10–28 (1993)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Geoffroy Peeters
    • 1
  1. 1.IrcamParisFrance

Personalised recommendations