Music Summary Detection with State Space Embedding and Recurrence Plot

  • Yongwei Gao
  • Yichun Shen
  • Xulong Zhang
  • Shuai Yu
  • Wei LiEmail author
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 568)


Automatic music summary detection is a task that identifies the most representative part of a song, facilitating users to retrieve the desired songs. In this paper, we propose a novel method based on state space embedding and recurrence plot. Firstly, an extended audio feature with state space embedding is extracted to construct a similarity matrix. Compared with the raw audio features, this extended feature is more robust against noise. Then recurrence plot based on global strategy is adopted to detect similar segment pairs within a song. Finally, we proposed to extract the most repeated part as a summary by selecting and merging the stripes containing the lowest distance in the similarity matrix under the constraints of slope and duration. Experimental results show that the performance of the proposed algorithm is more powerful than the other two competitive baseline methods.


Music summary detection Extended audio feature State space embedding Recurrence plot Global strategy 


  1. 1.
    Gao S, Li H (2015) Popular song summarization using chorus section detection from audio signal. In: Proceedings of the 17th international workshop on multimedia signal processing (MMSP), pp 1–6. IEEE, Xiamen, ChinaGoogle Scholar
  2. 2.
    Maddage NC, Xu C, Kankanhalli MS et al (2004) Content-based music structure analysis with applications to music semantics understanding. In: Proceedings of the 12th ACM international conference on multimedia (MM), pp 112–119. ACM, New York, USAGoogle Scholar
  3. 3.
    Matthew C, Jonathan F (2002) Automatic music summarization via similarity analysis. In: Proceedings of the 3rd international society for music information retrieval (ISMIR), pp 122–127. Paris, FranceGoogle Scholar
  4. 4.
    Bartsch MA, Wakefield GH (2005) Audio thumbnailing of popular music using chroma-based representations. IEEE Trans Multimedia (MM) 7(1):96–104CrossRefGoogle Scholar
  5. 5.
    Lu L, Zhang HJ (2003) Automated extraction of music snippets. In: Proceedings of the 11th ACM international conference on multimedia (MM), pp 140–147. ACM, CA, USAGoogle Scholar
  6. 6.
    Chai W (2006) Semantic segmentation and summarization of music: methods based on tonality and recurrent structure. IEEE Signal Process Mag 23(2):124–132MathSciNetCrossRefGoogle Scholar
  7. 7.
    Nieto O, Humphrey EJ, Bello JP (2012) Compressing music recordings into audio summaries. In: Proceedings of 13th international society for music information retrieval (ISMIR), pp 313–318, Porto, Portugal (2012)Google Scholar
  8. 8.
    Xu C, Maddage MC, Shao X (2005) Automatic music classification and summarization. IEEE Trans Speech Audio Process (TASLP) 13(3):441–450CrossRefGoogle Scholar
  9. 9.
    Xu C, Zhu Y, Tian Q (20025) Automatic music summarization based on temporal, spectral and cepstral features. In: Proceedings of international conference on multimedia and expo, pp 117–120, Lausanne, SwitzerlandGoogle Scholar
  10. 10.
    Zlatintsi A, Maragos P, Potamianos A (2012) A saliency-based approach to audio event detection and summarization. In: Proceedings of the 20th European signal processing conference (EUSIPCO), pp 1294–1298, Bucharest, RomaniaGoogle Scholar
  11. 11.
    Logan B, Chu S (2000) Music summarization using key phrases. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 749–752. Istanbul, TurkeyGoogle Scholar
  12. 12.
    Müller M, Ewert S (2010) Towards timbre-invariant audio features for harmony-based music. IEEE Trans Audio Speech Lang Process (TASLP) 18(3):649–662CrossRefGoogle Scholar
  13. 13.
    Müller M, Ewert S (2011) Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: Proceedings of the 12th international conference on music information retrieval (ISMIR), pp 215–220, Miami, FloridaGoogle Scholar
  14. 14.
    Kantz H, Schreiber T (2004) Nonlinear time series analysis. Cambridge University Press, Cambridge, United KingdomGoogle Scholar
  15. 15.
    Bello JP (2011) Measuring structural similarity in music. IEEE Trans Audio Speech Lang Process (TASLP) 19(7):2013–2025CrossRefGoogle Scholar
  16. 16.
    Serrà J, Serra X, Andrzejak RG (2009) Cross recurrence quantification for cover song identification. New J Phys 11(9):093017CrossRefGoogle Scholar
  17. 17.
    Cho T, Bello JP (2011) A feature smoothing method for chord recognition using recurrence plots. In: Proceedings of the 12th international society for music information retrieval (ISMIR), pp 651–656, Miami, FloridaGoogle Scholar
  18. 18.
    Bertin-Mahieux T, Ellis DPW (2011) Large-scale cover song recognition using hashed chroma landmarks. In: Proceedings of IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), pp 117–120, New York, USA Google Scholar
  19. 19.
    Egorov A, Linetsky G (2008) Cover song identification with IF-F0 pitch class profiles. MIREX extended abstractGoogle Scholar
  20. 20.
    Matthew C, Jonathan F (2003) Summarizing popular music via structural similarity analysis. In: Proceedings of IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), pp 1159–1170, New York, USA (2003)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Yongwei Gao
    • 1
  • Yichun Shen
    • 1
  • Xulong Zhang
    • 1
  • Shuai Yu
    • 1
  • Wei Li
    • 1
    • 2
    Email author
  1. 1.School of Computer Science and TechnologyFudan UniversityShanghaiChina
  2. 2.Shanghai Key Laboratory of Intelligent Information ProcessingFudan UniversityShanghaiChina

Personalised recommendations