A Long-Range Self-similarity Approach to Segmenting DJ Mixed Music Streams

  • Tim Scarfe
  • Wouter M. Koolen
  • Yuri Kalnishkan
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 412)

Abstract

In this paper we describe an unsupervised, deterministic algorithm for segmenting DJ-mixed Electronic Dance Music streams (for example; podcasts, radio shows, live events) into their respective tracks. We attempt to reconstruct boundaries as close as possible to what a human domain expert would engender. The goal of DJ-mixing is to render track boundaries effectively invisible from the standpoint of human perception which makes the problem difficult.

We use Dynamic Programming (DP) to optimally segment a cost matrix derived from a similarity matrix. The similarity matrix is based on the cosines of a time series of kernel-transformed Fourier based features designed with this domain in mind. Our method is applied to EDM streams. Its formulation incorporates long-term self similarity as a first class concept combined with DP and it is qualitatively assessed on a large corpus of long streams that have been hand labelled by a domain expert.

Keywords

music segmentation DJ mix dynamic programming 

References

  1. 1.
    Foote, J.: Visualizing music and audio using self-similarity. In: Proceedings of the Seventh ACM International Conference on Multimedia (Part 1), pp. 77–80. ACM (1999)Google Scholar
  2. 2.
    Foote, J.: A similarity measure for automatic audio classification. In: Proc. AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora (1997)Google Scholar
  3. 3.
    Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: 2000 IEEE International Conference on Multimedia and Expo, ICME 2000, vol. 1, pp. 452–455. IEEE (2000)Google Scholar
  4. 4.
    Foote, J.T., Cooper, M.L.: Media segmentation using self-similarity decomposition. In: Electronic Imaging 2003, pp. 167–175. International Society for Optics and Photonics (2003)Google Scholar
  5. 5.
    Foote, J., Cooper, M.: Visualizing musical structure and rhythm via self-similarity. In: Proceedings of the 2001 International Computer Music Conference, pp. 419–422 (2001)Google Scholar
  6. 6.
    Goodwin, M.M., Laroche, J.: Audio segmentation by feature-space clustering using linear discriminant analysis and dynamic programming. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 131–134. IEEE (2003)Google Scholar
  7. 7.
    Goodwin, M.M., Laroche, J.: A dynamic programming approach to audio segmentation and speech/music discrimination. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 4, pp. iv–309. IEEE (2004)Google Scholar
  8. 8.
    Peeters, G., La Burthe, A., Rodet, X.: Toward automatic music audio summary generation from signal analysis. In: Proc. of ISMIR, pp. 94–100 (2002)Google Scholar
  9. 9.
    Peeters, G.: Deriving musical structures from signal analysis for music audio summary generation: “Sequence” and “State” approach. In: Wiil, U.K. (ed.) CMMR 2003. LNCS, vol. 2771, pp. 143–166. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Peiszer, E., Lidy, T., Rauber, A.: Automatic audio segmentation: Segment boundary and structure detection in popular music. In: Proc. of LSAS (2008)Google Scholar
  11. 11.
    Sox, the swiss army knife of sound processing programs, http://sox.sourceforge.net/
  12. 12.
    Lindgren, M.: Cuenation, website for edm community to share track time metadata, http://cuenation.com/
  13. 13.
    Nyquist, H.: Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers 47(2), 617–644 (1928)CrossRefGoogle Scholar
  14. 14.
    Frigo, M., Johnson, S.G.: The fftw web page (2004)Google Scholar
  15. 15.
    Tzanetakis, G., Cook, P.: Multifeature audio segmentation for browsing and annotation. In: 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 103–106. IEEE (1999)Google Scholar
  16. 16.
    Tzanetakis, G., Cook, F.: A framework for audio analysis based on classification and temporal segmentation. In: Proceedings of 25th EUROMICRO Conference, vol. 2, pp. 61–67 (1999)Google Scholar
  17. 17.
    Theiler, J.P., Gisler, G.: Contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation. In: Optical Science, Engineering and Instrumentation 1997, pp. 108–118. International Society for Optics and Photonics (1997)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2013

Authors and Affiliations

  • Tim Scarfe
    • 1
  • Wouter M. Koolen
    • 1
  • Yuri Kalnishkan
    • 1
  1. 1.Computer Learning Research Centre and Department of Computer ScienceRoyal Holloway, University of LondonEghamUnited Kingdom

Personalised recommendations