Machine Learning

, Volume 65, Issue 2–3, pp 485–515 | Cite as

Using duration models to reduce fragmentation in audio segmentation

  • Samer Abdallah
  • Mark Sandler
  • Christophe Rhodes
  • Michael Casey


We investigate explicit segment duration models in addressing the problem of fragmentation in musical audio segmentation. The resulting probabilistic models are optimised using Markov Chain Monte Carlo methods; in particular, we introduce a modification to Wolff’s algorithm to make it applicable to a segment classification model with an arbitrary duration prior. We apply this to a collection of pop songs, and show experimentally that the generated segmentations suffer much less from fragmentation than those produced by segmentation algorithms based on clustering, and are closer to an expert listener’s annotations, as evaluated by two different performance measures.


Segmentation Duration prior MCMC Gibbs sampling Wolff algorithm 


  1. Abdallah, S., Noland, K., Sandler, M., Casey, M., & Rhodes, C. (2005). Theory and evaluation of a Bayesian music structure extractor. In J.D. Reiss & G.A. Wiggins (Eds), Proceedings of the sixth international conference on music information retrieval, (pp. 420–425).Google Scholar
  2. Allen, J. (1984). Towards a general theory of action and time. Artificial Intelligence, 23, 123–154.zbMATHCrossRefGoogle Scholar
  3. Aucouturier, J.-J., Pachet, F., & Sandler, M. (2005). The way it sounds: Timbre models for analysis and retrieval of polyphonic music signals. IEEE Transactions of Multimedia.Google Scholar
  4. Barbu, A. & Zhu, S.-C. (2004). Cluster sampling and its applications in image processing. Technical Report 409, Department of Statistics, UCLA.Google Scholar
  5. Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. (2004). A tutorial on onset detection in music signals. IEEE Transactions in Speech and Audio Processing, 13(5), 1035–1047.CrossRefGoogle Scholar
  6. Brown, J. C. (1991). Calculation of a constant Q spectral transform. Journal of the Acoustic Society of America, 89(1), 425–434.CrossRefGoogle Scholar
  7. Dannenberg, R., & Hu, N. (2002). Discovering musical structure in audio recordings. In Music and artifical intelligence: second international conference. Edinburgh.Google Scholar
  8. Downie, S., & Nelson, M. (2000). Evaluation of a simple and effective music information retrieval method. In Proceedings of the ACM SIGIR (pp. 73–80).Google Scholar
  9. Eckmann, J.-P., Kamphorst, S. O., & Ruelle, D. (1987). Recurrence plots of dynamical systems. Europhysics Letters, 5, 973–977.Google Scholar
  10. Foote, J. (1999). Visualizing music and audio using self-similarity. In ACM Multimedia, vol. 1, pp. 77–80.Google Scholar
  11. Galton, A. (Ed) (1987). Temporal logics and their applications. Academic Press, London.zbMATHGoogle Scholar
  12. Goto, M. (2003). A chorus-section detecting method for musical audio signals. In Proc. ICASSP, vol. V, pp. 437–440.Google Scholar
  13. Hainsworth, S., & Macleod, M. (2003). Onset detection in musical audio signals. In Proc. ICMC.Google Scholar
  14. Hofmann, T., & Buhmann, J. M. (1997). Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(1).Google Scholar
  15. Huang, Q., & Dom, B. (1995). Quantitative methods of evaluating image segmentation. In Proc. IEEE Intl. Conf. on Image Processing (ICIP’95).Google Scholar
  16. Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval.Google Scholar
  17. Logan, B., & Chu, S. (2000). Music summarization using key phrases. In International Conference on Acoustics, Speech and Signal Processing.Google Scholar
  18. Lu, L., Wang, M., & Zhang, H. (2004). Repeating pattern discovery and structure analysis from acoustic music data. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval.Google Scholar
  19. Maddage, N., Changsheng, X., Kankanhalli, M., & Shao, X. (2004). Content-based music structure analysis with applications to music semantics understanding. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval.Google Scholar
  20. Merhav, N., & Lee, C.-H. (1993). On the asymptotic statistical behaviour of empirical cepstral coefficients. IEEE Transactions on Signal Processing, 41(5), 1990–1993.zbMATHCrossRefGoogle Scholar
  21. Orio, N., & Neve, G. (2005). Experiments on segmentation techniques for music documents indexing. In J. D. Reiss & G. A. Wiggins (Eds), Proceedings of the sixth international conference on music information retrieval (pp. 624–627).Google Scholar
  22. Peeters, G., Burthe, A. L., & Rodet, X. (2002). Toward automatic music audio summary generation from signal analysis. In International Symposium on Music Information Retrieval.Google Scholar
  23. Puzicha, J., Hofmann, T., & Buhmann, J. M. (1999). Histogram clustering for unsupervised image segmentation. Proceedings of CVPR ’99.Google Scholar
  24. Rabiner, L. R. (1989). A tutorial on hidden markov models and selection applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.Google Scholar
  25. Robert, C. P., & Casella, G. (1999). Monte carlo statistical methods. Springer, New York.zbMATHGoogle Scholar
  26. Shoham, Y. (1988). Reasoning about change: time and causation from the standpoint of artificial intelligence. MIT Press, Cambridge, MA.Google Scholar
  27. Swendsen, R. H., & Wang, J.-S. (1987). Non-universal critical dynamics in Monte-Carlo simulations. Physical Review Letters, 58(2), 86–88.CrossRefGoogle Scholar
  28. Wakefield, G. H. (1999). Mathematical representation of joint time-chroma distributions. In Advanced Signal Processing Algorithms, Architectures, and Implementations, vol. 3807, IX, pp. 637–645. SPIE.Google Scholar
  29. Wolff, U. (1989). Collective Monte Carlo updating for spin systems. Physical Review Letters, 62(4), 361–364.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • Samer Abdallah
    • 1
  • Mark Sandler
    • 1
  • Christophe Rhodes
    • 2
  • Michael Casey
    • 2
  1. 1.Queen Mary, University of LondonLondon
  2. 2.Goldsmiths CollegeUniversity of LondonLondon

Personalised recommendations