Using duration models to reduce fragmentation in audio segmentation

Abstract

We investigate explicit segment duration models in addressing the problem of fragmentation in musical audio segmentation. The resulting probabilistic models are optimised using Markov Chain Monte Carlo methods; in particular, we introduce a modification to Wolff’s algorithm to make it applicable to a segment classification model with an arbitrary duration prior. We apply this to a collection of pop songs, and show experimentally that the generated segmentations suffer much less from fragmentation than those produced by segmentation algorithms based on clustering, and are closer to an expert listener’s annotations, as evaluated by two different performance measures.

References

  1. Abdallah, S., Noland, K., Sandler, M., Casey, M., & Rhodes, C. (2005). Theory and evaluation of a Bayesian music structure extractor. In J.D. Reiss & G.A. Wiggins (Eds), Proceedings of the sixth international conference on music information retrieval, (pp. 420–425).

  2. Allen, J. (1984). Towards a general theory of action and time. Artificial Intelligence, 23, 123–154.

    MATH  Article  Google Scholar 

  3. Aucouturier, J.-J., Pachet, F., & Sandler, M. (2005). The way it sounds: Timbre models for analysis and retrieval of polyphonic music signals. IEEE Transactions of Multimedia.

  4. Barbu, A. & Zhu, S.-C. (2004). Cluster sampling and its applications in image processing. Technical Report 409, Department of Statistics, UCLA.

  5. Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. (2004). A tutorial on onset detection in music signals. IEEE Transactions in Speech and Audio Processing, 13(5), 1035–1047.

    Article  Google Scholar 

  6. Brown, J. C. (1991). Calculation of a constant Q spectral transform. Journal of the Acoustic Society of America, 89(1), 425–434.

    Article  Google Scholar 

  7. Dannenberg, R., & Hu, N. (2002). Discovering musical structure in audio recordings. In Music and artifical intelligence: second international conference. Edinburgh.

  8. Downie, S., & Nelson, M. (2000). Evaluation of a simple and effective music information retrieval method. In Proceedings of the ACM SIGIR (pp. 73–80).

  9. Eckmann, J.-P., Kamphorst, S. O., & Ruelle, D. (1987). Recurrence plots of dynamical systems. Europhysics Letters, 5, 973–977.

    Google Scholar 

  10. Foote, J. (1999). Visualizing music and audio using self-similarity. In ACM Multimedia, vol. 1, pp. 77–80.

  11. Galton, A. (Ed) (1987). Temporal logics and their applications. Academic Press, London.

    MATH  Google Scholar 

  12. Goto, M. (2003). A chorus-section detecting method for musical audio signals. In Proc. ICASSP, vol. V, pp. 437–440.

  13. Hainsworth, S., & Macleod, M. (2003). Onset detection in musical audio signals. In Proc. ICMC.

  14. Hofmann, T., & Buhmann, J. M. (1997). Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(1).

  15. Huang, Q., & Dom, B. (1995). Quantitative methods of evaluating image segmentation. In Proc. IEEE Intl. Conf. on Image Processing (ICIP’95).

  16. Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval.

  17. Logan, B., & Chu, S. (2000). Music summarization using key phrases. In International Conference on Acoustics, Speech and Signal Processing.

  18. Lu, L., Wang, M., & Zhang, H. (2004). Repeating pattern discovery and structure analysis from acoustic music data. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval.

  19. Maddage, N., Changsheng, X., Kankanhalli, M., & Shao, X. (2004). Content-based music structure analysis with applications to music semantics understanding. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval.

  20. Merhav, N., & Lee, C.-H. (1993). On the asymptotic statistical behaviour of empirical cepstral coefficients. IEEE Transactions on Signal Processing, 41(5), 1990–1993.

    MATH  Article  Google Scholar 

  21. Orio, N., & Neve, G. (2005). Experiments on segmentation techniques for music documents indexing. In J. D. Reiss & G. A. Wiggins (Eds), Proceedings of the sixth international conference on music information retrieval (pp. 624–627).

  22. Peeters, G., Burthe, A. L., & Rodet, X. (2002). Toward automatic music audio summary generation from signal analysis. In International Symposium on Music Information Retrieval.

  23. Puzicha, J., Hofmann, T., & Buhmann, J. M. (1999). Histogram clustering for unsupervised image segmentation. Proceedings of CVPR ’99.

  24. Rabiner, L. R. (1989). A tutorial on hidden markov models and selection applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Google Scholar 

  25. Robert, C. P., & Casella, G. (1999). Monte carlo statistical methods. Springer, New York.

    MATH  Google Scholar 

  26. Shoham, Y. (1988). Reasoning about change: time and causation from the standpoint of artificial intelligence. MIT Press, Cambridge, MA.

    Google Scholar 

  27. Swendsen, R. H., & Wang, J.-S. (1987). Non-universal critical dynamics in Monte-Carlo simulations. Physical Review Letters, 58(2), 86–88.

    Article  Google Scholar 

  28. Wakefield, G. H. (1999). Mathematical representation of joint time-chroma distributions. In Advanced Signal Processing Algorithms, Architectures, and Implementations, vol. 3807, IX, pp. 637–645. SPIE.

  29. Wolff, U. (1989). Collective Monte Carlo updating for spin systems. Physical Review Letters, 62(4), 361–364.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Samer Abdallah.

Additional information

Editor: Gerhard Widmer

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Abdallah, S., Sandler, M., Rhodes, C. et al. Using duration models to reduce fragmentation in audio segmentation. Mach Learn 65, 485–515 (2006). https://doi.org/10.1007/s10994-006-0586-4

Download citation

Keywords

  • Segmentation
  • Duration prior
  • MCMC
  • Gibbs sampling
  • Wolff algorithm