Advertisement

Music Structure Analysis from Acoustic Signals

  • Roger B. Dannenberg
  • Masataka Goto

Music is full of structure, including sections, sequences of distinct musical textures, and the repetition of phrases or entire sections. The analysis of music audio relies upon feature vectors that convey information about music texture or pitch content. Texture generally refers to the average spectral shape and statistical fluctuation, often reflecting the set of sounding instruments, e.g., strings, vocal, or drums. Pitch content reflects melody and harmony, which is often independent of texture. Structure is found in several ways. Segment boundaries can be detected by observing marked changes in locally averaged texture.

Keywords

Feature Vector Hide Markov Model Acoustic Signal Similarity Matrix Audio Signal 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aucouturier, J.-J., F. Pachet, and M. Sandler. 2005. “ ‘The Way It Sounds’: Timbre Models for Structural Analysis and Retrieval of Music Signals”. IEEE Trans. on Multimedia 7(6), 1028–1035.Google Scholar
  2. Aucouturier, J.-J., and M. Sandler. 2001. “Segmentation of Musical Signals Using Hidden Markov Models”. 110th Convention of the Audio Engineering Society, Preprint No. 5379.Google Scholar
  3. Aucouturier, J.-J., and M. Sandler. 2002. “Finding Repeating Patterns in Acoustic Musical Signals: Applications for Audio Thumbnailing”. In AES22 Int. Conf. on Virtual, Synthetic and Entertainment Audio. Audio Engineering Society, pp. 412–421.Google Scholar
  4. Bartsch, M., and G. H. Wakefield. 2001. “To Catch a Chorus: Using Chroma-Based Representations for Audio Thumbnailing”. Proc. the 2001 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2001). New York: IEEE, pp. 15–18.Google Scholar
  5. Cooper, M., and J. Foote. 2003. “Summarizing Popular Music via Structural Similarity Analysis”. Proc. 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2003). New York: IEEE, pp. 127–130.Google Scholar
  6. Dannenberg, R. B. 2002. “Listening to ‘Naima’: An Automated Structural Analysis from Recorded Audio”. Proc. 2002 Int. Computer Music Conf. (ICMC 2002). San Francisco: International Computer Music Association, pp. 28–34.Google Scholar
  7. Dannenberg, R. B., and N. Hu. 2002. “Discovering Musical Structure in Audio Recordings”. In Anagnostopoulou, C., et al. eds. Music and Artificial Intelligence, Second International Conference, (ICMAI 2002), Berlin: Springer-Verlag, pp. 43–57Google Scholar
  8. Dannenberg, R. B., and N. Hu. 2003. “Pattern Discovery Techniques for Music Audio”. J. of New Music Research, 32(2), 153–164.CrossRefGoogle Scholar
  9. Foote, J. 1999. “Visualizing Music and Audio Using Self-Similarity”. Proc. ACM Multimedia ’99. New York: Association for Computing Machinery, pp. 77–80.Google Scholar
  10. Foote, J. 2000. “Automatic Audio Segmentation Using a Measure of Audio Novelty”. Proc. Int. Conf. on Multimedia and Expo (ICME 2000). New York: IEEE, pp. 452–455.Google Scholar
  11. Galas, T., and X. Rodet. 1990. “An Improved Cepstral Method for Deconvolution of Source–Filter Systems with Discrete Speara: Application to Musical Sounds”. Proc. 1990 Int. Computer Music Conf. (ICMC 1990). San Francisco: International Computer Music Association, pp. 82–84.Google Scholar
  12. Goto, M. 2003a. “A Chorus-Section Detecting Method for Musical Audio Signals”. Proc. 2003 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003). New York: IEEE, pp. V-437–V-440.Google Scholar
  13. Goto, M. 2003b. “SmartMusicKIOSK: Music Listening Station with Chorus-Search Function”. Proc. 16th Annual ACM Symposium on User Interface Software and Technology (UIST 2003). New York: Association for Computing Machinery, pp. 31–40.Google Scholar
  14. Goto, M. 2004. “A Real-Time Music Scene Description System: Predominant-F0 Estimation for Detecting Melody and Bass Lines in Real-World Audio Signals”. Speech Communication (ISCA Journal), 43(4), 311–329.Google Scholar
  15. Goto, M., T. Nishimura, H. Hashiguchi, and R. Oka. 2002. “RWC Music Database: Popular, Classical, and Jazz Music Databases”. Proc. 3rd Int. conf. on Music Information Retrieval (ISMIR 2002). Paris: IRCAM, pp. 287–288.Google Scholar
  16. Hu, N., R. B. Dannenberg, and G. Tzanetakis. 2003. “Polyphonic Audio Matching and Alignment for Music Retrieval”. Proc. 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2003). New York: IEEE, pp. 185–188.Google Scholar
  17. Leavers, V. F. 1992. Shape Detection in Computer Vision Using the Hough Transform. Berlin: Springer-Verlag.Google Scholar
  18. Logan, B., and S. Chu. 2000. “Music Summarization Using Key Phrases”. Proc. 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP 2000). New York: IEEE, pp. II-749–II-752.Google Scholar
  19. Lu, L., M. Wang, and H.-J. Zhang. 2004. “Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data”. Proc. 6th ACM SIGMM Int. Workshop on Multimedia Information Retrieval. New York: Association for Computing Machinery, pp. 275–282.Google Scholar
  20. Peeters, G., A. L. Burthe, and X. Rodet. 2002. “Toward Automatic Audio Summary Generation from Signal Analysis”. Proc. 3rd Int. conf. on Music Information Retrieval (ISMIR 2002). Paris: IRCAM, pp. 94–100.Google Scholar
  21. Peeters, G., and X. Rodet. 2003. “Signal-Based Music Structure Discovery for Music Audio Summary Generation”. Proc. 2003 Int. Computer Music Conf. (ICMC 2003). San Francisco: Int. Computer Music Association, pp. 15–22.Google Scholar
  22. Rabiner, L., and B.-H. Juang. 1993. Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  23. Shepard, R. 1964. “Circularity in Judgements of Relative Pitch”. J. Acoust. Soc. Am., 36(12), 2346–2353.CrossRefADSGoogle Scholar
  24. Tolonen, T., and M. Karjalainen. 2000. “A Computationally Efficient Multi-pitch Analysis Model”. IEEE Trans. on Speech and Audio Processing, 8(6), 708–716.Google Scholar
  25. Tzanetakis, G., and P. Cook. 1999. “Multifeature Audio Segmentation for Browsing and Annotation”. In Proc. 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 1999). New York: IEEE.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Roger B. Dannenberg
    • 1
  • Masataka Goto
    • 2
  1. 1.Carhegie Mellon UniversityPittsburghUSA
  2. 2.National Institute of Advanced Industrial Science and Technology (AIST)TsukubaJapan

Personalised recommendations