Pattern Induction and Matching in Music Signals

  • Anssi Klapuri
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6684)

Abstract

This paper discusses techniques for pattern induction and matching in musical audio. At all levels of music - harmony, melody, rhythm, and instrumentation - the temporal sequence of events can be subdivided into shorter patterns that are sometimes repeated and transformed. Methods are described for extracting such patterns from musical audio signals (pattern induction) and computationally feasible methods for retrieving similar patterns from a large database of songs (pattern matching).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abesser, J., Lukashevich, H., Dittmar, C., Schuller, G.: Genre classification using bass-related high-level features and playing styles. In: Intl. Society on Music Information Retrieval Conference, Kobe, Japan (2009)Google Scholar
  2. 2.
    Badeau, R., Emiya, V., David, B.: Expectation-maximization algorithm for multi-pitch estimation and separation of overlapping harmonic spectra. In: Proc. IEEE ICASSP, Taipei, Taiwan, pp. 3073–3076 (2009)Google Scholar
  3. 3.
    Barbour, J.: Analytic listening: A case study of radio production. In: International Conference on Auditory Display, Sydney, Australia (July 2004)Google Scholar
  4. 4.
    Barry, D., Lawlor, B., Coyle, E.: Sound source separation: Azimuth discrimination and resynthesis. In: 7th International Conference on Digital Audio Effects, Naples, Italy, pp. 240–244 (October 2004)Google Scholar
  5. 5.
    Bartsch, M.A., Wakefield, G.H.: To catch a chorus: Using chroma-based representations for audio thumbnailing. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, USA, pp. 15–18 (2001)Google Scholar
  6. 6.
    Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. of Artificial Intelligence Research 22, 385–421 (2004)MathSciNetMATHGoogle Scholar
  7. 7.
    Bertin-Mahieux, T., Weiss, R.J., Ellis, D.P.W.: Clustering beat-chroma patterns in a large music database. In: Proc. of the Int. Society for Music Information Retrieval Conference, Utrecht, Netherlands (2010)Google Scholar
  8. 8.
    Bever, T.G., Chiarello, R.J.: Cerebral dominance in musicians and nonmusicians. The Journal of Neuropsychiatry and Clinical Neurosciences 21(1), 94–97 (2009)CrossRefGoogle Scholar
  9. 9.
    Brown, J.C.: Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)CrossRefGoogle Scholar
  10. 10.
    Burred, J., Röbel, A., Sikora, T.: Dynamic spectral envelope modeling for the analysis of musical instrument sounds. IEEE Trans. Audio, Speech, and Language Processing (2009)Google Scholar
  11. 11.
    de Cheveigné, A.: Multiple F0 estimation. In: Wang, D., Brown, G.J. (eds.) Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Wiley–IEEE Press (2006)Google Scholar
  12. 12.
    Dannenberg, R.B., Goto, M.: Music structure analysis from acoustic signals. In: Havelock, D., Kuwano, S., Vorländer, M. (eds.) Handbook of Signal Processing in Acoustics, pp. 305–331. Springer, Heidelberg (2009)Google Scholar
  13. 13.
    Dannenberg, R.B., Hu, N.: Pattern discovery techniques for music audio. Journal of New Music Research 32(2), 153–163 (2003)CrossRefGoogle Scholar
  14. 14.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on p-stable distributions. In: ACM Symposium on Computational Geometry, pp. 253–262 (2004)Google Scholar
  15. 15.
    Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns. In: 4th International Conference on Music Information Retrieval, Baltimore, MD, pp. 159–165 (2003)Google Scholar
  16. 16.
    Downie, J.S.: The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoustical Science and Technology 29(4), 247–255 (2008)CrossRefGoogle Scholar
  17. 17.
    Dressler, K.: An auditory streaming approach on melody extraction. In: Intl. Conf. on Music Information Retrieval, Victoria, Canada (2006); MIREX evaluationGoogle Scholar
  18. 18.
    Duda, A., Nürnberger, A., Stober, S.: Towards query by humming/singing on audio databases. In: International Conference on Music Information Retrieval, Vienna, Austria, pp. 331–334 (2007)Google Scholar
  19. 19.
    Durrieu, J.L., Ozerov, A., Févotte, C., Richard, G., David, B.: Main instrument separation from stereophonic audio signals using a source/filter model. In: Proc. EUSIPCO, Glasgow, Scotland (August 2009)Google Scholar
  20. 20.
    Durrieu, J.L., Richard, G., David, B., Fevotte, C.: Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans. on Audio, Speech, and Language Processing 18(3), 564–575 (2010)CrossRefGoogle Scholar
  21. 21.
    Ellis, D., Arroyo, J.: Eigenrhythms: Drum pattern basis sets for classification and generation. In: International Conference on Music Information Retrieval, Barcelona, SpainGoogle Scholar
  22. 22.
    Ellis, D.P.W., Poliner, G.: Classification-based melody transcription. Machine Learning 65(2-3), 439–456 (2006)CrossRefGoogle Scholar
  23. 23.
    FitzGerald, D., Cranitch, M., Coyle, E.: Extended nonnegative tenson factorisation models for musical source separation. Computational Intelligence and Neuroscience (2008)Google Scholar
  24. 24.
    Fujihara, H., Goto, M.: A music information retrieval system based on singing voice timbre. In: Intl. Conf. on Music Information Retrieval, Vienna, Austria (2007)Google Scholar
  25. 25.
    Gersho, A., Gray, R.: Vector Quantization and Signal Compression. Kluwer Academic Publishers, Dordrecht (1991)MATHGoogle Scholar
  26. 26.
    Ghias, A., Logan, J., Chamberlin, D.: Query by humming: Musical information retrieval in an audio database. In: ACM Multimedia Conference 1995. Cornell University, San Fransisco (1995)Google Scholar
  27. 27.
    Goto, M.: A chorus-section detecting method for musical audio signals. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, vol. 5, pp. 437–440 (April 2003)Google Scholar
  28. 28.
    Goto, M.: A real-time music scene description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication 43(4), 311–329 (2004)CrossRefGoogle Scholar
  29. 29.
    Guo, L., He, X., Zhang, Y., Lu, Y.: Content-based retrieval of polyphonic music objects using pitch contour. In: IEEE International Conference on Audio, Speech and Signal Processing, Las Vegas, USA, pp. 2205–2208 (2008)Google Scholar
  30. 30.
    Hainsworth, S.W., Macleod, M.D.: Automatic bass line transcription from polyphonic music. In: International Computer Music Conference, Havana, Cuba, pp. 431–434 (2001)Google Scholar
  31. 31.
    Helén, M., Virtanen, T.: Separation of drums from polyphonic music using non-negtive matrix factorization and support vector machine. In: European Signal Processing Conference, Antalya, Turkey (2005)Google Scholar
  32. 32.
    Jang, J.S.R., Gao, M.Y.: A query-by-singing system based on dynamic programming. In: International Workshop on Intelligent Systems Resolutions (2000)Google Scholar
  33. 33.
    Jang, J.S.R., Hsu, C.L., Lee, H.R.: Continuous HMM and its enhancement for singing/humming query retrieval. In: 6th International Conference on Music Information Retrieval, London, UK (2005)Google Scholar
  34. 34.
    Jensen, K.: Multiple scale music segmentation using rhythm, timbre, and harmony. EURASIP Journal on Advances in Signal Processing (2007)Google Scholar
  35. 35.
    Jurafsky, D., Martin, J.H.: Speech and language processing. Prentice Hall, New Jersey (2000)Google Scholar
  36. 36.
    Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G.: Instrogram: Probabilistic representation of instrument existence for polyphonic music. IPSJ Journal 48(1), 214–226 (2007)Google Scholar
  37. 37.
    Klapuri, A.: A method for visualizing the pitch content of polyphonic music signals. In: Intl. Society on Music Information Retrieval Conference, Kobe, Japan (2009)Google Scholar
  38. 38.
    Klapuri, A., Davy, M. (eds.): Signal Processing Methods for Music Transcription. Springer, New York (2006)Google Scholar
  39. 39.
    Klapuri, A., Eronen, A., Astola, J.: Analysis of the meter of acoustic musical signals. IEEE Trans. Speech and Audio Processing 14(1) (2006)Google Scholar
  40. 40.
    Lartillot, O., Dubnov, S., Assayag, G., Bejerano, G.: Automatic modeling of musical style. In: International Computer Music Conference (2001)Google Scholar
  41. 41.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefMATHGoogle Scholar
  42. 42.
    Lemström, K.: String Matching Techniques for Music Retrieval. Ph.D. thesis, University of Helsinki (2000)Google Scholar
  43. 43.
    Lerdahl, F., Jackendoff, R.: A Generative Theory of Tonal Music. MIT Press, Cambridge (1983)Google Scholar
  44. 44.
    Leveau, P., Vincent, E., Richard, G., Daudet, L.: Instrument-specific harmonic atoms for mid-level music representation. IEEE Trans. Audio, Speech, and Language Processing 16(1), 116–128 (2008)CrossRefGoogle Scholar
  45. 45.
    Li, Y., Wang, D.L.: Separation of singing voice from music accompaniment for monaural recordings. IEEE Trans. on Audio, Speech, and Language Processing 15(4), 1475–1487 (2007)CrossRefGoogle Scholar
  46. 46.
    Marolt, M.: Audio melody extraction based on timbral similarity of melodic fragments. In: EUROCON (November 2005)Google Scholar
  47. 47.
    Mauch, M., Noland, K., Dixon, S.: Using musical structure to enhance automatic chord transcription. In: Proc. 10th Intl. Society for Music Information Retrieval Conference, Kobe, Japan (2009)Google Scholar
  48. 48.
    McNab, R., Smith, L., Witten, I., Henderson, C., Cunningham, S.: Towards the digital music library: Tune retrieval from acoustic input. In: First ACM International Conference on Digital Libraries, pp. 11–18 (1996)Google Scholar
  49. 49.
    Meek, C., Birmingham, W.: Applications of binary classification and adaptive boosting to the query-by-humming problem. In: Intl. Conf. on Music Information Retrieval, Paris, France (2002)Google Scholar
  50. 50.
    Müller, M., Ewert, S., Kreuzer, S.: Making chroma features more robust to timbre changes. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, pp. 1869–1872 (April 2009)Google Scholar
  51. 51.
    Nishimura, T., Hashiguchi, H., Takita, J., Zhang, J.X., Goto, M., Oka, R.: Music signal spotting retrieval by a humming query using start frame feature dependent continuous dynamic programming. In: 2nd Annual International Symposium on Music Information Retrieval, Bloomington, Indiana, USA, pp. 211–218 (October 2001)Google Scholar
  52. 52.
    Ono, N., Miyamoto, K., Roux, J.L., Kameoka, H., Sagayama, S.: Separation of a monaural audio signal into harmonic/percussive components by complementary diffucion on spectrogram. In: European Signal Processing Conference, Lausanne, Switzerland, pp. 240–244 (August 2008)Google Scholar
  53. 53.
    Ozerov, A., Philippe, P., Bimbot, F., Gribonval, R.: Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Trans. on Audio, Speech, and Language Processing 15(5), 1564–1578 (2007)CrossRefGoogle Scholar
  54. 54.
    Paiva, R.P., Mendes, T., Cardoso, A.: On the detection of melody notes in polyphonic audio. In: 6th International Conference on Music Information Retrieval, London, UK, pp. 175–182Google Scholar
  55. 55.
    Paulus, J.: Signal Processing Methods for Drum Transcription and Music Structure Analysis. Ph.D. thesis, Tampere University of Technology (2009)Google Scholar
  56. 56.
    Paulus, J., Klapuri, A.: Measuring the similarity of rhythmic patterns. In: Intl. Conf. on Music Information Retrieval, Paris, France (2002)Google Scholar
  57. 57.
    Paulus, J., Müller, M., Klapuri, A.: Audio-based music structure analysis. In: Proc. of the Int. Society for Music Information Retrieval Conference, Utrecht, Netherlands (2010)Google Scholar
  58. 58.
    Paulus, J., Virtanen, T.: Drum transcription with non-negative spectrogram factorisation. In: European Signal Processing Conference, Antalya, Turkey (September 2005)Google Scholar
  59. 59.
    Peeters, G.: Sequence representations of music structure using higher-order similarity matrix and maximum-likelihood approach. In: Intl. Conf. on Music Information Retrieval, Vienna, Austria, pp. 35–40 (2007)Google Scholar
  60. 60.
    Peeters, G.: A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Tech. rep., IRCAM, Paris, France (April 2004)Google Scholar
  61. 61.
    Poliner, G., Ellis, D., Ehmann, A., Gómez, E., Streich, S., Ong, B.: Melody transcription from music audio: Approaches and evaluation. IEEE Trans. on Audio, Speech, and Language Processing 15(4), 1247–1256 (2007)CrossRefGoogle Scholar
  62. 62.
    Purwins, H.: Profiles of Pitch Classes – Circularity of Relative Pitch and Key: Experiments, Models, Music Analysis, and Perspectives. Ph.D. thesis, Berlin University of Technology (2005)Google Scholar
  63. 63.
    Rowe, R.: Machine musicianship. MIT Press, Cambridge (2001)Google Scholar
  64. 64.
    Ryynänen, M., Klapuri, A.: Query by humming of MIDI and audio using locality sensitive hashing. In: IEEE International Conference on Audio, Speech and Signal Processing, Las Vegas, USA, pp. 2249–2252Google Scholar
  65. 65.
    Ryynänen, M., Klapuri, A.: Transcription of the singing melody in polyphonic music. In: Intl. Conf. on Music Information Retrieval, Victoria, Canada, pp. 222–227 (2006)Google Scholar
  66. 66.
    Ryynänen, M., Klapuri, A.: Automatic bass line transcription from streaming polyphonic audio. In: IEEE International Conference on Audio, Speech and Signal Processing, pp. 1437–1440 (2007)Google Scholar
  67. 67.
    Ryynänen, M., Klapuri, A.: Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal 32(3), 72–86 (2008)CrossRefGoogle Scholar
  68. 68.
    Schörkhuber, C., Klapuri, A.: Constant-Q transform toolbox for music processing. In: 7th Sound and Music Computing Conference, Barcelona, Spain (2010)Google Scholar
  69. 69.
    Selfridge-Field, E.: Conceptual and representational issues in melodic comparison. Computing in Musicology 11, 3–64 (1998)Google Scholar
  70. 70.
    Serra, J., Gomez, E., Herrera, P., Serra, X.: Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans. on Audio, Speech, and Language Processing 16, 1138–1152 (2007)CrossRefGoogle Scholar
  71. 71.
    Serra, X.: Musical sound modeling with sinusoids plus noise. In: Roads, C., Pope, S., Picialli, A., Poli, G.D. (eds.) Musical Signal Processing, Swets & Zeitlinger (1997)Google Scholar
  72. 72.
    Song, J., Bae, S.Y., Yoon, K.: Mid-level music melody representation of polyphonic audio for query-by-humming system. In: Intl. Conf. on Music Information Retrieval, Paris, France, pp. 133–139 (October 2002)Google Scholar
  73. 73.
    Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis – a unified approach to speech spectral estimation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Adelaide, Australia (1994)Google Scholar
  74. 74.
    Typke, R.: Music Retrieval based on Melodic Similarity. Ph.D. thesis, Universiteit Utrecht (2007)Google Scholar
  75. 75.
    Vincent, E., Bertin, N., Badeau, R.: Harmonic and inharmonic nonnegative matrix factorization for polyphonic pitch transcription. In: IEEE ICASSP, Las Vegas, USA (2008)Google Scholar
  76. 76.
    Virtanen, T.: Unsupervised learning methods for source separation in monaural music signals. In: Klapuri, A., Davy, M. (eds.) Signal Processing Methods for Music Transcription, pp. 267–296. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  77. 77.
    Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio, Speech, and Language Processing 15(3), 1066–1074 (2007)CrossRefGoogle Scholar
  78. 78.
    Virtanen, T., Mesaros, A., Ryynänen, M.: Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music. In: ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition, Brisbane, Australia (September 2008)Google Scholar
  79. 79.
    Wang, L., Huang, S., Hu, S., Liang, J., Xu, B.: An effective and efficient method for query by humming system based on multi-similarity measurement fusion. In: International Conference on Audio, Language and Image Processing, pp. 471–475 (July 2008)Google Scholar
  80. 80.
    Welch, T.A.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)CrossRefGoogle Scholar
  81. 81.
    Wu, X., Li, M., Yang, J., Yan, Y.: A top-down approach to melody match in pitch countour for query by humming. In: International Conference of Chinese Spoken Language Processing (2006)Google Scholar
  82. 82.
    Yeh, C.: Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis, University of Paris VI (2008)Google Scholar
  83. 83.
    Yilmaz, O., Richard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. on Signal Processing 52(7), 1830–1847 (2004)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Anssi Klapuri
    • 1
  1. 1.Centre for Digital MusicQueen Mary University of LondonLondonUnited Kingdom

Personalised recommendations