On Musical Performances Identification, Entropy and String Matching

  • Antonio Camarena-Ibarrola
  • Edgar Chávez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)


In this paper we address the problem of matching musical renditions of the same piece of music also known as performances. We use an entropy based Audio-Fingerprint delivering a framed, small footprint AFP which reduces the problem to a string matching problem. The Entropy AFP has very low resolution (750 ms per symbol), making it suitable for flexible string matching.

We show experimental results using dynamic time warping (DTW), Levenshtein or edit distance and the Longest Common Subsequence (LCS) distance. We are able to correctly (100%) identify different renditions of masterpieces as well as pop music in less than a second per comparison.

The three approaches are 100% effective, but LCS and Levenshtein can be computed online, making them suitable for monitoring applications (unlike DTW), and since they are distances a metric index could be use to speed up the recognition process.


Hide Markov Model Dynamic Time Warping String Match Musical Performance Longe Common Subsequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hu, N., Dannenberg, R.B., Tzanetakis, G.: Polyphonic audio matching and alignment for music retrieval. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2003)Google Scholar
  2. 2.
    Shalev-Shwartz, S., Dubnov, S., Friedman, N., Singer, Y.: Robust temporal and spectral modeling for query by melody. In: Proc. of ACM SIGIR 2002 (2002)Google Scholar
  3. 3.
    Cano, P., Loscos, A., Bonada, J.: Score-performance matching using hmms. In: Proceedings ICMC 1999 (1999)Google Scholar
  4. 4.
    Dixon, S.: Live tracking of musical performances using on-line time warping. In: Proc of the 8th Int Conf on Digital Audio Effects (DAFx 2005) (2005)Google Scholar
  5. 5.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)Google Scholar
  6. 6.
    Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Practical On-Line Search for Texts and Biological Sequences, vol. 17, Cambridge University Press, Cambridge (2002)Google Scholar
  7. 7.
    Ibarrola, A.C., Chavez, E.: A very robust audio-fingerprint based on the information content analysis. IEEE transactions on Multimedia (submitted), available:
  8. 8.
    Hellmuth, O., Allamanche, E., Cremer, M., Kastner, T., NeuBauer, C., Schmidt, S., Siebenhaar, F.: Content-based broadcast monitoring using mpeg-7 audio fingerprints. In: International Symposium on Music Information Retrieval ISMIR (2001)Google Scholar
  9. 9.
    Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system. In: IRCAM (2002)Google Scholar
  10. 10.
    Cano, P., Battle, E., Kalker, T., Haitsma, J.: A review of algorithms for audio fingerprinting. In: IEEE Workshop on Multimedia Signal Processing, pp. 167–169 (2002)Google Scholar
  11. 11.
    Shannon, C., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press (1949)Google Scholar
  12. 12.
    Shen, J.L., Hung, J.w., Lee, L.s.: Robust entropy-based endpoint detection for speech recognition in noisy environments. In: Proc. International Conference on Spoken Language Processing (1998)Google Scholar
  13. 13.
    You, H., Zhu, Q., Alwan, A.: Entropy-based variable frame rate analysis of speech signal and its applications to asr. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2004)Google Scholar
  14. 14.
    Ibarrola, A.C., Chavez, E.: A robust, entropy-based audio-fingerprint. In: IEEE International Conference on Multimedia and Expo 2006 (ICME 2006) (to appear, 2006)Google Scholar
  15. 15.
    Group, M.A.: Text of ISO/IEC Final Draft International Standar 15938-4 Information Technology - Multimedia Content Description Interface - Part 4: Audio. MPEG-7 (2001)Google Scholar
  16. 16.
    Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 9, 504–512 (2001)CrossRefGoogle Scholar
  17. 17.
    Sakoe, H., Chiba, S.: Dynamic programming algortihm optimization for spoken word recognition. In: IEEE transactions on Acoustics and Speech Signal Processing (ASSP), pp. 43–49 (1978)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Antonio Camarena-Ibarrola
    • 1
  • Edgar Chávez
    • 1
  1. 1.Universidad Michoacana de Sán Nicolás de Hidalgo, Edif “B” Ciudad UniversitariaMorelia, Mich.México

Personalised recommendations