Advances in Music Information Retrieval pp 307-332

Part of the Studies in Computational Intelligence book series (SCI, volume 274) | Cite as

Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation, and Beyond

  • Joan Serrà
  • Emilia Gómez
  • Perfecto Herrera

Abstract

A cover version is an alternative rendition of a previously recorded song. Given that a cover may differ from the original song in timbre, tempo, structure, key, arrangement, or language of the vocals, automatically identifying cover songs in a given music collection is a rather difficult task. The music information retrieval (MIR) community has paid much attention to this task in recent years and many approaches have been proposed. This chapter comprehensively summarizes the work done in cover song identification while encompassing the background related to this area of research. The most promising strategies are reviewed and qualitatively compared under a common framework, and their evaluation methodologies are critically assessed. A discussion on the remaining open issues and future lines of research closes the chapter.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adams, N.H., Bartsch, N.A., Shifrin, J.B., Wakefield, G.H.: Time series alignment for music information retrieval. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 303–310 (2004)Google Scholar
  2. 2.
    Ahonen, T.E., Lemstrom, K.: Identifying cover songs using normalized compression distance. In: Int. Workshop on Machine Learning and Music, MML (July 2008)Google Scholar
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books, New York (1999)Google Scholar
  4. 4.
    Bello, J.P.: Audio-based cover song retrieval using approximate chord sequences: testing shifts, gaps, swaps and beats. In: Int. Symp. on Music Information Retrieval (ISMIR), September 2007, pp. 239–244 (2007)Google Scholar
  5. 5.
    Bello, J.P., Pickens, J.: A robust mid-level representation for harmonic content in music signals. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 304–311 (2005)Google Scholar
  6. 6.
    Berenzweig, A., Logan, B., Ellis, D.P.W., Whitman, B.: A large scale evaluation of acoustic and subjective music similarity measures. In: Int. Symp. on Music Information Retrieval, ISMIR (2003)Google Scholar
  7. 7.
    Cano, P., Batlle, E., Kalker, T., Haitsma, J.: A review of audio fingerprinting. Journal of VLSI Signal Processing 41, 271–284 (2005)CrossRefGoogle Scholar
  8. 8.
    Casey, M., Rhodes, C., Slaney, M.: Analysis of minimum distances in high-dimensional musical spaces. IEEE Trans. on Audio, Speech, and Language Processing 16(5), 1015–1028 (2008)CrossRefGoogle Scholar
  9. 9.
    Casey, M., Slaney, M.: The importance of sequences in musical similarity. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), May 2006, vol. 5, p. V (2006)Google Scholar
  10. 10.
    Casey, M., Veltkamp, R.C., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: current directions and future challenges. Proceedings of the IEEE 96(4), 668–696 (2008)CrossRefGoogle Scholar
  11. 11.
    Dalla Bella, S., Peretz, I., Aronoff, N.: Time course of melody recognition: a gating paradigm study. Perception and Psychophysics 7(65), 1019–1028 (2003)Google Scholar
  12. 12.
    Dannenberg, R.B., Birmingham, W.P., Pardo, B., Hu, N., Meek, C., Tzanetakis, G.: A comparative evaluation of search techniques for query-by-humming using the musart testbed. Journal of the American Society for Information Science and Technology 58(5), 687–701 (2007)CrossRefGoogle Scholar
  13. 13.
    de Cheveigné, A.: Pitch perception models. In: Plack, C.J., Oxenham, A., Fay, R.R., Popper, A.N. (eds.) Pitch – Neural coding and perception, pp. 169–233. Springer, New York (2005)Google Scholar
  14. 14.
    Deliège, I.: Cue abstraction as a component of categorisation processes in music listening. Psychology of Music 24(2), 131–156 (1996)CrossRefGoogle Scholar
  15. 15.
    Dixon, S., Widmer, G.: Match: A music alignment tool chest. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 492–497 (2005)Google Scholar
  16. 16.
    Dowling, W.J.: Scale and contour: two components of a theory of memory for melodies. Psychological Review 85(4), 341–354 (1978)CrossRefGoogle Scholar
  17. 17.
    Dowling, W.J., Harwood, J.L.: Music cognition. Academic Press, London (1985)Google Scholar
  18. 18.
    Downie, J.S.: The music information retrieval evaluation exchange (2005–2007): a window into music information retrieval research. Acoustical Science and Technology 29(4), 247–255 (2008)CrossRefGoogle Scholar
  19. 19.
    Downie, J.S., Bay, M., Ehmann, A.F., Jones, M.C.: Audio cover song identification: MIREX 2006-2007 results and analyses. In: Int. Symp. on Music Information Retrieval (ISMIR), September 2008, pp. 468–473 (2008)Google Scholar
  20. 20.
    Egorov, A., Linetsky, G.: Cover song identification with IF-F0 pitch class profiles. In: MIREX extended abstract (September 2008)Google Scholar
  21. 21.
    Ellis, D.P.W., Cotton, C.: The 2007 labrosa cover song detection system. In: MIREX extended abstract (September 2007)Google Scholar
  22. 22.
    Ellis, D.P.W., Cotton, C., Mandel, M.: Cross-correlation of beat-synchronous representations for music similarity. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2008, pp. 57–60 (2008)Google Scholar
  23. 23.
    Ellis, D.P.W., Poliner, G.E.: Identifying cover songs with chroma features and dynamic programming beat tracking. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2007, vol. 4, pp. 1429–1432 (2007)Google Scholar
  24. 24.
    Ellis, D.P.W., Whitman, B., Berenzweig, A., Lawrence, S.: The quest for ground truth in musical artist similarity. In: Int. Symp. on Music Information Retrieval (ISMIR), October 2002, pp. 518–529 (2002)Google Scholar
  25. 25.
    Foote, J.: Arthur: Retrieving orchestral music by long-term structure. In: Int. Symp. on Music Information Retrieval (ISMIR) (October 2000)Google Scholar
  26. 26.
    Fujishima, T.: Realtime chord recognition of musical sound: a system using common lisp music. In: Int. Computer Music Conference (ICMC), pp. 464–467 (1999)Google Scholar
  27. 27.
    Gómez, E.: Tonal description of music audio signals. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain (2006), http://mtg.upf.edu/node/472
  28. 28.
    Gómez, E., Herrera, P.: The song remains the same: identifying versions of the same song using tonal descriptors. In: Int. Symp. on Music Information Retrieval (ISMIR), October 2006, pp. 180–185 (2006)Google Scholar
  29. 29.
    Gómez, E., Ong, B.S., Herrera, P.: Automatic tonal analysis from music summaries for version identification. In: Conv. of the Audio Engineering Society (AES) (October 2006); CD-ROM, paper no. 6902Google Scholar
  30. 30.
    Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. on Speech and Audio Processing 14(5), 1832–1844 (2006)CrossRefGoogle Scholar
  31. 31.
    Gusfield, D.: Algorithms on strings, trees and sequences: computer sciences and computational biology. Cambridge University Press, Cambridge (1997)Google Scholar
  32. 32.
    Harte, C.A., Sandler, M.B.: Automatic chord identification using a quantized chromagram. In: Conv. of the Audio Engineering Society (AES), pp. 28–31 (2005)Google Scholar
  33. 33.
    Heikkila, J.: A new class of shift-invariant operators. IEEE Signal Processing Magazine 11(6), 545–548 (2004)MathSciNetGoogle Scholar
  34. 34.
    Hu, N., Dannenberg, R.B., Tzanetakis, G.: Polyphonic audio matching and alignment for music retrieval. In: IEEE Workshop on Apps. of Signal Processing to Audio and Acoustics (WASPAA), pp. 185–188 (2003)Google Scholar
  35. 35.
    Izmirli, Ö.: Tonal similarity from audio using a template based attractor model. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 540–545 (2005)Google Scholar
  36. 36.
    Jensen, J.H., Christensen, M.G., Ellis, D.P.W., Jensen, S.H.: A tempo-insensitive distance measure for cover song identification based on chroma features. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2008, pp. 2209–2212 (2008)Google Scholar
  37. 37.
    Jensen, J.H., Christensen, M.G., Jensen, S.H.: A chroma-based tempo-insensitive distance measure for cover song identification using the 2d autocorrelation. In: MIREX extended abstract (September 2008)Google Scholar
  38. 38.
    Kim, S., Narayanan, S.: Dynamic chroma feature vectors with applications to cover song identification. In: IEEE Workshop on Multimedia Signal Processing (MMSP), October 2008, pp. 984–987 (2008)Google Scholar
  39. 39.
    Kim, S., Unal, E., Narayanan, S.: Fingerprint extraction for classical music cover song identification. In: IEEE Int. Conf. on Multimedia and Expo (ICME), June 2008, pp. 1261–1264 (2008)Google Scholar
  40. 40.
    Kim, Y.E., Perelstein, D.: MIREX 2007: audio cover song detection using chroma features and hidden markov model. In: MIREX extended abstract (September 2007)Google Scholar
  41. 41.
    Kurth, F., Müller, M.: Efficient index-based audio matching. IEEE Trans. on Audio, Speech, and Language Processing 16(2), 382–395 (2008)CrossRefGoogle Scholar
  42. 42.
    Larkin, C. (ed.): The Encyclopedia of Popular Music, 3rd edn. (November 1998)Google Scholar
  43. 43.
    Lee, K.: Identifying cover songs from audio using harmonic representation. In: MIREX extended abstract (September 2006)Google Scholar
  44. 44.
    Lee, K.: A system for acoustic chord transcription and key extraction from audio using hidden Markov models trained on synthesized audio. PhD thesis, Stanford University, USA (2008)Google Scholar
  45. 45.
    Lemstrom, K.: String matching techinques for music retrieval. PhD thesis, University of Helsinki, Finland (2000)Google Scholar
  46. 46.
    Levitin, D.: This is your brain on music: the science of a human obsession. Penguin (2007)Google Scholar
  47. 47.
    Manning, C.D., Prabhakar, R., Schutze, H.: An introduction to Information Retrieval. Cambridge University Press, Cambridge (2008), http://www.informationretrieval.org Google Scholar
  48. 48.
    Mardirossian, A., Chew, E.: Music summarization via key distributions: analyses of similarity assessment across variations. In: Int. Symp. on Music Information Retrieval, ISMIR (2006)Google Scholar
  49. 49.
    Marolt, M.: A mid-level melody-based representation for calculating audio similarity. In: Int. Symp. on Music Information Retrieval (ISMIR), October 2006, pp. 280–285 (2006)Google Scholar
  50. 50.
    Marolt, M.: A mid-level representation for melody-based retrieval in audio collections. IEEE Trans. on Multimedia 10(8), 1617–1625 (2008)CrossRefGoogle Scholar
  51. 51.
    Miotto, R., Orio, N.: A music identification system based on chroma indexing and statistical modeling. In: Int. Symp. on Music Information Retrieval (ISMIR), September 2008, pp. 301–306 (2008)Google Scholar
  52. 52.
    Müller, M.: Information Retrieval for Music and Motion. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  53. 53.
    Müller, M., Kurth, F., Clausen, M.: Audio matching via chroma-based statistical features. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 288–295 (2005)Google Scholar
  54. 54.
    Myers, C.: A comparative study of several dynamic time warping algorithms for speech recognition. Master’s thesis, Massachussets Institute of Technology, USA (1980)Google Scholar
  55. 55.
    Nagano, H., Kashino, K., Murase, H.: Fast music retrieval using polyphonic binary feature vectors. In: IEEE Int. Conf. on Multimedia and Expo (ICME), vol. 1, pp. 101–104 (2002)Google Scholar
  56. 56.
    Navarro, G., Mákinen, V., Ukkonen, E.: Algorithms for transposition invariant string matching. Journal of Algorithms (56) (2005)Google Scholar
  57. 57.
    Ong, B.S.: Structural analysis and segmentation of music signals. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain (2007), http://mtg.upf.edu/node/508
  58. 58.
    Oppenheim, A.V., Schafer, R.W., Buck, J.B.: Discrete-Time Signal Processing, 2nd edn. Prentice Hall, Englewood Cliffs (1999)Google Scholar
  59. 59.
    Pachet, F.: Knowledge management and musical metadata. Idea Group (2005)Google Scholar
  60. 60.
    Papadopoulos, H., Peeters, G.: Large-scale study of chord estimation algorithms based on chroma representation and hmm. In: Int. Conf. on Content-Based Multimedia Information, pp. 53–60 (2007)Google Scholar
  61. 61.
    Pickens, J.: Harmonic modeling for polyphonic music retrieval. PhD thesis, University of Massachussetts Amherst, USA (2004)Google Scholar
  62. 62.
    Poliner, G.E., Ellis, D.P.W., Ehmann, A., Gómez, E., Streich, S., Ong, B.S.: Melody transcription from music audio: approaches and evaluation. IEEE Trans. on Audio, Speech, and Language Processing 15, 1247–1256 (2007)CrossRefGoogle Scholar
  63. 63.
    Purwins, H.: Proles of pitch classes. Circularity of relative pitch and key: experiments, models, computational music analysis, and perspectives. PhD thesis, Berlin University of Technology, Germany (2005)Google Scholar
  64. 64.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. of the IEEE (1989)Google Scholar
  65. 65.
    Rabiner, L.R., Juang, B.H.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs (1993)Google Scholar
  66. 66.
    Riley, M., Heinen, E., Ghosh, J.: A text retrieval approach to content-based audio retrieval. In: Int. Symp. on Music Information Retrieval (ISMIR), September 2008, pp. 295–300 (2008)Google Scholar
  67. 67.
    Robine, M., Hanna, P., Ferraro, P., Allali, J.: Adaptation of string matching algorithms for identification of near-duplicate music documents. In: ACM SIGIR Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN), pp. 37–43 (2007)Google Scholar
  68. 68.
    Sailer, C., Dressler, K.: Finding cover songs by melodic similarity. In: MIREX extended abstract (September 2006)Google Scholar
  69. 69.
    Sankoff, D., Kruskal, J.: Time warps, string edits, and macromolecules. Addison-Wesley, Reading (1983)Google Scholar
  70. 70.
    Scaringella, N., Zoia, G., Mlynek, D.: Automatic genre classification of music content: a survey. IEEE Signal Processing Magazine 23(2), 133–141 (2006)CrossRefGoogle Scholar
  71. 71.
    Schellenberg, E.G., Iverson, P., McKinnon, M.C.: Name that tune: identifying familiar recordings from brief excerpts. Psychonomic Bulletin and Review 6(4), 641–646 (1999)Google Scholar
  72. 72.
    Schulkind, M.D., Posner, R.J., Rubin, D.C.: Musical features that facilitate melody identification: how do you know it’s your song when they finally play it? Music Perception 21(2), 217–249 (2003)CrossRefGoogle Scholar
  73. 73.
    Selfridge-Field, E.: Conceptual and representational issues in melodic comparison. MIT Press, Cambridge (1998)Google Scholar
  74. 74.
    Serrà, J., Serra, X., Andrzejak, R.G.: Cross recurrence quantification for cover song identification. New Journal of Physics 11, art. 093017 (September 2009)Google Scholar
  75. 75.
    Serrà, J., Gómez, E., Herrera, P.: Transposing chroma representations to a common key. In: IEEE CS Conference on The Use of Symbols to Represent Music and Multimedia Objects, October 2008, pp. 45–48 (2008)Google Scholar
  76. 76.
    Serrà, J., Gómez, E., Herrera, P., Serra, X.: Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans. on Audio, Speech, and Language Processing 16(6), 1138–1152 (2008)CrossRefGoogle Scholar
  77. 77.
    Sheh, A., Ellis, D.P.W.: Chord segmentation and recognition using em-trained hidden markov models. Int. Symp. on Music Information Retrieval (ISMIR), pp. 183–189 (2003)Google Scholar
  78. 78.
    Tsai, W.H., Yu, H.M., Wang, H.M.: A query-by-example technique for retrieving cover versions of popular songs with similar melodies. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 183–190 (2005)Google Scholar
  79. 79.
    Tsai, W.H., Yu, H.M., Wang, H.M.: Using the similarity of main melodies to identify cover versions of popular songs for music document retrieval. Journal of Information Science and Engineering 24(6), 1669–1687 (2008)Google Scholar
  80. 80.
    Tversky, A.: Features of similarity. Psychological Review 84, 327–352 (1977)CrossRefGoogle Scholar
  81. 81.
    Typke, R.: Music retrieval based on melodic similarity. PhD thesis, Utrecht University, Netherlands (2007)Google Scholar
  82. 82.
    Tzanetakis, G.: Pitch histograms in audio and symbolic music information retrieval. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 31–38 (2002)Google Scholar
  83. 83.
    Ukkonen, E., Lemstrom, K., Mäkinen, V.: Sweepline the music! Comp. Sci. in Perspective, 330–342 (2003)Google Scholar
  84. 84.
    Unal, E., Chew, E.: Statistical modeling and retrieval of polyphonic music. In: IEEE Workshop on Multimedia Signal Processing (MMSP), pp. 405-409 (2007)Google Scholar
  85. 85.
    Vignoli, F., Paws, S.: A music retrieval system based on user-driven similarity and its evaluation. In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 272–279 (2005)Google Scholar
  86. 86.
    Voorhees, E.M., Harman, D.K.: Trec: Experiment and evaluation in information retrieval (2005)Google Scholar
  87. 87.
    White, B.W.: Recognition of distorted melodies. American Journal of Psychology 73, 100–107 (1960)CrossRefGoogle Scholar
  88. 88.
    Xu, R., Wunsch, D.C.: Clustering. IEEE Press, Los Alamitos (2009)Google Scholar
  89. 89.
    Yang, C.: Music database retrieval based on spectral similarity. Technical report (2001)Google Scholar
  90. 90.
    Yu, Y., Downie, J.S., Chen, L., Oria, V., Joe, K.: Searching musical audio datasets by a batch of multi-variant tracks. In: ACM Multimedia, October 2008, pp. 121–127 (2008)Google Scholar
  91. 91.
    Yu, Y., Downie, J.S., Mörchen, F., Chen, L., Joe, K., Oria, V.: Cosin: content-based retrieval system for cover songs. In: ACM Multimedia, October 2008, pp. 987–988 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Joan Serrà
    • 1
  • Emilia Gómez
    • 1
  • Perfecto Herrera
    • 1
  1. 1.Music Technology Group, Department of Information and Communication TechnologiesUniversitat Pompeu Fabra. TàngerBarcelonaSpain

Personalised recommendations