Multimedia Tools and Applications

, Volume 77, Issue 2, pp 2629–2652 | Cite as

Fusing similarity functions for cover song identification

  • Ning ChenEmail author
  • Wei Li
  • Haidong Xiao


Cover Song Identification (CSI) technique, refers to the process of identifying an alternative version, performance, rendition, or recording of a previously recorded musical composition by measuring and modeling the musical similarity between them quantitatively and objectively. However, it is not possible to describe the similarity between tracks comprehensively and reliably with only one similarity function. In this paper, the Similarity Network Fusion (SNF) technique, which was originally proposed for combining different kernels for predicting drug-target interactions, is adopted to fuse different similarities based on the same descriptor and different similarity functions. First, the Harmonic Pitch Class Profile (HPCP) is extracted from each track. Next, the similarities, in terms of Qmax and Dmax measures, between the HPCP descriptors of any two tracks are calculated, respectively. Then, the track-by-track similarity networks based on Qmax and on Dmax similarity are constructed separately and then fused into one network by SNF. Finally, the fused similarities obtained from the fused similarity network are adopted to train a classifier, which can then be used to identify whether the input two tracks belong to reference/cover or reference/non-cover pair. Experimental results on Covers80 (, subset of SecondHandSongs (SHS) (, and the Mixed Collection and Mazurka Cover Collection provided by MIREX ( demonstrate that the proposed scheme performs comparably with or even better than state-of-the-art CSI schemes.


Cover song identification (CSI) Qmax Dmax Similarity network fusion (SNF) 


  1. 1.
    Bello JP (2007) Audio-based cover song retrieval using approximate chord sequences: Testing shifts, gaps, swaps and beats. In: Proceedings of 6th International Conference on Music Information Retrieval (ISMIR 2007), pp 239–244Google Scholar
  2. 2.
    Casey MA, Veltkamp R, Goto M, Leman M, Rhodes C, Slaney M (2008) Content-based music information retrieval: Current directions and future challenges. Proc IEEE 96(4):668–696CrossRefGoogle Scholar
  3. 3.
    Chang TM, Chen ET, Hsieh CB, Chang PC (2013) Cover song identification with direct chroma feature extraction from aac files. In: Proceedings of 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE), pp 55–56. IEEEGoogle Scholar
  4. 4.
    Chen N, Downie JS, Xiao Hd, Zhu Y (2015) Cochlear pitch class profile for cover song identification. Appl Acoust 99:92–96CrossRefGoogle Scholar
  5. 5.
    Chen N, Xiao Hd (2016) Similarity fusion scheme for cover song identification. Electron Lett 52(13):1173–1175CrossRefGoogle Scholar
  6. 6.
    Chuan X (2012) Cover song identification using an enhanced chroma over a binary classifier based similarity measurement framework. In: Proceedings of 2012 International Conference on Systems and Informatics (ICSAI), pp 2170–2176. IEEEGoogle Scholar
  7. 7.
    Degani A, Dalai M, Leonardi R, Migliorati P (2013) A heuristic for distance fusion in cover song identification. In: Proceedings of 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2013), pp 1–4. IEEEGoogle Scholar
  8. 8.
    Downie JS (2008) The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research. Acoust Sci Technol 29(4):247–255CrossRefGoogle Scholar
  9. 9.
    Egorov A, Linetsky G (2008) Cover song identification with if-f0 pitch class profiles. MIREX extended abstractGoogle Scholar
  10. 10.
    Ellis DP (2006) Identifying ’cover songs’ with beat-synchronous chroma features. MIREX 2006:1–4Google Scholar
  11. 11.
    Ellis DP, Poliner GE (2007) Identifying ’cover songs’ with chroma features and dynamic programming beat tracking. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), vol 4, pp IV–1429–IV–1432. IEEEGoogle Scholar
  12. 12.
    Foucard R, Durrieu JL, Lagrange M, Richard G (2010) Multimodal similarity between musical streams for cover version detection. In: Proceedings of 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2010), pp 5514–5517. IEEEGoogle Scholar
  13. 13.
    Fujishima T (1999) Realtime chord recognition of musical sound: A system using common lisp music. In: Proceedings of International Computer Music Association, pp 464–467Google Scholar
  14. 14.
    Gómez E (2006) Tonal description of music audio signals. Ph.D. thesis, Universitat Pompeu FabraGoogle Scholar
  15. 15.
    Gómez E (2006) Tonal description of polyphonic audio for music content processing. INFORMS J Comput 18(3):294–304CrossRefGoogle Scholar
  16. 16.
    Gómez E, Herrera P (2006) The song remains the same: identifying versions of the same piece using tonal descriptors. In: Proceedings of 6th International Conference on Music Information Retrieval (ISMIR 2006), pp 180–185Google Scholar
  17. 17.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18CrossRefGoogle Scholar
  18. 18.
    Khadkevich M, Omologo M (2013) Large-scale cover song identification using chord profiles. In: Proceedings of 14th International Society for Music Information Retrieval Conference (ISMIR), pp 233–238Google Scholar
  19. 19.
    Marolt M (2006) A mid-level melody-based representation for calculating audio similarity. In: Proceedings of 7th International Conference on Music Information Retrieval (ISMIR 2006), pp 280–285Google Scholar
  20. 20.
    Marolt M (2008) A mid-level representation for melody-based retrieval in audio collections. IEEE Trans Multimed 10(8):1617–1625CrossRefGoogle Scholar
  21. 21.
    Muller M, Ewert S (2010) Towards timbre-invariant audio features for harmony-based music. IEEE/ACM Trans Audio Speech, Lang Process 18(3):649–662CrossRefGoogle Scholar
  22. 22.
    Ravuri S, Ellis DP (2010) Cover song detection: from high scores to general classification. In: Proceedings of 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2010), pp 65–68. IEEEGoogle Scholar
  23. 23.
    Ravuri S et al (2009) Automatic cover song detection: Moving from high scores to general classification. MIREX extended abstractGoogle Scholar
  24. 24.
    Sailer C, Dressler K (2006) Finding cover songs by melodic similarity. MIREX extended abstractGoogle Scholar
  25. 25.
    Salamon J (2013) Melody extraction from polyphonic music signals. Ph.D. thesis, Universitat Pompeu FabraGoogle Scholar
  26. 26.
    Salamon J, Serrà J, Gómez E (2012) Melody, bass line, and harmony representations for music version identification. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp 887–894. ACMGoogle Scholar
  27. 27.
    Salamon J, Serra J, Gómez E (2013) Tonal representations for music retrieval: from version identification to query-by-humming. Int J Multimed Inf Retr 2(1):45–58CrossRefGoogle Scholar
  28. 28.
    Serrà J, Gómez E, Herrera P (2010) Audio cover song identification and similarity: background, approaches, evaluation, and beyond. In: Advances in Music Information Retrieval, pp 307–332. SpringerGoogle Scholar
  29. 29.
    Serra J, Gómez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE/ACM Trans Audio Speech, Lang Process 16(6):1138–1151CrossRefGoogle Scholar
  30. 30.
    Serra J, Serra X, Andrzejak RG (2009) Cross recurrence quantification for cover song identification. J Phys 11(9):093–017Google Scholar
  31. 31.
    Serrà J, Zanin M, Andrzejak RG (2009) Cover song retrieval by cross recurrence quantification and unsupervised set detection. In: Proceedings of 2009 International Society for Music Information Retrieval, pp 1–3Google Scholar
  32. 32.
    Serrà Julià J (2011) Identification of versions of the same musical composition by processing audio descriptions. Ph.D. thesis, Universitat Pompeu FabraGoogle Scholar
  33. 33.
    Tsai WH, Yu HM, Wang HM (2005) Query-by-example technique for retrieving cover versions of popular songs with similar melodies. In: Proceedings of 6th International Conference on Music Information Retrieval (ISMIR 2005), pp 183–190Google Scholar
  34. 34.
    Tsai WH, Yu HM, Wang HM (2008) Using the similarity of main melodies to identify cover versions of popular songs for music document retrieval. J Inf Sci Eng 24(6):1669–1687Google Scholar
  35. 35.
    Walters TC, Ross DA, Lyon RF (2013) The intervalgram: An audio feature for large-scale cover-song recognition. In: From Sounds to Music and Emotions, pp 197–213. SpringerGoogle Scholar
  36. 36.
    Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, Haibe-Kains B, Goldenberg A (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11(3):333–337CrossRefGoogle Scholar
  37. 37.
    Yang F, Chen N (2016) Cover song identification based on cross recurrence plot and local alignment. J East China Univ Sci Technol 42(2):247–253Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.School of Information Science and EngineeringEast China University of Science and TechnologyShanghaiChina
  2. 2.School of Computer Science and TechnologyFudan UniversityShanghaiChina
  3. 3.Shanghai Advanced Research Institute, Chinese Academy of SciencesShanghaiChina

Personalised recommendations