Multimedia Tools and Applications

, Volume 75, Issue 15, pp 9145–9165 | Cite as

A spectrogram-based audio fingerprinting system for content-based copy detection



This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the spectrogram matrix of the audio signal by using a threshold based on the average of the spectral values to prune this matrix. We transform each version of this pruned spectrogram matrix into a 2-D binary image. Multiple versions of these 2-D images suppress noise to a varying degree. This varying degree of noise suppression improves likelihood of one of the images matching a reference image. To speed up matching, we convert each image into an n-dimensional vector, and perform a nearest neighbor search based on this n-dimensional vector. We give results with two different feature parameters and their combination. We test this method on TRECVID 2010 content-based copy detection evaluation dataset, and we validate the performance on TRECVID 2009 dataset also. Experimental results show the effectiveness of these features even when the audio is distorted. We compare the proposed method to two state-of-the-art audio copy detection systems, namely NN-based and Shazam systems. Our method by far outperforms Shazam system for all audio transformations (or distortions) in terms of detection performance, number of missed queries and localization accuracy. Compared to NN-based system, our approach reduces minimal Normalized Detection Cost Rate (min NDCR) by 23 % and improves localization accuracy by 24 %.


Content-based copy detection Audio fingerprints Feature parameters Spectrogram TRECVID 


  1. 1.
    Anguera X, Garzon A, Adamek T (2012) Mask: robust local features for audio fingerprinting. In: 2012 13th IEEE International Conference on Multimedia and Expo, ICME 2012, July 9, 2012 - July 13, 2012, 455–460. Melbourne, VIC, Australia: IEEE Computer SocietyGoogle Scholar
  2. 2.
    Ayari M, Delhumeau J, Douze M, Jégou H, Potapov D, Revaud J, Schmid C, Yuan J(2011) Inria@Trecvid’2011: Copy Detection & Multimedia Event Detection. In: TRECVID workshopGoogle Scholar
  3. 3.
    Baluja S, Covell M (2007) Audio fingerprinting: combining computer vision data stream processing. In: 2007 I.E. International Conference on Acoustics, Speech, and Signal Processing, 15–20 April 2007, 213–16. Piscataway, NJ, USA: IEEEGoogle Scholar
  4. 4.
    Building Video Queries for Trecvid (2008) Copy Detection Task Accessed January 2014
  5. 5.
    Cano P, Batle E, Kalker T, Haitsma J (2002) A review of algorithms for audio fingerprinting. In: 2002 I.E. 5th Workshop on Multimedia Signal Processing, 9–11 Dec. 2002, 169–73. Piscataway, NJ, USA: IEEEGoogle Scholar
  6. 6.
    Ellis D (2009) Robust landmark-based audio fingerprinting, Online Serial],(2009 May), Available at HTTP:∼dpwe/resources/matlab/fingerprint, ci4
  7. 7.
    Gupta VN, Boulianne G, Cardinal P (2012) CRIM’s content-based audio copy detection system for Trecvid 2009. Multimed Tools Appl 60(2):371–87CrossRefGoogle Scholar
  8. 8.
    Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In: IsmirGoogle Scholar
  9. 9.
    Hartung F, Kutter M (1999) Multimedia watermarking techniques. Proc IEEE 87(7):1079–1107CrossRefGoogle Scholar
  10. 10.
    Heritier M, Gupta V, Gagnon L, Boulianne G, Foucher S, Cardinal P (2009) CRIM’s content-based copy detection system for trecvid. In: Proc. TRECVID-2009. Gaithersburg, MD., USAGoogle Scholar
  11. 11.
    Jegou H, Delhumeau J, Jiangbo Y, Gravier G, Gros P (2012) Babaz: a large scale audio search system for video copy detection. In: 2012 I.E. International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), 25–30 March, 2369–72. Kyoto, JapanGoogle Scholar
  12. 12.
    Jiang M, Fang S, Tian YH, Huang T, Gao W (2011) Pku-Idm@ Trecvid 2011 Cbcd: content-based copy detection with cascade of multimodal features and temporal pyramid matching. In: TRECVID workshopGoogle Scholar
  13. 13.
    Lebosse J, Brun L, Pailles JC (2007) A robust audio fingerprint extraction algorithm. In: Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition and Applications, 14–16 Feb. 2007, 269–74. Anaheim, CA, USA: ACTA PressGoogle Scholar
  14. 14.
    Lezi W, Yuan D, Hongliang B, Jiwei Z, Chong H, Wei L (2012) Contented-based large scale web audio copy detection. In: 2012 I.E. International Conference on Multimedia and Expo (ICME), 9–13 July 2012, 961–6. Los Alamitos, CA, USA: IEEE Computer SocietyGoogle Scholar
  15. 15.
    Ouali C, Dumouchel P, Gupta V (2014) A robust audio fingerprinting method for content-based copy detection. In: International Workshop on Content-Based Multimedia Indexing. AustriaGoogle Scholar
  16. 16.
    Ouali C, Dumouchel P, Gupta V (2014) Robust features for content-based audio copy detection. In: Fifteenth Annual Conference of the International Speech Communication Association. SingaporeGoogle Scholar
  17. 17.
    Saracoglu A, Esen E, Ates TK, Acar BO, Zubari U, Ozan EC, Ozalp E, Alatan AA, Ciloglu T (2009) Content based copy detection with coarse audio-visual fingerprints. In: 2009 Seventh International Workshop on Content-Based Multimedia Indexing (CBMI), 3–5 June 2009, 213–18. Piscataway, NJ, USA: IEEEGoogle Scholar
  18. 18.
    Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: 8th ACM Multimedia International Workshop on Multimedia Information Retrieval, MIR 2006, co-located with the 2006 ACM International Multimedia Conferenc, October 26, 2006 - October 27, 2006, 321–330. Santa Barbara, CA, United states: Association for Computing MachineryGoogle Scholar
  19. 19.
    Wang ALC (2003) An industrial-strength audio search algorithm. In: International Conference on Music Information Retrieval (ISMIR), pp 7–13Google Scholar
  20. 20.
    Yan K, Hoiem D, Sukthankar R (2005) Computer vision for music identification. In: Proceedings. 2005 I.E. Computer Society Conference on Computer Vision and Pattern Recognition, 20–25 June 2005, vol. 1, 597–604. Los Alamitos, CA, USA: IEEE Comput. SocGoogle Scholar
  21. 21.
    Zhu B, Li W, Wang Z, Xue X (2010) A novel audio fingerprinting method robust to time scale modification and pitch shifting. In: 18th ACM International Conference on Multimedia ACM Multimedia 2010, MM’10, October 25, 2010 - October 29, 2010, 987–990. Firenze, Italy: Association for Computing MachineryGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.ÉTS (École de Technologie Supérieure)MontrealCanada
  2. 2.CRIM (Computer Research Institute of Montreal)MontrealCanada

Personalised recommendations