A novel framework for CBCD using integrated color and acoustic features

  • R. RoopalakshmiEmail author
Regular Paper


Most studies in content-based video copy detection (CBCD) concentrate on visual signatures, while only very few efforts are made to exploit audio features. The audio data, if present, is an essential source of a video; hence, the integration of visual-acoustic fingerprints significantly improves the copy detection performance. Based on this aspect, we propose a new framework, which jointly employs color-based visual features and audio fingerprints for detecting the duplicate videos. The proposed framework incorporates three stages: First, a novel visual fingerprint based on spatio-temporal dominant color features is generated; Second, mel-frequency cepstral coefficients are extracted and compactly represented as acoustic signatures; Third, the resultant multimodal signatures are jointly used for the CBCD task, by employing combination rule and weighting strategies. The results of experiments on TRECVID 2008 and 2009 datasets, demonstrate the improved efficiency of the proposed framework compared to the reference methods against a wide range of video transformations.


Content-based video copy detection  MPEG-7 Dominant color descriptor MFCC  Singular value decomposition 


  1. 1.
    Economic consequences of movie piracy CMPDA-Feb 2011 report.
  2. 2.
    Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circuits Syst Video Technol 21(1):15–28CrossRefGoogle Scholar
  3. 3.
    Küçüktunç O, Baştan M, Güdükbay U, Ulusoy O (2010) Video copy detection using multiple visual cues and MPEG-7 descriptors. Elsevier J Vis Commun Image Represent 21:838–849CrossRefGoogle Scholar
  4. 4.
    Chiu CY, Wang HM, Chen CS (2010) Fast min-hashing indexing and robust spatio-temporal matching for detecting video copies. ACM Trans Multimed Comput Commun Appl 6(2):1–23CrossRefGoogle Scholar
  5. 5.
    Sarkar A, Singh V, Ghosh P, Manjunath BS, Singh A (2010) Efficient and robust detection of duplicate videos in a large database. IEEE Trans Circuits Syst Video Technol 20(6):870–885CrossRefGoogle Scholar
  6. 6.
    Roopalakshmi R, Reddy GRM (2013) A novel spatio-temporal registration framework for video copy localization based on multimodal features. Elsevier Signal Process J 93(8):2339–2351CrossRefGoogle Scholar
  7. 7.
    Roopalakshmi R, Reddy GRM (2013) A framework for estimating geometric distortions in video copies based on visual-audio fingerprints. Springer Signal Image Video Process (SIViP) J, 7(1). doi: 10.1007/s11760-013-0424-7.2013
  8. 8.
    Chiu CY (2010) Time-series linear search for video copies based on compact signature manipulation and containment relation modeling. IEEE Trans Circuits Syst Video Technol 20(11):1603–1613CrossRefGoogle Scholar
  9. 9.
    Hua XS, Chen X, Zhang HJ (2004) Robust video signature based on ordinal measure. In: Proceedings of IEEE international conference on image processing (ICIP), vol 1685–688Google Scholar
  10. 10.
    Lowe GD (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60:91–110CrossRefGoogle Scholar
  11. 11.
    Bay H, Tuytelaars T, Gool LV (2008) SURF: speeded up robust features. Comput Vis Image Understand 110:346–359CrossRefGoogle Scholar
  12. 12.
    Chiu CY, Chen CS, Chien LF (2008) A framework for handling spatiotemporal variations in video copy detection. IEEE Trans Circuits Syst Video Technol 18:412–417CrossRefGoogle Scholar
  13. 13.
    Itoh Y, Erokuumae M, Kojima K, Ishigame M, Tanaka K (2010) Time-space acoustical feature for fast video copy detection. In: Proceeding of IEEE international workshop on multimedia signal processing, pp 487–492Google Scholar
  14. 14.
    Anguera X, Obrador P, Oliver N (2009) Multimodal video copy detection applied to social media. In: Proceedings of ACM international conference-WSM’09, pp 57–64Google Scholar
  15. 15.
    Saracoğlu A, Esen E, Ateş TK, Acar BO, Zubari U, Ozan EC, özalp E, Alatan AA, Çiloglu T (2009) Content based copy detection with coarse audio-visual fingerprints, 2009-seventh international workshop on content-based multimedia indexing (cbmi), pp 213–218Google Scholar
  16. 16.
    Manjunath BS, Salembier P, Sikora T (2002) Introduction to MPEG-7-multimedia content description interface. Wiley, NewyorkGoogle Scholar
  17. 17.
    Park TH (2010) Introduction to digital signal processing-computer musically speaking. World scientific Press, SingaporezbMATHGoogle Scholar
  18. 18.
    Deng Y, Manjunath BS, Kenney C, Moore MS, Shin H (2001) An efficient color representation for image retrieval. IEEE Trans Image Process 10:140–147CrossRefzbMATHGoogle Scholar
  19. 19.
    Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–137CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Yang NC, Chang WH, Kuo CM, Li TH (2008) A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval. Elsevier J Vis Commun Image Represent 19:92–105CrossRefGoogle Scholar
  21. 21.
    Kashiwagi T, Oe S (2007) Introduction of frequency image and applications. In SICE annual conference-07, pp 584–591Google Scholar
  22. 22.
    Roytman E, Gotsman C (1995) Dynamic color quantization of video sequences. IEEE Trans Vis Comput Graph 1(3):274–286CrossRefGoogle Scholar
  23. 23.
    Roopalakshmi R, Ram Mohana Reddy G (2011) Efficient video copy detection using simple and effective extraction of color features. In Springer CCIS, vol 193, Part IV, pp 473–480. doi: 10.1007/978-3-642-22726-4_49
  24. 24.
    Boreczky JS, Wilcox LD (1998) A hidden Markov model framework for video segmentation using audio and image features. In: Proceedings of international conference on acoustics, speech, and signal processing (ICASSP-98), vol 6, pp 3741–3744Google Scholar
  25. 25.
    Wang Y, Liu Z, Huang JC (2000) Multimedia content analysis using both audio and visual cues. IEEE Signal Process Mag 17(6):12–36Google Scholar
  26. 26.
    Roopalakshmi R, Ram Mohana Reddy G (2011) A novel approach to video copy detection using audio fingerprints and PCA. Elsevier Proc Comput Sci J 5:149–156. doi: 10.1016/j.procs.2011.07.021 CrossRefGoogle Scholar
  27. 27.
    Chen N, Xiao HD, Wan W (2011) Audio hash function based on non-negative matrix factorization of mel-frequency cepstral coefficients. IET Inf Secur 5(1):19–25CrossRefGoogle Scholar
  28. 28.
    \(\ddot{O}\)zer H, Sankur B, Memom N, Anarim E (2005) Perceptual audio hashing functions. EURASIP J Appl Signal Process, 12, pp 1780–1793Google Scholar
  29. 29.
  30. 30.
    Open Video Project.
  31. 31.
    Kim C, Vasudev B (2005) Spatiotemporal sequence matching for efficient video copy detection. IEEE Trans Circuits Syst Video Technol 15:127–132CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringR.V. College of EngineeringBangaloreIndia

Personalised recommendations