Multimedia Tools and Applications

, Volume 72, Issue 2, pp 1483–1506 | Cite as

Aligning codebooks for near duplicate image detection

  • Sebastiano Battiato
  • Giovanni Maria Farinella
  • Giovanni Puglisi
  • Daniele Ravì
Article

Abstract

The detection of near duplicate images in large databases, such as the ones of popular social networks, digital investigation archives, and surveillance systems, is an important task for a number of image forensics applications. In digital investigation, hashing techniques are commonly used to index large quantities of images for the detection of copies belonging to different archives. In the last few years, different image hashing techniques based on the Bags of Visual Features paradigm appeared in literature. Recently, this paradigm has been augmented by using multiple descriptors (e.g., Bags of Visual Phrases) in order to exploit the coherence between different feature spaces. In this paper we propose to further improve the Bags of Visual Phrases approach considering the coherence between feature spaces not only at the level of image representation, but also during the codebook generation phase. Also we introduce a novel image database specifically designed for the development and benchmarking of near duplicate image retrieval techniques. The dataset consists of more than 3,300 images depicting more than 500 different scenes having at least three real near duplicates. The dataset has a huge variability in terms of geometric and photometric transformations between scenes and their corresponding near duplicates. Finally, we suggest a method to compress the proposed image representation for storage purposes. Experiments show the effectiveness of the proposed near duplicate retrieval technique, which outperforms the original Bags of Visual Phrases approach.

Keywords

Image forensics Near duplicate images Image retrieval Bags of visual words Bags of visual phrases Codebooks alignment 

References

  1. 1.
    Battiato S, Farinella GM, Gallo G, Ravì D (2010) Exploiting textons distributions on spatial hierarchy for scene classification. EURASIP J Image Video Process Article ID 919367:1–13. doi:10.1155/2010/919367 Google Scholar
  2. 2.
    Battiato S, Farinella GM, Messina E, Puglisi G (2012) Robust image alignment for tampering detection. IEEE Trans Inf Forensics Secur 7(4):1105–1117CrossRefGoogle Scholar
  3. 3.
    Battiato S, Farinella GM, Guarnera GC, Meccio T, Puglisi G, Ravì D, Rizzo R (2010) Bags of phrases with codebooks alignment for near duplicate image detection. In: Proceedings of the international acm workshop on multimedia in forensics, security and intelligence (MiFor 2010), in conjunction with international acm multimedia conference, pp 65–70Google Scholar
  4. 4.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Int J Comput Vis Image Understand 110(3):346–359CrossRefGoogle Scholar
  5. 5.
    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 2(4):509–522CrossRefGoogle Scholar
  6. 6.
    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(24):509–521CrossRefGoogle Scholar
  7. 7.
    Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British machine vision conferenceGoogle Scholar
  8. 8.
    Cheng X, Hu Y, Chia L-T (2011) Exploiting local dependencies with spatial-scale space (s-cube) for near-duplicate retrieval. Comput Vis Image Understand 115(6):750–758CrossRefGoogle Scholar
  9. 9.
    Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: Proceeding of BMVCGoogle Scholar
  10. 10.
    Chum O, Perdoch M, Matas J (2009) Geometric min-hashing: finding a (thick) needle in a haystack. In: IEEE computer society conference on computer vision and pattern recognition, pp 17–24Google Scholar
  11. 11.
    De Oliveira R, Cherubini M, Oliver N (2010) Looking at near-duplicate videos from a human-centric perspective. ACM Trans Multimedia Comput Commun Appl 6(3):15:1–15:22CrossRefGoogle Scholar
  12. 12.
    Eastlake D, Jones P (2001) RFC 3174. http://tools.ietf.org/html/rfc3174
  13. 13.
    Freeman W, Adelson E (1991) The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intell 13(9):891–906CrossRefGoogle Scholar
  14. 14.
    Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1458–1465Google Scholar
  15. 15.
    Hu Y, Cheng X, Chia L-T, Xie X, Rajan D, Tan A-H (2009) Coherent phrase model for efficient image near-duplicate retrieval. IEEE Trans Multimedia 11(8):1434–1445CrossRefGoogle Scholar
  16. 16.
    Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In: MIR ’08: proceedings of the 2008 ACM International conference on multimedia information retrieval. ACM, New York, NYGoogle Scholar
  17. 17.
    Johnson AE, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans Pattern Analy Mach Intell 21(5):433–449CrossRefGoogle Scholar
  18. 18.
    Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325–340CrossRefMATHMathSciNetGoogle Scholar
  19. 19.
    Ke Y, Sukthankar R, Huston L (2004) Efficient near-duplicate detection and sub-image retrieval. In: Proceeding of ACM multimedia, pp 869–876Google Scholar
  20. 20.
    Koenderink J, van Doorn A (1987) Representation of local geometry in the visual system. Biol Cybern 55:367–375CrossRefMATHGoogle Scholar
  21. 21.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on Computer Vision and Pattern Recognition, CVPR ’06, pp 2169–2178Google Scholar
  22. 22.
    Lazebnik S, Raginsky M (2009) Supervised learning of quantizer codebooks by information loss minimization. IEEE Trans Pattern Anal Mach Intell 31(7):1294–1309CrossRefGoogle Scholar
  23. 23.
    Lejsek H, ÃormóÃřsdóttir H, Ásmundsson F, DaÃřason K, Jóhannsson ÁÃ, Jónsson BÃ, Amsaleg L (2010) Videntifier forensic: large-scale video identification in practice. In: Proceeding of ACM workshop on multimedia in forensics, security and intelligence, pp 1–6Google Scholar
  24. 24.
    Leung T, Malik JJ (1999) Recognizing surfaces using three-dimensional textons. In: Proceedings of the IEEE international conference on computer vision, pp 1010–1017Google Scholar
  25. 25.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  26. 26.
    Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference, pp 384–393Google Scholar
  27. 27.
    Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis (IJCV) 60(1):63–86CrossRefGoogle Scholar
  28. 28.
    Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Analy Mach Intell (PAMI) 27(10):1615–1630CrossRefGoogle Scholar
  29. 29.
    Nistèr D, Stewènius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 2161–2168Google Scholar
  30. 30.
    Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Prentice-Hall, IncGoogle Scholar
  31. 31.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the International conference on computer vision and pattern recognitionGoogle Scholar
  32. 32.
    Rivest RL (1992) RFC 1321. http://tools.ietf.org/html/rfc1321
  33. 33.
    Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: Proceedings of the European conference on computer vision, pp 430–443Google Scholar
  34. 34.
    Rongrong J, Hongxun Y, Wei L, Xiaoshuai S, Tian TQ (2012) Task-dependent visual-codebook compression. IEEE Trans Image Process 21(4):2282–2293CrossRefMathSciNetGoogle Scholar
  35. 35.
    Rongrong J, Duan L-Y, Chen J, Xie L, Yao H, Gao W (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans Multimedia 15(1):153–166CrossRefGoogle Scholar
  36. 36.
    Saffari A, Bischof H (2007) Clustering in a boosting framework. In: Computer vision winter workshop, pp 75–82Google Scholar
  37. 37.
    Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-HillGoogle Scholar
  38. 38.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523CrossRefGoogle Scholar
  39. 39.
    Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering object categories in image collections. In: Proceedings of the international conference on computer visionGoogle Scholar
  40. 40.
    Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32CrossRefGoogle Scholar
  41. 41.
    Szeliski R (2010) Computer vision: algorithms and applications. Springer Available at http://szeliski.org/Book
  42. 42.
    van Gemert LC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283CrossRefGoogle Scholar
  43. 43.
    Wang Y, Hou Z, Leman K (2011) Keypoint-based near-duplicate images detection using affine invariant feature and color matching In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), pp 1209–1212Google Scholar
  44. 44.
    Wu Z, Ke Q, Isard M, Sun J (2009) Bundling features for large scale partial-duplicate web image search. In: Proceedings of the international conference on computer vision and pattern recognition, pp 25–32Google Scholar
  45. 45.
    Xu D, Chang S-F (2007) Visual event recognition in news video using kernel methods with multi-level temporal alignment. In: Proceeding of IEEE international conference on computer vision and pattern recognitionGoogle Scholar
  46. 46.
    Xu D, Cham TJ, Yan S, Duan L, Chang S-F (2010) Near duplicate identification with spatially aligned pyramid matching. IEEE Trans Circuits Syst Video Technol (TCSVT) 20(8):1068–1079CrossRefGoogle Scholar
  47. 47.
    Zhang D-Q, Chang S-F (2004) Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: Proceedings of the ACM multimedia conference, pp 877–884Google Scholar
  48. 48.
    Zhao W-L, Ngo C-W, Tan H-K, Wu X (2007) Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Trans Multimedia 9(5):1037–1048CrossRefGoogle Scholar
  49. 49.
    Zhao W-L, Ngo C-W (2009) Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Trans Image Process 18(2):412–423CrossRefMathSciNetGoogle Scholar
  50. 50.
    Zhao WL, Wu X, Ngo CW (2010) On the annotation of web videos by efficient near-duplicate search. IEEE Trans Multimedia 12(5):448–461CrossRefGoogle Scholar
  51. 51.
    Zhao W-L, Wu X, Ngo C-W (2011) SOTU: a toolkit for efficient near-duplicate image/video & retrieval/detection. Manual for SOTU Version 1.06. http://www.cs.cityu.edu.hk/~wzhao2/sotu.htm
  52. 52.
    Zhu J, Hoi SC, Lyu MR, Yan S (2008) Near-duplicate keyframe retrieval by nonrigid image matching. In: Proceedings of the ACM multimedia conference, pp 41–50Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Sebastiano Battiato
    • 1
  • Giovanni Maria Farinella
    • 1
  • Giovanni Puglisi
    • 1
  • Daniele Ravì
    • 1
  1. 1.Department of Mathematics and Computer Science, Image Processing LaboratoryUniversity of CataniaCataniaItaly

Personalised recommendations