Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Aligning codebooks for near duplicate image detection


The detection of near duplicate images in large databases, such as the ones of popular social networks, digital investigation archives, and surveillance systems, is an important task for a number of image forensics applications. In digital investigation, hashing techniques are commonly used to index large quantities of images for the detection of copies belonging to different archives. In the last few years, different image hashing techniques based on the Bags of Visual Features paradigm appeared in literature. Recently, this paradigm has been augmented by using multiple descriptors (e.g., Bags of Visual Phrases) in order to exploit the coherence between different feature spaces. In this paper we propose to further improve the Bags of Visual Phrases approach considering the coherence between feature spaces not only at the level of image representation, but also during the codebook generation phase. Also we introduce a novel image database specifically designed for the development and benchmarking of near duplicate image retrieval techniques. The dataset consists of more than 3,300 images depicting more than 500 different scenes having at least three real near duplicates. The dataset has a huge variability in terms of geometric and photometric transformations between scenes and their corresponding near duplicates. Finally, we suggest a method to compress the proposed image representation for storage purposes. Experiments show the effectiveness of the proposed near duplicate retrieval technique, which outperforms the original Bags of Visual Phrases approach.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    Note that at this stage other encoding methods can be used starting from the aligned vocabulary [7].

  2. 2.

    We consider a dataset as synthetic when the near duplicates are generated from a set of images (or frames of videos) by using transformations typically available on image manipulation software (e.g., ImageMagick http://www.imagemagick.org), such as colorizing, contrast changing, cropping, despeckling, downsampling, format changing, framing, rotating, scaling, saturation changing, intensity changing, shearing. To generate near duplicates the basic transformations are usually applied changing the different involved parameters and/or making combination of them.


  1. 1.

    Battiato S, Farinella GM, Gallo G, Ravì D (2010) Exploiting textons distributions on spatial hierarchy for scene classification. EURASIP J Image Video Process Article ID 919367:1–13. doi:10.1155/2010/919367

  2. 2.

    Battiato S, Farinella GM, Messina E, Puglisi G (2012) Robust image alignment for tampering detection. IEEE Trans Inf Forensics Secur 7(4):1105–1117

  3. 3.

    Battiato S, Farinella GM, Guarnera GC, Meccio T, Puglisi G, Ravì D, Rizzo R (2010) Bags of phrases with codebooks alignment for near duplicate image detection. In: Proceedings of the international acm workshop on multimedia in forensics, security and intelligence (MiFor 2010), in conjunction with international acm multimedia conference, pp 65–70

  4. 4.

    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Int J Comput Vis Image Understand 110(3):346–359

  5. 5.

    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 2(4):509–522

  6. 6.

    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(24):509–521

  7. 7.

    Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British machine vision conference

  8. 8.

    Cheng X, Hu Y, Chia L-T (2011) Exploiting local dependencies with spatial-scale space (s-cube) for near-duplicate retrieval. Comput Vis Image Understand 115(6):750–758

  9. 9.

    Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: Proceeding of BMVC

  10. 10.

    Chum O, Perdoch M, Matas J (2009) Geometric min-hashing: finding a (thick) needle in a haystack. In: IEEE computer society conference on computer vision and pattern recognition, pp 17–24

  11. 11.

    De Oliveira R, Cherubini M, Oliver N (2010) Looking at near-duplicate videos from a human-centric perspective. ACM Trans Multimedia Comput Commun Appl 6(3):15:1–15:22

  12. 12.

    Eastlake D, Jones P (2001) RFC 3174. http://tools.ietf.org/html/rfc3174

  13. 13.

    Freeman W, Adelson E (1991) The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intell 13(9):891–906

  14. 14.

    Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1458–1465

  15. 15.

    Hu Y, Cheng X, Chia L-T, Xie X, Rajan D, Tan A-H (2009) Coherent phrase model for efficient image near-duplicate retrieval. IEEE Trans Multimedia 11(8):1434–1445

  16. 16.

    Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In: MIR ’08: proceedings of the 2008 ACM International conference on multimedia information retrieval. ACM, New York, NY

  17. 17.

    Johnson AE, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans Pattern Analy Mach Intell 21(5):433–449

  18. 18.

    Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325–340

  19. 19.

    Ke Y, Sukthankar R, Huston L (2004) Efficient near-duplicate detection and sub-image retrieval. In: Proceeding of ACM multimedia, pp 869–876

  20. 20.

    Koenderink J, van Doorn A (1987) Representation of local geometry in the visual system. Biol Cybern 55:367–375

  21. 21.

    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on Computer Vision and Pattern Recognition, CVPR ’06, pp 2169–2178

  22. 22.

    Lazebnik S, Raginsky M (2009) Supervised learning of quantizer codebooks by information loss minimization. IEEE Trans Pattern Anal Mach Intell 31(7):1294–1309

  23. 23.

    Lejsek H, ÃormóÃřsdóttir H, Ásmundsson F, DaÃřason K, Jóhannsson ÁÃ, Jónsson BÃ, Amsaleg L (2010) Videntifier forensic: large-scale video identification in practice. In: Proceeding of ACM workshop on multimedia in forensics, security and intelligence, pp 1–6

  24. 24.

    Leung T, Malik JJ (1999) Recognizing surfaces using three-dimensional textons. In: Proceedings of the IEEE international conference on computer vision, pp 1010–1017

  25. 25.

    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

  26. 26.

    Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference, pp 384–393

  27. 27.

    Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis (IJCV) 60(1):63–86

  28. 28.

    Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Analy Mach Intell (PAMI) 27(10):1615–1630

  29. 29.

    Nistèr D, Stewènius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 2161–2168

  30. 30.

    Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Prentice-Hall, Inc

  31. 31.

    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the International conference on computer vision and pattern recognition

  32. 32.

    Rivest RL (1992) RFC 1321. http://tools.ietf.org/html/rfc1321

  33. 33.

    Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: Proceedings of the European conference on computer vision, pp 430–443

  34. 34.

    Rongrong J, Hongxun Y, Wei L, Xiaoshuai S, Tian TQ (2012) Task-dependent visual-codebook compression. IEEE Trans Image Process 21(4):2282–2293

  35. 35.

    Rongrong J, Duan L-Y, Chen J, Xie L, Yao H, Gao W (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans Multimedia 15(1):153–166

  36. 36.

    Saffari A, Bischof H (2007) Clustering in a boosting framework. In: Computer vision winter workshop, pp 75–82

  37. 37.

    Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill

  38. 38.

    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523

  39. 39.

    Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering object categories in image collections. In: Proceedings of the international conference on computer vision

  40. 40.

    Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32

  41. 41.

    Szeliski R (2010) Computer vision: algorithms and applications. Springer Available at http://szeliski.org/Book

  42. 42.

    van Gemert LC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283

  43. 43.

    Wang Y, Hou Z, Leman K (2011) Keypoint-based near-duplicate images detection using affine invariant feature and color matching In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), pp 1209–1212

  44. 44.

    Wu Z, Ke Q, Isard M, Sun J (2009) Bundling features for large scale partial-duplicate web image search. In: Proceedings of the international conference on computer vision and pattern recognition, pp 25–32

  45. 45.

    Xu D, Chang S-F (2007) Visual event recognition in news video using kernel methods with multi-level temporal alignment. In: Proceeding of IEEE international conference on computer vision and pattern recognition

  46. 46.

    Xu D, Cham TJ, Yan S, Duan L, Chang S-F (2010) Near duplicate identification with spatially aligned pyramid matching. IEEE Trans Circuits Syst Video Technol (TCSVT) 20(8):1068–1079

  47. 47.

    Zhang D-Q, Chang S-F (2004) Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: Proceedings of the ACM multimedia conference, pp 877–884

  48. 48.

    Zhao W-L, Ngo C-W, Tan H-K, Wu X (2007) Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Trans Multimedia 9(5):1037–1048

  49. 49.

    Zhao W-L, Ngo C-W (2009) Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Trans Image Process 18(2):412–423

  50. 50.

    Zhao WL, Wu X, Ngo CW (2010) On the annotation of web videos by efficient near-duplicate search. IEEE Trans Multimedia 12(5):448–461

  51. 51.

    Zhao W-L, Wu X, Ngo C-W (2011) SOTU: a toolkit for efficient near-duplicate image/video & retrieval/detection. Manual for SOTU Version 1.06. http://www.cs.cityu.edu.hk/~wzhao2/sotu.htm

  52. 52.

    Zhu J, Hoi SC, Lyu MR, Yan S (2008) Near-duplicate keyframe retrieval by nonrigid image matching. In: Proceedings of the ACM multimedia conference, pp 41–50

Download references


Part of this work has been performed in the project PANORAMA, co-funded by grants from Belgium, Italy, France, the Netherlands, the United Kingdom, and the ENIAC Joint Undertaking. The authors would like to thank Giuseppe Claudio Guarnera, Tony Meccio and Rosetta Rizzo who have given some help at the beginning of this work.

Author information

Correspondence to Giovanni Maria Farinella.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Battiato, S., Farinella, G.M., Puglisi, G. et al. Aligning codebooks for near duplicate image detection. Multimed Tools Appl 72, 1483–1506 (2014). https://doi.org/10.1007/s11042-013-1470-4

Download citation


  • Image forensics
  • Near duplicate images
  • Image retrieval
  • Bags of visual words
  • Bags of visual phrases
  • Codebooks alignment