Practical Application of Near Duplicate Detection for Image Database

  • Adi Eshkol
  • Michał Grega
  • Mikołaj Leszczuk
  • Ofer Weintraub
Part of the Communications in Computer and Information Science book series (CCIS, volume 429)


Traditional program guides, TV applications, and online portals alone are no longer sufficient to expose all content, let alone offer the content that consumers want, at the times they are most likely to want it. DEEP, (Data Enrichment and Engagement Platform) by Orca Interactive, a comprehensive new content discovery solution, combines search, recommendation, and second-screen devices into a single immersive experience which invites exploration. The automated generation (using internet sources) of digital magazines for movies, TV shows, cast members and topics is a key value of DEEP. Unfortunately, using the internet as a source for pictures can result in the acquisition of so-called “Near Duplicate” (ND) images – similar images from a specific display context - for example, multiple red carpet images showing an actor from very similar angles or degrees of zoom on him/her. Therefore, in this paper we present a practical application of ND detection for image databases. The algorithm used is based on the MPEG-7 Colour Structure descriptor. For images that were provided by the developers of the DEEP software the algorithm performs very well, and the results are almost identical to those obtained during the training phase.


Image Descriptors Scalable Colour Near Duplicates Query by Example QbE MPEG-7 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: British Machine Vision Conference (2008)Google Scholar
  2. 2.
    Chum, O., Philbin, J., Isard, M., Zisserman, A.: Scalable near identical image and shot detection. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR 2007, pp. 549–556. ACM, New York (2007), Google Scholar
  3. 3.
    Foo, J.J., et al.: Clustering near-duplicate images in large collections (2007)Google Scholar
  4. 4.
    Foo, J.J., Sinha, R., Zobel, J.: Sico: A system for detection of near-duplicate images during search. In: 2007 IEEE International Conference on Multimedia and Expo, pp. 595–598 (July 2007)Google Scholar
  5. 5.
    Foo, J.J., Sinha, R.: Using redundant bit vectors for near-duplicate image detection. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 472–484. Springer, Heidelberg (2007), CrossRefGoogle Scholar
  6. 6.
    Fraczek, R., Grega, M., Liebau, N., Leszczuk, M., Luedtke, A., Janowski, L., Papir, Z.: Ground-truth-less comparison of selected content-based image retrieval measures. In: Daras, P., Ibarra, O.M. (eds.) UCMedia 2009. LNICST, vol. 40, pp. 101–108. Springer, Heidelberg (2010), CrossRefGoogle Scholar
  7. 7.
    Grega, M., Łach, S.: Urban photograph localization using the instreet application – accuracy and performance analysis. Multimedia Tools and Applications pp. 1–12 (2013),
  8. 8.
    INRIA: Video copy detection evaluation showcase (2007),
  9. 9.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008), CrossRefGoogle Scholar
  10. 10.
    Jinda-Apiraksa, A., Vonikakis, V., Winkler, S.: California-nd: An annotated dataset for near-duplicate detection in personal photo collections. In: Burnett, I.S. (ed.) QoMEX, pp. 142–147. IEEE (2013)Google Scholar
  11. 11.
    Lee, D.C., Ke, Q., Isard, M.: Partition min-hash for partial duplicate image discovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 648–662. Springer, Heidelberg (2010), CrossRefGoogle Scholar
  12. 12.
    Li, L., Wu, Z., Zha, Z.J., Jiang, S., Huang, Q.: Matching content-based saliency regions for partial-duplicate image retrieval. In: 2011 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (July 2011)Google Scholar
  13. 13.
    Manjunath, B., Salembier, P., Sikora, T.: Introduction to MPEG-7: multimedia content description interface. John Wiley & Sons Inc. (2002)Google Scholar
  14. 14.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, vol. 2, pp. 2161–2168. IEEE Computer Society, Washington, DC (2006), Google Scholar
  15. 15.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  16. 16.
    Reinhardt, C.: Taxi cab geometry: History and applicationsGoogle Scholar
  17. 17.
    Smeaton, A.F., Kraaij, W., Over, P.: The TREC VIDeo retrieval evaluation (TRECVID): A case study and status report. In: Proceedings of RIAO 2004 (2004)Google Scholar
  18. 18.
    Viaccess-Orca: Going deep into discovery. Tech. rep., Viaccess-Orca (2013),
  19. 19.
    Wang, Y., Hou, Z., Leman, K.: Keypoint-based near-duplicate images detection using affine invariant feature and color matching. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1209–1212 (May 2011)Google Scholar
  20. 20.
    Wu, Z., Xu, Q., Jiang, S., Huang, Q., Cui, P., Li, L.: Adding affine invariant geometric constraint for partial-duplicate image retrieval. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 842–845 (August 2010)Google Scholar
  21. 21.
    Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 25–32 (June 2009)Google Scholar
  22. 22.
    Xie, H., Gao, K., Zhang, Y., Tang, S., Li, J., Liu, Y.: Efficient feature detection and effective post-verification for large scale near-duplicate image search. IEEE Transactions on Multimedia 13(6), 1319–1332 (2011)CrossRefGoogle Scholar
  23. 23.
    Xu, D., Cham, T.J., Yan, S., Duan, L., Chang, S.F.: Near duplicate identification with spatially aligned pyramid matching. IEEE Transactions on Circuits and Systems for Video Technology 20(8), 1068–1079 (2010)CrossRefGoogle Scholar
  24. 24.
    Yang, X., Zhu, Q., Cheng, K.T.: Near-duplicate detection for images and videos. In: Proceedings of the First ACM Workshop on Large-scale Multimedia Retrieval and Mining, LS-MMRM 2009, pp. 73–80. ACM, New York (2009), Google Scholar
  25. 25.
    Zhang, D.Q., Chang, S.F.: Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, MULTIMEDIA 2004, pp. 877–884. ACM, New York (2004), Google Scholar
  26. 26.
    Zheng, L., Qiu, G., Huang, J., Fu, H.: Salient covariance for near-duplicate image and video detection. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 2537–2540 (September 2011)Google Scholar
  27. 27.
    Zhu, J., Hoi, S.C.H., Lyu, M.R., Yan, S.: Near-duplicate keyframe retrieval by semi-supervised learning and nonrigid image matching. ACM Trans. Multimedia Comput. Commun. Appl. 7(1), 4:1–4:24 (2011), CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Adi Eshkol
    • 1
  • Michał Grega
    • 2
  • Mikołaj Leszczuk
    • 2
  • Ofer Weintraub
    • 1
  1. 1.Orca InteractiveRa’ananaIsrael
  2. 2.AGH University of Science and TechnologyKrakowPoland

Personalised recommendations