Abstract
Online image detection is one of the most critical components of an image deduplication technique for an efficient cloud storage system. Although extensive research has been conducted in this field, the problem still remains challenging. Deep learning techniques have achieved significant success in solving a variety of computer vision issues and have high potential in image deduplication techniques. Deduplication is an efficient method in a cloud storage system that minimizes redundant data at the file or sub-file level using cryptographic hash signatures. Although significant research on offline image deduplication techniques have been reported, yet limited research is available on online image deduplication techniques. Online image matching accuracy and performance has been a major challenge for online image deduplication techniques to detect exact or near-exact images using feature extraction techniques. These first use feature extraction techniques to extract image features and then match these image features to detect duplicate images. In this paper, we have proposed a Deep CNN based online image deduplication technique for a cloud storage system to detect exact and near-exact images using cross-domains, even in the presence of perturbations in the form of blur, noise, compression, lighting variations and many more. The experimental results show that our proposed deep CNN for online image deduplication technique outperforms in terms of image matching accuracy and performance. The paper also proposed a Hot Decomposition Vector (HDV) for image patch generation to store efficiently dissimilar parts of near-exact images. The experimental results demonstrate that HDV exhibits higher and stable image matching accuracy in all three types of image deformations with relatively small computation time.
Similar content being viewed by others
References
Alkawaz MH, Sulong G, Saba T, Rehman A (2018) Detection of copy-move image forgery based on discrete cosine transform. Neural Comput & Applic 30(1):183–192. https://doi.org/10.1007/s00521-016-2663-3
Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2010) A view of cloud computing. Commun ACM 53:50–58
Banerji S, Sinha A, Liu C (2013) New image descriptors based on color, texture, shape, and wavelets for object and scene image classification. Neurocomputing 117:173–185. https://doi.org/10.1016/j.neucom.2013.02.014
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Biadgie Y, Sohn KA (2014) Feature detector using adaptive accelerated segment test. In: 2014 international conference on information science and applications (ICISA) 2014, (pp. 1-4). IEEE. https://doi.org/10.1109/ICISA.2014.6847403
Chen CC, Hsieh SL (2015) Using binarization and hashing for efficient SIFT matching. J Vis Commun Image Represent 30:86–93. https://doi.org/10.1016/j.jvcir.2015.02.014
Chen M, Wang Y, Zou X, Wang S, Wu G (2012) A duplicate image deduplication approach via Haar wavelet technology. In: 2nd International Conference on Cloud Computing and Intelligence Systems, IEEE 2: 624–628. https://doi.org/10.1109/CCIS.2012.6664249
Chen M, Wang S, Tian L (2013) A high-precision duplicate image deduplication approach. J Comput 8(11):2768–2775. https://doi.org/10.4304/jcp.8.11.2768-2775
Dharshini T, Angelina JJR (2019) Review and analysis of image and video deduplication techniques. J Inf Comput Sci 9(12)
Ding C, Tao D (2015) Robust face recognition via multimodal deep face representation. IEEE Trans Multimedia 17(11):2049–2058. https://doi.org/10.1109/TMM.2015.2477042
Diwakar M, Kumar M (2018) A review on CT image noise and its denoising. Biomed Signal Process Control 42:73–88. https://doi.org/10.1016/j.bspc.2018.01.010
Diwakar M, Kumar M (2018) CT image denoising using NLM and correlation-based wavelet packet thresholding. IET Image Process 12(5):708–715
Diwakar M, Singh P (2010) CT image denoising using multivariate model and its method noise thresholding in non-subsampled shearlet domain. Biomed Signal Process Control 57:101754. https://doi.org/10.1016/j.bspc.2019.101754
Foo JJ, Sinha R, Zobel J (2007) SICO: a system for detection of near-duplicate images during search. Multimedia and Expo, IEEE International Conference on, 595–598. https://doi.org/10.1109/ICME.2007.4284720
Gang H, Yan H, Xu L (2015) Secure image deduplication in cloud storage. In: Khalil I, Neuhold E, Tjoa A, Xu L, You I (eds) Information and Communication Technology ICT-EurAsia, Lecture Notes in Computer Science, vol 9357. Springer, pp 243–251. https://doi.org/10.1007/978-3-319-24315-3_25
Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158. https://doi.org/10.1109/TPAMI.2015.2437384
Grabner M, Grabner H, Bischof H (2006) Fast approximated SIFT. Lecture Notes in Computer Science, LNCS. In: Asian conference on computer vision, Springer, Berlin, 3851: 918–927. https://doi.org/10.1007/11612032_92
Harris C, Stephens M (1988) A combined corner and edge detector. Procedings of the Alvey Vision Conference 1988, 23.1-23.6. https://doi.org/10.5244/C.2.23
Hua Y, He W, Liu X, Feng D (2015) SmartEye: real-time and efficient cloud image sharing for disaster environments. IEEE Conference on Computer Communications (INFOCOM) 1616-1624, https://doi.org/10.1109/INFOCOM.2015.7218541
Huang F, Zhou Z, Yang CN, Liu X, Wang T (2019) Original image tracing with image relational graph for near-duplicate image elimination. Int J Comput Sci Eng 18:294–304. https://doi.org/10.1504/IJCSE.2019.098540
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia ACM, pp. 675-678. https://doi.org/10.1145/2647868.2654889
Juan L, Gwun O (2009) A comparison of sift, pca-sift and surf. Int J Image Process (IJIP) 3(4):143–152
Jung HM, Park WV, Lee WY, Lee JG, Ko YW (2011) Data deduplication system for supporting multi-mode. In Asian Conference on Intelligent Information and Database Systems, Lecture Notes in Computer Science, Springer 6591:78-87. https://doi.org/10.1007/978-3-642-20039-7-8
Jyoti RR (2019) A hybrid approach for effective image deduplication using PCA, SPIHT and compressive sensing. Women Institute of Technology Conference on Electrical and Computer Engineering (WITCON ECE) 129-134. https://doi.org/10.1109/WITCONECE48374.2019.9092894
Kalia R, Lee KD, Samir BVR, Je SK, Oh WG (2011) An analysis of the effect of different image preprocessing techniques on the performance of SURF: speeded up robust features. 17th Korea-Japan joint workshop on Frontiers of computer vision, FCV 2011. https://doi.org/10.1109/FCV.2011.5739756
Kaur R, Chana I, Bhattacharya J (2018) Data deduplication techniques for efficient cloud storage management: a systematic review. J Supercomput 74(5):2035–2085. https://doi.org/10.1007/s11227-017-2210-8
Ke Y, Sukthankar R (2004) PCA-SIFT: “a more distinctive representation for local image descriptors, in proc. Conf. Computer vision and Pattern Recognition, pp. 511-517, 2004. https://doi.org/10.1109/CVPR.2004.1315206
Ke Y, Sukthankar R, Huston L (2004) Efficient near-duplicate detection and sub-image retrieval. In Proceedings of ACM International Conference on Multimedia (MM), 4(1).
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris I (2019) Finding near-duplicate videos in large-scale collections. In: Mezaris V, Nixon L, Papadopoulos S, Teyssou D (eds) Video Verification in the Fake News Era. Springer. https://doi.org/10.1007/978-3-030-26752-0_4
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer science department, University of Toronto. Tech Rep 1(4):7
Kumar M, Diwakar M (2018) CT image denoising using locally adaptive shrinkage rule in tetrolet domain. J King Saud Unive-Comput Info Sci 30(1):41–50. https://doi.org/10.1016/j.jksuci.2016.03.003
Kumar M, Diwakar M (2019) A new exponentially directional weighted function based CT image denoising using total variation. J King Saud Univ-Comput Inf Sci 31(1):113–124. https://doi.org/10.1016/j.jksuci.2016.12.002
Kumar PM, Devi GU, Basheer S, Parthasarathy P (2020) A study on data de-duplication schemes in cloud storage. Int J Grid Util Comput 11:509–516. https://doi.org/10.1504/IJGUC.2020.108450
Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350:1332–1338. https://doi.org/10.1126/science.aab3050
Latif A, Rasheed A, Sajid U, Ahmed J, Ali N, Ratyal NI, Zafar B, Dar SH, Sajid M, Khalil T (2019) Content-based image retrieval and feature extraction: a comprehensive review. Math Probl Eng. https://doi.org/10.1155/2019/9658350
Lee SH, Chan CS, Wilkin P, Remagnino P (2015) Deep-plant: plant identification with convolutional neural networks. Proceedings - International Conference on Image Processing, ICIP, pp.452–456. https://doi.org/10.1109/ICIP.2015.7350839
Lei Y, Qiu G, Zheng L, Huang J (2014) Fast near-duplicate image detection using uniform randomized trees. ACM Trans Multimed Comput Commun Appl (TOMM) 10(4):1–15. https://doi.org/10.1145/2602186
Leutenegger S, Chli M, Siegwart RY (2011) BRISK: binary robust invariant scalable keypoints. Proceedings of the IEEE International Conference on Computer Vision, 2548–2555. https://doi.org/10.1109/ICCV.2011.6126542
Li L, Zic J (2014) Image matching algorithm based on feature-point and DAISY descriptor. J Multimed 9(6):829–834. https://doi.org/10.4304/jmm.9.6.829-834
Li J, Qian X, Li Q, Zhao Y, Wang L, Tang YY (2015) Mining near duplicate image groups. Multimed Tools Appl 74(2):655–669. https://doi.org/10.1007/s11042-014-2008-0
Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. J Soft Comput 20(4):1437–1448. https://doi.org/10.1007/s00500-015-1596-6
Liang S, Wang P (2020) An efficient hierarchical near-duplicate video detection algorithm based on deep semantic features. In: Ro Y et al (eds) MultiMedia modeling. MMM lecture notes in computer science, vol 11961. Springer. https://doi.org/10.1007/978-3-030-37731-1_61
Liu YH (2018) Feature extraction and image recognition with convolutional neural networks. J Phys Conf Ser 1087(6):062032
Liu L, Lu Y, Suen CY (2015) Variable-length signature for near-duplicate image matching. IEEE Trans Image Process 24(4):1282–1296. https://doi.org/10.1109/TIP.2015.2400229
Liu D, Shen J, Wang A, Wang C (2020) Secure real-time image protection scheme with near-duplicate detection in cloud computing. J Real-Time Image Proc 17:175–184. https://doi.org/10.1007/s11554-019-00887-6
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151
Mao B, Jiang H, Wu S, Fu Y, Tian L (2014) Read-performance optimization for deduplication-based storage Systems in the Cloud. ACM Trans Storage 10(2):1–22. https://doi.org/10.1145/2512348
Miksik O, Mikolajczyk K (2012) Evaluation of local detectors and descriptors for fast feature matching. In: Proceedings of 21st international conference of pattern recognition (ICPR), 2012, pp. 2681-2684. IEEE
Mohamed SMA, Wang Y (2020) A survey on novel classification of deduplication storage systems. Distrib Parallel Databases. https://doi.org/10.1007/s10619-020-07301-2
Mohapatra S, Bajpai N, Swarnkar T, Mishra M (2020) Raw data redundancy elimination on cloud database. In: Das A, Nayak J, Naik B, Dutta S, Pelusi D(eds), Computational intelligence in pattern recognition. Adv Intel Sys Comput 1120:395–405. https://doi.org/10.1007/978-981-15-2449-3_34
Nbt Y, Ismail A, Majid NAA (2016) Deduplication image middleware detection comparison in standalone cloud database. Int J Adv Comput Sci Technol (IJACST) 5:12–18
Nian F, Li T, Wu X, Gao Q, Li F (2016) Efficient near-duplicate image detection with a local-based binary representation. Multimed Tools Appl 75(5):2435–2452. https://doi.org/10.1007/s11042-015-2472-1
Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. In: IEEE computer society conference on computer vision and Pattern Recognition (CVPR'06), 2:1447-1454. https://doi.org/10.1109/CVPR.2006.42
Pang Y, Li W, Yuan Y, Pan J (2012) Fully affine invariant SURF for image matching. Neurocomputing 85:6–10. https://doi.org/10.1016/j.neucom.2011.12.006
Paulin M, Douze M, Harchaoui Z, Mairal J, Perronin F, Schmid C (2015) Local convolutional features with unsupervised training for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp. 91-99
Paulo J, Pereira J (2014) A survey and classification of storage deduplication systems. ACM Comput Surv 47(1):1–30. https://doi.org/10.1145/2611778
Peker KA (2011) Binary sift: fast image retrieval using binary quantized sift features. In: Content-based multimedia indexing (CBMI), in 9th international workshop on 2011, pp. 217-222, IEEE. https://doi.org/10.1109/CBMI.2011.5972548
Pietro RD, Sorniotti A (2016) Proof of ownership for deduplication systems: a secure, scalable, and efficient solution. Comput Commun 82:71–82. https://doi.org/10.1016/j.comcom.2016.01.011
Raghavendra U, Fujita H, Bhandary SV, Gudigar A, Hong TJ, Acharya UR (2018) Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images. Inf Sci 441:41–49. https://doi.org/10.1016/j.ins.2018.01.051
Ramaiah NP, Mohan CK (2011) De-duplication of photograph images using histogram refinement. IEEE Recent Advances in Intelligent Computational Systems 391–395. https://doi.org/10.1109/RAICS.2011.6069341
Seo JS, Haitsma J, Kalker T, Yoo CD (2004) A robust image fingerprinting system using the radon transform. Signal Process Image Commun 19(4):325–339. https://doi.org/10.1016/j.image.2003.12.001
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Song J (2017) Binary generative adversarial networks for image retrieval. arXiv preprint arXiv:1708.04150
Song J, Yang Y, Huang Z, Shen HT, Luo J (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 15:1997–2008. https://doi.org/10.1109/TMM.2013.2271746
Song J, Zhang H, Li X, Gao L, Wang M, Hong R (2018) Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans Image Process 27:3210–3221. https://doi.org/10.1109/TIP.2018.2814344
Song J, He T, Gao L, Xu X, Hanjalic A, Shen HT (2020) Unified binary generative adversarial network for image retrieval and compression. Int J Comput Vis 128:2243–2264. https://doi.org/10.1007/s11263-020-01305-2
Srinivas S, Sarvadevabhatla RK, Mopuri KR, Prabhu N, Kruthiventi SS, Babu RV (2016) A taxonomy of deep convolutional neural nets for computer vision. Front Robot AI https://doi.org/10.3389/frobt.2015.00036
Sujatha G, Raj JR (2020) A study on image hashing techniques for implementing deduplication. In: Hemanth D., Vadivu G., Sangeetha M., Balas V. (eds), In Artificial Intelligence Techniques for Advanced Computing Applications. Lecture Notes in Networks and Systems 130:529-534. https://doi.org/10.1007/978-981-15-5329-5_49
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, https://doi.org/10.1109/CVPR.2015.7298594
Takeshita J, Karl R, Jung T (2020) Secure single-server nearly-identical image deduplication. arXiv preprint arXiv:2005.02330
Thaiyalnayaki S, Sasikala J, Ponraj R (2018) Indexing near-duplicate images in web search using minhash algorithm. Mater Today: Proc 5(1):1943–1949. https://doi.org/10.1016/j.matpr.2017.11.297
Thaiyalnayaki S, Sasikala J, Ponraj R (2019) Detecting near-duplicate images using segmented minhash algorithm. Int J Adv Intell Paradigms 12:192–206. https://doi.org/10.1504/IJAIP.2019.096963
Thyagharajan KK, Kalaiarasi G (2020) A review on near-duplicate detection of images using computer vision techniques. Arch Comput Methods Eng 6:1–20. https://doi.org/10.1007/s11831-020-09400-w
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. Proceedings of the IEEE International Conference on Computer Vision, 2015 Inter, 1904–1912. https://doi.org/10.1109/ICCV.2015.221
Vedaldi A, Lenc K (2015) MatConvNet - Convolutional Neural Networks for MATLAB. In: Proceedings of the 23rd ACM international conference on Multimedia, 2015 pp. 689-692. https://doi.org/10.1145/2733373.2807412
Velmurugan K, Baboo LDSS (2011) Content-based image retrieval using SURF and colour moments. Global J Comput Sci Technol 11(10)
Wan J, Han S, Zhang J, Zhu B, Zhou L (2013) An image management system implemented on open-source cloud platform. Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2064–2070. https://doi.org/10.1109/IPDPSW.2013.176
Wang J, Chen X (2016) Efficient and secure storage for outsourced data: a survey. Data Sci Eng 1(3):178–188. https://doi.org/10.1007/s41019-016-0018-9
Wang JG, Li J, Lee CY, Yau WY (2010) Dense SIFT and Gabor descriptors-based face representation with applications to gender recognition. In: 11th International Conference on Control, Automation, Robotics and Vision, ICARCV 2010, 1860–1864. https://doi.org/10.1109/ICARCV.2010.5707370
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 1912–1920
Xia W, Zou X, Jiang H, Zhou Y, Liu C, Feng D, Hua Y, Hu Y, Zhang Y (2020) The Design of Fast Content-Defined Chunking for data deduplication based storage systems. IEEE Trans Parallel Distrib Syst 31:2017–2031. https://doi.org/10.1109/TPDS.2020.2984632
Yang S, Luo P, Loy CC, Tang X (2015) From facial parts responses to face detection: a deep learning approach. In: Proceedings of the IEEE International Conference on Computer Vision 2015, pp. 3676-3684. https://doi.org/10.1109/ICCV.2015.419
Yao J, Yang B, Zhu Q (2015) Near-duplicate image retrieval based on contextual descriptor. IEEE Signal Process Lett 22(9):1404–1408 https://digitalcommons.unomaha.edu/compscifacpub/25
Yu X, Huang T (2009) A SIFT-based image fingerprinting approach robust to geometric transformations. In: Circuits and systems, ISCAS 2009. IEEE international symposium on, pp. 1665-1668. IEEE. https://doi.org/10.1109/ISCAS.2009.5118093
Zargar AJ, Singh N, Rathee G, Singh AK (2015) Image data-deduplication using the block truncation coding technique. IEEE International Conference on Futuristic Trends in Computational Analysis and Knowledge Management 154–158. https://doi.org/10.1109/ABLAZE.2015.7154986
Zeng X, Wen S, Zeng Z, Huang T (2018) Design of memristor-based image convolution calculation in convolutional neural network. Neural Comput & Applic 30(2):503–508. https://doi.org/10.1007/s00521-016-2700-2
Zhang J, Feng Z, Su Y (2008) A new approach for detecting copy-move forgery in digital images. 11th IEEE Singapore International Conference on Communication Systems, ICCS 2008, 362–366. https://doi.org/10.1109/ICCS.2008.4737205
Zhang D, Sun Z, Jia K (2020) Near-duplicate video detection based on temporal and spatial key points. In: Kountchev R, Patnaik S, Shi J, Favorskaya M (eds) Advances in 3D image and graphics representation, analysis, computing and information technology. Smart innovation, systems and Technologies, vol 180. Springer, Singapore, pp 129–137. https://doi.org/10.1007/978-981-15-3867-4_16
Zhao WL, Ngo CW, Tan HK, Wu X (2007) Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Trans Multimedia 9(5):1037–1048. https://doi.org/10.1109/TMM.2007.898928
Zhou W, Newsam S, Li C, Shao Z (2018) PatternNet: a benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J Photogramm Remote Sens 145:197–209. https://doi.org/10.1016/j.isprsjprs.2018.01.004
Zuo F, de With PH (2004) Fast facial feature extraction using a deformable shape model with haar-wavelet based local texture attributes. In: IEEE international conference on image processing, ICIP'04, 3:1425-1428. https://doi.org/10.1109/ICIP.2004.1421330
Zuo P, Hua Y, Liu X, Feng D, Xia W, Cao S, Wu J, Sun Y, Guo Y (2017) BEES: bandwidth- and energy- efficient image sharing for real-time situation awareness. Proceedings - International Conference on Distributed Computing Systems, 1510–1520. https://doi.org/10.1109/ICDCS.2017.36
Acknowledgements
This research is supported by the Department of Science and Technology, Government of India under WOS (Women Scientists Scheme) sponsored research project entitled “Distributed Data Deduplication Technique for efficient Cloud-Based Storage System” under File No: SR/WOS-A/ET-119/2016.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kaur, R., Bhattacharya, J. & Chana, I. Deep CNN based online image deduplication technique for cloud storage system. Multimed Tools Appl 81, 40793–40826 (2022). https://doi.org/10.1007/s11042-022-13182-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13182-7