Abstract
In the standard vector of locally aggregated descriptors (VLAD) model, the residual norm of each descriptor with corresponding center to the VLAD vector varies significantly, thus the visual similarity measure based on norm measurement will be corrupted. To address the problem, a weighted-residual is used in this paper to balance the distribution of residual norms of all descriptors in the VLAD vector to a certain degree. Moreover, to improve retrieval accuracy, local gradient and local color features of images are extracted and used to our proposed weighted VLAD method. Also, the global structural features and multiple weighted VLAD vectors are fused to represent an image. In order to better fuse these different image features, due to the advantage of collecting key information of original signal by compressed sensing (CS), it is used as a converter to transform the various features into a same feature subspace. Our proposed method is evaluated on three benchmark datasets, i.e., Holidays, Ukbench and Oxford5k. Experimental results show that our proposed method achieves good performances compared with other methods.
Similar content being viewed by others
References
Baraniuk RG (2007) Compressive sensing [Lecture Notes]. IEEE Signal Process Mag 24(4):118–121. https://doi.org/10.1109/MSP.2007.4286571
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Understand 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: European Conference on Computer Vision(ECCV), pp 584–599. https://doi.org/10.1007/978-3-319-10590-1_38
Chen QS, Ding YY, Li H, Wang J, Deng X (2014) A novel multifeature fusion and sparse coding-based framework for image retrieval. In: IEEE International Conference on Systems, Man and Cybernetics (SMC), pp 2391–2396. https://doi.org/10.1109/SMC.2014.6974284
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional networks. BMVC. https://doi.org/10.5244/C.28.6
Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: ACM Sigmm International Workshop on Multimedia Information Retrieval, pp 253–262. https://doi.org/10.1145/1101826.1101866
Delhumeau J, Gosselin PH (2013) Revisiting the VLAD image representation. In: Proceedings of the 21st ACM international conference on Multimedia, ACM, pp 653–656. https://doi.org/10.1145/2502081.2502171
Fan P, Men A, Chen MY (2009) COLOR-SURF: a SURF descriptor with local kernel color histograms. In: IEEE international conference on network infrastructure and digital content, pp 726–730. https://doi.org/10.1109/ICNIDC.2009.5360809
Gong YCH, Wang LW, Guo RQ, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: 13th European Conference on Computer Vision (ECCV), pp 392407. https://doi.org/10.1007/978-3-319-10584-0_26
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision (ECCV), pp 241–257. https://doi.org/10.1007/978-3-319-46466-4_15
Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In: The 1st ACM international conference on multimedia information retrieval (MIR’08), pp 39–43. https://doi.org/10.1145/1460096.1460104
Jégou H, Douze M, Schmid C (2008) Hamming emb edding and weak geometric consistency for large cale image search. In: 13th European Conference on Computer Vision (ECCV), pp 304–317. https://doi.org/10.1007/978-3-540-88682-2_24
Jégou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3304–3311. https://doi.org/10.1109/CVPR.2010.5540039
Jégou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. In: 13th European Conference on Computer Vision (ECCV), pp 774–787. https://doi.org/10.1007/978-3-642-33709-3_55
Jégou H, Perronnin F, Douze M, Snchez J, Prez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Patt Anal Mach Intel 34(9):1704–1716. https://doi.org/10.1109/TPAMI.2011.235
Jégou H, Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3310–3317. https://doi.org/10.1109/CVPR.2014.417
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1097–1105. https://doi.org/10.1145/3065386
Kan SC, Cen YG, Cen Y, Wang YH, Voronin V, Mladenovic V, Zeng M (2017) SURF binarization and fast codebook construction for image retrieval. J Vis Commun Image Rep 49:104–114. https://doi.org/10.1016/j.jvcir.2017.08.006
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Nister D, Stewenius H (2006) Scalable Recognition with a Vocabulary Tree. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 2161–2168. https://doi.org/10.1109/CVPR.2006.264
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition (CVPR’07), pp 1–8. https://doi.org/10.1109/CVPR.2007.383172
Perronnin F, Dance C (2007) Fisher Kernels on Visual Vocabularies for Image Categorization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8. https://doi.org/10.1109/CVPR.2007.383266
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8. https://doi.org/10.1109/CVPR.2008.4587635
Relja A, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2911–2918. https://doi.org/10.1109/CVPR.2012.6248018
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 512–519. https://doi.org/10.1109/CVPRW.2014.131
Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. In: 9th IEEE international conference on computer vision (ICCV), pp 1470–1477. https://doi.org/10.1109/ICCV.2003.1238663
Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Patt Anal Mach Intell (PAMI) 31(4):591–606. https://doi.org/10.1109/TPAMI.2008.111
Shi J, Jiang ZG, Feng H, Zhang LG (2012) SIFT-based Elastic sparse coding for image retrieval. In: 19th IEEE international conference on image processing (ICIP), pp 2437–2440. https://doi.org/10.1109/ICIP.2012.6467390
Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris IY, Tsoumakas G, Vlahavas I (2014) A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans Multimed 16(6):1713–1728. https://doi.org/10.1109/TMM.2014.2329648
Tolias G, Sicre R, Jégou H (2016) Particular object retrieval with integral maxpooling of cnn activations. Comput Sci. arXiv: 1511:05879
Vigo DAR, Khan FS, Weijer JVD, Gevers T (2010) The impact of color on bag-of-words based object recognition[J]. In: 20th international conference on pattern recognition (ICPR), pp 1549–1553. https://doi.org/10.1109/ICPR.2010.383
Weijer JVD, Schmid C, Verbeek J, Larlus D (2009) Learning color names for real-world applications. IEEE Trans Image Process 18(7):1512–1523. https://doi.org/10.1109/TIP.2009.2019809
Wang YH, Cen YG, Zhao RZH, Cen Y, Hu SHH, Voronin V, Wang HY (2017) Separable vocabulary and feature fusion for image retrieval based on sparse representation. Neurocomputing, 236(C):14–22. https://doi.org/10.1016/j.neucom.2016.08.106
Yandex AB, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: IEEE international conference on computer vision (ICCV), pp 1269–1277. https://doi.org/10.1109/ICCV.2015.150
Zeiler MD, Fergus R (2013) Visualizing and understanding convolutional networks. Lect Notes Comput Sci 8689:818–833. arXiv:1311.2901
Zheng L, Yang Y, Tian Q (2018) SIFT meets CNN: a decade survey of instance retrieval [J]. IEEE Trans Patt Anal Mach Intell 40(5):1224–1244. https://doi.org/10.1109/TPAMI.2017.2709749
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Cen, Y., Zhao, R. et al. Compressed sensing based feature fusion for image retrieval. J Ambient Intell Human Comput 14, 14893–14905 (2023). https://doi.org/10.1007/s12652-018-0895-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0895-z