FhVLAD: Fine-grained quantization and encoding high-order descriptor statistics for scalable image retrieval

Bhowmick, Alexy; Saharia, Sarat; Hazarika, Shyamanta M.

doi:10.1007/s11042-020-10491-7

FhVLAD: Fine-grained quantization and encoding high-order descriptor statistics for scalable image retrieval

1166: Advances of machine learning in data analytics and visual information processing
Published: 02 February 2021

Volume 80, pages 35495–35520, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

369 Accesses
1 Citation
Explore all metrics

Abstract

We are interested in the encoding of local descriptors of an image (e.g. SIFT) to design a compact representation vector and thereby address scalable image retrieval. We revisit the implicit design choices in the popular vector of locally aggregated descriptors (VLAD), which aggregates the residuals of descriptors to the codewords. VLAD’s use of a coarse codebook and first-order descriptor statistics in residual computation results in less discriminative residuals. To address this problem, we propose a division of codebook feature space using a novel fine-grained quantization strategy. After quantization, we embed the resulting residuals with high-order statistics of descriptor distribution. Experiments on three challenging image retrieval datasets (INRIA Holidays, UKBench, Oxford 5k) confirm the improved discriminative power of our novel encoding method called FhVLAD. We observe superior accuracy to baseline and competitive performance to state-of-the-art techniques with a limited increase in dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Image retrieval using underlying importance feature histogram

Article 15 May 2024

Perceptual image quality assessment: a survey

Article 26 April 2020

2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors

Article 20 February 2021

Notes

References

Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2911–2918
Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J (2018) NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1437–1451. conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence
Article Google Scholar
Arandjelovic R, Zisserman A (2013) All About VLAD. In: 2013 IEEE conference on computer vision and pattern recognition, pp 1578–1585
Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV), IEEE Computer Society, USA, ICCV ’15, pp 1269–1277
Babenko A, Slesarev A, Chigorin A, Lempitsky VS (2014) Neural codes for image retrieval. In: Computer vision - ECCV 2014 - 13th european conference, zurich, Switzerland, September 6-12, 2014, Proceedings Part I, pp 584–599
Balanda KP, MacGillivray HL (1988) Kurtosis: a critical review. The American Statistician 42(2):111–119
Google Scholar
Bay H, Tuytelaars T, Gool LV (2006) SURF: speeded up robust features. In: Computer vision – ECCV 2006, Lecture Notes in Computer Science. springer, Berlin, pp 404–417
Bhowmick A, Saharia S, Hazarika SM (2019) Encoding high-Order statistics in VLAD for scalable image retrieval. In: Deka B, Maji P, Mitra S, Bhattacharyya DK, Bora PK, Pal SK (eds) Pattern recognition and machine intelligence, lecture notes in computer science. Springer International Publishing, Cham, pp 559–566
Bishop C (2006) Pattern recognition and machine learning information science and statistics. Springer, New York
MATH Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: In workshop on statistical learning in computer vision, ECCV, pp 1–22
Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the VLAD image representation, ACM, New York
Eggert C, Romberg S, Lienhart R (2014) Improving VLAD: Hierarchical coding and a refined local coordinate system. In: 2014 IEEE international conference on image processing (ICIP), pp 3018–3022
Gao W, Zhu Y, Zhang W, Zhang K, Gao H (2019) A hierarchical recurrent approach to predict scene graphs from a visual-attention-oriented perspective. Comput Intell 35(3):496–516
Article MathSciNet Google Scholar
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Fleet DJ, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision - ECCV 2014 - 13th european conference, zurich, switzerland, september 6-12, 2014, proceedings, Part VII, Springer, Lecture Notes in Computer Science, vol 8695, pp 392–407
Husain SS, Bober M (2016) Improving large-Scale image retrieval through robust aggregation of local descriptors. IEEE Trans Pattern Anal Mach Intell 39 (9):1783–1796
Article Google Scholar
Jegou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of PCA and whitening. In: Computer vision – ECCV 2012, Lecture Notes in Computer Science. Springer, Berlin, pp 774–787
Jegou H, Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: Proc IEEE Conf computer vision and patter recognition, pp 3310–3317
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of the 10th European conference on computer vision: Part I, Springer, Berlin, Heidelberg, ECCV ’08, pp 304–317
Jegou H, Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. Int J Comput Vis 87(3):316–336
Article Google Scholar
Jegou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 3304–3311
Jegou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition - Volume 2, IEEE Computer Society, USA, CVPR ’06, pp 2169–2178
Li Q, Peng Q, Yan C (2018) Multiple VLAD encoding of CNNs for image classification. Comput Sci Eng 20(2):52–63
Article Google Scholar
Liu L, Wang L, Liu X (2011) In Defense of Soft-assignment Coding. In: Proceedings of the 2011 international conference on computer vision, ICCV ’11. IEEE Computer Society, Washington, pp 2486–2493
Liu P, Miao Z, Guo H, Wang Y, Ai N (2018) Adding spatial distribution clue to aggregated vector in image retrieval. EURASIP J Image Video Process 2018(1):9
Article Google Scholar
Liu Z, Houqiang L, Wengang Z, Ting R, Qi T (2016) Making residual vector distribution uniform for distinctive image representation. IEEE Trans Circ Syst Video Technol 26(2):375–384
Article Google Scholar
Liu Z, Wang S, Tian Q (2016) Fine-residual VLAD for Image Retrieval. Neurocomput 173(P3):1183–1191
Article Google Scholar
Long X, Lu H, Peng Y, Wang X, Feng S (2016) Image classification based on improved VLAD. Multimedia Tools and Applications 75(10):5533–5555
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-Invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mairal J, Koniusz P, Harchaoui Z, Schmid C (2014) Convolutional kernel networks. In: Proceedings of the 27th international conference on neural information processing systems, NIPS’14, vol 2. MIT Press, Cambridge, pp 2627–2635
Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vision 60(1):63–86
Article Google Scholar
Mironică I, Duţă IC, Ionescu B, Sebe N (2016) A modified vector of locally aggregated descriptors approach for fast video classification. Multimedia Tools and Applications 75(15):9045–9072
Article Google Scholar
Ng JYH, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval. In: 2015 IEEE Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, Boston, pp 53–61
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2161–2168
Noh H, Araujo A, Sim J, Weyand T, Han B (2017) Large-Scale image retrieval with attentive deep local features. In: IEEE International conference on computer vision, ICCV 2017, venice, italy, October 22-29, 2017, pp 3476–3485
Paulin M, Douze M, Harchaoui Z, Mairal J, Perronin F, Schmid C (2015) Local convolutional features with unsupervised training for image retrieval. In: 2015 IEEE international conference on computer vision (ICCV), pp 91–99
Peng X, Wang L, Qiao Y, Peng Q (2014) Boosting VLAD with supervised dictionary learning and high-Order statistics. In: Computer vision – ECCV 2014, Lecture Notes in Computer Science. Springer, Cham, pp 660–674
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: 2007 IEEE conference on computer vision and pattern recognition, pp 1–8
Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed Fisher vectors. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 3384–3391
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of the 11th European conference on computer vision: Part IV, ECCV’10. Springer, Berlin, pp 143–156
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE conference on computer vision and pattern recognition, pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition workshops, IEEE Computer Society, USA, CVPRW ’14, pp 512–519
Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. arXiv:14126574 [cs]
Sattler T, Havlena M, Schindler K, Pollefeys M (2016) Large-scale location recognition and the geometric burstiness problem. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1582–1590 . https://doi.org/10.1109/CVPR.2016.175
Shen X, Lin Z, Brandt J, Wu Y (2014) Spatially-Constrained similarity measurefor large-Scale object retrieval. IEEE Trans Pattern Anal Mach Intell 36(6):1229–1241
Article Google Scholar
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings Ninth IEEE international conference on computer vision, vol 2, pp 1470–1477
Tolias G, Sicre R, Jégou H (2016) Particular object retrieval with integral max-pooling of CNN activations. In: 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings
Tzelepi M, Tefas A (2018) Deep convolutional learning for Content Based Image Retrieval. Neurocomputing 275:2467–2478
Article Google Scholar
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained Linear Coding for image classification. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 3360–3367
Wang Y, Duan LY, Lin J, Wang Z, Huang T (2015) Hierarchical multi-VLAD for image retrieval. In: 2015 IEEE international conference on image processing (ICIP), pp 4629–4633
Wei XS, Luo JH, Wu J, Zhou ZH (2017) Selective convolutional descriptor aggregation for fine-Grained image retrieval. IEEE Trans Image Process 26(6):2868–2881
Article MathSciNet Google Scholar
Wu Z, Yu J (2019) A multi-level descriptor using ultra-deep feature for image retrieval. Multimedia Tools and Applications 78(18):25655–25672
Article Google Scholar
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition, pp 1794–1801
Yu W, Yang K, Yao H, Sun X, Xu P (2017) Exploiting the complementary strengths of multi-layer CNN features for image retrieval. Neurocomputing 237:235–241
Article Google Scholar
Zhao WL, Gravier G, Jégou H (2013) Oriented pooling for dense and non-dense rotation-invariant features. In: Burghardt T, Damen D, Mayol-Cuevas WW, Mirmehdi M (eds) British Machine Vision Conference, BMVC 2013, Bristol, UK, September 9-13, 2013, BMVA Press
Zheng J, Chen JC, Bodla N, Patel VM, Chellappa R (2016) VLAD encoded Deep Convolutional features for unconstrained face verification. In: 2016 23rd international conference on pattern recognition (ICPR), pp 4101–4106
Zhou Q, Wang C, Liu P, Li Q, Wang Y, Chen S (2016) Distribution entropy boosted VLAD for image retrieval. Entropy 18(8):311
Article Google Scholar
Zhou R, Yuan Q, Gu X, Zhang D (2014) Spatial pyramid VLAD. In: 2014 IEEE visual communications and image processing conference, pp 342–345
Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-Vector coding of local image descriptors. In: Computer vision – ECCV 2010, Lecture Notes in Computer Science. Springer, Berlin, pp 141–154

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Assam Don Bosco University, Airport Road, Azara, Guwahati, 781017, Assam, India
Alexy Bhowmick
Department of Computer Science and Engineering, Tezpur University, Tezpur, 784028, Assam, India
Alexy Bhowmick & Sarat Saharia
Biomimetic Robotics and Artificial Intelligence Lab, Department of Mechanical Engineering, Indian Institute of Technology, Guwahati, 781039, Assam, India
Shyamanta M. Hazarika

Authors

Alexy Bhowmick
View author publications
You can also search for this author in PubMed Google Scholar
Sarat Saharia
View author publications
You can also search for this author in PubMed Google Scholar
Shyamanta M. Hazarika
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexy Bhowmick.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhowmick, A., Saharia, S. & Hazarika, S.M. FhVLAD: Fine-grained quantization and encoding high-order descriptor statistics for scalable image retrieval. Multimed Tools Appl 80, 35495–35520 (2021). https://doi.org/10.1007/s11042-020-10491-7

Download citation

Received: 30 April 2020
Revised: 08 November 2020
Accepted: 29 December 2020
Published: 02 February 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11042-020-10491-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FhVLAD: Fine-grained quantization and encoding high-order descriptor statistics for scalable image retrieval

Abstract

Access this article

Similar content being viewed by others

Image retrieval using underlying importance feature histogram

Perceptual image quality assessment: a survey

2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FhVLAD: Fine-grained quantization and encoding high-order descriptor statistics for scalable image retrieval

Abstract

Access this article

Similar content being viewed by others

Image retrieval using underlying importance feature histogram

Perceptual image quality assessment: a survey

2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation