LDGC-Net: learnable descriptor graph convolutional network for image retrieval

Wang, Xingmei; Wang, Jinli; Kang, Minyang; Feng, Ze; Zhou, Xuan; Liu, Bo

doi:10.1007/s00371-022-02753-2

LDGC-Net: learnable descriptor graph convolutional network for image retrieval

Original Article
Published: 27 December 2022

Volume 39, pages 6639–6653, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xingmei Wang ORCID: orcid.org/0000-0002-0281-0336¹,
Jinli Wang¹,
Minyang Kang²,
Ze Feng¹,
Xuan Zhou¹ &
…
Bo Liu²

223 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Image retrieval is a challenging task of searching images similar to the query one from a database. Previous learning-based methods adopt various ingenious designs to increase the representatively positive and negative sample pairs in training. Still, these methods are performance immanently limited by the size of the mini-batch. To this end, we here introduce the learnable descriptor graph convolutional network (LDGC-Net), which effectively enhances the hard mining ability of the model and clears the boundary between different categories. We present an analysis of why our LDGC-Net can aggregate relationships between original descriptors in a constrained size of the mini-batch. Also, we propose an innovative end-to-end training framework with the LDGC-Net for image retrieval to accelerate model convergence. In particular, our LDGC-Net can be conveniently integrated into other current methods as a plug-and-play module with inappreciable computational cost. Experimental results in three benchmark datasets show that the proposed LDGC-Net can improve performance compared with several state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image retrieval using dual-weighted deep feature descriptor

Article 20 January 2023

A multi-level descriptor using ultra-deep feature for image retrieval

Article 30 May 2019

Aggregating Deep Features of Multi-CNN Models for Image Retrieval

Article 12 June 2023

Data availability

The datasets used in this paper are public datasets and can be obtained by contacting the relevant providers.

References

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014
Article Google Scholar
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5297–5307) (2016). https://doi.org/10.1109/TPAMI.2017.2711011
Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14141–14152) (2021). https://doi.org/10.48550/arXiv.2103.01486
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: Learning global representations for image search. In: European Conference on Computer Vision (pp. 241–257). Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_15
Cakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Deep metric learning to rank. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1861–1870) (2019). https://doi.org/10.1109/CVPR.2019.00196
Brown, A., Xie, W., Kalogeiton, V., Zisserman, A.: Smooth-ap: Smoothing the path towards large-scale image retrieval. In: European Conference on Computer Vision (pp. 677–694). Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_39
Ramzi, E., Thome, N., Rambour, C., Audebert, N., Bitot, X.: Robust and Decomposable Average Precision for Image Retrieval. Adv. Neural Inf. Process. Syst., 34 (2021). https://doi.org/10.48550/arXiv.2110.01445
Suh, Y., Han, B., Kim, W., Lee, K.M.: Stochastic class-based hard example mining for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7251–7259) (2019). https://doi.org/10.1109/CVPR.2019.00742
Ge, W.: Deep metric learning with hierarchical triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269–285) (2018)
Wang, X., Zhang, H., Huang, W., Scott, M.: Cross-batch memory for embedding learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6388–6397) (2020). https://doi.org/10.1109/CVPR42600.2020.00642
Jiang, B., Zhang, Z., Lin, D., Tang, J., Luo, B.: Semi-supervised learning with graph learning-convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11313–11320) (2019). https://doi.org/10.1109/CVPR.2019.01157
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Computer Vision, IEEE International Conference on (Vol. 3, pp. 1470–1470). IEEE Computer Society (2003). https://doi.org/10.1109/ICCV.2003.1238663
Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In: European Conference on Computer Vision (pp. 3–20). Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-01
Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2011). https://doi.org/10.1109/TPAMI.2011.235
Article Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 3304–3311). IEEE (2010). https://doi.org/10.1109/CVPR.2010.5540039
Dong, B., Zeng, F., Wang, T., Zhang, X., Wei, Y.: Solq: segmenting objects by learning queries (2021). https://doi.org/10.48550/arXiv.2106.02351
Liao, X., Li, K., Zhu, X., Liu, K.J.R.: Robust detection of image operator chain with two-stream convolutional neural network. IEEE J. Selected Top. Signal Process., 99, 1–1 (2020). https://doi.org/10.1109/JSTSP.2020.3002391
Hu, J., Liao, X., Wang, W., Qin, Z.: Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Trans. Circuits Syst. Video Technol. 99, 1–1 (2021). https://doi.org/10.1109/TCSVT.2021.3074259
Tolias, G., Jenicek, T., Chum, O.: Learning and aggregating deep local descriptors for instance-level recognition. In: European Conference on Computer Vision (pp. 460–477). Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_27
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018). https://doi.org/10.1109/TPAMI.2018.2846566
Article Google Scholar
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 2161–2168). IEEE (2006). https://doi.org/10.1109/CVPR.2006.264
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 3456–3465) (2017). https://doi.org/10.1109/ICCV.2017.374
Qin, Q., Hu, W., Liu, B.: Feature projection for improved text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8161–8171) (2020). https://doi.org/10.18653/v1/2020.acl-main.726
Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2575–2584) (2020). https://doi.org/10.48550/arXiv.2004.01804
Wu, C. Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 2840–2848) (2017). https://doi.org/10.1109/ICCV.2017.309
Revaud, J., Almazán, J., Rezende, R.S., Souza, C.R.D. Learning with average precision: Training image retrieval with a listwise loss. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 5107–5116) (2019). https://doi.org/10.1109/ICCV.2019.00521
Hamilton, W. L., Ying, R., Leskovec, J.: Representation Learning on Graphs: Methods and Applications (2017). arXiv preprint: https://doi.org/10.48550/arXiv.1709.05584
Bruna, J., Zaremba, W., Szlam, A., Lecun, Y.: Spectral networks and locally connected networks on graphs. Computer Science (2013). https://doi.org/10.48550/arXiv.1312.6203
Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data (2015). arXiv preprint: https://doi.org/10.48550/arXiv.1506.05163
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst., 29 (2016). https://doi.org/10.48550/arXiv.1606.09375
Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. Adv. Neural Inf. Process. Syst., 29 (2016). https://doi.org/10.48550/arXiv.1511.02136
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning (pp. 1263–1272) (2017). PMLR. https://doi.org/10.48550/arXiv.1704.01212
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning (pp. 10347–10357) (2021). PMLR. https://doi.org/10.48550/arXiv.2012.12877
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision (pp. 630–645). Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-038
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International CONFERENCE ON MACHINE LEARNING (pp. 448–456). PMLR (2015). https://doi.org/10.5555/3045118.3045167
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4004–4012) (2016). https://doi.org/10.1109/CVPR.2016.434
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200–2011 dataset (2011)
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. Comput. Vis. Pattern Recogn. IEEE (2016). https://doi.org/10.1109/CVPR.2016.124
Article Google Scholar
Roth, K., Brattoli, B., Ommer, B.: Mic: mining interclass characteristics for improved metric learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8000–8009) (2019). https://doi.org/10.48550/arXiv.1909.11574
Zhang, B., Zheng, W., Zhou, J., Lu, J.: Attributable visual similarity learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7532–7541) (2022). https://doi.org/10.1109/cvpr52688.2022.00738
Zheng, W., Zhang, B., Lu, J., Zhou, J.:. Deep relational metric learning (2021). https://doi.org/10.48550/arXiv.2108.10026
Rolinek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.: Optimizing rank-based metrics with blackbox differentiation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7620–7630) (2020). https://doi.org/10.1109/CVPR42600.2020.00764
Venkataramanan, S., Psomas, B., Avrithis, Y., Kijak, E., Karantzalos, K.: It takes two to tango: mixup for deep metric learning (2021). https://doi.org/10.48550/arXiv.2106.04990
El-Nouby, A., Neverova, N., Laptev, I., Jégou, H.: Training vision transformers for image retrieval (2021). arXiv preprint: https://doi.org/10.48550/arXiv.2102.05644
Zhao, W., Rao, Y., Wang, Z., Lu, J., Zhou, J.: Towards interpretable deep metric learning with structural matching (2021). https://doi.org/10.48550/arXiv.2108.05889
Teh, E.W., Devries, T., Taylor, G.W.: ProxyNCA++: revisiting and revitalizing proxy neighborhood component analysis (2020). https://doi.org/10.48550/arXiv.2004.01113

Download references

Acknowledgements

This work is supported by a grant from Key Laboratory of Avionics System Integrated Technology, Fundamental Research Funds for the Central Universities in China, Grant No. 3072022JC0601, and the National Natural Science Foundation of China under Grant No. 41876110.

Author information

Authors and Affiliations

College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Xingmei Wang, Jinli Wang, Ze Feng & Xuan Zhou
Key Laboratory of Avionics System Integrated Technology, Shanghai, China
Minyang Kang & Bo Liu

Authors

Xingmei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minyang Kang
View author publications
You can also search for this author in PubMed Google Scholar
Ze Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingmei Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Wang, J., Kang, M. et al. LDGC-Net: learnable descriptor graph convolutional network for image retrieval. Vis Comput 39, 6639–6653 (2023). https://doi.org/10.1007/s00371-022-02753-2

Download citation

Accepted: 10 December 2022
Published: 27 December 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00371-022-02753-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LDGC-Net: learnable descriptor graph convolutional network for image retrieval

Abstract

Access this article

Similar content being viewed by others

Image retrieval using dual-weighted deep feature descriptor

A multi-level descriptor using ultra-deep feature for image retrieval

Aggregating Deep Features of Multi-CNN Models for Image Retrieval

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LDGC-Net: learnable descriptor graph convolutional network for image retrieval

Abstract

Access this article

Similar content being viewed by others

Image retrieval using dual-weighted deep feature descriptor

A multi-level descriptor using ultra-deep feature for image retrieval

Aggregating Deep Features of Multi-CNN Models for Image Retrieval

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation