Skip to main content
Log in

LDGC-Net: learnable descriptor graph convolutional network for image retrieval

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Image retrieval is a challenging task of searching images similar to the query one from a database. Previous learning-based methods adopt various ingenious designs to increase the representatively positive and negative sample pairs in training. Still, these methods are performance immanently limited by the size of the mini-batch. To this end, we here introduce the learnable descriptor graph convolutional network (LDGC-Net), which effectively enhances the hard mining ability of the model and clears the boundary between different categories. We present an analysis of why our LDGC-Net can aggregate relationships between original descriptors in a constrained size of the mini-batch. Also, we propose an innovative end-to-end training framework with the LDGC-Net for image retrieval to accelerate model convergence. In particular, our LDGC-Net can be conveniently integrated into other current methods as a plug-and-play module with inappreciable computational cost. Experimental results in three benchmark datasets show that the proposed LDGC-Net can improve performance compared with several state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets used in this paper are public datasets and can be obtained by contacting the relevant providers.

References

  1. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  2. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  3. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5297–5307) (2016). https://doi.org/10.1109/TPAMI.2017.2711011

  4. Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14141–14152) (2021). https://doi.org/10.48550/arXiv.2103.01486

  5. Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: Learning global representations for image search. In: European Conference on Computer Vision (pp. 241–257). Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_15

  6. Cakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Deep metric learning to rank. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1861–1870) (2019). https://doi.org/10.1109/CVPR.2019.00196

  7. Brown, A., Xie, W., Kalogeiton, V., Zisserman, A.: Smooth-ap: Smoothing the path towards large-scale image retrieval. In: European Conference on Computer Vision (pp. 677–694). Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_39

  8. Ramzi, E., Thome, N., Rambour, C., Audebert, N., Bitot, X.: Robust and Decomposable Average Precision for Image Retrieval. Adv. Neural Inf. Process. Syst., 34 (2021). https://doi.org/10.48550/arXiv.2110.01445

  9. Suh, Y., Han, B., Kim, W., Lee, K.M.: Stochastic class-based hard example mining for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7251–7259) (2019). https://doi.org/10.1109/CVPR.2019.00742

  10. Ge, W.: Deep metric learning with hierarchical triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269–285) (2018)

  11. Wang, X., Zhang, H., Huang, W., Scott, M.: Cross-batch memory for embedding learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6388–6397) (2020). https://doi.org/10.1109/CVPR42600.2020.00642

  12. Jiang, B., Zhang, Z., Lin, D., Tang, J., Luo, B.: Semi-supervised learning with graph learning-convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11313–11320) (2019). https://doi.org/10.1109/CVPR.2019.01157

  13. Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Computer Vision, IEEE International Conference on (Vol. 3, pp. 1470–1470). IEEE Computer Society (2003). https://doi.org/10.1109/ICCV.2003.1238663

  14. Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In: European Conference on Computer Vision (pp. 3–20). Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-01

  15. Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2011). https://doi.org/10.1109/TPAMI.2011.235

    Article  Google Scholar 

  16. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 3304–3311). IEEE (2010). https://doi.org/10.1109/CVPR.2010.5540039

  17. Dong, B., Zeng, F., Wang, T., Zhang, X., Wei, Y.: Solq: segmenting objects by learning queries (2021). https://doi.org/10.48550/arXiv.2106.02351

  18. Liao, X., Li, K., Zhu, X., Liu, K.J.R.: Robust detection of image operator chain with two-stream convolutional neural network. IEEE J. Selected Top. Signal Process., 99, 1–1 (2020). https://doi.org/10.1109/JSTSP.2020.3002391

  19. Hu, J., Liao, X., Wang, W., Qin, Z.: Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Trans. Circuits Syst. Video Technol. 99, 1–1 (2021). https://doi.org/10.1109/TCSVT.2021.3074259

  20. Tolias, G., Jenicek, T., Chum, O.: Learning and aggregating deep local descriptors for instance-level recognition. In: European Conference on Computer Vision (pp. 460–477). Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_27

  21. Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018). https://doi.org/10.1109/TPAMI.2018.2846566

    Article  Google Scholar 

  22. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 2161–2168). IEEE (2006). https://doi.org/10.1109/CVPR.2006.264

  23. Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 3456–3465) (2017). https://doi.org/10.1109/ICCV.2017.374

  24. Qin, Q., Hu, W., Liu, B.: Feature projection for improved text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8161–8171) (2020). https://doi.org/10.18653/v1/2020.acl-main.726

  25. Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2575–2584) (2020). https://doi.org/10.48550/arXiv.2004.01804

  26. Wu, C. Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 2840–2848) (2017). https://doi.org/10.1109/ICCV.2017.309

  27. Revaud, J., Almazán, J., Rezende, R.S., Souza, C.R.D. Learning with average precision: Training image retrieval with a listwise loss. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 5107–5116) (2019). https://doi.org/10.1109/ICCV.2019.00521

  28. Hamilton, W. L., Ying, R., Leskovec, J.: Representation Learning on Graphs: Methods and Applications (2017). arXiv preprint: https://doi.org/10.48550/arXiv.1709.05584

  29. Bruna, J., Zaremba, W., Szlam, A., Lecun, Y.: Spectral networks and locally connected networks on graphs. Computer Science (2013). https://doi.org/10.48550/arXiv.1312.6203

  30. Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data (2015). arXiv preprint: https://doi.org/10.48550/arXiv.1506.05163

  31. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst., 29 (2016). https://doi.org/10.48550/arXiv.1606.09375

  32. Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. Adv. Neural Inf. Process. Syst., 29 (2016). https://doi.org/10.48550/arXiv.1511.02136

  33. Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning (pp. 1263–1272) (2017). PMLR. https://doi.org/10.48550/arXiv.1704.01212

  34. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning (pp. 10347–10357) (2021). PMLR. https://doi.org/10.48550/arXiv.2012.12877

  35. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision (pp. 630–645). Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-038

  36. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International CONFERENCE ON MACHINE LEARNING (pp. 448–456). PMLR (2015). https://doi.org/10.5555/3045118.3045167

  37. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4004–4012) (2016). https://doi.org/10.1109/CVPR.2016.434

  38. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200–2011 dataset (2011)

  39. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. Comput. Vis. Pattern Recogn. IEEE (2016). https://doi.org/10.1109/CVPR.2016.124

    Article  Google Scholar 

  40. Roth, K., Brattoli, B., Ommer, B.: Mic: mining interclass characteristics for improved metric learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8000–8009) (2019). https://doi.org/10.48550/arXiv.1909.11574

  41. Zhang, B., Zheng, W., Zhou, J., Lu, J.: Attributable visual similarity learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7532–7541) (2022). https://doi.org/10.1109/cvpr52688.2022.00738

  42. Zheng, W., Zhang, B., Lu, J., Zhou, J.:. Deep relational metric learning (2021). https://doi.org/10.48550/arXiv.2108.10026

  43. Rolinek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.: Optimizing rank-based metrics with blackbox differentiation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7620–7630) (2020). https://doi.org/10.1109/CVPR42600.2020.00764

  44. Venkataramanan, S., Psomas, B., Avrithis, Y., Kijak, E., Karantzalos, K.: It takes two to tango: mixup for deep metric learning (2021). https://doi.org/10.48550/arXiv.2106.04990

  45. El-Nouby, A., Neverova, N., Laptev, I., Jégou, H.: Training vision transformers for image retrieval (2021). arXiv preprint: https://doi.org/10.48550/arXiv.2102.05644

  46. Zhao, W., Rao, Y., Wang, Z., Lu, J., Zhou, J.: Towards interpretable deep metric learning with structural matching (2021). https://doi.org/10.48550/arXiv.2108.05889

  47. Teh, E.W., Devries, T., Taylor, G.W.: ProxyNCA++: revisiting and revitalizing proxy neighborhood component analysis (2020). https://doi.org/10.48550/arXiv.2004.01113

Download references

Acknowledgements

This work is supported by a grant from Key Laboratory of Avionics System Integrated Technology, Fundamental Research Funds for the Central Universities in China, Grant No. 3072022JC0601, and the National Natural Science Foundation of China under Grant No. 41876110.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingmei Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wang, J., Kang, M. et al. LDGC-Net: learnable descriptor graph convolutional network for image retrieval. Vis Comput 39, 6639–6653 (2023). https://doi.org/10.1007/s00371-022-02753-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02753-2

Keywords

Navigation