Skip to main content

SEMICON: A Learning-to-Hash Solution for Large-Scale Fine-Grained Image Retrieval

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

In this paper, we propose Suppression-Enhancing Mask based attention and Interactive Channel transformatiON (SEMICON) to learn binary hash codes for dealing with large-scale fine-grained image retrieval tasks. In SEMICON, we first develop a suppression-enhancing mask (SEM) based attention to dynamically localize discriminative image regions. More importantly, different from existing attention mechanism simply erasing previous discriminative regions, our SEM is developed to restrain such regions and then discover other complementary regions by considering the relation between activated regions in a stage-by-stage fashion. In each stage, the interactive channel transformation (ICON) module is afterwards designed to exploit correlations across channels of attended activation tensors. Since channels could generally correspond to the parts of fine-grained objects, the part correlation can be also modeled accordingly, which further improves fine-grained retrieval accuracy. Moreover, to be computational economy, ICON is realized by an efficient two-step process. Finally, the hash learning of our SEMICON consists of both global- and local-level branches for better representing fine-grained objects and then generating binary hash codes explicitly corresponding to multiple levels. Experiments on five benchmark fine-grained datasets show our superiority over competing methods. (Codes are available at https://github.com/NJUST-VIPGroup/SEMICON).

Y. Shen, X. Sun, X.-S. Wei and J. Yang—Are also with Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, Nanjing University of Science and Technology, China. This work is supported by National Key R &D Program of China (2021YFA1001100), Natural Science Foundation of Jiangsu Province of China under Grant (BK20210340), the Fundamental Research Funds for the Central Universities (No. 30920041111, No. NJ2022028), CAAI-Huawei MindSpore Open Fund, Beijing Academy of Artificial Intelligence (BAAI), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX22_0463).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 - mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29

    Chapter  Google Scholar 

  2. Cai, S., Zuo, W., Zhang, L.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 511–520 (2017)

    Google Scholar 

  3. Cakir, F., He, K., Sclaroff, S.: Hashing with binary matrix pursuit. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 344–361. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_21

    Chapter  Google Scholar 

  4. Cao, Z., Long, M., Wang, J., Yu, P.S.: HashNet: Deep learning to hash by continuation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5608–5617 (2017)

    Google Scholar 

  5. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 213–229 (2020)

    Google Scholar 

  6. Chen, L., et al.: SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5659–5667 (2017)

    Google Scholar 

  7. Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5157–5166 (2019)

    Google Scholar 

  8. Cui, Q., Jiang, Q.Y., Wei, X.S., Li, W.J., Yoshie, O.: ExchNet: A unified hashing network for large-scale fine-grained image retrieval. In: Proceedings of European Conference on Computer Vision, pp. 189–205 (2020)

    Google Scholar 

  9. Dasgupta, A., Kumar, R., Sarlos, T.: Fast locality-sensitive hashing. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1073–1081 (2011)

    Google Scholar 

  10. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  11. Ferrari, V., Zisserman, A.: Learning visual attributes. In: Proceedings of Advances in Neural Information Processing Systems, pp. 433–440 (2007)

    Google Scholar 

  12. Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2012)

    Article  Google Scholar 

  13. Guo, M.H., et al.: Attention mechanisms in computer vision: A survey. arXiv preprint arXiv:2111.07624 (2021)

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  15. Hoe, J.T., Ng, K.W., Zhang, T., Chan, C.S., Song, Y.Z., Xiang, T.: One loss for all: Deep hashing with a single cosine similarity based learning objective. In: Proceedings of Advances in Neural Information Processing Systems (2021)

    Google Scholar 

  16. Hou, S., Feng, Y., Wang, Z.: VegFru: A domain-specific dataset for fine-grained visual categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 541–549 (2017)

    Google Scholar 

  17. Jiang, Q.Y., Li, W.J.: Asymmetric deep supervised hashing. In: Proceedings of Conference on AAAI, pp. 3342–3349 (2018)

    Google Scholar 

  18. Jin, S., Yao, H., Sun, X., Zhou, S., Zhang, L., Hua, X.: Deep saliency hashing for fine-grained retrieval. IEEE Trans. Image Process. 29, 5336–5351 (2020)

    Article  Google Scholar 

  19. Krause, J., Gebru, T., Deng, J., Li, L.J., Fei-Fei, L.: Learning features and parts for fine-grained recognition. In: Proceedings of International Conference on Pattern Recognition, pp. 26–33 (2014)

    Google Scholar 

  20. Larochelle, H., Hinton, G.: Learning to combine foveal glimpses with a third-order boltzmann machine. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1243–1251 (2010)

    Google Scholar 

  21. Leng, C., Cheng, J., Wu, J., Zhang, X., Lu, H.: Supervised hashing with soft constraints. In: Proceedings of ACM International Conference on Information & Knowledge Management, pp. 1851–1854 (2014)

    Google Scholar 

  22. Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3958–3967 (2019)

    Google Scholar 

  23. Li, W.J., Wang, S., Kang, W.C.: Feature learning based deep supervised hashing with pairwise labels. In: Proceedings of International Joint Conferences on Artificial Intelligence, pp. 1711–1717 (2015)

    Google Scholar 

  24. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of IEEE conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)

    Google Scholar 

  25. Liu, C., Xie, H., Zha, Z., Yu, L., Chen, Z., Zhang, Y.: Bidirectional attention-recognition model for fine-grained object classification. IEEE Trans. Multimedia 22(7), 1785–1795 (2019)

    Article  Google Scholar 

  26. Liu, L., Shen, C., Van den Hengel, A.: The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4749–4757 (2015)

    Google Scholar 

  27. Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 10012–10022 (2021)

    Google Scholar 

  28. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104 (2016)

    Google Scholar 

  29. Lu, D., Wang, J., Zeng, Z., Chen, B., Wu, S., Xia, S.T.: SwinFGHash: Fine-grained image retrieval via transformer-based hashing network. In: Proceedings of British Machine Vision Conference, pp. 1–13 (2021)

    Google Scholar 

  30. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In: Proceedings of International Conference on Very Large Data Bases, pp. 950–961 (2007)

    Google Scholar 

  31. Ma, L., Li, X., Shi, Y., Wu, J., Zhang, Y.: Correlation filtering-based hashing for fine-grained image retrieval. IEEE Signal Process. Lett. 27, 2129–2133 (2020)

    Article  Google Scholar 

  32. Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)

  33. Ng, T., Balntas, V., Tian, Y., Mikolajczyk, K.: SOLAR: second-order loss and attention for image retrieval. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 253–270. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_16

    Chapter  Google Scholar 

  34. Pang, K., Yang, Y., Hospedales, T.M., Xiang, T., Song, Y.Z.: Solving mixed-modal jigsaw puzzle for fine-grained sketch-based image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10347–10355 (2020)

    Google Scholar 

  35. Shen, F., Shen, C., Liu, W., Shen, H.T.: Supervised discrete hashing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 37–45 (2015)

    Google Scholar 

  36. Shrivastava, A., Li, P.: Densifying one permutation hashing via rotation for fast near neighbor search. In: Proceedings of International Conference on Machine Learning, pp. 557–565 (2014)

    Google Scholar 

  37. Simon, M., Rodner, E.: Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision, pp. 1143–1151 (2015)

    Google Scholar 

  38. Song, J., Yu, Q., Song, Y.Z., Xiang, T., Hospedales, T.M.: Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In: Proceedings of the IEEE Conference on Computer Vision, pp. 5551–5560 (2017)

    Google Scholar 

  39. Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)

    Google Scholar 

  40. Van Horn, G., et al.: The iNaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769–8778 (2018)

    Google Scholar 

  41. Vaswani, A., et al.: Attention is all you need. In: Proceedings of Advances in Neural Information Processing System, pp. 5998–6008 (2017)

    Google Scholar 

  42. Vedaldi, A., et al.: Understanding objects in detail with fine-grained attributes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3622–3629 (2014)

    Google Scholar 

  43. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Tech. Report CNS-TR-2011-001 (2011)

    Google Scholar 

  44. Wang, J., Zhang, T., Sebe, N., Tao, S.H.: A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 769–790 (2017)

    Article  Google Scholar 

  45. Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017)

    Article  MathSciNet  Google Scholar 

  46. Wei, X.S., Shen, Y., Sun, X., Ye, H.J., Yang, J.: A\(^2\)-Net: Learning attribute-aware hash codes for large-scale fine-grained image retrieval. In: Proceedings of Advances in Neural Information Processing System, pp. 5720–5730 (2021)

    Google Scholar 

  47. Wei, X.S., et al.: Fine-grained image analysis with deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3126648

  48. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: Convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  49. Yu, Q., Liu, F., Song, Y.Z., Xiang, T., Hospedales, T.M., Loy, C.C.: Sketch me that shoe. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 799–807 (2016)

    Google Scholar 

  50. Yu, Y., Tang, S., Aizawa, K., Aizawa, A.: Category-based deep cca for fine-grained venue discovery from multimodal data. IEEE Trans. Neural Netw. Learn. Syst. 30(4), 1250–1258 (2018)

    Google Scholar 

  51. Yuan, X., Ren, L., Lu, J., Zhou, J.: Relaxation-free deep hashing via policy gradient. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 141–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_9

    Chapter  Google Scholar 

  52. Zeng, Z., Wang, J., Chen, B., Dai, T., Xia, S.T.: Pyramid hybrid pooling quantization for efficient fine-grained image retrieval. arXiv preprint arXiv:2109.05206 (2021)

  53. Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.S.: Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1325–1334 (2018)

    Google Scholar 

  54. Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision, pp. 5209–5217 (2017)

    Google Scholar 

  55. Zheng, X., Ji, R., Sun, X., Zhang, B., Wu, Y., Huang, F.: Towards optimal fine grained retrieval via decorrelated centralized loss with normalize-scale layer. In: Proceedings of Conference of AAAI, pp. 9291–9298 (2019)

    Google Scholar 

Download references

Acknowledgement

The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. We gratefully acknowledge the support of MindSpore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiu-Shen Wei .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2295 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shen, Y., Sun, X., Wei, XS., Jiang, QY., Yang, J. (2022). SEMICON: A Learning-to-Hash Solution for Large-Scale Fine-Grained Image Retrieval. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13674. Springer, Cham. https://doi.org/10.1007/978-3-031-19781-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19781-9_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19780-2

  • Online ISBN: 978-3-031-19781-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics