Skip to main content

Cross-Modal Visual Correspondences Learning Without External Semantic Information for Zero-Shot Sketch-Based Image Retrieval

  • Conference paper
  • First Online:
Artificial Intelligence and Robotics (ISAIR 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1998))

Included in the following conference series:

  • 269 Accesses

Abstract

In this paper, we study the problem of zero-shot sketch-based image retrieval (ZS-SBIR), which is challenging because of the modal gap between sketch and image and the semantic inconsistency between seen categories and unseen categories. Most of the previous methods in ZS-SBIR, need external semantic information, i.e., texts and class labels, to minimize modal gap or semantic inconsistency. To tackle the challenging ZS-SBIR without external semantic information which is labor intensive, we propose a novel method of learning the visual correspondences between different modalities, i.e., sketch and image, to transfer knowledge from seen data to unseen data. This method is based on a transformer-based dual-pathway structure to learn the visual correspondences. In order to eliminate the modal gap between sketch and image, triplet loss and Gaussian distribution based domain alignment mechanism are introduced and performed on tokens obtained from our proposed structure. In addition, knowledge distillation is introduced to maintain the generalization capability brought by the vision transformer (ViT) used as the backbone to build the model. The comprehensive experiments on three benchmark datasets, i.e., Sketchy, TU-Berlin and QuickDraw, demonstrate that our method achieves superior results compared to baselines on all three datasets without external semantic information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9650–9660, October 2021

    Google Scholar 

  2. Casey, E., Pérez, V., Li, Z.: The animation transformer: visual correspondence via segment matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11323–11332, October 2021

    Google Scholar 

  3. Deng, C., Xu, X., Wang, H., Yang, M., Tao, D.: Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Trans. Image Process. 29, 8892–8902 (2020). https://doi.org/10.1109/TIP.2020.3020383

    Article  Google Scholar 

  4. Dey, S., Riba, P., Dutta, A., Llados, J., Song, Y.Z.: Doodle to search: practical zero-shot sketch-based image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2179–2188 (2019)

    Google Scholar 

  5. Doersch, C., Gupta, A., Zisserman, A.: Crosstransformers: spatially-aware few-shot transfer. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA (2020)

    Google Scholar 

  6. Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale (2021)

    Google Scholar 

  7. Dutta, A., Akata, Z.: Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  8. Eitz, M., Hildebrand, K., Boubekeur, T., Alexa, M.: An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Comput. Graph. 34(5), 482–498 (2010)

    Article  Google Scholar 

  9. Lin, F., Li, M., Li, D., Hospedales, T., Song, Y.Z., Qi, Y.: Zero-shot everything sketch-based image retrieval, and in explainable style. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 23349–23358, June 2023

    Google Scholar 

  10. Liu, L., Shen, F., Shen, Y., Liu, X., Shao, L.: Deep sketch hashing: fast free-hand sketch-based image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2871 (2017)

    Google Scholar 

  11. Liu, Q., Xie, L., Wang, H., Yuille, A.L.: Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  12. Saavedra, J.M., Barrios, J.M.: Sketch based image retrieval using learned keyshapes (LKS). In: British Machine Vision Conference (2015). https://api.semanticscholar.org/CorpusID:11324587

  13. Sain, A., Bhunia, A.K., Potlapalli, V., Chowdhury, P.N., Xiang, T., Song, Y.Z.: Sketch3t: test-time training for zero-shot SBIR. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7452–7461 (2022). https://api.semanticscholar.org/CorpusID:247762119

  14. Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans. Graph. (TOG) 35(4), 1–12 (2016)

    Article  Google Scholar 

  15. Shen, Y., Liu, L., Shen, F., Shao, L.: Zero-shot sketch-image hashing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3598–3607 (2018)

    Google Scholar 

  16. Tian, J., Xu, X., Shen, F., Yang, Y., Shen, H.T.: TVT: three-way vision transformer through multi-modal hypersphere learning for zero-shot sketch-based image retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 2370–2378, June 2022. https://doi.org/10.1609/aaai.v36i2.20136, https://ojs.aaai.org/index.php/AAAI/article/view/20136

  17. Tian, J., Xu, X., Wang, Z., Shen, F., Liu, X.: Relationship-preserving knowledge distillation for zero-shot sketch based image retrieval. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 5473–5481. MM ’21, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3474085.3475676

  18. Tursun, O., Denman, S., Sridharan, S., Goan, E., Fookes, C.: An efficient framework for zero-shot sketch-based image retrieval. Pattern Recognit. 126, 108528 (2022). https://doi.org/10.1016/j.patcog.2022.108528, https://www.sciencedirect.com/science/article/pii/S0031320322000097

  19. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  20. Wang, J., Chen, J., Lin, J., Sigal, L., de Silva, C.W.: Discriminative feature alignment: improving transferability of unsupervised domain adaptation by gaussian-guided latent alignment. Pattern Recognit. 116, 107943 (2021). https://doi.org/10.1016/j.patcog.2021.107943, https://www.sciencedirect.com/science/article/pii/S0031320321001308

  21. Wang, Z., Wang, H., Yan, J., Wu, A., Deng, C.: Domain-smoothing network for zero-shot sketch-based image retrieval. ArXiv abs/2106.11841 (2021). https://api.semanticscholar.org/CorpusID:235593135

  22. Wu, Y., Song, K., Zhao, F., Chen, J., Ma, H.: Distribution aligned feature clustering for zero-shot sketch-based image retrieval (2023)

    Google Scholar 

  23. Yelamarthi, S.K., Reddy, S.K., Mishra, A., Mittal, A.: A zero-shot framework for sketch based image retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 316–333. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_19

    Chapter  Google Scholar 

  24. Zhang, H., Liu, S., Zhang, C., Ren, W., Wang, R., Cao, X.: Sketchnet: sketch classification with web images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1105–1113 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhijie Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gao, Z., Wang, K. (2024). Cross-Modal Visual Correspondences Learning Without External Semantic Information for Zero-Shot Sketch-Based Image Retrieval. In: Lu, H., Cai, J. (eds) Artificial Intelligence and Robotics. ISAIR 2023. Communications in Computer and Information Science, vol 1998. Springer, Singapore. https://doi.org/10.1007/978-981-99-9109-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9109-9_34

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9108-2

  • Online ISBN: 978-981-99-9109-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics