Skip to main content
Log in

Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) is an extension of sketch-based image retrieval (SBIR) that aims to search relevant images with query sketches of the unseen categories. Most previous methods focus more on preserving semantic knowledge and improving domain alignment performance, but neglect to capture the correlation between inter-modal features, resulting in unsatisfactory performance. Hence, a sketch-image cross-modal retrieval framework is proposed to maximize the sketch-image correlation. For this framework, we develop a discriminant adversarial learning method that incorporates intra-modal discrimination, inter-modal consistency, and inter-modal correlation into a deep learning network for common feature representation learning. Specifically, sketch and image features are first projected into a shared feature subspace to achieve modality-invariance. Subsequently, we adopt a category label predictor to achieve intra-modal discrimination, use adversarial learning to confuse modal information for inter-modal consistency, and introduce correlation learning to maximize inter-modal correlation. Finally, the trained deep learning model is used to test unseen categories. Extensive experiments conducted on three zero-shot datasets show that this method outperforms state-of-the-art methods. For retrieval accuracy of unseen categories, this method exceeds the state-of-the-art methods by approximately 0.6% on the RSketch dataset, 5% on the Sketchy dataset, and 7% on the TU-Berlin dataset. We also conduct experiments on the dataset of image-based 3D model scene retrieval, the proposed method significantly outperforms the state-of-the-art approaches in all standard metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Qi Y, Song YZ, Zhang H, Liu J (2016) Sketch-based image retrieval via Siamese convolutional neural network. In: Proceedings of the international conference on image processing (ICIP), pp 2460-2464. IEEE. https://doi.org/10.1109/ICIP.2016.7532801

  2. Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2298-2307. IEEE. https://doi.org/10.1109/CVPR.2017.247

  3. Zhang J, Shen F, Liu L, Zhu F, Yu M, Shao L, Tao H, Gool L V (2018) Generative Domain-Migration Hashing for Sketch-to-Image Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 304-321. Springer. https://doi.org/10.1007/978-3-030-01216-8_19

  4. Yu Q, Song J, Song YZ, Xiang T, Hospedales TM (2021) Fine-grained instance-level sketch-based image retrieval. Int J Comput Vis 129(2):484–500. https://doi.org/10.1007/s11263-020-01382-3

    Article  Google Scholar 

  5. Yang Z, Zhu X, Qian J, Liu P (2021) Dark-aware network for fine-grained sketch-based image retrieval. Signal Process Lett 28:264–268. https://doi.org/10.1109/LSP.2020.3043972

    Article  Google Scholar 

  6. Bai C, Chen J, Ma Q, Hao P, Chen S (2020) Cross-domain representation learning by domain-migration generative adversarial network for sketch based image retrieval. J Vis Commun Image Represent 71:102835–102842. https://doi.org/10.1016/j.jvcir.2020.102835

    Article  Google Scholar 

  7. Bhunia AK, Yang Y, Hospedales TM, Xiang T, Song YZ (2020) Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 9776-9785. IEEE, https://doi.org/10.1109/CVPR42600.2020.00980

  8. Song J, Yu Q, Song YZ, Xiang T, Hospedales T M (2017) Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision (ICCV), pp 5552-5561. IEEE. https://doi.org/10.1109/ICCV.2017.592

  9. Lin H, Yu Y, Lu P, Gong S, Xue X, Jiang YG (2019) TC-Net for iSBIR: Triplet Classification Network for Instance-level Sketch Based Image Retrieval. In: Proceedings of the 27th ACM international conference on multimedia, pp 1676-1684. ACM. https://doi.org/10.1145/3343031.3350900

  10. Bui T, Ribeiro LSP, Ponti M, Collomosse JP (2018) Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput Graph 71:77–87. https://doi.org/10.1016/j.cag.2017.12.006

    Article  Google Scholar 

  11. Pang K, Yang Y, Hospedales T M, Xiang T, Song Y Z (2020) Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 10344-10352. https://doi.org/10.1109/CVPR42600.2020.01036

  12. Pang K, Li K, Yang Y, Zhang H, Hospedales T M, Xiang T, Song Y Z (2020) Generalising Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 677-686. IEEE. https://doi.org/10.1109/CVPR.2019.00077

  13. Liu Q, Xie L, Wang H, Yuille AL (2019) Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision (ICCV), pp 3661-3670. IEEE. https://doi.org/10.1109/ICCV.2019.00376

  14. Pandey A, Mishra A, Verma VK, Mittal A, (2019) Adversarial Joint-Distribution Learning for Novel Class Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision workshops, pp 1391-1400. IEEE. https://doi.org/10.1109/ICCVW.2019.00175

  15. Yelamarthi SK, Reddy MSK, Mishra Ashish, Mittal A (2018) A Zero-Shot Framework for Sketch Based Image Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 316-333. Springer. https://doi.org/10.1007/978-3-030-01225-0_19

  16. Chaudhuri U, Banerjee B, Bhattacharya A, Datcu M (2020) CrossATNet - a novel cross-attention based framework for sketch-based image retrieval. Image Vis Comput 104:104003–1040012. https://doi.org/10.1016/j.imavis.2020.104003

    Article  Google Scholar 

  17. Dey S, Riba P, Dutta A, Lladós JL, Song YZ (2020) Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2179-2188. IEEE. https://doi.org/10.1109/CVPR.2019.00228

  18. Dutta A, Akata Z (2020) Semantically tied paired cycle consistency for any-shot sketch-based image retrieval. Int J Comput Vis 128(10):2684–2703. https://doi.org/10.1007/s11263-020-01350-x

    Article  MATH  Google Scholar 

  19. Xu P, Yin Q, Huang Y, Song YZ, Ma Z, Wang L, Xiang T, Kleijn WB, Guo J (2018) Cross-modal subspace learning for fine-grained sketch-based image retrieval. Neurocomputing 278:75–86. https://doi.org/10.1016/j.neucom.2017.05.099

    Article  Google Scholar 

  20. Zhang X, Li X, Li X, Shen M (2018) Better freehand sketch synthesis for sketch-based image retrieval: Beyond image edges. Neurocomputing 322:38–46. https://doi.org/10.1016/j.neucom.2018.09.047

    Article  Google Scholar 

  21. Wang Y, Huang F, Zhang Y, Feng R, Zhang T, Fan W (2020) Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval. Pattern Recognit 100:107148–107160. https://doi.org/10.1016/j.patcog.2019.107148

    Article  Google Scholar 

  22. Wang F, Lin S, Luo X, Wu H, Wang R, Zhou F (2017) A data-driven approach for sketch-based 3D shape retrieval via similar drawing-style recommendation. Comput Graph Forum 36(7):157–166. https://doi.org/10.1111/cgf.13281

    Article  Google Scholar 

  23. Lei J, Song Y, Peng B, Ma Z, Shao L, Song YZ (2020) Semi-heterogeneous three-way joint embedding network for sketch-based image retrieval. IEEE Trans Circuits Syst Video Technol 30(9):3226–3237. https://doi.org/10.1109/TCSVT.2019.2936710

    Article  Google Scholar 

  24. Shen Y, Liu L, Shen F, Shao L (2018) Zero-Shot Sketch-Image Hashing. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 3598-3607. IEEE. https://doi.org/10.1109/CVPR.2018.00379

  25. Dutta T, Biswas S (2019) Style-Guided Zero-Shot Sketch-based Image Retrieval. In: Proceedings of the 30th British Machine Vision Conference (BMVC), pp 209-210. BMVA Press

  26. Dutta T, Singh A, Biswas S (2021) StyleGuide: Zero-shot sketch-based image retrieval using style-guided image generation. IEEE Trans Multim 23:2833–2842. https://doi.org/10.1109/TMM.2020.3017918

    Article  Google Scholar 

  27. Wang W, Shi Y, Chen S, Peng Q, Zheng F, You X (2021) Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the thirtieth international joint conference on artificial intelligence (IJCAI). pp 1106-1112. https://doi.org/10.24963/ijcai.2021/153

  28. Tian J, Xu Xing, Wang Z, Shen F, Liu X (2021) Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval. In: Proceedings of the ACM Multimedia Conference. pp 5473-5481. ACM. https://doi.org/10.1145/3474085.3475676

  29. Zhang Z, Zhang Y, Feng R, Zhang T, Fan W (2020) Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. pp 12943-12950. AAAI Press

  30. Deng C, Xu X, Wang H, Yang M, Tao D (2020) Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Trans Image Process 29:8892–8902. https://doi.org/10.1109/TIP.2020.3020383

    Article  Google Scholar 

  31. Goodfellow IJ, Abadie JP, Mirza M, Xu B, Farley DW, Ozair S, Courville A, Bengio Y (2014) Generative Adversarial Nets. In: Proceedings of the annual conference on neural information processing systems. pp 2672-2680

  32. Zhu L, Song J, Zhu X, Zhang C, Zhang S, Yuan X, Wang P (2020) Adversarial learning-based semantic correlation representation for cross-modal retrieval. IEEE Multim 27(4):79–90

    Article  Google Scholar 

  33. Zheng W, Liu H, Wang B, Sun F (2019) Cross-modal surface material retrieval using discriminant adversarial learning. IEEE Trans Ind Informatics 15(9):4978–4987

    Article  Google Scholar 

  34. Wang H, Sahoo D, Liu C, Lim E P, Hoi SCH (2019) Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 11572-11581. IEEE. https://doi.org/10.1109/CVPR.2019.01184

  35. Chen J, Fang Y (2018) Deep Cross-Modality Adaptation via Semantics Preserving Adversarial Learning for Sketch-Based 3D Shape Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 624-640. Springer. https://doi.org/10.1007/978-3-030-01261-8_37

  36. Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: A unified embedding for face recognition and clustering. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 815-823. IEEE. https://doi.org/10.1109/CVPR.2015.7298682

  37. Wen Y, Zhang K, Li Z, Qian Y (2016) A Discriminative Feature Learning Approach for Deep Face Recognition. In: Proceedings of the 14th european conference on computer vision(ECCV), pp 499-515. Springer. https://doi.org/10.1007/978-3-319-46478-7_31

  38. He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-Center Loss for Multi-View 3D Object Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 1945-1954. IEEE. https://doi.org/10.1109/CVPR.2018.00208

  39. Xu F, Yang W, Jiang T, Lin S, Luo H, Xia GS (2020) Mental retrieval of remote sensing images via adversarial sketch-image feature learning. IEEE Trans Geosci Remote Sens 58(11):7801–7814. https://doi.org/10.1109/TGRS.2020.2984316

    Article  Google Scholar 

  40. Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph 35(4):1–12. https://doi.org/10.1145/2897824.2925954

    Article  Google Scholar 

  41. Deng J, Dong W, Socher R, Li LJ, Li FF (2009) ImageNet: A large-scale hierarchical image database. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2448-255. IEEE. https://doi.org/10.1109/CVPR.2009.5206848

  42. Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Graph 31(4):1–10. https://doi.org/10.1145/2185520.2185540

    Article  Google Scholar 

  43. Zhang H, Liu S, Zhang C, Ren W, Wang R, Cao X (2016) SketchNet: Sketch Classification with Web Images. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 1105-1113. IEEE. https://doi.org/10.1109/CVPR.2016.125

  44. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 770-778. IEEE. https://doi.org/10.1109/CVPR.2016.90

  45. Sohn K (2016) Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Proceedings of the annual conference on neural information processing systems. pp 1849-1857

  46. Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM (2019) Ranked List Loss for Deep Metric Learning. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 5207-5216. IEEE. https://doi.org/10.1109/CVPR.2019.00535

  47. Bottou L (2010) Large-Scale Machine Learning with Stochastic Gradient Descent. In: Proceedings of the 19th international conference on computational statistics. pp 177-186. Physica-Verlag. https://doi.org/10.1007/978-3-7908-2604-3_16

  48. Kingma DP, Ba J (2015) Adam: A Method for Stochastic Optimization. In: Proceedings of the 3rd international conference on learning representations. arXiv:1412.6980

  49. Yuan J, Rashid HA, Li B, Lu Y, Schreck T, Bui NM, Do TL, Nguyen KT, Nguyen TA, Nguyen VN, Tran MT, Wang T (2019) Extended 2D Scene Sketch-Based 3D Scene Retrieval. In: Proceedings of the 12th Eurographics Workshop on 3D Object Retrieval. pp 33-39. Eurographics Association. https://doi.org/10.2312/3dor.20191059

  50. Maaten LVD, Geoffrey H (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605

    MATH  Google Scholar 

  51. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175. https://doi.org/10.1023/A:1011139631724

    Article  MATH  Google Scholar 

  52. Hu R, Collomosse JP (2013) A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. Comput Vis Image Underst 117(7):790–806. https://doi.org/10.1016/j.cviu.2013.02.005

    Article  Google Scholar 

  53. Saavedra JM, Bustos B (2014) Sketch-based image retrieval using keyshapes. Multim Tools Appl 73(3):2033–2062. https://doi.org/10.1007/s11042-013-1689-0

    Article  Google Scholar 

  54. Radenovic F, Tolias G, Chum O (2018) Deep Shape Matching. In: Proceedings of the 14th european conference on computer vision(ECCV), pp 774-791. Springer. https://doi.org/10.1007/978-3-030-01228-1_46

  55. Jiang T, Xia GS, Lu Q, Shen W (2017) Retrieving aerial scene images with learned deep image-sketch features. J Comput Sci Technol 32(4):726–737. https://doi.org/10.1007/s11390-017-1754-7

    Article  Google Scholar 

  56. Zhen L, Hu P, Wang X, Peng D (2019) Deep Supervised Cross-Modal Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 10394-10403. IEEE. https://doi.org/10.1109/CVPR.2019.01064

  57. Jing L, Vahdani E, Tan J, Tian Y (2021) Cross-Modal Center Loss for 3D Cross-Modal Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 3142-3151. IEEE. https://doi.org/10.1109/CVPR46437.2021.00316

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant 2018YFB2101504, in part by the Key Research and Development Program of Shanxi Province of China under Grant 201903D121147, in part by the Natural Science Foundation of Shanxi Province of China under Grant 201901D111150, in part by the Research Project Supported by Shanxi Scholarship Council of China under Grant 2020-113.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liqun Kuang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiao, S., Han, X., Xiong, F. et al. Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval. Neural Comput & Applic 34, 13469–13483 (2022). https://doi.org/10.1007/s00521-022-07169-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07169-6

Keywords

Navigation