Skip to main content
Log in

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Recently, zero-shot hashing methods have been successfully applied to cross-modal retrieval. However, these methods typically assume that the training data labels are accurate and noise-free, which is unrealistic in real-world scenarios due to the noises introduced by manual or automatic annotation. To address this problem, we propose a robust zero-shot discrete hashing with noisy labels (RZSDH) method, which fully considers the impact of noisy labels in real scenes. Our RZSDH method incorporates the sparse and low-rank constraints on the noise matrix and the recovered label matrix, respectively, to effectively reduce the negative impact of noisy labels. Therefore, this significantly enhances the robustness of our proposed method in practice cross-modal retrieval tasks. Additionally, the proposed RZSDH method learns a representation vector of each category attribute, which effectively captures the relationship between seen classes and unseen classes. Furthermore, our approach learns the common latent representation with drift from multimodal data features, which is more conducive to obtaining stable hash codes and hash functions. Finally, we employ a fine-grained similarity preserving strategy to generate more discriminative hash codes. Experiments on several benchmark datasets verify the effectiveness and robustness of the proposed RZSDH method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

All data for this study are available from public repositories.

References

  1. Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 37–45

  2. Kou F, Du J, Cui W, Shi L, Cheng P, Chen J, Li J (2019) Common semantic representation method based on object attention and adversarial learning for cross-modal data in iov. IEEE Trans Veh Technol 68(12):11588–11598

    Article  Google Scholar 

  3. Shu Z, Li L, Yu J, Zhang D, Yu Z, Wu XJ (2023) Online supervised collective matrix factorization hashing for cross-modal retrieval. Appl Intell 53(11):14201–14218

    Article  Google Scholar 

  4. Shi L, Du J, Cheng G, Liu X, Xiong Z, Luo J (2022) Cross-media search method based on complementary attention and generative adversarial network for social networks. Int J Intell Syst 37(8):4393–4416

    Article  Google Scholar 

  5. Shi L, Luo J, Zhu C, Kou F, Cheng G, Liu X (2023) A survey on cross-media search based on user intention understanding in social networks. Inform Fusion 91:566–581

    Article  Google Scholar 

  6. Yu J, Huang W, Li Z, Shu Z, Zhu L (2022) Hadamard matrix-guided multi-modal hashing for multi-modal retrieval. Digital Signal Process 130:103743

    Article  Google Scholar 

  7. Li H, Zhang C, Jia X, Gao Y, Chen C (2021) Adaptive label correlation based asymmetric discrete hashing for cross-modal retrieval. IEEE Trans Knowl Data Eng 35(2):1185–1199

    Google Scholar 

  8. Shu Z, Bai Y, Zhang D, Yu J, Yu Z, Wu XJ (2022) Specific class center guided deep hashing for cross-modal retrieval. Inf Sci 609:304–318

    Article  Google Scholar 

  9. Shu Z, Yong K, Zhang D, Yu J, Yu Z, Wu XJ (2023) Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput Appl 35(9):6665–6684

    Article  Google Scholar 

  10. Hong C, Yu J, Zhang J, Jin X, Lee K (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961

    Article  Google Scholar 

  11. Hong C, Chen L, Liang Y, Zeng Z (2021) Stacked capsule graph autoencoders for geometry-aware 3d head pose estimation. Comput Vis Image Underst 208:103224

    Article  Google Scholar 

  12. Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578

    Article  Google Scholar 

  13. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Article  MathSciNet  Google Scholar 

  14. Hong C, Yu J, Chen X (2013) Image-based 3d human pose recovery with locality sensitive sparse retrieval. In: 2013 IEEE International Conference on systems, man, and cybernetics, pp 2103–2108. IEEE, 2013

  15. Yu J, Zhang D, Shu Z, Chen F (2022) Adaptive multi-modal fusion hashing via hadamard matrix. Appl Intell 52(15):17170–17184

    Article  Google Scholar 

  16. Hu P, Zhu H, Lin J, Peng D, Zhao YP, Peng X (2022) Unsupervised contrastive cross-modal hashing. IEEE Trans Pattern Anal Mach Intell 45(3):3877–3889

    Google Scholar 

  17. Yang X, Liu W, Liu W, Tao D (2019) A survey on canonical correlation analysis. IEEE Trans Knowl Data Eng 33(6):2349–2368

    Article  Google Scholar 

  18. Hardoon D, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  Google Scholar 

  19. Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385:338–352

    Article  Google Scholar 

  20. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2075–2082

  21. Wang D, Wang Q, He L, Gao X, Tian Y (2020) Joint and individual matrix factorization hashing for large-scale cross-modal retrieval. Pattern Recognit 107:107479

    Article  Google Scholar 

  22. Shen HT, Liu L, Yang Y, Xu X, Huang Z, Shen F, Hong R (2020) Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans Knowl Data Eng 33(10):3351–3365

    Article  Google Scholar 

  23. Wang L, Zareapoor M, Yang J, Zheng Z (2021) Asymmetric correlation quantization hashing for cross-modal retrieval. IEEE Trans Multimed 24:3665–3678

    Article  Google Scholar 

  24. Liu X, Li Z, Wang J, Yu G, Domenicon C, Zhang X (2019) Cross-modal zero-shot hashing. In: 2019 IEEE International Conference on data mining (ICDM), pages 449–458. IEEE

  25. Zhong F, Chen Z, Min G (2019) An exploration of cross-modal retrieval for unseen concepts. In: Database systems for advanced applications: 24th International Conference, DASFAA 2019, Chiang Mai, Thailand, April 22–25, 2019, Proceedings, Part II 24, pp 20–35. Springer

  26. Yuan X, Wang G, Chen Z, Zhong F (2021) Chop: an orthogonal hashing method for zero-shot cross-modal retrieval. Pattern Recognit Lett 145:247–253

    Article  Google Scholar 

  27. Zhou ZH (2018) A brief introduction to weakly supervised learning. Natl Sci Rev 5(1):44–53

    Article  Google Scholar 

  28. Wang R, Yu G, Zhang H, Guo M, Cui L, Zhang X (2021) Noise-robust deep cross-modal hashing. Inf Sci 581:136–154

    Article  MathSciNet  Google Scholar 

  29. Zhang C, Li H, Gao Y, Chen C (2022) Weakly-supervised enhanced semantic-aware hashing for cross-modal retrieval. IEEE Trans Knowl Data Eng 35(6):6475–6488

    Google Scholar 

  30. Wang W, Zheng VW, Yu H, Miao C (2019) A survey of zero-shot learning: settings, methods, and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–37

    Google Scholar 

  31. Shu Z, Yong K, Yu J, Gao S, Mao C, Yu Z (2022) Discrete asymmetric zero-shot hashing with application to cross-modal retrieval. Neurocomputing 511:366–379

    Article  Google Scholar 

  32. Wang R, Yu G, Liu L, Cui L, Domeniconi C, Zhang X (2021) Cross-modal zero-shot hashing by label attributes embedding. arXiv preprint arXiv:2111.04080

  33. Song L, Shang X, Yang C, Sun M (2022) Attribute-guided multiple instance hashing network for cross-modal zero-shot hashing. IEEE Trans Multimed 25:5305–5318

    Article  Google Scholar 

  34. Cui H, Zhu L, Cui C, Nie X, Zhang H (2020) Efficient weakly-supervised discrete hashing for large-scale social image retrieval. Pattern Recognit Lett 130:174–181

    Article  Google Scholar 

  35. Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1944–1952

  36. Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang I, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: 32nd Conference on Neural Information Processing Systems (NIPS), pp 1–11

  37. Liu X, Yu G, Domeniconi C, Wang J, Xiao G, Guo M (2019) Weakly supervised cross-modal hashing. IEEE Trans Big Data 8(2):552–563

    Google Scholar 

  38. Wang M, Zhou W, Tian Q, Li H (2021) Deep enhanced weakly-supervised hashing with iterative tag refinement. IEEE Trans Multimed 24:2779–2790

    Article  Google Scholar 

  39. Hu P, Peng X, Zhu H, Zhen L, Lin J (2021) Learning cross-modal retrieval with noisy labels. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 5403–5413

  40. Kulis Bn, Grauman K (2009). Kernelized locality-sensitive hashing for scalable image search. In: 2009 IEEE 12th International Conference on computer vision, pp 2130–2137. IEEE

  41. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  42. Wang Y, Chen ZD, Luo X, Xu XS (2022) A high-dimensional sparse hashing framework for cross-modal retrieval. IEEE Trans Circuits Syst Video Technol 32(12):8822–8836

    Article  Google Scholar 

  43. Liu X, Nie X, Zeng W, Cui C, Zhu L, Yin Y (2018) Fast discrete cross-modal hashing with regressing from semantic labels. In: Proceedings of the 26th ACM International Conference on Multimedia, pp 1662–1669

  44. Liu W, Mu C, Kumar S, Chang SF (2014) Discrete graph hashing. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp 3419–3427

  45. Cai JF, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982

    Article  MathSciNet  Google Scholar 

  46. Rudin Walter et al (1976) Principles of mathematical analysis, vol 3. McGraw-hill, New York

    Google Scholar 

  47. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on multimedia, pp 251–260

  48. Huiskes MJ, Lew MS (2008). The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on multimedia information retrieval, pp 39–43

  49. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3864–3872

  50. Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009). Nus-wide: a real-world web image database from national university of Singapore. In Proceedings of the ACM International Conference on image and video retrieval, pp 1–9

  51. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C (2014) Microsoft coco: common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp 740–755. Springer

  52. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  53. Wang D, Gao X, Wang X, He L (2018) Label consistent matrix factorization hashing for large-scale cross-modal similarity search. IEEE Trans Pattern Anal Mach Intell 41(10):2466–2479

    Article  Google Scholar 

  54. Wang Y, Luo X, Nie L, Song J, Zhang W, Xu X (2020) Batch: a scalable asymmetric discrete cross-modal hashing. IEEE Trans Knowl Data Eng 33(11):3507–3519

    Article  Google Scholar 

  55. Luo K, Zhang C, Li H, Jia X, Chen C (2023) Adaptive marginalized semantic hashing for unpaired cross-modal retrieval. IEEE Trans Multimed 25:9082–9095

    Article  Google Scholar 

  56. Sun Y, Ren Z, Hu P, Peng D, Wang X (2023) Hierarchical consensus hashing for cross-modal retrieval. IEEE Trans Multimed 26:824–836

    Article  Google Scholar 

  57. Ni H, Zhang J, Kang P, Fang X, Sun W, Xie S, Han N (2023) Cross-modal hashing with missing labels. Neural Netw 165:60–76

    Article  Google Scholar 

  58. Xu Y, Yang Y, Shen F, Xu X, Zhou Y, Shen HT (2017) Attribute hashing for zero-shot image retrieval. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 133–138. IEEE

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China [Grant nos. 61603159, 62162033, U21B2027], Yunnan Provincial Major Science and Technology Special Plan Projects [Grant nos. 202002AD080001, 202103AA080015], Yunnan Foundation Research Projects [Grant nos. 202101AT070438, 202101BE070001-056].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenqiu Shu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yong, K., Shu, Z., Wang, H. et al. Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02131-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-024-02131-5

Keywords

Navigation