Abstract
Recently, zero-shot hashing methods have been successfully applied to cross-modal retrieval. However, these methods typically assume that the training data labels are accurate and noise-free, which is unrealistic in real-world scenarios due to the noises introduced by manual or automatic annotation. To address this problem, we propose a robust zero-shot discrete hashing with noisy labels (RZSDH) method, which fully considers the impact of noisy labels in real scenes. Our RZSDH method incorporates the sparse and low-rank constraints on the noise matrix and the recovered label matrix, respectively, to effectively reduce the negative impact of noisy labels. Therefore, this significantly enhances the robustness of our proposed method in practice cross-modal retrieval tasks. Additionally, the proposed RZSDH method learns a representation vector of each category attribute, which effectively captures the relationship between seen classes and unseen classes. Furthermore, our approach learns the common latent representation with drift from multimodal data features, which is more conducive to obtaining stable hash codes and hash functions. Finally, we employ a fine-grained similarity preserving strategy to generate more discriminative hash codes. Experiments on several benchmark datasets verify the effectiveness and robustness of the proposed RZSDH method.
Similar content being viewed by others
Data availability
All data for this study are available from public repositories.
References
Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 37–45
Kou F, Du J, Cui W, Shi L, Cheng P, Chen J, Li J (2019) Common semantic representation method based on object attention and adversarial learning for cross-modal data in iov. IEEE Trans Veh Technol 68(12):11588–11598
Shu Z, Li L, Yu J, Zhang D, Yu Z, Wu XJ (2023) Online supervised collective matrix factorization hashing for cross-modal retrieval. Appl Intell 53(11):14201–14218
Shi L, Du J, Cheng G, Liu X, Xiong Z, Luo J (2022) Cross-media search method based on complementary attention and generative adversarial network for social networks. Int J Intell Syst 37(8):4393–4416
Shi L, Luo J, Zhu C, Kou F, Cheng G, Liu X (2023) A survey on cross-media search based on user intention understanding in social networks. Inform Fusion 91:566–581
Yu J, Huang W, Li Z, Shu Z, Zhu L (2022) Hadamard matrix-guided multi-modal hashing for multi-modal retrieval. Digital Signal Process 130:103743
Li H, Zhang C, Jia X, Gao Y, Chen C (2021) Adaptive label correlation based asymmetric discrete hashing for cross-modal retrieval. IEEE Trans Knowl Data Eng 35(2):1185–1199
Shu Z, Bai Y, Zhang D, Yu J, Yu Z, Wu XJ (2022) Specific class center guided deep hashing for cross-modal retrieval. Inf Sci 609:304–318
Shu Z, Yong K, Zhang D, Yu J, Yu Z, Wu XJ (2023) Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput Appl 35(9):6665–6684
Hong C, Yu J, Zhang J, Jin X, Lee K (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961
Hong C, Chen L, Liang Y, Zeng Z (2021) Stacked capsule graph autoencoders for geometry-aware 3d head pose estimation. Comput Vis Image Underst 208:103224
Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, Chen X (2013) Image-based 3d human pose recovery with locality sensitive sparse retrieval. In: 2013 IEEE International Conference on systems, man, and cybernetics, pp 2103–2108. IEEE, 2013
Yu J, Zhang D, Shu Z, Chen F (2022) Adaptive multi-modal fusion hashing via hadamard matrix. Appl Intell 52(15):17170–17184
Hu P, Zhu H, Lin J, Peng D, Zhao YP, Peng X (2022) Unsupervised contrastive cross-modal hashing. IEEE Trans Pattern Anal Mach Intell 45(3):3877–3889
Yang X, Liu W, Liu W, Tao D (2019) A survey on canonical correlation analysis. IEEE Trans Knowl Data Eng 33(6):2349–2368
Hardoon D, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664
Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385:338–352
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2075–2082
Wang D, Wang Q, He L, Gao X, Tian Y (2020) Joint and individual matrix factorization hashing for large-scale cross-modal retrieval. Pattern Recognit 107:107479
Shen HT, Liu L, Yang Y, Xu X, Huang Z, Shen F, Hong R (2020) Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans Knowl Data Eng 33(10):3351–3365
Wang L, Zareapoor M, Yang J, Zheng Z (2021) Asymmetric correlation quantization hashing for cross-modal retrieval. IEEE Trans Multimed 24:3665–3678
Liu X, Li Z, Wang J, Yu G, Domenicon C, Zhang X (2019) Cross-modal zero-shot hashing. In: 2019 IEEE International Conference on data mining (ICDM), pages 449–458. IEEE
Zhong F, Chen Z, Min G (2019) An exploration of cross-modal retrieval for unseen concepts. In: Database systems for advanced applications: 24th International Conference, DASFAA 2019, Chiang Mai, Thailand, April 22–25, 2019, Proceedings, Part II 24, pp 20–35. Springer
Yuan X, Wang G, Chen Z, Zhong F (2021) Chop: an orthogonal hashing method for zero-shot cross-modal retrieval. Pattern Recognit Lett 145:247–253
Zhou ZH (2018) A brief introduction to weakly supervised learning. Natl Sci Rev 5(1):44–53
Wang R, Yu G, Zhang H, Guo M, Cui L, Zhang X (2021) Noise-robust deep cross-modal hashing. Inf Sci 581:136–154
Zhang C, Li H, Gao Y, Chen C (2022) Weakly-supervised enhanced semantic-aware hashing for cross-modal retrieval. IEEE Trans Knowl Data Eng 35(6):6475–6488
Wang W, Zheng VW, Yu H, Miao C (2019) A survey of zero-shot learning: settings, methods, and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–37
Shu Z, Yong K, Yu J, Gao S, Mao C, Yu Z (2022) Discrete asymmetric zero-shot hashing with application to cross-modal retrieval. Neurocomputing 511:366–379
Wang R, Yu G, Liu L, Cui L, Domeniconi C, Zhang X (2021) Cross-modal zero-shot hashing by label attributes embedding. arXiv preprint arXiv:2111.04080
Song L, Shang X, Yang C, Sun M (2022) Attribute-guided multiple instance hashing network for cross-modal zero-shot hashing. IEEE Trans Multimed 25:5305–5318
Cui H, Zhu L, Cui C, Nie X, Zhang H (2020) Efficient weakly-supervised discrete hashing for large-scale social image retrieval. Pattern Recognit Lett 130:174–181
Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1944–1952
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang I, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: 32nd Conference on Neural Information Processing Systems (NIPS), pp 1–11
Liu X, Yu G, Domeniconi C, Wang J, Xiao G, Guo M (2019) Weakly supervised cross-modal hashing. IEEE Trans Big Data 8(2):552–563
Wang M, Zhou W, Tian Q, Li H (2021) Deep enhanced weakly-supervised hashing with iterative tag refinement. IEEE Trans Multimed 24:2779–2790
Hu P, Peng X, Zhu H, Zhen L, Lin J (2021) Learning cross-modal retrieval with noisy labels. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 5403–5413
Kulis Bn, Grauman K (2009). Kernelized locality-sensitive hashing for scalable image search. In: 2009 IEEE 12th International Conference on computer vision, pp 2130–2137. IEEE
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Wang Y, Chen ZD, Luo X, Xu XS (2022) A high-dimensional sparse hashing framework for cross-modal retrieval. IEEE Trans Circuits Syst Video Technol 32(12):8822–8836
Liu X, Nie X, Zeng W, Cui C, Zhu L, Yin Y (2018) Fast discrete cross-modal hashing with regressing from semantic labels. In: Proceedings of the 26th ACM International Conference on Multimedia, pp 1662–1669
Liu W, Mu C, Kumar S, Chang SF (2014) Discrete graph hashing. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp 3419–3427
Cai JF, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Rudin Walter et al (1976) Principles of mathematical analysis, vol 3. McGraw-hill, New York
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on multimedia, pp 251–260
Huiskes MJ, Lew MS (2008). The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on multimedia information retrieval, pp 39–43
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3864–3872
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009). Nus-wide: a real-world web image database from national university of Singapore. In Proceedings of the ACM International Conference on image and video retrieval, pp 1–9
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C (2014) Microsoft coco: common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp 740–755. Springer
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Wang D, Gao X, Wang X, He L (2018) Label consistent matrix factorization hashing for large-scale cross-modal similarity search. IEEE Trans Pattern Anal Mach Intell 41(10):2466–2479
Wang Y, Luo X, Nie L, Song J, Zhang W, Xu X (2020) Batch: a scalable asymmetric discrete cross-modal hashing. IEEE Trans Knowl Data Eng 33(11):3507–3519
Luo K, Zhang C, Li H, Jia X, Chen C (2023) Adaptive marginalized semantic hashing for unpaired cross-modal retrieval. IEEE Trans Multimed 25:9082–9095
Sun Y, Ren Z, Hu P, Peng D, Wang X (2023) Hierarchical consensus hashing for cross-modal retrieval. IEEE Trans Multimed 26:824–836
Ni H, Zhang J, Kang P, Fang X, Sun W, Xie S, Han N (2023) Cross-modal hashing with missing labels. Neural Netw 165:60–76
Xu Y, Yang Y, Shen F, Xu X, Zhou Y, Shen HT (2017) Attribute hashing for zero-shot image retrieval. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 133–138. IEEE
Acknowledgements
This work was supported by the National Natural Science Foundation of China [Grant nos. 61603159, 62162033, U21B2027], Yunnan Provincial Major Science and Technology Special Plan Projects [Grant nos. 202002AD080001, 202103AA080015], Yunnan Foundation Research Projects [Grant nos. 202101AT070438, 202101BE070001-056].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yong, K., Shu, Z., Wang, H. et al. Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02131-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13042-024-02131-5