Abstract
Cross-modal hash retrieval methods improve retrieval speed and reduce storage space at the same time. The accuracy of intra-modal and inter-modal similarity is insufficient, and the large gap between modalities leads to semantic bias. In this paper, we propose a Graph Rebasing and Joint Similarity Reconstruction (GRJSR) method for cross-modal hash retrieval. Particularly, the graph rebasing module is used to filter out graph nodes with weak similarity and associate graph nodes with strong similarity, resulting in fine-grained intra-modal similarity relation graphs. The joint similarity reconstruction module further strengthens cross-modal correlation and implements fine-grained similarity alignment between modalities. In addition, we combine the similarity representation of real-valued and hash features to design the intra-modal and inter-modal training strategies. GRJSR conducted extensive experiments on two cross-modal retrieval datasets, and the experimental results effectively validated the superiority of the proposed method and significantly improved the retrieval performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cao, Z., Long, M., Wang, J., Yu, P.S.: Hashnet: deep learning to hash by continuation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5608–5617 (2017)
Chen, S., Wu, S., Wang, L.: Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval. PeerJ Comput. Sci. 7, e552 (2021)
Chen, S., Wu, S., Wang, L., Yu, Z.: Self-attention and adversary learning deep hashing network for cross-modal retrieval. Comput. Electr. Eng. 93, 107262 (2021)
Cheng, S., Wang, L., Du, A.: Deep semantic-preserving reconstruction hashing for unsupervised cross-modal retrieval. Entropy 22(11), 1266 (2020)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)
Chun, S., Oh, S.J., De Rezende, R.S., Kalantidis, Y., Larlus, D.: Probabilistic embeddings for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8415–8424 (2021)
Fang, X., et al.: Average approximate hashing-based double projections learning for cross-modal retrieval. IEEE Trans. Cybern. 52(11), 11780–11793 (2021)
Fang, X., Liu, Z., Han, N., Jiang, L., Teng, S.: Discrete matrix factorization hashing for cross-modal retrieval. Int. J. Mach. Learn. Cybern. 12, 3023–3036 (2021)
Fang, Y.: Robust multimodal discrete hashing for cross-modal similarity search. J. Vis. Commun. Image Represent. 79, 103256 (2021)
Hou, C., Li, Z., Tang, Z., Xie, X., Ma, H.: Multiple instance relation graph reasoning for cross-modal hash retrieval. Knowl.-Based Syst. 256, 109891 (2022)
Hou, C., Li, Z., Wu, J.: Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism. In: Applied Intelligence, pp. 1–16 (2022)
Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 39–43 (2008)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Li, H., Zhang, C., Jia, X., Gao, Y., Chen, C.: Adaptive label correlation based asymmetric discrete hashing for cross-modal retrieval. IEEE Trans. Knowl. Data Eng. (2021)
Li, X., Hu, D., Nie, F.: Deep binary reconstruction for cross-modal hashing. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1398–1406 (2017)
Li, Z., Ling, F., Zhang, C., Ma, H.: Combining global and local similarity for cross-media retrieval. IEEE Access 8, 21847–21856 (2020)
Li, Z., Xie, X., Ling, F., Ma, H., Shi, Z.: Matching images and texts with multi-head attention network for cross-media hashing retrieval. Eng. Appl. Artif. Intell. 106, 104475 (2021)
Liu, H., Xiong, J., Zhang, N., Liu, F., Zou, X.: Quadruplet-based deep cross-modal hashing. Comput. Intell. Neurosci. 2021, 9968716 (2021)
Liu, S., Qian, S., Guan, Y., Zhan, J., Ying, L.: Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 1379–1388 (2020)
Liu, X., Wang, X., Cheung, Y.M.: Fddh: fast discriminative discrete hashing for large-scale cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst. 33(11), 6306–6320 (2021)
Messina, N., et al.: Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In: Proceedings of the 19th International Conference on Content-Based Multimedia Indexing, pp. 64–70 (2022)
Qin, J., Fei, L., Zhu, J., Wen, J., Tian, C., Wu, S.: Scalable discriminative discrete hashing for large-scale cross-modal retrieval. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4330–4334. IEEE (2021)
Shen, H.T., et al.: Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans. Knowl. Data Eng. 33(10), 3351–3365 (2020)
Shen, X., Zhang, H., Li, L., Zhang, Z., Chen, D., Liu, L.: Clustering-driven deep adversarial hashing for scalable unsupervised cross-modal retrieval. Neurocomputing 459, 152–164 (2021)
Song, G., Tan, X., Zhao, J., Yang, M.: Deep robust multilevel semantic hashing for multi-label cross-modal retrieval. Pattern Recogn. 120, 108084 (2021)
Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3027–3035 (2019)
Wang, D., Cui, P., Ou, M., Zhu, W.: Deep multimodal hashing with orthogonal regularization. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Wang, K., Herranz, L., van de Weijer, J.: Continual learning in cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3623–3633 (2021)
Wang, S., Zhao, H., Nai, K.: Learning a maximized shared latent factor for cross-modal hashing. Knowl.-Based Syst. 228, 107252 (2021)
Wang, W., Shen, Y., Zhang, H., Yao, Y., Liu, L.: Set and rebase: determining the semantic graph connectivity for unsupervised cross-modal hashing. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 853–859 (2021)
Wang, X., Hu, P., Zhen, L., Peng, D.: Drsl: deep relational similarity learning for cross-modal retrieval. Inf. Sci. 546, 298–311 (2021)
Xie, X., Li, Z., Tang, Z., Yao, D., Ma, H.: Unifying knowledge iterative dissemination and relational reconstruction network for image-text matching. Inform. Process. Manag. 60(1), 103154 (2023)
Yang, Z., et al.: Nsdh: A nonlinear supervised discrete hashing framework for large-scale cross-modal retrieval. Knowl.-Based Syst. 217, 106818 (2021)
Yi, J., Liu, X., Cheung, Y.m., Xu, X., Fan, W., He, Y.: Efficient online label consistent hashing for large-scale cross-modal retrieval. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021)
Yu, J., Zhou, H., Zhan, Y., Tao, D.: Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4626–4634 (2021)
Yu, T., Yang, Y., Li, Y., Liu, L., Fei, H., Li, P.: Heterogeneous attention network for effective and efficient cross-modal retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1146–1156 (2021)
Zhang, D., Wu, X.J., Yu, J.: Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. ACM Trans. Multimedia Comput. Commun. Appli. (TOMM) 17(3), 1–18 (2021)
Zhang, D., Wu, X.J., Yu, J.: Learning latent hash codes with discriminative structure preserving for cross-modal retrieval. Pattern Anal. Appl. 24, 283–297 (2021)
Zhang, D., Li, W.J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)
Zhang, H., Mao, Z., Zhang, K., Zhang, Y.: Show your faith: Cross-modal confidence-aware network for image-text matching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3262–3270 (2022)
Zhang, K., Mao, Z., Wang, Q., Zhang, Y.: Negative-aware attention framework for image-text matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15661–15670 (2022)
Zhang, P.F., Li, Y., Huang, Z., Xu, X.S.: Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 24, 466–479 (2021)
Zhang, P.F., Luo, Y., Huang, Z., Xu, X.S., Song, J.: High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 563–583 (2021)
Zhen, Y., Yeung, D.Y.: Co-regularized hashing for multimodal data. In: Advances in Neural Information Processing Systems 25 (2012)
Zhu, L., Huang, Z., Liu, X., He, X., Sun, J., Zhou, X.: Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans. Multimedia 19(9), 2066–2079 (2017)
Zhu, L., Tian, G., Wang, B., Wang, W., Zhang, D., Li, C.: Multi-attention based semantic deep hashing for cross-modal retrieval. Appl. Intell. 51, 5927–5939 (2021)
Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 143–152 (2013)
Zou, X., Wang, X., Bakker, E.M., Wu, S.: Multi-label semantics preserving based deep cross-modal hashing. Signal Process. Image Commun. 93, 116131 (2021)
Acknowledgements
This work is supported by National Natural Science Foundation of China (Nos. 62276073, 61966004), Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), Guangxi "Bagui Scholar" Teams for Innovation and Research Project, Innovation Project of Guangxi Graduate Education (No. YCBZ2023055), and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Statement
We affirm that the ideas, concepts, and findings presented in this paper are the result of our own original work, conducted with honesty, rigor, and transparency. We have provided proper citations and references for all sources used, and have clearly acknowledged the contributions of others where applicable.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yao, D., Li, Z. (2023). Graph Rebasing and Joint Similarity Reconstruction for Cross-Modal Hash Retrieval. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14170. Springer, Cham. https://doi.org/10.1007/978-3-031-43415-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-43415-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43414-3
Online ISBN: 978-3-031-43415-0
eBook Packages: Computer ScienceComputer Science (R0)