Graph Rebasing and Joint Similarity Reconstruction for Cross-Modal Hash Retrieval

Yao, Dan; Li, Zhixin

doi:10.1007/978-3-031-43415-0_6

Dan Yao^12,13 &
Zhixin Li^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14170))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

Abstract

Cross-modal hash retrieval methods improve retrieval speed and reduce storage space at the same time. The accuracy of intra-modal and inter-modal similarity is insufficient, and the large gap between modalities leads to semantic bias. In this paper, we propose a Graph Rebasing and Joint Similarity Reconstruction (GRJSR) method for cross-modal hash retrieval. Particularly, the graph rebasing module is used to filter out graph nodes with weak similarity and associate graph nodes with strong similarity, resulting in fine-grained intra-modal similarity relation graphs. The joint similarity reconstruction module further strengthens cross-modal correlation and implements fine-grained similarity alignment between modalities. In addition, we combine the similarity representation of real-valued and hash features to design the intra-modal and inter-modal training strategies. GRJSR conducted extensive experiments on two cross-modal retrieval datasets, and the experimental results effectively validated the superiority of the proposed method and significantly improved the retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, Z., Long, M., Wang, J., Yu, P.S.: Hashnet: deep learning to hash by continuation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5608–5617 (2017)
Google Scholar
Chen, S., Wu, S., Wang, L.: Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval. PeerJ Comput. Sci. 7, e552 (2021)
Article Google Scholar
Chen, S., Wu, S., Wang, L., Yu, Z.: Self-attention and adversary learning deep hashing network for cross-modal retrieval. Comput. Electr. Eng. 93, 107262 (2021)
Article Google Scholar
Cheng, S., Wang, L., Du, A.: Deep semantic-preserving reconstruction hashing for unsupervised cross-modal retrieval. Entropy 22(11), 1266 (2020)
Article Google Scholar
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)
Google Scholar
Chun, S., Oh, S.J., De Rezende, R.S., Kalantidis, Y., Larlus, D.: Probabilistic embeddings for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8415–8424 (2021)
Google Scholar
Fang, X., et al.: Average approximate hashing-based double projections learning for cross-modal retrieval. IEEE Trans. Cybern. 52(11), 11780–11793 (2021)
Article Google Scholar
Fang, X., Liu, Z., Han, N., Jiang, L., Teng, S.: Discrete matrix factorization hashing for cross-modal retrieval. Int. J. Mach. Learn. Cybern. 12, 3023–3036 (2021)
Article Google Scholar
Fang, Y.: Robust multimodal discrete hashing for cross-modal similarity search. J. Vis. Commun. Image Represent. 79, 103256 (2021)
Article Google Scholar
Hou, C., Li, Z., Tang, Z., Xie, X., Ma, H.: Multiple instance relation graph reasoning for cross-modal hash retrieval. Knowl.-Based Syst. 256, 109891 (2022)
Article Google Scholar
Hou, C., Li, Z., Wu, J.: Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism. In: Applied Intelligence, pp. 1–16 (2022)
Google Scholar
Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 39–43 (2008)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Li, H., Zhang, C., Jia, X., Gao, Y., Chen, C.: Adaptive label correlation based asymmetric discrete hashing for cross-modal retrieval. IEEE Trans. Knowl. Data Eng. (2021)
Google Scholar
Li, X., Hu, D., Nie, F.: Deep binary reconstruction for cross-modal hashing. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1398–1406 (2017)
Google Scholar
Li, Z., Ling, F., Zhang, C., Ma, H.: Combining global and local similarity for cross-media retrieval. IEEE Access 8, 21847–21856 (2020)
Article Google Scholar
Li, Z., Xie, X., Ling, F., Ma, H., Shi, Z.: Matching images and texts with multi-head attention network for cross-media hashing retrieval. Eng. Appl. Artif. Intell. 106, 104475 (2021)
Article Google Scholar
Liu, H., Xiong, J., Zhang, N., Liu, F., Zou, X.: Quadruplet-based deep cross-modal hashing. Comput. Intell. Neurosci. 2021, 9968716 (2021)
Google Scholar
Liu, S., Qian, S., Guan, Y., Zhan, J., Ying, L.: Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 1379–1388 (2020)
Google Scholar
Liu, X., Wang, X., Cheung, Y.M.: Fddh: fast discriminative discrete hashing for large-scale cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst. 33(11), 6306–6320 (2021)
Article MathSciNet Google Scholar
Messina, N., et al.: Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In: Proceedings of the 19th International Conference on Content-Based Multimedia Indexing, pp. 64–70 (2022)
Google Scholar
Qin, J., Fei, L., Zhu, J., Wen, J., Tian, C., Wu, S.: Scalable discriminative discrete hashing for large-scale cross-modal retrieval. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4330–4334. IEEE (2021)
Google Scholar
Shen, H.T., et al.: Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans. Knowl. Data Eng. 33(10), 3351–3365 (2020)
Article Google Scholar
Shen, X., Zhang, H., Li, L., Zhang, Z., Chen, D., Liu, L.: Clustering-driven deep adversarial hashing for scalable unsupervised cross-modal retrieval. Neurocomputing 459, 152–164 (2021)
Article Google Scholar
Song, G., Tan, X., Zhao, J., Yang, M.: Deep robust multilevel semantic hashing for multi-label cross-modal retrieval. Pattern Recogn. 120, 108084 (2021)
Article Google Scholar
Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3027–3035 (2019)
Google Scholar
Wang, D., Cui, P., Ou, M., Zhu, W.: Deep multimodal hashing with orthogonal regularization. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Google Scholar
Wang, K., Herranz, L., van de Weijer, J.: Continual learning in cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3623–3633 (2021)
Google Scholar
Wang, S., Zhao, H., Nai, K.: Learning a maximized shared latent factor for cross-modal hashing. Knowl.-Based Syst. 228, 107252 (2021)
Article Google Scholar
Wang, W., Shen, Y., Zhang, H., Yao, Y., Liu, L.: Set and rebase: determining the semantic graph connectivity for unsupervised cross-modal hashing. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 853–859 (2021)
Google Scholar
Wang, X., Hu, P., Zhen, L., Peng, D.: Drsl: deep relational similarity learning for cross-modal retrieval. Inf. Sci. 546, 298–311 (2021)
Article Google Scholar
Xie, X., Li, Z., Tang, Z., Yao, D., Ma, H.: Unifying knowledge iterative dissemination and relational reconstruction network for image-text matching. Inform. Process. Manag. 60(1), 103154 (2023)
Article Google Scholar
Yang, Z., et al.: Nsdh: A nonlinear supervised discrete hashing framework for large-scale cross-modal retrieval. Knowl.-Based Syst. 217, 106818 (2021)
Article Google Scholar
Yi, J., Liu, X., Cheung, Y.m., Xu, X., Fan, W., He, Y.: Efficient online label consistent hashing for large-scale cross-modal retrieval. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021)
Google Scholar
Yu, J., Zhou, H., Zhan, Y., Tao, D.: Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4626–4634 (2021)
Google Scholar
Yu, T., Yang, Y., Li, Y., Liu, L., Fei, H., Li, P.: Heterogeneous attention network for effective and efficient cross-modal retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1146–1156 (2021)
Google Scholar
Zhang, D., Wu, X.J., Yu, J.: Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. ACM Trans. Multimedia Comput. Commun. Appli. (TOMM) 17(3), 1–18 (2021)
Google Scholar
Zhang, D., Wu, X.J., Yu, J.: Learning latent hash codes with discriminative structure preserving for cross-modal retrieval. Pattern Anal. Appl. 24, 283–297 (2021)
Article Google Scholar
Zhang, D., Li, W.J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)
Google Scholar
Zhang, H., Mao, Z., Zhang, K., Zhang, Y.: Show your faith: Cross-modal confidence-aware network for image-text matching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3262–3270 (2022)
Google Scholar
Zhang, K., Mao, Z., Wang, Q., Zhang, Y.: Negative-aware attention framework for image-text matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15661–15670 (2022)
Google Scholar
Zhang, P.F., Li, Y., Huang, Z., Xu, X.S.: Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 24, 466–479 (2021)
Article Google Scholar
Zhang, P.F., Luo, Y., Huang, Z., Xu, X.S., Song, J.: High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 563–583 (2021)
Article Google Scholar
Zhen, Y., Yeung, D.Y.: Co-regularized hashing for multimodal data. In: Advances in Neural Information Processing Systems 25 (2012)
Google Scholar
Zhu, L., Huang, Z., Liu, X., He, X., Sun, J., Zhou, X.: Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans. Multimedia 19(9), 2066–2079 (2017)
Article Google Scholar
Zhu, L., Tian, G., Wang, B., Wang, W., Zhang, D., Li, C.: Multi-attention based semantic deep hashing for cross-modal retrieval. Appl. Intell. 51, 5927–5939 (2021)
Article Google Scholar
Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 143–152 (2013)
Google Scholar
Zou, X., Wang, X., Bakker, E.M., Wu, S.: Multi-label semantics preserving based deep cross-modal hashing. Signal Process. Image Commun. 93, 116131 (2021)
Article Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Nos. 62276073, 61966004), Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), Guangxi "Bagui Scholar" Teams for Innovation and Research Project, Innovation Project of Guangxi Graduate Education (No. YCBZ2023055), and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004, China
Dan Yao & Zhixin Li
Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, China
Dan Yao & Zhixin Li

Authors

Dan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixin Li .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Statement

We affirm that the ideas, concepts, and findings presented in this paper are the result of our own original work, conducted with honesty, rigor, and transparency. We have provided proper citations and references for all sources used, and have clearly acknowledged the contributions of others where applicable.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, D., Li, Z. (2023). Graph Rebasing and Joint Similarity Reconstruction for Cross-Modal Hash Retrieval. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14170. Springer, Cham. https://doi.org/10.1007/978-3-031-43415-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-43415-0_6
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43414-3
Online ISBN: 978-3-031-43415-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Graph Rebasing and Joint Similarity Reconstruction for Cross-Modal Hash Retrieval