Abstract
Known for efficient computation and easy storage, hashing has been extensively explored in cross-modal retrieval. The majority of current hashing models are predicated on the premise of a direct one-to-one mapping between data points. However, in real practice, data correspondence across modalities may be partially provided. In this research, we introduce an innovative unsupervised hashing technique designed for semi-paired cross-modal retrieval tasks, named Reconstruction Relations Embedded Hashing (RREH). RREH assumes that multi-modal data share a common subspace. For paired data, RREH explores the latent consistent information of heterogeneous modalities by seeking a shared representation. For unpaired data, to effectively capture the latent discriminative features, the high-order relationships between unpaired data and anchors are embedded into the latent subspace, which are computed by efficient linear reconstruction. The anchors are sampled from paired data, which improves the efficiency of hash learning. The RREH trains the underlying features and the binary encodings in a unified framework with high-order reconstruction relations preserved. With the well devised objective function and discrete optimization algorithm, RREH is designed to be scalable, making it suitable for large-scale datasets and facilitating efficient cross-modal retrieval. In the evaluation process, the proposed is tested with partially paired data to establish its superiority over several existing methods.
J. Wang and H. Shi—Equal Contributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zheng, C., Zhu, L., Cheng, Z., Li, J., Liu, A.A.: Adaptive partial multi-view hashing for efficient social image retrieval. IEEE Trans. Multimed. 23, 4079–4092 (2021)
Teng, S., et al.: Joint specifics and dual-semantic hashing learning for cross-modal retrieval. Neurocomputing 565, 126993 (2024)
Luo, K., Zhang, X., Wang, J., Li, H., Cheng, N., Xiao, J.: Contrastive latent space reconstruction learning for audio-text retrieval. In: 35th International Conference on Tools with Artificial Intelligence, pp. 913–917 (2023)
Liu, Y., Ji, S., Fu, Q., Chiu, D.K.W.: A semantic-consistency asymmetric matrix factorization hashing method for cross-modal retrieval. Multimed. Tools Appl. 83(3), 6621–6649 (2024)
Zhang, D., Wu, X., Xu, T., Yin, H.: DAH: discrete asymmetric hashing for efficient cross-media retrieval. IEEE Trans. Knowl. Data Eng. 35(2), 1365–1378 (2023)
Wang, Y., Luo, X., Nie, L., Song, J., Zhang, W., Xu, X.S.: Batch: a scalable asymmetric discrete cross-modal hashing. IEEE Trans. Knowl. Data Eng. 33(11), 3507–3519 (2020)
Cheng, M., Jing, L., Ng, M.K.: Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Tran. Inf. Syst. 38(3), 1–25 (2020)
Shi, Y., et al.: Deep adaptively enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7255–7268 (2022)
Deng, Y., Tang, H., Zhang, X., Cheng, N., Xiao, J., Wang, J.: Learning disentangled speech representations with contrastive learning and time-invariant retrieval. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–5 (2024)
Zeng, X., Xu, K., Xie, Y.: Pseudo-label driven deep hashing for unsupervised cross-modal retrieval. Int. J. Mach. Learn. Cybern. 14(10), 3437–3456 (2023)
Shen, X., Shen, F., Sun, Q.S., Yang, Y., Yuan, Y.H., Shen, H.T.: Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. IEEE Trans. Cybern. 47(12), 4275–4288 (2016)
Lu, K., et al.: Deep unsupervised momentum contrastive hashing for cross-modal retrieval. In: IEEE International Conference on Multimedia and Expo, pp. 126–131 (2023)
Shen, H.T., et al.: Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans. Knowl. Data Eng. 33(10), 3351–3365 (2020)
Zhang, P.F., Li, C.X., Liu, M.Y., Nie, L., Xu, X.S.: Semi-relaxation supervised hashing for cross-modal retrieval. In: Proceedings of ACM International Conference on Multimedia, pp. 1762–1770. ACM (2017)
Huiskes, M.J., Lew, M.S.: The MIR Flickr retrieval evaluation. In: Proceedings of ACM International Conference on Multimedia Information Retrieval, pp. 39–43 (2008)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)
Guo, J., Zhu, W.: Collective affinity learning for partial cross-modal hashing. IEEE Trans. Image Process. 29, 1344–1355 (2019)
Acknowledgements
This paper is supported by the Key Research and Development Program of Guangdong Province under grant No. 2021B0101400003.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, J., Shi, H., Luo, K., Zhang, X., Cheng, N., Xiao, J. (2024). RREH: Reconstruction Relations Embedded Hashing for Semi-paired Cross-Modal Retrieval. In: Huang, DS., Zhang, X., Zhang, C. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14879. Springer, Singapore. https://doi.org/10.1007/978-981-97-5675-9_32
Download citation
DOI: https://doi.org/10.1007/978-981-97-5675-9_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5674-2
Online ISBN: 978-981-97-5675-9
eBook Packages: Computer ScienceComputer Science (R0)