Abstract
Due to the low storage cost and computational efficiency, hashing approaches have drawn considerable interest and gained great success in multimodal retrieval. However, most existing works study the local geometric structure in the original space, which suffers from intra- and inter-modality ambiguity, resulting in low discriminative hash codes. To address this issue, we propose a novel cross-modal hashing approach by taking inter- and intra-modality structure preserving into consideration, dubbed discriminative structure preserving hashing (DSPH). Specifically, DSPH explores the intra- and inter-modality in the latent structure of the constructed common space. In addition, the local geometric consistency is improved by a supervised shrinking scheme. DSPH learns the hash codes and latent features based on factorization coding scheme. The objective function includes common latent subspace learning and inter- & intra-modality structure embedding. We devise an alternative optimization scheme, where the hash codes are solved by a bitwise scheme, and the large quantization error can be avoided. Owing to the merit of DSPH, more discriminative hash codes can be generated. The extensive experimental results on several widely used databases demonstrate that the proposed algorithm outperforms several state-of-art cross-media retrieval methods.
References
Asuncion A, Newman D (2007) UCI machine learning repository
Cao Y, Long M, Wang J, Yu PS (2016) Correlation hashing network for efficient cross-modal retrieval. arXiv preprint arXiv:1602.06697
Chen Y, Lai Z, Ding Y, Lin K, Wong WK (2019) Deep supervised hashing with anchor graph. In: Proceedings of the IEEE international conference on computer vision, pp 9796–9804
Choraś RS, Andrysiak T, Choraś M (2007) Integrated color, texture and shape information for content-based image retrieval. Pattern Anal Appl 10(4):333–343
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry, pp 253–262
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
Fang X, Teng S, Lai Z, He Z, Xie S, Wong WK (2017) Robust latent subspace learning for image classification. IEEE Trans Neural Netw Learn Syst 29(6):2502–2515
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vision 106(2):210–233
He R, Zhang M, Wang L, Ji Y, Yin Q (2015) Cross-modal subspace learning via pairwise constraints. IEEE Trans Image Process 24(12):5543–5556
Hui K, Wang C (2008) Clustering-based locally linear embedding. In: 2008 19th international conference on pattern recognition, pp 1–4. IEEE
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43
Jin L, Li K, Li Z, Xiao F, Qi GJ, Tang J (2018) Deep semantic-preserving ordinal hashing for cross-modal similarity search. IEEE Trans Neural Netw Learn Syst 30(5):1429–1440
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence
Lai Z, Chen Y, Wu J, Wong WK, Shen F (2018) Jointly sparse hashing for image retrieval. IEEE Trans Image Process 27(12):6147–6158
Li K, Qi GJ, Ye J, Hua KA (2016) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838
Li Z, Tang J (2016) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288
Lin G, Shen C, Shi Q, Van den Hengel A, Suter D (2014) Fast supervised hashing with decision trees for high-dimensional data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1963–1970
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872
Liu DC, Nocedal J (1989) On the limited memory bfgs method for large scale optimization. Math Program 45(1–3):503–528
Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. arXiv preprint arXiv:1603.05572
Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7380–7388
Liu X, Hu Z, Ling H, Cheung Ym (2019) Mtfh: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. In: IEEE transactions on pattern analysis and machine intelligence
Masci J, Bronstein MM, Bronstein AM, Schmidhuber J (2013) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830
Qin Z, Yu J, Cong Y, Wan T (2016) Topic correlation model for cross-modal multimedia information retrieval. Pattern Anal Appl 19(4):1007–1022
Rafailidis D, Crestani F (2016) Cluster-based joint matrix factorization hashing for cross-modal retrieval. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 781–784
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260
Rupnik J, Shawe-Taylor J (2010) Multi-view canonical correlation analysis. In: Conference on data mining and data warehouses (SiKDD 2010), pp 1–4
Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: CVPR 2011, pp 593–600. IEEE
Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: A discriminative latent space. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2160–2167. IEEE
Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45
Shen GL, Wu XJ (2013) Content based image retrieval by combining color, texture and centrist
Shen X, Shen F, Sun QS, Yang Y, Yuan YH, Shen HT (2016) Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. IEEE Trans Cybern 47(12):4275–4288
Shu X, Wu XJ (2011) A novel contour descriptor for 2d shape matching and its application to image retrieval. Image Vis Comput 29(4):286–294
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796
Tang J, Li Z (2017) Weakly supervised multimodal hashing for scalable social image retrieval. IEEE Trans Circuits Syst Video Technol 28(10):2730–2741
Tang J, Li Z, Wang M, Zhao R (2015) Neighborhood discriminant hashing for large-scale image retrieval. IEEE Trans Image Process 24(9):2827–2840
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Wan M, Lai Z, Yang G, Yang Z, Zhang F, Zheng H (2017) Local graph embedding based on maximum margin criterion via fuzzy set. Fuzzy Sets Syst 318:120–131
Wan M, Li M, Yang G, Gai S, Jin Z (2014) Feature extraction using two-dimensional maximum embedding difference. Inf Sci 274:55–69
Wang D, Gao X, Wang X, He L (2015) Semantic topic multimodal hashing for cross-media retrieval. In: Twenty-fourth international joint conference on artificial intelligence
Wang Y, Lin X, Wu L, Zhang W, Zhang Q (2015) Lbmch: Learning bridging mapping for cross-modal hashing. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 999–1002
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760
Wu B, Yang Q, Zheng WS, Wang Y, Wang J (2015) Quantized correlation hashing for fast cross-modal search. In: Twenty-fourth international joint conference on artificial intelligence
Wu F, Yu Z, Yang Y, Tang S, Zhang Y, Zhuang Y (2013) Sparse multi-modal hashing. IEEE Trans Multimedia 16(2):427–439
Yu J, Wu XJ, Kittler J (2018) Semi-supervised hashing for semi-paired cross-view retrieval. In: 2018 24th international conference on pattern recognition (ICPR), pp 958–963. IEEE
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Twenty-Eighth AAAI conference on artificial intelligence
Zhang L, Zhang Y, Hong R, Tian Q (2015) Full-space local topology extraction for cross-modal retrieval. IEEE Trans Image Process 24(7):2212–2224
Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384
Zhong F, Min G, Leng Y, Ying Y (2018) Supervised intra-and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6:27796–27808
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, pp 415–424
Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on Multimedia, pp 143–152
Acknowledgements
This research was supported by the National Nature Science Foundation of China [Grant 61672265, U1836218] and the 111 Project of Chinese Ministry of Education under Grant B12018.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, D., Wu, XJ. & Yu, J. Learning latent hash codes with discriminative structure preserving for cross-modal retrieval. Pattern Anal Applic 24, 283–297 (2021). https://doi.org/10.1007/s10044-020-00893-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00893-6