Learning latent hash codes with discriminative structure preserving for cross-modal retrieval

Zhang, Donglin; Wu, Xiao-Jun; Yu, Jun

doi:10.1007/s10044-020-00893-6

Learning latent hash codes with discriminative structure preserving for cross-modal retrieval

Short paper
Published: 24 June 2020

Volume 24, pages 283–297, (2021)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

402 Accesses
6 Citations
Explore all metrics

Abstract

Due to the low storage cost and computational efficiency, hashing approaches have drawn considerable interest and gained great success in multimodal retrieval. However, most existing works study the local geometric structure in the original space, which suffers from intra- and inter-modality ambiguity, resulting in low discriminative hash codes. To address this issue, we propose a novel cross-modal hashing approach by taking inter- and intra-modality structure preserving into consideration, dubbed discriminative structure preserving hashing (DSPH). Specifically, DSPH explores the intra- and inter-modality in the latent structure of the constructed common space. In addition, the local geometric consistency is improved by a supervised shrinking scheme. DSPH learns the hash codes and latent features based on factorization coding scheme. The objective function includes common latent subspace learning and inter- & intra-modality structure embedding. We devise an alternative optimization scheme, where the hash codes are solved by a bitwise scheme, and the large quantization error can be avoided. Owing to the merit of DSPH, more discriminative hash codes can be generated. The extensive experimental results on several widely used databases demonstrate that the proposed algorithm outperforms several state-of-art cross-media retrieval methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Asuncion A, Newman D (2007) UCI machine learning repository
Cao Y, Long M, Wang J, Yu PS (2016) Correlation hashing network for efficient cross-modal retrieval. arXiv preprint arXiv:1602.06697
Chen Y, Lai Z, Ding Y, Lin K, Wong WK (2019) Deep supervised hashing with anchor graph. In: Proceedings of the IEEE international conference on computer vision, pp 9796–9804
Choraś RS, Andrysiak T, Choraś M (2007) Integrated color, texture and shape information for content-based image retrieval. Pattern Anal Appl 10(4):333–343
Article MathSciNet Google Scholar
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry, pp 253–262
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
Fang X, Teng S, Lai Z, He Z, Xie S, Wong WK (2017) Robust latent subspace learning for image classification. IEEE Trans Neural Netw Learn Syst 29(6):2502–2515
Article MathSciNet Google Scholar
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vision 106(2):210–233
Article Google Scholar
He R, Zhang M, Wang L, Ji Y, Yin Q (2015) Cross-modal subspace learning via pairwise constraints. IEEE Trans Image Process 24(12):5543–5556
Article MathSciNet Google Scholar
Hui K, Wang C (2008) Clustering-based locally linear embedding. In: 2008 19th international conference on pattern recognition, pp 1–4. IEEE
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43
Jin L, Li K, Li Z, Xiao F, Qi GJ, Tang J (2018) Deep semantic-preserving ordinal hashing for cross-modal similarity search. IEEE Trans Neural Netw Learn Syst 30(5):1429–1440
Article MathSciNet Google Scholar
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence
Lai Z, Chen Y, Wu J, Wong WK, Shen F (2018) Jointly sparse hashing for image retrieval. IEEE Trans Image Process 27(12):6147–6158
Article MathSciNet Google Scholar
Li K, Qi GJ, Ye J, Hua KA (2016) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838
Article Google Scholar
Li Z, Tang J (2016) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288
Article MathSciNet Google Scholar
Lin G, Shen C, Shi Q, Van den Hengel A, Suter D (2014) Fast supervised hashing with decision trees for high-dimensional data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1963–1970
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872
Liu DC, Nocedal J (1989) On the limited memory bfgs method for large scale optimization. Math Program 45(1–3):503–528
Article MathSciNet Google Scholar
Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. arXiv preprint arXiv:1603.05572
Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7380–7388
Liu X, Hu Z, Ling H, Cheung Ym (2019) Mtfh: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. In: IEEE transactions on pattern analysis and machine intelligence
Masci J, Bronstein MM, Bronstein AM, Schmidhuber J (2013) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830
Article Google Scholar
Qin Z, Yu J, Cong Y, Wan T (2016) Topic correlation model for cross-modal multimedia information retrieval. Pattern Anal Appl 19(4):1007–1022
Article MathSciNet Google Scholar
Rafailidis D, Crestani F (2016) Cluster-based joint matrix factorization hashing for cross-modal retrieval. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 781–784
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260
Rupnik J, Shawe-Taylor J (2010) Multi-view canonical correlation analysis. In: Conference on data mining and data warehouses (SiKDD 2010), pp 1–4
Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: CVPR 2011, pp 593–600. IEEE
Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: A discriminative latent space. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2160–2167. IEEE
Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45
Shen GL, Wu XJ (2013) Content based image retrieval by combining color, texture and centrist
Shen X, Shen F, Sun QS, Yang Y, Yuan YH, Shen HT (2016) Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. IEEE Trans Cybern 47(12):4275–4288
Article Google Scholar
Shu X, Wu XJ (2011) A novel contour descriptor for 2d shape matching and its application to image retrieval. Image Vis Comput 29(4):286–294
Article Google Scholar
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796
Tang J, Li Z (2017) Weakly supervised multimodal hashing for scalable social image retrieval. IEEE Trans Circuits Syst Video Technol 28(10):2730–2741
Article Google Scholar
Tang J, Li Z, Wang M, Zhao R (2015) Neighborhood discriminant hashing for large-scale image retrieval. IEEE Trans Image Process 24(9):2827–2840
Article MathSciNet Google Scholar
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Article MathSciNet Google Scholar
Wan M, Lai Z, Yang G, Yang Z, Zhang F, Zheng H (2017) Local graph embedding based on maximum margin criterion via fuzzy set. Fuzzy Sets Syst 318:120–131
Article MathSciNet Google Scholar
Wan M, Li M, Yang G, Gai S, Jin Z (2014) Feature extraction using two-dimensional maximum embedding difference. Inf Sci 274:55–69
Article Google Scholar
Wang D, Gao X, Wang X, He L (2015) Semantic topic multimodal hashing for cross-media retrieval. In: Twenty-fourth international joint conference on artificial intelligence
Wang Y, Lin X, Wu L, Zhang W, Zhang Q (2015) Lbmch: Learning bridging mapping for cross-modal hashing. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 999–1002
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760
Wu B, Yang Q, Zheng WS, Wang Y, Wang J (2015) Quantized correlation hashing for fast cross-modal search. In: Twenty-fourth international joint conference on artificial intelligence
Wu F, Yu Z, Yang Y, Tang S, Zhang Y, Zhuang Y (2013) Sparse multi-modal hashing. IEEE Trans Multimedia 16(2):427–439
Article Google Scholar
Yu J, Wu XJ, Kittler J (2018) Semi-supervised hashing for semi-paired cross-view retrieval. In: 2018 24th international conference on pattern recognition (ICPR), pp 958–963. IEEE
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Twenty-Eighth AAAI conference on artificial intelligence
Zhang L, Zhang Y, Hong R, Tian Q (2015) Full-space local topology extraction for cross-modal retrieval. IEEE Trans Image Process 24(7):2212–2224
Article MathSciNet Google Scholar
Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384
Zhong F, Min G, Leng Y, Ying Y (2018) Supervised intra-and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6:27796–27808
Article Google Scholar
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, pp 415–424
Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on Multimedia, pp 143–152

Download references

Acknowledgements

This research was supported by the National Nature Science Foundation of China [Grant 61672265, U1836218] and the 111 Project of Chinese Ministry of Education under Grant B12018.

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China
Donglin Zhang, Xiao-Jun Wu & Jun Yu
Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, 214122, Wuxi, China
Donglin Zhang, Xiao-Jun Wu & Jun Yu

Authors

Donglin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao-Jun Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, D., Wu, XJ. & Yu, J. Learning latent hash codes with discriminative structure preserving for cross-modal retrieval. Pattern Anal Applic 24, 283–297 (2021). https://doi.org/10.1007/s10044-020-00893-6

Download citation

Received: 27 June 2019
Accepted: 09 June 2020
Published: 24 June 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10044-020-00893-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning latent hash codes with discriminative structure preserving for cross-modal retrieval

Abstract

Access this article

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation