Skip to main content
Log in

Fast unsupervised consistent and modality-specific hashing for multimedia retrieval

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Hashing is an effective technique to solve large-scale data storage problem and achieve efficient retrieval, and it is also a core technology to promote the intelligent development of the new infrastructure construction. In most practical situations, label information is unavailable, and creating manual annotations is a time-consuming and laborious process. Therefore, unsupervised cross-modal hashing technique has received extensive attention from the information retrieval community due to its fast retrieval speed and feasibility. However, the capabilities of existing unsupervised cross-modal hashing methods are not sufficient to comprehensively describe the complex relations among different modalities, such as the balance of complementary and consistency between different modalities. In this article, we propose a new-type of unsupervised cross-modal hashing method called Fast Unsupervised Consistent and Modality-Specific Hashing (FUCMSH). Specifically, FUCMSH consists of two main modules, i.e., shared matrix factorization module (SMFM) and individual auto-encoding module (IAEM). In the SMFM, FUCMSH dynamically assigns weights to different modalities to adaptively balance the contribution of different modalities. By doing so, the information completeness of the shared consistent representation can be guaranteed. In the IAEM, FUCMSH learns individual modality-specific latent representations of different modalities through modality-specific linear autoencoders. Moreover, FUCMSH makes use of the transfer learning to link the relationships between different individual modality-specific latent representations. Combined with the SMFM and the IAEM, the discriminative capability of the generated binary codes can be significantly improved. The relatively extensive experimental results manifest the superiority of the proposed FUCMSH.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability Statement

This publication is supported by multiple datasets, which are openly available at the hyperlinks in the dataset section or at the locations cited in the reference section.

Notes

  1. http://svcl.ucsd.edu/projects/crossmodal/.

  2. https://press.liacs.nl/mirflickr/mirdownload.html.

  3. https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html.

References

  1. Gao D, Jin L, Chen B, Qiu M, Li P, Wei Y, Hu Y, Wang H (2020) Fashionbert: text and image matching with adaptive loss for cross-modal retrieval. In: ACM SIGIR. ACM, pp 2251–2260

  2. Lin K, Xu X, Gao L, Wang Z, Shen HT (2020) Learning cross-aligned latent embeddings for zero-shot cross-modal retrieval. In: AAAI. AAAI Press, pp 11515–11522

  3. Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: ACM MM. ACM, pp 154–162

  4. Wu Y, Wang S, Huang Q (2020) Online fast adaptive low-rank similarity learning for cross-modal retrieval. IEEE Trans Multimed 22(5):1310–1322

    Article  Google Scholar 

  5. Zhang Y, Zhou W, Wang M, Tian Q, Li H (2021) Deep relation embedding for cross-modal retrieval. IEEE Trans Image Process 30:617–627

    Article  Google Scholar 

  6. Wang Z, Zhang Z, Luo Y, Huang Z, Shen HT (2021) Deep collaborative discrete hashing with semantic-invariant structure construction. IEEE Trans Multimed 23:1274–1286

    Article  Google Scholar 

  7. Qiang H, Wan Y, Liu Z, Xiang L, Meng X (2020) Discriminative deep asymmetric supervised hashing for cross-modal retrieval. Knowl Based Syst 204:106188

    Article  Google Scholar 

  8. Yang Z, Long J, Zhu L, Huang W (2020) Nonlinear robust discrete hashing for cross-modal retrieval. In: ACM SIGIR, pp 1349–1358

  9. Li Z, Tang J, Zhang L, Yang J (2020) Weakly-supervised semantic guided hashing for social image retrieval. Int J Comput Vis 128(8):2265–2278

    Article  MathSciNet  MATH  Google Scholar 

  10. Fang Y, Li B, Li X, Ren Y (2021) Unsupervised cross-modal similarity via latent structure discrete hashing factorization. Knowl Based Syst 218:106857

    Article  Google Scholar 

  11. Mandal D, Chaudhury KN, Biswas S (2019) Generalized semantic preserving hashing for cross-modal retrieval. TIP 28(1):102–112

    MathSciNet  Google Scholar 

  12. Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: CVPR, pp 3594–3601

  13. Liu X, Nie X, Zeng W, Cui C, Zhu L, Yin Y (2018) Fast discrete cross-modal hashing with regressing from semantic labels. In: ACM MM, pp 1662–1669

  14. Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 37–45

  15. Luo X, Zhang P, Wu Y, Chen Z, Huang H, Xu X (2018) Asymmetric discrete cross-modal hashing. In: ICMR, pp 204–212

  16. Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: CVPR, pp 2064–2072

  17. Yang Z, Raymond OI, Huang W, Liao Z, Zhu L, Long J (2020) Scalable deep asymmetric hashing via unequal-dimensional embeddings for image similarity search. Neurocomputing 412:262–275

    Article  Google Scholar 

  18. Li F, Wang T, Zhu L, Zhang Z, Wang X (2021) Task-adaptive asymmetric deep cross-modal hashing. Knowl Based Syst 219:106851

    Article  Google Scholar 

  19. Deng C, Yang E, Liu T, Li J, Liu W, Tao D (2019) Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans Image Process 28(8):4032–4044

    Article  MathSciNet  MATH  Google Scholar 

  20. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: SIGMOD, pp 785–796

  21. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: SIGIR, pp 415–424

  22. He K, Wen F, Sun J (2013) K-means hashing: an affinity-preserving quantization method for learning binary compact codes. In: CVPR, pp 2938–2945

  23. Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen HT (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach Intell 40(12):3034–3044

    Article  Google Scholar 

  24. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: NIPS, pp 1753–1760

  25. Zhang H, Liu L, Long Y, Shao L (2018) Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Trans Image Process 27(4):1626–1638

    Article  MathSciNet  Google Scholar 

  26. Fang Y, Zhang H, Ren Y (2019) Unsupervised cross-modal retrieval via multi-modal graph regularized smooth matrix factorization hashing. Knowl Based Syst 171:69–80

    Article  Google Scholar 

  27. Yu J, Zhou H, Zhan Y, Tao D (2021) Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: AAAI. AAAI Press, pp 4626–4634

  28. Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI, pp 2177–2183

  29. Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. TIP 26(5):2494–2507

    MathSciNet  MATH  Google Scholar 

  30. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp 3864–3872

  31. Kim S, Choi S (2013) Multi-view anchor graph hashing. In: ICASSP. IEEE, pp 3123–3127

  32. Meng M, Wang H, Yu J, Chen H, Wu J (2021) Asymmetric supervised consistent and specific hashing for cross-modal retrieval. IEEE Trans Image Process 30:986–1000

    Article  MathSciNet  Google Scholar 

  33. Sun L, Ji S, Ye J (2008) A least squares formulation for canonical correlation analysis. In: ICML, ACM International Conference Proceeding Series, vol 307, pp 1024–1031

  34. Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: IJCAI, pp 1360–1365

  35. Lee K, Chen X, Hua G, Hu H, He X (2018) Stacked cross attention for image-text matching. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) ECCV, Lecture notes in computer science, vol 11208. Springer, Berlin, pp 212–228

  36. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: CVPR, pp 2083–2090

  37. Wang D, Wang Q, He L, Gao X, Tian Y (2020) Joint and individual matrix factorization hashing for large-scale cross-modal retrieval. Pattern Recognit 107:107479

    Article  Google Scholar 

  38. Cheng M, Jing L, Ng MK (2020) Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans Inf Syst 38(3):30:1-30:25

    Article  Google Scholar 

  39. Wang L, Yang J, Zareapoor M, Zheng Z (2021) Cluster-wise unsupervised hashing for cross-modal similarity search. Pattern Recognit 111:107732

    Article  Google Scholar 

  40. Ji D, Gao J, Fei H, Teng C, Ren Y (2020) A deep neural network model for speakers coreference resolution in legal texts. Inf Process Manag 57(6):102365

    Article  Google Scholar 

  41. Farrugia RA, Guillemot C (2020) Light field super-resolution using a low-rank prior and deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 42(5):1162–1175

    Google Scholar 

  42. Zhang C, Liu A, Liu X, Xu Y, Yu H, Ma Y, Li T (2021) Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity. IEEE Trans Image Process 30:1291–1304

    Article  Google Scholar 

  43. Fu X, Wang W, Huang Y, Ding X, Paisley JW (2021) Deep multiscale detail networks for multiband spectral image sharpening. IEEE Trans Neural Netw Learn Syst 32(5):2090–2104

    Article  Google Scholar 

  44. Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: AAAI, pp 1618–1625

  45. Jiang Q, Li W (2017) Deep cross-modal hashing. In: CVPR, pp 3270–3278

  46. Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854–2860

  47. Zhu L, Lu X, Cheng Z, Li J, Zhang H (2020) Deep collaborative multi-view hashing for large-scale image search. IEEE Trans Image Process 29:4643–4655

    Article  MathSciNet  MATH  Google Scholar 

  48. Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: AAAI, pp 539–546

  49. Shao M, Kit D, Fu Y (2014) Generalized transfer subspace learning through low-rank constraint. Int J Comput Vis 109(1–2):74–93

    Article  MathSciNet  MATH  Google Scholar 

  50. Kafai M, Eshghi K (2019) Croification: Accurate kernel classification with the efficiency of sparse linear SVM. IEEE Trans Pattern Anal Mach Intell 41(1):34–48

    Article  Google Scholar 

  51. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  52. Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: ACM MM. ACM, pp 251–260

  53. Huiskes M J, Lew M S (2008) The MIR flickr retrieval evaluation. In: ACM SIGMM, pp 39–43

  54. Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of Singapore. Iin: Proceedings of the 8th ACM international conference on image and video retrieval, CIVR 2009, Santorini Island, Greece, July 8–10, 2009

  55. Chen Z, Wang Y, Li H, Luo X, Nie L, Xu X (2019) A two-step cross-modal hashing by exploiting label correlations and preserving similarity in both steps. In: ACM MM, pp 1694–1702

  56. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  57. Deng J, Dong W, Socher R, Li R, Li K, Li F (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255

  58. Liu H, Lin M, Zhang S, Wu Y, Huang F, Ji R (2018) Dense auto-encoder hashing for robust cross-modality retrieval. In: ACM MM. ACM, pp 1589–1597

  59. Zheng C, Zhu L, Cheng Z, Li J, Liu A (2021) Adaptive partial multi-view hashing for efficient social image retrieval. IEEE Trans Multimed 23:4079–4092

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key R &D Program of China under Grant 2021YFB3900902, in part by the National Natural Science Foundation of China under Grants (62202501, U2003208), and in part by the Science and Technology Plan of Hunan Province under Grants (2022JJ40638, 2016TP1003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Long.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Deng, X. & Long, J. Fast unsupervised consistent and modality-specific hashing for multimedia retrieval. Neural Comput & Applic 35, 6207–6223 (2023). https://doi.org/10.1007/s00521-022-08008-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08008-4

Keywords

Navigation