Skip to main content
Log in

Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the advance of internet and multimedia technologies, large-scale multi-modal representation techniques such as cross-modal hashing, are increasingly demanded for multimedia retrieval. In cross-modal hashing, three essential problems should be seriously considered. The first is that effective cross-modal relationship should be learned from training data with scarce label information. The second is that appropriate weights should be assigned for different modalities to reflect their importance. The last is the scalability of training process which is usually ignored by previous methods. In this paper, we propose Multi-graph Cross-modal Hashing (MGCMH) by comprehensively considering these three points. MGCMH is unsupervised method which integrates multi-graph learning and hash function learning into a joint framework, to learn unified hash space for all modalities. In MGCMH, different modalities are assigned with proper weights for the generation of multi-graph and hash codes respectively. As a result, more precise cross-modal relationship can be preserved in the hash space. Then Nyström approximation approach is leveraged to efficiently construct the graphs. Finally an alternating learning algorithm is proposed to jointly optimize the modality weights, hash codes and functions. Experiments conducted on two real-world multi-modal datasets demonstrate the effectiveness of our method, in comparison with several representative cross-modal hashing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51(1):459–468

    Google Scholar 

  2. Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition IEEE, pp 3594–3601

  3. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27

    Article  Google Scholar 

  4. Cheng J, Leng C, Li P, Wang M, Lu H (2014) Semi-supervised multi-graph hashing for scalable similarity search. Comput Vis Image Underst 124:12–21

    Article  Google Scholar 

  5. Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval

  6. Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  7. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, Prague, vol 1, pp 1–2

  8. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 2083– 2090

  9. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  10. Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the ACM International Conference on Multimedia, pp 7–16

  11. Gao L, Song J, Nie F, Yan Y, Sebe N, Tao Shen H (2015) Optimal graph learning with partial tags and multiple features for image and video annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4371–4379

  12. Guillaumin M, Verbeek J, Schmid C (2010) Multimodal semi-supervised learning for image classification. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition IEEE, pp 902–909

  13. He K, Wen F, Sun J (2013) K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp 2938–2945

  14. Hotelling H (1936) Relations between two sets of variates. Biometrika:321–377

  15. Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Proceedings of the international joint conference on artificial intelligence, vol 22, p 1360

  16. Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the 11th ACM international conference on Multimedia, ACM, pp 604–611

  17. Liu W, Mu C, Kumar S, Chang SF (2014) Discrete graph hashing. In: Proceedings of NIPS, pp 3419–3427

  18. Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22(7):2676–2687

    Article  MathSciNet  Google Scholar 

  19. Liu W, Wang J, Kumar S, Chang SF (2011) Hashing with graphs. In: Proceedings of the 28th international conference on machine learning, pp 1–8

  20. Luo Y, Liu T, Tao D, Xu C (2015) Multiview matrix completion for multilabel image classification. IEEE Trans Image Process 24(8):2355–2368

    Article  MathSciNet  Google Scholar 

  21. Ma Z, Yang Y, Sebe N, Hauptmann AG (2014) Multiple features but few labels? A symbiotic solution exemplified for video analysis. In: Proceedings of the ACM International Conference on Multimedia, ACM, pp 77–86

  22. Ni B, Moulin P, Yan S (2015) Order preserving sparse coding. IEEE Trans Pattern Anal Mach Intell 37:1615–1628

    Article  Google Scholar 

  23. Perronnin F, Snchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of the 11th European Conference on Computer Vision, pp 143–156

  24. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia, ACM, pp 251–260

  25. Saberian MJ, Vasconcelos N (2011) Multiclass boosting: Theory and algorithms. In: Proceedings of NIPS, pp 2124–2132

  26. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. university press, Cambridge

    Book  MATH  Google Scholar 

  27. Song J, Gao L, Yan Y, Zhang D, Sebe N (2015) Supervised hashing with pseudo labels for scalable multimedia retrieval. In: Proceedings of the 23rd ACM Conference on Multimedia, ACM, pp 827– 830

  28. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 785–796

  29. Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MathSciNet  MATH  Google Scholar 

  30. Wang J, Kumar S, Chang SF (2010) Semi-supervised hashing for scalable image retrieval. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition IEEE, pp 3424– 3431

  31. Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406

    Article  Google Scholar 

  32. Wang K, He R, Wang W, Wang L, Tan T (2013) Learning coupled feature spaces for cross-modal matching. In: Proceedings of 2013 IEEE International Conference on Computer Vision IEEE, pp 2088–2095

  33. Wang Q, Si L, Zhang Z, Zhang N (2014) Active hashing with joint data example and tag selection. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM, pp 405–414

  34. Wang W, Yang X, Ooi BC, Zhang D, Zhuang Y (2015) Effective deep learning-based multi-modal retrieval. VLDB J:1–23

  35. Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Proceedings of NIPS, pp 1753–1760

  36. Xie L, Pan P, Lu Y, Wang S (2014) A cross-modal multi-task learning framework for image annotation. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, ACM, pp 431–440

  37. Xie L, Pan P, Lu Y (2015) Analyzing semantic correlation for cross-modal retrieval. Multimedia Systems 21(6):525–539

  38. Xie L, Zhu L, Pan P, Lu Y (2015) Cross-modal self-taught hashing for large-scale image retrieval. Signal Processing

  39. Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2015) A multi-task learning framework for head pose estimation under target motion

  40. Yan Y, Ricci E, Subramanian R, Liu G, Sebe N (2014) Multitask linear discriminant analysis for view invariant action recognition. IEEE Trans Image Process 23(12):5599–5611

    Article  MathSciNet  Google Scholar 

  41. Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878

    Article  MathSciNet  Google Scholar 

  42. Yang Y, Zhuang YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimedia 10(3):437–446

    Article  Google Scholar 

  43. Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on Multimedia, ACM, pp 175–184

  44. Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, pp 2177– 2183

  45. Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp 18–25

  46. Zhang D, Wang F, Si L (2011) Composite hashing with multiple information sources. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, ACM, pp 225–234

  47. Zhang K, Tsang IW, Kwok JT (2008) Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on Machine learning, ACM, pp 1232–1239

  48. Zhang K, Kwok JT, Parvin B (2009) Prototype vector machine for large scale semi-supervised learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, pp 1233– 1240

  49. Zhang P, Zhang W, Li WJ, Guo M (2014) Supervised hashing with latent factor models. In: Proceedings of the 37th international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 173–182

  50. Zhen Y, Yeung DY (2012) A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 940–948

  51. Zhen Y, Yeung DY (2013) Active hashing and its application to image and text retrieval. Data Min Knowl Disc 26(2):255–274

    Article  MathSciNet  Google Scholar 

  52. Zhu L, Shen J, Jin H, Zheng R, Xie L (2015) Content-based visual landmark search via multimodal hypergraph learning. IEEE Transactions on Cybernetics

  53. Zhu L, Shen J, Xie L (2015) Topic hypergraph hashing for mobile image retrieval. In: Proceedings of the 23rd ACM Conference on Multimedia Conference, ACM, pp 843–846

  54. Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on Multimedia, ACM, pp 143– 152

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, L., Zhu, L. & Chen, G. Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimed Tools Appl 75, 9185–9204 (2016). https://doi.org/10.1007/s11042-016-3432-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3432-0

Keywords

Navigation