Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval

Xie, Liang; Zhu, Lei; Chen, Guoqi

doi:10.1007/s11042-016-3432-0

Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval

Published: 19 March 2016

Volume 75, pages 9185–9204, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Liang Xie¹,
Lei Zhu² &
Guoqi Chen³

990 Accesses
35 Citations
Explore all metrics

Abstract

With the advance of internet and multimedia technologies, large-scale multi-modal representation techniques such as cross-modal hashing, are increasingly demanded for multimedia retrieval. In cross-modal hashing, three essential problems should be seriously considered. The first is that effective cross-modal relationship should be learned from training data with scarce label information. The second is that appropriate weights should be assigned for different modalities to reflect their importance. The last is the scalability of training process which is usually ignored by previous methods. In this paper, we propose Multi-graph Cross-modal Hashing (MGCMH) by comprehensively considering these three points. MGCMH is unsupervised method which integrates multi-graph learning and hash function learning into a joint framework, to learn unified hash space for all modalities. In MGCMH, different modalities are assigned with proper weights for the generation of multi-graph and hash codes respectively. As a result, more precise cross-modal relationship can be preserved in the hash space. Then Nyström approximation approach is leveraged to efficiently construct the graphs. Finally an alternating learning algorithm is proposed to jointly optimize the modality weights, hash codes and functions. Experiments conducted on two real-world multi-modal datasets demonstrate the effectiveness of our method, in comparison with several representative cross-modal hashing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51(1):459–468
Google Scholar
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition IEEE, pp 3594–3601
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Article Google Scholar
Cheng J, Leng C, Li P, Wang M, Lu H (2014) Semi-supervised multi-graph hashing for scalable similarity search. Comput Vis Image Underst 124:12–21
Article Google Scholar
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval
Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Article Google Scholar
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, Prague, vol 1, pp 1–2
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 2083– 2090
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the ACM International Conference on Multimedia, pp 7–16
Gao L, Song J, Nie F, Yan Y, Sebe N, Tao Shen H (2015) Optimal graph learning with partial tags and multiple features for image and video annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4371–4379
Guillaumin M, Verbeek J, Schmid C (2010) Multimodal semi-supervised learning for image classification. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition IEEE, pp 902–909
He K, Wen F, Sun J (2013) K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp 2938–2945
Hotelling H (1936) Relations between two sets of variates. Biometrika:321–377
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Proceedings of the international joint conference on artificial intelligence, vol 22, p 1360
Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the 11th ACM international conference on Multimedia, ACM, pp 604–611
Liu W, Mu C, Kumar S, Chang SF (2014) Discrete graph hashing. In: Proceedings of NIPS, pp 3419–3427
Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22(7):2676–2687
Article MathSciNet Google Scholar
Liu W, Wang J, Kumar S, Chang SF (2011) Hashing with graphs. In: Proceedings of the 28th international conference on machine learning, pp 1–8
Luo Y, Liu T, Tao D, Xu C (2015) Multiview matrix completion for multilabel image classification. IEEE Trans Image Process 24(8):2355–2368
Article MathSciNet Google Scholar
Ma Z, Yang Y, Sebe N, Hauptmann AG (2014) Multiple features but few labels? A symbiotic solution exemplified for video analysis. In: Proceedings of the ACM International Conference on Multimedia, ACM, pp 77–86
Ni B, Moulin P, Yan S (2015) Order preserving sparse coding. IEEE Trans Pattern Anal Mach Intell 37:1615–1628
Article Google Scholar
Perronnin F, Snchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of the 11th European Conference on Computer Vision, pp 143–156
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia, ACM, pp 251–260
Saberian MJ, Vasconcelos N (2011) Multiclass boosting: Theory and algorithms. In: Proceedings of NIPS, pp 2124–2132
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. university press, Cambridge
Book MATH Google Scholar
Song J, Gao L, Yan Y, Zhang D, Sebe N (2015) Supervised hashing with pseudo labels for scalable multimedia retrieval. In: Proceedings of the 23rd ACM Conference on Multimedia, ACM, pp 827– 830
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 785–796
Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
MathSciNet MATH Google Scholar
Wang J, Kumar S, Chang SF (2010) Semi-supervised hashing for scalable image retrieval. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition IEEE, pp 3424– 3431
Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406
Article Google Scholar
Wang K, He R, Wang W, Wang L, Tan T (2013) Learning coupled feature spaces for cross-modal matching. In: Proceedings of 2013 IEEE International Conference on Computer Vision IEEE, pp 2088–2095
Wang Q, Si L, Zhang Z, Zhang N (2014) Active hashing with joint data example and tag selection. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM, pp 405–414
Wang W, Yang X, Ooi BC, Zhang D, Zhuang Y (2015) Effective deep learning-based multi-modal retrieval. VLDB J:1–23
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Proceedings of NIPS, pp 1753–1760
Xie L, Pan P, Lu Y, Wang S (2014) A cross-modal multi-task learning framework for image annotation. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, ACM, pp 431–440
Xie L, Pan P, Lu Y (2015) Analyzing semantic correlation for cross-modal retrieval. Multimedia Systems 21(6):525–539
Xie L, Zhu L, Pan P, Lu Y (2015) Cross-modal self-taught hashing for large-scale image retrieval. Signal Processing
Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2015) A multi-task learning framework for head pose estimation under target motion
Yan Y, Ricci E, Subramanian R, Liu G, Sebe N (2014) Multitask linear discriminant analysis for view invariant action recognition. IEEE Trans Image Process 23(12):5599–5611
Article MathSciNet Google Scholar
Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878
Article MathSciNet Google Scholar
Yang Y, Zhuang YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimedia 10(3):437–446
Article Google Scholar
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on Multimedia, ACM, pp 175–184
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, pp 2177– 2183
Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp 18–25
Zhang D, Wang F, Si L (2011) Composite hashing with multiple information sources. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, ACM, pp 225–234
Zhang K, Tsang IW, Kwok JT (2008) Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on Machine learning, ACM, pp 1232–1239
Zhang K, Kwok JT, Parvin B (2009) Prototype vector machine for large scale semi-supervised learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, pp 1233– 1240
Zhang P, Zhang W, Li WJ, Guo M (2014) Supervised hashing with latent factor models. In: Proceedings of the 37th international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 173–182
Zhen Y, Yeung DY (2012) A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 940–948
Zhen Y, Yeung DY (2013) Active hashing and its application to image and text retrieval. Data Min Knowl Disc 26(2):255–274
Article MathSciNet Google Scholar
Zhu L, Shen J, Jin H, Zheng R, Xie L (2015) Content-based visual landmark search via multimodal hypergraph learning. IEEE Transactions on Cybernetics
Zhu L, Shen J, Xie L (2015) Topic hypergraph hashing for mobile image retrieval. In: Proceedings of the 23rd ACM Conference on Multimedia Conference, ACM, pp 843–846
Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on Multimedia, ACM, pp 143– 152

Download references

Author information

Authors and Affiliations

Department of Mathematics, Wuhan University of Technology, Wuhan, China
Liang Xie
School of Information Systems, Singapore Management University, Singapore, Singapore
Lei Zhu
School of Automation, Wuhan University of Technology, Wuhan, China
Guoqi Chen

Authors

Liang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Guoqi Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, L., Zhu, L. & Chen, G. Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimed Tools Appl 75, 9185–9204 (2016). https://doi.org/10.1007/s11042-016-3432-0

Download citation

Received: 01 October 2015
Revised: 22 February 2016
Accepted: 03 March 2016
Published: 19 March 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s11042-016-3432-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation