Skip to main content
Log in

Multi-modal graph regularization based class center discriminant analysis for cross modal retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

On the network, a large amount of multi-modal data has emerged. Efficiently utilizing such data to conduct cross modal retrieval has become a hot topic of research. Some solutions have been proposed for this problem. However, many of these methods only considered the local structural information, thus losing sight of the global structural information of data. To overcome this problem and enhance retrieval accuracy, we propose a multi-modal graph regularization based class center discriminant analysis for cross modal retrieval. The core of our method is to maximize the intra-modality distance and minimize the inter-modality distance of class center samples to strengthen the discriminant ability of the model. Meanwhile, a multi-modal graph, which consists of the inter-modality similarity graph, the class center intra-modality graph and the inter-modality graph, is fused into the method to further reinforce the semantic similarity between different modalities. The method considers the local structural information of data together with the global structural information of data. Experimental results on three benchmark datasets demonstrate the superiority of this proposed scheme over several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  2. Huang X, Peng Y (2018) Deep Cross-media Knowledge Transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8837–8846

  3. Ji Z, Pang Y, Li X (2015) Relevance preserving projection and ranking for web image search reranking. IEEE Trans Image Process Publ IEEE Signal Process Soc 24(11):4137–47

    MathSciNet  MATH  Google Scholar 

  4. Ji Z, Yu Y, Pang Y (2017) Manifold regularized cross-modal embedding for zero-shot learning. Inf Sci 378:48–58

    Article  Google Scholar 

  5. Ji Z, Li S, Pang Y (2018) Fusion-attention network for person search with free-form natural language. Pattern Recogn Lett 116:205–211

    Article  Google Scholar 

  6. Ji Z, Sun Y, Yu Y (2019) Attribute-Guided Network for Cross-Modal Zero-Shot Hashing. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2019.2904991

  7. Jian M, Lam KM, Dong J (2014) Facial-feature detection and localization based on a hierarchical scheme. Inf Sci 262:1–14

    Article  Google Scholar 

  8. Jian M, Lam KM, Dong J (2015) Visual-patch-attention-aware saliency detection. IEEE Trans Cybern 45(8):1575–1586

    Article  Google Scholar 

  9. Jian M, Qi Q, Dong J (2017) Saliency detection using quaternionic distance based weber local descriptor and level priors. Multimed Tools Appl 77(11):14343–14360

    Article  Google Scholar 

  10. Jian M, Qi Q, Dong J (2018) Integrating qdwd with pattern distinctness and local contrast for underwater saliency detection. J Vis Commun Image Represent 53:31–41

    Article  Google Scholar 

  11. Jian M, Yin Y, Dong J (2018) Content-based image retrieval via a hierarchical-local-feature extraction scheme. Multimedia Tools and Applications 77 (8):1–19

    Google Scholar 

  12. Kang WC, Li WJ, Zhou ZH (2016) Column Sampling Based Discrete Supervised Hashing in AAAI, pp 1230-1236

  13. Lan X, Ma AJ, Yuen PC (2014) Multi-cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1194–1201

  14. Lan X, Ma AJ, Yuen PC (2015) Joint sparse representation and robust Feature-Level fusion for Multi-Cue visual tracking. IEEE Trans Image Process 24 (12):5826–5841

    Article  MathSciNet  MATH  Google Scholar 

  15. Lan X, Zhang S, Yuen PC (2016) Robust joint discriminative feature learning for visual tracking. International Joint Conference on Artificial Intelligence, pp 3403–3410

  16. Lan X, Ye M, Zhang S (2018) Modality-Correlation-Aware Sparse Representation for RGB-Infrared Object Tracking. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2018.10.002

  17. Lan X, Ye M, Zhang S (2018) Robust collaborative discriminative learning for rgb-infrared tracking. Thirty-Second AAAI Conference on Artificial Intelligence, pp 7008–7015

  18. Lan X, Zhang S, Yuen PC (2018) Learning common and Feature-Specific patterns: a novel Multiple-Sparse-Representation-Based tracker. IEEE Trans Image Process 27(4):2022–2037

    Article  MathSciNet  MATH  Google Scholar 

  19. Lan X, Ye M, Shao R (2019) Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System. IEEE Transactions on Industrial Electronics. https://doi.org/10.1109/TIE.2019.2898618

  20. Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on Multimedia, pp 604–611

  21. Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: Computer Vision and Pattern Recognition (CVPR), pp 2074–2081

  22. Peng Y, Qi J, Yuan Y (2017) CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning. arXiv:1710.05106

  23. Peng Y, Qi J, Huang X, Yuan Y (2018) Ccl: Cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans Multimed 20 (2):405–420

    Article  Google Scholar 

  24. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia, pp 251–260

  25. Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: Computer Vision and Pattern Recognition (CVPR), pp 593–600

  26. Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Computer Vision and Pattern Recognition (CVPR), pp 2160–2167

  27. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 785–796

  28. Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406

    Article  Google Scholar 

  29. Wang K, He R, Wang W, Wang L, Tan T (2013) Learning coupled feature spaces for cross-modal matching. In: Computer Vision (ICCV). IEEE International Conference, pp 2088–2095

  30. Wang K, He R, Wang L, Wang W, Tan T (2016) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023

    Article  Google Scholar 

  31. Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. arXiv:1607.06215

  32. Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2016) Modality-dependent cross-media retrieval. ACM Trans Intell Syst Technol 7(4):57

    Article  Google Scholar 

  33. Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cybern 47 (2):449–460

    Google Scholar 

  34. Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760

  35. Wu J, Lin Z, Zha H (2017) Joint latent subspace learning and regression for Cross-Modal retrieval. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 917–920

  36. Wu Y, Wang S, Zhang W, Huang Q (2017) Online low-rank similarity function learning with adaptive relative margin for cross-modal retrieval. IEEE International Conference on Multimedia and Expo, pp 823–828

  37. Yan J, Zhang H, Sun J, Wang Q, Guo P, Meng L, Dong X (2018) Joint graph regularization based modality-dependent cross-media retrieval. Multimed Tools Appl 77(3):3009–3027

    Article  Google Scholar 

  38. Zhai X, Peng Y, Xiao J (2013) Heterogeneous metric learning with joint graph regularization for cross-media retrieval. Twenty-Seventh AAAI Conference on Artificial Intelligence, pp 1198–1204

  39. Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circ Syst Video Technol 24(6):965–978

    Article  Google Scholar 

  40. Zhang H, Lu J (2009) Semi-supervised fuzzy clustering: a kernel-based approach. Knowl-Based Syst 22(6):477–481

    Article  Google Scholar 

  41. Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Sets Syst 161(13):1790–1802

    Article  MathSciNet  Google Scholar 

  42. Zhang H, Cao L (2014) A spectral clustering based ensemble pruning approach. Neurocomputing 139:289–297

    Article  Google Scholar 

  43. Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47(9):3168–3178

    Article  MATH  Google Scholar 

  44. Zhang L, Ma B, Li G, Huang Q, Tian Q (2016) Pl-ranking: a novel ranking method for cross-modal retrieval. In: Proceedings of the 2016 ACM on Multimedia Conference, pp 1355–1364

  45. Zhang L, Ma B, Li G, Huang Q, Tian Q (2017) Cross-modal retrieval using multiordered discriminative structured subspace learning. IEEE Trans Multimed 19(6):1220–1233

    Article  Google Scholar 

  46. Zhang L, Ma B, Li G, Huang Q, Tian Q (2018) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimed 20(1):128–141

    Article  Google Scholar 

  47. Zhang M, Zhang H, Li J (2018) Supervised graph regularization based cross media retrieval with intra and inter-class correlation. Journal of Visual Communication and Image Representation. https://doi.org/10.1016/j.jvcir.2018.11.025

  48. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on Research and development in information retrieval, pp 415–424

  49. Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. International Joint Conferences on Artificial Intelligence, pp 3959–3965

  50. Zhu L, Huang Z, Liu X, He X, Sun J, Zhou X (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans Multimed 19(9):2066–2079

    Article  Google Scholar 

Download references

Acknowledgments

The work is partially supported by the National Natural Science Foundation of China (Nos.61572298, 61772322, U1836216), the Key Research and Development Foundation of Shandong Province (Nos.2017GGX10117, 2017CXGC0703), and the Natural Science Foundation of Shandong China (No.ZR2015PF006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaxiang Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Zhang, H., Li, J. et al. Multi-modal graph regularization based class center discriminant analysis for cross modal retrieval. Multimed Tools Appl 78, 28285–28307 (2019). https://doi.org/10.1007/s11042-019-07909-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07909-2

Keywords

Navigation