Cross-modal retrieval with dual optimization

Xu, Qingzhen; Liu, Shuang; Qiao, Han; Li, Miao

doi:10.1007/s11042-022-13650-0

Cross-modal retrieval with dual optimization

Published: 17 August 2022

Volume 82, pages 7141–7157, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qingzhen Xu ORCID: orcid.org/0000-0001-6687-8367¹,
Shuang Liu¹,
Han Qiao¹ &
…
Miao Li¹

325 Accesses
Explore all metrics

Abstract

For the flexible retrieval of data in different modalities, cross-modal retrieval has gradually attracted the attention of researchers. However, there is a heterogeneity gap between the data of different modalities, which cannot be measured directly. To solve this problem, researchers project data of different modalities into a common representation space to compensate for the heterogeneity of data of different modalities. However, existing methods with pair or triple constraints ignore the rich information between samples, which leads to the degradation of retrieval performance. In order to fully mine the information of samples, this paper proposes a cross-modal retrieval method (CMRDO) with dual optimization. First, the method optimizes the common representation space from inter-modal and intra-modal, respectively. Secondly, we introduce an efficient sample construction strategy to avoid sample pairs with less information. Finally, the bi-directional retrieval strategy we introduced can effectively capture the potential structure of query modal. In the three public datasets, the proposed CMRDO can effectively improve the final cross-modal retrieval accuracy, and has strong generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

Article 21 April 2018

A Mutual Information-Based Disentanglement Framework for Cross-Modal Retrieval

Cross-modal retrieval based on shared proxies

Article 20 January 2024

References

Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. CoRR, vol. abs/1306.6709
Cao Y, Long M, Wang J, Zhu H (2016) Correlation autoencoder hashing for supervised cross-modal search. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval. ACM, pp 197–204
Chapter Google Scholar
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the 8th ACM international conference on image and video retrieval. ACM
Google Scholar
Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the ACM international conference on multimedia. ACM, pp 7–16
Google Scholar
Hardoon DR, Szedmák S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Article MATH Google Scholar
Huang X, Peng Y, Yuan M (2020) MHTN: modal-adversarial hybrid transfer network for cross-modal retrieval. IEEE Trans Cybern 50(3):1047–1059
Article Google Scholar
Jiang Q, Li W (2017) Deep cross-modal hashing. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, pp 3270–3278
Google Scholar
Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: Computer vision - ECCV 2012 - 12th European conference on computer vision, vol 7572. Springer, pp 808–821
Chapter Google Scholar
Kan M, Shan S, Zhang H, Lao S, Chen X (2016) Multi-view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194
Article Google Scholar
Kang C, Xiang S, Liao S, Xu C, Pan C (2015) Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans Multimedia 17(3):370–381
Article Google Scholar
Laurens VDM, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(2605):2579–2605
MATH Google Scholar
Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on multimedia, Berkeley, CA, USA, November 2-8, 2003. ACM, pp 604–611
Google Scholar
Liong VE, Lu J, Tan Y, Zhou J (2017) Deep coupled metric learning for cross-modal matching. IEEE Trans Multimedia 19(6):1234–1244
Article Google Scholar
Peng Y, Huang X, Qi J (2016) Cross-media shared representation by hierarchical learning with multiple deep networks. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. IJCAI/AAAI Press, pp 3846–3853
Google Scholar
Peng Y, Qi J, Huang X, Yuan Y (2018) CCL: cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans Multimedia 20(2):405–420
Article Google Scholar
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Article Google Scholar
Ranjan V, Rasiwasia N, Jawahar CV (2015) Multi-label cross-modal retrieval. In: 2015 IEEE international conference on computer vision. IEEE Computer Society, pp 4094–4102
Google Scholar
Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using amazon’s mechanical turk. In: Proceedings of the 2010 workshop on creating speech and language data with Amazon’s mechanical Turk. Association for Computational Linguistics, pp 139–147
Google Scholar
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, pp 1849–1857
Google Scholar
Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: 2016 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 4004–4012
Google Scholar
Sun C, Wang C, Lai W (2019) Gait analysis and recognition prediction of the human skeleton based on migration learning. Phys A: Stat Mech Appl 532:121812
Article MATH Google Scholar
Unar S, Wang X, Zhang C, Wang C (2019) Detected text-based image retrieval approach for textual images. IET Image Process 13(3):515–521
Article Google Scholar
Unar S, Wang X, Wang C, Wang M (2019) New strategy for CBIR by combining low-level visual features with a colour descriptor. IET Image Process 13(7):1191–1200
Article Google Scholar
Wang C, Lai W (2021) A fuzzy model of wearable network real-time health monitoring system on pharmaceutical industry. Pers Ubiquit Comput 25:485–493
Wang W, Livescu K (2016) Large-scale approximate kernel canonical correlation analysis. In: 4th international conference on learning representations
Google Scholar
Wang X, Wang Z (2013) A novel method for image retrieval based on structure elements’ descriptor. J Vis Commun Image Represent 24(1):63–74
Article Google Scholar
Wang X, Wang Z (2014) The method for image retrieval based on multi-factors correlation utilizing block truncation coding. Pattern Recogn 47(10):3293–3303
Article Google Scholar
Wang K, He R, Wang W, Wang L, Tan T (2013) Learning coupled features paces for cross-modal matching. In: IEEE international conference on computer vision. IEEE Computer Society, pp 2088–2095
Google Scholar
Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. CoRR, vol. abs/1607.06215
Wang K, He R, Wang L, Wang W, Tan T (2016) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023
Article Google Scholar
Wang W, Yang X, Ooi BC, Zhang D, Zhuang Y (2016) Effective deep learning-based multi-modal retrieval. VLDB J 25(1):79–101
Article Google Scholar
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 154–162
Chapter Google Scholar
Wang C, Xu Q, Lin X, Liu S (2019) Research on data mining of permissions mode for android malware detection. Clust Comput 22(6):13337–13350
Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM (2019) Ranked list loss for deep metric learning. In: IEEE conference on computer vision and pattern recognition. Computer Vision Foundation / IEEE, pp 5207–5216
Google Scholar
Wang C, Wang X, Xia Z, Ma B, Shi Y (2020) Image description with polar harmonic fourier moments. IEEE Trans Circuits Syst Video Technol 30(12):4440–4452
Article Google Scholar
Wei Y, Song Y, Zhen Y, Liu B, Yang Q (2014) Scalable heterogeneous translated hashing. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 791–800
Chapter Google Scholar
Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans Cybern 47(2):449–460
Google Scholar
Wu F, Lu X, Zhang Z, Yan S, Rui Y, Zhuang Y (2013) Cross-media semantic representation via bi-directional learning to rank. In: ACM Multimedia Conference. ACM, pp 877–886
Chapter Google Scholar
Yan F, Mikolajczyk K (2015) Deep correlation for matching images and text. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 3441–3450
Google Scholar
Yang Z, Lin Z, Kang P, Lv J, Li Q, Liu W (2020) Learning shared semantic space with correlation alignment for cross-modal event retrieval. ACM Trans Multimed Comput Commun Appl 16(1):9:1–9:22
Article Google Scholar
Ye M, Lan X, Wang Z, Yuen PC (2020) Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans Inf Forensics Secur 15:407–419
Article Google Scholar
Yuan Y, Yang K, Zhang C (2017) Hard-aware deeply cascaded embedding. In: IEEE international conference on computer vision. IEEE Computer Society, pp 814–823
Google Scholar
Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circuits Syst Video Technol 24(6):965–978
Article Google Scholar
Zhang L, Ma B, Li G, Huang Q, Tian Q (2016) Pl-ranking: a novel ranking method for cross-modal retrieval. In: Proceedings of the 2016 ACM conference on multimedia conference. ACM, pp 1355–1364
Google Scholar
Zhang J, Peng Y, Yuan M (2020) SCH-GAN: semi-supervised cross-modal hashing by generative adversarial network. IEEE Trans Cybern. 50(2):489–502
Article Google Scholar
Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. In: IEEE conference on computer vision and pattern recognition, pp 10394–10403
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, China
Qingzhen Xu, Shuang Liu, Han Qiao & Miao Li

Authors

Qingzhen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Han Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Miao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingzhen Xu.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, Q., Liu, S., Qiao, H. et al. Cross-modal retrieval with dual optimization. Multimed Tools Appl 82, 7141–7157 (2023). https://doi.org/10.1007/s11042-022-13650-0

Download citation

Received: 12 October 2020
Revised: 25 March 2022
Accepted: 02 August 2022
Published: 17 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11042-022-13650-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-modal retrieval with dual optimization

Abstract

Access this article

Similar content being viewed by others

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

A Mutual Information-Based Disentanglement Framework for Cross-Modal Retrieval

Cross-modal retrieval based on shared proxies

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-modal retrieval with dual optimization

Abstract

Access this article

Similar content being viewed by others

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

A Mutual Information-Based Disentanglement Framework for Cross-Modal Retrieval

Cross-modal retrieval based on shared proxies

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation