Abstract
Cross-modal retrieval (i.e., image–query–text or text–query–image) is a hot research topic for multimedia information retrieval, but the heterogeneity gap between different modalities generates a critical challenge for multimodal data. Some researchers regard the cross-modal retrieval as a leaning to rank task, and they usually consider to measure similarity between two different modalities in the embedding shared subspace. However, previous methods almost pay more attention to construct a discriminative objective function to optimize common space, ignoring to exploit correlation between the single modality. In this paper, we consider the cross-modal retrieval task, from the perspective of optimizing ranking model, as a listwise ranking problem, and propose a novel method called learning to rank with relational graph and pointwise constraint (\( {\text{LR}}^{2} {\text{GP}} \)). In \( {\text{LR}}^{2} {\text{GP}} \), we first propose a discriminative ranking model, which makes use of the relation between the single modality to improve ranking performance so as to learn an optimal embedding common subspace. Then, a pointwise constraint is introduced in the low-dimension embedding subspace to make up for the real loss in the training phase since listwise method introduced merely considers directly optimize latent permutation from the perspective of the overall. Finally, a dynamic interpolation algorithm, which gradually transits from pointwise and pairwise to listwise learning, is selected to deal with the problem of fusion of loss function reasonable. Experiments on the benchmark datasets about Wikipedia and Pascal demonstrate the effectiveness for proposed method.
Similar content being viewed by others
References
Akaho S (2006) A kernel method for canonical correlation analysis. Comput Sci 40(2):263–269
Andrew G, Arora R, Bilmes J, Livescu K (2010) Deep canonical correlation analysis. In: International conference on machine learning (ICML), pp 3408–3415
Bai Y, Mu X (2018) Global asymptotic stability of a generalized SIRS epidemic model with transfer from infectious to susceptible. J Appl Anal Comput 8(2):402–412
Bai B et al (2010) Learning to rank with (a lot of) word features. Inf Retr 13(3):291–314
Cao X, Wang J (2018) Finite-time stability of a class of oscillating systems with two delays. Math Methods Appl Sci. https://doi.org/10.1002/mma.4943
Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning. ACM, pp 129–136
Duncan Luce R (2005) Individual choice behavior: a theoretical analysis. Courier Corporation, Chelmsford
Everingham M, Gool V, Williams C, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Fushan L, Qingyong G (2016) Blow-up of solution for a nonlinear Petrovsky type equation with memory. Appl Math Comput 274:383–392
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embed- ding space for modeling internet images, tags, and their semantics. Int J Comput Vis (IJCV) 106(2):210–233
Grangier D, Bengio S (2008a) A discriminative kernel-based approach to rank images from text queries. IEEE Trans Pattern Anal Mach Intell 30(8):1371–1384
Grangier D, Bengio S (2008b) A discriminative kernel-based approach to rank images from text queries. IEEE Trans Pattern Anal Mach Intell 30(8):1371–1384
Han M, Sheng L, Zhang X (2018a) Bifurcation theory for finitely smooth planar autonomous differential systems. J Differ Equ 264:3596–3618
Han M, Hou X, Sheng L, Wang C (2018b) Theory of rotated equations and applications to a population model. Discrete Contin Dyn Syst A 38(4):2171–2185
Hardoon DR, Szedmák S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Hwang SJ, Grauman K (2012) Reading between the lines: object localization using implicit cues from im- age tags. IEEE Trans Pattern Anal Mach Intell 34(6):1145–1158
Kang C, Xiang S, Liao S, Xu C, Pan C (2015a) Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans Multimed 17(3):370–381
Kang C, Xiang S, Liao S, Xu C, Pan C (2015b) Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans Multimed 17(3):370–381
Kang C, Xiang S, Liao S, Xu C, Pan C (2016) Multi- view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194
Laurens VDM (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245
Li H (2014) Learning to rank for information retrieval and natural language processing. Synth Lect Hum Lang Technol 4(1):113
Li F, Guangwei D (2018) General energy decay for a degenerate viscoelastic Petrovsky-type plate equation with boundary feedback. J Appl Anal Comput 8(1):390–401
Li F, Li J (2012) Global existence and blow-up phenomena for nonlinear divergence form parabolic equations with inhomogeneous Neumann boundary conditions. J Math Anal Appl 385:1005–1014
Li F, Li J (2014) Global existence and blow-up phenomena for p- Laplacian heat equation with inhomogeneous Neumann boundary conditions. Bound Value Probl 2014:219
Li P, Ren G (2016) Some classes of equations of discrete type with harmonic singular operator and convolution. Appl Math Comput 284:185–194
Li M, Wang J (2018) Exploring delayed Mittag–Leffler type matrix function to study finite timestability of fractional delay differential equations. Appl Math Comput 324:254–265
Li H, Liu TY, Zhai CX (2009) Learning to rank for information retrieval (LR4IR 2009). Acm Sigir Forum 43(2):41–45
Liu S, Cheng X, Lan C, Fu W, Zhou J, Li Q, Gao G (2013) Fractal property of generalized M-set with rational number exponent. Appl Math Comput 220:668–675
Liu S, Pan Z, Cheng X (2017a) A novel fast fractal image compression method based on distance clustering in high dimensional sphere surface. Fractals 25(4):1740004
Liu S, Pan Z, Cheng Z (2017b) A novel fast fractal image compression method based on distance clustering in high dimensional sphere surface. Fractals 25(4):1740004
Liu S, Pan Z, Son H (2017c) Digital image watermarking method based on DCT and fractal encoding. IET Image Process 11(10):815–821
Liu G, Xu S, Wei Y, Qi Z, Zhang Z (2018a) New insight into reachable set estimation for uncertain singular time-delay systems. Appl Math Comput 320:769–780
Liu G, Liu S, Muhammad K (2018b) Object tracking in vary lighting conditions for fog based intelligent surveillance of public spaces. IEEE Access 6:29283–29296
Lu X, Wu F, Tang S, et al (2013) A low rank structural large margin method for cross-modal ranking. In: International ACM SIGIR conference on research and development in information retrieval. ACM, pp 433–442
Lu X, Wu F, Tang S, Zhang Z, He X, Zhuang Y (2013) A low rank structural large margin method for cross-modal ranking. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, pp 433–442
Lu M, Liu S, Sangaiah AK (2018) Nucleosome positioning with fractal entropy increment of diversity in telemedicine. IEEE Access 6:33451–33459
Ma X, Wang P, Wei W (2018) Constant mean curvature surfaces and mean curvature flow with non-zero Neumann boundary conditions on strictly convex domains. J Funct Anal 274:252–277
Mao A, Chang H (2016) Kirchhoff type problems in RN with radial potentials and locally Lipschitz functional. Appl Math Lett Appl Math Lett 62:49–54
Mao A, Yang L, Qian A, Luan S (2017) Existence and concentration of solutions of Schroinger-Poisson system. Appl Math Lett 68:8–12
Meng D, Zhao Q, Lu J (2017) A theoretical understanding of self-paced learning. Inf Sci 414:319–328
Peihe W, Dekai Z (2017) Convexity of level sets of minimal graph on space form with nonnegative curvature. J Differ Equ 262:5534–5564
Peihe W, Lingling Z (2016) Some geometrical properties of convex level sets of minimal graph on 2-dimensional Riemannian mani-folds. Nonlinear Anal Theory Methods Appl 130(1):1–13
Peng X, Shang Y, Zheng X (2018) Lower bounds for the blow-up time to a nonlinear viscoelastic wave equation with strong damping. Appl Math Lett 76:66–73
Plackett RL (1975) The analysis of permutations. Appl Stat 24(2):193–202
Ranjan V, Rasiwasia N, Jawahar CV (2015) Multi-label cross-modal retrieval. In: IEEE international conference on computer vision. IEEE Computer Society, pp 4094–4102
Ranjan V, Rasiwasia N, Jawahar C (2015) Multi-label cross-modal retrieval. In: ICCV, pp 4094–4102
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: ACM international conference on multimedia (ACM MM), pp 251–260
Rasiwasia N,Pereira JC, Coviello E (2010) A new approach to cross-modal multimedia retrieval. In: ACMMM, pp 251–260
Rasiwasia N, Mahajan D, Mahadevan V, Aggarwal G (2014) Cluster canonical correlation analysis. In: International conference on artificial intelligence and statistics (AISTATS), pp 823–831
Sharma A, Kumar A, Hal D, Jacobs D (2012) Generalized multiview analysis: a discriminative latent space. In: CVPR, pp 2160–2167
Sun F, Liu L, Yonghong W (2018) Finite time blow-up for a thin-film equation with initial data at arbitrary energy level. J Math Anal Appl 458:9–20
Tong H, He J, Li M, Zhang C, Ma W-Y (2005) Graph based multi-modality learning. In: ACM international conference on multimedia (ACM MM), pp 862–871
Wang K, He R, Wang W, Wang L, Tan T (2013) Learning coupled feature spaces for cross-modal matching. In: IEEE international conference on computer vision, pp 2088–2095
Wang B, Iserles A, Wu X (2016a) Arbitrary order trigonometric fourier collocation methods for multi-frequency oscillatory systems. Found Comput Math 16(1):151–181
Wang B, Iserles A, Wu X (2016b) Arbitrary order trigonometric Fourier collocation methods for second-order ODEs. Found Comput Math 16:151–181
Wang B, Wu X, Meng F (2017a) Trigonometric collocation methods based on Lagrange basis polynomials for multi-frequency oscillatory second order differential equations. J Comput Appl Math 313:185–201
Wang B, Yang H, Meng F (2017b) Sixth order symplectic and symmetric explicit ERKN schemes for solving multi frequency oscillatory nonlinear Hamiltonian equations. Calcolo 54:117–140
Wang PH, Qiu HM, Liu ZH (2018a) Some geometrical properties of minimal graph on space forms with nonpositive curvature. Houston J Math 44(2):545–570
Wang PH, Liu X, Liu ZH (2018b) The convexity of the level sets of maximal strictly space-like hypersurfaces defined on 2-dimensional space forms. Nonlinear Anal 174:79–103
Wang J, Ibrahim AG, O’Regan D (2018c) Topological structure of the solution set for fractional non-instantaneous impulsive evolution inclusions. J Fixed Point Theory Appl 20:59. https://doi.org/10.1007/s11784-018-0534-5
Wu M, Chang Y, Zheng Z et al (2009) Smoothing DCG for learning to rank: a novel approach using smoothed hinge functions. In: ACM conference on information and knowledge management. ACM, pp 1923–1926
Wu F, Lu X, Zhang Z, Yan S, Rui Y, Zhuang Y (2013) Cross-media semantic representation via bi-directional learning to rank. In: Proceedings of 21st ACM international conference on multimedia, pp 877–886
Xia F, Liu TY, Wang J et al (2008) Listwise approach to learning to rank: theory and algorithm. In: International conference on machine learning. ACM, pp 1192–1199
Xiao M, Ding YX, Gao X (2011) Learning to rank relational objects based on the listwise approach. In: International joint conference on neural networks. IEEE, pp 1818–1824
Xiuli L, Zengqin Z (2016) Iterative technique for a third-order differential equation with three-point nonlinear boundary value conditions. Electron J Qual Theory Differ Equ 12(1):1–10. https://doi.org/10.14232/ejqtde.2016.1.12
Xu R, Meng F (2016) Some new weakly singular integral inequalities and their applications to fractional differential equations. J Inequal Appl 2016(1):1–16
Yan F, Mikolajczyk K (2015) Deep correlation for matching images and text. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3441–3450
Yang S, Chuangqiang H (2018) Pure Weierstrass gaps from a quotient of the Hermitian curve. Finite Fields Appl 50:251–271
Yang S, Hu C (2017) Weierstrass semigroups from Kummer extensions. Finite Fields Appl 45:264–284
Yang S, Yao Z-A (2017) Complete weight enumerators of a class of linear codes. Discrete Math 340:729–739
Yang S, Yao Z-A, Zhao C-A (2017) The weight distributions of two classes of p-ary cyclic codes with few weights. Finite Fields Appl 44:76–91
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Zhang L, Zhao Y, Zhu Z, Wei S, Wu X (2014) Mining semantically consistent patterns for cross-view data. IEEE Trans Knowl Data Eng (TKDE) 26:2745–2758
Zhang L, Ma B, He JF, Li GR, Huang QM, Tian Q (2017) Adaptively unified semi-supervised learning for cross-modal retrieval. In: IJCAI, pp 3406–3412
Zhuang Y, Yang Y, Wu F (2008) Miningsemanticcorrelationofheterogeneous multimedia data for cross-media retrieval. IEEE Trans Multimed 10(2):221–229
Acknowledgements
The authors would like to thank the anonymous reviewers and the editor for the very instructive suggestions that led to the much improved quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by A.K. Sangaiah, H. Pham, M.-Y. Chen, H. Lu, F. Mercaldo.
This work was supported in part by The 13th Five-Year plan for the development of philosophy and Social Sciences in GUANGZHOU (2018GZYB36), Science Foundation of Guangdong Provincial Communications Department (2015-02-064), the National NATURAL SCIENCE Foundation of China (61402185), and South China Normal Q4 University–Bluedon Information Security Technologies Co, Ltd joint laboratory project (LD20170201).
Rights and permissions
About this article
Cite this article
Xu, Q., Li, M. & Yu, M. Learning to rank with relational graph and pointwise constraint for cross-modal retrieval. Soft Comput 23, 9413–9427 (2019). https://doi.org/10.1007/s00500-018-3608-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3608-9