Abstract
With the explosive growth of multimedia data, the information is usually represented in multi-modal version. The cross-modal based applications attracted increasing attention in recent years, and cross-modal retrieval is the popular one of them. In this paper, we propose a semi-supervised modality-dependent cross-modal retrieval method based on coupled feature selection (Semi-CoFe). It is different from most of the previous cross-modal retrieval methods, which usually used only labeled data for training to obtain the projection matrices under the constraint of l2-norm. In details, we propagate the label of cluster centers to unlabeled data via a devised weight matrix and construct the pseudo corresponding heterogeneous data. And then we jointly considered the semantic regression and pair-wised correlation analysis when learning the mapping matrices to keep the semantic consistency and the closeness of pair-wised data. Meanwhile, the l2,1-norm constraint is used for informative and discriminative features selection and noise reduction. In addition, we learn different mapping matrices for different sub-tasks (such as, using image to search text (I2T) and using text to search image (T2I)) to distinguish the semantic information of query data, and the optimal mapping matrices are achieved via an iterative optimization method. The experimental results on three public datasets verify that the proposed method performs better than the state-of-the-art methods.
Similar content being viewed by others
References
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst 28(10):2294–2305
Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197
Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106 (2):210–233
Hardoon DR, Szedmak S, Shawetaylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
He J, Ma B, Wang S, Liu Y, Huang Q (2016) Cross-modal retrieval by real label partial least squares. In: ACM on Multimedia Conference, pp 227–231
Hua Y, Wang S, Liu S, Huang Q, Cai A (2016) Cross-modal correlation learning by adaptive hierarchical semantic aggregation. IEEE Transactions on Multimedia 18(10):190–199
Jia Y, Salzmann M, Darrell T (2011) Learning cross-modality similarity for multinomial data. In: IEEE International Conference on Computer Vision, pp 2407–2414
Kang C, Xiang S, Liao S, Xu C, Pan C (2015) Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans Multimed 17 (3):370–381
Katsurai M, Ogawa T, Haseyama M (2014) A cross-modal approach for extracting semantic relationships between concepts using tagged images. IEEE Trans Multimed 16(4):1059–1074
Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098
Mao X, Lin B, Cai D, He X, Pei J (2013) Parallel field alignment for cross media retrieval. In: ACM International Conference on Multimedia, pp 897–906
Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint 2,1 -norms minimization. In: International Conference on Neural Information Processing Systems, pp 1813–1821
Nie F, Wang H, Huang H, Ding C (2013) Early active learning via robust representation and structured sparsity. In: International Joint Conference on Artificial Intelligence, pp 1572–1578
Peng Y, Zhai X, Zhao Y, Huang X (2016) Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans Circ Syst Video Technol 26(3):583–596
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–35
Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: International Conference on Multimedia, pp 251–260
Sharma A (2012) Generalized multiview analysis: A discriminative latent space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2160–2167
Sharma A, Jacobs DW (2011) Bypassing synthesis: Pls for face recognition with pose, low-resolution and sketch. In: Computer Vision and Pattern Recognition, pp 593–600
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
He R, Tan T, Wang L, Zheng W (2012) l2, 1 regularized correntropy for robust feature selection. IEEE Conf Comput Vis Pattern Recognit 157(10):2504–2511
Tenenbaum JB, Freeman WT (2014) Separating style and content with bilinear models. Neural Comput 12(6):1247–1283
Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393
Wang K, He R, Wang W, Wang L, Tan T (2013) Learning coupled feature spaces for cross-modal matching. In: IEEE International Conference on Computer Vision, pp 2088–2095
Wang K, He R, Wang L, Wang W, Tan T (2016) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010
Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2016) Modality-dependent cross-media retrieval. Acm Trans Intell Syst Technol 7(4):57
Wu F, Zhang H, Zhuang Y (2007) Learning semantic correlations for cross-media retrieval. In: IEEE International Conference on Image Processing, pp 1465–1468
Xie L, Pan P, Lu Y (2013) A semantic model for cross-modal and multi-modal retrieval. In: ACM Conference on International Conference on Multimedia Retrieval, pp 175–182
Yan J, Zhang H, Sun J, Wang Q, Guo P, Meng L, Wan W, Dong X (2018) Joint graph regularization based modality-dependent cross-media retrieval. Multimedia Tools and Applications 77(3):3009–3027
Zhai X, Peng Y, Xiao J (2013) Heterogeneous metric learning with joint graph regularization for cross-media retrieval. In: Twenty-Seventh AAAI Conference on Artificial Intelligence, pp 1198–1204
Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circ Syst Video Technol 24(6):965–978
Zhuang Y, Wang Y, Wu F, Zhang Y, Lu W (2013) Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: AAAI Conference on Artificial Intelligence
Zhang H, Liu Y, Ma Z (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119(16):10–16
Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47(9):3168–3178
Zhang L, Ma B, Li G, Huang Q, Tian Q (2016) Pl-ranking: A novel ranking method for cross-modal retrieval. In: ACM on Multimedia Conference, pp 1355–1364
Zhang L, Ma B, Li G, Huang Q, Tian Q (2017) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimed PP(99):1–1
Acknowledgements
This work is supported by Natural Science Foundation for Distinguished Young Scholars of Shandong Province (JQ201718), Key Research and Development Foundation of Shandong Province (2016GGX101009), the Natural Science Foundation of China (U1736122, 61603225, 61601268), Shandong Provincial Key Research and Development Plan (2017CXGC1504). And we gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Yu, E., Sun, J., Wang, L. et al. Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval. Multimed Tools Appl 78, 28931–28951 (2019). https://doi.org/10.1007/s11042-018-5958-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5958-9