Domain Invariant Subspace Learning for Cross-Modal Retrieval

  • Chenlu Liu
  • Xing Xu
  • Yang Yang
  • Huimin Lu
  • Fumin Shen
  • Yanli Ji
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10705)


Due to the rapid growth of multimodal data, cross-modal retrieval has drawn growing attention in recent years, which aims to take one type of data as the query to retrieve relevant data of another type. To enable directly matching between different modalities, the key issue in cross-modal retrieval is to eliminate the heterogeneity between modalities. A bundle of existing approaches directly project the samples of multimodal data into a common latent subspace with the supervision of class label information, and different samples within the same class contribute uniformly to the subspace construction. However, the subspace constructed by these methods may not reveal the true importance of each sample as well as the discrimination of different class label. To tackle this problem, in this paper we regard different modalities as different domains and propose a Domain Invariant Subspace Learning (DISL) method to associate multimodal data. Specifically, DISL simultaneously minimize the classification error with sample-wise weighting coefficients and preserve the structure similarity within and across modalities with the graph regularization. Therefore, the subspace learned by DISL can well reflect the sample-wise importance and capture the discrimination of different class labels in multi-modal data. Compared with several state-of-the-art algorithms, extensive experiments on three public datasets demonstrate the superiority of the proposed method for cross-modal retrieval tasks such as image-to-text and text-to-image.


Cross-modal retrieval Subspace learning 



This paper is partially supported by NSFC grants No. 61602089, 61572108, 61632007, 61673088; Fundamental Research Funds for Central Universities ZYGX2014Z007 and ZYGX2016KYQD114; LEADER of MEXT-Japan (16809746), The Telecommunications Foundation, REDAS and SCAT.


  1. 1.
    Shen, X., Shen, F., Sun, Q., Yang, Y., Yuan, Y., Shen, H.T.: Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. In: TCYB (2016)Google Scholar
  2. 2.
    He, L., Xu, X., Lu, H., Yang, Y., Shen, F., Shen, H.T.: Unsupervised cross modal retrieval through adversarial learning. In: ICML (2017)Google Scholar
  3. 3.
    Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM MM, pp. 33–42 (2013)Google Scholar
  4. 4.
    Zhang, H., Shen, F., Liu, W., He, X., Luan, H., Chua, T.: Discrete collaborative filtering. In: SIGIR, pp. 325–334 (2016)Google Scholar
  5. 5.
    Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10, 221–229 (2008)CrossRefGoogle Scholar
  6. 6.
    Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10, 437–446 (2008)CrossRefGoogle Scholar
  7. 7.
    Xu, X., Shimada, A., Taniguchi, R., He, L.: Coupled dictionary learning and feature mapping for cross-modal retrieval. In: ICME, pp. 1–6 (2015)Google Scholar
  8. 8.
    Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Proces. 26, 2494–2507 (2017)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004)CrossRefzbMATHGoogle Scholar
  10. 10.
    Ranjan, V., Rasiwasia, N., Jawahar, C.V.: Multi-label cross-modal retrieval. In: ICCV, pp. 4094–4102 (2015)Google Scholar
  11. 11.
    Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106, 210–233 (2014)CrossRefGoogle Scholar
  12. 12.
    Hardoon, D.R., Shawe-Taylor, J.: KCCA for different level precision in content-based image retrieval. In: CIBM (2003)Google Scholar
  13. 13.
    Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2010–2023 (2016)CrossRefGoogle Scholar
  14. 14.
    Sharma, A., Kumar, A., Daume, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR, pp. 2160–2167 (2012)Google Scholar
  15. 15.
    Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: ICCV, pp. 2407–2414 (2011)Google Scholar
  16. 16.
    Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS, pp. 2231–2239 (2012)Google Scholar
  17. 17.
    Mignon, A., Jurie, F.: CMML: a new metric learning approach for cross modal matching, pp. 1–14 (2012)Google Scholar
  18. 18.
    Quadrianto, N., Lampert, C.H.: Learning multi-view neighborhood preserving projections. In: ICML, 425–432 (2011)Google Scholar
  19. 19.
    Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: Devise: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013)Google Scholar
  20. 20.
    Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142, 397–434 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Rasiwasia, N., Moreno, P.J., Vasconcelos, N.: Bridging the gap: query by semantic example. IEEE Trans. Multimedia 9, 923–938 (2007)CrossRefGoogle Scholar
  22. 22.
    Hwang, S.J., Grauman, K.: Reading between the lines: object localization using implicit cues from image tags. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1145–1158 (2012)CrossRefGoogle Scholar
  23. 23.
    Simon, M., Rodner, E., Denzler, J.: Imagenet pre-trained models with batch normalization. CoRR (2016)Google Scholar
  24. 24.
    Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: MM, pp. 251–260 (2010)Google Scholar
  25. 25.
    Li, A., Shan, S., Chen, X., Gao, W.: Cross-pose face recognition based on partial least squares. Pattern Recogn. Lett. 32, 1948–1955 (2011)CrossRefGoogle Scholar
  26. 26.
    Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. In: Neural Computation, pp. 1247–1283 (2000)Google Scholar
  27. 27.
    Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV, pp. 2088–2095 (2013)Google Scholar
  28. 28.
    Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: MM, pp. 604–611 (2003)Google Scholar
  29. 29.
    Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: MM, pp. 7–16 (2014)Google Scholar
  30. 30.
    Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circ. Syst. Video Technol. 24, 965–978 (2014)CrossRefGoogle Scholar
  31. 31.
    Kang, C., Xiang, S., Liao, S., Xu, C., Pan, C.: Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans. Multimedia, 17, 370–381 (2015)Google Scholar
  32. 32.
    Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. IJCA I, 3846–3853 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Chenlu Liu
    • 1
  • Xing Xu
    • 1
  • Yang Yang
    • 1
  • Huimin Lu
    • 2
  • Fumin Shen
    • 1
  • Yanli Ji
    • 3
  1. 1.Center for Future Media & School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina
  2. 2.Kyushu Institute of TechnologyKitakyushuJapan
  3. 3.School of Automation EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations