Multi-modal Correlated Centroid Space for Multi-lingual Cross-Modal Retrieval

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9022)


We present a novel cross-modal retrieval approach where the textual modality is present in different languages. We retrieve semantically similar documents across modalities in different languages using a correlated centroid space unsupervised retrieval (C2SUR) approach. C2SUR consists of two phases. In the first phase, we extract heterogeneous features from a multi-modal document and project it to a correlated space using kernel canonical correlation analysis (KCCA). In the second phase, correlated space centroids are obtained using clustering to retrieve cross-modal documents with different similarity measures. Experimental results show that C2SUR outperforms the existing state-of-the-art English cross-modal retrieval approaches and achieve similar results for other languages.


Machine Translation Image Query Mean Average Precision Correlate Space Text Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rafailidis, D., Manolopoulou, S., Daras, P.: A unified framework for multimodal retrieval. Pattern Recognition 46(12), 3358–3370 (2013)CrossRefGoogle Scholar
  2. 2.
    Peters, C., Braschler, M., Clough, P.: Cross-Language Information Retrieval. Multilingual Information Retrieval, 57–84 (2012)Google Scholar
  3. 3.
    Moran, S., Lavrenko, V.: Sparse Kernel Learning for Image Annotation. In: Proceedings of International Conference on Multimedia Retrieval (2014)Google Scholar
  4. 4.
    Mishra, A., Alahari, K., Jawahar, C.V.: Image Retrieval using Textual Cues. In: IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  5. 5.
    Metze, F., Ding, D., Younessian, E., Hauptmann, A.: Beyond audio and video retrieval: Topic-oriented multimedia summarization. International Journal of Multimedia Information Retrieval 2(2), 131–144 (2013)CrossRefGoogle Scholar
  6. 6.
    Shakery, A., Zhai, C.X.: Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs. Information Retrieval 16(1), 1–29 (2013)CrossRefGoogle Scholar
  7. 7.
    Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 1192–1201 (2009)Google Scholar
  8. 8.
    Larkin, J.H., Simon, H.A.: Why a diagram is (sometimes) worth ten thousand words. Cognitive Science 11(1), 65–100 (1987)CrossRefGoogle Scholar
  9. 9.
    Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the International Conference on Multimedia, pp. 251–260 (2010)Google Scholar
  10. 10.
    Wu, X., Hauptmann, A.G., Ngo, C.-W.: Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts. In: Proceedings of the 15th International Conference on Multimedia (2007)Google Scholar
  11. 11.
    Rasiwasia, N., Mahajan, D., Mahadevan, V., Aggarwal, G.: Cluster Canonical Correlation Analysis. In: Proceedings of the Seventeenth AISTATS, pp. 823–831 (2014)Google Scholar
  12. 12.
    Sharma, A., Kumar, A., Daume, H., Jacobs, D.: Generalized multiview analysis: A discriminative latent space. In: Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  13. 13.
    Zhai, X., Peng, Y., Xiao, J.: Learning Cross-Media Joint Representation with Sparse and Semi-Supervised Regularization. IEEE Journal (2013)Google Scholar
  14. 14.
    Zhai, X., Peng, Y., Xiao, J.: Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 312–322. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Blaschko, M.B., Lampert, C.H.: Correlational spectral clustering. In: Computer Vision and Pattern Recognition (CVPR) (2008)Google Scholar
  16. 16.
    Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)CrossRefzbMATHGoogle Scholar
  17. 17.
    Andrew, G., Arora, R., Bilmes, J., Livescu, K.: Deep Canonical Correlation Analysis. In: Proceedings of The 30th International Conference on Machine Learning, pp. 1247–1255 (2013)Google Scholar
  18. 18.
    Hardoon, D., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16(12), 2639–2664 (2004)CrossRefzbMATHGoogle Scholar
  19. 19.
    Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., McCallum, A.: Polylingual topic models. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), vol. 2, pp. 880–889 (2009)Google Scholar
  20. 20.
    Wang, S., Zhang, L., Liang, Y., Pan, Q.: Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In: Computer Vision and Pattern Recognition (CVPR), pp. 2216–2223 (2012)Google Scholar
  21. 21.
    Zhuang, Y., Wang, Y., Wu, F., Zhang, Y., Lu, W.: Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: Proceedings of 25th AAAI (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Institute AIFBKarlsruhe Institute of TechnologyGermany

Personalised recommendations