Abstract
We present a novel cross-modal retrieval approach where the textual modality is present in different languages. We retrieve semantically similar documents across modalities in different languages using a correlated centroid space unsupervised retrieval (C2SUR) approach. C2SUR consists of two phases. In the first phase, we extract heterogeneous features from a multi-modal document and project it to a correlated space using kernel canonical correlation analysis (KCCA). In the second phase, correlated space centroids are obtained using clustering to retrieve cross-modal documents with different similarity measures. Experimental results show that C2SUR outperforms the existing state-of-the-art English cross-modal retrieval approaches and achieve similar results for other languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rafailidis, D., Manolopoulou, S., Daras, P.: A unified framework for multimodal retrieval. Pattern Recognition 46(12), 3358–3370 (2013)
Peters, C., Braschler, M., Clough, P.: Cross-Language Information Retrieval. Multilingual Information Retrieval, 57–84 (2012)
Moran, S., Lavrenko, V.: Sparse Kernel Learning for Image Annotation. In: Proceedings of International Conference on Multimedia Retrieval (2014)
Mishra, A., Alahari, K., Jawahar, C.V.: Image Retrieval using Textual Cues. In: IEEE International Conference on Computer Vision (ICCV) (2013)
Metze, F., Ding, D., Younessian, E., Hauptmann, A.: Beyond audio and video retrieval: Topic-oriented multimedia summarization. International Journal of Multimedia Information Retrieval 2(2), 131–144 (2013)
Shakery, A., Zhai, C.X.: Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs. Information Retrieval 16(1), 1–29 (2013)
Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pp. 1192–1201 (2009)
Larkin, J.H., Simon, H.A.: Why a diagram is (sometimes) worth ten thousand words. Cognitive Science 11(1), 65–100 (1987)
Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the International Conference on Multimedia, pp. 251–260 (2010)
Wu, X., Hauptmann, A.G., Ngo, C.-W.: Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts. In: Proceedings of the 15th International Conference on Multimedia (2007)
Rasiwasia, N., Mahajan, D., Mahadevan, V., Aggarwal, G.: Cluster Canonical Correlation Analysis. In: Proceedings of the Seventeenth AISTATS, pp. 823–831 (2014)
Sharma, A., Kumar, A., Daume, H., Jacobs, D.: Generalized multiview analysis: A discriminative latent space. In: Computer Vision and Pattern Recognition (CVPR) (2012)
Zhai, X., Peng, Y., Xiao, J.: Learning Cross-Media Joint Representation with Sparse and Semi-Supervised Regularization. IEEE Journal (2013)
Zhai, X., Peng, Y., Xiao, J.: Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 312–322. Springer, Heidelberg (2012)
Blaschko, M.B., Lampert, C.H.: Correlational spectral clustering. In: Computer Vision and Pattern Recognition (CVPR) (2008)
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Andrew, G., Arora, R., Bilmes, J., Livescu, K.: Deep Canonical Correlation Analysis. In: Proceedings of The 30th International Conference on Machine Learning, pp. 1247–1255 (2013)
Hardoon, D., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16(12), 2639–2664 (2004)
Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., McCallum, A.: Polylingual topic models. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP), vol. 2, pp. 880–889 (2009)
Wang, S., Zhang, L., Liang, Y., Pan, Q.: Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In: Computer Vision and Pattern Recognition (CVPR), pp. 2216–2223 (2012)
Zhuang, Y., Wang, Y., Wu, F., Zhang, Y., Lu, W.: Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: Proceedings of 25th AAAI (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mogadala, A., Rettinger, A. (2015). Multi-modal Correlated Centroid Space for Multi-lingual Cross-Modal Retrieval. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-16354-3_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16353-6
Online ISBN: 978-3-319-16354-3
eBook Packages: Computer ScienceComputer Science (R0)