Skip to main content
Log in

Relative image similarity learning with contextual information for Internet cross-media retrieval

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

With the fast explosive rate of the amount of image data on the Internet, how to efficiently utilize them in the cross-media scenario becomes an urgent problem. Images are usually accompanied with contextual textual information. These two heterogeneous modalities are mutually reinforcing to make the Internet content more informative. In most cases, visual information can be regarded as an enhanced content of the textual document. To make image-to-image similarity being more consistent with document-to-document similarity, this paper proposes a method to learn image similarities according to the relations of the accompanied textual documents. More specifically, instead of using the static quantitative relations, rank-based learning procedure by employing structural SVM is adopted in this paper, and the ranking structure is established by comparing the relative relations of textual information. The learning results are in more accordance with the human’s recognition. The proposed method in this paper can be used not only for the image-to-image retrieval, but also for cross-modality multimedia, where a query expansion framework is proposed to get more satisfactory results. Extensive experimental evaluations on large scale Internet dataset validate the performance of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp. 524–531 (2005)

  2. Li, L., Jiang, S., Huang, Q.: Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans. Multimedia 14(5), 1401–1413 (2012)

    Article  Google Scholar 

  3. Tang, J., Zha, Z.-J., Tao, D., Chua, T.-S.: Semantic-gap oriented active learning for multi-label image annotation. IEEE Trans. Image Process. 21(4), 2354–2360 (2012)

    Article  MathSciNet  Google Scholar 

  4. Wang, S., Huang, Q., Jiang, S., Tian, Q.: S3MKL: scalable semi-supervised multiple kernel learning for real world image applications. IEEE Trans. Multimedia 14(4), 1259–1274 (2012)

    Article  Google Scholar 

  5. Wang, M., Hua, X., Hong, R., Tang, J., Qi, G., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circ. Syst. Video Technol. 19(5), 733–746 (2009)

    Article  Google Scholar 

  6. Jiang, S., Huang, Q., Ye, Q., Gao, W.: An effective method to detect and categorize digitized traditional Chinese paintings. Pattern Recogn. Lett. 27(7), 734–746 (2006)

    Article  Google Scholar 

  7. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2169–2178 (2006)

  8. Tang, J., Yan, S., Hong, R., Qi, G.-J., Chua, T.-S.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of ACM Multimedia, pp. 223–232 (2009)

  9. Li, X., Snoek, C.G.M., Worring, M.: Learning social tag relevance by neighbor voting. IEEE Trans. Multimedia 11(7), 1310–1322 (2009)

    Article  Google Scholar 

  10. Tang, J., Hong, R., Yan, S., Chua, T.-S., Qi, G.-J., Jain, R.: Image annotation by knn-sparse graph-based label propagation over noisily-tagged web images. ACM Trans. Intell. Syst. Technol. 2, 2 (2011)

    Article  Google Scholar 

  11. Liu, D., Hua, X., Yang, L., Wang, M., Zhang, H.: Tag ranking. In: Proceeding of the 17th International Conference on World Wide Web, ACM, New York, NY, USA, pp. 317–326 (2009)

  12. Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of ACM Multimedia, pp. 461–470 (2010)

  13. Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proceedings of ACM Multimedia, pp. 952–959 (2004)

  14. Gao, B., Liu, T.-Y., Qin, T., Zheng, X., Cheng, Q.-S., Ma, W.-Y.: Web image clustering by consistent utilization of visual features and surrounding texts. In: Proceedings of ACM Multimedia, pp. 112–121 (2005)

  15. Rege, M., Dong, M., Hua, J.: Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In: Proceeding of the 17th International Conference on World Wide Web, ACM, New York, NY, USA, pp. 317–326 (2008)

  16. Jin, Y., Khan, L., Wang, L., Awad M.: Image annotations by combining multiple evidence and Wordnet. In: Proceedings of ACM Multimedia, pp. 706–715 (2008)

  17. Wu, L., Hoi, S.C., Zhu, J., Jin, R., Yu, N.: Distance metric learning from uncertain side information with application to automated photo tagging. In: Proceedings of ACM Multimedia, pp. 135–144 (2009)

  18. Wang, S., Jiang, S., Huang, Q., Tian, Q.: Multi-feature metric learning with knowledge transfer among semantics and social tagging. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2633 (2012)

  19. Wu, L., Hua, X.-S., Yu, N., Ma, W.-Y., Li, S.: Flickr distance. In: Proceedings of ACM Multimedia, pp. 31–40 (2008)

  20. Hoi, S.C.H., Liu, W., Lyu, M.R., Ma, W.-Y.: Learning distance metrics with contextual constraints for image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2072–2078 (2006)

  21. Hwang, S.J., Grauman, K., Sha, F.: Learning a tree of metrics with disjoint visual features. In: Proceedings of the Conference on Advances in Neural Information Processing Systems, NIPS (2011)

  22. Wu, P., Hoi, S.C.H., Zhao, P., He, Y.: Mining social images with distance metric learning for automated image tagging. In: WSDM, pp. 197–206 (2011)

  23. Verma, N., Mahajan, D., Sellamanickam, S., Nair, V.: Learning hierarchical similarity metrics. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2280–2287 (2012)

  24. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  25. McFee, B., Lanckriet, G.: Metric learning to rank. In: International Conference on Machine Learning, Haifa, Israel (2010)

  26. Wang, X.-J., Zhang, L., Li, X., Ma, W.-Y.: Annotating images by mining image search results. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1919–1932 (2008)

    Article  Google Scholar 

  27. Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp. 3386–3393 (2012)

  28. Perronnin, F., Akata, Z., Harchaoui, Z., Schmid, C.: Towards good practice in large-scale learning for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3482–3489 (2012)

  29. Zhang, H.J., Su, Z.: Improving CBIR by semantic propagation and cross modality query expansion. In: Proceedings of the international workshop on MultiMedia Content-Based Indexing and Retrieval (MM-CBIR’01), September, pp. 83–86 (2001)

  30. Jia, Y., Salmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2407–2414 (2011)

  31. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)

    MATH  Google Scholar 

  32. Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. In: Proceedings of the Conference on Advances in Neural Information Processing Systems (2005)

  33. Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Proceedings of the Conference on Advances in Neural Information Processing Systems (2009)

  34. Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D., Belongi, S.: Generalized non-metric multi-dimensional scaling. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (2007)

  35. McFee, B., Lanckriet, G.R.G.: Learning multi-modal similarity. J. Mach. Learn. Res. (JMLR), February, pp. 491–523 (2011)

  36. Lee, J.-E., Jin, R., Jain, A.K.: Rank-based distance metric learning: an application to image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2009)

  37. Thorsten, J., Finley, T., John Yu C.-N.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1):27–59 (2009). ISSN 0885-6125

    Google Scholar 

  38. Crammer, K., Singer, Y.: On the algorithmic implementation of multi-class kernel-based vector machines. Mach. Learn. Res. 2, 265–292 (2001)

    Google Scholar 

  39. Joachims, T.: A support vector method for multivariate performance measures. In: International Conference on Machine Learning, pp. 377–384 (2005)

  40. Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: Proceedings of acm special interest group on information retrieval conference, pp. 271–278 (2007)

  41. Chakrabarti, S., Khanna, R., Sawant, U., Bhattacharyya, C.: Structured learning for non smooth ranking losses. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, USA, pp. 88–96 (2008)

  42. http://www.imageclef.org

  43. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by National Basic Research Program of China (973 Program):2012CB316400, in part by National Natural Science Foundation of China: 61070108, 61025011, and 61035001, in part by the Key Technologies R&D Program of China under Grant no. 2012BAH18B02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuqiang Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, S., Song, X. & Huang, Q. Relative image similarity learning with contextual information for Internet cross-media retrieval. Multimedia Systems 20, 645–657 (2014). https://doi.org/10.1007/s00530-012-0299-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-012-0299-4

Keywords

Navigation