Not Just a Matter of Semantics: The Relationship Between Visual and Semantic Similarity

  • Clemens-Alexander BrustEmail author
  • Joachim Denzler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11824)


Knowledge transfer, zero-shot learning and semantic image retrieval are methods that aim at improving accuracy by utilizing semantic information, e.g., from WordNet. It is assumed that this information can augment or replace missing visual data in the form of labeled training images because semantic similarity correlates with visual similarity.

This assumption may seem trivial, but is crucial for the application of such semantic methods. Any violation can cause mispredictions. Thus, it is important to examine the visual-semantic relationship for a certain target problem. In this paper, we use five different semantic and visual similarity measures each to thoroughly analyze the relationship without relying too much on any single definition.

We postulate and verify three highly consequential hypotheses on the relationship. Our results show that it indeed exists and that WordNet semantic similarity carries more information about visual similarity than just the knowledge of “different classes look different”. They suggest that classification is not the ideal application for semantic methods and that wrong semantic information is much worse than none.



This work was supported by the DAWI research infrastructure project, funded by the federal state of Thuringia (grant no. 2017 FGI 0031), including access to computing and storage facilities.


  1. 1.
    Barz, B., Denzler, J.: Hierarchy-based image embeddings for semantic image retrieval. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 638–647. IEEE (2019)Google Scholar
  2. 2.
    Bilal, A., Jourabloo, A., Ye, M., Liu, X., Ren, L.: Do convolutional neural networks learn class hierarchy? 24(1), 152–162.
  3. 3.
    Van den Branden Lambrecht, C.J., Verscheure, O.: Perceptual quality measure using a spatiotemporal model of the human visual system. In: Digital Video Compression: Algorithms and Technologies 1996, vol. 2668, pp. 450–462. International Society for Optics and Photonics (1996)Google Scholar
  4. 4.
    Brust, C.A., et al.: Towards automated visual monitoring of individual gorillas in the wild. In: International Conference on Computer Vision Workshop (ICCV-WS) (2017)Google Scholar
  5. 5.
    Chen, G., Han, T.X., He, Z., Kays, R., Forrester, T.: Deep convolutional neural network based species recognition for wild animal monitoring. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 858–862. IEEE (2014)Google Scholar
  6. 6.
    Deselaers, T., Ferrari, V.: Visual and semantic similarity in ImageNet. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1777–1784. IEEE (2011)Google Scholar
  7. 7.
    Freytag, A., Rodner, E., Simon, M., Loos, A., Kühl, H., Denzler, J.: Chimpanzee faces in the wild: Log-Euclidean CNNs for predicting identities and attributes of primates. In: German Conference on Pattern Recognition (GCPR), pp. 51–63 (2016)Google Scholar
  8. 8.
    Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2121–2129. Curran Associates, Inc. (2013)Google Scholar
  9. 9.
    Harispe, S., Ranwez, S., Janaqi, S., Montmain, J.: Semantic similarity from natural language and ontology analysis. Synth. Lect. Hum. Lang. Technol. 8(1), 1–254 (2015)CrossRefGoogle Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  11. 11.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint arXiv:cmp-lg/9709008 (1997)
  12. 12.
    Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. (IJCV) 123(1), 32–73 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report 4, University of Toronto (2009)Google Scholar
  14. 14.
    Kumar, A.: Computer-vision-based fabric defect detection: a survey. IEEE Trans. Industr. Electron. 55(1), 348–363 (2008)CrossRefGoogle Scholar
  15. 15.
    Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)CrossRefGoogle Scholar
  16. 16.
    Liu, Y., Zhang, D., Lu, G., Ma, W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)CrossRefGoogle Scholar
  17. 17.
    Maedche, A., Staab, S.: Comparing ontologies-similarity measures and a comparison study. Technical report, Institute AIFB, University of Karlsruhe (2001)Google Scholar
  18. 18.
    Malamas, E.N., Petrakis, E.G., Zervakis, M., Petit, L., Legat, J.D.: A survey on industrial vision systems, applications and tools. Image Vis. Comput. 21(2), 171–188 (2003)CrossRefGoogle Scholar
  19. 19.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995).
  20. 20.
    Niemann, H.: Pattern Analysis. Springer Series in Information Sciences. Springer, Heidelberg (2012).
  21. 21.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42(3), 145–175 (2001)CrossRefGoogle Scholar
  22. 22.
    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19(1), 17–30 (1989)CrossRefGoogle Scholar
  23. 23.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint arXiv:cmp-lg/9511007 (1995)
  24. 24.
    Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1641–1648. IEEE (2011)Google Scholar
  25. 25.
    Ross, S.M.: A First Course in Probability. Macmillan, New York (1976)zbMATHGoogle Scholar
  26. 26.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Salem, M.A.M., Atef, A., Salah, A., Shams, M.: Recent survey on medical image segmentation. In: Computer Vision: Concepts, Methodologies, Tools, and Applications, pp. 129–169. IGI Global (2018)Google Scholar
  28. 28.
    Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)CrossRefGoogle Scholar
  29. 29.
    Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1), 31–72 (2011)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)CrossRefGoogle Scholar
  31. 31.
    Thevenot, J., López, M.B., Hadid, A.: A survey on computer vision for assistive medical diagnosis from faces. IEEE J. Biomed. Health Inform. 22(5), 1497–1511 (2018)CrossRefGoogle Scholar
  32. 32.
    Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. Trans. Pattern Anal. Mach. Intell. (PAMI) 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  33. 33.
    Tversky, A.: Features of similarity. Psychol. Rev. 84(4), 327 (1977)CrossRefGoogle Scholar
  34. 34.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  35. 35.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. arXiv preprint arXiv:1801.03924
  36. 36.
    Zhou, Z., Wang, Y., Gu, J.: A new model of information content for semantic similarity in WordNet. In: Second International Conference on Future Generation Communication and Networking Symposia, 2008, FGCNS 2008, vol. 3, pp. 85–89. IEEE (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computer Vision GroupFriedrich Schiller University JenaJenaGermany
  2. 2.Michael Stifel Center JenaJenaGermany

Personalised recommendations