Learning Visual Models Using a Knowledge Graph as a Trainer

Monka, Sebastian; Halilaj, Lavdim; Schmid, Stefan; Rettinger, Achim

doi:10.1007/978-3-030-88361-4_21

Sebastian Monka^17,18,
Lavdim Halilaj¹⁷,
Stefan Schmid¹⁷ &
…
Achim Rettinger¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12922))

Included in the following conference series:

International Semantic Web Conference

3707 Accesses
9 Citations

Abstract

Traditional computer vision approaches, based on neural networks (NN), are typically trained on a large amount of image data. By minimizing the cross-entropy loss between a prediction and a given class label, the NN and its visual embedding space are learned to fulfill a given task. However, due to the sole dependence on the image data distribution of the training domain, these models tend to fail when applied to a target domain that differs from their source domain. To learn a more robust NN to domain shifts, we propose the knowledge graph neural network (KG-NN), a neuro-symbolic approach that supervises the training using image-data-invariant auxiliary knowledge. The auxiliary knowledge is first encoded in a knowledge graph with respective concepts and their relationships, which is then transformed into a dense vector representation via an embedding method. Using a contrastive loss function, KG-NN learns to adapt its visual embedding space and thus its weights according to the image-data invariant knowledge graph embedding space. We evaluate KG-NN on visual transfer learning tasks for classification using the mini-ImageNet dataset and its derivatives, as well as road sign recognition datasets from Germany and China. The results show that a visual model trained with a knowledge graph as a trainer outperforms a model trained with cross-entropy in all experiments, in particular when the domain gap increases. Besides better performance and stronger robustness to domain shifts, these KG-NN adapts to multiple datasets and classes without suffering heavily from catastrophic forgetting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

Article Open access 07 September 2023

Context-Driven Visual Object Recognition Based on Knowledge Graphs

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

References

Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1425–1438 (2016)
Article Google Scholar
van Assem, M., Isaac, A., von Ossenbruggen, J.: WordNet 3.0 in RDF (2010). http://semanticweb.cs.vu.nl/lod/wn30/
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: ISWC (2007)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Google Scholar
Chen, X., Li, L., Fei-Fei, L., Gupta, A.: Iterative visual reasoning beyond convolutions. In: CVPR (2018)
Google Scholar
Chen, Z., Wei, X., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: CVPR (2019)
Google Scholar
D’Amour, A., et al.: Underspecification presents challenges for credibility in modern machine learning. CoRR (2020)
Google Scholar
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: AAAI (2018)
Google Scholar
Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: NIPS (2013)
Google Scholar
Gao, J., Zhang, T., Xu, C.: I know the relationships: zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In: AAAI (2019)
Google Scholar
Geng, Y., Chen, J., Jiménez-Ruiz, E., Chen, H.: Human-centric transfer learning explanation via knowledge graph [extended abstract]. CoRR (2019)
Google Scholar
Goodfellow, I.J., Bengio, Y., Courville, A.C.: Deep learning. In: Adaptive Computation and Machine Learning (2016)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)
Google Scholar
Hastie, T., Friedman, J.H., Tibshirani, R.: The elements of statistical learning: data mining, inference, and prediction (2001)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., et al.: The many faces of robustness: a critical analysis of out-of-distribution generalization. CoRR (2020)
Google Scholar
Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: ICLR (2019)
Google Scholar
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. CoRR (2019)
Google Scholar
Hogan, A., et al.: Knowledge graphs. CoRR (2020)
Google Scholar
Joulin, A., van der Maaten, L., Jabri, A., Vasilache, N.: Learning visual features from large weakly supervised data. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 67–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_5
Chapter Google Scholar
Khosla, P., et al.: Supervised contrastive learning. In: NeurIPS (2020)
Google Scholar
Lécué, F., Chen, J., Pan, J.Z., Chen, H.: Knowledge-based explanations for transfer learning. In: Studies on the Semantic Web (2020)
Google Scholar
Lee, C., Fang, W., Yeh, C., Wang, Y.F.: Multi-label zero-shot learning with structured knowledge graphs. In: CVPR (2018)
Google Scholar
Liu, Z., Jiang, Z., Wei, F.: OD-GCN object detection by knowledge graph with GCN. CoRR (2019)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. ACM Commun. 38, 39–41 (1995)
Article Google Scholar
Mitchell, T.M., et al.: Predicting human brain activity associated with the meanings of nouns. Science 320, 1191–1195 (2008)
Article Google Scholar
Nguyen, D.Q., Nguyen, T.D., Nguyen, D.Q., Phung, D.Q.: A novel embedding model for knowledge base completion based on convolutional neural network. In: NAACL-HLT (2018)
Google Scholar
Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs. In: Schuurmans, D., Wellman, M.P. (eds.) AAAI (2016)
Google Scholar
Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings. In: ICLR (2014)
Google Scholar
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR (2018)
Google Scholar
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: NIPS (2009)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP (2014)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. Image (2021)
Google Scholar
Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do ImageNet classifiers generalize to ImageNet? In: ICML (2019)
Google Scholar
Ruder, S., Plank, B.: Learning to select data for transfer learning with bayesian optimization. In: EMNLP (2017)
Google Scholar
Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Socher, R., Ganjoo, M., Manning, C.D., Ng, A.Y.: Zero-shot learning through cross-modal transfer. In: NIPS (2013)
Google Scholar
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: NIPS (2016)
Google Scholar
Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: AAAI (2017)
Google Scholar
Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks 32, 323–332 (2012)
Google Scholar
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: ICANN (2018)
Google Scholar
Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J.B., Isola, P.: Rethinking few-shot image classification: a good embedding is all you need? In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 266–282. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_16
Chapter Google Scholar
University, P.: About WordNet (2010). https://wordnet.princeton.edu
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: NIPS (2016)
Google Scholar
Wang, H., Ge, S., Lipton, Z., Xing, E.P.: Learning robust global representations by penalizing local predictive power. In: NeurIPS (2019)
Google Scholar
Wang, P., Wu, Q., Shen, C., Dick, A.R., van den Hengel, A.: Explicit knowledge-based reasoning for visual question answering. In: IJCAI (2017)
Google Scholar
Wang, X., Ye, Y., Gupta, A.: Zero-shot recognition via semantic embeddings and knowledge graphs. In: CVPR (2018)
Google Scholar
Wilcke, W.X., Bloem, P., de Boer, V., van t Veer, R.H., van Harmelen, F.A.H.: End-to-end entity classification on multimodal knowledge graphs. CoRR (2020)
Google Scholar
Yang, Y., Luo, H., Xu, H., Wu, F.: Towards real-time traffic sign detection and classification. IEEE Trans. Intell. Transp. Syst. 17, 2022–2031 (2016)
Article Google Scholar
Yuan, F., et al.: End-to-end video classification with knowledge graphs. CoRR (2017)
Google Scholar
Zhang, Y., Jiang, H., Miura, Y., Manning, C.D., Langlotz, C.P.: Contrastive learning of medical visual representations from paired images and text. CoRR (2020)
Google Scholar
Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: ICCV (2015)
Google Scholar

Download references

Acknowledgements

This publication was created as part of the research project “KI Delta Learning” (project number: 19A19013D) funded by the Federal Ministry for Economic Affairs and Energy (BMWi) on the basis of a decision by the German Bundestag.

Author information

Authors and Affiliations

Bosch Research, Renningen, Germany
Sebastian Monka, Lavdim Halilaj & Stefan Schmid
Trier University, Trier, Germany
Sebastian Monka & Achim Rettinger

Authors

Sebastian Monka
View author publications
You can also search for this author in PubMed Google Scholar
Lavdim Halilaj
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Achim Rettinger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Monka .

Editor information

Editors and Affiliations

University of Würzburg, Würzburg, Germany
Andreas Hotho
Linköping University, Linköping, Sweden
Eva Blomqvist
University of Düsseldorf, Düsseldorf, Germany
Stefan Dietze
IBM Research - Thomas J. Watson Research, Hawthorne, CA, USA
Achille Fokoue
University of Texas, Austin, TX, USA
Ying Ding
Imperial College, London, UK
Payam Barnaghi
Australian National University, Canberra, ACT, Australia
Armin Haller
Fondazione Bruno Kessler, Povo, Trento, Italy
Mauro Dragoni
The Open University Walton Hall, Milton Keynes, UK
Harith Alani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monka, S., Halilaj, L., Schmid, S., Rettinger, A. (2021). Learning Visual Models Using a Knowledge Graph as a Trainer. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-88361-4_21
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88360-7
Online ISBN: 978-3-030-88361-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)

Learning Visual Models Using a Knowledge Graph as a Trainer

Abstract

Access this chapter

Similar content being viewed by others

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

Context-Driven Visual Object Recognition Based on Knowledge Graphs

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Learning Visual Models Using a Knowledge Graph as a Trainer

Abstract

Access this chapter

Similar content being viewed by others

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

Context-Driven Visual Object Recognition Based on Knowledge Graphs

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation