Abstract
Traditional computer vision approaches, based on neural networks (NN), are typically trained on a large amount of image data. By minimizing the cross-entropy loss between a prediction and a given class label, the NN and its visual embedding space are learned to fulfill a given task. However, due to the sole dependence on the image data distribution of the training domain, these models tend to fail when applied to a target domain that differs from their source domain. To learn a more robust NN to domain shifts, we propose the knowledge graph neural network (KG-NN), a neuro-symbolic approach that supervises the training using image-data-invariant auxiliary knowledge. The auxiliary knowledge is first encoded in a knowledge graph with respective concepts and their relationships, which is then transformed into a dense vector representation via an embedding method. Using a contrastive loss function, KG-NN learns to adapt its visual embedding space and thus its weights according to the image-data invariant knowledge graph embedding space. We evaluate KG-NN on visual transfer learning tasks for classification using the mini-ImageNet dataset and its derivatives, as well as road sign recognition datasets from Germany and China. The results show that a visual model trained with a knowledge graph as a trainer outperforms a model trained with cross-entropy in all experiments, in particular when the domain gap increases. Besides better performance and stronger robustness to domain shifts, these KG-NN adapts to multiple datasets and classes without suffering heavily from catastrophic forgetting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1425–1438 (2016)
van Assem, M., Isaac, A., von Ossenbruggen, J.: WordNet 3.0 in RDF (2010). http://semanticweb.cs.vu.nl/lod/wn30/
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: ISWC (2007)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Chen, X., Li, L., Fei-Fei, L., Gupta, A.: Iterative visual reasoning beyond convolutions. In: CVPR (2018)
Chen, Z., Wei, X., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: CVPR (2019)
D’Amour, A., et al.: Underspecification presents challenges for credibility in modern machine learning. CoRR (2020)
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: AAAI (2018)
Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: NIPS (2013)
Gao, J., Zhang, T., Xu, C.: I know the relationships: zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In: AAAI (2019)
Geng, Y., Chen, J., Jiménez-Ruiz, E., Chen, H.: Human-centric transfer learning explanation via knowledge graph [extended abstract]. CoRR (2019)
Goodfellow, I.J., Bengio, Y., Courville, A.C.: Deep learning. In: Adaptive Computation and Machine Learning (2016)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)
Hastie, T., Friedman, J.H., Tibshirani, R.: The elements of statistical learning: data mining, inference, and prediction (2001)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., et al.: The many faces of robustness: a critical analysis of out-of-distribution generalization. CoRR (2020)
Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: ICLR (2019)
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. CoRR (2019)
Hogan, A., et al.: Knowledge graphs. CoRR (2020)
Joulin, A., van der Maaten, L., Jabri, A., Vasilache, N.: Learning visual features from large weakly supervised data. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 67–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_5
Khosla, P., et al.: Supervised contrastive learning. In: NeurIPS (2020)
Lécué, F., Chen, J., Pan, J.Z., Chen, H.: Knowledge-based explanations for transfer learning. In: Studies on the Semantic Web (2020)
Lee, C., Fang, W., Yeh, C., Wang, Y.F.: Multi-label zero-shot learning with structured knowledge graphs. In: CVPR (2018)
Liu, Z., Jiang, Z., Wei, F.: OD-GCN object detection by knowledge graph with GCN. CoRR (2019)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Miller, G.A.: WordNet: a lexical database for English. ACM Commun. 38, 39–41 (1995)
Mitchell, T.M., et al.: Predicting human brain activity associated with the meanings of nouns. Science 320, 1191–1195 (2008)
Nguyen, D.Q., Nguyen, T.D., Nguyen, D.Q., Phung, D.Q.: A novel embedding model for knowledge base completion based on convolutional neural network. In: NAACL-HLT (2018)
Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs. In: Schuurmans, D., Wellman, M.P. (eds.) AAAI (2016)
Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings. In: ICLR (2014)
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR (2018)
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: NIPS (2009)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP (2014)
Radford, A., et al.: Learning transferable visual models from natural language supervision. Image (2021)
Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do ImageNet classifiers generalize to ImageNet? In: ICML (2019)
Ruder, S., Plank, B.: Learning to select data for transfer learning with bayesian optimization. In: EMNLP (2017)
Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Socher, R., Ganjoo, M., Manning, C.D., Ng, A.Y.: Zero-shot learning through cross-modal transfer. In: NIPS (2013)
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: NIPS (2016)
Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: AAAI (2017)
Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks 32, 323–332 (2012)
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: ICANN (2018)
Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J.B., Isola, P.: Rethinking few-shot image classification: a good embedding is all you need? In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 266–282. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_16
University, P.: About WordNet (2010). https://wordnet.princeton.edu
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: NIPS (2016)
Wang, H., Ge, S., Lipton, Z., Xing, E.P.: Learning robust global representations by penalizing local predictive power. In: NeurIPS (2019)
Wang, P., Wu, Q., Shen, C., Dick, A.R., van den Hengel, A.: Explicit knowledge-based reasoning for visual question answering. In: IJCAI (2017)
Wang, X., Ye, Y., Gupta, A.: Zero-shot recognition via semantic embeddings and knowledge graphs. In: CVPR (2018)
Wilcke, W.X., Bloem, P., de Boer, V., van t Veer, R.H., van Harmelen, F.A.H.: End-to-end entity classification on multimodal knowledge graphs. CoRR (2020)
Yang, Y., Luo, H., Xu, H., Wu, F.: Towards real-time traffic sign detection and classification. IEEE Trans. Intell. Transp. Syst. 17, 2022–2031 (2016)
Yuan, F., et al.: End-to-end video classification with knowledge graphs. CoRR (2017)
Zhang, Y., Jiang, H., Miura, Y., Manning, C.D., Langlotz, C.P.: Contrastive learning of medical visual representations from paired images and text. CoRR (2020)
Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: ICCV (2015)
Acknowledgements
This publication was created as part of the research project “KI Delta Learning” (project number: 19A19013D) funded by the Federal Ministry for Economic Affairs and Energy (BMWi) on the basis of a decision by the German Bundestag.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Monka, S., Halilaj, L., Schmid, S., Rettinger, A. (2021). Learning Visual Models Using a Knowledge Graph as a Trainer. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-88361-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88360-7
Online ISBN: 978-3-030-88361-4
eBook Packages: Computer ScienceComputer Science (R0)