Skip to main content

Towards Visually Intelligent Agents (VIA): A Hybrid Approach

  • Conference paper
  • First Online:
The Semantic Web: ESWC 2021 Satellite Events (ESWC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12739))

Included in the following conference series:

  • 899 Accesses


Service robots can undertake tasks that are impractical or even dangerous for us - e.g., industrial welding, space exploration, and others. To carry out these tasks reliably, however, they need Visual Intelligence capabilities at least comparable to those of humans. Despite the technological advances enabled by Deep Learning (DL) methods, Machine Visual Intelligence is still vastly inferior to Human Visual Intelligence. Methods which augment DL with Semantic Web technologies, on the other hand, have shown promising results. In the lack of concrete guidelines on which knowledge properties and reasoning capabilities to leverage within this new class of hybrid methods, this PhD work provides a reference framework of epistemic requirements for the development of Visually Intelligent Agents (VIA). Moreover, the proposed framework is used to derive a novel hybrid reasoning architecture, to address real-world robotic scenarios which require Visual Intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. Aditya, S., Yang, Y., Baral, C.: Integrating knowledge and reasoning in image understanding. In: Proceedings of IJCAI 2019, pp. 6252–6259 (2019)

    Google Scholar 

  2. Alatise, M.B., Hancke, G.P.: A review on challenges of autonomous mobile robot and sensor fusion methods. IEEE Access 8, 39830–39846 (2020)

    Article  Google Scholar 

  3. Landau, B., Jackendoff, R.: “What’’ and “where’’ in spatial language and spatial cognition. Behav. Brain Sci. 16, 217–265 (1993)

    Article  Google Scholar 

  4. Bastianelli, E., Bardaro, G., Tiddi, I., Motta, E.: Meet HanS, the heath & safety autonomous inspector. In: Proceedings of the International Semantic Web Conference (ISWC), Poster&Demo Track (2018)

    Google Scholar 

  5. Borrmann, A., Rank, E.: Query support for BIMs using semantic and spatial conditions. In: Handbook of Research on Building Information Modeling and Construction Informatics: Concepts and Technologies (2010)

    Google Scholar 

  6. Chang, A.X., et al.: ShapeNet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  7. Chiatti, A., Bardaro, G., Motta, E., Daga, E.: Commonsense spatial reasoning for visually intelligent agents. arXiv preprint arXiv:2104.00387 (2021)

  8. Chiatti, A., Motta, E., Daga, E.: Towards a framework for visual intelligence in service robotics: epistemic requirements and gap analysis. In: Proceedings of KR 2020- Special session on KR & Robotics, pp. 905–916. IJCAI (2020)

    Google Scholar 

  9. Chiatti, A., Motta, E., Daga, E., Bardaro, G.: Fit to measure: reasoning about sizes for robust object recognition. In: To appear in Proceedings of the AAAI2021 Spring Symposium on Combining Machine Learning and Knowledge Engineering (AAAI-MAKE 2021) (2021)

    Google Scholar 

  10. Daruna, A., Liu, W., Kira, Z., Chetnova, S.: RoboCSE: robot common sense embedding. In: Proceedings of ICRA, pp. 9777–9783. IEEE (2019)

    Google Scholar 

  11. Daruna, A.A., et al.: SiRoK: situated robot knowledge-understanding the balance between situated knowledge and variability. In: 2018 AAAI Spring Symposium Series (2018)

    Google Scholar 

  12. Deeken, H., Wiemann, T., Hertzberg, J.: Grounding semantic maps in spatial databases. Robot. Auton. Syst. 105, 146–165 (2018)

    Article  Google Scholar 

  13. Gouidis, F., Vassiliades, A., Patkos, T., Argyros, A., Bassiliades, N., Plexousakis, D.: A review on intelligent object perception methods combining knowledge-based reasoning and machine learning. arXiv:1912.11861 [cs], March 2020

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)

    Google Scholar 

  15. Hoffman, D.D.: Visual Intelligence: How We Create What We See. WW Norton & Company, New York (2000)

    Google Scholar 

  16. van Krieken, E., Acar, E., van Harmelen, F.: Analyzing differentiable fuzzy implications. In: Proceedings of KR 2020, pp. 893–903 (2020)

    Google Scholar 

  17. Krishna, R., Zhu, Y., Groth, O., Johnson, J., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  18. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  19. Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40 (2017)

    Google Scholar 

  20. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553)(2015)

    Google Scholar 

  21. Liu, D., Bober, M., Kittler, J.: Visual semantic information pursuit: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2019)

    Google Scholar 

  22. Liu, L., et al.: Deep learning for generic object detection: a survey. Int. J. Comput. Vis. 128(2), 261–318 (2020)

    Article  Google Scholar 

  23. Mancini, M., Karaoguz, H., Ricci, E., Jensfelt, P., Caputo, B.: Knowledge is never enough: towards web aided deep open world recognition. In: IEEE ICRA, p. 9543, May 2019

    Google Scholar 

  24. Marcus, G.: Deep learning: a critical appraisal. arXiv preprint arXiv:1801.00631 (2018)

  25. Marino, K., Salakhutdinov, R., Gupta, A.: The more you know: using knowledge graphs for image classification. In: Proceedings of IEEE CVPR, pp. 20–28, July 2017

    Google Scholar 

  26. Parisi, G.I., Kemker, R., Part, J.L., Kanan, C., Wermter, S.: Continual lifelong learning with neural networks: a review. Neural Netw. 113, 54–71 (2019)

    Google Scholar 

  27. Paulius, D., Sun, Y.: A survey of knowledge representation in service robotics. Robot. Auton. Syst. 118, 13–30 (2019)

    Google Scholar 

  28. Pearl, J.: Theoretical impediments to machine learning with seven sparks from the causal revolution. In: Proceedings of WSDM 2018, p. 3. ACM, February 2018

    Google Scholar 

  29. Serafini, L., Garcez, A.D.: Logic tensor networks: deep learning and logical reasoning from data and knowledge. arXiv:1606.04422 [cs], July 2016

  30. Storks, S., Gao, Q., Chai, J.Y.: Recent advances in natural language inference: a survey of benchmarks, resources, and approaches. arXiv preprint arXiv:1904.01172 (2019)

  31. Wu, Q., Teney, D., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Visual question answering: a survey of methods and datasets. Comput. Vis. Image Underst. 163, 21–40 (2017)

    Article  Google Scholar 

  32. Yang, K., Russakovsky, O., Deng, J.: Spatialsense: an adversarially crowdsourced benchmark for spatial relation recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2051–2060 (2019)

    Google Scholar 

  33. Young, J., Kunze, L., Basile, V., Cabrio, E., Hawes, N., Caputo, B.: Semantic web-mining and deep vision for lifelong object discovery. In: Proceedings of ICRA, pp. 2774–2779. IEEE (2017)

    Google Scholar 

  34. Zeng, A., et al.: Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. In: 2018 IEEE ICRA, pp. 1–8. IEEE (2018)

    Google Scholar 

Download references


I would like to thank my supervisors, Prof. Enrico Motta and Dr. Enrico Daga, for their continuous support and guidance throughout this PhD project. It is also thanks to them if I have found out about the ESWC PhD symposium.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Agnese Chiatti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chiatti, A. (2021). Towards Visually Intelligent Agents (VIA): A Hybrid Approach. In: Verborgh, R., et al. The Semantic Web: ESWC 2021 Satellite Events. ESWC 2021. Lecture Notes in Computer Science(), vol 12739. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80417-6

  • Online ISBN: 978-3-030-80418-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics