Abstract
Scene understanding is a fundamental challenge for intelligent robots, especially for social robots, which are expected to have a human-like perception, comprehension, and knowledge. This paper proposes an approach to enable robots not only to detect objects in a scene but also to understand and reason the working environments. The proposed method contains three parts, which are object detection, object semantic comprehension, and feedback on robotic comprehension. Semantic comprehension is based on dictionary definitions of objects. The category, function, property, and composition of the detected objects are analyzed. These four elements are used to assist the robot in comprehending the target object in details. The experiment part of this paper discusses the applicability of the proposed method on robots.
Similar content being viewed by others
References
Li H, Cabibihan J-J, Tan YK (2011) Towards an effective design of social robots. Int J Soc Robot 3(4):333–335
Yan H, Ang MH, Poo AN (2014) A survey on perception methods for human–robot interaction in social robots. Int J Soc Robot 6(1):85–119
Rosman B, Ramamoorthy S (2011) Learning spatial relationships between objects. Int J Robot Res 30(11):1328–1342
Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42(3–4):143–166
Bartneck C, Forlizzi J (2004) A design-centred framework for social human–robot interaction. In: RO-MAN 2004. 13th IEEE international workshop on robot and human interactive communication (IEEE Catalog No. 04TH8759). IEEE, 2004, pp 591–594
Breazeal CL (2002) Designing sociable robots. MIT Press, Cambridge
Ersen M, Oztop E, Sariel S (2017) Cognition-enabled robot manipulation in human environments: requirements, recent work, and open problems. IEEE Robot Autom Mag 24(3):108–122
Camarasa GA, Siebert JP (2009) A hierarchy of visual behaviours in an active binocular robot head
Aragon-Camarasa G, Fattah H, Siebert JP (2010) Towards a unified visual framework in a binocular active robot vision system. Robot Auton Syst 58(3):276–286
Papageorgiou CP, Oren M, Poggio T (1998) A general framework for object detection. In: Sixth international conference on computer vision (IEEE Cat. No. 98CH36271). IEEE, 1998, pp 555–562
Fulkerson B, Vedaldi A, Soatto S (2009) Class segmentation and object localization with superpixel neighborhoods. In: 2009 IEEE 12th international conference on computer vision. IEEE, 2009, pp 670–677
Gevers T, Smeulders AW (1999) Color-based object recognition. Pattern Recogn 32(3):453–464
Bai X, Yang X, Latecki LJ (2008) Detection and recognition of contour parts based on shape similarity. Pattern Recognit 41(7):2189–2199
Zhou B, Zhao H, Puig X, Xiao T, Fidler S, Barriuso A, Torralba A (2019) Semantic understanding of scenes through the ade20k dataset. Int J Comput Vis 127(3):302–321
Tenorth M, Kunze L, Jain D, Beetz M (2010) Knowrob-map-knowledge-linked semantic object maps. In: 2010 10th IEEE-RAS international conference on humanoid robots (humanoids). IEEE, 2010, pp 430–435
Pangercic D, Tenorth M, Jain D, Beetz M (2010) Combining perception and knowledge processing for everyday manipulation-k-copman
Beetz M, Bálint-Benczédi F, Blodow N, Nyga D, Wiedemeyer T, Márton Z-C (2015) Robosherlock: unstructured information processing for robot perception. In: 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 2015, pp 1549–1556
Pangercic D, Tenorth M, Jain D, Beetz M (2010) Combining perception and knowledge processing for everyday manipulation. In: 2010 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2010, pp 1065–1071
Anderson JE (1995) Constraint-directed improvisation for everyday activities
Thrun S (2002) Probabilistic robotics. Commun ACM 45(3):52–57
Kunze L, Tenorth M, Beetz M (2010) Putting people’s common sense into knowledge bases of household robots. In: Annual conference on artificial intelligence. Springer 2010, pp 151–159
Ai-Chang M, Bresina J, Charest L, Chase A, Hsu J-J, Jonsson A, Kanefsky B, Morris P, Rajan K, Yglesias J et al (2004) Mapgen: mixed-initiative planning and scheduling for the mars exploration rover mission. IEEE Intell Syst 19(1):8–12
Cristianini N, Shawe-Taylor J et al (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Schalkoff RJ (1997) Artificial neural networks, vol 1. McGraw-Hill, New York
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems 2015, pp 91–99
Tenorth M, Nyga D, Beetz M (2010) Understanding and executing instructions for everyday manipulation tasks from the world wide web. In: 2010 IEEE international conference on robotics and automation. IEEE, 2010, pp 1486–1491
Matuszek C, Fox D, Koscher K (2010) Following directions using statistical machine translation. In: 2010 5th ACM/IEEE international conference on human–robot interaction (HRI). IEEE, 2010, pp 251–258
Tellex S, Kollar T, Dickerson S, Walter MR, Banerjee AG, Teller S, Roy N (2011) Understanding natural language commands for robotic navigation and mobile manipulation. In: Twenty-fifth AAAI conference on artificial intelligence
Dzifcak J, Scheutz M, Baral C, Schermerhorn P (2009) What to do and how to do it: translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In: 2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp 4163–4168
Goodrich MA, Schultz AC et al (2008) Human-robot interaction: a survey. Found Trends Hum Comput Interaction 1(3):203–275
Mataric MJ (1990) A distributed model for mobile robot environment-learning and navigation. Massachusetts Inst of Tech Cambridge Artificial Intelligence Lab, Technical Report
Valada A, Oliveira GL, Brox T, Burgard W (2016) Deep multispectral semantic scene understanding of forested environments using multimodal fusion. In: International symposium on experimental robotics. Springer 2016, pp 465–477
Whelan T, Leutenegger S, Salas-Moreno R, Glocker B, Davison A (2015) Elasticfusion: dense slam without a pose graph. Robotics: Science and Systems
Popović M, Kootstra G, Jørgensen JA, Kragic D, Krüger N (2011) Grasping unknown objects using an early cognitive vision system for general scene understanding. In: 2011 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2011, pp 987–994
Jodoin P-M, Benezeth Y, Wang Y (2013) Meta-tracking for video scene understanding. In: 2013 10th IEEE international conference on advanced video and signal based surveillance. IEEE, 2013, pp 1–6
Emami S, Suciu VP (2012) Facial recognition using opencv. J Mobile Embed Distrib Syst 4(1):38–43
Jain P, Pawar P, Koriya G, Lele A, Kumar A, Darbari H (2015) Knowledge acquisition for language description from scene understanding. In: 2015 international conference on computer, communication and control (IC4). IEEE, 2015, pp 1–6
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2015, pp 3367–3375
Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2015, pp 922–928
Leger M, Quiedeville A, Bouet V, Haelewyn B, Boulouard M, Schumann-Bard P, Freret T (2013) Object recognition test in mice. Nat Protoc 8(12):2531
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision. Springer 2012, pp 548–562
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition 2017, pp 7263–7271
Tenorth M, Beetz M (2009) Knowrob–knowledge processing for autonomous personal robots. In: 2009 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2009, pp 4261–4266
Baddoura R, Venture G (2013) Social vs. useful hri: experiencing the familiar, perceiving the robot as a sociable partner and responding to its actions. Int J Soc Robot 5(4):529–547
Lang D, Friedmann S, Häselich M, Paulus D (2014) Definition of semantic maps for outdoor robotic tasks. In: 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014). IEEE, 2014, pp 2547–2552
Yan F, Nannapaneni S, He H (2019) Robotic scene understanding by using a dictionary. In: 2019 IEEE international conference on robotics and biomimetics (ROBIO). IEEE, 2019, pp 895–900
Yan F, Zhang Y, He H (2018) Semantics comprehension of entities in dictionary corpora for robot scene understanding. International Conference on Social Robotics. Springer 2018, pp 359–368
Lang D, Friedmann S, Hedrich J, Paulus D (2015) Semantic mapping for mobile outdoor robots. In: 2015 14th IAPR international conference on machine vision applications (MVA). IEEE, 2015, pp 325–328
Yang K, Bergasa LM, Romera E, Wang K (2019) Robustifying semantic cognition of traversability across wearable rgb-depth cameras. Appl Opt 58(12):3141–3155
Rani PJ, Bakthakumar J, Kumaar BP, Kumaar UP, Kumar S (2017) Voice controlled home automation system using natural language processing (nlp) and internet of things (iot). In: 2017 third international conference on science technology engineering & management (ICONSTEM). IEEE, 2017, pp 368–373
Nyga D, Balint-Benczedi F, Beetz M (2014) Pr2 looking at things—ensemble learning for unstructured information processing with markov logic networks. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp 3916–3923
Stich SP (1975) Logical form and natural language. Philos Stud 28(6):397–418
O. U. Press (2010) Oxford Dictionary of English, O. U. Press, Ed. Oxford University Press
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Prestes E, Carbonera JL, Fiorini SR, Jorge VA, Abel M, Madhavan R, Locoro A, Goncalves P, Barreto ME, Habib M et al (2013) Towards a core ontology for robotics and automation. Robot Auton Syst 61(11):1193–1204
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer 2014, pp 740–755
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Hinton GE, Salakhutdinov RR (2009) Replicated softmax: an undirected topic model. In: Advances in neural information processing systems 2009, pp 1607–1614
Schlenoff C, Prestes E, Madhavan R, Goncalves P, Li H, Balakirsky S, Kramer T, Miguelanez E (2012) An ieee standard ontology for robotics and automation. In: 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2012, pp 1337–1342
Maedche A, Staab S (2001) Ontology learning for the semantic web. IEEE Intell Syst 16(2):72–79
Davies J, Fensel D, Van Harmelen F (2003) Towards the semantic web: ontology-driven knowledge management. Wiley, New York
McGuinness DL, Van Harmelen F et al (2004) Owl web ontology language overview. W3C Recomm 10(10):2004
Pot E, Monceaux J, Gelin R, Maisonnier B (2009) Choregraphe: a graphical tool for humanoid robot programming. In: RO-MAN 2009-The 18th IEEE international symposium on robot and human interactive communication. IEEE, 2009, pp 46–51
Olson DL, Delen D (2008) Advanced data mining techniques. Springer, Berlin
Funding
This work has been supported by the Wichita Medical Research and Education Foundation and the Regional Institute on Aging (Grant No. 20,000).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, F., Tran, D.M. & He, H. Robotic Understanding of Object Semantics by Referringto a Dictionary. Int J of Soc Robotics 12, 1251–1263 (2020). https://doi.org/10.1007/s12369-020-00657-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-020-00657-6