Visual Object Detection for an Autonomous Indoor Robotic System

  • Anima M. Sharma
  • Imran A. Syed
  • Bishwajit Sharma
  • Arshad Jamal
  • Dipti Deodhare
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 703)


This paper discusses an indoor robotic system that integrates a state-of-the-art object detection algorithm trained with data augmented for an indoor scenario and enabled with mechanisms to localize and position objects in 3D and display them interactively to a user. Size, weight, and power constraints in a mobile robot constrain the type of computing hardware that can be integrated with the robotic platform. However, on the other hand, the robot’s mobility if leveraged properly can provide enough opportunity to detect objects from different distances and viewpoints as the robot approaches them giving more robust results. This work adapts a CNN-based algorithm, YOLO, to run on a GPU-enabled board, the Jetson TX1. An innovative method to calculate the object position in the 3D environment map is discussed along with the problems therein, such as that of duplicate detections that need to be suppressed. Since multiple objects of different or same class may be detected, the user is overloaded with information and management of the visualization through human–machine interaction gains an important role. A scheme for informative display of objects is implemented which lets the user interactively view object images as well as their position in the scene. The complete robotic system including the interactive visualization tool can be put to various uses such as search and rescue, indoor assistance, patrolling and surveillance.


Object detection CNN Duplicate suppression 3D localization Mapping Visualization 


  1. 1.
    A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012.Google Scholar
  2. 2.
    Menglong Zhu, Konstantinos G. Derpanis, Yinfei Yang, Samarth Brahmbhatt, Mabel Zhang, Cody Phillips, Matthieu Lecce and Kostas Daniilidis, Single Image 3D Object Detection and Pose Estimation for Grasping, ICRA, 2014.Google Scholar
  3. 3.
    Ian Lenz, Honglak Lee and Ashutosh Saxena, Deep Learning for Detecting Robotic Grasps, arXiv 2014.Google Scholar
  4. 4.
    Ling Cai, Lei He, Yiren Xu, Yuming Zhao, Xin Yang, Multi-object detection and tracking by stereo vision, Pattern Recognition, 2010.Google Scholar
  5. 5.
    Arjun Singh, James Sha, Karthik S. Narayan, Tudor Achim, Pieter Abbeel, BigBIRD: A Large-Scale 3D Database of Object Instances, ICRA, 2014.Google Scholar
  6. 6.
    Omid Hosseini Jafari, Dennis Mitzel, Bastian Leibe, Real-Time RGB-D based People Detection and Tracking for Mobile Robots and Head-Worn Cameras, ICRA, 2014.Google Scholar
  7. 7.
    Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann Le Cun, OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, arXiv, 2014.Google Scholar
  8. 8.
    Saurabh Gupta, Ross Girshick, Pablo Arbelaez, and Jitendra Malik, Learning Rich Features from RGB-D Images for Object Detection and Segmentation, arXiv, 2014.Google Scholar
  9. 9.
    Yulan Guo, Mohammed Bennamoun, Ferdous Sohel, Min Lu, and Jianwei Wan, 3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, November 2014.Google Scholar
  10. 10.
    Christian Szegedy, Alexander Toshev, Dumitru Erhan, Deep Neural Networks for Object Detection, NIPS, 2013.Google Scholar
  11. 11.
    Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov, Scalable Object Detection using Deep Neural Networks, CVPR, 2014.Google Scholar
  12. 12.
    Yu Xiang, Roozbeh Mottaghi, Silvio Savarese, Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild, WACV, 2014.Google Scholar
  13. 13.
    Xiaozhi Chen, Kaustav Kundu, Yukun Zhu, Andrew Berneshawi, Huimin Ma, SanjaFidler, Raquel Urtasun, 3D Object Proposals for Accurate Object Class Detection, NIPS, 2015.Google Scholar
  14. 14.
    Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once: Unified, Real-Time Object Detection, CVPR, 2016.Google Scholar
  15. 15.
    Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005.Google Scholar
  16. 16.
    Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 2010.Google Scholar
  17. 17.
    J. Dong, Q. Chen, S. Yan, and A. Yuille. Towards unified object detection and semantic segmentation. In Computer Vision–ECCV 2014, pages 299–314. Springer, 2014.Google Scholar
  18. 18.
    Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, 2014.Google Scholar
  19. 19.
    Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015.Google Scholar
  20. 20.
    Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Region-Based Convolutional Networks for Accurate Object Detection and Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, January 2016.Google Scholar
  21. 21.
    M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98–136, Jan. 2015.Google Scholar
  22. 22.
    Khaled Alhamzi, Mohammed Elmogy, Sherif Barakat, 3D Object Recognition Based on Local and Global Features Using Point Cloud Library, IJACT, 2015.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Anima M. Sharma
    • 1
  • Imran A. Syed
    • 1
  • Bishwajit Sharma
    • 1
  • Arshad Jamal
    • 1
  • Dipti Deodhare
    • 1
  1. 1.Centre for Artificial Intelligence and RoboticsBangaloreIndia

Personalised recommendations