6D Object Pose Estimation for Robot Programming by Demonstration

  • Mohammad GhahramaniEmail author
  • Aleksandar Vakanski
  • Farrokh Janabi-Sharifi
Conference paper
Part of the Springer Proceedings in Physics book series (SPPHY, volume 233)


Estimating the position and orientation (pose) of objects in images is a crucial step toward successful robot programming by demonstration using visual task learning. Currently, a number of algorithms exist for detecting and tracking objects in images, including conventional image processing methods and the state-of-the-art methods based on deep learning architectures. However, the problem of accurate estimation of 6D poses of objects in a sequence of video frames still poses challenges. In this paper, we present a novel deep learning method for pose estimation based on data augmentation and nonlinear regression. For training purposes, thousands of images associated with views of different poses of an object are generated based on a known CAD model of the object geometry. The trained deep neural network is employed for accurate and real-time estimation of the orientation of the object. The object position coordinates in the demonstrations are obtained from the depth information of the scene captured by a Microsoft Kinect v2.0 sensor. The resulting 6-dimensional poses are estimated at each time frame and are employed for learning robotic tasks at a trajectory level of abstraction. Robot inverse kinematics is applied to generate a program for robotic task execution. The proposed method is validated for transferring new skills to a robot in a painting application.



This work was supported by NSERC Innovation to Idea (I2I) grant (I2I PJ 486866-15). We would like to thank Miss. Kaiqi Cheng for validating the experiments. Authors received a high-end Graphical Processing Unit (GPU), Titan XP from NVIDIA which was used for this research.


  1. 1.
    G. Biggs, B. MacDonald, A survey of robot programming systems, in Proceedings of the Australasian Conference on Robotics and Automation, Brisbane, Australia (2003), pp. 1–10Google Scholar
  2. 2.
    S. Schaal, A. Ijspeert, A. Billard, Computational approaches to motor learning by imitation. Philos. Trans. R. Soc. Lond. Biol. Sci. 358(1431), 537–547 (2003)CrossRefGoogle Scholar
  3. 3.
    S. Calinon, Robot Programming by Demonstration: A Probabilistic Approach (EPFL/CRC Press, Boca Raton, USA, 2009)Google Scholar
  4. 4.
    A.G. Billard, S. Calinon, R. Dillmann, Learning from humans, in Handbook of Robotics, ed. by B. Siciliano, O. Khatib (Springer, New York, USA, 2016), pp. 1995–2014Google Scholar
  5. 5.
    B. Argall, S. Chernova, M. Veloso, B. Browning, A survey of learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)CrossRefGoogle Scholar
  6. 6.
    A. Vakanski, F. Janabi-Sharifi, Robot Learning from Visual Observation (Wiley, 2017)Google Scholar
  7. 7.
    X. Jia, H. Lu, M. Yang, Visual tracking via adaptive structural local sparse appearance model, in IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA (2012), pp. 1822–1829Google Scholar
  8. 8.
    D. Li, W. Chen, Object tracking with convolutional neural networks and kernelized correlation filters, in Chinese Control and Decision Conference, Chongqing, China (2017), pp. 1039–1044Google Scholar
  9. 9.
    J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, in IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA (2016), pp. 779–788Google Scholar
  10. 10.
    R. Dillmann, Teaching and learning of robot tasks via observation of human performance. Robot. Auton. Syst. 47(2–3), 109–116 (2004)CrossRefGoogle Scholar
  11. 11.
    D. Martinez, D. Kragic, Modeling and recognition of actions through motor primitives, in Proceedings of the IEEE International Conference Robotics and Automation, Pasadena, USA (2008), pp. 1704–1709Google Scholar
  12. 12.
    A. Vakanski, I. Mantegh, A. Irish, F. Janabi-Sharifi, Trajectory learning for robot programming by demonstration using hidden Markov model and dynamic time warping. IEEE Trans. Syst. Man Cybern. Part B 41(4), 1039–1052 (2012)Google Scholar
  13. 13.
    A. Vakanski, F. Janabi-Sharifi, I. Mantegh, An image-based trajectory planning approach for robust robot programming by demonstration. Robot. Auton. Syst. 98, 241–257 (2017)CrossRefGoogle Scholar
  14. 14.
    O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint (MIT Press, Cambridge, USA, 1993)Google Scholar
  15. 15.
    P. Wohlhart, V. Lepetit, Learning descriptors for object recognition and 3D pose estimation, in IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA (2015), pp. 3109–3118Google Scholar
  16. 16.
    Demos available at
  17. 17.
    Autodesk ReCap (2018). Available at
  18. 18.
    Autodesk 3D Max (2018). Available at
  19. 19.
    S. Karen, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556 (2014)
  20. 20.
    Quarc Real-time Control Software (2018). Available at

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Mohammad Ghahramani
    • 1
    Email author
  • Aleksandar Vakanski
    • 2
  • Farrokh Janabi-Sharifi
    • 1
  1. 1.Ryerson UniversityTorontoCanada
  2. 2.University of IdahoIdaho FallsUSA

Personalised recommendations