6D Pose Estimation for Industrial Applications

  • Federico Cunico
  • Marco CarlettiEmail author
  • Marco Cristani
  • Fabio Masci
  • Davide Conigliaro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11808)


Object pose estimation is important for systems and robots to interact with the environment where the main challenge of this task is the complexity of the scene caused by occlusions and clutters. A key challenge is performing pose estimation leveraging on both RGB and depth information: prior works either extract information from the RGB image and depth separately or use costly post-processing steps, limiting their performances in highly cluttered scenes and real-time applications. Traditionally, the pose estimation problem is tackled by matching feature points between 3D models and images. However, these methods require rich textured models. In recent years, the raising of deep learning has offered an increasing number of methods based on neural networks, such as DSAC++, PoseCNN, DenseFusion and SingleShotPose. In this work, we present a comparison between two recent algorithms, DSAC++ and DenseFusion, focusing on computational cost, performance and applicability in the industry.


Pose estimation Deep learning 



We thank The Edge Company, Srl for the support to this research.


  1. 1.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). Scholar
  2. 2.
    Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). Scholar
  3. 3.
    Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: CVPR (2017)Google Scholar
  4. 4.
    Brachmann, E., Rother, C.: 6D camera localization via 3D surface regression. In: CVPR (2018)Google Scholar
  5. 5.
    Choi, C., Christensenb, H.I.: 3D textureless object detection and tracking: an edge-based approach. In: IROS (2012)Google Scholar
  6. 6.
    Choi, C., Christensenb, H.I.: RGB-D object pose estimation in unstructured environments. RAS 75, 595–613 (2016)Google Scholar
  7. 7.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Gao, X., Hou, X., Tang, J., Cheng, H.: Complete solution classification for the perspective-three-point problem. TPAMI 25, 930–943 (2003)CrossRefGoogle Scholar
  9. 9.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The Kitti vision benchmark suite. In: CVPR (2012)Google Scholar
  10. 10.
    Glocker, B., Izadi, S., Shotton, J., Criminisi, A.: Real-time RGB-D camera relocalization. In: ISMAR (2013)Google Scholar
  11. 11.
    Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: ICCV (2011)Google Scholar
  12. 12.
    Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). Scholar
  13. 13.
    Kehl, W., Milletari, F., Tombari, F., Ilic, S., Navab, N.: Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 205–220. Springer, Cham (2016). Scholar
  14. 14.
    Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV (2015)Google Scholar
  15. 15.
    Lee, K.: Augmented reality in education and training. TechTrends 56, 13–21 (2012)CrossRefGoogle Scholar
  16. 16.
    Lee, Y.H., Medioni, G.: Wearable RGBD indoor navigation system for the blind. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 493–508. Springer, Cham (2015). Scholar
  17. 17.
    Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. IJCV 81, 155 (2009)CrossRefGoogle Scholar
  18. 18.
    Li, C., Bai, J., Hager, G.D.: A unified framework for multi-view multi-class object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 263–281. Springer, Cham (2018). Scholar
  19. 19.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)Google Scholar
  20. 20.
    Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. TVCG 22, 2633–2651 (2015)Google Scholar
  21. 21.
    Marder-Eppstein, E.: Project tango. In: SIGGRAPH (2016)Google Scholar
  22. 22.
    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)Google Scholar
  23. 23.
    Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-DOF object pose from semantic keypoints. In: ICRA (2017)Google Scholar
  24. 24.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)Google Scholar
  25. 25.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  26. 26.
    Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR (2017)Google Scholar
  27. 27.
    Schwarz, M., Schulz, H., Behnke, S.: RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In: ICRA (2015)Google Scholar
  28. 28.
    Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 634–651. Springer, Cham (2014). Scholar
  29. 29.
    Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: CVPR (2016)Google Scholar
  30. 30.
    Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR (2018)Google Scholar
  31. 31.
    Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. CoRR abs/1901.04780 (2019)Google Scholar
  32. 32.
    Webel, S., Bockholt, U., Engelke, T., Gavish, N., Olbrich, M., Preusche, C.: An augmented reality training platform for assembly and maintenance skills. RAS 61, 398–403 (2013)Google Scholar
  33. 33.
    Weiss, S., Achtelik, M.W., Chli, M., Siegwart, R.: Versatile distributed pose estimation and sensor self-calibration for an autonomous MAV. In: ICRA (2012)Google Scholar
  34. 34.
    Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. RSS (2018)Google Scholar
  35. 35.
    Zhu, M., et al.: Single image 3D object detection and pose estimation for grasping. In: ICRA (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Federico Cunico
    • 1
  • Marco Carletti
    • 1
    Email author
  • Marco Cristani
    • 1
    • 2
  • Fabio Masci
    • 3
  • Davide Conigliaro
    • 4
  1. 1.Department of Computer ScienceUniversity of VeronaVeronaItaly
  2. 2.Istituto Italiano di Tecnologia (IIT)GenovaItaly
  3. 3.The Edge Company, SrlRiminiItaly
  4. 4.Humatics, SrlVeronaItaly

Personalised recommendations