Fast 6D Pose Estimation from a Monocular Image Using Hierarchical Pose Trees

  • Yoshinori KonishiEmail author
  • Yuki Hanzawa
  • Masato Kawade
  • Manabu Hashimoto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9905)


It has been shown that the template based approaches could quickly estimate 6D pose of texture-less objects from a monocular image. However, they tend to be slow when the number of templates amounts to tens of thousands for handling a wider range of 3D object pose. To alleviate this problem, we propose a novel image feature and a tree-structured model. Our proposed perspectively cumulated orientation feature (PCOF) is based on the orientation histograms extracted from randomly generated 2D projection images using 3D CAD data, and the template using PCOF explicitly handle a certain range of 3D object pose. The hierarchical pose trees (HPT) is built by clustering 3D object pose and reducing the resolutions of templates, and HPT accelerates 6D pose estimation based on a coarse-to-fine strategy with an image pyramid. In the experimental evaluation on our texture-less object dataset, the combination of PCOF and HPT showed higher accuracy and faster speed in comparison with state-of-the-art techniques.


6D pose estimation Texture-less objects Template matching 

Supplementary material

Supplementary material 1 (mp4 23360 KB)


  1. 1.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  2. 2.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  3. 3.
    Grimson, W., Huttenlocher, D.: On the verification of hypothesized matches in model-based recognition. IEEE Trans. Pattern Anal. Mach. Intell. 13(12), 1201–1213 (1991)CrossRefGoogle Scholar
  4. 4.
    Lanser, S., Munkelt, O., Zierl, C.: Robust video-based object recognition using CAD models. In: Intelligent Autonomous Systems IAS-4, pp. 529–536 (1995)Google Scholar
  5. 5.
    Cyr, C.M., Kimia, B.B.: A similarity-based aspect-graph approach to 3D object recognition. Int. J. Comput. Vis. 57(1), 5–22 (2004)CrossRefGoogle Scholar
  6. 6.
    Liu, M.Y., Tuzel, O., Veeraraghavan, A., Taguchi, Y., Marks, T., Chellappa, R.: Fast object localization and pose estimation in heavy clutter for robotic bin picking. Int. J. Rob. Res. 31(8), 951–973 (2012)CrossRefGoogle Scholar
  7. 7.
    Ulrich, M., Wiedemann, C., Steger, C.: Combining scale-space and similarity-based aspect graphs for fast 3D object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1902–1914 (2012)CrossRefGoogle Scholar
  8. 8.
    Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.: Gradient response maps for real-time detection of textureless objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 876–888 (2012)CrossRefGoogle Scholar
  9. 9.
    Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37331-2_42 Google Scholar
  10. 10.
    David, P., DeMenthon, D.: Object recognition in high clutter images using line features. In: CVPR, pp. 1581–1588 (2005)Google Scholar
  11. 11.
    Damen, D., Bunnun, P., Calway, A., Mayol-Cuevas, W.: Real-time learning and detection of 3D texture-less objects: a scalable approach. In: BMVC (2012)Google Scholar
  12. 12.
    Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: efficient and robust 3D object recognition. In: CVPR, pp. 998–1005 (2010)Google Scholar
  13. 13.
    Rodrigues, J., Kim, J.S., Furukawa, M., Xavier, J., Aguiar, P., Kanade, T.: 6D pose estimation of textureless shiny objects using random ferns for bin-picking. In: IROS, pp. 3334–3341 (2012)Google Scholar
  14. 14.
    Tejani, A., Tang, D., Kouskouridas, R., Kim, T.-K.: Latent-class hough forests for 3D object detection and pose estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 462–477. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10599-4_30 Google Scholar
  15. 15.
    Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10605-2_35 Google Scholar
  16. 16.
    Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR, pp. 3109–3118 (2015)Google Scholar
  17. 17.
    Crivellaro, A., Rad, M., Verdie, Y., Yi, K.M., Fua, P., Lepetit, V.: A novel representation of parts for accurate 3D object detection and tracking in monocular images. In: ICCV, pp. 4391–4399 (2015)Google Scholar
  18. 18.
    Krull, A., Brachmann, E., Michel, F., Yang, M.Y., Gumhold, S., Rother, C.: Learning analysis-by-synthesis for 6D pose estimation in RGB-D images. In: ICCV, pp. 954–962 (2015)Google Scholar
  19. 19.
    Zhu, M., Derpanis, K., Yang, Y., Brahmbhatt, S., Zhang, M., Phillips, C., Lecce, M., Daniilidis, K.: Single image 3D object detection and pose estimation for grasping. In: ICRA, pp. 3936–3943 (2014)Google Scholar
  20. 20.
    Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: ICCV, pp. 2048–2055 (2013)Google Scholar
  21. 21.
    Kehl, W., Tombari, F., Navab, N., Ilic, S., Lepetit, V.: Hashmod: a hashing method for scalable 3D object detection. In: BMVC (2015)Google Scholar
  22. 22.
    Hodan, T., Zabulis, X., Lourakis, M., Obdrzalek, S., Matas, J.: Detection and fine 3D pose estimation of texture-less objects in RGB-D images. In: IROS, pp. 4421–4428 (2015)Google Scholar
  23. 23.
    Steger, C.: Occlusion, clutter, and illumination invariant object recognition. In: International Archives of Photogrammetry and Remote Sensing, vol. XXXIV, Part 3A, pp. 345–350 (2002)Google Scholar
  24. 24.
    Ullah, F., Kaneko, S.: Using orientation codes for rotation-invariant template matching. Pattern Recogn. 37(2), 201–209 (2004)CrossRefzbMATHGoogle Scholar
  25. 25.
    Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., Navab, N.: Dominant orientation templates for real-time detection of texture-less objects. In: CVPR, pp. 2257–2264 (2010)Google Scholar
  26. 26.
    Konishi, Y., Ijiri, Y., Suwa, M., Kawade, M.: Textureless object detection using cumulative orientation feature. In: ICIP, pp. 1310–1313 (2015)Google Scholar
  27. 27.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)Google Scholar
  28. 28.
    Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: CVPR, pp. 1–8 (2008)Google Scholar
  29. 29.
    Muja, M., Lowe, D.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)CrossRefGoogle Scholar
  30. 30.
    Lai, K., Bo, L., Ren, X., Fox, D.: A scalable tree-based approach for joint object and pose recognition. In: AAAI, pp. 1474–1480 (2011)Google Scholar
  31. 31.
    Gavrila, D.M.: A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1408–1421 (2007)CrossRefGoogle Scholar
  32. 32.
    Stenger, B., Thayananthan, A., Torr, P.H.S., Cipolla, R.: Hand pose estimation using hierarchical detection. In: Sebe, N., Lew, M., Huang, T.S. (eds.) CVHCI 2004. LNCS, vol. 3058, pp. 105–116. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-24837-8_11 CrossRefGoogle Scholar
  33. 33.
    Borgefors, G.: Hierarchical chamfer matching: a parametric edge matching algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 849–865 (1988)CrossRefGoogle Scholar
  34. 34.
    Pelleg, D., Moore, A.: X-means: extending k-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)Google Scholar
  35. 35.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN: 0521540518CrossRefzbMATHGoogle Scholar
  36. 36.
    Garrido-Jurado, S., Muñoz Salinas, R., Madrid-Cuevas, F.J., Marín-Jiménez, M.J.: Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn. 47(6), 2280–2292 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Yoshinori Konishi
    • 1
    Email author
  • Yuki Hanzawa
    • 1
  • Masato Kawade
    • 1
  • Manabu Hashimoto
    • 2
  1. 1.OMRON CorporationKyotoJapan
  2. 2.Chukyo UniversityNagoyaJapan

Personalised recommendations