Abstract
Estimating the pose of objects from depth data is a problem of considerable practical importance for many vision applications. This paper presents an approach for accurate and efficient 3D pose estimation from noisy 2.5D depth images obtained from a consumer depth sensor. Initialized with a coarsely accurate pose, the proposed approach applies a hypothesize-and-test scheme that combines stochastic optimization and graphics-based rendering to refine the supplied initial pose, so that it accurately accounts for a sensed depth image. Pose refinement employs particle swarm optimization to minimize an objective function that quantifies the misalignment between the acquired depth image and a rendered one that is synthesized from a hypothesized pose with the aid of an object mesh model. No explicit correspondences between the depth data and the model need to be established, whereas pose hypothesis rendering and objective function evaluation are efficiently performed on the GPU. Extensive experimental results demonstrate the superior performance of the proposed approach compared to the ICP algorithm, which is typically used for pose refinement in depth images. Furthermore, the experiments indicate the graceful degradation of its performance to limited computational resources and its robustness to noisy and reduced polygon count models, attesting its suitability for use with automatically scanned object models and common graphics hardware.
Similar content being viewed by others
Notes
In our implementation, \(d_T\) is set to 20 mm, a value determined by considering the size of the search space for candidate poses and the sensors sensitivity to depth changes.
References
Badino, H., Huber, D., Park, Y., Kanade, T.: Fast and accurate computation of surface normals from range images. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3084–3091 (2011)
Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)
Bouaziz, S., Tagliasacchi, A., Pauly, M.: Sparse iterative closest point. Comput. Graph. Forum 32(5), 113–123 (2013)
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: European Conference on Computer Vision, vol. II, pp. 536–551. Springer International Publishing, Berlin (2014)
Bratanič, B., Pernuš, F., Likar, B., Tomaževič, D.: Real-time pose estimation of rigid objects in heavily cluttered environments. Comput. Vis. Image Understand. 141, 38–51 (2015)
Bronstein, M.M., Kokkinos, I.: Scale-invariant heat kernel signatures for non-rigid shape recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1704–1711 (2010)
Cao, T.T., Tang, K., Mohamed, A., Tan, T.S.: Parallel banding algorithm to compute exact distance transform with the GPU. In: Proceedings of the 2010 Association for Computing Machinery’s Special Interest Group on Computer Graphics and Interactive Techniques (ACM SIGGRAPH) Symposium on Interactive 3D Graphics and Games, pp. 83–90 (2010)
Choi, C., Christensen, H.: 3D pose estimation of daily objects using an RGB-D camera. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3342–3349 (2012)
Cignoni, P., Corsini, M., Ranzuglia, G.: MeshLab: an open-source 3D mesh processing system. ERCIM News 2008(73), 45–46 (2008)
Collet, A., Martinez, M., Srinivasa, S.: The MOPED framework: object recognition and pose estimation for manipulation. Int. J. Robot. Res. 30(10), 1284–1306 (2011)
Darom, T., Keller, Y.: Scale-invariant features for 3-D mesh models. IEEE Trans. Image Process. 21(5), 2758–2769 (2012)
DARWIN: Dextrous Assembler Robot Working with Embodied Intelligence. European Commission FP7 Project, Grant No. 270138. http://darwin-project.eu/ (2015). Accessed 23 Sept 2015
Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: efficient and robust 3D object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 998–1005 (2010)
Felzenszwalb, P., Huttenlocher, D.: Distance transforms of sampled functions. Theory Comput. 8(19), 415–428 (2012)
Fischer, J., Bormann, R., Arbeiter, G., Verl, A.: A feature descriptor for texture-less object representation using 2D and 3D cues from RGB-D data. In: IEEE International Conference on Robotics and Automation, pp. 2112–2117 (2013)
Flöry, S., Hofer, M.: Surface fitting and registration of point clouds using approximations of the unsigned distance function. Comput. Aided Geom. Des. 27(1), 60–77 (2010)
Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. Eur. Conf. Comput. Vis. 3, 224–237 (2004)
Garland, M., Heckbert, P.S.: Surface simplification using quadric error metrics. In: 24th Annual Conference on Computer Graphics and Interactive Techniques, Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH) ’97, pp. 209–216. ACM Press/Addison-Wesley Publishing Co., New York (1997)
Geiger, A.: LIBICP: C++ library for iterative closest point matching (2011). http://www.cvlibs.net/software/libicp. Accessed 23 Sept 2015
Hernandez-Matas, C., Zabulis, X., Argyros, A.A.: Retinal image registration based on keypoint correspondences, spherical eye modeling and camera pose estimation. In: IEEE International Conference of the Engineering in Medicine and Biology Society, pp. 5650–5654 (2015)
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Asian Conference on Computer Vision, pp. 548–562 (2012)
Hodan, T., Zabulis, X., Lourakis, M., Obdrzalek, S., Matas, J.: Detection and fine 3D pose estimation of textureless objects in RGB-D images. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4421–4428 (2015)
Horn, B.: Closed-form solution of absolute orientation using unit quaternions. J. Opt. Soc. Am. A 4(4), 629–642 (1987)
Ivekovič, S., Trucco, E., Petillot, Y.: Human body pose estimation with particle swarm optimisation. Evol. Comput. 16(4), 509–528 (2008)
Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999)
Kehl, W., Tombari, F., Navab, N., Ilic, S., Lepetit, V.: Hashmod: a hashing method for scalable 3D object detection. In: British Machine Vision Conference, pp. 1–12. BMVA Press, USA (2015)
Khoshelham, K., Elberink, S.: Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 12(2), 1437–1454 (2012)
Krull, A., Brachmann, E., Michel, F., Ying Yang, M., Gumhold, S., Rother, C.: Learning analysis-by-synthesis for 6D pose estimation in RGB-D images. In: International Conference on Computer Vision, pp. 954–962. IEEE, New York (2015)
Lourakis, M., Zabulis, X.: Model-based pose estimation for rigid objects. In: International Conference on Computer Vision Systems. Lecture Notes on Computer Science, vol. 7963, pp. 83–92. Springer, Berlin (2013)
Mian, A., Bennamoun, M., Owens, R.: Automatic correspondence for 3D modeling: an extensive review. Int. J. Shape Model. 11(02), 253–291 (2005)
Nascimento, E., Oliveira, G., Campos, M., Vieira, A., Schwartz, W.: BRAND: a robust appearance and depth descriptor for RGB-D images. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1720–1726 (2012)
Oikonomidis, I., Kyriazis, N., Argyros, A.: Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: International Conference on Computer Vision, pp. 2088–2095 (2011)
Park, I., Germann, M., Breitenstein, M., Pfister, H.: Fast and automatic object pose estimation for range images on the GPU. Mach. Vis. Appl. 21(5), 749–766 (2010)
Pauwels, K., Ivan, V., Ros, E., Vijayakumar, S.: Real-time object pose recognition and tracking with an imprecisely calibrated moving RGB-D camera. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2733–2740 (2014)
Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization. Swarm Intell. 1(1), 33–57 (2007)
Prankl, J., Aldoma, A., Svejda, A., Vincze, M.: RGB-D object modelling for object recognition and tracking. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 96–103 (2015)
Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: International Conference on Computer Vision (ICCV), pp. 2048–2055 (2013)
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: International Conference on 3D Digital Imaging and Modeling, pp. 145–152 (2001)
Rusu, R., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: IEEE International Conference on Robotics and Automation, pp. 3212–3217 (2009)
Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: International Conference on Computer Vision (2007)
Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: European Conference on Computer Vision, vol. VI, pp. 634–651. Springer International Publishing, Berlin (2014)
Sun, M., Bradski, G., Xu, B.X., Savarese, S.: Depth-encoded hough voting for joint object detection and shape recovery. In: European Conference on Computer Vision, pp. 658–671 (2010)
Tejani, A., Tang, D., Kouskouridas, R., Kim, T.: Latent-class hough forests for 3D object detection and pose estimation. In: European Conference on Computer Vision, pp. 462–477 (2014)
Tombari, F., Salti, S., di Stefano, L.: A combined texture-shape descriptor for enhanced 3D feature matching. In: IEEE International Conference on Image Processing, pp. 809–812 (2011)
Wang, W., Chen, L., Liu, Z., Kühnlenz, K., Burschka, D.: Textured/textureless object recognition and pose estimation using RGB-D image. J. Real-Time Image Process. 10(4), 667–682 (2013)
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3109–3118. IEEE, New York (2015)
Yuille, A., Kersten, D.: Vision as bayesian inference: analysis by synthesis? Trends Cognit. Sci. 10(7), 301–308 (2006)
Zabulis, X., Lourakis, M., Koutlemanis, P.: 3D object pose refinement in range images. In: International Conference on Computer Vision Systems. In: Lecture Notes on Computer Science, vol. 9163, pp. 263–274. Springer International Publishing, Berlin (2015)
Zach, C., Penate-Sanchez, A., Pham, M.T.: A dynamic programming approach for fast and robust object pose recognition from range images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 196–203. IEEE, New York (2015)
Zaharescu, A., Boyer, E., Varanasi, K., Horaud, R.: Surface feature detection and description with applications to mesh matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 373–380 (2009)
Zhang, X., Hu, W., Maybank, S., Li, X., Zhu, M.: Sequential particle swarm optimization for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)
Acknowledgments
This work was partially supported by the European Commission FP7 DARWIN Project, Grant No. 270138 and the Foundation for Research and Technology Hellas-Institute of Computer Science (FORTH-ICS) internal RTD Programme ‘Ambient Intelligence and Smart Environments’.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zabulis, X., Lourakis, M.I.A. & Koutlemanis, P. Correspondence-free pose estimation for 3D objects from noisy depth data. Vis Comput 34, 193–211 (2018). https://doi.org/10.1007/s00371-016-1326-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-016-1326-9