Abstract
In recent times, object detection and pose estimation have gained significant attention in the context of robotic vision applications. Both the identification of objects of interest as well as the estimation of their pose remain important capabilities in order for robots to provide effective assistance for numerous robotic applications ranging from household tasks to industrial manipulation. This problem is particularly challenging because of the heterogeneity of objects having different and potentially complex shapes, and the difficulties arising due to background clutter and partial occlusions between objects. As the main contribution of this work, we propose a system that performs real-time object detection and pose estimation, for the purpose of dynamic robot grasping. The robot has been pre-trained to perform a small set of canonical grasps from a few fixed poses for each object. When presented with an unknown object in an arbitrary pose, the proposed approach allows the robot to detect the object identity and its actual pose, and then adapt a canonical grasp in order to be used with the new pose. For training, the system defines a canonical grasp by capturing the relative pose of an object with respect to the gripper attached to the robot’s wrist. During testing, once a new pose is detected, a canonical grasp for the object is identified and then dynamically adapted by adjusting the robot arm’s joint angles, so that the gripper can grasp the object in its new pose. We conducted experiments using a humanoid PR2 robot and showed that the proposed framework can detect well-textured objects, and provide accurate pose estimation in the presence of tolerable amounts of out-of-plane rotation. The performance is also illustrated by the robot successfully grasping objects from a wide range of arbitrary poses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
K. He, et al., Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778
S. Liu, W. Deng, Very deep convolutional neural network based image classification using small training sample size, in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015), pp. 730–734
C. Szegedy, et al., Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 1–9
D.C. Ciresan, et al., Flexible, high performance convolutional neural networks for image classification, in Twenty-Second International Joint Conference on Artificial Intelligence (2011)
P. Sermanet, et al., Overfeat: integrated recognition, localization and detection using convolutional networks, in 2nd International Conference on Learning Representations (ICLR 2014), Conference Date: 14-04-2014 Through 16-04-2014 (2014)
K. He, et al., Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1440–1448
S. Ren, et al., Faster R-CNN: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems (2015), pp. 91–99
W. Liu, et al., SSD: single shot multibox detector, in European Conference on Computer Vision (Springer, Berlin, 2016), pp. 21–37
J. Redmon, et al., You only look once: unified, real-time object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 779–788
J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 7263–7271
T.-Y. Lin, et al., Focal loss for dense object detection, in Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2980–2988
V. Badrinarayanan, et al., Segnet: a deep convolutional encoder-decoder architecture for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
K. He, et al., Mask R-CNN, in Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2961–2969
O. Ronneberger, et al., U-net: convolutional networks for biomedical image segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, Berlin, 2015), pp. 234–241
D.J. Butler, et al., A naturalistic open source movie for optical flow evaluation, in Computer Vision – ECCV 2012, ed. by A. Fitzgibbon et al. (Springer, Berlin, 2012), pp. 611–625
N. Mayer, et al., A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 4040–4048
W. Qiu, A. Yuille, Unrealcv: connecting computer vision to unreal engine, in European Conference on Computer Vision (Springer, Berlin, 2016), pp. 909–916
Y. Zhang, et al., Unrealstereo: a synthetic dataset for analyzing stereo vision (2016, preprint). arXiv:1612.04647
J. McCormac, et al., Scenenet RGB-D: can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? in The IEEE International Conference on Computer Vision (ICCV) (2017)
Y. Xiang, et al., Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes, in Robotics: Science and Systems (RSS) (2018)
J. Tremblay, et al., Deep object pose estimation for semantic robotic grasping of household objects, in Conference on Robot Learning (CoRL) (2018)
E. Brachmann, et al., Learning 6d object pose estimation using 3d object coordinates, in European Conference on Computer Vision (Springer, Berlin, 2014), pp. 536–551
C. Wang, et al., Densefusion: 6d object pose estimation by iterative dense fusion, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 3343–3352
Y. Hu, et al., Segmentation-driven 6d object pose estimation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 3385–3394
C.G. Harris, et al., A combined corner and edge detector, in Alvey Vision Conference, vol. 15 (Citeseer, 1988), pp. 10–5244
C. Tomasi, T. Kanade, Detection and tracking of point features. School of Computer Science, Carnegie Mellon University, Pittsburgh (1991)
J. Shi, et al., Good features to track, in 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, Piscataway, 1994), pp. 593–600
D. Hall, et al., Saliency of interest points under scale changes, in British Machine Vision Conference (BMVC) (2002), pp. 1–10
T. Lindeberg, Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 79–116 (1998)
K. Mikolajczyk, C. Schmid, Indexing based on scale invariant interest points, in Proceedings Eighth IEEE International Conference on Computer Vision (ICCV 2001), vol. 1 (IEEE, Piscataway, 2001), pp. 525–531
D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
H. Bay, et al., SURF: speeded up robust features, in Computer Vision – ECCV 2006, ed. by A. Leonardis, et al. (Springer Berlin, 2006), pp. 404–417
Y. Ke, R. Sukthankar, PCA-sift: a more distinctive representation for local image descriptors, in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (CVPR 2004), vol. 2 (IEEE, Piscataway, 2004), p. II
S.K. Lodha, Y. Xiao, GSIFT: geometric scale invariant feature transform for terrain data, in Vision Geometry XIV, vol. 6066 (International Society for Optics and Photonics, Bellingham, 2006), p. 60660L
A.E. Abdel-Hakim, A.A. Farag, CSIFT: a sift descriptor with color invariant characteristics, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2 (IEEE, Piscataway, 2006), pp. 1978–1983
J.-M. Morel, G. Yu, ASIFT: a new framework for fully affine invariant image comparison. SIAM J. Imag. Sci. 2(2), 438–469 (2009)
P.F. Alcantarilla, et al., Gauge-surf descriptors. Image Vis. Comput. 31(1), 103–116 (2013)
T.-K. Kang, et al., MDGHM-surf: a robust local image descriptor based on modified discrete Gaussian–Hermite moment. Pattern Recognit. 48(3), 670–684 (2015)
J. Fu, et al., C-surf: colored speeded up robust features, in International Conference on Trustworthy Computing and Services (Springer, Berlin, 2012), pp. 203–210
E. Rosten, T. Drummond, Machine learning for high-speed corner detection, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 430–443
E. Mair, et al., Adaptive and generic corner detection based on the accelerated segment test, in European Conference on Computer Vision (Springer, Berlin, 2010), pp. 183–196
M. Calonder, et al., Brief: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2011)
E. Rublee, et al., ORB: an efficient alternative to sift or surf, in 2011 International Conference on Computer Vision (2011), pp. 2564–2571
S. Leutenegger, et al., Brisk: binary robust invariant scalable keypoints, in 2011 International Conference on Computer Vision (IEEE, Piscataway, 2011), pp. 2548–2555
R. Ortiz, Freak: fast retina keypoint, in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR’12, Washington (IEEE Computer Society, Washington, 2012), pp. 510–517
P.F. Alcantarilla, et al., Kaze features, in Computer Vision – ECCV 2012, ed. by A. Fitzgibbon, et al. (Springer, Berlin, 2012), pp. 214–227
P.F. Alcantarilla, et al., Fast explicit diffusion for accelerated features in nonlinear scale spaces, in British Machine Vision Conference (BMVC) (2013)
J. Weickert, et al., Cyclic schemes for PDE-based image analysis. Int. J. Comput. Vis. 118(3), 275–299 (2016)
S. Grewenig, et al., From box filtering to fast explicit diffusion, in Joint Pattern Recognition Symposium (Springer, Berlin, 2010), pp. 533–542
G. Simon, M. Berger, Pose estimation for planar structures. IEEE Comput. Graph. Appl. 22(6), 46–53 (2002)
C. Xu, et al., 3D pose estimation for planes, in 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops (2009), pp. 673–680
M. Donoser, et al., Robust planar target tracking and pose estimation from a single concavity, in 2011 10th IEEE International Symposium on Mixed and Augmented Reality (2011), pp. 9–15
D. Nistér, H. Stewénius, Linear time maximally stable extremal regions, in Computer Vision – ECCV 2008, ed. by D. Forsyth et al. (Springer, Berlin, 2008), pp. 183–196
A. Sahbani, et al., An overview of 3d object grasp synthesis algorithms. Robot. Auton. Syst. 60(3), 326–336 (2012)
J. Bohg, et al., Data-driven grasp synthesis–a survey. IEEE Trans. Robot. 30(2), 289–309 (2013)
B. Kehoe, et al., Cloud-based robot grasping with the google object recognition engine, in 2013 IEEE International Conference on Robotics and Automation (IEEE, Piscataway, 2013)
K. Huebner, et al., Minimum volume bounding box decomposition for shape approximation in robot grasping, in 2008 IEEE International Conference on Robotics and Automation (IEEE, Piscataway, 2008)
S. Caldera, et al., Review of deep learning methods in robotic grasp detection. Multimodal Technol. Interact. 2(3), 57 (2018)
J. Yu, et al., A vision-based robotic grasping system using deep learning for 3d object recognition and pose estimation, in 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO) (IEEE, Piscataway, 2013)
O. Kroemer, et al., Active learning using mean shift optimization for robot grasping, in 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, Piscataway, 2009)
J. Aleotti, S. Caselli, Part-based robot grasp planning from human demonstration, in 2011 IEEE International Conference on Robotics and Automation (IEEE, Piscataway, 2011)
A. Saxena, et al., Robotic grasping of novel objects using vision. Int. J. Robot. Res. 27(2), 157–173 (2008)
L. Montesano, M. Lopes, Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions. Robot. Auton. Syst. 60(3), 452–462 (2012)
J. Nogueira, et al., Unscented Bayesian optimization for safe robot grasping, in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, Piscataway, 2016)
E. Karami, et al., Image matching using sift, surf, brief and orb: performance comparison for distorted images, in The 24th Annual Newfoundland Electrical and Computer Engineering Conference, NECEC (2015)
S.A.K. Tareen, Z. Saleem, A comparative analysis of sift, surf, kaze, akaze, orb, and brisk, in 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET) (IEEE, Piscataway, 2018), pp. 1–10
M. Muja, D.G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, in International Conference on Computer Vision Theory and Application VISSAPP’09) (INSTICC Press, Lisboa, 2009), pp. 331–340
M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM 24(6), 381–395 (1981)
Acknowledgements
This work has been supported in part by the Office of Naval Research award N00014-16-1-2312 and US Army Research Laboratory (ARO) award W911NF-20-2-0084.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Paul, S.K., Chowdhury, M.T., Nicolescu, M., Nicolescu, M., Feil-Seifer, D. (2021). Object Detection and Pose Estimation from RGB and Depth Data for Real-Time, Adaptive Robotic Grasping. In: Arabnia, H.R., Deligiannidis, L., Shouno, H., Tinetti, F.G., Tran, QN. (eds) Advances in Computer Vision and Computational Biology. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-71051-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-71051-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71050-7
Online ISBN: 978-3-030-71051-4
eBook Packages: Computer ScienceComputer Science (R0)