Skip to main content
Log in

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

This paper presents a comprehensive survey on vision-based robotic grasping. We conclude three key tasks during vision-based robotic grasping, which are object localization, object pose estimation and grasp estimation. In detail, the object localization task contains object localization without classification, object detection and object instance segmentation. This task provides the regions of the target object in the input data. The object pose estimation task mainly refers to estimating the 6D object pose and includes correspondence-based methods, template-based methods and voting-based methods, which affords the generation of grasp poses for known objects. The grasp estimation task includes 2D planar grasp methods and 6DoF grasp methods, where the former is constrained to grasp from one direction. These three tasks could accomplish the robotic grasping with different combinations. Lots of object pose estimation methods need not object localization, and they conduct object localization and object pose estimation jointly. Lots of grasp estimation methods need not object localization and object pose estimation, and they conduct grasp estimation in an end-to-end manner. Both traditional methods and latest deep learning-based methods based on the RGB-D image inputs are reviewed elaborately in this survey. Related datasets and comparisons between state-of-the-art methods are summarized as well. In addition, challenges about vision-based robotic grasping and future directions in addressing these challenges are also pointed out.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  • Akkaya I, Andrychowicz M, Chociej M, Litwin M, McGrew B, Petron A, Paino A, Plappert M, Powell G, Ribas R, et al (2019) Solving rubik’s cube with a robot hand. Preprint arXiv:1910.07113

  • Aldoma A, Vincze M, Blodow N, Gossow D, Gedikli S, Rusu RB, Bradski G (2011) Cad-model recognition and 6dof pose estimation using 3d cues. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 585–592

  • Aoki Y, Goforth H, Srivatsan RA, Lucey S (2019) Pointnetlk: robust & efficient point cloud registration using pointnet. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7163–7172

  • Ardón P, Pairet È, Petrick RP, Ramamoorthy S, Lohan KS (2019) Learning grasp affordance reasoning through semantic relations. IEEE Robot Autom Lett 4(4):4571–4578

    Google Scholar 

  • Asif U, Tang J, Harrer S (2018) Graspnet: an efficient convolutional neural network for real-time grasp detection for low-powered devices. In: IJCAI, pp 4875–4882

  • Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: European conference on computer vision, Springer, pp 404–417

  • Bellekens B, Spruyt V, Berkvens R, Weyn M (2014) A survey of rigid 3d pointcloud registration algorithms. In: AMBIENT 2014: the fourth international conference on ambient computing, applications, services and technologies, August 24–28, 2014, Rome, Italy, pp 8–13

  • Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522

    Google Scholar 

  • Berscheid L, Meißner P, Kröger T (2019) Robot learning of shifting objects for grasping in cluttered environments. Preprint arXiv:1907.11035

  • Besl PJ, McKay ND (1992) A method for registration of 3-d shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256

    Google Scholar 

  • Bhatia S, Chalup SK et al (2013) Segmenting salient objects in 3d point clouds of indoor scenes using geodesic distances. J Signal Inf Process 4(03):102

    Google Scholar 

  • Billings G, Johnson-Roberson M (2018) Silhonet: An RGB method for 3d object pose estimation and grasp planning. CoRR abs/1809.06893

  • Blomqvist K, Breyer M, Cramariuc A, Förster J, Grinvald M, Tschopp F, Chung JJ, Ott L, Nieto J, Siegwart R (2020) Go fetch: mobile manipulation in unstructured environments. Preprint arXiv:2004.00899

  • Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. Preprint arXiv:2004.10934

  • Bohg J, Kragic D (2010) Learning grasping points with shape context. Robot Auton Syst 58(4):362–377

    Google Scholar 

  • Bohg J, Morales A, Asfour T, Kragic D (2014) Data-driven grasp synthesis: a survey. IEEE Trans Robot 30(2):289–309

    Google Scholar 

  • Bolya D, Zhou C, Xiao F, Lee YJ (2019) Yolact++: better real-time instance segmentation. Preprint arXiv:1912.06218

  • Bolya D, Zhou C, Xiao F, Lee YJ (2019) Yolact: real-time instance segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 9157–9166

  • Borji A, Cheng MM, Hou Q, Jiang H, Li J (2019) Salient object detection: A survey. Computational visual media 5(2):117–150

    Google Scholar 

  • Borst C, Fischer M, Hirzinger G (2003) Grasping the dice by dicing the grasp. In: IEEE/RSJ international conference on intelligent robots and systems, IEEE, vol 4, pp 3692–3697

  • Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K et al (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4243–4250

  • Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision, Springer, pp 536–551

  • Brachmann E, Michel F, Krull A, Ying Yang M, Gumhold S et al (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3364–3372

  • Bradski G, Kaehler A (2008) Learning OpenCV: computer vision with the OpenCV library. “ O’Reilly Media, Inc.”

  • Cai J, Cheng H, Zhang Z, Su J (2019) Metagrasp: data efficient grasping by affordance interpreter network. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 4960–4966

  • Caldera S, Rassau A, Chai D (2018) Review of deep learning methods in robotic grasp detection. Multimodal Technol Interact 2(3):57

    Google Scholar 

  • Castro P, Armagan A, Kim TK (2020) Accurate 6d object pose estimation by pose conditioned mesh reconstruction. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4147–4151

  • Chen D, Li J, Wang Z, Xu K (2020) Learning canonical shape space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11973–11982

  • Chen H, Li Y (2018) Progressively complementarity-aware fusion network for rgb-d salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3051–3060

  • Chen H, Li Y (2019) Cnn-based rgb-d salient object detection: learn, select and fuse. Preprint arXiv:1909.09309

  • Chen H, Li Y, Su D (2019) Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for rgb-d salient object detection. Pattern Recogn 86:376–385

    Google Scholar 

  • Chen H, Sun K, Tian Z, Shen C, Huang Y, Yan Y (2020) Blendmask: top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8573–8581

  • Chen IM, Burdick JW (1993) Finding antipodal point grasps on irregularly shaped objects. IEEE Trans Robot Autom 9(4):507–512

    Google Scholar 

  • Chen K, Pang J, Wang J, Xiong Y, Li X, Sun S, Feng W, Liu Z, Shi J, Ouyang W, et al (2019) Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4974–4983

  • Chen LC, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4013–4022

  • Chen W, Jia X, Chang HJ, Duan J, Leonardis A (2020) G2l-net: global to local network for real-time 6d pose estimation with embedding vector features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4233–4242

  • Chen X, Girshick R, He K, Dollár P (2019) Tensormask: a foundation for dense object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 2061–2069

  • Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915

  • Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2014) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582

    Google Scholar 

  • Choy C, Dong W, Koltun V (2020) Deep global registration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2514–2523

  • Chu FJ, Xu R, Vela PA (2018) Real-world multiobject, multigrasp detection. IEEE Robot Autom Lett 3(4):3355–3362

    Google Scholar 

  • Chu FJ, Xu R, Vela PA (2019) Detecting robotic affordances on novel objects with regional attention and attributes. Preprint arXiv:1909.05770

  • Crivellaro A, Rad M, Verdie Y, Yi KM, Fua P, Lepetit V (2017) Robust 3d object tracking from monocular images using stable parts. IEEE Trans Pattern Anal Mach Intell 40(6):1465–1479

    Google Scholar 

  • Dai A, Nießner M, Zollhöfer M, Izadi S, Theobalt C (2017) Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans Graph (ToG) 36(4):1

    Google Scholar 

  • Dai J, He K, Li Y, Ren S, Sun J (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision, Springer, pp 534–549

  • Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3150–3158

  • Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  • Danielczuk M, Matl M, Gupta S, Li A, Lee A, Mahler J, Goldberg K (2019) Segmenting unknown 3d objects from real depth images using mask r-cnn trained on synthetic data. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 7283–7290

  • Deng X, Xiang Y, Mousavian A, Eppner C, Bretl T, Fox D (2020) Self-supervised 6d object pose estimation for robot manipulation. In: International conference on robotics and automation (ICRA)

  • Depierre A, Dellandréa E, Chen L (2018) Jacquard: a large scale dataset for robotic grasp detection. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 3511–3516

  • Depierre A, Dellandréa E, Chen L (2020) Optimizing correlated graspability score and grasp regression for better grasp prediction. Preprint arXiv:2002.00872

  • DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 224–236

  • Ding D, Liu YH, Wang MY (2001) On computing immobilizing grasps of 3-d curved objects. In: IEEE international symposium on computational intelligence in robotics and automation, IEEE, pp 11–16

  • Do TT, Cai M, Pham T, Reid I (2018) Deep-6dpose: recovering 6d object pose from a single rgb image. Preprint arXiv:1802.10367

  • Do TT, Nguyen A, Reid I (2018) Affordancenet: an end-to-end deep learning approach for object affordance detection. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–5

  • Domae Y, Okuda H, Taguchi Y, Sumi K, Hirai T (2014) Fast graspability evaluation on single depth maps for bin picking with general grippers. In: 2014 IEEE international conference on robotics and automation (ICRA), IEEE, pp. 1997–2004

  • Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10519–10528

  • Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartogr Int J Geogr Inf Geovis 10(2):112–122

    Google Scholar 

  • Drost B, Ilic S (2012) 3d object detection and localization using multimodal point pair features. In: International conference on 3D imaging, modeling, processing, visualization transmission, pp 9–16

  • Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 998–1005

  • Du L, Ye X, Tan X, Feng J, Xu Z, Ding E, Wen S (2020) Associate-3ddet: perceptual-to-conceptual association for 3d point cloud object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13329–13338

  • Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6569–6578

  • Engelmann F, Bokeloh M, Fathi A, Leibe B, Nießner M (2020) 3d-mpa: multi-proposal aggregation for 3d semantic instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9031–9040

  • Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154

  • Falco P, Lu S, Natale C, Pirozzi S, Lee D (2019) A transfer learning approach to cross-modal object recognition: from visual observation to robotic haptic exploration. IEEE Trans Robot 35(4):987–998

    Google Scholar 

  • Fan Y, Tomizuka M (2019) Efficient grasp planning and execution with multifingered hands by surface fitting. IEEE Robot Autom Lett 4(4):3995–4002

    Google Scholar 

  • Fan Z, Yu JG, Liang Z, Ou J, Gao C, Xia GS, Li Y (2020) Fgn: fully guided network for few-shot instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9172–9181

  • Fang HS, Wang C, Gou M, Lu C (2020) Graspnet-1billion: a large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11444–11453

  • Fang K, Bai Y, Hinterstoisser S, Savarese S, Kalakrishnan M (2018) Multi-task domain adaptation for deep learning of instance grasping from simulation. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE, pp 3516–3523

  • Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    MathSciNet  Google Scholar 

  • Fitzgibbon AW, Fisher RB et al (1996) A buyer’s guide to conic fitting. Department of Artificial Intelligence, University of Edinburgh, Edinburgh

    Google Scholar 

  • Florence PR, Manuelli L, Tedrake R (2018) Dense object nets: learning dense visual object descriptors by and for robotic manipulation. Preprint arXiv:1806.08756

  • Frome A, Huber D, Kolluri R, Bülow T, Malik J (2004) Recognizing objects in range data using regional point descriptors. In: European conference on computer vision, Springer, pp 224–237

  • Gao G, Lauri M, Wang Y, Hu X, Zhang J, Frintrop S (2020) 6d object pose regression via supervised learning on point clouds. Preprint arXiv:2001.08942

  • Gao W, Tedrake R (2019) kpam-sc: generalizable manipulation planning using keypoint affordance and shape completion. Preprint arXiv:1909.06980

  • Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR ’14, pp 580–587

  • Gojcic Z, Zhou C, Wegner JD, Wieser A (2019) The perfect match: 3d point cloud matching with smoothed densities. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5545–5554

  • Gonzalez M, Kacete A, Murienne A, Marchand E (2020) Yoloff: you only learn offsets for robust 6dof object pose estimation. Preprint arXiv:2002.00911

  • Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision, Springer, pp 241–257

  • Goron LC, Marton ZC, Lazea G, Beetz M (2012) Robustly segmenting cylindrical and box-like objects in cluttered scenes using depth cameras. In: ROBOTIK 2012; 7th German conference on robotics, VDE, pp 1–6

  • Graham B, Engelcke M, van der Maaten L (2018) 3d semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9224–9232

  • Graham B, van der Maaten L (2017) Submanifold sparse convolutional networks. Preprint arXiv:1706.01307

  • Guo D, Kong T, Sun F, Liu H (2016) Object discovery and grasp detection with a shared convolutional neural network. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 2038–2043

  • Guo D, Sun F, Liu H, Kong T, Fang B, Xi N (2017) A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1609–1614

  • Guo F, Wang W, Shen J, Shao L, Yang J, Tao D, Tang YY (2017) Video saliency detection using object proposals. IEEE Trans Cybern 48(11):3159–3170

    Google Scholar 

  • Guo Y, Bennamoun M, Sohel F, Lu M, Wan J, Kwok NM (2016) A comprehensive performance evaluation of 3d local feature descriptors. Int J Comput Vis 116(1):66–89

    MathSciNet  Google Scholar 

  • Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M (2019) Deep learning for 3d point clouds: a survey. Preprint arXiv:1912.12033

  • Hafiz AM, Bhat GM (2020) A survey on instance segmentation: state of the art. Int J Multimed Inf Retr 9(3):171–189

    Google Scholar 

  • Hagelskjær F, Buch AG (2019) Pointposenet: accurate object detection and 6 dof pose estimation in point clouds. Preprint arXiv:1912.09057

  • Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process Mag 35(1):84–100

    Google Scholar 

  • Han L, Zheng T, Xu L, Fang L (2020) Occuseg: occupancy-aware 3d instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2940–2949

  • Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: European conference on computer vision, Springer, pp 297–312

  • He K, Gkioxari G, Dollár P, Girshick RB (2017) Mask r-cnn. IEEE International conference on computer vision (ICCV), pp 2980–2988

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641

  • Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision, Springer, pp 548–562

  • Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: International conference on artificial neural networks, Springer, pp 44–51

  • Hodan T, Barath D, Matas J (2020) Epos: estimating 6d pose of objects with symmetries. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11703–11712

  • Hodaň T, Haluza P, Obdržálek Š, Matas J, Lourakis M, Zabulis X (2017) T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: IEEE winter conference on applications of computer vision (WACV)

  • Hodan T, Kouskouridas R, Kim T, Tombari F, Bekris KE, Drost B, Groueix T, Walas K, Lepetit V, Leonardis A, Steger C, Michel F, Sahin C, Rother C, Matas J (2018) A summary of the 4th international workshop on recovering 6d object pose. CoRR abs/1810.03758

  • Hodaň T, Michel F, Brachmann E, Kehl W, GlentBuch A, Kraft D, Drost B, Vidal J, Ihrke S, Zabulis X et al (2018) Bop: benchmark for 6d object pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 19–34

  • Hodaň T, Zabulis X, Lourakis M, Obdržálek Š, Matas J (2015) Detection and fine 3d pose estimation of texture-less objects in rgb-d images. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 4421–4428

  • Hogan FR, Ballester J, Dong S, Rodriguez A (2020) Tactile dexterity: manipulation primitives with tactile feedback. Preprint arXiv:2002.03236

  • Hou J, Dai A, Nießner M (2019) 3d-sis: 3d semantic instance segmentation of rgb-d scans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4421–4430

  • Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212

  • Hu Y, Fua P, Wang W, Salzmann M (2020) Single-stage 6d object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2930–2939

  • Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6d object pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3385–3394

  • Jiang H, Wang J, Yuan Z, Wu Y, Zheng N, Li S (2013) Salient object detection: a discriminative regional feature integration approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2083–2090

  • Jiang H, Xiao J (2013) A linear approach to matching cuboids in rgbd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2171–2178

  • Jiang Y, Moseson S, Saxena A (2011) Efficient grasping from rgbd images: learning using a new rectangle representation. In: IEEE international conference on robotics and automation, IEEE, pp 3304–3311

  • Johnson AE (1997) Spin-images: a representation for 3-d surface matching

  • Kaiser A, Ybanez Zepeda JA, Boubekeur T (2019) A survey of simple geometric primitives detection methods for captured 3d data. In: Computer graphics forum, Wiley Online Library, vol 38, pp 167–196

  • Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529

  • Khan SH, He X, Bennamoun M, Sohel F, Togneri R (2015) Separating objects and clutter in indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4603–4611

  • Kim G, Huber D, Hebert M (2008) Segmentation of salient regions in outdoor scenes using imagery and 3-d data. In: 2008 IEEE workshop on applications of computer vision, IEEE, pp 1–8

  • Kirillov A, Wu Y, He K, Girshick R (2020) Pointrend: image segmentation as rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9799–9808

  • Kirkpatrick D, Mishra B, Yap CK (1992) Quantitative steinitz’s theorems with applications to multifingered grasping. Discrete Comput Geom 7(3):295–318

    MathSciNet  MATH  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—volume 1, NIPS’12, pp 1097–1105

  • Kumra S, Joshi S, Sahin F (2019) Antipodal robotic grasping using generative residual convolutional neural network. Preprint arXiv:1909.04810

  • Kumra S, Kanan C (2017) Robotic grasp detection using deep convolutional neural networks. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 769–776

  • Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12697–12705

  • Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750

  • Lee MA, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, Garg A, Bohg J (2019) Making sense of vision and touch: self-supervised learning of multimodal representations for contact-rich tasks. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 8943–8950

  • Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13906–13915

  • Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. Int J Robot Res 34(4–5):705–724

    Google Scholar 

  • León B, Ulbrich S, Diankov R, Puche G, Przybylski M, Morales A, Asfour T, Moisio S, Bohg J, Kuffner J, Dillmann R (2010) Opengrasp: a toolkit for robot grasping simulation. In: Ando N, Balakirsky S, Hemker T, Reggiani M, von Stryk O (eds) Simulation, modeling, and programming for autonomous robots. Springer, Berlin, pp 109–120

    Google Scholar 

  • Lepetit V, Fua P et al (2005) Monocular model-based 3d tracking of rigid objects: a survey. Found Trends® Comput Graph Vis 1(1):1–89

  • Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: an accurate o(n) solution to the pnp problem. IJCV 81(2):155–166

    Google Scholar 

  • Li G, Liu Z, Ye L, Wang Y, Ling H (2020) Cross-modal weighting network for rgb-d salient object detection

  • Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2359–2367

  • Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: deep iterative matching for 6d pose estimation. Lecture notes in computer science, pp 695–711

  • Li Z, Wang G, Ji X (2019) Cdpn: coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 7678–7687

  • Liang H, Ma X, Li S, Görner M, Tang S, Fang B, Sun F, Zhang J (2019) Pointnetgpd: detecting grasp configurations from point sets. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 3629–3635

  • Liang M, Yang B, Chen Y, Hu R, Urtasun R (2019) Multi-task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7345–7353

  • Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  • Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  • Liu C, Furukawa Y (2019) Masc: multi-scale affinity with sparse convolution for 3d instance segmentation. Preprint arXiv:1902.04478

  • Liu F, Fang P, Yao Z, Fan R, Pan Z, Sheng W, Yang H (2019) Recovering 6d object pose from rgb indoor image based on two-stage detection network withmulti-task loss. Neurocomputing 337:15–23

    Google Scholar 

  • Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318

    Google Scholar 

  • Liu M, Pan Z, Xu K, Ganguly K, Manocha D (2019) Generating grasp poses for a high-dof gripper using neural networks. Preprint arXiv:1903.00425

  • Liu N, Han J (2016) Dhsnet: deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 678–686

  • Liu N, Han J, Yang MH (2018) Picanet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3089–3098

  • Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37

  • Liu X, Jonschkowski R, Angelova A, Konolige K (2020) Keypose: multi-view 3d labeling and keypoint estimation for transparent objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11602–11610

  • Liu Y, Zhang Q, Zhang D, Han J (2019) Employing deep part-object relationships for salient object detection. In: Proceedings of the IEEE international conference on computer vision, pp 1232–1241

  • Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X (2020) Tanet: robust 3d object detection from point clouds with triple attention. In: AAAI, pp 11677–11684

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  • Lou X, Yang Y, Choi C (2019) Learning to generate 6-dof grasp poses with reachability awareness. Preprint arXiv:1910.06404

  • Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the international conference on computer vision-Volume 2, ICCV ’99, p 1150

  • Lu W, Wan G, Zhou Y, Fu X, Yuan P, Song S (2019) Deepicp: an end-to-end deep neural network for 3d point cloud registration. Preprint arXiv:1905.04153

  • Lundell J, Verdoja F, Kyrki V (2019) Robust grasp planning over uncertain shape completions. Preprint arXiv:1903.00645

  • Luo T, Mo K, Huang Z, Xu J, Hu S, Wang L, Su H (2020) Learning to group: a bottom-up framework for 3d part discovery in unseen categories. In: International conference on learning representations

  • Mahajan M, Bhattacharjee T, Krishnan A, Shukla P, Nandi G (2020) Semi-supervised grasp detection by representation learning in a vector quantized latent space. Preprint arXiv:2001.08477

  • Mahler J, Liang J, Niyaz S, Laskey M, Doan R, Liu X, Ojea JA, Goldberg K (2017) Dex-net 2.0: seep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. CoRR arXiv:1703.09312

  • Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: 2011 International conference on computer vision, IEEE, pp 89–96

  • Mellado N, Aiger D, Mitra NJ (2014) Super 4pcs fast global pointcloud registration via smart indexing. In: Computer graphics forum, Wiley Online Library, vol 33, pp 205–215

  • Van der Merwe M, Lu Q, Sundaralingam B, Matak M, Hermans T (2019) Learning continuous 3d reconstructions for geometrically aware grasping. Preprint arXiv:1910.00983

  • Miller AT, Allen PK (2004) Graspit! a versatile simulator for robotic grasping. IEEE Robot Autom Mag 11(4):110–122

    Google Scholar 

  • Miller AT, Knoop S, Christensen HI, Allen PK (2003) Automatic grasp planning using shape primitives. ICRA 2:1824–1829

    Google Scholar 

  • Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D (2020) Image segmentation using deep learning: a survey. Preprint arXiv:2001.05566

  • Mirtich B, Canny J (1994) Easily computable optimum grasps in 2-d and 3-d. In: IEEE international conference on robotics and automation, IEEE, pp 739–747

  • Morrison D, Corke P, Leitner J (2018) Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. Preprint arXiv:1804.05172

  • Morrison D, Corke P, Leitner J (2019) Multi-view picking: next-best-view reaching for improved grasping in clutter. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 8762–8768

  • Mousavian A, Eppner C, Fox D (2019) 6-dof graspnet: variational grasp generation for object manipulation. In: Proceedings of the IEEE international conference on computer vision, pp 2901–2910

  • Mur-Artal R, Montiel JMM, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163

    Google Scholar 

  • Murali A, Mousavian A, Eppner C, Paxton C, Fox D (2019) 6-dof grasping for target-driven object manipulation in clutter. Preprint arXiv:1912.03628

  • Najibi M, Lai G, Kundu A, Lu Z, Rathod V, Funkhouser T, Pantofaru C, Ross D, Davis LS, Fathi A (2020) Dops: learning to detect 3d objects and predict their 3d shapes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11913–11922

  • Nguyen VD (1987) Constructing stable grasps in 3d. In: IEEE international conference on robotics and automation, IEEE, vol 4, pp 234–239

  • Ni P, Zhang W, Zhu X, Cao Q (2020) Pointnet++ grasping: learning an end-to-end spatial grasp generation algorithm from sparse point clouds. Preprint arXiv:2003.09644

  • Nikandrova E, Kyrki V (2015) Category-based task specific grasping. Robot Auton Syst 70:25–35

    Google Scholar 

  • Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 119–134

  • Pang Y, Zhang L, Zhao X, Lu H (2020) Hierarchical dynamic filtering network for rgb-d salient object detection. In: Proceedings of the European conference on computer vision (ECCV)

  • Park D, Chun SY (2018) Classification based grasp detection using spatial transformer network. Preprint arXiv:1803.01356

  • Park D, Seo Y, Chun SY (2018) Real-time, highly accurate robotic grasp detection using fully convolutional neural network with rotation ensemble module. Preprint arXiv:1812.07762

  • Park D, Seo Y, Shin D, Choi J, Chun SY (2019) A single multi-task deep neural network with post-processing for object detection with reasoning and robotic grasp detection. Preprint arXiv:1909.07050

  • Park K, Mousavian A, Xiang Y, Fox D (2020) Latentfusion: end-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10710–10719

  • Park K, Patten T, Vincze M (2019) Pix2pose: pixel-wise coordinate regression of objects for 6d pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 7668–7677

  • ten Pas A, Gualtieri M, Saenko K, Platt R (2017) Grasp pose detection in point clouds. Int J Rob Res 36(13–14):1455–1473

    Google Scholar 

  • Pas At, Platt R (2015) Using geometry to detect grasps in 3d point clouds. Preprint arXiv:1501.03100

  • Patil AV, Rabha P (2018) A survey on joint object detection and pose estimation using monocular vision. Preprint arXiv:1811.10216

  • Patten T, Park K, Vincze M (2020) Dgcm-net: dense geometrical correspondence matching network for incremental experience-based robotic grasping. Preprint arXiv:2001.05279

  • Peng H, Li B, Ling H, Hu W, Xiong W, Maybank SJ (2016) Salient object detection via structured matrix decomposition. IEEE Trans Pattern Anal Mach Intell 39(4):818–832

    Google Scholar 

  • Peng H, Li B, Xiong W, Hu W, Ji R (2014) Rgbd salient object detection: a benchmark and algorithms. In: European conference on computer vision, Springer, pp 92–109

  • Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4561–4570

  • Pereira N, Alexandre LA (2019) Maskedfusion: mask-based 6d object pose estimation. Preprint arXiv:1911.07771

  • Pham QH, Nguyen T, Hua BS, Roig G, Yeung SK (2019) Jsis3d: joint semantic-instance segmentation of 3d point clouds with multi-task pointwise networks and multi-value conditional random fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8827–8836

  • Pham QH, Uy MA, Hua BS, Nguyen DT, Roig G, Yeung SK (2020) Lcd: learned cross-domain descriptors for 2d–3d matching. In: AAAI, pp 11856–11864

  • Piao Y, Ji W, Li J, Zhang M, Lu H (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE international conference on computer vision, pp 7254–7263

  • Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Advances in neural information processing systems, pp 1990–1998

  • Pinheiro PO, Lin TY, Collobert R, Dollár P (2016) Learning to refine object segments. In: European conference on computer vision, Springer, pp 75–91

  • Pinto L, Gupta A (2016) Supersizing self-supervision: learning to grasp from 50k tries and 700 robot hours. In: IEEE International conference on robotics and automation (ICRA), IEEE, pp 3406–3413

  • Ponce J, Sullivan S, Boissonnat JD, Merlet JP (1993) On characterizing and computing three-and four-finger force-closure grasps of polyhedral objects. In: IEEE international conference on robotics and automation, IEEE, pp 821–827

  • Qi CR, Chen X, Litany O, Guibas LJ (2020) Imvotenet: boosting 3d object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4404–4413

  • Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE international conference on computer vision, pp 9277–9286

  • Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 918–927

  • Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660

  • Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, pp 5099–5108

  • Qi Q, Zhao S, Shen J, Lam KM (2019) Multi-scale capsule attention-based salient object detection with multi-crossed layer connections. In: 2019 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1762–1767

  • Qin Y, Chen R, Zhu H, Song M, Xu J, Su H (2020) S4g: Amodal single-view single-shot se (3) grasp detection in cluttered scenes. In: Conference on robot learning, pp 53–65

  • Qu L, He S, Zhang J, Tian J, Tang Y, Yang Q (2017) Rgbd salient object detection via deep fusion. IEEE Trans Image Process 26(5):2274–2285

    MathSciNet  MATH  Google Scholar 

  • Rabbani T, Van Den Heuvel F (2005) Efficient hough transform for automatic detection of cylinders in point clouds. Isprs Wg Iii/3, Iii/4 3:60–65

  • Rad M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: IEEE international conference on computer vision, pp 3828–3836

  • Redmon J, Angelova A (2015) Real-time grasp detection using convolutional neural networks. In: 2015 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1316–1322

  • Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  • Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  • Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. Preprint arXiv:1804.02767

  • Ren J, Gong X, Yu L, Zhou W, Ying Yang M (2015) Exploiting global priors for rgb-d saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 25–32

  • Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  • Rennie C, Shome R, Bekris KE, De Souza AF (2016) A dataset for improved rgbd-based object detection and pose estimation for warehouse pick-and-place. IEEE Robot Autom Lett 1(2):1179–1185

    Google Scholar 

  • Rosten E, Drummond T (2005) Fusing points and lines for high performance tracking. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, IEEE, vol 2, pp 1508–1515

  • Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 International conference on computer vision, IEEE, pp 2564–2571

  • Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: IEEE international conference on robotics and automation, pp 3212–3217

  • Rusu RB, Blodow N, Marton ZC, Beetz M (2009) Close-range scene segmentation and reconstruction of 3d point cloud maps for mobile manipulation in domestic environments. In: 2009 IEEE/RSJ international conference on intelligent robots and systems, IEEE, pp 1–6

  • Sabour S, Frosst N, Hinton G (2018) Matrix capsules with em routing. In: 6th international conference on learning representations, ICLR, pp 1–15

  • Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856–3866

  • Sahbani A, El-Khoury S, Bidaud P (2012) An overview of 3d object grasp synthesis algorithms. Robot Auton Syst 60(3):326–336 Autonomous Grasping

    Google Scholar 

  • Sajjan SS, Moore M, Pan M, Nagaraja G, Lee J, Zeng A, Song S (2019) Cleargrasp: 3d shape estimation of transparent objects for manipulation. Preprint arXiv:1910.02550

  • Salti S, Tombari F, Stefano LD (2014) Shot: Unique signatures of histograms for surface and texture description. Comput Vis Image Underst 125:251–264

    Google Scholar 

  • Sanchez J, Corrales JA, Bouzgarrou BC, Mezouar Y (2018) Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey. Int J Robot Res 37(7):688–716

    Google Scholar 

  • Sarode V, Li X, Goforth H, Aoki Y, Dhagat A, Srivatsan RA, Lucey S, Choset H (2019) One framework to register them all: pointnet encoding for point cloud alignment. Preprint arXiv:1912.05766

  • Sarode V, Li X, Goforth H, Aoki Y, Srivatsan RA, Lucey S, Choset H (2019) Pcrnet: point cloud registration network using pointnet encoding. Preprint arXiv:1908.07906

  • Saxena A, Driemeyer J, Kearns J, Osondu C, Ng AY (2008a) Learning to grasp novel objects using vision. In: Experimental robotics, Springer, pp 33–42

  • Saxena A, Driemeyer J, Ng AY (2008b) Robotic grasping of novel objects using vision. Int J Robot Res 27(2):157–173

  • Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. Preprint arXiv:1312.6229

  • Shi J, Yan Q, Xu L, Jia J (2015) Hierarchical image saliency detection on extended cssd. IEEE Trans Pattern Anal Mach Intell 38(4):717–729

    Google Scholar 

  • Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–779

  • Shi S, Wang Z, Shi J, Wang X, Li H (2020) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. Preprint arXiv:1907.03670

  • Shi W, Rajkumar R (2020) Point-gnn: graph neural network for 3d object detection in a point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1711–1719

  • Simon M, Fischer K, Milz S, Witt CT, Gross HM (2020) Stickypillars: robust feature matching on point clouds using graph neural networks. Preprint arXiv:2002.03983

  • Song C, Song J, Huang Q (2020) Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 431–440

  • Song S, Xiao J (2014) Sliding shapes for 3d object detection in depth images. In: European conference on computer vision, Springer, pp 634–651

  • Song S, Xiao J (2016) Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 808–816

  • Sultana F, Sufian A, Dutta P (2020) Evolution of image segmentation using deep convolutional neural network: a survey. Preprint arXiv:2001.04074

  • Sultana F, Sufian A, Dutta P (2020) A review of object detection models based on convolutional neural network. In: Intelligent computing: image processing based applications, Springer, pp 1–16

  • Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images. In: European conference on computer vision, Springer International Publishing, pp 712–729

  • Suzuki K, Yokota Y, Kanazawa Y, Takebayashi T (2020) Online self-supervised learning for object picking: detecting optimum grasping position using a metric learning approach. In: 2020 IEEE/SICE international symposium on system integration (SII), IEEE, pp 205–212

  • Szegedy C, Reed S, Erhan D, Anguelov D, Ioffe S (2014) Scalable, high-quality object detection. Preprint arXiv:1412.1441

  • Tam GK, Cheng ZQ, Lai YK, Langbein FC, Liu Y, Marshall D, Martin RR, Sun XF, Rosin PL (2013) Registration of 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE Trans Vis Comput Graph 19(7):1199–1217

    Google Scholar 

  • Tejani A, Tang D, Kouskouridas R, Kim TK (2014) Latent-class hough forests for 3d object detection and pose estimation. In: European conference on computer vision, Springer, pp 462–477

  • Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: IEEE conference on computer vision and pattern recognition, pp 292–301

  • Tian H, Wang C, Manocha D, Zhang X (2019) Transferring grasp configurations using active learning and local replanning. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 1622–1628

  • Tian M, Pan L, Ang Jr MH, Lee G.H (2020) Robust 6d object pose estimation by learning rgb-d features. Preprint arXiv:2003.00188

  • Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9627–9636

  • Tosun T, Yang D, Eisner B, Isler V, Lee D (2020) Robotic grasping through combined image-based grasp proposal and 3d reconstruction. Preprint arXiv:2003.01649

  • Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. Preprint arXiv:1809.10790

  • Truong P, Apostolopoulos S, Mosinska A, Stucky S, Ciller C, Zanet SD (2019) Glampoints: greedily learned accurate match points. In: Proceedings of the IEEE international conference on computer vision, pp 10732–10741

  • Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

    Google Scholar 

  • Vacchetti L, Lepetit V, Fua P (2004) Stable real-time 3d tracking using online and offline information. IEEE Trans Pattern Anal Mach Intell 26(10):1385–1391

    Google Scholar 

  • Vahrenkamp N, Westkamp L, Yamanobe N, Aksoy EE, Asfour T (2016) Part-based grasp planning for familiar objects. In: IEEE-RAS 16th international conference on humanoid robots (Humanoids), IEEE, pp 919–925

  • Varley J, DeChant C, Richardson A, Ruales J, Allen P (2017) Shape completion enabled robotic grasping. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 2442–2447

  • Vidal J, Lin C, Martí R (2018) 6d pose estimation using an improved method based on point pair features. In: 4th international conference on control, automation and robotics (ICCAR), pp 405–409

  • Villena-Martinez V, Oprea S, Saval-Calvo M, Azorin-Lopez J, Fuster-Guillo A, Fisher RB (2020) When deep learning meets data alignment: a review on deep registration networks (drns). Preprint arXiv:2003.03167

  • Vohra M, Prakash R, Behera L (2019) Real-time grasp pose estimation for novel objects in densely cluttered environment. In: 2019 28th IEEE international conference on robot and human interactive communication (RO-MAN), IEEE, pp 1–6

  • Wada K, Sucar E, James S, Lenton D, Davison AJ (2020) Morefusion: multi-object reasoning for 6d pose estimation from volumetric fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14540–14549

  • Wang C, Martín-Martín R, Xu D, Lv J, Lu C, Fei-Fei L, Savarese S, Zhu Y (2019) 6-pack: category-level 6d pose tracker with anchor-based keypoints. Preprint arXiv:1910.10750

  • Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3343–3352

  • Wang H, Sridhar S, Huang J, Valentin J, Song S, Guibas LJ (2019) Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2642–2651

  • Wang S, Jiang X, Zhao J, Wang X, Zhou W, Liu Y (2019) Efficient fully convolution neural network for generating pixel wise robotic grasps with high resolution images. In: 2019 IEEE international conference on robotics and biomimetics (ROBIO), IEEE, pp 474–480

  • Wang S, Wu J, Sun X, Yuan W, Freeman WT, Tenenbaum JB, Adelson EH (2018) 3d shape perception from monocular vision, touch, and shape priors. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 1606–1613

  • Wang W, Lai Q, Fu H, Shen J, Ling H (2019) Salient object detection in the deep learning era: an in-depth survey. Preprint arXiv:1904.09146

  • Wang W, Shen J, Shao L, Porikli F (2016) Correspondence driven saliency transfer. IEEE Trans Image Process 25(11):5025–5034

    MathSciNet  MATH  Google Scholar 

  • Wang W, Yu R, Huang Q, Neumann U (2018) Sgpn: similarity group proposal network for 3d point cloud instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2569–2578

  • Wang X, Kong T, Shen C, Jiang Y, Li L (2019) Solo: segmenting objects by locations. Preprint arXiv:1912.04488

  • Wang X, Liu S, Shen X, Shen C, Jia J (2019) Associatively segmenting instances and semantics in point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4096–4105

  • Wang Y, Solomon JM (2019) Deep closest point: learning representations for point cloud registration. In: Proceedings of the IEEE international conference on computer vision, pp 3523–3532

  • Wang Y, Solomon JM (2019) Prnet: self-supervised learning for partial-to-partial registration. In: Advances in neural information processing systems, pp 8812–8824

  • Wang Z, Jia K (2019) Frustum convnet: sliding frustums to aggregate local point-wise features for amodal 3d object detection. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 1742–1749

  • Watkins-Valls D, Varley J, Allen P (2019) Multi-modal geometric learning for grasping and manipulation. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 7339–7345

  • Wei Y, Wen F, Zhu W, Sun J (2012) Geodesic saliency using background priors. In: European conference on computer vision, Springer, pp 29–42

  • Wong JM, Kee V, Le T, Wagner S, Mariottini GL, Schneider A, Hamilton L, Chipalkatty R, Hebert M, Johnson DM, et al (2017) Segicp: integrated deep semantic segmentation and pose estimation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 5784–5789

  • Xiang Y, Schmidt T, Narayanan V, Fox D (2018) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. Preprint arXiv:1711.00199

  • Xie C, Xiang Y, Mousavian A, Fox D (2020) The best of both modes: separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on robot learning, pp 1369–1378

  • Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2020) Polarmask: single shot instance segmentation with polar representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12193–12202

  • Xie Q, Lai YK, Wu J, Wang Z, Zhang Y, Xu K, Wang J (2020) Mlcvnet: multi-level context votenet for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10447–10456

  • Xu D, Anguelov D, Jain A (2018) Pointfusion: deep sensor fusion for 3d bounding box estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition

  • Xue Z, Kasper A, Zoellner JM, Dillmann R (2009) An automatic grasp planning system for service robots. In: 2009 international conference on advanced robotics, IEEE, pp 1–6

  • Yan X, Hsu J, Khansari M, Bai Y, Pathak A, Gupta A, Davidson J, Lee H (2018) Learning 6-dof grasping interaction via deep geometry-aware 3d representations. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–9

  • Yan X, Khansari M, Hsu J, Gong Y, Bai Y, Pirk S, Lee H (2019) Data-efficient learning for sim-to-real robotic grasping using deep point cloud prediction networks. Preprint arXiv:1906.08989

  • Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337

    Google Scholar 

  • Yang B, Wang J, Clark R, Hu Q, Wang S, Markham A, Trigoni N (2019) Learning object bounding boxes for 3d instance segmentation on point clouds. In: Advances in neural information processing systems, pp 6737–6746

  • Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3166–3173

  • Yang H, Shi J, Carlone L (2020) Teaser: fast and certifiable point cloud registration. Preprint arXiv:2001.07715

  • Yang J, Li H, Campbell D, Jia Y (2015) Go-icp: a globally optimal solution to 3d icp point-set registration. IEEE Trans Pattern Anal Mach Intell 38(11):2241–2254

    Google Scholar 

  • Yang S, Zhang W, Lu W, Wang H, Li Y (2019) Learning actions from human demonstration video for robotic manipulation. Preprint arXiv:1909.04312

  • Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11040–11048

  • Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE international conference on computer vision, pp 1951–1960

  • Ye M, Xu S, Cao T (2020) Hvnet: hybrid voxel network for lidar based 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1631–1640

  • Yew ZJ, Lee GH (2018) 3dfeat-net: weakly supervised local 3d features for point cloud registration. In: European conference on computer vision, Springer, pp 630–646

  • Yi KM, Trulls E, Lepetit V, Fua P (2016) Lift: learned invariant feature transform. In: European conference on computer vision, Springer, pp 467–483

  • Yi L, Zhao W, Wang H, Sung M, Guibas LJ (2019) Gspn: generative shape proposal network for 3d instance segmentation in point cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3947–3956

  • Yokota Y, Suzuki K, Kanazawa Y, Takebayashi T (2020) A multi-task learning framework for grasping-position detection and few-shot classification. In: 2020 IEEE/SICE international symposium on system integration (SII), IEEE, pp 1033–1039

  • Yu F, Liu K, Zhang Y, Zhu C, Xu K (2019) Partnet: a recursive part decomposition network for fine-grained and hierarchical shape segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9491–9500

  • Yu P, Rao Y, Lu J, Zhou J (2019) P\(^{2}\)gnet: pose-guided point cloud generating networks for 6-dof object pose estimation. Preprint arXiv:1912.09316 (2019)

  • Yu X, Zhuang Z, Koniusz P, Li H (2020) 6dof object pose estimation via differentiable proxy voting loss. Preprint arXiv:2002.03923

  • Yuan Y, Hou J, Nüchter A, Schwertfeger S (2020) Self-supervised point set local descriptors for point cloud registration. Preprint arXiv:2003.05199

  • Zakharov S, Shugurov I, Ilic S (2019) Dpod: 6d pose object detector and refiner. In: Proceedings of the IEEE international conference on computer vision, pp 1941–1950

  • Zapata-Impata BS, Gil P, Pomares J, Torres F (2019) Fast geometry-based computation of grasping points on three-dimensional point clouds. Int J Adv Robot Syst 16(1):1729881419831846

    Google Scholar 

  • Zapata-Impata BS, Mateo Agulló C, Gil P, Pomares J (2017) Using geometry to detect grasping points on 3d unknown point cloud

  • Zeng A, Song S, Nießner M, Fisher M, Xiao J, Funkhouser T (2017a) 3dmatch: learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1802–1811

  • Zeng A, Yu KT, Song S, Suo D, Walker E, Rodriguez A, Xiao J (2017b) Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 1386–1383

  • Zeng A, Song S, Yu KT, Donlon E, Hogan FR, Bauza M, Ma D, Taylor O, Liu M, Romo E, et al (2018) Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–8

  • Zhang F, Guan C, Fang J, Bai S, Yang R, Torr P, Prisacariu V (2020) Instance segmentation of lidar point clouds. ICRA, Cited by 4(1)

  • Zhang H, Lan X, Bai S, Wan L, Yang C, Zheng N (2018) A multi-task convolutional neural network for autonomous robotic grasping in object stacking scenes. Preprint arXiv:1809.07081

  • Zhang H, Lan X, Bai S, Zhou X, Tian Z, Zheng N (2018) Roi-based robotic grasp detection for object overlapping scenes. Preprint arXiv:1808.10313

  • Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Unconstrained salient object detection via proposal subset optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5733–5742

  • Zhang Q, Qu D, Xu F, Zou F (2017) Robust robot grasp detection in multimodal fusion. In: MATEC web of conferences, EDP Sciences, vol 139, p 00060

  • Zhang Z, Sun B, Yang H, Huang Q (2020) H3dnet: 3d object detection using hybrid geometric primitives. In: Proceedings of the European conference on computer vision (ECCV)

  • Zhao L, Tao W (2020) Jsnet: Joint instance and semantic segmentation of 3d point clouds. In: Thirty-Fourth AAAI conference on artificial intelligence

  • Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274

  • Zhao S, Li B, Xu P, Keutzer K (2020) Multi-source domain adaptation in the deep learning era: a systematic survey. Preprint arXiv:2002.12169

  • Zhao ZQ, Zheng P, Xu S, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

    Google Scholar 

  • Zhao B, Zhang H, Lan X, Wang H, Tian Z, Zheng N (2020) Regnet: region-based grasp network for single-shot grasp detection in point clouds. Preprint arXiv:2002.12647

  • Zheng T, Chen C, Yuan J, Li B, Ren K (2019) Pointcloud saliency maps. In: Proceedings of the IEEE international conference on computer vision, pp 1598–1606

  • Zhou QY, Park J, Koltun V (2016) Fast global registration. In: European conference on computer vision, Springer, pp 766–782

  • Zhou X, Lan X, Zhang H, Tian Z, Zhang Y, Zheng N (2018) Fully convolutional grasp detection network with oriented anchor box. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 7223–7230

  • Zhou X, Wang D, Krähenbühl P (2019) Objects as points. Preprint arXiv:1904.07850

  • Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 850–859

  • Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499

  • Zhou Z, Pan T, Wu S, Chang H, Jenkins OC (2019) Glassloc: plenoptic grasp pose detection in transparent clutter. Preprint arXiv:1909.04269

  • Zhu A, Yang J, Zhao C, Xian K, Cao Z, Li X (2020) Lrf-net: learning local reference frames for 3d local shape description and matching. Preprint arXiv:2001.07832

  • Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2814–2821

  • Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. Preprint arXiv:1905.05055

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoguang Du.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, G., Wang, K., Lian, S. et al. Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif Intell Rev 54, 1677–1734 (2021). https://doi.org/10.1007/s10462-020-09888-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09888-5

Keywords

Navigation