Abstract
This paper presents a comprehensive survey on vision-based robotic grasping. We conclude three key tasks during vision-based robotic grasping, which are object localization, object pose estimation and grasp estimation. In detail, the object localization task contains object localization without classification, object detection and object instance segmentation. This task provides the regions of the target object in the input data. The object pose estimation task mainly refers to estimating the 6D object pose and includes correspondence-based methods, template-based methods and voting-based methods, which affords the generation of grasp poses for known objects. The grasp estimation task includes 2D planar grasp methods and 6DoF grasp methods, where the former is constrained to grasp from one direction. These three tasks could accomplish the robotic grasping with different combinations. Lots of object pose estimation methods need not object localization, and they conduct object localization and object pose estimation jointly. Lots of grasp estimation methods need not object localization and object pose estimation, and they conduct grasp estimation in an end-to-end manner. Both traditional methods and latest deep learning-based methods based on the RGB-D image inputs are reviewed elaborately in this survey. Related datasets and comparisons between state-of-the-art methods are summarized as well. In addition, challenges about vision-based robotic grasping and future directions in addressing these challenges are also pointed out.
Similar content being viewed by others
References
Akkaya I, Andrychowicz M, Chociej M, Litwin M, McGrew B, Petron A, Paino A, Plappert M, Powell G, Ribas R, et al (2019) Solving rubik’s cube with a robot hand. Preprint arXiv:1910.07113
Aldoma A, Vincze M, Blodow N, Gossow D, Gedikli S, Rusu RB, Bradski G (2011) Cad-model recognition and 6dof pose estimation using 3d cues. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), IEEE, pp 585–592
Aoki Y, Goforth H, Srivatsan RA, Lucey S (2019) Pointnetlk: robust & efficient point cloud registration using pointnet. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7163–7172
Ardón P, Pairet È, Petrick RP, Ramamoorthy S, Lohan KS (2019) Learning grasp affordance reasoning through semantic relations. IEEE Robot Autom Lett 4(4):4571–4578
Asif U, Tang J, Harrer S (2018) Graspnet: an efficient convolutional neural network for real-time grasp detection for low-powered devices. In: IJCAI, pp 4875–4882
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: European conference on computer vision, Springer, pp 404–417
Bellekens B, Spruyt V, Berkvens R, Weyn M (2014) A survey of rigid 3d pointcloud registration algorithms. In: AMBIENT 2014: the fourth international conference on ambient computing, applications, services and technologies, August 24–28, 2014, Rome, Italy, pp 8–13
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522
Berscheid L, Meißner P, Kröger T (2019) Robot learning of shifting objects for grasping in cluttered environments. Preprint arXiv:1907.11035
Besl PJ, McKay ND (1992) A method for registration of 3-d shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256
Bhatia S, Chalup SK et al (2013) Segmenting salient objects in 3d point clouds of indoor scenes using geodesic distances. J Signal Inf Process 4(03):102
Billings G, Johnson-Roberson M (2018) Silhonet: An RGB method for 3d object pose estimation and grasp planning. CoRR abs/1809.06893
Blomqvist K, Breyer M, Cramariuc A, Förster J, Grinvald M, Tschopp F, Chung JJ, Ott L, Nieto J, Siegwart R (2020) Go fetch: mobile manipulation in unstructured environments. Preprint arXiv:2004.00899
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. Preprint arXiv:2004.10934
Bohg J, Kragic D (2010) Learning grasping points with shape context. Robot Auton Syst 58(4):362–377
Bohg J, Morales A, Asfour T, Kragic D (2014) Data-driven grasp synthesis: a survey. IEEE Trans Robot 30(2):289–309
Bolya D, Zhou C, Xiao F, Lee YJ (2019) Yolact++: better real-time instance segmentation. Preprint arXiv:1912.06218
Bolya D, Zhou C, Xiao F, Lee YJ (2019) Yolact: real-time instance segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 9157–9166
Borji A, Cheng MM, Hou Q, Jiang H, Li J (2019) Salient object detection: A survey. Computational visual media 5(2):117–150
Borst C, Fischer M, Hirzinger G (2003) Grasping the dice by dicing the grasp. In: IEEE/RSJ international conference on intelligent robots and systems, IEEE, vol 4, pp 3692–3697
Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K et al (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4243–4250
Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision, Springer, pp 536–551
Brachmann E, Michel F, Krull A, Ying Yang M, Gumhold S et al (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3364–3372
Bradski G, Kaehler A (2008) Learning OpenCV: computer vision with the OpenCV library. “ O’Reilly Media, Inc.”
Cai J, Cheng H, Zhang Z, Su J (2019) Metagrasp: data efficient grasping by affordance interpreter network. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 4960–4966
Caldera S, Rassau A, Chai D (2018) Review of deep learning methods in robotic grasp detection. Multimodal Technol Interact 2(3):57
Castro P, Armagan A, Kim TK (2020) Accurate 6d object pose estimation by pose conditioned mesh reconstruction. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4147–4151
Chen D, Li J, Wang Z, Xu K (2020) Learning canonical shape space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11973–11982
Chen H, Li Y (2018) Progressively complementarity-aware fusion network for rgb-d salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3051–3060
Chen H, Li Y (2019) Cnn-based rgb-d salient object detection: learn, select and fuse. Preprint arXiv:1909.09309
Chen H, Li Y, Su D (2019) Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for rgb-d salient object detection. Pattern Recogn 86:376–385
Chen H, Sun K, Tian Z, Shen C, Huang Y, Yan Y (2020) Blendmask: top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8573–8581
Chen IM, Burdick JW (1993) Finding antipodal point grasps on irregularly shaped objects. IEEE Trans Robot Autom 9(4):507–512
Chen K, Pang J, Wang J, Xiong Y, Li X, Sun S, Feng W, Liu Z, Shi J, Ouyang W, et al (2019) Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4974–4983
Chen LC, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4013–4022
Chen W, Jia X, Chang HJ, Duan J, Leonardis A (2020) G2l-net: global to local network for real-time 6d pose estimation with embedding vector features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4233–4242
Chen X, Girshick R, He K, Dollár P (2019) Tensormask: a foundation for dense object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 2061–2069
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2014) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582
Choy C, Dong W, Koltun V (2020) Deep global registration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2514–2523
Chu FJ, Xu R, Vela PA (2018) Real-world multiobject, multigrasp detection. IEEE Robot Autom Lett 3(4):3355–3362
Chu FJ, Xu R, Vela PA (2019) Detecting robotic affordances on novel objects with regional attention and attributes. Preprint arXiv:1909.05770
Crivellaro A, Rad M, Verdie Y, Yi KM, Fua P, Lepetit V (2017) Robust 3d object tracking from monocular images using stable parts. IEEE Trans Pattern Anal Mach Intell 40(6):1465–1479
Dai A, Nießner M, Zollhöfer M, Izadi S, Theobalt C (2017) Bundlefusion: real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans Graph (ToG) 36(4):1
Dai J, He K, Li Y, Ren S, Sun J (2016) Instance-sensitive fully convolutional networks. In: European conference on computer vision, Springer, pp 534–549
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3150–3158
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Danielczuk M, Matl M, Gupta S, Li A, Lee A, Mahler J, Goldberg K (2019) Segmenting unknown 3d objects from real depth images using mask r-cnn trained on synthetic data. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 7283–7290
Deng X, Xiang Y, Mousavian A, Eppner C, Bretl T, Fox D (2020) Self-supervised 6d object pose estimation for robot manipulation. In: International conference on robotics and automation (ICRA)
Depierre A, Dellandréa E, Chen L (2018) Jacquard: a large scale dataset for robotic grasp detection. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 3511–3516
Depierre A, Dellandréa E, Chen L (2020) Optimizing correlated graspability score and grasp regression for better grasp prediction. Preprint arXiv:2002.00872
DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 224–236
Ding D, Liu YH, Wang MY (2001) On computing immobilizing grasps of 3-d curved objects. In: IEEE international symposium on computational intelligence in robotics and automation, IEEE, pp 11–16
Do TT, Cai M, Pham T, Reid I (2018) Deep-6dpose: recovering 6d object pose from a single rgb image. Preprint arXiv:1802.10367
Do TT, Nguyen A, Reid I (2018) Affordancenet: an end-to-end deep learning approach for object affordance detection. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–5
Domae Y, Okuda H, Taguchi Y, Sumi K, Hirai T (2014) Fast graspability evaluation on single depth maps for bin picking with general grippers. In: 2014 IEEE international conference on robotics and automation (ICRA), IEEE, pp. 1997–2004
Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10519–10528
Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartogr Int J Geogr Inf Geovis 10(2):112–122
Drost B, Ilic S (2012) 3d object detection and localization using multimodal point pair features. In: International conference on 3D imaging, modeling, processing, visualization transmission, pp 9–16
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 998–1005
Du L, Ye X, Tan X, Feng J, Xu Z, Ding E, Wen S (2020) Associate-3ddet: perceptual-to-conceptual association for 3d point cloud object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13329–13338
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6569–6578
Engelmann F, Bokeloh M, Fathi A, Leibe B, Nießner M (2020) 3d-mpa: multi-proposal aggregation for 3d semantic instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9031–9040
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154
Falco P, Lu S, Natale C, Pirozzi S, Lee D (2019) A transfer learning approach to cross-modal object recognition: from visual observation to robotic haptic exploration. IEEE Trans Robot 35(4):987–998
Fan Y, Tomizuka M (2019) Efficient grasp planning and execution with multifingered hands by surface fitting. IEEE Robot Autom Lett 4(4):3995–4002
Fan Z, Yu JG, Liang Z, Ou J, Gao C, Xia GS, Li Y (2020) Fgn: fully guided network for few-shot instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9172–9181
Fang HS, Wang C, Gou M, Lu C (2020) Graspnet-1billion: a large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11444–11453
Fang K, Bai Y, Hinterstoisser S, Savarese S, Kalakrishnan M (2018) Multi-task domain adaptation for deep learning of instance grasping from simulation. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE, pp 3516–3523
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Fitzgibbon AW, Fisher RB et al (1996) A buyer’s guide to conic fitting. Department of Artificial Intelligence, University of Edinburgh, Edinburgh
Florence PR, Manuelli L, Tedrake R (2018) Dense object nets: learning dense visual object descriptors by and for robotic manipulation. Preprint arXiv:1806.08756
Frome A, Huber D, Kolluri R, Bülow T, Malik J (2004) Recognizing objects in range data using regional point descriptors. In: European conference on computer vision, Springer, pp 224–237
Gao G, Lauri M, Wang Y, Hu X, Zhang J, Frintrop S (2020) 6d object pose regression via supervised learning on point clouds. Preprint arXiv:2001.08942
Gao W, Tedrake R (2019) kpam-sc: generalizable manipulation planning using keypoint affordance and shape completion. Preprint arXiv:1909.06980
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR ’14, pp 580–587
Gojcic Z, Zhou C, Wegner JD, Wieser A (2019) The perfect match: 3d point cloud matching with smoothed densities. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5545–5554
Gonzalez M, Kacete A, Murienne A, Marchand E (2020) Yoloff: you only learn offsets for robust 6dof object pose estimation. Preprint arXiv:2002.00911
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision, Springer, pp 241–257
Goron LC, Marton ZC, Lazea G, Beetz M (2012) Robustly segmenting cylindrical and box-like objects in cluttered scenes using depth cameras. In: ROBOTIK 2012; 7th German conference on robotics, VDE, pp 1–6
Graham B, Engelcke M, van der Maaten L (2018) 3d semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9224–9232
Graham B, van der Maaten L (2017) Submanifold sparse convolutional networks. Preprint arXiv:1706.01307
Guo D, Kong T, Sun F, Liu H (2016) Object discovery and grasp detection with a shared convolutional neural network. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 2038–2043
Guo D, Sun F, Liu H, Kong T, Fang B, Xi N (2017) A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1609–1614
Guo F, Wang W, Shen J, Shao L, Yang J, Tao D, Tang YY (2017) Video saliency detection using object proposals. IEEE Trans Cybern 48(11):3159–3170
Guo Y, Bennamoun M, Sohel F, Lu M, Wan J, Kwok NM (2016) A comprehensive performance evaluation of 3d local feature descriptors. Int J Comput Vis 116(1):66–89
Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M (2019) Deep learning for 3d point clouds: a survey. Preprint arXiv:1912.12033
Hafiz AM, Bhat GM (2020) A survey on instance segmentation: state of the art. Int J Multimed Inf Retr 9(3):171–189
Hagelskjær F, Buch AG (2019) Pointposenet: accurate object detection and 6 dof pose estimation in point clouds. Preprint arXiv:1912.09057
Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process Mag 35(1):84–100
Han L, Zheng T, Xu L, Fang L (2020) Occuseg: occupancy-aware 3d instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2940–2949
Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: European conference on computer vision, Springer, pp 297–312
He K, Gkioxari G, Dollár P, Girshick RB (2017) Mask r-cnn. IEEE International conference on computer vision (ICCV), pp 2980–2988
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: a deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11632–11641
Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision, Springer, pp 548–562
Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: International conference on artificial neural networks, Springer, pp 44–51
Hodan T, Barath D, Matas J (2020) Epos: estimating 6d pose of objects with symmetries. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11703–11712
Hodaň T, Haluza P, Obdržálek Š, Matas J, Lourakis M, Zabulis X (2017) T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: IEEE winter conference on applications of computer vision (WACV)
Hodan T, Kouskouridas R, Kim T, Tombari F, Bekris KE, Drost B, Groueix T, Walas K, Lepetit V, Leonardis A, Steger C, Michel F, Sahin C, Rother C, Matas J (2018) A summary of the 4th international workshop on recovering 6d object pose. CoRR abs/1810.03758
Hodaň T, Michel F, Brachmann E, Kehl W, GlentBuch A, Kraft D, Drost B, Vidal J, Ihrke S, Zabulis X et al (2018) Bop: benchmark for 6d object pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 19–34
Hodaň T, Zabulis X, Lourakis M, Obdržálek Š, Matas J (2015) Detection and fine 3d pose estimation of texture-less objects in rgb-d images. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 4421–4428
Hogan FR, Ballester J, Dong S, Rodriguez A (2020) Tactile dexterity: manipulation primitives with tactile feedback. Preprint arXiv:2002.03236
Hou J, Dai A, Nießner M (2019) 3d-sis: 3d semantic instance segmentation of rgb-d scans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4421–4430
Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212
Hu Y, Fua P, Wang W, Salzmann M (2020) Single-stage 6d object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2930–2939
Hu Y, Hugonot J, Fua P, Salzmann M (2019) Segmentation-driven 6d object pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3385–3394
Jiang H, Wang J, Yuan Z, Wu Y, Zheng N, Li S (2013) Salient object detection: a discriminative regional feature integration approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2083–2090
Jiang H, Xiao J (2013) A linear approach to matching cuboids in rgbd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2171–2178
Jiang Y, Moseson S, Saxena A (2011) Efficient grasping from rgbd images: learning using a new rectangle representation. In: IEEE international conference on robotics and automation, IEEE, pp 3304–3311
Johnson AE (1997) Spin-images: a representation for 3-d surface matching
Kaiser A, Ybanez Zepeda JA, Boubekeur T (2019) A survey of simple geometric primitives detection methods for captured 3d data. In: Computer graphics forum, Wiley Online Library, vol 38, pp 167–196
Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529
Khan SH, He X, Bennamoun M, Sohel F, Togneri R (2015) Separating objects and clutter in indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4603–4611
Kim G, Huber D, Hebert M (2008) Segmentation of salient regions in outdoor scenes using imagery and 3-d data. In: 2008 IEEE workshop on applications of computer vision, IEEE, pp 1–8
Kirillov A, Wu Y, He K, Girshick R (2020) Pointrend: image segmentation as rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9799–9808
Kirkpatrick D, Mishra B, Yap CK (1992) Quantitative steinitz’s theorems with applications to multifingered grasping. Discrete Comput Geom 7(3):295–318
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems—volume 1, NIPS’12, pp 1097–1105
Kumra S, Joshi S, Sahin F (2019) Antipodal robotic grasping using generative residual convolutional neural network. Preprint arXiv:1909.04810
Kumra S, Kanan C (2017) Robotic grasp detection using deep convolutional neural networks. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 769–776
Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12697–12705
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Lee MA, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, Garg A, Bohg J (2019) Making sense of vision and touch: self-supervised learning of multimodal representations for contact-rich tasks. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 8943–8950
Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13906–13915
Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. Int J Robot Res 34(4–5):705–724
León B, Ulbrich S, Diankov R, Puche G, Przybylski M, Morales A, Asfour T, Moisio S, Bohg J, Kuffner J, Dillmann R (2010) Opengrasp: a toolkit for robot grasping simulation. In: Ando N, Balakirsky S, Hemker T, Reggiani M, von Stryk O (eds) Simulation, modeling, and programming for autonomous robots. Springer, Berlin, pp 109–120
Lepetit V, Fua P et al (2005) Monocular model-based 3d tracking of rigid objects: a survey. Found Trends® Comput Graph Vis 1(1):1–89
Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: an accurate o(n) solution to the pnp problem. IJCV 81(2):155–166
Li G, Liu Z, Ye L, Wang Y, Ling H (2020) Cross-modal weighting network for rgb-d salient object detection
Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2359–2367
Li Y, Wang G, Ji X, Xiang Y, Fox D (2018) Deepim: deep iterative matching for 6d pose estimation. Lecture notes in computer science, pp 695–711
Li Z, Wang G, Ji X (2019) Cdpn: coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 7678–7687
Liang H, Ma X, Li S, Görner M, Tang S, Fang B, Sun F, Zhang J (2019) Pointnetgpd: detecting grasp configurations from point sets. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 3629–3635
Liang M, Yang B, Chen Y, Hu R, Urtasun R (2019) Multi-task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7345–7353
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Liu C, Furukawa Y (2019) Masc: multi-scale affinity with sparse convolution for 3d instance segmentation. Preprint arXiv:1902.04478
Liu F, Fang P, Yao Z, Fan R, Pan Z, Sheng W, Yang H (2019) Recovering 6d object pose from rgb indoor image based on two-stage detection network withmulti-task loss. Neurocomputing 337:15–23
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318
Liu M, Pan Z, Xu K, Ganguly K, Manocha D (2019) Generating grasp poses for a high-dof gripper using neural networks. Preprint arXiv:1903.00425
Liu N, Han J (2016) Dhsnet: deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 678–686
Liu N, Han J, Yang MH (2018) Picanet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3089–3098
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Liu X, Jonschkowski R, Angelova A, Konolige K (2020) Keypose: multi-view 3d labeling and keypoint estimation for transparent objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11602–11610
Liu Y, Zhang Q, Zhang D, Han J (2019) Employing deep part-object relationships for salient object detection. In: Proceedings of the IEEE international conference on computer vision, pp 1232–1241
Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X (2020) Tanet: robust 3d object detection from point clouds with triple attention. In: AAAI, pp 11677–11684
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lou X, Yang Y, Choi C (2019) Learning to generate 6-dof grasp poses with reachability awareness. Preprint arXiv:1910.06404
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the international conference on computer vision-Volume 2, ICCV ’99, p 1150
Lu W, Wan G, Zhou Y, Fu X, Yuan P, Song S (2019) Deepicp: an end-to-end deep neural network for 3d point cloud registration. Preprint arXiv:1905.04153
Lundell J, Verdoja F, Kyrki V (2019) Robust grasp planning over uncertain shape completions. Preprint arXiv:1903.00645
Luo T, Mo K, Huang Z, Xu J, Hu S, Wang L, Su H (2020) Learning to group: a bottom-up framework for 3d part discovery in unseen categories. In: International conference on learning representations
Mahajan M, Bhattacharjee T, Krishnan A, Shukla P, Nandi G (2020) Semi-supervised grasp detection by representation learning in a vector quantized latent space. Preprint arXiv:2001.08477
Mahler J, Liang J, Niyaz S, Laskey M, Doan R, Liu X, Ojea JA, Goldberg K (2017) Dex-net 2.0: seep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. CoRR arXiv:1703.09312
Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: 2011 International conference on computer vision, IEEE, pp 89–96
Mellado N, Aiger D, Mitra NJ (2014) Super 4pcs fast global pointcloud registration via smart indexing. In: Computer graphics forum, Wiley Online Library, vol 33, pp 205–215
Van der Merwe M, Lu Q, Sundaralingam B, Matak M, Hermans T (2019) Learning continuous 3d reconstructions for geometrically aware grasping. Preprint arXiv:1910.00983
Miller AT, Allen PK (2004) Graspit! a versatile simulator for robotic grasping. IEEE Robot Autom Mag 11(4):110–122
Miller AT, Knoop S, Christensen HI, Allen PK (2003) Automatic grasp planning using shape primitives. ICRA 2:1824–1829
Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D (2020) Image segmentation using deep learning: a survey. Preprint arXiv:2001.05566
Mirtich B, Canny J (1994) Easily computable optimum grasps in 2-d and 3-d. In: IEEE international conference on robotics and automation, IEEE, pp 739–747
Morrison D, Corke P, Leitner J (2018) Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. Preprint arXiv:1804.05172
Morrison D, Corke P, Leitner J (2019) Multi-view picking: next-best-view reaching for improved grasping in clutter. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 8762–8768
Mousavian A, Eppner C, Fox D (2019) 6-dof graspnet: variational grasp generation for object manipulation. In: Proceedings of the IEEE international conference on computer vision, pp 2901–2910
Mur-Artal R, Montiel JMM, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163
Murali A, Mousavian A, Eppner C, Paxton C, Fox D (2019) 6-dof grasping for target-driven object manipulation in clutter. Preprint arXiv:1912.03628
Najibi M, Lai G, Kundu A, Lu Z, Rathod V, Funkhouser T, Pantofaru C, Ross D, Davis LS, Fathi A (2020) Dops: learning to detect 3d objects and predict their 3d shapes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11913–11922
Nguyen VD (1987) Constructing stable grasps in 3d. In: IEEE international conference on robotics and automation, IEEE, vol 4, pp 234–239
Ni P, Zhang W, Zhu X, Cao Q (2020) Pointnet++ grasping: learning an end-to-end spatial grasp generation algorithm from sparse point clouds. Preprint arXiv:2003.09644
Nikandrova E, Kyrki V (2015) Category-based task specific grasping. Robot Auton Syst 70:25–35
Oberweger M, Rad M, Lepetit V (2018) Making deep heatmaps robust to partial occlusions for 3d object pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 119–134
Pang Y, Zhang L, Zhao X, Lu H (2020) Hierarchical dynamic filtering network for rgb-d salient object detection. In: Proceedings of the European conference on computer vision (ECCV)
Park D, Chun SY (2018) Classification based grasp detection using spatial transformer network. Preprint arXiv:1803.01356
Park D, Seo Y, Chun SY (2018) Real-time, highly accurate robotic grasp detection using fully convolutional neural network with rotation ensemble module. Preprint arXiv:1812.07762
Park D, Seo Y, Shin D, Choi J, Chun SY (2019) A single multi-task deep neural network with post-processing for object detection with reasoning and robotic grasp detection. Preprint arXiv:1909.07050
Park K, Mousavian A, Xiang Y, Fox D (2020) Latentfusion: end-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10710–10719
Park K, Patten T, Vincze M (2019) Pix2pose: pixel-wise coordinate regression of objects for 6d pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 7668–7677
ten Pas A, Gualtieri M, Saenko K, Platt R (2017) Grasp pose detection in point clouds. Int J Rob Res 36(13–14):1455–1473
Pas At, Platt R (2015) Using geometry to detect grasps in 3d point clouds. Preprint arXiv:1501.03100
Patil AV, Rabha P (2018) A survey on joint object detection and pose estimation using monocular vision. Preprint arXiv:1811.10216
Patten T, Park K, Vincze M (2020) Dgcm-net: dense geometrical correspondence matching network for incremental experience-based robotic grasping. Preprint arXiv:2001.05279
Peng H, Li B, Ling H, Hu W, Xiong W, Maybank SJ (2016) Salient object detection via structured matrix decomposition. IEEE Trans Pattern Anal Mach Intell 39(4):818–832
Peng H, Li B, Xiong W, Hu W, Ji R (2014) Rgbd salient object detection: a benchmark and algorithms. In: European conference on computer vision, Springer, pp 92–109
Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) Pvnet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4561–4570
Pereira N, Alexandre LA (2019) Maskedfusion: mask-based 6d object pose estimation. Preprint arXiv:1911.07771
Pham QH, Nguyen T, Hua BS, Roig G, Yeung SK (2019) Jsis3d: joint semantic-instance segmentation of 3d point clouds with multi-task pointwise networks and multi-value conditional random fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8827–8836
Pham QH, Uy MA, Hua BS, Nguyen DT, Roig G, Yeung SK (2020) Lcd: learned cross-domain descriptors for 2d–3d matching. In: AAAI, pp 11856–11864
Piao Y, Ji W, Li J, Zhang M, Lu H (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE international conference on computer vision, pp 7254–7263
Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Advances in neural information processing systems, pp 1990–1998
Pinheiro PO, Lin TY, Collobert R, Dollár P (2016) Learning to refine object segments. In: European conference on computer vision, Springer, pp 75–91
Pinto L, Gupta A (2016) Supersizing self-supervision: learning to grasp from 50k tries and 700 robot hours. In: IEEE International conference on robotics and automation (ICRA), IEEE, pp 3406–3413
Ponce J, Sullivan S, Boissonnat JD, Merlet JP (1993) On characterizing and computing three-and four-finger force-closure grasps of polyhedral objects. In: IEEE international conference on robotics and automation, IEEE, pp 821–827
Qi CR, Chen X, Litany O, Guibas LJ (2020) Imvotenet: boosting 3d object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4404–4413
Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE international conference on computer vision, pp 9277–9286
Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 918–927
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, pp 5099–5108
Qi Q, Zhao S, Shen J, Lam KM (2019) Multi-scale capsule attention-based salient object detection with multi-crossed layer connections. In: 2019 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1762–1767
Qin Y, Chen R, Zhu H, Song M, Xu J, Su H (2020) S4g: Amodal single-view single-shot se (3) grasp detection in cluttered scenes. In: Conference on robot learning, pp 53–65
Qu L, He S, Zhang J, Tian J, Tang Y, Yang Q (2017) Rgbd salient object detection via deep fusion. IEEE Trans Image Process 26(5):2274–2285
Rabbani T, Van Den Heuvel F (2005) Efficient hough transform for automatic detection of cylinders in point clouds. Isprs Wg Iii/3, Iii/4 3:60–65
Rad M, Lepetit V (2017) Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: IEEE international conference on computer vision, pp 3828–3836
Redmon J, Angelova A (2015) Real-time grasp detection using convolutional neural networks. In: 2015 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1316–1322
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. Preprint arXiv:1804.02767
Ren J, Gong X, Yu L, Zhou W, Ying Yang M (2015) Exploiting global priors for rgb-d saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 25–32
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Rennie C, Shome R, Bekris KE, De Souza AF (2016) A dataset for improved rgbd-based object detection and pose estimation for warehouse pick-and-place. IEEE Robot Autom Lett 1(2):1179–1185
Rosten E, Drummond T (2005) Fusing points and lines for high performance tracking. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, IEEE, vol 2, pp 1508–1515
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 International conference on computer vision, IEEE, pp 2564–2571
Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: IEEE international conference on robotics and automation, pp 3212–3217
Rusu RB, Blodow N, Marton ZC, Beetz M (2009) Close-range scene segmentation and reconstruction of 3d point cloud maps for mobile manipulation in domestic environments. In: 2009 IEEE/RSJ international conference on intelligent robots and systems, IEEE, pp 1–6
Sabour S, Frosst N, Hinton G (2018) Matrix capsules with em routing. In: 6th international conference on learning representations, ICLR, pp 1–15
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856–3866
Sahbani A, El-Khoury S, Bidaud P (2012) An overview of 3d object grasp synthesis algorithms. Robot Auton Syst 60(3):326–336 Autonomous Grasping
Sajjan SS, Moore M, Pan M, Nagaraja G, Lee J, Zeng A, Song S (2019) Cleargrasp: 3d shape estimation of transparent objects for manipulation. Preprint arXiv:1910.02550
Salti S, Tombari F, Stefano LD (2014) Shot: Unique signatures of histograms for surface and texture description. Comput Vis Image Underst 125:251–264
Sanchez J, Corrales JA, Bouzgarrou BC, Mezouar Y (2018) Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey. Int J Robot Res 37(7):688–716
Sarode V, Li X, Goforth H, Aoki Y, Dhagat A, Srivatsan RA, Lucey S, Choset H (2019) One framework to register them all: pointnet encoding for point cloud alignment. Preprint arXiv:1912.05766
Sarode V, Li X, Goforth H, Aoki Y, Srivatsan RA, Lucey S, Choset H (2019) Pcrnet: point cloud registration network using pointnet encoding. Preprint arXiv:1908.07906
Saxena A, Driemeyer J, Kearns J, Osondu C, Ng AY (2008a) Learning to grasp novel objects using vision. In: Experimental robotics, Springer, pp 33–42
Saxena A, Driemeyer J, Ng AY (2008b) Robotic grasping of novel objects using vision. Int J Robot Res 27(2):157–173
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. Preprint arXiv:1312.6229
Shi J, Yan Q, Xu L, Jia J (2015) Hierarchical image saliency detection on extended cssd. IEEE Trans Pattern Anal Mach Intell 38(4):717–729
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–779
Shi S, Wang Z, Shi J, Wang X, Li H (2020) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. Preprint arXiv:1907.03670
Shi W, Rajkumar R (2020) Point-gnn: graph neural network for 3d object detection in a point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1711–1719
Simon M, Fischer K, Milz S, Witt CT, Gross HM (2020) Stickypillars: robust feature matching on point clouds using graph neural networks. Preprint arXiv:2002.03983
Song C, Song J, Huang Q (2020) Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 431–440
Song S, Xiao J (2014) Sliding shapes for 3d object detection in depth images. In: European conference on computer vision, Springer, pp 634–651
Song S, Xiao J (2016) Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 808–816
Sultana F, Sufian A, Dutta P (2020) Evolution of image segmentation using deep convolutional neural network: a survey. Preprint arXiv:2001.04074
Sultana F, Sufian A, Dutta P (2020) A review of object detection models based on convolutional neural network. In: Intelligent computing: image processing based applications, Springer, pp 1–16
Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images. In: European conference on computer vision, Springer International Publishing, pp 712–729
Suzuki K, Yokota Y, Kanazawa Y, Takebayashi T (2020) Online self-supervised learning for object picking: detecting optimum grasping position using a metric learning approach. In: 2020 IEEE/SICE international symposium on system integration (SII), IEEE, pp 205–212
Szegedy C, Reed S, Erhan D, Anguelov D, Ioffe S (2014) Scalable, high-quality object detection. Preprint arXiv:1412.1441
Tam GK, Cheng ZQ, Lai YK, Langbein FC, Liu Y, Marshall D, Martin RR, Sun XF, Rosin PL (2013) Registration of 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE Trans Vis Comput Graph 19(7):1199–1217
Tejani A, Tang D, Kouskouridas R, Kim TK (2014) Latent-class hough forests for 3d object detection and pose estimation. In: European conference on computer vision, Springer, pp 462–477
Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: IEEE conference on computer vision and pattern recognition, pp 292–301
Tian H, Wang C, Manocha D, Zhang X (2019) Transferring grasp configurations using active learning and local replanning. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 1622–1628
Tian M, Pan L, Ang Jr MH, Lee G.H (2020) Robust 6d object pose estimation by learning rgb-d features. Preprint arXiv:2003.00188
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9627–9636
Tosun T, Yang D, Eisner B, Isler V, Lee D (2020) Robotic grasping through combined image-based grasp proposal and 3d reconstruction. Preprint arXiv:2003.01649
Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. Preprint arXiv:1809.10790
Truong P, Apostolopoulos S, Mosinska A, Stucky S, Ciller C, Zanet SD (2019) Glampoints: greedily learned accurate match points. In: Proceedings of the IEEE international conference on computer vision, pp 10732–10741
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Vacchetti L, Lepetit V, Fua P (2004) Stable real-time 3d tracking using online and offline information. IEEE Trans Pattern Anal Mach Intell 26(10):1385–1391
Vahrenkamp N, Westkamp L, Yamanobe N, Aksoy EE, Asfour T (2016) Part-based grasp planning for familiar objects. In: IEEE-RAS 16th international conference on humanoid robots (Humanoids), IEEE, pp 919–925
Varley J, DeChant C, Richardson A, Ruales J, Allen P (2017) Shape completion enabled robotic grasping. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 2442–2447
Vidal J, Lin C, Martí R (2018) 6d pose estimation using an improved method based on point pair features. In: 4th international conference on control, automation and robotics (ICCAR), pp 405–409
Villena-Martinez V, Oprea S, Saval-Calvo M, Azorin-Lopez J, Fuster-Guillo A, Fisher RB (2020) When deep learning meets data alignment: a review on deep registration networks (drns). Preprint arXiv:2003.03167
Vohra M, Prakash R, Behera L (2019) Real-time grasp pose estimation for novel objects in densely cluttered environment. In: 2019 28th IEEE international conference on robot and human interactive communication (RO-MAN), IEEE, pp 1–6
Wada K, Sucar E, James S, Lenton D, Davison AJ (2020) Morefusion: multi-object reasoning for 6d pose estimation from volumetric fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14540–14549
Wang C, Martín-Martín R, Xu D, Lv J, Lu C, Fei-Fei L, Savarese S, Zhu Y (2019) 6-pack: category-level 6d pose tracker with anchor-based keypoints. Preprint arXiv:1910.10750
Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3343–3352
Wang H, Sridhar S, Huang J, Valentin J, Song S, Guibas LJ (2019) Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2642–2651
Wang S, Jiang X, Zhao J, Wang X, Zhou W, Liu Y (2019) Efficient fully convolution neural network for generating pixel wise robotic grasps with high resolution images. In: 2019 IEEE international conference on robotics and biomimetics (ROBIO), IEEE, pp 474–480
Wang S, Wu J, Sun X, Yuan W, Freeman WT, Tenenbaum JB, Adelson EH (2018) 3d shape perception from monocular vision, touch, and shape priors. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 1606–1613
Wang W, Lai Q, Fu H, Shen J, Ling H (2019) Salient object detection in the deep learning era: an in-depth survey. Preprint arXiv:1904.09146
Wang W, Shen J, Shao L, Porikli F (2016) Correspondence driven saliency transfer. IEEE Trans Image Process 25(11):5025–5034
Wang W, Yu R, Huang Q, Neumann U (2018) Sgpn: similarity group proposal network for 3d point cloud instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2569–2578
Wang X, Kong T, Shen C, Jiang Y, Li L (2019) Solo: segmenting objects by locations. Preprint arXiv:1912.04488
Wang X, Liu S, Shen X, Shen C, Jia J (2019) Associatively segmenting instances and semantics in point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4096–4105
Wang Y, Solomon JM (2019) Deep closest point: learning representations for point cloud registration. In: Proceedings of the IEEE international conference on computer vision, pp 3523–3532
Wang Y, Solomon JM (2019) Prnet: self-supervised learning for partial-to-partial registration. In: Advances in neural information processing systems, pp 8812–8824
Wang Z, Jia K (2019) Frustum convnet: sliding frustums to aggregate local point-wise features for amodal 3d object detection. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 1742–1749
Watkins-Valls D, Varley J, Allen P (2019) Multi-modal geometric learning for grasping and manipulation. In: 2019 international conference on robotics and automation (ICRA), IEEE, pp 7339–7345
Wei Y, Wen F, Zhu W, Sun J (2012) Geodesic saliency using background priors. In: European conference on computer vision, Springer, pp 29–42
Wong JM, Kee V, Le T, Wagner S, Mariottini GL, Schneider A, Hamilton L, Chipalkatty R, Hebert M, Johnson DM, et al (2017) Segicp: integrated deep semantic segmentation and pose estimation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 5784–5789
Xiang Y, Schmidt T, Narayanan V, Fox D (2018) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. Preprint arXiv:1711.00199
Xie C, Xiang Y, Mousavian A, Fox D (2020) The best of both modes: separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on robot learning, pp 1369–1378
Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2020) Polarmask: single shot instance segmentation with polar representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12193–12202
Xie Q, Lai YK, Wu J, Wang Z, Zhang Y, Xu K, Wang J (2020) Mlcvnet: multi-level context votenet for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10447–10456
Xu D, Anguelov D, Jain A (2018) Pointfusion: deep sensor fusion for 3d bounding box estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition
Xue Z, Kasper A, Zoellner JM, Dillmann R (2009) An automatic grasp planning system for service robots. In: 2009 international conference on advanced robotics, IEEE, pp 1–6
Yan X, Hsu J, Khansari M, Bai Y, Pathak A, Gupta A, Davidson J, Lee H (2018) Learning 6-dof grasping interaction via deep geometry-aware 3d representations. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–9
Yan X, Khansari M, Hsu J, Gong Y, Bai Y, Pirk S, Lee H (2019) Data-efficient learning for sim-to-real robotic grasping using deep point cloud prediction networks. Preprint arXiv:1906.08989
Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337
Yang B, Wang J, Clark R, Hu Q, Wang S, Markham A, Trigoni N (2019) Learning object bounding boxes for 3d instance segmentation on point clouds. In: Advances in neural information processing systems, pp 6737–6746
Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3166–3173
Yang H, Shi J, Carlone L (2020) Teaser: fast and certifiable point cloud registration. Preprint arXiv:2001.07715
Yang J, Li H, Campbell D, Jia Y (2015) Go-icp: a globally optimal solution to 3d icp point-set registration. IEEE Trans Pattern Anal Mach Intell 38(11):2241–2254
Yang S, Zhang W, Lu W, Wang H, Li Y (2019) Learning actions from human demonstration video for robotic manipulation. Preprint arXiv:1909.04312
Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11040–11048
Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE international conference on computer vision, pp 1951–1960
Ye M, Xu S, Cao T (2020) Hvnet: hybrid voxel network for lidar based 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1631–1640
Yew ZJ, Lee GH (2018) 3dfeat-net: weakly supervised local 3d features for point cloud registration. In: European conference on computer vision, Springer, pp 630–646
Yi KM, Trulls E, Lepetit V, Fua P (2016) Lift: learned invariant feature transform. In: European conference on computer vision, Springer, pp 467–483
Yi L, Zhao W, Wang H, Sung M, Guibas LJ (2019) Gspn: generative shape proposal network for 3d instance segmentation in point cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3947–3956
Yokota Y, Suzuki K, Kanazawa Y, Takebayashi T (2020) A multi-task learning framework for grasping-position detection and few-shot classification. In: 2020 IEEE/SICE international symposium on system integration (SII), IEEE, pp 1033–1039
Yu F, Liu K, Zhang Y, Zhu C, Xu K (2019) Partnet: a recursive part decomposition network for fine-grained and hierarchical shape segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9491–9500
Yu P, Rao Y, Lu J, Zhou J (2019) P\(^{2}\)gnet: pose-guided point cloud generating networks for 6-dof object pose estimation. Preprint arXiv:1912.09316 (2019)
Yu X, Zhuang Z, Koniusz P, Li H (2020) 6dof object pose estimation via differentiable proxy voting loss. Preprint arXiv:2002.03923
Yuan Y, Hou J, Nüchter A, Schwertfeger S (2020) Self-supervised point set local descriptors for point cloud registration. Preprint arXiv:2003.05199
Zakharov S, Shugurov I, Ilic S (2019) Dpod: 6d pose object detector and refiner. In: Proceedings of the IEEE international conference on computer vision, pp 1941–1950
Zapata-Impata BS, Gil P, Pomares J, Torres F (2019) Fast geometry-based computation of grasping points on three-dimensional point clouds. Int J Adv Robot Syst 16(1):1729881419831846
Zapata-Impata BS, Mateo Agulló C, Gil P, Pomares J (2017) Using geometry to detect grasping points on 3d unknown point cloud
Zeng A, Song S, Nießner M, Fisher M, Xiao J, Funkhouser T (2017a) 3dmatch: learning local geometric descriptors from rgb-d reconstructions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1802–1811
Zeng A, Yu KT, Song S, Suo D, Walker E, Rodriguez A, Xiao J (2017b) Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 1386–1383
Zeng A, Song S, Yu KT, Donlon E, Hogan FR, Bauza M, Ma D, Taylor O, Liu M, Romo E, et al (2018) Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. In: IEEE international conference on robotics and automation (ICRA), IEEE, pp 1–8
Zhang F, Guan C, Fang J, Bai S, Yang R, Torr P, Prisacariu V (2020) Instance segmentation of lidar point clouds. ICRA, Cited by 4(1)
Zhang H, Lan X, Bai S, Wan L, Yang C, Zheng N (2018) A multi-task convolutional neural network for autonomous robotic grasping in object stacking scenes. Preprint arXiv:1809.07081
Zhang H, Lan X, Bai S, Zhou X, Tian Z, Zheng N (2018) Roi-based robotic grasp detection for object overlapping scenes. Preprint arXiv:1808.10313
Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Unconstrained salient object detection via proposal subset optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5733–5742
Zhang Q, Qu D, Xu F, Zou F (2017) Robust robot grasp detection in multimodal fusion. In: MATEC web of conferences, EDP Sciences, vol 139, p 00060
Zhang Z, Sun B, Yang H, Huang Q (2020) H3dnet: 3d object detection using hybrid geometric primitives. In: Proceedings of the European conference on computer vision (ECCV)
Zhao L, Tao W (2020) Jsnet: Joint instance and semantic segmentation of 3d point clouds. In: Thirty-Fourth AAAI conference on artificial intelligence
Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1265–1274
Zhao S, Li B, Xu P, Keutzer K (2020) Multi-source domain adaptation in the deep learning era: a systematic survey. Preprint arXiv:2002.12169
Zhao ZQ, Zheng P, Xu S, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Zhao B, Zhang H, Lan X, Wang H, Tian Z, Zheng N (2020) Regnet: region-based grasp network for single-shot grasp detection in point clouds. Preprint arXiv:2002.12647
Zheng T, Chen C, Yuan J, Li B, Ren K (2019) Pointcloud saliency maps. In: Proceedings of the IEEE international conference on computer vision, pp 1598–1606
Zhou QY, Park J, Koltun V (2016) Fast global registration. In: European conference on computer vision, Springer, pp 766–782
Zhou X, Lan X, Zhang H, Tian Z, Zhang Y, Zheng N (2018) Fully convolutional grasp detection network with oriented anchor box. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 7223–7230
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. Preprint arXiv:1904.07850
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 850–859
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499
Zhou Z, Pan T, Wu S, Chang H, Jenkins OC (2019) Glassloc: plenoptic grasp pose detection in transparent clutter. Preprint arXiv:1909.04269
Zhu A, Yang J, Zhao C, Xian K, Cao Z, Li X (2020) Lrf-net: learning local reference frames for 3d local shape description and matching. Preprint arXiv:2001.07832
Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2814–2821
Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. Preprint arXiv:1905.05055
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Du, G., Wang, K., Lian, S. et al. Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif Intell Rev 54, 1677–1734 (2021). https://doi.org/10.1007/s10462-020-09888-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-020-09888-5