Abstract
Object detection is an important problem and has a wide range of applications. In recent years, deep learning based object detection with conventional RGB cameras has made great progress. At the same time, people are more and more aware of the limitations of RGB cameras. The progress of algorithms alone can not fundamentally resolve the challenges of object detection. Unmanned vehicles or mobile robot platforms are often equipped with a variety of sensors in addition to RGB camera, each of which have its own characteristics, and can expand the sensing range of RGB camera from different dimensions. For example, infrared thermal imaging camera and multispectral camera broaden sensing range from spectral dimension, while LiDARs and depth cameras are able to broaden sensing range from the spatial dimension. This paper mainly summarizes the deep learning based object detection methods under the condition of multi-modal sensors, and surveys and categorizes the methods from the perspective of data fusion manner. The datasets of different modality are summarized, and the advantages and disadvantages with different combination of sensors are also discussed in this paper.
Similar content being viewed by others
References
An Z, Liu C, Han Y (2022) Effectiveness guided cross-modal information sharing for aligned rgb-t object detection. IEEE Signal Process Lett 29:2562–2566
An P, Liang J, Yu K, Fang B, Ma J (2022) Deep structural information fusion for 3d object detection on lidar-camera system. Comput Vision Image Underst 214:103295
Bahnsen CH, Moeslund TB (2018) Rain removal in traffic surveillance: Does it matter? IEEE Trans Intell Transp Syst 20(8):2802–2819
Benavides JM, Chang S, Park SY, Richards-Kortum R, Mackinnon N, MacAulay C, Milbourne A, Malpica A, Follen M (2003) Multispectral digital colposcopy for in vivo detection of cervical cancer. Optics Express 11(10):1223–1236
Bhatti UA, Yu Z, Chanussot J, Zeeshan Z, Yuan L, Luo W, Nawaz SA, Bhatti MA, Ain QU, Mehmood A (2021) Local similaritybased spatial-spectral fusion hyperspectral image classification with deep cnn and gabor filtering. IEEE Trans Geosci Remote Sensing 60:1–15
Blin R, Ainouz S, Canu S, Meriaudeau F (2019) Road scenes analysis in adverse weather conditions by polarization-encoded images and adapted deep learning. In: 2019 IEEE intelligent transportation systems conference (ITSC), pp 27–32 . IEEE
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11621–11631
Cao Y, Guan D, Wu Y, Yang J, Cao Y, Yang MY (2019) Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS J Photogramm Remote Sensing 150:70–79
Cao H, Chen G, Xia J, Zhuang G, Knoll A (2021) Fusion-based feature attention gate component for vehicle detection based on event camera. IEEE Sensors J 21(21):24540–24548
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp 213–229. Springer
Chen Y, Xie H, Shin H (2018) Multi-layer fusion techniques using a cnn for multispectral pedestrian detection. IET Comput Vision 12(8):1179–1187
Chen K, Liu J, Zhang H (2023) Igt: Illumination-guided rgb-t object detection with transformers. Knowl Based Syst 268:110423
Chen X, Ma H, Wan J, Li B, Xia T (2017) Mult-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915
Choe G, Kim S-H, Im S, Lee J-Y, Narasimhan SG, Kweon IS (2018) Ranus: Rgb and nir urban scene dataset for deep scene parsing. IEEE Robotics and Automation Letters 3(3):1808–1815
Choi W, Pantofaru C, Savarese S (2012) A general framework for tracking multiple people from a moving camera. IEEE Trans Pattern Anal Mach Intell 35(7):1577–1591
Choi Y, Kim N, Hwang S, Park K, Yoon JS, An K, Kweon IS (2018) Kaist multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans Intell Transp Syst 19(3):934–948
Clark GA, Sengupta SK, Aimonetti WD, Roeske F, Donetti JG (2000) Multispectral image feature selection for land mine detection. IEEE Trans Geosci Remote Sensing 38(1):304–311
Cui Y, Chen R, Chu W, Chen L, Tian D, Li Y, Cao D (2021) Deep learning for image and point cloud fusion in autonomous driving: A review. IEEE transactions on intelligent transportation systems
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via regionbased fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, pp 886–893 . Ieee
Davis JW, Sharma V (2007) Background-subtraction using contour-based fusion of thermal and visible imagery. Comput Vision Image Underst 106(2–3):162–182
Deng Z, Jan Latecki L (2017) Amodal detection of 3d objects: Inferring 3d bounding boxes from 2d ones in rgb-depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5762–5770
Devaguptapu C, Akolekar N, M Sharma, M, N Balasubramanian V (2019) Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Dhawan AP, D’Alessandro B, Patwardhan S, Mullani N (2009) Multispectral optical imaging of skin-lesions for detection of malignant melanomas. In: 2009 annual international conference of the IEEE engineering in medicine and biology society, pp 5352–5355. IEEE
Ding L, Wang Y, Laganiere R, Huang D, Fu S (2020) Convolutional neural networks for multispectral pedestrian detection. Signal Processing: Image Communication 82:115764
Du X, Ang MH, Karaman S, Rus D (2018) A general pipeline for 3d detection of vehicles. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 3194–3200 . IEEE
Fayyad J, Jaradat MA, Gruyer D, Najjaran H (2020) Deep learning sensor fusion for autonomous vehicle perception and localization: A review. Sensors 20(15):4220
Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3):1341–1360
Gebhardt E, Wolf M (2018) Camel dataset for visual and thermal infrared multiple object detection and tracking. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–6 . IEEE
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3354–3361 . IEEE
Gibson KD, Dirks R, Medlin CR, Johnston L (2004) Detection of weed species in soybean using multispectral digital images. Weed Technol 18(3):742–749
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
González A, Fang Z, Socarras Y, Serrat J, Vázquez D, Xu J, López AM (2016) Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors 16(6):820
Guan D, Cao Y, Yang J, Cao Y, Tisse C-L (2018) Exploiting fusion architectures for multispectral pedestrian detection and segmentation. Appl Optics 57(18):108–116
Guan D, Cao Y, Yang J, Cao Y, Yang MY (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf Fusion 50:148–157
Guerry J, Le Saux B, Filliat D (2017) “ look at this one” detection sharing between modality-independent classifiers for robotic discovery of people. In: 2017 European conference on mobile robots (ECMR), pp 1–6 . IEEE
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: European conference on computer vision, pp 345–360 . Springer
Han Y, Hu D (2020) Multispectral fusion approach for traffic target detection in bad weather. Algorithms 13(11):271
Herrmann C, Ruf M, Beyerer J (2018) Cnn-based thermal infrared person detection by domain adaptation. In: Autonomous systems: Sensors, vehicles, security, and the internet of everything, vol 10643, p 1064308. International Society for Optics and Photonics
Hoffman J, Gupta S, Leong J, Guadarrama S, Darrell T (2016) Crossmodal adaptation for rgb-d detection. In: 2016 IEEE international conference on robotics and automation (ICRA), pp 5032–5039 . IEEE
Hou Y-L, Song Y, Hao X, Shen Y, Qian M, Chen H (2018) Multispectral pedestrian detection based on deep convolutional neural networks. Infrared Phys & Technol 94:69–77
Hou C, Qiao T, Zhang H, Pang Y, Xiong X (2019) Multispectral visual detection method for conveyor belt longitudinal tear. Measurement 143:246–257
Huang S, Huang M, Zhang Y, Chen J, Bhatti U (2020) Medical image segmentation using deep learning with feature enhancement. IET Image Process 14(14):3324–3332
Huang T, Liu Z, Chen X, Bai X (2020) Epnet: Enhancing point features with image semantics for 3d object detection. In: European conference on computer vision, pp 35–52 . Springer
Hu X, Yang K, Fei L, Wang K (2019) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP), pp 1440–1444 . IEEE
Hwang S, Park J, Kim N, Choi Y, So Kweon I (2015) Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1037–1045
Iacono M, Weber S, Glover A, Bartolozzi C (2018) Towards event-driven object detection with off-the-shelf deep learning. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1–9 . IEEE
Jaus A, Yang K, Stiefelhagen R (2023) Panoramic panoptic segmentation: Insights into surrounding parsing for mobile agents via unsupervised contrastive learning. IEEE Trans Intell Transp Syst
Jiang Q, Dai J, Rui T, Shao F, Wang J, Lu G (2022) Attention-based cross-modality feature complementation for multispectral pedestrian detection. IEEE Access 10:53797–53809
Jin L, Ai J, Tian Z, Zhang Y (2017) Detection of polluted insulators using the information fusion of multispectral images. IEEE Trans Dielectrics Electrical Insulation 24(6):3530–3538
Jnawali K, Chinni B, Dogra V, Rao N (2020) Automatic cancer tissue detection using multispectral photoacoustic imaging. Int J Comput Assist Radiology Surgery 15(2):309–320
Kalkan H, Beriat P, Yardimci Y, Pearson T (2011) Detection of contaminated hazelnuts and ground red chili pepper flakes by multispectral imaging. Comput Electr Agri 77(1):28–34
Kesten R, Usman M, Houston J, Pandya T, Nadhamuni K, Ferreira A, Yuan M, Low B, Jain A, Ondruska P, et al (2019) Lyft level 5 av dataset 2019. https://level5.lyft.com/dataset
Kieu M, Bagdanov AD, Bertini M, Del Bimbo A (2020) Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp 546–562 .Springer
Kim JU, Park S, Ro YM (2021) Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection. IEEE Trans Circ Syst Video Technol
Kim J, Chung Y, Choi Y, Sa J, Kim H, Chung Y, Park D, Kim H (2017) Depth-based detection of standing-pigs in moving noise environments. Sensors 17(12):2757
Kim J, Kim H, Kim T, Kim N, Choi Y (2021) Mlpd: multi-label pedestrian detector in multispectral domain. IEEE Robot Auto Lett 6(4):7846–7853
Kim M, Lefcourt A, Chao K, Chen Y, Kim I, Chan D (2002) Multispectral detection of fecal contamination on apples based on hyperspectral imagery: Part i. application of visible and near-infrared reflectance imaging. Trans ASAE 45(6):2027
Kirk R, Cielniak G, Mangan M (2020) L* a* b* fruits: A rapid and robust outdoor fruit detection system combining bio-inspired features with onestage deep learning networks. Sensors 20(1):275
Konig D, Adam M, Jarvers C, Layher G, Neumann H, Teutsch M (2017) Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 49–56
Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 1–8 . IEEE
Lahoud J, Ghanem B (2017) 2d-driven 3d object detection in rgb-d images. In: Proceedings of the IEEE international conference on computer vision, pp 4622–4630
Lauricella A, Cannon J, Branting S, Hammer E (2017) Semi-automated detection of looting in afghanistan using multispectral imagery and principal component analysis. Antiquity 91(359):1344–1355
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp 734–750
Li J, Chen L, Huang W, Wang Q, Zhang B, Tian X, Fan S, Li B (2016) Multispectral detection of skin defects of bi-colored peaches based on vis-nir hyperspectral imaging. Postharvest Biol Technol 112:121–133
Li G, Gan Y, Wu H, Xiao N, Lin L (2018) Cross-modal attentional context learning for rgb-d object detection. IEEE Trans Image Process 28(4):1591–1601
Li C, Song D, Tong R, Tang M (2019) Illumination-aware faster r-cnn for robust multispectral pedestrian detection. Pattern Recognit 85:161–171
Li S, Jiao J, Wang C (2021) Research on polarized multi-spectral system and fusion algorithm for remote sensing of vegetation status at night. Remote Sensing 13(17):3510
Liang M, Yang B, Chen Y, Hu R, Urtasun R (2019) Multi-task multisensor fusion for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7345–7353
Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 641–656
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Linder T, Pfeiffer KY, Vaskevicius N, Schirmer R, Arras KO (2020) Accurate detection and 3d localization of humans using a novel yolobased rgb-d fusion approach and synthetic training data. In: 2020 IEEE International conference on robotics and automation (ICRA), pp 1000–1006 . IEEE
Li X, Shi B, Hou Y, Wu X, Ma T, Li Y, He L (2022) Homogeneous multi-modal feature fusion and interaction for 3d object detection. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp 691–707. Springer
Li C, Song D, Tong R, Tang M (2018) Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818
Liu H, Chahl JS (2018) A multispectral machine vision system for invertebrate detection on green leaves. Comput Electr Agri 150:279–288
Liu F, Shao X, Han P, Xiangli B, Yang C (2014) Detection of infrared stealth aircraft through their multispectral signatures. Optical Eng 53(9):094101
Liu J, Liu Y, Zhang G, Zhu P, Chen YQ (2015) Detecting and tracking people in real time with rgb-d camera. Pattern Recognit Lett 53:16–23
Liu H, Luo J, Wu P, Xie S, Li H (2016) People detection and tracking using rgb-d cameras for mobile robots. Int J Adv Robot Syst 13(5):1729881416657746
Liu Z, Tan Y, He Q, Xiao Y (2021) Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection. IEEE Trans Circ Syst Video Technol 32(7):4486–4497
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer
Liu Z, Huang T, Li B, Chen X, Wang X, Bai X (2022) Epnet++: Cascade bi-directional fusion for multi-modal 3d object detection. IEEE Trans Pattern Anal Mach Intell
Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5187–5196
Liu J, Liu Y, Cui Y, Chen YQ (2013) Real-time human detection and tracking in complex environments using single rgbd camera. In: 2013 IEEE international conference on image processing, pp 3088–3092. ieee
Liu C, Yang G, Wang S, Wang H, Zhang Y, Wang Y (2022) Tanet: Transformer-based asymmetric network for rgb-d salient object detection. arXiv preprint arXiv:2207.01172
Liu J, Zhang S, Wang S, Metaxas DN (2016) Multispectral deep neural networks for pedestrian detection. In: 27th British machine vision conference, BMVC 2016
Liu H, Zhang J, Yang K, Hu X, Stiefelhagen R (2022) Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers. arXiv preprint arXiv:2203.04838
Lu C, Mandal M (2013) Toward automatic mitotic cell detection and segmentation in multispectral histopathological images. IEEE J Biomed Health Inform 18(2):594–605
Luo Q, Ma H, Tang L, Wang Y, Xiong R (2020) 3d-ssd: Learning hierarchical features from rgb-d images for amodal 3d object detection. Neurocomputing 378:364–374
Mei J, Zhu AZ, Yan X, Yan H, Qiao S, Chen L-C, Kretzschmar H (2022) Waymo open dataset: Panoramic video panoptic segmentation. In: Computer Vision-ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX, pp 53–72 . Springer
Meyer GP, Charland J, Hegde D, Laddha A, Vallespi-Gonzalez C (2019) Sensor fusion for joint 3d object detection and semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Meyer GP, Laddha A, Kee E, Vallespi-Gonzalez C, Wellington CK (2019) Lasernet: An efficient probabilistic 3d object detector for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12677–12686
Mitrokhin A, Fermüller C, Parameshwara C, Aloimonos Y (2018) Eventbased moving object detection and tracking. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1–9. IEEE
Nissimov S, Goldberger J, Alchanatis V (2015) Obstacle detection in a greenhouse environment using the kinect sensor. Comput Electr Agri 113:104–115
Novikova T, Rehbinder J, Vizet J, Pierangelo A, Ossikovski R, Nazac A, Benali A, Validire P (2018) Mueller polarimetry as a tool for optical biopsy of tissue. In: 2018 international conference laser optics (ICLO), pp 553–553 . IEEE
Park K, Kim S, Sohn K (2018) Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recognit 80:143–155
Pei D, Jing M, Liu H, Sun F, Jiang L (2020) A fast retinanet fusion framework for multi-spectral pedestrian detection. Infrared Phys & Technol 105:103178
Pham Q-H, Sevestre P, Pahwa RS, Zhan H, Pang CH, Chen Y, Mustafa A, Chandrasekhar V, Lin J (2020) A* 3d dataset: Towards autonomous driving in challenging environments. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 2267–2273. IEEE
Qi CR, Chen X, Litany O, Guibas LJ (2020) Imvotenet: Boosting 3d object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4404–4413
Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9277–9286
Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 918–927
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413
Qin J, Burks T, Zhao X, Niphadkar N, Ritenour M (2011) Multispectral detection of citrus canker using hyperspectral band selection. Trans ASABE 54(6):2331–2341
Rahman MM, Tan Y, Xue J, Shao L, Lu K (2019) 3d object detection: Learning 3d bounding boxes from scaled down 2d bounding boxes in rgb-d images. Inform Sci 476:147–158
Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: A small target detection benchmark. J Visual Commun Image Represent 34:187–203
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Roblyer DM, Richards-Kortum RR, Sokolov KV, El-Naggar AK, Williams MD, Kurachi C, Gillenwater A (2008) Multispectral optical imaging device for in vivo detection of oral neoplasia. J Biomed Optics 13(2):024019
Sa I, Ge Z, Dayoub F, Upcroft B, Perez T, McCool C (2016) Deepfruits: A fruit detection system using deep neural networks. Sensors 16(8):1222
Sakla W, Konjevod G, Mundhenk TN (2017) Deep multi-modal vehicle detection in aerial isr imagery. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 916–923 . IEEE
Schlosser J, Chow CK, Kira Z (2016) Fusing lidar and images for pedestrian detection using convolutional neural networks. In: 2016 IEEE international conference on robotics and automation (ICRA), pp 2198–2205 . IEEE
Schwartz CR, Eismann MT, Cederquist JN, Johnson RO (1996) Thermal multispectral detection of military vehicles in vegetated and desert backgrounds. In: Targets and Backgrounds: Characterization and representation II, vol 2742, pp 286–297 . International Society for Optics and Photonics
Schwarz M, Milan A, Periyasamy AS, Behnke S (2018) Rgb-d object detection and semantic segmentation for autonomous manipulation in clutter. Int J Robot Res 37(4–5):437–451
Shen X, Stamos I (2020) Frustum voxnet for 3d object detection from rgb-d or depth images. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1698–1706
Shin U, Lee K, Kweon IS (2023) Complementary random masking for rgbthermal semantic segmentation. arXiv preprint arXiv:2303.17386
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 770–779
Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 601–608 . IEEE
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision, pp 746–760 . Springer
Sindagi VA, Zhou Y, Tuzel O (2019) Mvx-net: Multimodal voxelnet for 3d object detection. In: 2019 international conference on robotics and automation (ICRA), pp 7276–7282 . IEEE
Song X, Gao S, Chen C (2021) A multispectral feature fusion network for robust pedestrian detection. Alex Eng J 60(1):73–85
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576
Spinello L, Arras KO (2011) People detection in rgb-d data. In: 2011 IEEE/RSJ international conference on intelligent robots and systems, pp 3838–3843 . IEEE
Sun L, Yang K, Hu X, Hu W, Wang K (2020) Real-time fusion network for rgb-d semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot Auto Lett 5(4):5558–5565
Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B, et al. (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2446–2454
Takumi K, Watanabe K, Ha Q, Tejero-De-Pablos A, Ushiku Y, Harada T (2017) Multispectral object detection for autonomous vehicles. Proceedings of the on Thematic Workshops of ACM Multimedia 2017:35–43
Tian L, Li M, Hao Y, Liu J, Zhang G, Chen YQ (2018) Robust 3-d human detection in complex environments with a depth camera. IEEE Trans Multimedia 20(9):2249–2261
Tomatis S, Carrara M, Bono A, Bartoli C, Lualdi M, Tragni G, Colombo A, Marchesini R (2005) Automated melanoma detection with a novel multispectral imaging system: results of a prospective study. Phys Med Biol 50(8):1675
Tu S, Xue Y, Zheng C, Qi Y, Wan H, Mao L (2018) Detection of passion fruits and maturity classification using red-green-blue depth images. Biosyst Eng 175:156–167
Vandersteegen M, Van Beeck K, Goedemé, T (2018) Real-time multispectral pedestrian detection with a single-pass deep neural network. In: International conference image analysis and recognition, pp 419–426 .Springer
Vázquez-Arellano M, Griepentrog HW, Reiser D, Paraforos DS (2016) 3-d imaging systems for agricultural applications! \(^{ {a}}\)a review. Sensors 16(5):618
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Vora S, Lang AH, Helou B, Beijbom O (2020) Pointpainting: Sequential fusion for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4604–4612
Wagner J, Fischer V, Herman M, Behnke S (2016) Multispectral pedestrian detection using deep fusion convolutional neural networks. European symposium on artificial neural network, computational intelligence and machine learning 587:509–514
Wanchaitanawong N, Tanaka M, Shibata T, Okutomi M (2021) Multimodal pedestrian detection with large misalignment based on modal-wise regression and multi-modal iou. In: 2021 17th international conference on machine vision and applications (MVA), pp 1–6 . IEEE
Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2965–2974
Wang C, Ma C, Zhu M, Yang X (2021) Pointaugmenting: Cross-modal augmentation for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11794–11803
Wang Y, Ye T, Cao L, Huang W, Sun F, He F, Tao D (2022) Bridged transformer for vision and point cloud 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12114–12123
Wolpert A, Teutsch M, Sarfraz MS, Stiefelhagen R (2020) Anchor-free small-scale multispectral pedestrian detection. In: 31st British machine vision conference, BMVC 2020
Wu X, Peng L, Yang H, Xie L, Huang C, Deng C, Liu H, Cai D (2022) Sparse fuse dense: Towards high quality 3d detection with depth completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5418–5427
Xiang K, Yang K, Wang K (2021) Polarization-driven semantic segmentation via efficient attention-bridged fusion. Optics Express 29(4):4802–4820
Xiang J, Gou S, Li R, Zheng Z (2022) Rgb-thermal based pedestrian detection with single-modal augmentation and roi pooling multiscale fusion. In: IGARSS 2022-2022 IEEE international geoscience and remote sensing symposium, pp 3532–3535 . IEEE
Xie L, Xiang C, Yu Z, Xu G, Yang Z, Cai D, He X (2020) Pi-rcnn: An efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module. Proceedings of the AAAI conference on artificial intelligence 34:12460–12467
Xu X, Li Y, Wu G, Luo J (2017) Multi-modal deep feature learning for rgb-d object detection. Pattern Recognit 72:300–313
Xu S, Zhou D, Fang J, Yin J, Bin Z, Zhang L (2021) Fusionpainting: Multimodal fusion with adaptive attention for 3d object detection. In: 2021 IEEE international intelligent transportation systems Conference (ITSC), pp 3047–3054 . IEEE
Yang H, Liu Z, Wu X, Wang W, Qian W, He X, Cai D (2022) Graph rcnn: Towards accurate 3d object detection with semantic-decorated local graph. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VIII, pp 662–679. Springer
Yang X, Qian Y, Zhu H, Wang C, Yang M (2022) Baanet: Learning bidirectional adaptive attention gates for multispectral pedestrian detection. In: 2022 international conference on robotics and automation (ICRA), pp 2920–2926 . IEEE
Yan C, Zhang H, Li X, Yang Y, Yuan D (2023) Cross-modality complementary information fusion for multispectral pedestrian detection. Neural Comput Appl 1–26
Yoo JH, Kim Y, Kim J, Choi JW (2020) 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: European conference on computer vision, pp 720–736 . Springer
You Y, Ye Z, Lou Y, Li C, Li Y-L, Ma L, Wang W, Lu C (2022) Canonical voting: Towards robust oriented bounding box detection in 3d scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1193–1202
Zhang MM, Choi J, Daniilidis K, Wolf MT, Kanan C (2015) Vais: A dataset for recognizing maritime imagery in the visible and infrared spectrums. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 10–16
Zhang G, Liu J, Li H, Chen YQ, Davis LS (2017) Joint human detection and head pose estimation via multistream networks for rgb-d videos. IEEE Signal Process Lett 24(11):1666–1670
Zhang D, Zhou X, Zhang J, Lan Y, Xu C, Liang D (2018) Detection of rice sheath blight using an unmanned aerial system with high-resolution color and multispectral imaging. PloS One 13(5):0187470
Zhang L, Liu Z, Zhang S, Yang X, Qiao H, Huang K, Hussain A (2019) Cross-modality interactive attention network for multispectral pedestrian detection. Inf Fusion 50:20–29
Zhang Q, Xiao T, Huang N, Zhang D, Han J (2020) Revisiting feature fusion for rgb-t salient object detection. IEEE Trans Circ Syst Video Technol 31(5):1804–1818
Zhang Y, Sidibé D, Morel O, Mériaudeau F (2021) Deep multimodal fusion for semantic image segmentation: A survey. Image Vision Comput 105:104042
Zhang Y, Yu H, He Y, Wang X, Yang W (2023) Illumination-guided rgbt object detection with inter-and intra-modality fusion. IEEE Trans Instrum Meas 72:1–13
Zhang H, Fromont E, Lefevre S, Avignon B (2020) Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: 2020 IEEE international conference on image processing (ICIP), pp 276–280 .IEEE
Zhang H, Fromont E, Lefèvre S, Avignon B (2021) Guided attentive feature fusion for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 72–80
Zhang L, Liu Z, Zhu X, Song Z, Yang X, Lei Z, Qiao H (2021) Weakly aligned feature fusion for multimodal object detection. IEEE Trans Neural Netw Learn Syst
Zhang J, Yang K, Stiefelhagen R (2021) Issafe: Improving semantic segmentation in accidents by fusing event-based data. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1132–1139 . IEEE
Zhang L, Zhu X, Chen X, Yang X, Lei Z, Liu Z (2019) Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5127–5137
Zhao C, Liu H, Su N, Yan Y (2022) Tftn: A transformer-based fusion tracking framework of hyperspectral and rgb. IEEE Trans Geosci Remote Sensing 60:1–15
Zhao J, Zhang G, Tian L, Chen YQ (2017) Real-time human detection with depth camera via a physical radius-depth detector and a cnn descriptor. In: 2017 IEEE international conference on multimedia and Expo (ICME), pp 1536–1541. IEEE
Zheng Y, Izzat IH, Ziaee S (2019) Gfd-ssd: gated fusion double ssd for multispectral pedestrian detection. arXiv preprint arXiv:1903.06999
Zhou K, Chen L, Cao X (2020) Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Computer Vision– ECCV 2020: 16th european conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp 787–803 . Springer
Zhou T, Fan D-P, Cheng M-M, Shen J, Shao L (2021) Rgb-d salient object detection: A survey. Computational Visual Media 1–33
Zhou K, Paiement A, Mirmehdi M (2017) Detecting humans in rgb-d data with cnns. In: 2017 Fifteenth IAPR international conference on machine vision applications (MVA), pp 306–309 . IEEE
Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 840–849
Zhu Q, Ren J, Barclay D, McCormack S, Thomson W (2015) Automatic animal detection from kinect sensed images for livestock monitoring and assessment. In: 2015 IEEE international conference on computer and information technology; Ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing, pp 1154–1157 . IEEE
Zhu P, Sun Y, Wen L, Feng Y, Hu Q (2020) Drone based rgbt vehicle detection and counting: A challenge. arXiv preprint arXiv:2003.02437
Acknowledgements
The research work of this paper is sponsored by Natural science fund for colleges and universities in Jiangsu Province NO. 21KJB520015, and Post doctoral fund of Jiangsu Province NO. 2021K398C
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
Authors of this manuscript declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Meng, S., Wang, H. et al. Deep learning based object detection from multi-modal sensors: an overview. Multimed Tools Appl 83, 19841–19870 (2024). https://doi.org/10.1007/s11042-023-16275-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16275-z