Skip to main content
Log in

Deep learning based object detection from multi-modal sensors: an overview

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object detection is an important problem and has a wide range of applications. In recent years, deep learning based object detection with conventional RGB cameras has made great progress. At the same time, people are more and more aware of the limitations of RGB cameras. The progress of algorithms alone can not fundamentally resolve the challenges of object detection. Unmanned vehicles or mobile robot platforms are often equipped with a variety of sensors in addition to RGB camera, each of which have its own characteristics, and can expand the sensing range of RGB camera from different dimensions. For example, infrared thermal imaging camera and multispectral camera broaden sensing range from spectral dimension, while LiDARs and depth cameras are able to broaden sensing range from the spatial dimension. This paper mainly summarizes the deep learning based object detection methods under the condition of multi-modal sensors, and surveys and categorizes the methods from the perspective of data fusion manner. The datasets of different modality are summarized, and the advantages and disadvantages with different combination of sensors are also discussed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. An Z, Liu C, Han Y (2022) Effectiveness guided cross-modal information sharing for aligned rgb-t object detection. IEEE Signal Process Lett 29:2562–2566

    ADS  Google Scholar 

  2. An P, Liang J, Yu K, Fang B, Ma J (2022) Deep structural information fusion for 3d object detection on lidar-camera system. Comput Vision Image Underst 214:103295

    Google Scholar 

  3. Bahnsen CH, Moeslund TB (2018) Rain removal in traffic surveillance: Does it matter? IEEE Trans Intell Transp Syst 20(8):2802–2819

    Google Scholar 

  4. Benavides JM, Chang S, Park SY, Richards-Kortum R, Mackinnon N, MacAulay C, Milbourne A, Malpica A, Follen M (2003) Multispectral digital colposcopy for in vivo detection of cervical cancer. Optics Express 11(10):1223–1236

    ADS  PubMed  Google Scholar 

  5. Bhatti UA, Yu Z, Chanussot J, Zeeshan Z, Yuan L, Luo W, Nawaz SA, Bhatti MA, Ain QU, Mehmood A (2021) Local similaritybased spatial-spectral fusion hyperspectral image classification with deep cnn and gabor filtering. IEEE Trans Geosci Remote Sensing 60:1–15

    Google Scholar 

  6. Blin R, Ainouz S, Canu S, Meriaudeau F (2019) Road scenes analysis in adverse weather conditions by polarization-encoded images and adapted deep learning. In: 2019 IEEE intelligent transportation systems conference (ITSC), pp 27–32 . IEEE

  7. Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11621–11631

  8. Cao Y, Guan D, Wu Y, Yang J, Cao Y, Yang MY (2019) Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS J Photogramm Remote Sensing 150:70–79

    ADS  Google Scholar 

  9. Cao H, Chen G, Xia J, Zhuang G, Knoll A (2021) Fusion-based feature attention gate component for vehicle detection based on event camera. IEEE Sensors J 21(21):24540–24548

    ADS  Google Scholar 

  10. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp 213–229. Springer

  11. Chen Y, Xie H, Shin H (2018) Multi-layer fusion techniques using a cnn for multispectral pedestrian detection. IET Comput Vision 12(8):1179–1187

    Google Scholar 

  12. Chen K, Liu J, Zhang H (2023) Igt: Illumination-guided rgb-t object detection with transformers. Knowl Based Syst 268:110423

    Google Scholar 

  13. Chen X, Ma H, Wan J, Li B, Xia T (2017) Mult-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1907–1915

  14. Choe G, Kim S-H, Im S, Lee J-Y, Narasimhan SG, Kweon IS (2018) Ranus: Rgb and nir urban scene dataset for deep scene parsing. IEEE Robotics and Automation Letters 3(3):1808–1815

    Google Scholar 

  15. Choi W, Pantofaru C, Savarese S (2012) A general framework for tracking multiple people from a moving camera. IEEE Trans Pattern Anal Mach Intell 35(7):1577–1591

    Google Scholar 

  16. Choi Y, Kim N, Hwang S, Park K, Yoon JS, An K, Kweon IS (2018) Kaist multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans Intell Transp Syst 19(3):934–948

    Google Scholar 

  17. Clark GA, Sengupta SK, Aimonetti WD, Roeske F, Donetti JG (2000) Multispectral image feature selection for land mine detection. IEEE Trans Geosci Remote Sensing 38(1):304–311

    ADS  Google Scholar 

  18. Cui Y, Chen R, Chu W, Chen L, Tian D, Li Y, Cao D (2021) Deep learning for image and point cloud fusion in autonomous driving: A review. IEEE transactions on intelligent transportation systems

  19. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via regionbased fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  20. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1, pp 886–893 . Ieee

  21. Davis JW, Sharma V (2007) Background-subtraction using contour-based fusion of thermal and visible imagery. Comput Vision Image Underst 106(2–3):162–182

    Google Scholar 

  22. Deng Z, Jan Latecki L (2017) Amodal detection of 3d objects: Inferring 3d bounding boxes from 2d ones in rgb-depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5762–5770

  23. Devaguptapu C, Akolekar N, M Sharma, M, N Balasubramanian V (2019) Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition Workshops, pp 0–0

  24. Dhawan AP, D’Alessandro B, Patwardhan S, Mullani N (2009) Multispectral optical imaging of skin-lesions for detection of malignant melanomas. In: 2009 annual international conference of the IEEE engineering in medicine and biology society, pp 5352–5355. IEEE

  25. Ding L, Wang Y, Laganiere R, Huang D, Fu S (2020) Convolutional neural networks for multispectral pedestrian detection. Signal Processing: Image Communication 82:115764

    Google Scholar 

  26. Du X, Ang MH, Karaman S, Rus D (2018) A general pipeline for 3d detection of vehicles. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 3194–3200 . IEEE

  27. Fayyad J, Jaradat MA, Gruyer D, Najjaran H (2020) Deep learning sensor fusion for autonomous vehicle perception and localization: A review. Sensors 20(15):4220

    ADS  PubMed  PubMed Central  Google Scholar 

  28. Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3):1341–1360

    Google Scholar 

  29. Gebhardt E, Wolf M (2018) Camel dataset for visual and thermal infrared multiple object detection and tracking. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–6 . IEEE

  30. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3354–3361 . IEEE

  31. Gibson KD, Dirks R, Medlin CR, Johnston L (2004) Detection of weed species in soybean using multispectral digital images. Weed Technol 18(3):742–749

    Google Scholar 

  32. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  33. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  34. González A, Fang Z, Socarras Y, Serrat J, Vázquez D, Xu J, López AM (2016) Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors 16(6):820

    ADS  PubMed  PubMed Central  Google Scholar 

  35. Guan D, Cao Y, Yang J, Cao Y, Tisse C-L (2018) Exploiting fusion architectures for multispectral pedestrian detection and segmentation. Appl Optics 57(18):108–116

    Google Scholar 

  36. Guan D, Cao Y, Yang J, Cao Y, Yang MY (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf Fusion 50:148–157

    Google Scholar 

  37. Guerry J, Le Saux B, Filliat D (2017) “ look at this one” detection sharing between modality-independent classifiers for robotic discovery of people. In: 2017 European conference on mobile robots (ECMR), pp 1–6 . IEEE

  38. Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: European conference on computer vision, pp 345–360 . Springer

  39. Han Y, Hu D (2020) Multispectral fusion approach for traffic target detection in bad weather. Algorithms 13(11):271

    Google Scholar 

  40. Herrmann C, Ruf M, Beyerer J (2018) Cnn-based thermal infrared person detection by domain adaptation. In: Autonomous systems: Sensors, vehicles, security, and the internet of everything, vol 10643, p 1064308. International Society for Optics and Photonics

  41. Hoffman J, Gupta S, Leong J, Guadarrama S, Darrell T (2016) Crossmodal adaptation for rgb-d detection. In: 2016 IEEE international conference on robotics and automation (ICRA), pp 5032–5039 . IEEE

  42. Hou Y-L, Song Y, Hao X, Shen Y, Qian M, Chen H (2018) Multispectral pedestrian detection based on deep convolutional neural networks. Infrared Phys & Technol 94:69–77

    ADS  Google Scholar 

  43. Hou C, Qiao T, Zhang H, Pang Y, Xiong X (2019) Multispectral visual detection method for conveyor belt longitudinal tear. Measurement 143:246–257

    ADS  Google Scholar 

  44. Huang S, Huang M, Zhang Y, Chen J, Bhatti U (2020) Medical image segmentation using deep learning with feature enhancement. IET Image Process 14(14):3324–3332

    Google Scholar 

  45. Huang T, Liu Z, Chen X, Bai X (2020) Epnet: Enhancing point features with image semantics for 3d object detection. In: European conference on computer vision, pp 35–52 . Springer

  46. Hu X, Yang K, Fei L, Wang K (2019) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP), pp 1440–1444 . IEEE

  47. Hwang S, Park J, Kim N, Choi Y, So Kweon I (2015) Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1037–1045

  48. Iacono M, Weber S, Glover A, Bartolozzi C (2018) Towards event-driven object detection with off-the-shelf deep learning. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1–9 . IEEE

  49. Jaus A, Yang K, Stiefelhagen R (2023) Panoramic panoptic segmentation: Insights into surrounding parsing for mobile agents via unsupervised contrastive learning. IEEE Trans Intell Transp Syst

  50. Jiang Q, Dai J, Rui T, Shao F, Wang J, Lu G (2022) Attention-based cross-modality feature complementation for multispectral pedestrian detection. IEEE Access 10:53797–53809

    Google Scholar 

  51. Jin L, Ai J, Tian Z, Zhang Y (2017) Detection of polluted insulators using the information fusion of multispectral images. IEEE Trans Dielectrics Electrical Insulation 24(6):3530–3538

    CAS  Google Scholar 

  52. Jnawali K, Chinni B, Dogra V, Rao N (2020) Automatic cancer tissue detection using multispectral photoacoustic imaging. Int J Comput Assist Radiology Surgery 15(2):309–320

    Google Scholar 

  53. Kalkan H, Beriat P, Yardimci Y, Pearson T (2011) Detection of contaminated hazelnuts and ground red chili pepper flakes by multispectral imaging. Comput Electr Agri 77(1):28–34

    Google Scholar 

  54. Kesten R, Usman M, Houston J, Pandya T, Nadhamuni K, Ferreira A, Yuan M, Low B, Jain A, Ondruska P, et al (2019) Lyft level 5 av dataset 2019. https://level5.lyft.com/dataset

  55. Kieu M, Bagdanov AD, Bertini M, Del Bimbo A (2020) Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp 546–562 .Springer

  56. Kim JU, Park S, Ro YM (2021) Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection. IEEE Trans Circ Syst Video Technol

  57. Kim J, Chung Y, Choi Y, Sa J, Kim H, Chung Y, Park D, Kim H (2017) Depth-based detection of standing-pigs in moving noise environments. Sensors 17(12):2757

    ADS  PubMed  PubMed Central  Google Scholar 

  58. Kim J, Kim H, Kim T, Kim N, Choi Y (2021) Mlpd: multi-label pedestrian detector in multispectral domain. IEEE Robot Auto Lett 6(4):7846–7853

    Google Scholar 

  59. Kim M, Lefcourt A, Chao K, Chen Y, Kim I, Chan D (2002) Multispectral detection of fecal contamination on apples based on hyperspectral imagery: Part i. application of visible and near-infrared reflectance imaging. Trans ASAE 45(6):2027

  60. Kirk R, Cielniak G, Mangan M (2020) L* a* b* fruits: A rapid and robust outdoor fruit detection system combining bio-inspired features with onestage deep learning networks. Sensors 20(1):275

    ADS  PubMed  PubMed Central  Google Scholar 

  61. Konig D, Adam M, Jarvers C, Layher G, Neumann H, Teutsch M (2017) Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 49–56

  62. Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 1–8 . IEEE

  63. Lahoud J, Ghanem B (2017) 2d-driven 3d object detection in rgb-d images. In: Proceedings of the IEEE international conference on computer vision, pp 4622–4630

  64. Lauricella A, Cannon J, Branting S, Hammer E (2017) Semi-automated detection of looting in afghanistan using multispectral imagery and principal component analysis. Antiquity 91(359):1344–1355

    Google Scholar 

  65. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp 734–750

  66. Li J, Chen L, Huang W, Wang Q, Zhang B, Tian X, Fan S, Li B (2016) Multispectral detection of skin defects of bi-colored peaches based on vis-nir hyperspectral imaging. Postharvest Biol Technol 112:121–133

    CAS  Google Scholar 

  67. Li G, Gan Y, Wu H, Xiao N, Lin L (2018) Cross-modal attentional context learning for rgb-d object detection. IEEE Trans Image Process 28(4):1591–1601

    ADS  MathSciNet  PubMed  Google Scholar 

  68. Li C, Song D, Tong R, Tang M (2019) Illumination-aware faster r-cnn for robust multispectral pedestrian detection. Pattern Recognit 85:161–171

    ADS  Google Scholar 

  69. Li S, Jiao J, Wang C (2021) Research on polarized multi-spectral system and fusion algorithm for remote sensing of vegetation status at night. Remote Sensing 13(17):3510

    ADS  Google Scholar 

  70. Liang M, Yang B, Chen Y, Hu R, Urtasun R (2019) Multi-task multisensor fusion for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7345–7353

  71. Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 641–656

  72. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  73. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  74. Linder T, Pfeiffer KY, Vaskevicius N, Schirmer R, Arras KO (2020) Accurate detection and 3d localization of humans using a novel yolobased rgb-d fusion approach and synthetic training data. In: 2020 IEEE International conference on robotics and automation (ICRA), pp 1000–1006 . IEEE

  75. Li X, Shi B, Hou Y, Wu X, Ma T, Li Y, He L (2022) Homogeneous multi-modal feature fusion and interaction for 3d object detection. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp 691–707. Springer

  76. Li C, Song D, Tong R, Tang M (2018) Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818

  77. Liu H, Chahl JS (2018) A multispectral machine vision system for invertebrate detection on green leaves. Comput Electr Agri 150:279–288

    Google Scholar 

  78. Liu F, Shao X, Han P, Xiangli B, Yang C (2014) Detection of infrared stealth aircraft through their multispectral signatures. Optical Eng 53(9):094101

    ADS  Google Scholar 

  79. Liu J, Liu Y, Zhang G, Zhu P, Chen YQ (2015) Detecting and tracking people in real time with rgb-d camera. Pattern Recognit Lett 53:16–23

    ADS  CAS  Google Scholar 

  80. Liu H, Luo J, Wu P, Xie S, Li H (2016) People detection and tracking using rgb-d cameras for mobile robots. Int J Adv Robot Syst 13(5):1729881416657746

    Google Scholar 

  81. Liu Z, Tan Y, He Q, Xiao Y (2021) Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection. IEEE Trans Circ Syst Video Technol 32(7):4486–4497

    Google Scholar 

  82. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer

  83. Liu Z, Huang T, Li B, Chen X, Wang X, Bai X (2022) Epnet++: Cascade bi-directional fusion for multi-modal 3d object detection. IEEE Trans Pattern Anal Mach Intell

  84. Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5187–5196

  85. Liu J, Liu Y, Cui Y, Chen YQ (2013) Real-time human detection and tracking in complex environments using single rgbd camera. In: 2013 IEEE international conference on image processing, pp 3088–3092. ieee

  86. Liu C, Yang G, Wang S, Wang H, Zhang Y, Wang Y (2022) Tanet: Transformer-based asymmetric network for rgb-d salient object detection. arXiv preprint arXiv:2207.01172

  87. Liu J, Zhang S, Wang S, Metaxas DN (2016) Multispectral deep neural networks for pedestrian detection. In: 27th British machine vision conference, BMVC 2016

  88. Liu H, Zhang J, Yang K, Hu X, Stiefelhagen R (2022) Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers. arXiv preprint arXiv:2203.04838

  89. Lu C, Mandal M (2013) Toward automatic mitotic cell detection and segmentation in multispectral histopathological images. IEEE J Biomed Health Inform 18(2):594–605

    Google Scholar 

  90. Luo Q, Ma H, Tang L, Wang Y, Xiong R (2020) 3d-ssd: Learning hierarchical features from rgb-d images for amodal 3d object detection. Neurocomputing 378:364–374

    Google Scholar 

  91. Mei J, Zhu AZ, Yan X, Yan H, Qiao S, Chen L-C, Kretzschmar H (2022) Waymo open dataset: Panoramic video panoptic segmentation. In: Computer Vision-ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX, pp 53–72 . Springer

  92. Meyer GP, Charland J, Hegde D, Laddha A, Vallespi-Gonzalez C (2019) Sensor fusion for joint 3d object detection and semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0

  93. Meyer GP, Laddha A, Kee E, Vallespi-Gonzalez C, Wellington CK (2019) Lasernet: An efficient probabilistic 3d object detector for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12677–12686

  94. Mitrokhin A, Fermüller C, Parameshwara C, Aloimonos Y (2018) Eventbased moving object detection and tracking. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1–9. IEEE

  95. Nissimov S, Goldberger J, Alchanatis V (2015) Obstacle detection in a greenhouse environment using the kinect sensor. Comput Electr Agri 113:104–115

    Google Scholar 

  96. Novikova T, Rehbinder J, Vizet J, Pierangelo A, Ossikovski R, Nazac A, Benali A, Validire P (2018) Mueller polarimetry as a tool for optical biopsy of tissue. In: 2018 international conference laser optics (ICLO), pp 553–553 . IEEE

  97. Park K, Kim S, Sohn K (2018) Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recognit 80:143–155

    ADS  Google Scholar 

  98. Pei D, Jing M, Liu H, Sun F, Jiang L (2020) A fast retinanet fusion framework for multi-spectral pedestrian detection. Infrared Phys & Technol 105:103178

    Google Scholar 

  99. Pham Q-H, Sevestre P, Pahwa RS, Zhan H, Pang CH, Chen Y, Mustafa A, Chandrasekhar V, Lin J (2020) A* 3d dataset: Towards autonomous driving in challenging environments. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 2267–2273. IEEE

  100. Qi CR, Chen X, Litany O, Guibas LJ (2020) Imvotenet: Boosting 3d object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4404–4413

  101. Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9277–9286

  102. Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 918–927

  103. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660

  104. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413

  105. Qin J, Burks T, Zhao X, Niphadkar N, Ritenour M (2011) Multispectral detection of citrus canker using hyperspectral band selection. Trans ASABE 54(6):2331–2341

    Google Scholar 

  106. Rahman MM, Tan Y, Xue J, Shao L, Lu K (2019) 3d object detection: Learning 3d bounding boxes from scaled down 2d bounding boxes in rgb-d images. Inform Sci 476:147–158

    Google Scholar 

  107. Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: A small target detection benchmark. J Visual Commun Image Represent 34:187–203

    Google Scholar 

  108. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  109. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  110. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  111. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  112. Roblyer DM, Richards-Kortum RR, Sokolov KV, El-Naggar AK, Williams MD, Kurachi C, Gillenwater A (2008) Multispectral optical imaging device for in vivo detection of oral neoplasia. J Biomed Optics 13(2):024019

    ADS  Google Scholar 

  113. Sa I, Ge Z, Dayoub F, Upcroft B, Perez T, McCool C (2016) Deepfruits: A fruit detection system using deep neural networks. Sensors 16(8):1222

    ADS  PubMed  PubMed Central  Google Scholar 

  114. Sakla W, Konjevod G, Mundhenk TN (2017) Deep multi-modal vehicle detection in aerial isr imagery. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 916–923 . IEEE

  115. Schlosser J, Chow CK, Kira Z (2016) Fusing lidar and images for pedestrian detection using convolutional neural networks. In: 2016 IEEE international conference on robotics and automation (ICRA), pp 2198–2205 . IEEE

  116. Schwartz CR, Eismann MT, Cederquist JN, Johnson RO (1996) Thermal multispectral detection of military vehicles in vegetated and desert backgrounds. In: Targets and Backgrounds: Characterization and representation II, vol 2742, pp 286–297 . International Society for Optics and Photonics

  117. Schwarz M, Milan A, Periyasamy AS, Behnke S (2018) Rgb-d object detection and semantic segmentation for autonomous manipulation in clutter. Int J Robot Res 37(4–5):437–451

    Google Scholar 

  118. Shen X, Stamos I (2020) Frustum voxnet for 3d object detection from rgb-d or depth images. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1698–1706

  119. Shin U, Lee K, Kweon IS (2023) Complementary random masking for rgbthermal semantic segmentation. arXiv preprint arXiv:2303.17386

  120. Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 770–779

  121. Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 601–608 . IEEE

  122. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision, pp 746–760 . Springer

  123. Sindagi VA, Zhou Y, Tuzel O (2019) Mvx-net: Multimodal voxelnet for 3d object detection. In: 2019 international conference on robotics and automation (ICRA), pp 7276–7282 . IEEE

  124. Song X, Gao S, Chen C (2021) A multispectral feature fusion network for robust pedestrian detection. Alex Eng J 60(1):73–85

    Google Scholar 

  125. Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576

  126. Spinello L, Arras KO (2011) People detection in rgb-d data. In: 2011 IEEE/RSJ international conference on intelligent robots and systems, pp 3838–3843 . IEEE

  127. Sun L, Yang K, Hu X, Hu W, Wang K (2020) Real-time fusion network for rgb-d semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot Auto Lett 5(4):5558–5565

    Google Scholar 

  128. Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B, et al. (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2446–2454

  129. Takumi K, Watanabe K, Ha Q, Tejero-De-Pablos A, Ushiku Y, Harada T (2017) Multispectral object detection for autonomous vehicles. Proceedings of the on Thematic Workshops of ACM Multimedia 2017:35–43

    Google Scholar 

  130. Tian L, Li M, Hao Y, Liu J, Zhang G, Chen YQ (2018) Robust 3-d human detection in complex environments with a depth camera. IEEE Trans Multimedia 20(9):2249–2261

    Google Scholar 

  131. Tomatis S, Carrara M, Bono A, Bartoli C, Lualdi M, Tragni G, Colombo A, Marchesini R (2005) Automated melanoma detection with a novel multispectral imaging system: results of a prospective study. Phys Med Biol 50(8):1675

    PubMed  Google Scholar 

  132. Tu S, Xue Y, Zheng C, Qi Y, Wan H, Mao L (2018) Detection of passion fruits and maturity classification using red-green-blue depth images. Biosyst Eng 175:156–167

    Google Scholar 

  133. Vandersteegen M, Van Beeck K, Goedemé, T (2018) Real-time multispectral pedestrian detection with a single-pass deep neural network. In: International conference image analysis and recognition, pp 419–426 .Springer

  134. Vázquez-Arellano M, Griepentrog HW, Reiser D, Paraforos DS (2016) 3-d imaging systems for agricultural applications! \(^{ {a}}\)a review. Sensors 16(5):618

    ADS  PubMed  PubMed Central  Google Scholar 

  135. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Google Scholar 

  136. Vora S, Lang AH, Helou B, Beijbom O (2020) Pointpainting: Sequential fusion for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4604–4612

  137. Wagner J, Fischer V, Herman M, Behnke S (2016) Multispectral pedestrian detection using deep fusion convolutional neural networks. European symposium on artificial neural network, computational intelligence and machine learning 587:509–514

    Google Scholar 

  138. Wanchaitanawong N, Tanaka M, Shibata T, Okutomi M (2021) Multimodal pedestrian detection with large misalignment based on modal-wise regression and multi-modal iou. In: 2021 17th international conference on machine vision and applications (MVA), pp 1–6 . IEEE

  139. Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2965–2974

  140. Wang C, Ma C, Zhu M, Yang X (2021) Pointaugmenting: Cross-modal augmentation for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11794–11803

  141. Wang Y, Ye T, Cao L, Huang W, Sun F, He F, Tao D (2022) Bridged transformer for vision and point cloud 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12114–12123

  142. Wolpert A, Teutsch M, Sarfraz MS, Stiefelhagen R (2020) Anchor-free small-scale multispectral pedestrian detection. In: 31st British machine vision conference, BMVC 2020

  143. Wu X, Peng L, Yang H, Xie L, Huang C, Deng C, Liu H, Cai D (2022) Sparse fuse dense: Towards high quality 3d detection with depth completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5418–5427

  144. Xiang K, Yang K, Wang K (2021) Polarization-driven semantic segmentation via efficient attention-bridged fusion. Optics Express 29(4):4802–4820

    ADS  PubMed  Google Scholar 

  145. Xiang J, Gou S, Li R, Zheng Z (2022) Rgb-thermal based pedestrian detection with single-modal augmentation and roi pooling multiscale fusion. In: IGARSS 2022-2022 IEEE international geoscience and remote sensing symposium, pp 3532–3535 . IEEE

  146. Xie L, Xiang C, Yu Z, Xu G, Yang Z, Cai D, He X (2020) Pi-rcnn: An efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module. Proceedings of the AAAI conference on artificial intelligence 34:12460–12467

    Google Scholar 

  147. Xu X, Li Y, Wu G, Luo J (2017) Multi-modal deep feature learning for rgb-d object detection. Pattern Recognit 72:300–313

    ADS  Google Scholar 

  148. Xu S, Zhou D, Fang J, Yin J, Bin Z, Zhang L (2021) Fusionpainting: Multimodal fusion with adaptive attention for 3d object detection. In: 2021 IEEE international intelligent transportation systems Conference (ITSC), pp 3047–3054 . IEEE

  149. Yang H, Liu Z, Wu X, Wang W, Qian W, He X, Cai D (2022) Graph rcnn: Towards accurate 3d object detection with semantic-decorated local graph. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VIII, pp 662–679. Springer

  150. Yang X, Qian Y, Zhu H, Wang C, Yang M (2022) Baanet: Learning bidirectional adaptive attention gates for multispectral pedestrian detection. In: 2022 international conference on robotics and automation (ICRA), pp 2920–2926 . IEEE

  151. Yan C, Zhang H, Li X, Yang Y, Yuan D (2023) Cross-modality complementary information fusion for multispectral pedestrian detection. Neural Comput Appl 1–26

  152. Yoo JH, Kim Y, Kim J, Choi JW (2020) 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: European conference on computer vision, pp 720–736 . Springer

  153. You Y, Ye Z, Lou Y, Li C, Li Y-L, Ma L, Wang W, Lu C (2022) Canonical voting: Towards robust oriented bounding box detection in 3d scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1193–1202

  154. Zhang MM, Choi J, Daniilidis K, Wolf MT, Kanan C (2015) Vais: A dataset for recognizing maritime imagery in the visible and infrared spectrums. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 10–16

  155. Zhang G, Liu J, Li H, Chen YQ, Davis LS (2017) Joint human detection and head pose estimation via multistream networks for rgb-d videos. IEEE Signal Process Lett 24(11):1666–1670

    ADS  Google Scholar 

  156. Zhang D, Zhou X, Zhang J, Lan Y, Xu C, Liang D (2018) Detection of rice sheath blight using an unmanned aerial system with high-resolution color and multispectral imaging. PloS One 13(5):0187470

    Google Scholar 

  157. Zhang L, Liu Z, Zhang S, Yang X, Qiao H, Huang K, Hussain A (2019) Cross-modality interactive attention network for multispectral pedestrian detection. Inf Fusion 50:20–29

    Google Scholar 

  158. Zhang Q, Xiao T, Huang N, Zhang D, Han J (2020) Revisiting feature fusion for rgb-t salient object detection. IEEE Trans Circ Syst Video Technol 31(5):1804–1818

    Google Scholar 

  159. Zhang Y, Sidibé D, Morel O, Mériaudeau F (2021) Deep multimodal fusion for semantic image segmentation: A survey. Image Vision Comput 105:104042

    Google Scholar 

  160. Zhang Y, Yu H, He Y, Wang X, Yang W (2023) Illumination-guided rgbt object detection with inter-and intra-modality fusion. IEEE Trans Instrum Meas 72:1–13

    Google Scholar 

  161. Zhang H, Fromont E, Lefevre S, Avignon B (2020) Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: 2020 IEEE international conference on image processing (ICIP), pp 276–280 .IEEE

  162. Zhang H, Fromont E, Lefèvre S, Avignon B (2021) Guided attentive feature fusion for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 72–80

  163. Zhang L, Liu Z, Zhu X, Song Z, Yang X, Lei Z, Qiao H (2021) Weakly aligned feature fusion for multimodal object detection. IEEE Trans Neural Netw Learn Syst

  164. Zhang J, Yang K, Stiefelhagen R (2021) Issafe: Improving semantic segmentation in accidents by fusing event-based data. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1132–1139 . IEEE

  165. Zhang L, Zhu X, Chen X, Yang X, Lei Z, Liu Z (2019) Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5127–5137

  166. Zhao C, Liu H, Su N, Yan Y (2022) Tftn: A transformer-based fusion tracking framework of hyperspectral and rgb. IEEE Trans Geosci Remote Sensing 60:1–15

    Google Scholar 

  167. Zhao J, Zhang G, Tian L, Chen YQ (2017) Real-time human detection with depth camera via a physical radius-depth detector and a cnn descriptor. In: 2017 IEEE international conference on multimedia and Expo (ICME), pp 1536–1541. IEEE

  168. Zheng Y, Izzat IH, Ziaee S (2019) Gfd-ssd: gated fusion double ssd for multispectral pedestrian detection. arXiv preprint arXiv:1903.06999

  169. Zhou K, Chen L, Cao X (2020) Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Computer Vision– ECCV 2020: 16th european conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp 787–803 . Springer

  170. Zhou T, Fan D-P, Cheng M-M, Shen J, Shao L (2021) Rgb-d salient object detection: A survey. Computational Visual Media 1–33

  171. Zhou K, Paiement A, Mirmehdi M (2017) Detecting humans in rgb-d data with cnns. In: 2017 Fifteenth IAPR international conference on machine vision applications (MVA), pp 306–309 . IEEE

  172. Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490–4499

  173. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850

  174. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 840–849

  175. Zhu Q, Ren J, Barclay D, McCormack S, Thomson W (2015) Automatic animal detection from kinect sensed images for livestock monitoring and assessment. In: 2015 IEEE international conference on computer and information technology; Ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing, pp 1154–1157 . IEEE

  176. Zhu P, Sun Y, Wen L, Feng Y, Hu Q (2020) Drone based rgbt vehicle detection and counting: A challenge. arXiv preprint arXiv:2003.02437

Download references

Acknowledgements

The research work of this paper is sponsored by Natural science fund for colleges and universities in Jiangsu Province NO. 21KJB520015, and Post doctoral fund of Jiangsu Province NO. 2021K398C

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Liu.

Ethics declarations

Conflicts of interest

Authors of this manuscript declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Meng, S., Wang, H. et al. Deep learning based object detection from multi-modal sensors: an overview. Multimed Tools Appl 83, 19841–19870 (2024). https://doi.org/10.1007/s11042-023-16275-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16275-z

Keywords

Navigation