Skip to main content
Log in

From single- to multi-modal remote sensing imagery interpretation: a survey and taxonomy

  • Review
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Modality is a source or form of information. Through various modal information, humans can perceive the world from multiple perspectives. Simultaneously, the observation of remote sensing (RS) is multimodal. We observe the world macroscopically through panchromatic, Lidar, and other modal sensors. Multimodal observation of remote sensing has become an active area, which is beneficial for urban planning, monitoring, and other applications. Despite numerous advancements in this area, there has still not been a comprehensive assessment that provides a systematic overview with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences between single- and multimodal RS imagery interpretation, then use these differences to guide our research survey of multimodal RS imagery interpretation in a cascaded structure. Finally, some potential future research directions are explored and outlined. We hope that this survey will serve as a starting point for researchers to review state-of-the-art developments and work on multimodal research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Srivastava S, Vargas-Munoz J E, Tuia D. Understanding urban landuse from the above and ground perspectives: a deep learning, multimodal solution. Remote Sens Environ, 2019, 228: 129–143

    Article  Google Scholar 

  2. Poliyapram V, Wang W, Nakamura R. A point-wise LiDAR and image multimodal fusion network (PMNet) for aerial point cloud 3D semantic segmentation. Remote Sens, 2019, 11: 2961

    Article  Google Scholar 

  3. Rostami M, Kolouri S, Eaton E, et al. Deep transfer learning for few-shot SAR image classification. Remote Sens, 2019, 11: 1374

    Article  Google Scholar 

  4. Xu F, Hu C, Li J, et al. Special focus on deep learning in remote sensing image processing. Sci China Inf Sci, 2020, 63: 140300

    Article  Google Scholar 

  5. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778

  6. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. ArXiv:1409.1556

  7. Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. 5693–5703

  8. Baltrusaitis T, Ahuja C, Morency L P. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell, 2018, 41: 423–443

    Article  Google Scholar 

  9. Uss M L, Vozel B, Lukin V V, et al. Multimodal remote sensing image registration with accuracy estimation at local and global scales. IEEE Trans Geosci Remote Sens, 2016, 54: 6587–6605

    Article  Google Scholar 

  10. Fan J, Wu Y, Li M, et al. SAR and optical image registration using nonlinear diffusion and phase congruency structural descriptor. IEEE Trans Geosci Remote Sens, 2018, 56: 5368–5379

    Article  Google Scholar 

  11. Wang S, Quan D, Liang X, et al. A deep learning framework for remote sensing image registration. ISPRS J Photogrammetry Remote Sens, 2018, 145: 148–164

    Article  Google Scholar 

  12. Zhu Z. Change detection using landsat time series: a review of frequencies, preprocessing, algorithms, and applications. ISPRS J Photogrammetry Remote Sens, 2017, 130: 370–384

    Article  Google Scholar 

  13. Mou L, Bruzzone L, Zhu X X. Learning spectral-spatial-temporal features via a recurrent convolutional neural network for change detection in multispectral imagery. IEEE Trans Geosci Remote Sens, 2018, 57: 924–935

    Article  Google Scholar 

  14. Saha S, Bovolo F, Bruzzone L. Unsupervised deep change vector analysis for multiple-change detection in VHR images. IEEE Trans Geosci Remote Sens, 2019, 57: 3677–3693

    Article  Google Scholar 

  15. Yan J, Wang L, Song W, et al. A time-series classification approach based on change detection for rapid land cover mapping. ISPRS J Photogrammetry Remote Sens, 2019, 158: 249–262

    Article  Google Scholar 

  16. Guo M, Zhou C, Liu J. Jointly learning of visual and auditory: a new approach for RS image and audio cross-modal retrieval. IEEE J Sel Top Appl Earth Observations Remote Sens, 2019, 12: 4644–4654

    Article  Google Scholar 

  17. Chen Y, Lu X, Wang S. Deep cross-modal image-voice retrieval in remote sensing. IEEE Trans Geosci Remote Sens, 2020, 58: 7049–7061

    Article  Google Scholar 

  18. Yuan Z, Zhang W, Fu K, et al. Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval. IEEE Trans Geosci Remote Sens, 2022, 60: 1–19

    Google Scholar 

  19. Zitová B, Flusser J. Image registration methods: a survey. Image Vision Computing, 2003, 21: 977–1000

    Article  Google Scholar 

  20. Moigne J L. Introduction to remote sensing image registration. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017. 2565–2568

  21. Shi X, Deng Z, Ding X, et al. Land cover classification combining Sentinel-1 and Landsat 8 imagery driven by Markov random field with amendment reliability factors. J Wireless Com Network, 2020, 2020: 87

    Article  Google Scholar 

  22. Ma L, Crawford M M, Zhu L, et al. Centroid and covariance alignment-based domain adaptation for unsupervised classification of remote sensing images. IEEE Trans Geosci Remote Sens, 2018, 57: 2305–2323

    Article  Google Scholar 

  23. Gao G, Gu Y. Tensorized principal component alignment: a unified framework for multimodal high-resolution images classification. IEEE Trans Geosci Remote Sens, 2019, 57: 46–61

    Article  Google Scholar 

  24. Sun Y, Lei L, Li X, et al. Patch similarity graph matrix-based unsupervised remote sensing change detection with homogeneous and heterogeneous sensors. IEEE Trans Geosci Remote Sens, 2020, 59: 4841–4861

    Article  Google Scholar 

  25. Sun Y, Lei L, Guan D, et al. Iterative robust graph for unsupervised change detection of heterogeneous remote sensing images. IEEE Trans Image Process, 2021, 30: 6277–6291

    Article  Google Scholar 

  26. Sun Y, Lei L, Li X, et al. Nonlocal patch similarity based heterogeneous remote sensing change detection. Pattern Recognition, 2021, 109: 107598

    Article  Google Scholar 

  27. Garnot V S F, Landrieu L, Giordano S, et al. Satellite image time series classification with pixel-set encoders and temporal self-attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 12325–12334

  28. Garnot V S F, Landrieu L. Lightweight temporal self-attention for classifying satellite images time series. In: Proceedings of International Workshop on Advanced Analytics and Learning on Temporal Data, 2020. 171–181

  29. Abdullah T, Bazi Y, Al Rahhal M M, et al. TextRS: deep bidirectional triplet network for matching text to remote sensing images. Remote Sens, 2020, 12: 405

    Article  Google Scholar 

  30. Cheng Q, Zhou Y, Fu P, et al. A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing. IEEE J Sel Top Appl Earth Observations Remote Sens, 2021, 14: 4284–4297

    Article  Google Scholar 

  31. Yan L, Wang Z, Liu Y, et al. Generic and automatic Markov random field-based registration for multimodal remote sensing image using grayscale and gradient information. Remote Sens, 2018, 10: 1228

    Article  Google Scholar 

  32. Xiang Y, Tao R, Wan L, et al. OS-PC: combining feature representation and 3-D phase correlation for subpixel optical and SAR image registration. IEEE Trans Geosci Remote Sens, 2020, 58: 6451–6466

    Article  Google Scholar 

  33. Cole-Rhodes A A, Johnson K L, Lemoigne J, et al. Multiresolution registration of remote sensing imagery by optimization of mutual information using a stochastic gradient. IEEE Trans Image Process, 2003, 12: 1495–1511

    Article  MathSciNet  Google Scholar 

  34. Chen H-M, Varshney P K, Arora M K. Performance of mutual information similarity measure for registration of multitemporal remote sensing images. IEEE Trans Geosci Remote Sens, 2003, 41: 2445–2454

    Article  Google Scholar 

  35. Fan X F, Rhody H, Saber E. A Spatial-feature-enhanced MMI algorithm for multimodal airborne image registration. IEEE Trans Geosci Remote Sens, 2010, 48: 2580–2589

    Article  Google Scholar 

  36. Gross W, Espinosa N, Becker M, et al. Improving linear classification using semi-supervised invertible manifold alignment. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 2018. 3551–3554

  37. Pournemat A, Adibi P, Chanussot J. Semisupervised charting for spectral multimodal manifold learning and alignment. Pattern Recognition, 2021, 111: 107645

    Article  Google Scholar 

  38. Hu J, Hong D, Zhu X X. MIMA: MAPPER-induced manifold alignment for semi-supervised fusion of optical image and polarimetric SAR data. IEEE Trans Geosci Remote Sens, 2019, 57: 9025–9040

    Article  Google Scholar 

  39. Devis T, Michele V, Maxime T, et al. Semisupervised manifold alignment of multimodal remote sensing images. IEEE Trans Geosci Remote Sens, 2014, 52: 7708–7720

    Article  Google Scholar 

  40. Hong D, Yokoya N, Ge N, et al. Learnable manifold alignment (LeMA): a semi-supervised cross-modality learning framework for land cover and land use classification. ISPRS J Photogrammetry Remote Sens, 2019, 147: 193–205

    Article  Google Scholar 

  41. Ye Y, Shan J, Bruzzone L, et al. Robust registration of multimodal remote sensing images based on structural similarity. IEEE Trans Geosci Remote Sens, 2017, 55: 2941–2958

    Article  Google Scholar 

  42. Li Z, Zhang H, Huang Y. A rotation-invariant optical and SAR image registration algorithm based on deep and Gaussian features. Remote Sens, 2021, 13: 2628

    Article  Google Scholar 

  43. Ye Y, Yang C, Zhu B, et al. Improving co-registration for sentinel-1 SAR and sentinel-2 optical images. Remote Sens, 2021, 13: 928

    Article  Google Scholar 

  44. Quan D, Wang S, Liang X, et al. Deep generative matching network for optical and SAR image registration. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 2018. 6215–6218

  45. Zhang J, Ma W, Wu Y, et al. Multimodal remote sensing image registration based on image transfer and local features. IEEE Geosci Remote Sens Lett, 2019, 16: 1210–1214

    Article  Google Scholar 

  46. Xiang Y, Tao R, Wang F, et al. Automatic registration of optical and SAR images via improved phase congruency model. IEEE J Sel Top Appl Earth Observations Remote Sens, 2020, 13: 5847–5861

    Article  Google Scholar 

  47. Zhang H, Ni W, Yan W, et al. Registration of multimodal remote sensing image based on deep fully convolutional neural network. IEEE J Sel Top Appl Earth Observations Remote Sens, 2019, 12: 3028–3042

    Article  Google Scholar 

  48. Fan R, Hou B, Liu J, et al. Registration of multiresolution remote sensing images based on L2-siamese model. IEEE J Sel Top Appl Earth Observations Remote Sens, 2020, 14: 237–248

    Article  Google Scholar 

  49. Mao S, Yang J, Gou S, et al. Multi-scale fused SAR image registration based on deep forest. Remote Sens, 2021, 13: 2227

    Article  Google Scholar 

  50. Jimenez-Sierra D A, Benítez-Restrepo H D, Vargas-Cardona H D, et al. Graph-based data fusion applied to: change detection and biomass estimation in rice crops. Remote Sens, 2020, 12: 2683

    Article  Google Scholar 

  51. Yang M, Jiao L, Liu F, et al. DPFL-Nets: deep pyramid feature learning networks for multiscale change detection. IEEE Trans Neural Netw Learn Syst, 2022, 33: 6402–6416

    Article  Google Scholar 

  52. Xue D, Lei T, Jia X, et al. Unsupervised change detection using multiscale and multiresolution gaussian-mixture-model guided by saliency enhancement. IEEE J Sel Top Appl Earth Observations Remote Sens, 2020, 14: 1796–1809

    Article  Google Scholar 

  53. Chen Y, Bruzzone L. Self-supervised change detection in multi-view remote sensing images. 2021. ArXiv:2103.05969

  54. Guo H, Shi Q, Marinoni A, et al. Deep building footprint update network: a semi-supervised method for updating existing building footprint from bi-temporal remote sensing images. Remote Sens Environ, 2021, 264: 112589

    Article  Google Scholar 

  55. Kaiser P, Wegner J D, Lucchi A, et al. Learning aerial image segmentation from online maps. IEEE Trans Geosci Remote Sens, 2017, 55: 6054–6068

    Article  Google Scholar 

  56. Zampieri A, Charpiat G, Tarabalka Y. Coarse to fine non-rigid registration: a chain of scale-specific neural networks for multimodal image alignment with application to remote sensing. 2018. ArXiv:1802.09816

  57. Kocur-Bera K, Dawidowicz A. Land use versus land cover: geo-analysis of national roads and synchronisation algorithms. Remote Sens, 2019, 11: 3053

    Article  Google Scholar 

  58. Zhong Y, Su Y, Wu S, et al. Open-source data-driven urban land-use mapping integrating point-line-polygon semantic objects: a case study of Chinese cities. Remote Sens Environ, 2020, 247: 111838

    Article  Google Scholar 

  59. Corona P, Fattorini L, Franceschi S, et al. Mapping by spatial predictors exploiting remotely sensed and ground data: a comparative design-based perspective. Remote Sens Environ, 2014, 152: 29–37

    Article  Google Scholar 

  60. Chen P, Yao W, Zhu X. Combination of ground- and space-based data to establish a global ionospheric grid model. IEEE Trans Geosci Remote Sens, 2014, 53: 1073–1081

    Article  Google Scholar 

  61. Zhang R, Zhou X, Ouyang Z, et al. Estimating aboveground biomass in subtropical forests of China by integrating multisource remote sensing and ground data. Remote Sens Environ, 2019, 232: 111341

    Article  Google Scholar 

  62. Babaeian E, Paheding S, Siddique N, et al. Estimation of root zone soil moisture from ground and remotely sensed soil information with multisensor data fusion and automated machine learning. Remote Sens Environ, 2021, 260: 112434

    Article  Google Scholar 

  63. Handcock R, Swain D, Bishop-Hurley G, et al. Monitoring animal behaviour and environmental interactions using wireless sensor networks, GPS collars and satellite remote sensing. Sensors, 2009, 9: 3586–3603

    Article  Google Scholar 

  64. McRoberts R E, Chen Q, Walters B F, et al. The effects of global positioning system receiver accuracy on airborne laser scanning-assisted estimates of aboveground biomass. Remote Sens Environ, 2018, 207: 42–49

    Article  Google Scholar 

  65. Carlá T, Tofani V, Lombardi L, et al. Combination of GNSS, satellite InSAR, and GBInSAR remote sensing monitoring to improve the understanding of a large landslide in high alpine environment. Geomorphology, 2019, 335: 62–75

    Article  Google Scholar 

  66. Jat M K, Garg P K, Khare D. Monitoring and modelling of urban sprawl using remote sensing and GIS techniques. Int J Appl Earth Observation Geoinf, 2008, 10: 26–43

    Article  Google Scholar 

  67. Bachagha N, Wang X, Luo L, et al. Remote sensing and GIS techniques for reconstructing the military fort system on the Roman boundary (Tunisian section) and identifying archaeological sites. Remote Sens Environ, 2020, 236: 111418

    Article  Google Scholar 

  68. Manzoni M, Monti-Guarnieri A, Molinari M E. Joint exploitation of spaceborne SAR images and GIS techniques for urban coherent change detection. Remote Sens Environ, 2021, 253: 112152

    Article  Google Scholar 

  69. Moradkhani H. Hydrologic remote sensing and land surface data assimilation. Sensors, 2008, 8: 2986–3004

    Article  Google Scholar 

  70. Khan S I, Hong Y, Wang J, et al. Satellite remote sensing and hydrologic modeling for flood inundation mapping in lake victoria basin: implications for hydrologic prediction in ungauged basins. IEEE Trans Geosci Remote Sens, 2010, 49: 85–95

    Article  Google Scholar 

  71. Cimini D, Pierdicca N, Pichelli E, et al. On the accuracy of integrated water vapor observations and the potential for mitigating electromagnetic path delay error in InSAR. Atmos Meas Tech, 2012, 5: 1015–1030

    Article  Google Scholar 

  72. Mao G, Yuan Y, Lu X Q. Deep cross-modal retrieval for remote sensing image and audio. In: Proceedings of the 10th IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS), 2018. 1–7

  73. Chaudhuri U, Banerjee B, Bhattacharya A, et al. CMIR-NET: a deep learning based model for cross-modal retrieval in remote sensing. Pattern Recognition Lett, 2020, 131: 456–462

    Article  Google Scholar 

  74. Ning H, Zhao B, Yuan Y. Semantics-consistent representation learning for remote sensing image-voice retrieval. IEEE Trans Geosci Remote Sens, 2022, 60: 1–14

    Article  Google Scholar 

  75. Zhou N, Fan J. Automatic image-text alignment for large-scale web image indexing and retrieval. Pattern Recognition, 2015, 48: 205–219

    Article  Google Scholar 

  76. Wehrmann J, Kolling C, Barros R C. Adaptive cross-modal embeddings for image-text alignment. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020. 12313–12320

  77. Zhang F, Xu M, Mao Q, et al. Joint attribute manipulation and modality alignment learning for composing text and image to image retrieval. In: Proceedings of the 28th ACM International Conference on Multimedia, 2020. 3367–3376

  78. Sargin M E, Yemez Y, Erzin E, et al. Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Trans Multimedia, 2007, 9: 1396–1403

    Article  Google Scholar 

  79. Halperin T, Ephrat A, Peleg S. Dynamic temporal alignment of speech to lips. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. 3980–3984

  80. Wang J, Fang Z, Zhao H. AlignNet: a unifying approach to audio-visual alignment. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020. 3309–3317

  81. Bojanowski P, Lajugie R, Grave E, et al. Weakly-supervised alignment of video with text. In: Proceedings of the IEEE International Conference on Computer Vision, 2015. 4462–4470

  82. Song Y C, Naim I, Mamun A A, et al. Unsupervised alignment of actions in video with text descriptions. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, 2016. 2025–2031

  83. Wang X, Zhu L, Yang Y. T2VLAD: global-local sequence alignment for text-video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. 5079–5088

  84. Walker J J, de Beurs K M, Wynne R H, et al. Evaluation of Landsat and MODIS data fusion products for analysis of dryland forest phenology. Remote Sens Environ, 2012, 117: 381–393

    Article  Google Scholar 

  85. Ward D P, Petty A, Setterfield S A, et al. Floodplain inundation and vegetation dynamics in the Alligator Rivers region (Kakadu) of northern Australia assessed using optical and radar remote sensing. Remote Sens Environ, 2014, 147: 43–55

    Article  Google Scholar 

  86. Zhao Y, Huang B, Song H. A robust adaptive spatial and temporal image fusion model for complex land surface changes. Remote Sens Environ, 2018, 208: 42–62

    Article  Google Scholar 

  87. Gevaert C M, Suomalainen J, Tang J, et al. Generation of spectral-temporal response surfaces by combining multispectral satellite and hyperspectral UAV imagery for precision agriculture applications. IEEE J Sel Top Appl Earth Observations Remote Sens, 2015, 8: 3140–3146

    Article  Google Scholar 

  88. Maimaitijiang M, Ghulam A, Sidike P, et al. Unmanned aerial system (UAS)-based phenotyping of soybean using multisensor data fusion and extreme learning machine. ISPRS J Photogrammetry Remote Sens, 2017, 134: 43–58

    Article  Google Scholar 

  89. Kimm H, Guan K, Jiang C, et al. Deriving high-spatiotemporal-resolution leaf area index for agroecosystems in the U.S. Corn Belt using Planet Labs CubeSat and STAIR fusion data. Remote Sens Environ, 2020, 239: 111615

    Article  Google Scholar 

  90. Im J, Lu Z, Rhee J, et al. Impervious surface quantification using a synthesis of artificial immune networks and decision/regression trees from multi-sensor data. Remote Sens Environ, 2012, 117: 102–113

    Article  Google Scholar 

  91. Liu L, Coops N C, Aven N W, et al. Mapping urban tree species using integrated airborne hyperspectral and LiDAR remote sensing data. Remote Sens Environ, 2017, 200: 170–182

    Article  Google Scholar 

  92. Cao R, Tu W, Yang C, et al. Deep learning-based remote and social sensing data fusion for urban region function recognition. ISPRS J Photogrammetry Remote Sens, 2020, 163: 82–97

    Article  Google Scholar 

  93. Hall D L, Llinas J. An introduction to multisensor data fusion. Proc IEEE, 1997, 85: 6–23

    Article  Google Scholar 

  94. Pradhan P S, King R L, Younan N H, et al. Estimation of the number of decomposition levels for a wavelet-based multiresolution multisensor image fusion. IEEE Trans Geosci Remote Sens, 2006, 44: 3674–3686

    Article  Google Scholar 

  95. Palsson F, Sveinsson J R, Ulfarsson M O, et al. Model-based fusion of multi- and hyperspectral images using PCA and wavelets. IEEE Trans Geosci Remote Sens, 2014, 53: 2652–2663

    Article  Google Scholar 

  96. Schmitt M, Zhu X X. Data fusion and remote sensing: an ever-growing relationship. IEEE Geosci Remote Sens Mag, 2016, 4: 6–23

    Article  Google Scholar 

  97. Moosavi V, Talebi A, Mokhtari M H, et al. A wavelet-artificial intelligence fusion approach (WAIFA) for blending Landsat and MODIS surface temperature. Remote Sens Environ, 2015, 169: 243–254

    Article  Google Scholar 

  98. Chen Y, Li C, Ghamisi P, et al. Deep fusion of remote sensing data for accurate classification. IEEE Geosci Remote Sens Lett, 2017, 14: 1253–1257

    Article  Google Scholar 

  99. Li H, Ghamisi P, Soergel U, et al. Hyperspectral and LiDAR fusion using deep three-stream convolutional neural networks. Remote Sens, 2018, 10: 1649

    Article  Google Scholar 

  100. Vörösmarty C J, Willmott C J, Choudhury B J, et al. Analyzing the discharge regime of a large tropical river through remote sensing, ground-based climatic data, and modeling. Water Resour Res, 1996, 32: 3137–3150

    Article  Google Scholar 

  101. Chatterjee A, Michalak A M, Kahn R A, et al. A geostatistical data fusion technique for merging remote sensing and ground-based observations of aerosol optical thickness. J Geophys Res, 2010, 115: D20207

    Article  Google Scholar 

  102. Tian J, Chen D. A semi-empirical model for predicting hourly ground-level fine particulate matter (PM2.5) concentration in southern Ontario from satellite remote sensing and ground-based meteorological measurements. Remote Sens Environ, 2010, 114: 221–229

    Article  Google Scholar 

  103. Li F, Zhang X, Kondragunta S, et al. A preliminary evaluation of GOES-16 active fire product using Landsat-8 and VIIRS active fire data, and ground-based prescribed fire records. Remote Sens Environ, 2020, 237: 111600

    Article  Google Scholar 

  104. Alparone L, Aiazzi B, Baronti S, et al. Multispectral and panchromatic data fusion assessment without reference. photogramm eng remote Sens, 2008, 74: 193–200

    Article  Google Scholar 

  105. Li Z, Leung H. Fusion of multispectral and panchromatic images using a restoration-based method. IEEE Trans Geosci Remote Sens, 2009, 47: 1482–1491

    Article  Google Scholar 

  106. Zhang L P, Shen H F, Gong W, et al. Adjustable model-based fusion method for multispectral and panchromatic images. IEEE Trans Syst Man Cybern B, 2012, 42: 1693–1704

    Article  Google Scholar 

  107. Chanussot J, Mauris G, Lambert P. Fuzzy fusion techniques for linear features detection in multitemporal SAR images. IEEE Trans Geosci Remote Sens, 1999, 37: 1292–1305

    Article  Google Scholar 

  108. Jeon B, Landgrebe D A. Decision fusion approach for multitemporal classification. IEEE Trans Geosci Remote Sens, 1999, 37: 1227–1233

    Article  Google Scholar 

  109. Dai X, Khorram S. Data fusion using artificial neural networks: a case study on multitemporal change analysis. Comput Environ Urban Syst, 1999, 23: 19–31

    Article  Google Scholar 

  110. McKeown D M, Cochran S D, Ford S J, et al. Fusion of HYDICE hyperspectral data with panchromatic imagery for cartographic feature extraction. IEEE Trans Geosci Remote Sens, 1999, 37: 1261–1277

    Article  Google Scholar 

  111. Hardie R C, Eismann M T, Wilson G L. MAP estimation for hyperspectral image resolution enhancement using an auxiliary sensor. IEEE Trans Image Process, 2004, 13: 1174–1184

    Article  Google Scholar 

  112. Cetin M, Musaoglu N. Merging hyperspectral and panchromatic image data: qualitative and quantitative analysis. Int J Remote Sens, 2009, 30: 1779–1804

    Article  Google Scholar 

  113. Zehtabian A, Ghassemian H. An adaptive pixon extraction technique for multispectral/hyperspectral image classification. IEEE Geosci Remote Sens Lett, 2015, 12: 831–835

    Article  Google Scholar 

  114. Yokoya N, Grohnfeldt C, Chanussot J. Hyperspectral and multispectral data fusion: a comparative review of the recent literature. IEEE Geosci Remote Sens Mag, 2017, 5: 29–56

    Article  Google Scholar 

  115. Palsson F, Sveinsson J R, Ulfarsson M O. Multispectral and hyperspectral image fusion using a 3-D-convolutional neural network. IEEE Geosci Remote Sens Lett, 2017, 14: 639–643

    Article  Google Scholar 

  116. Haydn R. Application of the IHS color transform to the processing of multisensor data and image enhancement. In: Proceedings of the International Symposium on Remote Sensing of Arid and Semi-Arid Lands, Cairo, 1982

  117. Carper W, Lillesand T, Kiefer R. The use of intensity-hue-saturation transformations for merging spot panchromatic and multispectral image data. Photogrammetric Engin Remote Sens, 1990, 56: 459–467

    Google Scholar 

  118. Ehlers M. Multisensor image fusion techniques in remote sensing. ISPRS J Photogrammetry Remote Sens, 1991, 46: 19–30

    Article  Google Scholar 

  119. Ling Y, Ehlers M, Usery E L, et al. FFT-enhanced IHS transform method for fusing high-resolution satellite images. ISPRS J Photogrammetry Remote Sens, 2007, 61: 381–392

    Article  Google Scholar 

  120. Chavez P, Sides S C, Anderson J A, et al. Comparison of three different methods to merge multiresolution and multispectral data- landsat tm and spot panchromatic. Photogrammetric Engin Remote Sens, 1991, 57: 295–303

    Google Scholar 

  121. Shettigara V K. A generalized component substitution technique for spatial enhancement of multispectral images using a higher resolution data set. Photogrammetric Engin Remote Sens, 1992, 58: 561–567

    Google Scholar 

  122. Licciardi G, Khan M M, Chanussot J. Fusion of hyperspectral and panchromatic images: a hybrid use of indusion and nonlinear PCA. In: Proceedings of the 19th IEEE International Conference on Image Processing, 2012. 2133–2136

  123. Shahdoosti H R, Ghassemian H. Combining the spectral PCA and spatial PCA fusion methods by an optimal filter. Inf Fusion, 2016, 27: 150–160

    Article  Google Scholar 

  124. Aiazzi B, Baronti S, Selva M. Improving component substitution pansharpening through multivariate regression of MS +Pan data. IEEE Trans Geosci Remote Sens, 2007, 45: 3230–3239

    Article  Google Scholar 

  125. Maurer T. How to pan-sharpen images using the Gram-Schmidt pan-sharpen method—a recipe. In: Proceedings of International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2013. 239–244

  126. Yilmaz V, Yilmaz C S, Güngör O, et al. A genetic algorithm solution to the Gram-Schmidt image fusion. Int J Remote Sens, 2020, 41: 1458–1485

    Article  Google Scholar 

  127. Tu T M, Lee Y C, Chang C P, et al. Adjustable intensity-hue-saturation and Brovey transform fusion technique for IKONOS/QuickBird imagery. Opt Eng, 2005, 44: 116201

    Article  Google Scholar 

  128. Du Q, Younan N H, King R, et al. On the performance evaluation of pan-sharpening techniques. IEEE Geosci Remote Sens Lett, 2007, 4: 518–522

    Article  Google Scholar 

  129. Dian R, Li S. Hyperspectral image super-resolution via subspace-based low tensor multi-rank regularization. IEEE Trans Image Process, 2019, 28: 5135–5146

    Article  MathSciNet  MATH  Google Scholar 

  130. Xu H, Qin M, Chen S, et al. Hyperspectral-multispectral image fusion via tensor ring and subspace decompositions. IEEE J Sel Top Appl Earth Observations Remote Sens, 2021, 14: 8823–8837

    Article  Google Scholar 

  131. Tu T M, Huang P S, Hung C L, et al. A fast intensity-hue-saturation fusion technique with spectral adjustment for IKONOS imagery. IEEE Geosci Remote Sens Lett, 2004, 1: 309–312

    Article  Google Scholar 

  132. Chen Z, Pu H, Wang B, et al. Fusion of hyperspectral and multispectral images: a novel framework based on generalization of pan-sharpening methods. IEEE Geosci Remote Sens Lett, 2014, 11: 1418–1422

    Article  Google Scholar 

  133. Gangkofner U G, Pradhan P S, Holcomb D W. Optimizing the high-pass filter addition technique for image fusion. Photogramm Eng Remote Sens, 2008, 74: 1107–1118

    Article  Google Scholar 

  134. Nunez J, Otazu X, Fors O, et al. Multiresolution-based image fusion with additive wavelet decomposition. IEEE Trans Geosci Remote Sens, 1999, 37: 1204–1211

    Article  Google Scholar 

  135. Aiazzi B, Alparone L, Barducci A, et al. Multispectral fusion of multisensor image data by the generalized laplacian pyramid. In: Proceedings of IEEE 1999 International Geoscience and Remote Sensing Symposium, 1999. 1183–1185

  136. Nencini F, Garzelli A, Baronti S, et al. Remote sensing image fusion using the curvelet transform. Inf Fusion, 2007, 8: 143–156

    Article  Google Scholar 

  137. Choi M, Kim R Y, Nam M R, et al. Fusion of multispectral and panchromatic satellite images using the curvelet transform. IEEE Geosci Remote Sens Lett, 2005, 2: 136–140

    Article  Google Scholar 

  138. Dong L, Yang Q, Wu H, et al. High quality multi-spectral and panchromatic image fusion technologies based on Curvelet transform. Neurocomputing, 2015, 159: 268–274

    Article  Google Scholar 

  139. Masi G, Cozzolino D, Verdoliva L, et al. Pansharpening by convolutional neural networks. Remote Sens, 2016, 8: 594

    Article  Google Scholar 

  140. Wei Y, Yuan Q, Shen H, et al. Boosting the accuracy of multispectral image pansharpening by learning a deep residual network. IEEE Geosci Remote Sens Lett, 2017, 14: 1795–1799

    Article  Google Scholar 

  141. Yang J, Fu X, Hu Y, et al. PanNet: a deep network architecture for pan-sharpening. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 5449–5457

  142. Scarpa G, Vitale S, Cozzolino D. Target-adaptive CNN-based pansharpening. IEEE Trans Geosci Remote Sens, 2018, 56: 5443–5457

    Article  Google Scholar 

  143. Gao F, Masek J, Schwaller M, et al. On the blending of the Landsat and MODIS surface reflectance: predicting daily Landsat surface reflectance. IEEE Trans Geosci Remote Sens, 2006, 44: 2207–2218

    Article  Google Scholar 

  144. Zhu X, Chen J, Gao F, et al. An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions. Remote Sens Environ, 2010, 114: 2610–2623

    Article  Google Scholar 

  145. Gevaert C M, García-Haro F J. A comparison of STARFM and an unmixing-based algorithm for Landsat and MODIS data fusion. Remote Sens Environ, 2015, 156: 34–44

    Article  Google Scholar 

  146. Xie D, Zhang J, Zhu X, et al. An improved STARFM with help of an unmixing-based method to generate high spatial and temporal resolution remote sensing data in complex heterogeneous regions. Sensors, 2016, 16: 207

    Article  Google Scholar 

  147. Xue J, Leung Y, Fung T. A Bayesian data fusion approach to spatio-temporal fusion of remotely sensed images. Remote Sens, 2017, 9: 1310

    Article  Google Scholar 

  148. Pedergnana M, Marpu P R, Mura M D, et al. Classification of remote sensing optical and LiDAR data using extended attribute profiles. IEEE J Sel Top Signal Process, 2012, 6: 856–865

    Article  Google Scholar 

  149. Chini M, Pierdicca N, Emery W J. Exploiting SAR and VHR optical images to quantify damage caused by the 2003 Bam earthquake. IEEE Trans Geosci Remote Sens, 2008, 47: 145–152

    Article  Google Scholar 

  150. Pedergnana M, Marpu P R, Mura M D, et al. A novel technique for optimal feature selection in attribute profiles based on genetic algorithms. IEEE Trans Geosci Remote Sens, 2013, 51: 3514–3528

    Article  Google Scholar 

  151. Ghamisi P, Benediktsson J A, Phinn S. Land-cover classification using both hyperspectral and LiDAR data. Int J Image Data Fusion, 2015, 6: 189–215

    Article  Google Scholar 

  152. Rasti B, Ghamisi P. Remote sensing image classification using subspace sensor fusion. Inf Fusion, 2020, 64: 121–130

    Article  Google Scholar 

  153. Rasti B, Ulfarsson M O, Sveinsson J R. Hyperspectral feature extraction using total variation component analysis. IEEE Trans Geosci Remote Sens, 2016, 54: 6976–6985

    Article  Google Scholar 

  154. Rasti B, Ghamisi P, Gloaguen R. Hyperspectral and LiDAR fusion using extinction profiles and total variation component analysis. IEEE Trans Geosci Remote Sens, 2017, 55: 3997–4007

    Article  Google Scholar 

  155. McCabe M F, Wood E F, Wójcik R, et al. Hydrological consistency using multi-sensor remote sensing data for water and energy cycle studies. Remote Sens Environ, 2008, 112: 430–444

    Article  Google Scholar 

  156. Awange J L, Schumacher M, Forootan E, et al. Exploring hydro-meteorological drought patterns over the Greater Horn of Africa (1979–2014) using remote sensing and reanalysis products. Adv Water Resources, 2016, 94: 45–59

    Article  Google Scholar 

  157. Teillet P. Effects of spectral, spatial, and radiometric characteristics on remote sensing vegetation indices of forested regions. Remote Sens Environ, 1997, 61: 139–149

    Article  Google Scholar 

  158. Babst F, Esper J, Parlow E. Landsat TM/ETM+ and tree-ring based assessment of spatiotemporal patterns of the autumnal moth (Epirrita autumnata) in northernmost Fennoscandia. Remote Sens Environ, 2010, 114: 637–646

    Article  Google Scholar 

  159. Guanter L, Richter R, Kaufmann H. On the application of the MODTRAN4 atmospheric radiative transfer code to optical remote sensing. Int J Remote Sens, 2009, 30: 1407–1424

    Article  Google Scholar 

  160. Bloom A A, Worden J, Jiang Z, et al. Remote-sensing constraints on South America fire traits by Bayesian fusion of atmospheric and surface data. Geophys Res Lett, 2015, 42: 1268–1274

    Article  Google Scholar 

  161. Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1798–1828

    Article  Google Scholar 

  162. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60: 84–90

    Article  Google Scholar 

  163. Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows. 2021. ArXiv:2103.14030

  164. Zhang L, Lan M, Zhang J, et al. Stagewise unsupervised domain adaptation with adversarial self-training for road segmentation of remote-sensing images. IEEE Trans Geosci Remote Sens, 2022, 60: 1–13

    Google Scholar 

  165. Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. 2013. ArXiv:1301.3781

  166. Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. 1532–1543

  167. Devlin J, Chang M W, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding. 2018. ArXiv:1810.04805

  168. Schneider S, Baevski A, Collobert R, et al. wav2vec: unsupervised pre-training for speech recognition. 2019. ArXiv:1904.05862

  169. Xu Q, Baevski A, Likhomanenko T, et al. Self-training and pre-training are complementary for speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021. 3030–3034

  170. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014. 701–710

  171. Grover A, Leskovec J. node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. 855–864

  172. Wang H, Wang J, Wang J, et al. GraphGAN: graph representation learning with generative adversarial nets. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018

  173. Guo W, Wang J, Wang S. Deep multimodal representation learning: a survey. IEEE Access, 2019, 7: 63373–63394

    Article  Google Scholar 

  174. LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278–2324

    Article  Google Scholar 

  175. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1–9

  176. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008

  177. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations, 2020

  178. Yuan L, Chen Y, Wang T, et al. Tokens-to-Token ViT: training vision transformers from scratch on ImageNet. 2021. ArXiv:2101.11986

  179. Bazi Y, Bashmal L, Rahhal M M A, et al. Vision transformers for remote sensing image classification. Remote Sens, 2021, 13: 516

    Article  Google Scholar 

  180. He X, Chen Y, Lin Z. Spatial-spectral transformer for hyperspectral image classification. Remote Sens, 2021, 13: 498

    Article  Google Scholar 

  181. Chen H, Qi Z, Shi Z. Remote sensing image change detection with transformers. IEEE Trans Geosci Remote Sens, 2022, 60: 1–14

    Google Scholar 

  182. Pascual S, Ravanelli M, Serrà J, et al. Learning problem-agnostic speech representations from multiple self-supervised tasks. In: Proceedings of Interspeech 2019, 2019. 161–165

  183. Liu A T, Yang S W, Chi P H, et al. Mockingjay: unsupervised speech representation learning with deep bidirectional transformer encoders. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020. 6419–6423

  184. Sharma M, Dhanaraj M, Karnam S, et al. YOLOrs: object detection in multimodal remote sensing imagery. IEEE J Sel Top Appl Earth Observations Remote Sens, 2020, 14: 1497–1508

    Article  Google Scholar 

  185. Yang D, Liu X, He H, et al. Air-to-ground multimodal object detection algorithm based on feature association learning. Int J Adv Robotic Syst, 2019, 16: 172988141984299

    Article  Google Scholar 

  186. Flynn H, Cameron S. Multi-modal people detection from aerial video. In: Proceedings of the 8th International Conference on Computer Recognition Systems, 2013. 815–824

  187. de Oliveira D C, Wehrmeister M A. Towards real-time people recognition on aerial imagery using convolutional neural networks. In: Proceedings of IEEE 19th International Symposium on Real-Time Distributed Computing, 2016. 27–34

  188. Breckon T P, Gaszczak A, Han J, et al. Multi-modal target detection for autonomous wide area search and surveillance. In: Proceedings of SPIE—International Society for Optics and Photonics, 2013. 889913

  189. Audebert N, Le Saux B, Lefèvre S. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks. In: Proceedings of Asian Conference on Computer Vision, 2016. 180–196

  190. Audebert N, Le Saux B, Lefèvre S. Beyond RGB: very high resolution urban remote sensing with multimodal deep networks. ISPRS J Photogrammetry Remote Sens, 2018, 140: 20–32

    Article  Google Scholar 

  191. Li X, Lei L, Sun Y, et al. Multimodal bilinear fusion network with second-order attention-based channel selection for land cover classification. IEEE J Sel Top Appl Earth Observations Remote Sens, 2020, 13: 1011–1026

    Article  Google Scholar 

  192. Jeong J, Yoon T S, Park J B. Multimodal sensor-based semantic 3D mapping for a large-scale environment. Expert Syst Appl, 2018, 105: 1–10

    Article  Google Scholar 

  193. Farooq A, Jia X, Hu J, et al. Multi-resolution weed classification via convolutional neural network and superpixel based local binary pattern using remote sensing images. Remote Sens, 2019, 11: 1692

    Article  Google Scholar 

  194. Li Z, Chen G, Zhang T. A CNN-transformer hybrid approach for crop classification using multitemporal multisensor images. IEEE J Sel Top Appl Earth Observations Remote Sens, 2020, 13: 847–858

    Article  Google Scholar 

  195. Zhou M, Jing M, Liu D, et al. Multi-resolution networks for ship detection in infrared remote sensing images. Infrared Phys Tech, 2018, 92: 183–189

    Article  Google Scholar 

  196. Wang Y, Wang C, Zhang H, et al. Automatic ship detection based on RetinaNet using multi-resolution Gaofen-3 imagery. Remote Sens, 2019, 11: 531

    Article  Google Scholar 

  197. Bergado J R, Persello C, Stein A. Recurrent multiresolution convolutional networks for VHR image classification. IEEE Trans Geosci Remote Sens, 2018, 56: 6361–6374

    Article  Google Scholar 

  198. Robinson C, Hou L, Malkin K, et al. Large scale high-resolution land cover mapping with multi-resolution data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. 12726–12735

  199. Wirion C, Bauwens W, Verbeiren B. Location- and time-specific hydrological simulations with multi-resolution remote sensing data in urban areas. Remote Sens, 2017, 9: 645

    Article  Google Scholar 

  200. Ye Y, Bruzzone L, Shan J, et al. Fast and robust matching for multimodal remote sensing image registration. IEEE Trans Geosci Remote Sens, 2019, 57: 9059–9070

    Article  Google Scholar 

  201. Uss M, Vozel B, Lukin V, et al. Efficient discrimination and localization of multimodal remote sensing images using CNN-based prediction of localization uncertainty. Remote Sens, 2020, 12: 703

    Article  Google Scholar 

  202. Zhu R, Yu D, Ji S, et al. Matching RGB and infrared remote sensing images with densely-connected convolutional neural networks. Remote Sens, 2019, 11: 2836

    Article  Google Scholar 

  203. Huang B, Li Y, Han X, et al. Cloud removal from optical satellite imagery with SAR imagery using sparse representation. IEEE Geosci Remote Sens Lett, 2015, 12: 1046–1050

    Article  Google Scholar 

  204. Meraner A, Ebel P, Zhu X X, et al. Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion. ISPRS J Photogrammetry Remote Sens, 2020, 166: 333–346

    Article  Google Scholar 

  205. Zhao Y, Shen S, Hu J, et al. Cloud removal using multimodal GAN with adversarial consistency loss. IEEE Geosci Remote Sens Lett, 2022, 19: 1–5

    Google Scholar 

  206. Dai P, Ji S, Zhang Y. Gated convolutional networks for cloud removal from bi-temporal remote sensing images. Remote Sens, 2020, 12: 3427

    Article  Google Scholar 

  207. Hong D, Yao J, Meng D, et al. Multimodal GANs: toward crossmodal hyperspectral-multispectral image segmentation. IEEE Trans Geosci Remote Sens, 2020, 59: 5103–5113

    Article  Google Scholar 

  208. Liu X, Hong D, Chanussot J, et al. Modality translation in remote sensing time series. IEEE Trans Geosci Remote Sens, 2022, 60: 1–14

    Google Scholar 

  209. Sun L, Mi X, Wei J, et al. A cloud detection algorithm-generating method for remote sensing data at visible to short-wave infrared wavelengths. ISPRS J Photogrammetry Remote Sens, 2017, 124: 70–88

    Article  Google Scholar 

  210. Ao D, Dumitru C O, Schwarz G, et al. Dialectical GAN for SAR image translation: from Sentinel-1 to TerraSAR-X. Remote Sens, 2018, 10: 1597

    Article  Google Scholar 

  211. Gao J, Yuan Q, Li J, et al. Cloud removal with fusion of high resolution optical and SAR images using generative adversarial networks. Remote Sens, 2020, 12: 191

    Article  Google Scholar 

  212. Fu S L, Xu F, Jin Y-Q. Reciprocal translation between SAR and optical remote sensing images with cascaded-residual adversarial networks. Sci China Inf Sci, 2021, 64: 122301

    Article  Google Scholar 

  213. Shi Z, Zou Z. Can a machine generate humanlike language descriptions for a remote sensing image? IEEE Trans Geosci Remote Sens, 2017, 55: 3623–3634

    Article  Google Scholar 

  214. Lu X, Wang B, Zheng X, et al. Exploring models and data for remote sensing image caption generation. IEEE Trans Geosci Remote Sens, 2017, 56: 2183–2195

    Article  Google Scholar 

  215. Shen X, Liu B, Zhou Y, et al. Remote sensing image captioning via variational autoencoder and reinforcement learning. Knowledge-Based Syst, 2020, 203: 105920

    Article  Google Scholar 

  216. Ju J, Roy D P. The availability of cloud-free Landsat ETM+ data over the conterminous United States and globally. Remote Sens Environ, 2008, 112: 1196–1211

    Article  Google Scholar 

  217. Ling F, Du Y, Li X, et al. Interpolation-based super-resolution land cover mapping. Remote Sens Lett, 2013, 4: 629–638

    Article  Google Scholar 

  218. Pignol F, Colone F, Martelli T. Lagrange-polynomial-interpolation-based keystone transform for a passive radar. IEEE Trans Aerosp Electron Syst, 2017, 54: 1151–1167

    Article  Google Scholar 

  219. Zhang Y, Fan Q, Bao F, et al. Single-image super-resolution based on rational fractal interpolation. IEEE Trans Image Process, 2018, 27: 3782–3797

    Article  MathSciNet  MATH  Google Scholar 

  220. Chavez-Roman H, Ponomaryov V. Super resolution image generation using wavelet domain interpolation with edge extraction via a sparse representation. IEEE Geosci Remote Sens Lett, 2014, 11: 1777–1781

    Article  Google Scholar 

  221. Shao Z, Wang L, Wang Z, et al. Remote sensing image super-resolution using sparse representation and coupled sparse autoencoder. IEEE J Sel Top Appl Earth Observations Remote Sens, 2019, 12: 2663–2674

    Article  Google Scholar 

  222. Hou B, Zhou K, Jiao L. Adaptive super-resolution for remote sensing images based on sparse representation with global joint dictionary model. IEEE Trans Geosci Remote Sens, 2017, 56: 2312–2327

    Article  Google Scholar 

  223. Chang Y, Luo B. Bidirectional convolutional LSTM neural network for remote sensing image super-resolution. Remote Sens, 2019, 11: 2333

    Article  Google Scholar 

  224. Gu J, Sun X, Zhang Y, et al. Deep residual squeeze and excitation network for remote sensing image super-resolution. Remote Sens, 2019, 11: 1817

    Article  Google Scholar 

  225. Lu T, Wang J, Zhang Y, et al. Satellite image super-resolution via multi-scale residual deep neural network. Remote Sens, 2019, 11: 1588

    Article  Google Scholar 

  226. Haut J M, Fernandez-Beltran R, Paoletti M E, et al. A new deep generative network for unsupervised remote sensing single-image super-resolution. IEEE Trans Geosci Remote Sens, 2018, 56: 6792–6810

    Article  Google Scholar 

  227. Lei S, Shi Z, Zou Z. Coupled adversarial training for remote sensing image super-resolution. IEEE Trans Geosci Remote Sens, 2019, 58: 3633–3643

    Article  Google Scholar 

  228. Xiong Y, Guo S, Chen J, et al. Improved SRGAN for remote sensing image super-resolution across locations and sensors. Remote Sens, 2020, 12: 1263

    Article  Google Scholar 

  229. Zhang D, Shao J, Li X, et al. Remote sensing image super-resolution via mixed high-order attention network. IEEE Trans Geosci Remote Sens, 2020, 59: 5183–5196

    Article  Google Scholar 

  230. Salvetti F, Mazzia V, Khaliq A, et al. Multi-image super resolution of remotely sensed images using residual attention deep neural networks. Remote Sens, 2020, 12: 2207

    Article  Google Scholar 

  231. Zhang S, Yuan Q, Li J, et al. Scene-adaptive remote sensing image super-resolution using a multiscale attention network. IEEE Trans Geosci Remote Sens, 2020, 58: 4764–4779

    Article  Google Scholar 

  232. Liu P, Wang M, Wang L, et al. Remote-sensing image denoising with multi-sourced information. IEEE J Sel Top Appl Earth Observations Remote Sens, 2019, 12: 660–674

    Article  Google Scholar 

  233. Feng X, Zhang W, Su X, et al. Optical remote sensing image denoising and super-resolution reconstructing using optimized generative network in wavelet transform domain. Remote Sens, 2021, 13: 1858

    Article  Google Scholar 

  234. Enomoto K, Sakurada K, Wang W, et al. Image translation between SAR and optical imagery with generative adversarial nets. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 2018. 1752–1755

  235. Reyes M F, Auer S, Merkle N, et al. SAR-to-optical image translation based on conditional generative adversarial networks-optimization, opportunities and limits. Remote Sens, 2019, 11: 2067

    Article  Google Scholar 

  236. Zhang Q, Liu X, Liu M, et al. Comparative analysis of edge information and polarization on SAR-to-optical translation based on conditional generative adversarial networks. Remote Sens, 2021, 13: 128

    Article  Google Scholar 

  237. Ji S, Wang D, Luo M. Generative adversarial network-based full-space domain adaptation for land cover classification from multiple-source remote sensing images. IEEE Trans Geosci Remote Sens, 2020, 59: 3816–3828

    Article  Google Scholar 

  238. Peng D, Guan H, Zang Y, et al. Full-level domain adaptation for building extraction in very-high-resolution optical remote-sensing images. IEEE Trans Geosci Remote Sens, 2022, 60: 1–17

    Google Scholar 

  239. Qu B, Li X, Tao D, et al. Deep semantic understanding of high resolution remote sensing image. In: Proceedings of 2016 International Conference on Computer, Information and Telecommunication Systems (CITS), 2016. 1–5

  240. Wang B, Lu X, Zheng X, et al. Semantic descriptions of high-resolution remote sensing images. IEEE Geosci Remote Sens Lett, 2019, 16: 1274–1278

    Article  Google Scholar 

  241. Lobry S, Murray J, Marcos D, et al. Visual question answering from remote sensing images. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 2019. 4951–4954

  242. Lobry S, Marcos D, Murray J, et al. RSVQA: visual question answering for remote sensing data. IEEE Trans Geosci Remote Sens, 2020, 58: 8555–8566

    Article  Google Scholar 

  243. Zheng X, Wang B, Du X, et al. Mutual attention inception network for remote sensing visual question answering. IEEE Trans Geosci Remote Sens, 2022, 60: 1–14

    Article  Google Scholar 

  244. Lu X, Wang B, Zheng X. Sound active attention framework for remote sensing image captioning. IEEE Trans Geosci Remote Sens, 2019, 58: 1985–2000

    Article  Google Scholar 

  245. Wu S, Zhang X, Wang X, et al. Scene attention mechanism for remote sensing image caption generation. In: Proceedings of 2020 International Joint Conference on Neural Networks (IJCNN), 2020. 1–7

  246. Zhao R, Shi Z, Zou Z. High-resolution remote sensing image captioning based on structured attention. IEEE Trans Geosci Remote Sens, 2022, 60: 1–14

    Article  Google Scholar 

  247. Huang W, Wang Q, Li X. Denoising-based multiscale feature fusion for remote sensing image captioning. IEEE Geosci Remote Sens Lett, 2020, 18: 436–440

    Article  Google Scholar 

  248. Wang Q, Huang W, Zhang X, et al. Word-sentence framework for remote sensing image captioning. IEEE Trans Geosci Remote Sens, 2021, 59: 10532–10543

    Article  Google Scholar 

  249. Yao Y, Doretto G. Boosting for transfer learning with multiple sources. In: Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010. 1855–1862

  250. Liu W, Qin R. A MultiKernel domain adaptation method for unsupervised transfer learning on cross-source and cross-region remote sensing data classification. IEEE Trans Geosci Remote Sens, 2020, 58: 4279–4289

    Article  Google Scholar 

  251. Xu Z, Chen Z, Yi W, et al. Deep gradient prior network for DEM super-resolution: Transfer learning from image to DEM. ISPRS J Photogrammetry Remote Sens, 2019, 150: 80–90

    Article  Google Scholar 

  252. Hu T, Huang X, Li J, et al. A novel co-training approach for urban land cover mapping with unclear Landsat time series imagery. Remote Sens Environ, 2018, 217: 144–157

    Article  Google Scholar 

  253. Qiu C, Schmitt M, Mou L, et al. Feature importance analysis for local climate zone classification using a residual convolutional neural network with multi-source datasets. Remote Sens, 2018, 10: 1572

    Article  Google Scholar 

  254. Rostami M, Kolouri S, Eaton E, et al. Sar image classification using few-shot cross-domain transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019

  255. Ying Z, Xuan C, Zhai Y, et al. TAI-SARNET: deep transferred atrous-inception CNN for small samples SAR ATR. Sensors, 2020, 20: 1724

    Article  Google Scholar 

  256. Shermeyer J, Hogan D, Brown J, et al. SpaceNet 6: multi-sensor all weather mapping dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. 196–197

  257. Montgomery J, Brisco B, Chasmer L, et al. SAR and lidar temporal data fusion approaches to boreal wetland ecosystem monitoring. Remote Sens, 2019, 11: 161

    Article  Google Scholar 

  258. Chen H, Shi Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens, 2020, 12: 1662

    Article  Google Scholar 

  259. Daudt R C, Saux B L, Boulch A, et al. Multitask learning for large-scale semantic change detection. Comput Vision Image Understanding, 2019, 187: 102783

    Article  Google Scholar 

  260. Zheng B, Campbell J B, de Beurs K M. Remote sensing of crop residue cover using multi-temporal Landsat imagery. Remote Sens Environ, 2012, 117: 177–183

    Article  Google Scholar 

  261. Garnot V S F, Landrieu L, Giordano S, et al. Satellite image time series classification with pixel-set encoders and temporal self-attention. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  262. Wei J, Lee Z, Garcia R, et al. An assessment of Landsat-8 atmospheric correction schemes and remote sensing reflectance products in coral reefs and coastal turbid waters. Remote Sens Environ, 2018, 215: 18–32

    Article  Google Scholar 

  263. Zhang W, Tang P, Zhao L. Remote sensing image scene classification using CNN-CapsNet. Remote Sens, 2019, 11: 494

    Article  Google Scholar 

  264. Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137–1149

    Article  Google Scholar 

  265. Yang X, Sun H, Fu K, et al. Automatic ship detection in remote sensing images from Google Earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens, 2018, 10: 132

    Article  Google Scholar 

  266. Tian Z, Shen C, Chen H, et al. FCOS: fully convolutional one-stage object detection. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 2019. 9626–9635

  267. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431–3440

  268. Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2017, 40: 834–848

    Article  Google Scholar 

  269. Mi L, Chen Z. Superpixel-enhanced deep neural forest for remote sensing image semantic segmentation. ISPRS J Photogrammetry Remote Sens, 2020, 159: 140–152

    Article  Google Scholar 

  270. Yang Y, Newsam S. Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2010. 270–279

  271. Xia G S, Hu J, Hu F, et al. AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans Geosci Remote Sens, 2017, 55: 3965–3981

    Article  Google Scholar 

  272. Zhou W, Newsam S, Li C, et al. PatternNet: a benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J Photogrammetry Remote Sens, 2018, 145: 197–209

    Article  Google Scholar 

  273. Emelyanova I V, McVicar T R, van Niel T G, et al. Assessing the accuracy of blending Landsat-MODIS surface reflectances in two landscapes with contrasting spatial and temporal dynamics: a framework for algorithm selection. Remote Sens Environ, 2013, 133: 193–209

    Article  Google Scholar 

  274. Campos-Taberner M, Romero-Soriano A, Gatta C, et al. Processing of extremely high-resolution LiDAR and RGB data: outcome of the 2015 IEEE GRSS data fusion contest-part A: 2-D contest. IEEE J Sel Top Appl Earth Observations Remote Sens, 2016, 9: 5547–5559

    Article  Google Scholar 

  275. Ching J, Mills G, Bechtel B, et al. WUDAPT: an urban weather, climate, and environmental modeling infrastructure for the anthropocene. Bull Am Meteorol Soc, 2018, 99: 1907–1924

    Article  Google Scholar 

  276. Cadiou E, Mammez D, Dherbecourt J B, et al. Atmospheric boundary layer CO2 remote sensing with a direct detection LIDAR instrument based on a widely tunable optical parametric source. Opt Lett, 2017, 42: 4044–4047

    Article  Google Scholar 

  277. Zhang X, Wang Q, Chen S, et al. Multi-scale cropping mechanism for remote sensing image captioning. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 2019. 10039–10042

  278. Chen B, Huang B, Xu B. Multi-source remotely sensed data fusion for improving land cover classification. ISPRS J Photogrammetry Remote Sens, 2017, 124: 27–39

    Article  Google Scholar 

  279. Carrasco L, O’Neil A, Morton R, et al. Evaluating combinations of temporally aggregated Sentinel-1, Sentinel-2 and Landsat 8 for land cover mapping with Google Earth Engine. Remote Sens, 2019, 11: 288

    Article  Google Scholar 

  280. Piramanayagam S, Saber E, Schwartzkopf W, et al. Supervised classification of multisensor remotely sensed images using a deep learning framework. Remote Sens, 2018, 10: 1429

    Article  Google Scholar 

  281. Liu J, Gong M, Qin K, et al. A deep convolutional coupling network for change detection based on heterogeneous optical and radar images. IEEE Trans Neural Netw Learn Syst, 2016, 29: 545–559

    Article  MathSciNet  Google Scholar 

  282. Wu Y, Li J, Yuan Y, et al. Commonality autoencoder: learning common features for change detection from heterogeneous images. IEEE Trans Neural Netw Learn Syst, 2022, 33: 4257–4270

    Article  Google Scholar 

  283. Zhu X, Cai F, Tian J, et al. Spatiotemporal fusion of multisource remote sensing data: literature survey, taxonomy, principles, applications, and future directions. Remote Sens, 2018, 10: 527

    Article  Google Scholar 

  284. Li J, Li Y F, He L, et al. Spatio-temporal fusion for remote sensing data: an overview and new benchmark. Sci China Inf Sci, 2020, 63: 140301

    Article  MathSciNet  Google Scholar 

  285. Xu J, Zhu Y, Zhong R, et al. DeepCropMapping: a multi-temporal deep learning approach with improved spatial generalizability for dynamic corn and soybean mapping. Remote Sens Environ, 2020, 247: 111946

    Article  Google Scholar 

  286. He C, Gao B, Huang Q, et al. Environmental degradation in the urban areas of China: evidence from multi-source remote sensing data. Remote Sens Environ, 2017, 193: 65–75

    Article  Google Scholar 

  287. Hilker T, Wulder M A, Coops N C, et al. A new data fusion model for high spatial- and temporal-resolution mapping of forest disturbance based on Landsat and MODIS. Remote Sens Environ, 2009, 113: 1613–1627

    Article  Google Scholar 

  288. Tran T V, de Beurs K M, Julian J P. Monitoring forest disturbances in Southeast Oklahoma using Landsat and MODIS images. Int J Appl Earth Observation GeoInf, 2016, 44: 42–52

    Article  Google Scholar 

  289. Singh P, Komodakis N. Cloud-GAN: cloud removal for sentinel-2 imagery using a cyclic consistent generative adversarial networks. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 2018. 1772–1775

  290. Zhang Q, Yuan Q, Zeng C, et al. Missing data reconstruction in remote sensing image with a unified spatial-temporal-spectral deep convolutional neural network. IEEE Trans Geosci Remote Sens, 2018, 56: 4274–4288

    Article  Google Scholar 

  291. Huang H, Kuhn A, Michelini M, et al. 3D urban scene reconstruction and interpretation from multisensor imagery. In: Proceedings of Multimodal Scene Understanding, 2019. 307–340

  292. Liu Y, Xue F, Huang H. UrbanScene3D: a large scale urban scene dataset and simulator. 2021. ArXiv:2107.04286

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2021YFB3900504) and National Natural Science Foundation of China (Grant Nos. 61725105, 62171436).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Fu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, X., Tian, Y., Lu, W. et al. From single- to multi-modal remote sensing imagery interpretation: a survey and taxonomy. Sci. China Inf. Sci. 66, 140301 (2023). https://doi.org/10.1007/s11432-022-3588-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3588-0

Keywords

Navigation