1 Introduction

Image fusion (IF) is an emerging field for generating an Infrmative image with the integration of images obtained by different sensors for decision making [1]. The analytical and visual image quality can be improved by integrating different images. Effective image fusion is capable of preserving vital Information by extracting all important Information from the images without producing any inconsistencies in the output image. After fusion, the fused image is more suitable for the machine and human perception. The first step of fusion is Image Registration (IR) in which source image is mapped with respect to the reference image. This type of mapping is performed to match the equivalent image on the basis of confident features for further analysis. IF and IR are perceived as vital assistants to produce valuable Information in several domains [2]. According to the literature, the number of scientific papers has been increased dramatically since 2011 and reached to the peak 21,672 in 2019 which can be illustrated in Fig. 1. This fast-rising trend can be recognized due to the increased demand for high-performance image fusion techniques with low cost. Recently, various techniques like multi-scale decomposition and sparse representation have been introduced that bring several ways for improving the image fusion performance. There is a need for efficient fusion method due to variations between corresponding images in various applications. For instance, numerous satellites are increasing nowadays to acquire aerial images with diverse spectral, spatial and temporal resolutions in the domain of remote sensing. The IF is basically a collection of image Information achieved by several imaging parameters such as aperture settings or dynamic range, spectral response or position of the camera or the use of polarization filters. The Information of interest is extracted from different images with the help of appropriate image fusion methods which can further be used for traffic control, reconnaissance, driver assistance or quality assessment.

Fig. 1
figure 1

According to the literature, the number of articles related to image fusion

Various techniques of image fusion can be classified as pixel level, decision level and feature level. Pixel level techniques for image fusion directly integrate the Information from input images for further computer processing tasks [3]. Feature level techniques for image fusion entails the extractions of relevant features that is pixel intensities, textures or edges that are compounded to create supplementary merged features [4, 5]. In decision level fusion techniques for images, the input images are processed one at a time for the extraction of Information [4]. There are a variety of IF classifications based on the datasets such as multi-focus, multi-spectral, multi-scale, multi-temporal, and multi-sensor. In multi-focus techniques for image fusion, Information from several images of a similar scene is fused to get one composite image [6]. In addition, the multi-source and multi-sensor IF methods recommend superior features for representing Information which is not visible to a human visible system and are utilized for medical diagnosis applications. The Information generated from merged images can be employed for localization of abnormalities accurately [7]. The temporal modeling will give details of all clinical variables and reduce the risk of Information failure [8]. The fast-rising trend can be a major factor of image fusion techniques having low cost and high performance.

Currently, several techniques like sparse representation (SR) and multi-scale decomposition have been anticipated that help in enhancing the image fusion performance [3]. SR is a theory of image representation, which is employed to an image processing tasks such as interpolation, denoising, and recognition [1]. The multi-spectral (MS) image is used for remote sensing that merges their features to get an understandable image using the corresponding Information and spatiotemporal correlation [9]. IF has grown to be an influential solution by merging the images captured through diverse sensors. Images of diverse types such as infrared, visible, MRI and CT are superior input images for multimodal fusion [1]. Currently, Deep learning is a very active topic in image fusion. It has gained great success in this area for solving different type of problems such as image processing and computer vision. It is widely used for image fusion [10]. Due to recent technological advancements, various imaging fusion techniques have been utilized in many applications including video surveillance, security, remote sensing, machine vision, and medical imaging.

Still there are number of challenges associated with image fusion which have to be explored. Moreover, an appropriate, accurate and reliable fusion technique is required for the various types of images for different domains that should be easily interpretable to obtain better results. Besides, image fusion techniques must be robust against uncontrollable acquisition conditions or inexpensive computation time in real-time systems as mis-registration is a major error found while fusing images. This paper presents an outline of various IF techniques with their applications in diverse domains. Also, various challenges, shortcomings, and benefits of image fusion techniques have been discussed.

Rest of the paper is organized in different sections. Section 2 discusses the image fusion process. Section 3 gives the various image fusion techniques. Section 4 presents the taxonomical view of the image fusion methods. Section 5 explains various image fusion applications. In Sect. 6, the evaluation metrics of fusion are discussed. Section 7 delivered the perceptive deliberations and prospects for future work. Finally, the papers with an outline of major ideas are concluded.

2 Image Fusion Process

As discussed earlier, the goal of IF is to produce a merged image with the integration of Information from more than one image. Figure 2 demonstrates the major steps involved in IF process. In wide-ranging, the registration is measured as an optimization issue which is used to exploit the similarity as well as to reduce the cost. The Image registration procedure is used to align the subsequent features of various images with respect to a reference image. In this procedure, multiple source images are used for registration in which the original image is recognized as a reference image and the original images are aligned through reference image. In feature extraction, the significant features of registered images are extracted to produce several feature maps.

Fig. 2
figure 2

The main steps of IF procedure

By employing a decision operator whose main objective is to label the registered images with respect to pixel or feature maps, a set of decision maps are produced. Semantic equivalence obtained the decision or feature maps that might not pass on to a similar object. It is employed to connect these maps to a common object to perform fusion. This process is redundant for the source obtained from a similar kind of sensors. Then, radiometric calibration is employed on spatially aligned images. Afterward, the transformation of feature maps is performed on an ordinary scale to get end result in a similar representation format. Finally, IF merge the consequential images into a one resultant image containing an enhanced explanation of the image. The main goal of fusion is getting more Infrmative fused image [2].

3 Image Fusion Techniques

IF techniques can be classify as spatial and frequency domain. The spatial technique deals with pixel values of the input images in which the pixels values are manipulated to attain a suitable outcome. The entire synthesis operations are evaluated using Fourier Transform (FT) of the image and then IFT is evaluated to obtain a resulting image. Other IF techniques are PCA, IHS and high pass filtering and brovey method [12].

Discrete transform fusion techniques are extensively used in image fusion as compared to pyramid based fusion technique. Different types of IF techniques are shown in Fig. 3 [13].

Fig. 3
figure 3

Image Fusion Techniques

3.1 Spatial Based Techniques

The Spatial based technique is a simple image fusion method consist of Max–Min, Minimum, Maximum, Simple Average and Simple Block Replace techniques [14][15]. Table 1shows the diverse spatial domain based methods with their pros and cons.

Table 1 shows the diverse spatial domain based methods with their pros and cons as per the literature review

3.1.1 Simple Average

It is a fusion technique used to combined images by averaging the pixels. This technique focused on all regions of the image and if the images are taken from the same type of sensor then it works well [16]. If the images have high brightness and high contrast then it will produce good results.

3.1.2 Minimum Technique

It selects the lowest intensity value of the pixels from images andproduceda fused image [14]. It is used for darker images [17].

3.1.3 Maximum Technique

It selects the pixels values of high intensity from images to yield fused image [12].

3.1.4 Max–Min Technique

It selects the averaging values of the pixels smallest and largest from the entire source images and produced the resultant merged image.

3.1.5 Simple Block Replace Technique

It adds all images of pixel values and takes the block average for it. It is based on pixel neighboring block images.

3.1.6 Weighted Averaging Technique

It assigned the weights to every pixel in the source images. The resultant image is produced by the weighted sum of every pixel value in source images [18]. This method improves the detection reliability of the output image.

3.1.7 Hue Intensity Saturation (HIS)

It is a basic fusion color technique that converted the Red–Green–Blue image into HIS components and then intensity levels are divided with panchromatic (PAN) image. Spatial contains intensity Information and spectral contains both hue and saturation Information of the image. It performs in the bands and has three multispectral bands Red–Green–Blue (RGB) of low resolution. In the end, the inverse transformation is performed to convert the HIS space to the original RGB space for yielding fused image [12]. It is a very straightforward technique to combine the images features and provides a high spatial quality image. In remote sensing images it gives the best result and major drawback is that it involved only three bands [19].

3.1.8 Brovey Transform Method

Gillespie et al. suggested Brovey Transform in 1987. It is a straightforward technique for merging data from more than one sensor. It overcomes the three band problems. It standardized the three multispectral bands used for RGB to append the intensity and brightness into the image [13]. It includes an RGB color transform technique that is known as color normalization transform to avoid disadvantages of the multiplicative technique [12]. It is helpful for visual Interpretation but generates spectral distortion [19].

3.1.9 Principal Component Analysis (PCA)

It is a statistical method on the basis of orthogonal transformation for converting a set of observations of a possibly correlated variable into principal components that are set of linearly uncorrelated variables. The main drawback of PCA is spectral degradation and color distortion [9].

3.1.10 Guided filtering

It works as a boundary smoothing and preserving operator similar to the admired bilateral filter. It has enhanced performance near the boundaries. It has a hypothetical link with Laplacian matrix. It is a fast and non-estimated linear time algorithm, whose density is not dependent on the mask size. This filter is more efficient and effective in graphics and computer vision applications with joint upsampling, haze removal, detail smoothing and noise reduction [20]. IF is also used in medical domain to identity the various diseases. In which article, author perform experiment on brain images and prove that Guided filter provides better results as compared to Principal component analysis and multi-resolution singular value decomposition technique [161].

3.2 Frequency Domain

These techniques decomposed the multiscale coefficients from the input images [25]. Spatial distortion can be handled by the frequency method. Table 2 lists the various frequency domain based methods with their pros and cons.

Table 2 shows the diverse frequency domain based methods with their pros and cons as discussed by several authors

3.2.1 Laplacian Pyramid Fusion Technique

It uses the interpolation sequence and Gaussian pyramid for multi-resolution analysis for image fusion. Saleem et al. have reported an improved IF technique using a contrast pyramid transform on multi-source images [151]. But it is suffered by the drawback of extraction ability which can be overcome by multi-scale decomposition. Further, Li et al. improved the gradient pyramid multi-source IF method which attains high band coefficient with the help of gradient direction operator [9].

3.2.2 Discrete Transform Fusion Method [14]

Discrete transform based fusion take composite images. Firstly, if the images are colored then RGB (Red–Green–Blue) components of the multiple images are separated subsequently, discrete transformation on images is applied and then the average of multiple images is computed an inverse transformation is applied at the end to obtain a fused image. DWT (Discrete wavelets transform) is a better IF method as compared to other fusion methods like Laplacian pyramid method, Curvelet transforms method etc [26].

3.2.3 Discrete Cosine Transform (DCT)

In image fusion, DCT has various types like DCTma (DCT magnitude), DCTcm (DCT contrast measure), DCTch (DCT contrast highest), DCTe (DCT energy) and DCTav (DCT average) [29]. This technique does not give a better result with the size of the block less than 8 × 8. In the DCT domain, DCTav is straightforward and basic method of image fusion. DCTe and DCTma methods performed well in image fusion. This technique is straightforward and used in factual time applications.

3.2.4 Discrete Wavelet Transform (DWT) Method

DWT method decomposes the two or more images into various high and low-frequency bands [31]. This method minimized the spectral distortion in the resultant fused images by producing the good signal to noise ratio with fewer spatial resolution as compared to the pixel-based method. Wavelet fusion performed superior to the spatial domain fusion method with respect to minimizing the color distortions [15].

3.2.5 Kekre’s Wavelet Transform (KWT) Method

Kekre’s Wavelet Transform method is obtained from Kekre’s transforms [32]. It can generate KWT matrices of ((2 N)*(2 N)), ((3 N)*(3 N)),…., ((N2)*(N2)) from Kekre’s transform method matrix [33]. It can be used for more than one images and the fused image is far good than other methods.

3.2.6 Kekre’s Hybrid Wavelet Transform (KHWT) Method

KHWT method has been derived from hybrid wavelet transforms. Many authors suggested that kekre-hadamard wavelet method gives more brightness. Hybrid kekre-DCT wavelet method gives good results. In this method, the best two matrices are combined into a hybrid wavelet transforms method. It cannot be used images integer power of two [45].

3.2.7 Stationary Wavelet Transform (SWT) Method

DWT method has a disadvantage of translation invariance and Stationary Wavelet Transform overcome this problem [34]. This technique provides a better output at decomposition level 2 and time inefficient process [35] [36] [37]. SWT derived from DWT method. It is a new type of wavelet transform method with translation invariance. It provides enhanced analysis of image facts. The next second invention curvelet transform method is additionally suitable for 2-D image edges.

3.2.8 Curvelet Transform Method

SWT has a better characteristic in time–frequency. It can achieve well result for devising in smooth. The second generation Curvelet is a new multi-scale transform; it breaks the disadvantages of wavelet method in representing directions of boundaries in the image [11, 40,41,42].

3.3 Deep Learning

Another technique which is most widely used for image fusion is Deep Learning in various domains. Several deep learning based image fusion methods have been presented for multi-focus image fusion, multi-exposure image fusion, multi-modal image fusion, multi-spectral (MS) image fusion, and hyper-spectral (HS) image fusion, showing various advantages. Various recent advances related to deep learning based image are discussed in [10]. In another way, deep learning and case-based reasoning techniques are used with image fusion to enhance the outcome of segmentation. In this article, author used artificial intelligence to improve the results of segmentation and its implementation done on kidney and tumour images. This process complete in three layers: Fusion layer, Segmentation layer, Data layer [159]. The multi-view deep learning model is also used in Covid-19 for validation and testing sets of chest CT images. It is more helpful to identify the diagnosis problem. Data is collected from various hospitals of china [160]. The reasons behind the popularity of deep learning based methods for image fusion are presented as that deep learning model are able to extract the most elective features automatically from data without any human intervention. These models are also able to characterize various complex relationships between targeting data and input. Deep learning models are gaining popularity in providing potential image representation approaches which could be useful to the study of image fusion. Commonly used deep learning models in image fusion are Convolutional Neural Network (CNN), Convolutional Sparse Representation (CSR) and Stacked Autoencoder (SAE). Table 3.listed the various advantages and disadvantages of deep learning based image fusion methods.

Table 3 shows deep learning based image fusion methods

4 Image Fusion categorization

It is the process which integrates the source image and reference image into one image. Diverse techniques were anticipated by different authors to achieve the required fusion objective. A Single sensor, multi-sensor, multi-modal, multi-view, multi-focus, and multi-temporal illustrates major classes of such methods which are discussed below.

4.1 Single Sensor

A number of images are merged to produce a fused image with the best possible Information. For instance, human operators are not able to perceive desired objects in lighting variant and noisy environment which can be highlighted in the end fused image. The inadequacy of this type of system is due to imaging sensors that are used in many sensing areas. The resolution of images is limited by the sensors and the conditions in which the system is operated in its dynamic range. Such as visible band sensor (digital camera) is appropriate for illuminated day-light scenes but is not appropriate for the badly illuminated nighttime environment or under fog or rain that is unfavorable conditions [47].

4.2 Multi Sensors

Multi-sensor IF overpowers the confines of a one sensor IF by integration of images from a number of sensors to form a compound image. An infrared (IR) camera accompanies the digital camera to obtain the final image from individual images. The infrared camera is suitable in inadequate illuminated environments and the digital camera is suitable for day-light views. It is used in machine vision, a military area such as in medical imaging, robotics and object detection. It is mostly used to resolve the merged Information of the numerous images [47]. According to the literature review, Table 4 shows the various Multi-sensors techniques discussed by several authors.

Table 4 Multi Sensor image fusion techniques reported in literature

4.3 Multi-view Fusion

Multi-view images have diverse views at the similar time. It is also known as Mono-modal fusion [47]. The existing methods didn’t achieve acceptable performances in all cases, especially when one of the estimations is not high-quality; in this case, they are unable to discard it [49, 86, 152]. According to the literature review, Table 5 shows the various Multi-view techniques discussed by several authors.

Table 5 Multi-View image fusion techniques reported in literature

4.4 Multi-modal Fusion

It is the process of integrating multi-modal images from one or more imaging modalities to enhance the quality of an image. The various models are multispectral, panchromatic, infrared, remote sensing and visible images. Table 6 evident the various Multi-modal techniques discussed by several authors according to the literature review.

Table 6 Multi-Modal image fusion techniques reported in literature

4.5 Multi-focus Fusion

It is an efficient method for integrating the Information from several images with a similar sight into a wide-ranging image. The compound image is more Infrmative than input images [6]. It gives better visual quality of an image. According to the literature review, Table 7 shows the various Multi-focus techniques discussed by several authors.

Table 7 Multi-Focus fusion techniques reported in literature

4.6 Multi-temporal Fusion

Multi-temporal fusion captures the same scene at different times. Long and short-term observations are required because of the estimation of the occurrence of changes on the ground. Because of the revisit observation satellites, remote sensing images are obtained at diverse times for a given area. Multi-temporal images are vital for detecting land surface variations in broad geographical areas. According to the literature review; Table 8 shows the various Multi-temporal techniques discussed by several authors.

Table 8 Multi-Temporal fusion techniques reported in literature

5 Main Applications in Diverse Domains

In current years, IF has been widely used in many different applications such as medical diagnosis, surveillance, photography and remote sensing. Here, various challenging and issues are discussed related to different fields [3].

5.1 Remote Sensing Applications

In accumulation to the modalities discussed above, it has numerous IF techniques such as Synthetic Aperture Radar, ranging and light detection and moderate resolution imaging spectroradiometer that have been useful in IF applications. Byun et al. have given the area based IF scheme for combining panchromatic, multispectral and synthetic aperture radar images [1]. Temporal data fusion and high spatial approach is used to produce synthetic Landsat imagery by combining Landsat and moderate resolution imaging spectroradiometer data [1]. Moreover, the synthesis of air-bone hyper-spectral and Light Detection and Ranging (LiDAR) data is researched recently by a combination of spectral Information. Various datasets have been provided by Earth imaging satellites like Quickbird, Worldview-2, and IKONOS for the applications of pansharpening. Co-registered hyper-spectral and multispectral images are more complex to obtain as compared to multispectral and panchromatic images. Moreover, air-bone hyper-spectral data and LiDAR are accessible. For occurrence, the IEEE Geoscience and Remote Sensing Society Data Fusion 2013 and 2015 Contests have distributed numerous hyper-spectral, color and light detection and ranging data for research purposes. In this field application, numbers of satellites are mounted to acquire remote sensing images with diverse spatial, temporal and spectral resolutions. Moreover, in this field, classification and change detection has been providing by Google Maps or Earth products that are effectively applied to construct the imagery seen. This is a supplementary difficult problem as compared with pansharpening, the multichannel multispectral image contains both spatial Information and spectral Information. Therefore, pansharpening is unsuitable or incompetent for the IF of hyperspectral and multispectral images. The foremost challenge in this domain is accomplished as below:

  1. (1)

    Spatial and spectral distortions The image datasets frequently reveal variations in spatial and spectral structures which causes more distortions with spatial or spectral artifacts during image fusion.

  2. (2)

    Mis-registration The next most important challenge in this domain is how to decrease the misregistration rate. The remote sensing input images are regularly obtained from diverse times, acquisitions or spectral bands. Even the panchromatic and multispectral datasets provided by a similar platform, the one or more sensors may not give accurate results in the same direction; their gaining moments may be different. Therefore, in order to resolve this, prior to IF, the images are required to be registered. Conversely, registration is the challenging process because of the variations between input images as they are provided with diverse acquisitions. Figure 4 shows the fusion of Panchromatic and Multi-spectral images that is achieved by the Principal Component Analysis (PCA) transformation correspondingly.

Fig. 4
figure 4

Examples of IF in remote sensing domain. a PAN b MS c Fused image

5.2 Medical Domain Applications

Harvard Medical School has provided a brain image dataset of registered Computerized Tomography and Magnetic Resonance Imaging. Figure 6 shows an example of IF in medical diagnosis by fusing CT and MRI. The CT is used for capturing the bone structures with high spatial resolutions and MRI is used to capture the soft tissue structures like the heart, eyes, and brain. The CT and MRI can be used collectively with IF techniques to enhance accuracy and sensible medical applicability. The main challenging of this field is also accomplished as below.

  1. (1)

    Lack of medical crisis oriented IF methods The main motive of IF is to assist the improved clinical results. The clinical crisis is still a big challenge and nontrivial tasks in the medical field.

  2. (2)

    Objective image fusion performance estimate The main difficulty in this domain is how to evaluate the IF performance. There are diverse clinical issues of IF, which preferred the IF effect may be fairly dissimilar.

  3. (3)

    Mis-registration The inaccurate registration of objects suffered from poor performance in the medical domain. Figure 5 illustrates the fusion of MRI and CT images. In this, the fusion of images is achieved by the guided filtering based technique with image statistics.

Fig. 5
figure 5

Examples of IF in medical diagnosis domain. a MRI b CT c Fused image

5.3 Surveillance Domain Applications

Figure 6 shows examples of IF in the surveillance domain that is infrared and visible images fusion. Its high temperature makes it able to “see in the night” even without enlightenment as it is sensitive to objects. Infrared images give bad spatial resolution and it can be overcome by fusion technique by the visible and infrared image. Moreover, the fusion of visible and infrared images has been introduced for another surveillance domain problem in face recognition, image dehazing, and military reconnaissance. The main challenges of in this domain are as:

  1. (1)

    Computing efficiency In this domain, effective IF algorithms should merge the Information of innovative images to get the final resultant image. Other prominently, in these domains usually engages continuous real-time monitoring.

  2. (2)

    Imperfect environmental conditions The major difficulty in this field, the images may be acquired at imperfect circumstances. Due to the weather and enlightenment condition, the input images may contain under-exposure and serious noise. Fusion of visible and infrared image is shown in Fig. 6a, b. In this outline, the fusions of both images are achieved by the guided filtering and image statistics.

Fig. 6
figure 6

Examples of IF in surveillance domain. a Visible image b Infrared image c Fused image

5.4 Photography Domain Applications

Figure 7 shows examples of IF in the photography domain, the fusion of multi-focus images. It is not possible for all objects with diverse distances from camera due to its restricted depths to be all-in-focus within a single shot of cameras. Due to the restricted depths of the camera, it is not possible to be all-in-focus within a single shot of cameras for all objects with diverse distances. To overcome this, the multi-focus IF method is used to merge several images with a similar scene having diverse focus points for generating an all-in-focusresultant image. This resultant compound image can well defend the significant Information from the source image. It is more desirable in several image processing tasks and machine vision. In Fig. 8, the data sources used in the photography domain. The various challenges which are faced in this domain are:

  1. (1)

    Effect of moving target objects In this domain, multi-exposure and multi-focus images are constantly provided by diverse times. In these circumstances, during the capturing process moving objects may become visible in diverse locations. The moving objects might produce inconsistencies into the fused image.

  2. (2)

    Relevance in consumer electronics In this, images are taken from numerous shots with diverse settings of the camera. The challenge is to combine the multi-exposure and multi-focus IF methods into consumer electronics to produce a compound image of high quality in real-time. IF of multi-focus images (Back-focus image and Fore-focus image) is shown in Fig. 7a, b. In this outline, IF of multi-focus images is achieved by guided filtering based technique and image.

Fig. 7
figure 7

Examples of IF in photography domain. a Back-focus Image b Fore-focus image c Fused image

5.5 Applications in Other Domains

Many other applications that are used for fusion like object recognition, tracking, object detection etc.

5.6 Recognition Application

In an image one or more objects are visible. The main aim of recognition to recognize the objects clearly. Face recognition is a major application in this domain. Recognition algorithm used infrared and visible IF method. It has two types to recognize the image. In the first type, first fused the images after that recognize its objects. In the next type, it is embedded with the reorganization process,and it isproposed by Faten et al. It can help for improving the recognition precision with the help of narrowband and enhancement the fusion results.

5.7 Detection and Tracking Application

Detection and tracking are used in infrared and visible IF. It isused in real-life applications: fruit detection, object detection. It determines the accurate place of the objects at a similar time. Detection fusion algorithm can be differentiate into two classes, in the first class detect the objects before fusing and in the second classes, fuse the images before detecting objects. He at el. introduced an algorithm with multilevel image fusion and it enhanced the target detection of the object. Pixel and feature level image fusion isalso considered in this method and it shows the relationship between high and low-frequency Information, which is ignored in the wavelet transform IF. The main motive of this method to enhance target visibility.

Target tracking algorithm is similar to the detection algorithm. It should determine the relationship between the frames of the target objects in a particular time sequence. Stephen et al. introduced an enhanced target tracking approach through visible and infrared IF by using PCA-weighted fusion rule. In most of the algorithms: detection, recognition and tracking are independent to each other which is designed to recover the features or visibility of the actual images [4].

5.8 Performance Evaluation Measure

A number of performance evaluation metrics have been anticipated to evaluate the performances of diverse IF techniques. They can be categorized as subjective and objective assessment measures [52]. Subjective assessment measures play important role in IF as it evaluates the fused image quality based on human visual perception. It can compare various fusion techniques and methods according to standards like image details, image distortion, and object competence. In the infrared and visible IF subjective evaluation is more popular and reliable. Its disadvantages are high cost, time-consuming, irreproducibility and human intervention. Objective assessment is carried out to quantify the fused image quality quantitatively. It is not biased by observers and highly consistent with visual perception. It’s arriving from diverse types, which are based on image gradient, similar structure, Information theory, human visual perception and statistics [1]. A number of metrics for quantifying the quality of fused images is presented in this survey. Some other evaluation measures of fused images are divided into two groups. Evaluation measures are further categorized as a reference and non-reference evaluation measures. Evaluation measures based on the reference image are given below.

  1. i.

    The mean of the square error (MSE) computes the error and the real differentiation between the ideal or expected results [1, 147]. This metric is defined as follow:

    $$ MSE = \frac{1}{mn}\mathop \sum \limits_{i = 1}^{m} \mathop \sum \limits_{j = 1}^{n} \left( {A_{ij} - B_{ij} } \right)^{2} $$

    where A and B is the ideal and compound image respectively that can be evaluated, i and j is the pixel row and column index respectively, m and n is the height and width of image implying the numeral or pixel rows and columns respectively.

  2. ii.

    The Structural similarity index metric (SSIM) quantifies the similarity between one or more images. It is designed by modeling any contrast distortion and radiometric. It is a combination of the luminance image distortion and the combination of contrast distortion, loss correlation and structure distortion between source images and the final image [1, 11, 147,148,149,150]. This metric is defined as follow:

    $$ SSIM\left( {x,y} \right) = \frac{{\left( {2\mu_{x} \mu_{y} + c_{1} } \right)\left( {2\sigma_{xy} + c_{2} } \right)}}{{\left( {\mu_{x}^{2} + \mu_{y}^{2} + c_{1} } \right)\left( {\sigma_{x}^{2} + \sigma_{y}^{2} + c_{2} } \right)}} $$

    where the \({\mu }_{x}\) average of x, \({\mu }_{y}\) the average of y, \({\sigma }_{x}^{2}\) the variance of x, \({\sigma }_{y}^{2}\) the variance of y, \({\sigma }_{xy}\) the covariance of x and y,\({c}_{1}\) and \({c}_{2}\) two variable to stabilize the division with the weak denominator

  3. iii.

    The peak signal to noise ratio (PSNR) is used to compute the ratio of peak power and noise value power [1, 11, 147, 149, 150]. This metric is defined as follow:

    $$ PSNR = 10log_{10 } \left\{ {\frac{{r^{2} }}{MSE}} \right\} $$

    Here, r indicates the peak value of the fused image. If the PSNR value is high that means fused image closer to the input image and less distortion in the fusion method

  4. iv.

    Erreur Relative Globale Adimensionnelle de Synthese employed to quantify the image quality from the fusion of high spatial resolution images. This method is introduced by Lucien Wald [148, 151]. This metric is defined as follow:

    $$ ERGAS = 100\frac{h}{l}\sqrt {\frac{1}{N}\mathop \sum \limits_{k = 1}^{N} {\raise0.7ex\hbox{${RMSE\left( {B_{k} } \right)^{2 } }$} \!\mathord{\left/ {\vphantom {{RMSE\left( {B_{k} } \right)^{2 } } {\left( {M_{k} } \right)^{2 } }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\left( {M_{k} } \right)^{2 } }$}}} $$

    where \(h\) and \(l\) show the size of the PAN and MS images, \(N\) indicates the total number of pixels in the fused image. \({M}_{k}\) is the mean radiance value of the MS image for the \({B}_{k}\) band [151]

  5. v.

    Overall cross entropy (OCE) is used to determine the difference between one or more source images to get a fused image. If the OCF value is smaller, it means it provide better results [11, 12]. This metric is defined as follow:

    $$ OCE\left( {f_{A} ,f_{B} ;F} \right) = CE(f_{A} ;F) + CE(f_{B} ;F)/2 $$

    Here CE indicates the cross-entropy of the images, it effects the entropy of two source images \({f}_{A}\) and \({f}_{B}\) and F is fused image

  6. vi.

    Visual Information fidelity (VIF) is used to measure the distortions of images. It includes blur, local or global contrast changes and additive noises [1, 150]. This metric is defined as follow:

    $$ VIF \, = \, Distorted \, image \, information/Reference \, image \, information $$
  7. vii.

    Mutual Information provides the Information quantity detail of source images, which are merged in the resultant image. The highest Mutual Information represents the effectiveness of the IF technique [1, 11, 141, 150, 151]. This is defined as follows

    $$ MI_{AF} = \mathop \sum \limits_{a,f} P_{A,F } \left( {a,f} \right)log\left[ {\frac{{P_{AF } \left( {a,f} \right)}}{{P_{A } \left( a \right)P_{F } \left( f \right)}}} \right] $$

    where \({P}_{A }(a)\) and \({P}_{F }(f)\) denote the marginal histogram of input image A and the fused image is F. \({P}_{A,F }(a,f)\) indicate joint histogram of input image A and fused image is F. if mutual Information value is high it means fusion performance is good

  8. viii.

    Spectral Angle Mapper (SAM) calculates spectral similar content between original and final fused images by looking between two vectors at the angle [21]. This metric is defined as follow:

    $$ {\upalpha } = \cos^{ - 1} \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{b} t_{i} r_{i} }}{{\left( {\mathop \sum \nolimits_{i = 1}^{b} t_{i}^{2} } \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}}} \left( {\mathop \sum \nolimits_{i = 1}^{b} r_{i}^{2} } \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}}} }}} \right) $$

    Here, b is denoted the number of bands and \({\mathrm{t}}_{\mathrm{i}}\) and \({\mathrm{r}}_{\mathrm{i}}\) stand for an ith band of test and reference images

  9. ix.

    Signal to noise Ratio (SNR) is used to determine the noise. The larger the signal to noise value enhanced the resultant compound image [11, 12]. This metric is defined as follow:

    $$ {\text{S}}NR = 10log_{10} \left\{ {\frac{{\mathop \sum \nolimits_{x = 1}^{P} \mathop \sum \nolimits_{y = 1}^{Q} \left( {I_{r} (x,y} \right))^{2} }}{{\mathop \sum \nolimits_{x = 1}^{P} \mathop \sum \nolimits_{y = 1}^{Q} \left( {I_{r} (x,y} \right) - I_{f} \left( {x,y} \right))^{2} }}} \right\} $$

    where \({I}_{r}(x,y)\) indicates the intensity of the pixel of the estimated image and \({I}_{f}(x,y)\) indicates the intensity of the pixel of the source image. If the signal to noise value is high it means estimation error is small and better performance in the fusion.

    Non-reference image quality evaluation measures which do not need reference image are given below.

  10. i.

    Standard Deviation (SD) spread the data in the whole image [1, 151]. This metric is defined as follow:

    $$ {\text{SD}} = \sqrt {\mathop \sum \limits_{i = 1}^{m} \mathop \sum \limits_{j = 1}^{n} \left( {h\left( {i,j} \right) - \overline{H}} \right)^{2} } $$

    where \(\stackrel{-}{H}\)denotes the mean value of fused image. If the value of SD is high that means fused images achieve a good visibility effect.

  11. ii.

    Spatial Frequency error (SFE) is a quantitative measure to objectively assess the superiority of the final the fused image [149]. This is defined as follow:

    $$ SFE = \frac{{SF_{f} - SF_{r} }}{{SF_{r} }} $$

    where \({SF}_{f}\)-SF of the fused image, \({SF}_{r}\)-SF of the reference image.

  12. iii.

    Entropy (EN) is used to evaluate the Information content of an image and it produce sensitive noise in the image. Image with large Information content has low cross entropy [1, 11, 12, 149, 151]. This metric is defined as follow:

    $$ EN = - \mathop \sum \limits_{l = 0}^{L - 1} p_{l} log_{2} p_{l} $$

    where, L is denotes the total number of gray levels and \({p}_{l}\) denotes the normalized histogram of the corresponding gray level in the IF. If the EN value is higher it means it contains more Information and better performance in the image fusion.

  1. iv.

    The universal image quality index (UIQI) is motivated by Human Visual System. It is based on the structural Information of final fused resultant images with the combination of loss of correlation, contrast distortion, and luminance distortion [11, 150]. This metric is defined as follow:

    $$ UIQI = \frac{{\sigma_{{I_{1} I_{F} }} }}{{\sigma_{{I_{1} \times }} \sigma_{{I_{F} }} }}.\frac{{\mu_{{I_{1} I_{F} }} }}{{\left( {\mu_{{I_{1} }} } \right)^{2 } + \left( {\mu_{{I_{1} }} } \right)^{2 } }}.\frac{{2\sigma_{{I_{1} \times }} \sigma_{{I_{F} }} }}{{\left( {\sigma_{{I_{1} }} } \right)^{2 } + \left( {\sigma_{{I_{1} }} } \right)^{2 } }} $$

    where \(\sigma \) is variance, \(\mu \) is the average.

  2. v.

    Fusion mutual Information metric evaluates the degree of dependence of the one or more images. It is based on Mutual Information (MI) and measures the feature Information which is transformed from input image to fused image. It enhanced the image quality [1, 11]. This metric is defined as follow:

    $$ FMI = MI_{A,F} + MI_{B,F} $$

    where A, B are input images and F is the fused image. If FMI value is high it indicates that considerable Information transfer from input to the fused image.

  3. vi.

    Spatial frequency (SF) is an image quality index it’s called spatial row frequency (RF) and column frequency (CF) based on horizontal and vertical gradients. The Spatial frequency evaluation metric can calculate the gradient distribution of an image effectively and it gives more texture detail of the image [1, 11, 150, 151]. This metric is defined as follow:

    $$ SF = \sqrt {RF^{2} + CF^{2} } $$
    $$ RF = \sqrt {\mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} \left( {F\left( {i,j} \right) - F\left( {i,j - 1} \right)} \right)^{2} } $$
    $$ {\text{CF}} = \sqrt {\mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} \left( {F\left( {i,j} \right) - F\left( {i - 1,j} \right)} \right)^{2} } $$

    where F is a fused image. A fused image with high SF contain the rich edges and textures Information.

  4. vii.

    The large Mean gradient (MG) measurement implies that the composite image capture the rich edges and textures Information. Its fusion performance is better [1, 149]. This metric is defined as follow:

    $$ {\text{MG}} = \frac{1}{{\left( {{\text{M}} - 1} \right)\left( {{\text{N}} - 1} \right){ }}} \times \mathop \sum \limits_{{{\text{x}} = 1}}^{{{\text{M}} - 1}} \mathop \sum \limits_{{{\text{y}} = 1}}^{{{\text{N}} - 1}} \sqrt {{\raise0.7ex\hbox{${(\left( {{\text{F}}\left( {{\text{x}},{\text{y}}} \right) - {\text{F}}\left( {{\text{x}} - 1,{\text{y}}} \right)} \right)^{2} + \left( {{\text{F}}\left( {{\text{x}},{\text{y}}} \right) - {\text{F}}\left( {{\text{x}}, - 1{\text{y}}} \right)} \right)^{2} )}$} \!\mathord{\left/ {\vphantom {{(\left( {{\text{F}}\left( {{\text{x}},{\text{y}}} \right) - {\text{F}}\left( {{\text{x}} - 1,{\text{y}}} \right)} \right)^{2} + \left( {{\text{F}}\left( {{\text{x}},{\text{y}}} \right) - {\text{F}}\left( {{\text{x}}, - 1{\text{y}}} \right)} \right)^{2} )} 2}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}} $$

    where F is a final fused image.

  5. ix.

    Average Difference (AD) is the propositional value of the differentiation between the actual or ideal data. [147]. This metric is defined as follow:

    $$ {\text{AD}} = \frac{1}{{{\text{mn}}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{m}}} \mathop \sum \limits_{{{\text{j}} = 1}}^{{\text{n}}} \left| {\left( {{\text{A}}_{{{\text{ij}}}} - {\text{B}}_{{{\text{ij}}}} } \right)} \right| $$
  6. ix.

    Average Gradient (AG) measurement is used to measure the gradient Information of the composite image. It provides the texture detail of an image [1, 150, 151]. The AG metric is defined as follow:

    $$ {\text{AG}} = \frac{1}{{{\text{MN}}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{M}}} \mathop \sum \limits_{{{\text{j}} = 1}}^{{\text{N}}} \sqrt {\frac{{\nabla {\text{F}}_{{\text{x}}}^{{2{ }}} \left( {{\text{i}},{\text{j}}} \right) + \nabla {\text{F}}_{{\text{y}}}^{{2{ }}} \left( {{\text{i}},{\text{j}}} \right)}}{2}} $$

    If AG metric value is high that means it contains more gradient Information and better performance in the fused algorithm.

  7. x.

    Normalized cross correlation (NCC) is employed to determine similar content between input and fused image [147]. This metric is defined as follow:

    $$ {\text{NCC}} = \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{m}}} \mathop \sum \nolimits_{{{\text{j}} = 1}}^{{\text{n}}} \left( {{\text{A}}_{{{\text{ij}}}} {\text{*B}}_{{{\text{ij}}}} } \right)}}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{m}}} \mathop \sum \nolimits_{{{\text{j}} = 1}}^{{\text{n}}} \left( {{\text{A}}_{{{\text{ij}}}}^{2} } \right)}} $$
  8. xi.

    Mean absolute error (MAE) of the related pixels in original and final fused images [11]. This is defined as follow:

    $$ {\text{MAE}} = \frac{1}{{{\text{N}}^{2} }}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{M}}} \mathop \sum \limits_{{{\text{j}} = 1}}^{{\text{N}}} {\text{s}}\left( {{\text{i}},{\text{j}}} \right) - {\text{y}}\left( {{\text{i}},{\text{j}}} \right) $$
  9. xii.

    Normalized Absolute Error (NAE) is a quality measure that normalized the error value with respect to the expected or ideal value. It isprovide the dissimilarity between the actual and desired outcome which is further divided bythe sum of the expected values [147]. This metric is defined as follow:

    $$ {\text{NAE}} = \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{m}}} \mathop \sum \nolimits_{{{\text{j}} = 1}}^{{\text{n}}} \left| {\left( {{\text{A}}_{{{\text{ij}}}} - {\text{B}}_{{{\text{ij}}}} } \right)} \right|}}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{m}}} \mathop \sum \nolimits_{{{\text{j}} = 1}}^{{\text{n}}} \left( {{\text{A}}_{{{\text{ij}}}} } \right)}} $$
  10. xiii

    Correlation determines the correlation between referenced and resultant image. If the reference image and resultant image value are one it means both images are exactly the same and if it is less than one it means both images have more dissimilarity [11].

6 Discussion

Despite the various constraints which are handled by several researchers, still number of research and development in the field of image fusion is growing day by day. Image fusion has several open-ended difficulties in different domains. The main aim is to discuss the current challenges and future trends of image fusion that arise in various domains, such as surveillance, photography, medical diagnosis, and remote sensing are analyzed in the fusion processes. This paper has discussed various spatial and frequency domain methods as well as their performance evaluation measures. Simple image fusion techniques cannot be used in actual applications. PCA, hue intensity saturation and Brovey methods are computationally proficient, high-speed and extremely straightforward but resulted in distortion of color. Images fused with Principal component analysis have a spatial advantage but resulted in spectral degradation. The guided filtering is an easy, computationally efficient method and is more suitable forreal-world applications. The number of decomposition levels affects the pyramid decomposition in image fusion outcome. Every algorithm has its own advantages and disadvantages. The main challenge faced in remote sensing field is to reduce the visual distortions after fusing panchromatic (PAN), hyperspectral (HS) and multi-spectral (MS) images. This is because source images are captured using different sensors with similar platform but do not focus on a same direction as well as their gaining moments are not exactly the same. The dataset and its accessibility represent a restriction that is faced by many researchers. The progress of image fusion has increased its interest in colored images and its enhancement. The aim of color contrast enhancement is to produce an appealing image with bright color and clarity of the visual scene. Recently, researchers have used neutrosophy in image fusion, used to remove noise and to enhance the quality of single photon emission tomography (SPET), computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) image. This integration of neutrosophy with image fusion resulted in noise reduction and better visibility of the fused image. Deep learning is the rising trend to develop the automated application. It extremely applied in various applications such as face recognition, speech recognition, object detection and medical imaging. The integration of quantitative and qualitative measures is the accurate way to determine which particular fusion technique is better for certain application. The various challenges which generally are faced by researchers is to design image transformation and fusion strategies. Moreover, the lack of effective image representation approaches and widely recognized fusion evaluation metrics for performance evaluation of image fusion techniques is also of great concern. Whereas the recent progresses in machine learning and deep learning based image fusion shows a huge potential for future improvement in image fusion.

7 Conclusion

Recently, the area of image fusion is attracting more attention. In this paper, various image fusion techniques with their pros and cons, different methods with state-of-art has been discussed. Different applications like medical image, remote sensing, photography and surveillance images have been discussed with their challenges. Finally, the different evaluation metrics for image fusion techniques with or without reference has been discussed. Therefore, it is concluded from survey that each image fusion technique is meant for a specific application and can be used in various combinations to obtain better results. In future, new deep neural networks based image fusion methods will be developed for various domains to improve the efficiency of fusion procedure by implementing the algorithm with parallel computing unit.