Multimodal medical image fusion algorithm in the era of big data

In image-based medical decision-making, different modalities of medical images of a given organ of a patient are captured. Each of these images will represent a modality that will render the examined organ differently, leading to different observations of a given phenomenon (such as stroke). The accurate analysis of each of these modalities promotes the detection of more appropriate medical decisions. Multimodal medical imaging is a research field that consists in the development of robust algorithms that can enable the fusion of image information acquired by different sets of modalities. In this paper, a novel multimodal medical image fusion algorithm is proposed for a wide range of medical diagnostic problems. It is based on the application of a boundary measured pulse-coupled neural network fusion strategy and an energy attribute fusion strategy in a non-subsampled shearlet transform domain. Our algorithm was validated in dataset with modalities of several diseases, namely glioma, Alzheimer’s, and metastatic bronchogenic carcinoma, which contain more than 100 image pairs. Qualitative and quantitative evaluation verifies that the proposed algorithm outperforms most of the current algorithms, providing important ideas for medical diagnosis.


Introduction
Multimodal medical imaging is a research field that has been getting increasing attention in the scientific community in the last few years, specially due to its significance in medical diagnosis, computer vision, and internet of things [3,5,15,20,28,31,32,35]. Defined as the simultaneous production of signals belonging to different medical imaging techniques, one of the biggest challenges in this research field is how to combine (or fuse) in an effective and optimal way multimodal medical imaging sensors, such as positron emission tomography (PET), single-photon emission computed tomography (SPECT), and magnetic resonance imaging (MRI). This image fusion process comprises many techniques and research areas, ranging from image processing techniques, computer vision to pattern recognition, with the goal of promoting more accurate medical diagnosis and more effective medical decision-making [8,10,18,26,45].

Current challenges in multimodal image fusion
Image fusion can usually be divided into three levels: pixel-level, feature-level, and decision level [21,31,[42][43][44]47]. Since the aim is to fuse pixel information from source images, medical image fusion belongs to the pixel-level.
Multi-scale transform (MST) method is one of the most famous categories [40]. Commonly, the MST fusion methods consist of three steps. First, the source images are transformed into MST domain. Then, the parameters in different scales merged in light of a specific fusion strategy. Finally, the fused image is reconstructed through the corresponding inverse transform. The MST methods mainly contain the Laplacian pyramid (LP) [6], the wavelet transform (WT) [27,34], the non-subsampled contourlet transform (NSCT) [49], and the non-subsampled shearlet transform (NSST) [4,23,38]. However, if the MST method performs without other fusion measures, some unexpected block effect may appear [39].
To overcome this disadvantage, some fusion measures are applied in the MST method. For instance, spatial frequency (SF), local variance (LV), the energy of image gradient (EIG) and sum-modified-Laplacian (SML) are commonly used as fusion measures [17,41]. However, most of these measures are acquired in the spatial domain or low-order gradient domain, which means the fusion map may not be always precise. This imprecision may lead to blocking artefacts.
Except for traditional MST methods, the edge-preserving filtering (EPF)-based MST decomposition method are also commonly used. In the EPF-MST methods, Gaussian filtering and EPF are used to decompose the input image into two scale-layers and one base layer. Then, three layers are fused based on suitable fusion strategies. Finally, the fused image is reproduced by a reconstruction algorithm. The EPF-MST methods contain bilateral filtering (BF)based [51], curvature filtering (CF)-based [40], and cooccurrence filtering (CoF)-based [37] methods.

A pulse-coupled neural network model for medical image fusion
To overcome this challenge, a method called pulse-coupled neural network (PCNN) has been proposed in the literature [46]. This method was initially proposed to emulate the underlying mechanisms of a cat's visual cortex and became later an essential method in image processing [29]. Kong et al. presented an SF modulated PCNN fusion strategy in NSST domain with the solution of infrared and visible image fusion [19]. Inspired by this kind of fusion measure modulated by the PCNN model, one interesting research path would be a solution to a new measure to modulate PCNN in the medical image fusion field. To further improve the fusion quality of medical images, we propose a medical image fusion method based on boundary measure modulated by a pulse-coupled neural network in the non-subsampled shearlet domain. Firstly, the source images are transformed into the NSST domain with low-frequency bands and high-frequency bands. Then, the low-frequency bands are merged through an energy attribute-based fusion strategy, and the high-frequency bands are merged through a boundary measure modulated PCNN strategy. Finally, the fused image is reconstructed by combining the inverse NSST. We evaluate the proposed algorithm by comparing its performance with several existing methods using both a quantitative and qualitative evaluation. Experimental results demonstrate that the proposed method performs better than most of the existing fusion methods.

Contribution
The main contributions of the proposed research article are the following: 1. A medical image fusion framework based on boundary measured PCNN in NSST domain, which can complete the fusion task effectively; 2. The application of a boundary measured PCNN model for high-frequency bands. In this method, the gradient information of the image can be easily extracted, and the size of the structure can be changed to adapt to the scale of structure; 3. The application of an energy attribute-based fusion strategy to low-frequency bands.
Experiments conducted in this research paper suggest that the proposed boundary measured PCNN-NSST achieves the best performance in most cases in qualitative and quantitative when compared to other state-of-the-art image fusion techniques.

Organization
The rest of this paper is organized as follows. In Sect. 2, it is presented the most significant works in the image fusion domain. In Sect. 3, the proposed fusion method BM-PCNN-NSST is described. In Sect. 4, it is presented the set of experiments that were performed to evaluate the proposed algorithm. Finally, in Sect. 5, the main conclusions of this research work are presented.

Related work
In this section, we present an overview of the most significant image fusion algorithms in the literature, namely the non-subsampled shearlet transform (Sect. 2.1), the multi-scale morphological gradient (Sect. 2.2), and the pulse-coupled neural network (Sect. 2.3).

Non-subsampled shearlet transform
The non-sampled shearlet transform is an image fusion method, originally proposed by Easley [13]. It consists in combining the non-subsampled pyramid transform with different shearing filters, and it has the characteristics of multi-scale and multi-directionality. The non-subsampled pyramid transform makes it invariant, which is superior than the LP, and WT methods. Additionally, since the size of the shearing filter is smaller than the directional filter, NSST can represent smaller scales, which makes it better than NSCT.
Given the superiority of its underlying functions, NSST performs better than most commonly used MST. It is therefore widely used in the field of image denoising [36] and image fusion [22].
The NSST model can be described as follows. For the case, n = 2, the shearlet function is satisfied where w 2 L 2 (R 2 ), both A and B are invertible matrices with size 2 9 2, and det B j j¼ 1. For instance, A and B can be represented as In this situation, the tiling of the frequency plane of NSST is shown in Fig. 1 It can be seen that (a) represents the decomposition, and (b) represents the size of the frequency support of the shearlet element w i;l;k .
For convenience, two related functions are used to represent the NSST and the inverse NSST where nsst de Á ð Þ represents the NSST decomposition function for the input image I in , and nsst re Á ð Þ represents the NSST reconstruction steps for the reconstructed image I re . The parameters L n and H n represent low-frequency subbands and high-frequency sub-bands, respectively.

Multi-scale morphological gradient
Multi-scale morphological gradient (MSMG) is an effective operator which extracts gradient information from an image in order to indicate the contrast intensity in the close neighborhood of a pixel in the image. For this reason, MSMG is a method that is highly efficient and used in edge detection and image segmentation. In image fusion, MSMG has been used as a type of focus measure in multifocus image fusion [50]. The specific details of MSMG are as follows.
A multi-scale structuring element is defined as where SE 1 denotes a basic structure element, and t represents the number of scales. The gradient feature G t can be represented by the morphological gradient operators from the image f.
where È and denote the morphological dilation and erosion operators, respectively. x; y ð Þ denotes the pixel coordinate.
From the multi-scale structuring element and the gradient feature, then one can obtain the MSMG by computing the weighted sum of gradients over all scales.
where w t represents the weight of gradient in t-th scale, and it can be represented as Figure 2 shows an example of MSMG. One can see that the boundary information of the images has been well extracted, which demonstrates the effectiveness of the boundary measure.

Pulse-coupled neural network
As the third-generation artificial neural network, PCNN has achieved great success in the image fusion field. A PCNN model often contains three parts: the receptive field, the modulation field and the pulse generator. The expressions of a simplified dual-channel PCNN model can be defined as Neural Computing and Applications As is shown in Fig. 3, S 1 ij and S 2 ij denote the pixel value of two input images at point i; j ð Þ in the neural network; L ij represents the linking parameter; b 1 ij and b 2 ij denote the linking strength; F 1 ij and F 2 ij represent the feedback of inputs. U ij is the output of the dual-channel. h ij is the threshold of step function, d e is the declining extent of the threshold, V h decides the threshold of the active neurons, and T ij is the parameter to determine the number of iterations. Y ij k ð Þ is the k-th output of PCNN.

A Bounded measured PCNN in NSST domain algorithm
In this section, we present the proposed algorithm for multimodal medical image fusion: a bounded measured PCNN approach in the NSST domain (BM-PCNN-NSST). The framework of the proposed algorithm is illustrated in Fig. 4. The fusion algorithm consists in four parts: the NSST decomposition, the low-frequency fusion, the highfrequency fusion, and the NSST reconstruction. The algorithm starts with a pseudocolor image source A which contains three-bands (PET/SPECT image). The first step is to apply an intensity-hue-saturation (IHS) transform in A, which will result in a pair containing the intensity image I A and a source image B. After performing the fusion Fig. 1 The structure of the frequency tiling

NSST decomposition
An N-level NSST decomposition is performed on images I A and B to acquire the decomposition bands L A , H l;k A and L B , H l;k B based on Eq (3), where L denotes low-frequency sub-bands and H l;k represents high-frequency sub-bands at level l with direction k.

Low-frequency fusion
The low-frequency sub-band contains most information of the source images (texture structure and background). In this paper, an energy attribute (EA) fusion strategy is presented in the low-frequency fusion. This EA fusion strategy is divided into three steps: 1. The intrinsic property values of the low-frequency subband are computed as where l and Me represent the mean value and the median value of L A and L B , respectively. 2. The EA function E A and E B are calculated by where exp a L A x; y À IP A ð Þ j j ð Þrepresents the exponential operator, and a denotes the modulation parameter. 2. The fused low-frequency sub-band is obtained by a weighted mean

High-frequency fusion
While low-frequency sub-band contains most information about the source images (such as background and texture), high-frequency sub-bands contain more information about details in images (for example, pixel-level information).
Since in the PCNN model one pixel corresponds to one neuron, it is suitable to use PCNN in high-frequency subbands. In addition, modulating PCNN with MSMG can increase the spatial correlation in the image. Therefore, the MSMG operator can be used to adjust the linking strength between b 1 ij and b 2 where M A and M B are computed by Eq. (7). The high-frequency sub-bands are merged based on this MSMG-PCNN model until all neurons are activated (equal to 1). The fused high-frequency sub-bands can be obtained by where T xy;A and T xy;B can be computed using Eq (15).

NSST reconstruction
The fused image F is reconstructed by L F and H l;k F through the inverse NSST according to Eq (4)

Experiments
To validate the proposed algorithm, a set of experiments was made using three datasets representing different diseases: (1) glioma, (2) mild Alzheimer's, and (3) hypertensive encephalopathy. The proposed algorithm was compared with seven state-of-the-art image fusion methods. Qualitative and quantitative analyses were made to assess its performance. The code of the paper is made available. 1

Datasets
To

Parameter settings
In the proposed BM-PCNN-NSST algorithm, the following parameters were used: • the NSST decomposition level N is set to 4; • the number of directions in each level is set to 16,16,8,8; • the modulation parameter is set to 4; • the scales number of MSMG operator t is set to 3.

Evaluation metrics
To analyze the performance of the proposed algorithm in a quantitative way, we evaluate the different fusion methods using five metrics: entropy (EN), standard deviation (SD), normalized mutual information (NMI) [14], Piella's structure similarity (SS) [33], and visual information fidelity (VIF) [16]. In general, both SD and EN can measure the amount of information of the fused image. NMI evaluates the amount of information transferred from the source images to fused image. SS mainly evaluates the structure similarity between source images and fused image. VIF evaluates the visual information fidelity between the source images and fused image. More detailed information about these evaluation metrics can be found on the references related to each fusion method.

Experimental results
The results of medical image fusion cannot be completely dependent on visual effects evaluation. As long as the feature information is not lost, the medical diagnosis will not be misjudged because of this, and the visual effect will be acceptable. Therefore, in this paper, each disease demonstrates a set of experimental results, which is shown in Figs. 5, 6, 7, 8, 9, 10, and 11. Different methods have different visual effects, but the feature information does not seem to be lost. Therefore, objective evaluation indicators are needed for a further quantitative evaluation.
The mean value of each metrics of different fusion methods is listed in Tables 1, 2, 3, 4, 5, 6 and 7. Each column represents the same metrics for different methods. The highest value is shown in bold, while the second highest in italic. It can be seen that the proposed method performs the best in half of the cases. Even if it is not the highest value in one column, it is still the second highest value, except in the NMI of the normal aging case.

Conclusion
In this paper, a multimodal medical image fusion algorithm is proposed based on boundary measured PCNN and EA fusion strategies in NSST domain. The main advantage of the proposed algorithm is that the two fusion strategies are suitable for different scales. Decomposing images into different scales with NSST can give full play to the advantages of the two fusion strategies. Meanwhile, as an excellent decomposition method, NSST can well blend the differences of multimodal medical images. The performance of the proposed algorithm has been verified in public datasets, which represents it has reached state-ofthe-art level. One of the important outcomes of this paper is reported in ''Appendix,'' which showed the experimental performance of different values of a and t. One can see that when a = 4 and t = 3 the performance is the best in most cases. Since the deep learning technology has been widely used, in the future research, we will focus on the deep learning method in multimodal medical image fusion [2,7,30].
Acknowledgements The authors are grateful to the editors and the reviewers for their valuable comments and suggestions, the Whole Brain Atlas for providing the datasets, and Dr. Mengxue Zheng's guidance on analyzing medical images. This study is supported by China Scholarship Council (CSC201906960047) and 111 Project (B17035).

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.