A deep learning based dual encoder–decoder framework for anatomical structure segmentation in chest X-ray images

Ullah, Ihsan; Ali, Farman; Shah, Babar; El-Sappagh, Shaker; Abuhmed, Tamer; Park, Sang Hyun

doi:10.1038/s41598-023-27815-w

A deep learning based dual encoder–decoder framework for anatomical structure segmentation in chest X-ray images

Article
Open access
Published: 16 January 2023

Volume 13, article number 791, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

A deep learning based dual encoder–decoder framework for anatomical structure segmentation in chest X-ray images

Download PDF

Ihsan Ullah¹^na1,
Farman Ali²^na1,
Babar Shah³,
Shaker El-Sappagh^4,5,
Tamer Abuhmed⁶ &
…
Sang Hyun Park¹

4316 Accesses
14 Citations
4 Altmetric
Explore all metrics

An Author Correction to this article was published on 04 September 2023

This article has been updated

Abstract

Automated multi-organ segmentation plays an essential part in the computer-aided diagnostic (CAD) of chest X-ray fluoroscopy. However, developing a CAD system for the anatomical structure segmentation remains challenging due to several indistinct structures, variations in the anatomical structure shape among different individuals, the presence of medical tools, such as pacemakers and catheters, and various artifacts in the chest radiographic images. In this paper, we propose a robust deep learning segmentation framework for the anatomical structure in chest radiographs that utilizes a dual encoder–decoder convolutional neural network (CNN). The first network in the dual encoder–decoder structure effectively utilizes a pre-trained VGG19 as an encoder for the segmentation task. The pre-trained encoder output is fed into the squeeze-and-excitation (SE) to boost the network’s representation power, which enables it to perform dynamic channel-wise feature calibrations. The calibrated features are efficiently passed into the first decoder to generate the mask. We integrated the generated mask with the input image and passed it through a second encoder–decoder network with the recurrent residual blocks and an attention the gate module to capture the additional contextual features and improve the segmentation of the smaller regions. Three public chest X-ray datasets are used to evaluate the proposed method for multi-organs segmentation, such as the heart, lungs, and clavicles, and single-organ segmentation, which include only lungs. The results from the experiment show that our proposed technique outperformed the existing multi-class and single-class segmentation methods.

Deep architectures for high-resolution multi-organ chest X-ray image segmentation

Article 21 October 2019

Contour-aware multi-label chest X-ray organ segmentation

Article 07 February 2020

Segmentation of Multiple Structures in Chest Radiographs Using Multi-task Fully Convolutional Networks

Introduction

Chest X-rays are mostly used to examine the chest’s anatomical structures, such as the lungs, heart, and clavicles for various pulmonary and cardiac disorders. In the healthcare industry, 3.6 billion chest X-rays are performed each year to evaluate the patients’ health conditions¹. With the increasing amount of chest X-rays, computer-aided diagnostic (CAD) systems can play a significant role to identify chest diseases. Significant research has recently been conducted in CAD systems to facilitate pulmonologists/radiologists to assess chest radiographs. The CAD considers a range of analytical tasks that demand precise segmentation of the anatomical structures in the chest radiography images. These tasks include disease prediction and various size measurements in the chest radiographs, such as determining the existence of pulmonary nodules² or lung disease³ using lung field segmentation. Furthermore, cardiomegaly⁴ can be predicted using heart segmentation in chest radiographs. The information about the clavicle position can be used to overcome false positive results and identify the lesions behind a clavicle more consistently.

However, evaluating chest radiographic images is still challenging for indistinct and overlapping body structure borders, such as the lungs, heart, and clavicles. Furthermore, anatomical structures vary in shapes and sizes depending on the patient’s gender, age, and physique. Also, the presence of medical equipment, which includes pacemakers and guidewires, and various artifacts in chest radiography make the segmentation of anatomical structures problematic. Despite these difficulties, significant development has occurred in recent years regarding the improvement of the segmentation methods^4,5,6,7. The researchers have been able to test various segmentation approaches that employ the Japanese Society of Radiological Technology (JSRT) dataset⁸. These approaches include rule-based systems, such as the thresholding of intensity⁹, edge detection-based methods^10,11, hybrid models^12,13, and landmark-based models^14,15. However, these methods require optimal parameter settings for good performance, and they frequently fail when the anatomical structures overlap.

Due to the excellent performance of the CNNs in segmentation tasks, which are notably in biomedical imaging, the CNN-based approaches^5,16 have quickly gained popularity. Moreover, deep learning methods, such as U-Net have substantially improved segmentation in medical applications, which include vascular segmentation¹⁷, catheter segmentation in X-rays and realistic images^18,19,20, and the lungs segmentation in chest radiography²¹. However, the interest region in the medical images can have similar appearances, which makes it difficult to segment them using U-Net²². Deeper CNN structures with more layers are preferred for better feature representation to achieve a higher segmentation performance. Therefore, we utilized a dual CNN architecture with more layers to achieve a better feature representation for the anatomical structure segmentation.

This paper presents a novel segmentation framework that uses a dual encoder–decoder CNN structure with a pre-trained network weights for the chest anatomical structure segmentation in the X-ray fluoroscopic images. The first encoder–decoder network predicts the initial mask using the VGG19 pre-trained network as an encoder. The initially predicted mask is coupled with the input image, and it is fed into a second encoder–decoder network, which comprises of the recurrent residual convolutional layers and an attention gate module to predict the final improved segmentation mask and capture the additional contextual features. The proposed framework is evaluated using three standard benchmark datasets. The JSRT dataset is utilized for three-class segmentation, which contains the data about heart, clavicles, and lungs. The MCCXR and SCXR datasets are employed for single-class segmentation, which includes the lungs only. The proposed method outperforms the existing state-of-the-art (SOTA) CNN architectures for the anatomical structures segmentation in the chest X-rays. The following are the significant contributions of this study.

We propose a novel dual encoder–decoder architecture to effectively segment the anatomical structures in chest X-ray images. The proposed method can accurately segment both the more prominent structures (lungs and heart) and the smaller structures (clavicles) in chest X-rays.
Instead of previous methods requiring post-processing²³ or multi-stage⁵, we employed an end-to-end CNN architecture with efficient training and high accuracy to improve the model sensitivity to foreground pixels without requiring complex heuristics.
The proposed method incorporates Attention Gating Modules (AGMs) to allow the model to focus on the regions of interest while maintaining the spatial resolution and improving the quality of the feature maps.
We extensively evaluated three public datasets for the anatomical structure segmentation, that includes multiclass segmentation and single-class segmentation. The experimental results show that our proposed method outperforms the previous SOTA methods on single class anatomical structure segmentation as well as multiclass anatomical structure segmentation.

The rest of the manuscript is arranged as follows. “Related work” presents the prior studies about the anatomical structure segmentation for the X-ray fluoroscopic images. “Proposed method” describes the proposed method in detail. We evaluate the proposed method extensively in “Experiments and results”. “Discussion” provides a discussion about the proposed framework. Finally, “Conclusion” presents the conclusion of the proposed method.

Related work

In this section, we present the recent advancements in the anatomical structure segmentation in three domains, which include the conventional, shallow learning, and deep learning methods.

Conventional methods

The conventional feature-based methods employ heuristics algorithms relying on low-level image features. For example, Cheng et al.²⁴ locate lung areas by evaluating the Horizontal and Vertical profiles (HVP) of a chest X-ray image. After that, the lungs boundaries are detected using static global intensity thresholding and smoothing operations. Armato et al.²⁵first construct a range of threshold values using a histogram analysis instead of utilizing the threshold of static global, which they later apply global thresholding to distinguish the initial lung area, and then use local thresholding to refine the initial segmented lung region. Nevertheless, iterative thresholding makes the process computationally expensive. Furthermore, these techniques^24,25 produce lower performances when large deformation occurs in the chest region.

Also, these methods are sensitive to image intensities. It is challenging to define a proper threshold value for the accurate segmentation in the X-ray images due to poor contrast^24,25. Li et al.¹⁰ employed the first derivative of the HVP to define the initial boundary of the lung area, which was succeeded by an adjustment of iterative boundary and edge tracing. However, this method fails to segment the complicated anatomical structures, such as mediastinum and hemidiaphragm. Iakovidis et al.¹¹ utilized a similar approach¹⁰ to identify the lung boundaries in chest X-ray images using curve interpolation²⁶. Xu et al.²⁷ introduced an improved method to identify the anatomical structures in chest X-ray images by performing a gradient analysis using the structural relationships among the anatomic landmark positions. After that, the polynomial functions are utilized to smoothen the boundaries of the anatomical structures. Ahmad et al.²⁸ employed c-mean clustering (CMC) to improve lung borders. However, these techniques do not need any prior understanding of lung anatomy. Several approaches^10,11,27,28 are still noise-sensitive, and the results of pre-processing noise filtering with a Gaussian are contingent on the filter size and threshold.

Shallow learning methods

The shallow learning techniques are considerably sensitive to the feature extraction process. However, the major limitation of these approaches is to select the suitable attributes. Several shallow learning-based frameworks for anatomical structure classification are described in the prior literature. To identify the lung area, Gray et al.²⁹ used a multi-layer neural network (MLP) with the k Nearest Neighbor (kNN) and the Linear Discriminant Analysis (LDA) classifiers, which were trained using a variety of local textures, grey levels, and local difference based features. Tsujii et al.³⁰ used a Hybrid Adaptive Neural Network (HANN), which includes features, such as adjacent pixel locations, normalized intensity, and histogram equalized entropy. The hidden layers in the HANN are dynamically settled, and thus they can be trained efficiently without over-fitting. Nevertheless, these techniques perform poorly with similarly intensity regions, low contrast, and tilted images.

The Markov random field (MRF) model represents a flexible and successful approach to obtain spatial and texture-relevant information. An MRF was also used by Vittitoe et al.³¹ to blend spatial and textural information using the potential functions that were parameterized with a probability distribution and the recursive conditional approach to classify each pixel of the chest X-ray fluoroscopic images optimally for the respective anatomic class. Furthermore, Ginneken et al.⁸ combined the rule-based approach with a shallow learning-based method, whereas the rule-based method uses prior information about the lung region. Also, the pixel classifier uses these attributes, the image intensities, and the entropy measurements to classify the X-ray image to their various categories, such as the lung regions and the non-lung regions. Shi et al.³² suggested an unsupervised technique to segment the lung regions. The authors used a CMC-based method with Gaussian kernels, and they achieved good results via the CMC’s flexible and robust mathematical modeling. Nevertheless, the CMC methods fail in medical image segmentation due to noise, artifacts, and illumination variations.

Deep learning methods

Deep learning-based approaches are contrasted to shallow learning techniques, leverage dynamics, and hierarchical feature representations with several levels of abstraction. Deep learning methods significantly outperform the shallow learning techniques in many applications, but the most significant impediment to the practical deployment of deep learning is a lack of labeled data for training. In recent years the CNNs have been employed in medical field for diagnosing chest problems³³. Kalinovsky et al.³⁴ recently adopted the four-layered encoder–decoder architecture, which is called SegNet³⁵, for the lung area segmentation. Novikov et al.¹⁶ adopted U-Net²² for multi-class segmentation in chest radiographs. Furthermore, Mittal et al.³⁶ proposed LF-SegNet, a modified U-Net-based approach that incorporates a normalization mechanism and improves the up-sampling strategy for segmntation of lung fields in chest radiographs. However, these U-Net-based methods introduce outliers and holes inside the targeted structures, which were solved using a level-set method post-processing step.

Recently, amore complex CNN method such as Mask R-CNN³⁷ is proposed by Wang et al.³⁸ to segment and identify the lung field, heart, and clavicles in the chest X-ray images. However, this approach is computationally costly due to the excessive region proposals. Furthermore, Hwang et al.⁵ proposed a two-stage cascade network training approach, in which a network is trained using input chest X-rays in the first stage, and then both the input chest X-ray image and the output from the learned model in the first stage are fed into the second network. However, these approaches were not trained in an end to end manner. In addition, Peng et al.³⁹ proposed a two-stage approach for the lung segmentation, where in the first step a Deep Belief Network and K-Nearest Neighbor is utilized to segment the lungs and then and improved principal curve and machine learning method is used for the refinement of the segmentation result. However, these types of techniques are computationally costly and necessitate a post-processing step to improve the output. In contrast, the proposed dual encoder–decoder method can be trained in the end-to-end manner. As a result, post-processing and pre-processing are not required.

Proposed method

Figure 1 shows the overview of the proposed framework. The proposed method contains two encoders and decoders. The image is first given to a pre-trained VGG19 encoder which of consists of highly connected convolutional and fully connected layers for better feature extraction. The extracted features from a pre-trained VGG19 encoder are then given to the SE block which pass the most relevant information to the first decoder for the generation of the initial binary segmentation mask. Furthermore, the initial binary segmentation mask and input image are coupled and fed to the second encoder to ensure that the produced initial binary segmentation mask can still be improved. The output features of the second encoder are passed through the AGMs, improving the quality of the feature maps. The AGMs focus on the regions of interest while maintaining the spatial resolution of the feature maps. The feature maps from both encoders are upsampled and passed to the second encoder which generate the final improved segmentation output. Moreover, the pre-trained VGG19 network can be replaced with other networks such as MobileNet⁴⁰, ResNet⁴¹, DenseNet⁴² or EfficientNet⁴³. Furthermore, incorporating DenseNet⁴² in our proposed framework obtained the best results.

Dual encoder–decoder CNN architecture (DED-CNN)

Encoders

The first Deep Learning (DL) network employed a VGG19 encoder with successive convolution and max-pooling layers. The number of filters is doubled after the max-pooling layer, and the process is repeated four times. The fully connected layer is substituted with a single convolutional layer that serves as a network bottleneck, which separates the encoder and decoder. The Rectified Linear Unit (ReLU) activation function is used to introduce nonlinearity into the model. The Squeeze-and-Excite (SE)⁴⁴block, which is shown in Fig. 1c, is then applied to enhance the feature map’s quality.

The second DL encoder is built from scratch. We utilize the prior segmentation mask estimated by the first encoder to guide the second encoder towards the instance of interest. The second network can learn appropriate boundaries of anatomical structures in chest X-ray images by emphasizing features derived from relatively coarse segmentation outputs. Furthermore the second encoder incorporates a recurrent residual convolutional layer (R2CL) in each step to increase the model’s capacity for integrating the context information. R2CL helps to develop a more deeper model and ensure better and stronger feature representation accumulation with respect to different time-steps. The R2CL is comprised of $3\times 3$ convolutions, which is repeated twice with recurrent connections. When the R2CL is employed, the number of feature maps increases, which results in a size reduction of about 50$\%$. Each convolution operation is followed by batch normalization (BN)⁴⁵. The BN regularizes the model while reducing the internal covariant shift. Concatenation is used in the R2U-Net architecture to map the features from the encoding path to the decoding path. The residual network⁴⁶ performs better when the R2CL operations are conducted in discrete time steps. As illustrated in Fig. 1a, we assume that $x_l$ is the input of the lth layer of the R2CL block. Where the pixel is localized at (i, j) in the input on the kth feature map. Furthermore, the network output $O_l$ is (t) at time step t. The output can be mathematically represented by the equation that is given below.

$$\begin{aligned} O^l_{ijk}(t) = (W^c_k)^T\times x^{c(i,j)}_l(t) +(W^r_k)^T\times x^{r(i,j)}_l (t-1)+b_k , \end{aligned}$$

(1)

wherein $x_l^{c(i,j)}$ and $x_l^{r(i,j)}$ are the standard convolutional layer inputs as well as the input for lth residual convolutional layer (RCL). $W_k^c$ and $W_k^r$ are the weights of the forward convolutional layer and RCL of the kth feature map, and $b_k$ is used as the bias. $O_{ijk}^l$ is given to the standard ReLU, which can be mathematically expressed using the equation that is given below.

$$\begin{aligned} F(x_l ,w_l)= f(O^l_{ijk}(t)) =0,O^l_{ijk}(t) \end{aligned}$$

(2)

Final outputs $x(l+1)$ of the recurrent convolutional unit is given to the residual unit, which can be formulated mathematically with the equation that is given below.

$$\begin{aligned} x_{l+1} = x_l + F(x_l ,w_l). \end{aligned}$$

(3)

The inputs of R2CL are represented by $x_l$. The outputs $F(x_l,w_l)$ are used in the down-sampling and up-sampling layers in the encoder and decoder path of the proposed network. The successive sub-sampling or up-sampling layers use the final output $x_{l+1}$ as the input. The basic forward convolutional unit is shown in Fig. 2a, and the structure of the recurrent residual convolutional layer unit (R2CL) is shown in Fig. 2b. We deepen the R2CL block, which each comprise of two RCL units, to extract the features of the deeper layers. The RCL expansion strategy is represented in Fig. 2c, which is where the RCL is expanded to two-time steps $(T = 2)$.

Decoders

Each decoder block in the first DL network performs up-sampling on the given feature map via the transposed convolution, which effectively increases the size of the feature maps. Afterwards, the features maps of the encoder skip-connections are concatenated with the output feature maps. The skip-connections from the first encoder are used in the first decoder. Nevertheless, the skip-connections from both encoders are used in the second decoder, which allow the model to maintain spatial resolution and improve the quality maps of the output feature. In each step of the decoding path, the output of the R2CL unit from the preceding layer is up-sampled by the R2-Unit. The number of feature maps is reduced to half and size is doubled with each up-sampling process. The size of the feature map is restored to its original size of the input image at the last layer of the decoder. Furthermore, the BN is used in the up-sampling process to increase the proposed network’s stability and speed up its convergence during the training phase. The output of the BN is fed to the Attention Gating Modules (AGMs).

The second network uses the AGMs to optimize the previous encoder’s output features, which then merge them with the corresponding features in the decoder. Furthermore, the AGMs allow the model to emphasize the high-quality feature representations while preserving the spatial resolution and boosting the quality of the resulting feature maps. The additive attention map employed in AGMs is described in Fig. 1b. The attention values are computed for each pixel of all the input features $x_i^l$. To determine the attention regions in the feature map, the gating vector $g_i$ is applied to each pixel i. The following is the additive equation.

$$\begin{aligned} \alpha ^l_i = \sigma _2(\psi ^T(\sigma _1(W^T_x x^l_i + W^T_g g_i +b_g))+b_\psi , \end{aligned}$$

(4)

wherein $\sigma _1$ and $\sigma _2$ represent the ReLU and sigmoid activation. Furthermore, the weights of a linear transformation are represented as $W_x$ and $W_g$, and the biases are $b_g$ and $b_{\psi }$. A linear transformation is used to minimize the number of learnable parameters and the computational complexity of the AGMs. In addition, we adjust the input features to match the size of the gating signal. To prevent the aliasing issue, grid resampling of the attention coefficients is performed using tri-linear interpolation. Our network incorporates the AGMs to emphasize the essential features to segment the clavicles, heart, and lungs. The coarse-scale information is used in the gating to disentangle the unnecessary and noisy responses in the skip connections, which boosts the model’s performance with the foreground pixel prediction. The output of the AGMs is the combination of the input feature map and the attention coefficient, which can be represented mathematically by the equation that is given below.

$$\begin{aligned} g_i = x^l_i \times \alpha ^l_i . \end{aligned}$$

(5)

Finally, the output is predicted using a sigmoid activation function.

The proposed framework was implemented using the Keras and TensorFlow libraries, and the models were trained from scratch. The model was initialized with random values during the training step. Adam was used to optimize the model with an initial learning rate (LR) of 1e-5 and a batch size of 8.

Experiments and results

In this section, the datasets that were utilized to conduct the experiments are extensively discoursed, and the implementation procedure is described. In addition, the performance of the proposed framework to segment the anatomical structures in the chest fluoroscopic images is evaluated, and the results are discussed.

Datasets

We performed several experiments on three different datasets: MCCXR dataset⁴⁷, the JSRT dataset⁴⁸, and the SCXR dataset⁴⁷ , as shown in Fig. 3. The JSRT dataset contains 247 posterior-anterior (PA) chest radiographs with a resolution of $2048\times 2048$ pixels and their ground-truth segmentation⁸ with a resolution of $1024\times 1024$ pixels. The MCCXR⁴⁷ comprises of 138 fluoroscopic images with a resolution of $4020\times 4892$, which contains 80 normal subjects and 58 abnormal subjects, including tuberculosis, effusions, and military patterns gathered from Montgomery County’s tuberculosis screening program. The SCXR dataset⁴⁷ was collected by Shenzhen No. 3 Hospital in Shenzhen, Guangdong providence, China. The SCXR dataset⁴⁷ contain 326 normal X-rays images and 336 abnormal X-rays that show various manifestations of tuberculosis. Further, we have used NIH Chest X-ray dataset⁴⁹ which contain both PA and AP views in cross-dataset evaluation. We have performed several experiments in cross-dataset evaluation setup on different datasets to validate the robustness and effectiveness of the proposed framework.

The JSRT dataset is initially split into five subsets. The five-folds cross-validation is then carried out with one fold being dedicated to testing, which includes 20$\%$ of all the available chest X-ray data. The remaining four folds are divided into training, which includes 90$\%$ the chest X-ray data, and the validation, which includes 10% of the chest X-ray data. For the SCXR and MCCXR datasets, we performed three fold cross-validation. For each fold, the total number of epochs was set to 100. An NVIDIA 1080Ti GPU was used to train each model. We compared the proposed model to the recent state of the art techniques, i.e., U-Net²², Linknet⁵⁰, PSPNet⁵¹, Seg-Net⁵², Adaptive Scan⁵³, X-Net+⁵⁴, RX-Net+⁵⁴, TVC⁵⁵, TMI¹⁶, and SCIA⁵⁶, For fair comparison, we normalized the image size as $256\times 256$ images and then trained the proposed model and the comparison methods.

Performance metrics

We utilized the dice⁵⁷, Intersection over Union (IoU)⁵⁸, and the Hausdorff distance (HD)⁵⁹ similarity coefficients to evaluate the proposed method and compare it to the SOTA approaches. For the pairwise comparison of the binary segmentation of the foreground with the ground truth, the DSC is a frequently used overlap metric. In formal terms, it is written using the equation that is given below.

$$\begin{aligned} Dice = \frac{2\times (A \cap B)}{A+B}, \end{aligned}$$

(6)

where the ground truth is A, and the predicated mask is B. The Dice coefficient varies from 0 to 1, and 1 indicates a complete overlap. In addition, we also utilized the Intersection over Union (IoU) metric to quantify the percent overlap between the target mask and our predicted output.

$$\begin{aligned} IoU = \frac{A \cap B}{|A| \cup |B|}. \end{aligned}$$

(7)

The HD is a distance measurement between two sets of points. HD is the maximum of the distances between any segmentation point and the nearest ground truth point.

$$\begin{aligned} HD = \max (d(A,B),d(B,A)), \end{aligned}$$

(8)

where A and B is the finite point set, the function d(A, B) is referred to as the directed HD from A to B, and d(B, A) is the distance from B to A.

Table 1 Comparative evaluation of the proposed method segmentation performance with the existing SOTA algorithms was reported on the JSRT dataset.

Full size table

Experimental results

This section presents the quantitative and qualitative results of the proposed method using three different datasets. In the quantitative results, the proposed framework performance of the multiclass segmentation and the single-class segmentation is compared with the different approaches using the JSRT dataset and the MCCXR and SCXR datasets. In the qualitative results, the visual results of the proposed method are demonstrated.

Quantitative results

JSRT benchmark

Table 1 presents the comparison of the proposed work in terms of the segmentation performance with the existing approaches i.e X-Net+, RX-Net+, TVC, SCIA and TMI that are applied for all the anatomical structures of the X-ray images from the JSRT dataset, such as the heart, lungs, and clavicles. As we can see in Table 1, the proposed method outperformed the majority of the existing techniques. Our proposed method obtained the most significant results in terms of the IoU and dice score for the clavicles and heart segmentation. Additionally, our proposed method is comparably good for the lung segmentation. The RX-Net+ reported the lowest performance for the clavicles segmentation, which is 0.859 and 0.924 for the IoU and dice score, respectively. Another variant of the same method X-Net+ achieved slightly better results. However, this method uses an image size of $1024\times 1024$, which is computationally expansive and challenging to train a model. The proposed method also yielded higher dice and IoU scores than SCIA for the clavicles and heart and comparable dice and IoU scores for the lungs. Although SCIA achieved slightly better results on the lungs than ours, SCIA employed a level set approach to post-process the output segmentation to remove the artifacts, which is computationally costly. In contrast, no post-processing is performed in the proposed method for the improvement of the output segmentation.

Table 2 An overview of the results based on the IoU and the Hausdorff distance for each architecture and three organs, which include the clavicles, heart, and lungs.

Full size table

Table 2 presents the results of the comparison models trained on a single-class dataset. Similar to the results on the multi-class, DED-CNN outperformed the RX-Net+ and TMI. DED-CNN achieved 0.868 IoU for clavicle segmentation, whereas the TMI had the lowest IoU of 0.833. Also, the RX-Net+ showed a slightly worse IoU than our proposed model by achieving a 0.86 IoU. The proposed method obtained a 0.907 IoU for heart segmentation, whereas the RX-Net+ and the TMI scored 0.889 and 0.869 IoU, respectively. The RX-Net+ and the proposed DED-CNN achieved a similar IoU score of 0.955. However, the proposed method outperformed the TMI for lung segmentation. In terms of the HD, the proposed DED-CNN outperforms both the RX-Net+ and TMI for the heart and lungs by achieving 37.186 and 37.419 of HD, respectively. For the clavicles, the RX-Net+ achieved a lower HD than our proposed method.

MCCXR benchmark

We also compared our proposed model in terms of the lung segmentation with the existing methods for the MCCXR dataset, which are presented in Table 3. The proposed DED-CNN outperforms the other existing approaches. The proposed DED-CNN attained a dice score of 97.67$\%$, and an IoU of 95.48$\%$. The proposed model achieved higher scores than the baseline UNet by 2.2$\%$ and 2.92$\%$ in terms of the dice score and IoU. In addition, our model also achieved a higher score when it was compared with the PSPNet model.

Table 3 Analysis of the lung segmentation results in the terms of the dice score and the IoU score using the MCCXR dataset.

Full size table

SCXR benchmark

The proposed method is also applied for lung segmentation on the SCXR dataset. As shown in Table 4, we achieved better results than the existing methods. As we can see in Table 4, the proposed DED-CNN outperforms the other methods by achieving a dice score and an IoU of 97.67$\%$ and 95.48$\%$, respectively. These results show that the proposed method outperformed the baseline UNet by 2.2$\%$ dice score and 2.92$\%$ IoU. Moreover, PSPNet showed the lowest performance when it was compared to the achieved results from the proposed method.

Table 4 Comparision of the proposed method with the other exisiting models in terms of lung segmentation using the SCXR database.

Full size table

To further confirm the significance of the proposed DED-CNN, we compared it with the existing deep learning methods in the literature using three standard benchmark datasets. We avoided the traditional methods, because our main aim was to compare the proposed model with the deep learning based approaches for lung segmentation. Table 5 presents the comparison results of the proposed method and the existing deep CNN architectures for lung segmentation. As we can see in Table 5, the proposed method achieved dice scores of 97.60$\%$, 97.67$\%$, and 95.68$\%$, and it achieved IoU scores of 95.50$\%$, 95.48$\%$, and 91.84$\%$ for the JSRT, MCCXR, and SCXR datasets, respectively. We compared these results with the U-Net model, which indicate that the proposed model increases the dice score by +0.2$\%$, +2.2$\%$, and +0.36$\%$, and IoU score by +3.3$\%$, +2.92$\%$, and +0.65$\%$ for the JSRT, MCCXR, and SCXR datasets, respectively. For the JSRT dataset, the proposed DED-CNN surpasses the dice score by +0.4$\%$ and the JI score by +0.9$\%$ when it is compared to the gold standard⁸. These results indicate that the proposed method outperformed the recently published segmentation approaches in terms of the dice and IoU scores using the three datasets.

Table 5 Detailed comparison of the proposed method with the existing models using three different datasets.

Full size table

Qualitative results

To confirm the qualitative performance and further assess the efficiency of the proposed method, the visual results of the different anatomical structures segmentation are demonstrated in Fig. 4. The class activation maps (CAMs) strategy was initially adopted to interpret classification models; however, Vinogradova et al.⁶³ used the CAMs for the interpretation of semantic segmentation networks. We analyzed the feature’s activation here and employed CAMs to generate the heat maps to visualize the most activated areas in the X-ray images⁶⁴. In addition, the last layer of the trained model is retrieved to construct the CAMs. By taking the weighted sum of the feature maps with their associated weights, we obtained a map, which is denoted as $C_m$, of the most important features, which is used to identify the image pixel as the lung region. The map $C_m$ can be formally stated using the equation that is given below.

$$\begin{aligned} C_m = \sum _k w_{c,k},f_k, \end{aligned}$$

(9)

where $f_k$ is the last feature map, and $w_{c,k}$ is weight of the final classification layer. We identify the essential features, which are utilized by the model to predict the lung region and by up-scaling the map $C_m$ to the image’s dimensions and overlaying the input image. Figure 5 shows various examples of the lung segmentation task using the CAMs. As shown in Fig. 5, the CAMs that are employed in our proposed method produced more activation on specific lung regions, whereas the U-Net CAMs have some outliers that were triggered on the right lung’s bottom side. In addition, the Linknet CAMs have less activation on the left lung when it was compared to the proposed CAMs.

Discussion

The popularity of the CADs among physicians and healthcare professionals is increasing daily. Therefore, it has become vital to provide a stable and reliable framework for chest radiographs that can be constantly evaluated and used for possible medical diagnostics. This study provides a new method for autonomous multi-organ and lung area segmentation in the chest X-rays. The proposed method incorporates a dual encoder–decoder CNN architecture with an initial encoder, which is a pre-trained VGG19 network. In this model, the attention gates and recurrent residual convolutional blocks are utilized instead of the regular convolutional blocks in the second network. The extensive experimental evaluation affirmed that the proposed dual encoder–decoder CNN architectures produced the best results for anatomical structure segmentation compared to the state-of- art methods.

Table 6 Detailed comparison of the proposed method in the cross-dataset setting. The proposed method includes two encoder-decoder networks, referred to as Network-1 and Network-2, respectively.

Full size table

Table 7 Cross-dataset qualitative comparisons of the component of the proposed approach.

Full size table

The results showed in Fig. 4 confirmed the effectiveness of our proposed method in regards to segmenting several organs in the chest X-ray fluoroscopic radiographs. Furthermore, the results provided in Fig. 5 demonstrate that our technique can segment the heart, clavicles, and lungs in severely aberrant structures, which makes it suitable for clinic diagnostics. Furthermore, we performed lung segmentation generalization across datasets in order to evaluate all possible dataset combinations. Table 6 shows the dice and IoU scores for the proposed DED-CNN. The performance of the DED-CNN is higher to that of the model that was trained using only AGM or SE blocks on test datasets. Furthermore, the Fig. 6 demonstrate that the qualitative results are consistent with our quantitative results presented in Table 6 for cross dataset testing configuration. Moreover, the performance of the proposed method with various pre-trained models is shown in Table 7. When we employed the DenseNet-121 as a pre-trained model in our proposed approach, we obtained the best dice and IoU score in cross-dataset settings compared to VGG-19, ReseNet-50, MobileNetv2 and EfficientNet pre-trained models.

Our proposed approach has notable advantages over SCIA and TMI since it is an “end-to-end” method and has robust accuracy performances. There is no pipeline that necessitates a significant amount of image pre and post-processing or registration, which leads to a noticeable decrease in performance time. Additionally, by utilizing the first network output in the second network, where we combine the input image and output segmentation map to guide the second network, our approach effectively lowers outliers and misclassification. In contrast to SCIA and TMI, the proposed method makes use of a pre-trained model, maximizing its generalizability. Additionally, using a pre-trained model is advantageous when there is limited training data.

Despite the outstanding performance of the proposed model in various settings, we illustrated some flaws and constraints that should be carefully considered. First and foremost, the proposed system only accomplishes the segmentation task. Second, this approach is computationally expensive because it uses a dual encoder and decoder. We will integrate additional downstream tasks in future studies, such as pneumonia classification to provide a more comprehensive pipeline for the CAD of chest radiographs. In addition, we also aim to increase the speed of the proposed method by reducing the model parameters using pruning.

Conclusion

In this study, we presented a deep learning-based framework that can effectively identify anatomical structures, which include the lungs, heart, and clavicles, in chest X-ray images. A dual encoder–decoder network is notably employed, which can iteratively refine the output of the first network by fusing it with the input image and passing it through the second network to identify the anatomical structures in the X-ray images. Furthermore, the first encoder–decoder incorporates the use of a pre-trained VGG19 network, which allows the proposed framework to be efficiently trained using limited datasets. In addition, it also allows the proposed model to extract essential features to enhance the anatomical structure segmentation. In the second encoder–decoder, we integrated the input image with a segmentation mask to guide the network and focus on the essential features and avoid the outliers, which permits the proposed network to segment the anatomical structures effectively. Furthermore, instead of the standard convolutional layers, we employed the R2CL and AGMs, which enable the proposed framework to focus on the regions of interest simultaneously and improve the feature maps. The proposed method accurately extracts information from the first encoder–decoder network and integrates it with the second encoder–decoder network to provide precise segmentation of the anatomical structures, which will assist physicians with diagnosing various pulmonary and cardiac diseases.

In the future, the performance of the anatomical structures segmentation framework will be enhanced by utilizing synthetically generated samples using generative adversarial networks. Additionally, downstream tasks, such as pneumonia and covid classification, will be included to provide a more comprehensive approach for the CADs of chest radiographs. Finally, a more sophisticated segmentation and classification method will be investigated for more accurate and reliable anatomical structure segmentation and diseases classification.

Data availability

The JSRT dataset used in this study is published by the Japanese Society of Radiology Technology (JSRT) and is accessible at http://db.jsrt.or.jp/eng.php, The Montgomery dataset (MCCXR) used in this study is published by the U.S. National Institute of Health and is accessible at https://academictorrents.com/details/ac786f74878a5775c81d490b23842fd4736bfe33. The Shenzen dataset (SCXR) images are available on https://www.kaggle.com/datasets/raddar/tuberculosis-chest-xrays-shenzhen and Shenzen dataset (SCXR) masks are accessable at https://www.kaggle.com/datasets/yoctoman/shcxr-lung-mask. The NIH dataset used in this work is publicly available on https://github.com/rsummers11/CADLab/tree/master/Lung_Segmentation_XLSor.

Change history

04 September 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41598-023-41661-w

References

UNSCEAR. Report of the United Nations Scientific Committee on the Effects of Atomic Radiation. General Assembly 56. Session (10–18 July 2008). Official Records: 63. Session, Suppl. No. 46 (a/63/46). Technical Report, United Nations Scientific Committee on the Effects of Atomic Radiation (UNSCEAR) (2008).
Kakeda, S. et al. Improved detection of lung nodules on chest radiographs using a commercial computer-aided diagnosis system. Am. J. Roentgenol. 182, 505–510 (2004).
Article Google Scholar
Wang, J., Li, F. & Li, Q. Automated segmentation of lungs with severe interstitial lung disease in CT. Med. Phys. 36, 4592–4599 (2009).
Article PubMed PubMed Central Google Scholar
Nakamori, N., Doi, K., Sabeti, V. & MacMahon, H. Image feature analysis and computer-aided diagnosis in digital radiography: Automated analysis of sizes of heart and lung in chest images. Med. Phys. 17, 342–350 (1990).
Article CAS PubMed Google Scholar
Hwang, S. & Park, S. Accurate lung segmentation via network-wise training of convolutional networks. in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. 92–99 (Springer, 2017).
Ibragimov, B., Likar, B., Pernuš, F. & Vrtovec, T. Accurate landmark-based segmentation by incorporating landmark misdetections. in 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI). 1072–1075 (IEEE, 2016).
Yang, W. et al. Lung field segmentation in chest radiographs from boundary maps by a structured edge detector. IEEE J. Biomed. Health Inform. 22, 842–851 (2017).
Article PubMed Google Scholar
Van Ginneken, B., Stegmann, M. B. & Loog, M. Segmentation of anatomical structures in chest radiographs using supervised methods: A comparative study on a public database. Med. Image Anal. 10, 19–40 (2006).
Article PubMed Google Scholar
Dawoud, A. Lung segmentation in chest radiographs by fusing shape information in iterative thresholding. IET Comput. Vis. 5, 185–190 (2011).
Article Google Scholar
Li, L., Zheng, Y., Kallergi, M. & Clark, R. A. Improved method for automatic identification of lung regions on chest radiographs. Acad. Radiol. 8, 629–638 (2001).
Article CAS PubMed Google Scholar
Iakovidis, D. K. & Papamichalis, G. Automatic segmentation of the lung fields in portable chest radiographs based on Bézier interpolation of salient control points. in 2008 IEEE International Workshop on Imaging Systems and Techniques. 82–87 (IEEE, 2008).
Candemir, S. et al. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE Trans. Med. Imaging 33, 577–590 (2013).
Article PubMed Google Scholar
Lee, J.-S., Wu, H.-H. & Yuan, M.-Z. Lung segmentation for chest radiograph by using adaptive active shape models. Biomed. Eng. Appl. Basis Commun. 22, 149–156 (2010).
Article Google Scholar
Shao, Y. et al. Hierarchical lung field segmentation with joint shape and appearance sparse learning. IEEE Trans. Med. Imaging 33, 1761–1780 (2014).
Article PubMed Google Scholar
Iakovidis, D. K. & Savelonas, M. Active shape model aided by selective thresholding for lung field segmentation in chest radiographs. in 2009 9th International Conference on Information Technology and Applications in Biomedicine. 1–4 (IEEE, 2009).
Novikov, A. A. et al. Fully convolutional architectures for multiclass segmentation in chest radiographs. IEEE Trans. Med. Imaging 37, 1865–1876 (2018).
Article PubMed Google Scholar
Fu, H., Xu, Y., Lin, S., Wong, D. W. K. & Liu, J. Deepvessel: Retinal vessel segmentation via deep learning and conditional random field. in International Conference on Medical Image Computing and Computer-assisted Intervention. 132–139 (Springer, 2016).
Ullah, I., Chikontwe, P., Choi, H., Yoon, C.-H. & Park, S. H. Synthesize and segment: Towards improved catheter segmentation via adversarial augmentation. Appl. Sci. 11, 1638 (2021).
Article CAS Google Scholar
Ullah, I., Chikontwe, P. & Park, S. H. Catheter synthesis in X-ray fluoroscopy with generative adversarial networks. in International Workshop on PRedictive Intelligence In MEdicine. 125–133 (Springer, 2019).
Ullah, I., Chikontwe, P. & Park, S. H. Real-time tracking of guidewire robot tips using deep convolutional neural networks on successive localized frames. IEEE Access 7, 159743–159753 (2019).
Article Google Scholar
Gao, J., Jiang, Q., Zhou, B. & Chen, D. Convolutional neural networks for computer-aided detection or diagnosis in medical image analysis: An overview. Math. Biosci. Eng. 16, 6536–6561 (2019).
Article MathSciNet PubMed MATH Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. in International Conference on Medical Image Computing and Computer-Assisted Intervention. 234–241 (Springer, 2015).
Souza, J. C. et al. An automatic method for lung segmentation and reconstruction in chest X-ray using deep neural networks. Comput. Methods Programs Biomed. 177, 285–296 (2019).
Article PubMed Google Scholar
Cheng, D. & Goldberg, M. An algorithm for segmenting chest radiographs. in Visual Communications and Image Processing’88: Third in a Series. Vol. 1001. 261–268 (International Society for Optics and Photonics, 1988).
Armato, S. G. III., Giger, M. L. & MacMahon, H. Automated lung segmentation in digitized posteroanterior chest radiographs. Acad. Radiol. 5, 245–255 (1998).
Article PubMed Google Scholar
Bartels, R. H., Beatty, J. C. & Barsky, B. A. An Introduction to Splines for Use in Computer Graphics and Geometric Modeling (Morgan Kaufmann, 1995).
Xu, X.-W. & Doi, K. Image feature analysis for computer-aided diagnosis: Detection of right and left hemidiaphragm edges and delineation of lung field in chest radiographs. Med. Phys. 23, 1613–1624 (1996).
Article CAS PubMed Google Scholar
Ahmad, W. S. H. M. W., Zaki, W. M. D. W. & Fauzi, M. F. A. Lung segmentation on standard and mobile chest radiographs using oriented Gaussian derivatives filter. Biomed. Eng. Online 14, 1–26 (2015).
McNitt-Gray, M. F., Sayre, J. W., Huang, H. & Razavi, M. Pattern classification approach to segmentation of chest radiographs. in Medical Imaging 1993: Image Processing. Vol. 1898. 160–170 (International Society for Optics and Photonics, 1993).
Tsujii, O., Freedman, M. T. & Mun, S. K. Automated segmentation of anatomic regions in chest radiographs using an adaptive-sized hybrid neural network. Med. Phys. 25, 998–1007 (1998).
Article CAS PubMed Google Scholar
Vittitoe, N. F., Vargas-Voracek, R. & Floyd, C. E. Jr. Identification of lung regions in chest radiographs using Markov random field modeling. Med. Phys. 25, 976–985 (1998).
Article CAS PubMed Google Scholar
Shi, Z. et al. Lung segmentation in chest radiographs by means of Gaussian kernel-based fcm with spatial constraints. in 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery. Vol. 3. 428–432 (IEEE, 2009).
DeGrave, A. J., Janizek, J. D. & Lee, S.-I. Ai for radiographic Covid-19 detection selects shortcuts over signal. Nat. Mach. Intell. 1–10 (2021).
Kalinovsky, A. & Kovalev, V. Lung Image Segmentation Using Deep Learning Methods and Convolutional Neural Networks. (2016).
Badrinarayanan, V., Handa, A. & Cipolla, R. Segnet: A deep convolutional encoder–decoder architecture for robust semantic pixel-wise labelling. arXiv preprint arXiv:1505.07293 (2015).
Mittal, A., Hooda, R. & Sofat, S. Lf-segnet: A fully convolutional encoder–decoder network for segmenting lung fields from chest radiographs. Wirel. Pers. Commun. 101, 511–529 (2018).
Article Google Scholar
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask r-cnn. in Proceedings of the IEEE International Conference on Computer Vision. 2961–2969 (2017).
Wang, J., Li, Z., Jiang, R. & Xie, Z. Instance segmentation of anatomical structures in chest radiographs. in 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS). 441–446 (IEEE, 2019).
Peng, T., Xu, T. C., Wang, Y. & Li, F. Deep belief network and closed polygonal line for lung segmentation in chest radiographs. Comput. J. (2020).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520 (2018).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778 (2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708 (2017).
Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. in International Conference on Machine Learning. 6105–6114 (PMLR, 2019).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141 (2018).
Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. in International Conference on Machine Learning. 448–456 (PMLR, 2015).
Alom, M. Z., Hasan, M., Yakopcic, C., Taha, T. M. & Asari, V. K. Improved inception-residual convolutional neural network for object recognition. Neural Comput. Appl. 32, 279–293 (2020).
Article Google Scholar
Jaeger, S. et al. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant. Imaging Med. Surg. 4, 475 (2014).
PubMed PubMed Central ADS Google Scholar
Shiraishi, J. et al. Development of a digital image database for chest radiographs with and without a lung nodule: Receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. Am. J. Roentgenol. 174, 71–74 (2000).
Article CAS Google Scholar
Tang, Y.-B., Tang, Y.-X., Xiao, J. & Summers, R. M. Xlsor: A robust and accurate lung segmentor on chest X-rays using criss-cross attention and customized radiorealistic abnormalities generation. in International Conference on Medical Imaging with Deep Learning. 457–467 (PMLR, 2019).
Chaurasia, A. & Culurciello, E. Linknet: Exploiting encoder representations for efficient semantic segmentation. in 2017 IEEE Visual Communications and Image Processing (VCIP). 1–4 (IEEE, 2017).
Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2881–2890 (2017).
Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017).
Article PubMed Google Scholar
Chandra, T. B., Verma, K., Jain, D. & Netam, S. S. Segmented lung boundary correction in chest radiograph using context-aware adaptive scan algorithm. in Advances in Biomedical Engineering and Technology. 263–275 (Springer, 2021).
Gomez, O., Mesejo, P., Ibanez, O., Valsecchi, A. & Cordon, O. Deep architectures for high-resolution multi-organ chest X-ray image segmentation. Neural Comput. Appl. 32, 15949–15963 (2020).
Article Google Scholar
Bi, L., Feng, D. & Kim, J. Dual-path adversarial learning for fully convolutional network (FCN)-based medical image segmentation. Vis. Comput. 34, 1043–1052 (2018).
Article Google Scholar
Wang, C. Segmentation of multiple structures in chest radiographs using multi-task fully convolutional networks. in Scandinavian Conference on Image Analysis. 282–289 (Springer, 2017).
Sorensen, T. A. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons. Biol. Skar. 5, 1–34 (1948).
Google Scholar
Jaccard, P. The distribution of the flora in the alpine zone. 1. New Phytol. 11, 37–50 (1912).
Beauchemin, M., Thomson, K. P. & Edwards, G. On the Hausdorff distance used for the evaluation of segmentation results. Can. J. Remote Sens. 24, 3–8 (1998).
Article ADS Google Scholar
Jangam, E. & Rao, A. Segmentation of lungs from chest X rays using firefly optimized fuzzy c-means and level set algorithm. in International Conference on Recent Trends in Image Processing and Pattern Recognition. 303–311 (Springer, 2018).
Bosdelekidis, V. & Ioakeimidis, N. S. Lung field segmentation in chest X-rays: A deformation-tolerant procedure based on the approximation of rib cage seed points. Appl. Sci. 10, 6264 (2020).
Article CAS Google Scholar
Liu, M. & Yin, H. Feature pyramid encoding network for real-time semantic segmentation. arXiv preprint arXiv:1909.08599 (2019).
Vinogradova, K., Dibrov, A. & Myers, G. Towards interpretable semantic segmentation via gradient-weighted class activation mapping (student abstract). Proc. AAAI Conf. Artif. Intell. 34, 13943–13944 (2020).
Google Scholar
Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. in Proceedings of the IEEE International Conference on Computer Vision. 618–626 (2017).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

Download references

Acknowledgements

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIT) (No. 2019R1C1C1008727), and by the grant of the medical device technology development program funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea) (20006006). This research work was also supported by the Cluster grant R20143 of Zayed University

Author information

These authors contributed equally: Ihsan Ullah and Farman Ali.

Authors and Affiliations

Department of Robotics and Mechatronics Engineering, Daegu Gyeonbuk Institute of Science and Engineering (DGIST), Daegu, 42988, South Korea
Ihsan Ullah & Sang Hyun Park
Department of Computer Science and Engineering, School of Convergence, College of Computing and Informatics, Sungkyunkwan University, Seoul, 03063, South Korea
Farman Ali
College of Technological Innovation, Zayed University, Dubai, 19282, United Arab Emirates
Babar Shah
Faculty of Computer Science and Engineering, Galala University, Suez, 435611, Egypt
Shaker El-Sappagh
Information Systems Department, Faculty of Computers and Artificial Intelligence, Benha University, Banha, 13518, Egypt
Shaker El-Sappagh
Department of Computer Science and Engineering, College of Computing and Informatics, Sungkyunkwan University, Suwon, 16419, South Korea
Tamer Abuhmed

Authors

Ihsan Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Farman Ali
View author publications
You can also search for this author in PubMed Google Scholar
Babar Shah
View author publications
You can also search for this author in PubMed Google Scholar
Shaker El-Sappagh
View author publications
You can also search for this author in PubMed Google Scholar
Tamer Abuhmed
View author publications
You can also search for this author in PubMed Google Scholar
Sang Hyun Park
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.U. contributed to the conceptualization of this study, conducted experiments, wrote the original draft, and revised the manuscript, S.H.P. and F.A. contributed to the experimental plan, supervised the work and revised the manuscript, B.S, S.S. and T.A. contributed to the evaluation of the developed technique, analysis of results and revised the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Sang Hyun Park.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this Article was revised: In the original version of this Article, Farman Ali was incorrectly affiliated with ‘Department of Software, Sejong University, Seoul, 05006, South Korea’. The correct affiliation is ‘Department of Computer Science and Engineering, School of Convergence, College of Computing and Informatics, Sungkyunkwan University, Seoul 03063, South Korea.’

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ullah, I., Ali, F., Shah, B. et al. A deep learning based dual encoder–decoder framework for anatomical structure segmentation in chest X-ray images. Sci Rep 13, 791 (2023). https://doi.org/10.1038/s41598-023-27815-w

Download citation

Received: 31 May 2022
Accepted: 09 January 2023
Published: 16 January 2023
DOI: https://doi.org/10.1038/s41598-023-27815-w
Springer Nature Limited

This article is cited by

Segmentation-based cardiomegaly detection based on semi-supervised estimation of cardiothoracic ratio
- Patrick Thiam
- Christopher Kloth
- Hans A. Kestler
Scientific Reports (2024)
CSR-NeT: lung segmentation from chest radiographs using transfer learning technique
- Pradeep Kumar
- Linesh Raja
- Pramod Kumar Soni
Proceedings of the Indian National Science Academy (2024)

A deep learning based dual encoder–decoder framework for anatomical structure segmentation in chest X-ray images

Abstract

Similar content being viewed by others

Deep architectures for high-resolution multi-organ chest X-ray image segmentation

Contour-aware multi-label chest X-ray organ segmentation

Segmentation of Multiple Structures in Chest Radiographs Using Multi-task Fully Convolutional Networks

Introduction

Related work

Conventional methods

Shallow learning methods

Deep learning methods

Proposed method

Dual encoder–decoder CNN architecture (DED-CNN)

Encoders

Decoders

Experiments and results

Datasets

Performance metrics

Experimental results

Quantitative results

JSRT benchmark

MCCXR benchmark

SCXR benchmark

Qualitative results

Discussion

Conclusion

Data availability

Change history

04 September 2023

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Segmentation-based cardiomegaly detection based on semi-supervised estimation of cardiothoracic ratio

CSR-NeT: lung segmentation from chest radiographs using transfer learning technique

Search

Navigation