Automated thorax disease diagnosis using multi-branch residual attention network

Li, Dongfang; Huo, Hua; Jiao, Shupei; Sun, Xiaowei; Chen, Shuya

doi:10.1038/s41598-024-62813-6

Automated thorax disease diagnosis using multi-branch residual attention network

Article
Open access
Published: 24 May 2024

Volume 14, article number 11865, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Automated thorax disease diagnosis using multi-branch residual attention network

Download PDF

Dongfang Li¹,
Hua Huo¹,
Shupei Jiao¹^na1,
Xiaowei Sun¹^na1 &
…
Shuya Chen¹^na1

312 Accesses
1 Altmetric
Explore all metrics

Abstract

Chest X-ray (CXR) is an extensively utilized radiological modality for supporting the diagnosis of chest diseases. However, existing research approaches suffer from limitations in effectively integrating multi-scale CXR image features and are also hindered by imbalanced datasets. Therefore, there is a pressing need for further advancement in computer-aided diagnosis (CAD) of thoracic diseases. To tackle these challenges, we propose a multi-branch residual attention network (MBRANet) for thoracic disease diagnosis. MBRANet comprises three components. Firstly, to address the issue of inadequate extraction of spatial and positional information by the convolutional layer, a novel residual structure incorporating a coordinate attention (CA) module is proposed to extract features at multiple scales. Next, based on the concept of a Feature Pyramid Network (FPN), we perform multi-scale feature fusion in the following manner. Thirdly, we propose a novel Multi-Branch Feature Classifier (MFC) approach, which leverages the class-specific residual attention (CSRA) module for classification instead of relying solely on the fully connected layer. In addition, the designed BCEWithLabelSmoothing loss function improves the generalization ability and mitigates the problem of class imbalance by introducing a smoothing factor. We evaluated MBRANet on the ChestX-Ray14, CheXpert, MIMIC-CXR, and IU X-Ray datasets and achieved average AUCs of 0.841, 0.895, 0.805, and 0.745, respectively. Our method outperformed state-of-the-art baselines on these benchmark datasets.

Diagnosis of Pediatric Pneumonia with Ensemble of Deep Convolutional Neural Networks in Chest X-Ray Images

Article 12 September 2021

Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance

Article 11 June 2024

Detection and classification of lung diseases for pneumonia and Covid-19 using machine and deep learning techniques

Article 18 September 2021

Introduction

The impact of thoracic diseases on global health and public welfare has been considerable over the years. Chest X-ray (CXR) images¹ are currently the most common method that can effectively aid in the diagnosis of a series of thoracic diseases, such as cardiomegaly, consolidation, and edema. Computer-automated classification studies of multiple-label CXR images^2,3 are extremely important in assisting clinical diagnosis. However, current publicly available CXR image datasets often contain several different diseases, and the location of lesions in these different diseases may overlap or interfere with each other. As a consequence, the diagnosis of thoracic diseases can be significantly affected. Currently, several methods^4,5,6,7 for the automatic classification of multi-label CXR images have been proposed to improve the classification performance. However, studying computer-aided diagnosis (CAD) algorithms for thoracic diseases is still a challenging task. In recent years, many researchers have paid much attention to deep learning-based automatic medical image analysis techniques, and some techniques have been applied to medical image processing, such as U-net⁸, PSPNet⁹, DeepLab¹⁰ for medical image segmentation, ResNet¹¹, DenseNet¹² and Transforms¹³ for medical image classification. To learn case information from CXR images corresponding to chest diseases and to help physicians in clinical diagnosis, some methods have been used to improve the learning ability of network models. These methods can be broadly classified into three categories: (1) optimizing the network structure, (2) introducing attention mechanisms, (3) optimizing the loss function. In addition, The public release of two large datasets, ChestX-Ray14² and CheXpert³, has further contributed to the research in this area.

The diagnostic steps of chest diseases include feature extraction of abnormal regions and disease classification. To improve the ability of the model for feature extraction, the main approach is to optimize the structure of the network model. For example, Wang et al.² used a transition layer, a global pooling layer, and a prediction layer after the last convolutional layer instead of the fully connected layer and the final classification layer in DCNN. This method is able to find the reasonable spatial location of the disease. Chowdary et al.¹⁴ designed a two-branch network that firstly segmented the input CXR image by R-I UNet, and then used two fine-tuned AlexNet models to extract features from the original CXR image and the segmented image respectively. Hashmi et al.¹⁵ fine-tuned five classical CNN models using transfer learning and achieved good improvements in CXR classification. Chen et al.⁶ proposed a DualCheXNet to predict thoracic diseases, using DenseNet and ResNet to extract features from the same CXR image, and then using two auxiliary classifiers to classify the features. Huang et al.¹⁶ proposed an HRNet to extract abnormal features from four feature maps with different resolutions and classify them. With the advancement of attention mechanisms and their applications for image classification tasks in recent years, the performance of

network models have been further improved. For example, the ECA¹⁷ attention mechanism used in MRFCNet¹⁸ can adaptively calibrate the channel response of the feature maps and enhance the main pathological features while suppressing the transmission of useless information. Meanwhile, many other attention mechanisms such as SE¹⁹, CBAM²⁰, and CA²¹ have been proposed and applied in many studies. Inspired by¹⁸, in our study, after comparing the effects, we finally introduced CA into feature extractor to help our network locate the abnormal regions on CXR images.

However, in many research works, the spatial location and channel information on various diseases tend to be disregarded, resulting in models that fail to adequately capture details regarding lesion regions across different scales. Furthermore, a prevalent issue of class imbalance is evident in the majority of datasets, which can easily give rise to overfitting challenges. Regrettably, existing research methodologies lack efficient strategies to address these issues, significantly impacting the ultimate classification outcomes. Additionally, as depicted in Fig. 1, a substantial amount of irrelevant data concerning non-diagnostic regions is present in chest X-ray (CXR) images. This not only incurs heightened computational expenses but also adversely affects the final classification performance of the model. Although the approaches proposed in^22,23 aim to localize regions of interest and integrate them with global images for chest radiograph classification, they do not effectively tackle the issue.

To solve these issues, we propose a multi-branch residual attention network (MBRANet) model, which focuses on the fusion and classification of image features at several different scales. MBRANet works on abnormal region extraction features by using ResNet50. With the emergence and application of attention mechanisms in recent years, we introduced the CA to focus attention on specific locations in the images. Subsequently, leveraging the premise of a Feature Pyramid Network (FPN), we conduct multi-scale feature fusion in an ensuing manner. Then, we designed a new multi-branch feature classifier (MFC) based on the class-specific residual attention (CSRA) method to classify the extracted multi-scale feature information. In addition, to solve the problem caused by class imbalances, we designed the BCEWithLabelSmoothing loss function. Finally, we use a heat map to display the anomalous regions on the CXR images. In summary, the main contributions of this work are as follows:

We proposed the MBRANet model for multi-label CXR image classification. The model can better capture the location information of disease-correlated regions in CXR images. Coordinate attention (CA) is proposed to make the resident blocks pay more attention to the disease-correlated regions and retain more discriminative features. Enhancing feature fusion through the integration of the Feature Pyramid Network (FPN) approach.
We proposed a novel Multi-branch Feature Classifier (MFC) that effectively classifies features at multiple scales. This is achieved by incorporating the class-specific residual attention (CSRA) module, which eliminates the need for parameterization during the classification process. Our method overcomes the limitation of the fully connected layer in utilizing spatial information from features.
We designed a novel BCEWithLabelSmoothing loss function, which incorporates a smoothing factor. The inclusion of this factor addresses the issue of model sensitivity to noisy or uncertain data when utilizing traditional one-hot encoding for labels.
We developed a graphical user interface (GUI) page for visualizing the lesion area and displaying classification results. Experimental evaluations conducted on extensive CXR image datasets consistently demonstrate that our proposed MBRANet surpasses previous competing approaches in terms of performance, thus confirming the efficiency of our method.

The following sections of this paper are organized as follows: “Related work” introduces work on deep learning and attention mechanisms for multi-label CXR image classification. The “Approach” section describes the method used in the proposed MBRANet model. The “Experiment” section presents the dataset and experimental parameters (such as learning rate, and batch size), as well as a summary of the experimental results. The “Analysis and discussions” section encompasses a comprehensive evaluation of the results obtained from ablation experiments on each module, alongside presenting case studies. Furthermore, it delves into a thorough examination of the problem addressed in this paper, emphasizing the significance of the research while also recognizing the existing limitations inherent within our current methodology. Finally, the “Conclusion” section concludes our work.

Related work

In this section, we summarize related research in three aspects: Firstly, we present some previous studies of work on the classification of multi-label chest X-rays. Secondly, we give an analysis of attention mechanisms. In conclusion, we examine some of the significant research currently available in this area.

Multi-label chest X-ray image classification

In recent years, computer-aided diagnosis (CAD) research has attracted widespread interest and made significant breakthroughs. With the release of the publicly available ChestX-ray14, CheXpert, and COVID-19, more researchers are applying deep learning to automated CXR analysis. Specifically, the ChestX-ray14 dataset provided by the NIH Clinical Center has been a hot spot for research in automated CXR image analysis. Wang et al.² used four pre-trained ImageNet²⁴ models to evaluate the performance of multi-label classification of chest X-ray images and found that ResNet50 performed best. Currently, a basic approach is to classify CXR images by training a binary classifier for each lesion by using the popular CNN. Ma et al.²⁵ proposed a ChestXNet model, an improvement on the pre-trained DenseNet121¹², to classify each chest X-ray image for abnormalities and achieve excellent results in detecting pneumonia. Hashmi et al.¹⁵ used transfer learning to fine-tune five classical CNN models and proposed a weighted classifier that combined the classification results of these CNN models to achieve high accuracy in identifying pneumonia. Chen et al.⁶ used a dual asymmetric architecture based on ResNet and DenseNet to adaptively capture more discriminable features on thoracic CXR images. Kumar et al.²⁶ proposed a PairWise Error (PWE) loss function and an optimized convolutional network for multi-label chest X-ray image classification. Albahli et al.²⁷ proposed a new method called AI-CenterNet with DenseNet-41, which uses a Heatmap header to find the lesion region and its class, and then Multitask loss is used to further improve the localization and classification of the lesion region. Chen et al.²⁸ introduced graph convolutional networks (GCN) for lung disease classifying. However, these methods use mainstream deep learning networks to extract pathological features from CXR images, which are not capable of combining global and local features. Therefore, we utilized the multi-scale features extracted by the network to learn more relevant information.

Attention learning

The attention mechanisms enhance the model’s ability to handle lengthy sequences and intricate relationships. Conventional sequence models often fail to effectively capture long-range dependencies, whereas attention mechanisms overcome this limitation by attending to different positions of the input sequence. This dynamic adjustment of attention enables the model to capture both local and global information within the sequence.

The attention mechanisms are important in the classification of multi-label chest X-ray images. An attention mechanism can enhance model performance by enabling it to concentrate on the lesion region in the image. Guendel et al.²⁹ proposed a location-aware approach. This method can effectively utilize high-resolution image data and significantly improve classification accuracy by incorporating spatial information about the disease. Moreover, they proposed a new split reference for the dataset, which can be meaningfully benchmarked for future directions. Wang et al.³⁰ developed a triple attention network (A3Net). Specifically, A3Net uses the pre-trained DenseNet121 model as the backbone network for feature extraction and integrates three attention modules into a unified framework that learns attention to information at the channel level, element level, and scalar level. To further improve the performance of individual disease diagnosis, Guan et al.³¹ suggested exploring local discriminative regions and proposed attention-guided masked inference to help the network recognize diseases. Zhu et al.³² proposed a pixel classification and attention network (PCAN) for disease classification and weakly supervised localization, and they also provided interpretability for disease classification. Ouyang et al.³³ proposed an attention-driven weakly supervised algorithm that can be used to solve the problem of anomaly localization. The algorithm contains a hierarchical attention-mining framework that unifies the visual attention of activation functions and gradients in a single whole. Chen et al.³⁴ proposed a new network PCSANet that is based on pyramidal convolution and shuffle attention modules for thoracic disease classification and COVID-19 detection.

Other works

In addition to the aforementioned research aspects, several other valuable contributions have been made in this field. For example, optimizing network structure is a crucial research direction. By fine-tuning the CNN model, a comprehensive feature representation is achieved. Baltruschat et al.³⁵ optimized the ResNet-50 architecture, which better extracts information from images and improves its applicability to the classification of chest CXR images. Albahli et al.³⁶ proposed a new strategy to complement three deep CNN models with synthetic data to identify 14 pulmonary-related conditions. This approach uses three algorithms: DenseNet121, ResNet152V2, and InceptionResNetV2. Besides, the proposed models have been trained and tested separately. Currently, many studies have classified different lung diseases in the ChestX-ray14 dataset. However, due to the class imbalance in this dataset, these models may be over-trained for diseases in one category and under-trained for diseases in another. As a result, although these models can successfully detect multiple regions of lung disease, their performance is ultimately inadequate when applied more generally. In many studies of imbalanced data, such as³⁷ most choose to address the problem of class imbalance by employing data augmentation techniques on the dataset. Such as up-sampling operations for a small number of classes and down-sampling operations for a large number of classes. We have instead viewed the classification of each disease as a dichotomous problem (Yes or No), using the BCE Loss function and the Label smoothing technique to address the class imbalance.

Approach

In this section, we introduce the overall design of the MBRANet network model proposed in this paper. First, We propose incorporating a residual structure as a feature extractor and implementing attention mechanisms to enhance the feature extraction capability of the model. Next, we will describe the designed Multi-Branch classifier module in detail. Finally, we will introduce how the BCEWithLabelSmoothing loss function works. The diagram of our model is illustrated in Fig. 2. We now give further details of the MBRANet model.

CNN-based feature extractor

Feature extraction is an important step in identifying and classifying abnormal regions in thoracic CXR images. Therefore, effective features are needed to locate the abnormal regions correctly. The attention mechanisms can further channel information and location information in the images. Feature fusion methods combine information from different levels. By integrating diverse features, the model gains a more comprehensive understanding of the data, resulting in better results for complex tasks.

Backbone: In CXR images, there are significant differences in lesion size among different diseases, as shown in Fig. 1. Therefore, the desired network model needs to have robust multi-scale feature extraction ability.

Table 1 Network structure of the feature extractor, where Conv k $\times$ k block stands for k $\times$ k convolution, Batch Normalization, and ReLU. CA represents the Coordinate Attention operation in Fig. 4c. F1, F2, F3, and F4 represent the output tensor of the different layers.

Full size table

We construct a feature extractor by employing residual blocks inspired by the ResNet50 network architecture, which can represent multi-scale features at a finer granularity. The relationship between the input and output of a bottleneck residual unit is shown in (1).

$$\begin{aligned} {\text {H(x)}} = F(x) + x \end{aligned}$$

(1)

where x denotes the input of this residual unit; F(x) denotes the residual value of the input after the convolution layer; H(x) represents the output of the current residual unit.

Coordinate attention (CA): The convolution operation in residual blocks can capture local relations, but it cannot capture the dependency of long-range pixels. To solve this problem, we introduce the Coordinate Attention (CA) module into residual blocks, the main idea of the CA module is to calculate the attention weight of each position through a set of learnable parameters. The weights are utilized to emphasize significant input features, enabling the model to concentrate on important locations. The structure is shown in Fig. 3c. This module can effectively capture channel relationships and spatial location information in CXR images. Since the module has the same input and output dimensions, it can be flexibly inserted into any network structure to alleviate the problem of inadequate extraction of

local features by convolutional operations. We introduce CA into the residual block to obtain a new residual block for feature extraction.

Output of feature extraction: The feature extractor consists of multiple stages. As shown in Table 1, the network architecture consists of the following parts, a $7 \times 7$ convolutional layer in the stem, a $3 \times 3$ maximum pooling layer, and four consecutive stages containing different numbers of ResNet blocks. After a series of convolutional operations, we get the corresponding output feature maps F1, F2, F3, and F4 from $Conv1\_x$, $Conv2\_x$, $Conv3\_x$, and $Conv4\_x$.

Feature fusion: During the extraction and classification of multi-scale features, the lack of direct skip connections to leverage deep features from various scales (as depicted in Fig. 4a) leads to an inadequate capture of feature details and contextual information. To address this, we employ the feature fusion approach known as the Feature Pyramid Networks (FPN). As depicted in Fig. 4b, we first reduce the dimensions of F1, F2, F3, and F4 using 1x1 convolutions. We then adopt a top-down pathway propagation, starting from higher-level feature maps. This process involves retrieving feature maps from the previous layer and upsampling them using bilinear interpolation. The feature maps of the same scale are subsequently fused using element-wise addition, resulting in more refined feature maps denoted as $F1^{'}$, $F2^{'}$, $F3^{'}$, and $F4^{'}$. This fusion process effectively combines lower-resolution yet semantically rich features with higher-resolution features, resulting in an enhanced capture of feature details.

Multi-branch feature classifier (MFC)

There may be several abnormal areas on a single chest X-ray. The type of lung disease is diagnosed based on these abnormal regions. We want the model to focus not just on the global information of the image, but also on the spatial location of the abnormal regions in the feature map, which can then help experts assist in the diagnosis of lung disease. To achieve this goal, we designed the MFC method, which allows our network model to focus more on the key locations of diseases in the CXR image, in order to reduce the focus on irrelevant information in the image.

We designed the MFC method with four classifiers using the class-specific residual attention (CSRA) module. Zhu et al.³⁸ proposed the Residual Attention module to efficiently capture the different spatial regions occupied by different classes of objects, which can make full use of the spatial attention of each object class and achieve higher accuracy on the task of multi-label classification. The working principle of the CSRA module is shown in Fig. 5.

As shown in (2), for a given image S, it is first passed into the network model $\varphi$ for feature extraction to obtain the feature tensor $F \in R(c \times h \times w)$, where c, h, and w are the dimension, height, and width of the feature tensor, respectively. As shown in Table 1, the extracted features F1, F2, F3 and F4 have the shapes $256 \times 56 \times 56,512 \times 28 \times 28, 1024 \times 14 \times 14$ and $2048 \times 7 \times 7$ respectively. Then, we feed the extracted $F1$ $\sim$ $F4$ into the CSRA module to classify. The CSRA module first performs a $1 \times 1$ convolution operation on the input feature vector $c \times h \times w$ to reduce its dimensionality to $d \times h \times w$ and decouple it into $x_1$, $x_2$, $x_3, \ldots , x_{h \times w}$, where d denotes the number of classes. Afterward, the feature tensor after the dimension reduction

operation (Flatten) is transformed into a one-dimensional vector $U:d \times (h \times w)$ for the subsequent classification task. Next, two distinct operations are performed on the information tensor after the Flatten operation. First, a Spatial pooling operation is performed on U to obtain a spatial attention score map $U_1: 1 \times d$ describing the spatial feature information of a category. Then, the Average pooling operation is executed on U to obtain the classical global category diagnostic feature vector $U_2: 1 \times d$, As shown in Fig. 5. y = [$y^1$, $y^2,\ldots , y^c$], c means the number of classes. The predicted probability of the k-th class can be derived by the following (3).

$$\begin{aligned} {\text {F}}= & {} \varphi ({\text {S; }}\theta ) \end{aligned}$$

(2)

$$\begin{aligned} {{\text {y}}^k}= & {} \frac{1}{{hw}}\sum \limits _{i = 1}^{hw} {X_i^k} + \lambda \sum \limits _{i = 1}^{hw} {{\text {softmax(T}}X_i^k)} X_i^k \end{aligned}$$

(3)

The parameter $\theta$ refers to the setting of the parameter in the network model. T is the temperature hyperparameter (T > 0), which controls the sharpness of a single position score. $\lambda$ is a parameter that controls the output weight of spatial pooling. We initialized the parameter t with a value of 1. In our experiments, we feed $F1^{'}\sim F4^{'}$ into the classifier and set $\lambda$ to 0.5, 0.4, 0.3, and 0.1 respectively. Finally, the prediction vectors $y_1$, $y_2$, $y_3$, and $y_4$ output by four CSRA modules are added for a joint judgment of thoracic diseases: $y_{final} = \sigma (\sum \nolimits _{i=1}^{4}w_iy_i)$.

Multi-label classification loss

These datasets have the problem of class imbalance. Such as in the distribution of the number of categories in the ChestX-Ray14 dataset, the number of positive images such as ‘Pneumonia’, ‘hernia’, and ‘Cardiomegaly’ is far less than the number of negative samples in the distribution of positive CXR images of the thorax. However, the unbalanced distribution of samples hinders the accuracy of classification with multiple labels and requires more pathological information to train a model with good results. To address this problem, we use Label Smoothing³⁹ to optimize the Binary Cross-Entropy (BCE) loss. we have termed this the BCEWithLabelSmoothing loss function and used it in our MBRANet.

For classification problems, we typically assume that the training data’s label vector has a probability of 1 for the target category and 0 for the non-target category. The traditional label vector $y_i$ for computing one-hot encoding is shown in (4). Cross-entropy loss is a commonly used loss function for binary classification problems, of which BCE (Binary Cross-Entropy) loss is a special case for binary classification tasks. The equation for BCE Loss is shown in (5).

$$\begin{aligned} y_i = \left\{ \begin{array}{ll} 1,&{}\quad {\text {i}} = target\\ 0,&{}\quad {\text {i}} \ne target \end{array} \right. \end{aligned}$$

(4)

$$\begin{aligned} \begin{aligned} BCELoss = - (y \times \log ({\text {p(x)}}) + (1 - y) \times \log (1 - p(x))) \end{aligned} \end{aligned}$$

(5)

where y is the true binary label (0 or 1), and p(x) is the predicted probability value.

During training, label smoothing reduces the overconfidence of the model in the training sample by replacing the true label with a small value between 0 and 1. Label smoothing incorporates a uniform distribution and replaces the traditional one-hot encoded label vector $y_i$ with an updated label vector $\widehat{y_i}$. $\widehat{y_i}$ is calculated as shown in (6).

$$\begin{aligned} \begin{aligned} \widehat{{\text {y}}_{\text {i}}} = \left\{ \begin{array}{ll} 1 - \alpha ,&{}\quad {{\text {y}}_{\text {i}}} = 1\\ \alpha {\text {/K}},&{}\quad {{\text {y}}_{\text {i}}} = 0 \end{array} \right. \end{aligned} \end{aligned}$$

(6)

Where K is the number of classes. $\alpha$ is the smoothing factor, which is usually taken as a small positive number (e.g. 0.1 or 0.01), used to control the degree of label smoothing. By introducing the smoothing factor, the degree of overconfidence of the model for the training samples can be reduced, thus improving the generalization ability and robustness of the model. BCEWithLabelSmoothing loss is calculated as shown in (7).

$$\begin{aligned} \begin{aligned} BCEWithLabelSmoothing&= - \frac{1}{K}\sum \limits _{i = 1}^K (\widehat{{y_i}} \times \log ({p_i})\\&\quad + (1 - \widehat{{y_i}}) \times \log (1 - {p_i})) \end{aligned} \end{aligned}$$

(7)

where the label of each CXR image is labeled as a one-hot vector y = [$y_1$, $y_2,\ldots , y_K$]. In our conducted experiments, the value of K was determined to be 14, while the parameter $\alpha$ was deliberately configured to a fixed value of 0.1. The label $\widehat{y_i}$ is obtained after the calculation of the y label in (6).

Table 2 Distribution of training, validation, and test sets in ChestX-Ray14, ChexPert, MIMIC-CXR, and IU X-Ray datasets.

Full size table

Ethical and informed consent for data used

This article is licensed under the Creative Commons Attribution 4.0 International License. Based on the terms of this license, the Licensed Material can be copied and shared in whole or in part. Additionally, adapted material can be created, copied, and shared. This license is free, non-transferable, non-exclusive, and irrevocable, and applies worldwide. The hyperlink to access the licensed material is https://creativecommons.org/licenses/by/4.0/.

Experiment

In this section, we will validate the performance of the MBRANet model proposed in this paper for multi-label CXR image classification on the ChestX-Ray14, CheXpert, MIMIC-CXR, and IU X-Ray datasets. First, we briefly overview the basic information about the experimental datasets. Then, we describe the implementation details of the experiments. Next, we present the evaluation metrics. Finally, comparisons are made between the performance of the MBRANet model and state-of-the-art methods.

Dataset

ChestX-Ray14² is a publicly available dataset widely used for medical image analysis, which is primarily used for the automated detection and classification of chest diseases. The dataset is published by the National Institutes of Health (NIH) and covers 14 common categories of chest disease (atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax. consolidation, edema, emphysema, fibrosis, pleural thickening, and hernia). The disease labels in each image are extracted from the relevant radiology reports using natural language processing. It contains 112,120 frontal X-ray images of size 1024 $\times$ 1024 obtained on 30,805 patients. Some examples of which are illustrated in Fig. 1. As shown in Table 2, the splitting of this dataset strictly followed the official splitting criteria published by² (78,468 images (70%) for training, 11,219 images (10%) for verification, and 22,433 images (20%) for testing). The distribution of the number of images used for training, validation, and testing across all classes is shown in Fig. 6 and reflects the high imbalance and diversity of the ChestX-Ray14 dataset.

CheXpert³ is a large dataset of CXR images published by Stanford University. It consisted of 224,316 chest radiographs of 65,240 patients. Each report was labeled for the presence of 14 observations as positive(1), negative(0), or uncertain(−1). The frequently used methods for handling uncertain(−1) labels were U-Ones (replacing all uncertain labels with 1) and U-Zeroes (replacing all uncertain labels with 0). According to Table 2, we split the dataset according to the official proportions³, where the validation set consisted of 200 chest radiology studies, manually annotated by three certified radiologists.

MIMIC-CXR⁴⁰ is a large publicly available dataset of chest radiographs with free-text radiology reports. The dataset contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center (BIDMC) in Boston, MA. According to Table 2, we adopt the standard 7:1:2 train/validation/test splits to verify the generalization capability of the MBRANet model.

IU X-Ray⁴¹ is a public radiography dataset collected by Indiana University, containing 7470 chest X-ray images and 3955 radiology reports. As shown in Table 2 in the official documentation, the dataset is divided into training, validation, and test sets with varying proportions of images, following the standard 7:1:2 ratio.

The 14 pathologies in ChestX-ray14 are Atelectasis (Atel), Cardiomegaly (Card), Effusion (Effu), Infiltration (Infi), Mass, Nodule (Nodu), Pneumonia (Pneu1), Pneumothorax (Pneu2), Consolidation (Cons), Edema (Edem), Emphysema (Emph), Fibrosis (Fibr), Pleural Thickening (P_T) and Hernia (Hern), respectively.

The 14 pathologies in CheXpert, MIMIC-CXR, and IU X-Ray are No Finding (NoFi), Enlarged Cardiomediastinum (EnCa), Cardiomegaly (Card), Lung Lesion (Lesi), Lung Opacity (Opac), Edema (Edem), Consolidation (Cons), Pneumonia (Pneu1), Atelectasis (Atel), Pneumothorax (Pneu2), Pleural Effusion (Effu), Pleural Other (Other), Fracture (Frac) and Support Devices (Devi), respectively.

For the labels of datasets, we use a D-dimensional vector of the one-hot form y = [$y_1$, $y_2, \ldots , y_d$], where d represents the number of disease classes and y is calculated by (2). $y_i \in \{0, 1\}$ denotes the presence or non-presence of disease category i, where 1 represents the presence and 0 represents absence. If $y_1$, $y_2, \ldots$, and $y_d$ are all 0, it means that the above disease type does not exist in this image.

Table 3 Parameters for training.

Full size table

Implementation details

As shown in Table 3, we utilized the early stopping method with a patience of 5 to enhance training efficiency. We use the PyTorch framework to implement the MBRANet model. We trained the model on an NVIDIA GeForce RTX 3060 GPU with epochs of 60, and batch_size was set to 64. We first resized the original image to $256 \times 256$, then randomly cropped it to $224 \times 224$ and used a random horizontal flip operation, and normalized it using the mean and standard deviation of ImageNet⁴². We used the Adam⁴³ optimizer with a weight decay of 1e−5, a beta of (0.9, 0.999), and a eps of 1e−8. The initial learning rate was set to 0.0001. When the loss value no longer decreases or the mean AUC value no longer increases over five epochs, the learning rate is divided by 10. In the validation and testing phases, we still resize the original image to $256 \times 256$, random crop to $224 \times 224$, and perform the same normalization operation as in the training phase.

Evaluation metrics

Because of the class imbalance problem, the area under the receiver operating characteristic curve(AUC) is a more reasonable indicator for performance evaluation than other indicators (e.g. accuracy, F1-Score), which has been used in most relevant work. Following^28,44, we used the per-class AUC score to measure the performance of these methods in diagnosing each of the 14 thoracic diseases. A high AUC value is known to indicate better performance of the model and higher diagnostic accuracy. Therefore, we will use the AUC metric to compare with other models. In the Receiver Operating Characteristic (ROC) curve, the horizontal axis is the False Positive Rate (FPR) and the vertical axis is the True Positive Rate (TPR), FPR and TPR are calculated through (8). By maintaining high TPR and low FPR across thresholds, a good ROC curve provides a comprehensive evaluation of the model’s discrimination power and overall performance in classification tasks. Additionally, we evaluated some models based on parameter count, FLOPS, GPU memory usage, training time, and inference time.

$$\begin{aligned} \begin{array}{l} {\text {FPR}} = \dfrac{{{\text {FP}}}}{{{\text {FP}} + {\text {TN}}}}\\ \\ {\text {TPR}} = \dfrac{{{\text {TP}}}}{{{\text {TP}} + {\text {FN}}}} \end{array} \end{aligned}$$

(8)

Comparison with state-of-the-art methods

In this section, we do extensive experiments and analysis on two benchmark public datasets, ChestX-Ray14, CheXpert, MIMIC-CXR, and IU X-Ray. We use the AUC metric to evaluate the final effect of MBRANet and compare it with current state-of-the-art methods. In Tables 4 and 5, we record not only the AUC score for each chest disease but also the mean AUC score values for all chest diseases.

Table 4 Comparison with previous baselines on the ChestX-Ray14 dataset. The AUC score of each pathology and the mean AUC score of 14 pathologies are reported. For each column, the best results are highlighted in bold.

Full size table

Evaluation on ChestX-Ray14 dataset

First, Fig. 7 gives the tendency of the loss values during training and validation. Then, we evaluate the performance of our model on the test dataset of ChestX-Ray14. The AUC score values for each lung disease are summarized in Table 4, where our proposed MBRANet achieves an average AUC value of 0.841 in 14 lung diseases, which is better than the other State-of-the-art Methods in Table 4. Figure 8 shows the ROC curves of the proposed MBRANet for the 14 lung diseases on the test dataset.

According to the results in Table 4, our proposed MBRANet achieved the best performance in three diseases (Atelectasis, Pneumonia, and Fibrosis). It achieved the highest mean AUC value of 0.841 among all 14 diseases, which effectively demonstrates the effectiveness of our proposed MBRANet for multi-label classification. As Compared to the mean AUC values of 0.830 of the State-of-the-art Methods^44,51 in Table 4, we improved by over 1%. Moreover, our research consistently outperforms transformer-based methods^52,53,54,55, achieving an accuracy of 0.841 compared to their reported accuracies of 0.768, 0.838, and 0.835. This further demonstrates that our MBRANet is superior to other State-of-the-art Methods. It is worth noting that our method achieved the best results in Atel, pneu1, and Fibr diseases, increasing all three conditions by around 1% compared to the highest AUC values previously achieved in these three diseases. For Effu, Edema, P_T, Infi, and Hern diseases, the AUC values we obtained are very close to the best values obtained by state-of-the-art methods for these diseases. However, it is evident from the table that for Card, Nodu, pneu2, and Emph conditions, our method does not achieve comparable results. In the future, we will conduct targeted research on this issue.

Besides, among the 14 diseases, we found that the AUC score for “Infiltration” was very low among all methods, probably because the diagnosis of this condition relies mainly on minor textural changes in CXR images, and it is still a challenge to improve the recognition of this disease.

Evaluation on CheXpert dataset

This section reports the effect of MBRANet on the CheXpert dataset. First, we evaluate the performance of our model on a validation set in the official³. Since uncertain labels are present in the training set, we use two approaches to deal with uncertain labels, U-Ones and U-Zeros. The comparison results obtained by MBRANet with other State-of-the-art Methods are shown in Table 5.

As displayed in the experimental results in Table 5. When utilizing the U-Ones method for uncertain labels, it is evident that⁵⁵ achieved the highest mean AUC value of 0.904 on the validation dataset with five categories.

Table 5 Comparison with previous baselines on the CheXpert dataset as measured by the AUC score of the validation set. The U-Ones and U-Zeros are different settings for uncertain labels. For each column, the best results are highlighted in bold.

Full size table

Table 6 Comparison with previous baselines on the MIMIC-CXR and IU X-Ray datasets as measured by the AUC score of the test set. Significant values are in bold.

Full size table

Compared to the mean AUC for the five disorders in^3,55, we obtained a mean value that was superior to^3,55 (0.895 vs. 0.893, 860). When validated in the 14 classes, the MBRANet method achieves the same high AUC value as⁴⁴, outperforming the AUC value of 0.813 in⁵⁵. When we use the U-Zeros method for uncertain labels, as shown by the results in Table 5, the MBRANet method obtains better results (0.891 vs. 0.886, 0.858) for validation of the five categories. For the validation of 14 categories, we obtained an AUC value of 0.835, which is close to the value in⁴⁴. We found that the U-Zeros method performs better than the U-Ones method. This phenomenon occurred because the number of noisy labels must have become less when we mapped the uncertain label to a new label, which gives a better training result.

Evaluation on MIMIC-CXR dataset

This section presents a detailed summary of the performance of the MBRANet model on the MIMIC-CXR dataset. The comparison in Table 6 demonstrates that our model outperformed^55,57 with the highest Mean AUC value of 0.805, surpassing their values of 0.721 and 0.773. Notably, while⁵⁵ achieved better AUC values for Edem, Effu, Pneu2, and Devi, our model obtained the highest AUC values for the remaining diseases. These results underscore the effectiveness and robustness of our model in disease classification tasks.

Evaluation on IU X-Ray dataset

According to the results in Table 6, our model achieved a mean AUC value of 0.745 across the 14 diseases in the IU X-Ray dataset. Even more excitingly, our model obtained higher AUC values for each disease compared to the results in⁵⁵. However, combining the information from Tables 5 and 6, it is evident that our model has outperformed its performance in the IU X-Ray dataset on the same 14 diseases in both the CheXpert and MIMIC-CXR datasets.

The MBRANet model has been trained and evaluated on four publicly available datasets, namely ChestX-Ray14, CheXpert, MIMIC-CXR, and IU X-ray. The results of the evaluation demonstrate that MBRANet achieves state-of-the-art performance on all four datasets. This indicates that MBRANet has excellent generalization capability.

Analysis and discussions

In this section, We first analyze the reasons for the effectiveness of our proposed MBRANet network. Next, we will use a Graphical User Interface (GUI) to display the final analysis results of our model more clearly. This allows us to perform visual analysis.

Effectiveness analysis of the MBRANet

Ablation studies

Table 7 Ablation experiments on CA/MFC/BCEWITHLABELSMOOTHING modules. In this context, MFC refers to combining FPN and CSRA modules. The mean AUC scores are reported for 14 pathologies on the ChestX-Ray14 dataset.

Full size table

Table 8 Computational cost comparison of different methods.

Full size table

Table 7 summarized the mean AUC scores for the 14 pathologies on the ChestX-Ray14 dataset using different modules. Additionally, it offers a detailed comparison of the computational costs associated with different models, encompassing parameters, FLOPS, GPU memory utilization, and inference time. The average AUC value was 0.831 while using ResNet50 for classifying chest diseases. To prove the effectiveness of the CA, MFC, and BCEWithLabelSmoothing modules, we integrated each of these modules separately into the original ResNet50. After conducting our analysis, we discovered that the implementation of the CA module resulted in an average AUC score increase of 0.3% over the baseline (0.834 vs. 0.831). This demonstrates the effectiveness of the CA module and its ability to enhance our overall results. When using MFC, and BCEWithLabelSmoothing respectively, the AUC scores improved by 0.1% (0.832 vs. 0.831) and 0.5% (0.836 vs. 0.831) over our baseline in turn. This highlights the effectiveness of the MFC and BCEWithLabelSmoothing modules that we have designed. When using the CA and MFC modules together, the result is 0.836, improving by 0.5%, which is better than both modules alone (0.836 vs. 0.834, 0.832). In addition, when using all three modules together we achieved the best AUC score of 0.841. The results in Table 7 effectively demonstrate the effectiveness of each module.

Based on the cost comparison summarized in Table 8, it is advisable to opt for the CSRA module over the FC module. The CSRA module exhibits superior performance in terms of computational complexity and parameter count, making it the more efficient choice for the given scenario. Furthermore, selecting the CA module instead of the SE and CBAM modules aligns better with the overall analysis.

Analysis of feature extractor

Table 9 Effect of different backbone structures on the ChestX-Ray14 dataset. Training time refers to the time taken for one epoch. Significant values are in bold.

Full size table

Table 10 Effect of different attention mechanisms for mean AUC Results on the ChestX-Ray14 dataset. Significant values are in bold.

Full size table

The MBRANet model utilizes a fine-tuned ResNet50 as the backbone network for feature extraction. We conducted experiments to assess the efficacy of utilizing ResNet50 as the feature extractor. And we replaced ResNet50 with variations such as DenseNet121, ResNet18, SKNet, ConvNet, and ResNet101 models while keeping all other settings constant. To improve the learning efficiency of the model, the backbone network used in the experiments will be the corresponding version of the pre-trained model on the ImageNet dataset. Table 9 presents a summary of the experimental results, showing that the best results were achieved using ResNet50 on the ChestX-Ray14 dataset with a mean AUC score of 14 pathologies (0.841).

In deep learning models, factors such as the number of parameters, network depth, and architecture play a crucial role in determining model performance. Analysis of Table 9 results indicates that ResNet50, with its moderate parameter count and network depth, exhibits stronger feature learning and representation capabilities. This translates to higher accuracy, superior generalization, and a more efficient and stable training process. Despite its superiority in FLOPS, PARAMS, and training time, ResNet18 only achieves a Mean AUC of 0.832. Consequently, ResNet50 outperforms ResNet101 and other models by adapting better to datasets and achieving superior performance, making it a preferred choice for various deep learning applications.

Analysis of attention mechanisms

The use of attention mechanisms has a significant impact on the experimental results. After we chose ResNet50 as the backbone network of MBRANet, we introduced different attention mechanism approaches into the residual blocks. Different attention mechanisms were introduced at positions after 3 $\times$ 3 convolution in the residual block. Table 10 summarises the results obtained on the ChestX-Ray14 test dataset using the SE, CBAM, and CA attention mechanism methods that employ the MFC and BCEWithLabelSmoothing techniques. The structure is shown in Fig. 3. The best results were obtained using the CA approach (0.841 vs. 0.830, 0.831). Based on the results in Tables 8 and 10, the CA attention mechanism approach was chosen to help ResNet50 process the image data better, to improve the performance and accuracy of the model.

For a given input, the CA module encodes each channel along the horizontal and vertical directions, producing a set of direction-aware feature maps. The operation captures correlations between pixels over long distances and retains positional information as a way to help localize our network to the region of interest.

Table 11 Multi-scale feature aggregation method comparison.

Full size table

Analysis of multi-branch feature classifier (MFC)

Table 11 shows that the MFC module excels over the Concat and Add methods in multiple aspects. The MFC module has the lowest computational complexity at 2.0211G, while the Add and Concat methods have higher values at 2.8188G and 2.8204G, respectively. In terms of parameter size, the MFC module is the smallest at 0.0251M, contrasting with the larger sizes of 2.8647M for Add and 3.0719M for Concat. Additionally, the MFC module achieves the highest Mean AUC score of 0.8410, outperforming the Add at 0.8370 and the Concat at 0.8330. Hence, the MFC module is the preferred choice due to its superior computational efficiency, smaller parameter size, and better model performance compared to the Add and Concat methods.

Analysis of $\lambda$-parameters

In the MFC method we designed, the only hyperparameter is $\lambda$. We take to evaluate the performance of different $\lambda$ on the ChestX-Ray14 datasets. Figure 5 illustrates the CSRA module in the MFC method. $\lambda$ controls the effect of the spatial pooling component, but when $\lambda$ is too large, the contribution of the average pooling component is diminishing. Therefore, for multi-scale features, the appropriate size of $\lambda$ to improve the classification results is very important. For finding a suitable setting of $\lambda$, we kept the other settings unchanged in the experiment and evaluated the impact of different sets of $\lambda$ values ($\lambda _1, \lambda _2, \lambda _3, \lambda _4$) on the final classification results. $\lambda _1, \lambda _2, \lambda _3, \lambda _4$ correspond to the $\lambda$ parameter value settings in the CSRA1, CSRA2, CSRA3 and CSRA4 modules in Fig. 2, respectively. Table 12 summarizes the experimental results for 10 different sets of $\lambda$ values. Table 12 shows that this set of $\lambda$ values of (0.5, 0.4, 0.3, 0.1) achieved the best mean AUC value (0.841). Notably, in future work, we can further improve classification performance by setting more groups of $\lambda$ values.

Table 12 Effect of setting different $\lambda$-parameters for mean AUC results on the ChestX-Ray14 dataset. Significant values are in bold.

Full size table

Problem analysis

When classifying thoracic diseases, to solve the problem of localizing lesion regions in different sizes, we introduce CA in the residual block to suppress the noise interference in the CXR image and capture the correlation between distant pixels to help the network localize to the region of the lesion. In most of the existing research methods, the extracted multi-scale features are not capitalized to improve the performance of the model. In addition, there is a severe sample imbalance in certain pathologies within the dataset, which tends to create an overdependence on a limited number of samples leading to overfitting. To address the above issues, we designed the MFC method to handle multi-scale information features and use decision fusion to produce the final classification results. In addition, we employed the label smoothing technique to address the issue of sample imbalance by processing the one-hot labels of the images.

We do the training and validation on the ChestX-Ray14, CheX-pert, MIMIC-CXR, and IU X-Ray datasets. In Tables 4, 5, and 6, we compared our results with previous studies on the same datasets, with the AUC value serving as the main performance metric. The analysis revealed that the proposed models are highly efficient and boast excellent generalization ability. It is important to note that while some studies have shown consistent findings, others have reported inconsistent results, underscoring the importance of meticulous data analysis to ensure accuracy and reliability.

Case analysis

With the advantage of weakly supervised localization, we use Grad-CAM⁵⁸ to generate a heatmap of the CXR image to highlight the lesion region. As shown in Fig. 9, the highlighted areas on the heatmap closely resemble Ground Truth, demonstrating that our proposed method can locate the most discriminative lesion areas on CXR images. In addition, we show the predicted results of multi-label classification on the ChestX-Ray14 dataset. In Fig. 9, the score column shows the top 6 predicted scores of MBRANet for each sample in predicting thoracic diseases. It indicates that our proposed method obtains accurate classification results.

To apply the trained MBRANet to assist in the diagnosis of thoracic diseases, we used Tkinter, a standard GUI (graphical user interface) toolkit for Python, to implement the visual case analysis. Figure 10 shows the main interface of the thorax X-ray analysis system implemented using Tkinter. The use of a GUI makes diagnostic results more intuitive and helps the medical professional diagnose the patient’s disease.

Discussion

Most existing methods for classifying chest X-ray images face challenges in accurately capturing lesion details at different scales due to the lack of consideration for spatial and channel information within disease regions. Additionally, the common problem of class imbalance in datasets can lead to overfitting and reduce the accuracy of classification results. The absence of effective solutions further hinders the achievement of precise classification. To solve these issues, we propose a multi-branch residual attention network (MBRANet) model, which focuses on the fusion and classification of image features at several different scales.

When classifying thoracic diseases, to solve the problem of localizing lesion regions in different sizes, we introduce CA in the residual block to suppress the noise interference in the CXR image and capture the correlation between distant pixels to help the network localize to the region of the lesion. In most of the existing research methods, the extracted multi-scale features are not capitalized to improve the performance of the model. We also performed multi-scale feature fusion using FPN models. To tackle the issue of an excessive number of parameters in the FC layer, we designed the MFC method to handle multi-scale information features and use decision fusion to produce the final classification results. In addition, we use the labelSmoothing technique to process the one-hot labels of the images to solve the sample imbalance problem.

Our research involved training, validating, and testing our proposed MBRANet model on the ChestX-Ray14, CheXpert, MIMIC-CXR, and IU X-Ray datasets. In Tables 4, 5, and 6, we compared our results with previous studies on the same datasets, with the AUC value serving as the main performance metric. In addition, we also analyzed each module. These results indicate that our proposed model is highly efficient and boasts excellent generalization ability. It is important to note that while some studies have shown consistent findings, others have reported inconsistent results, underscoring the importance of meticulous data analysis to ensure accuracy and reliability.

Although the high classification performance of MBRANet has been demonstrated, there are still some limitations. In particular, when dealing with the relationship between different diseases, some diseases may not be predicted due to the possible overlap of lesion regions in different pathologies. In addition, as the labels of the dataset are encoded using one-hot coding, similar features may become independent after encoding. This may result in the loss of some useful information and make it difficult to process similar features. Please note that the classification results of diseases using MBRANet can only be used as a diagnostic aid.

Responding to these issues, our future work will further explore the potential correlations between different diseases. Integrating prior knowledge with our model across multiple modalities to improve the overall performance. We will also attempt to address the uncertainties associated with the use of one-hot labels.

Conclusion

Experienced physicians possess the expertise to make accurate clinical diagnoses by focusing on the pathological information present in chest X-ray (CXR) images. In this paper, we proposed the MBRANet network model that can automatically learn multi-scale features of CXR images to classify some common chest diseases. In addition, we implement a GUI to call the trained MBRANet for computer-aided diagnosis. We trained and validated the performance of the models on the ChestX-Ray14, CheXpert, MIMIC-CXR, and IU X-Ray datasets. Our proposed MBRANet has the following features: (1) It can solve the problem of the inability of CNN to capture the long-range dependency of pixels by the Coordinate Attention (CA) module, Which enables our network to capture cross-channel information and direction-aware information. (2) It can accurately extract multi-scale image information and perform feature fusion, which improves the perceptual range of the network and improves the classification accuracy. (3) By replacing the fully connected (FC) layer with the class-specific residual attention (CSRA) module, we can not only reduce the number of parameters in the model but also improve the classification accuracy. (4) Using our designed BCEWithLabelSmoothing loss function to solve the problem of category imbalance in the dataset, which can reduce the overfitting tendency of the model. For future work, we will further explore the dependency between the information on disease characteristics to discriminate between the correct and incorrect categories with a blurred line.

Data availability

We clarify that our research findings are based on the analysis of four publicly available datasets: the ChestX-ray 14, CheXpert, MIMIC-CXR, and the IU X-Ray dataset. ChestX-ray 14: https://nihcc.app.box.com/v/ChestXray-NIHCC. CheXpert: https://stanfordmlgroup.github.io/competitions/chexpert. MIMIC-CXR: https://physionet.org/content/mimic-cxr/2.0.0/. IU X-Ray: https://openi.nlm.nih.gov/.

References

Hansell, D. M. et al. Fleischner society: Glossary of terms for thoracic imaging. Radiology 246(3), 697–722 (2008).
Article PubMed Google Scholar
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M. & Summers, R.M. Chestx-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases, 2097–2106 (2017).
Irvin, J. et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proc. AAAI Conf. Artif. Intell. 33(01), 590–597 (2019).
Google Scholar
Salehinejad, H., Colak, E., Dowdell, T., Barfett, J. & Valaee, S. Synthesizing chest X-ray pathology for training deep convolutional neural networks. IEEE Trans. Med. Imaging 38(5), 1197–1206 (2018).
Article PubMed Google Scholar
Guan, Q. & Huang, Y. Multi-label chest X-ray image classification via category-wise residual attention learning. Pattern Recognit. Lett. 130, 259–266 (2020).
Article ADS Google Scholar
Chen, B., Li, J., Guo, X. & Lu, G. Dualchexnet: Dual asymmetric feature learning for thoracic disease classification in chest X-rays. Biomed. Signal Process. Control 53, 101554 (2019).
Article Google Scholar
Yan, C., Yao, J., Li, R., Xu, Z. & Huang, J. Weakly supervised deep learning for thoracic disease classification and localization on chest X-rays, 103–110 (2018).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional Networks for Biomedical Image Segmentation 234–241 (Springer, 2015).
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network, 2881–2890 (2017).
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017).
Article PubMed Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition, 770–778 (2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks, 4700–4708 (2017).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Chowdary, G. J. & Kanhangad, V. A dual-branch network for diagnosis of thorax diseases from chest X-rays. IEEE J. Biomed. Health Inform. 26(12), 6081–6092 (2022).
Article PubMed Google Scholar
Hashmi, M. F., Katiyar, S., Keskar, A. G., Bokde, N. D. & Geem, Z. W. Efficient pneumonia detection in chest X-ray images using deep transfer learning. Diagnostics 10(6), 417 (2020).
Article PubMed PubMed Central Google Scholar
Huang, Z. et al. Fusion high-resolution network for diagnosing chest X-ray images. Electronics 9(1), 190 (2020).
Article Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W. & Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks, 11534–11542 (2020).
Li, Q., Yu, L., Adamu, M. J., Qu, L., Nie, J. & Nie, W. Multi-level residual feature fusion network for thoracic disease classification in chest X-ray images. IEEE Access (2023).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks, 7132–7141 (2018).
Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: Convolutional block attention module, 3–19 (2018).
Hou, Q., Zhou, D. & Feng, J. Coordinate attention for efficient mobile network design, 13713–13722 (2021).
Guendel, S. et al. Learning to Recognize Abnormalities in Chest X-rays with Location-Aware Dense Networks 757–765 (Springer, 2019).
Google Scholar
Guan, Q. et al. Thorax disease classification with attention guided convolutional neural network. Pattern Recognit. Lett. 131, 38–45 (2020).
Article ADS Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012).
Ma, W.-D.K., Lewis, J. & Kleijn, W. B. The HSIC bottleneck: Deep learning without back-propagation. Proc. AAAI Conf. Artif. Intell. 34(04), 5085–5092 (2020).
Google Scholar
Kumar, P., Grewal, M. & Srivastava, M. M. Boosted Cascaded Convnets for Multilabel Classification of Thoracic Diseases in Chest Radiographs 546–552 (Springer, 2018).
Google Scholar
Albahli, S. & Nazir, T. AI-CenterNet CXR: An artificial intelligence (AI) enabled system for localization and classification of chest X-ray disease. Front. Med. 9, 955765 (2022).
Article Google Scholar
Chen, B., Li, J., Lu, G., Yu, H. & Zhang, D. Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification. IEEE J. Biomed. Health Inform. 24(8), 2292–2302 (2020).
Article PubMed Google Scholar
Guendel, S. et al. Learning to Recognize Abnormalities in Chest X-rays with Location-Aware Dense Networks 757–765 (Springer, 2019).
Google Scholar
Wang, H. et al. Triple attention learning for classification of 14 thoracic diseases using chest radiography. Med. Image Anal. 67, 101846 (2021).
Article PubMed Google Scholar
Guan, Q., Huang, Y., Zhong, Z., Zheng, Z., Zheng, L. & Yang, Y. Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification. arXiv preprint arXiv:1801.09927 (2018)
Zhu, X. et al. Pcan: Pixel-wise classification and attention network for thoracic disease classification and weakly supervised localization. Comput. Med. Imaging Graph. 102, 102137 (2022).
Article PubMed Google Scholar
Hasannezhad, M., Ouyang, Z., Zhu, W.-P. & Champagne, B. An Integrated CNN-GRU Framework for Complex Ratio Mask Estimation in Speech Enhancement 764–768 (IEEE, 2020).
Google Scholar
Chen, K., Wang, X. & Zhang, S. Thorax disease classification based on pyramidal convolution shuffle attention neural network. IEEE Access 10, 85571–85581 (2022).
Article Google Scholar
Baltruschat, I. M., Nickisch, H., Grass, M., Knopp, T. & Saalbach, A. Comparison of deep learning approaches for multi-label chest X-ray classification. Sci. Rep. 9(1), 6381 (2019).
Article ADS PubMed PubMed Central Google Scholar
Albahli, S., Rauf, H. T., Algosaibi, A. & Balas, V. E. Ai-driven deep CNN approach for multi-label pathology classification using chest X-rays. PeerJ Comput. Sci. 7, 495 (2021).
Article Google Scholar
He, H. & Garcia, E. A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009).
Article Google Scholar
Zhu, K. & Wu, J. Residual attention: A simple but effective method for multi-label recognition, 184–193 (2021).
Müller, R., Kornblith, S. & Hinton, G. E. When does label smoothing help? Adv. Neural Inf. Process. Syst. 32 (2019).
Johnson, A. E. et al. Mimic-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 317 (2019).
Article PubMed PubMed Central Google Scholar
Demner-Fushman, D. et al. Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016).
Article PubMed Google Scholar
Deng, J. et al. Imagenet: A Large-Scale Hierarchical Image Database 248–255 (IEEE, 2009).
Google Scholar
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Chen, B., Zhang, Z., Li, Y., Lu, G. & Zhang, D. Multi-label chest X-ray image classification via semantic similarity graph embedding. IEEE Trans. Circ. Syst. Video Technol. 32(4), 2455–2468 (2021).
Article Google Scholar
Khanh Ho, T. K. & Gwak, J. Multiple feature integration for classification of thoracic disease in chest radiography. Appl. Sci. 9(19), 4130 (2019).
Article Google Scholar
Ouyang, X. et al. Learning hierarchical attention for weakly-supervised chest X-ray abnormality localization and diagnosis. IEEE Trans. Med. Imaging 40(10), 2698–2710 (2020).
Article Google Scholar
Ho, T. K. K. & Gwak, J. Utilizing knowledge distillation in deep learning for classification of chest X-ray abnormalities. IEEE Access 8, 160749–160761 (2020).
Article Google Scholar
Kim, E., Kim, S., Seo, M. & Yoon, S. Xprotonet: Diagnosis in chest radiography with global and local explanations, 15719–15728 (2021).
Guan, Q. et al. Discriminative feature learning for thorax disease classification in chest X-ray images. IEEE Trans. Image Process. 30, 2476–2487 (2021).
Article ADS PubMed Google Scholar
Lin, B., Chen, Z., Li, M., Lin, H., Xu, H., Zhu, Y., Liu, J., Cai, W., Yang, L., Zhao, S. et al. Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining. arXiv preprint arXiv:2304.14204 (2023).
Jiang, X., Zhu, Y., Cai, G., Zheng, B. & Yang, D. MXT: A new variant of pyramid vision transformer for multi-label chest X-ray image classification. Cogn. Comput. 14(4), 1362–1377 (2022).
Article Google Scholar
Taslimi, S., Taslimi, S., Fathi, N., Salehi, M. & Rohban, M. H. Swinchex: Multi-label classification on chest x-ray images with transformers. arXiv preprint arXiv:2206.04246 (2022).
Wu, X. et al. Chexnet: Combing transformer and CNN for thorax disease diagnosis from chest X-ray images. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 73–84 (Springer, 2023).
Google Scholar
Öztürk, Ş., Turalı, M. Y. & Çukur, T. Hydravit: Adaptive multi-branch transformer for multi-label disease classification from chest X-ray images. arXiv preprint arXiv:2310.06143 (2023).
Singh, S. Computer-aided diagnosis of thoracic diseases in chest X-rays using hybrid cnn-transformer architecture. arXiv preprint arXiv:2404.11843 (2024).
Pham, H. H., Le, T. T., Tran, D. Q., Ngo, D. T. & Nguyen, H. Q. Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels. Neurocomputing 437, 186–194 (2021).
Article Google Scholar
Hou, D., Zhao, Z. & Hu, S. Multi-label learning with visual-semantic embedded knowledge graph for diagnosis of radiology imaging. IEEE Access 9, 15720–15730 (2021).
Article Google Scholar
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. & Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization, 618–626 (2017).

Download references

Funding

This work is supported by the National Natural Science Foundation of China under Grant No. 61672210, the Major Science and Technology Program of Henan Province under Grant No. 221100210500, and the Central Government Guiding Local Science and Technology Development Fund Program of Henan Province under Grant No. Z20221343032.

Author information

These authors contributed equally: Shupei Jiao, Xiaowei Sun and Shuya Chen.

Authors and Affiliations

School of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China
Dongfang Li, Hua Huo, Shupei Jiao, Xiaowei Sun & Shuya Chen

Authors

Dongfang Li
View author publications
You can also search for this author in PubMed Google Scholar
Hua Huo
View author publications
You can also search for this author in PubMed Google Scholar
Shupei Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shuya Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Li Dongfang (First Author): Conceptualization, Methodology, Investigation, Formal Analysis, Writing Original Draft. Huo Hua (Corresponding Author): Conceptualization, Resources, Funding Acquisition, Supervision, Review and Editing. Jiao Shupei, Sun Xiaowei and Chen Shuya: Data Curation, Writing, Review and Editing.

Corresponding author

Correspondence to Hua Huo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, D., Huo, H., Jiao, S. et al. Automated thorax disease diagnosis using multi-branch residual attention network. Sci Rep 14, 11865 (2024). https://doi.org/10.1038/s41598-024-62813-6

Download citation

Received: 30 January 2024
Accepted: 21 May 2024
Published: 24 May 2024
DOI: https://doi.org/10.1038/s41598-024-62813-6
Springer Nature Limited

Automated thorax disease diagnosis using multi-branch residual attention network

Abstract

Similar content being viewed by others

Diagnosis of Pediatric Pneumonia with Ensemble of Deep Convolutional Neural Networks in Chest X-Ray Images

Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance

Detection and classification of lung diseases for pneumonia and Covid-19 using machine and deep learning techniques

Introduction

Related work

Multi-label chest X-ray image classification

Attention learning

Other works

Approach

CNN-based feature extractor

Multi-branch feature classifier (MFC)

Multi-label classification loss

Ethical and informed consent for data used

Experiment

Dataset

Implementation details

Evaluation metrics

Comparison with state-of-the-art methods

Evaluation on ChestX-Ray14 dataset

Evaluation on CheXpert dataset

Evaluation on MIMIC-CXR dataset

Evaluation on IU X-Ray dataset

Analysis and discussions

Effectiveness analysis of the MBRANet

Ablation studies

Analysis of feature extractor

Analysis of attention mechanisms

Analysis of multi-branch feature classifier (MFC)

Analysis of \(\lambda\)-parameters

Problem analysis

Case analysis

Discussion

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation