TB-CXRNet: Tuberculosis and Drug-Resistant Tuberculosis Detection Technique Using Chest X-ray Images

Rahman, Tawsifur; Khandakar, Amith; Rahman, Ashiqur; Zughaier, Susu M.; Al Maslamani, Muna; Chowdhury, Moajjem Hossain; Tahir, Anas M.; Hossain, Md. Sakib Abrar; Chowdhury, Muhammad E. H.

doi:10.1007/s12559-024-10259-3

TB-CXRNet: Tuberculosis and Drug-Resistant Tuberculosis Detection Technique Using Chest X-ray Images

Open access
Published: 17 February 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Cognitive Computation Aims and scope Submit manuscript

TB-CXRNet: Tuberculosis and Drug-Resistant Tuberculosis Detection Technique Using Chest X-ray Images

Download PDF

Tawsifur Rahman¹,
Amith Khandakar¹,
Ashiqur Rahman²,
Susu M. Zughaier³,
Muna Al Maslamani⁴,
Moajjem Hossain Chowdhury¹,
Anas M. Tahir¹,
Md. Sakib Abrar Hossain¹ &
…
Muhammad E. H. Chowdhury ORCID: orcid.org/0000-0003-0744-8206¹

962 Accesses
Explore all metrics

Abstract

Tuberculosis (TB) is a chronic infectious lung disease, which caused the death of about 1.5 million people in 2020 alone. Therefore, it is important to detect TB accurately at an early stage to prevent the infection and associated deaths. Chest X-ray (CXR) is the most popularly used method for TB diagnosis. However, it is difficult to identify TB from CXR images in the early stage, which leads to time-consuming and expensive treatments. Moreover, due to the increase of drug-resistant tuberculosis, the disease becomes more challenging in recent years. In this work, a novel deep learning-based framework is proposed to reliably and automatically distinguish TB, non-TB (other lung infections), and healthy patients using a dataset of 40,000 CXR images. Moreover, a stacking machine learning-based diagnosis of drug-resistant TB using 3037 CXR images of TB patients is implemented. The largest drug-resistant TB dataset will be released to develop a machine learning model for drug-resistant TB detection and stratification. Besides, Score-CAM-based visualization technique was used to make the model interpretable to see where the best performing model learns from in classifying the image. The proposed approach shows an accuracy of 93.32% for the classification of TB, non-TB, and healthy patients on the largest dataset while around 87.48% and 79.59% accuracy for binary classification (drug-resistant vs drug-sensitive TB), and three-class classification (multi-drug resistant (MDR), extreme drug-resistant (XDR), and sensitive TB), respectively, which is the best reported result compared to the literature. The proposed solution can make fast and reliable detection of TB and drug-resistant TB from chest X-rays, which can help in reducing disease complications and spread.

Deep learning-based comprehensive review on pulmonary tuberculosis

Article 24 January 2024

Automatic Diagnose of Drug-Resistance Tuberculosis from CT Images Based on Deep Neural Networks

Automated Pulmonary Tuberculosis Severity Assessment on Chest X-rays

Article 08 April 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Tuberculosis (TB) is a contagious disease and the leading infectious disease-related cause of death [1]. TB can be cured if diagnosed early and treated properly [2]. Chest X-rays (CXRs) are routinely utilized for pulmonary tuberculosis detection and screening [3, 4]. Chest radiographs are analyzed in clinical practice by trained medical doctors in TB diagnosis. However, this is prone to subjective evaluation, expert-dependent, and sometimes inefficient process. Subjective discrepancies in radiograph-based illness diagnosis are unavoidable [5, 6]. CXR images of TB patients are sometimes confused with other lung abnormalities of similar patterns [7, 8]. This leads to incorrect diagnosis and therapeutic treatment which worsen the disease of the patients. Moreover, radiologists are in short supply in low-income countries, particularly in countryside areas. Computer-assisted diagnostic (CAD) systems that analyze chest X-ray images can play an essential role in mass screening for pulmonary tuberculosis. The introduction of deep convolutional neural network (CNN) models and open access publicly available large datasets made the wide spread application of computer vision algorithms. CNNs allow important image features to be learned automatically from the large training data, but acquiring annotated medical image datasets like ImageNet is a very challenging task [9,10,11]. X-ray imaging technique is a popular and a very low-cost modality, which can provide plenty of data to train machine learning models. Therefore, X-ray images are becoming popular in detecting lung abnormalities using deep CNN models.

Recently, several studies employed deep CNN models to detect lung abnormalities (i.e., pneumonia, lung cancer, tuberculosis) by analyzing CXR images [12,13,14,15]. Deep CNN models were extensively used to detect the novel coronavirus disease from CXR images[15,16,17,18]. Ismael and Sengur in [16] used a comparatively smaller dataset of CXR images and confirmed that deep learning had the potential for coronavirus disease 2019 (COVID-19) detection using CXR images. Features extracted using the ResNet50 model were classified using the support vector model (SVM) classifier with the linear kernel to produce an accuracy of 94.7%. Tahir et al. [12] proposed a framework for classifying coronavirus families with more than 90% sensitivity by utilizing multiple pre-trained CNN models. To differentiate viral pneumonia, COVID-19, and healthy patients, Chowdhury et al. [14] reported a deep CNN model for COVID-19 detection from CXR images while the different layers of the CNN model were used to identify the signature of viral pneumonia and COVID pneumonia in the X-ray images. Another study proposed a unique CNN model called PulDi-COVID for detecting nine different diseases, including COVID-19, using chest X-ray images and the SSE algorithm [19]. The test results showed that PulDi-COVID had high accuracy for identifying COVID-19 specifically with 99.70% accuracy, 98.68% precision, 98.67% recall, 98.67% F₁ score, a low zero-one loss of 12 chest X-ray images, 99.24% AUC-ROC score, and a low error rate of 1.33%. A collection of recent literature on the use of X-ray, CT, and multimodal imaging for COVID-19 diagnosis was reviewed and classified based on the use of pre-trained and custom models by Yogesh and Patnaik in [20]. The authors also discussed the challenges of using deep learning for COVID-19 diagnostic systems and outlined areas for future research to improve the accuracy and reliability of COVID-19 detection. Another study proposed a unique CNN model called PulDi-COVID for detecting nine different diseases, including COVID-19, using chest X-ray images, and the Searchable symmetric encryption (SSE) algorithm [19]. The test results showed that PulDi-COVID had high accuracy for identifying COVID-19 specifically with 99.70% accuracy, 98.68% precision, 98.67% recall, 98.67% F₁ score, a low zero-one loss of 12 chest X-ray images, 99.24% AUC-ROC score, and a low error rate of 1.33%. A collection of recent literature on the use of X-ray, CT, and multimodal imaging for COVID-19 diagnosis was reviewed and classified based on the use of pre-trained and custom models [20]. The authors also discussed the challenges of using deep learning for COVID-19 diagnostic systems and outlined areas for future research to improve the accuracy and reliability of COVID-19 detection. Ieracitano et al. [21] introduced a deep learning framework that incorporates fuzzy logic to distinguish between COVID-19 pneumonia and non-COVID-19 interstitial pneumonias based on chest X-ray (CXR) images. They utilized CXR images and fuzzy images generated through a formal fuzzy edge detection method as inputs to their developed CovNNet model, allowing for automatic extraction of the most crucial features. The experimental findings demonstrated that by combining CXR and fuzzy features, the classification performance significantly improved, reaching an accuracy rate of up to 81%.

Several research groups applied standard machine learning algorithms to stratify TB and healthy or other non-TB lung infections using CXR images [22,23,24,25,26,27]. Different groups have proposed deep CNN models [28,29,30,31,32,33] by pruning the networks to detect tuberculosis. TB patients were identified with an accuracy of 82.09% using a deep CNN model by Hooda et al. [28]. In [30], a CAD model was proposed for the detection of TB patients from the chest X-ray images with an accuracy of 88.76% utilizing important patterns in the lung images. Pasa et al. [31] showed a deep neural network for tuberculosis detection with an accuracy of 86.82%. They also mentioned a technique for interactively visualizing tuberculosis instances. In another work utilizing the ensemble of CNN models, Hernandez et al. [33] automatically classified TB patients from CXR images with an accuracy of 86%. Pre-trained CNN models, which were trained on the ImageNet dataset, were utilized by Lopes et al. [34] to stratify the TB and non-TB patients using CXR images. A simplified pre-trained CNN model for TB detection with and without image augmentation has been developed by Ahsan et al. [35] with an accuracy of 81.25% and 80%, respectively. Again, pre-trained CNN models were reported to show an accuracy of 94.89% in TB detection by Yadav et al. [36]. Abbas et al. [37] suggested a class decomposition strategy-based CNN architecture to enhance the performance of pre-trained models. It is worth noting here that TB culture test images were used for training the pre-trained CNN models. Chang et al. [38] achieved 98% sensitivity with 99% precision by applying the transfer learning technique to TB culture images. TB culture image-based classification needs specific samples from the patients, which makes it less reliable than classification from readily available chest X-rays. In our previous work [39], we presented a transfer learning approach utilizing deep Convolutional Neural Networks (CNNs) to automatically detect tuberculosis (TB) from chest radiographs. The researchers evaluated the performance of nine different CNN models in classifying TB and normal chest X-ray (CXR) images. Among these models, ChexNet demonstrated superior performance for datasets with the lung segmented CXR images. The results revealed high classification accuracy, precision, and recall for TB detection. Specifically, without segmentation, the accuracy, precision, and recall were found to be 96.47%, 96.62%, and 96.47%, respectively, while with segmentation, they increased to 98.6%, 98.57%, and 98.56%, respectively.

The drug-resistant TB strains are particularly concerning in the diagnosis and treatment of TB. There are currently around 20 medicines in use to treat tuberculosis. The five most often used medications, usually known as first-line treatments, are typically given to TB patients who did not get TB treatment before. Rifampin (RIF), isoniazid (INH), ethambutol (EMB), pyrazinamide (PZA), and streptomycin (SM) are the first-line drugs [40]. To avoid getting resistance to a single treatment, it is critical to take multiple TB drugs at the same time. Patients need to be very careful about the treatment plan for several months without missing a single dose to avoid drug resistance. Medicines for drug resistance tuberculosis, which are so-called second-line medications, have negative impacts as well as very expensive. These reserve drugs are grouped according to their experience of usage and efficiency. If the TB bacterium that causes the infection responds to all medicines, the patient is drug-susceptible or drug-sensitive. If a patient’s TB becomes drug-resistive, at least one of the primary medications will not affect the TB bacteria, either through poor treatment or transmitted by the infected patient. Multi-drug resistant TB (MDR-TB) and extreme drug-resistant TB (XDR-TB) are the two main kinds of drug resistance. MDR-TB is characterized as resistance to at least one of the most effective first-line TB medications, isoniazid or rifampicin. These two are the most common TB drug resistance types while further classifications are occasionally used relying on the number of medications to which the Mycobacterium tuberculosis bacteria stop responding such as resistance to specific drugs and resistance to the majority of currently available drugs. XDR TB is an extensively rare type of MDR TB, which is resistant to rifampin, fluoroquinolone, or any isoniazid drugs [41]. They are also resistant to any of the second-line injectable drugs (i.e., kanamycin, amikacin, or capreomycin). Patients have very fewer and less effective drug options due to the resistance of the XDR TB to the most potent TB drugs. Human immunodeficiency virus (HIV)-infected or other similar condition patient who has weak immune system should be very careful about XDR TB. Once infected such a patient can easily develop TB and also have a high risk of death after developing the TB.

MDR-TB is challenging to detect and requires additional time and cost for patient treatment (sometimes more than 2 years). MDR-TB affects 3.3% of new TB patients, as well as 20% of previously treated patients [42]. One of the most difficult aspects of treating MDR-TB is detecting drug resistance in suspected patients on their first hospital visit. Drug resistance status is determined by a drug susceptibility test often done on sputum samples. A well-equipped laboratory is necessary to acquire such a report in 4 to 6 weeks [43]. This investigation time can be considerably shortened to detect MDR-TB by the recent invention of the Xpert Mycobacterium tuberculosis/rifampicin (MTB/RIF) [44]. This is a real-time polymerase chain reaction test to identify the genetic changes that happened to the MTB genome related to rifampicin (RIF) drug resistance. But the sputum sample collection is still required for the test, which is very hard to collect, particularly from children. As a result, finding MDR-TB remains a difficulty, and because of its broad availability, the traditional chest X-ray (CXR) remains an important tool in the surveillance, diagnosis, and screening of MDR-TB. There is evidence that computed tomography (CT) images can be used to distinguish MDR TB and drug-sensitive TB. For instance, Yeom et al. [45] showed a substantial association between primary MDR-TB patients and multiple bilateral abnormalities in the lung CT images. Stefan et al. [44] reported multiple cavities and bilateral consolidations in the chest CT slices, which help in the discrimination analysis of MDR-TB patients. Chen et al. [46] used positron emission tomography and computed tomography (PET-CT) imaging to relate the abnormalities in images with the MDR TB patients while studying the changes in the lung abnormalities in a cohort of 28 MDR TB patients under second-line TB treatment for 2 years and then monitored for another 6 months using CT alone. Traditional sputum microbiology is less sensitive than several radiologic markers in detecting successful vs unsuccessful TB patients under treatment. Cha et al. [47] explained the radiological results of XDR-TB and compare them to those of MDR-TB and drug-sensitive TB among non-AIDS patients. Drug-sensitive TB was represented by the presence of several nodules, bronchial, and cavities dilatation in CT images of young individuals, whereas no significant difference in the image of the patients with MDR-TB and XDR-TB was observed. These findings were verified by Kim et al. [48], who detected apparent cavities in CT images for the patients with MDR-TB. This is also supported by the findings of Chung et al. [49]. On the other hand, Lee et al. [50] later determined that XDR-TB had more widespread consolidation and a tree-in-bud presence in CT images in comparison to MDR-TB. Very little effort has been made to automatically distinguish drug-resistant and drug-sensitive TB using CXR images. A significant relationship between the treatment resistance status of TB patients and computerized features of radiological imaging was identified by Kovalev et al. [51] in a pilot study. By combining CXR and CT features, the authors attained an accuracy of more than 75% in drug-resistance TB detection [52]. However, the CXR features alone had a substantially low performance. Stefan et al. [44] reported that it is possible to computationally extract relevant information from the chest X-ray images related to the drug-resistant TB infection. They have used the CXR images from the database of the Republic of Belarus where MDR/XDR-TB and HIV/TB are dominant. The database also incorporates the laboratory values and clinical biomarkers along with CXR images from either diagnosed or suspected MDR-TB patients. Out of 135 investigated cases by Stefan et al. [44], 45% (61) were sensitive while 54% (74) were MDR. As radiological images can provide details that can help in distinguishing the various drug-resistant TB categories, machine learning networks can be used to detect them and make decisions. It has been found from previous works of the authors and other recent work that novel machine learning networks along with pre-processing techniques can accurately detect other pulmonary abnormalities [13, 53, 54]. The above studies motivated this study to use a machine learning framework in classifying TB and healthy patients using chest X-ray images and further classify the TB patients into the different drug-resistant TB groups to help in early disease detection and treatment. The key contributions of this work are highlighted below:

The largest TB benchmark dataset, namely, QU-MLG-TB, has been created using 40,000 CXR images along with their ground truth lung masks. Of the TB patients’ CXR images, 10,881 normal (healthy), 24,119 non-TB (other lung infections), and 5000 are present in the dataset. This is the largest TB dataset which is collected from multiple open access and restricted access databases.
A novel framework, TB-CXR-Net, for TB detection using this largest dataset was proposed. This is a benchmark performance on a benchmark dataset with the highest accuracy ever achieved in the diagnosis and assessment of TB disease using CXR alone.
The largest drug-resistant TB dataset as a subset of the QU-MLG-TB dataset will be released to develop a machine learning model for drug-resistant TB detection and stratification.
A state-of-the-art machine learning stacking model is proposed to detect and stratify drug-resistant TB from chest X-ray images alone with state-of-the-art performance.
Score-CAM-based visualization technique to see how the best performing model decides to classify the image.

The paper is divided into five subsequent sections. The “Methodology” section summarizes the methodology used in the paper with the details of the datasets and pre-processing steps. The “Experiments” section provides the experimental details, while the “Results and Discussion” section describes the results and discussion of the TB classification and drug-resistant TB stratification. Finally, the article is concluded in the “Conclusion” section.

Methodology

The objective of this research is first to classify the TB patients among healthy control and non-TB other lung infections and then stratify the TB patients into drug-sensitive and drug-resistive TB.

In the first stage, a novel framework was developed to classify normal, non-TB (other lung infections), and TB patients (Fig. 1A) using a convolutional neural network with a non-linear neuron-based multi-layer perceptron (MLP) classifier [55]. In the second stage, the chest X-ray images of the TB patients are applied as input to a CheXNet-based CNN encoder to extract CXR image features, and then the dimensionality of the extracted features was reduced using the principal component analysis (PCA). Finally, different machine learning classifiers and stacking approaches were investigated to find the best performing model to classify the TB chest X-rays into binary (drug-resistant and drug-sensitive, i.e., cases that are sensitive to all the TB drugs (Fig. 1B)) and 3-class problems (MDR, XDR, and sensitive-TB (Fig. 1B)). The overall methodology of the proposed system is shown in Fig. 1.

Dataset Description

The study considered only posterior-to-anterior or anterior-to-posterior view of the chest X-ray images, as this view is widely used by radiologists. QU-MLG-TB is the largest TB benchmark dataset and consists of 40,000 chest X-ray (CXR) images along with their corresponding lung masks. Details of the full dataset are shown in Table 1. There are two categories in the dataset:

Table 1 Different properties of QU-MLG-TB dataset

Full size table

TB Classification

This is one of the largest datasets for TB which consist of 10,881 normal (healthy), 24,119 non-TB (other lung infections), and 5000 TB patients’ CXR images that are present in the dataset. This dataset was built using a variety of publicly available and restricted access datasets and repositories [39, 53]. Therefore, the dataset has a wide range of resolution and format and was collected using different equipment. In the pre-processing phase, the authors identified and discarded the images with extremely low-quality, over-exposed, and duplicate images to ensure a good quality dataset for this study.

RSNA CXR Dataset (Non-COVID Infections and Normal CXR)

The RSNA pneumonia detection challenge dataset [56] is made up of 26,684 chest X-ray images, where 8851 images are normal, 11,821 are abnormal, and 6012 are images of lung opacity. The images are in DICOM format. In this study, we used 8851 normal images and 6012 images of lung opacity as the non-COVID class.

PadChest Dataset

The PadChest dataset [57] is made up of more than 160,000 X-ray images from 67,000 patients that were collected and reported by radiologists at Hospital San Juan (Spain) from 2009 to 2017. In this study, we used 4000 normal and 4000 pneumonia/infiltrate (non-COVID-19) cases from the PadChest dataset.

NLM Dataset

The National Library of Medicine (NLM) in the USA has made two datasets of lung X-ray images publicly available: the Montgomery and Shenzhen datasets [58]. The Montgomery County (MC) and the Shenzhen, China (CHN) databases consist of 138 and 667 posterior-anterior (PA) chest X-ray images, respectively. The resolution of the images in the MC database is either 4020 × 4892 or 4892 × 4020 pixels, while the resolution of the images in the CHN database is variable, but around 3000 × 3000 pixels. In the MC database, out of the 138 chest X-ray images, 58 were taken from different TB patients, and 80 were from normal subjects. In the CHN database, out of 662 chest X-ray images, 336 were taken from different TB patients, and 324 were from normal subjects. Therefore, in this NLM database, there are 406 normal and 394 TB-infected X-ray images.

Belarus Dataset

The Belarus dataset [59] was gathered for a study on drug resistance led by the National Institute of Allergy and Infectious Diseases, Ministry of Health, Republic of Belarus. The dataset includes 306 chest X-ray images from 169 patients. The images were taken using the Kodak Point-of-Care 260 system and have a resolution of 2248 × 2248 pixels. All the images in this database are from individuals infected with TB.

NIAID TB Dataset

The NIAID TB portal program dataset [60] includes around 3037 chest X-ray images that are positive for TB from approximately 3087 cases. The images were collected from seven different countries and are in Portable Network Graphics (PNG) format.

Drug-Resistant TB Classification

A subset of the QU-MLG-TB dataset, where 3037 CXR images out of 5000 TB images labeled as drug-resistant/sensitive TB, was used for the drug-resistant TB classification. Among these 3037 CXR images, 626, 1672, and 739 images are sensitive, MDR, TB, and XDR TB, respectively. Figure 2 (last row) shows the sample CXR images of the drug-resistant/sensitive TB. Figure 2 shows the sample images for healthy, non-TB other lung infections, and TB images with high interclass variations and varied quality, signal-to-noise ratio (SNR) levels, and resolution. Figure 2 (last row) shows the sample CXR images of the drug-resistant/sensitive TB. Figure 2 (last row) shows the sample CXR images of the drug-resistant/sensitive TB.

Preprocessing

This section describes different pre-processing steps used in this study, such as image enhancement techniques, technical details in the model development for the lung segmentation and classification including feature extraction, feature reduction using principal component analysis (PCA), and finally stacking machine learning-based classification.

Gamma Correction

The image enhancement technique is to is to highlight important information in an image while reducing or eliminating irrelevant details, thereby enhancing decision-making performance. In this study, the authors employed the Gamma correction technique, which has previously demonstrated improved classification performance on chest X-ray (CXR) images in the works of the same authors [39, 53]. While linear operations such as addition, subtraction, and scalar multiplication are commonly used for pixel normalization in image processing, Gamma correction involves applying a non-linear operation to enhance the pixels of the image. Gamma correction is typically denoted by the following expression:

$${{\text{P}}}_{{\text{gamma}}} \;=\;\mathrm{ A }\;\times\;{{\text{P}}}_{{{\text{original}}}^{\gamma }}$$

(1)

where the non-negative pixel values are raised to the power of γ and gamma value can be greater or smaller than 1 and multiplied by the constant A.

Lung Segmentation Model Development

It is very important to localize the region of interest for the machine learning networks, i.e., the lungs in the chest X-ray images. In our previous work for CXR lung segmentation [61], a detailed investigation was done on three segmentation architectures, Feature Pyramid Networks (FPN) [62], U-Net++ [63], and U-Net [64] with various encoder backbones. FPN [62] segmentation network with DenseNet121 [65] encoder as a backbone outperformed other conventional segmentation networks [61]. Using the FPN network with DenseNet121 backbone, the lung area is segmented very accurately which was verified by the experienced radiologists in the previous work. The model trained in [61] was used to create lung segmentation for this work. Figure 3 shows the sample chest X-ray images and their corresponding lung masks.

TB Classification Model and Drug-Resistant TB Stratification Model

Two experimental frameworks have been proposed in this study to classify TB patients into healthy and non-TB other lung infections and then stratify drug-resistive TB patients among the TB patients.

TB Classification Model

In this study, a novel deep learning network using a pre-trained CNN (ChexNet) encoder and non-linear neuron-based MLP classifier (Self-MLP)—details provided below, is proposed for TB classification (Fig. 4).

Feature Extractor

An encoder of the pre-trained CNN model, ChexNet, was used to extract important features from the segmented chest X-rays. It should be worth mentioning here that CheXNet is a variant of DenseNet (DenseNet121) which was trained on a large chest X-ray dataset and the pre-trained model is available publicly. CheXNet performed exceptionally well in the CXR image classification task for COVID-19 as shown in our previous work [39]. In the DenseNet architecture, in the Dense block, each layer is connected to every other layer. Every Dense block has a feature map of the same size, and the features are reused within the network. Such Dense connections connecting the DenseNet layers expedite the flow of information throughout the network. The useful features of the CXR images from the CheXNet were extracted from the last layer of the encoder, ‘AvgPool.’

The DenseNet encoder has three dense blocks that each has an equal number of layers. Before entering the first dense block, a convolution with 16 output channels is performed on the input images. For convolutional layers with kernel size 3 × 3, each side of the inputs is zero-padded by one pixel to keep the feature-map size fixed. Moreover, the encoder part has 1 × 1convolution followed by 2 × 2 average pooling as transition layers between two contiguous dense blocks. At the end of the last dense block, a global max pooling is performed and then a Self-MLP classifier is attached. The feature-map sizes in the three dense blocks are 32 × 32, 16 × 16, and 8 × 8, respectively.

Principal Component Analysis-Based Feature Reduction

Principal component analysis (PCA) is used to reduce the dimensionality of the feature space extracted from the ChexNet encoder. PCA projects high-dimensional data into a new lower-dimensional representation with as minimal reconstruction error as feasible. Because all the fundamental components in the reduced set are orthogonal to one another, there is no redundant data. PCA was calculated with the use of whitening, which can improve accuracy by forcing data to meet certain assumptions.

Self-MLP

To overcome the linear nature of CNN, the Operational Neural Network (ONN)-based model was recently presented in [55]. ONN is a heterogeneous network that learns complicated patterns of any signal using a fixed set of non-linear operators and has demonstrated promising results in numerous applications such as image denoising and image restoration [66,67,68,69]. Self-organized Operational Neural Networks (Self-ONN) is a new variant of ONN. Instead of a fixed collection of operator libraries, Self-ONN learns the best set of operators throughout the training process. This results in a more robust model that can handle a wider range of scenarios and generalizes effectively in real-world scenarios. During the training phase, operational layers determine the best set of operators, which can be a combination of any conventional functions or unknown functions. The output ${x}_{k}^{l}$ at $k$ th neuron of ${l}$ th the layer of any ONN can be illustrated as follows in Eqs. (2) and (3):

$${x}_{k}^{l}\;=\;{b}_{k}^{l}\;+\;\sum_{i\;=\;1}^{{N}_{l\,-\,1}} {\Psi }_{ki}^{l}\left({w}_{ki}^{l},{y}_{i}^{l\,-\,1}\right)$$

(2)

where ${b}_{k}^{l}$ and ${w}_{ki}^{l}$ denote the biases and weights corresponding with that neuron and layer, ${y}_{i}^{l-1}$ represent the previous layer’s input, ${N}_{l-1}$ stands for kernel size of that layer, and ${\Psi }_{ki}^{l}$ corresponds to the nodal operator of the neuron and layer. If ${\Psi }_{ki}^{l}$ is linear then the equation simply corresponds to conventional CNN. In ONN, the composite nodal operator $\Psi$ can be constructed using a set of standard functions as follows:

$$\Psi \left(\mathbf{w},y\right)\;=\;{w}_{1}\;{\text{sin}}\left({w}_{2}y\right)\;+\;{w}_{3}\;{\text{exp}}\left({w}_{4}y\right)\;+\;\cdots\;+\;{w}_{q}y$$

(3)

where $\mathbf{w}$ represents the q-dimensional array of parameters that are composed of internal parameters and weights of the individual functions. Instead of a fixed set of operators, the composite nodal operator $\Psi$ can be constructed using a Taylor series approximation. The Taylor series approximation of a function $f(x),$ near point, $x=a$ is expressed by the following equation:

$$\begin{array}{c}f(x)\;=\;f(a)\;+\;\frac{{f}^{\mathrm{^{\prime}}}\left(a\right)}{1!}(x\;-\;a)\;+\;\frac{{f}^{^{\prime\prime} }\left(a\right)}{2!}(x\;-\;a{)}^{2}\;+\;\frac{{f}^{\mathrm{^{\prime}}\mathrm{^{\prime}}\mathrm{^{\prime}}}\left(a\right)}{3!}(x\;-\;a{)}^{3}\;+\;\cdots \\ \end{array}$$

(4)

Equation (4) can be used to construct the nodal operator as follows:

$$\Psi (\mathbf{w},y)\;=\;{w}_{0}\;+\;{w}_{1}(y\;-\;a)\;+\;{w}_{2}(y\;-\;a{)}^{2}\;+\;{\cdots\;+\;w}_{q}(y\;-\;a{)}^{q}$$

(5)

where ${w}_{q}=\frac{{f}^{(n)}(a)}{q!}$ is the $q$ th parameter of the $q$ th-order polynomial. In Self-ONN, tangent hyperbolic (tanh) has been used as an activation function that is bounded at the range [− 1, 1]. So, for tanh, $a$ is equal to zero in Eq. (5).

Figure 4 illustrates the CheXNet-Self-MLP-based TB classifier that uses Self-MLP as a classifier after the CheXNet-based encoder. MLP layers can be implemented using convolutional layers by using kernels of the same size as the input. Thus, a single sliding window of the convolutional kernel will cover the full signal, retaining the fully connected nature of MLPs. Similarly, 1D operational layers can be used to implement Self-MLP layers, which were used in the implementation of this study.

Drug-Resistant TB Stratification Model

ML Classifier

To identify the patients with TB drug resistance, the CheXNet encoder was used to extract features as mentioned earlier, PCA was used to reduce the features obtained from the encoder and then eight machine learning classifiers, including Support Vector Machine (SVM) [70], K-nearest neighbor (KNN) [71], XGBoost [72], Random Forest [73], Adaboost [74], linear discriminant analysis (LDA) [75], Gradient boosting [76], and Logistic regression [77], were used to classify a subset of TB patients into sensitive and drug-resistive TB (binary class problem) and XDR, MDR, and sensitive TB (3-class problem).

Stacking Model

The three best performing classifiers were chosen as base learner models (M₁, M₂, M₃) in the stacking architecture, and a meta learner classifier (M_f) was trained in the second phase, resulting in separate performance matrices based on the final prediction. Consider a single dataset A is consisting of input vectors (${x}_{i}$) and their classification score (${y}_{i}$). At first, a set of base-level ML classifiers ${M}_{1},\dots \dots ,{M}_{p }$ is trained on the dataset and the estimation of these base learners is applied to train the meta-level classifier${M}_{f }\mathrm{}$, [78,79,80] which is illustrated in Fig. 5.

Model Interpretability

Saliency map or class activation map techniques make the deep learning model interpretable. It is become very important to see why the CNN model works and to know the underlying reason in the decision-making process. This helps to make the model trustworthy as the reason for classification becomes evident to humans. SmoothGrad [81], Grad-CAM [82], Grad-CAM++ [83], and Score-CAM [84] are the common visualization techniques. However, Score-CAM was used in this study as it outperformed other techniques in the recently reported medical image classification problems. The heat map created by the Score-CAM technique will show the regions of the images where the model is learning most. This visualization process of CNN allows its user to have trust in the model decision if it is seen that the model is learning from the relevant area of the image rather than just using the model as back-box without any clue where is taking the decision from.