A deep learning system that generates quantitative CT reports for diagnosing pulmonary Tuberculosis

The purpose of this study was to establish and validate a new deep learning system that generates quantitative computed tomography (CT) reports for the diagnosis of pulmonary tuberculosis (PTB) in clinic. 501 CT imaging datasets were collected from 223 patients with active PTB, while another 501 datasets, which served as negative samples, were collected from a healthy population. All the PTB datasets were labeled and classified manually by professional radiologists. Then, four state-of-the-art 3D convolution neural network (CNN) models were trained and evaluated in the inspection of PTB CT images. The best model was selected to annotate the spatial location of lesions and classify them into miliary, infiltrative, caseous, tuberculoma, and cavitary types. The Noisy-Or Bayesian function was used to generate an overall infection probability of this case. The results showed that the recall and precision rates of detection, from the perspective of a single lesion region of PTB, were 85.9% and 89.2%, respectively. The overall recall and precision rates of detection, from the perspective of one PTB case, were 98.7% and 93.7%, respectively. Moreover, the precision rate of type classification of the PTB lesion was 90.9%. Finally, a quantitative diagnostic report of PTB was generated including infection possibility, locations of the lesion, as well as the types. This new method might serve as an effective reference for decision making by clinical doctors.


Introduction
Pulmonary tuberculosis (PTB) is one of the leading respiratory infectious diseases monitored worldwide, as reported by the World Health Organization. 1 At present, China is still one of the 22 countries with a high PTB burden worldwide.The number of patients with PTB in China ranks third in the world, barely after India and Indonesia. 2,3In China, the number of reported cases of PTB ranked the second-highest infectious disease, after viral hepatitis. 4Therefore, correct detection and diagnosis of PTB are very crucial.
0][11] Numerous companies have released intelligent diagnostic systems for lung nodule detection, such as the Dr. Watson system from IBM.At the same time, some well-known academic institutions and organizations also launched competitions for lung nodule detection on computed tomography (CT) images.Of these, the most famous were the Lung Nodule Analysis 2016 (LUNA16) 12 and the Data Science Bowl 2017 (DSB2017), which were held by the notable data science website Kaggle.These open-sourced datasets had incubated a series of excellent detection and segment algorithms.However, few recent studies have explored the detection and classification of PTB infection.The progress in this field has been relatively slow compared with the lung nodule domain because fewer open-sourced CT image datasets of PTB are available.
Moreover, the much wider distribution and different characteristics of PTB lesion regions compared with those of lung nodules have also made it difficult to investigate.Despite the differences in morphological features between PTB lesion and pulmonary nodule, some of the open-sourced intelligent detection methods for pulmonary nodule still have a considerable reference value for PTB detection, for example, data preprocessing and image segmentation.
They found that the detection sensitive accuracy of this method was more than 85.4%.
Ciompi et al. 17 constructed a labeling system to automatically classify the morphological characteristics of pulmonary nodules into solid, sub-solid, calcified, and nonsolid lesions.Wang et al. 18 proposed a multi-view CNN that integrated several branches of CNN with a fully connected layer, which discriminated various pulmonary nodules more effectively.Zhu et al. 19 adopted 3D Faster R-CNN to detect pulmonary nodules.They used a 3D Dual Path Network to classify the detected pulmonary nodules into benign or malignant and achieved results comparable to doctors' diagnosis.Julian de Wit et al., 20 who won the second place in the DSB2017 competition, constructed a pulmonary nodule detector through a 3D CNN to predict the possibility of cancer.
In this study, three fine-tuned 3D CNN models were evaluated.The best model was used to detect and classify the PTB lesion regions based on CT image datasets.
Moreover, the spatial location of each lesion, the confidence (infection probability) of each single infection, presence of calcifications, classification of each lesion type, overall infection probability, and effective volume of the left and right lungs were digitally achieved according to the output of the AI network model.These reports generated a quantitative evaluation of a single infection region and the whole PTB case, thus greatly assisting clinical doctors to make more accurate diagnostic decisions.

Process
Figure 1 shows the whole process of PTB diagnostic report generation in this study.
First, the CT images were preprocessed to extract effective lung regions.Second, 3D CNN model were used to segment and classify the lesion region at the same time, and then the overall infection probability was calculated using the Noisy-Or Bayesian function.Finally, a quantitative diagnostic report together with the corresponding labeled CT images was exported for reference.

Dataset introduction
Five types of active PTB lesions were defined according to the Expert Consensus of Chinese Society of Radiology: 21

Dataset preprocessing
To facilitate the detection of PTB lesions, the CT images were resampled to keep the voxel of CT image to 1 × 1 × 1 mm 3 measured in the real space following the rule of nearest-neighbor interpolation.Then, the resampled CT sets were preprocessed to generate masks of the effective lung region so as to eliminate the unrelated regions before the training of the deep learning model.
1.As the digital gray scale image had the pixel value ranging [0, 255], the resampled CT raw data were converted from the Hounsfield Unit (HU) to the aforementioned values interval accordingly.The HU data matrix was clipped within [ -1200, 600] (any value beyond this was set to -1200 or 600 accordingly) and then linearly normalized to [0, 255] to fit into the digital image format as shown in Figure 3a.
2. A fixed threshold (-600) was used to binarize the resampled CT images, and bones and soft tissues such as blood vessels and muscles with substantial HU values were filtered out (Fig. 3b).
3. All connected components smaller than 0.3 cm 2 and having eccentricity larger than 0.99 were removed to eliminate some high-luminance radial imaging noise.The components (usually clothes and accessories other than the human body) with the distance to the center of the CT image more than 6.2 cm were also removed.Furthermore, the components with volume between 450 and 7500 cm 3 were remained, as shown in Figure 3c.The range in the present study was expanded compared with those reported by lung nodule detection studies 22 , which ranged from 680 to 7500 cm 3 .The nodule detection study usually focused on small regions, while lesions could be more massive for PTB cases.
4. The extracted mask in step 3 was eroded into two sectors and then dilated to the original size to remove small black holes (Fig. 3d).
5. Convex hull operation was performed on the effective region, which was extracted from the previous step, to include lesion regions attached to the outer wall of the lung (Fig. 3e).
6.The matrix data of images in step 1 were multiplied by the masks exported from step 5 to obtain the final effective pulmonary region for further processing.
The space out of the mask was filled with 170, which was equivalent to 0 if converts back to HU value.(Fig. 3f)

PTB data process and augment
To reduce the influence of the uneven distribution of different PTB lesion types in the present dataset, types with fewer specimens were expanded correspondingly.The

Deep learning model for segmentation and classification
Network structure 3D u-net 24,25 and v-net 26  The confidence loss L conf is a cross entropy loss to measure whether this proposal was a valid target: ( log( ) (1 )log( 1) where p is the ground truth and p ⌢ is the predicted value. The The corresponding predictions were ( ).Then, the total regression location loss is defined by { , , , } ( , ) where the function S was defined as L class is the cross entropy loss of five-classification dimension, class log where i y is the ground truth label and i y ⌢ is the predicted label.
Intersection over Union (IoU), which is equal to the overlapped area of the bounding boxes of two objects divided by their united area, is an evaluation metric used to measure the accuracy of an object detector on a particular target and define the tags of each anchor box in the present study.The anchor box with IoU larger than 0.5 was treated as a positive sample (p = 1), while that with IoU small than 0.02 was regarded as a negative sample (p = 0).Others were neglected during training and validation.
Then, the loss function (L total ) is defined as follows: p equals to 1 and 0 when the box is a positive sample and a negative sample respectively; and λ is set to 0.5 according to the setting of Yolo, 29 which is a well-tuned deep learning algorithm for 2D object segmentation and identification.

Patch-based input for training
The 3D CT image patches were cropped from the lung images and then fed into the network individually.They were randomly selected based on the following rules.First, 70% of the patches contained at least one ground truth PTB lesion, which indicated that either the center point or more than 12 mm margin at each dimension from the region were included.The rest 30% were cropped from the healthy area to ensure the coverage of enough negative samples.
The Clipped 3D CT image patches were cropped from the lung scans to save GPU memories and then fed into the network individually.The size of the patch was 128 × 128 × 128 × 1 (height × length × width × channel).The output of the last convolution network was resized to 32 × 32 × 32 × 3 × 10 in the transpose layer, where the last two dimensions corresponded to the anchors in the RPN network and the location and classifications, respectively.Three different scales of anchors with the side length of 10, 40, and 80 mm were used.Hence, the output layer had 32 × 32 × 32 × 3 anchor bounding boxes.The 10 regression dimension was {p i , x i , y i , z i , d i , t 0 , t 1 , t 2 , t 3 , t 4 } where p i is the confidence; x i , y i and z i denote for the center of the candidate; d i is the side length of the region, and t 0-4 is the possibility of 5 PTB types individually.

Transfer learning
To accelerate the convergence rate of the PTB analysis models, transfer learning was utilized in the study by first training models for the task of lung nodule detection using two open-sourced pulmonary CT datasets LUNA16 and DSB2017, which contained 888 and 2101 lung CT nodule analysis cases, respectively.The outputs of the nodule detection models included the coordinates of the center point, the side length, and the confidence of the nodule region.Then this pre-trained nodule detection models were used to initialize the network for PTB study.

PTB training
In the next PTB training stage, only the output layer and the loss function were modified to be included into the lesion classification task, while the rest of the network structure remained unchanged.
At in the beginning of the training, the PTB analysis network was initialized with the parameters from the pre-trained nodule detection model (as they had the exactly same network structures) except for the output layer, which was initialized randomly with the normal distribution.

Performance evaluation
A non-maximum suppression algorithm 30 was first performed on detected PTB lesion regions to remove repeated candidate bounding boxes.If the central coordinate of the remaining box was within the radius of the human annotated lesion region, the result was marked as true positive (TP); otherwise it was false positive (FP).False negative (FN) indicated that no predicted bounding box was corresponding to a human annotated region to measure the number of issues missed by the model.Accordingly, Recall, Precision, and the more balanced F1_score were used to measure the performance of the deep learning model: The test dataset consisted of 150 cases, including 75 cases with PTB lesions and 75 normal cases from 75 healthy people, with 412 valid PTB lesion regions.

Quantitative diagnostic report
The exported CT images were converted back to the original size for easy to review.
The final quantitative diagnostic report, based on the detection and classification of PTB information, included the overall infection probability, effective volume of the left and right lungs, classification of lesion type, spatial location of infection, and presence of calcifications.The original CT images with corresponding annotated lesion regions were also exported.

Overall infection probability of the left and right lungs
According to the confidence level of each single detected lesion, the overall infection level (P) of the left and right lungs was calculated using the probability formula of the Noisy-Or Bayesian function 31 as follows: where i P represents the infection possibility of the ith lesion in this single lung.

Effective volume of the left and right lungs
The effective volume of lungs had consulting value for doctors in medical diagnosis. 32,33The effective volume of a single lung was calculated by extracting the effective region in the original CT images according to the value of HU (threshold equals to -600).By removing the blood vessels, soft tissues, and lesion regions, the volume i V of a single CT image was calculated as where i S is the effective lung region of the ith piece and h is the physical thickness between two adjacent slices.Then, the total volume (in real physical size) of the effective lung was measured as follows: total 0

Spatial location of infection
As a 3D system, the number of CT image slices was used instead of coordinate z.
The parameters x, y, and d (in pixel size) represented the center point and the side length of this single lesion region.The origin of coordinates of x and y was at the lower-left corner of each CT image.

Recognition of the presence of calcification
According to clinical experience, a HU value of more than 120 of the nodule with an effective region of at least 3 pixels indicated the presence of calcification.

CT image annotations
The location of lesions was annotated as the bounding box on the CT slices corresponding to the output of the deep learning model, together with their infection probabilities, types, and presence of calcification.Only the image slice with the center of the lesion was labeled to avoid confusion.

Evaluation platform
An Intel i7-8700k CPU together with NVIDIA GPU GeForce GTX 1080 was used as the testing server.

Training curve
All three network models were trained with and without pre-trained models seperately (shown in Table 1).The results indicated that without pre-trained models, the models did not converge at all after a few hundreds of epochs.Therefore, the pre-trained models were used for the next steps of this study.As the epoch number of training iterations increased to more than 350, the loss value did not decrease or increase obviously, suggesting that the models converged well to a relative optimal state without distinct overfitting.

Model performance on test dataset
The performance of all three 3D CNN models were evaluated on the test set, which consisted of 150 cases, including 75 cases from PTB group and 75 normal cases from healthy group, with 412 valid PTB lesion regions.The detection accuracy was evaluated firstly, as the classification accuracy would be calculated only when it was a true positive region.
The Free Response Operation Characteristic (FROC) analysis was utilized to evaluate the performance of different models on the test dataset, as shown in Figure 8.To facilitate directly quantitative comparisons among models, FROC system score was computed, which was the average of the recall at seven predefined false positive per scan (1/8; 1/4; 1/2; 1; 2; 4; and 8).
The corresponding FROC system score for 3DUNET-RPN, VNET-RPN and VNET-IR-RPN were 0.875, 0.901 and 0.917, which showed that VNET-IR-RPN had the best performance averagely.This result also highlighted the effectiveness and efficiency of inception-resnet blocks in this 3D inspection architecture.Therefore, VNET-IR-RPN model was used for the rest of this study.In order to achieve the maximum value of F1_score, we set the threshold (classified as lesion region if the predicted probability is higher than the threshold) to 0.38.

Example of the diagnostic report
An example of an exported diagnosis report, consisting of a summarized description report and a series of images with labeled lesions accordingly, is shown in Table 2 and

Discussion
Currently, most CAD systems focused on PTB using chest x-ray images.CAD4TB is an AI-based CAD system for detecting PTB, 34 which makes judgments based on chest x-ray images as "No TB", "Possible TB", and "Likely TB" scenarios.Lakhani et al. 35 from the Thomas Jefferson University Hospital used a different deep CNN for deep learning and classified chest x-ray images into PTB manifestation or normal with a precision rate of 97.3%.These CAD systems using chest x-ray images could only perform bi-choice or multi-choice operation to detect the presence of TB and might ignore small-scale lesions.
CT imaging is a tomographic imaging method with the characteristics of high resolution and non-invasiveness.A single scan usually generates dozens to hundreds of images to present a complete 3D pulmonary view.Compared with the planar imaging of chest x-ray, CT has more advantages, such as higher spatial precision, clearer display of organs and structures of lesions, and stereoscopic exhibition.
However, it is difficult for radiologists to evaluate the severity of a single lesion as well as the whole case accurately on a 3D scale.Controversial decisions can be made by one doctor at different times or by different doctors due to subjective judgments.
In this study, three state-of-the-art 3D deep learning models were adopted to analyze CT images of the lungs, which could use the advantages of 3D CT imaging characteristics to detect various regions and types of lesions effectively.Among them, v-net backbone with inception-resnet blocks achieved the best performance both for the accuracy of detection and classification.Then the exported quantitative report, with overall infection probability, calcification information, lung effective volume, lesions with spatial coordination and corresponding labeled images, may serve as an effective reference for doctors to make decisions.
However, this study had several limitations.First, besides PTB, many other signs existed for intrathoracic TB, including lesions in the pleural cavity, pericardium, bones, and lymph nodes.This study focused only on the five typical types of pulmonary lesions and ignored the other signs.Second, many pulmonary issues other than PTB still existed, such as infectious diseases (bacteria, fungi, viruses, parasites and so on) and non-infectious diseases (tumors and vasculitis and so on).Due to the lack of relevant training samples, other pulmonary lesions could not be correctly identified in this study and were misjudged as one type of PTB.The samples of pulmonary lesions other than TB should be added for effective discrimination in the future.Third, the CT samples in this study were collected from inpatient PTB cases with relative massive region of lesions.The current model might be less sensitive to trivial PTB lesions.Moreover, the proposed model might misjudge some of the lesions due to the false-positive rate; therefore, doctors still needed to review the full CT scan to confirm the result.
Future investigations can be improved from the following aspects.First, for extracting of effective lung regions, a fixed threshold method was used to extract masks in this study.For improvement, more effective segmentation methods can be used in data preprocessing due to the wide distribution and different types of PTB lesion regions.
For example, a better pulmonary mask can be achieved by extracting the lung contour by a deep learning regression method.Second, during the full complete TB treatment cycle for one patient, clinical doctors were more concerned about the changes in PTB lesions.Hence, patients needed to be scanned several times.The comparisons should be made before, during, and after the therapy to assess the treatment effect.An artificial intelligence system can be used in the future to analyze all CT cases of one patient along the time sequence with a quantitative comparison of the whole PTB treatment.

Figure 1 .
Figure 1.Process flow chart.The CT image dataset was first preprocessed according to the

Figure 3 .
Figure 3. Image data preprocessing. (a) Normalized resampled CT image; (b) sampling possibility of miliary, caseous, and tuberculoma cases was expanded 10 times, and that of cavitary cases was expanded 5 times during the training to balance the specimen number with the infiltrative type, which was the dominated type among all.At the same time, generic data expansion mechanisms such as random clipping and left-right flipping were performed on specimens to increase the number of training samples and prevent data overfitting.23

Figure 4 .Figure 5 .Figure 6 .
Figure 4. 3DUNET-RPN network structure; ground truth bounding box of a PTB lesion is denoted by (G x , G y , G z , G d ) and the bounding box of an anchor is denoted by (A x , A y , A z , A d ), where the first three elements stand for the center point and the last element for the side length.The regression labels of the bounding box included the regression of the center point (d x , d y , d z ) and the side length d d

Figure 8 .
Figure 8.The FROC curve of different models.

Table 1 .
The training curves of the loss value for each CNN model All three models detection and classification accuracy values were based on the training set.The classification accuracy would be calculated only when it was a true positive region.All three models did not converge without pre-trained models.The detection and classification accuracy values were based on the training set.The classification accuracy would be calculated only when it was a true positive region.Training curve of loss value.The loss referred to the total loss function 5 at the bottom for type 5 (cavitary) PTB; and the cal and digit 2 on the left for the presence of cal with 2 pixels.(Usually, at least 3 pixels indicated the effective presence of calcification) From the perspective of a whole PTB case, 79 cases (74 cases from PTB group and 5 cases from healthy group) were detected to have at least one lesion, while 71 (1 cases digit

Table 2 .
Example of a diagnostic report.The number of CT image slice was used Example of a diagnostic result corresponding to the same case as in instead of z.The parameters x, y, and d (in pixel size) represented the center point and the side length of the lesion region.Cal., Presence of calcification; IP, infection