A. Image datasets
Two datasets were retrospectively collected: Dataset 1 consisted of 120 chest CT scans and was used to develop, train, and test deep learning algorithms to segment the lung boundaries and main lung vessels. A variety of lung disease, including atelectasis (n = 47), interstitial lung disease (ILD) (n = 31), tuberculosis (n = 13), pneumonia (n = 17), and others (n = 12) (including five CT scans with emphysema and seven CT scans that were negative for lung disease based on visual interpretation by a thoracic radiologist). Dataset 1 was randomly split into three groups: (1) training set (n = 80), (2) interval validation set (n = 20), and (3) independent test set (n = 20). These CT scans were collected from various sources and acquired using different protocols (e.g., manufacturers, radiation dose, and slice thickness).
Dataset 2 consisted of 72 serial chest CT scans from 24 subjects with a confirmed COVID-19 diagnosis and used development and test the algorithm to detect and quantify the pneumonic regions. Each subject had at least three consecutive CT scans performed at (1) T0—baseline CT scan; (2) T1—first follow-up scan; and (3) T2—second follow-up scan, which were performed at 3.4 days ± 1.8, 9.7 days ± 1.9, and 15.8 days ± 3.6 after symptom onset, respectively. Only the first three consecutive CT scans were included in this study. The cohort represents all the subjects identified with three consecutive CT scans and no other criteria were used to exclude subjects. All subjects had close contact with individuals from Wuhan and were later confirmed to have COVID-19 by RT-PCR. The chest CT exams were performed on a 64-row spiral CT (Siemens) without radiopaque contrast with the participants in a supine position and holding their breath. The scan parameters were tube voltage of 120 kVp, tube current modulation of 100 mA, and spiral pitch factor of 1. The image slice thickness ranged from 1.0 to 2.0 mm. Subject ages ranged from 15 to 74 years with a mean of 44.8 ± 15.6 with 13 of the 24 subjects male (Table 1). There were no reported or obvious comorbidities reported in the medical records of the Dataset 2 cohort. The CT scans from four subjects were used to develop the algorithm (training set), and the remaining CT scans from 20 subjects were used to independently test the algorithm (test set). All the CT scans from the subjects (i.e., T0, T1, and T2) were used in the development and testing process.
Table 1 Demographics of the COVID-19 subjects (n = 24) The protected health information was removed from all data and was re-identified with a unique study ID. This study was approved by both the Ethics Committee at the Xian Jiaotong University The First Affiliated Hospital (XJTU1AF2020LSK-012) and the University of Pittsburgh Institutional Review Boards (IRB) (# STUDY20020171).
B. The computerized scheme
There are four primary components to our approach (Fig. 1): (1) automated segmentation of the lung boundary and major vessels, (2) elastic registration of the CT scans acquired at two time points, (3) computerized identification of the pneumonitis regions, and (4) assessment of disease progression. See Supplemental power point file demonstrating the 3-dimensional visualization of the lung, vascular, and pneumonia segmentations (slide 2) as well as the heatmap visualization of disease progression (slides 3 and 4).
Automated lung segmentation
A deep learning approach based on the U-Net framework [16,17,18] was developed to ensure the automated segmentation of the lung boundary when there is pneumonia or consolidation adjacent to the chest wall. It is well-known that deep learning approaches are data-hungry. The 120 CT scans in Dataset 1 used to develop the lung and vessel segmentation algorithm had the lung boundaries delineated and other types of lung diseases labeled by an experienced thoracic radiologist (D.P.). Our computational geometric approach [19] used to identify the intrapulmonary vessels often failed to identify the vessels near the hilum due to the entanglement of the arteries and veins. Therefore, the U-Net framework was used to identify the main extrapulmonary vessels and vessels near the hilum. When training the U-Net framework, the CT images were transformed into an isotropic format and used 3D patches with a size of 96 × 96 × 96 mm. The Adam optimizer was used with an initial learning rate of 0.001 on a batch size of 2 and set the maximum number of epochs as 100. The voxel-wise cross-entropy loss function was minimized for the optimization, and the model with the smallest validation loss was saved as the final inference model.
Elastic lung registration
Our previously developed bidirectional elastic registration algorithm [20] was used to register two CT scans at different time points. The registration procedure produced a deformation field, by which we could elastically transform the CT images from one CT scan to another. We computed the intensity of the deformed voxels by performing a linear interposition based on the eight neighboring voxels of the new locations in the initial CT images.
Automated detection of COVID-19 disease
Pneumonia depicted on CT scans typically has a higher density compared with the lung parenchyma, but the density of pneumonia can vary widely. On chest CT scans, the lung vessels, fissures, and airway walls have a higher HU value compared with the surrounding parenchyma. The pulmonary fissures and the airway walls are small relative to the parenchyma and can largely be ignored or easily filtered from the image. The vessels are larger, thus intrapulmonary vessels and extrapulmonary vessels in the mediastinum were segmented (or filtered) and excluded from the images during the detection of the diseased regions, specifically pneumonia. The average density of the images in the middle of the lungs was used to compute a threshold (the lowest density) to detect regions associated with pneumonia. An experienced thoracic radiologist (J.S.) labeled the pneumonic regions associated with COVID-19 in the 72 CT scans from the 24 subjects in Dataset 1. As stated above, the 12 CT scans from four subjects were used in the development of the algorithm, which were not part of the test CT scans.
Quantitative assessment of disease progression
One approach to assess disease progression would be to independently quantify the volumes of the diseased regions in the lungs depicted on two chest CT scans and then compute the volume differences. For pneumonia caused by COVID-19, which most often includes multiple infected regions (Fig. 4), this method provides an overall estimation of the disease progress but lacks information regarding specific regional disease differences. A more robust approach to evaluating disease progression includes the independent assessment of longitudinal changes in each diseased region. To compare progression across individual regions, the paired regions of disease need to be identified on serial CT scans (e.g., T0 and T1). To automatically pair the diseased regions on two different CT scans, we used our previously developed bidirectional elastic registration algorithm [20] to register the two CT scans. Given two chest CT scans, the registration procedure produced a deformation field by which we could elastically transform the CT images with the identified disease at an early time point to a deformed version that is expected to be aligned with the CT images obtained at a later time point. We calculated the intensity of the deformed voxels by performing a linear interposition based on the eight neighboring voxels of the new locations from the initial CT images. The CT images from T0 and T1 were registered as described above, and based on the alignment of the regions of disease, which are automatically aligned, the difference between the regions of disease was used to quantify disease progression between T0 and T1 in terms of volume and density. A simple subtraction was also performed, based on the image registration between the aligned regions of disease between T0 and T1, to visualize the longitudinal changes by creating a heatmap. The voxel values on the subtraction images could be either positive or negative. A positive value indicates that the density increases, and a negative value indicates that the density decreases from T0 to T1.
C. Performance testing
The performance of the deep learning algorithms to segment the lung boundary and main lung vessels were compared with the visually interpreted results of the human expert (D.P.) using the test set (n = 20) of Dataset 1. Likewise, the pneumonic regions labeled in 60 CT scans (20 subjects) of Dataset 2 were used to evaluate the performance of the algorithm to assess the presence and progression of COVID-19. Since the algorithm for elastic lung CT registration has been quantitatively assessed and reported elsewhere [20], only (1) the performance of the deep learning algorithms for lung region segmentation and main lung vessel segmentation, and (2) the performance of the algorithm for pneumonia detection and quantification in COVID-19 subjects were evaluated in this study. The Dice coefficient was used to evaluate the performance of the deep learning algorithms [21]. The Dice coefficient is defined as:
$$ D\left(A,B\right)=\frac{2\left|\mathrm{A}\cap \mathrm{B}\right|}{\left|\mathrm{A}\right|+\left|\mathrm{B}\right|} $$
(1)
where A is the computerized results and B is the labeled results by the human expert. The overlap between the readers’ delineated pneumonic regions and the computer-detected pneumonic regions was used to evaluate the performance of the computer algorithm. The readers’ outlines of the lung boundary and pneumonic regions were considered the gold standard “truth” in this study. Pneumonic regions detected by the software that did not overlap with radiologist-delineated pneumonic regions were considered false positives (FP). To characterize the detection-localization accuracy of the computer algorithm, we focused on regions > 200 mm3, which are likely to be more clinically relevant. The classic detection-localization characteristics [22] were estimated in terms of the true-positive fraction (TPF), which is the proportion of “true” detected pneumonic regions and the false-positive rate (FPR), which is the average number of FP results per image. TPF and FPR estimates were used for the sensitivity and specificity estimates of the corresponding iROI analysis [22], assuming that a reasonable bound or the number of pneumonic regions for CT scan is 40. The 95% confidence intervals for the estimates were computed using the generalized linear model for clustered binary data (PROC GENMOD, SAS, v.9.4).
Two radiologists (S.K. and Y.G.) independently reviewed and rated randomly presented CT scans from subjects with COVID-19. First, they viewed the original CT scans at T0 and T1 to assess if the disease increased, decreased, or remained the same. Next, they viewed the heatmap the computer software created from the registration of T0 and T1 images. Finally, they subjectively assessed if the heatmap accurately represented disease progression from T0 to T1 on a five 5-point scale: (1) unacceptable, too many errors that affect assessment; (2) poor, obvious errors that may affect assessment; (3) acceptable, minor errors that did not affect assessment; (4) good, minor errors; and (5) excellent, no obvious errors. We assessed the agreement of the two raters using the weighted Kappa coefficients. The statistical analysis (with p values and 95% confidence intervals, CI) was performed using a generalized linear model for a binary outcome, accounting for correlation between the assessment of the images from the same patient (PROC GENMOD, SAS v 9.4, SAS Institute). Inter-rater agreement was evaluated for both binary and multi-category (Likert) quality assessment using a simple weighted kappa statistic (PROC FREQ, SAS v.9.4).