Introduction

Interstitial lung diseases (ILDs) encompass a heterogeneous group of chronic and fibrotic lung diseases with distinct disease course and prognosis [1]. They may be associated with progressive lung volume loss with impaired quality of life, and in advanced stage, respiratory failure. Idiopathic pulmonary fibrosis (IPF), an inexorably progressive disease, is the most severe and lethal among others [2, 3]. Survival is shortened in patients with lower forced vital capacity (FVC) at baseline and annual FVC decline ≥ 10%. FVC has been thus proposed as a surrogate marker for disease progression and mortality in all ILDs and has been advocated as a primary outcome in major clinical trials in IPF [4,5,6]. However, FVC measurement is subject to inherent measurement variability and might be inaccurate in frail patients, advanced disease stages, and subjects with intractable cough [7, 8]. Therefore, a new feasible, reproducible, and effortless surrogate biomarker is still needed.

Chest high-resolution computed tomography (HRCT) is mandatory for disease evaluation in patients with ILD and is used for diagnostic, monitoring, and prognostic purposes. The current guidelines recommend that pulmonary function tests (PFT) and chest HRCT are both fundamental in patient follow-up [9,10,11]. Moreover, CT loss volume derived from visual or automated quantification of the lung volumes on HRCT has shown great interest for assessing the degree of severity, disease progression, and mortality in IPF and systemic sclerosis-associated ILD, in a relatively small ILD population [12,13,14,15]. Yet, there is a dearth of data on the role of longitudinal CT lung volume loss and annual decline and its prognostic in the IPF population.

Therefore, using a newly commercially available deep learning algorithm for automatic quantification of lung CT volume, we compare the lung CT volume and pulmonary function tests in a large interstitial lung diseases population. Then, we evaluate the longitudinal CT volume loss between IPF and non-IPF populations and explore a predictive value of annual CT volume loss in the IPF population.

Materials and methods

Study population

We conducted a retrospective, observational, longitudinal study between February 2005 and July 2018 in an ILD expert center (Louis Pradel Hospital, Hospices Civils de Lyon, Lyon, France). Clinical, functional, and imaging data for consecutive patients that underwent at least one unenhanced HRCT study for fibrotic ILD were collected. The diagnosis of ILD was made in multidisciplinary discussion according to international guidelines at the time of the patient’s presentation. Data usage policy of the “Hospices Civils de Lyon” in terms of confidentiality, anonymization, and security was applied for each study, and approval was obtained from our local committee. Institutional review board approval was obtained for the study, and patient consent was waived.

CT studies

For the comparison between the PFT and CT volume (CTvol), only patients having a CT study within the two weeks of the PFT were considered eligible. For the longitudinal CTvol loss evaluation within non-IPF and IPF groups, patients who underwent more than four CT examinations were included. Finally, for the predictive analysis of CT volume loss in IPF patients, patients who had a baseline CT and a 1-year follow-up CT (± 10 days) were included. For all CT examinations, patients with a history of recent (within 3 months) acute exacerbation, pneumothorax, pleural effusion, or lower respiratory tract infection and confirmed by two senior radiologists (with 6 and 20 years of experience in chest imaging, S.S-M. and L.B., respectively) were excluded in order to be representative of the chronic disease course of these fibrotic ILDs.

CT protocols

All HRCT acquisitions were performed at the end of deep inspiration. The data were collected retrospectively from all CT examinations on several systems over the years: GE Medical Systems (Revolution GSI), Philips (Brilliance 40, Brilliance 64, iCT 256, Ingenuity CT, IQon, Spectral CT), Siemens (Somatom Definition AS and AS +). The scanning parameters were as follows: tube voltage = 100–140 kVp (mean ± SD: 121 ± 9 kVp), helical scan mode. The mean slice thickness was 1.8 ± 0.8 mm (range: 0.9–3 mm).

Automatic quantification of lung CTvol

We used commercially available software implemented in a clinical workstation (CT Pulmo Auto Results, provided under a research contract; IntelliSpace Portal ISP11.1, Philips Healthcare). This software is a U-net-based deep learning algorithm and allows the lung segmentation with the exclusion of the main airways including the trachea, stem, lobar bronchi, and the main vessels. Lung CTvol was expressed in liters (L) (Fig. 1).

Fig. 1
figure 1

Automated segmentation of 9 CT lung volumes over a 6-year follow-up in a 58-year-old male affected by idiopathic pulmonary fibrosis. The first row represents the coronal images. The second row represents the mask segmentation of the lung. The third row represents the 3D volume after lung segmentation using commercially available software based on a U-net-based deep learning algorithm (CT Pulmo Auto Results un-released, provided under a research contract; IntelliSpace Portal ISP11.1, Philips Healthcare)

Clinical respiratory functional test data

Patients’ data on demographics and PFTs were collected from patients’ records. PFTs were performed in all patients according to ATS/ERS official statement [16]. Total lung capacity (TLC) was measured with body plethysmography, and forced vital capacity (FVC) by spirometry, both expressed in liters (L).

Statistical analysis

The data are expressed as mean ± standard deviation (SD) and median with interquartile range (IQR) for continuous variables. Categorical variables were presented as frequency (percentage). Distributions were tested for normality using the D’Agostino-Pearson test. Variables were compared using two paired Student t-test or Wilcoxon rank-sum test, as appropriate. For comparison purposes, differences in lung CTvol and functional volumes were compared using the Wilcoxon matched-pairs signed-rank test. For comparison between CTvol and respiratory volumes (FVC and TLC), a Bland–Altman analysis (bias, limits of agreement), a linear regression analysis (95% confidence interval, R2), and Spearman correlation coefficients and their 95% confidence were calculated.

For estimation of the daily and annual CTvol loss during follow-up, linear regression between all individual CT across time was calculated. The Mann–Whitney test was used to compare daily and annual CTvol loss between IPF and non-IPF groups.

For determining a predictive CTvol loss in the IPF population for major adverse events (MAE), i.e., death and transplantation, we first calculated the absolute annual and relative CTvol loss between CT baseline and 1-year follow-up CT of each patient that underwent a minimal 3-year follow-up or had died or undergone transplantation within 3 years. Then, a receiver operator characteristic (ROC) analysis was used to evaluate the greater baseline CTvol and CTvol loss values with the Youden index after having dichotomized the population with or without MAE at 4 years after the first CT study (baseline). Accordingly, to this value, a chi-square test with Yates’ continuity correction was used to test the association between categorical variables (MAE at 4 years, baseline CTvol, annual and relative CTvol loss). Univariate Cox regressions were performed. A Kaplan–Meier statistical analysis was used to test the survival rate in the IPF population with the best predictor greater and lower than the threshold tested.

Statistical analysis was performed using the SPSS® software v23 (IBM) and R software v3.5. A p value less than 0.05 was considered significant. The Bonferroni correction was used to adjust the p values in the longitudinal and predictive studies, i.e., less than 0.01, respectively.

Results

Study population

In total, 560 patients were included in the study (341 men (61%); mean age of 65.4 ± 13.9 years), corresponding to 4657 CT studies (mean CT studies per patient of 6.2 ± 4.6) (Fig. 2).

Fig. 2
figure 2

Study flow chart. Three different yet partly overlapping groups of patients were consecutively selected from the population of 560 patients: 296 patients are common to the first two groups; 53 patients are common to the last two groups

Relation between CTvol and pulmonary function tests

We analyzed 424 patients corresponding to 1171 CT studies (Table 1). There was a good correlation between CTvol and FVC (Spearman coefficient of 0.86 (IC95%: 0.84–0.87, p < 0.0001) as well as between CTvol and TLC (Spearman coefficient of 0.84 (IC95%: 0.82–0.86, p < 0.0001) (Table 2). Bland–Altman analysis between CTvol and FVC revealed a positive proportional bias of 0.97, with 95.0% limits of agreement from − 0.27 to 2.21 L and a negative proportional bias between CTvol and TLC of − 0.68, with 95.0% limits of agreement from − 2.05 to 0.69 L (Table 2 and Fig. 3).

Table 1 Population characteristics of the comparative study
Table 2 Comparison between lung CT volume and pulmonary function tests (forced vital capacity and total lung capacity) in patients with fibrotic interstitial lug diseases (results of 1171 pairs of CT analyzed)
Fig. 3
figure 3

Linear regression and Bland–Altman analysis graphs between CT volume (CTvol), forced vital capacity (a, b) and total lung capacity (c, d)

Longitudinal lung CTvol loss between IPF and non-IPF groups

We analyzed 408 patients corresponding to 3332 CT studies, which were distributed in two groups: 81 (19.9%) patients in the IPF group and 327 (80.1%) in the non-IPF group (Table 3, Figs. 4 and 5). Median (IQR) follow-up was 1345 (1034–1867) days (approximately 3.7 years) in the IPF group and 2051 (1456–2881) days (approximately 5.7 years) in the non-IPF group. Patients with IPF had a mean ± SD of 8.0 ± 3.6 CT studies while patients with non-IPF had a mean ± SD of 8.2 ± 3.7 CT studies. Median (IQR) daily CTvol loss was 0.41 mL (0.05–0.80) in IPF group and 0.14 mL (− 0.05–0.40) in the non-IPF group (p < 0.0001). Median (IQR) annual CTvol loss was 155.7 mL (49.9–304.5) versus 50.7 mL (− 19.0–144.7), respectively (p < 0.0001) (Table 2).

Table 3 Population characteristics of the longitudinal study
Fig. 4
figure 4

Notched box-and-whiskers plot showing the annual CT Volume (CTvol) in IPF and non-IPF groups. The lower and upper margins of each box indicate the 25th and 75th percentile. Median is marked by the line in the box, and mean by the red dot. 95% confidence interval of the median is represented by the notches and outliers indicate the minimal and maximal values

Fig. 5
figure 5

Representative cases of interstitial lung diseases over 6-year follow-up. (a) Idiopathic pulmonary fibrosis, (b) systemic sclerosis, (c) unclassifiable interstitial lung disease, (d) combined pulmonary fibrosis and emphysema, (e) individual longitudinal CT volume loss graph

One-year lung volume loss from baseline in the IPF group

Annual CTvol loss between the CT baseline and the 1-year CT follow-up of 73 patients with IPF were analyzed. Twenty-one MAEs during the 3-year follow-up period were reported with a mean ± SD delay of 440 ± 288 days after the annual CT. Sixteen patients died with a mean ± SD delay of 416 ± 288 days after the annual CT from pneumonia (6, 38%), respiratory failure (3, 19%), acute exacerbation (4, 25%), lung cancer (1, 6%), sepsis (1, 6%), and pulmonary embolism (1, 6%). Five patients underwent single lung transplantation with a mean ± SD delay of 542 ± 197 days after the annual CT. Eight patients were lost before the 3-year follow-up and were previously excluded from the initial population. Four patients underwent transplantation before the annual CT follow-up and were excluded from the predictive analysis. Dichotomized baseline characteristics of the study population by presence of MAE are summarized in Table 4.

Table 4 Characteristics of patients with idiopathic pulmonary fibrosis

Based on a ROC statistical analysis, performed on baseline CTvol, absolute and relative annual CTvol loss, we found that a relative CTvol loss of 7.9% best matched for greater sensitivity and specificity, respectively of 81.0% and 69.2%, with an AUC at 0.74 [0.60; 0.86]. Optimal threshold for absolute CTvol loss was 0.37 L/year (76.2% sensitivity and 71.1% specificity, AUC = 0.73 [0.60–0.84]) and baseline CTvol was 3.19 L (57.1% sensitivity and 61.5% specificity, AUC = 0.56 [0.41–0.71]). Using multiple univariate Cox regression models (Table 5), no difference was observed with respect to age, sex, and baseline CTvol. But we found a significant effect for annual absolute CTvol loss (p < 0.01) and relative annual CTvol loss (p < 0.001), both dichotomized, on patient survival. The categorical variable (death at 4 years) was significantly associated with the annual CTvol loss with a p < 0.001 using a chi-square test with Yates’ continuity correction. Log-rank test demonstrated a significant association as well with a p < 0.0001. Kaplan–Meier survival curves are plotted in Fig. 6 according to 7.9% threshold for relative annual CTvol loss. Mean survival times were 1039 (SEM: 28) and 746 (SEM: 73) days respectively in the groups with lower and higher relative CTvol loss than 7.9%.

Table 5 Univariate Cox regression models: CTvol parameters are dichotomized (*) after application of separate thresholds (baseline of 3.19 L, decline of 0.37 L or 7.9% per year from baseline). Cox regression was significant for absolute and relative CTvol loss
Fig. 6
figure 6

Three-year survival Kaplan–Meier curve and log-rank test, as a function of a 7.9% threshold for the relative CTvol loss in the idiopathic pulmonary fibrosis population. Mean survival times are 1039 (SEM: 28) and 746 (SEM: 73) days respectively (2.8 versus 2.0 years)

Discussion

In the present study, we found that lung CT volume measurement, enabled by an automatic approach based on a deep learning algorithm, correlated strongly with FVC and TLC. Using longitudinal lung CT volume loss, we found that patients with IPF had a distinct disease course than other ILDs. In addition, in the IPF group, higher annual CT volume loss was associated with a worse prognosis.

The first results were the colinear relationship between CTvol with TLC, as previously shown [14, 17], but also with FVC among 1171 pairs of CT and PFT in a large cohort of ILD patients. CTvol was lower than TLC, such as previously demonstrated [18], which can be explained by patient positioning, e.g., the difference between CT exams (decubitus) and plethysmography (sitting position). Moreover, TLC measurement (body plethysmography) sums up all thoracic airspaces (including anatomic dead space) while CTvol only provides the measurement of parenchymal lung volume [19]. Thus, the delta between TLC and CTvol represents an anatomic dead space of 450 mL. CTvol on the other hand, was higher than FVC, which can be explained by the contribution of the residual volume contrary to the evaluation of FVC. In addition, the Bland Altman analysis showed a positive proportional bias suggesting a higher difference between CTvol and FVC for high lung volumes.

From a clinical perspective, longitudinal assessment of volume loss may be more useful than a single measurement at baseline. The results of the present longitudinal study demonstrated a greater and faster CTvol loss among IPF than non-IPF patients, associated with shorter follow-up time (due to greater mortality) but a comparable number of CT exams. These findings are consistent with higher morbidity and mortality in IPF as compared to other fibrotic ILDs, leading to more frequent acute complications requiring more exams, and earlier death [20]. In addition, results from the longitudinal analysis showed in the IPF population a good agreement between annual CTvol loss and annual FVC loss as reported previously [21]. Of note, PFTs are time-consuming and require multiple measurements along with technical expertise in order to be reproducible [22,23,24,25,26,27].

Finally, in our attempt to clarify the interest for lung CTvol in the IPF population, we showed a significant association between the annual CTvol loss and the prognosis. We found a relative CTvol loss predictive value of 7.9% for death at 3 years after 1-year CT follow-up which is close to the 10% FVC decline threshold currently used as an endpoint, i.e., indicating disease progression in IPF clinical studies [28]. Using the 7.9% threshold, we found a mean survival time of 2.8 versus 2.0 years which is in line with the average 2.5-year survival time reported in IPF patients [29]. Interestingly, we found also an association between the absolute CTvol loss and mortality. In addition, the baseline CT volume was not associated with the prognosis contrary to the CT volume loss while there was a trend to a lower baseline CT in patients with poor prognosis. This result is supporting the findings of previous studies that highlighted the stronger prognostic value of the clinical and physiological parameters change compared to the baseline lung function in fibrotic ILD [30, 31]. However, the meaning of this finding may be limited due to the absence of normalization as a function of the age, sex, and height of the patient.

In this study, the strength was to use an automated solution based on a deep learning algorithm that allowed the analysis of a large cohort, previously limited by exclusive manual processing [21,22,23]. This solution is a commercially available fully automatic application that does not require any user interaction, besides installation, and that is part of the software suite available in the IntelliSpace Portal (Philips Healthcare). In our study, this was made available as a standalone to be deployed in a batch mode. But lung volumetric measurements can be done on-demand or in pre-processing mode as soon as the data from the scanner is available. Running time (including additional measurements and reporting) varies, depending on the IT network and system workload, typically less than 2–3 min. Analysis of the inter-observer variability between a manual and automatic segmentation reported by the vendor showed a median and interquartile range (IQR) of the absolute volume differences of 19.6 ml and 9.1–32.0 mL IQR (Q1–Q3) [32], which appears as a relatively small range in comparison to the CTvol loss values reported in our study. Taken together, this explains the feasibility for investigating lung volumes at CT in a large representative cohort of consecutive patients with fibrotic ILD in a center of expertise. In addition, this automatic approach may be of great interest in many cases where measuring PFTs can be challenging. PFTs, although generally reproducible, are also person-dependent (i.e., on the patient and on the technician or physician performing the test). Measuring lung volumes using a different method could help interpret the findings, as do complementary methods in other areas. One additional result from the CT that cannot be obtained by PFTs is the ability to measure lobar volumes that may help to define the clinical evolution in some fibrotic ILDs, as suggested recently in pleuroparenchymal fibroelastosis [33]. Hence, our study represents a first step towards the identification of a new biomarker predicting physiological outcomes in order to consider the design of a new IPF mortality-risk score [34]. By taking into account additional CT parameters, automatic quantitative CT analysis could become a valid alternative or a complementary tool to pulmonary function tests in patients with ILD. However, the implementation of this technique in clinical practice warrants further prospective and controlled studies.

This study besides its retrospective and monocentric design has several limitations. First, CTvol measurement may vary with the individual’s degree of inspiration during the CT examination. However, previous study reported an acceptable variability (< 10%) and a good repeatability of CTvol in patients with restrictive lung diseases, explained by reduced pulmonary compliance leading to less variable inspiratory volumes [36]. Hence, we assumed that inspiratory CT would be sufficient for patient follow-up although combined inspiratory and expiratory CT exams are often recommended [37]. Second, the presence of concomitant emphysema in ILD can underestimate disease progression despite FVC and TLC remaining stable [7], that is why estimation of lung volume solely with PFT can therefore sometimes be insufficient to monitor restrictive physiology. Nevertheless, our study did not take into account patients with concomitant emphysema. Hence, it would be interesting to perform additional lobar segmentation since pulmonary fibrosis takes place mainly at the basal lobes while emphysematous changes preferentially affect upper lobes [38]. Third, in the longitudinal study, patients with at least 4 CT studies were included to guarantee the quality of linear regressions which may be variable depending on the sampling pattern across time, i.e., may lead to influential points. However, in this real-life study, the CTs scans were not performed at fixed intervals, which can represent a bias. Fourth, our comparative study between the PFT results and CT volume values did not take into account the PFT values expressed as a percentage of the predicted value. Nevertheless, our aim was to evaluate the absolute precision of the CT for volume quantification. Fifth, the annual CT volume loss was assessed only in the IPF patients. Currently, there is no recommendation for performing an annual CT follow-up in the non-IPF patients, contrary to the IPF patients who undergo annual follow-up CT in our expert center according to the French recommendations for the management of IPF [39]. This explains the difficulty to assess the annual CT lung volume loss in non-IPF patients. In addition, non-IPF patients who had a follow-up CT probably had a CT due to a clinical indication including the occurrence of acute exacerbations, which may bias our analysis of the chronic disease course of the fibrotic ILD. Finally, cases were enrolled over a period of 13 years, and several different CTs were used; this in fact demonstrates that our approach can be implemented to various scenarios in a real-life setting.

In conclusion, automatic evaluation of the lung CT volume, in patients with ILD and particularly with IPF, may be an alternative or a complementary biomarker to pulmonary function tests for assessment of lung volume loss, in clinical care as in randomized trials.