Background

Chronic obstructive pulmonary disease (COPD) is a leading cause of death and healthcare burden in the United States and worldwide. Accounting for over 3 million deaths globally in 2015 [1], this disease is expected to rise in prevalence as the world population ages [2]. COPD is understood to be a complex heterogeneous disease presenting clinically diverse phenotypes [3, 4]. Major causes of airflow obstruction are attributed to chronic bronchiolar obstruction, a.k.a small airways disease (SAD), and emphysema. Although SAD and emphysema are treated as separate COPD subtypes, studies have shown strong quantitative evidence that SAD exists as an intermediate state between healthy lung tissue and emphysema—i.e., irreversible lung damage—in COPD pathogenesis [5,6,7]. At present, little has been done to better quantify the onset of SAD from healthy lung parenchyma.

The Parametric Response Map (PRM) is a CT-based voxel-wise computational technique that can identify and quantify functional small airways disease (fSAD; an indirect measure of SAD) even in the presence of emphysema [8]. The percent volume of PRM-derived fSAD (PRMfSAD), i.e., the amount of fSAD in the lungs, has improved COPD phenotyping and the prediction of spirometric decline in subjects at risk of COPD [9]. To determine the value of spatial features from each PRM classification, we developed topological PRM (tPRM) as an extension of the PRM algorithm [10]. These radiographic tPRM readouts were shown to improve upon commonly used whole-lung PRM measures with respect to COPD characterization and progression [11, 12], and correlate to structural changes in lung tissue samples from lung transplant recipients diagnosed with bronchiolitis obliterans [13, 14].

In this study, we evaluated the PRM topologies volume density (V), a measure of extent, and Euler-Poincaré Characteristic (χ), a measure of pocket formation, of normal lung and fSAD as independent readouts of COPD severity, pulmonary function, and extent of emphysema using the Phase 1 COPDGene cohort [15]. We also investigated the potential of these topologic readouts as predictors of spirometric decline using a machine-learning model. This study demonstrates how tPRM readouts may be used as possible measures of early emphysema and COPD progression.

Methods

Study sample

Our study was a secondary analysis of data from COPDGene (ClinicalTrials.gov: NCT00608764), a large Health Insurance Portability and Accountability Act-compliant prospective multi-center observational study. In Phase 1 (2007–2012) and Phase 2 (2013–2017), 5-year follow-up, written and informed consent was obtained from all participants and the study was approved by local institutional review boards of all 21 centers. Ever-smokers with greater than or equal to 10 pack-year smoking history, with and without airflow obstruction, were enrolled between January 2008 and June 2011. Participants were non-Hispanic white or African American. Participants underwent volumetric inspiratory and expiratory CT using standardized protocol; images were transferred to a central lab for protocol verification and quality control (QC) [15]. Exclusion criteria included a history of other lung disease (except asthma), prior surgical excision involving a lung lobe or greater, present cancer, metal in the chest, or history of chest radiation therapy. Participants were excluded from the present study due to inadequate CT for computing tPRM, such as missing an inspiration/expiration scan, or failing QC implemented specifically for the present study. Our QC protocol is described in Additional File 1 (Supplemental Methods 1). Data for participants evaluated here have been utilized in numerous previous studies and a list of COPDGene publications can be found at [16]. Our study is the first to report tPRM analysis across the whole Phase 1 cohort and predict spirometric decline over 5 years in the Phase 2 subset of COPDGene participants.

Spirometry was performed in the COPDGene study before and after the administration of a bronchodilator, specifically 180 mcg of albuterol (Easy-One spirometer; NDD, Andover, MA). Post-bronchodilator values were used in our analyses. COPD was defined by a post-bronchodilator FEV1/FVC of less than 0.7 at the baseline visit, as specified in the Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines [17]. GOLD grades 1–4 were used to define disease severity. GOLD 0 classification, i.e., “at-risk,” was defined by a post-bronchodilator FEV1/FVC ≥ 0.7 at the baseline visit, alongside FEV1% predicted ≥ 80%. Participants with FEV1/FVC ≥ 0.7 with FEV1% predicted < 80% were classified as having preserved ratio impaired spirometry (PRISm) [18]. Demographic and spirometric measures used in this study included age, sex, race, smoking history, scanner manufacturer, body mass index (BMI), FEV1% predicted, FEV1/FVC and forced mid-expiratory flow (FEF25 − 75).

Computed tomography and Topological PRM Analysis

All computed tomography (CT) data were obtained from multiple sites associated with the COPDGene project at Phase 1. Whole-lung volumetric multidetector CT acquisition was performed at full inspiration and normal expiration at functional residual capacity using a standardized previously published protocol [15]. Data reconstructed with the standard reconstruction kernel were used for quantitative analysis. All CT data were presented in Hounsfield units (HU), where stability of CT measurement for each scanner was monitored monthly using a custom COPDGene phantom [15]. For reference, air and water attenuation values are − 1,000 and 0 HU, respectively.

PRM were determined from paired CT scans using Lung Density Analysis (LDA) software (Imbio, LLC, Minneapolis, MN). LDA segmented the lungs from the thoracic cavity with airways removed. Inspiratory CT scans were spatially aligned to the expiratory geometric frame using deformable image registration. Lung voxels were classified using pre-determined HU thresholds as: normal (PRMNorm, -950 < inspiration HU ≤ -810, and expiration HU ≥ -856), functional small airways disease (PRMfSAD, -950 < inspiration HU ≤ -810, expiration HU < -856), emphysema (PRMEmph, inspiration HU < -950, expiration HU < -856), or parenchymal disease (PRMPD, inspiration HU > -810) [19, 20]. Only voxels between − 1,000 HU and − 250 HU at both inspiration and expiration were used for PRM classification. Each PRM classification was quantified as the percent volume, which is defined as the sum of a PRM classification normalized to the total lung volume at expiration multiplied by 100. There were a few noisy voxels that were considered indeterminate by PRM (inspiration < -950 HU, expiration > -856 HU) that were excluded from our analysis as they did not form consolidated regions of interest within the parenchyma.

Topological analysis of PRM was performed using methods previously described [10]. tPRM metrics were defined through application of Minkowski measures on 3D binary voxel distributions: volume density (V) and Euler-Poincaré Characteristic (χ) [21]. Maps of V and χ were computed for each PRM class (Norm, fSAD, Emph, and PD) using a 3D moving window of size 21 × 21 × 21 voxels evaluated on a grid of every 5th voxel. V was normalized by the Minkowski estimate of the mask within the same local window volume (rather than a direct calculation of the mask volume in the window as previously described) and χ by the masked window voxel count. Linear interpolation was applied to determine V and χ values for all segmented voxels.

To indicate the PRM class associated with a Minkowski measure, the class is presented as a superscript (e.g., VfSAD is the volume density of PRMfSAD). tPRM analysis was performed using open-source and in-house software developed in MATLAB R2019a (MATLAB, The MathWorks Inc., Natick, MA). A detailed overview and diagram, of computing tPRM from raw imaging data, was made by Hoff et al. [10]. Because the focus of this study is the relationship between normal parenchyma and SAD, and its association with emphysema, all analyses were performed using V and χ for PRM classifications Norm and fSAD. For completeness, V and χ for PRM classifications Emph and PD are provided.

Phase 1 data and statistical analysis

Data in this study are presented as mean and standard deviation unless stated otherwise. Correlations between V and χ for PRMNorm and PRMfSAD were calculated using Spearman rank-order correlation coefficients (\(\rho\)). The total Phase 1 cohort was separated into two subsets based on spirometry-confirmed COPD: non-COPD (FEV1/FVC ≥ 0.7) and COPD (FEV1/FVC < 0.7). Cross-sectional multivariable regression analysis was performed on both subsets using a stepwise approach with V and χ for PRM classifications Norm and fSAD as independent variables and selected pulmonary function testing and clinical features as outcome variables, controlling for age, gender, race, BMI, smoking (pack years) and CT vendor. These control variables were included as compulsory independent variables in all regression models. Statistical work was conducted using IBM SPSS Statistics v27 (SPSS Software Products). In all tests, significance was defined by p < 0.05.

Predict spirometric decline

We evaluated baseline V and χ for PRM classifications Norm and fSAD as predictors of FEV1 decline over 5 years using a machine learning (ML) model. A total of 4483 cases from the Phase 2 cohort of the COPDGene longitudinal trial, a subset of Phase 1, had FEV1 measurements at baseline and 5-year follow up. Our ML model is a sparse dictionary learning algorithm [22,23,24,25,26] that classifies image patch features as “normal” or “abnormal”. In our method, we used the tPRM maps VNorm, VfSAD, χNorm, and χfSAD of each case as inputs for training and testing the algorithm. For training our ML model, individual cases were stratified based on the change in FEV1 over 5 years [= (FEV1 at yr 5 – FEV1 at yr 0)/5 years] as fast (ΔFEV1/yr ≤ -60 ml/yr; n = 1516) and slow progressors (ΔFEV1/yr > -60 ml/yr; n = 2967). We used 35% of the data for training and 65% for testing the model [27, 28]. Training was performed on a randomly selected subset of 1569 cases, with n = 531 fast progressors and n = 1038 slow progressors. The remaining 2914 cases, consisting of n = 985 fast progressors and n = 1929 slow progressors, were used for testing the algorithm. In brief, our ML model is designed to associate unique features from the input image patches with fast and slow progressors. This is achieved by randomly selecting image patches from within the lung and extracting the information from the inputs (tPRM maps VNorm, VfSAD, χNorm, and χfSAD given as inputs to the ML algorithm) at these image patch locations and comparing their underlying patch features with the compiled class dictionaries of features, which are determined during training. It is important to note that no previous knowledge about the case and lung tissue features, such as emphysema, are provided for the algorithm to delineate “normal” from “abnormal” lung tissue. Details on model design and methods for training and testing are provided in Additional File 1 (Supplemental Fig. 1 and Supplemental Methods 2). To determine the contribution of each feature to the model selection, we used the minimum redundancy maximum relevance feature selection algorithm [29] to rank the tPRM inputs used in the dictionary learning algorithm. The algorithm quantifies the redundancy and relevance using mutual information of variables [30, 31]. We also investigated the selection bias for each input in the ML model and obtained the prediction accuracy for 10 different choices of training image patches, considering each input separately in the model. The prediction accuracy for each training run is fit to a Gaussian probability density function [32, 33]. All processing and analyses were performed using in-house algorithms developed in MATLAB version 2020a (MathWorks, Natick, MA). To determine the contribution of our ML model to account for spatial features in predicting FEV1 decline, we determined if whole lung mean values of VNorm, VfSAD, χNorm, and χfSAD were predictive of FEV1 decline using a logistic regression classifier.

Case Study: spatial analysis

To better understand the relationship between PRMfSAD and PRMEmph, we evaluated the spatial dependance of V and χ for these PRM classifications from a single subject. The case is a female subject, 48 years of age, diagnosed with GOLD 4 COPD. On a single axial slice, profiles of V and χ for PRMfSAD and PRMEmph were generated by selecting points from high emphysema (VEmph > 0.6) and low emphysema (VEmph < 0.2). A line plot (Additional File 1: Supplemental Fig. 4) was produced for V and χ vs. distance along each point of the profile. The distance, in units of centimeters, along the image profile was determined using the voxel dimensions of the CT scan. All processing and analyses were performed using in-house algorithms developed in MATLAB version 2020a (MathWorks, Natick, MA).

Results

Population characteristics

The original COPDGene Phase 1 cohort consisted of 10,300 individuals. We excluded 1,344 participants for: inadequate CT data, such as missing an expiration or inspiration scan, to conduct tPRM analysis (n = 1,125); missing clinical data (n = 16); or failing to pass our CT-based QC testing (n = 203). Further details of CT QC are provided in Additional File 1 (Supplemental Methods 1). The resulting complete subset used for analyses thus consisted of 8,956 participants. Baseline demographics and lung function for all Phase 1 participants, grouped based on FEV1% predicted and FEV1/FVC—that is, by GOLD grade or PRISm as described in the Methods—are reported in Table 1. Due to the COPDGene recruitment strategy, the proportion of GOLD 0 (FEV1/FVC ≥ 0.7, FEV1% predicted ≥ 80%) participants [15] account for almost half of the study population (43%; 3,867 of 8,956 participants). Increasing percent volume of PRM-derived fSAD (PRMfSAD) and PRM-derived emphysema (PRMEmph), with decreasing PRMNorm, was observed with higher GOLD grades. This is consistent with previously published work. PRM-derived parenchymal disease (PRMPD) was found to be elevated in PRISm and GOLD 0 participants (35.8 ± 16.4% and 26.3 ± 12.8% of the total lung volume, respectively) as compared to the COPD subset.

Table 1 Clinical characterization of the study population

Topological readouts of PRM

Presented in Fig. 1 is a case with elevated fSAD (PRMfSAD = 40%). Representative coronal slices of the expiration CT scan and PRMfSAD, overlaid on CT scan, are provided. To illustrate the dependence of χ on the arrangement of PRMfSAD, we have included VfSAD and χfSAD maps indicating regions with similar values of VfSAD (blue and magenta boxes). As expected, VfSAD (Fig. 1C) is dependent on the amount of fSAD (yellow voxels in Fig. 1B). Averaged over the lungs, VfSAD is proportional to the percent volume of PRMfSAD by a factor of 100. However, χfSAD > 0 (magenta box in Fig. 1D) corresponds to the formation of fSAD pockets (magenta box Fig. 1B), whereas χfSAD < 0 (blue box in Fig. 1D) is the consolidation of these pockets into a mesh with holes (blue box in Fig. 1B).

Fig. 1
figure 1

Illustration of Volume Density (V) and Euler-Poincaré Characteristic (χ) for PRMfSAD. Presented are representative coronal slices for the (A) expiratory CT scan with associated (B) PRMfSAD overlay (yellow). Included are the (C) volume density and (D) Euler-Poincaré Characteristic of PRMfSAD. Blue and Magenta boxes indicate two lung regions with elevated VfSAD that have negative and positive χfSAD, respectively. The subject is a GOLD 3 female aged 53 with FEV1% predicted of 32% and percent volume of PRMfSAD of 40%

The volume density of PRMNorm and PRMfSAD demonstrated an inverse relationship with increasing COPD severity (Fig. 2A), consistent with previous work. A similar inverse relationship was observed for χ of both normal lung and fSAD (χNorm and χfSAD). Values of χNorm and χfSAD were found to flip about zero (e.g., χfSAD changes from positive to negative values) from GOLD 2 to GOLD 4 (Fig. 2B). From Fig. 2B we observe that χNorm and χfSAD had means (standard deviations) of -0.0084 (0.0071) and 0.0047 (0.0074), respectively, for cases diagnosed as GOLD 2. For those with severe COPD, i.e., GOLD 4, χNorm and χfSAD are 0.0039 (0.0055) and − 0.0036 (0.0048), respectively. Mean values of χEmph and χPD were found to be positive and similar across GOLD. We did not consider mean breadth and surface area of PRMNorm and PRMfSAD in our analysis, as we did not see such a strong relationship between them (Additional File 1: Supplementary Fig. 2).

Fig. 2
figure 2

Boxplots for topological measures of PRM maps PRMNorm (green), PRMfSAD (yellow), PRMEmph (red) and PRMPD (magenta) across all GOLD stages, “at-risk” (GOLD 0), and PRISm. Plots of (A) volume density, describing class magnitude (relative amounts of voxels) and (B) Euler-Poincaré characteristic, describing class homology, determined by number and type of holes within class volumes. Box plots were computed following standard protocol for box and whiskers; box lines determined by lower quartile (Q1), middle quartile / median (Q2) and upper quartile (Q3), and whiskers are drawn out to Q1–1.5 x IQR and Q3 + 1.5 x IQR for lower and upper limits, respectively. IQR = Q3-Q1. Outliers are defined as points beyond the given upper and lower limits and illustrated as black points with a random bounded horizontal perturbation beyond box whiskers

We further evaluated the relationship of PRMNorm and PRMfSAD with respect to V (Fig. 3A) and χ (Fig. 3B). Both V and χ demonstrated strong correlations between Norm and fSAD (ρ = -0.666, p < 0.001 and ρ = -0.745, p < 0.001, respectively) over the Phase 1 cohort. Here the GOLD stages are coded by color and the relative amount of emphysema, quantified by VEmph, by size of the marker. As observed in Fig. 3A, VNorm versus VfSAD had more scatter in the data compared to χNorm versus χfSAD (Fig. 3B). As expected, GOLD 4 cases with elevated emphysema (VEmph) demonstrated a drop in VNorm and VfSAD values. In contrast, χNorm consisted of primarily positive values, whereas positive and negative values were observed for χfSAD (Fig. 3B). Although VfSAD was found to be strongly correlated to VEmph (ρ = 0.845, p < 0.001), only a weak correlation was observed between χfSAD and χEmph (ρ = -0.155, p < 0.001).

Fig. 3
figure 3

Scatter plots of all study sample participants for (A) VNorm versus VfSAD and (B) χNorm versus χfSAD. Individual points are color coded based on COPD classifications. The size of the points indicates the amount of emphysema as measured by the volume density of PRMEmph (VEmph)

Multivariable regression analysis

Presented in Table 2 are results from multivariable regression analyses that demonstrate the contribution of V and χ to PRMNorm and PRMfSAD when modeling spirometric measures and the volume density of emphysema, controlling for age, sex, race, BMI, pack-years, and CT vendor. Among those with spirometrically confirmed COPD, VNorm was found to be significantly associated with multiple clinical measures including FEV1% predicted, FEV1/FVC, FEF25 − 75 and VEmph (see Table 2). VfSAD and χfSAD were found to independently and significantly contribute to FEV1% predicted (β = 0.065, p = 0.004 and β = 0.106, p < 0.001). Only the Norm topological measures were found to contribute to FEV1/FVC (β = 0.668, p < 0.001 for VNorm and β = -0.120, p < 0.001 for χNorm), whereas V and χ for both Norm and fSAD were found to be significant parameters for FEF25 − 75. With respect to VEmph, extent of emphysema, V and χ for Norm and fSAD were highly significant but demonstrated similar trends irrespective of PRM classification. For completeness, the same analyses were performed on the non-COPD cohort (Additional File 1: Supplemental Table 1). As compared to the COPD cohort, statistical models generated from the non-COPD cohort demonstrated significant parameters but with weaker correlations (i.e., adjusted R2).

Table 2 Multivariable regression for COPD subset

Prediction model of spirometric decline

Representative axial slices of expiration CT scan, PRM, VfSAD, χfSAD and corresponding patch probability maps from a fast progressor (with ΔFEV1/yr of -249 ml/yr) are provided in Fig. 4. Our ML model correctly classified this subject as a fast progressor. This case is a 63-year-old male, diagnosed at baseline with GOLD 2 COPD. Using V and χ from PRMfSAD and PRMNorm as inputs, the ML model was able to determine regions of emphysema, discernible from existing fSAD, observed in the right upper lung as “abnormal” (blue patches in the probability maps). In contrast, the dorsal lung regions were classified as “normal” (red patches in the probability maps) due to the absence of fSAD and emphysema. For completeness we have provided in Additional File 1 (Supplemental Fig. 3) representative axial slices of expiration CT scan, PRM, VfSAD, χfSAD and corresponding patch probability maps from a slow progressor (with ΔFEV1/yr of 101 ml/yr).

Fig. 4
figure 4

The dictionary learning results for a 63-year-old male diagnosed at baseline with GOLD 2 COPD and declared a fast progressor with ΔFEV1/yr of -249 ml/yr. Representative axial slice of an expiratory CT scan acquired at baseline, its associated PRM map, the tPRM maps VfSAD and χfSAD of PRMfSAD, and their image patch probability maps from the dictionary learning model

As seen in Fig. 5A and B, our ML model had an overall classification accuracy of 70.6% and Area Under the Curve (AUC) of 0.69 of the receiver operating characteristic (ROC) curve. We compared our ML model with a simple logistic regression model using whole lung mean values of VNorm, VfSAD, χNorm, and χfSAD. Figure 5B shows that the logistic regression model only achieved an AUC of 0.55. The contribution of each of the inputs to the model (VNorm, VfSAD, χNorm, and χfSAD) are shown in Fig. 5C and D. V and χ of PRMfSAD are dominant inputs, followed by V and χ of PRMNorm (Fig. 5C). Using a feature rank analysis performed on our test set, we observed that V and χ of PRMfSAD are important to achieve higher prediction accuracy. In fact, χfSAD was found to have the smallest spread/variance (Fig. 5D), indicating highly desirable robustness to the choice of training image patches and its usefulness as an input in the ML model. As reported in Table 3, “normal” patches, on average, consisted primarily of PRMNorm, elevated VNorm (abundant) and negative χNorm (consolidated), with negligible PRMfSAD, low VfSAD (depleted) and positive χfSAD (sparse pockets). In “abnormal” patches, similar values of V and χ for PRMNorm and PRMfSAD were observed (Table 3). Positive and negative values in χfSAD were found for “normal” and “abnormal” patches, respectively. This is consistent with the inverse relationship seen with increasing COPD severity shown in Fig. 2.

Fig. 5
figure 5

Results and relevance of the different features (tPRM metrics as inputs) used in the dictionary learning method. (A) Confusion Matrix showing the sensitivity and specificity of the ML model classifications for both the fast progressor (n = 985) and the slow progressor (n = 1929) classes in the test set. Green colored and red colored fields in the matrix represent agreement and disagreement, respectively, of the ML model with the actual decision. (B) Receiver Operating Characteristic (ROC) curve for our ML model and the logistic regression classifier with the corresponding Area Under the Curve (AUC) statistics. (C) Bar plot showing the feature importance score and feature ranking using the minimum redundancy maximum relevance method. (D) Plot showing the distribution of the features and their prediction accuracy over ten different training runs

Table 3 Image patch topological PRM metrics in the ML model

Dependence between topologies of PRMfSAD and PRMEmph

As the topologies of PRM were determined as averages over the whole lungs, we provide a case study illustrating the relationship between V and χ of PRMfSAD and PRMEmph at the local level. Presented in Additional File 1 (Supplemental Fig. 4) are the profiles of V and χ of PRMfSAD and PRMEmph from a region of the right lung with elevated and reduced VEmph (orange circle and star, respectively; Supplemental Fig. 4A and C). The case is a female subject, 48 years of age, diagnosed with GOLD 4 COPD. The subject was found to have on average high levels of VfSAD (0.37) with relatively elevated VEmph (0.1). Mean values for the whole lungs of χ were 0.008 and − 0.009 for PRMEmph and PRMfSAD, respectively. As seen in Additional File 1 (Supplemental Fig. 4C), VfSAD increased while VEmph decreased further from lung with the highest level of VEmph (~ 0.6 at orange circle in Additional File 1: Supplemental Fig. 4A and C). At approximately 1.8 cm, volume densities between PRMfSAD and PRMEmph transitioned. In addition, χfSAD was found to increase with decreasing χEmph with transition occurring at ~ 1.2 cm.

Discussion

The topological parametric response map is an extension of the well-established PRM method, a quantitative imaging marker of SAD [8]. In this study, we have demonstrated that inclusion of topological features, in this case the Euler-Poincaré Characteristic (χ), improved characterization and interpretation of fSAD in COPD as a complimentary readout of volume density (V), which is equivalent to traditional percent volume of PRM classifications [10]. This study also evaluated the role of PRM-defined normal parenchyma (PRMNorm) and fSAD (PRMfSAD) as lone indicators of COPD severity. We observed distinct patterns in topological metrics with respect to GOLD grades and identified a complete inversion in topology, characterized by Euler-Poincaré Characteristic χ, between normal lung and fSAD, in mid-to-late stages of COPD. We also found V and χ of PRMNorm and PRMfSAD to have statistically significant correlation with spirometric measures and emphysema and to be predictive of spirometric decline.

Our study builds on previous work by Hoff et al. [10] on tPRM characterization in COPD. This study used a much smaller population (n = 88) to demonstrate the trends of all four topological features (volume density, surface area, mean curvature and Euler-Poincaré Characteristic) with increasing COPD severity [10]. Limited in statistical power, it instead focused on the surface area of fSAD. Access to a notably larger population (n = 8,956) in the current study allowed us to evaluate the volume density (V) and Euler-Poincaré Characteristic (χ) of PRMNorm and PRMfSAD and relate our findings to the field’s current understanding of COPD progression, i.e., normal parenchyma transitions to emphysema through SAD.

A key finding of our study is the ability to quantify parenchymal lung health, based not only on the extent but also on the arrangement of local lung abnormalities, i.e., fSAD. This is rooted in the concept that the lungs are healthy (i.e., PRMNorm) and COPD progresses through SAD (i.e., PRMfSAD), an intermediate between normal and emphysematous lung tissue, to emphysema. The nature of this transition suggests χ may be capturing a fundamental mechanism in the emergence of fSAD. Based on our observation, fSAD appears to develop as distinct pockets, which are represented as positive values in χfSAD within healthy lung tissue, as depicted in the blue box in Fig. 1B. With increasing COPD severity, fSAD pockets coalesce to a mesh, which is represented by negative values in χfSAD (magenta box in Fig. 1B). On a whole lung level, this transition occurs on average from GOLD stages 2 to 4. By quantifying the amount and arrangement of normal and fSAD parenchyma, one can assess the severity of COPD. As fSAD is an intermediate between healthy lung and emphysema, increasing levels of emphysema have a direct effect on V and χ of fSAD. This is observed in Fig. 3 and Additional File 1 (Supplemental Fig. 4), where increasing values of VEmph resulted in a drop in VfSAD and increase in χfSAD. These trends were reflected in our multivariable model for VEmph as well (Table 2).

In a seminal study, McDonough and colleagues [7] provided pathological evidence demonstrating the role of SAD in COPD progression. Using high resolution (~ 10 μm) microCT to analyze frozen lung samples from lung transplant recipients with end-stage COPD, they found that widespread narrowing and destruction of the smaller airways (i.e., SAD) occurred before emphysematous lesions became large enough to be visible on standard CT imaging. They concluded that SAD might serve as an emphysema precursor. Based on their observation, we postulated that the transition observed between χNorm and χfSAD (Figs. 2 and 3) should be observed for χfSAD and χEmph. Using mean values of χ over the lungs, χEmph was found to be relatively stable, generating positive values across GOLD (Additional File 1: Supplemental Fig. 2), as well as demonstrating a weak correlation to χfSAD (ρ = -0.155, p < 0.001). Nevertheless, evaluating χfSAD and χEmph at the local level, we observe a strong association between these two readouts (Additional File 1: Supplemental Fig. 4), which may be linked to the structural changes in the terminal airways observed using microCT of lung explants.

In a recent study, Bhatt and colleagues evaluated a CT readout, referred to as the mean Jacobian determinant of normal voxels, at varying distances from emphysematous tissue [34]. When measured at 2 mm from CT voxels designated emphysema (i.e., voxel HU <-950 HU), this CT-based readout was found to be predictive of spirometric decline. Our spatial analysis of a single case clearly demonstrates a transition in topologies of PRMfSAD and PRMEmph, 1.8 cm and 1.2 cm for V and χ, respectively (Additional File 1: Supplemental Fig. 4). It is the association of topologies between PRMfSAD and PRMEmph at the local level that allows our machine learning model to predict spirometric decline, with an accuracy of 70%, in the absence of any emphysema readout as an input (Fig. 4). Although the readouts reported by Bhatt and colleagues lacked quantification of SAD, there is clear agreement that lung tissue along the periphery of emphysematous tissue provides potential insight into COPD progression. Using only topologies of PRMNorm and PRMfSAD, our patch-based ML model outperformed the whole-lung logistic regression model (Fig. 5B). This result highlights the importance of the spatial relationship of χfSAD to χEmph to predict spirometric decline (Figs. 4 and 5).

We acknowledge several notable limitations. COPDGene comprises over 20 study sites, making scanner variation and reconstruction kernel inconsistency inevitable. Sensitivity of PRM to scanner variability was addressed previously [35] and although effort was made to apply PRM only to soft kernels, variability in scanner type was unavoidable. However, we included scanner vendor in our multivariable regressions and found that it did not significantly confound models. Another limitation is variation in levels of inspiration and expiration during CT acquisition. Earlier work demonstrated that even small perturbations from functional residual capacity (FRC) have an observable effect on threshold-based techniques such as PRM [35]. To limit this, we implemented QC that excluded participants based on erroneous volume changes or strong discordance with correlation between PRMNorm and FEV1% predicted.

Conclusions

In this paper, we have demonstrated that topological features, V and χ, are able to enhance the sensitivity of PRM classifications, notably Norm and fSAD, to extent of emphysema and COPD severity. These data support the concept that as pockets of small airways disease coalesce, surrounding normal tissue is lost. Pockets of fSAD are seen to correlate with increasing presence of emphysema, independent of the amount of fSAD present. We further demonstrated that local levels of χfSAD and χEmph correlate, which may be explained by bronchiolitis along the periphery of emphysematous tissue observed by McDonough and colleagues using microCT. In addition, we demonstrated that local values of V and χ for PRMNorm and PRMfSAD provide sufficient information to predict spirometric decline, even in the absence of any prior knowledge of emphysema. Our study provides a unique strategy to detect subtle changes in lung parenchyma that may progress to emphysema. This approach to monitoring extent and arrangement of Norm and fSAD offers insight into COPD phenotypes and provides improved prognostic information that has relevance in clinical care and future clinical trials.