Background

Smokers without interstitial lung disease or emphysema may have quantitative interstitial abnormalities (QIA), which are subtle parenchymal changes on chest computed tomography (CT) scans detected by an automated method [1]. The presence and progression of QIA (also called interstitial features in prior work) are associated with worse lung function, reduced exercise capacity, increased respiratory symptoms, and death [1,2,3,4]. Risk factors for QIA include advanced age, current smoking status, the MUC5B polymorphism, and female sex [1,2,3,4]. Given its shared risk and clinical factors to both idiopathic pulmonary fibrosis (IPF) and chronic obstructive pulmonary disease (COPD), QIA may be a precursor to advanced parenchymal diseases in some patients [5, 6], for whom existing therapies slow the future development or decrease symptoms of disease but do not reverse the parenchymal damage [7,8,9,10]. QIA may be a useful target for early intervention, but the biomarkers associated with QIA remain unclear.

Metabolomics may be useful for understanding the biochemical perturbations associated with an early stage of lung disease like QIA. Metabolomics is the field of the identification and measurement of small molecules (≤ 1500 Daltons) in a single biological specimen [11]. Endogenous metabolites are end-products of enzymatic reactions and linked by metabolic pathways, making them downstream of genomics, transcriptomics, and proteomics; they can also be derived from food, medications, microbiota, and the environment [12, 13]. Metabolism may be perturbed in disease and can directly reflect the underlying pathogenic mechanisms [12].

Prior work has shown that serum and plasma metabolomic analyses are useful in the study of established and advanced lung diseases. Serum and plasma metabolomics have been used to discriminate healthy controls from those with COPD or IPF [14, 15] and to detect the presence and extent of radiographic emphysema and other measures of disease severity [16,17,18]. Systemic metabolomics may similarly reflect and provide biochemical insight into more subtle parenchymal injury like QIA.

In this study, we used a global metabolomics assay that captures a broad range of chemical classes of metabolites to identify those associated with QIA in a well-characterized cohort of ever-smokers. Additionally, given shared risk factors between QIA and quantitative emphysema, we sought to identify shared and differentiating metabolite profiles in QIA-predominant versus emphysema-predominant CT phenotypes.

Methods

This was a cross-sectional cohort study of metabolomics features associated with QIA. The Genetic Epidemiology of COPD (COPDGene) is a prospective observational study of over 10,000 former and current smokers (ever-smokers) aged 45–80 years with at least a 10 pack-year smoking history and no prior history of bronchiectasis or interstitial lung disease (ILD) [19]. Participants self-reporting as Non-Hispanic White or Black were recruited at 21 study centers across the United States. For this study, we used data from questionnaires, chest CT scans, and blood samples collected at the five-year follow up (visit 2) of the study (2013–2017), as previously described [19]. The COPDGene study (NCT00608764) was approved by the institutional review board for ethical review at all 21 participating centers (Additional file 1). All participants provided written informed consent.

Chest CT measurements

We measured QIA and quantitative emphysema in 4,778 participants using a previously-published automated tool [1]. The tool employs a machine learning classifier using local density histograms and distances from the pleural surface to identify the voxels of total CT lung volume as radiologic features. The percentages of total lung volume of reticulation, subpleural lines, ground glass opacities, honeycombing, linear scarring, centrilobular nodules, and other nodularity features were summed to yield the total QIA percent (Fig. 1); panlobular and centrilobular emphysema features were summed to yield total emphysema percent. We used continuous QIA for the primary analysis.

Fig. 1
figure 1

Quantitative interstitial abnormalities (QIA). The left image is an example computed tomography (CT) cut of a left lung, and the right image is the same cut with QIA shaded

Plasma metabolomics measurements

Metabolomics assays were run on plasma samples collected from 1,136 participants from two study centers (National Jewish Health, University of Iowa) at visit 2. This analysis included 928 participants with complete quantitative CT and metabolomics level measurements available (Fig. 2). Plasma samples were profiled using the Metabolon Global Metabolomics Platform (Morrisville, NC), as described previously and in the Additional file 1. [20,21,22]

Fig. 2
figure 2

Flow diagram of participants included in the study

Metabolite levels were median scaled within each batch. Of the 1,392 metabolites initially profiled by the Metabolon platform, 397 with > 20% missingness were excluded, as done in prior work, referred as the “80%” rule [23]. Of the remaining 995 metabolites, 237 were quantified but chemically unidentified, and they were not used in our analysis. The remaining 758 named metabolites were used in our analysis. For the 400 metabolites that had ≤ 20% missingness, missing values were imputed using k-nearest neighbor sample imputation (k = 10); 358 metabolites had no values missing. Missingness is an inherent element of metabolomics data, which routinely requires pre-processing such as data reduction and imputation [24]. All values were log2-transformed in preparation for statistical analyses. There were 363 involved in lipid, 180 in amino acid, 32 in nucleotide, 24 in carbohydrate, and 10 in energy metabolism (Additional file 1: Table S1). Additionally, there were 25 cofactors and vitamins, 25 peptides, 96 xenobiotics, and 3 partially-characterized metabolites.

Statistical analyses

We assessed the associations between each metabolite level (individual predictor) and continuous QIA (primary outcome) with univariable linear regression then multivariable linear regression adjusted for age, sex, body mass index (BMI), smoking status, pack-years, and inhaled corticosteroid (ICS) use, using a Benjamini–Hochberg False Discovery Rate (FDR) p-value of ≤ 0.05. Models were adjusted for BMI given obesity perturbs the metabolome, and obesity-related atelectasis may contribute to CT noise [25]. For the secondary analysis, QIA and emphysema were both dichotomized by a cutoff of each measure occupying ≥ 5% or < 5% of the total lung volume; cutoffs were determined based on prior work [1, 26]. We categorized participants into four CT-based phenotypes, defining those with ≥ 5% QIA and < 5% emphysema as QIA-predominant, ≥ 5% QIA and ≥ 5% emphysema as combined-predominant, < 5% QIA and ≥ 5% emphysema as emphysema-predominant, and < 5% QIA and < 5% emphysema as neither-predominant (Additional file 1: Fig. S1). We assessed unadjusted associations between each metabolite level and the CT phenotype (secondary outcome) using analysis of variance (ANOVA), then used multinomial logistic regression adjusted for the covariates listed above, at an FDR of ≤ 0.05, with QIA-predominant used as the reference group. Analyses were performed using R software (version 4.2.2) and implemented using RStudio (version 2022.12.0 + 353). [27, 28]

We performed metabolic pathway enrichment analyses of the significant metabolites using the web platform MetaboAnalyst (V5.0) [29]. L- and D-enantiomer annotations for amino acids were simplified to the L-enantiomer. Metabolites with missing or more than one Human Metabolome Database (HMDB) ID annotations were excluded [30]. The metabolic pathways were mapped to the homo sapiens Kyoto Encyclopedia of Genes and Genomes database (KEGG) [31], then pathway enrichment analyses were performed by global test at an FDR of ≤ 0.05 and topology analyses by relative-betweenness centrality.

Results

The baseline characteristics of the 928 participants in this analysis are shown in Table 1. The participants had mean age of 67.5 ± 8.6 years, were 50.2% male, and were predominantly former smokers. Mean percent predicted forced expiratory volume in 1 s (FEV1) was 77.8 ± 26.0%, and mean percent predicted forced vital capacity (FVC) was 86.6 ± 18.5%. The mean percentage of lung occupied by QIA was 5.0 ± 4.3% and by emphysema was 8.1 ± 16.0%. In the cohort, 223 had QIA-predominant, 109 had combined-predominant, 133 had emphysema-predominant, and 463 had neither-predominant CT phenotypes (Additional file 1: Table S2). The participants in our cohort, when compared to the rest of the COPDGene cohort with CT measurements and covariables but no metabolomics data available, had similar characteristics but were older, with a greater percentage of former smokers and predominantly White in self-reported race (Additional file 1: Table S3).

Table 1 Baseline characteristics

Association of metabolomics with QIA

We identified significant associations between 223 metabolites and continuous QIA by univariable linear regression (Additional file 1: Table S4). By multivariable regression, 85 metabolites were significantly associated with QIA (Fig. 3, Additional file 1: Table S5), of which 51 metabolites were negatively-associated and 34 positively-associated with QIA, including 44 (51.8%) lipids and 21 (24.7%) amino acids.

Fig. 3
figure 3

Volcano plot of median-scaled, log-transformed metabolites associated with QIA. Metabolites are colored by super pathway if significantly associated with QIA and colored in gray if insignificant (FDR ≤ 0.05)

The top 25 positively-associated and 25 negatively-associated metabolites are shown in Table 2. Positively-associated metabolites included the aminosugars N-acetylneuraminate and erythronate, the nucleotide pseudouridine, and the amino acid derivatives C-glycosyl tryptophan and N-acetylserine. Enrichment analysis of these positively-associated metabolites showed overrepresentation of metabolites involved in nicotinate and nicotinamide, histidine, starch and sucrose, and pyrimidine pathways. Negatively-associated metabolites included eight phosphatidylcholines, seven lysophospholipids, and four sphingomyelins; some of these metabolites were represented in the glycerophospholipid and sphingolipid pathways that were significant in the enrichment analysis (Table 3, Fig. 4).

Table 2 The 25 metabolites that are most positively-associated and 25 metabolites that are most negatively-associated with quantitative interstitial abnormalities in multivariable linear regression models
Table 3 Pathway enrichment analysis of metabolites associated with quantitative interstitial abnormalities
Fig. 4
figure 4

Scatterplots generated from pathway enrichment analysis in MetaboAnalyst. FDR p-values are on the y-axis and pathway impact values on the x-axis. The size of the plotted point correlates to the pathway impact and color (blue to red) correlates to p-values

Multinomial outcomes of CT phenotypes

Globally amongst the four phenotypes (QIA-predominant, combined-predominant, emphysema-predominant, neither-predominant), 282 metabolites significantly differed by ANOVA. Post-hoc Tukey’s test pairwise comparisons were performed and are shown in Additional file 1: Table S6.

Our adjusted multinomial logistic regression models yielded 75 metabolites that differed significantly between QIA-predominant and emphysema-predominant phenotypes, with 45 associated with higher odds and 30 associated with lower odds of QIA relative to emphysema (Table 4), including 36 (48.0%) lipids and 22 (29.3%) amino acids (Fig. 5, Additional file 1: Table S7). Most of the associations of amino acids with QIA-predominance were positive, and they included dimethylarginine (SDMA, ADMA), phenylalanine, asparagine, proline, and kynurenine. Amongst lipids, phosphatidylethanolamines (PE) were most commonly associated with higher odds of QIA-predominance, whereas sphingomyelins (SM) and acyl carnitines were associated with higher odds of emphysema-predominance. Pathway enrichment analysis showed overrepresentation of metabolites involved in PE metabolism (glycerophospholipid and glycosylphosphatidylinositol-anchor pathways), as well as multiple amino acid pathways including those involving nicotinate and nicotinamide, aminoacyl-tRNA, arginine, proline, alanine, aspartate, and glutamate metabolism (Table 5, Fig. 6).

Table 4 A The 25 metabolites that have highest odds of QIA-predominant phenotype and B 25 metabolites that have the lowest odds of QIA-predominant phenotype compared to emphysema-predominant phenotype in multinomial logistic regression
Fig. 5
figure 5

Heatmaps of mean A amino acid metabolite values and B lipid metabolite values of QIA-predominant, combined-predominant, emphysema-predominant, and neither-predominant groups. Shown are the metabolites that had significant differences in the QIA-predominant and emphysema-predominant groups (FDR p ≤ 0.05)

Table 5 Pathway enrichment analysis of metabolites associated QIA-predominant versus emphysema-predominant phenotypes from multinomial regression
Fig. 6
figure 6

Scatterplots generated from pathway enrichment analysis in MetaboAnalyst of multinomial logistic regression analysis of QIA-predominant and emphysema-predominant phenotypes. FDR p-values are on the y-axis and pathway impact values on the x-axis. The size of the plotted point correlates to pathway impact and colors (blue to red) correlate to p-values

One metabolite, tryptophan betaine, was significantly associated with lower odds of QIA-predominant compared to neither-predominant groups. There were no significant metabolites between the QIA-predominant and combined-predominant groups. Intriguingly, despite non-significance, and although the unadjusted mean metabolite levels do not completely reflect the multinomial differences, the combined-predominant group had similar levels of amino acids as the QIA-predominant group but, depending on the metabolite, showed directionality with either the QIA-predominant or the emphysema-predominant group (Fig. 5).

Discussion

To our knowledge, our discovery study was the first global analysis of the metabolomics features of quantitative interstitial abnormalities in a large cohort. Our study of 928 ever-smokers in COPDGene found that 85 plasma metabolites from the Metabolon metabolomics assay were associated with QIA, independent of age, sex, BMI, smoking status, pack-years, and ICS use. These findings highlight the metabolic features of participants with QIA and provide initial insight into the biochemical systemic features associated with these quantitative CT changes. Furthermore, we identified 75 metabolites that differed significantly between participants with QIA-predominant versus emphysema-predominant phenotypes. Such associations of metabolomic differences between participants with shared risk factors but different CT parenchymal findings may be useful as biomarkers that distinguish these smoking-related phenotypes. These associations also help us understand the metabolic processes that may be important in early QIA, but which may be less prominent in later stages of lung injury like emphysema. Lastly, some of the metabolites significant in our analyses were those previously associated with advanced diseases like IPF and COPD, suggesting potential shared pathways in progression that should be studied further.

Amino acids

Circulating amino acids are involved in numerous processes including cell signaling, regulating gene expression, hormone synthesis and secretion, nutrient metabolism, oxidative stress, and immune response regulation [32]. In particular, smokers with COPD can have perturbations in branched chain amino acid levels important in the skeletal muscle, which may reflect systemic changes including impaired immunity or cachexia. [17]

In our analysis, the tryptophan metabolites quinolinate, kynurenine, and N-formylanthranilic acid were associated with higher odds of QIA over emphysema and were notable. These tryptophan derivatives suggest inflammatory activity with QIA and shared overlap with advanced diseases. These three metabolites are downstream in the kynurenine pathway, which normally comprises 95% of tryptophan metabolism and is upregulated in inflammation and immune responses [33]. Quinolinate is also a substrate for nicotinamide adenine dinucleotide (NAD +) synthesis, required for normal cell function and energy production, and this pathway has been proposed to be upregulated in physiological stress [34] and was enriched for QIA in our analysis. In COPD, upregulated kynurenine derivatives are associated with reduced FEV1 [35], and reduction in the precursor tryptophan is associated with COPD exacerbations and emphysema [17]. Patients with IPF have been shown to have significant declines in kynurenine after treatment with the anti-fibrotic pirfenidone, thought to be due to its anti-inflammatory effects. [36]

Several glutamine derivatives were associated with higher odds of QIA over emphysema and in pathway enrichment: glutamate, alpha-ketoglutarate, and 4-hydroxyglutamate. The precursor glutamine is the most abundant amino acid in the body, found in both plasma and skeletal muscle, and plays roles in immune modulation, ammonia transport, and maintenance of cell integrity and function [37]. The downstream derivatives glutamate and alpha-ketoglutarate are crucial intermediates in the Krebs cycle [37]. In patients with COPD compared to controls, plasma glutamine and glutamate are decreased, thought to be from hypermetabolism and muscle depletion [38]. We found higher odds of higher levels of glutamine derivatives in QIA-predominance compared to emphysema-predominance, which suggests that patients with emphysema may be in a more advanced, catabolic state compared those with QIA. These metabolites should be studied as a potential factor in the progression of QIA to advanced disease.

Lipids

Circulating lipids comprise thousands of individual species with a considerable range of structural diversity and physiological functions, including maintaining the integrity of the lipid bilayer, functional hormones, and cell signaling pathways [39]. In the lungs, lipids are important components of surfactant. [40]

The majority of lipids negatively-associated with QIA were phospholipids, including eight phosphatidylcholines (PCs), significant in enrichment analysis. PCs are the body’s most abundant phospholipids and the major component of surfactant [41]. Some of the PC species negatively-associated with QIA in our analysis were those specifically found in prior studies of IPF and COPD patients demonstrating decreased PC concentrations in the respiratory fluid and blood [18, 42, 43]. Decreased PC levels may also generally reflect oxidative changes in the lungs in the setting of cigarette exposure, as has been demonstrated in mice alveolar cells [44]. Further studies are needed to test the relationship between blood and pulmonary phospholipids in smokers and to understand whether the plasma PC perturbations associated with QIA represent a systemic manifestation of PC dysregulation in surfactant, or another phospholipid perturbation altogether.

Sphingomyelins (SM) were another lipid subclass negatively-associated with QIA, and they were of particular interest given their many roles in fetal lung development and lung inflammation [45]. Patients with IPF have downregulated plasma SM [46], including the SM(D18:1/20:0) species that we identified with QIA. A previous study of COPD phenotypes in the COPDGene cohort found that some SMs were also negatively-associated with emphysema [47]. In our analysis, four SMs were associated with lower odds of QIA-predominance compared to emphysema-predominance. Our findings complement the prior study as it did not account for QIA in the assessment of emphysema, suggesting that both QIA and emphysema CT measures should be considered when studying the metabolomics of smoking-related disease.

Carbohydrates

Amongst carbohydrates, our pathway analyses showed QIA was enriched for certain sugars including maltose, sucrose, and xylose. As these sugars mostly come from the diet and the breakdown of food starches in the digestive system, these metabolites may be related to the gut-lung axis, in which changes in inflammation and microbiota in the gut mucosa cross-talk with lung mucosa [48]. Also positively-associated with QIA was the sialic acid amino sugar N-acetylneuraminate (Neu5Ac), which may be reflective of inflammation. Sialic acids are often the terminal sugars on mucin, and variations in these sugar attachments may indicate regulation by proinflammatory cytokines or modification by bacteria. [49] Higher Neu5Ac levels in bronchoalveolar lavage fluid have been found to be associated with COPD and with increased bacterial binding in smokers [50, 51]. Lastly, erythronate and its precursor N-acetylglucosamine were also positively-associated with QIA. These extracellular matrix degradation products are associated with pulmonary fibrosis in animal models [52], and they may reflect remodeling during QIA development.

Strengths and limitations

With more patients with a smoking history receiving screening CT scans than before, we need a deeper understanding of the biology that underlies the subtle interstitial changes that are often caught incidentally. The metabolites significant in our exploratory study provide initial insight into the biochemical activity and pathways associated with QIA. We found associations with several metabolomic features previously linked with IPF and COPD, suggestive of shared disease activity between early-stage QIA and later-stage advanced diseases.

We also identified metabolic features that differed between participants with QIA- and emphysema-predominant phenotypes, which provide initial insight into possible common and different underlying pathways. The two smoking-related phenotypes share risk factors and imaging and physiologic features, especially before very advanced disease develops [53]. Since the metabolomic levels of the combined-predominant group did not clearly fall in between those of the QIA- and emphysema-predominant phenotypes, their metabolomic profiles reflect more complicated processes requiring further investigation.

There are several limitations to our study. Although one of the strengths of our study is the large sample size of thoroughly phenotyped ever-smokers, our results need replication in other smoking and population-based cohorts for validation of potential biomarkers. Our analyses can provide insight into, and generate hypotheses for, possible pathogenic pathways of QIA but cannot be used to elucidate exact mechanisms. Furthermore, while the global metabolomics panel captures a broad range of different classes of metabolites, it is not quantitative; in future work aimed at pinpointing mechanisms, targeted assays will be required. Due to the cross-sectional nature of our data, interpretations of causality are limited; longitudinal studies may help elucidate temporal relationships more clearly. We defined CT phenotypes with the predominant CT features occupying at least 5% of the lung volume; although a binary cutoff for emphysema at 5% is well-established as a clinically meaningful value [26], a similar 5% cutoff has been used for QIA but is less robust [1]. Lastly, we used the HMDB identifier and KEGG background database for our pathway analysis because they are widely used, acknowledging the following limitations. Given the novelty of high-throughput metabolomics and rapid accumulation of new data in the field, some metabolites are unclearly notated or not found at all in the databases, other metabolites are redundant in multiple pathways [54]; thus, we may have not been able to detect some pathways that are nonetheless biologically important in QIA.

Conclusions

Lipid, amino acid, and carbohydrate metabolites associated with inflammation and immune response, extracellular matrix remodeling, surfactant, and muscle cachexia may play important roles in the earliest stages of smoking-related lung disease. These metabolic signals provide initial insight into the biochemical associations with QIA as one of the earliest stages of smoking-related lung disease activity. These signals suggest future biomarkers for early detection of disease and potential therapeutic targets before progression to IPF and COPD.