Technical variability measured using β-pinene vapors
Before discussing the biological variability measured in human breath, we gauged the typical technical variability to be expected for our SESI-HRMS system. In order to do so, we infused a continuous stream of air seeded with 92.7 ppb of β-pinene, simulating an exhalation maneuver. Upon injection of the standard, the mass spectrum was dominated by the expected protonated β-pinene at m/z 137.1326 (C10H17), along with some oxidized species (C10H15O and C10H17O2; ESM Fig. S3). SESI-MS is known to detect trace species down to the sub-ppt range . For this reason, and not surprisingly, 92.7 ppb of β-pinene nearly saturated the detector of the Orbitrap mass analyzer. Because the dynamic range of our mass analyzer is five orders of magnitude (signal intensity 104–109 a.u.), the limit of detection is expected to be at around 1 ppt, which is consistent with previous SESI-MS quantification studies . When we started the delivery of β-pinene, the signal of the protonated analyte raised sharply to reach a plateau. We measured the stability of the signal intensity detection during 1 h. When the delivery of β-pinene was stopped, the signal intensity dropped abruptly to baseline level, indicating no carryover effects, at least for this particular compound (inset Fig. S3, see ESM). The CV of β-pinene signal intensity during an hour of continuous delivery of the vapor was found to be 2.3%. We therefore conclude that technical CVs within 3% are to be expected for our SESI-HRMS platform.
Replicate exhalations: intra- and inter-subject variability
In total, the four participants provided 648 exhalations (n = 171 for subject 1, n = 174 for subject 2, n = 225 for subject 3 and n = 78 for subject 4). These measurements were subdivided into 104 single experiments (N = 25 for subject 1, N = 29 for subject 2, N = 37 for subject 3, and N = 13 for subject 4) each containing 6 to 13 exhalations (replicates) performed within 10 to 20 min. The aim was to examine the variability across these replicates, considering that the technical variability, as mentioned above, was found to be in the range of 3%. Figure 1 b shows one such representative experiment whereby a subject provided 13 consecutive exhalations during 19 min (ESM Fig. S4 shows a zoomed-in view of the first exhalation, where the time traces can be inspected in greater detail).
The vast majority of the features typically detected by SESI-HRMS in human breath remain to be positively identified. However, over the last years, we have made a substantial effort to systematically identify the molecular structure for some of these metabolites by combining real-time breath MS/MS analysis and UPLC-MS/MS analysis of exhaled breath condensate [47,48,49,50,51,52,53]. Given the clinical importance of aldehydes, as potential surrogates of oxidative stress [54,55,56,57,58,59], we will concentrate in discussing our findings for a series of three classes of fatty aldehydes : 4-hydroxy-2-alkenals (CxH2x – 2O2), 2-alkenals (CxH2x − 2O), and 4-hydroxy-2,6-alkadienals (CxH2x − 4O2) with chain lengths ranging from C8 to C16. These 27 representative aldehydes were used as benchmarking metabolites. For reference, Fig. 1 b shows the time traces of three such representative exhaled aldehydes and Fig. S5 (see ESM) shows the time traces for the 27 aldehydes of interest from the same experiment. The gray areas in Fig. 1 b and Fig. S4 (see ESM) represent the time windows whereby CO2 levels were above 3%.
Visual inspection of CO2 and exhalation parameters from Fig. 1b suggests a high repeatability across replicate measurements. Indeed, computed mean ± SD for this particular experiment yielded a CO2 level of 4.7 ± 0.1%, an exhalation flow rate of 11.7 ± 0.3 L/min and an exhaled volume of 2.6 ± 0.1 L (i.e., excluding 0.5–0.6 L of breath not containing at least 3% of CO2) for the considered windows. Median CVs (IQRs) for CO2, exhalation flow rate, and exhaled volume based on all 104 experiments were 3.2% (1.5%), 3.1% (1.9%), and 5.0% (4.6%), respectively. The overall picture for the aldehydes was somehow more complex. While 4-hydroxy-2,6-pentadecadienal in Fig. 1 b shows a relatively constant behavior across all exhalations (akin to CO2), 2-dodecenal drops over time during consecutive exhalations and the decay is even more pronounced for 4-hydroxy-2-nonenal, whose signal intensity decays by ~ 35% during the first three exhalations, to then reach a steady state. Interestingly, we observed this behavior systematically for these particular molecules among all participants. Figure 2 shows the mean normalized breath-signal (see “Material and methods” for details) and the corresponding 95% CI from all experiments for the four participants as a function of exhalation number for the three selected representative compounds shown in Fig. 1 b. It clearly shows that the dynamics for each compound are subject independent and, interestingly, seem to depend on the aldehyde chain length. For example, signal intensity drops between the first and the sixth exhalation for 4-hydroxy-2-nonenal is around 50%, for 2-dodecenal the drop is around 30%, whereas for 4-hydroxy-2,6-pentadecadienal signal intensity remains stable (or even increases after the first exhalation). This trend was systematically observed for all the aldehydes from the three classes (ESM Figs. S6-S8).
Location within the respiratory system where the gas exchange occurs may explain the molecule-dependent exhalation traces
The signal intensity decaying behavior as a function of chain length can be rationalized by the dependency with Ostwald blood-air partition coefficient (λb:a), which is the most important factor in determining the location within the respiratory system where the gas exchange occurs . Soluble gases with λb:a > 100 exchange almost exclusively within the airways (with the bronchial blood), whereas those with 10 < λb:a < 100 exchange partially in the airways and in the alveoli, and those with a λb:a < 10 nearly exclusively exchange in the alveoli (with the pulmonary blood) . Therefore, CO2 (λb:a = 3) exchanges in the alveoli . Figure 1 b shows that the CO2 level does not decrease as the participant provides consecutive exhalations and this was the trend observed across all measurements. The same trend is observed for the longest aldehydes, which in turn have the lowest λb:a from the series. The predicted λb:a by Kramer et al.  suggests that, indeed, shorter aldehydes have a greater λb:a. For example, the predicted λb:a for 2-hexenal was 111, therefore exchanges almost exclusively in the airways. In contrast, 2-undecenal has a predicted λb:a = 39; hence, it exchanges partially in the airways and in the alveoli. It is expected that even longer aldehydes (> C14), such as those studied in this work, will have a λb:a approaching the critical value of 10 (i.e., almost exclusively exchanged in the alveoli). This trend can be observed in Fig. S9 (see ESM), which shows predicted λb:a as a function of the number of carbon atoms from the aldehydes, based on data by Kramer et al. . Thus, we hypothesize that the longest aldehydes studied here (C14–C16) exchange exclusively in the alveoli, and for this reason show a similar behavior as CO2. In contrast, the smaller aldehydes exchange mainly in the airways, leading to a decrease during prolonged consecutive exhalations. For example, it has been estimated that ethanol, which has a high blood solubility (λb:a = 1,803), can show a 20% lower concentration than alveolar air after a complete prolonged exhalation . Reinforcing this idea, we found that the signal intensity as a function of exhaled volume during a single exhalation, varies significantly depending on the aldehyde chain length and therefore on their λb:a. Figure 3 a displays signal intensity profiles of the aldehydes as a function of exhaled volume for a representative first and last exhalation in an experiment (same experiment as Fig. 1 b). It clearly shows how the C9 metabolites reach a maximum intensity at ~ 0.7 L to then decrease. In contrast, the exhalation profile for the longest aldehydes (C12 and C15) tends to increase systematically with exhaled volume (similarly to CO2 profiles). We hypothesize that, as the exhalation maneuver is repeated, the net influx towards bronchial circulation exceeds that outwards. Thus, the partial pressure cannot re-equilibrate in the short lapse in-between exhalations, leading to a constant non-linear decay across the repeated measurements. For 4-hydroxy-2-dodecanal, we observed a deviation from the decaying pattern (Fig. 3 a and ESM Fig. S5). The underlying reason might be that this particular m/z channel is dominated by an isomer of 4-hydroxy-2-dodecanal. It is important to note at this point that this is a limitation of SESI-HRMS, as discrimination of isomers is sacrificed by the possibility of performing real-time analysis.
In order to further connect the theoretical explanation as to why λb:a ultimately modulates the decay in signal intensity due to gas exchange in the airways, Fig. 3 b (and ESM Fig. S10) shows the experimental average breath-signal difference between the last and first exhalation, as a function of predicted λb:a. These λb:a values were estimated by fitting the λb:a for all aldehydes reported by Kramer et al.  (ESM Fig. S9). It reveals a clear trend, whereby for the longest chain (C16) the difference tends to increase during the repeated measurements. This is especially evident for 4-hydroxy-2-hexadecenal (ESM Fig. S6). In contrast, as the chain length decreases (and thus λb:a increases), the breath-signal difference decreases to finally reach a plateau of Δ −20% to −40% at C11. The fact that this clear trend occurs in the transition boundaries between 10 < λb:a < 100 suggests that indeed this may be due to the different regions of the respiratory system where these series of compounds exchange: from alveoli for C16 to airways for C8, with a mixed exchange situation for intermediate species. Further work is required to confirm this hypothesis and whether this could be further exploited to infer physiological information of the respiratory system, for example, complementing other tests such as the multiple-breath washout test to measure abnormal ventilation distribution between well- and poorly ventilated lung regions.
Despite that the first exhalation may reflect more accurately systemic concentrations for metabolites with high blood-air partition coefficients, we recommend to sample at least ten replicate exhalations and compute breath-signals considering only the last three exhalations, thus capturing the steady state. When doing so in the example shown in Fig. 1 b, the median CV (IQR) for the 27 aldehydes was 4.1% (1.5%), which approaches the technical variability of ~ 3% measured with standard β-pinene vapors. However, for pediatric patients and patients suffering from respiratory diseases, this may prove difficult. For this reason, in order to determine an upper bound of expected variability, we have evaluated here the variability of breath metabolites across all subjects considering only six exhalations and excluded the first three maneuvers to the breath-signal for metabolites. When doing so, we found that the median CV (IQR) for the aldehydes studied here was 6.7% (5.5%). Table 1 lists the intra-subject CVs for the 27 aldehydes studied here.
Some studies indicate that the exhalation maneuver itself can in some cases alter the metabolic profile, hence providing misleading results . For this reason, we further investigated whether the exhalation flow rate of our protocol had an impact on the breath-signal of the exhaled metabolites. Flow resistance of the device was as low as 3 mbar × min/L, mean ± SD exhalation flow rates of all the experiments performed in this study (N = 104) was 10.6 ± 0.9 L/min (ESM Fig. S11) and typical exhaled volumes were in the order of 3 L (i.e., 15–20 s of exhalation). It is important to note that this maneuver is far less invasive and easy to perform than a classical spirometry, whereby the forced expiratory volume in one second (FEV1) can typically be 4 L in adults. This implies exhalation flow rates around 25 times higher than the maneuver used in our experiments. It has been shown that such forced expiration maneuvers can lead to substantial changes in exhaled CO2 and other metabolites . The fact that no significant changes in the CO2 levels were observed suggests that the maneuver does not induce hyperventilation . In order to determine whether there was any dependency with the exhalation flow rate, we explored the impact of exhaling at two flow rates, one at the lower end and another one at the upper end of the distribution of exhalation flow rates measured for all participants (ESM Fig. S11). Figure 4 a shows the comparison of two measurements from the same subject at a lower flow rate (9.8 ± 0.1 L/min) and consecutively at a higher flow rate (12.0 ± 0.3 L/min). Bland-Altman plot for log-transformed variables shows that the breath-signal of metabolites is independent of the exhalation flow rate. The mean of log10(ratio) was found to be − 0.09. As expected, only ~ 4% of low-intensity ions lie outside the mean ± 1.96 × SD bands. We therefore conclude that the range of flow rates between 9 and 12 L/min are suitable for breath metabolomics using our particular configuration.
Antibacterial/antiviral spirometry filter
Patient and operator safety and hygiene are crucial factors to take into account in the clinics. For this reason, the interface between the patient and the breath analysis platform is through a disposable barrier filter, as the ones routinely used for pulmonary function testing. This is a new element incorporated in this device to allow for measuring patients with suspected respiratory infectious diseases. Until now, our system featured a mouthpiece filter used for alcohol breath tests, which would not be suitable to investigate contagious respiratory diseases. In a separate set of experiments, we examined whether these aerosol filters may have an impact on the detected metabolites. To do so, we compared the breath-signal of the same subject exhaling through the filter and subsequently exhaling without the filter. Figure 4 b shows the resulting comparison, represented as a Bland-Altman plot for log-transformed variables. There appears to be a small bias towards lower intensities by the use of the filter, as the mean of log10(ratio) was found to be − 0.17. Moreover, only 4.6% of the signals fell outside the mean ± 1.96 × SD boundaries. Globally, these results are consistent with previous studies suggesting that SESI-MS breath spectra using and removing aerosol filters look alike . We therefore conclude that, while the antibacterial/antiviral filters incorporated in our system may partially suppress some signal intensities, they represent a good compromise to protect the system and the operator from pathogens and to preserve the quality of the mass spectral readout of exhaled metabolites.
Instrumental time drift
Instrumental time drifts and batch effects are a common problem in untargeted metabolomics [65, 66]. This can be especially critical in clinical studies as patient recruitment typically runs over several months/years. In order to assess whether our system showed any significant batch effect due to the date of measurement, we visualized our data using principal components analysis (PCA). Figure 5 shows the resulting plot for the first two components, whereby the labels on the left-hand side correspond to a total of 17 measuring days spanning across 1 month. No clustering according to measuring day is evident, suggesting that the variance explained by these two components (48.6% in total) cannot be attributed to a batch effect. Note that no special cleaning procedures, apart from flushing the ion source with hot nitrogen overnight, were performed during this month of operation. In contrast, on the right-hand side of Fig. 5, the same score plot is shown whereby the labels now indicate the subject number. Grouping based on the subject number is much more evident. For example, subjects 1 and 3 cluster together suggesting a significantly different exhaled metabolic phenotype than subjects 2 and 4. This is also consistent with previous studies suggesting the existence of stable individual-specific metabolic traits [67,68,69]. The same picture emerged when we considered the 27 representative aldehydes (ESM Fig. S12). In order to provide a more objective assessment of whether significant differences exist across subjects for these metabolites, we conducted an ANOVA test followed by post hoc multiple comparison using a Bonferroni method (Table 2).
This univariate approach revealed significant differences in the breath-signal of exhaled aldehydes. Overall, the median (IQR) relative difference between individuals (considering only those p ≤ 0.05) was 48.2% (39.3%). This is consistent with inter-subject variability in blood concentrations for these particular compounds. For example, Mak et al.  reported CVs for 4-hydroxy-nonenal from eight healthy individuals of 95.8%. In our case, mean differences between subject 1 and 2 were of 42.4% for this particular compound. It is therefore evident that the inter-subject biological variability is greater than intra-subject variability, and is consistent with the variability expected in blood levels.
Fatty aldehydes as surrogate markers of oxidative stress
Fatty aldehydes were chosen as metabolite models for this study as they are related to lipid peroxidation and oxidative stress. Oxidative stress is the trigger for the production of fatty aldehydes, such as 4-hydroxy-2-nonenal, in human metabolism . Abnormally elevated values (factor two to three as compared to controls) of some of the aldehydes studied here have been associated with pathologies such as congestive heart failure . Strong associations between series of metabolites, i.e., in terms of correlations, might be an indication for a common metabolic pathway, as already shown previously for series of omega-oxidation end-products of aliphatic fatty acids [52, 72] and aminoacids . In an attempt to visualize whether an interplay between the different series of fatty aldehydes may be captured by breath analysis, we computed correlation coefficients across all measurements. A first indication suggesting that these metabolites are indeed metabolically connected is given by the fact that all of them showed positive correlations (ESM Figs. S13-15). Thus, all measured subjects had consistent (high or low) breath-signals for all 27 metabolites. One could argue that this might be an artifact as a result of different performance of the system during the different days (i.e., consistently high- or low-intensity mass spectra). However, this can be ruled out as we found that these aldehydes consistently correlated with each other, but not with the rest of the over 2,000 features considered in the breath mass spectra (ESM Fig. S16). Only around 2% of the pair-wise correlations for all features correlated with r ≥ 0.85 with the aldehydes. We therefore conclude that the observed associations for these families of compounds should encode a biological meaning. Figure 6 shows the resulting correlation network for the aldehydes. Most of the aldehydes are indeed linked with a mean ± SD degree of 4 ± 2 (r ≥ 0.85). This is to be expected from the metabolic point of view, as aliphatic aldehydes in humans are largely produced by a cascade of catabolic metabolism of several lipids . In particular, peroxidative cleavage of polyunsaturated fatty acids by reactive oxygen species is the mechanism behind a complete series of aldehydes as those studied, including short- and medium-chain aldehydes, or hydroxy-alkenals.