Pulmonary gas exchange evaluated by machine learning: a computer simulation

Using computer simulation we investigated whether machine learning (ML) analysis of selected ICU monitoring data can quantify pulmonary gas exchange in multi-compartment format. A 21 compartment ventilation/perfusion (V/Q) model of pulmonary blood flow processed 34,551 combinations of cardiac output, hemoglobin concentration, standard P50, base excess, VO2 and VCO2 plus three model-defining parameters: shunt, log SD and mean V/Q. From these inputs the model produced paired arterial blood gases, first with the inspired O2 fraction (FiO2) adjusted to arterial saturation (SaO2) = 0.90, and second with FiO2 increased by 0.1. ‘Stacked regressor’ ML ensembles were trained/validated on 90% of this dataset. The remainder with shunt, log SD, and mean ‘held back’ formed the test-set. ‘Two-Point’ ML estimates of shunt, log SD and mean utilized data from both FiO2 settings. ‘Single-Point’ estimates used only data from SaO2 = 0.90. From 3454 test gas exchange scenarios, two-point shunt, log SD and mean estimates produced linear regression models versus true values with slopes ~ 1.00, intercepts ~ 0.00 and R2 ~ 1.00. Kernel density and Bland–Altman plots confirmed close agreement. Single-point estimates were less accurate: R2 = 0.77–0.89, slope = 0.991–0.993, intercept = 0.009–0.334. ML applications using blood gas, indirect calorimetry, and cardiac output data can quantify pulmonary gas exchange in terms describing a 20 compartment V/Q model of pulmonary blood flow. High fidelity reports require data from two FiO2 settings. Supplementary Information The online version contains supplementary material available at 10.1007/s10877-022-00879-1.


Introduction
More than 50 years ago John West published his landmark model of pulmonary gas exchange [1], building on the work of predecessors [2]. The model is characterised by volumes of inspired gas (V) and mixed venous blood (Q) equilibrating in 10 to 100 virtual lung compartments governed by log normal distributions of alveolar ventilation and pulmonary capillary blood flow across compartmental V/Q ratios [1,3,4].
The multiple inert gas technique (MIGET), an investigative tool based on West's model [5,6], has provided mechanistic detail on impaired gas exchange. MIGET evaluations are technically challenging procedures in which six inert gases spanning a range of solubilities are infused in saline until equilibration. Plots of pulmonary retention and excretion versus gas solubility are constructed from gas chromatographic measurements and 'transformed' respectively into distributions of blood flow and ventilation against a logarithmic scale of V/Q ratios spread across 50 compartments [5,7].
MIGET has identified shunt (V/Q = 0) as the dominant cause of hypoxaemia in the acute respiratory distress syndrome (ARDS) and lobar pneumonia, whereas in chronic obstructive pulmonary disease (COPD) and in some patients with COVID-19 pneumonia hypoxaemia is primarily from mixed venous equilibration in low V/Q compartments [8][9][10]. Bimodal distributions have been observed in patients with COPD, asthma [3] and ARDS [11]. Despite its 'gold standard' status, the complexity of MIGET has obliged clinicians to track pulmonary gas exchange via alternative indices, usually those categorized as 'tension' or 'content'-based [12]. Venous admixture (VA) is the classic content-based index [13], while tension-based indices include the A-a gradient, used in APACHE risk algorithms [14], and the ratio between the arterial oxygen tension and the inspired oxygen fraction (PaO 2 /FiO 2 ratio or PF ratio), important in ARDS diagnosis and stratification [15].
These indices show significant signal variability [16], but their greatest drawback is the limited information provided on the underlying pulmonary pathophysiology. The VA approach of Riley and Cournand [13,17] is more informative on this aspect, but hampered by inherent over-simplification. This is because VA (V/Q = 0) is one of just two perfused compartments (V/Q = 0 and 1). All oxygen transfer deficits are corralled within VA, in other words as true shunt, leaving no ability to tease out contributions from low V/Q compartments. For clinicians this can be a crucial distinction, for example in managing COVID-19 pneumonia (see "Discussion" section) [10]. Similarly, the effects of high V/Q are incorporated in a single dead space estimate (V/Q = ∞). As a final drawback, accurate VA calculations require mixed venous blood for analysis [12].
In part to address these shortcomings, scaled back variations on the MIGET framework have been proposed [18][19][20][21]. Prominent among these is the automatic lung parameter estimator (ALPE) [18], described as a 'simple bedside alternative to MIGET'. ALPE has been shown to match complex MIGET calculations in experimental lung injury [22,23], and is now finding application in clinical research [24] and as the key component of a commercial package (www. merma idcare. com) designed for monitoring and decision support.
Like MIGET, shunt is given conventionally in ALPE assessments as percentage of cardiac output. However, unlike MIGET, ALPE models 'low' and 'high' V/Q mismatch as partial pressure differentials (to be distinguished from diffusion limitation) across imposed 'partitions' between blood and alveolar gas. Specifically, 'low' V/Q mismatch is represented by the fall in PO 2 from alveolar gas to pulmonary end-capillary blood, and 'high' V/Q mismatch as the rise in PCO 2 across the same interface.
We suggest that machine learning (ML) could add value in this 'scaled back MIGET' space [25,26]. With data inputs close to those used by ALPE it should be possible for trained ML applications to generate detailed pulmonary assessments. These could take the form of a shunt estimate plus separate parameters defining log normal distributions of blood flow across compartmental V/Q ratios. Critical care physicians would then be provided with prompt actionable diagnostic information presented in a familiar format. Added bonuses could include shorter measurement intervals with a reduced requirement for FiO 2 'switching' (at present ALPE requires up to four FiO 2 'switches').
To investigate this possibility, we tested the following hypotheses in silico: (1) Trained ML applications using data normally sourced from blood gas analysis, indirect calorimetry, and cardiac output measurements can quantify pulmonary gas exchange in terms describing a multi-compartment V/Q model of pulmonary blood flow. (2) Consistent ML reports require measurement data at no more than two FiO 2 settings.

Materials and methods
To test the above hypotheses, we exposed selected ML applications to simulated clinical monitoring data routinely available from blood gas analysis, indirect calorimetry, and cardiac output measurements. Scenarios were constructed with these data to represent a diverse mix of O 2 consumption (VO 2 ) and delivery, CO 2 production (VCO 2 ) and transport, hemoglobin-oxygen affinity, and respiratory and metabolic acid-base status. Paired blood gases were generated in each simulation by a 21-compartment model of pulmonary blood flow governed by three input values: shunt percentage, log standard deviation (log SD) and distributional mean ( Fig. 1, for more model detail and core equations, see Supplementary Material). To make the evaluation, ML applications trained on this material were challenged with simulated monitoring data from 'unseen' test scenarios, the goal in each case being to back-generate the three governing model parameters of pulmonary blood flow distribution (shunt, log SD and mean). These estimates were then compared with 'true' model input values for the same scenarios.
Steps in this process were as follows: (1) Arterial blood gases were produced by the lung model at two structured settings of inspired oxygen fraction (FiO 2 ) (see below) in response to unique input combinations of the three parameters defining model pulmonary blood flow distribution (shunt, log SD and mean, Table 1) plus one value from each of six monitoring categories (

ML analysis of completed dataset
(1) After pre-processing to reduce redundancies, data rows were formatted as in Table 3 and subjected to randomization.
(2) The randomized dataset was partitioned into sequential split fractions (70%:20%:10%) for ML training, validation and testing respectively. (3) The test fraction was subjected to trained ML analysis with columns containing the model-defining values of pulmonary blood flow (shunt, log SD and mean) 'held back' to allow blinded estimates. (4) Two categories of ML estimates were performed: (a) 'Single-Point' estimates were derived by ML analysis of 10 variables confined to model input and output logs for SaO 2 = 0.90. Input variables were 'CO 2 load', 'O 2 pull', standard P50 (P50st) [27], base excess, BE [28], and blood haemoglobin concentration (Hb). Output variables were FiO 2 , arterial pH, PaCO 2 , PaO 2 , and VA (Table 3). (b) 'Two-Point' estimates were derived after inclusion of three additional variables consisting of DVA, Dsat and DPF (Table 3), all obtained from model output logs following the 0.10 FiO 2 increment.

ML methodology
We It became evident during the validation process that multiple simultaneous models in a 'stacked' or 'ensemble' configuration outperformed any single model. The stacking process used simple linear regression at the output layer to combine the contributions from individual models. Model stacks were tested using 'StackingRegressor' from the 'sklearn' Python library (https:// scikit-learn. org/ stable/). Models were trained using correlation ('R' and 'R 2 ), mean absolute error ('MAE') and by comparing the slope and distance from zero intersection of the line of best fit.
See 'Supplementary Material' for more detail of ML methodologies employed.

Statistical analysis
Prior to analysis, the comparison data were checked for completeness, accuracy, and consistency.
Two-way (univariate) comparisons were made using standard linear regression. Post-estimation diagnostics were run on all models. Due to the large size of the dataset, these included checking model residuals for normality, using both the Kolmogorov-Smirnov test and a normal probability plot and heteroskedasticity, using the Breusch-Pagan and Cook-Weisberg tests. For each predictor, the regression slope (β) and its p-value were tabulated along with the equation intercept and the overall R 2 value.
Kernel density plots and graphical Bland and Altman analyses [29] were constructed to enable visual comparisons of single-point and two-point results for each variable (shunt, log SD, and mean estimates) versus the true values.
STATA TM (v17.0) was used for all analyses with the level of significance set throughout at α < 0.05.

Results
From the final dataset of 34,551 data rows, 31,097 rows were allocated for ML training and validation and the remaining 3454 rows for testing.  Table 4, and Bland and Altman data in Table 5.

Two-point estimates
Two-point estimates of shunt, log SD and mean produced regression models with almost identical results (Table 4), with β ~ 1.00, intercept ~ 0.00 and R 2 ~ 1.00 for each of the test-set variables. The kernel density and Bland and Altman plots confirmed close agreement with true values (Figs. 3, 5, 7; Table 5).  (Table 4).

Discussion
Using computer simulation, we found that blinded ML analysis of monitoring data replicating diverse gas exchange scenarios, including blood gases generated by a 21-compartment V/Q model of pulmonary blood flow, could back-generate the model's governing parameters. This was achieved with 'stacked regressor' ML ensembles trained and tested on blood gas, indirect calorimetry, and cardiac output data over a broad spectrum of gas exchange equilibria. In each simulation ML accurately delineated pulmonary blood flow as shunt percentage plus the key descriptors (log SD and mean) of log normal flow distributions to gas exchanging compartments according to their V/Q ratios. This is essentially pulmonary blood flow in MIGET format. Measurements adopted for the simulation are available from current ICU monitoring devices [30]. Point of care blood gas analysis has been routine in ICU practice for decades. Indirect calorimetry is now recommended as a nutritional guide for critically ill mechanically ventilated patients [31][32][33]. Low invasive cardiac output monitoring, although not without problems [34][35][36], is mainstream in contemporary ICUs. The application of artificial intelligence in critical illness monitoring and decision support is itself no longer a novel concept [26].
The dataset to train, validate and test the ML applications was derived from systematically varied input combinations of the three model defining parameters (shunt, log SD, and mean, Table 1), linked to four direct measurements (cardiac output, VO 2 , VCO 2 , and Hb; Table 2)  and two calculated parameters (BE, P50st; Table 2). To complete each scenario the model generated paired sets of arterial blood gases in response to these inputs at two structured FiO 2 settings. The final dataset represented approximately 35,000 unique scenarios covering a diverse mix of O 2 delivery and consumption, CO 2 production and transport, hemoglobin-oxygen affinity, and respiratory and metabolic acid-base status.
ML was then able to back-generate the model-defining parameters of 3454 test scenarios in blinded fashion using only the blood gas measurements along with inherent derived values (BE, P50st, VA, PF ratios) plus cardiac output, VO 2 , VCO 2 , and the baseline FiO 2 . ML estimates from single-point data (recorded at baseline SaO 2 = 0.90) showed sufficient concordance with true values to reflect trends in all three key model parameters. However, a second equilibration introduced a dynamic component, captured by ML via changes in VA (DVA), PF ratios (DPF) and saturation (Dsat). This two-point approach enabled high fidelity identification of all three key model descriptors (Figs. 3, 5, 7; Tables 4, 5).
The simulation was designed to emulate a practical twostep procedure in which arterial blood gas analysis with oximetry is performed with the FiO 2 adjusted for SaO 2 = 0.90 (using SpO 2 as initial guide). This is followed by a second set of blood gases after increasing the FiO 2 by 0.10. During this process, once only measurements of cardiac output, VO 2 and VCO 2 are also recorded. ML then quantifies the defining parameters of the diagnostic model(s) of choice from relationships embedded in the data.  It should be possible to train ML applications in other diagnostic models such as the ALPE system, which like the approach considered here devolves to three key parameters [18,37], in that case shunt and partial pressure gradients across modelled blood/gas 'partitions' representing 'high V/Q' and 'low V/Q' mismatch. It is also conceivable that larger training datasets with wider input ranges could enable accurate single-point ML reports from data 'snapshots' collected at any working FiO 2 . One further possibility for future investigation is that training sets formatted to target specific model variants, for example bimodal flow distributions [38], could extend ML reporting to these complexities.   Informative 'on the spot' gas exchange evaluations can facilitate management decisions, as mentioned in the Introduction. A contemporary example might be a ventilated patient with pneumonia and hypoxemia with a PF ratio < 100. To decide on a safe course of action clinicians should be able to distinguish between two extremes of lung pathophysiology. At one extreme the disturbed oxygenation represents a large right to left shunt in the context of low pulmonary gas volumes, typical of recruitable ARDS. At the other pulmonary gas volumes are normal and shunt is minimal, the hypoxemia arising instead from widespread low V/Q ratios due to maldistributed lung perfusion, a situation more characteristic of COVID-19 with multiple pulmonary vascular thrombi. In the latter circumstance, recruitment maneuvers and major manipulations of positive end expiratory pressure (PEEP) would be contraindicated [10,38]. Varying combinations of the two extremes complete the spectrum of possibilities.
Based on our simulation, ML evaluations could make these distinctions rapidly without a need for specialized imaging. Equivalent diagnostic assessment by the current ALPE system would take 10 to 15 min, involve up to four FiO 2 'switches', and report VQ mismatch as partial pressure gradients [24,37].

Some caveats
The model of pulmonary blood flow used to generate the blood gases follows the basic West model format. Several modifications and simplifications were employed. These are detailed in the Supplementary Material.
The simulation assumes error-free measurements, whereas some degree of error is intrinsic to measurements of cardiac output [36], indirect calorimetry [39], and the measured and derived elements of blood gas analysis [12]. Indirect calorimetry has increased error potential at FiO 2 ≥ 0.7 or PEEP > 12, both encountered in severe respiratory failure [39]. Other risk factors include circuit leaks, bronchopleural fistulae, and possibly extracorporeal circulations.
We have not attempted a sensitivity analysis. However, it is noteworthy that ALPE, an advanced system now in service, is subject to similar error susceptibilities. ALPE evaluations require a single arterial blood gas analysis and one cardiac output measurement or estimate, along with measurements at three to five different FiO 2 settings of VO 2 , VCO 2 , arterial oxygen saturation by pulse oximetry (SpO 2 ), and end-tidal O 2 and CO 2 fractions [24]. Despite measurement intervals of 10-15 min with up to four FiO 2 'switches', any signal distortion from absorption atelectasis [40] and altered hypoxic pulmonary vasoconstriction [41] is regarded as minor [37].
Further, the MIGET gold-standard itself relies on a series of measurements and techniques all prone to error, including but not limited to cardiac output and minute ventilation measurements, collection of mixed expired gas without condensation-induced loss of dissolved gases, and gas chromatographic concentration measurements of six inert gases in both mixed expired gas and the gas phases above blood samples [4].
The low baseline arterial saturation (SaO 2 = 0.90) was selected to allow a subsequent 0.10 FiO 2 step-up within the bounds of FiO 2 ≤ 1.00. Although SaO 2 = 0.90 is at the hypoxemia threshold [12], it is considered adequate for tissue oxygenation in the absence of anemia and low cardiac output, albeit with limited supportive evidence [42]. Of historical interest, older versions of the automated ALPE system could manipulate baseline SaO 2 to values as low as 0.85, if necessary using FiO 2 < 0.21 [18].
Dataset shunt, log SD and mean values retained uneven distributions across their respective ranges, as illustrated by the test-set kernel density plots (Figs. 2, 3, 4, 5, 6, 7). Greater training set uniformity may have produced more consistent single-point estimations. Barriers to uniformity included the automatic rejection of input combinations in which SaO 2 ≠ 0.90 when FiO 2 ≥ 0.21 ≤ 0.90.

Conclusions
We conclude based on computer simulations of diverse gas exchange scenarios that trained ML applications using data sourced from blood gas analysis, indirect calorimetry, and cardiac output measurements can quantify pulmonary gas exchange in terms used to describe multi-compartment V/Q models of pulmonary blood flow. High fidelity ML reports require measurement data at no more than two FiO 2 settings, subject to measurement accuracy.