Background

Cardiac output (CO) monitoring is a mainstay of hemodynamic management in high-risk patients having major surgery and in critically ill patients with circulatory shock [1, 2]. Numerous technologies are available to measure or estimate CO [3,4,5,6]. Thermodilution methods allow CO calculation based on the Stewart-Hamilton principle; after injection of a known amount of indicator the change in indicator concentration downstream in the circulation is related to blood flow [7,8,9].

Pulmonary artery thermodilution remains the clinical reference method for CO monitoring [10]. For intermittent pulmonary artery thermodilution a fluid bolus with known volume and temperature is manually injected into the right atrium through the proximal port of a pulmonary artery catheter (PAC) and subsequent temperature changes over time are detected by an integrated thermistor more distal in the pulmonary artery [8]. To minimize measurement error and account for cyclic changes in CO throughout the respiratory cycle, CO is calculated based on several consecutive thermodilution CO measurements [8].

In contrast to intermittent pulmonary artery thermodilution, continuous pulmonary artery thermodilution enables CO to be measured automatically (i.e., without the need for manual indicator injection) [11]. PACs for continuous pulmonary artery thermodilution are equipped with a thermal filament heating up the blood in the right ventricle in a random binary sequence [11]. Changes in blood temperature are detected downstream by an integrated thermistor near the tip of the PAC. Based on the detected blood temperature changes, CO is continuously calculated using a stochastic system identification principle and an averaged CO value is provided by the monitor [11].

Because both continuous and intermittent pulmonary artery thermodilution are used in clinical practice it is important to know whether CO measurements by the two methods are clinically interchangeable. We, therefore, performed a systematic review and meta-analysis of clinical studies comparing CO measurements assessed using continuous and intermittent pulmonary artery thermodilution.

Methods

Study design and registration

In accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [12] we performed a systematic review and meta-analysis of clinical studies comparing continuous pulmonary artery thermodilution-derived CO measurements (COcont; test method) with intermittent pulmonary artery thermodilution-derived CO measurements (COint; reference method) in adult patients having surgery or critically ill patients treated in the intensive care unit. This systematic review and meta-analysis was registered in the International Prospective Register of Systematic Reviews (PROSPERO; registration number CRD42020159730).

Eligibility criteria

For this systematic review and meta-analysis, we considered studies published in English between January 1st, 1975 and December 31st, 2019 comparing COcont and COint in adult (age ≥ 18 years) surgical or critically ill patients that report extractable or calculable mean of the differences between COcont and COint with corresponding standard deviation (SD) and/or 95%-limits of agreement (95% LOA). We did not consider correspondences or case reports.

Information sources and search strategy

The electronic databases PubMed, Web of Science, and the Cochrane Library were systematically searched using a priori defined search strategies. As an example, the full electronic search strategy for PubMed is provided in Additional file 1. Further, the reference lists of the identified studies and the reference lists of previous reviews were searched to find additional eligible studies that had not been identified during the initial systematic database search.

Study selection

Titles and abstracts of all identified studies were screened by three investigators (PH, MF, BS). The full-text of potentially eligible studies was used to assess study eligibility based on the above-mentioned predefined eligibility criteria. Discrepancies were resolved by discussion among the three investigators.

Data collection process and data items

Four different investigators (KK, AB, CV, LB) independently extracted the data from the included studies and data were checked for consistency. Discrepancies were discussed and resolved based on the original data. We extracted data on the results of comparative statistics, i.e., the mean of the differences between COcont and COint with SD, 95% LOA, and the percentage error (PE) [13]. We report the mean of the differences between COcont and COint as COcont − COint. We re-calculated the mean of the differences for studies reporting the mean of the differences as COint − COcont accordingly. If not provided in the studies, the SD of the mean of the differences was re-calculated as (upper 95% LOA − mean of the differences)/1.96. For studies not providing the PE but reporting mean COcont and mean COint, the PE was calculated as (1.96 SD of the mean of the differences)/(mean of COcont and COint).

In addition to the results of comparative statistics, we extracted data regarding the study setting (operating room or intensive care unit), the patient population, the number of patients, the total number of measurement pairs, and the year of publication.

Risk of bias in individual studies

Based on the Quality Assessment of Diagnostic Accuracy Studies guidelines (QUADAS-2) [14] we used an adapted questionnaire (Additional file 2) to assess study quality by objectively performing judgments on bias and applicability of the included studies [14,15,16]. Risk of bias classification is based on different signaling questions of different domains that were marked with “yes”, “no” or “unclear” which finally results in classifying these domains as “low”, “high” or “unclear” risk of bias. Concerns about applicability of the included studies were rated as “low”, “high” or “unclear”. An independent quality assessment of each included study was performed by three investigators (KK, AB, LB) and discrepancies were resolved by discussion among the three investigators.

Principle summary measures

The mean of the differences between COcont and COint of the individual studies is the principal summary measure of the current meta-analysis. We used a random effects model for means as outcomes with restricted maximum likelihood as the estimator to summarize the mean of the differences, the SD of the mean of the differences, and the sample size. This random effects model derives a pooled estimate of the mean of the differences that represents the trueness/accuracy of COcont compared to COint.

For each study, we calculated the 95%-confidence interval (95% CI) for the reported/calculated mean of the differences between COcont and COint as 1.96 standard error of the mean (SD/√sample size) to account for study sample size. We summarized these 95% CIs with the random effects model and report the resulting overall random effects model-derived pooled estimate of the 95% CI.

Further, we report overall random effects model-derived pooled estimates of 95% LOA.

We summarized the PE using a random effects model for proportions with DerSimonian-Laird as the estimator [17] and report the overall random effects model-derived pooled estimate of the PE with 95% CI. We defined clinical interchangeability between COcont and COint based on the established 30% PE threshold [13]. Heterogeneity and inconsistency were assessed by means of Cochran’s Q and I2.

Synthesis of results

The database includes all relevant data to perform the meta-analysis. To obtain overall random effects model-derived pooled estimates, a random effects model was computed for each outcome. We reported Cochran’s Q as a measure of heterogeneity and I2 as a measure of consistency.

Risk of publication bias across studies

We calculated funnel plots with corresponding Eggers regression tests for asymmetry to address the potential problem of selective reporting [18].

Subgroup analyses, additional analyses

We performed subgroup analyses considering the factors "setting" (operating room and intensive care unit) and “patient population” (liver transplantation and cardiac surgery).

Additionally, we investigated the relation between the mean of the differences between COcont and COint from individual studies and a) the reported mean COint and b) the year of publication.

Statistical software

We used the software R version 4.0.2 (R Foundation for Statistical Computing. Vienna, Austria) with the R-package metafor version 2.4–0 for statistical analyses [19].

Results

Study selection

After removal of duplicates, we identified 426 different records based on the initial electronic database search (Fig. 1). We excluded 362 records after title and abstract screening. Full-text screening of the remaining 64 articles identified 54 studies fulfilling our predefined inclusion criteria [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73]. Six studies were divided into two studies each for the following reasons: measurements before and after caval clamping/graft perfusion during liver transplantation [26], measurements reported separately for infusion rates > 1000 mL/h and \(\le\) 1000 mL/h [41], measurements with different PAC devices [60, 72], measurements reported separately for patients with an ejection fraction higher or lower than 45% [65], and measurements reported separately for patients with a CO higher or lower than 8 L/min [36]. One study was divided into four studies because different software versions and different fluid bolus temperatures were used [67].

Fig. 1
figure 1

Flowchart of the literature search based on the PRISMA statement

Study characteristics

We included a total number of 1,522 individual patients in the final analysis with a median of 21 patients included per study (minimum: 7 patients, maximum: 84 patients). All studies reported the number of measurement pairs except for one study. The total number of reported measurement pairs was 17,920 with a median of 168 (interquartile range 108 and 238) measurement pairs per study. In 51 of the 54 studies, the mean of the differences was reported; for the remaining three studies the mean of the differences was calculated. In 24 of the 54 studies, 95% LOA were reported; for 30 studies 95% LOA were calculated. In 11 of the 54 studies, the PE was reported; for 16 studies the PE was calculated. In 23 of the 54 studies, the mean values of COcont and COint were reported or calculated. A summary of the included studies and CO measurement data is provided in Additional file 3.

Risk of bias in individual studies

The adapted QUADAS-2 questionnaire was used to assess the risk of bias in the included studies (Additional file 4). In 19 studies, the risk of bias was identified to be “unclear” or “high” at least for one domain, in six studies, the risk of bias was identified to be “high” at least for one domain.

Overall meta-analysis

Individual means of the differences between COcont and COint with SD and 95% LOA for each study are shown in Additional file 3. The overall random effects model-derived pooled estimate of the mean of the differences between COcont and COint was 0.08 (95% CI 0.01 to 0.16) L/min with pooled 95% LOA of − 1.68 to 1.85 L/min (heterogeneity: Q = 200.1 (P < 0.001), I2 = 75%) (Fig. 2).

Fig. 2
figure 2

Forest plot for cardiac output. Forest plot showing the results of the meta-analysis for cardiac output (CO) with mean of the differences (dots) calculated as the mean of continuous pulmonary artery thermodilution-derived CO measurements minus intermittent pulmonary artery thermodilution-derived CO measurements and corresponding 95%-confidence interval (bars) per individual study in relation to the overall random effects model-derived pooled estimate (vertical dashed line). Heterogeneity is presented with Cochran’s Q and I2. N, number of patients per study. Böttiger and colleagues [26], Costa and colleagues [36], Greim and colleagues [41], Neto and colleagues [60], Rödig and colleagues [65], and Zöllner and colleagues [72] are treated as two studies in the analysis (A and B). Schmid and colleagues [67] is treated as four studies in the analysis (A, B, C, and D)

The overall random effects model-derived pooled estimate of the PE was 29.7% with 95% CI of 20.5 to 38.9% (heterogeneity: Q = 281.3 (P < 0.001), I2 = 90%) (Fig. 3). The PE was ≤ 30% in 19 out of 27 studies (70%).

Fig. 3
figure 3

Forest plot for percentage error. Forest plot showing the results of the meta-analysis for the percentage error (dots) with 95%-confidence interval (bars) per individual study in relation to the overall random effects model-derived pooled estimate (vertical dashed line). Heterogeneity is presented with Cochran’s Q and I2. CO, cardiac output; N, number of patients per study. Costa and colleagues [36], Rödig and colleagues [65], and Zöllner and colleagues [72] are treated as two studies in the analysis (A and B)

Risk of publication bias across studies

Funnel plots indicating the risk of publication bias across studies including Eggers regression tests are shown in Additional file 5 for CO (P = 0.843), and Additional file 6 for PE (P = 0.474).

Subgroup analyses, additional analyses

We performed subgroup analyses considering the factors "setting" (operating room and intensive care unit), “patient population” (liver transplantation and cardiac surgery), and “availability of the PE” (studies where the PE was reported or calculable and studies where the PE was not reported or calculable).

For patients studied in the operating room [20, 22, 26, 34, 37, 38, 41, 44, 48, 49, 52, 62, 64, 69], the overall random effects model-derived estimate of the mean of the differences was 0.14 (95% CI 0.00 to 0.28) L/min with pooled 95% LOA of − 2.03 to 2.44 L/min (Additional file 7). For patients studied in the intensive care unit [21, 23, 24, 27,28,29, 31,32,33, 35, 36, 39, 40, 42, 45, 46, 51, 53,54,55,56,57,58,59,60, 66,67,68, 70,71,72,73], the overall random effects model-derived estimate of the mean of the differences was 0.07 (95% CI − 0.04 to 0.17) L/min with pooled 95% LOA of − 1.66 to 1.76 L/min (Additional file 8).

For patients having liver transplantation [20, 22, 26, 35, 36, 38, 41], the overall random effects model-derived estimate of the mean of the differences was 0.07 (95% CI − 0.26 to 0.40) L/min with pooled 95% LOA of − 2.89 to 3.01 L/min (Additional file 9). For patients having cardiac surgery [23, 25, 27, 30,31,32, 34, 39, 43,44,45, 47,48,49, 52, 54, 56, 60, 61, 63,64,65, 67, 69, 70, 72, 73], the overall random effects model-derived estimate of the mean of the differences was 0.09 (95% CI − 0.01 to 0.18) L/min with pooled 95% LOA of − 1.38 to 1.54 L/min (Additional file 10).

There were no clinically meaningful differences in the mean of the differences and the 95% LOA between studies with reported/calculable PE and studies without reported/calculable PE (Additional files 11 and 12).

The mean of the differences between COcont and COint from individual studies was not influenced by the reported mean COint (Additional file 13) or the year of publication (Additional file 14).

Discussion

In this meta-analysis of clinical studies comparing COcont and COint in adult surgical and critically ill patients, the heterogeneity across studies was high. The overall random effects model-derived pooled estimate of the mean of the differences between COcont and COint was 0.08 L/min with pooled 95% LOA of − 1.68 to 1.85 L/min and a pooled PE of 29.7 (95% CI 20.5 to 38.9)%.

In CO method comparison studies, the agreement between a test and a reference method is described by the trueness (often called “accuracy”) and precision of agreement [74,75,76] based on Bland–Altman analysis [77,78,79]. In Bland–Altman plots, the difference between measurements with a test and a reference method is plotted against the mean of the two measurements [77,78,79]. The mean of the differences (often called “bias”) reflects the trueness of test method measurements, the SD and 95% LOA of the mean of the differences reflect the precision of agreement [74,75,76]. The PE is used frequently in CO method comparison studies to characterize the precision of agreement; the PE is 1.96 SD of the mean of the differences between measurements divided by the mean value of all measurements [13]. In their landmark study, Critchley et al. proposed 28.3%, rounded up to 30%, as the PE threshold defining interchangeability [13]. Nevertheless, one should keep in mind that the PE threshold of 28.3% is based on the assumption that the precision of method of both the test method and the reference method are 20%. Because the precision of method is not exactly known, using a 30% PE threshold may lead to misinterpretations concerning the clinical interchangeability of COcont and COint.

In this meta-analysis, the overall random effects model-derived pooled estimate of the mean of the differences between COcont and COint was < 0.1 L/min—which is less than a 2% difference for an average adult CO of 5 to 6 L/min. This meta-analysis thus suggests a good trueness/accuracy of COcont compared with COint when looking at the overall pooled mean of the differences. However, a low pooled mean of the differences in meta-analyses can be misleading because averaging study results with negative and positive means of the differences of similar absolute amount can result in a very low pooled mean of the differences despite marked measurement differences in single studies. In this meta-analysis, studies reporting an overestimation and those reporting an underestimation of COcont compared to COint neutralized each other, as illustrated in Fig. 2.

Regarding the precision of agreement between COcont and COint this meta-analysis revealed that the pooled 95% LOA of the mean of the differences between COcont and COint were − 1.68 to 1.85 L/min. The overall random effects model-derived pooled estimate of the PE was 29.7 (95% CI 20.5 to 38.9)%—thus suggesting that COcont barely passes interchangeability criteria with COint [13]. However, the PE was only available for half of all studies because the PE per se or mean CO values necessary for post-hoc PE calculation were not always reported. Nevertheless, 95% CIs were similar in studies with reported or calculable PE and studies where the PE was not reported or calculable suggesting that the PE for all studies would probably also be close to 30%.

This meta-analysis showed a large variability in results between studies, with means of the differences reported in single studies ranging from − 0.79 to 1.00 L/min and PEs ranging from 4.8 to 89.3%. This variability strongly suggests that the measurement performance of COcont is influenced by various factors, that may include patient characteristics, the clinical setting, and cardiovascular dynamics. Even subgroups of studies were heterogeneous. For example, the “operating room” subgroup included patients having different types of surgery, the “intensive care unit” subgroup included patients with and without circulatory shock requiring different vasopressor and inotropic support, and the “cardiac surgery” subgroup included patients studied either during or after surgery. It is important to bear in mind that the measurement performance is context-sensitive when interpreting validation studies of any CO monitoring system [80].

Intermittent pulmonary artery thermodilution remains the clinical reference method for CO monitoring and therefore is frequently used as the “gold standard” in method comparison studies [10]. Continuous pulmonary artery thermodilution offers the opportunity to measure CO automatically without the need for manual indicator injection, thus reducing contamination risk and saving time [81]. Although “continuous” suggests that this PAC technology provides real-time CO measurements, it actually provides “semi-continuous”, averaged CO values [11, 81]. The averaging procedure improves the signal-to-noise ratio but may cause a time delay of up to several minutes. This time delay may become relevant when hemodynamics change rapidly, e.g., during dynamic tests such as passive leg raising and during therapeutic interventions such as fluid or vasopressor administration [8, 82, 83].

In today’s clinical practice, PACs are mainly used in patients having cardiac surgery, liver transplantation, and in critically ill patients with circulatory shock, especially with right ventricular dysfunction [10, 84]. Using a PAC allows monitoring of CO, mixed venous oxygen saturation, and intravascular pressure and thus provides important information on cardiovascular dynamics [85]. There is nonetheless an ongoing debate on whether or not PACs still have a place in daily clinical practice [86,87,88]. Some trials showed no clinical benefit of using the PAC without treatment protocols in critically ill patients [89, 90] or cardiac surgery patients [91]. Additionally, there are now various methods to measure or estimate CO less invasively or even non-invasively [3, 6]. The clinical use of the PAC thus decreased over the last years in critically ill patients and in surgical patients [92, 93].

Although intermittent and continuous pulmonary thermodilution methods are widely used, we are not aware of any meta-analysis investigating the overall agreement between the two methods. In contrast, several meta-analyses have already been published for Doppler [94, 95], bioimpedance [15, 94], as well as invasive and non-invasive pulse contour methods [15, 16, 94]. They all reported pooled PE values ranging between 40 and 50%.

We only investigated the absolute agreement between COcont and COint and did not analyze the trending ability of COcont. The ability to track changes in CO is actually the main expectation clinicians may have from a continuous monitoring system over an intermittent technique. Unfortunately, most studies of this meta-analysis did not report concordance rates or polar plots, so that we were unable to assess the ability of continuous pulmonary thermodilution to track changes in CO. Furthermore, several studies [19 of 54 (35%)] had a risk of bias classification of “unclear” or “high” that may further influence the final results of this meta-analysis. About half of the included studies [26 of 54 (48%)] were performed before the year 2000, and only 6 (11%) studies after 2010.

Conclusion

The heterogeneity across clinical studies comparing COcont and COint in adult surgical and critically ill patients is high. The overall trueness/accuracy of COcont in comparison with COint is good (indicated by a pooled mean of the differences < 0.1 L/min). Pooled 95% LOA of − 1.68 to 1.85 L/min and a pooled PE of 29.7 (95% CI 20.5 to 38.9)% suggest that COcont barely passes interchangeability criteria with COint. The PE was ≤ 30% in two-thirds of studies with available PE.