1 Introduction

1.1 Rationale

Information about the hemodynamic status of patients plays an important role in daily clinical practise in the emergency department, the intensive care unit (ICU) and operating room (OR). Heart rate, blood pressure and pulse-oximetry monitoring is generally applied. Advanced hemodynamic monitoring is used in critically ill and high-risk surgical patients. Many studies, including meta-analyses [1,2,3,4,5], have shown that optimization of hemodynamic parameters reduces mortality, morbidity, post-operative complication rates, duration of hospital stay and improves functional recovery in high-risk surgical patients.

In adults intermittent pulmonary artery thermodilution (intermittent PAC) and transpulmonary thermodilution (TPTD) are considered the gold standard for the measurement of cardiac output (CO). However, these methods are invasive and linked to complications [6,7,8,9]. In neonates and pediatric patients transthoracic echocardiography (TTE) is the most commonly used technique. This technique has several limitations as it requires an experienced operator, is technically demanding and is obtained intermittently. Recently, many non-invasive devices have been developed and studied [10,11,12].

One of these new non-invasive, yet to become established, methods is thoracic electrical bioimpedance (TEB), first described in 1966 by Kubicek and colleagues [13]. This method is based on changes in thoracic resistance as a result of changes in blood velocity during the cardiac cycle and uses an algorithm to calculate the CO. Sramek and Bernstein (1986) modified the algorithm [14]. The most recent modification is the Bernstein-Osypka Eq. (2003), also called electrical velocimetry or electrical cardiometry (EC) [15, 16]. The latter name will be used in this manuscript.

EC measures alteration in thoracic resistance or impedance, using four skin electrodes. EC is able to isolate the changes in impedance created by the circulation, partly caused by the change in orientation of the erythrocytes during the cardiac cycle (Fig. 1). Impedance cardiography can be affected by the remaining thoracic tissue or fluid [17]. Two electrodes are placed on the left base of the neck and two on the left inferior side of the thorax at the level of the xiphoid process (Fig. 1). Exact placement of the electrodes is important because measurements can vary when placement is incorrect. The inter-electrode gap of the lower electrodes should be 15 cm in adults [18]. The electrodes are connected to either the Aesculon® monitor (Osypka Medical GmbH, Berlin, Germany) or the ICON® monitor (Osypka Medical GmbH, Berlin, Germany), which is smaller in size and portable. Both devices derive stroke volume, heart rate and CO from the impedance values. Further details of the devices are described elsewhere [15, 16, 19].

Fig. 1
figure 1

a Placement of electrodes on the left base of the neck and on the left inferior side of the thorax at the level of the xiphoid process. b Arrangement and orientation of erythrocytes during diastole (left) and systole (right) explaining the difference in thoracic impedance. Figure reproduced from Osypka Medical GmbH, an introduction to Electrical Cardiometry [19]

This safe and easy applicable method could be a suitable candidate to complement or replace invasive CO monitoring. Several studies tried to validate EC using different reference methods, leading to conflicting results. EC was part of three meta-analyses with limited studies only [10,11,12]. So, its place between all existing hemodynamic monitoring devices has yet to be determined. Our meta-analysis focuses exclusively on EC, for definitive validation of accuracy and precision in both adults and pediatrics.

1.2 Objective

We conducted a systematic review to assess the accuracy and precision of CO measurement by EC compared to a reference method, in both adults and pediatrics. The primary outcome measures were (i) accuracy, defined as the bias between the CO measured by EC and the reference methods, (ii) precision, defined as the standard deviation (SD) of the bias, (iii) the limits of agreement (LoA) defined as [bias ± 1.96*SD], and (iv) the mean percentage error (MPE) derived from the SD and mean CO. A pooled MPE of less than 30% was considered clinically acceptable, as described by Critchley and Critchley [20].

2 Methods

This systematic review was conducted using Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) approach (See Table 5 in Appendix 1) [21].

2.1 Eligibility criteria

Eligibility criteria were (1) studies comparing CO measurement by EC and a reference method, (2) studies using Bland–Altman analysis to report bias, SD of the bias and MPE or for which those data could be extracted [22], (3) studies performed in humans and (4) studies published as a full paper in English. Studies involving participants of any age and under any clinical circumstances were included. No restriction in publication date was applied.

2.2 Information sources and search

Two independent investigators (MS and SS) performed an electronical database search of PubMed, Embase, Web of Science and the Cochrane Library of Clinical Trials. The last date of search was January 4, 2019. Studies that were not published as full journal articles (e.g. letters, editorials, conference papers) and retracted publications were excluded. The search strategy conducted in PubMed is shown in Appendix 2. The search strategies for the other databases were comparable and are available on request. The manufacturer of ICON®/Aesculon® (Osypka Medical GmbH, Berlin, Germany) and the website were consulted to identify additional studies. The reference lists of all included studies were screened for additional studies. EndNote® software, version X8.1 (Thomson Reuters, New York, USA) was used to arrange all articles and to filter the duplicates between databases.

2.3 Study selection

Two independent investigators (MS and SS) identified the potentially relevant studies. The first selection was based on title and abstract. The remaining full text articles were reviewed for eligibility. After including an article we arranged them in the category adult or pediatric patients. Conflicts were resolved by consensus or after consultation with the third investigator (CS). The flow diagram of this study selection process is shown in Fig. 2.

Fig. 2
figure 2

Flow diagram of the study selection process. CI cardiac index, MPE mean percentage error, SV stroke volume

2.4 Data collection process

A customized data form was developed by three investigators (MS, SS and CS), using Microsoft Excel (Microsoft Office, Washington, USA). The data extraction form was pilot-tested on five randomly-selected included studies and refined. Data were extracted independently by two investigators (MS and SS). Patient characteristics, clinical setting, age, reference method and device, number of patients, total number of measurements, and financial support were considered relevant (Tables 1, 2). For the statistical analyses we extracted mean CO, CO range, bias, SD of the bias, LoA and MPE (See Tables 6, 7 in Appendix 3, 4). Precision of the reference and tested method and assessment of trending ability were added to the data extraction form after the pilot-test. Disagreements in data extraction were resolved by consensus or by consultation of CS.

Table 1 Study characteristics of included adult studies
Table 2 Study characteristics of included pediatric studies

Mean CO, bias, LoA, SD, MPE and precision of the reference or tested method were defined according to the following equations:

$$Mean\,CO = \frac{Mean\,COec + Mean\,COreference}{2}$$
(1)
$$Bias = Mean\,COreference - Mean\,COec$$
(2)
$$Limits\,of\,Agreement = bias \pm 1.96*SD\, {\text{or}}\,SD = \frac{upper\,LoA - lower\,LoA}{1.96*2}$$
(3)
$$Mean\,Percentage\,Error = \frac{1.96*SD\,of\,bias\,between\,methods}{Mean\,CO}*100\%$$
(4)
$$Precision\,method x = \frac{1.96*SD\,of\,reproducibility}{mean\,CO\,method\,x}$$
(5)
$$Precision\,method\,x = 1.96 * Coefficient\,of\,Variation.$$
(6)

Missing information was calculated using the equations above. If the data could not be calculated, data was extracted from the Bland–Altman plot. If both options could not be applied, the authors were contacted. Duplicate publication of data was assessed by juxtaposing author names, reference methods, sample sizes, outcome measures mean CO, bias, MPE and data points in Bland–Altman plots.

2.5 Risk of bias assessment in individual studies

To assess the risk of bias for individual studies we used the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) guidelines [23]. The original QUADAS-2 tool consists of the four domains patient selection, index test, reference test, flow and timing. Signalling questions are used to assess the risk of bias in each domain. The first three domains are also assessed in terms of concerns about applicability. Kim et al. modified these guidelines to make them more suitable for method-comparison studies [24]. We modified Kim’s QUADAS-2 tool and pilot-tested it on five randomly-selected included studies and refined it accordingly. After the pilot-test, we developed a fifth domain, to assess the statistical analysis and implemented the recommendations of Cecconi [25]. The modified QUADAS-2 tool is available in Table 8 in Appendix 5. MS and SS independently assessed the risk of bias. Conflicts were resolved by consensus or by consultation of CS.

2.6 Summary measures

The primary outcome measures were (i) accuracy, defined as the bias between the CO measured by EC and the reference methods, (ii) precision, defined as the SD of the bias, (iii) the LoA and (iv) the MPE. A pooled MPE of less than 30% was considered clinically acceptable, as described by Critchley and Critchley [20].

2.7 Synthesis of results

Pooled bias, LoA and MPE for both adults and pediatrics were calculated using a random-effects model, as heterogeneity could be present, and forest plots were created. The weight given to the results of the independent studies was determined according to the inverse variance method. Inter-study heterogeneity was calculated using a Q test and described as an I2 index (0% no heterogeneity, 25% low heterogeneity, 50% moderate heterogeneity, 75% high heterogeneity) [26]. If an individual study led to multiple outcome measures for bias, LoA and MPE, the outcomes of those studies were presented in different rows in the forest plot.

2.8 Subgroup analyses

Subgroup analyses of the gold standard thermodilution (TD) in adults and most commonly used method TTE in pediatrics were pre-specified for definitive validation of EC. For adults, we distinguished between intermittent TD and continuous TD, as continuous TD averages CO over a longer time period. This led to the subgroups intermittent TD, continuous TD and other reference method. For pediatrics we distinguished between children and neonates, which led to the subgroups TTE children, TTE neonates and other reference method children. A test for subgroup differences was applied. Subgroup analysis for clinical setting was conducted post hoc in adults. This led to the subgroups cardiac surgery, OR, ICU and other clinical setting.

2.9 Risk of publication bias across studies

Risk of publication bias across studies was assessed for both adults and pediatrics using funnel plots, showing the bias versus it’s standard error. The symmetry of the funnel plots was assessed visually and by Egger’s regression test using a significance level of 0.1 [27].

The statistical analyses were conducted using R, version 3.4.2 (R Foundation for Statistical Computing, Vienna, Austria), Rstudio (RStudio, Inc., Boston, USA) and SPSS Statistics, version 25.0 (IBM Business Analytics, New York, USA). The lay-out of the forest and funnel plots was customized using Adobe Photoshop CS4 (Adobe Systems, California, USA).

3 Results

3.1 Study selection

We found an initial amount of 777 citations through the database search and two additional records by consultation of the manufacturer’s website [28, 29]. After duplicates were removed, 571 studies remained. After title and abstract screening, 41 studies remained. Those full-text articles were assessed for eligibility, which led to 24 included studies [28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51] and 17 excluded studies [18, 52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67]. The included studies were divided into 13 studies in adults [28,29,30,31,32,33,34,35,36,37,38,39,40] and 11 studies in pediatrics [41,42,43,44,45,46,47,48,49,50,51]. Contacting the manufacturer and screening of the reference lists of all included studies led to no additional studies. The flow diagram of the study selection process is shown in Fig. 2. The articles which were excluded after full-text analysis and the reason for exclusion are listed in Appendix 6.

3.2 Study characteristics

Study characteristics of the included studies are presented in Tables 1 and 2. A total of 620 adults and 603 pediatric patients were included. Sample size ranged from 16 to 134 patients with a mean of 52 patients. Concerning adult studies; two were conducted in the OR during liver transplantation surgery [31, 39], three during cardiac surgery [28, 29, 36], two both during cardiac surgery and post cardiac surgery in the ICU [30, 33], two in the ICU [35, 40], three in the cardiology unit [34, 37, 38] and one in the outpatient unit [32]. Concerning pediatric studies; four were conducted in the OR [45, 47, 49, 50], two in the ICU [43, 44], two in the neonatal intensive care unit (NICU) [46, 51], and three in the outpatient unit [41, 42, 48]. The ICON® device was used in nine studies [28, 29, 31, 32, 41,42,43, 45, 48] and the Aesculon® in fifteen studies [30, 33,34,35,36,37,38,39,40, 44, 46, 47, 49,50,51]. In the majority of the adult studies intermittent PAC was used as reference method [28, 30, 31, 33, 34, 40]. Other used reference methods in adults were continuous PAC [29, 39], TPTD [35, 40], TTE [32], transesophageal echocardiography (TEE) [36], magnetic resonance imaging (MRI) [37] and Fick-method [38]. The mean age in adults was 51 years. In the pediatric studies the most commonly used reference method was TTE [41,42,43,44, 46, 48, 49, 51] except for three studies, which used intermittent PAC [50], TEE [45] and Fick-method [47]. Two studies focussed on neonates with a mean gestational age of 36 weeks, both using TTE as reference method [46, 51]. Three studies acknowledged financial or material support by Osypka Medical GmbH [30, 31, 35].

3.3 Contacting authors

We contacted three authors concerning the direction of the bias (reference–tested method or tested–reference method) [28, 40, 47]. One of them responded [47] and for the other two studies we interpreted the direction of the bias ourselves [28, 40]. We contacted one author concerning the mean CO and MPE [67]. As the mean CO was not described in the manuscript and could not be extracted from the Bland–Altman plot, the MPE could not be calculated. The author did not respond and therefore the study was excluded.

3.4 Risk of bias in individual studies

The assessment of the risk of bias for adult studies is provided in Table 3 and for pediatrics in Table 4. The majority of the included studies was judged low risk of bias with respect to patient selection, tested method, reference method and flow timing. For six studies potential for bias existed in more than one of those four domains, but were considered low risk [30, 33, 34, 37, 38, 47]. Concerning the statistical analysis domain, all studies were judged high risk, except for two studies [46, 49]. Concerns on applicability were assessed low for all studies, which is not shown in Tables 3 and 4.

Table 3 Risk of bias for included adult studies, according to the
Table 4 Risk of bias for included pediatric studies, according to the

3.5 Synthesis of results, adults

The pooled results for the adult studies are shown in Fig. 3. The overall random effects pooled bias was 0.03 L min−1 [95% CI − 0.23; 0.29], LoA − 2.78 to 2.84 L min−1 and MPE 48.0%. Inter-study heterogeneity was high (I2 = 93%, p < 0.0001). For two studies multiple data for a patient is presented in two or three different rows in the forest plot, as those studies presented multiple outcome measures for different clinical circumstances [30, 34]. Therefore, the number of patients in the forest plot for adults (N = 667) differs from the actual number of adult patients (N = 620).

Fig. 3
figure 3

Forest plot showing the bias, LoA and MPE for the studies in adults. The random effects pooled bias was 0.03 L min−1, LoA − 2.78 to 2.84 L min−1 and MPE 48.0%. Significant heterogeneity was detected (I2 = 93%, p < 0.0001). aOR, bICU, cat rest, dduring exercise, eduring NO inhalation, fintermittent PAC as reference method, gTPTD as reference method, LoA limits of agreement, MPE mean percentage error, N number of patients

3.6 Subgroup analyses, adults

Figure 7 in Appendix 7 shows a subgroup analysis for reference method in adults. The subgroup intermittent TD showed a random effects pooled bias of 0.04 L min−1 [95% CI − 0.28; 0.37], LoA − 3.14 to 3.22 L min−1 and MPE 53.5%. Heterogeneity was high (I2 = 80%, p < 0.0001). The subgroup continuous TD showed a pooled bias of − 0.56 L min−1 [95% CI − 1.70; 0.57], LoA − 2.90 to 1.78 L min−1 and MPE 31.1%. Heterogeneity was high (I2 = 82%, p = 0.02). The subgroup other reference showed a pooled bias of 0.16 L min−1 [95% CI − 0.57; 0.90], LoA − 2.34 to 2.66 L min−1 and MPE 48.5%. Heterogeneity was high (I2 = 97%, p < 0.0001). There was no statistically significant difference in subgroup effects (p = 0.55).

Figure 8 in Appendix 8 shows a subgroup analysis for clinical setting in adults. The subgroup cardiac surgery showed a random effects pooled bias of 0.01 L min−1 [95% CI − 0.14; 0.17], LoA − 1.34; 1.36 L min−1 and MPE 33.3%. Heterogeneity was high (I2 = 73%, p < 0.01). The subgroup OR showed a pooled bias of 1.00 L min¯1 [95% CI − 3.47; 5.47], LoA − 4.05; 6.05 L min−1 and MPE 67.7%. Heterogeneity was high (I2 = 97%, p < 0.0001). The subgroup ICU showed a pooled bias of 0.04 L min−1 [95% CI − 0.18; 0.27], LoA − 2.37; 2.45 L min−1 and MPE 42.9%. Heterogeneity was moderate (I2 = 38%, p = 0.17). The subgroup other clinical setting showed a pooled bias of − 0.35 L min−1 [95% CI − 1.22; 0.53], LoA − 3.17; 2.47 L min−1 and MPE 53.5%. Heterogeneity was high (I2 = 96%, p < 0.0001). There was no statistically significant difference in subgroup effects (p = 0.82). The study by Mekis et al. was conducted during cardiac surgery and in the ICU [33]. Therefore, we divided the data of this study in three rows, namely before and immediately post cardiac surgery and in the ICU. As three rows in the subgroup analysis for clinical setting replace one row in the forest plot for adults (Fig. 3), the number of patients and pooled data presented in the subgroup analysis for clinical setting slightly differ from the actual pooled data presented in the forest plot for adults.

3.7 Synthesis of results, pediatrics

Figure 4 demonstrates the pooled results for the pediatric studies. The overall random effects pooled bias was − 0.02 L min−1 [95% CI − 0.09; 0.05], LoA − 1.22 to 1.18 L min−1 and MPE 42.0%. Heterogeneity was high (I2 = 86%, p < 0.0001).

Fig. 4
figure 4

Forest plot showing the bias, LoA and MPE for the studies in pediatrics. The random effects pooled bias was − 0.02 L min−1, LoA − 1.22 to 1.18 L min−1 and MPE 42.0%. Significant heterogeneity was detected (I2 = 86%, p < 0.0001). anormal weight, boverweight and obese, LoA limits of agreement, MPE mean percentage error, N number of patients

3.8 Subgroup analysis, pediatrics

Figure 9 in Appendix 9 shows a subgroup analysis for reference method in pediatrics. The subgroup TTE in children showed a random effects pooled bias of − 0.10 L min−1 [95% CI − 0.25, 0.04], LoA − 1.61 to 1.41 L min−1 and MPE 43.9%. Heterogeneity was high (I2 = 75%, p < 0.001). The subgroup TTE in neonates showed a pooled bias of 0.01 L min¯1 [95% CI − 0.01, 0.02], LoA − 0.14 to 0.16 L min−1 and MPE 35.1%. No heterogeneity was detected (I2 = 0%, p = 0.94).The subgroup other reference method in children showed a pooled bias of 0.15 L min−1 [95% CI − 0.14; 0.44], LoA − 0.73 to 1.03 L min−1 and MPE 41.6%. Heterogeneity was high (I2 = 96%, p < 0.0001). There was no statistically significant difference in subgroup effects (p = 0.21).

3.9 Risk of publication bias across studies

To detect risk of bias across studies, funnel plots are shown in Figs. 5 and 6. Egger’s regression test showed no significant p value for both adults (p = 0.4147) and pediatrics (p = 0.6572), indicating a low risk of publication bias [27]. However, for both groups asymmetry could be detected, which could be caused by publication bias or high heterogeneity. The latter is most likely the explanation. However, publication bias cannot be excluded.

Fig. 5
figure 5

Funnel plot for detection of publication bias across included studies in adults. Egger’s regression test showed no significant p-value (p = 0.4147). The funnel plot shows asymmetry

Fig. 6
figure 6

Funnel plot for detection of publication bias across included studies in pediatrics. Egger’s regression test showed no significant p-value (p = 0.6572). The funnel plot shows asymmetry

3.10 Trending ability

Seven of the thirteen studies in adults assessed trending ability, applying several statistical analyses [28,29,30,31, 33, 34, 39]. Magliocca et al. and Wang et al. analysed trending ability using a 4-quadrant plot, showing a concordance rate of respectively 100% and 56.5% [31, 39]. Other statistical methods were a time plot [29], a receiver operator characteristic curve [28], descriptive analyses of changes in CO for the whole study population [33, 34] or individuals [30]. None of the studies in pediatrics evaluated trending. Due to a lack of agreement on the statistical methodology, no pooled results can be calculated.

4 Discussion

4.1 Summary of evidence

This meta-analysis of 24 studies, which assesses the accuracy and precision of EC, shows a pooled bias of 0.03 L min¯1 [95% CI − 0.23, 0.29], LoA − 2.78 to 2.84 L min−1 and MPE 48.0% in adult studies. In pediatric studies pooled bias was − 0.02 L min−1 [95% CI − 0.09; 0.05], LoA − 1.22 to 1.18 L min−1 and MPE 42.0%. Inter-study heterogeneity was high in both adults (I2 = 93%, p < 0.0001) and pediatrics (I2 = 86%, p < 0.0001).

Although the pooled bias in both adult and pediatric studies was close to zero, high accuracy cannot be assumed, as the range of the bias in the studies was wide. The direction of the bias (positive or negative) is inconsistent and cannot be predicted in the clinical setting, which corresponds with the high inter-study heterogeneity. Pooled MPE in all subgroups were above the recommended 30% [20]. Therefore, EC cannot replace TD and TTE for the measurement of absolute CO values.

The ICON® and Aesculon® monitors were included in three other meta-analyses [10,11,12]. Importantly, the data of the three other meta-analyses are the result of subgroup analyses for TEB, including EC but also other devices based on other algorithms. Therefore, no conclusions may be drawn for EC only. Peyton and Chong found a bias of − 0.10 L min−1 and a MPE of 42.9% in adults, by a subgroup analysis including five EC studies and seven studies based on other algorithms [12]. Joosten et al. performed a subgroup analysis including four EC studies and six studies based on other algorithms and found a bias of − 0.22 L min−1 and a MPE of 42% in adults [10]. These results are comparable with our findings. Suehiro et al. found a bias of − 0.03 L min−1 and a MPE of 23.6% in pediatrics, by a subgroup analysis of four EC studies and four studies based on other algorithms [11]. We found similar bias, but could not confirm the low MPE. In contrast to above mentioned reviews, our results are derived from EC studies only. Furthermore, subgroup analyses of the gold standard in adults (TD) and most commonly used technique in pediatrics (TTE) were applied in our meta-analysis. This leads to definitive validation of EC compared to these methods. Besides, our meta-analysis includes more studies, and therefore more patients and more clinical settings than previous meta-analyses. So in numbers and diversity our study contributes and elaborates on the topic.

When compared to other minimally or non-invasive techniques used in clinical practice, most devices show a MPE of more than 30% [10, 12, 68,69,70,71,72,73,74,75,76]. Therefore, Peyton and Chong have suggested to change the acceptable MPE to 45%, ensuring a higher rate of agreement in new methods [12]. MPE is determined by the reference and tested method and highly influenced by the clinical condition. The lowest bias and MPE are found in validation studies during cardiac surgery [68, 77, 78]. The worst results are found during sepsis and septic shock as the bias of most non-invasive devices is negatively influenced by a low systemic vascular resistance (SVR) [68, 74, 75, 79,80,81]. Which device should be the reference method and under which clinical condition the validation needs to be performed, remains subject of discussion.

The subgroup analysis for reference method in adults (Fig. 7 in Appendix 7) showed a relatively high MPE (53.5%) for intermittent TD and a relatively low MPE (31.1%) for continuous TD. The high MPE for intermittent TD can be explained by the high MPE of the included studies. As the subgroup continuous TD consists of only two studies, the low MPE can be explained by the extremely low MPE (4.7%) of one included study [29].

The subgroup analysis for clinical setting in adults (Fig. 8 in Appendix 8) showed a low bias (0.01 L min−1) and a relatively low MPE (33.3%) during cardiac surgery, probably due to the hypodynamic status with low CO and high SVR. The studies in this subgroup showed a mean CO of 4.1 ± 0.2 L min−1. The other included adult studies showed a statistical higher (p < 0.05) mean CO of 6.3 ± 1.7 L min−1. The OR subgroup, consisting of two studies during liver transplantation [31, 39], showed a relatively high bias (1.00 L min−1) and high MPE (67.7%), this could be explained by the hyperdynamic status (high CO and low SVR) which is often seen during these procedures [31, 68]. The patient characteristics in the ICU subgroup differed too much to draw conclusions for this subgroup, as it concerned post cardiac surgery patients [30, 33], patients suffering from systemic inflammatory response syndrome or sepsis post-surgery [35] or critically ill patients post-surgery [40] (Table 1). The same accounts for the studies included in the other clinical setting subgroup, which concerned pregnant women [32], hemodynamically stable cardiac patients [37, 38] or took partly place during exercise or NO inhalation [34] (Table 1).

The results for the subgroup TTE children were comparable to the pooled results for pediatric studies. The subgroup TTE neonates showed a relatively low MPE (35.1%) (Fig. 9 in Appendix 9).

Although a subgroup analysis for clinical setting in adults was performed post hoc, we decided not to perform the same subgroup analysis in pediatric studies, as the clinical settings differed too much (Table 2), which should lead to very small subgroups. No subgroup analyses for age were performed, as the age ranged too much in the individual adult and pediatric studies (Tables 1, 2).

4.2 Recommendations for clinicians

EC cannot replace TD and TTE for the measurement of absolute CO values. However, as the MPE is comparable to clinically used minimally or non-invasive hemodynamic monitors, EC could complement monitoring in the ICU and NICU, providing continuous monitoring, relevant for goal-directed therapy and clinical decision-making. This should be further investigated. In the OR, monopolar electrocauterization interferes with the EC measurement [82]. Bipolar electrocauterization does not.

4.3 Limitations

This study has multiple limitations. Firstly, population selection bias could be present. Most studies took place in cardiac surgical setting [28,29,30, 33, 36, 44]. Although hemodynamic instability can be present, cardiac surgery is characterized by low CO and high SVR [68, 77, 78], which could be an explanation for the low bias and relatively low MPE in the cardiac surgery subgroup. The low bias and MPE influence the pooled data in adults.

Another limitation is the LoA and MPE as outcome measures. Both are influenced by the error of the reference method. All reference methods have their own inherent error and do not provide an accurate and precise measurement of CO. For example, the precision of different TD devices is proved to be 13% by Stetz et al. [83]. Slagt et al. showed a precision of 6.7% for TPTD [81]. For intermittent PAC, precisions of 6.4% [84], 8.4% [85] and 16.2% [28] are described. For TTE, Mercado et al. showed a 9% precision [86] and we derived 8.4% precision based on the data by Tomaske et al. (See Table 7 in Appendix 4) [49]. Concerning TEE, precisions of 12.8% [84] and 16.0% [85] are described. For Fick method, a precision of 27.4% was calculated from the data by Trinkmann et al. (See Table 6 in Appendix 3) [38]. Critchley and Critchley proved that the MPE depends on both the precision of the reference and tested method, according to the following equation [20]:

$$MPE = \sqrt {\left[ {\left( {precision_{reference} } \right)^{2} + \left( {precision_{test} } \right)^{2} } \right]} .$$
(7)

To draw conclusions from the MPE concerning the precision of the tested method, Cecconi recommends to measure the precision of the reference method within the study using repeated measurements and according to the following equation:

$$Precision\,method\,x = \frac{1.96*SD\,of\,reproductability}{mean\,CO\,method\,x}.$$
(8)

The precision of the tested method can then be calculated, according to Eq. (7) [25]. Hapfelmeier proved that Eq. (7) is not completely true, as the overall precision and MPE depend on the method’s variability about the true values as well [87]. In spite of its inaccuracy, Eq. (7) indicates that the precision of both reference and tested method influence the MPE and should therefore be calculated for proper interpretation of the LoA and MPE. Only a few studies measured both (Tables 3, 4) [28, 38, 40, 49].

In addition to the latter described limitation, the different reference methods should be described as another limitation. It is questionable whether the included studies, based on different reference methods, are comparable. This could be an explanation for the high inter-study heterogeneity found in our review. Therefore, we applied subgroup analyses of the gold standard TD in adults and most commonly used technique TTE in pediatrics. The results of the subgroup analyses are discussed earlier. Inter-study heterogeneity decreased, but remained high. The subgroup TTE in neonates showed no heterogeneity (I2 = 0%), as the two included studies showed comparable results.

To assess the statistical analysis in the included studies, we developed an additional domain for the modified QUADAS-2 tool. This has not been done previously. The risk of bias in individual studies was high in the statistical analysis domain (Tables 3, 4), which is a limitation of this review too. First, in some studies, the direction of the bias was unclear [28, 29, 40, 44, 47]. Second, the SD described in the manuscript did not correspond with the LoA in the figure [28, 29, 43]. Third, the recalculated MPE differed from the value presented in five studies [29, 37, 43, 44, 50]. For those studies, the differences in MPE (defined as recalculated MPE—presented MPE) were 1.1% [29], 2.9% [37], 26.8% [43], 58.4% [44], − 5.1% [50] (See Tables 6, 7 in Appendix 3, 4). In many cases, the MPE could not be recalculated [30, 31, 33,34,35, 38, 40, 41, 45, 47,48,49]. Fourth, the Bland–Altman analysis may only be applied for independent observations. In case of multiple observations per individual and in the absence of major hemodynamic changes, a modification of the Bland–Altman analysis for repeated measurements should be applied [88,89,90]. Many of the included studies used multiple observations per individual, but did not apply the modified Bland–Altman analysis [28, 32,33,34, 37,38,39, 43, 45, 50, 51]. This can lead to narrower LoA and a lower MPE in the individual studies [88, 89]. Lastly, only a few studies assessed the precision of both reference and tested method [28, 38, 40, 49], which is discussed earlier. Overall, the high risk of bias in the statistical domain causes the pooled data in this review to be less reliable.

Besides, for two adult studies multiple data for a patient is presented in two or three different rows in the forest plot, as those studies presented multiple outcome measures for different clinical circumstances [30, 34]. As the clinical conditions of both measurement points are different, the data can be considered as independent. Therefore it is statistically justified to assess these data separate.

Furthermore, some studies were excluded from our meta-analysis because of assessment of cardiac index, stroke volume or CO presented as mL kg−1 min−1, instead of CO as L min−1 [52,53,54,55,56, 58,59,60, 62,63,64,65,66]. These studies could have been a contribution to our results.

4.4 Trending ability

Monitoring changes in CO is relevant in clinical practice to measure the effect of an intervention. Despite its inability to measure absolute CO values, which is assessed by the Bland–Altman analysis, EC could still be applicable as trend monitor. To achieve acceptable trending ability, good precision is required, independent of the accuracy [91]. For the assessment of trending ability different methods are described, of which the for-quadrant plot and the polar plot are recommended [92,93,94]. Seven of the thirteen studies in adults assessed trending ability, applying several statistical analyses [28,29,30,31, 33, 34, 39]. None of the studies in pediatrics evaluated trending. Due to a lack of agreement on the statistical methodology, it is difficult to compare results and draw conclusions, which is a limitation of this review.

4.5 Future research

Our study focuses on the ICON®/Aesculon® monitor for evaluating EC. The ICON®/Aesculon® monitor is a device in development and future research should clarify its place between existing hemodynamic monitoring devices. The high risk of bias in the statistical analysis domain of the modified QUADAS-2 tool emphasizes the lack of consensus how to present data in validation studies, despite the fact that good proposals have been published [20, 25, 87, 91]. Consensus is required to interpret results of different studies and draw conclusions. Future validation studies with regard to EC, should also focus on trending ability [92,93,94]. Combined with studies on the applicability of EC for continuous CO monitoring and goal-directed therapy, this will provide useful clinical advice.

5 Conclusion

This meta-analysis of 24 studies, which assesses the accuracy and precision of non-invasive CO measurement by EC compared to a reference method, shows a pooled bias of 0.03 Lmin¯1 [95% CI − 0.23; 0.29], LoA − 2.78 to 2.84 L min−1 and MPE was 48.0% in adult studies. In pediatric studies the pooled bias was − 0.02 L min−1 [95% CI − 0.09; 0.05], LoA − 1.22 to 1.18 L min−1 and MPE 42.0%. Inter-study heterogeneity was high for both adults (I2 = 93%, p < 0.0001) and pediatrics (I2 = 86%, p < 0.0001). Despite the low bias in both adults and pediatrics, the pooled MPE were above the recommended 30%. Therefore, EC cannot replace TD and TTE for the measurement of absolute CO values. The trending ability of EC could not be assessed in this meta-analysis, due to a lack of agreement on the statistical methodology in the included studies. So, EC might still be applicable as a trend monitor to measure acute changes in CO, which is relevant for clinical decision-making. This should be an important part of future research, especially as EC is safe and easy to apply.