Introduction

Milrinone (Corotrope®/Primacor®) is a type III phosphodiesterase inhibitor primarily used for inotropic support in the treatment of cardiac dysfunction. Although milrinone is implemented in several guidelines, its efficacy and safety profile remain controversial [1, 2].

Three meta-analyses have evaluated milrinone in critically ill patients [35]. One meta-analysis included adult cardiac surgery patients and observed that milrinone was associated with a significant increase in mortality while an update of the review found no significant effects [4, 5]. One other meta-analysis evaluated milrinone for the treatment of acute heart failure after acute myocardial infarction and suggested that milrinone might be safe and effective in these patients [3]. Unfortunately, only four trials with a limited number of 303 patients were included.

None of these meta-analyses met all key methodological criteria for being a systematic review [6]. None of them were based upon a previously published protocol [35]. They lacked or had insufficient assessment of the risk of bias, and bias risks were insufficiently incorporated in the analyses and conclusions. They also lacked sufficient evaluation of the risks of random errors [79]. Just one domain having unclear risk of bias or high risk of bias is potentially sufficient to bias the findings. Furthermore, none of the previous meta-analyses assessed the outcomes according to the patients’ perspective following the Grading of Recommendations Assessment, Development and Evaluation (GRADE) [10]. GRADE assesses the quality of evidence by evaluating risk of bias, heterogeneity, indirectness, imprecision and publication bias [10].

Our objective was to perform a systematic review with meta-analyses and trial sequential analysis (TSA) of randomised clinical trials (RCTs) according to The Cochrane Handbook for Systematic Reviews of Interventions and The Cochrane Hepato-Biliary Group Module comparing the benefits and harms of milrinone in critically ill adult patients with cardiac dysfunction [6, 7].

Methods

This systematic review was conducted according to our published protocol following the recommendations of The Cochrane Handbook for Systematic Reviews of Interventions and The Cochrane Hepato-Biliary Group Module and reported according to the PRISMA statement [6, 7, 11]. The protocol for this systematic review was registered at PROSPERO (no. CRD42014009061) [12].

Eligibility criteria

We considered all randomised clinical trials for inclusion, irrespective of language, blinding, publication status or sample size for assessment of benefits and harms. Quasi-randomised studies and observational studies with more than 500 patients were not included regarding assessment of benefits, but were considered for inclusion regarding assessment of harms and were planned to be analysed separately from the randomised trials [6].

Only trials with adult patients having cardiac dysfunction were considered. Cardiac dysfunction was defined as left ventricular ejection fraction (LVEF) below 40 % and/or low cardiac output. Low cardiac output syndrome was defined as a pre-existing or developing state of cardiac insufficiency with underlying left or right ventricular systolic dysfunction requiring inotrope support [13]. We accepted the definitions of the diagnoses according to the criteria used in each individual randomised trial. Milrinone was considered the experimental intervention. There were no restrictions on dose, continuous or intermittent administration, or duration of treatment. However, trials with oral and/or inhaled milrinone were excluded as such routes of administration were judged inappropriate for critically ill patients.

All trials were included independent the type of control intervention, i.e., no intervention, placebo, dobutamine, levosimendan, or any other inotrope or vasopressor. While this may introduce heterogeneity, subgroup comparisons were preplanned according to inactive (placebo or no intervention) and potentially active control interventions (e.g., other inotropes or vasopressors).

All outcomes were graded according to the patients’ perspective following GRADE [9]. The primary outcome was serious adverse events (SAE). SAE is a composite outcome summarising all serious events necessitating an intervention, operation, prolonged hospital stay or mortality according to ICH-GCP definitions [14]. This outcome was chosen for balancing the potential benefits and harms. The secondary outcomes were all-cause mortality, myocardial infarction, arrhythmia (including supra- and ventricular tachycardia and ventricular fibrillation) and duration of mechanical ventilation. Time-specific analyses of mortality were conducted according to availability of data (e.g. 30, 90 and/or 180 days). Length of stay (both intensive care unit and total hospital stay) is a potentially highly biased surrogate outcome for recovery and was therefore not considered.

Search strategy

We searched the Cochrane Central Register of Controlled Trials (CENTRAL) in The Cochrane Library, PubMed/MEDLINE, EMBASE, Web of Science and CINAHL until November 2015 (see supplements). We searched the references of the identified trials and systematic reviews to identify any further relevant trials, i.e. backward snowballing. We also searched the WHO’s trial platform and ClinicalTrials.gov for ongoing trials and contacted the FDA and EMA.

Study selection and data extraction

Two authors independently identified the trials for inclusion. Excluded studies were listed with reasons for exclusion. The following data was extracted: year of publication, country in which the trial was conducted, year of conduct of the trial, single-centre or multicentre trial, inclusion and exclusion criteria, all outcomes, details on interventions and characteristics of the trials, e.g. baseline imbalance, early stopping and other than intention-to-treat analysis. The authors of the individual trials were contacted in case of any unclear or missing information.

Bias risk assessment

Two authors independently assessed the risks of bias of the trials following instructions in The Cochrane Handbook for Systematic Reviews of Interventions [6]. The following risk of bias domains were extracted from each trial: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessors, incomplete outcome data, selective outcome reporting and other bias including bias due to vested interest and/or academic bias [1520]. Trials were classified as low risk of bias if all the domains were assessed as low risk. Trials were considered to have high risk of bias if one or more of these bias risk domains were scored as unclear or high risk of bias.

Error matrix approach

Data on the outcomes of all trials were assessed for the risks of bias (measured by the level of evidence), the risks of random error (measured by standard error) and design errors (measured by GRADING the outcomes) [21]. The three-dimensional Manhattan error matrix was used to facilitate the overview of available evidence at a glance [21].

Statistical analysis

We performed the meta-analyses according to The Cochrane Handbook for Systematic Reviews of Interventions [6] and The Cochrane Hepato-Biliary Group Module [7] and used the software package Review Manager 5.30 [22]. For TSA, the TSA program v.0.9beta (http://www.ctu.dk/tsa) was used [23].

Results were presented as relative risks (RR) with 95 % confidence interval (CI) if there were two or more trials for an outcome. For rare events (<5 % in the control group) we calculated odds ratios (OR) and for very rare events (<2 % in the control group) we calculated Peto’s OR with 95 % CI [24]. We also reported risk differences (RD) if conclusions were different from risk ratio. P values less than TSA-adjusted significance levels were considered statistically significant.

We calculated both a fixed-effect [25] and a random-effects [26] model for meta-analysis and presented both models in case of discrepancy. Considering the anticipated clinical heterogeneity we emphasised the random-effects model except if one or two trials dominated the available evidence [27]. Heterogeneity was explored by the Chi-squared test with significance set at a P value of 0.10, and the quantity was measured by I 2 [6, 28].

Analyses were performed on intention-to-treat [6]. In case of statistically significant RR, we calculated the number needed to treat (NNT) or number needed to harm (NNH) with 95 % CI.

Predefined subgroup analyses were conducted according to (1) the bias risk of trials (low risk of bias compared to trials with unclear and high risk of bias; hypothesis: trials with unclear or high risk of bias are associated with more favourable beneficial effects); (2) the control intervention (inactive compared to potentially active; hypothesis: milrinone appears more favourable when compared to an inactive control intervention than potentially active control intervention); (3) clinical setting (patients having cardiac surgery compared to patients not having cardiac surgery; hypothesis: milrinone shows benefit in patients having cardiac surgery and not in other patients).

Funnel plots were used to explore small trial bias when data of more than ten randomised trials were available [6, 29, 30].

Trial sequential analysis (TSA)

We conducted TSA to control the statistically significance levels when data are reanalysed repetitively or are too sparse to draw firm conclusions, and accordingly, appropriately widen the confidence intervals [8, 9, 3133]. TSA depends on the quantification of the required information size (the meta-analysis sample size). We calculated the diversity (D 2)-adjusted required information size (DARIS) for a random-effects meta-analysis [34]. Trial sequential monitoring boundaries cannot be calculated when less than 5 % of the DARIS has been accrued. We conducted TSA with the intention to maintain an overall 5 % risk of a type I error and a power of 90 %. We used the unweighted control event proportion in the control group and we anticipated an intervention effect of a 10 % relative risk reduction (RRR). Sensitivity analyses were conducted using an RRR of 20 % as well as the lower confidence limit of the RRR of the intervention effect suggested by the meta-analysis of the trials with low risk of bias [27]. We intended to provide the CI adjusted for sparse data and repetitive testing, which we describe as the TSA-adjusted CI.

GRADE approach

We used the GRADE system to assess the quality of the body of evidence associated with each of the major outcomes in our review using GRADE software (ims.cochrane.org/revman/other-resources/gradepro) [10]. The quality measure of a body of evidence considers within-study risk of bias, indirectness, heterogeneity, imprecision and risk of publication bias.

Results

The search strategy identified 9336 hits (Fig. 1). Three additional publications were identified by backward snowballing: two could be included [35, 36] and one was irretrievable [37]. After removal of duplicates and screening, 244 hits remained. Of the 244 hits, 213 were excluded after full text evaluation. The remaining 31 publications were included in this systematic review. All authors of the 31 publications were contacted for missing data; only three authors responded [3840], but no additional data was obtained. Of the 31 publications, 15 evaluated only surrogate outcomes (such as haemodynamic variables). Accordingly, only 16 randomised trials provided data for analyses [35, 36, 38, 4052].

Fig. 1
figure 1

PRISMA flow diagram. Asterisk not available at Dutch libraries or the universities linked through the University of Groningen. Double asterisk study design: no RCT or prospective non-randomised <500 patients. Population: no cardiac dysfunction or low cardiac output syndrome. Intervention: not milrinone

No ongoing trials, quasi-randomised studies or observational studies were identified.

Characteristics of the included trials

The characteristics of the 16 randomised trials that provided data for analyses are listed (Table 1). Two trials used a three-arm parallel group design; all others had a two-arm parallel group design. There were five multicentre trials.

Table 1 Baseline characteristics of included trials

Eight trials evaluated patients after cardiac surgery [38, 41, 43, 4548, 51], four trials evaluated patients with chronic heart failure [40, 42, 44, 50], three trials evaluated patients with acute heart failure after acute myocardial infarction [35, 36, 49] and one trial evaluated patients with severe sepsis [52].

Milrinone was administered in different doses. Nine trials used a 50 µg/kg bolus and one trial a 30 µg/kg bolus. Continuous infusion rates ranged from 0.25 to 1.0 µg/kg/min. Eight trials used an inactive comparator and eight trials used a potentially active comparator, including catecholamines, dobutamine, levosimendan, nifedipine or nesirrtide. Many trials applied milrinone as an add-on intervention to standard care including other inotropes.

Bias risk assessment

Three trials (19 %) had low risk of bias regarding sequence generation, two trials (13 %) had low risk of bias regarding allocation concealment, five trials (31 %) had low risk of bias regarding blinding of participants, six trials (38 %) had low risk of bias regarding blinding of outcome assessors, five trials (31 %) had low risk of bias regarding incomplete outcome data, four trials (25 %) were without selective outcome reporting and two trials (13 %) were assessed as low risk of bias concerning industry and/or academic bias (Fig. 2). Accordingly, all trials were assessed as high risk of bias.

Fig. 2
figure 2

Risk of bias assessment. Review of authors’ judgements about each risk of bias domain for each included study. Red high risk, green low risk, yellow unclear

Outcomes

The pooled intervention effect estimates with the 95 % CI of the outcomes are specified according to control intervention and setting (Table 2 and supplements).

Table 2 Conventional risk ratios with 95 % confidence intervals (CI) for the evaluated outcome measures including all patients stratified by intervention

In the absence of trials that reported the primary composite outcome SAE including mortality, we have chosen to report all-cause mortality at maximum follow-up as the most important outcome. There were insufficient data for time-specific analyses of mortality. Meta-regression was not performed because of insufficient data.

Subgroup analyses according to risk of bias were not performed as no trial was assessed as having low risk of bias.

All analyses were conducted with stratification by control intervention, unless stated otherwise.

Comparison 1: all critically ill patients with cardiac dysfunction

All-cause mortality

Fourteen trials with 1611 randomised patients reported mortality. Pooled data showed that mortality at maximal follow-up was 11 % in both groups (RR 0.96; 95 % CI 0.76–1.21; I 2 0 %; Fig. 3).

Fig. 3
figure 3

Forest plot of all-cause mortality in trials stratified by intervention. Size of squares for risk ratio (RR) reflects the weight of the trial in the pooled analyses. Horizontal bars 95 % confidence intervals (CI)

Subgroup analyses on type of control intervention and clinical setting showed differences in mortality event proportions in the control groups (inactive control group 0–70 %; potentially active control group 9–73 %; cardiac surgery setting control group 0–7 %; non-cardiac surgery setting control group 0–73 %), but tests of interaction showed no statistically significant differences between the groups (P = 0.59 and P = 0.83, respectively; Table 2 and supplements). No comparison could be analysed with TSA using the prespecified type I error of 5 % and type II error of 10 % because less than 5 % of DARIS was accrued.

As a sensitivity analysis, we conducted TSA with an RRR of 20 % and power of 80 % which showed that 20 % of the data was accrued and thousands of additional randomised patients are needed before futility or the required information size will be reached (RR 0.96; TSA adjusted CI 0.60–1.53; see supplements).

Myocardial infarction

Five trials with 1120 patients reported myocardial infarction (MI). MI at maximal follow-up occurred in 3 % in the inactive control group versus 15 % in the potentially active control group. Two small trials [41, 45] had a potentially active control group. There were no statistically significant differences in MI between milrinone and any control group (RR 0.73; 95 % CI 0.25–2.09; I 2 61 %, P = 0.48; Table 2). No comparison could be analysed with TSA using the prespecified type I error of 5 % and type II error of 10 % because less than 5 % of DARIS was accrued.

Subgroup analyses based on clinical setting revealed discrepancy between fixed- and random-effects models driven by different weighting of one trial with 94 % relative risk reduction (random-effects model RR 0.53; 95 % CI 0.24–1.17, and fixed-effect model RR 0.45; 95 % CI 0.25–0.81; I 2 34 %; see supplements) [48].

Other outcomes

Ventricular tachyarrhythmias [i.e. ventricular tachycardia (VT)/ventricular fibrillation (VF)] were reported in seven randomised trials (1226 patients) with equal event rate percentages (7 %) in both groups (RR 0.96; 95 % CI 0.65–1.41; I 2 0 %). No comparison could be analysed with TSA using the prespecified type I error of 5 % and type II error of 10 % because less than 5 % of DARIS was accrued.

The pooled results and the subgroup analyses showed no associations between milrinone and ventricular tachyarrhythmia (see supplements).

Supraventricular tachyarrhythmia’s (SVT) were reported in four trials (1138 patients). There was a statistically significant heterogeneity between the trials in both subgroup analyses (both I 2 55 %; P = 0.08). SVT varied from 5 to 18 % in the different subgroups (inactive versus potentially active and cardiac surgery versus non-cardiac surgery). Analyses of the pooled data (RR 0.89; 95 % CI 0.43–1.87) and the subgroups showed no significant associations (see supplements). No comparison could be analysed with TSA using the prespecified type I error of 5 % and type II error of 10 % since less than 5 % of DARIS was accrued.

Mechanical ventilation duration was reported in four trials (210 patients) in a cardiac surgery setting; duration ranged from 11 to 34 h in the control group and 10–65 h in the milrinone group. There was statistically significant heterogeneity (I 2 80 %; P = 0.002). No significant differences were found. Test of interaction was not significant (P = 0.06).

Comparison 2: patients with cardiac dysfunction after cardiac surgery

All-cause mortality

Six trials with 279 randomised patients reported mortality data. Mortality at maximal follow-up was 4 % in both groups (RR 1.04; 95 % CI 0.30–3.63; I 2 0 %). No comparison could be analysed with TSA using the prespecified type I and type II error because less than 5 % of DARIS was accrued.

Two trials used an inactive comparator and four trials used a potentially active comparator. No significant associations between milrinone and mortality were found (see supplements).

Myocardial infarction

MI was reported in four trials including 210 patients. There was significant statistical heterogeneity between the trials (I 2 58 %; P = 0.09). There was discrepancy between the fixed- and the random-effects models driven by different weighting of one trial [48] (fixed-effect model RR 0.42; 95 % CI 0.21–0.86; random-effects model RR 0.47; 95 % CI 0.13–1.72; see supplements). No comparison could be analysed with TSA using the prespecified type I and type II error because less than 5 % of DARIS was accrued.

Other outcomes

Five trials with 240 patients documented ventricular tachyarrhythmia and no significant associations were found (see supplements).

SVTs were reported in three trials with 230 randomised patients and no significant associations were found between milrinone and SVTs (see supplements).

Comparison 3: patients with cardiac dysfunction not having cardiac surgery

All-cause mortality

Eight trials with 1332 randomised patients reported mortality. Mortality at maximal follow-up was 11 % in the milrinone group versus 12 % in the control group (RR 0.91; 95 % CI 0.64–1.28; I 2 0 %). No comparison could be analysed with TSA using the prespecified type I error of 5 % and type II error of 10 % because less than 5 % of DARIS was accrued.

Three trials used an inactive control and five trials used a potentially active control. Subgroup analyses on type of control intervention showed no significant difference (test of interaction P = 0.34). No significant associations between milrinone and mortality were found (see supplements).

Other outcomes

Ventricular tachyarrhythmia’s (VT/VF) were reported in two trials (986 patients). No significant associations between milrinone and VT/VF were found (RR 1.19 95 % CI 0.68–2.06).

There was insufficient data on other secondary outcomes.

Error matrix approach

The Manhattan error matrix plots of milrinone showed that there is a similar amount of evidence regarding the benefits and harms of milrinone. All trials had high risks of systematic errors (bias) and the large majority of the trials also had high risks of random errors (see supplements).

Small trial bias

Funnel plots showed no clear arguments for small trial bias including publication bias (see supplements).

GRADE approach

The quality of the evidence was assessed as very low for all outcomes based on risk of bias limitations, indirectness, inconsistency, imprecision and other considerations. Table 3 shows the GRADEpro summary of findings table with stratification by control intervention.

Table 3 GRADE pro summary of findings table of the outcomes of interest stratified by control intervention

Discussion

Our systematic review evaluating the effects of milrinone for critically ill adult patients with cardiac dysfunction found few data on outcomes critical for decision making. Thirty-one randomised clinical trials fulfilled our inclusion criteria. All included trials had high risk of bias, most as a result of not reporting bias protection, and nearly all trials had large risks of random errors. Fifteen trials only reported surrogate outcomes. No trial reported the primary outcome, SAE (including mortality). All-cause mortality was reported in 14 trials with 1611 patients. No significant effect on any patient-centred outcome was found.

A general issue is that systematic reviews depend on the strengths of the included randomised trials. Trials with unclear or high risks of bias are associated with overestimation of benefits and underestimation of harms [15, 16, 18, 19]. The unknown true intervention effect may be beneficial, neutral or harmful. Previous meta-analyses on milrinone differ in design from our systematic review and they come to different conclusions [35]. One study focussed only on patients with myocardial infarction [3] and two only on cardiac surgery patients, in which the latter was an update [4, 5]. The meta-analysis on patients with myocardial infarction observed no significant effect on mortality, but stated that milrinone increased left ventricular ejection fraction and cardiac output [3]. The first meta-analysis evaluating patients having cardiac surgery suggested an increase in mortality using milrinone, which disappeared in the updated meta-analysis [4, 5]. Our prepublished protocol, a sensitive search strategy and thorough evaluations of the risks of systematic errors and random errors may explain differences with these previous publications [35]. First, previous meta-analyses ignored exploring associations of bias risk with intervention effect estimates. Final conclusions ought to be derived from trials with low risk of bias, of which there were none [6]. Second, despite including more patients (n = 1611) as compared to previous meta-analyses (n = 303 [3], n = 518 [4], n = 1037 [5]) the number of included patients is still far too small to draw any firm conclusions. We think that any significance needs the perspective of sample size considerations, in individual trials and also in meta-analyses [27, 31, 5355]. Third, previous meta-analyses combined patients with normal cardiac functions [4, 5] and children [5] with patients with cardiac dysfunction into one pooled estimate. We included trials that randomised adult patients who had cardiac dysfunction. It is unlikely that patients benefit from milrinone when their cardiac function is unaffected, i.e. when the pathophysiological basis for cardiac stimulation is lacking.

Co-interventions with medications with an efficacy profile similar to milrinone might also have obscured results. Trials that evaluated milrinone versus placebo could also be considered add-on trials since co-interventions were allowed. The largest trial that evaluated milrinone versus placebo allowed at least co-interventions with dobutamine in their randomised patients; other inotropes were not reported [44]. The results of this trial suggest that milrinone may be harmful in patients with heart failure (LVEF <40 %) compared with standard treatment (ACE inhibitor and diuretics). Furthermore, the sickest patients were excluded in this trial [44]. For daily practice it is of utmost interest to know which vasopressor, inotrope, vasodilator or any combination is indicated for which patient and at what target [56, 57]. We found that for milrinone and levosimendan for critically ill adult patients with cardiac dysfunction evidence from trials with low risk of bias and low risk of random error is lacking to support its use [58]. Other interventions are currently being evaluated in systematic reviews which might feed future evidence-based guidance for clinicians or substantiate new trials.

Limitations

During the process of the systematic review we were non-adherent to our prepublished protocol for several reasons. We rephrased the title and terminology for an improved description of the cardiac state of the patients at interest (i.e. cardiac dysfunction instead of cardiac support or myocardial dysfunction). We divided subgroup comparisons into inactive versus (potentially) active control interventions. Since no data was found on the predefined subgroup comparison milrinone versus vasopressors we were unable to report this comparison. There were also no data on the composite outcome SAE (mortality included) and, therefore, all-cause mortality became the most important outcome. The outcome hypotension was regarded as a surrogate outcome and therefore omitted.

We frequently found significant statistical heterogeneity, but even when absent, there was still considerable clinical heterogeneity in patients, interventions, comparators, outcomes or settings. Pooling the data was frequently considered disputable, even in the absence of statistically significant tests of interactions. One example is the pooled intervention effect estimate of mortality (comparison 1), which has low statistical heterogeneity (I 2 0 %; P = 0.50) and no subgroup differences, but the clinical heterogeneity is obvious, also reflected by control event rates for all-cause mortality varying from 3.6 to 12.0 %. The large variety in types of control interventions further increases clinical heterogeneity.

Milrinone dose and duration varied among the included trials. Also, there were differences in definitions of outcomes. Further, 15 trials evaluated surrogate outcomes, such as haemodynamic and biochemical parameters. Finally, most trials had short follow-up; only one trial evaluated 1-year follow-up [51], so that mortality analyses reflect rather short follow-up.

Conclusions

The quantity and quality of evidence for benefit or harm of milrinone in critically ill adult patients with cardiac dysfunction are very low because of high risks of systematic and random errors. Future randomised clinical trials need to be large and well designed by following SPIRIT guidelines and reported according to CONSORT guidelines. The widespread use of milrinone in critical care cannot be advocated or refuted on the basis of the current evidence.