Background

Heart failure (HF) disease management programs are common in North America, Europe, and Australia [1, 2]. These services provide care to optimize pharmacological regimen and support medication management and effective self-care. Programs have been widely introduced following recommendations from international clinical guidelines [1, 3, 4] but a number of recent and comparatively large trials have found no or small benefits from programs [510]. These inconsistencies have been explained by design issues rather than biases, reporting inadequacies or differences in actual effects [11, 12]. However, recent results from the United States of the Medicare Health Support Pilot Program (MHSPP) [13] provide corroboration that program effects are poorly understood. This independent randomized trial of nine disease management programs with 30,000 patients with heart failure and diabetes concluded that programs did not decrease mortality, frequency of hospitalization, costs, or improve self-care, self-care efficacy, or mental and physical health [13].

These results raise questions about what clinicians should do in the light of contradictory evidence from trials and meta-analyses. When results from trials differ, it should not be concluded that an intervention is ineffective because most trials are underpowered to identify true effects [14]. Meta-analyses can overcome this lack of power but are as prone to reporting and design flaws as any other type of research design [15]. Though findings from meta-analyses frequently influence guidelines, like any other research design, as the recent PRISMA guidelines acknowledge, systematic reviews can vary widely in quality [16, 17].

Thus, the methods and overall quality of meta-analysis are of great importance. Despite this, there has been no systematic appraisal of the quality of meta-analyses of heart failure management programs to date. This is particularly important given the increasing awareness of the complexity and diversity of these programs [18]. To evaluate the strength of evidence from current meta-analyses of these programs, we appraised the nature and quality of evidence from existing published meta-analyses of HF disease management programs.

Methods

Meta-review was used to identify and appraise evidence from published meta-analyses of heart failure disease management programs or approaches. Meta-review appraises and synthesises findings from systematic reviews, in this instance, from meta-analyses [19]. The approach has evolved in response to the growing number of systematic reviews and the need to appraise quality of a review before application to practice and policy, for example via PRISMA [17].

Meta-review follows similar principles to systematic review [19]: it involves a comprehensive and detailed search of the literature for relevant studies with quality assessment to assess for bias, transparency, and comprehensiveness [19]. As with traditional systematic review, in meta-review, validation of quality by a second, independent reviewer is important to reduce potential for bias [19].

A comprehensive search was done to identify meta-analyses of randomized controlled trials published in English that examined the effects of HF disease management programs on key outcomes. To be included, reviews had to have a detailed and comprehensive search strategy (as identified by: naming of databases and years of searching and example or actual terms), contain data on study quality and make reference to synthesis of findings either by pooling data or rejecting the pooling of data. Due to changes in clinical practice, and to ensure some degree of congruence with contemporary clinical practice, we searched only for meta-analyses published after 1995, confined our search to reviews that contained comparisons of programs with usual care, and included samples of adults over the age of 18 years with confirmed diagnosis of HF. Meta-analyses of interventions that included patients with other forms of cardiac disease (such as cardiac rehabilitation or secondary prevention) that may have addressed heart failure disease management were not included due to the lack of data specific to heart failure populations in these reviews [20, 21]. Finally, the meta-analyses had to contain extractable data for HF patients on mortality (all-cause or HF related), hospital (re)admission (all-cause and HF related), or health-related quality of life.

For the purposes of the review, interventions were defined as HF management programs if they consisted of more than one recognized disease management component (medication optimization, lifestyle modification, or education) with the purpose of improving outcomes related to HF in patients with a confirmed diagnosis or were self-identified by the authors as constituting a program or analogous health service intervention beyond usual care for the treatment of HF.

A variety of electronic databases using a range of search terms (Table 1) were searched, including: MEDLINE, EMBASE, Cochrane Database of Systematic Reviews (CDSR), DARE, NHS EED, NHS HTA, Ageline, AMED, Scopus, Web of Science and CINAHL from 1st January 1995 to July 31, 2008. In addition, reference lists and bibliographies of identified reviews were hand searched.

Table 1 Search terms used

The primary screening was conducted independently by LS and AMC with abstracts/titles being screened fully. Full papers for potential inclusion were then screened by LS and AMC for detailed evaluation with disagreements regarding eligibility being handled with joint discussion between LS, AMC, and DRT.

Data were extracted onto a standardized data extraction template relating to: population, intervention, comparison, and outcome (PICO). This approach has been developed for optimizing evidence-based practice. Quality of each meta-analysis was assessed independently by LS and AMC using a standardized and valid measure of quality of systematic review (AMSTAR) [22].

Results

4529 potential articles were initially identified (Figure 1) but primary screening excluded 4285 papers. After reviewing the remaining papers (n = 244), 15 meta-analyses met the inclusion criteria (Table 2).

Figure 1
figure 1

Results of the systematic search strategy and study selection process.

Table 2 Meta-analyses included in review

The 15 meta-analyses (Table 2) contained a mean of 18.5 randomized trials +/- 10.1 (range: 6 to 36) and a mean of 3267.4 patients +/- 2184.0. Two reviews did not report sample size [9, 23]. Overall quality of the meta-analyses based on AMSTAR criteria [22] was moderate but varied widely (Mean Score = 6.4 +/- 1.9; range 2-9). Main weaknesses in the reviews were lack of incorporation of study quality in conclusions and low detail regarding excluded studies (Additional file 1).

Search Strategies

Most reviews searched for published and unpublished trials [9, 2431]; four identified that grey literature was searched [9, 26, 27, 30]. Though only one review limited its search to English-only papers, [32] the overall quality of search strategies was moderate: three reviews described a full Boolean strategy [24, 27, 33] and eight provided a QUOROM-like flow chart [25, 2831, 3335]. Most reviews included an assessment of publication bias via a funnel plot [2325, 28, 29, 31, 32, 34].

Populations

Mean age of the review population was calculated in two reviews [28, 31] (both mean age: 73 years) with the oldest reported mean age being 81.6 [23]. Seven reviews [23, 26, 27, 30, 33, 34, 36] reported an upper age limit of 80 years. The lowest mean age reported was 56 by five reviews by way of inclusion of the same trial [24, 30, 32, 34, 36]. Two additional reviews reported lower mean age limits of 57 and 58 [25, 35] but none presented data on standard deviation of ages.

Six reviews [23, 25, 26, 30, 35, 36] provided no data on the sex of the participants in the trials. Co-morbidities and characteristics of study populations were frequently not reported with particular weaknesses in reporting of medication treatments (Table 3). Of the four studies that did report co-morbidities, [26, 31, 36, 37] hypertension, diabetes, chronic obstructive lung disease, and coronary artery disease were most common.

Table 3 Select population characteristics

Interventions

Definitions of trials

Reviews most frequently used operationalised definitions (Table 4) to guide inclusion of interventions, though only three used definitions involving approach, personnel, setting, and content [23, 26, 27]. The foci of reviews differed markedly, for example, reviews specified interventions provided only in particular settings, [23, 2527] or without reference at all to content [25, 34, 37].

Table 4 Definitions of trials and characteristics of interventions actually included

Interventions included

Interventions included in reviews mostly spanned three to five patient settings or modes of provision; only two were focused on interventions using single or comparable settings or mode of provision [23, 25]. Interventions included in reviews were wide ranging (Table 4) in terms of number and type of settings and locations. For example, nine reviews included programs with both hospital and home-based components [2729, 31, 3337] and two reviews included studies that employed interventions in all settings [29, 34]. Nurses were the most frequent providers of care through 'multi-disciplinary team' interventions. Additional physician involvement via cardiologist or general practitioner was identified in nine reviews [26, 2832, 3537] though three reviews involved both general practitioners and cardiologists [26, 30, 36]. All reviews but one [24] identified other personnel involved, for example: pharmacist or pharmacist collaboration [2830, 32, 35, 37].

Program Content

The reviews specified a mean of 1.13 essential components of content (range 0 to 3). Interventions were described in terms of content using general descriptors, such as education, self-care, discharge plan, and medication support. Reviews most commonly stated that interventions had to have three or four component items though reviews could extend to five or more content components [26, 30, 37]. Educational and monitoring interventions were the most commonly identified elements. Other components included support at hospital discharge, medication review, and social support. Hence, a degree of overlap existed across settings. For example, a systematic review may focus on a nurse-led hospital-based intervention yet offers home visits, telephone support, and follow-up with a general practitioner [23].

Obtaining data on usual care was noted to be problematic [23, 2729, 32, 35] and the care provided to comparison groups was poorly defined (Table 5). For example, in seven of twelve trials in one review, descriptions of care were omitted entirely [35].

Table 5 Components of interventions included and trial quality

Outcomes

The follow-up period was 3 to 12 months in six reviews [24, 27, 29, 31, 33, 36]. Three studies reported beginning follow-up periods at three months but the upper limit extended to 16, 18, and 22 months [25, 32, 28]. Other reviews did not report length of follow-up [34] or did not report follow-up periods [23].

Within review pooling of outcomes

The meta-analyses pooled data on: all-cause mortality as primary and secondary outcomes. (Table 6) Other outcomes pooled included all-cause (re)admission, HF mortality, HF (re)admission, quality of life, and cost. Data were pooled using random [25, 28, 30, 3235] and fixed effect models of analysis [24, 27] or both methods [26, 29, 31, 37] if significant statistical heterogeneity was identified.

Table 6 Effect sizes of primary outcomes of reviews (95% Confidence Intervals)

Out of 13 reviews, 6 identified statistically significant improvements in all cause mortality [25, 28, 30, 32, 34, 37] though all 13 reviews identified trends favouring programs over control. Effect sizes varied from 3% to 25% but were mostly clustered around 15% to 20%. Larger benefits were more evident in terms of hospitalisations. All 9 reviews that measured changes in HF-related hospitalizations [24, 25, 2730, 34, 35, 37] identified significant reductions in admissions with reductions in risk ranging from 30% to 56%. Out of 13 reviews, 10 reviews [24, 26, 2830, 3235, 37] identified reductions in all-cause readmission with reductions in risk ranging from 8% to 41% with most clustered around 15% to 25% reductions in admission. Seven reviews extracted data on quality of life or health-related quality of life [2731, 33, 35]. (Table 7) The majority did not pool outcomes due to high levels of heterogeneity [27, 28, 33, 35] or lack of data [30]. However, two reviews identified insignificant trends favouring quality of life improvements after pooling [29, 31].

Table 7 Direction of effects Quality of Life

Due to the limited reporting of interventions and control groups and the diversity of trials included in the reviews, it is not appropriate to pool outcomes from the meta-analyses here. This is important because findings from interventions that are excessively heterogeneous should not be pooled. Particularly, this was the case with these meta-analyses that varied and/or contained unclear data pertaining to a wide range of factors and strata of programs, for example, relating to clinical populations, providers, location, mode of delivery, numbers of components, and length. These multiple ambiguities made pooling, sensitivity analysis, and meta-regression inappropriate [3840].

Handling of uncertainty in the reporting of review results

Trial quality was inconsistently taken into account when formulating conclusions and was not addressed in most reviews. Statistical heterogeneity was discussed in most reviews though clinical and methodological heterogeneity was consistently neglected (Table 8). Sensitivity analyses were carried out around a diverse range of elements, including study quality, [23, 2830, 34, 35, 37] size, [37] and publication status [34]. Intervention-type, [26, 28] follow up, [26, 30] diagnoses, [23] and elements of interventions related to: components, [23, 26] complexity, [31] and provider-type [28, 37]. Three reviews selected factors a priori for sensitivity analysis [26, 30, 37]. Sub-analyses were undertaken around 'general' program features, [32] setting, [26, 30, 34] home-visit or telephone contact, [26] and discharge planning [29, 31].

Table 8 Trial quality and heterogeneity

Discussion

This meta-review is the first of meta-analyses of HF disease management programs and conveys the challenges of performing meta-analyses of complex health services interventions. Overall, quality of the reviews was moderate though very mixed across reviews - this quality is important to consider when deciding whether review findings should guide practice and guidelines [22, 41, 42].

Based on the consistency and size of effect sizes identified by the meta-analyses, it would immediately appear reasonable to conclude either that, in generality, programs work or that programs of various types work [43]. However, this meta-review supports concerns that populations, programs, and analyses of these programs are inconsistently and poorly described [44, 45]. For example, studies were poorly described in terms of populations and treatments with only one-fifth of reviews defining programs comprehensively in terms of approach, personnel, setting, and content. Even with the use of operationalised definitions to guide study selection in reviews, findings from interventions with very diverse characteristics and populations were pooled and, though mentioned in reviews, the implications of trial quality or statistical, clinical or methodological heterogeneity were seldom actually taken into account in analyses. No progress over time was evident in quality of reporting. Hence, reviews continue to focus on the results of study pooling over issues related to program complexity and heterogeneity.

Why might program complexity and heterogeneity be comparatively neglected in comparison to the findings of reviews? Firstly, this emphasis is understandable due to limitations in methodology. Complex interventions are often poorly described in published manuscripts [46] and it is well known that HF disease management programs are complex and diverse [43, 45, 47]. Current statistical and methodological techniques to describe and analyse such interventions in systematic review remain rudimentary [48]. Current meta-analyses also predate the existence of a taxonomy to classify HF disease management programs [18] and more extensive CONSORT reporting requirements for non-pharmacological trials [49].

Secondly, scientific findings that are more positive are more likely to be published in higher impact journals and cited more often in guidelines [50, 51]. This reduces incentives to qualify results to take account of 'messy' issues related to program diversity and heterogeneity and fosters a disproportionate emphasis on positive findings without qualification [52] or recognition of how elements of context may moderate intervention effects [53]. This tendency may be combined with a wider perceived political need to champion multi-disciplinary health services interventions to attain greater recognition and usage of such interventions in healthcare systems seen to favour pharmacological interventions and biomedicine [54].

However, paradoxically, ignoring complexity and heterogeneity may actually reduce knowledge translation. This follows because uptake is likely to be reduced by unclear descriptions of what programs and comparison groups consist of, lack of clarity over likely benefits in important patient groups (for example: the effects of both age and sex on program outcomes are not known), and lack of specificity in findings regarding key program characteristics [16, 53].

In future reviews, programs should be described comprehensively using systematic classification methods [18]. More sophisticated taxonomies are needed to fully capture the deeper characteristics of programs [48]. These should be used in future reviews to describe programs comprehensively and the effects of clinical, methodological, and statistical heterogeneity - as per PRISMA guidelines - must be formally taken into account in methods and conclusions [15]. Future trials should report key elements of populations, interventions, comparison group, and outcomes in accordance with the modified CONSORT statement for non-pharmacological trials [49]. These factors should be incorporated and reported comprehensively in meta-analyses. Findings from meta-analyses should be evaluated prior to application to practice and policy with review quality being assessed using valid quality criteria [15].

In terms of limitations, as with any review, this meta-review was constrained by the quality of reporting of the component studies. The data presented here are descriptive because it was inappropriate to synthesise outcomes to generate pooled effect sizes due to the wide diversity of programs subsumed in the reviews and the lack of comprehensive reporting in the reviews of intervention, comparator groups, and population characteristics [55, 56]. As pivotal elements of programs, reporting of these components has to be clear and comprehensive if synthesis is to be undertaken.

Conclusions

Meta-analyses of heart failure disease management programs have promising findings but often fail to report key characteristics of populations, interventions, and comparisons. Existing reviews are of mixed quality and do not adequately take account of program complexity and heterogeneity.