Introduction

In diabetes care there are single interventions (e.g. a single drug) and complex interventions (e.g. treatment regimens or diabetes self-management). The latter comprise a number of separate elements (active components), all of which seem essential to their proper functioning [1, 2]. A self-management programme for type 1 diabetes may illustrate the complexity of such interventions [3]. Decisive components are the insulin regimen used and the quality of the teaching process to empower patients to carry it out effectively and safely. Empowerment of patients to set individual treatment goals and to balance favourable blood glucose targets and an acceptable risk of hypoglycaemia by self-adaptation of insulin dosages to adjust to lifestyle may be more effective than defining normoglycaemia as the primary treatment goal and asking patients to adapt their lifestyle to match prescribed doses of insulin [4]. Liberalisation of the diet may be important to motivate patients to carry out an intensified insulin therapy regimen in the long term [5]. Though indispensable, knowledge by itself may not improve outcomes. The information and how it is transmitted are decisive [3]. Blood glucose self-monitoring may be at best useless unless patients have learned to interpret results and to react by adjusting insulin dosages [3].

A high-quality randomised controlled trial (RCT) is considered the most valid method of evaluating a medical intervention, and a systematic review of high-quality RCTs the most powerful evidence available [6]. A systematic review is a summary of the medical literature that uses explicit methods to systematically search, critically appraise and synthesise the literature on a specific issue [6]. They may, but need not, include a meta-analysis as a statistical method for combining the results of individual studies [6].

The UK Medical Research Council (UKMRC) has proposed a framework that allows a systematic and transparent evaluation of complex interventions. Five sequential phases of a ‘continuum of increasing evidence of complex interventions’ (hereafter referred to as ‘increasing evidence’) have been defined, which require both qualitative and quantitative evidence [1, 2]

It has been suggested that, for many reasons, current methods used in systematic reviews do not allow adequate appraisal of complex interventions such as diabetes or hypertension self-management programmes [7]. For example, systematic reviews do not consider the theory behind the complex intervention (e.g. behavioural models) and do not differentiate between trials designed to determine efficacy and those focusing on implementation.

If used appropriately, meta-analysis is a powerful tool for investigating overall effects. However, if studies are clinically or methodologically heterogeneous, data-pooling may be meaningless and genuine differences in effects may be obscured [8]. Complex interventions are heterogeneous in their goals, methods and target populations [914]. Thus, using meta-analysis to evaluate complex interventions may disregard the complexity of the efficacy measures used in the original studies. For example, HbA1c is used as a single outcome variable without considering individual treatment goals or effects on hypoglycaemia, body weight or quality of life [9, 15, 16]. An increase in HbA1c from 6.5 to 7% might be considered a deterioration, but would clearly be a desirable outcome if it halved the rate of hypoglycaemia.

The aim of the present study was to describe and critically appraise available methodologies of systematic reviews on complex interventions. Patient self-management education programmes of diabetes and hypertension implemented in Germany were used as examples.

Materials and methods

The Cochrane Library, PubMed, Cumulative Index to Nursing and Allied Health (CINAHL) and Health Technology Assessment databases were systematically searched for systematic reviews published between 1997 and March 2006 (see search strategy in Electronic supplementary material [ESM]). Reference lists of retrieved reviews were screened for additional publications. Reviews were included if patient education programmes of diabetes and hypertension self-management were evaluated (see Selection criteria in ESM). Included reviews were analysed independently by two investigators (M. Lenz and A. Steckelberg) using standardised data extraction forms created following Cochrane criteria [17], according to predefined hypotheses (see Fig. 1, and Hypotheses and Data synthesis in ESM). All authors of the included reviews were contacted after data synthesis and a preliminary version of our review was provided. They were asked whether they felt that their review had been considered appropriately and their findings interpreted correctly.

Fig. 1
figure 1

Flow diagram of study selection. The term ‘systematic review’ includes meta-analyses and health technology assessment reports. Fourteen reviews in 16 publications (superscript a). Some reviews address more than one topic (superscript b)

The ‘increasing evidence’ of three patient self-management education programmes—type 1 diabetes, type 2 diabetes and hypertension—was used as the reference. It was assessed: (1) whether the selection criteria and search strategies used in the included systematic reviews were appropriate for detection of the ‘increasing evidence’ of the three reference programmes (ESM Tables 1, 2, 3); (2) which publications related to our three reference programmes had been identified; (3) whether theoretical background literature (theoretical/preclinical phase) had been considered; (4) which active components had been identified and included (ESM Tables 4, 5, 6); (5) whether classification of interventions matched the type of intervention; and (6) whether all relevant patient outcomes had been included (ESM Tables 7, 8, 9).

Results

A total of 15 reviews (16 publications) were included (Table 1, excluded reviews are listed in ESM Table 10). Various types of interventions were investigated (e.g. individual and group counselling, and structured training courses for patients). A meta-analysis was performed in eight reviews [911, 13, 15, 16, 1820]. A total of six reviews exclusively used qualitative data synthesis [2127] (ESM Table 11).

Table 1 Characteristics of included reviews

The majority of search strategies were comprehensive and transparently reported (ESM Table 12). In eight reviews, experts in the field were contacted to identify additional publications [10, 11, 13, 15, 1820, 25]. Reference tracking was performed in 11 reviews [10, 11, 13, 15, 16, 1822, 24]. Predefined contact of all authors was not reported in any review.

In six reviews only RCTs were considered [9, 10, 13, 15, 18, 26, 27] (see ESM Table 12). Following the selection criteria of the reviews, 16 publications [2843] referring to our three reference programmes should have been identified. In practice, a total of 11 were identified [5, 28, 30, 32, 34, 38, 39, 4346], and seven [28, 32, 34, 35, 38, 39, 46] were included in at least one of the reviews. In six reviews the main controlled trials of our reference programmes [28, 3032, 38, 44] were included in data synthesis [10, 11, 16, 18, 19, 25]. Out of 25 [5, 3337, 39, 4143, 4559] replication trials (phase 4, see text box: Phases of ‘increasing evidence of complex interventions’) four [34, 35, 39, 46] were included in six reviews [16, 19, 21, 22, 24, 25]. Different reviews on identical topics included different publications referring to the same reference programme.

No review explicitly evaluated publications on the theoretical basis of the reference programmes (preclinical/theoretical phase of ‘increasing evidence’). The UKMRC approach of the evaluation of complex interventions [1, 2] was cited in three reviews [18, 19, 25]. We did not find that the approach influenced their methodology.

All included reviews reported at least some features of the assessed interventions, which were either active components (e.g. setting, duration, interventionist, formal syllabus) or study characteristics (e.g. follow-up, age of participants, study quality). The investigated features were heterogeneous. Six reviews used regression or subgroup analysis to analyse the effect of single active components on outcomes [9, 11, 13, 15, 16, 19]. Descriptive analysis of the influence of active components was performed in five reviews [19, 21, 22, 24, 26]. Most of the components of our three reference programmes had been identified, but only a few had been evaluated. The identified and included components were different in reviews on identical topics (ESM Table 13).

In 12 reviews the included programmes were categorised by intervention [911, 13, 15, 16, 18, 19, 21, 22, 2527]. Categories were defined according to the active components of the included interventions (e.g. setting, duration, interventionist, formal syllabus) [9, 11, 13, 15, 16, 19, 21, 22], the type of intervention (e.g. type of disease, type of activity, organisational interventions) [10, 18, 21, 22, 2527] aimed at patients or health professionals, or the study characteristics (e.g. follow-up, age of participants, study quality) [11, 13, 15, 19, 24]. The applied categories varied.

The calculation of overall effects within categories of interventions was performed in eight reviews [911, 13, 15, 16, 1820]. Nine reviews included qualitative data synthesis within categories [13, 1922, 2427] (Table 1). In some reviews [10, 11, 18] the allocation into categories did not meet the type of intervention [32, 34, 35].

No review included all relevant patient outcome measures of our reference studies (ESM Table 14). If quantitative reviews were aimed at exploring multiple outcome measures, multiple meta-analyses were performed [10, 11, 13, 16, 1820]. If quantitative analysis was used according to the effect of active components, regression analysis or subgroup analysis was conducted [9, 11, 13, 15, 16, 19]. Interdependencies between different outcome measures of the same educational intervention were considered in only three reviews [2527], all of which performed qualitative data synthesis.

Four out of 12 contacted authors responded (P. Corabian, Alberta Heritage Foundation for Medical Research, Edmonton, AB, Canada; T. Deakin, Pendle Rossendale Primary Care Trust, Burnley General Hospital, Burnley, UK; T. Fahey, University of Dundee, Dundee, UK; C. M. Renders, Institute of Research in Extramural Medicine, Amsterdam, The Netherlands). All of them replied that their reviews were, in general, considered appropriately and their findings interpreted correctly. Specific comments on methodological problems such as study selection and categorisation of components of education programmes were acknowledged. We corrected five misinterpretations in reporting data and added unpublished information according to the authors’ comments.

Discussion

Most of the included systematic reviews discussed methodological challenges to appraise complex interventions. However, these considerations did not have adequate impact on the methods used in data synthesis.

The selection criteria used in most of the analysed reviews excluded study types other than RCTs; other important types of publications concerning the ‘increasing evidence’ were rarely included. The investigated reviews did not differentiate between the main controlled trial(s) and controlled replication trials referring to the same programme.

The importance of considering the theoretical basis of an education programme was widely discussed. However, we could not identify any approach to systematically assess the theoretical basis and its influence on judging the quality of the programmes.

The majority of reviews reported that the included programmes had been ‘multifaceted’ or ‘multidimensional’ or ‘consist of multiple active components.’ However, neither the UKMRC approach of ‘increasing evidence’ (published in 2000) [1, 2], nor a similar approach was integrated in any review. Methodological changes may take a long time to become accepted.

The categorisation of interventions used in the included reviews often seemed to be arbitrary: each review used different categories; none of them explained the rationale of their categorisation. Allocation of complex interventions into categories can be problematic, even if categories are derived from core components of programmes (e.g. education directed to the patient). If the categories refer to single but interdependent components, the compartmentalisation of efficacy is not possible.

Regression and subgroup analyses are the best tools for exploring heterogeneity [8]. However, these techniques should not be misused to identify the contributions of the various active components (e.g. intensity or duration of the programme) on the overall effect (e.g. knowledge of the target group or the importance perceived by the provider).

The analysed reviews did not consider all relevant patient outcome parameters. Components of complex outcome measures were singled out, especially if they used meta-analysis. The complex interdependency of individual treatment goals and outcomes (e.g. changes in medication and metabolic or blood pressure control) remained unexplored. Clinicians with experience in diabetes care might be consulted to focus the relevant clinical issues.

Recapitulating the research findings, we propose to take the following criteria into account when undertaking systematic reviews of complex interventions:

  1. 1.

    All studies referring to development, evaluation and implementation (‘increasing evidence’) should be considered, differentiating between phases of ‘increasing evidence’.

  2. 2.

    Literature searches should therefore not be limited by criteria such as certain types of studies, specific target groups and publication dates; reference tracking should be performed and authors should be contacted systematically.

  3. 3.

    Active components should be described and assessed, but should only be examined separately if they are independent and should not be disassembled if they work interdependently.

  4. 4.

    Education programmes should not be allocated into categories referring to interdependent components.

  5. 5.

    All relevant patient-orientated outcome parameters should be included.

  6. 6.

    Pooling of outcome measures across different programmes is usually inappropriate. Instead, the relative importance of outcomes [60] and the complex interdependency between treatment goals and outcomes should be described in detail.

Information necessary for the evaluation of education programmes is difficult or impossible to identify. Therefore, specific search strategies need to be developed and validated that aim to identify publications on all phases of ‘increasing evidence’. To facilitate the implementation of such an approach, what is needed is an electronic database that provides available programmes together with all relevant background information, or the implementation of such a system into existing databases.