Introduction

Clinical practice guidelines are systematically developed statements whose primary purpose is to assist clinical decision-making by providing a rational basis of therapy using the best available scientific evidence [1]. Akin to a randomized controlled trial, a guideline should be critically appraised prior to endorsing its recommendations because poor quality guidelines fail to reduce unnecessary variations in care [2]. Quality assessment techniques have traditionally focused on evaluating the methods used to obtain, formulate and report recommendations rather than to appraise the validity of individual recommendations by linking recommendation strength to the quality of supporting evidence [3]. A recent analysis of cardiovascular disease guidelines revealed that almost half of their recommendations were supported by low quality evidence [4]. Many guidelines provide pharmacotherapy recommendations for the critically ill, yet their quality and the caliber of scientific evidence supporting their strong recommendations are unknown. The aims of this study were to determine the quality of guidelines that provide critical care pharmacotherapy recommendations and to assess the quality of evidence supporting their strong pharmacotherapy recommendations.

Methods

Clinical practice guideline selection

MEDLINE (1966–February 2008), EMBASE (1980–February 2008) and National Guideline Clearinghouse (February 2008) databases were searched using a predetermined critical care topic list. Review of critical care medicine and related professional society websites and hand-search of references of retrieved articles were also performed. English-language guidelines were included if they addressed critical care topics and were published prior to 1 February 2008. Those that did not provide pharmacotherapy recommendations, addressed pediatric populations or were considered outdated (a revised version was available from same guideline developer) were excluded.

Clinical practice guideline quality assessment tool

The appraisal of guidelines, research and evaluation (AGREE) instrument was used as the guideline assessment tool [5]. This validated tool requires appraisal of 23 items that are organized into six domains: scope and purpose, stakeholder involvement, rigor of development, clarity and presentation, applicability and editorial independence. Items were independently scored on a 4-point Likert scale by four investigators [5]. Standardized domain scores were calculated by summing the scores of individual items in a domain and by standardizing the total as a percentage (0–100%) of the maximum possible score for that domain [5]. A guideline can be strongly recommended if the majority of item scores are 3 or 4 and the majority of standardized domain scores are 60% or greater. A guideline can be recommended with alterations if there are equal numbers of item scores 3 or 4 and 1 or 2, and most standardized domain scores are between 30 and 60%. A guideline cannot be recommended if the majority of item scores are 1 or 2, and most standardized domain scores are 30% or less [5].

Data synthesis

Five investigators (two critical care pharmacy specialists, one emergency medicine pharmacy specialist, one internal medicine pharmacy specialist and one pharmacy practice resident) were involved in guideline quality assessment [6]. Two investigators (SG, MHC) independently appraised 24 guidelines and others evaluated 17 (PZ), 16 (RS) and 15 (KW) guidelines, respectively. Item scores for each guideline were entered into a Microsoft Excel 2003 (Microsoft Corporation, Redmond, WA) database and were electronically transferred to one investigator (SG) for aggregation into a master database. One investigator (MHC) extracted pre-defined guideline characteristics and the quality of evidence supporting strong pharmacotherapy recommendations. Pharmacotherapy recommendations were characterized as “strong” if the respective guideline authors defined them as such. The quality of supporting evidence was standardized according to a modified Center for Evidence-Based Medicine (CEBM) level of evidence criteria [7] (supplementary Appendix 1). The primary outcome was the mean standardized score for each of the six AGREE domains [5]. Secondary outcomes included the proportion of guidelines that were strongly recommended, recommended with alterations and not recommended according to the AGREE criteria; and the proportion of strong pharmacotherapy recommendations that were supported by highest quality evidence.

Data validation

To determine whether errors may have occurred in item scoring, one investigator (SG) examined all final item scores across the four appraisals for potential item discrepancies. Discrepancies were defined as inter-rater score differences of three points on any domain item. All appraisers were then asked to perform another AGREE assessment on the discrepant item in question. Only one investigator (SG) was aware of other appraisers’ scores on those items at the time of reassessment. After item reassessments were independently performed, the scores were considered to be final and analyses were performed using these data. The mean intraclass correlation (ICC) two-way random model was calculated for each domain to assess appraiser agreement [8].

Results

Clinical practice guideline selection and characteristics

The electronic search yielded 128 guidelines and a hand-search of personal files yielded 25 guidelines, and after accounting for duplicates and exclusion criteria, 24 guidelines were included (Fig. 1). Guideline topics included brain injury and cerebrovascular trauma [911], aneurysmal subarachnoid hemorrhage and spontaneous intracerebral hemorrhage [12, 13], status epilepticus [14], sedation, analgesia and neuromuscular blockade [15, 16], ventilator-associated pneumonia (VAP) [17, 18], community-acquired pneumonia [19], ARDS [20], severe acute respiratory syndrome (SARS) [21], nitric oxide therapy [22], severe sepsis and septic shock [23, 24], colloid use [25], cardiopulmonary resuscitation [26], stress ulcer prophylaxis [27], intra-abdominal infections [28, 29], pancreatitis [30] and catheter-related blood stream infections [31, 32] (Appendix 1). All guidelines were developed in association with a critical care-related professional society, all were published in peer-reviewed journals, two-thirds addressed the adult-only population and were published after the Conference on Guideline Standardization (COGS) in 2003 (Supplementary Table 1). Half of the guidelines were first versions, and less than one-fifth were considered consensus statements.

Fig. 1
figure 1

Trial flow diagram

Data validation

There were 544 AGREE items scored, and after initial assessment, 42 (7.6%) discrepancies were identified in 21 (88%) guidelines. After independent reassessment of these items, 15 (2.7%) discrepancies remained in 7 (29%) guidelines. The mean ICCs for scope and purpose, stakeholder involvement, rigor of development, clarity, applicability and editorial independence were 0.79, 0.86, 0.94, 0.84, 0.89 and 0.96, respectively.

AGREE domain scores and overall recommendations

Clarity and presentation domain scored highest [69% (95% confidence interval (CI) 62–76%)] and applicability domain scored lowest [19% (95% CI 12–26%)] (Fig. 2). Based on AGREE criteria for the appropriate development of CPGs, 25% of the guidelines are strongly recommended, 37.5% are recommended with alterations, and 37.5% are not recommended (Appendix 1).

Fig. 2
figure 2

Mean (95% confidence intervals) standardized AGREE domain scores

Quality of evidence supporting strong pharmacotherapy recommendations

Two hundred forty-eight strong pharmacotherapy recommendations were extracted from 24 guidelines, and 89 (36%) of these recommendations were supported by the highest quality evidence.

Discussion

Only one quarter of the assessed critical care guidelines are of the highest quality and can be strongly recommended for use in practice. Examples of guidelines that can be strongly recommended address management of severe traumatic brain injury, prevention of VAP and stress ulcer prophylaxis [10, 17, 27]. Despite using a more liberal definition of high-quality guidelines to include guidelines that could be recommended with alterations as per AGREE, only two-thirds of all critical care pharmacotherapy guidelines assessed could be considered high quality. Examples of guidelines that cannot be recommended address penetrating brain injury, SARS and hemodynamic support of sepsis [9, 21, 24].

Wide variability existed in scores across the six AGREE domains. Overall low guideline quality may be accounted for by low scores within applicability, stakeholder involvement and editorial independence domains. Applicability consists of three items pertaining to the likely organizational, behavioral and cost implications of applying the guideline [5]. Most guidelines failed to discuss implications of applying the guideline, nor did they discuss key review criteria that could be used for monitoring or audit purposes. Stakeholder involvement consists of four items that focus on the extent to which the guideline represents the views of its intended users [5]. The two items addressing solicitation of patients’ views and target-user piloting of the guidelines usually scored low. Editorial independence consists of two items that are concerned with the independence of the recommendations and acknowledgement of possible conflict of interest from the guideline development group [5]. Explicit statements that the views/values or interests of the funding body have not influenced the final recommendations and conflict of interest declarations were often absent. Despite these shortcomings, it is encouraging to note that rigor of development, which is the most highly weighted AGREE domain consisting of seven items, was one of the highest scoring domains.

This is not the first published analysis of the quality of critical care guidelines; however, our analysis is unique as it examined both guideline quality and the quality of evidence supporting strong pharmacotherapy recommendations [33]. Quality of evidence should reflect the extent to which confidence in an estimate of the effect is adequate to support recommendations, and study design is a crucial factor in determining this [34]. It is disconcerting, however, that only one-third of all “strong” critical care pharmacotherapy recommendations were supported by the highest quality evidence. Processes of evidence assessments that rely on consensus in making recommendations introduce an opaque dimension to how the recommendations are made and compromise objectivity [35].

Many suggestions can be made for using the results of this analysis. Organizations that produce guidelines should adhere to the methods suggested by the AGREE Collaboration because they encourage a systematic approach to addressing the most important traits of a guideline. It is also recommended that there be a transparent link between quality of evidence and the strength of recommendations included in their guideline documents. One suggested approach is to utilize the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group approach [36]. Our final recommendation is that guidelines are not necessary for every disease, but are required for diseases having significant practice variability and for which a valid evidence base can guide recommendations [37].

A potential limitation to this study is that the sample of guidelines assessed may not represent the larger pool of critical care pharmacotherapy guidelines. Selection bias may have resulted in omission of high quality guidelines from this analysis due to an incomplete search strategy. We aimed to minimize the potential for introducing this bias by performing a systematic search of multiple electronic databases using predefined criteria encompassing a heterogeneous mix of critical care pharmacotherapy topics. Secondly, while one-fifth of the guidelines we assessed were considered consensus statements, these are intended to be applied to practice in a similar fashion as a guideline and should be held to the same rigorous quality standards. Another potential limitation pertains to the instrument employed to assess guideline quality. The AGREE instrument evaluates the methods used to synthesize and report the guideline rather than evaluate the quality of its contents [38]. We attempted to minimize this limitation by systematically evaluating the quality of scientific evidence to support strong pharmacotherapy recommendations. However, we did not incorporate the degree of concordance between evidence quality and strength of recommendation into each guideline’s quality assessment. It is possible that a guideline that would not be recommended for use based on the AGREE instrument because of poor methodology or reporting could provide pharmacotherapy recommendations based on high quality evidence. The significant disconnect observed across the guidelines in terms of evidence quality backing pharmacotherapy recommendations makes this unlikely. There also is potential for lack of reliability among appraisers when using the AGREE instrument. We aimed to mitigate this by insuring that each appraiser first understood the AGREE instrument by using the training manual and by employing the maximum recommended number of appraisers [8]. The ICCs for each AGREE domain were similar to or higher than reported in the AGREE validation study, which reflects a high degree of agreement among the four appraisers [8]. Because of the large quantity of items that were scored across the four appraisals, there was potential for data entry or scoring errors, which could have led to inaccurate results. After identifying potential erroneous scores and independently re-scoring these items, fewer than 3% of items appeared discordant, which could be explained by normal variation in scoring across four appraisers. Finally, it is unlikely that the entire contents of high quality guidelines can be adopted into practice without first being adapted, and tools exist to facilitate this [39, 40].

In conclusion, only two-thirds of critical care pharmacotherapy-related clinical practice guidelines can be recommended for use, and most strong pharmacotherapy recommendations are backed by low quality scientific evidence. Guideline developers should endorse the AGREE Collaboration recommendations when constructing future critical care guidelines and should employ the GRADE approach when formulating pharmacotherapy recommendations. Critical care clinicians should critically appraise guidelines and scrutinize the scientific evidence supporting recommendations prior to applying them to practice.