Not long ago clinical decisions were mostly based on what someone’s colleagues were doing, or upon the personal experience of a well-thought-of clinician as conveyed during a lecture or in a hallway discussion. There was no written rationale, no review of published studies and no distillation of the evidence that systematically presented the advantages and disadvantages of a treatment. All this fostered enormous variations in practice, much inappropriate care, misuse, overuse and underuse of medical services, and considerable physician uncertainty about how best to treat disease.

The value of guidelines

Although all of the above problems are still worrisome, there is no question that the advent of clinical guidelines has helped enormously in reducing these and other problems [15]. For example, guidelines have become critically important tools that help simplify the complexity of medicine. They free clinicians from spending huge amounts of time sorting through and grasping the results (and details) of innumerable research publications on a given subject in order to arrive at a course of action. They provide reassurance that the ‘right’ course of action has been recommended by experts in the field, who themselves have distilled the relevant data, pooled their expertise and applied appropriate judgement to a medical condition.

The way guidelines have become the cornerstone of medicine is exemplified by the fact that they: (1) are essential teaching tools for medical students (e.g. textbooks); (2) appear in drug labelling (e.g. indications and contraindications); (3) are pre-requisite knowledge for medical licensure (e.g. the questions on board exams); (4) serve as basis for the evaluation and improvement of quality care in a population (e.g. performance measures); and (5) often function as the basis for reimbursement (e.g. Medicare, National Institute for Health and Clinical Excellence [NICE]). Guidelines are now so numerous and continually being written and updated by a wide range of medically related associations, medical specialty societies, managed care organisations and other groups, that they often trip over each other [6]. Although it has been difficult to draw an unequivocal straight line from a specific guideline to a change in the practice of medicine, it’s hard to imagine, for example, that the way diabetes care is delivered, from screening and detection through to diagnosis and ongoing medical management, not to mention the progress made in past decades is not the result of diabetes guidelines published in the last 20 or 30 years.

Some examples

Two recently published guidelines exemplify both the value and, as we’ll see, the shortcomings of the enterprise [7, 8]. The need for the first one [7] stems from the conflicting results of large, randomised control studies that addressed the effect of glycaemic control on incidence of CVD. After the Diabetes Control and Complications Trial (DCCT) and UK Prospective Diabetes Study (UKPDS) had shown the unequivocal benefit of glucose-lowering for preventing microvascular complications of diabetes, there remained great uncertainty whether intensive therapy lowered the risk of CVD, which is arguably the most impactful complication associated with diabetes. Subsequent long-term follow-up of participants in both studies indicated that patients undergoing intensive therapy experienced a significant reduction of CVD events [9, 10], suggesting that excellent glucose control could reduce the incidence of macrovascular complications. Just when it all seemed clear and straightforward, along came the results of three other trials specifically designed to address the benefits and risk of intensive therapy with regard to cardiovascular events [1113]. Surprisingly, all three showed no beneficial effect of intensive glucose-lowering on a composite of CVD endpoints. Indeed, one (the Action to Control Cardiovascular Risk in Diabetes [ACCORD] trial [11]) was terminated early because of excess deaths in the intensively treated group. Moreover, all three studies reported a significantly increased frequency of serious hypoglycaemic events related to intensive therapy. Subset analyses in all three studies subsequently provided information that might be useful for interpreting the main results and also shed light on which, if any, patients might benefit from intensive control. Of note, in all five studies, there were many significant differences in the patient population recruited, the therapy administered, HbA1c levels achieved and in the outcomes tracked.

How should a clinician interpret these studies? Should intensive therapy be offered to no one, everyone or only some subset(s) of the diabetic population? If there ever was a purpose and value of a guideline, this subject would seem to beg for one that could bring order, direction and rationality to an important topic.

Recognising the confusing nature of these studies, three prominent organisations appointed a group of experts in diabetes and CVD to review the data and provide treatment recommendations. The resulting guideline [7] reviews the benefits and risks of intensive therapy, discusses the nuances of the studies and summarises what it all means and the implications for clinical care. Precisely what a guideline is suppose to do.

The second example [8] relates to the complicated subject of the pharmacological treatment of hyperglycaemia in diabetes. With nine classes of glucose-lowering agents, at least that many pharmaceutical companies each spending many millions of dollars advertising the virtues of their ‘unique’ product and a literature base notable for its lack of many well controlled trials directly comparing different agents, it’s easy to appreciate the confusion of many clinicians when it comes to deciding what to use when starting therapy, what to use when the initial therapy fails, and how and when to use combination therapy. Should generalists wade through the umpteen papers published in more than two dozen journals, each reporting the benefit of one or two of the drugs, and then attempt themselves to devise a treatment algorithm? That’s unlikely to happen.

To provide some guidance in this area, a writing group was given a ream of publications related to the pharmacological treatment of diabetes, along with the specific publications suggested by each manufacturer of an available drug. Altogether there were dozens of articles to discuss and debate. In the end, the group developed a simple algorithm that highlighted the use of some agents and downplayed the use of others. One agent was not recommended. Although, undoubtedly, the guideline could have been much more complex, detailed or nuanced, the writing group felt their recommendations provided sufficient guidance to practitioners who cannot devote resources or time to remembering all the benefits, drawbacks and special characteristics of a wide variety of agents, each with virtually the same indication. Here, once again, is a guideline intended to reduce the complexity of medical decisions and free clinicians from estimating and weighing the pros and cons of a panoply of drugs.

There is more to a guideline than the evidence

In the case of the two guidelines cited above, other authors arrived at different conclusions in their review of virtually the same evidence base [1416]. Although some criticism of guidelines is rightfully directed toward conflicts of interest in members of the writing panels or the failure to cover some of the important elements of a guideline [1723], that did not seem to be the predominant reason for the alternative views. Indeed, it’s certainly possible that, even when major shortcomings are revealed in a guideline, it is still possible to reach the very same conclusions as those of the ‘flawed’ guideline [24]. This is because critics of guidelines in general [2022] or in specific cases [16] almost always overlook or ignore the fact that a major component of all guidelines is judgement.

In the first of a comprehensive, landmark series of papers on clinical and health policy decision making (all authored by D. M. Eddy and published in the Journal of the American Medical Association in 1990), it was pointed out that scientific evidence and judgement are the two key components [25] (Fig. 1) of a medical decision or policy. No doubt much criticism and debate around (a) guideline(s) results from the failure to appreciate that at every step of guideline development judgement is required. Judgement starts with the evaluation of the very clinical studies that form the basis of the guideline. For example, were the inclusion or exclusion rules too loose or rigid, was the drop-out rate too high, were the outcomes measured reasonable and fully documented, how well or properly was the trial executed, to what extent were the results clinically meaningful? Not surprisingly, therefore, the results of major, randomised control trials are routinely commented on in journal editorials, which point out strengths and weaknesses of the study in question, highlighting the fact that even the adequacy of the evidence is a matter of judgement or opinion, not an objective characteristic of the evidence itself [26]. Of course, the benefits, harms and cost of any treatment have trade-offs, the evaluation of which is especially subjective when interventions are compared and different experts bring their own opinions and biases to bear. Finally, judgement plays a key role in evaluating the weight of the evidence, especially when studies are combined [27]. Clearly, therefore, even the best of clinical trials or a summary of them do not necessarily provide clear-cut guidance. Evidence requires interpretation, which in turn requires judgement, which in turn requires the expression of opinion.

Fig. 1
figure 1

Components of a medical decision. Adapted with permission from Eddy [25]

Thus, no guideline, however much it stems from randomised controlled trials, is purely or even largely ‘evidence-based’, if that term is meant to imply that the recommendations were not greatly influenced by opinion, values and personal preferences, i.e. by judgement. To be sure, the subjective elements are derived from a platform of science, but they themselves are not scientific judgements [26]. It follows that papers criticising a guideline or even the construction of guidelines should be very clear about whether the disagreement or problem is about science (e.g. does evidence exist) or judgement (e.g. what does it mean).

What does this imply?

Does the subjectivity of guidelines imply we should do away with them? Not at all. As pointed out above, guidelines serve a very useful purpose. When experts review the evidence and come to a consensus on what it means (all with great transparency), physicians are provided with an informed opinion that obviates a lot of uncertainty, among other benefits. It would be nice if the consensus were forged by hundreds of opinions instead of those of a small writing group, but so far that seems difficult to implement and may be especially problematic if (as there always will be) some ‘experts’ refuse to compromise. Their reservations may of course be ‘right’, but whether practitioners would take the time and effort to study the various perspectives remains to be seen.

Other issues

Even under the best of circumstances, in the most obvious and agreed-upon guidelines, there will invariably be other problems, some inherent or fundamental, and these will always limit the utility of a guideline.

First, no set of trials or evidence can possibly address all the known caveats relevant to a clinical decision. In diabetes, patients can be at different stages of the disease process, with varying clinical manifestations of the diabetes syndrome. Outcomes in randomised trials represent averages derived from selected groups of patients, and the benefits or an intervention depend on many patient factors, most notably age, life expectancy and co-morbidities [28]. It is the complexity of these and other factors that by necessity makes the guidelines cited above [7, 8] so nuanced, because it is impossible to standardise therapy for the wide variety of patients seen in daily clinical practice. In the case of the pharmaco-therapy guideline [8], critics might then say something like ‘so every drug is good for someone’, which in fact the guideline acknowledges, but the writing group still believed it important to define a preferred route of therapy. In their judgement of the scientific evidence along with other clinical judgements deemed relevant (e.g. cost), they arrived at what they thought to be the best established, most effective and most cost-effective strategy for achieving target glycaemic goals, albeit with an algorithm that still leaves much room for clinician discretion.

Second, virtually all guidelines do not formally address the cost-effectiveness of the intervention(s) recommended. That is not necessarily an omission by design; rather, there are usually few relevant studies focusing on this point. The impact of costs in medicine has only recently been an issue, and for the vast majority of interventions for which guidelines exist, cost is rarely acknowledged let alone discussed, because studies on this aspect simply don’t exist. However, omitting the cost factor makes it impossible to arrive at the real value of an intervention. Everything offering benefit or relative benefit is recommended, yet it’s seldom acknowledged that a given intervention known to improve health outcomes may in fact be very expensive and of low value. In a recent report examining the cost-effectiveness of various approaches to preventing CVD, the cost-effectiveness of 11 commonly recommended/endorsed interventions differed by up to 100-fold [29]!

Finally, current guidelines do not always compare the full range of options available to treat a problem; yet if the full range is considered, this is likely to result in quite different benefits [29]. Also, when dealing with continuous variables (e.g. blood pressure, lipid and glucose levels), guidelines usually establish artificial cutpoints for when to begin treatment. For example, a number of interventions are available to reduce cardiovascular disease, yet no guideline weighs the comparative merits of blood pressure-, lipid- and glucose-lowering in light of a patient’s family and medical history along with the pre-treatment laboratory values for each of those variables. Instead, again by necessity, a guideline will usually say ‘treat X if it’s above Y’ and not deal with other important variables, or if it does, then only superficially. Although on occasion a guideline will provide one or two other trigger points depending on other factors, it cannot possibly cover the entire waterfront of circumstances known to be relevant. Alternatively, as demonstrated in the two examples above, the recommendations will have many caveats and will be proscriptive only to a modest degree. In short, although helpful, they are far from definitive.

Can it be done better?

This paper is certainly not the first to outline some of the shortcomings of guidelines. Moreover, as pointed out earlier, many authors have suggested important technical remedies like better grading systems and the elimination of obvious conflicts. Others have recommended more fundamental remedies [30], which while important and helpful, will not and cannot lead to a recommended course of action that considers the complete profile of the patient. Some have seemed to try, such as with the recommendations for management of ST-elevation myocardial infarction [31], but with over 400 recommendations and some 70 tables and figures, such documents are not very accessible to clinicians and may still be incomplete. Programming such documents into electronic medical records will not be the way to weigh and prioritise treatment based on all the options available and all the relevant patient characteristics. Others pin their hopes on critically needed research that will hopefully uncover genotypes or biomarkers enabling disease subtypes and the way they respond to therapy to be identified [32]. At present, however, there is no evidence that this approach will result in meaningful algorithms that can be used to guide therapy in more than a fraction of cases. More recently, Hayward and colleagues [33] developed an individualised approach to statin therapy, but their tool for assessing risk and guiding treatment (the Framingham equation) is clinically simplistic and was not designed, nor has it been validated, to make specific treatment decisions. Their observation, however, that using even a crudely tailored treatment approach seems superior to a treat-to-target strategy indicates that individualised guidelines have enormous potential.

One approach of great promise is the use of computer-generated algorithms to produce patient-specific guidelines. Comprehensive, well-validated mathematical models of disease and healthcare delivery exist [34] and are already being used to predict the risk of disease [35] and the likelihood of benefit from one of many different treatments [36], depending on a person’s demographic and medical profile. It would not seem an enormous leap for such software to produce individualised patient guidelines that take the next step, namely to weigh and prioritise a wide variety of treatments, although such an approach has yet to be demonstrated and validated.

In the meantime, we are stuck with what is necessary but inadequate: our current clinical guidelines. We need them, we know they pose problems, and we must find a way to replace them with something better.