Background

Much of medical care hinges on performing the right test, on the right patient, at the right time. Apart from their financial cost, diagnostic tests have downstream implications on care and, ultimately, patient outcomes. Yet, studies suggest wide variation in diagnostic test ordering behavior for seemingly similar patients [14]. This variation may be due to overuse or underuse of tests and may reflect inaccurate interpretation of results, rapid advances in diagnostic technology, and challenges in estimating tests' performance characteristics [510]. Thus, developing effective strategies to optimize healthcare practitioners' diagnostic test ordering behavior has become a major concern [11].

A variety of methods have been considered, including educational messages, reminders, and computerized clinical decision support systems (CCDSSs) [2, 1214]. For example, Thomas et al.[15] programmed a laboratory information system to automatically produce reminder messages that discourage future inappropriate use for each of nine diagnostic tests. A systematic review of strategies to change test-ordering behavior concluded that most interventions assessed were effective [2]. However, this review was limited by the low quality of primary studies. More recently, Shojania et al.[16] quantified the magnitude of improvements in processes of care from computer reminders delivered to physicians for any clinical purpose. Pooling data across randomized trials, they found a modest 3.8% median improvement (interquartile range [IQR], 15.9%) in adherence to test ordering reminders.

CCDSSs match characteristics of individual patients to a computerized knowledge base and provide patient-specific recommendations. The Health Information Research Unit (HIRU) at McMaster University previously conducted a systematic review assessing the effects of CCDSSs on practitioner performance and patient outcomes in 1994 [17], updated it in 1998 [18], and most recently in 2005 [19]. However, these reviews have not focused specifically on the use of diagnostic tests.

In this current update, we had the opportunity to partner with local hospital administration, clinical staff, and representatives of our regional health authority, in anticipation of major institutional investments in health information technology. Many new studies have been published in this field since our previous work in 2005 [19] allowing us to focus on randomized controlled trials (RCTs), with their lessened risk of bias. To better address the information needs of our decision-making partners, we focused on six separate topics for review: diagnostic test ordering, primary preventive care, drug prescribing, acute medical care, chronic disease management, and therapeutic drug monitoring and dosing. In this paper, we determine if CCDSSs improve practitioners' diagnostic test ordering behavior.

Methods

We previously published detailed methods for conducting this systematic review available at http://www.implementationscience.com/content/5/1/12[20]. These methods are briefly summarized here, along with details specific to this review of CCDSSs for diagnostic test ordering.

Research question

Do CCDSSs improve practitioners' diagnostic test ordering behavior?

Partnering with decision makers

The research team engaged key decision makers early in the project to guide its design and endorse its funding application. Direction for the overall review was provided by senior administrators at Hamilton Health Sciences (one of Canada's largest teaching hospitals) and our regional health authority. JY (Department of Medicine) and DK (Chair, Department of Radiology) provided specific guidance for the area of diagnostic test ordering by selecting from each study the outcomes relevant to diagnostic testing. HIRU research staff searched for and selected trials for inclusion, as well as extracted and synthesised pertinent data. All partners worked together through the review process to facilitate knowledge translation, that is, to define whether and how to transfer findings into clinical practice.

Search strategy

We previously published the details of our search strategy [20]. Briefly, we examined citations retrieved from MEDLINE, EMBASE, Ovid's Evidence-Based Medicine Reviews, and Inspec bibliographic databases up to 6 January 2010, and hand-searched the reference lists of included articles and relevant systematic reviews.

Study selection

In pairs, our reviewers independently evaluated each study's eligibility for inclusion, and a third observer resolved disagreements. We first included all RCTs that assessed a CCDSS's effect on healthcare processes in which the system was used by healthcare professionals and provided patient-specific assessments or recommendations. We then selected trials of systems that gave direct recommendations to order or not to order a diagnostic test, or presented testing options, and measured impact on diagnostic processes. Trials of systems that simply gave advice for interpreting test results were excluded (such as Poels et al.[21]), as were trials of diagnostic systems that only reasoned through patient characteristics to suggest a diagnosis without making test recommendations (such as Bogusevicius et al.[22]). Systems that provided only information, such as cost of testing [23] or past test results [24] without actionable recommendations or options were also excluded.

Data extraction

Pairs of reviewers independently extracted data from all eligible trials, including a wide range of system design and implementation characteristics, study methods, setting, funding sources, patient/provider characteristics, and effects on care process and clinical outcomes, adverse effects, effects on workflow, costs, and practitioner satisfaction. Disagreements were resolved by a third reviewer or by consensus. We attempted to contact primary authors of all included trials to confirm extracted data and to provide missing data, receiving a response from 69% (24/35).

Assessment of study quality

We assessed the methodological quality of eligible trials with a 10-point scale consisting of five potential sources of bias, including concealment of allocation, appropriate unit of allocation, appropriate adjustment for baseline differences, appropriate blinding of assessment, and adequate follow-up [20]. For each source of bias, a score of 0 indicated the highest potential for bias, whereas a score of 2 indicated the lowest, generating a range of scores from 0 (lowest study quality) to 10 (highest study quality). We used a 2-tailed Mann-Whitney U test to assess whether the quality of trials has improved with time, comparing methodologic scores between trials published before the year 2000 and those published later.

Assessment of CCDSS intervention effects

In determining effectiveness, we focused exclusively on diagnostic testing measures and defined these broadly to include performing physical examinations (e.g., eye and foot exams), blood pressure measurements, as well as ordering laboratory, imaging, and functional tests. Patient outcomes were excluded from this study because, in general, they are most directly affected by treatment action and could not be attributed solely to diagnostic testing advice, especially in systems that also recommended therapy. Impact on patient outcomes and other process outcomes was assessed in our other current reviews on primary preventive care, drug prescribing, acute medical care, chronic disease management, and therapeutic drug monitoring and dosing.

Whenever possible, we classified systems as serving at least one of three purposes: disease monitoring (e.g., measuring HbA1c in diabetes), treatment monitoring (e.g., measuring liver enzymes at time of statin prescription), and diagnosis (e.g., laboratory tests to detect source of fever). We classified trials in each area depending on whether they gave recommendations for that purpose and measured the outcome of those recommendations. Trials of systems for monitoring of medications with narrow therapeutic indexes, such as insulin or warfarin, are the focus of a separate report on CCDSSs for toxic drug monitoring and dosing and are not discussed here.

We looked for the intended direction of impact: to increase or to decrease testing. We considered a system effective if it changed, in the intended direction, a pre-specified primary outcome measuring use of diagnostic tests (2-tailed p < 0.05). If multiple pre-specified primary outcomes were reported, we considered a change in ≥50% of outcomes to represent effectiveness. We considered primary those outcomes reported by the author as 'primary' or 'main,' or if no such statements could be found, we considered the outcome used for sample size calculations to be primary. In the absence of a relevant primary outcome, we looked for a change in ≥50% of multiple pre-specified secondary outcomes. If there were no relevant pre-specified outcomes, systems that changed ≥50% of reported diagnostic process outcomes were considered effective. We included studies with multiple CCDSS arms in the count of 'positive' studies if any of the CCDSS arms showed a benefit over the control arm. These criteria are more specific than those used in our previous review [19]; therefore, some studies included in our earlier review [19] were re-categorised with respect to their effectiveness in this review.

Data synthesis and analysis

We summarized data using descriptive measures, including proportions, medians, and ranges. Denominators vary in some proportions because not all trials reported relevant information. We conducted our analyses using SPSS, version 15.0. Given study-level differences in participants, clinical settings, disease conditions, interventions, and outcomes measured, we did not attempt a meta-analysis.

A sensitivity analysis was conducted to assess the possibility of biased results in studies with a mismatch between the unit of allocation (e.g., clinicians) and the unit of analysis (e.g., individual patients without adjustment for clustering). Success rates comparing studies with matched and mismatched analyses were compared using chi-square for comparisons. No differences in reported success were found for diagnostic process outcomes (Pearson X2 = 0.44, p = 0.51). Accordingly, results have been reported without distinction for mismatch.

Results

Figure 1 shows the flow of included and excluded trials. Across all clinical indications, we identified 166 RCTs of CCDSSs and inter-reviewer agreement on study eligibility was high (unweighted Cohen's kappa, 0.93; 95% confidence interval [CI], 0.91 to 0.94). In this review, we included 35 trials described in 45 publications because they measured the impact on test ordering behavior of CCDSSs that gave suggestions for ordering or performing diagnostic tests [15, 2557, 5767]. Thirty-two included studies contributed outcomes to both this review and other CCDSS interventions in the series; four studies [34, 37, 41, 68] to four reviews, 11 studies [25, 32, 33, 35, 36, 39, 40, 4244, 4649, 51, 57, 61] to three reviews, and 17 studies [2631, 38, 45, 50, 5256, 5860, 6265] to two reviews; but we focused here only on diagnostic test ordering process outcomes.

Figure 1
figure 1

Flow diagram of included and excluded studies for the update 1 January 2004 to 6 January 2010 with specifics for diagnostic test ordering*. *Details provided in: Haynes RB et al. [20]. Two updating searches were performed, for 2004 to 2009 and to 6 January 2010 and the results of the search process are consolidated here.

Our assessment of trial quality is summarized in Additional file 1, Table S1; system characteristics in Additional file 2, Table S2; study characteristics in Additional file 3, Table S3; outcome data in Table 1 and Additional file 4, Table S4; and other CCDSS-related outcomes in Additional file 5, Table S5.

Table 1 Summary results for CCDSS trials of diagnostic test orderinga

Study quality

Details of our methodological quality assessment can be found in Additional file 1, Table S1. Fifty-four percent of trials concealed group allocation [26, 27, 30, 3235, 3740, 4244, 50, 5255, 6063, 6668]; 51% allocated clusters (e.g., entire clinics or wards) to minimize contamination between study groups [15, 25, 2830, 34, 36, 3844, 46, 5053, 60, 6264, 68]; 77% either showed no differences in baseline characteristics between study groups or adjusted accordingly [15, 2637, 39, 40, 4555, 58, 59, 6166, 68]; 69% of trials achieved ≥90% follow-up for the appropriate unit of analysis [15, 25, 2835, 37, 3941, 45, 5056, 5861, 66, 67]; and all but one used blinding or an objective outcome [45].

Most studies had good methodological quality (median quality score, 8; ranging from 2 to 10) and 63% (22/35) [15, 2538, 5055, 5861, 65] were published after our previous search in September 2004. Study quality improved with time (median score before versus after year 2000, 7 versus 8, 2-tailed Mann-Whitney U = 44.5; p = 0.002), mainly because early trials did not conceal allocation and failed to achieve adequate follow-up.

CCDSS and study characteristics

Systems' design and implementation characteristics are presented in Additional file 2, Table S2, but not all trials reported these details. CCDSSs in 80% of trials (28/35) gave advice at the time of care [2527, 30, 31, 3437, 3951, 54, 5664, 6668]; most were integrated with electronic medical records (82%; 27/33) [15, 26, 27, 3034, 36, 37, 39, 40, 4251, 54, 5658, 6064, 6668] and some were integrated with computerized physician order entry (CPOE) systems (26%; 7/27) [3133, 37, 50, 54, 67, 68]; 77% (24/31) automatically obtained data needed to trigger recommendations from electronic medical records [15, 26, 27, 3034, 36, 37, 39, 40, 45, 46, 50, 51, 54, 5658, 6064, 6668], while others relied on practitioners, existing non-prescribing staff, or research staff to enter data. In most trials (61%; 20/33) advice was delivered on a desktop or laptop computer [15, 26, 27, 31, 34, 3641, 50, 51, 54, 5863, 6668], but other methods included personal digital assistants, email, or existing staff. Seventy-four percent (26/35) of systems were implemented in primary care [15, 2540, 4245, 5054, 58, 6065, 67]; 56% (14/25) were pilot tested [25, 2833, 36, 38, 4245, 51, 54, 62, 63, 66, 67]; and users of 59% (17/29) were trained [2529, 3133, 35, 37, 3944, 46, 51, 54, 5860, 67]. Eighty-three percent of trials (29/35) declared that at least one author was involved in the development of the system [15, 2533, 36, 37, 3941, 4553, 5560, 6268]. In general, user interfaces were not described in detail. Additional file 3, Table S3 gives further description of the setting and method of CCDSS implementation.

The 35 trials included a total of 4,212 practitioners (median, 132; ranging from 14 to 600, when reported) caring for 626,382 patients (median, 2,765; ranging from 164 to 400,000, when reported) in 835 clinics (median, 15; ranging from 1 to 142, when reported) across 545 distinct sites (median, 4.5; ranging from 1 to 112, when reported).

Three trials did not declare a funding source [31, 57, 60]. Of those that did, 78% (25/32) were publically funded [15, 2530, 3638, 4151, 54, 56, 58, 59, 6164, 6668], 9% (3/32) received both private and public funding [39, 40, 55, 61], and 13% (4/32) were conducted with private funds only [3235, 65].

CCDSS effectiveness

Each system's impact on the use of diagnostic tests is summarized in Table 1, and Additional file 4, Table S4 provides a detailed description of test ordering outcomes. These outcomes were primary in 37% (13/35) of trials [15, 2527, 31, 41, 5053, 55, 58, 6163, 66].

Fifty-six percent (18/33) of evaluated trials demonstrated an impact on the use of diagnostic tests [15, 2531, 41, 52, 53, 5557, 5964, 66, 67] Two studies [65, 68] met all eligibility criteria and included diagnostic process measures but were excluded from the assessment of effectiveness because they did not provide statistical comparisons of these measures.

Disease monitoring

Systems in 49% (17/35) of trials (median quality score, 7; ranging from 4 to 10) gave recommendations for monitoring active conditions, all focusing on chronic diseases [2549]. Their effectiveness for improving all processes of care and patient outcomes was assessed in our review on chronic disease management. Here we looked specifically for their impact on monitoring activity and found that 35% (6/17) increased appropriate monitoring [2531, 41].

In the context of diabetes, four of eight trials successfully increased timely monitoring of common targets such as HbA1c, blood lipids, blood pressure, urine albumin, and foot and eye health [2630, 41]. One of two systems that focused primarily on monitoring of hypertension was effective at increasing the frequency of appropriate blood pressure measurement [31]. One of three trials that focused on dyslipidemia improved monitoring of blood lipids [25]. Another three systems gave suggestions for monitoring of asthma [35, 37, 39, 40], angina [39, 40], chronic obstructive pulmonary disease (COPD) [37], and one for a combination of renal disease, obesity, and hypertension [4749], but all failed to change testing behavior.

Treatment monitoring

Systems in 23% of trials (8/35) [34, 5057] provided suggestions for laboratory monitoring of drug therapy. Trials in this area were generally recent and of high quality (median score, 8.5; range, 2 to 10; 75% (6/8) published since 2005). They targeted a wide range of medications (described in Additional file 4, Table S4) and are discussed in detail in our review of CCDSSs for drug prescribing, which looked for effects on prescribing behavior and patient outcomes. Focusing on their effectiveness for improving laboratory monitoring, we found that 63% (5/8) improved practices such as timely monitoring for adverse effects of medical therapy [34, 52, 53, 5557]. However, two of the trials demonstrating an impact were older and had low methodologic scores [56, 57].

Diagnosis

Systems in 17% of trials (6/35) [5864] gave recommendations for ordering tests intended to aid diagnosis (median quality score, 7.5; ranging from 6 to 9) and 67% (4/6) were published since 2005 [5861]. Eighty-three percent (5/6) successfully improved test ordering behavior [5964]. Systems suggested tests to investigate suspected dementia in primary practice [60], to detect the source of fever for children in the emergency room [59], to increase bone mineral density measurements for diagnosing osteoporosis [61], to reduce unnecessary laboratory tests for diagnosing urinary tract infections or sore throats [62, 63], to diagnose HIV [58], and to diagnose a host of conditions, including cancer, thyroid disorders, anemia, tuberculosis, and others [64].

Other

Finally, five trials did not specify the clinical purpose of recommended tests [15, 6568], or suggested tests for several purposes but without data necessary to isolate the effects on testing for any one purpose. Three of five focused on reducing ordering rates and were successful [15, 66, 67]. Javitt et al. intended to increase test ordering and measured compliance with suggestions, but did not evaluate the outcome due to technical problems [65]. Overhage et al. meant to increase 'corollary orders' (tests to monitor the effects of other tests or treatments), but did not present statistical comparisons of their data on diagnostic process outcomes [68].

Costs and practical process-related outcomes

Potentially important factors such as user satisfaction, adverse events, and impact on cost and workflow were rarely studied ( see Additional file 5, Table S5). Because most systems also gave recommendations for therapy, we were usually unable to isolate the effects of test-ordering suggestions on these factors, and here we discuss systems that gave only testing advice.

Two trials estimated statistically significant reductions in the cost of care, but estimates were small in one study [37] and imprecise (large confidence interval) in the other [28, 29]. A third study estimated a relatively small reduction in annual laboratory costs ($35,000), but presented no statistical comparisons [66].

Three trials formally evaluated user satisfaction. One study found mixed satisfaction with a system for monitoring of diabetes and postulated that this was due to technical difficulties [26, 27]. Another found that 78% of users felt CCDSS suggestion for ordering of HIV tests had an effect on their test-ordering practices, despite failing to show an effect of the CCDSS in the study [58]. The third study found that, regardless of high satisfaction with the local CPOE system, satisfaction with reminders about potentially redundant laboratory tests was lower (3.5 on a scale of 1 to 7) [66].

Only one study formally looked for adverse events caused by the CCDSS [66]. The system was designed to reduce potentially redundant clinical laboratory tests by giving reminders. Researchers assessed the potential for adverse events by checking for new abnormal test results for the same test performed after initial cancellation. Fifty-three percent of accepted reminders for a redundant test were followed by the same type of test within 72 hours, and 24% were abnormal, although only 4% provided new information and 1% led to changes in clinical management.

One study made a formal attempt to measure impact on user workflow and found that use of the CCDSS did not increase length of clinical encounters [45]. However, this outcome was not prespecified and the study may not have had adequate statistical power to detect an effect.

Discussion

Our systematic review of RCTs of CCDSSs for diagnostic test ordering found that overall testing behavior was improved in just over one-half of trials. We considered studies 'positive' if they showed a statistically significant improvement in at least 50% of diagnostic process outcomes.

While the earliest RCT of a system for this purpose was published in 1976, most examples have appeared in the past five years, and evaluation methods have improved with time. Systems' diagnostic test ordering advice was most often intended to increase the ordering of certain tests in specific situations. Most systems suggested tests to diagnose new conditions, to monitor existing ones, or to monitor recently initiated drug treatments. Trials often demonstrated benefits in the areas of diagnosis and treatment monitoring, but were seldom effective for disease monitoring. All four systems that were explicitly meant to decrease unnecessary testing were successful [15, 62, 63, 66, 67]. CCDSSs may be better suited for some purposes than for others, but we need more trials and more detailed reporting of potential confounders, such as system design and implementation characteristics, to reliably assess the relationship between purpose and effectiveness.

Previous reviews have separately synthesized the literature on ways of improving diagnostic testing practice and on the effectiveness of CCDSSs [2, 1214, 1719, 69]. Our current systematic review combines these areas and isolates the impact of CCDSS on diagnostic test ordering. However, several factors limited our analysis. Importantly, we chose not to evaluate effects on patient outcomes because many systems also gave treatment suggestions that affect these outcomes more directly than does test ordering advice. Some systems gave recommendations for testing but their respective studies did not measure the impact on test ordering practice and were, therefore, excluded from this review [7072]. Only 37% of trials assessed impact on test ordering activity as a primary outcome, and others may not have had adequate statistical power to detect testing effects.

We did not determine the magnitude of effect in each study, there being no common metric for this, but simply considered studies 'positive' if they showed a statistically significant improvement in at least 50% of diagnostic process outcomes. As a result, some of the systems considered ineffective by our criteria reported statistically significant findings, but only for a minority of secondary or non-prespecified outcomes. Indeed, the limitations of this 'vote counting' [73] are well established and include increased risk of underestimating effect. However, our results remain essentially unchanged from our 2005 review [19] and are comparable to another major review [74], and a recent 'umbrella' review of high-quality systematic reviews of CCDSSs in hospital settings [75].

Vote counting prevented us from assessing publication bias but we believe that, along with selective outcome reporting, publication bias is a real issue in this literature because most systems were tested by their own developers.

We observed an improvement in trial quality over time, but this may simply reflect better reporting after standards such as Consolidated Standards of Reporting Trials (CONSORT) were widely adopted. Thirty-one percent of the authors we attempted to contact did not respond, and this may have particularly affected the quality of our extraction from older, less standardised reports.

While the number of RCTs has increased, the majority of these studies did not investigate or describe potentially important factors, including details of system design and implementation, costs and effects on user satisfaction, and workflow. Reporting such information is difficult under the space constraints of a trial publication, but supplementary reports may be an effective way to communicate these important details. One example comes from Flottorp et al.[62, 63] who reported a process evaluation exploring the factors that affected the success of their CCDSS for management of sore throat and urinary tract infections. Feedback from practices showed that they were generally satisfied with installing and using the software, its technical performance, and with entering data. It also showed where they faced implementation challenges and which components of the intervention they used.

Our systematic review uncovered only three studies evaluating CCDSSs that give advice for the use of diagnostic imaging tests [35, 61, 64]. Effective decision support for ordering of imaging tests may be particularly relevant for the delivery of high quality, sustainable, modern healthcare, given the high cost and rapidly increasing use of such tests, and emerging concerns about cancer risk associated with exposure to medical radiation [11, 76, 77].

Conclusions

Some CCDSSs improve practitioners' diagnostic test ordering behavior, but the determinants of success and failure remain unclear. CCDSSs may be better suited to improve testing for some purposes than others, but more trials and more detailed descriptions of system features and implementation are needed to evaluate this relationship reliably. Factors of interest to innovators who develop CCDSSs and decision makers considering local deployment are under-investigated or under-reported. To support the efforts of system developers, researchers should rigorously measure and report adverse effects of their system and impacts on user workflow and satisfaction, as well as details of their systems' design (e.g., user interface characteristics and integration with other systems). To inform decision makers, researchers should report costs of design, development, and implementation.