Introduction

Medication errors are a common problem in health care and a frequent cause of mortality and morbidity [1,2,3]. Due to inconsistent definitions and classification systems, differences in populations studied and varying outcome measures, the reported prevalence of medication errors and adverse drug events (ADE) varies widely (from 2% to 94%) across different studies [1, 2, 4,5,6]. Given the high number of prescriptions in primary care, medication errors have the potential to cause considerable harm [7,8,9], contributing to substantial health and economic consequences, including an increased utilization of health care services and, in the worst case, patient death [10,11,12].

The use of digital health technologies can help overcome shortcomings at each stage of the medication management process [13]. Digital health technologies have the potential to reduce medication errors and adverse drug reactions (ADR), improve patient safety and thus contribute to higher quality and efficiency in health care [14, 15]. In particular, Clinical Decision Support Systems (CDSS) are used to improve medication safety by providing direct medication related advice to physicians, pharmacists or other participants involved in the medication process [16, 17]. Current research demonstrates the potential of CDSS to enhance health care processes [18,19,20,21,22,23]. In particular, CDSS that are integrated into the clinical workflow and include messages or alerts that are automatically presented during clinical decision making can have beneficial effects [24].

While a variety of studies have examined the effects of CDSS on medication safety, significant heterogeneity exists concerning the outcome measures used, leading to an ambiguous body of evidence [16, 25, 26] – particularly in primary care [27,28,29] and long-term care (LTC) [29,30,31]. According to Seidling and Bates [32], outcomes used by studies investigating the impact of digital health technologies on medication safety can be grouped into three categories: process-related, harm-related, and cost-related outcomes. These categories differ regarding their relevance for patient health [32]. In particular, harm-related outcomes are more directly relevant for patient health than process- or cost-related outcomes.

As of yet, no review has comprehensively summarized the outcome measures used in studies on medication safety-related CDSS effectiveness in primary care and LTC. Therefore, the primary objective of this systematic review is to summarize and categorize the outcome measures used in these studies. Thereby, we contribute to a more standardized approach in the evaluation of CDSS and facilitate future research in this field. A secondary aim is to compare the main empirical findings of these studies.

Methods

Our systematic review followed the guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) Statement [33] (see Supplementary Tables S1-S2, Additional File 1). This systematic review was registered with PROSPERO (CRD42023464746) [34].

Search strategy

We systematically searched PubMed, Embase, CINAHL, and the Cochrane Library for papers published before September 20th, 2023. The search strategy included terms about the character and type of intervention (digital decision support), the aim of these interventions (medication safety) and the targeted setting (outpatient/primary and LTC). Relevant MeSH-terms were considered (see Supplementary Table S1, Additional File 2). We developed the search strategy in accordance with published CDSS-related systematic reviews [25, 26, 28, 35]. Further publications were searched manually via hand search and automatically using forward and backward citation of the Spider Cite tool [36].

Eligibility criteria

We included English and German language full-text publications that report data on interventional studies evaluating CDSS to improve the medication safety in the primary/outpatient and LTC setting. Only studies reporting medication-, patient- or cost-related outcomes were included, while studies reporting only outcomes related to healthcare providers attitude or acceptance regarding CDSS and studies focusing only on performance or quality indicators of CDSS (e.g. sensitivity, specificity) were excluded. Studies were also excluded if the intervention was conducted in inpatient care, did not automatically engage in the medication process (e.g., via automated alerts), or included only a simple reminder function. Furthermore, studies were not eligible if they focused only on a single potentially problematic drug or only on one specific indication. Finally, studies were excluded if they did not primarily aim at the improvement of medication safety. There were no restrictions regarding the comparator of the intervention (see Supplementary Table S2, Additional File 2). Two investigators (DL and DGR) independently screened search results and assessed the eligibility of potentially relevant studies according to the predefined inclusion and exclusion criteria. Discrepancies (n = 131) were resolved by consensus. Another investigator (BA) was consulted if consensus could not be reached.

Data extraction, categorization and synthesis

We extracted the following data from the included studies: study design, study period, sample, and setting, type of intervention and comparator (Table 1), primary and secondary outcome measures (Table 2), outcome levels (Table 3), and main empirical findings (Table 4). Two investigators (DGR, JG) jointly performed the data extraction, which was verified by a third investigator (BA). We grouped types of interventions and comparators into the following categories:

Table 1 Study characteristics
Table 2 Overview of extracted primary and secondary outcomes including outcome (sub-)categories and levels of operationalization
Table 3 Overview and frequency of outcome levels used by included studies per outcome category and subcategory (n = number of studies)
Table 4 Empirical findings of studies (primary outcomes)

Computerized physician order entry

Computerized Physician Order Entry (CPOE) is defined as any system that allows health care providers to “directly place orders for medications, tests or studies into an electronic system, which then transmits the order directly to the recipient responsible for carrying out the order (e.g. the pharmacy, laboratory, or radiology department)” [27].

Electronic prescribing

Electronic Prescribing (e-prescribing or eRx) can be seen as a special form of CPOE [69]. It is defined as the “computer-based electronic generation, transmission and filling of a prescription” [70].

Clinical decision support systems

Clinical Decision Support Systems (CDSS), often integrated with CPOE [27], supply health care providers and patients themselves with “knowledge and person-specific information, intelligently filtered or presented at appropriate times” [71]. Tools may include computerized alerts and reminders, clinical guidelines, patient data reports, or diagnostic support [71].

Electronic health records

According to the International Organization for Standardization, electronic health records (EHR) are classified as a repository of patient data in digital form, stored and exchanged securely, and accessible by multiple authorized users that primarily aim to support continuing, efficient and quality integrated health care. There are several different types of EHR [72].

We grouped outcome measures into the three main categories identified by Seidling and Bates [32]: process-related outcomes (e.g. medication errors), harm-related outcomes (e.g. ADE), and cost-related outcomes (e.g. costs of ADE, outcomes from health economic evaluations) (Table 2). For example, we categorized healthcare resource utilization outcomes (HCRU) as cost-related and effects on health (e.g. mortality or hospitalization) as harm-related [73]. Finally, we extracted the main empirical findings (for primary outcomes) of the included studies (Table 4). Heterogeneity in reported outcomes and study designs did not allow for a meta-analysis.

Quality assessment

The methodological quality of the included studies was assessed using the Evidence Project risk of bias tool [74], which has also been used by a similar systematic review in this field [26]. This tool was selected because it allows assessing the risk of bias for both randomized and non-randomized studies. The items include (1) cohort, (2) control or comparison group, (3) pre-post intervention data, (4) random assignment of participants to the intervention, (5) random selection of participants for assessment, (6) follow-up rate of 80% or moe, (7) comparison groups equivalent on sociodemographics, and (8) comparison groups equivalent at baseline on outcome measures referring to study design, participant representativeness, and the equivalence of comparison groups (see Supplementary Table S1, Additional File 3). Item 7 was slightly expanded by not only considering sociodemographic but also disease-related factors as potential confounding variables. The tool explicitly allows such adaptions. For each study, items 1–3 and 5 were rated as present or absent; item 4 was rated as present, absent or not applicable (n.a.); items 6–8 were rated as present, absent, n.a. or not reported (n.r.). Two reviewers made independent judgments on each of the items (DGR, DL). Disagreements (n = 10) between the two reviewers were resolved by consensus after discussion.

Results

Study selection

The literature search identified 2,094 studies, resulting in 1,477 studies after duplicates were removed. After screening titles and abstracts, 1,378 records were excluded and 99 full-text studies were subsequently assessed for eligibility. Full-text assessment led to the exclusion of further 69 studies. Reasons for exclusion were related to a wrong study design (n = 49), intervention type (n = 10), setting (n = 8), outcome (n = 1) and language (n = 1). In addition to the database search, one study each was identified by forward and backward citation and by manually searching the reference lists of the included studies, respectively. Overall, we included a total of 32 studies in our review (Fig. 1).

Fig. 1
figure 1

PRISMA 2020 flow diagram

Study characteristics

Study characteristics and designs are presented in Table 1. The studies included 13 cluster-randomized trials (C-RCT) [41, 43, 47], 11 single-arm pre-post studies (PPS) [37, 39, 44,45,46, 53, 60, 61, 66,67,68], five non-randomized controlled trials (N-RCT) [38, 40, 49,50,51] and three randomized controlled trials (RCT) [42, 55, 56]. Roughly half of C-RCT studies (n = 6) were randomized at the physician level [48, 52, 57, 63,64,65], though the remainder (n = 7) were randomized at a higher level, either at the level of resident care units [41, 43, 47, 62] or the clinic/practice level [54, 58, 59].

The majority of studies (n = 24, 75%) were conducted in North America (USA/CAN) [37, 39, 41,42,43, 45, 47, 50, 51, 53,54,55,56, 58,59,60,61,62,63,64,65,66, 68] six in Europe [40, 46, 48, 52, 53, 57, 67] and two in Asia [44, 49]. Studies were predominately conducted in primary care practices/centers (PCP) [38, 40, 48, 50,51,52,53,54, 57, 58, 63,64,65, 67], in outpatient/ambulatory clinics (OC) [37, 39, 42, 44, 46, 61, 66, 68], in Health Maintenance Organizations (HMO) [45, 49, 55, 56, 59, 60] or in LTC facility settings [41, 43, 47, 62]. Sample sizes varied considerably between studies, ranging from 323 [48] to approx. 450,000 patients [60]. Study periods also varied between 4 months [55] and 57 months [45].

All but one study [48] used a CDSS in combination with other components. Most frequently, EHR [37,38,39,40,41,42, 49, 50, 53, 54, 57,58,59,60,61,62,63,64,65,66,67,68], CPOE systems [37, 39, 41, 43, 44, 47, 59,60,61,62, 66], and electronic prescribing (eRx) [37,38,39,40, 46, 50, 51, 53, 64] were used in addition to the CDSS. In addition, a subset of these or other interventional components, such as pharmacy information management systems [45, 55, 56], medication profiling software with a clinical pharmacist [42] or an educational program [59] were added. Studies also differ regarding the comparator. Most frequently, the comparator consisted of EHR [37, 39,40,41,42, 49, 54, 58,59,60,61,62,63,64,65,66, 68], CDSS with fewer functions [37, 39, 40, 42, 45, 50, 54, 58, 59, 63, 64, 68], CPOE systems [37, 39, 41,42,43,44, 47, 59,60,61,62, 66], paper-based prescription/information [38, 46, 51,52,53] or a combination of these components. Other types of software [45, 67] and eRx [50, 53, 64, 67] were also utilized for comparison.

Methodological findings

Following Seidling and Bates [32], the outcome measures used in the included studies were categorized into process-related, harm-related and cost-related outcomes. Table 2 gives an overview of the extracted outcomes for each study. Almost all studies (n = 31) used process-related outcomes. Of these, 18 used only process-related outcomes. Harm-related outcomes were used in 11 studies, of which one study used only harm-related outcomes. Three studies reported cost-related outcomes. Notably, no study used all three types of outcomes. In each category, we grouped the outcomes used into subcategories, shown in Fig. 2.

Fig. 2
figure 2

Number of included studies using each outcome (sub-)category

Process-related outcomes

We divided the process-related outcomes used by the included studies into three subcategories, defined in more detail below. Of these subcategories, error rates were studied most frequently (n = 25 studies), followed by alert rates (n = 14) and response rates (n = 12). During clinical encounters involving CDSS, these three subcategories of process-related outcomes follow a temporal logic. First, alert rates measure whether CDSS alerts occurred, indicating a potential medication error in the making. Second, response rates measure whether (and/or how) prescribers react to these alerts. Finally, error rates measure the actual medication errors that reach patients.

Error rates concern the occurrence of different types of medication errors. Error rates are the most patient-relevant process-related outcomes, since medication errors may lead to ADE or other direct patient harms. As seen in Table 2, the studies used various types of errors to define error rates. These error types included potentially inappropriate medication (PIM), potentially inappropriate prescribing (PIP), drug-drug interactions (DDI), drug duplications, near misses and rule violations. A number of studies used composite outcomes combining multiple types of prescribing errors, including illegibility errors, duration errors, strength errors, directions errors, frequency errors, amount errors, dose errors, route errors, refill errors and inappropriate abbreviations. Finally, some studies did not measure the number of errors, but rather the absence of errors (such as error-free patient visits or recommended drug use); these outcomes were also categorized as error rates.

Most studies used error rates defined at the patient-level, such as the number of errors (of a given type) per patient/person/person-time, or at the prescription-level, such as the number of errors per prescription/medication/dispensing. Two studies used error rates defined at the encounter-level (the number of errors per encounter/visit).

Alert rates measure the number of alerts generated by the CDSS. Alerts do not directly impact patients and are therefore less patient-relevant than error rates, although accurate alerts that lead to appropriate responses by prescribers can prevent the occurrence of medication errors. Types of alerts included warnings (such as dose, frequency, interaction, avoid or missing information alerts) and recommendations (such as START and STOPP recommendations or dose recommendations). Most studies using alert rates defined these outcomes at the patient-level, although a smaller number of studies defined alert rates at the prescription-, encounter- or physician-level (the number of alerts per physician).

Response rates concern the ways in which prescribers respond to and interact with the CDSS and the alerts it generates. Response rates do not directly impact patients and are therefore also less patient-relevant than error rates. However, these responses do influence whether medication errors occur following alerts, thereby indirectly impacting patients. There was significant heterogeneity in the response types investigated by the included studies. These response types included implementing CDSS recommendations, resolving or overriding alerts, correcting or modifying prescriptions (including medication, dose and frequency), discontinuing mediations and other appropriate actions after alerts. Most studies using response rates defined these outcomes at the alert-level (the number of responses per alert). A smaller number of studies used response rates defined at the prescription-level or patient-level.

Harm-related outcomes

Harm-related outcomes most frequently comprised ADE or fractures, which we grouped under injuries (n = 10), followed by injury risk (n = 4), which includes falls. Two studies each used Health-related Quality of Life (HRQoL), mortality and hospitalization (Fig. 2). Most studies used the Naranjo algorithm [75] for classifying ADE; two studies [43, 57] used other methods. Most harm-related outcomes were defined at the patient-level, although four studies defined (preventable) ADE at the prescription-level.

Cost-related outcomes

Only three of the included studies used cost-related outcomes. Of these studies, one [49] assessed only HCRU, one [62] assessed only direct costs and one [67] assessed both HCRU and direct costs. No studies assessed indirect costs. Both studies assessing direct costs included only a small subset of these costs: Witte et al. [67] compared direct drug-related costs resulting from a difference in the observed prescription volumes between the intervention and control period, while Subramanian et al. [62] estimated the costs that would have been incurred if drug orders that triggered the alert system had actually been completed compared to the costs of the final submitted orders. One study [48] references a full health economic evaluation conducted alongside the effectiveness trial. This health economic evaluation reportedly takes into account both direct (e.g. doctor visits) and indirect (e.g. informal care) costs. However, as of our search, the corresponding paper has not yet been published and is therefore not included in this review.

Table 3 gives an overview of the outcome levels used by the included studies per outcome category and subcategory. The patient-level was the most common for all process-related outcomes except response rates, which were most commonly defined at the alert-level. Notably, response rates were also the only outcomes of any kind to be defined at the alert-level. Finally, harm- and cost-related outcomes were overwhelmingly defined at the patient-level, though some injury outcomes were also defined at the prescription-level.

Empirical findings

Slightly more than half of the studies (n = 20) explicitly specified a primary outcome (Table 4), three studies specified multiple primary outcomes. Most studies (n = 15) used process-related primary outcomes, of which roughly half (n = 8) were PIM or PIP. Five studies used harm-related primary outcomes, three of which were (preventable) ADE. No study specified a cost-related primary outcome.

Half of studies with primary outcomes (n = 10) demonstrated a significant intervention effect for at least one primary outcome. However, only one out of five studies with harm-related primary outcomes (20%) found a significant intervention effect, compared to nine out of fifteen studies with process-related primary outcomes (60%). Of the three studies with multiple primary outcomes, two found significant intervention effects for some primary outcomes, but not for others (Table 4).

Quality assessment

We found that at least half of the included studies demonstrated a potential risk of bias. First, half of the studies were either PPS (n = 11), which lack a separate control group, or N-RCT (n = 5), which use a non-randomized control group. In contrast, C-RCT (n = 13) and RCT (n = 3) studies, which use randomized control groups, demonstrate less risk of bias. Second, most studies that did use a (randomized or non-randomized) control group either reported problems regarding the comparability of study groups or did not address study group comparability at all. Third, most of the studies were cross-sectional (n = 22) instead of using a longitudinal (n = 10) design (see Supplementary Table S1, Additional File 3).

Discussion

This systematic review identified and categorized outcomes used in experimental studies evaluating the effects of medication-related CDSS implemented in primary and LTC settings. We grouped outcome measures into three categories identified by Seidling and Bates [32]: harm-related, process-related and cost-related. Across the included studies, there was substantial heterogeneity with regards to study design, outcome measures and main empirical findings.

Choosing outcome measures

Which outcomes should be used to evaluate CDSS? From a patient perspective, harm-related outcomes are most relevant. Medication-related outcomes (such as ADE) may be better suitable for evaluating the isolated health impact of CDSS than more general outcomes (such as HRQoL, hospitalization or mortality), since the latter depend on various factors besides the CDSS [76]. Nevertheless, HRQoL, hospitalization and mortality are highly patient-relevant outcomes. If possible, studies should therefore use medication-related outcomes alongside more general harm-related outcomes.

When the use of harm-related outcomes is not possible or feasible, error rates can serve as a process-related proxy for patient harm. Alert rates and response rates, however, are less suitable as proxies for patient harms. Whenever possible, studies should use harm-related primary outcomes rather than process-related proxies [76].

While process-related outcomes should not replace direct measures of patient harms, they provide important information about system activity and should therefore be included as outcomes in CDSS evaluations. For example, a high alert rate and low response rate may indicate alert fatigue, suggesting improvements aimed at usability and user experience [77]. In contrast, a low alert rate, high response rate and high error rate may indicate that while prescribers are willing to use the system, not enough alerts are generated to meaningfully improve patient outcomes. To comprehensively assess CDSS activity, studies should use error, response and alert rates.

Finally, while cost-related outcomes are not directly patient-relevant, they represent important secondary outcomes and should therefore be included in CDSS evaluations. The health economic impacts of novel interventions are increasingly important for resource allocation decisions [78]. However, the cost-related evaluation of CDSS remains a challenging task, as these complex digital health interventions usually influence the medication process in several ways [32]. Furthermore, using secondary data on direct and indirect costs for economic evaluations is not always feasible and primary cost-related data may be difficult to collect.

Besides direct intervention costs (such as those related to the implementation), studies should also include indirect intervention costs (such as time spent training with new software). However, these indirect costs are difficult to measure and are thus often not considered [79]. For example, Donovan et al. show that the implementation costs of hospital-based CDSS are rarely reported and the methods used to measure and value such costs are often not well described [80]. Thus, intervention costs, as well as costs that may have occurred in other (health care) sectors, are often not considered in economic evaluations of CDSS [81]. Since the quality of the current health economic literature on health information technology in medication management is poor [81], future studies should follow established standards of health economic evaluations [78, 82, 83]. Additionally, since the economic impacts of improved medication safety may occur on different levels, economic evaluations of CDSS should take into account not only the payers’ perspective, but also financial effects at the provider level.

To summarize: CDSS evaluations should include multiple outcomes from each of the three outcome categories [32, 76]. However, we found that none of the included studies conducted a comprehensive evaluation of all three outcome categories. Furthermore, two-thirds of studies did not consider any harm-related outcomes. Those studies that did use harm-related outcomes mostly used ADE or other injuries; very few used morbidity or hospitalization. Although process-related outcomes were by far the most used outcomes, this is mostly due to the large number of studies using error rates. In contrast, response rates and alert rates were used less commonly, making it difficult to fully investigate and interpret CDSS activity and use. Finally, only three studies used cost-related outcomes. This finding is consistent with the sparse and conflicting evidence regarding the financial impact and cost-effectiveness of CDSS [16, 81, 84]. The studies that used cost-related outcomes included only a small subset of direct costs and did not consider indirect costs.

Defining outcome measures

We have seen that the included studies differ in the outcome categories they use. However, studies also differ in their definition and operationalization of outcomes even within categories (and subcategories).

While mortality and hospitalization are easily measured standardized outcomes, other harm-related outcomes (such as injuries) may be defined and operationalized in various ways, limiting the comparability of harm-related results between studies. Cost-related outcomes were only considered in three studies, which used significantly different (and therefore non-comparable) approaches.

Differences in outcome definition and operationalization between studies were most pronounced for process-related outcomes. First, these outcomes measured the occurrence of a number of different types of errors, responses, and alerts. For example, an error rate may refer to the number of PIM or the number of DDI. Second, these outcomes can be defined at different levels, including patient-level, encounter-level, prescription-level or alert-level. For example, an error rate may refer to the number of errors per prescription or the number of errors per patient-month. These differences in outcome definitions are in line with the literature: a review by Rinke et al. [85] also found differences in outcome definition and operationalization for evaluations of interventions to reduce paediatric medication errors.

Due to these differences in outcome definition, comparing results between studies can be difficult or even impossible [85], even if studies use the same outcome categories. Therefore, future research should work toward consensus definitions for key outcomes. This could increase the efficiency of evidence synthesis and reduce the risk of duplicated research efforts, thereby accelerating the improvement of care [86]. When agreed-upon definitions are unavailable, researchers can increase the comparability of their results by reporting multiple outcome definitions.

Importantly, this does not imply that all CDSS evaluation research should use a one-size-fits-all approach. Different healthcare systems, care settings, study populations, or CDSS types may give rise to different research questions, which will likely require the use of different outcomes and definitions. For example, an evaluation of a novel CDSS introduced in an LTC setting with a history of inappropriate medications may use a PIM/PIP-based error rate, while an evaluation of an existing primary care CDSS which has recently been upgraded to generate dosage alerts may instead measure the rate of dosage errors. However, studies with similar research questions concerning similar settings and populations should still strive to use comparable outcome definitions, when possible.

Finally, researchers should carefully consider at which level they define their outcomes. For many types of error rates, the prescription-level may be most appropriate. For example, the number of errors per prescription (or per encounter) reflects the total opportunities for errors more accurately than the number of errors per patient or per patient-month [85]. Similarly, it may be more appropriate to define response rates at the alert-level, rather than the prescription-level. As discussed above, the most appropriate outcome definition will depend on the context and specific research question.

Reducing the risk of bias

But even if the included studies had used a wider variety of outcomes from all outcome categories, with agreed-upon definitions and standardized operationalizations for each outcome, many studies would still have exhibited a risk of bias due to their study design and other methodological problems. In particular, most studies were cross-sectional designs without a sufficient follow-up period, many studies were not randomized or not controlled and most controlled studies did not demonstrate study group comparability. Finally, many studies did not specify a primary outcome, and only 12 studies reported power calculations.

To reduce the risk of bias, future research should rely on well-designed (cluster) RCTs including a sufficient follow-up period; study group comparability should be assessed and reported. Whenever possible, studies should be longitudinal rather than cross-sectional. Finally, studies should explicitly specify a clear (preferably harm-related) primary outcome and should perform and report sample size and power calculations for this outcome.

Empirical findings

Only 20 out of 32 included studies explicitly specified a clear primary outcome and, of these, only five studies used harm-related primary outcomes. While half of all studies with primary outcomes demonstrated a significant intervention effect, most studies finding significant effects did so for process-related primary outcomes. This result is in line with current research demonstrating significant intervention effects when using process-related outcomes [18,19,20,21,22]. In contrast, only one study found a significant intervention effect for a harm-related primary outcome. Overall, our results agree with prior reviews finding that the effectiveness of CDSS for medication safety in primary care [27,28,29] and LTC settings [29,30,31] remains inconsistent and future research on the harm-related effects of medication-related CDSS is needed.

To generate stronger evidence on the effectiveness of CDSS, future studies should follow the methodological recommendations outlined above. Furthermore, additional research should take place in LTC settings, as this setting was underrepresented in the included studies. Finally, insights from research using process-related outcomes to study CDSS activity should be used to improve on the design and functionality of future CDSS. While uptake levels are rarely reported in CDSS evaluations, available evidence indicates that uptake is low [87]. In addition to alert fatigue, high override rates are an increasingly important problem for CDSS interventions [88, 89]. If these overrides are inappropriate, they can lead to medication errors, patient harms and increased costs [90]. Comprehensive CDSS evaluations using a variety of outcomes and outcome categories are therefore needed to identify and remove barriers to user acceptance of CDSS.

Limitations

Compared to a recent review [26], we expanded our scope by including the LTC setting and focusing primarily on methodological aspects and outcomes used in CDSS evaluations. However, our systematic review still has several limitations. First, relevant studies that have not been indexed in the searched databases might be missing from this review, although we followed an extensive search strategy, including hand search and automated citation tools alongside the search of multiple databases. Second, due to the methodological heterogeneity of the included studies, we only compared whether or not studies found a significant effect for their primary outcome and did not compare levels of significance or effect sizes. We also did not consider outcomes related to user acceptance of CDSS. Finally, a scoping review may also have been an appropriate method for addressing our primary (methodological) aim, although the lines between these types of reviews are often blurred [91]. However, due to our secondary (empirical) aim and our performance of a risk of bias assessment, we decided to conduct a full systematic review according to the PRISMA, rather than PRIMSA Extension for Scoping Reviews [92], guidelines.

The included studies vary in terms of applied interventions and comparisons. Some studies compared the CDSS intervention to non-automated IT systems, while other studies used handwritten or paper-based prescription forms as a comparison. Consequently, the applied interventions and comparisons are not comparable, which could also have an influence on the differences in outcome measures and operationalizations. For example, comparing CDSS to other IT systems rather than handwritten prescriptions may allow alert rates or response rates to be calculated for both the intervention and control groups.

Furthermore, since 75% of the studies were from North America, the generalizability of the studies to other regions may be limited. Finally, the included studies’ high risk of bias (particularly for PPS and N-RCT studies), their lack of clearly specified primary outcomes and their weak reporting of sample sizes need to be considered when drawing conclusions from study results. Despite these limitations, our results give rise to a number of key recommendations for future studies researching the effect of CDSS on medication safety, summarized in Table 5.

Table 5 Recommendations for research on medication safety-related CDSS effectiveness

Conclusions

Our primary aim in this review was to summarize and categorize the outcome measures used in CDSS evaluation studies. Furthermore, we assessed the methodological quality of these studies and compared their key findings.

Although a variety of studies have evaluated the effectiveness of CDSS, we found that these studies face a number of (methodological) problems that limit the generalizability of their results. In particular, no studies used a comprehensive set of harm-related, process-related and cost-related outcomes. Definitions and operationalizations of outcomes varied widely between studies, complicating comparisons and limiting the possibility of evidence synthesis. Furthermore, a number of studies were not controlled, lacked randomization or did not demonstrate the comparability of study groups. Only 63% of studies explicitly specified a primary outcome. Of these, half found a significant intervention effect.

Overall, evidence on CDSS effectiveness is mixed and evidence synthesis remains difficult due to methodological concerns and inconsistent outcome definitions. Additional high-quality studies using a wider array of harm-, process- and cost-related outcomes are needed to close this evidence gap and increase the availability of effective CDSS in primary care and LTC.