1 Background

Prescribing errors are common in hospital inpatients. While errors are under-reported in clinical practice, research studies using methods other than spontaneous reporting have found much higher rates. In a recent systematic review of the prevalence, incidence, and nature of prescribing errors in hospital inpatients (including a wider range of methods for identifying errors), the median error rates in 65 eligible studies were 7 % of medication orders, 52 errors per 100 admissions, and 24 errors per 1,000 patient days [1]. Even errors that do not result in harm create additional work and can adversely affect patients’ confidence in their care.

Tools for measuring errors are needed to evaluate the effectiveness of interventions designed to reduce them. Medication error rates are often used to compare drug distribution systems [24] and to assess the effects of interventions. However, medication errors range from those with very serious consequences to those that have little or no impact on the patient. It has thus been suggested that the severity as well as the prevalence of errors should be taken into account [5, 6]. Assessing the severity of errors detected increases the clinical relevance of studies’ findings when compared with studies based on prevalence alone. In their systematic review of the prevalence, incidence, and nature of prescribing errors in hospital inpatients, Lewis et al. [1] noted that methods of classifying severity were disparate, but did not discuss the tools identified.

In this study, our objective was to describe and evaluate tools used to assess prescribing error severity in studies reporting prescribing error rates in the hospital setting.

2 Methods

2.1 Search Strategy

We carried out a systematic review to identify the tools that have been used to assess prescribing error severity in hospitals, and to investigate the validation and reliability of those tools. A recent comprehensive review of studies of the prevalence of prescribing errors in hospitals up to the end of 2007 was carried out by Lewis et al. [1]. We used the results of this search but excluded conference abstracts and letters [7] and those studies not assessing the severity of error. SG then re-ran Lewis et al.’s search strategy to identify additional papers published between 2008 and January 2013 (inclusive). The following databases were searched: MEDLINE, EMBASE, International Pharmaceutical Abstracts, and CINAHL using the keywords (error OR medication error OR near miss OR preventable adverse event) AND (prescription OR prescribe) AND (rate OR incidence OR Prevalence OR epidemiology) AND (Inpatient OR Hospital OR Hospitalization)—see Appendix for full example. SG searched the reference list of any relevant reviews identified and obtained the full text of any original studies that potentially met our inclusion criteria. Finally, SG hand-searched the reference lists of all included articles and searched our research team’s local database of medication error studies in order to identify any further studies informing the development, reliability, or validity of the severity assessment tools identified.

2.2 Inclusion and Exclusion Criteria

2.2.1 Inclusion Criteria

Peer-reviewed studies that reported on the detection and rate of prescribing errors in prescriptions for adult and/or pediatric hospital inpatients, or elaborated on the properties of severity tools used by these studies were included. All study designs and lengths of follow-up were included. If the same study was reported in multiple papers, all papers were included.

2.2.2 Exclusion Criteria

The exclusion criteria were based on those of Lewis et al. [1], with conference abstracts, letters, and studies not measuring severity also excluded [7]:

  • Studies not published in English.

  • Letters.

  • Conference abstracts.

  • Studies neither reporting the incidence of prescribing error separately from all types of medication error nor reporting the reliability and validity of tools that have been used to assess severity in these studies.

  • Studies of errors for only one disease or drug class or for one route of administration or one type of prescribing error.

  • Studies carried out in primary care or hospital outpatients.

2.3 Screening and Data Extraction for Electronic Search

All database search results were combined into a Reference Manager® 11 database. An electronic duplicate search was conducted using Reference Manager® 11 followed by a manual duplicate search. All duplicate papers were removed. SG then screened each title and abstract to determine whether the full research paper should be retrieved or whether it was evident it did not meet the inclusion criteria at that stage. BDF independently screened a random 10 % sample of abstracts to check the reliability of the screening (agreement level 91 %). All discrepancies were resolved through discussion. SG then reviewed all retrieved full papers to determine whether the article met the inclusion criteria and BDF independently reviewed a random 10 % sample of full papers to check reliability (agreement level 100 %). SG then extracted data from the included articles regarding the tools used to assess prescribing error severity. A second researcher (BDF) checked the tables (but did not carry out a duplicate data extraction). The following data were extracted directly into electronic tables: country and method of development, whether the tool assessed actual or potential harm, levels of severity assessed, and results of any validity and reliability studies. We extracted any data referring to the reliability and validity of the instruments rather than focusing on any particular type of reliability or validity. Authors were not contacted to provide further information. The data extracted were not amenable to meta-analysis; a descriptive analysis was therefore conducted. We did not formally assess the risk of bias.

3 Results and Discussion

3.1 Overview

Forty (62 %) of the 65 papers originally identified by Lewis et al. [1] met our criteria. 210 abstracts were screened during our additional electronic search and 67 full-text articles were obtained. Twenty of these papers met our inclusion criteria; the rest were letters, conference abstracts, or did not assess severity (Fig. 1). When combined with the papers from Lewis et al. [1], a total of 60 papers were included [867] and 40 tools (including adaptations of other tools) identified. Forty studies (67 %) used original or adapted versions of four tools [11, 16, 68, 69], but there were also 18 tools designed for individual studies. It is notable that 46 (44 %) of 104 studies measuring the prevalence of prescribing errors in secondary care did not include any assessment of severity. The included tools and their properties are shown in Table 1 of the Electronic Supplementary Material (ESM).

Fig. 1
figure 1

Flow chart of papers identified, screened, and evaluated

Methods of measuring severity were diverse, although most tools had some features in common. The tools all comprised single-item classification systems for error severity with associated definitions. The majority were presented as ordinal Likert scales but one tool was based on a visual analog scale [11]. Seven tools [24, 27, 48, 5052, 68] were a mixture of a severity assessment scale and another type of assessment. For example, the NCC MERP (National Coordinating Council for Medication Error Reporting and Prevention) index [68] includes a category ‘not an error’. As a consequence, the tool is a mixture of a severity assessment scale and a tool for recording whether or not an error had occurred. One study measured the predicted patient outcome from clinical pharmacists’ recommendations made in response to identified prescribing errors rather than the direct clinical significance of the errors themselves [9].

3.2 Tool Development

Little information was given on the development of the majority of tools (ESM Table 1). No information was given at all for 19 (47.5 %) [912, 1623, 3439, 4147, 5760, 63, 64, 66, 67] of the 40 tools. For the other tools, information was usually limited to statements explaining that the tool was based on a previous one, the development of which was not described or referenced. However, the authors of two tools described the rationale or methodology by which they adapted these previous tools. The NCC MERP index was collapsed from nine to six categories by Forrey et al. [48] because the original distinctions were considered ambiguous or seemed similar. In a separate study [31], an expert panel survey was used to adapt Folli et al.’s tool [16]. Again, in neither case was the development of the original tool described or referenced.

While many tools were developed for medication errors in general, others were developed for studies of prescribing error specifically. Tools were developed in a range of countries (15 UK, 10 USA, 14 other, 1 not stated). Tools developed for use in one country may not be transferable to other countries, due to differences in healthcare systems.

3.3 Potential Versus Actual Harm

Thirty (75 %) tools were based on potential rather than actual harm. It is of interest that the NCC MERP index [68] was developed to assess actual harm but was subsequently used or adapted to assess potential harm in six studies [48, 50, 51, 5456]. Tools based on actual patient outcomes may have practical limitations if a researcher becomes aware of any errors as they occur and may be ethically obliged to intervene, or in retrospective studies where it may be difficult to identify any clinical effects because of the delay between the occurrence and identification of errors [11]. The main benefit of using potential outcomes is that even in the absence of actual patient harm, judgments can be made about severity; however, assessing potential outcomes is likely to be more subjective.

3.4 Severity Levels

Tools varied in the number and range of severity levels assessed. The number of levels of severity ranged from two to continuous. The majority of tools included levels ranging from potentially or actually lethal, to minor/mild error, or no harm. However, some tools had ‘severe’ or ‘harmful’ as the highest level of severity and did not have a separate category for life-threatening errors. In addition, Folli et al.’s [16] lowest harm rating was ‘significant’. Some authors [2427] expanded the number of severity categories from Folli et al. [16] to include minor errors. Adding a category could complicate the assessment for the reviewers, but it allows for a wider range of responses, and therefore potentially increases the sensitivity of the method.

3.5 Reliability

A measure of reliability was established for 17 (43 %) tools (ESM Table 2) [9, 11, 15, 16, 24, 28, 31, 33, 34, 3739, 48, 50, 63, 67, 68]. In all cases this was inter-rater reliability, which could be particularly important where potential harm was being assessed. The Folli et al. [16] scale appeared to have higher inter-rater reliability when used to assess actual harm (κ = 0.67–0.89 [16]) than potential harm (κ = 0.32–0.37 [17]). However, this finding should be interpreted with caution as these were two separate studies using different assessors. High inter-rater reliability (κ > 0.7) was found for five tools (ESM Table 2): Folli adapted by Abdel-Qader et al. [24], Folli adapted by Lesar et al. [28] in 1990, Kozer et al.’s 2002 tool [39], NCC MERP index adapted by Forrey et al. [48], and Wang et al.’s tool [67]. It is of note that the NCC MERP index was more reliable when collapsed into six levels of severity than when all nine levels were used in the same studies [48, 50]. Dean and Barber [11] used generalizability theory to establish the reliability of their tool. They found that in order to achieve an acceptable generalizability coefficient (>0.8) four reviewers were required and their mean score then used as the index of severity. Subsequent studies measuring the severity of prescribing errors using Dean and Barber’s tool [11] have used five reviewers based on a conference abstract, which does not meet this review’s inclusion criteria. There was no information regarding reliability for 23 (57 %) tools [11, 2527, 35, 4042, 5162, 6466], and in seven cases (18 %) descriptive information was given but no statistical information presented [9, 15, 31, 33, 34, 37, 63].

3.6 Validity

Validity was only reported for five (12.5 %) tools [11, 48, 61] (ESM Table 2). These all explored construct or criterion validity and measured raters’ judgments of potential harm against actual harm in situations where the outcome was known. Dean and Barber [11] found that there was a clear relationship between potential harm as assessed using their scales and actual harm. Forrey et al. [48] found that the original NCC MERP index [68] had 74 % alignment and that their adapted version had 81.0–83.9 % alignment when potential harm assessment was compared with actual harm. Ridley et al. [61] reported that there was no relationship between potential harm and apparent actual harm.

3.7 Acceptability

Very little information was given on acceptability or ease of use of the tools. However, Dean and Barber’s tool [11] requires four reviewers to rate error severity in order to achieve acceptable reliability, which could potentially be viewed as time consuming and a disadvantage of that particular tool.

3.8 Comparison with Studies Measuring Medication Administration Errors

Our findings are similar to those for studies of administration errors. In our review, 43 % of studies of prescribing error prevalence did not include assessment of severity. Of those that did, 67 % used previously established methods and 33 % used their own tools. In a review of studies of the prevalence of administration errors, Keers et al. [70] found that 44 % of 91 studies did not attempt to determine the clinical significance of identified administration errors and that whilst 82 % of these studies used previously published severity tools, 18 % used their own criteria.

3.9 Review Limitations

Our search strategy excluded studies not published in English and focused on the hospital setting. We based our work on a previous paper and an existing search strategy rather than developing our own. However, this strategy was a sufficiently close fit to match our needs. We acknowledge that one of the authors of this review, BDF, was the author of one of the tools [11], giving a potential source of bias. One of the databases that we searched was not publically available, but our own local database of medication error studies.

3.10 Recommendations

Researchers and clinicians may have different needs in relation to a tool for assessing the severity of medication errors. However, in general, an ideal tool should be specific to medication error severity, relatively easy and not too time consuming to use, reliable, and validated in different healthcare systems. Few studies presented information on ease of use or the time required. We identified only two tools with acceptable validity and reliability: the NCC MERP index as adapted by Forrey et al. [48], and Dean and Barber’s tool [11]. It is not possible to directly compare the reliability of the two tools as they used different methods of assessing reliability. However, information about their development and ease of use is limited and Dean and Barber’s tool [11] may be more time consuming to use. Forrey et al.’s tool [48] is a mixture of error identification and error severity. Currently, the most appropriate instrument will need to be selected based on use. Forrey et al.’s tool may be most appropriate for use in clinical practice as it is less time consuming to use. However, Dean and Barber’s tool may be better for research as it has been tested on a larger sample size and the continuous scale potentially permits more powerful statistical analysis in comparative studies. There is also scope for developing and testing of a new tool which meets all of the criteria above. Due to the wide range of tools used in the literature, researchers should also consider developing a basis of comparison between tools to assist in comparing findings across studies.

4 Conclusion

When assessing the effects of interventions on prescribing error rates, the severity of error should also be considered [5, 6]. When selecting a tool to assess prescribing error severity, its development, reliability, validity, and ease of use need to be taken into account. There is potentially the need to establish a less time-consuming method of measuring severity of prescribing errors, with acceptable international reliability and validity. Due to the wide range of tools used, developing a basis of comparison between tools would potentially be helpful in comparing findings across studies.