Error in drug administration pervades every aspect of clinical practice, notably in anesthesia.1-5 There is reasonable agreement about the basic principles of safe administration of medications6 but relatively little empirical evidence to underpin these principles. Studying medication error in clinical settings is difficult, because errors, although unacceptably frequent, do not occur often enough to lend themselves to study in prospective trials that are moderate in size. In recent years, the effects of fatigue and time of day have received considerable attention as causes of performance impairment.7-10 Interns in the United States working “traditional” 24-hr shifts suffered from double the number of failures in concentration than those working a 17-hr shift,11 and during the night shift, they made 36% more medical errors of a serious nature.12 This impairment in performance was attributed to the effects of sleep deprivation and the circadian clock, which promotes sleep at night.

As a result of these concerns, the work hours of junior doctors in New Zealand have been restricted since 1985 to a maximum of 17 continuous hours and a total of 72 hr per week. Many of the studies of the effects of fatigue are based on tests, such as the psychomotor vigilance test, that seem to have little to do with clinical medicine in general or anesthesia in particular. Perhaps unsurprisingly, Howard et al.13 demonstrated a significant decrement in the performance of anesthesiologist in the psychomotor vigilance test after 25-30 hr of sleep deprivation, but they demonstrated no change in their performance in a high-fidelity human patient simulator. Conventional cognitive tests (CogState™) have been used to demonstrate reduced performance of anesthesiologists after working several consecutive night shifts,14 but these tests are relatively time-consuming to complete during work periods and, arguably, do not relate directly to the task of drug administration. Legible drug labels that are colour-coded according to international standards have been adopted widely as an aid to avoiding drug error,15 and with some overall success, they have been incorporated into a multi-faceted system designed to reduce error in anesthesia.16 However, the influence of drug label format alone on drug administration error has not been definitively elucidated.

Consequently, we have developed a novel task-relevant simulated drug recognition and confirmation test to measure times and error rates of simulated drug recognition and confirmation. In this study, we describe the test and examine two null hypotheses: 1) Speed of drug recognition and accuracy of drug selection, as measured by the new test in terms of reaction time and error rate, does not differ between three different presentations of drug identity; and 2) These measures do not differ between day and night shifts.

Methods

Setting, ethics, and consent

Approval was obtained from the Auckland Regional Ethics Committee (ref 97/032) and the University of Auckland Human Participants’ Ethics Committee (ref 2005/051). Written consent was obtained from all participants. The study was conducted at Auckland City Hospital (ACH), Auckland, New Zealand from 2005 to 2008. The ACH is a 760-bed tertiary hospital associated with the University of Auckland and credentialed for training by the Australian and New Zealand College of Anaesthetists (ANZCA).

Participants

Eligible participants were anesthesia trainees allocated to one of four sub-departments at ACH. The ACH participates in a regional training scheme that has slightly more than 90 anesthesia trainees at any one time, all of whom are allocated to this sub-department (15 each time) at some point in their training. These trainees are rostered for three or four consecutive night shifts at a time, interspersed with variable (but longer) periods of day shifts. Data were collected from each participant over a seven-day study period that spanned both day and night shifts. Different participants took part in Experiment-1 (undertaken in the earlier part of the study) and Experiment-2 (undertaken in the later period).

Sleep data

Participants used actigraphs (Actiwatch-L, Cambridge Neurotechnology Inc, Cambridge, UK) to track their time spent sleeping, which were cross-checked with sleep diaries.

The medication recognition and confirmation test

The computer-based Medication Recognition and Confirmation Test (MRCT) was designed with reference to established criteria for neurocognitive testing.17 The user interface is a touch screen to measure the times and accuracy of responses to linked pairs of questions involving (Question One) recognition and selection and (Question Two) confirmation of the identity of pictures of medication labels in any desired format.

In Question One (recognition and selection), each participant is asked to choose a picture out of four different options that corresponds to a medication named in the centre of the screen (Fig. 1A). In Question Two (confirmation), the participant is presented with a single picture selected from the four options with the prompt, “Do you want to administer this drug?” (Fig. 1B). The picture presented in this “confirmation” question is usually the one selected, but one of the other three pictures is substituted in a small percentage of cases. The substitution rate can be varied, and the substitutions can occur randomly, thereby increasing the need for vigilance and the potential for error. The number of questions provided in a single test can also vary.

Fig. 1
figure 1

Example screen shots of the Medication Recognition and Confirmation Test. (A) The name of a drug (in this case etomidate) is shown in the centre of the screen, and the participant is asked to choose the correct label by touching the appropriate picture. (B) Having chosen the correct drug, the participant is asked to indicate whether he/she wishes to administer the displayed drug (which is usually the drug chosen, but may be a different drug displayed on a random basis) by touching the Yes or No buttons on the screen

In this study, the tests were set to 200 iterations of paired pictures consisting of 50 medications used commonly in anesthesia at ACH. Two experiments were conducted (see below). For each experiment, presentation in each pair was in one of two formats (A and B) specific to that experiment (Figs 1A and 2). Format A and Format B were presented randomly in equal proportions according to a sequence of computer-generated numbers, such that participants could not predict the format that was likely to follow. Each of the 50 medications appeared four times during each test (twice within the 100 Format A question pairs and twice within the 100 Format B question pairs). In the medication confirmation questions, the substitution rate for a false picture was set arbitrarily to 5% overall (60 out of 1,200 questions per participant), and the ratio of these substitutions was 20% intra-class (for example within the class of neuromuscular blockers) and 80% inter-class (for example between neuromuscular blockers and opioids), class being defined by the standard international colour code for user-applied drug labels.18

Fig. 2
figure 2

Three example medications were used in each experiment. In Experiment-1, colour-coded labels were tested against pictures of conventional ampoules. In Experiment-2, colour-coded labels were matched with black and white labels. Ampoules and labels are not shown to scale here, but they appear 1:1 with physical equivalents in the computer test

Participant handling

All participants were familiarized with the MRCT test and practiced the test before commencing the study. During training and before each trial, participants were instructed to strive for maximum speed and accuracy. Testing was undertaken on the third shift in a consecutive series of day or night shifts, respectively. Each participant acted as his/her own control. The start shift run (either nights or days) was determined by the roster, and each test battery was conducted three times (at the start, middle, and end of the shift). To avoid any possible differences in processor speed, the same dedicated desktop computer was used to collect all MRCT data. All testing took place in a quiet room adjacent to the operating rooms. Participants were instructed to use their dominant hand to respond to every question.

Experiment-1: colour-coded labels vs ampoules

Format A: Pictures of standardized labels colour-coded according to a New Zealand and Australian standard19 (identical to standards registered in the United States, Canada, and the United Kingdom),20-22 with both the class and the name of the drug displayed in a large clear font (e.g., “Opioid” and “Fentanyl”). Less salient details, including those required by regulation, were displayed in smaller fonts. These labels incorporated a barcode and were designed in variants for use on prefilled syringes, on ampoules as flag labels, and as user-applied labels for syringes15 (Safer Sleep, Auckland, NZ).

Format B: Photographs of conventional ampoules with care taken to produce the best possible resolution and clarity (Fig. 2). All ampoules were represented 1:1 with their physical equivalents.

Experiment-2: colour-coded labels vs black and white labels

Format A: The same as Experiment-1, Format A. Format B: Similar to Experiment-1, Format A, but with black and white labels instead of colour (Fig. 2).

Data analysis

Software

Statistical analysis was undertaken with SPSS v.17 (SPSS, Chicago, IL, USA) and the statistical package R, version 2.8.23 Average daily sleep gained during each shift type was calculated using Sleep Analysis Software v. 5.0 (Cambridge Neurotechnology Inc, Cambridge, UK).

Data reduction

For each participant, the median reaction time (MRT) in milliseconds (msec) was calculated for the responses to each recognition question in each format (Question One) and each confirmation question in each format (Question Two). The MRT was chosen purposely after examining the distributions of data, because it was a valid measure of central tendency10 and because transforming the data would make interpretation of the reaction times less intuitive. The total number of errors was calculated for each participant for each of Questions One and Two for each format.

Label type comparisons

When examining differences in label types, separate analyses were performed on each of the following outcome measurements: median recognition MRT, median confirmation MRT, and total errors within each experiment. Outcomes were measured for each participant within a shift-type at a shift-stage, so tests were based on the paired differences between the label types. Participants were modelled as levels of a random factor, as they were considered to have been sampled from the population of anesthetic trainees in the region’s training program. Linear mixed-effects models with random intercepts were fitted to the distribution of differences using a normal error structure (identity link-function) and maximum likelihood parameter estimation. The model initially included the following covariates: 1) shift-type (night vs day) 2) shift-stage (start, middle, or end), and 3) sleep duration (minutes) as fixed effect covariates. Covariates were dropped from the model if there was no evidence of an effect at the alpha = 5% level.

When outliers were present, we used the non-parametric Wilcoxon signed-rank test or the sign test (the latter test was used when data had multiple ties) on the paired differences (averaged across shift-stage for each participant), as indicated in the text. We obtained the exact confidence intervals (CI) for the median difference of the population by the algorithm described by Bauer,24 and we designated the level of significance, alpha, as 5%.

The sample size estimate was not undertaken because no prior data were available for this purpose. Instead, we recruited as many participants as practical during the time the first author was completing his PhD.

Day-night comparisons

When examining differences in shift-type (day vs night), the analysis was carried out on the combined Format A data (i.e., coloured label data) amalgamated from both experiments using a paired Student’s t test.

Results

Fourteen trainees consented for Experiment-1 and ten trainees consented for Experiment-2. No trainee who was approached at the outset declined to participate, but four and two trainees, respectively, were then unable to participate because of rostering exigencies. At the end of the study, valuable MRCT data were available from 18 subjects all aged < 40 yr (ten in Experiment-1: five male, five female and eight in Experiment-2: three male, five female); actigraphy data were available from 12 of these participants (five in Experiment-1 and seven in Experiment-2). Missing data were attributable to lack of compliance in completing the tests and equipment failure.

Sleep data

Actigraphy data were in concordance with sleep diaries (sleep diary data not shown). Participants slept an average of 6:36 hr during day shifts compared with 5:39 hr during night shifts. This result corresponds to 57 min less sleep (95% CI 0:15-1:39 hr; P = 0.013, paired Student’s t test) per 24-hr period while working night shifts than while working day shifts (Fig. 3).

Fig. 3
figure 3

Paired comparisons of mean sleep obtained by anesthesiologists during day and night shifts. Mean difference was 57 min (95% confidence interval 0:15-1:39 hr; P = 0.013, paired Student’s t test)

Experiment-1: colour-coded labels vs ampoules

The mean recognition MRT for colour-coded labels was 332 msec faster (95% CI 242-422 msec; P < 0.0001, mixed-model analysis), shift phase omitted, than that for conventional ampoules (Figs 4A and 5). When shift phase was included in the model, the evidence for a difference in mean recognition MRT remained strong (P < 0.001, mixed-model analysis), but the magnitude of the difference diminished as shift-stage progressed from early through mid to late (Table). Only four medications were recognized more quickly on average in ampoule format than in label format (metoclopramide, potassium chloride, tenoxicam, and water) (Fig. 5).

Fig. 4
figure 4

Parallel line plot comparing individuals’ median reaction times in drug recognition. Panel (A) Experiment-1 comparison between colour-coded labels ( ) and conventional ampoules (●) (n = 10). Panel (B) Experiment-2 comparison between colour-coded labels ( ) and black and white labels (○) (n = 8)

Fig. 5
figure 5

Experiment-1 comparison of medication recognition times between colour-coded labels ( ) and conventional ampoules (●) (n = 10). Each point represents the mean of all participants’ median reaction times to each medication. Error bars indicate 95% confidence intervals

Table The magnitude of the difference between colour-coded labels and ampoules was dependent on shift stage

Mean confirmation MRT for colour-coded labels was 40 msec faster (95% CI 15-66 msec; P = 0.0028, mixed-model analysis) than for ampoules (Fig. 6A).

Fig. 6
figure 6

Parallel line plot comparing individuals’ median reaction times in drug confirmation. Panel (A) Experiment-1 comparison between colour-coded labels ( ) and conventional ampoules (●) (n = 10). Panel (B) Experiment-2 comparison between colour-coded labels ( ) and black and white labels (○) (n = 8)

There were 38 errors, 17 errors with the colour-coded label format and 21 errors with the ampoule format (with no evidence of a difference in proportions; P = 0.47, mixed-model analysis). There was no evidence of the effects of shift-stage order on confirmation MRT or error rate. No drug stood out as being more error prone than the others.

Experiment-2: colour-coded labels vs black and white labels

The mean recognition MRT for colour-coded labels was 96 msec faster (95% CI 46-146 msec; P < 0.0001, mixed-model analysis) than that for black and white labels (Fig. 4B). Mean confirmation times for colour-coded labels were 16 msec faster (95% CI 6-26 msec; P = 0.0023, mixed-model analysis) than that for black and white labels (Fig. 6B). There was no evidence of the effects of shift-stage on recognition or confirmation MRT.

A total of 76 errors were made, 42 with the colour-coded label format and 34 with the black and white label format (P = 0.07, Wilcoxon signed-rank test).

Day vs night comparisons: analysis of amalgamated colour-coded label format data (Format A in both experiments)

There were 20,600 responses to the MRCT questions for colour-coded labels (Format A) (10,300 recognition questions and 10,300 confirmation questions). Sleep duration, shift-stage, and order showed no evidence of affecting MRT, so these factors were dropped from the initial model. The mean recognition MRT was 56 msec slower (95% CI -2 to 115 msec; P = 0.06, paired Student’s t test) during night shifts than during day shifts (Fig. 7).

Fig. 7
figure 7

The distribution of differences (night – day) in median reaction times for recognition responses and confirmation responses (n = 18). Points are shown perturbed on the x-axis to aid in discrimination. Horizontal lines denote the means of the group sample

Mean confirmation MRT was 60 msec slower (95 % CI 1-120 msec; P = 0.048, paired Student’s t test) during the night than during the day (Fig. 7). Analysis of recognition and confirmation errors together revealed no evidence of a difference (P = 0.29, mixed-model analysis)

Fifty-nine errors were made with colour-coded labels, 28 during day shifts and 31 during night shifts.

Discussion

In Experiment-1, photographs of colour-coded drug labels were selected more quickly than photographs of ampoules, and they were also confirmed more quickly as the required drug when subsequently presented. However, there was no difference in error rates between these two formats.

In Experiment-2, selection and confirmation was also quicker with colour-coded labels than with equivalent black and white labels. Forty-two errors were made with the colour-coded labels vs 34 with black and white labels (P = 0.07).

In the first Experiment-1, the difference in selection times diminished as the shift progressed from early to mid to late. This result may have been an effect of fatigue, but no other effects from shift phase were identified, so it is difficult to speculate on this theory. We tested for effects of order and found none, suggesting no important learning component in the test.

Participants obtained almost one less hour of sleep while on night shifts than while on day shifts (Fig. 3). Both drug recognition time (P = 0.06) and drug confirmation time (P = 0.048) were slightly slower during night shifts. However, we found no differences between the night and day shifts with regard to error rates.

The value of these data lies in the clinical relevance of the simulated task. The participants in this study were participating in standard rostered shifts, and the only deviation from normal clinical practice was the requirement to perform our tests. We did not control for caffeine, alcohol consumption, or sleep medication. We were surprised that almost all participants obtained considerably less than the eight hours of sleep considered usual25 on both day and night shifts. The relatively small difference between day and night shifts in hours of sleep (approximately one hour) can be attributed to the fact that participants often had the opportunity for sleep while on night duty. Recent data suggest that continuously restricting sleep to six hours each night can have a deleterious effect on vigilance equivalent to 24-hr acute sleep deprivation,26,27 and self-awareness of reduced performance tends to plateau.27,28 The trainees who were studied in our hospital had a tendency to fall within this category, at least on night shifts, despite the relatively rigorous restrictions on work hours in New Zealand. Impairment of anesthesiologists’ performance due to loss of sleep has been shown previously in a similar New Zealand setting.10

Taken collectively, these findings suggest an advantage in favour of the labels that were studied (characterized by a standardized layout, a large clear font, and various other features)15 over most of the varied presentations manufacturers use on their ampoules. The results also provide limited support for the use of colour-coding, although the advantage conferred by this was restricted to speed of selection and confirmation rather than to increased accuracy.

A limitation of this study lies in its relatively small number of participants (the logistics of collecting data from practicing clinicians during actual periods of work at different times of night and day were challenging) and the fact that they all were a part of a single regional training scheme. Therefore, our findings may not apply beyond the population studied. The strength of this study is its paired design - each individual acted as his/her own control. Also, we were able to collect data from many repetitions of the test.

Questions relating to the presentation of drug information may be easier to study than those relating to time of day. We took great care to make the photographs as clear as possible, but results could be different if the actual ampoules were to be tested against the actual labels. It is also possible that performance in a clinical context could differ from that reported here. On the one hand, performance could be worse if there were distractions; on the other hand, perhaps participants would perform better if they were making decisions with consequences for real patients. The absolute reaction and confirmation times in this study are relatively arbitrary, and one might argue that differences of 330 msec (recognition times) or 40 msec (confirmation times) between colour-coded labels and ampoules are not clinically relevant in themselves. However, these times reflect the mental effort required to complete the tests,13 and we therefore infer that quicker reaction times to standardized drug representations correspond with easier recognition of the drugs. Over a large number of administrations, quicker reaction times are likely to facilitate the avoidance of errors. We were surprised not to find a difference in error rates between label types. Despite being instructed to complete the tests as quickly and accurately as possible, the participants in the study knew we were investigating drug error and may have sacrificed speed in favour of accuracy.

The value of colour-coding in promoting safety in drug administration continues to be debated.29,30 The fact that there is room for improvement in the legibility and distinctiveness of some ampoules is less controversial2 – but it remains very difficult to persuade regulators or manufacturers to act on the well recognized problem of so-called “look-alike sound-alike”31 drugs.

The experiments described here employ a simple form of simulation. Complexity in simulation can be varied, from relatively simple task-oriented simulations such as this through to immersion high-fidelity simulation.32 By capitalizing on the strengths of each level (in this case low cost and ease of multiple tests) the same questions can be studied in different ways. Thus, the methods presented here could be used to obtain the information needed to perfect the presentation of drugs, i.e., investigating such features as font size, tall man lettering, and the use of auditory as well as visual presentation of key information.15 One could also test for error rates between drugs that look the same rather than using random substitutions as we did in the present trial. Once this has been accomplished, more complex simulation methods could then be used to verify that these findings apply in the wider context of administering a full anesthetic6 before finally confirming them in a clinical setting.

In conclusion, we have presented a novel task-relevant test of performance in relation to drug administration. We have shown that there are advantages to having legible standardized drug labels that are colour-coded for class of drug, and we have added to the evidence that shift effects (and likely fatigue associated with sleep restriction) influence performance in an anesthetic context.