Background

There has been increasing impetus worldwide on preventing medical errors, which have been estimated to be one of the leading contributors to poor outcome including mortality [1]. An important type of medical error concern diagnosis and clinical decision making, since they are highly prevalent, often preventable, and generally lead to greater morbidity and mortality than other types of error [2]. In fact, the majority of complaints brought against emergency physicians arise from delayed or missed diagnoses [3]. These errors in diagnosis and clinical decision making have been shown to arise due to technical/knowledge deficits, system related factors or cognitive errors. Of these cognitive errors have been shown to be major contributors (in more than three fourths of errors involving diagnosis and clinical decision making) [4, 5].

Cognitive errors may stem from inappropriate application or complete failure to apply knowledge or suboptimal ‘cognitive dispositions to respond’ (CDR) or biases [4, 6, 7]. The term CDR has been suggested to replace the term ‘bias’, to remove the negative connotations associated with the later [8]. These biases represent the manner in which a physician orientates and responds to the presenting complaints, symptoms and signs under circumstances of uncertainty [8]. Cognitive biases can prompt clinicians to make errors when pruning, selecting and/or validating a diagnosis resulting in missed/wrong diagnoses and treatment [6, 915]. There are several factors which predispose to cognitive errors such as fatigue and sleep deprivation, affective state, patient factors, ambient conditions and past experiences [8].

The cognitive errors have been studied extensively in high-stakes safety industry such as aviation [16, 17]. However, in the medical literature the focus has been relatively recent [6, 1822]. Also, the evidence on cognitive errors largely originates from studies on speciality trainees, and reveal that practicing physicians and trainees demonstrate considerable susceptibility to various cognitive errors. The initial post graduate years involve a sharp learning curve [23], wherein foundations of their non-technical and technical skills are laid with implications on their future performance. Therefore, identifying the common cognitive errors and how they impact performance can help prioritising them for focused interventions [24, 25]. In the absence of a formal training in the current curriculum, clinical experience alone would have to be relied on to attenuate the harmful effects of cognitive errors. However, whether this is the case, and to what extent is unknown.

To study cognitive errors in medicine, doctors are usually made to engage in cognitive tasks of diagnosis synthesis and decision making under experimental conditions. The majority of studies have used case notes and low-fidelity computer screen-based simulations to engage doctors in these cognitive tasks. The degree of immersion, cognitive, physical and affective load imposed by widely used methods of case notes and computer screen-based simulations, is different from what doctors experience in real life setting [26, 27]. Although there is controversy as to the learning efficacy of low versus high-fidelity simulation; [28] for research studies on cognitive errors, the latter may be better suited [27, 2931]. Hence, clinical problems, recreated in a high-fidelity simulated environment, with engineered complexity, uncertainty and temporal demands, is likely to reflect the real life challenges more closely. Observing and interviewing doctors who immerse and engage with patients and other staff in such an environment provides an opportunity for assessment of various technical and non-technical skills including diagnosis synthesis and decision making [32]. Occurrence of various cognitive errors and their impact on diagnosis synthesis and decision making can also be studied [25, 32], as various factors such as content knowledge, scenario complexity, environment, time of the day, experience etc. can be controlled to certain degree.

In this pilot study, we investigate the relative occurrence of selected cognitive errors amongst doctors in their first post-graduate year, using high fidelity clinical simulation as a tool.

Besides cognitive errors, the changes in teamwork and leadership skills were also assessed during the first post graduate year; as this is one of the main learning objectives of the existing simulation program.

Methods

The study involved review of intern (PGY1) simulation sessions conducted in year 2014 at Flinders medical centre, South Australia. The study was approved by Southern Adelaide Clinical Human Research Ethics Committee (91.15).

Simulation program

Simulation involved high-fidelity scenarios, using the SimMan simulator (Laerdal Medical, Wappingers Falls, NY, USA), and actors (trained simulation centre staff) representing other team members (surgeon, nurse, medical emergency registrar etc.). Each session started with briefing, followed by simulation and a debriefing session. Video assisted debriefing [CAE LearningSpace™ (CAE Healthcare, Canada)] was used with focus on technical skills, teamwork and leadership. The predominant debriefing style involved advocacy inquiry [33]. This involves pairing observations of participant’s actions (or inaction) with inquiry into participants’ thought processes during that moment.

The four simulation scenarios chosen for study were–(A) acute onset breathlessness in a post-operative knee surgery patient, secondary to cardiogenic pulmonary edema; (B) pain in abdomen and dizziness in a day 1 post hemi-colectomy patient who has deteriorating hypovolemic shock (covert to overt), secondary to intra-abdominal bleeding; (C) low-conscious state in a trauma patient secondary to narcotic overdose resulting from a prescription error, and (D) low-conscious state due to hypoglycaemia in a fasting diabetic patient awaiting surgery, where regular insulin dose was not withheld (Appendix). These scenarios were developed from actual incidents in hospital for learning purposes. In each quarter of internship year, the interns were exposed to one of the four scenarios. The sequence of exposure to these scenarios over the year was random. Each simulation scenario began with the nurse calling the doctor to see the unwell patient. Towards the end of session, the senior help (usually an ICU or admitting consultant) enters and requests a handover from the team.

Think-aloud: In think-aloud studies, people are asked to verbalize their thinking while performing tasks [34]. Researchers using this technique typically both observe and audiotape or videotape the participant [35]. Although it is often used, the technique is not entirely undisputed and there is some criticism in literature that thinking aloud and the limited capacity of memory may hinder the cognitive processes of the participant, thus affecting performance [3638]. On the other hand, there is broad support for it as well, and it is often seen as the only available ‘next best thing’ to–the currently impossible–‘direct observation’ of thought processes [39]. An alteration to the conventional ‘think-aloud’ protocol is the use of ‘verbal probing techniques’ [40]. Immediately following a particular response, questions (probes), are asked to help reveal the thought process [41]. Investigators have previously found that asking post-process questions to subjects provided valuable information that made think aloud data easier to understand and interpret [42, 43].

The participants in our study were not given instructions beforehand to verbalize their thoughts. Instead their thought process was probed (prompts) by the patient, the nurse and the ‘senior help’ to help reveal their thoughts during simulation. Prompts by the patient were initiated when the participant appeared to have gathered initial patient data i.e. history and physical exam. Prompts such as “what’s wrong with me doctor?” and “what does that mean?” were used to get insight into working differentials and participant’s understanding of pathophysiology and synthesis from the clinical data. The prompts from the nurse involved “what are you thinking?” and “why do you think so?”. These prompts were initiated if the participant initiated treatments without verbalizing the diagnosis or clinical problem or the participant verbalizes the diagnosis without the basis for it. The processing framework for think-aloud data was similar to that described previously [44]. Verbal data was transcribed and broken down into smaller components by predefined codes such as signs, symptoms, action, treatment, labs, monitoring. Scripts were then analysed to ascertain relationship between codes and identification of inductive and deductive reasoning processes.

Towards the end of session, the senior help arrived and asked the participants to verbalize their assessment of the situation and actions, along with the basis for it. If the participants were not correct with the diagnosis, they were asked if there could be other differentials and what were the findings for and against each differential. By comparing responses against checklist of key expectations, this interview process allowed for gaining an estimate about their knowledge around pertinent diagnoses and management. For example, the participant was identified to have adequate knowledge if he/she was aware of bradypnoea, hypothermia and pin-point pupils, and use of naloxone as a part of narcotic toxodrome diagnosis and management respectively.

Recording of cognitive errors

For pragmatic reasons, only selected cognitive errors (Table 1) were studied, of the more than 30 reported in literature [8]. The basis for selecting these was the prevalence as reported in previous studies [45], minimal overlap in theoretical construct, coverage of both diagnosis synthesis and treatment decisions and feasibility of studying them based on video review of simulation sessions. The simulation scenarios were developed to focus on technical skills (diagnosis and treatment) and non-technical skills of teamwork and leadership.

Table 1 Catalogue of cognitive errors used in this study [45]

To identify the occurrence of cognitive errors, two investigators (SP and SB) independently analysed the video recording and ‘think aloud’ comments from participants [CAE LearningSpace™ (CAE Healthcare, Canada)]. The observations were recorded using a Likert scale based questionnaire tool (Table 2). This questionnaire was adapted from a previously published questionnaire [45], which was used in a similar context of simulated in-hospital acutely sick patient. The assessors, similar to our study, were trained and had expertise in the field. Also, the questionnaire was demonstrated to be reliable (Cronbach’s α 0.81). Both the investigators have expertise in the management of critically ill patients and underwent training at using this questionnaire, which involved (i) familiarising themselves with the theoretical constructs of cognitive errors and the questionnaire, (ii) going through published examples of such cognitive errors, (iii) observing 10 randomly selected simulation sessions and applying questionnaire and discussing differences if any and (iv) discussing how in each of the four study scenarios, the specific cognitive error may manifest. While both the investigators were aware of the main study objective of feasibility, one investigator (SB) was blinded to other research questions. The investigators indicated whether or not a particular cognitive error was made by a given participant using a five point Likert scale, with the following anchors: ‘Strongly disagree’; ‘Disagree’; ‘Neutral/Unsure’; ‘Agree’; ‘Strongly agree’. The theoretical construct of cognitive errors obtained from previous publication allowed for two items per error, without increasing the risk of redundancy, cognitive load and fatigue on the assessors, while still preserving the face validity. Anchoring was reflected by questions 2 and 9, commission by questions 3 and 10, confirmation by questions 6 and 13, overconfidence by questions 4 and 11, omission by questions 1 and 8, premature closure by questions 5 and 12, and search satisfying by questions 7 and 14. For each cognitive error, the scores were averaged to develop final score that reflected three final outcomes–‘Present’, ‘absent’ and ‘can’t be assessed or unsure’. A cognitive error was considered as either present or absent, if both the assessors agreed. The outcome was ‘unsure/can’t be assessed’ if marked by both the assessors as ‘unsure/can’t be assessed’ or there was difference in marking.

Table 2 Questionnaire tool to assess cognitive errors

Recording of non-technical skills

The non-technical skills were evaluated using previously validated, Ottawa Crisis Resource Management Global Rating Scale (Ottawa GRS) [46, 47]. The Ottawa GRS is divided into five categories of crisis resource management (CRM) skills based on recognized CRM literature: problem solving, situational awareness, leadership, resource utilization and communication. An overall rating category for CRM performance is also provided. Each category is measured on a seven-point anchored ordinal scale with descriptive anchors to provide guidelines on alternating points along the scale.

Statistical methods

Both descriptive and inferential statistics were used. The results are presented using tables and graphs. Data is presented as median and Interquartile range (IQR), unless otherwise specified. Prevalence of cognitive errors is reported as a ratio of number of observed error type to total number of scenarios. Normality was assessed using Kolmogorov-Smirnov test. Associations between categorical and continuous variables were assessed using Chi-square test and Spearman’s Rho respectively. The relationship of delay in diagnosis and time spent in internship, with the occurrence of cognitive error, was studied after adjusting for scenario type, using generalized linear modelling and logistic regression. Time to diagnosis (dependent variable) across scenario types (factor variable) was compared using one way ANOVA. Trends in Ottawa GRS score over the PGY1 year was assessed using the Friedman test. The internal consistency and inter-rater agreement (for the final outcomes of ‘Present’, ‘absent’ and ‘can’t be assessed or unsure’) of questionnaire tool for assessment of cognitive errors was analysed using Cronbach’s alpha and Kappa statistic respectively. The thresholds for Cronbach’s alpha [48] and Kappa statistic [49] were 0.7 and 0.6 respectively. Analysis was performed using SPSS version 21 (IBM Corp., Armonk, NY, USA). For all tests, a two-sided P value of less than 0.05 was considered significant.

Results

A total of 70 simulation sessions were reviewed over the year 2014. A total of 25 interns participated in these sessions as primary responder. There were 17 sessions each of cardiogenic pulmonary oedema, narcotic overdose and hypoglycaemia, and 19 sessions of haemorrhagic shock.

Performance of questionnaire

The inter-rater agreement and internal consistency was good for all cognitive errors assessed, except ‘overconfidence’ (Cronbach’s alpha 0.46; Kappa statistic 0.17 for overconfidence).

Overall prevalence of cognitive errors

Total number of studied cognitive errors observed per session ranged from 0 to 5, with a median (IQR) of 3 (2–4) errors per session. Frequent cognitive errors, observed in more than 75% of sessions, included search satisfying (90%), followed by premature closure (78.6%), and anchoring (75.7%) (Fig. 1). No instances of overconfidence were identified. The uncertainty was most frequent in the assessment of omission bias; whereas it was least frequent in the assessment of anchoring and search satisfying.

Fig. 1
figure 1

Overall prevalence of cognitive errors

Impact of occurrence of cognitive errors on diagnosis synthesis

The correct diagnosis was reached in 57 (85.1%) sessions, with a median (IQR) time of 4.05 (2.3–6) minutes. The scenarios differed in time to reach diagnosis (P = 0.04). After adjusting for the scenario type, anchoring appeared to be significantly associated with delay in diagnoses (β = 0.2, 95% CI 0.5 to 3.0, P = 0.007) and also occurrence of premature closure (x2 = 9.4, P = 0.005). There was a trend towards negative association between occurrence of confirmation bias and the ability to make a correct diagnosis (x2 = 5.8, P = 0.05).

Occurrence of cognitive errors with time spent in internship

As shown in Table 3, there was no association of occurrence of particular cognitive error with scenario type, except for omission bias. After adjusting for scenario type, the odds of occurrence of various cognitive errors did not change with time spent in the internship year (Table 3).

Table 3 Association of cognitive errors with scenario types and odds of their occurrence with increasing internship time exposure

Other non-technical skills

There was overall improvement of OTTAWA GRS scores during the internship period (x2 = 11.9, P = 0.01) (Fig. 2). As shown in Fig. 2, there was improvement in skills involving leadership (x2 = 28, P < 0.001), problem solving (x2 = 8.1, P = 0.04), resource utilization (x2 = 19.7, P < 0.001) and communication skills (x2 = 19.4, p < 0.001). No significant trend was seen in situational awareness (x2 = 1.9, P = 0.6).

Fig. 2
figure 2

The Ottawa GRS score for non-technical skills across four quarters during the internship year

Review of interview between ‘senior help’ and the participant revealed that although all participants had knowledge of clinical presentation of these diagnoses/syndromes, there were frequent deficits in the knowledge around management.

Discussion

Our study shows that cognitive errors are highly prevalent amongst junior doctors in their first post-graduate year, particularly anchoring, premature closure and search satisfying. Although junior doctors’ clinical experience increased during the year leading to a significant improvement of their non-technical skills of teamwork and leadership, there was no change in the occurrence of studied cognitive errors. This is unfortunate because we also found that anchoring and confirmation bias did not only impact on the ability to make a diagnosis, but also resulted in a delay in making a diagnosis. Moreover, anchoring was associated with premature closure.

Observed cognitive errors and their impact on diagnosis synthesis

The most frequent cognitive errors were ‘anchoring’, ‘premature closure’ and ‘search satisfying’. Anchoring was significantly associated with delay in reaching a correct diagnosis and premature closure. Anchoring represents the tendency to fixate on the hypothesis generated from a specific feature of a presentation early in the diagnostic process, at the expense of understanding the whole situation. Recovery from this initial anchor involves ‘adjusting’ to more appropriate diagnosis, by seeking additional data for and against the diagnostic hypothesis generated from the initial feature [50]. This is therefore likely to delay reaching correct diagnosis by the adjusting process [50], either rendered insufficient by premature closure, or flawed by confirmation bias. In confirmation bias, time is spent to recruit selective information to reinforce the initial impression that might have been generated due to anchoring; in which case, the correct diagnosis may not be even considered. The occurrence of confirmation bias becomes more likely when the information is presented sequentially, as is usually the case in clinical medicine [51]. This also reinforces the value of high-fidelity patient simulation for studying cognitive errors over case notes or computer screen-based tools, where the information is available upfront. The highly prevalent search satisfying by definition resulted in high incidence of failure to search for causes or contributing factors, complications and co-existing diagnoses. Attempts at identifying these are likely to result in more successful and effective management.

Overconfidence bias was not observed in our study. This is contrary to findings in the literature of considerable prevalence [12]. However, susceptibility to overconfidence bias may vary with seniority and may actually be low in fresh medical graduates, as supported by findings of Friedman et al. [52].

Despite the high prevalence of cognitive errors, the correct diagnosis was considered in majority of sessions. This is perhaps not as illogical as it seems: heuristics and biases that may underlie these errors, evolve over time to help perform under circumstances of uncertainty and high cognitive load, and hence they may not be always associated with poor outcome [53]. Also some of the cognitive errors studied may not impact diagnostic accuracy, but may influence the choices in management, eg. search satisfying may lead to missed co-diagnoses or complications. Increase in complexity of case, temporal demand or fatigue may uncover the potentially harmful effects of biases or cognitive dispositions. Besides these possibilities, it has also been shown that even if the correct diagnosis appears as one of the differentials, it is rarely rejected during the diagnostic process [54].

High fidelity simulation as a research and teaching tool

Our pilot study is one of the first that demonstrates the feasibility of immersive high-fidelity clinical simulations to study cognitive errors, combined with a think aloud technique and a questionnaire based tool. This is a step forwards from previous studies utilizing predominantly case notes or computer screen based tools to study cognitive errors. Our study demonstrated the utility of simulation in eliciting how they may negatively impact clinical performance such as diagnosis synthesis and treatment decisions. Guided reflection here is likely to help develop catalogue of examples of situations where the potentially harmful effects of biases or cognitive dispositions may occur (exemplar theory [55]) and also motivate participants to learn (motivation theory [56]).

There was a significant improvement in teamwork and leadership skills. The efficacy of simulation based training for these skills have been demonstrated extensively in literature [57]. Similarly, focused simulation based training could help doctors to recognize and manage situations where they are at risk for cognitive errors. The applied psychology concepts, common to simulation based learning, of metacognition and ‘de-biasing’ strategies have been demonstrated in other fields to be effective in cognitive error prevention and recovery [5861].

In the absence of formal training, clinical experience is often relied on to mitigate cognitive errors. In our study, 1 year of clinical experience did not appear to influence the occurrence of various cognitive errors. It is unlikely that the context of scenarios alone would have yielded this result as: (i) the sequence of scenarios were not fixed, (ii) the pattern of occurrence of cognitive errors across scenarios was largely similar and (iii) we adjusted for scenario type in the regression analysis. Since the study was not designed for this research question, the finding can only be regarded as hypothesis generating, to be tested in a larger prospective study. In support of this finding, however, is a prospective observation study which revealed that the clinical reasoning process remains relatively constant from medical school entry to practice [62]. Also, the high prevalence of cognitive errors amongst speciality trainees and practicing physicians suggests that experienced clinicians are as likely as junior colleagues to experience adverse cognitive dispositions or biases [63, 64]. There could be few reasons as to why clinical experience alone may not suffice as a remedy for cognitive errors. Firstly, the outcomes in real life may not be immediately visible and secondly, the learning that occurs from experience require unbiased and informed reflection (Kolb’s experiential theory) [65] on what cognitive errors are and how they can impair clinical performance. Observing clinicians during their work to enhance experiential learning around cognitive errors has ethical limitations as the error would need to be interrupted or ideally avoided, so that the patient is not harmed. Here simulation based learning may be beneficial as the errors can be allowed to evolve and serve as a learning catalyst.

Resources involved in delivering high fidelity simulation based teaching

Simulation based teaching intervention can be resource intensive, and there is limited data in literature. Lapkin et al. reported that marginal cost of high fidelity simulation amounted to $ 291.26 per participant [66]. For our simulation based teaching, we used the resource cost model to estimate the cost [67]. There is one time cost of setting up of simulation facility which also includes purchase and installation of Mannequin, simulation software and computers. There are also recurring costs, of which the significant ones include equipment and materials, and personnel cost. Under our business model, the simulation based activity in the current project was a part of a larger simulation and clinical skills programme, used for training of medical students, post graduate doctors, nurses and allied staff. Hence the above costs which otherwise are significant (approximately $200,000) would be distributed across several teaching programs. Similarly, the cost involved in scenario development and programming was about $364.62 per scenario (6 h–$60.08/h). This cost again would be distributed across 30 to 40 participants each year, thereby significantly reducing the cost per participant. Per participant, the personnel cost involved in simulation based teaching was $157 (0.5 h nursing time, 0.6 h of 2 medical facilitators and 0.3 h technical and administrative time).

Limitations and future research

We used retrospective study design which allowed for assessment of authentic performance of participants and their thought process using think-aloud transcripts. Although we were able to control for uncertainty, a prospective study using direct interview may help minimize uncertainty in assessment and also further validate the questionnaire based tool. The inter-rater agreement and internal consistency in assessment of Overconfidence was poor. This is likely to be highly skewed observations towards absence of Overconfidence.

Although, the assessors were well trained for making observations, there is no independent corroboration of assessor’s observations. Also only one of the two assessors was blinded to the research questions. Although this may have not affected the main objective of the study, it may impact on specific hypothesis testing. In future studies, it is essential that the assessors are blinded to the hypothesis being tested.

We used modified think-aloud technique using verbal probes. Although such technique allows for more focused information gathering, while minimizing interruptions, it can potentially introduce bias. Certain probe questions can be perceived as a cue to reconsider their thoughts/decisions. Despite their usefulness, unfortunately both conventional think-aloud technique and verbal probing have inherent shortcomings which are difficult to overcome [3638]. Besides think aloud technique, there are other methods such as script concordance approach [68] and post-hoc cognitive walkthroughs [69] using video review, which may be trialled in future studies.

Cognitive errors were detected based on outcome measures, which could have been influenced by factors such as knowledge deficit. Since our assessment revealed deficits in knowledge around management of the simulated syndromes/diagnoses, some of these encountered errors may have originated from lack of knowledge, such as omission bias. This highlights the importance of assessment of the relevant content knowledge of participants, in order to better assess the origin of observed cognitive error. This is important to know, as interventions would differ. We did not have pre-defined standardized framework for assessment. Hence, although there was flexibility to modify assessment to the context, scenario and participant; this resulted in heterogeneity in assessment technique. A pre-formulated structured assessment framework will allow uniformity in assessment and needs to be trialled in a prospective study.

The results of our study need to be triangulated with a larger prospective study on cognitive errors using immersive high fidelity simulation as a tool. Also by controlling for various factors such as complexity of scenario, cognitive load, participant experience etc., and more objective assessment of participant knowledge base, it needs to be determined as to how and under what circumstances the cognitive dispositions or biases result in negative outcome. This is likely to provide essential evidence to facilitate studies into how simulation based education could be used to train doctors to avoid, identify and recover from cognitive errors.

Conclusions

Our study demonstrated a high prevalence of anchoring, premature closure, and search satisfying amongst doctors in their 1st postgraduate year. The clinical experience gained during their first post graduate year and simulation based training focused on teamwork and leadership was associated with acquisition of these skills. However, there was no change in the prevalence of various cognitive errors. Further prospective research is needed to validate the results and explore the utility of simulation based intervention to help doctors manage their cognitive errors to improve clinical reasoning.