Information-gathering patterns associated with higher rates of diagnostic error

  • John E. DelzellJr
  • Heidi Chumley
  • Russell Webb
  • Swapan Chakrabarti
  • Anju Relan
Article

DOI: 10.1007/s10459-009-9152-8

Cite this article as:
Delzell, J.E., Chumley, H., Webb, R. et al. Adv in Health Sci Educ (2009) 14: 697. doi:10.1007/s10459-009-9152-8

Abstract

Diagnostic errors are an important source of medical errors. Problematic information-gathering is a common cause of diagnostic errors among physicians and medical students. The objectives of this study were to (1) determine if medical students’ information-gathering patterns formed clusters of similar strategies, and if so (2) to calculate the percentage of incorrect diagnoses in each cluster. A total of 141 2nd year medical students completed a computer case simulation. Each student’s information-gathering pattern included the sequence of history, physical examination, and ancillary testing items chosen from a predefined list. We analyzed the patterns using an artificial neural network and compared percentages of incorrect diagnoses among clusters of information-gathering patterns. We input patterns into a 35 × 35 self organizing map. The network trained for 10,000 epochs. The number of students at each neuron formed a surface that was statistically smoothed into clusters. Each student was assigned to one cluster, the cluster that contributed the largest value to the smoothed function at the student’s location in the grid. Seven clusters were identified. Percentage of incorrect diagnoses differed significantly among clusters (Range 0–42%, Χ2 = 13.62, P = .034). Distance of each cluster from the worst performing cluster was used to rank clusters. This rank was compared to rank determined by percentage incorrect. We found a high positive correlation (Spearman Correlation = .893, P = .007). Clusters closest to the worst performing cluster had the highest percentages of incorrect diagnoses. Patterns of information-gathering were distinct and had different rates of diagnostic error.

Keywords

Clinical reasoning Diagnostic error Information-gathering pattern Undergraduate medical education 

Introduction

Medical errors are an important cause of morbidity and mortality in the United States (Kohn et al. 1999). Systemic factors contribute to the high rate of medical errors (Gandhi et al. 2006; Singh et al. 2007a, b); however, individual cognitive factors remain the last line of defense before an error is made (Zhang et al. 2002). Recent attention has turned to cognitive errors, specifically diagnostic errors, which comprise a large proportion of medical errors (Croskerry 2003; Gandhi et al. 2006; Singh et al. 2007a, b). Diagnostic errors may occur during any step of the diagnostic reasoning process (Bornstein and Emler 2001; Kassirer and Kopelman 1989; Kempainen et al. 2003; Mamede et al. 2007). Although a single diagnostic reasoning model has not emerged, most models describe common steps of diagnosis generation, refinement, and validation (Bornstein and Emler 2001; Kempainen et al. 2003; Mamede et al. 2007). Information-gathering influences each step in the diagnostic process and problematic information-gathering has been implicated as the most common reason for diagnostic errors in primary care (Singh et al. 2007b). Studying errors arising from inadequate information-gathering has been recommended as an important research focus (Singh et al. 2007a).

Cognitive skill development

Diagnostic reasoning is a cognitive skill that can be developed. In deGroot’s theoretical model of developing expertise in cognitive skills, people begin as novices and then become acceptable performers, a stage in which few gross errors are made and experience accounts for adequate performance in typical situations. At this stage, many people improve or decline based on their experiences, but few develop into experts (deGroot 1965).

The information-gathering pattern (history questions asked, physical examination done, and the sequence by which this information is sought) varies in different developmental stages. Medical students begin as novices and at that stage use an exhaustive, illogical strategy (deGroot 1965). During medical school and residency training, most advance to a level of acceptable performance, where experience augments reasoning and accounts for acceptable performance in most instances. Physicians in an acceptable performance stage focus their information-gathering on what has been relevant in prior similar situations, limiting an acceptable performer’s ability to make a correct diagnosis in a novel situation (Norman et al. 2007). This experience-based, non-analytic reasoning (pattern recognition) is seen in students, residents, and practicing physicians (Norman et al. 2007). Expert diagnosticians use a focused, adaptive strategy to gather information, based on a deeper understanding of the subject. Although, experts use pattern recognition (Norman et al. 2007), they know when to deviate from that strategy and can arrive at a correct diagnosis by reasoning even in novel situations. These different information-gathering patterns (See Table 1) may be useful in recognizing non-expert diagnostic reasoning that leads to a correct diagnosis. This is particularly important in primary care where it is not feasible to expect that physicians will experience every possible presentation for all diseases during training and practice.
Table 1

Characteristics of different levels of clinical reasoning

Novice

Acceptable performer

Expert

Misses patterns

Notices recurrent patterns

Formulates new patterns

Organizes knowledge by when it is learned

Organizes knowledge by when it has been used

Organizes knowledge based on a deep understanding

Does not know when knowledge will be useful

Does not know when new knowledge will be useful

Knows when new knowledge will be useful

Exhaustive, illogical information gathering

Experience-based information gathering

Focused, adaptive information gathering

Difficulty of identifying non-expert diagnostic reasoning

Identifying non-expert reasoning is difficult. A physician using non-expert reasoning can (and does) often make a correct diagnosis. Assessments that rely on a correct diagnosis, such as multiple-choice tests, poorly discriminate among different levels of expertise (Anderson 2004; Ferland et al. 1987). Also, experts use problem-solving patterns that are similar to each other, but may not gather identical information. Comparisons of non-experts to experts must account for the variability among experts in the approach to a specific problem. This creates the assessment problem seen with standardized patient examinations (SP), which have not captured increasing levels of expertise (Ferland et al. 1987; Hodges et al. 1999, 2002). One explanation is that the SP checklist, the specific items expected of the examinee, does not encompass the entirety of expert approaches. More complex patterns of information-gathering, rather than specific items, may better distinguish acceptable performers from experts.

Assessing diagnostic reasoning

Many methods have been used to assess diagnostic reasoning; however, there is no gold-standard, and each method has limitations. Multiple-choice question tests and standardized patient examinations do not reliably measure clinical reasoning (Ferland et al. 1987; Hodges et al. 1999; Merrick et al. 2002; Regehr et al. 1998). Evaluation by supervising physicians forms the basis of assessment for advanced learners; however, physician observation of learners’ information-gathering occurs infrequently (Nutter and Whitcomb 2001). The MiniCEX, a checklist that requires direct observation, reliably measures clinical performance, but not specifically diagnostic reasoning (Holmboe et al. 2003).

Newer assessment methods include the script concordance test (SCT) and computer simulations. In the SCT, students answer a series of Likert scale questions, designed to gauge information processing and organization (Charlin et al. 2000). The SCT is based on educational theory and several studies demonstrate its content and construct validity (Charlin et al. 2000; Cohen et al. 2005; Gagnon et al. 2006). The SCT questions are independent of each other, testing a single step of clinical reasoning, and do not assess information-gathering patterns. The United States Medical Licensing Examination uses computer simulations, but the reliability has been lower than expected (Margolis et al. 2004). The web-based simulation, DxR Clinician®, uses patient cases to separate performance into categories, many of which are similar to diagnostic errors seen in physicians; however, its validity has been questioned (Jerant and Azari 2004). The scoring algorithm is based on whether or not the student arrived at a correct diagnosis and chose to do specific items. In this way, DxR Clinician® is similar to standardized patient (SP) assessments. In both, expected actions are generated by educators, and measured using a checklist. Neither DxR Clinician® nor SP examinations consider the sequence in which information is obtained. We believe this is a major limitation that interferes with an assessment’s ability to identify at-risk information-gathering. (See example below)

 

Example:

An expert and non-expert complete a computer simulation of a person with acute back pain. Their diagnostic reasoning is assessed using a checklist of expected actions. On the checklist are “red flags,” indicators of an increased risk of a serious etiology (i.e., unexplained loss of bowel or bladder control, or bilateral leg weakness). The expert uncovers urinary retention during the history of present illness and asks about other “red flags” before conducting the past medical history. The non-expert does not find “red flags” until completing a non-focused review of systems, which incidentally uncovered an important symptom. As both the expert and non-expert asked the same questions, a checklist would not differentiate between these two approaches. Sequenced data would demonstrate this difference. The observable difference is not the questions asked, but the change in information-gathering when important data is uncovered. We believe that the presence of this type of change in information-gathering has a high specificity for expert reasoning. An expert may not always demonstrate this, but it is unlikely that a novice would have the well-organized knowledge structures required to recognize important information and change information-gathering during the course of an interview.

Pattern recognition systems

Although information-gathering patterns are complex, recognizing similar information-gathering patterns is possible with several types of computer pattern-recognition systems. An artificial neural network (ANN) is a type of information processing tool with the ability to find meaning in complicated sequential data, allowing it to notice patterns that are too complex for humans to consciously describe. An ANN processes information similarly to the biological nervous system, through a group of interconnected artificial neurons that uses a mathematical model to process information. ANNs learn, as do humans, by example, and once trained can classify complex patterns.

ANNs have been used to separate novice from expert reasoning patterns (Casillas et al. 2000; Stevens et al. 1996; Stevens and Lopo 1994; Stevens and Najafi 1993). In these studies, expert patterns were similar to other expert patterns. Novice patterns were not only different from expert patterns, but also different from other novice patterns. One study used only subjects who arrived at a correct diagnosis, demonstrating that many who arrived at a correct diagnosis did not demonstrate an expert pattern (Stevens et al. 1996). To our knowledge, artificial neural networks have not been used to identify clusters of information-gathering patterns. The purposes of this study were to (1) use an artificial neural network to cluster students’ information-gathering patterns and (2) determine the percentages of incorrect diagnoses in each cluster.

Methods

Settings and subjects

The study subjects were 141 2nd year medical students from the Class of 2009 at the David Geffen School of Medicine at the University of California, Los Angeles. The case was a mandatory take-home assignment, offered as part of the weekly problem-based learning (PBL) tutorial. Students completed a training case on chronic obstructive pulmonary disease to help them adapt to the simulation interface. Course directors at UCLA selected two computer cases based on the learning objectives for the course that students were required to complete during the block. The purpose of the assignment was to provide formative feedback on student performance.

Computer cases

These two computer cases were available via institutional subscription to DxR Clinician© (www.dxrclinician.com). Each case is an immersive patient simulation designed to provide practice in and assessment of clinical reasoning skills. The investigators have no commercial interest in DxR Clinician©.

For the purpose of this pilot study, we focused on one case, a child with a sore throat. Eighty five percent of the students made a correct diagnosis of streptococcal pharyngitis on this case. We selected the case with the higher percentage of correct diagnoses, as we are interested in identifying at-risk information-gathering patterns used when the diagnosis is correct. The mean time for case completion was 68.49 min.

When completing a case through DxR Clinician©, the student is first given a one-sentence introduction, such as “your next patient is a child with a sore throat.” The student enters the case and begins the interview by selecting questions from a pre-defined list of 242 history items. History items are divided into sections such as history of present illness, lifestyles, past medical history, review of systems, physical examination and testing to facilitate navigation through the program. A student may move freely through the sections in any order. When a student chooses to ask a question, the computer program provides an immediate pre-determined answer to the question. The program also contains 337 physical examination and 963 ancillary testing options. When a physical examination item or ancillary testing item is requested, the computer program provides the findings. The computer program tracks which history, physical examination, and testing items were chosen and the sequence in which they were chosen. A string of 1,542 numbers, representing the 1,542 total possible items in the computer program, is the information-gathering pattern. A zero indicates items not chosen and a 1–N indicates items completed, 1 for the first item and N for the last item. For example, if a student did not choose item 1, chose item 3 first, and chose item 2 second, the information-gathering pattern would begin as 021. This number sequence, representing the student’s information-gathering pattern is the input data for the ANN.

Artificial neural networks

Computer-based implementations of many algorithms are appropriate for addressing pattern-recognition problems. Algorithms and analyses based on artificial neural networks (ANN) have been successfully used for different types of pattern-recognition problems seen in medicine (Jesneck et al. 2007; Kocyigit et al. 2008; Usher et al. 2004). An ANN can be trained to recognize patterns using supervised or unsupervised training. In supervised learning, the ANN is provided with a portion of cases that have already been classified (this case is X and this one is Y). The ANN learns the inherent characteristics or mathematical model that describes the group of cases. This learning process allows it to correctly classify each case. The ANN is then tested with the rest of the cases. These cases are presented as unknown and classified by the ANN. The ANN’s performance can be assessed by determining if it correctly classified the cases presented as unknowns. This method is used when there are a large number of cases and enough is known about the cases to provide a set of correctly classified cases for testing and training.

When less is known about patterns characteristics, unsupervised learning can be used. In unsupervised training, unlabeled cases are provided to the ANN. The ANN uses a similarity metric to organize cases spatially in a grid of artificial neurons. This approach is useful to develop hypotheses about complex patterns. Unsupervised learning was used to demonstrate that expert patterns cluster together, and that many patterns that led to a correct diagnosis did not cluster with the expert patterns (Casillas et al. 2000; Stevens and Najafi 1993). Unsupervised learning in the form of a Kohonen network or self organizing map (SOM) was used in this study (Kohonen 1990). A SOM uses a training algorithm, a systematic method that alters weighting of different features of the data, until it derives a successful model. Each alteration is called an epoch. Kohonen recommends that 10,000 epochs be used to train the SOM. The first 1,000 epochs constitute the global ordering phase, in which all the neurons across the entire grid are modified at each epoch to model the training data. The subsequent 9,000 epochs are used to refine the map in increasingly more local areas.

The input data in this study was a string of 1,542 numbers (Si, where i varies from 1 to 1,542). Each Si could have a value between 0 and 1,542 indicating the sequence number at which the test was ordered, with zero indicating that the test was not ordered. This is extremely complex data; however, most of the important information was contained in specific parts of the patterns. For example, no students selected items 1,200–1,542, meaning no relevant data was contained in the last several hundred digits of the string of 1,542 numbers. This tail was removed without losing meaning. In addition, a process called discrete cosine transform further simplified the data by identifying the coefficients (parts of the pattern) that contained most of the important information. The first 60 coefficients identified through discrete cosign transform were used as input to a 35 × 35 neuron self organizing map, built with the Neural Network Toolbox from Mathworks Inc. The network was trained in 10,000 epochs using three different random seeds. The three resulting models had very similar and consistent results, indicating that the model had converged onto a good solution. Using one of these models, the SOM assigned each input pattern into one of the 1,225 (35 × 35) output nodes. The authors have no commercial interest in Mathworks, Inc.

The number of students at each output node formed a surface that was smoothed into clusters using a statistical smoothing function (σ). After convergence, a number of dominant peaks are selected and a smoothing function is used to smooth out the spiky nature of the information. Each student was assigned to the cluster that contributed the largest value to the smoothed function at the student’s location in the grid. This smoothing factor is a Gaussian function with its maximum value located at one of the dominant peaks. The standard deviation of the Gaussian function controls the spreading behavior of the smoothing function around its peak value. Different numbers of clusters may be formed by changing the value of the function. Based on experiences working with students, we expected 5–7 different types of patterns. As such, we modified the smoothing factor until we obtained a number of clusters in this range. We compared the proportion of incorrect diagnoses among the seven clusters formed using a chi-square analysis. See the “Technical appendix” for more detailed information about the smoothing function.

Results

The smoothed SOM output provided seven distinct clusters, with each student’s information-gathering pattern belonging to only one cluster (See Fig. 1). Clusters varied in the mean number of total items selected (See Table 2; Fig. 2). The percentages of incorrect diagnoses among clusters was significantly different (0–42%, Χ2 = 13.62, P = .034). We then used a proportional analysis to test for significant differences between cluster 5 (42% incorrect diagnoses) and the other clusters. We set significance at P = .05 and adjusted for the six comparisons using Bonferroni’s correction. As such, we considered P < .008 significant. The proportion of incorrect diagnoses in cluster 5 was significantly different from cluster 1 (z = 3.23, P = .001, ES = .96).
Fig. 1

Spatial model depicting the location of the seven clusters in the self organizing map. The numbers represent the cluster number. The number of students in a cluster is indicated on the Z axis

Table 2

Average number of items selected by subjects in each cluster

Cluster (students)

Total

History present illness

Past medical history

Review of systems

Physical exam

Ancillary testing

1

68.60

11.66

9.61

22.03

21.79

3.53

2

64.39

11.67

4.67

10.56

32.61

4.89

3

49.96

9.58

5.42

10.58

21.12

3.27

4

64.74

13.41

10.48

26.37

11.96

2.52

5

43.83

11.42

5.50

15.92

8.83

2.17

6

31.44

12.25

2.00

1.50

12.50

3.19

7

110.00

17.50

24.25

39.00

23.00

6.25

Fig. 2

Average proportions of different types of information-gathering actions performed in each of the clusters

Fig. 3

Relationship between percentage of missed diagnosis and proximity to cluster five in the self-organizing map

Post analysis

The self organizing map places patterns that are similar in close proximity. As such, we hypothesized that clusters closer to the worst performing cluster (42% incorrect diagnoses) would have higher percentages of incorrect diagnoses and clusters farther away from the worst performing cluster would have lower percentages of incorrect diagnoses. To test this hypothesis, we set up two rankings to compare. First, we ranked each cluster based on its percentage of incorrect diagnoses, from 1 to 7, with 1 being the cluster with 42% incorrect diagnoses and 7 being the cluster with 0% incorrect diagnoses. Second, we ranked each cluster based on its distance from the worst performing cluster. The artificial neural network provided us with the distance between the center of each cluster and the center of the worst performing cluster. We assigned a rank of 1 for the worst performing cluster and 7 for the cluster that was the greatest distance away from the worst performing cluster. We used Spearman’s Rank Correlation to compare the rank based on incorrect diagnoses and the rank based on the distance from the worst performing cluster. We found a strong correlation, Spearman’s rank .893, P = .007 (See Table 3) (Fig. 3).
Table 3

Percentage of incorrect diagnoses, distance from cluster 5, and cluster ranks by incorrect diagnoses and distance

Cluster

Percentage incorrect diagnoses (%)

Distance from cluster 5a

Rank by incorrect diagnoses

Rank by distance from cluster 5

1

5

51

6

6

2

11

48

4

5

3

8

27

5

3

4

22

15

2

2

5

42

0

1

1

6

12

30

3

4

7

0

57

7

7

aCluster 5 had the highest percentage of incorrect diagnoses. Distance was measured in millimeters. The actual measurement depends on how we expand or shrink the graph; however, the relative differences remain the same

We do not believe that any of the clusters represents an expert pattern. Experts, we would expect, would have a low number of total items, with a higher percentage of history items. In many cases, much of the important information comes from the history. We would expect more history items than physical examination items and a small number of ancillary testing items. None of the clusters fit this pattern.

Novices gather more information than experts, often because their pattern recognition is not sufficiently developed to recognize items that are less important for making a diagnosis. We would expect novices to ask many history questions and perform many physical examination items. Cluster 7 had an average of 111 items, 42 items more than any other cluster. Interestingly, there were only four students who demonstrated this pattern (Cluster 7) and each made the correct diagnosis.

Clusters 4 and 5 were located in close proximity to each other and shared several characteristics. These patterns had a smaller percentage of history of present illness items and a larger percentage of review of systems items. In essence, students spent less time exploring the history of present illness and more time conducting an unfocused review of systems. This pattern led to a higher percentage of incorrect diagnoses.

The remaining clusters, 1, 2, 3, and 6, are positioned such that 1 and 2 are close together and 3 and 6 are close together. Therefore, we suspect that 1 and 2 are similar patterns and 3 and 6 are similar patterns. Cluster 1 and 6 are the farthest apart, indicating that these patterns have more differences identified by the ANN. We looked at clusters 1 and 6 for general characteristics. Cluster 6 has an average of 15.8 history items and 33 total items, which is fewer than the other clusters.

Although we can see the actual pattern data for each student in each cluster, the string of 1,542 numbers is too complex for us to interpret other than the gross findings described above. Because of this complexity, we are unable to discern meaningful differences among the clusters; however, we are currently performing further analysis to begin to answer this question.

Discussion

This study demonstrated that among second year medical students, distinct information-gathering patterns could be identified. These patterns had different percentages of incorrect diagnoses ranging from 0 to 42%. An advantage of the artificial neural network is the ability to detect complex patterns. Unfortunately, the ANN cannot tell us what the differences are among the patterns. As such, our speculations must be regarded as quite preliminary. To give face validity to our findings, we must look at the clusters of information-gathering patterns to determine if the groupings have meaning. This is the focus of our next study.

We did not have subjects in our study with more experience such as practicing physicians, residents, or advanced clinical students. In prior studies, experts (selected by their training or colleague referral) completed cases to establish an expert pattern for comparison. Our original sample did not have subjects with more experience. We decided not to include subjects with more experience for two reasons. First, it has been proposed that there are two distinct expert strategies: pattern recognition and scheme inductive (Coderre et al. 2003). We believe these strategies would demonstrate different information-gathering patterns, but previous research has shown that experts cluster tightly at one node instead of two (Stevens et al. 1996). We believed this would complicate our initial attempts to separate patterns. Second, we did not expect to see expert strategies among subjects at this level of training. Had we used expert patterns to train our ANN, we were likely to replicate prior findings: expert patterns cluster and novice patterns are dispersed. We are interested in determining if we can make sense of those non-expert patterns. As such, we believed training with a large number of experts in the samples would hinder our ability to see clusters among non-expert patterns.

We chose to use an artificial neural network analysis in this investigation over more familiar linear clustering techniques. We believe that linear or algorithmic analysis may not capture the complexity inherent in the sequence of information-gathering. We have not formally tested this assumption, but we based our choice on the disappointing results seen with algorithmic or checklist analyses of clinical reasoning (Jerant and Azari 2004). We also felt that the global pattern of information gathering is more important than the individual factors within that pattern. Breaking the pattern down into smaller component factors may lose some of the richness of the pattern.

Limitations

In this study, we set out to determine if students’ information-gathering patterns would cluster. Although our results are encouraging, there are several limitations to consider. The subjects completed this assignment as a take home case that was not monitored and no grade was assigned. As such, we do not know if our data represents individual students’ best efforts; however, that should not affect whether or not patterns clustered. The DxR Clinician© interface can be difficult for students who are first learning the program, which could profoundly affect the information-gathering pattern. Again, this should not affect whether or not the patterns clustered; however, it may make the patterns appear less expert. As such, we draw no conclusions about medical students’ abilities to demonstrate expert information-gathering, only that we do not believe that we saw expert patterns on this one computer case simulation. In addition, we only looked at one case. Many cases are needed to assess a student’s performance, but we were determining if patterns would cluster. Repeating the study with more cases would strengthen our findings and allow us to begin to make speculations about individual students.

This type of analysis has inherent limitations as well. An SOM analysis produces a spatial map dependent on the initial vectors. Running the investigation with different initial vectors should provide similar results. We performed several different runs with sets of randomized initial vectors, which provided a slightly different orientation of the clusters. We presented the best result in this paper. In the future, we plan to run a large number of cases as described before and use some statistical estimation schemes to generate the final clusters. In this paper we showed some basic results and our plan is to gradually build upon this research.

Conclusions

Second year medical students demonstrated different types of information-gathering patterns when completing a computer-based clinical simulation. Some clusters of information-gathering patterns were associated with higher rates of incorrect diagnoses. The clusters contained patterns which had characteristics that we expected to see with unpacking, premature closure, and anchoring; however, further study is needed to verify that specific clusters do represent these types of information-gathering patterns. We believe our results demonstrate that further research on information-gathering patterns is warranted as an approach to identifying clinical reasoning errors.

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • John E. DelzellJr
    • 1
  • Heidi Chumley
    • 1
  • Russell Webb
    • 2
  • Swapan Chakrabarti
    • 2
  • Anju Relan
    • 3
  1. 1.University of Kansas School of MedicineKansas CityUSA
  2. 2.University of Kansas College of EngineeringLawrenceUSA
  3. 3.University of CaliforniaLos AngelesUSA

Personalised recommendations