Study Design
After institutional board review approval, we gathered questions from several previously used instruments for diagnostic error measurement10,16,35 and used an operational definition of diagnostic error37 to develop an initial draft of the instrument. We iteratively refined our instrument through pilot medical record reviews and multidisciplinary input, and tested the accuracy of the final instrument by conducting reviews of a sample of patients with and without diagnostic errors.
Study Setting
The study site was a large urban VA facility with 35 full-time primary care providers (PCPs), including physicians, physician assistants, and nurse practitioners, providing comprehensive care to approximately 50,000 patients. It had an integrated and well-established electronic health record (EHR), and large clinic networks through which it provided longitudinal care to ethnically and socioeconomically diverse patients from rural and urban areas. Most PCPs were physicians, some of whom supervised residents, and visits included scheduled follow-up visits and “drop-in” unscheduled visits.
Instrument Development
We developed a 12-item rating instrument (the Safer Dx Instrument) for the purpose of determining the presence or absence of diagnostic error for a specific episode of care. Our team consisted of five practicing clinicians (three of who were also diagnostic error and/or quality improvement experts), a psychometrician and a cognitive psychologist. We first sought existing content from instruments previously used in research on diagnostic error measurement.10,16,35 We then adapted some items from these previous instruments and added additional items to address important aspects of the diagnostic process such as history-taking, physical examination, test ordering, and test interpretation. All of the questions were intended to identify missed opportunities in diagnosis using criteria developed through several previous studies.9,35,36 We relied heavily on three clinical criteria found to be useful in our previous work to determine the presence or absence of diagnostic errors, i.e., case analysis reveals evidence of missed opportunity to make a correct or timely diagnosis; missed opportunity was framed within the context of an “evolving” diagnostic process; and opportunity could be missed by the provider, care team, system, and/or patient (see online Supplementary Appendix for details on criteria and instrument development).37
The final version of the Safer Dx Instrument consisted of 11 questions regarding the appropriateness of the diagnostic process and one summary question regarding the overall impression of diagnostic error (Table 1). Items were scored from 1 (strongly agree an error occurred) to 6 (strongly disagree that an error occurred), with the exception of three items (items 6, 9, and 10) that were reverse scored. Items were rated on a six-point Likert scale in order to allow for “gray areas” in the determination of diagnostic error (i.e., we did not want to force someone to say “absolutely an error” vs. “absolutely not an error,” but instead select response options that were less definite). However, to directly compare the overall impression of diagnostic error in item 12 to a previous sample of patients with and without diagnostic errors, item 12 (the main outcome) was dichotomized, such that 1 to 3 represented diagnostic error and 4 to 6 represented absence of diagnostic error (alternate ways to dichotomize are included in the online Appendix Table).
Table 1. The Safer Dx Instrument: Items for Determining Presence or Absence of Diagnostic Error in a Primary Care Encounter
Two physicians on our multidisciplinary team (AA and CD) pilot tested the instrument and provided feedback, which was used in team meetings for further refinement. The instrument was further refined through an iterative process of reviews by five additional practicing physicians outside of this team to ensure content and face validity. This type of approach is consistent with standard survey item development practices.38 Details on pilot testing are provided in the online Appendix. The chart reviewer, an actively practicing board-certified primary care physician (AA) with experience in EHR and patient safety projects, was trained extensively on record reviews.
Sample/Participants
We tested the Safer Dx Instrument using a cohort of 389 patients with and without diagnostic errors (n = 129 and n = 260, respectively) from the VA site in our prior study.35 At this VA study site, 1300 records had been selected for review; 886 using a “trigger” algorithm to identify patients with possible diagnostic errors based on unexpected hospitalizations and return visits, and 414 as “trigger negative” controls. After exclusion of false positives with no or minimal information available for error assessment, 1169 records remained and were reviewed in detail by at least two independent raters to determine the presence or absence of diagnostic errors. Patients were mostly male (93.8 %); 56.8 % White and 39 % Black. The cases represented a heterogeneous group of common medical conditions seen in the primary care setting and were independent of cases used to develop and pilot-test the earlier draft of the instrument.
Outcomes
The physician-reviewer blinded to the diagnostic error outcome reviewed medical records from all 389 patients and completed the Safer Dx Instrument for each. Clinical details were determined through detailed reviews of the EHR about care processes at an index primary care visit and subsequent visits. The reviewer evaluated EHR data up to 1 year after the index visit to help determine the clinical context. A second reviewer (board certified in internal medicine, but otherwise with similar familiarity with EHRs) independently assessed a random sample of 30 records from the testing data set (ten with and 20 without errors).
Statistical Analysis
We calculated the Safer Dx Instrument’s overall sensitivity, specificity, positive predictive value, and negative predictive value by comparing the main, dichotomized outcome from item 12 (1–3 = error, 4–6 = no error as determined by the single physician using the instrument) to results obtained in the previous study.35 Accuracy was defined as physician agreement with presence or absence of diagnostic errors as compared to our previous study results for all 389 cases.35
Additionally, we examined whether any of the 11 diagnostic process items were related to the main outcome (i.e., the rater’s overall impression of diagnostic error) by computing both Spearman correlation coefficients (using the six-point scaled outcome) and Pearson correlations coefficients (using the dichotomized outcome). All items that were significantly correlated to the main outcome were entered into a factor analysis with varimax rotation to identify any higher-order dimensions represented by clusters of items. We kept dimensions with eigenvalues over Kaiser’s criterion of 1 and assessed the internal consistency of the resulting dimensions using Cronbach’s alpha.
Finally, we developed a score based on all of the instrument items to predict whether cases assessed via Safer Dx Instrument were determined to be errors in our previous study. We thus performed a logistic regression using summed scores from the dimensions obtained in the factor analysis above, as well as individual items not included in the dimensions, to predict whether each case was an error or not. Using the obtained regression equation, we compared scores obtained in the error cases and the non-error cases. This would allow users to create potential cut-off scores, signaling lower or higher likelihood of diagnostic error. Users would have the flexibility to personalize these cutoff scores depending on how inclusive and conservative they wanted to be.