BACKGROUND

Diabetic ketoacidosis (DKA) accounts for an estimated 115,000 hospital discharges per year in the USA.1 Clinical management is suboptimal; in a single-centre chart audit of 55 patients admitted with DKA to a large teaching hospital, the mean time to insulin initiation (a key component of therapy) was 207 min, and 75 % were placed on an inappropriate hyperglycemia protocol that did not address the other metabolic derangements of DKA.2

DKA is a medical emergency necessitating hourly assessment of a myriad of dynamic clinical parameters, resulting in numerous critical decision-making points, which are further complicated by the complex interplay between management actions.3 While clinical knowledge is necessary, clinical reasoning and management skills are critical for successful patient management. One before-after study examined the effect of resident education on DKA knowledge4. Fifty-one residents undertook a web-based test consisting of 12 multiple-choice questions before and 6 months after the intervention. In addition to receiving test feedback and links to further reading, they attended two 1-hour didactic lectures and case-based discussion. The authors reported no change in resident knowledge between the two time points. How best to improve residents’ clinical reasoning and management skills related to DKA has yet to be studied fully.

In contrast to passive delivery of content (i.e., didactic lectures), research has shown that trainees acquire skills and develop expertise through deliberate practice. Ericsson5 , 6 describes deliberate practice as a set of “…activities that have been found most effective in improving performance,” consisting of nine elements: highly motivated learners, well-defined learning objectives, appropriate levels of difficulty, focused repetitive practice, reliable measurements, informative feedback, monitoring and error correction, evaluation and performance, and advancement to the next task.7

A meta-analysis comparing simulation-based training in which trainees followed deliberate practice principles to traditional clinical medical education found 14 studies (6 randomized trials, 3 cohort, 1 case-control, and 4 pre-post studies), which addressed procedural, auscultation, and life support skills in medical students and residents.7 All studies favored simulation-based training with deliberate practice over traditional education, with an overall effect size correlation of 0.71 (95 % CI 0.65–0.76, p < 0.001). Thus, deliberate practice has strong potential as a framework for designing the training and assessment of clinical skills, including medical students’ and residents’ DKA management skills.

These previous studies on deliberate practice have not clarified which of the nine elements are most responsible for the observed performance improvements. In order to optimize the effectiveness of educational interventions employing deliberate practice, a rigorous understanding of its key elements and the contribution of each is central. For example, Pusic et al. have demonstrated that repetitive practice, one of the key elements of deliberate practice, is essential for trainees to develop expertise.8 In a prospective cross-sectional study, 18 pediatric residents were asked to classify whether 234 cases of ankle radiographs were normal or abnormal. Learning was greatest between cases 21 to 50, highlighting the importance of repetitive practice in gaining expertise. Given the high number of repetitions required to gain expertise, Pusic et al. suggest that computer simulation is an ideal medium for tracking the development of deliberate practice and for clarifying which of its nine elements are most useful.9

Two of the key elements of deliberate practice are that informative feedback be provided from educational sources and that assessment scores are available to produce a mastery standard.7 Thus, before a simulator can be used as a medium for deliberate practice, it must have a robust scoring system for which favorable validity evidence exists. Recently, Cook et al. conducted another review of the simulation literature specifically looking for validity evidence and found a paucity of reports.10 , 11In particular, they noted little use of validity frameworks, which have been the gold standard approach in the fields of psychology and education since 1999.12

OBJECTIVES

We aimed to develop a computer-based DKA simulator for medical training that included a robust scoring system. We collected validity evidence in order to judge whether the scores are appropriate measures of undergraduate and postgraduate medical trainees’ DKA management skills for both formative (e.g., identify students who require additional training) and summative (e.g., identify students who are competent) purposes. We chose to use the National Standards framework, which emphasises the collection of five sources of validity evidence, including content, response process, internal structure, relations with other variables, and consequences.12 , 13

DESIGN

Overview

First, we developed the DKA simulator, which relied on expert review of content. Next, we conducted usability testing of the simulator, which led to refinement of its content and functionality. We then developed the simulator scoring system and assessed our hypothesis that the in-built scoring system would produce favorable validity evidence demonstrating it is an appropriate measure of trainees’ DKA management skills.

Aim 1: Simulator Development and Refinement

Simulator Development

Content

The principal investigator (CY) identified key principles regarding DKA management in accordance with the Canadian Diabetes Association 2013 Clinical Practice Guidelines (CDA CPG)3 and incorporated those principles into clinical scenarios. In addition, she created linear equations that were modeled to simulate real-life parameters, such as vital signs and laboratory abnormalities. Six scenarios were designed to reflect the variety of presentations and management challenges (e.g., DKA with concurrent respiratory alkalosis; Appendix 1). Real-time progression of the case scenario included patient clips, interactive order entry, and presentation of laboratory and clinical data.

Format

In keeping with best practices for the instructional design of simulation activities, we designed the simulator to include interactivity, individualized learning, preset action categories, feedback, repetitive practice with varying levels of difficulty, and contrasting cases.14 18 Specifically, learning was individualized based on the user’s actions, for example, if they failed to administer potassium, the “patient’s” serum potassium would fall and the user would receive specific feedback regarding aggressive potassium replacement. The simulator consisted of preset action categories, including items under clinical assessment, investigations, management, and nursing. Users received feedback based on their actions throughout and upon completion of the simulation, consisting of “Helpful Hints,” as well as a summary report indicating their performance in each management category and additional reading. For example, if they did not order an arterial blood gas, they were prompted to do so and given the rationale for ordering it. Finally, the simulator included six contrasting clinical scenarios with varying difficulty (for example, an older adult in hyperosmolar hyperglycemic state; Appendix 1). We also implemented elements of deliberate practice in our design: well-defined learning objectives or tasks; appropriate level of difficulty; informative feedback from educational sources; focused, repetitive practice; rigorous, reliable measurements; and monitoring, error correction, and more deliberate practice.7

Programming and platforms of delivery

The extensive programming required for the complex interactions between the simulated patient’s parameters and the learner’s actions was completed by a programmer with the LAMP stack (Linux Ubuntu Distro, Apache 2.0, MySQL 5.0 and PHP 5.3), CodeIgniter as the Model-Viewer-Controller framework, jQuery for front end logic, and the HTML5 Boiler Template and Modernizr to expedite development for cross-browser compliance. The computer-based program was delivered over the Internet and run on standard web browsers. Iterative design, refinement, and quality assurance occurred over a 12-month period.

Expert Review of Content

We invited four clinical experts (one endocrinologist, one intensivist, one general internist, and one emergency physician) in active clinical practice (>50 % of time performing clinical work) with frequent exposure to DKA, through convenience sampling. Each independently completed each scenario to assess the accuracy and realism of the content and was asked to complete a questionnaire assessing inaccuracies (Appendix 2). The questionnaire was developed by CY and reviewed by SES. In addition, CY took field notes of their comments as the expert ran through each simulator case (although this was not a formal think-aloud protocol).

Usability Testing

The simulator underwent heuristic evaluation by a human factors engineer (SJ). Heuristic evaluation is conducted by usability experts, who review the product using a set of validated usability heuristics as guidelines, following the methodology defined by Nielsen.19 Usability issues were categorized by severity into minor, moderate, major, or catastrophic.

Simulator Refinement

Based on recommendations from the expert content review and usability phases of our process, the prototype was modified through an iterative process of design and evaluation. Specific changes are described in the Results section.

Aim 2: Collecting Validity Evidence for the DKA Simulator Scoring System

Development of the Scoring System

We modeled our simulator scoring system from those in the literature.20 The seven priorities in DKA management3 comprise the seven domains of the scoring system, which are (1) potassium deficiency, (2) volume depletion and fluid replacement, (3) acidosis, (4) hyperglycemia, (5) precipitating cause, (6) organization of care (e.g., communication with nurse), and (7) monitoring of patient. This comprised a total of 18 performance items (Appendix 3). For each performance item, the simulator tabulated percentage of correct actions and identified critical errors performed. The simulator then calculated a 3-point scoring scale per item20 (Appendix 3), resulting in a final numerical score ranging from 18 to 54, where 18 represented unacceptable performance in all performance items and 54 represented acceptable performance in all performance items.

Collection of Validity Evidence

Setting

We conducted the validation phase at a large urban academic health sciences center.

Participants

We recruited individuals with varying levels of expertise in DKA management [undergraduate medical students in year 1 (MS1) with limited knowledge and expertise, undergraduate medical students in year 3 (MS3), postgraduate trainees in year 2 of internal medicine residency (PGY2), and staff endocrinologists with extensive knowledge and expertise]. We asked all participants to complete the simulator after viewing a tutorial and completing one practice run to familiarize them with the simulator; hints were not given during the practice run. Sample size could not be estimated based on previous data, as this was a new scoring system. However, we expected a large effect size (given the wide variation in expertise of the groups) and estimated that to achieve a power of 0.80 with an alpha of 0.05, a total sample size of 66 participants would be required.21

Main Measures

As outlined in Messick’s12 original work and itemized for medical education researchers,10 validity evidence can be organized into five categories. We note that it is not necessary (or usually possible) to collect all sources of validity evidence in a single study.10 Consequently, our methods emphasized assessment of validity related to content, internal structure, relations with other variables, and consequences (Table 1). A recent article provided an organizing framework that we used to choose which data elements to collect10 (Table 1).

Table 1. Data elements collected for each category of validity evidence

Ethical review

We obtained approval from the ethical review board of the involved institution. Informed consent was obtained from all participants.

KEY RESULTS

Aim 1: Simulator Development and Refinement

Simulator Development

We depict simulator functionality and representative screenshots in the Appendices (Appendix 4: Simulator tutorial; Appendix 5a-i: Screen shots of simulator).

Expert Review of Content

Clinical experts thought that the simulator was reflective of real-life management of DKA. However, they felt that other medical care (for example, management of congestive heart failure) was neglected to focus on DKA. A summary of their comments is provided in Appendix 2.

Usability Testing

No critical issues were identified. However, some usability issues were rated as ‘major’ or ‘moderate.’ For example, the purpose of “Notes to self” was not clear. Most of these were thought to be adequately addressed through training or by additional explanatory text (Table 2).

Table 2. Simulator refinement based on heuristic evaluation

Simulator Refinement

Based on recommendations from expert content review and heuristic evaluation, the prototype was modified by team members through an iterative process of design; refinements are indicated in Table 2. For example, we renamed “Notes to self” as “Medical Notes” and added a brief text below stating its purpose.

Aim 2: Collecting Validity Evidence for DKA Simulator Scoring System

Eighty-one participants were recruited to the study (Table 3). Sixty-eight participants (91 %) reported using other forms of information technology for medical-related learning (primarily online resources such as Up-to-date); of these, 0 participants reported previous exposure to simulation-based learning. On inspection of the data distribution, we identified six participants with scores greater than two standard deviations from the mean. Five of these outliers spent less than 60 seconds on the simulator, indicating that they did not complete the patient case and the sixth performed very poorly. We chose to eliminate all of these individuals from further analyses, leaving us with 75 participants in total.

Table 3. Participant characteristics
  1. (1)

    Content: We based our scoring system on a pre-existing framework,20 the CDA CPG,3 as well as expert review by content and education experts reported above.

  2. (2)

    Internal structure: For the seven subscales, Cronbach’s α was 0.795, indicating adequate internal consistency. The exploratory factor analysis revealed that the Kaiser-Meyer-Olkin value was 0.25, which suggests our sample size was inadequate for conducting such an analysis (the value should be >0.50).

  3. (3)

    Relations with other variables: According to our ANOVA, the mean overall simulator score showed a significant group difference (F (3, 71) = 11.2, p < 0.001). Post-hoc analyses using Tukey’s HSD revealed the source of the difference was that the MS1 group scored significantly lower than all other groups (p < 0.02). The other groups’ scores did not differ significantly (Figs. 1 and 2). Our correlation data suggested that self-reported comfort with managing DKA correlated with the simulator score (r = 0.55, p < 0.001), as did the medical students’ self-reported number of weeks on GIM rotation (r = 0.40, p < 0.014). Similarly, across all groups, our nonparametric variables of age and number of DKA patients treated correlated with score (p = 0.022 and p < 0.001, respectively). There was no correlation of score with residents’ self-reported number of months on GIM rotation or gender.

    Figure 1.
    figure 1

    Mean score, percentage of actions correct, and number of critical errors by level of training. Error bars indicate standard deviation; undergraduate medical students in year 1 (MS1) with limited knowledge and expertise, undergraduate medical students in year 3 (MS3), postgraduate trainees in year 2 of internal medicine residency (PGY2), and staff endocrinologists.

    Figure 2.
    figure 2

    Receiver-operating characteristic curves for discriminating between expert and non-expert on the basis of score. The number indicated for each point is the score applied as a cut point value.

  4. iv)

    Consequences: We generated a receiver-operating characteristic (ROC) curve to define a simulator cutoff (“pass-fail”) score that would delineate a threshold between practicing physicians (considered ‘experts’) and trainees. Using the data from the curve, we calculated the Youden index (sum of sensitivity and specificity minus one) in order to identify the optimal cutoff score. We found that the largest value (0.47) occurred at a simulator score of 75 % (sensitivity of 94.7 %, specificity 51.8 %), demonstrating that a score of 75 % has high sensitivity (cutoff scoreable to identify 94.7 % of practicing physicians) but low specificity (cutoff scoreable to exclude 48.2 % of trainees). The area under the curve was fair at 0.73 ± 0.06 (95 % confidence interval: 0.61–0.85, p = 0.003).

CONCLUSION

We integrated guideline-based content and expert input, evidence-based instructional design strategies, and principles of user-centered design to develop an easy-to-use, engaging, and realistic computer-based DKA management simulator. We also evaluated validity evidence and judged the value of the evidence using two elements of deliberate practice: that informative feedback is provided from educational sources and that assessment scores are available to produce a mastery standard.7 Our judgment of the validity evidence is that it is mostly favorable for using the DKA simulator as a formative method for assessing trainees’ skill in DKA management. However, the data do not substantiate using the simulator for summative purposes: although performance of junior medical students differed from other groups, the low specificity of our cut point score suggests the scoring system is not yet sensitive to subtle DKA management performance differences between senior medical students and residents.

Current Validity Argument for Use of the DKA Simulator/Criteria for Effective Assessment

For a test to provide effective formative assessment for the learner, it should provide specific and actionable feedback, be integrated into the learning experience, and be timely and ongoing.23 Our DKA simulator provides feedback based on the learner’s actions and suggests correct management actions throughout the simulation and upon completion. Based on our content and relations with other variables' evidence, the simulator appears able to assess and differentiate a learner’s ability to identify and prioritize management options. Further research is needed, however, to ensure the feedback provided leads to performance improvements during prolonged periods of deliberate practice.

For a test to provide effective summative feedback for the learner and educator, it must consist of high-quality test material, a systematic standard-setting process, and secure administration as well as demonstration of validity, consistency, and equivalence.23 We created high-quality test material that was securely administrated and initiated a systematic standard setting process. However, although our collection of internal structure evidence demonstrated good internal consistency, our collection of consequence evidence, specifically the psychometric properties of the cutoff score, was not sufficiently strong to support its use for summative purposes; although sensitivity was high at 94.7 %, specificity was low at 51.8 %, thus not permitting accurate prediction of expertise. In addition, we have not yet assessed test-re-test reliability or equivalence (i.e., whether the same assessment yields equivalent scores or decisions when administered across different institutions or cycles of testing). In order to build upon a validity argument wherein the simulator score can be used to predict practice-ready competence in DKA management, additional consequence evidence such as evaluation with the actual pass rate (e.g., on objective structured clinical examination) will need to be collected.

Strengths and Limitations

Strengths of our simulator include its systematic development. User-reported limitations include its focus on DKA management, to the exclusion of other medical conditions; this was deemed an acceptable compromise given the intended focus of the simulator. A study strength includes our collection of multiple sources of validity evidence, which resulted in a more balanced assessment of the validity of our scoring system. Unlike previous studies in the literature,10 , 11 we collected not only evidence for relations with other variables, but also evidence for content, internal structure, and consequences. We believe this study serves as an example in moving the field of validation research methods forward in the domain of simulation-based medical education and assessment.

Next Steps

The current study is the first in a program of study that ultimately is aimed at impacting translational outcomes such as patient care practices, better patient outcomes, and collateral educational effects.24 For example, integration of the simulator into the medical curricula may improve resident knowledge and skills, the mean time to insulin initiation, prevalence of life-threatening hypokalemia, adequate fluid resuscitation, and subsequently patient morbidity and length of stay. Next steps of this research program are to explore further refinements to the scoring algorithm, how to most effectively implement the simulator in a curriculum, such as the optimal setting (for example, on-site invigilation by a coach versus self-study), and the optimal dose (for example, set number of case repetitions versus self-selected number of case repetitions). In addition, the simulator can be used to collect participant responses to clinical cues, which may be used to better understand the mechanism by which simulator cases can improve skills. Furthermore, the impact on clinical reasoning and the time course for these changes can be explored. Thus, computer-based simulation offers opportunities to improve trainee skill and to better understand how trainees learn.

Using the principles of deliberate practice and incorporating evidence-based instructional features, we developed a computer-based DKA management simulator. We subsequently collected an array of validity evidence for the scoring system including evidence on content, internal structure, relations with other variables, and consequences. Our next steps are to explore refinement of the scoring system and integration of the DKA simulator into medical education; pending these findings, the simulator will be refined and made available to the broader medical education audience.