Clinical management of diabetic ketoacidosis (DKA) continues to be suboptimal; simulation-based training may bridge this gap and is particularly applicable to teaching DKA management skills given it enables learning of basic knowledge, as well as clinical reasoning and patient management skills.
1) To develop, test, and refine a computer-based simulator of DKA management; 2) to collect validity evidence, according to National Standard’s validity framework; and 3) to judge whether the simulator scoring system is an appropriate measure of DKA management skills of undergraduate and postgraduate medical trainees.
After developing the DKA simulator, we completed usability testing to optimize its functionality. We then conducted a preliminary validation of the scoring system for measuring trainees’ DKA management skills.
We recruited year 1 and year 3 medical students, year 2 postgraduate trainees, and endocrinologists (n = 75); each completed a simulator run, and we collected their simulator-computed scores.
We collected validity evidence related to content, internal structure, relations with other variables, and consequences.
Our simulator consists of six cases highlighting DKA management priorities. Real-time progression of each case includes interactive order entry, laboratory and clinical data, and individualised feedback. Usability assessment identified issues with clarity of system status, user control, efficiency of use, and error prevention. Regarding validity evidence, Cronbach’s α was 0.795 for the seven subscales indicating favorable internal structure evidence. Participants’ scores showed a significant effect of training level (p < 0.001). Scores also correlated with the number of DKA patients they reported treating, weeks on Medicine rotation, and comfort with managing DKA. A score on the simulation exercise of 75 % had a sensitivity and specificity of 94.7 % and 51.8%, respectively, for delineating between expert staff physicians and trainees.
We demonstrate how a simulator and scoring system can be developed, tested, and refined to determine its quality for use as an assessment modality. Our evidence suggests that it can be used for formative assessment of trainees’ DKA management skills.
Diabetic ketoacidosis (DKA) accounts for an estimated 115,000 hospital discharges per year in the USA.1 Clinical management is suboptimal; in a single-centre chart audit of 55 patients admitted with DKA to a large teaching hospital, the mean time to insulin initiation (a key component of therapy) was 207 min, and 75 % were placed on an inappropriate hyperglycemia protocol that did not address the other metabolic derangements of DKA.2
DKA is a medical emergency necessitating hourly assessment of a myriad of dynamic clinical parameters, resulting in numerous critical decision-making points, which are further complicated by the complex interplay between management actions.3 While clinical knowledge is necessary, clinical reasoning and management skills are critical for successful patient management. One before-after study examined the effect of resident education on DKA knowledge4. Fifty-one residents undertook a web-based test consisting of 12 multiple-choice questions before and 6 months after the intervention. In addition to receiving test feedback and links to further reading, they attended two 1-hour didactic lectures and case-based discussion. The authors reported no change in resident knowledge between the two time points. How best to improve residents’ clinical reasoning and management skills related to DKA has yet to be studied fully.
In contrast to passive delivery of content (i.e., didactic lectures), research has shown that trainees acquire skills and develop expertise through deliberate practice. Ericsson5 , 6 describes deliberate practice as a set of “…activities that have been found most effective in improving performance,” consisting of nine elements: highly motivated learners, well-defined learning objectives, appropriate levels of difficulty, focused repetitive practice, reliable measurements, informative feedback, monitoring and error correction, evaluation and performance, and advancement to the next task.7
A meta-analysis comparing simulation-based training in which trainees followed deliberate practice principles to traditional clinical medical education found 14 studies (6 randomized trials, 3 cohort, 1 case-control, and 4 pre-post studies), which addressed procedural, auscultation, and life support skills in medical students and residents.7 All studies favored simulation-based training with deliberate practice over traditional education, with an overall effect size correlation of 0.71 (95 % CI 0.65–0.76, p < 0.001). Thus, deliberate practice has strong potential as a framework for designing the training and assessment of clinical skills, including medical students’ and residents’ DKA management skills.
These previous studies on deliberate practice have not clarified which of the nine elements are most responsible for the observed performance improvements. In order to optimize the effectiveness of educational interventions employing deliberate practice, a rigorous understanding of its key elements and the contribution of each is central. For example, Pusic et al. have demonstrated that repetitive practice, one of the key elements of deliberate practice, is essential for trainees to develop expertise.8 In a prospective cross-sectional study, 18 pediatric residents were asked to classify whether 234 cases of ankle radiographs were normal or abnormal. Learning was greatest between cases 21 to 50, highlighting the importance of repetitive practice in gaining expertise. Given the high number of repetitions required to gain expertise, Pusic et al. suggest that computer simulation is an ideal medium for tracking the development of deliberate practice and for clarifying which of its nine elements are most useful.9
Two of the key elements of deliberate practice are that informative feedback be provided from educational sources and that assessment scores are available to produce a mastery standard.7 Thus, before a simulator can be used as a medium for deliberate practice, it must have a robust scoring system for which favorable validity evidence exists. Recently, Cook et al. conducted another review of the simulation literature specifically looking for validity evidence and found a paucity of reports.10 , 11In particular, they noted little use of validity frameworks, which have been the gold standard approach in the fields of psychology and education since 1999.12
We aimed to develop a computer-based DKA simulator for medical training that included a robust scoring system. We collected validity evidence in order to judge whether the scores are appropriate measures of undergraduate and postgraduate medical trainees’ DKA management skills for both formative (e.g., identify students who require additional training) and summative (e.g., identify students who are competent) purposes. We chose to use the National Standards framework, which emphasises the collection of five sources of validity evidence, including content, response process, internal structure, relations with other variables, and consequences.12 , 13
First, we developed the DKA simulator, which relied on expert review of content. Next, we conducted usability testing of the simulator, which led to refinement of its content and functionality. We then developed the simulator scoring system and assessed our hypothesis that the in-built scoring system would produce favorable validity evidence demonstrating it is an appropriate measure of trainees’ DKA management skills.
Aim 1: Simulator Development and Refinement
The principal investigator (CY) identified key principles regarding DKA management in accordance with the Canadian Diabetes Association 2013 Clinical Practice Guidelines (CDA CPG)3 and incorporated those principles into clinical scenarios. In addition, she created linear equations that were modeled to simulate real-life parameters, such as vital signs and laboratory abnormalities. Six scenarios were designed to reflect the variety of presentations and management challenges (e.g., DKA with concurrent respiratory alkalosis; Appendix 1). Real-time progression of the case scenario included patient clips, interactive order entry, and presentation of laboratory and clinical data.
In keeping with best practices for the instructional design of simulation activities, we designed the simulator to include interactivity, individualized learning, preset action categories, feedback, repetitive practice with varying levels of difficulty, and contrasting cases.14 – 18 Specifically, learning was individualized based on the user’s actions, for example, if they failed to administer potassium, the “patient’s” serum potassium would fall and the user would receive specific feedback regarding aggressive potassium replacement. The simulator consisted of preset action categories, including items under clinical assessment, investigations, management, and nursing. Users received feedback based on their actions throughout and upon completion of the simulation, consisting of “Helpful Hints,” as well as a summary report indicating their performance in each management category and additional reading. For example, if they did not order an arterial blood gas, they were prompted to do so and given the rationale for ordering it. Finally, the simulator included six contrasting clinical scenarios with varying difficulty (for example, an older adult in hyperosmolar hyperglycemic state; Appendix 1). We also implemented elements of deliberate practice in our design: well-defined learning objectives or tasks; appropriate level of difficulty; informative feedback from educational sources; focused, repetitive practice; rigorous, reliable measurements; and monitoring, error correction, and more deliberate practice.7
Programming and platforms of delivery
The extensive programming required for the complex interactions between the simulated patient’s parameters and the learner’s actions was completed by a programmer with the LAMP stack (Linux Ubuntu Distro, Apache 2.0, MySQL 5.0 and PHP 5.3), CodeIgniter as the Model-Viewer-Controller framework, jQuery for front end logic, and the HTML5 Boiler Template and Modernizr to expedite development for cross-browser compliance. The computer-based program was delivered over the Internet and run on standard web browsers. Iterative design, refinement, and quality assurance occurred over a 12-month period.
Expert Review of Content
We invited four clinical experts (one endocrinologist, one intensivist, one general internist, and one emergency physician) in active clinical practice (>50 % of time performing clinical work) with frequent exposure to DKA, through convenience sampling. Each independently completed each scenario to assess the accuracy and realism of the content and was asked to complete a questionnaire assessing inaccuracies (Appendix 2). The questionnaire was developed by CY and reviewed by SES. In addition, CY took field notes of their comments as the expert ran through each simulator case (although this was not a formal think-aloud protocol).
The simulator underwent heuristic evaluation by a human factors engineer (SJ). Heuristic evaluation is conducted by usability experts, who review the product using a set of validated usability heuristics as guidelines, following the methodology defined by Nielsen.19 Usability issues were categorized by severity into minor, moderate, major, or catastrophic.
Based on recommendations from the expert content review and usability phases of our process, the prototype was modified through an iterative process of design and evaluation. Specific changes are described in the Results section.
Aim 2: Collecting Validity Evidence for the DKA Simulator Scoring System
Development of the Scoring System
We modeled our simulator scoring system from those in the literature.20 The seven priorities in DKA management3 comprise the seven domains of the scoring system, which are (1) potassium deficiency, (2) volume depletion and fluid replacement, (3) acidosis, (4) hyperglycemia, (5) precipitating cause, (6) organization of care (e.g., communication with nurse), and (7) monitoring of patient. This comprised a total of 18 performance items (Appendix 3). For each performance item, the simulator tabulated percentage of correct actions and identified critical errors performed. The simulator then calculated a 3-point scoring scale per item20 (Appendix 3), resulting in a final numerical score ranging from 18 to 54, where 18 represented unacceptable performance in all performance items and 54 represented acceptable performance in all performance items.
Collection of Validity Evidence
We conducted the validation phase at a large urban academic health sciences center.
We recruited individuals with varying levels of expertise in DKA management [undergraduate medical students in year 1 (MS1) with limited knowledge and expertise, undergraduate medical students in year 3 (MS3), postgraduate trainees in year 2 of internal medicine residency (PGY2), and staff endocrinologists with extensive knowledge and expertise]. We asked all participants to complete the simulator after viewing a tutorial and completing one practice run to familiarize them with the simulator; hints were not given during the practice run. Sample size could not be estimated based on previous data, as this was a new scoring system. However, we expected a large effect size (given the wide variation in expertise of the groups) and estimated that to achieve a power of 0.80 with an alpha of 0.05, a total sample size of 66 participants would be required.21
As outlined in Messick’s12 original work and itemized for medical education researchers,10 validity evidence can be organized into five categories. We note that it is not necessary (or usually possible) to collect all sources of validity evidence in a single study.10 Consequently, our methods emphasized assessment of validity related to content, internal structure, relations with other variables, and consequences (Table 1). A recent article provided an organizing framework that we used to choose which data elements to collect10 (Table 1).
We obtained approval from the ethical review board of the involved institution. Informed consent was obtained from all participants.
Aim 1: Simulator Development and Refinement
Expert Review of Content
Clinical experts thought that the simulator was reflective of real-life management of DKA. However, they felt that other medical care (for example, management of congestive heart failure) was neglected to focus on DKA. A summary of their comments is provided in Appendix 2.
No critical issues were identified. However, some usability issues were rated as ‘major’ or ‘moderate.’ For example, the purpose of “Notes to self” was not clear. Most of these were thought to be adequately addressed through training or by additional explanatory text (Table 2).
Based on recommendations from expert content review and heuristic evaluation, the prototype was modified by team members through an iterative process of design; refinements are indicated in Table 2. For example, we renamed “Notes to self” as “Medical Notes” and added a brief text below stating its purpose.
Aim 2: Collecting Validity Evidence for DKA Simulator Scoring System
Eighty-one participants were recruited to the study (Table 3). Sixty-eight participants (91 %) reported using other forms of information technology for medical-related learning (primarily online resources such as Up-to-date); of these, 0 participants reported previous exposure to simulation-based learning. On inspection of the data distribution, we identified six participants with scores greater than two standard deviations from the mean. Five of these outliers spent less than 60 seconds on the simulator, indicating that they did not complete the patient case and the sixth performed very poorly. We chose to eliminate all of these individuals from further analyses, leaving us with 75 participants in total.
Internal structure: For the seven subscales, Cronbach’s α was 0.795, indicating adequate internal consistency. The exploratory factor analysis revealed that the Kaiser-Meyer-Olkin value was 0.25, which suggests our sample size was inadequate for conducting such an analysis (the value should be >0.50).
Relations with other variables: According to our ANOVA, the mean overall simulator score showed a significant group difference (F (3, 71) = 11.2, p < 0.001). Post-hoc analyses using Tukey’s HSD revealed the source of the difference was that the MS1 group scored significantly lower than all other groups (p < 0.02). The other groups’ scores did not differ significantly (Figs. 1 and 2). Our correlation data suggested that self-reported comfort with managing DKA correlated with the simulator score (r = 0.55, p < 0.001), as did the medical students’ self-reported number of weeks on GIM rotation (r = 0.40, p < 0.014). Similarly, across all groups, our nonparametric variables of age and number of DKA patients treated correlated with score (p = 0.022 and p < 0.001, respectively). There was no correlation of score with residents’ self-reported number of months on GIM rotation or gender.
Consequences: We generated a receiver-operating characteristic (ROC) curve to define a simulator cutoff (“pass-fail”) score that would delineate a threshold between practicing physicians (considered ‘experts’) and trainees. Using the data from the curve, we calculated the Youden index (sum of sensitivity and specificity minus one) in order to identify the optimal cutoff score. We found that the largest value (0.47) occurred at a simulator score of 75 % (sensitivity of 94.7 %, specificity 51.8 %), demonstrating that a score of 75 % has high sensitivity (cutoff scoreable to identify 94.7 % of practicing physicians) but low specificity (cutoff scoreable to exclude 48.2 % of trainees). The area under the curve was fair at 0.73 ± 0.06 (95 % confidence interval: 0.61–0.85, p = 0.003).
We integrated guideline-based content and expert input, evidence-based instructional design strategies, and principles of user-centered design to develop an easy-to-use, engaging, and realistic computer-based DKA management simulator. We also evaluated validity evidence and judged the value of the evidence using two elements of deliberate practice: that informative feedback is provided from educational sources and that assessment scores are available to produce a mastery standard.7 Our judgment of the validity evidence is that it is mostly favorable for using the DKA simulator as a formative method for assessing trainees’ skill in DKA management. However, the data do not substantiate using the simulator for summative purposes: although performance of junior medical students differed from other groups, the low specificity of our cut point score suggests the scoring system is not yet sensitive to subtle DKA management performance differences between senior medical students and residents.
Current Validity Argument for Use of the DKA Simulator/Criteria for Effective Assessment
For a test to provide effective formative assessment for the learner, it should provide specific and actionable feedback, be integrated into the learning experience, and be timely and ongoing.23 Our DKA simulator provides feedback based on the learner’s actions and suggests correct management actions throughout the simulation and upon completion. Based on our content and relations with other variables' evidence, the simulator appears able to assess and differentiate a learner’s ability to identify and prioritize management options. Further research is needed, however, to ensure the feedback provided leads to performance improvements during prolonged periods of deliberate practice.
For a test to provide effective summative feedback for the learner and educator, it must consist of high-quality test material, a systematic standard-setting process, and secure administration as well as demonstration of validity, consistency, and equivalence.23 We created high-quality test material that was securely administrated and initiated a systematic standard setting process. However, although our collection of internal structure evidence demonstrated good internal consistency, our collection of consequence evidence, specifically the psychometric properties of the cutoff score, was not sufficiently strong to support its use for summative purposes; although sensitivity was high at 94.7 %, specificity was low at 51.8 %, thus not permitting accurate prediction of expertise. In addition, we have not yet assessed test-re-test reliability or equivalence (i.e., whether the same assessment yields equivalent scores or decisions when administered across different institutions or cycles of testing). In order to build upon a validity argument wherein the simulator score can be used to predict practice-ready competence in DKA management, additional consequence evidence such as evaluation with the actual pass rate (e.g., on objective structured clinical examination) will need to be collected.
Strengths and Limitations
Strengths of our simulator include its systematic development. User-reported limitations include its focus on DKA management, to the exclusion of other medical conditions; this was deemed an acceptable compromise given the intended focus of the simulator. A study strength includes our collection of multiple sources of validity evidence, which resulted in a more balanced assessment of the validity of our scoring system. Unlike previous studies in the literature,10 , 11 we collected not only evidence for relations with other variables, but also evidence for content, internal structure, and consequences. We believe this study serves as an example in moving the field of validation research methods forward in the domain of simulation-based medical education and assessment.
The current study is the first in a program of study that ultimately is aimed at impacting translational outcomes such as patient care practices, better patient outcomes, and collateral educational effects.24 For example, integration of the simulator into the medical curricula may improve resident knowledge and skills, the mean time to insulin initiation, prevalence of life-threatening hypokalemia, adequate fluid resuscitation, and subsequently patient morbidity and length of stay. Next steps of this research program are to explore further refinements to the scoring algorithm, how to most effectively implement the simulator in a curriculum, such as the optimal setting (for example, on-site invigilation by a coach versus self-study), and the optimal dose (for example, set number of case repetitions versus self-selected number of case repetitions). In addition, the simulator can be used to collect participant responses to clinical cues, which may be used to better understand the mechanism by which simulator cases can improve skills. Furthermore, the impact on clinical reasoning and the time course for these changes can be explored. Thus, computer-based simulation offers opportunities to improve trainee skill and to better understand how trainees learn.
Using the principles of deliberate practice and incorporating evidence-based instructional features, we developed a computer-based DKA management simulator. We subsequently collected an array of validity evidence for the scoring system including evidence on content, internal structure, relations with other variables, and consequences. Our next steps are to explore refinement of the scoring system and integration of the DKA simulator into medical education; pending these findings, the simulator will be refined and made available to the broader medical education audience.
Centers for Disease Control and Prevention DoDT. Diabetes Surveillance System. DKA as first-listed diagnosis for hospitalization. Centers for Disease Control and Prevention, Atlanta, GA 2005. http://www.cdc.gov/diabetes/statistics/dkafirst/fig1.htm. 2014. Accessed February 25, 2015.
Ferreri R. Treatment practices of diabetic ketoacidosis at a large teaching hospital. J Nurs Care Qual. 2008;23(2):47–54.
Canadian Diabetes Association Clinical Practice Guidelines Expert Committee. Clinical practice guidelines for the prevention and management of diabetes in Canada. Can J Diabetes. 2013;37(suppl 1):S1–S212.
Volkova NB, Fletcher CC, Tevendale RW, Munyaradzi SM, Elliot S, Peterson MW. Impact of a multidisciplinary approach to guideline implementation in diabetic ketoacidosis. Am J Med Qual. 2008;23(1):47–55.
Ericsson KA, Krampe R, Tesch-Romer C. The role of deliberate practice in the acquisition of expert performance. Psychol Rev. 1993;100(3):363–406.
Ericsson KA. Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Acad Med. 2004;79(S70).
McGaghie WC, Issenberg SB, Cohen ER, Barsuk JH, Wayne DB. Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Acad Med. 2011;86(6):706–711.
Pusic M, Pecaric M, Boutis K. How much practice is enough? Using learning curves to assess the deliberate practice of radiograph interpretation. Acad Med. 2011;86(6):731–736.
Pusic MV, Kessler D, Szyld D, Kalet A, Pecaric M, Boutis K. Experience curves as an organizing framework for deliberate practice in emergency medicine learning. Acad Emerg Med. 2012;19(12):1476–1480.
Cook DA, Brydges R, Zendejas B, Hamstra SJ, Hatala R. Technology-enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods, and reporting quality. Acad Med. 2013;88(6):872–883.
Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ. 2013:1–18. doi:10.1007/s10459-013-9458-4.
Messick S. Validity. In: Linn RL, ed. Educational Measurement. 3rd ed. New York: American Council on Education and Macmillan; 1989:13–103.
Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: Theory and application. Am J Med. 2006;119:166.e7–e16.
Issenberg SB, McGaghie WC, Petrusa ER, Lee Gordon D, Scalese RJ. Features and uses of high-fidelity medical simulations that lead to effective learning: A BEME systematic review. Med Teach. 2005;27(1):10–28.
Cook DA, Erwin PJ, Triola MM. Computerized virtual patients in health professions education: A systematic review and meta-analysis. Acad Med. 2010;85(10):1589–1602.
Michelson JD, Manning L. Competency assessment in simulation-based procedural education. Am J Surg. 2008;196(4):609–615.
Triola M, Feldman H, Kalet AL, Zabar S, Kachur EK, Gillespie C, et al. A randomized trial of teaching clinical skills using virtual and live standardized patients. J Gen Intern Med. 2006;21:424–429.
Cook DA, Triola MM. Virtual patients: A critical literature review and proposed next steps. Med Educ. 2009;43:303–311.
Nielsen J. How to Conduct a Heuristic Evaluation. 1995. http://www.nngroup.com/articles/how-to-conduct-a-heuristic-evaluation/. Accessed February 25, 2015.
Napier F, Davies RP, Baldock C, Stevens H, Lockey AS, Bullock I, et al. Validation for a scoring system of the ALS cardiac arrest simulation test (CASTest). Simul Educ. 2009;80(9):1034–1038.
Cohen J. Statistical Power Analysis for the Behavioral Sciences. New Jersey: Lawrence Erlbaum Associates; 1988.
Tavakol M, Dennick R. Standard setting: the application of the receiver operating characteristic method. Int J Med Educ. 2012;3:198–200.
Norcini J, Anderson B, Bollela V, Burch V, Costa MJ, Duvivier R, et al. Criteria for good assessment: consensus statement and recommendations from the Ottawa 2010 Conference. Med Teach. 2011;33(3):206–214.
McGaghie WC, Issenberg SB, Barsuk JH, Wayne DB. A critical review of simulation-based mastery learning with translational outcomes. Med Educ. 2014;48(4):375–385.
CHY conceived of the study, developed and refined the simulator, conducted the study, analyzed and synthesized the results, and drafted the manuscript. SES oversaw simulator development and study conduct and provided critical review of the manuscript. RB analyzed and synthesized the results and provided critical review of the manuscript. All of the authors approved the final version submitted for publication. We thank Evermight (John Lai, Sid Momin, Al Momin) for conducting the programming, Sasha Jovicic for conducting the heuristic evaluation, and Dr. Chi Ming Chow for his advice. We are also grateful to all trainees and staff for participating in our study.
The authors are grateful for financial support from the Department of Medicine, University of Toronto, and the Banting and Best Diabetes Centre, University of Toronto. Dr. SE Straus is supported by a Tier 1 Canada Research Chair.
Abstract presented at the Vascular Meeting 2013 and World Diabetes Congress 2013.
Conflict of Interest
The authors declare that they do not have a conflict of interest.
The study received ethical approval from the institutional review board of the academic center. This work was carried out in accordance with the Declaration of Helsinki, including but not limited to there being no potential harm to participants, the anonymity of participants is guaranteed, and the informed consent of participants was obtained.
About this article
Cite this article
Yu, C.H.Y., Straus, S. & PhD, R.B. The ABCs of DKA: Development and Validation of a Computer-Based Simulator and Scoring System. J GEN INTERN MED 30, 1319–1332 (2015). https://doi.org/10.1007/s11606-015-3273-y
- medical education
- assessment/evaluation, medical education
- clinical skills training, medical education
- computer/web-based training, medical education
- instructional design, medical education