Assessment of Clinical Reasoning Using the CBCR Test

A CBCR course for preclinical students should be completed with some form of student evaluation, as any course in a medical curriculum should be concluded with a valid decision about the extent to which every student has reached the objectives of the course. Many students proceed through education with one predominant, returning question in mind: what must I do to pass this course and its examination? This guides their efforts in learning, and as the saying goes, “assessment drives learning” or “students rather learn what you inspect than what you expect .” This is not something to be disappointed about, but a fact of university life that should be understood and respected. After all, it is the school that sets these rules. However, the lesson to be learned is that assessment should be aligned with the educational objectives in such way that what we inspect is exactly what we expect (Biggs 1996). If we test students on clinical reasoning skills, the examination should rather not consist of four-item multiple-choice or true/false questions that focus on factual knowledge. That would drive students in the direction of rehearsing the systems approach to biomedical knowledge that, as we have shown, is in contrast with the patient-oriented approach in CBCR sessions. Assessment needs to focus more spe-cifically on clinical reasoning skill, acknowledging however the importance of biomedical knowledge for clinical reasoning. That should include questions like: What, at this stage, is a likely differential diagnosis? What findings would you expect to find with physical examination if hypothesis X were true? Which laboratory tests of the ordering list would you check for this patient at this stage? In other words, multiple options from a larger array of possibilities can suit the assessment of clinical reasoning better than standard four-option MC questions.


The CBCR Test as Developed at the University Medical Center Utrecht
In 2010 a specific written assessment format was developed for the CBCR course at the University Center Utrecht, which has been in place ever since that time. The aim of the design was to align the test as closely as possible with both the CBCR course as delivered and the desired future skill of patient-oriented clinical reasoning.
The case-based questions all follow the course of the clinical encounter, starting with a case title that reflects the initial information the physician normally would have (age, gender, and main complaint) and a short clinical presentation vignette. Then, a series of questions about the case unfolds.
Features from different established item types as discussed in Chap. 5 have been incorporated: • Relatively long lists of options, such as used in extended matching questions and long menu items • A differential diagnostic approach from the comprehensive integrated puzzle items through the "alternative scenarios" approach • The addition of new information to alter a hypothesis as used in the script concordance test.
The first and foremost feature of the Utrecht CBCR test is the integrated, patientoriented nature of the test. The unit of focus is a patient with several option lists (diagnostic hypotheses, history findings, physical examination findings, diagnostic findings, management options, etc.) that all pertain to focused questions following from the initial patient presentation or from follow-up information about this patient. The test has a limited number of cases but a broad enough set to reduce the threat of case specificity and has a series of questions within each case. The CBCR test has the following characteristics.

Alignment with Actual Cases Discussed
An important purpose of the CBCR course for junior students is to mentally "install" a basic framework of a limited number of illness scripts. The Utrecht CBCR test therefore closely follows the cases discussed during the CBCR sessions. There is no expectation of substantial transfer of learning, i.e., of a benefit of studying cases for the ability to handle other cases and no aim to test that transfer ability early in the medical curriculum. Indeed, an additional aim of closely following the cases studied is to reinforce the learned scripts both by rehearsing for the test and during the test itself. For senior medical students, who have acquired a basic mental framework of illness scripts in their long-term memory, it may well be recommended to deviate from cases discussed, to gradually test a more general clinical reasoning ability, but this is not the aim of the end-of-course CBCR test for the preclinical students.

Cases, Options Lists, and Scenarios
The case is the unit of focus for test items. The question starts with a short case vignette that usually resembles an initial presentation of the patient at the primary care doctor's office, the emergency department, or elsewhere, similar to the start of a CBCR education case. Several option lists that may vary in length from 5 to 25 or more options accompany the case presentation. These lists include the following nine optional categories: (a) diagnoses, (b) history questions, (c) history findings, (d) physical examination procedures, (e) physical examination findings, (f) diagnostic test options, (g) diagnostic test findings, (h) management options, and (i) prognosis. Usually, a limited number of lists (4 or 5) will be used for one case. The distinction between history questions and history findings is that either the question can be "What are the [two] most relevant next questions to ask?" or "Which [three] history findings would you expect if hypothesis X were true?" (or equivalent questions). This also holds for physical examination and diagnostic tests. All questions may ask to check as many options from a list as the item writer finds suitable, e.g., "Which four lab findings would you expect to find given what you know about this patient?" However, the number of correct options should not be more than one third of the total list, and preferably much less, to avoid successful guessing.
Questions may be about a differential diagnosis following from the initial case vignette, but cases generally include one or more sequential scenarios. A scenario is defined as a deviation from the initial course of happenings or findings, but still relates to the same patient (age, sex, main complaint, and initial vignette). A scenario usually starts like this. "Scenario B. Presume, the radiology of this patient's thorax shows a lump in the left lower lung, which two hypotheses from the list (a) Diagnoses would now be most likely?" Or "Scenario C. Presume, the radiology of this patient's thorax would show no pathology, which two hypotheses from the list (a) Diagnoses would now be most likely?" Case title, initial vignette, and all options lists are identical throughout a case, but correct options (and the number of requested options) vary.
In practice, most cases include one to three scenarios. When there is only one scenario, the word scenario is not used.

CBCR Test Quality Findings Since 2010
Between December 2010 and April 2017, the test has been administered 14 times, with on average about 12 cases and 50 items (about four items per case), for cohorts of about 300 students. The test reliability averaged 0.73 (Cronbach's alpha) for tests with an average duration of not much more than 1 h. In this course, spread across 8 months, a test is administered twice (in December and April), and the final score combines both sub-scores. This combined score would compare with a test with 24 cases, 2-2.5 h of testing and an estimated average reliability of 0.84 (estimated with the Spearman-Brown formula). This is more than satisfactory and more efficient than key feature tests (Page and Bordage 1995).
The test questions were derived from small group CBCR cases, supplemented with cases presented in a lecture hall. We will not expand on the latter cases, which constituted a minority of the items, but they were suitable for a similar approach to clinical reasoning testing, given the large group education format (Borleffs et al. 2003). Given the richness of the cases discussed in het CBCR sessions, it was possible to devise new questions for each test, hardly without using any previous questions.

Electronic and Paper Versions
All CBCR tests except one were administered electronically. The first version of the test was not much more than a protected interface. After logging in with a general password, candidates could access all questions per case on one screen: The left side of the screen showed a series of questions and open fill-in-the-blank-slots; on the right side of the screen, all relevant options lists were displayed. Candidates were then asked to enter a three-digit number into the relevant slots at the left side of the screen. The resulting file was exported as an Excel ® file, available after the test administration and suitable for analysis with an elaborate Excel ® analysis application and statistical software.
A next version of the application showed a professional interface with check boxes instead of fill-in-the-blank slots. Currently, a commercial test administration firm, TestVision ® (www.testvision.nl), has incorporated the CBCR item format requirements into a professional application.

What the Utrecht CBCR Test Does Not Provide
The Utrecht CBCR test approach has limitations. In clinical reasoning, a logical question is "what is the most likely diagnosis?". All question types that are not constructed response suffer from some cueing, as the answer can be chosen from a list. Script concordance testing and the comprehensive integrative puzzle approaches have simply given up on this requirement as diagnostic hypotheses are given and not asked (Charlin et al. 2000;Ber 2003;Groothoff et al. 2008). CBCR test questions provide a list to choose from that can be as long as the item writer wishes, thereby somewhat limiting this cueing, similar to extended matching questions and long menu questions (Case and Swanson 1998;Schuwirth et al. 1996). The recommended length is 20 options. In practice, they are often shorter and sometimes longer.
Another feature that is not supported is the possibility to evaluate a chain of interdependent reasoning questions and answers of a candidate, requiring conditional links between consecutive answers (if the student chooses X on item 1, then item 2 will be adapted). That possibility is not provided. It would be possible to value an answer differently if it follows upon a previous wrong answer, as the two answers may be correctly related to each other. If the chain of reasoning becomes longer, however, the potential branches to be evaluated would quickly become too many to manage. The use of parallel scenarios however compensates this by the possibility of branching options from the same patient. A question of a parallel scenario always starts with "Presume,.." e.g., "Presume, you have received result X from diagnostic test Y, what then would be the most likely diagnosis?".
Finally, the current use of CBCR test methodology does not allow for weighing of items within answers ("Provide a differential diagnosis of three in a correct order of likelihood"); that now requires multiple questions ("What is the most likely diagnosis?" and "Name two other diagnostic hypotheses"). These are technical limitations that in the future may be solved with sophisticated software.

Rules and Regulations Around the Utrecht CBCR Test
In practice, the Utrecht CBCR test is administered twice a year, each for half of the final score. Students pass the course requirements if their final test score, combined with proof of active participation in the course as a student and as a peer teacher, if satisfactory. As participation also yields a score, test and participation scores are combined to cover 88% and 12% of the final score, respectively, which was found to be a useful ratio. We will not expand on how the participation rate per student is calculated, but details can be found in the model study guide in Chap. 10. Students who do not pass the requirements can opt for a retake of the examination.

Issues of Validity of the CBCR Test
The Utrecht CBCR tests as applied since 2010 have an undisputed content validity, as they are built upon cases that are used in the course, and they cover all cases.
The construct validity of the Utrecht CBCR test approach remains to be investigated. Table 7.1 shows an example of a CBCR test question, derived from a case that is used in education. The representation of the question may have different forms when presented as an actual test. The initial case vignette should remain visible while students proceed with scenarios through the case. Supplementary visual information, such as a photo of the patient, an X-ray image, or others may appear when indicated.

Scoring of Items
Once students have taken the test, a file results with all answers. A common format is an Excel® file with rows per student and columns of answers per option. See Fig. 7.1 for an example. The unit of scoring is the option. If Question 1 from Case 1, Scenario A, asks for a differential diagnosis of four hypotheses that need to be considered with this patient, all students will have an answer for a, b, c, d. Scores are counted for each question into a sum score of 0 to 4 or 5, depending on the number of options requested in that question. For psychometric purposes (calculation of Cronbach's alpha reliability and item analysis) the unit of analysis is the question (but there are arguments to use scenarios or cases as units of analysis). As with any test, we recommend to conduct an item analysis to determine whether any items need to be removed before final scores for students are disclosed. Those final scores can be calculated in different ways. The Utrecht procedure is to first calculate the mean guessing rate per item (e.g. 20 or 25%) and take that percentage of the maximum score as a bottom score. E.g., a test of 50 questions and 120 options, with an average guessing score of 21% yields a bottom score of 25. Subtracting that from the maximum score of 120 means that all students receive a score between 0 and 95 points. As we want to end the CBCR course with a final score that combines two test scores (88%) and a score for participation (12%), and because we use a 100-point scale, each test score must be recalculated to a 0-44 points range.

Checklist for Item Writers
We end with a checklist for writers of test questions for CBCR-tests. Box 7.1 includes a number of pitfalls to avoid and recommendations to follow.

Box 7.1 Checklist for Item Writers of the Utrecht CBCR Test
1. Always include age, sex and main complaint, and sign or symptom in the case title. 2. Always relate to this individual patient. Instead of "Which two physical examination findings do patients with complaint X always have," ask: "Which two physical examination features do you expect to find when examining this patient, based on her complaint X?". Stimulate students to think from a patient-oriented perspective, also during the test. "...do you expect to find..." is the aimed typical hypothesis-driven thinking mode and is regularly used in CBCR test items. In many cases, it is sensible to add "...if this hypothesis is correct." Check whether an answer requires a preceding case vignette; if not, it is probably not a question about the individual patient. 3. When writing items, stay close to how the information arrived at the physician, e.g., the history as the patient presents its physical examination as what the physicians sees or hears), rather that interpreting and summarizing with semantic qualifiers (avoid "The patient provides a family history of cardiovascular disease," and rather "…has yellow sclerae" than "...is icteric"), unless the presentation would be a discharge letter from a hospital. 4. Be specific about the number of requested options from the list (and avoid "at least" or "maximum number" of options to be checked). 5. Make sure that the list contains options that do not overlap or include each other. Also, do not include two items in the list that evidently exclude each other, while the list contains other options ("age younger than 30, age 30 or older, and age 50" should not all be used in one listing). 6. Formulate finding options specific rather than general (Age 46 rather than older than 40; "Pain since three weeks" rather than "Pain since quite some time"); again formulate how this information is presented by the patient. 7. Intersperse questions with small follow-up information text. The following lab results are reported: [...]; then continue with a next question. 8. Start a new scenario if there is a deviation from earlier information or questions. The first question of a new scenario typically starts with "Scenario B. Presume, the patient had said/shown/...." If there is no branching or deviation from earlier information, "scenario" terminology is not needed. Students must understand the significance of the "scenario" terminology in their instruction about this test.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.