1 Why Must We Develop Assessment Methods?

The practice of medicine depends on qualified doctors who strive to achieve and maintain appropriate knowledge, skills, and attitude. Competent doctors are the need of the hour and hence tests of clinical competence, which allow a decision to be made about whether or not a doctor is fit to practice are in demand. This demand poses a challenge for all involved in medical education. Therefore, assessment and evaluation of medical trainees play important roles in choosing good doctors.

Assessment promotes learning and for this assessment needs to be educational, formative, and also summative. Students learn from the assessment process and receive feedback on which they build their knowledge and skills. Wass et al. pragmatically describe assessment to be the most appropriate engine on which the curriculum is harnessed. They feel assessment should not only aim at certification and exclusion but also influence the learning process [1].

Neufeld and Norman have listed key measurement issues that should be addressed when designing assessments of clinical competencies [2].

figure a

To assess skill, knowledge, and attitude in medicine, a combination of assessment techniques is always required. Selecting an assessment technique not only depends on measuring students’ performance but also needs to address issues like cost, suitability, and safety. These play a major role in inter-institutional variations in selecting assessment methods and success [3].

2 Are Objective Assessment Methods Reliable and Valid?

Reliability measures the reproducibility or consistency of a test. It is affected by examiner judgments, types of cases, nervousness of candidates, and test conditions. Two important aspects of reliability are inter-rater reliability and inter-case (candidate) reliability. Inter-rater reliability measures how consistent different examiners are in rating candidates’ performance. The use of multiple examiners improves interrater reliability [4]. Intercase reliability which measures consistency of candidate performance across different cases is the most important aspect of testing clinical competence. Multiple sampling across many cases improves intercase reliability as compared to candidates assessed on a single case (Fig. 41.1). Clinical skill testing has now moved to the multicase format with increasing use of assessment techniques like the objective structured clinical examination (OSCE). OSCE consists of multiple tasks in multiple stations with sufficient testing time which helps achieve adequate intercase reliability. Length also plays a critical role in determining reliability [5].

Fig. 41.1
figure 1

Reported reliability when 4 h testing times are used for different test formats (MCQ = multiple-choice examination; PMP = patient management problem; OSCE = objective structured clinical examination)

Validity determines if a test actually succeeds in testing the competencies that it is designed to test. As valid measures of clinical competence are absent, Miller introduced the concept of pyramid of competence (Fig. 41.2). It is a conceptual model outlining issues involved in analyzing validity.

Fig. 41.2
figure 2

Miller’s pyramid of competence (SP = simulated patients; OSCE = objective structured clinical examination; MCQ = multiple-choice questions)

All facets essential for clinical competence are covered by the pyramid. The base of the pyramid represents the knowledge components of competence: knows (basic facts) followed by knows how (applied knowledge). These are easily assessed by written tests of clinical knowledge like multiple-choice questions. Assessment of competency of a qualifying doctor needs evaluation of more important facet which is ‘shows how’. It evaluates behavioural function which involves hands-on demonstration. Ultimate valid assessment of clinical competence is to test a doctor’s actual performance, which involves assessing the summit of the pyramid using modalities like OSCE [6].

3 Is Traditional Assessment Inferior to Objective Assessment?

Medical education facilitates learning and encourages acquiring factual knowledge, improving professional skills and developing skills of application like critical reflection, problem solving, and reasoning. Until recently assessments of medical students depended on traditional methods like essay type questions and long case/viva voces, which typically required students to memorize large amounts of content without needing to apply it.

Unfortunately what and how a student learns depends on how he/she thinks will be assessed. And the use of traditional assessment methods leads a student to memorize and reproduce factual information in order to get a good grade and much of this information is forgotten within a week. It also relies on the assessment by examiners with different teaching experiences that leads to increased subjectivity and reduced reliability of the examination [7].

The merits and demerits of traditional assessment methods can be summarized as follows [8]:

Merits

  • Global judgement of the skills of the student.

  • No compartmentalization of the clinical skills to be judged.

  • Less time consuming.

  • Less effort in organization and conduction of the examination.

  • More interaction between examiners and examinees.

Demerits

  • Biased system hence less valid and reliable.

  • Lacks the structure and uniformity to be used as an assessment tool.

  • Affective skills like communication, history taking are not judged.

  • Requires experienced faculty for the judgement of student’s performance.

These limitations have led to a search for an objective, structured, and unbiased assessment tool that is reliable and valid. Objective assessment methods like multiple-choice question’s (MCQ’s), objective structured practical examination (OSPE), and objective structured practical examination (OSPE) have helped address these issues and have now largely replaced traditional assessment methods.

4 Multiple-Choice Questions (MCQs): What, Why, and When?

Multiple-Choice Questions (MCQs) have now become the most widely applicable, useful and accepted type of objective assessment. They help assess all important facets of educational outcomes which are knowledge, understanding, judgement, and problem solving. Introduced in medical education since the 1950s, MCQs have now become well established as a reliable examination tool in assessments of both undergraduate and post-graduate students. The MCQ which has become synonymous with objective evaluation and consists of questions for which there is a prior agreement on what constitutes the correct answers [9].

MCQs are reliable and easy to score. They also help in wide sampling of knowledge in a limited time. Through a short and time-efficient examination, the length and breadth of any topic are assessed. The beauty of this assessment method is that, apart from the recall of isolated facts, it also helps to assess taxonomically higher-order cognitive processing such as interpretation, synthesis, and application of knowledge [10]. Apart from reliability MCQs are also discriminatory, reproducible, and cost effective. There is a general consensus that rather than using MCQs as a sole method of examination it can be used alongside other evaluation methods to broaden the range of skills to be assessed in medical education [11].

Even though considerable effort goes into framing MCQs, their high objectivity helps in the immediate release of results as they can be marked by any person or machine. They also allow easy collection and analysis of raw data and also comparisons with past performances. Another advantage is their ability to assess a large number of candidates easily and making use of computers [12, 13].

Medical teachers are often faced with the challenging task of constructing good MCQs that test higher-order thinking skills. Unfortunately, they often have little or no experience or training in constructing MCQs. Preparing a good MCQ is difficult and time consuming. Most institutes now emphasize on faculty development programmes that concentrate on MCQ construction and implementation.

figure b

5 How to Design a Good MCQ?

MCQs can be prepared in different patterns. Commonly used formats are ‘one correct answer’, single or one best response, ‘true or false’ and ‘multiple true or false’, ‘matching’, and the ‘extended matching questions or items types [14]. Single best options are the most common format of MCQs which are widely used and accepted.

Before preparing an MCQ one must consider objectives that need to be sampled and areas to be tested. Learning outcome also needs to be determined before sampling to ensure the high validity of the test. Another important issue to be addressed is learning objectives which the learners are expected to achieve. Learning objectives can be formulated to SMART an acronym representing goals that are specific, measurable, attainable, realistic, and time bound [15,16,17].

Benjamin Bloom was an educational psychologist who divided what and how we learn into three separate domains of learning [18]:

  1. 1.

    Cognitive domain—related to thinking/knowledge (K).

  2. 2.

    Affective domain—related to feeling/attitudes (A).

  3. 3.

    Psychomotor domain—related to doing/skills or practice (P).

In 1956 he also published a taxonomy of cognitive learning, described as a hierarchy of (i) knowledge, (ii) comprehension, (iii) application, (iv) analysis, (v) synthesis, and (vi) evaluation. After nearly four decades in 2001, the last 2 taxonomies were revised to (v) evaluation and (vi) creation (Fig. 41.3) [19].

Fig. 41.3
figure 3

Suitability of testing revised Bloom’s cognitive taxonomic level learning by MCQs

MCQs designed to test knowledge (lower-level learning) would not be appropriate to test competence for objectives that reflect analysis (higher-level learning). Importance of skill and knowledge objectives should be provided in the educational programmes and measurable objectives should allow the assessment of achievement of the same [20].

6 What Needs to Be Done to Construct MCQs?

The first step for conducting MCQs is to have a blueprint also known as a test specification table. It is a guide that helps create a balanced examination and consists of a list of competence and topics that need to be tested. Three important contents of a good blueprint are:

  1. 1.

    Content/objectives to be tested.

  2. 2.

    Questions that design to test the content/objective.

  3. 3.

    Learning domain and levels of testing.

It is a three-dimensional chart where the placement of each question and the content area is represented. It provides a solid foundation on which test activity is developed. It offers an evidence for content validity and also makes assessment more meaningful [21, 22].

MCQs need to have good grammar, appropriate punctuation, and avoid spelling errors. One must also minimize the time required to read each item. Basically, an MCQ consists of a stem or a lead-in question which is followed by 4–5 answers or options. The option which matches the key in a MCQ is best called ‘the correct answer’ and the other options are called the ‘distracters’. An ideal question is one that can be answered by 60–65% of the tested population. One must avoid unintended cues like making correct answers longer than the distracters. The instructions to answer these questions should be clear and uniform [23, 24].

A good distracter should be inferior to the correct answer but at the same time should also be plausible to a non-competent candidate. All options should contain facts and have a varying degree of acceptance. Only one answer should be correct and should match the examiner’s key. Commonly asked questions include the most appropriate, most common, least harmful, or any other feature which is at the uppermost or lowermost point in a range. The options need to be homogenous in both content and length. Options like always, never, completely, all of above, and none of the above should be avoided. Preparing appropriate distracters is challenging and needs a lot of effort [25].

Well-constructed MCQs aim at testing the application of medical knowledge (context-rich) rather than just the recall of information (context-free). Context-rich questions stimulate the thinking process, and represent the candidate’s problem-solving ability better than context-free questions. Practical problems encountered in clinical practice should be assessed rather than assessing knowledge of trivial facts or obscure problems rarely seen. MCQs should aim at making testing both fair and consequentially valid. MCQs should strategically evaluate important content and clinical competence [23, 26].

7 Can Objective Structured Clinical Examination (OSCE) and Objective Structured Practical Examinations (OSPE) Replace the Traditional Viva Voce?

Medical education has undergone a paradigm shift towards a more competency-based system and as a corollary, competency-based medical assessment. The Objective Structured Clinical Examination (OSCE) and its derivative the Objective Structured Practical Examination (OSPE) have been introduced as measures of competence, which avoid many biases associated with the conventional methods.

Harden et al. were the first to describe OSCE in 1975. OSCE /OSPE assesses clinical or practical competencies in a methodical, objective, and time-orientated manner with direct observation of the student’s performance during planned clinical or test stations. The third level of Miller’s pyramid ‘shows how’ is assessed. The student is evaluated on the performance of specific skill sets in a controlled setting [6, 27, 28].

The traditional examination focuses more on global performance rather than a student’s clinical competency. It mainly addresses the ‘knows’ and ‘knows how’ aspects of Miller’s pyramid of competence. Evaluation is often subjective, biased, monotonous, and inadequate in evaluating the overall performance of the student. Other attributes like attitude, communication skills, interpersonal skills, ethical issues, and professional judgements are not tested. Also, the need to understand core topics and develop problem-solving skills is not covered by traditional assessment methods. Another drawback is the variation of the examiner’s subjectivity which in turn reduces the reliability of the examination. It has been seen that subjectivity reduces the correlation coefficient between marks given to the same student to as low as 0.25. This affects scoring and results in dissatisfaction among both the examiners and examinees. Also, traditional methods lack a proper feedback process which is essential to improve one’s skills [7, 29].

8 How to Conduct OSCE and OSPE Assessments?

Conducting an OSCE or OSPE requires considerable effort and preparation. Here too the first step of planning is designing a blueprint of the structured checklist for observed and unobserved stations based on Bloom’s taxonomy. An examiners’ and students’ instruction manuals also should be considered while designing a blueprint. Checklists of clinical procedures, manuals, and standard answers need to be checked and validated by senior faculty members and medical educators [18].

Based on requirements the number of OSCE/OSPE station is decided. Apart from knowledge, the stations should also focus on evaluating communication, psychomotor, and clinical skills. The stations also need to be designed with difficulty levels ranging from ‘must know’ to ‘desirable to know’ to ‘nice to know’ sections. Stations can be either question and answer stations or procedure stations. Ideally, a procedure station should be followed by a question and answer station pertaining to the previous procedure. At procedure stations, students are expected to perform a focused history or examination on standardized patients. Other focused tasks like interpreting X-rays, electrocardiograms, and microscopic slides also can be evaluated. About 3–5 min are allotted for each station and it is recommended to have a few rest stations in between. Adequate time is given between stations to facilitate student movement [30].

Marking in OSCE/OSPE is relatively simple. Every examiner has a previously agreed-upon checklist of items with assigned points. The student is marked based on each piece of predetermined key information obtained or physical manoeuvre performed. A Likert-like scale ranging from 1 to 5 can also be used to grade overall efficacy. The final score is based on a compilation of marks obtained at different stations and the overall score [31].

Debates regarding the reliability and validity of OSCE/OSPE have been put to rest through multiple studies [31,32,33]. van der Vleuten and Swanson have recommended a few steps to improve reliability like using checklists, having standardized patients to maximize reproducibility, increasing hands-on skill stations, and maximizing testing time to 3–4 h [34].

The merits and demerits of objective testing using MCQs or OSCE/OSPE can be listed as follows [8]:

Merits

  • With comprehensive blueprinting cognitive, psychomotor domains, and high order thoughts can be effectively examined.

  • OSPE/OCSE help assess affective domain skills like history taking, communication.

  • Competence-based assessment.

  • Good teaching-learning tool with appropriate feedback.

  • Less experienced faculty members can be incorporated for assessment.

  • All the students are asked similar types of questions hence assessment is less biased.

Demerits

  • Blueprinting of the syllabus, validation of the comprehensive checklist is tedious and time consuming.

  • Administration and conduction of an MCQ-based examination or OSCE/OSPE is time consuming and laborious, money and resource intensive.

  • There is less interaction between the examiner and examinee.

  • Limited scope of questions.

  • Constant need to innovate and develop MCQ, OSPE, and OSPE banks to prevent repetition.

9 Conclusion: Where are We and Where Do We Need to Go?

  • The primary concern of medical education is clinical performance measurement which remains elusive.

  • Even though traditional assessment methods have been replaced by more objective evaluation systems like MCQs, OSCEs, and OSPEs with studies showing a significant correlation between the two, but a gold standard for such comparisons still does not exist.

  • Creation of a competency-based curriculum and appropriate tools to evaluate that curriculum is the need of the hour.

  • The literature supports the role of these objective modalities in the evaluation of knowledge, skill, and competency. One can conclude that combining objective assessment methods with traditional methods along with direct observation in the clinical setting has the potential to become the gold standard to measure a physician’s competence.