Introduction

In 1975, Sinclair wrote an editorial in The Lancet expressing his concerns about medical students’ low level of anatomical knowledge [1]. Ever since, many other authors have reported similar concerns [2,3,4,5,6,7,8]. In the contemporary literature, clinicians, as well as medical students, report concerns about what they perceive as their own insufficient knowledge of anatomy [9,10,11,12]. Some authors even suggest that this lack of anatomical knowledge is the reason why the number of medicolegal claims in healthcare is rising [13, 14]. Anatomical knowledge facilitates learning pathophysiology, supports the examination of a patient, and facilitates rendering a diagnosis [7]. Hence, a good understanding of human anatomy is important not only for surgeons but for all medical specialists to ensure safe clinical practice [7]. Numerous studies describe interventions and education programmes to improve anatomical knowledge, suggesting that there is a need for improvement [15, 16]. However, research on the actual level of anatomical knowledge and the impact of the suggested shortage of anatomical knowledge is scarce. Of the few studies that aim to assess knowledge, many focus on individual opinions instead of on quantification of anatomical knowledge [17].

Methods

The aim of this review was to gain more insight into the level of anatomical knowledge among medical students, residents, fellows, and specialists by performing a literature review of studies that quantify anatomical knowledge.

The meaning of those findings is discussed from two different scientific perspectives: the deontological one and the utilitarian stance [18].

The deontological perspective is an ethical theory which places special emphasis on the relationship between duty and the morality of human actions. In deontological ethics, an action is considered morally good because of some characteristic of the action itself not because the product of the action is good. The theory believes that ethical actions follow universal moral laws, such as “Don’t lie. Don’t steal. Don’t cheat”. Unlike consequentialism, which judges actions by their results, deontology does not require weighing the costs and benefits of a situation. This avoids subjectivity and uncertainty because you only have to follow set rules. So, following the rules makes deontology easy to apply. But it also means disregarding the possible consequences of our actions when determining what is right and what is wrong.

An example of deontological stance: suppose you are a software engineer and learn that a nuclear missile is about to launch that might start a war. You can hack the network and cancel the launch, but it is against your professional code of ethics to break into any software system without permission. And, it is a form of lying and cheating. Deontology advises not to violate this rule. However, in letting the missile launch, thousands of people will die.

Utilitarianism, a form of consequentialism, is an ethical theory that determines right from wrong by focusing on outcomes. The utilitarian stance holds that the most ethical choice is the one that will produce the greatest good for the greatest number. However, because we cannot predict the future, it is difficult to know with certainty whether the consequences of our actions will be good or bad.

An example of utilitarianism: assume a hospital has four people whose lives depend upon receiving organ transplants: a heart, lungs, a kidney, and a liver. If a healthy person wanders into the hospital, his organs could be harvested to save four lives at the expense of one life. This would arguably produce the greatest good for the greatest number. But, few would consider it an acceptable course of action, let alone the most ethical one.

This study was written in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) items that were relevant for this review [19].

Search

A comprehensive search was performed in the following online databases: Medline (using PubMed), Web of Science, and Education Resources Information Centre (ERIC). We used both medical subject headings (MeSH) and text terms from January 1, 1995, to October 15, 2018. The structured search can be reproduced using the following keywords and logical operators: ((("Students, Medical"[Mesh] OR "Medical students" OR "Medical student" OR "Resident" OR "Residents" OR "Fellow")) AND ("Anatomy/education"[Mesh] OR "Anatomy knowledge" OR "Anatomical knowledge" OR "Clinical anatomy" OR "Anatomy education" OR "Anatomical education")) AND ("Testing" OR "Test" OR "Examination" OR "Test result" OR "Achievement" OR "Cognitive load" OR "Skill" OR "Effectiveness" OR Outcome OR Measurement))).

Study Selection

Two researchers (D.M.K. and C.S.) selected the studies. First, manuscript titles and abstracts were screened for potential relevance. For all of the selected studies, the full text was reviewed to determine eligibility. In case of disagreement about a study, two other researchers (S.M.J.v.K. and K.N.) decided whether the study was suitable for this literature review or not. We included all studies written in English in which anatomical knowledge was tested among medical students, residents, fellows, or medical doctors.

Over the last decades, anatomy education changed in many universities. Therefore, we chose to exclude any studies conducted before 1995.

In the case of a mixed group of participants (i.e. physician assistants and medical students), only those studies which described the results separately for the different participants were included. From these studies, we only included the participants who fulfilled the inclusion criteria.

The flowchart of the literature search is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of literature search

Scaled Score

We anticipated heterogeneity with respect to the quantification of the test results between the different studies and within the studies by using different scales or scores. In order to aid interpretation, all the scales were recalculated to a scaled average test score with a range instead of the SD between 0 and 100%.

Results

Study Selection

The flowchart of the literature search is shown in Fig. 1.

The electronic search strategy identified 1141 studies, which were assessed for eligibility. After the exclusion of duplicates and studies conducted before 1995, 721 studies remained eligible. Titles and abstracts were screened for eligibility, and 65 articles were selected for further reading. After full-text reading, 29 articles were selected for inclusion. A cross-reference search of the references of the included articles resulted in one additional relevant article. A total of 30 articles were included.

Study Characteristics

Details of the included studies are summarised in Tables 1 and 2.

Table 1 Anatomical knowledge [20, 21, 26, 27, 30, 33, 34, 38,39,40,41]
Table 2 Intervention studies [23,24,25, 28, 29, 31, 32, 42,43,44,45,46,47,48,49,50,51,52,53]

Table 1 shows studies whose primary aim was quantifying current anatomical knowledge. Eleven studies primarily evaluated anatomical knowledge. There were six studies which primarily evaluated the anatomical knowledge of medical students, and four which evaluated (young) medical doctors. One study assessed the anatomical knowledge of fellows and medical specialists.

The nineteen studies summarised in Table 2 evaluated an intervention and tested anatomical knowledge before and after the intervention. For this review, we assumed that the pre-intervention tests reflected the participants’ level of anatomical knowledge. Hence, we only extracted the pre-intervention results from these studies. Seven studies tested the knowledge of medical students before the intervention by the authors. A total of eight intervention studies involved a pretest on residents. We identified two studies which performed an intervention study on fellows. Two intervention studies focused on the anatomical knowledge of a mixed group of students, residents, and fellows. Table 3 shows test results based on different types of questions, subdivided into multiple choice, board style, and fill in the blank.

Table 3 Test results based on different types of questions. Subdivided into multiple choice, board style, and fill in the blank. Scaled score 0–100%

Discussion

Main Findings

The actual measured knowledge of anatomy of medical students, residents, fellows, and specialists differed substantially between studies. Scores were reported as median or mean and after scaling ranged from 22.5 to 82.4% correct answers. Scores per group were 22.5–73.8% for medical students, 26.9–82.4% for residents, and 25.0–63.2% for medical doctors/fellows. Almost two-thirds of the total given mean/median scores were below 60%.

In six of the thirty studies, the authors expressed their interpretation of the measured level of anatomical knowledge. Based on the measurement results of their research, they conclude that the level of knowledge is deficient and moderate to worryingly low.

Interpretation of the Findings

The question of whether knowledge of anatomy is sufficient or too low may be approached from different perspectives. One of those perspectives is the deontological one. As a physician, one has to have good knowledge of anatomy. It is an obligation or duty towards the patient and is a generally accepted rule we should conform to. The current literature seems to lean on deontological ethics. The opinion provided by five of the studies that knowledge is moderate to worrying low is an example of deontology. However, there is no research on what the level of knowledge should be. The utilitarian stance is another perspective we can approach. Utilitarianism states that the best action is the one that maximises utility, which is usually defined as that which produces the greatest well-being of the greatest number of people.

When is anatomical knowledge worryingly low so that it will cause danger to a patient? Or the opposite, that it leads to higher appreciation of the patient? We could not find any evidence showing that a low level of anatomical knowledge is the reason for medical errors or unsatisfied patients.

This might suggest that the way most medical professionals deal with anatomy is pragmatic and a fair choice in the abundance of demanded competencies. However, we must also acknowledge that the absence of proof is not always the proof of absence.

The Quest for a Gold Standard for How Much Anatomy

So far, the literature does not provide a convincing gold standard for how much anatomy is required for safe clinical practice. Following the deontological stance, an international consensus could set the standard. However, with more than 100 curricula all over the world, this sounds like an impossible job. Two studies, Brunk et al. and Prince et al. tried to set a gold standard through the use of expert panels. In the study of Brunk et al. the pass rate was set at 60.4% for 5th- and 6th-year medical students. The actual score was 29.9%. Prince et al. used different expert panels to set the standard, showing that fourth-year students set the highest pass rate at 56.0%, whereas recent graduates set the lowest pass rate at 46.9%. The mean overall score was 53.2%. The conclusion of the authors of both studies was that the results are way below the expected standard [20, 21]. However, given the known retention levels of basic science, it is questionable whether this conclusion is correct. In an extensive study by Custers et al. it is shown that participants still in medical school and those not too long out of it achieved scores of approximately 40% correct answers on basic science. Specifically looking at anatomical knowledge for 5th- and 6th-year students, this percentage lies between 45 and 50% [22].

Strengths and Limitations

Our review holds some limitations that need to be addressed. We included 30 studies in which 30 different tests were used. There was much heterogeneity in the number and type of questions, as well as in the region of interest which was tested. Based on the different characteristics of the tests, some can be regarded as more reliable than others. One of the most frequent manners of testing was the identification of labelled structures with a maximum of 20 items [23,24,25,26,27,28,29,30,31,32,33]. But Brunk et al. used the Berlin Progress Test Medicine (PTM), a test of 200 items chosen from an item pool containing 5000 items. All items are administered in single-best answer multiple-choice format and typically make use of clinical vignettes [20]. In contrast, Dickson et al. derived their conclusion on an 11-question test [34]. Besides, the sort of test, the context in which it was taken, the time between the test and the period in which the anatomy was learned, and if there has been repeated learning are important variables. In our selection, we only included studies that did not test anatomical knowledge after an intervention or repeated learning. The time interval between the moment the material was studied and when it was tested was hard to assess since there are many different curricula. However, in most curricula, anatomy is taught in the preclinical years.

This diversity of tests and moment of testing creates two difficulties. First, although pooling the results using meta-analysis techniques is statistically not impossible, we felt it would not yield a useful summary of test results for the purpose of our study. Second, it makes it hard to interpret the reliability of the scores. For example, an average score of 50% on a difficult exam with questions of function and applied clinical anatomy might be the same as a 90% score on an easy exam with only identification of structures.

Another point to mention is the diversity of participants. In the included study, this ranged from medical students up to medical doctors. This can be seen as a limitation if comparisons between studies are made, but it is also a strength in providing some insight into anatomical knowledge over time and making the results of our review generalizable to a broader group.

The strength of our study can be found from a more philosophical point of view. Our review has shown that anatomical knowledge is hard to establish, and a gold standard cannot be found. The questions around anatomy education should be rephrased using different paradigms from philosophy. The main question will be “when to give students the right level and amount of which anatomy in order to feel safe and competent to do their clinical work”. This means that we should also focus on ways to define and assess this level.

Suggestions and Challenges for the Future

In our search for the level of anatomical knowledge, the result is the absence of standardisation. Not only in ways of testing but also in need to know knowledge. Without agreements about the need for knowledge, which will differ at different stages of medical and postgraduate education, it is difficult to judge the level of anatomical knowledge. There are universities with an extensive curricular plan including a good description of what anatomy knowledge is expected [35]. This is a good start, although it can vary from university to university and from country to country. While in general, the human being and her anatomy and illness do not vary. A suggestion to remedy this absence is to conduct a Delphi study to determine what knowledge is required to know. In a Delphi study, experts will discuss a topic and reach a consensus. An example is being carried out in the Netherlands for the gynaecology speciality [36]. After focus groups, in-depth interviews, and two Delphi rounds, a core list of anatomical structures that are relevant to the safe and competent practice of general gynaecologists was identified. Such a list can be used to guide gynaecology postgraduate education and assessment.

The second challenge is the wide variety of specialities and subspecialities. A gastrointestinal surgeon does not need to have the same knowledge as a cardiac surgeon or a gynaecologist. Determining what the knowledge need for each stage of education, speciality, and subspecialty and what the need to know knowledge is will be an extensive job. However, in our opinion, this is an indispensable step in the process of assessing and determining anatomical knowledge.

A third challenge is the way of testing. Our results already show different ways in which anatomical knowledge can be assessed. In general, anatomical knowledge can be tested using a variety of assessment tools, such as multiple-choice exams, oral exams, or structured practical examinations. These tools reflect the three domains of anatomy training: theoretical knowledge, practical 3D application of this knowledge, and clinical or bedside application of knowledge [37]. So, after determining which knowledge is essential, this anatomical knowledge should be tested in various ways within the different domains.

Conclusion

This review provides an overview of what is known about measured anatomical knowledge. After critically reviewing the literature, we have to conclude that the existing literature confirms that anatomical knowledge is hard to establish, mainly due to the lack of standardisation.

Further research should focus on ways to define and assess “desired anatomical knowledge” in different contexts. Suggestions are to conduct a Delphi study among experts from the field to define essential anatomical structures. After that, it is important to assess anatomy knowledge through various assessments to test different domains of anatomical knowledge. In a next phase, we can discuss if anatomical knowledge is lacking. And if so, what the impact of this shortage is and whether interventions are needed.