Background

The death certificate is a permanent, legal record of death that provides important information about the circumstances and cause of death [1]. For deaths that occur in hospitals, or other settings where a doctor is present, death certification is initiated by a medical officer, after which the certificate usually undergoes registration by a national civil registration system [2].

Accurate and timely cause of death reporting is essential for health policy and research purposes [3]. Individual death certificates are routinely aggregated into vital statistics by national civil registration systems, providing the most widely verified sources of mortality data in the form of standardised, comparable, cause-specific mortality figures [4]. These statistics provide essential insights for government policymakers, health managers, healthcare providers, donors and research institutes into common causes of death by age, sex, location and time. The data inform the allocation of resources across an array of stakeholders and disciplines, including medical research and education, disease control, social welfare and development and health promotion [5].

Cause of death

The ‘gold standard’ for cause of death statistics is complete civil registration where each death has an underlying cause assigned by a physician and is coded according to International Classification of Diseases (ICD) rules. Causes of death reported in death certificates are defined by the World Health Organization (WHO) as ‘all those diseases, morbid conditions or injuries which either resulted in or contributed to death and the circumstances of the accident or violence which produced any such injuries’ [6]. Importantly, this definition does not include symptoms and modes of dying.

Medical certification of cause of death

The Medical Certificate of Cause of Death (Fig. 1) is a standardised universal form recommended by the WHO for international use, which has been adopted by most WHO member states [6]. The WHO also provides instructions on correct cause of death reporting to improve the quality of medical certification and subsequent data [7].

Fig. 1
figure 1

Frame A (Medical data: Part 1 and 2) of the International Form of Medical Certificate of Cause of Death

When a single cause of death is reported on the death certificate, this becomes the underlying cause of death used for tabulation. When more than one cause of death is reported, the disease or injury which initiated the sequence of events that produced the fatal event becomes the underlying cause of death [6].

Despite the availability of guidance, errors in cause of death certification have been observed across all geographical regions, with inadequate certification by doctors remaining the principal reason for inaccurate death data [8, 9]. Over the past few decades, therefore, training medical doctors in death certification has become a key intervention employed by health services and national governments to improve mortality statistics. Interventions have included improvements in death certificate formats, training programmes on completion of death certificates, development of self-learning educational materials, implementation of cause of death query systems, periodic peer auditing of death certificates and increasing autopsy rates [10,11,12].

Intervention studies on death certification

Several studies have investigated the effectiveness of interventions to improve the quality of death certification [13,14,15]. Whilst improvement in death certification accuracy is often reported, negative findings have also been published [16]. Moreover, there are few randomised controlled trials (RCTs) or similar studies that have produced high-quality evidence. A 2010 literature review identified 129 studies on the effectiveness of educational interventions for death certification, ultimately reviewing 14, including three RCTs [8]. All educational interventions identified in the review improved certain aspects of death certification, although the statistical significance of evaluation results varied with the type of intervention.

Given the absence of any systematic review and meta-analysis of death certification training interventions, as well as the increase in experimental data produced in the past decade and the need—made even more urgent by the COVID-19 pandemic—to strengthen national vital registration and cause of death data systems, further evaluation is essential. In this study, we systematically review and meta-analyse the effectiveness of training interventions for improving the quality of medical certification of cause of death (MCCOD). To our knowledge, no study has specifically investigated interventions intended to reduce errors in MCCOD in a systematic review.

Methods

Preparation and search strategy

This review was registered in the International Prospective Register of Systematic Reviews (PROSPERO; Registration ID: CRD42020172547). Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed throughout the review process [17].

A comprehensive literature search was conducted to identify published articles investigating the effectiveness of training and education interventions to improve death certification (additional file 1: Fig. S1). The search was conducted on the CENTRAL, Ovid MEDLINE and Ovid EMBASE electronic databases, and returned 1060 results, which were exported to EndNote X9 citation manager and deduplicated. The remaining 676 studies were then limited to those published from 1994 onwards (where 1994 is the year ICD-10 was implemented) resulting in 616 studies for screening.

Eligibility criteria and study selection

This study aimed to assess the effectiveness of training interventions in improving the quality of MCCOD compared to generic academic training in training curricula for current, as well as prospective physicians (in randomised studies), or pre-intervention quality parameters (in non-randomised studies) [8]. Two reviewers (BPK and JS) independently reviewed each study against inclusion/exclusion criteria (additional file 2: Fig. S2). Studies were screened by titles and abstracts using DistillerSR online screening software. Full texts of 44 records were then reviewed, as well as an additional eight records that were identified from the study reference lists. All disputes were resolved by an expert third reviewer (LM). Researchers were blinded to each others’ decisions. A total of 21 studies were included for data extraction and final analysis (Fig. 2). One reviewer extracted data from the selected studies (BPK), with findings then reviewed by a second reviewer (JS). Disputes were resolved independently by the third reviewer (LM).

Fig. 2
figure 2

PRISMA flow diagram

Risk of bias, meta-analysis and narrative synthesis

Selected studies were categorised under ‘randomised’ and ‘non-randomised’, and risk of bias was assessed by two reviewers (BPK and JS) with disputes resolved by the third reviewer (LM). Randomised trials were assessed using the seven domains of the GRADE recommendations, and non-randomised studies were assessed using the seven domains of ROBINS-I criteria [18, 19].

All studies were initially assessed for clinical and methodological heterogeneity [20]. Four interventions were eligible to undergo meta-analysis in relation to five outcomes. As these were before-and-after studies without control groups, the ‘generic inverse variance method’ was used in pooling [21]. Review Manager 5.4 software was used in the meta-analysis and the effect measure was ‘risk difference’ (i.e. percentage of death certificates with each error). Statistical heterogeneity was assessed using the I-square statistic and chi-square test. When potential outliers were removed in dealing with statistical heterogeneity, sensitivity analyses were performed with and without excluded studies [22]. Robustness of the effect measures was explored further using a sensitivity analysis with both fixed and random effect assumptions [22]. Potential publication bias was explored with the generation of funnel plots.

The meta-analysis findings were imported through the ‘GRADEproGDT’ online tool. A ‘summary of findings’ table was prepared, and related narrative components added to the table [23]. The certainty assessments were done using eight criteria: study design, risk of bias, potential of publication bias, imprecision, inconsistency, indirectness, magnitude of effect, dose-response gradient and effect of plausible confounders [24]. Studies or sub-groups that were not included in the meta-analysis were included in a narrative synthesis of findings.

Results

Within the 21 selected articles [13,14,15, 25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42], there were 24 distinct interventions, with one article describing four interventions across four countries [30]. In another, findings were stratified under two study populations [27]. Three were randomised controlled trials [13, 35, 37] and 21 were non-randomised interventions. Amongst the latter, one was a non-randomised controlled study [31] whilst the remainder were non-controlled before-after studies. Characteristics of the selected studies are shown in Table 1.

Table 1 Characteristics of selected studies

Study populations, interventions and outcomes

In seven interventions, the study populations consisted of medical students [14, 15, 27, 29, 35, 39, 41]. These medical students were comprised of first year students (UK) [35], medical trainees in teaching hospitals (Spain) [41], third year students (USA) [14] and final year students (Fiji and Spain) [15, 29]. Generally, however, the study populations were physicians or doctors, and referred to as residents (Canada, USA, India) [13, 28, 34, 36], medical interns (South Africa, Spain) [37, 39], postgraduates (USA, India) [31, 36, 40], secondary healthcare physicians (Bahrain) [26], family doctors (Spain, Canada) [27, 33, 39] or Senior House Officers (England) [38].

Seminars, interactive workshops, teaching programmes and training sessions were the most common terms used in introducing the interventions. These ranged in duration from 45 min [13] to 5 h [27], and some interventions included subsequent sessions on additional days [36]. Other descriptions included ‘training of trainers’ (Philippines, Myanmar, Sri Lanka) [30], a video (UK) [35] and web-based or online training (USA, Fiji) [14, 15, 31]. In Peru, training was complementary to an online death certification system [32].

For the majority of interventions, a comparison of certification errors pre- and post-intervention was used as the measure of impact, although some studies developed a special knowledge test or used a quality index. These included the Mid-America-Heart Institute (MAHI) Death-Certificate-Scoring System (two interventions) [13, 14], knowledge assessment tests developed by the investigators (three interventions) [31, 35, 37], and quality indexes providing numerical scores based on ICD volume 2 best-practice certification guidelines [15].

Risk of bias assessments

The risk of bias assessments for the randomised studies [13, 35, 37] are shown in Fig. 3a and in Fig. 3b for the non-randomised studies.

Fig. 3
figure 3

a Risk of bias summary of the randomised studies. b Risk of bias summary of the non-randomised studies

For all randomised studies, ‘blinding of participants and personnel’ was assessed as high-risk given the difficulty of maintaining blinding for training interventions. All three studies had pre-determined outcomes and were rated low risk for ‘selective reporting’.

All but one study were before-after studies without a separate control group. Due to the method of recruitment, none of the studies was characterised as low-risk in relation to confounding and selection bias. However, since the intervention periods were clearly defined, all studies were characterised as low-risk for ‘bias in measurement classification of interventions’.

Meta-analysis

Since the interventions targeting medical students were found to be clinically heterogenous, potential meta-analyses were restricted to those targeting physicians. In anticipation of substantial methodological heterogeneity, the meta-analysis was planned separately for non-randomised studies. Findings of the studies and sub-groups initially entered to the meta-analysis are summarised in additional file 3: Tables S1-S5.

As the initial meta-analyses showed statistical heterogeneity, sensitivity analyses were performed after excluding a potential outlier in each comparison, with both fixed and random effect assumptions (Table 2). Except for ‘ill-defined underlying cause of death’ [43], the direction and significance of the estimates did not change with these sensitivity analyses.

Table 2 Sensitivity analysis of the pooled estimates

The forest plots of the five outcomes (i.e. after excluding the outliers) included in the meta-analyses are shown in Fig. 4a–e. Three interventions were included in each meta-analysis [30].

Fig. 4
figure 4

a Forest plot of ‘improper sequence’. b Forest plot of ‘presence of abbreviations’. c Forest plot of ‘no disease time interval’. d Forest plot of ‘multiple causes in a single line’. e Forest plot of ‘ill-defined underlying causes of death’

The lowest pooled risk difference (15%) was observed for ‘multiple causes per line’ and ‘ill-defined underlying cause of death’ whereas the highest was for ‘no disease time interval’ (33%).

Funnel plots exploring potential publication bias are shown in Fig. 5a–e.

Fig. 5
figure 5

ae Funnel plots of the pooled estimates

All funnel plots were generally symmetrical. A cautious interpretation of these is included in the “Discussion” section.

In the ‘summary of findings’ table (Table 3), the certainty assessments of these five outcomes are presented. ‘Moderate certainty’ was assigned to four outcomes and ‘low certainty’ to one. Findings of related additional studies have also been summarised as comments in Table 3.

Table 3 Summary of findings

Narrative synthesis of other findings

Findings of randomised studies

In two of the three randomised studies conducted on medical interns, overall scores improved with the intervention (p < 0.05) [13, 37]. In the third study, which was conducted on medical students, there was weak evidence for an improvement in the overall performance score (p = 0.046), as well as a ‘skill score’ (p = 0.066) [35]. In one study, ‘correct identification of the COD’ improved more in the intervention group (15% to 91%) compared to the control group (16% to 55%), and ‘erroneous identification of cardiac deaths’ decreased more with the intervention (56% to 6%) compared to the controls (64% to 43%) [13]. In a South African study, three errors (‘mechanism only’, ‘improper sequence’ and ‘absence of time interval’) were significantly reduced in the intervention group only, whereas ‘competing causes’ and ‘abbreviations’ were reduced in both groups [37].

Non-randomised study findings on medical students

Degani et al. (2009) showed improvements in the modified-MAHI score following the intervention (mean difference of 7.1; p < 0.0001) [14]. Vilar and Perez (2007) reported improvements in ‘at least one error’ (p < 0.0001), including ‘mechanism of death only’ (p < 0.0001), ‘improper sequence’ (p < 0.0001), ‘listing cause of death in Part 2’ (p < 0.0001) and ‘mechanism as UCOD’ (p < 0.0001) [41]. In the same study, two error types (‘abbreviations’ and ‘listing two causally related causes as COD’) did not show evidence of improvement (p = 0.413 and p = 0.290) [41]. In a Fijian study, training produced improvements of 1.67% to 19.4% in the following: ‘quality index score’, ‘average error rate’, ‘abbreviations’, ‘sequence’, ‘one cause per line’, ‘not reporting a mode of death’ and ‘legibility’ [15]. In two Spanish studies, the intervention improved performance in ‘sequence’, ‘cause of death’, ‘precision of terms’, ‘abbreviations’ and ‘legibility’ [29, 39].

Other comparisons

Case-wise comparisons with a set of errors were conducted in two studies [25, 27]. Most errors decreased following the intervention. In one non-randomised controlled study, a custom performance score increased post-intervention [31]. One study in England explored ‘mentioning consultant’s name’ and ‘completion by a non-involved doctor’, both of which improved following the intervention [38]. In a Canadian study, ‘increased use of specific diseases as UCOD’ and ‘being more knowledgeable on not using conditions like ‘old age’’ improved in the intervention group [33]. ‘Competing causes’ were less common post-intervention in two Indian studies, with varying strength of evidence (p = 0.001 and p = 0.069) [28, 36], but not in a Canadian study (p = 0.81) [34]. ‘Mechanism of death followed by a legitimate UCOD’ showed non-significant reductions in three studies (45.9% to 36.1%, 13.5% to 7.8% and 16% to 6.6%) [28, 34, 36]. Other studies that assessed ‘presence of at least one-major error’ and ‘keeping blank lines’ in the sequence generally showed a reduction following the intervention [30, 34].

Discussion

We conducted a systematic review of the impact of 24 selected interventions to improve the quality of MCCOD. Our meta-analysis suggests that selected training interventions significantly reduced error rates amongst participants, with moderate certainty (four outcomes), and low certainty (one outcome). Similarly, the findings of the narrative synthesis suggest a positive impact on both physicians and medical trainees. These findings highlight the feasibility and importance of strengthening the training of current and prospective physicians in correct MCCOD, which will in turn increase the quality and policy utility of data routinely produced by vital statistics systems in countries.

The systematic approach we followed distinguishes this study from the more common ‘narrative reviews’, whilst the meta-analysis provides pooled and precise estimates of training impact [44]. Rigorous heterogeneity and ‘certainty of evidence’ assessments were performed. To enable a better comparison of the quality of the selected studies, risk of bias assessments were performed using different criteria for randomised and non-randomised studies [18, 19]. Given the controversy surrounding conventional direct comparison methods for before-after studies in the literature—due to these methods’ non-independent nature [45]—less controversial ‘generic inverse variance methods’ were used in this review.

Irrespective of the study design (i.e. randomised or not) and population (i.e. physicians or medical students), training interventions were shown to reduce diagnostic errors, either in relative terms or due to an increase in scaled scores. Risk differences were used as pooled effect measures and typically suggested that certification errors decreased between 15 and 33% as a result of the training. Our findings also suggest that refresher trainings and regular dissemination of MCCOD quality assessment findings can further reduce diagnostic errors. However, due to the inherent limitations of using ‘absolute risk estimates’ like risk differences, we place greater emphasis on the direction of the effect measure and not on its size [46].

The pre-intervention percentages of all error categories selected for meta-analyses were below 51%, except for the category ‘absence of time intervals’, which ranged from 37 to 93% [30]. Based on post-intervention percentages, we therefore conclude that the intervention had a markedly favourable impact. For example, post-intervention errors were reduced to between 6.0 and 20.8% for ‘multiple causes in a single line’ and between 5.8 and 20.3% for ‘improper sequence’. For all interventions reviewed under the meta-analysis, post-training assessments were conducted between 6 months and 2 years after the intervention. Hence, the observed risk differences reflect the impact of the intervention over a longer time period, which is likely to be a more useful measure of the sustainability and effectiveness of training interventions than the more commonly used immediate post-training assessments.

The classification of errors into ‘minor’ or ‘major’ varies between studies. For example, ‘absence of time intervals’ was considered a major error in one study [32], but minor in several others [28, 30, 34, 36]. Some studies, although not all, classified ‘mechanism of death followed by a legitimate UCOD’ as an error [26, 28, 34, 36, 40]—furthermore, the scoring method and content of the assessment varied between studies [13, 14, 31, 35, 37]. Given this heterogeneity, it is important to focus on the patterns of individual errors and to be clear about how errors are defined before comparing results across studies.

Interestingly, we found greater variation across studies for post-intervention composite error indicators than for specific errors. Across the six interventions considered, post-intervention measures of ‘at least one major error’ ranged from 3.75 to 44.8% [30, 34, 40] whilst the fraction of cases with ‘at least one error’ ranged from 9 to 74.8% [30, 38, 41]. It is also interesting to note that doctors appeared to benefit less from the interventions compared to interns. This may in part reflect lower priority given by doctors to certification compared to patient management, possibly due to limited understanding of the public policy utility of data derived from individual death certificates.

In some studies, it is possible that a small proportion of post-intervention death certificates were actually completed by doctors who had not undergone training. This would have the effect of diluting the impact estimates of the training interventions. Further, constructing the causal sequence on the death certificate may involve a degree of public health and epidemiological consideration, in addition to clinical reasoning, which may be challenging for some doctors to incorporate into the certification process. This could explain the general lower improvement scores reported for the causal sequence. Finally, correct certification practices are heavily dependent on the attitudes of doctors towards the process, as well as the level of monitoring, accountability and feedback related to their certification performance.

Most interventions were conducted as interactive workshops that enabled participants to undergo ‘on-the-spot’ training [13, 25,26,27,28,29,30, 33, 34, 36, 37, 41]. There is a paucity of studies with control groups that compare different interventions. One study concluded that a ‘face-to-face’ intervention was more effective than ‘printed instructions’ [13]. However, another concluded that an added ‘teaching session’ did not improve performance compared to an ‘education handout’, although both strategies were independently effective [13, 37]. More research is required to test the relative effectiveness of training methods, such as online interventions, compared to those requiring face-to-face interaction.

Our analysis suggests several cost-effective options for improving the quality of medical certification. To the extent that individual-level training of doctors in correct medical certification is costly, strengthening the curricula in medical schools designed to teach medical students how to correctly certify causes of death, and ensuring that these curricula are universally applied, is likely to be the most economical and sustainable way to improve the quality of medical certification. How and when this training is applied prior to completion of medical training is likely to vary from one context to another and will depend on local requirements for internship training. Training smaller groups of physicians as master trainers in medical certification and subsequently rolling out the training in provincial and district hospitals is likely to be an effective and economical interim measure to improve certification accuracy, as has been demonstrated in a number of countries [30].

In some countries, electronic death certification has been used as a means to standardise and improve the quality of cause of death data [32]. Electronic death certification can be helpful in avoiding certain errors such as illegible handwriting and reporting multiple causes on a single line (by not allowing the certifier to report more than one condition per line) [47]. An electronic certification system can also generate pop-up messages to remind the certifier not to report modes of dying, or symptoms and signs, as the underlying cause. However, electronic certification cannot improve the accuracy of the causal sequence or alleviate the reporting of competing causes, unspecified neoplasms or non-reporting of external causes. Furthermore, whilst cause of death data entered in free text format could improve the quality of medical certification [48] when electronic certification is enhanced with suggested text options and ‘pick’ lists, this can lead to systematic errors in medical certification.

This review has several limitations. The studies examined in this review included a diverse range of participants and intervention methods and were conducted in various cultural settings. The duration and modality of the training interventions varied substantially across studies. Only three interventions were randomised, and due to the diversity in non-randomised studies, the potential influence of confounding factors on the quality parameters assessed cannot be excluded. These factors were, however, considered in risk of bias and heterogeneity assessments.

There is also considerable subjectivity in the assessment of some criteria, including ‘legibility’ and ‘incorrect sequence’ that could lead to bias in the assessments. Despite outcomes usually being pre-defined, adherence to risk-lowering strategies, such as ‘blinding the assessor’, was often not described [14, 15, 25, 26, 28,29,30,31,32,33, 36, 38,39,40,41,42]. Despite the inclusion of only three interventions, each meta-analysis included an adequate number of at least 1500 observations per group. Even though funnel plots were presented for gross exploration of publication bias, generally the interpretation of these are recommended for meta-analyses with more than 10 comparisons. Furthermore, little evidence is available on the appropriateness of funnel plots drawn with risk differences [49].

Conclusions

Both pooled estimates and narrative findings demonstrate the effectiveness of training interventions in improving the accuracy of death certification. Meta-analyses revealed that these interventions are effective in reducing diagnostic errors, including ‘no time interval’, ‘using abbreviations’, ‘improper sequence’, ‘multiple causes per line’ with moderate certainty and ‘ill-defined underlying CoDs’ with ‘low certainty’. In general, ‘no time interval’ was observed to be the most common error, and ‘illegibility’ the least observed amongst pre-intervention errors. ‘No time interval’ appeared to be the error with most improvement following intervention, as evidenced by both the pooled and narrative findings.

Strategic investment in MCCOD training activities will enable long-term improvements in the quality of cause of death data in CRVS systems, thus improving the utility of these data for health policy. Whilst these findings strengthen the evidence base for improving the quality of MCCOD, more research is needed on the relative effectiveness of different training methods in different study populations. From the limited evidence thus far, our meta-analysis indicates that training doctors and interns in correct cause of death certification can increase the accuracy of certification and should be routinely implemented in all settings as a means of improving the quality of cause of death data.