Background

Summary data on the causes of death are used to allocate health system resources for prevention and treatment of disease and to monitor disease trends over time. National mortality statistics are typically based on single causes of death. Until the mid twentieth century this was appropriate while infectious diseases were the primary causes of most deaths. However, as living standards, effective treatments, and disease prevention and control measures have improved, longevity has increased and as more of the world’s population is becoming older, multi-morbidity (suffering two or more chronic conditions simultaneously) is becoming increasingly common. For older people, the single cause of death is less realistic for describing disease burden [1, 2]. As Désesquelles et al. note “At old ages, death is indeed often the final stage of a long morbid process involving several conditions” [3].

There is a World Health Organization (WHO) framework for recording causes of death [4]. For each death a doctor should complete a Medical Certificate of Cause of Death (here called the death certificate) which has two sections: Part I is the sequential causal pathway resulting in death, and Part II records other conditions the person had pre-mortem that contributed to the death but were not in the causal pathway. Causes may be recorded on any line in Part I or Part II and several causes may be listed on the same line. Commonly the underlying cause of death (UCoD), the condition beginning the causal chain leading to death, is listed on the last line in Part I. The other causes listed above the UCOD in Part I are typically conditions directly leading to death; they are the consequences of the UCoD (e.g., pneumonitis following a fall). Conditions listed in Part II typically describe relevant multi-morbidities. All the conditions listed on the death certificate are coded according to rules for the International Classification of Diseases (ICD) [5]. Then algorithms are used to determine the single underlying cause of death (UCoD) which is used in national statistics. However, the other causes listed on the death certificate can provide important information that is not adequately captured by the UCoD alone. For example, in Australia dementia, including Alzheimer’s disease, as the UCoD rose from being the fourth leading cause of death in 2006 to the second leading cause by 2013 according to the national statistics [6]. During this time the rate of dementia deaths as the UCoD increased by 1.03% per year, however, the rate of deaths with dementia mentioned anywhere else on the death certificate decreased by 0.97% per year [7], so the net effect was that the rate of dementia mentioned anywhere on the death certificate remained stable [8]. This example of dementia illustrates that the UCoD can be influenced by administrative effects such as certification and coding changes. Also, some conditions such as hypertension or congestive heart disease, by their nature, are less likely to be a UCoD and therefore their contribution to death can be understated.

A related issue is the way in which specific causes of death are grouped with other related causes into broader categories. For many practical purposes, the actual numbers or rates of death by cause are less important than their rank order. But this in turn depends on how causes of death are grouped. As Becker et al. note “The rank-order of any causal category depends on the list used… Moreover, a broad cause group, such as ‘all circulatory diseases’, is more likely to score high in the rankings when compared with an individual disease, such as stroke… The process utilized to create condensed tabulations lists should be based on the intended analysis… Any sequence of leading causes is strongly influenced by the criteria according to which the cause-groups of the list are defined” [9].

The main aim of this paper is to present a new method for including both the UCoDs and the other causes recorded on the death certificate into the calculation of national mortality statistics. The objectives of this paper are to: describe a new, data driven, method that uses the patterns of multiple causes of death to calculate the contribution of each cause to national death statistics; compare this method with an alternative method proposed by Piffaretti et al. that allocates arbitrary weights to the UCoD and other contributing causes of death (CCoDs) [10]; and apply both methods using Australian data and compare the effects on rank order of leading causes.

Methods

Analysing multiple cause of death data

Piffaretti et al. proposed the following approach for incorporating information from CCoDs into the calculation of the contributions of each cause to the total number of deaths [10]. They only considered the UCoD and CCoDs from Part II of the death certificate (i.e., other Part I conditions are not considered). For each death, weights are assigned to the UCoD and CCoDs in such a manner that the sum of these weights is one [11]. Therefore, the sum of weights of all deaths equals the total number of deaths. For each death, a cause is counted only once; if it is listed as both a UCoD and a CCoD, it is taken as the UCoD and the CCoD is ignored; or if the cause is listed more than once as a CCoD it is counted only once. Let wci denote the weight assigned to cause c for death i. If there are no CCoDs for death i, then

$$w_{ci}=\left\{\begin{array}{lc}1&\;\mathrm{if}\;c\;\mathrm{is}\;\mathrm{the}\;\mathrm{UCoD}\\0&\mathrm{otherwise}\end{array}\right.$$

If there are ni CCoDs for death i then

$$w_{ci}=\left\{\begin{array}{cc}p&\mathrm{if}\;c\;\mathrm{is}\;\mathrm{the}\;\mathrm{UCoD}\\\frac{\left(1-p\right)}{n_i}&\mathrm{if}\;c\;\mathrm{is}\;\mathrm a\;\mathrm{CCoD}\end{array}\right.$$

where p is an arbitrary weight with 0 ≤ p ≤ 1. Then the contribution of cause c to all deaths is given by

$$\sum_{i}{w}_{ci}.$$

If p = 1 then only the UCoDs are counted, i.e., this is the current way of reporting national death statistics. If p = 0 the UCoDs would be completely ignored. Piffaretti et al. illustrated the method using p = ½ [10]. Moreno- Betancur et al. used p = ¾ and ½ and suggested another method giving equal weights for the UCoD and each of the CCoDs [12]. However, these methods all involve the choice of an arbitrary value, p.

To overcome the subjectivity of choosing the value p we propose a new data driven method for calculating weights that takes into account the associations that occur between UCoDs and particular CCoDs. For example, ischaemic heart disease as a UCoD commonly has diabetes as a CCoD but is less likely to have lung cancer as a CCoD. In the data driven method the contribution of diabetes in a death with ischaemic heart disease as the UCoD is given more weight than lung cancer. This is due to the common co-occurrence of ischaemic heart disease and diabetes reflecting the causal pathway between them, compared to the less common and less direct link between ischaemic heart disease and lung cancer. In contrast, in methods using arbitrary weights the contributions of these CCoDs would be equal.

The first step is to use all the data to calculate the numbers

$$x_{uc}=\left\{\begin{array}{cc}\frac{N_{c\vert u}}{N_u}&\mathrm{if}\;u\neq c\\0&\mathrm{if}\;u=c\end{array}\right.$$

where Nc|u is the number of deaths with c as a CCoD and u as the UCoD, and Nu is the total number of deaths with u as the UCoD.

The next step is to calculate weights wci for each death. Suppose death i has u as the UCoD and ni CCoDs. The weight wci is defined as

$$w_{ci}=\left\{\begin{array}{cl}{}^{{x}_{uc}}\!\left/ \!{}_{{n}_{i}}\right.,&\text{if one of the CCoDs is} \,c\\1-\sum_{all\;CCoDs}{}^{{x}_{uc}}\!\left/ \!{}_{{n}_{i}}\right.,&\text{if the UCoD is c (i.e.,}\,c=u\text{)}\\0,&\text{if}\,c\,\text{is not the UCoD or a CCoD}\end{array}\right.$$

where xuc and ni are defined above. Then the contribution of cause c to all deaths is given by \(\sum_{i}{w}_{ci}.\)

Australian cause of death records

In Australia each death is certified by a doctor who completes a Medical Certificate of Cause of Death, or the death is referred to the coroner to investigate the circumstances and causes (currently about 12% of deaths are referred to a coroner) [13]. In either case, the cause of death information is lodged directly with the Registrar of Births, Deaths and Marriages in the relevant State or Territory. ICD codes are assigned to each cause and the UCoD is subsequently identified using a combination of automated and manual coding practices. In 1999 ICD 10 was adopted and since 2013 the Iris system for automated processing has been used [14]. Iris is an automatic system for coding multiple causes of death and selecting the underlying cause of death. The system has been designed to accommodate language-dependent aspects of cause of death recording and to improve international comparability. Iris is based on the international death certificate form recommended by WHO and causes of death coded according to ICD-10.

We obtained unit record data from the Australia Coordinating Registry which manages the data from all eight States and Territories. The data were provided in the order recorded on the death certificate (called the entity axis) and as a list with the UCoD recorded first followed by all other causes in alphabetical order (called the record axis). For this paper we used the UCoD from the record axis. In the context of multimorbidity we used the entity axis to identify all the CCoDs which we defined as any Part I causes listed to the right of, or below, the UCoD together with all causes in Part II. Causes listed in Part I above the UCoD were ignored as these should not be part of the pre-mortem pattern of multimorbidity.

The data used for this paper were for all deaths in Australia from 2006 to 2018 inclusive. The starting date of 2006 was chosen because there were major changes which led to a marked discontinuity between 2005 and 2006 [15]. The methods then remained unchanged, except for a change in software in 2013 which did not affect allocations within broad categories of causes [16]. Finally, the most recent data available when this work commenced were for 2018 (when the data were also unaffected by COVID-19). We used records for all people aged 60 years or more and examined the effects of sex and age (three groups: 60 to 74 years, 75 to 84 years, and 85 years and over).

Categories of causes of death

Various criteria for defining lists of conditions for the analysis of multimorbidity have been published [9, 17, 18]. The main points are as follows.

  1. 1.

    Relevance or fitness for purpose. For this paper the purpose is to analyse data on multiple causes of death among people aged 60 or more in Australia – although the list is likely to be applicable to other countries where most deaths occur from non-communicable diseases in older people, and multiple causes of death are collected and coded.

  2. 2.

    Measurement. In this case all causes of death were coded according to the ICD 10. In Australia ICD 10 codes are assigned to conditions reported on the death certificate or coroner’s findings through a combination of automated and manual coding [16].

  3. 3.

    Prevalence. Common causes should be in singular categories.

  4. 4.

    Categories should be mutually exclusive and exhaustive so that each cause belongs to exactly one category.

The list of 50 leading UCoDs published regularly by the Australian Bureau of Statistics, based on the recommendations of Becker et al. [9], formed the foundation for the list developed for this paper. Causes that were uncommon for the study age group were combined with other causes in the same or another chapter in ICD 10 (e.g., infectious diseases, A codes, were grouped with parasitic diseases, B codes, and others). Common causes that were closely related and can sometimes be interchanged were grouped in the same categories (e.g., Alzheimer’s disease, vascular dementia and other dementias, that is G30 and F00-F03, were grouped together even though they are in different chapters). Causes of particular relevance to public health (e.g., suicide, ICD 10 codes X60-X84, Y87.0) were not grouped with other causes. Among the codes for injuries, those that describe the mechanisms of external injuries and would be coded as UCoDs (V, W20-W99, X00-X59, X85-X99, Y00-Y86, Y87.1-Y87.9, Y88-Y99) were grouped with the nature of the injuries which would be coded as CCoDs (i.e., S and T codes). Although codes in the ICD chapter for ‘Symptoms, signs, ill-defined conditions’ would not generally be considered as UCoDs, in the data set used for this paper, terms like ‘senility’ were used sufficiently often as the only cause of death to justify a separate category. This categorisation resulted in 40 categories. See the Appendix for details. The last category, ICD-10 codes U and Z, comprises conditions that are not valid UCoDs but are occasionally used for CCoDs and is included in the list for completeness. For this paper the word ‘cause’, in UCoD or CCoD, refers to the relevant category of causes in the 40-category list.

Results

Illustrative example

A simple artificial example of 10 deaths with four causes A, B, C and D, shown in Section I of Table 1, is used to illustrate the calculations of the contributions to the total deaths attributed to each cause. Section II shows the two steps involved in the new method: first using all the data to calculate the number of CCoDs associated with each UCoD (or zero if the CCoD is the same as the UCoD) divided by the frequency of the UCoD; and secondly, for each record separately calculating the weight attributed to each cause. The values from step 2 are added for each cause (column) across all records (rows) to give the total contribution of that cause (shown in the second row of Section IV). Section III shows the alternative method using, for each record, arbitrary weights of p = ½ for the UCoD and (1 – p) = ½ distributed across all CCoDs, or 1 if there are no CCoDs. Section IV shows the comparison of the method which only considers the UCoD (equivalent to the arbitrary method with p = 1), the new data-driven method, and the method using an arbitrary value of p = ½. The results are quite similar except for causes C and D. Cause C more often occurs as a CCoD and so makes a bigger contribution to causes of death when either the new method or the method involving arbitrary weights is used, and similarly cause D which is less commonly a CCoD makes a smaller contribution.

Table 1 Simple example of ten hypothetical death records with four possible causes of death labelled A, B, C and D. The example shows the calculation of weights using the data driven and arbitrary methods, and the comparison of the methods

Australian cause of death data

A practical application of these methods is illustrated using Australian cause of death data. The distribution and numbers of causes of death for the 1,663,234 deaths by sex and age group are shown in Table 2. The numbers of deaths increased with age more among women than men. The first quartile, median and third quartiles for the number of CCoDs per death certificate were 0, 1 and 2 respectively for all sex-age groups but the mean numbers increased with age.

Table 2 Distribution of deaths in Australia 2006–2018 for all people aged 60 years or more, by sex and age groups and the number of deaths and mean number of contributing causes per death, in brackets

Table 3 shows the effects of various weights allocated to the UCoDs and CCoDs for all deaths among people aged 60 years or more, for the data-driven and arbitrary methods. As might be expected some causes, such as cancers, were much more likely to be listed as UCoDs than CCoDs, so the percentage of all deaths associated with those causes was similar for the method based on UCoDs alone and for the data-driven method but decreased as the arbitrary weight p varied from 1 to 0. In contrast, for others such as diabetes and hypertensive disease, the percentage of deaths associated with the cause increased as more weight was given to CCoDs than the UCoD. The proportion of deaths associated with some causes, such as chronic lower respiratory disease, other respiratory disease and other digestive diseases, were similar for all methods and were scarcely affected by variations in the arbitrary weights.

Table 3 Numbers and percentages of deaths in Australia 2006–2018 for all people aged 60 years or more by cause: first column is the list of causes of death; the second column is the numbers of deaths with this as the underlying cause; the third column is the percentages of deaths based on data driven method; the remaining columns are percentages of deaths using arbitrary weights varying from 1 (i.e., underlying cause only) to zero (i.e., contributing causes only)

Figure 1 shows Bland Altman plots illustrating differences between percentage of deaths associated with different causes when different weighting schemes are used. The left panel shows differences between percentages derived using the data driven approach and the UCoDs. In this case the greatest differences are for hypertensive heart disease, ischaemic heart disease and cerebrovascular disease with hypertensive heart disease given more weight and the other two causes less weight when CCoDs are taken into account. The right panel shows the differences in percentages of deaths for each cause calculated with the arbitrary weights p = 0.9 (A90) and p = 0.1 (A10). These are greater than the differences shown in the left panel, and so the scales on the vertical axes differ. In the right panel three causes have particularly greater differences between arbitrary weight with p = 0.9 (A90, i.e., most weight on the UCoD) and p = 0.1 (A10, i.e., most weight on CCoDs) – these are lung cancer which, if present, is mainly recorded as the UCoD, hypertensive disease which is much more commonly reported as a CCoD and ischaemic heart disease which is most commonly reported as a UCoD.

Fig. 1
figure 1

Bland Altman plots comparing percentages of deaths associated with each cause: left panel, data driven estimates (DD) vs underlying cause of death (UCoD), right panel, arbitrary weights with p = 0.9 vs p = 0.1 (A90 and A10). Points outside shaded areas or limits of agreement, indicate the differences that are beyond what might be expected by chance; these are for hypertensive disease (HYP), cerebrovascular disease (CeVD), ischaemic heart disease (IHD) and lung cancer (Lung Ca). Note the scales for the two graphs are different

If the causes are ranked in order of decreasing percentage of deaths, the rank order for the most common causes differs by age and between men and women (Tables 4 and 5); for example, the position for dementia rises with age. Within each sex-age group about 6–8 causes are consistently in the top 10 regardless of how the percentages of deaths are calculated, and rank orders among these top causes are affected by small differences in percentages. There is greater similarity in ranks between the UCoDs and the data driven estimates than when the CCoDs are given more weight using the arbitrary method with p = 0.5, for example.

Table 4 Leading causes of death in Australia 2006–2018 among men aged 60–74, 75–84 and 85 years or more, ranked by using the current method (underlying cause of death alone, UCoD), the new data driven method (DD) and the method using arbitrary weight with p = 0.5
Table 5 Leading causes of death in Australia 2006–2018 among women aged 60–74, 75–84 and 85 years or more, ranked by using the current method (underlying cause of death alone, UCoD), the new data driven method (DD) and the method using arbitrary weight with p = 0.5

Discussion

In this paper we have described a new method for including multiple causes of death in statistics that summarise the contribution for each cause to the total number of deaths in a population. The method takes account of the patterns of associations among UCoDs and CCoDs. It is driven by the data and does not involve an arbitrary choice of weights. We illustrated the method using a simple artificial example which demonstrates how the frequencies with which CCoDs occur with specific UCoDs affect their contributions to the results. To demonstrate the practical application of the method we applied it to Australian mortality data for people aged 60 years or more. Compared to the usual method based only on UCoDs, with the new method the percentage of deaths attributed to a condition like hypertensive disease (which is a common CCoD) almost doubles from 1.13% to 2.05%, and the percentages are attributed to the related conditions of ischaemic heart disease and cerebrovascular disease decrease. Similarly, the percentage contributions attributable to diabetes, dementia (including Alzheimer’s disease) and heart failure are all higher using the new method. For some conditions, like chronic obstructive pulmonary disease, and other respiratory conditions, the percentage of deaths varies little with different calculation methods because they occur about equally commonly as UCoDs or CCoDs. There are some causes, notably cancers, which are usually recorded as the UCoD with few, if any CCoDs. For these causes the percentages are only slightly lower based on the new method compared with the usual, UCoD only method, but can be considerably reduced if arbitrary weights are used. This is an important outcome achieved by the data driven method compared to that achieved using arbitrary weights. Cancers are legitimately the underlying cause of these deaths and reducing the ‘burden’ associated with these cancer deaths and attributing it to other causes would be difficult to justify from a public health perspective. Thus, reducing the relative importance of cancer does not make sense in the same way that reducing the rank order of ischaemic heart disease in deference to hypertensive diseases or diabetes does.

Several authors have suggested alternative methods for addressing the growing concern that, as the prevalence of multi-morbidity is increasing due to population ageing, the exclusive use of UCoDs in national mortality statistics does not adequately represent the importance of some conditions in terms of the population health burden [10,11,12]. The methods that have been proposed involve arbitrary choices of weight to be assigned to UCoDs and CCoDs without regard to the patterns that occur among these causes. The new data driven method is designed to overcome this limitation by taking into account the joint frequencies of conditions. The results in Table 3 show how this approach can increase or decrease the contributions of different conditions according to these patterns.

In this paper the unit of analysis is the death and not the cause of death. Thus, in common with other authors, each death is counted once so every death has the same weight in the total count, regardless of the number of CCoDs mentioned on the death certificate [3, 10, 12]. This differs from an analysis in which the total number of times a cause is mentioned in all death certificates may be the statistic of interest. In this case a death with numerous CCoDs will be more influential than one with only the UCoD reported. Counting each death just once is comparable with the current practice of using UCoDs only and it is more robust to differences in coding practices, for example, between countries which differ in the number of CCoDs commonly reported [3]. However, the data driven method uses the relative frequency of each cause across all deaths with the same UCoD in the term xuc which is used to calculate the weights wci. But, in the data driven method the influence for each cause is affected by the number ni of CCoDs on death certificate i (a feature shared with methods using arbitrary weights).

When causes of death are listed in rank order, as in Tables 4 and 5, the striking feature is that, within age and sex groups the ranks vary little between the data driven method and the current method based on UCoDs (but this effect is not necessarily found with the arbitrary weights). The reason for this robustness of ranks of causes is the use of groups of closely related causes (e.g., related to the vascular system, or the respiratory system). Most of the ‘exchange’ of weights occurs within these groups, e.g., between dementia as a CCoD when ischaemic heart disease is the UCoD and dementia as the UCoD when ischaemic heart disease as a CCoD. This phenomenon provides insights into the importance of how the groups of causes are defined. The categories used in this paper were based on the recommendations of Becker et al. [9] with modifications used by the Australian Bureau of Statistics, such as grouping Alzheimer’s disease with other dementias. Provided such a list of aetiologically related causes is used, results in this paper show that the percentages of deaths associated with different causes and the rank order of causes are quite robust to inclusion of CCoD information based on patterns within the data. This finding should provide confidence that the standard method, used by the World Health Organisation and many countries, does in fact provide a good representation of the relative importance of cause specific mortality rates.

Nevertheless, the methods discussed in this paper are not appropriate for universal use, for example as the international standard for reporting death statistics. In countries with incomplete registration of death, inadequate identification of causes of death, or where CCoDs are poorly recorded or not recorded at all, trying to take account of multiple causes of death is not sensible or feasible. Complete registration and improving the quality of UCoDs must remain the priority. However, for countries with high quality multiple cause of death data, two forms of statistical tabulations could be routinely reported: the current one based on UCoDs only, and another that uses the multiple cause data. This two-part approach would directly address the concerns that multimorbidity is inadequately represented in national statistics.

There are a number of limitations to the data and methods used in this study. Firstly, death certificates are not always filled in correctly. For example, several causes may be listed on the last line in Part I. In this paper those on the right of the UCoD were taken as CCoDs (i.e., assuming they were contributing causes that should have been in Part II) and those listed above and before the UCoD in Part I were ignored (assuming they were consequences of the UCoD). Other authors have taken the same approach, for example Piffaretti et al. calculated estimates both including and excluding these Part I causes [10]. Secondly, there is substantial evidence that some causes of death, e.g., diabetes [19,20,21] and dementia [22,23,24,25], are poorly recorded on death certificates as UCoDs or CCoDs even when the person is known in their lifetime to have the condition. Sensitivities of the order of 40—50% for diabetes and dementia have been reported, even for these causes listed anywhere on the death certificate [26].

If national statistics are based only on UCoDs the effects of causes which may be considered as risk factors for other causes, in particular endocrine, nutritional and metabolic diseases, may be underestimated [11]. Indeed, Goldberg et al. have suggested that differences in the way such conditions are coded as UCoDs or CCoDs can explain apparent differences in disease patterns between countries [27]. This issue is important for distinguishing between deaths directly attributable to COVID-19 (deaths from COVID-19) and those where COVID-19 was a CCoD (death with COVID-19) [28].

There are several important strengths of this study. In Australia the quality of death certification is high with approximately 88% of death certificates completed by a registered medical practitioner and the remainder obtained from coroners’ reports. Additionally, the data cover a period of thirteen years when there were very few changes in coding practices or the software used for processing the multiple causes listed on the death certificates, and there were no major changes that would impact on population mortality. By adopting the principle that each death is counted only once, we have ensured that the new method is robust to certification variations in the number of CCoDs reported on the death certificate.

Conclusion

A new method is proposed for calculating the percentages of deaths attributed to different causes when multiple cause of death data are available. It takes into account the patterns that occur between UCoDs and CCoDs as listed on the death certificate. Unlike previously proposed methods it does not rely on arbitrary choices of weights and does not treat all CCoDs equally. The application of the method to Australian mortality data shows how multi-morbidity can affect the percentages of deaths associated with different causes. The new method does not greatly affect the rank order of conditions, confirming the validity of the current practice based UCoDs alone. However, it does produce results that more adequately reflect the contribution of certain causes to overall mortality burden. It would be suitable for use by national statistical agencies to produce mortality tables that reflected the information available on multiple causes to complement the current tables based only on UCoDs.