Background

The CONCORD-2 study was designed to establish long-term surveillance of cancer survival worldwide, by central analysis of population-based cancer registry data. Net survival from 10 common malignancies was estimated from individual patient data submitted by 279 cancer registries in 67 countries [1].

Net survival of a cohort of cancer patients is estimated as the probability of survival derived solely from the cancer-specific hazard of death. It can be interpreted as the proportion of cancer patients who survive up to a given time after diagnosis (e.g. 5 years), after eliminating the impact of other causes of death (background mortality). This is done by separating the excess hazard of death due to cancer from the background mortality. Background mortality often differs widely between populations, and can even differ substantially within registry populations, for instance by race [2], ethnic group [3] or socio-economic status [4].

Information on background mortality in a given population is obtained from life tables, which are tables of age- and sex-specific death rates or probabilities in a given population at a given point in time. Net survival will be more accurate if the estimates of background mortality are as close as possible to each particular individual’s “real” expected mortality from all causes. Previous international studies of cancer survival [5], including the first CONCORD study [2, 6], have recommended that life tables specific to the area in which cancer patients live should be used, rather than national life tables, which may not account for sub-national differences in mortality. Ideally, these life tables should be by single calendar year, single year of age, sex, and race (or ethnicity) or deprivation when the relevant data are available. Such life tables are, however, not generally available: national statistical offices often only produce life tables for the whole country or major geographic regions.

In this article, we outline the methods used to construct life tables for the CONCORD-2 study, which is the largest comparison of worldwide trends in population-based cancer survival to date. We document the wide variations in life expectancy and age-specific mortality between and within the populations covered by the 279 participating cancer registries. We examine trends in life expectancy in regions within continents, and even within countries. We also investigate the importance of using regional vs. national life tables in the estimation of net survival, by comparing estimates for four adult cancers (breast, colon, lung, prostate) in 44 US registries, using either a US national, race-specific life table [7] or the race- and state-specific life tables that were constructed for the CONCORD-2 study.

Methods

All 279 cancer registries participating in the CONCORD-2 study were invited to contribute data for patients diagnosed during all or part of the calendar period 1995–2009, with follow-up to 31 December 2009, or a later year. To enable estimation of net survival for these patients, registries were asked to provide data on background mortality for each calendar year for which they submitted cancer data, from the first year of incidence to the last year of follow-up. They were offered the option of supplying their own life tables or providing death and population counts from which we could construct the life tables required.

Some registries also supplied life table data for racial or ethnic sub-populations within their territory: in all, we received data for 327 populations. The Israel National Cancer Registry and all 44 participating United States (US) cancer registries submitted death and population data from which to construct life tables by ethnicity (Israel, national-level) or race (US, state-level). The New Zealand Cancer Registry and the Penang Cancer Registry (Malaysia) provided mortality rates by ethnicity at a national level. Both the Polish National Cancer Registry and the Austrian Cancer Registry submitted mortality rates for the sub-regions covered by their registries (voivodeships for Poland, bundeslands for Austria). Neither registry submitted data by ethnicity.

We classified the data we received into four categories on the basis of their structure and quality: i) death and population counts by single year of age; ii) death and population counts by age group (typically five years); iii) mortality rates by single year of age; and iv) mortality rates by age group. A fifth category included registries from which life table data were unavailable or deemed unreliable. The methods used to construct life tables were different for each of the five categories (Table 1).

Table 1 Methods used to construct the CONCORD-2 life tables, by type of data obtained

Some registries did not provide life tables (or the corresponding death and population counts) for each calendar year covered by their cancer data. We constructed life tables for any intervening years by linear interpolation of the age-specific death rates. If the calendar span of life tables was shorter than the calendar span of the cancer incidence and follow-up data, life tables for the earliest or latest available year were used for the missing years, i.e. without extrapolation, so that we would have estimates of background mortality for every year included in the cancer data.

Life tables from death and population counts (categories i and ii)

In all, 172 (62%) of the participating registries provided data on the numbers of deaths and the population size (death and population counts) by age and sex (table 1). A flexible multivariable model (flexible Poisson model) [8] was used to derive the required age- and sex-specific mortality rates. This method was chosen because it was recently recommended for the estimation of smoothed age-specific mortality rates for small populations [8]. This approach also allowed for the modelling of mortality rates by race or ethnicity, where the data were available.

The death counts were modelled separately for each sex and calendar year, within the generalised linear model framework, using a Poisson error and log link. Person-years at risk were used as the offset:

$$ log\left({d}_x\right)={\beta}_0+ f(x)+ log\left( pyr{s}_x\right) $$

where x denotes age in years, d x denotes the age-specific death count, β 0 denotes the coefficient at baseline (i.e. the log of the mortality rate at the reference age), f(x) denotes a restricted cubic spline function on age, and pyrs x denotes the age-specific person-years at risk.

The model was implemented using the Stata command mvrs (multivariable regression splines) [9] in Stata 13. Splines are made up of piecewise polynomial functions joined at locations called knots. The process we used to select the knot locations is summarised in Additional file 1 and in detail elsewhere [8]. We used the flexible Poisson model with a continuous interaction between race/ethnicity and age to construct race/ethnicity-specific life tables for the Israel National Cancer Registry (ethnicity) and the 44 US states (race). Further details are provided in Additional file 2.

We used three calendar years of death and population counts around a central year, so that the resulting life tables would not be as susceptible to year - on - year fluctuations.

Life tables from mortality rates (categories iii and iv)

We obtained age-specific mortality rates from 83 (30%) of the participating registries (Table 1). Of these, 73 (88%) provided mortality rates by single year of age (complete life tables) and 10 (12%) provided rates by five-year age group (abridged life tables). Of those registries that submitted complete life tables, 56 (77%) provided smoothed versions for each calendar year submitted (where the raw, age-specific mortality rates had been modelled up to age 99 years to remove any random fluctuations by age) and 17 (23%) did not.

Where the mortality rates we received had not been smoothed, we used the Ewbank relational method [10] to derive a smoothed mortality profile for the given population. The Ewbank method is an extension of the Brass relational method [11]. The Brass method involves plotting the linear relationship between the logits of two survivorship functions, one from a standard life table and the other from observed data. Plotting this linear relationship provides information on two parameters, one for the level of mortality in the model (a) and another for the slope of the observed survivorship curve relative to the standard curve, i.e. the relation between young and old age mortality in the observed data relative to the standard (β). These two parameters are then used to determine the shape of a smoothed survivorship function for the observed data. The Ewbank method includes two additional parameters: one for childhood mortality (κ) and another for mortality at older ages (λ). The parameter for childhood mortality applies before the median age at death in the population. The parameter for mortality at older ages applies after the median age at death.

If mortality rates were available by single year of age up to 99 years, we used all four parameters (level of mortality, relation between young and old age mortality, childhood mortality, older-age mortality). In many populations, the median age at death was close to 80 years of age, or higher. For abridged life tables, in which the highest age group is typically for ages 85 years and above, this meant that data to estimate values for the older-age mortality parameter (which only applies after the median age at death) were often available for only one or two age groups. This has previously been found to cause instability in the estimated older-age mortality parameter, leading to unreliable estimates of older-age mortality [12]. For abridged mortality rates, we therefore used only three parameters and constrained the parameter for older-age mortality to be a factor of the parameter for the level of mortality [10].

Registries for which no reliable data were available (category v)

We were unable to obtain reliable life table data from 24 (8%) registries (Table 1). For 23 of these registries, we used country-level life tables by age group (abridged) for calendar periods 1995–2000, 2000–2005 and 2005–2010 obtained from the UN Population Division (UNPD) [13]. We centred these on years 1997, 2002 and 2007 and smoothed the abridged values using the Elandt-Johnson method [14]. The Elandt-Johnson method has been recommended for deriving single-year-of-age life tables from abridged ones [15]. As above, we produced life tables for individual calendar years by age-specific linear interpolation between the life tables for each of the three calendar periods. For one of these registries, Gibraltar, no life table data were available from the UNPD [13], WHO [16], Global Burden of Disease Study [17] or the Human Mortality Database [18], so we used the life table we constructed for England.

Evaluation and comparison of derived life tables

Life expectancy at birth is a summary measure of age-specific mortality. We calculated life expectancy at birth, the infant mortality rate (probability of dying between birth and exact age 1), childhood mortality rate (probability of dying between birth and exact age 5), and the probabilities of dying between exact ages 15 and 60, 60 and 85, and 85 and 99 years from each of the derived life tables.

Life expectancies at birth and the probabilities of death were summarised in a standardised report for each cancer registry (see example in Additional file 3). The reports included plots of the smoothed mortality curves on both logarithmic and arithmetic scales.

Performance of the flexible Poisson model was also evaluated from plots of the deviance residuals at each age. Deviance residuals are a measure of how closely the modelled values fit the observed data. The residuals should be approximately normally distributed, with a constant range, if the model fits the data well [19]. We deemed the model to be performing well if the standardised deviance residuals were in the range −2 to +2.

Results

In total, 6,514 life tables were constructed as part of the CONCORD-2 study: of these, 6,392 life tables were constructed for 223 (80%) registries with the flexible Poisson model, the Ewbank method or the Elandt-Johnson method. A further 35 registries (12.5%) provided smoothed life tables that did not cover all calendar years; for these registries, we constructed 122 life tables by linear interpolation. We received smoothed, complete, life tables for all calendar years from 21 registries (7.5%). No modifications were required for these life tables.

The type of data received varied by continent (Table 2). With the exception of Mauritius, no African registries provided reliable life table data, whereas the great majority of registries from the Americas (North, Central and South) provided death and population counts.

Table 2 Type of life table data obtained from each cancer registry: number of registries, by continent

Life expectancy at birth was higher among females than males in all populations except Mali (Bamako Cancer Registry), throughout 1995–1999, 2000–2004 and 2005–2009 (Additional file 4).

Global variation in life expectancy at birth was very wide (Table 3; Figs. 1 and 2; Additional file 4). Life expectancy was highest in Japan for males and females throughout the 15-year period 1995–2009. It was lowest in Nigeria during 1995–1999 (47 years for females, 46 years for males) and Lesotho during 2000–2004 and 2005–2009 (e.g. 45 years for females, 45 years for males in 2005–2009; Table 3, Additional file 4). During 1995–1999, life expectancy at birth ranged from 47 to 84 years among females and 46 to 78 years among males. During 2005–2009, it ranged from 45 to 86 years among females and 45 to 80 years among males. The largest range within any continent was observed in Africa, where life expectancy in 2005–2009 varied by as much as 30 years between populations (from 45 to 77 years among females, and 45 to 72 years among males). If North Africa is considered separately from East, West and South Africa, however, the ranges within these two regions are narrow. The narrowest range was observed in North America during 1995–1999 (6 years for females, 4 years for males) and Oceania in 2005–2009 (9 years for females, 10 years for males).

Table 3 Adult probabilities of dying, and life expectancy at birth: range, by calendar period and sex
Fig. 1
figure 1

Life expectancy at birth (years), by sex: earliest and latest years of life table data. Each data point represents a single population, either for a registry territory, or for a racial/ethnic sub-population within a given territory. This figure shows the change in life expectancy at birth, by sex, between the earliest and latest years for which life table data were submitted, in the general population of 279 participating registries, covering 327 populations. The diagonal represents ‘no change’ between the first and last years: data points above the diagonal denote an increase in life expectancy for that population

Fig. 2
figure 2

Life expectancy at birth (years): range, by continent, calendar period and sex. The numbers in brackets beside each calendar period denote the number of registries contributing life table data for that calendar period. Each dot on the graph represents a registry population or sub-population

Striking changes occurred in life expectancy in some countries between the earliest and latest calendar years for which we obtained life table data (Fig. 1). Among males, life expectancy fell by up to six years in South Africa and Lesotho. Among females, it fell by nine and seven years, respectively. By contrast, life expectancy rose by six years or more in Estonia, Latvia (males only), South Korea, São Paulo (Brazil, males only), Changle (China, females only), Haining (China), and East Germany (males only).

These variations and trends in life expectancy at birth summarise the underlying patterns and trends in age-specific mortality, which also varied very widely (Table 3; Figs. 3, 4 and 5; Additional file 4). Worldwide, the greatest range in the probability of death among adults was seen in the age range 60 to 85 years, both during 1995–1999 (37.9% to 93.7% among females; 55.9% to 94.3% among males) and during 2005–2009 (31.5% to 93.5% among females; 51.4% to 93.9% among males).

Fig. 3
figure 3

Probability of dying between ages 15 and 60: range, by continent, calendar period and sex. The numbers in brackets beside each calendar period denote the number of registries contributing life table data for that calendar period. Each dot on the graph represents a registry population or sub-population

Fig. 4
figure 4

Probability of dying between ages 60 and 85: range, by continent, calendar period and sex. The numbers in brackets beside each calendar period denote the number of registries contributing life table data for that calendar period. Each dot on the graph represents a registry population or sub-population

Fig. 5
figure 5

Probability of dying between ages 85 and 99: range, by continent, calendar period and sex. The numbers in brackets beside each calendar period denote the number of registries contributing life table data for that calendar period. Each dot on the graph represents a registry population or sub-population

Where we obtained background mortality data by race or ethnic group, the majority group (whites in the United States, Jews in Israel, Non-Maoris in New Zealand) tended to have higher life expectancy at birth than the other subgroup(s). Malaysia (Penang Cancer Registry) was the exception, where life expectancy among the Chinese (23% of the population) was higher than among the majority Malay (50%) population [20] (Fig. 6; Additional file 4). Correspondingly, there were clear disparities in age-specific mortality, but the rates tended to converge among the elderly.

Fig. 6
figure 6

Life expectancy at birth (years) by race/ethnicity and sex: 2005–2009. Data are presented for Israel, Malaysia, New Zealand and 38* states of the United States. Hollow diamonds denote the majority ethnic group in each population. * Six metropolitan SEER registries were also included in the CONCORD-2 study, but the life tables used were those of the parent state (see text)

For some states in the US, the population of blacks is so small that death counts were not available for several age groups. We were therefore unable to construct robust life tables for blacks in Hawaii, New Hampshire, Montana, Idaho or Wyoming, even with the flexible Poisson model. For Utah and Alaska, the black life tables were also based on small counts, but data were available for enough age groups for us to construct life tables for use in survival analyses.

Impact of using state- and race-specific life tables

We compared five-year net survival estimates for the 44 participating US registries for patients diagnosed during 2005–2009, obtained using state- and race-specific life tables that we had constructed using the flexible Poisson model, with the corresponding survival estimates derived with the national, race-specific life tables obtained from the National Center for Health Statistics (NCHS) [21]. For this comparison, we chose four cancers with very different prognosis: breast and prostate (high), colon (medium) and lung (low).

Absolute differences between the two sets of survival estimates were greatest for states where life expectancy at birth differed most from the national average, and for cancers with a better prognosis. They were smallest for states where life expectancy at birth differed least from the national average, and for cancers with a poor prognosis, where the majority of deaths were excess deaths. Differences were largest for men with prostate cancer, and for women with breast cancer, and smallest for lung cancer in both sexes (Table 4). The greatest difference was 3.6% for prostate cancer in Mississippi.

Table 4 Absolute differences in five-year net survival estimates obtained using national versus state-specific, race-specific life tables

Discussion

In order to establish worldwide surveillance of population-based cancer survival trends in the CONCORD-2 study [1], we needed to obtain or construct life tables of background mortality by age, sex and calendar year that were as specific as possible for each registry population or sub-population.

This was particularly important in light of the tremendous intra-continental and even sub-national variations in background mortality. The UN Population Division, the World Health Organisation and the Global Burden of Disease study regularly produce life tables for countries worldwide [13, 16, 17, 22], but they are for countries, rather than sub-regions or ethnic/racial groups, and they may not accurately reflect the background mortality in the specific population(s) covered by a cancer registry.

We were obliged to use several methods to construct the life tables, because of the different types of data available from the registries (complete or abridged death and population counts; complete or abridged mortality rates; no reliable data). These methods involved different assumptions about the shape of age-mortality patterns and the rate of increase of mortality at older ages. The different assumptions made in the construction of the life tables may have had an impact on the subsequent estimates of net survival, and this warrants further investigation.

We recommend using the multivariable flexible Poisson model to construct life tables for future international comparisons of population-based cancer survival. We found that this method performed well, even for small populations. It does not rely on an external standard population or a pre-defined set of coefficients, and therefore does not make strong assumptions about the age-pattern of mortality. It was also recently found to perform better than the Elandt-Johnson method and a flexible relational method (based on the Ewbank approach) for small populations [8].

Life expectancy at birth varied by more than 30 years among the 327 populations examined in the 279 registries. In Canada alone, during 2005–2009, life expectancy differed by 10 years between residents of Nunavut (females 73.4 years; males 68.3 years) and British Columbia (females 83.4 years; males 78.9 years). These differences are probably explained by the very different demographic profiles of these two provinces: aboriginal people made up 86% of the population of Nunavut in 2011 [23].

In most populations, life expectancy increased during 1995–2009, but in Lesotho and South Africa it fell by as much as 6 years, most probably because of the HIV/AIDS epidemic emerging in those countries during the 1990s [24, 25].

We constructed ethnic or race-specific life tables for Israel, Malaysia (Penang Cancer Registry), New Zealand and the US. These life tables showed marked differences in background mortality between the ethnic and racial sub-populations in each country. In 5 of the 44 participating US states (Hawaii, New Hampshire, Montana, Idaho, Wyoming), it was not possible to construct sufficiently robust life tables for blacks. However, we were able to use race- or ethnic-specific life tables to estimate net survival in 39 of the 44 US registries, and for Israel, Penang (Malaysia) and New Zealand, controlling for background mortality by age and sex separately within each race or ethnicity. This was a strength of the CONCORD-2 study.

Examination of the impact of using race-specific life tables for each US state on estimates of net survival showed that age-standardised estimates differed by up to 3.6%, when compared with estimates obtained with the national race-specific life tables that have been used in the past. The differences were more marked for cancers with better prognosis. This is in line with previous findings [2, 26]. The largest difference observed was in the estimate of age-standardised five-year net survival for prostate cancer in Mississippi, which was 3.6% higher when derived with state- and race-specific life tables than when using national life tables. The explanation is that background mortality among adults in Mississippi is considerably higher than the US national average, for both blacks and whites. National life tables therefore under-estimate background mortality in Mississippi, leading us to over-estimate excess mortality and subsequently underestimate net survival. Of note, we did not investigate age-, sex- or race-specific differences in net survival estimated with the alternative life tables. Differences in net survival from those obtained with national life tables will be larger in some of the groups defined by age and race and, for other cancers, sex. This is a further reason for using the most specific life tables that can be obtained.

Stroup et al. [26] recently examined the differences between estimates of relative survival by age, sex and race for 17 SEER registries, obtained either with state- and race-specific life tables or with national, race-specific life tables. The differences were greatest for patients aged 85 years or over, and differed in both direction and magnitude by race and sex. They deemed the NCHS state- and race-specific life tables unreliable above age 85 and recommended against using them to estimate relative survival for patients aged 85 and older. We have compared the probabilities of dying between ages 85 and 99 years for each state, race and sex derived from the CONCORD-2 life tables with those derived from the national, race-specific life tables available from NCHS. Estimates of the probability of dying between ages 85 and 99 were higher for black males and females, and to a lesser extent for white males, when derived from the CONCORD-2 state- and race-specific life tables than when derived with the national life tables. For white females, the estimates derived from the CONCORD-2 life tables were fairly evenly distributed around the corresponding national estimates.

We opted to use life tables that were specific to each registry or sub-population, wherever possible, in order to reflect as closely as possible the background mortality that an individual cancer patient would be expected to experience. This is critical when estimating net survival. Accurate life tables allow us to estimate the net survival of cancer patients within each population, rather than an approximation obtained with a less specific life table. This is particularly important for worldwide comparisons of cancer survival.

This was the first opportunity for us to use our flexible Poisson model [8] to construct life tables for such a large and disparate set of populations. The model generally performed well, even when the death counts were sparse. In some US states (California, New York, Alaska and Colorado), however, where the races classified as “Other” (American Indian/Alaska Native, Asian/Pacific Islander) made up a large proportion of the total population, the fit of the life table for blacks was not ideal, particularly at the oldest ages. This is probably because the interaction between race and age was modelled as a continuous variable, meaning that greater weight was given to the shape of the mortality function for “other” races at older ages than to that for blacks. This may have contributed to some over-estimation of background mortality among elderly blacks. We determined that inclusion of an interaction between race and age in the model as dummy variables (see Additional file 5) provided a better fit of the race-specific mortality rates to the underlying data, and we produced a new set of life tables for the US registries with this approach. We have also explored the inclusion of calendar year as a covariable: we believe this could facilitate construction of more robust life tables for future comparisons of cancer survival.

In some circumstances, we were limited by the availability of data, in others by data quality. For some populations, raw infant and child mortality rates were very low, suggesting some undercounting of infant deaths. In some populations, (sometimes the same populations), mortality rates at very old ages were remarkably high, perhaps indicating inaccurate reporting of the age at death (too high) or undercounting of older-age populations. Where possible, we compared our estimates of life expectancy with those available from the national statistical authority for the country or region covered by the cancer registry. These were generally very comparable. The quality and completeness of civil and vital registration statistics varies substantially around the world, however, and only about one-third of all deaths are actually registered. To improve the robustness of research on demographic trends it is critical to push for the improvement of vital statistics data worldwide [27, 28]. This would benefit international public health research, including international comparisons of cancer survival, by enabling wider participation, particularly in low- and middle-income countries.

Conclusions

The methods by which life tables are constructed for the estimation of cancer survival depend on the nature and detail of the available data on death and population counts, or mortality rates.

To our knowledge, this study represents the first time such a large number of national and sub-national life tables have been constructed on a global scale. We found wide variations and major changes over time in life expectancy at birth, and in patterns of mortality by age, between 327 populations in the territory of 279 participating registries. This highlights the importance of using life tables that are as specific as possible to the populations or sub-populations for which comparisons of net survival are required, in order to control adequately for variations in background mortality between populations and over time.

The life tables used by the CONCORD programme are all available online. They can be downloaded from the Cancer Survival Group website [29].