1 Introduction

Life expectancy at birth is the single most used demographic measure of population health and of the well-being of a population (Canudas-Romo & Becker, 2011). It can be defined as the average number of years lived by newborns who would be exposed throughout their lives to the conditions observed during a particular period. Because it is not affected by the age structure of the population, life expectancy is a useful indicator to make comparisons between countries, regions or population groups. Another useful health outcome is the infant mortality rate (IMR), since the determinants of infant mortality are associated with the living conditions and the socioeconomic development of populations (Reidpath & Allotey, 2003). Call to Action 19 of the Truth and Reconciliation Commission of Canada final report asks for the establishment of “measurable goals to identify and close the gaps in health outcomes between Aboriginal and non-Aboriginal communities, and to publish annual progress reports and assess long-term trends” (Truth and Reconciliation Commission of Canada, 2015).Footnote 1 In this context, and given the importance of these indicators for the comparison of population health outcomes among populations and over time, the unavailability of estimates of life expectancy at birth and IMR among Indigenous populations is an important data gap.

Statistics Canada publishes annual estimates of life expectancy and IMRs for the country and the provinces and territories. However, the methodology cannot be used to obtain disaggregated results for Indigenous populations because the data sources—the Canadian Vital Statistics - Death database (CVSD), the Canadian Vital Statistics - Birth database (CVSB) and the Demographic Estimates Program—do not contain information on Indigenous identity. In this study, we use an alternative data source, the Canadian Census Health and Environment Cohorts (CanCHECs), to compute life expectancy of Indigenous populations at different periods over time. The CanCHECs are data linkages between the long-form census questionnaire, which contains information about the self-declared identity of individuals, and the CVSD. Another data source, the Canadian Birth Census Cohorts (CanBCCs), consists of linkages of births, stillbirths and infant deaths from the CVSB and the CVSD with the long-form census questionnaires and is used to compute IMRs.

The paper is structured in six distinct parts. Following this introduction, a succinct review of literature regarding previous attempts at measuring life expectancy of Indigenous populations in Canada is presented. Next, the methodological issues that are of concern for the study of mortality in Indigenous populations are described. Section 3 presents the alternative data sources used in this paper—namely the CanCHECs and CanBCCs—along with their strengths and limitations. Section 4 presents an overview of the methods. The results are provided in Section 5, along with a brief analysis of the main trends. The concluding part of the paper summarizes the results and discusses the limitations of the methods developed for the estimation of life expectancy at birth for Indigenous populations.

2 Literature Review

Civil registration of deaths in Canada is a responsibility of the provinces and territories. All of them collect information critical to demographic and health research, such as births, deaths and causes of death, but only a few of them capture information about the Indigenous identity of the deceased.Footnote 2 Consequently, mortality indicators for the Indigenous population have to be computed either by (a) using other data sources, such as the Indian Register (IR); (b) using an ecological approach, where the results relate to a specific region with a known large density of Indigenous people; or (c) linking vital statistics data with other data sources containing information about Indigenous identity.

The IR is the official record of people registered under Section 6 of the Indian Act maintained by Indigenous Services Canada. The IR was used in a series of reports prepared for Indian and Northern Affairs CanadaFootnote 3 by Rowe and Norris (1985), Loh (1990), and Loh et al. (1998) to assess mortality of Registered Indians and to support the development of projections of the Registered Indian population. A new series of estimates was published in 2004 (Verma et al., 2004), results that were revisited in the context of the development of the Human Development Index for the United Nations (Cooke et al., 2007). Other estimates of life expectancy computed from the IR as a stand-alone data source have been provided by Trovato (2011, 2014) and Amorevieta-Gentil et al. (2014). An obvious limitation of the IR for estimating life expectancy and IMRs is that it covers only the Registered Indian population. Another limitation is that significant adjustments must be made to the data to account for delays in reporting and for the underreporting of births and deaths in the IR. Table 6 in the Appendix presents estimates of life expectancy at birth of Registered Indians collected from various sources.

The ecological approach is the fastest and most cost-effective way to generate estimates for marginalized populations. It was used in a study of mortality for the Inuit population (Wilkins et al., 2008; Senécal et al., 2018). A study of the concentration-coverage curve from 1996 Census data concluded that the place of residence can effectively be used as a proxy for Indigenous identity for the Inuit population, but not for the Métis population (Finès, 2008). That said, results from the 2016 Census show that as much as 27% (Statistics Canada, 2017a) of the Inuit population is living outside Inuit Nunangat, so estimates would exclude a substantial proportion of the overall Inuit population. Moreover, changes in the composition of the population may undermine comparability over time. Other examples of the use of the ecological approach can be found in a study evaluating how various indicators of mortality differ in function of the concentration of Indigenous residents (Public Health Agency of Canada, 2018) and in a study of stillbirths and IMRs in Indigenous communities in the province of Québec (Gilbert et al., 2015).

Another way to produce disaggregated statistics for specific population groups is through linkages of health statistics with other data sources to retrieve the Indigenous identity of the deceased and obtain denominators needed for the calculation of death rates. Thanks to the availability of CanCHECs and CanBCCs, notable efforts have been made to document the mortality outcomes of Indigenous populations in the recent past. For example, Park (2021) used the 2006 CanCHEC with a 10-year follow-up period to compute mortality rates for on- and off-reserve First Nations people and the non-Indigenous population for some provinces, a group of provinces and the territories. Tjepkema et al. (2019a) used CanCHEC data from 1991 to 2011 to look at the life expectancy of First Nations people, Métis and Inuit (at various ages other than zero) at the national level, and its evolution over time. Perinatal outcomes (including infant death) occurring from 2004 to 2006 have been studied at the national level using the 2006 CanBCC among First Nations people, Métis and Inuit in Canada (Sheppard et al., 2017) and among First Nations people with a comparison between individuals living on a reserve versus outside a reserve and those with Indian registered status versus without Indian registered status (Shapiro et al., 2018).

These studies have provided important insights. For example, in line with previous research, Tjepkema et al. (2019a) found that Indigenous populations have substantially lower life expectancy than non-Indigenous populations, with noticeable differences between First Nations people, Métis, Inuit and non-Indigenous people. The study also found little sign of convergence with the total population in Canada over time. Gains in life expectancy of First Nations adults were not as large as those of non-Indigenous adults from 1996 to 2011. Gains for Métis were comparable to those of non-Indigenous individuals, but the authors warn that the large increase in the number of census respondents identifying as Métis in the census over time could influence the results. In fact, a number of factors call for caution in interpreting comparisons over time from CanCHEC data (discussed in Section 3). Research conducted using CanBCCs also showed that Indigenous populations have less favourable outcomes than non-Indigenous populations regarding IMR (Shapiro et al., 2018; Sheppard et al., 2017). A study based on the 2006 CanBCC found that IMRs were more than twice as high as those observed among the non-Indigenous population (Sheppard et al., 2017).

Despite the richness of the insights provided by these studies, there remain important data gaps for steady measurements of health outcomes and assessment of long-term trends among Indigenous populations, as demanded by Call to Action 19 of the Truth and Reconciliation Commission of Canada final report. Responding to the task requires calculating series of health outcomes in a consistent manner for all Indigenous population groups and at regular intervals (it is the goal of the present study to explore the potential of data linkages to achieve this task at the most disaggregated level possible).

Other examples of the use of multiple linked data sources in Canada can be found in a linkage of the membership list of the Manitoba Métis Federation to provincial health statistics to produce estimates of life expectancy at birth and IMR, among other health indicators, for the Métis population (Martens et al., 2010) and a linkage of the Alberta Vital Statistics Death File with administrative data sources such as the First Nations Status Registry (Government of Alberta, 2021). Data linkages have also been used for estimating death rates in other countries or regions of the world, such as in New Zealand (Tan & Blakely, 2012; Blakely et al., 2000, 2009), the United States (Arias et al., 2021), the United Kingdom (Schofield et al., 2019) and Scotland (Gruer et al., 2016; Boyle et al., 2009). Examples of linkages of census data with birth and death records for estimating IMRs in the United States can be found in Ely and Driscoll (2022) and Wong et al. (2014). Data linkages are also used in countries where Indigenous identity is captured on death certificates to correct for discrepancies between vital registration and censuses or other sources of disaggregated population counts (Espey et al., 2014; Australian Institute of Health and Welfare, 2011; Australian Bureau of Statistics, 2018; Coleman et al., 2016; Choi & Smith, 2018). The impact of misclassifications on death certificates is often not negligible and may lead to an underestimation of death rates of Indigenous populations (Arias et al., 2008; Harwell et al., 2002; Stehr-Green et al., 2002). The discrepancies may happen in part because the information is not collected the same way in vital registration (numerators), where information regarding Indigenous identity is obtained via proxy or from observation, and the census (denominators), where identity is either self-reported or reported by another member of the household. The fact that identity is a construct and subject to change over time may also explain why discrepancies occur (Anderson et al., 2014).

3 Data Sources

3.1 Canadian Census Health and Environment Cohorts

The CanCHECs are a series of population-based probability-linked datasets that combine long-form census questionnaire (or the 2011 National Household Survey [NHS]) data with health outcome data from administrative sources such as the CVSD, the Canadian Cancer Registry, the Discharge Abstract Database and the National Ambulatory Care Reporting System (Statistics Canada, 2022a). These linked datasets follow a cohort of Canadians over time for specific health outcomes such as mortality, cancer or hospitalizations. Sampling weights are computed to make the cohorts more representative of the eligible population and to reduce bias attributable to missed links. With sampling weights, the CanCHEC data can be considered representative of the population of private households at the time of census collection. A series of 500 bootstrap weights was computed exclusively for proper calculation of variance with the 2006 and 2011 CanCHECs. For the 2016 CanCHEC, a series of 100 replicate weights were computed for variance estimation with the balanced repeated replication method.Footnote 4 Details on how the cohorts have been constructed can be found in Tjepkema et al. (2019b).

In this study, we use data from three CanCHEC cycles: 2006, 2011 and 2016. Note that although the construction of CanCHECs started with the 1991 Census, versions before 2006 did not contain deaths of individuals who were under 25 years of age, and therefore could not be used for calculating life expectancy at birth. With the exception of First Nations people living on reserve and the population living in remote and northern areas, where a 100% sampling design is used, only a sample of Canadian households receives the long-form questionnaire (approximately one in five in 2006, one in three in the 2011 NHS and one in four for the 2016 Census). This means that for a large part of the population of interest, only a fraction of the observed health outcomes occurring each year is captured. A follow-up of five years is used with the 2006 Census and the 2011 NHS so that periods of observations do not overlap. At the time of producing these numbers, the 2016 CanCHEC had data linked up to 2019, so the length of the follow-up is three years.Footnote 5 This has the advantage of excluding 2020 and 2021, which were affected by the COVID-19 pandemic, resulting in consequences on life expectancy levels and trends (Dion, 2021). Research suggests that Indigenous populations have been disproportionately affected by the pandemic in Canada (Hahmann & Kumar, 2022). While it would be of interest to examine the way in which the pandemic affected life expectancy of different population subgroups, including these years would tend to blur longer time trends, especially with only three available data points.

Table 1 provides an overview of the sample sizes of the 2006, 2011 and 2016 CanCHECs at the national level. Despite the multiple years of mortality follow-up, the number of deaths remains modest for small population groups. Small sample sizes limit substantially the extent to which we can further disaggregate the data and have consequences on the reliability of the results.

Table 1 Overview of sample sizes for deaths and person-years in the 2006, 2011 and 2016 Canadian Census Health and Environment Cohorts (in thousands)

A significant limitation of the CanCHECs is that they include only the population of private households, therefore excluding the population living in any of the following at the time of the census: (1) institutions such as correctional facilities, nursing homes and group homes; (2) non-institutional collective dwellings such as rooming houses, shelters and hotels; and (3) those with no fixed address (i.e., unhoused).Footnote 6 The impact may be assumed to be negligible given the relatively small proportion of Canadians living in institutions and non-institutional collective dwellings, but Indigenous populations tend to be overrepresented among them. Registered Indians, particularly young adults, were found to have high rates of institutionalization and homelessness (Feir & Akee, 2018). More generally, Indigenous populations, and Indigenous women in particular, were found to be overrepresented in the Canadian justice system (Correctional Services Canada, 2003; Guimond, 2003). Table 7 in the Appendix shows the age- and sex-specific mortality rates per 1,000 person-years based on the 2011 CanCHEC (five-year mortality follow-up period) compared with the age- and sex-specific mortality rates corresponding to the entire population based on Statistics Canada’s vital statistics and demographic estimates. For most age groups, mortality rates based on CanCHECs are lower than those for the entire Canadian population, with greater differences observed in the older age groups because of the exclusion of the institutional and collective dwelling populations.

The reliability of linked data files depends on how successfully individuals were linked. Linkage rates for specific population groups in CanCHECs are not available, but Christidis et al. (2018) note that linkage rates of in-scope census and NHS records linked to the Derived Record Depository, a dynamic relational database created from birth, death, immigration and tax files used to facilitate the linkages, are lower for Indigenous populations (although they do not indicate by how much). Lower linkage rates could be attributable in part to lower response rates to the census and NHS. While response rates are not available specifically for Indigenous populations, Bérard-Chagnon and Parent (2021) find census net undercoverage to be lower than average on reserves in 2006, 2011 and 2016. Specific measures are taken to ensure data quality and good response rates in remote and northern areas of the country and on reserves, including early enumeration for populations who tend to migrate out of their communities before Census Day and the use of canvassers to visit dwellings, a method deemed more efficient than self-enumeration in these areas (Statistics Canada, 2017b). Nevertheless, there are factors that could potentially lead to higher non-response in Indigenous communities and in the Indigenous population in general, including an incomplete count on some reserves because of housing counts that were either not allowed or interrupted before they could be completed (Statistics Canada, 2019a), difficulties in reaching unhoused and highly mobile individuals (who are overrepresented in the Indigenous population) (Smylie & Firestone, 2015), or apprehensions about government data collection or barriers related to language or literacy and numeracy (Wright et al., 2020). To minimize impacts of missed links and to ensure representativeness, weights were created from existing census and NHS weights (Tjepkema et al., 2019b).

Finally, the computation of life expectancy at birth requires the estimation of the mortality rate at age zero (IMR), which is not possible with CanCHECs. Consequently, the IMRs have to be either estimated from a different data source or estimated through modelling methods (see Sections 4.1 and 4.2). More information about CanCHEC data can be found in Tjepkema et al. (2019b).

3.2 Canadian Birth Census Cohorts

The 1996, 2006 and 2016 CanBCCs were created to examine patterns and disparities in perinatal health across socioeconomic and ethnocultural groups in Canada. The cohorts are made of birth, stillbirth and infant death events that occurred in the two years prior to Census Day and linked to the census long-form questionnaire sample to obtain detailed characteristics of parents (e.g., education, immigrant status and Indigenous identity). Missed links represent a reduction of about 10% in cohort sizes. Cohort weights have been generated to adjust for the census sampling design, census non-response, and missed linkages between the birth and census databases. Series of bootstrap weights were also computed to account for the variability arising from sampling, non-response, the linkage process and the stochastic variability inherent in vital events.

The 1996 CanBCC is comprised of live birth, stillbirth and infant death records from May 1994 to May 1996 linked to parents captured in the 1996 Census long-form questionnaire sample. The total sample size includes 97,006 births out of a possible 466,170 in-scope births. Contrary to the 2006 and 2016 CanBCCs, the 1996 CanBCC excludes births that occurred in Ontario because of incompleteness of birth registration. The 2006 CanBCC covers the period from May 2004 to May 2006 and is built from a linkage to the 2006 Census long-form questionnaire sample. Its sample size is 135,426 births out of a possible 687,340 in-scope births. The 2016 CanBCC covers the period from May 2014 to May 2016 and results from a linkage to the 2016 Census long-form questionnaire sample. Its overall sample size includes 195,947 births out of a possible 773,904 in-scope births.

There are some limitations with the use of CanBCCs. First, the sample sizes are very small, affecting the reliability of the estimates—the deviations are often very large—and the extent to which the results can be disaggregated (see Table 2 for an overview of weighted counts and sample sizes). A second limitation is that low linkage rates among the Indigenous population can yield potential biases if non-linked individuals differ substantially from those who could be linked, as explained earlier. More information about CanBCC data can be found in (Bushnik et al. (2016).

Table 2 Overview of sample sizes and weighted counts for the number of infant deaths in the 1996, 2006 and 2016 Canadian Birth Census Cohorts

4 Methods

4.1 Estimation of Death Rates from Canadian Census Health and Environment Cohorts

In CanCHECs, the records linked to the CVSD are followed for several years after the census. They represent individuals who age over the course of the follow-up. Estimates of age-specific death rates are necessary for the calculation of life expectancy at birth. The numerator of these rates consists of the number of age-specific deaths, whereas the denominator consists of age-specific estimates of person-years lived. The latter can be thought of as the exposure, or the total number of people along with the amount of time each individual was exposed to risk. Life expectancy at birth is computed using abridged life tables, with the last age group being 85 years and over. No adjustment or imputation is made when there are no deaths in an age group, as recommended by Eayres and Williams (2004). Indigenous identity is defined using information related to the self-reported Registered Indian status and the self-reported Indigenous group of First Nations, Métis or Inuit contained in the long-form census questionnaire. In this study, the following classification is used:

  1. 1.

    Registered First Nations people: Individuals who self-identified with the First Nations group (single identity) and reported Registered Indian status in the census.

  2. 2.

    Non-Registered First Nations people: Individuals who self-identified with the First Nations group (single identity) and did not report Registered Indian status in the census.

  3. 3.

    Métis: Individuals who self-identified with the Métis group (single identity) in the census.

  4. 4.

    Inuit: Individuals who self-identified with the Inuit group (single identity) in the census.

  5. 5.

    Total Indigenous population: Individuals who belong to any of the categories above, in addition to those who self-identified with more than one Indigenous group and those who only self-reported having Registered Indian status, being a member of a First Nation or being a member of an Indian band.

  6. 6.

    Non-Indigenous population: Individuals who did not self-identify with an Indigenous group and did not report having Registered Indian status in the census or being a member of a First Nation or Indian band.

One caveat of CanCHEC data is that they consist of closed cohorts (i.e., no infant born during the follow-up period is added to the existing cohort). Figure 1 illustrates the double classification of events (deaths) for infants under 1 year in CanCHECs. The periods (years of follow-up) in which those events occurred are shown on the x-axis and are defined in relation to the first day (in this case, Census Day = 0), while the age at which events occurred is shown on the y-axis. The life of an individual can be portrayed as a diagonal advancing from their birth at age zero at a certain time on the x-axis to the right (in time) and upward (in ages). The only quantities available for calculating the IMR available in CanCHECs are the population at age 0 at the start of the follow-up on Census Day—that is, those babies under 1 year of age who were already born at the time of the census—denoted by P0 (segment in bold, in Fig. 1) and the number of deaths that occurred among those babies during the first year of the follow-up, denoted by D− 1,0. In short, deaths of babies born during the follow-up years are missing for the estimation of the IMR. Furthermore, as these deaths are not evenly distributed over the first year of life, deaths occurring in the first months after birth, when mortality rates are highest, are underrepresented, leading to a substantial underestimation of the IMR.

Fig. 1
figure 1

Double classification of deaths for infants under 1 year old in Canadian Census Health and Environment Cohorts (with five-year follow-up)

The limitations of following closed cohorts over time also affect the quality of estimation at ages less than the number of years in the follow-up. For a given age, there can only be as many complete years of follow-up as that age. For example, deaths at age 1 are captured in their entirety only in the first year of the follow-up. Consequently, estimates of death rates at young ages are based on smaller sample sizes and do not reflect the whole period of follow-up.

4.2 Indirect Estimation of Infant Mortality Rate from Canadian Census Health and Environment Cohorts

If IMRs cannot be computed from CanCHECs, a logical solution would be to compute them from CanBCCs, especially since this is a goal of our study. However, this option carries several limitations. A first limitation is that the sample sizes in CanBCCs are insufficient to produce estimates at the level of disaggregation proposed for life expectancy. A second limitation is that CanBCC data are available only at two points in time: 2006 and 2016. For these reasons, a different method had to be developed for modelling the IMR from CanCHEC data.

We use Cox proportional hazards models to evaluate simultaneously the effect of Indigenous identity and age on survival. These models are well adapted to the analysis of longitudinal data because, unlike some other known methods that would simply measure the influence of risk factors on the occurrence of the event (e.g., logistic regression), Cox proportional hazards models measure the influence of risk factors on the time until an event occurred (in this case, the age at the onset of death). This is an advantage because more information is used, giving different weights to events depending on how early or late they happen (van der Net et al., 2008).Footnote 7 The models are applied to CanCHEC data to estimate how the mortality conditions of a specific population group and geography differ from those of the total Canadian population. Age groups are included as a covariate to control for the effect of age on risk of death, as well as differences in the age composition of different populations.Footnote 8 An IMR for a specific population group can be obtained by multiplying the resulting hazard ratio related to that population group by the IMR of the total Canadian population, available from the life tables published annually by Statistics Canada (Statistics Canada, 2020).Footnote 9 Under these specifications, we make the implicit assumption that the factors influencing mortality at all other ages, for a given Indigenous population, are the same ones that impact infant mortality. This is a restrictive view since we know that infant mortality is also influenced by its own set of factors, including prenatal care, access to midwifery services, and access to birth and postnatal services (Smylie et al., 2010). Various model specifications have been tested. Since the emphasis is on computing the IMR for Indigenous populations, one option was to apply the models only to children (e.g., those aged 0 to 9 years), focusing only on the factors affecting the mortality of children. However, the number of children proved to be insufficient, resulting in relative risk estimates with very large variances. Another option was to further increase the sample to include slightly older ages—for example, by including only individuals aged 0 to 40 years—but this runs the risk of having differences in mortality rates at older ages (30 to 40 years), where there are many more deaths, to influence unduly estimation of the relative risks. Since mortality risk differences between the Indigenous population and the total population in Canada tend to be largest in the 25-to-40-years cohort, using cut-offs at age 40 could lead to an overestimation of the true difference in IMR between the two populations. Despite these limitations, the approach of not using the age cut-off produced results that were the closest to those obtained from direct estimation from CanBCC data. Although this offers no absolute guarantee of validity of the chosen specifications, especially given the large variances associated with results from CanBCCs, it is difficult to find a better criterion for evaluation given the paucity of data. Not using age cut-offs also contributes to maximizing sample sizes, hence obtaining more robust estimates. Finally—and perhaps more importantly—the modelling approach reduces the importance of random fluctuations in estimation of life expectancy (in contrast to using CanBCCs), and therefore facilitates the comparisons of life expectancy between groups and over time.

4.3 Direct Estimation of Infant Mortality Rate from Canadian Birth Census Cohorts

IMR is not only a necessary component in the calculation of life expectancy at birth, but also an important indicator, complementing life expectancy for analysis of the mortality conditions of a population. Unlike CanCHECs, CanBCCs are made of open cohorts. The data provide a complete follow-up of two birth cohorts during the first year of life, as shown in Fig. 2. Using the nomenclature of Fig. 1, the two-year average IMR is given by

$$IMR= \frac{\sum\nolimits_{i=-2}^{-1}{D}_{i,i}+\sum\nolimits_{i=-2}^{-1}{D}_{i,i+1}}{\sum\nolimits_{i=-2}^{-1}{B}_{i}}$$

where Bi represents the number of births in year i, Di,i represents the number of deaths during year i of infants born during year i, and Di,i+1 represents the number of deaths during year i + 1 of infants born during year i.

Fig. 2
figure 2

Double classification of events for infants under 1 year old in Canadian Birth Census Cohorts

Indigenous identity of deceased babies is defined following the same categories as for CanCHECs. However, in CanBCCs, it is necessary in most cases to make assumptions based on the identity of the parents (see Table 2). This is because the identity of deceased babies is missing in most cases (e.g., 97% of the time in 2006), attributable in large part to the fact that most infant deaths occurred before Census day.Footnote 10 We use the information about the identity of the linked mothers and fathers to impute identity information to the deceased infants when missing. In cases where the mother and father have distinct identities, children’s records are cloned to create one record with each identity, and the weights are divided by two.

4.4 Calculation of Variance

Careful analysis of differences in the IMR or in life expectancy over time or between population groups must account for the uncertainty surrounding the estimates. This uncertainty may come from two sources. A first source of variability, sometimes dubbed “natural variability,” comes from the fact that deaths are not totally predictable and are assumed to come from a random process with underlying probabilities (Spiegelhalter, 2019). A second source of uncertainty is sampling variability, which stems from the fact that estimates are calculated only from a sample of the population of interest and that using a different sample could have produced different estimates. The estimation of sampling variability must take into account the probability of each record being sampled in the census or NHS and then linked to the CVSD, which may vary according to the characteristics of the individuals. Using resampling methods in conjunction with the bootstrap or replicate weights provided with CanCHECs and CanBCCs, we can estimate the total variance from the two distinct sources of uncertainty.

The variances of life expectancy estimates were computed using a resampling routine developed in SAS.Footnote 11 The method accounts for the impact of the complex survey design used in the census and the NHS and for the covariance between death probabilities by age, which occur when life expectancy is estimated from a sample only of the population (Chiang, 1967; Schenker et al., 2011). Confidence intervals were computed assuming normality of the distribution of life expectancy estimates. Variance estimates of IMR were obtained using the method of balanced repeated replication available in the SAS-callable statistical software package SUDAAN,Footnote 12 using the method described by Phillips (2004). The variability of all estimates is expressed in the form of 95% confidence intervals.

5 Results

5.1 Considerations

The disaggregation of results for relatively small population groups and the fact that death is a relatively rare event imply that estimates of life expectancy and IMRs are often based on small sample sizes, and thus associated with large variances. As a result, it can be difficult to make definitive conclusions about trends over time or assess differences between population groups. For example, when the confidence intervals of two estimates overlap, this would usually mean that it is not possible to discard the fact that differences may be attributable simply to randomness. The selected level of disaggregation reflects choices made to ensure adequate sample sizes. This is why, for example, it is not possible to estimate life expectancy separately by sex and by region, or for each one of the Atlantic provinces.

Other problems relate to small sample sizes. When population sizes are below 5,000, there can be significant biases, in particular as populations increase or decrease, and standard errors tend to deviate sharply from the normality assumption (Eayres & Williams, 2004; Scherbov & Ediev, 2011).Footnote 13 Biases remain noticeable for population sizes up to 10,000, and the normality assumption remains an approximation for population sizes from 5,000 to 50,000 (Scherbov & Ediev, 2011).

The changing composition of the population between censuses constitutes another potential concern (excluding changes in the age composition). One source of heterogeneity that can affect comparability of results over time is the mobility of responses in the census, which is defined by changes in how people respond to questions about Indigenous identity over time in the census (O’Donnell & LaPointe, 2019).Footnote 14 The difficulty is that a portion of the changes in health outcomes among a population group over time may be attributable solely to changes in the composition of this group. Differences in the composition of the population affect comparisons not only over time, but across population groups. Multiple factors can explain the differences in life expectancy between Indigenous and non-Indigenous populations, such as differences in education or income (Bushnik et al., 2020), or place of residence (Greenberg & Normandin, 2011). This study does not aim at explaining why the differences exist but rather to estimate these differences.

Changes made to the definition of reserves over time and changes in the list of incompletely enumerated reserves, for which no data are available, are two additional sources of heterogeneity.Footnote 15 Finally, unlike estimates for 1996, 2006 and 2011, estimates from the 2016 CanCHEC and 2016 CanBCC exclude deaths that occurred in Yukon and deaths of residents of Yukon that occurred in other provinces or territories after 2016, since these deaths are not available in the CVSD. The approach taken in this study was to include all individuals who responded to maximize sample sizes, as opposed to limiting the sample to a smaller subsample of individuals to reduce the number of confounding factors that may affect comparability over time.Footnote 16

5.2 Life Expectancy at Birth

Table 3 presents estimates of life expectancy at birth by sex and population group for Canada for the three periods studied: 2006 to 2011, 2011 to 2016 and 2016 to 2019. There are clear differences between Indigenous population groups and the non-Indigenous population, over seven years across all periods. Life expectancy has been rising steadily for both males and females, in both Indigenous and non-Indigenous populations. Improvements were of greater magnitude between 2006 and 2011 and 2011 to 2016 than between 2011 and 2016 and 2016 to 2019 among Registered First Nations people living on reserve, non-Registered First Nations people, Inuit and the non-Indigenous population. This could be explained by the fact that the second period does not extend as far back in time as the first (8 years versus 10 years), but another potential explanation could be the opioid epidemic that developed in Canada during this period (Statistics Canada, 2019b). The opioid epidemic was found to have a disproportionate impact among Indigenous populations, particularly among First Nations people living on reserve (Carrière et al., 2018; The Alberta First Nations Information Governance Centre and Alberta Health, 2021).

Table 3 Life expectancy by sex and population group with 95% confidence intervals, private household population, Canada, 2006 to 2011, 2011 to 2016 and 2016 to 2019

Figure 3 shows changes in life expectancy observed at the national level over the whole period from 2006 to 2011 to 2016 to 2019. All population groups saw an increase in life expectancy over this period except Inuit and possibly Registered First Nations people living on reserve. The largest increases were observed among non-Registered First Nations people and Métis (over 3.2%). Perhaps not coincidentally, these are the two groups that grew the most through response mobility in the census during these periods, especially from the non-Indigenous population (O’Donnell & LaPointe, 2019). Response mobility can increase the life expectancy of a population group when it welcomes individuals who have identified with a different population group in the past and who live in relatively better socioeconomic conditions than the members of its current population group. For Registered First Nations people, Feir and Akee (2019), who studied mortality rates, also observed a lack of improvement from 1985 to 2013, especially among females.

Fig. 3
figure 3

Change in life expectancy by population group with 95% confidence intervals, private household population, Canada, 2006 to 2011, 2011 to 2016 and 2016 to 2019. Sources: Canadian Census Health and Environment Cohort, 2006 and 2016. Notes: Estimates exclude the institutional and collective dwelling populations. Error bars indicate 95% confidence intervals

More disaggregated data are shown in Appendix Table 8, which presents estimates of life expectancy at birth for both sexes with additional geographic disaggregation. The wide confidence intervals make it difficult to perceive the differences between regions, especially when it comes to specific population groups. However, differences between provinces among Indigenous population groups do not always correspond to those observed among the non-Indigenous population.

5.3 Infant Mortality Rates

Estimates of IMRs are shown in Table 4 for the periods from 2004 to 2006 and 2014 to 2016 for Canada and in Table 5 for the periods from 1994 to 1996, 2004 to 2006 and 2014 to 2016 for Canada without Ontario. IMRs for the total Canadian population in 2005 and 2015 were 5.4 and 4.5 per thousand, respectively, as computed from the CVSB and the CVSD (Statistics Canada, 2022b). These values are similar or very close to those computed from CanBCCs for the periods from 2004 to 2006 and 2014 to 2016, which are 5.4 and 4.6 per thousand, respectively. This suggests that the exclusion of institutional and collective dwelling populations in CanBCCs does not cause significant bias in measurement of IMR.

Table 4 Infant mortality rate by population group, sex and place of residence (census metropolitan area versus non-census metropolitan area), private household population, Canada, with 95% confidence intervals, 2004 to 2006 and 2014 to 2016, per thousand live births
Table 5 Infant mortality rate by population group, sex and place of residence (census metropolitan area versus non-census metropolitan area), private household population, Canada without Ontario, with 95% confidence intervals, 1994 to 1996, 2004 to 2006 and 2014 to 2016, per thousand live births

IMRs have been declining over time in Canada, and this pattern is also observable among most Indigenous population groups between 2004 and 2006 and 2014 to 2016, although the portrait is less clear, potentially because of the large variances. The same can be said when looking at Canada without Ontario for the periods from 1994 to 1996 to 2014 to 2016.

As for life expectancy, there is a gap between Indigenous and non-Indigenous populations. At the national level, in the periods from 2004 to 2006 and 2014 to 2016, IMRs of Indigenous populations are higher than those of non-Indigenous populations by a factor of about 1.8. A study conducted in the province of Quebec showed comparable results for First Nations and Inuit populations for the period from 1996 to 2011 (Chen et al., 2015). One exception is for individuals living in a CMA in the period from 2014 to 2016, where IMRs of Indigenous and non-Indigenous populations are of similar sizes.

When looking at specific population groups, the large fluctuations and variances call for careful interpretation. Despite overlapping confidence intervals, IMRs of Inuit populations are almost systematically the highest of all population groups. Also, the IMRs of First Nations people are consistently higher than the IMRs of Métis.

6 Conclusion

Results from the various data linkages used in this study show that Indigenous populations have a lower life expectancy than non-Indigenous populations and that there are also marked differences between population groups among the Indigenous population, as recently shown in Tjepkema et al. (2019a). Similar results are observed in other countries with Indigenous populations, such as in Australia (Australian Institute of Health and Welfare, 2011), New Zealand (Phillips et al., 2017) and the United States (Arias, Xu, Sally, Brigham, & Tejada-Vera, 2021).

The life expectancy of Indigenous populations improved between 2006 and 2011 and 2016 to 2019, possibly at a slightly faster pace than that of the non-Indigenous population. However, the difference is too small to see it as a sign of convergence with the total Canadian population. Rates of improvements vary among the various Indigenous population groups. It is mainly among Métis and First Nations people living off reserve that life expectancy grew, perhaps in part as a consequence of response mobility in the census. By contrast, Inuit and Registered First Nations people living on reserve may have seen their life expectancy decrease over this period. The opioid epidemic that developed in Canada during this period may also have had a different impact among the various population groups studied. Finally, IMRs were generally higher among Indigenous populations than among non-Indigenous populations, a finding consistent with those of Wong et al. (2014) and Ely and Driscoll (2022) in the United States.

One limitation associated with CanCHECs and, to a lesser extent, CanBCCs is the absence of institutionalized populations in long-form census questionnaire data. Other constraints are the relatively small sample sizes for some population subgroups, unequal linkage rates among population groups and availability of data every five years only. Availability of population group identifiers in vital statistics could look like a solution to these limitations, but, as the experience of other countries has shown, misclassifications on death certificates have non-negligible impacts and may lead to an underestimation of death rates of Indigenous populations. One solution to increase the number of linked records would be to move the collection of data on Indigenous identity from the long-form to the short-form questionnaire, distributed to the whole population. However, this would do nothing for reserves and remote and northern areas, where the long-form questionnaire is distributed to the whole population.

In conclusion, while the limitations should not be minimized, one of the main findings of this article is that the use of CanCHECs and CanBCCs appears to be a viable solution for estimating life expectancy and IMRs for each of the specific Indigenous populations in Canada. This is an important result, because regular reporting of these health outcomes is essential to assessing and monitoring trends, in addition to detecting a narrowing of the gap in life expectancy and IMR between the Indigenous and non-Indigenous populations.