Migration Between Indian Reserves and Off-Reserve Areas: an Exploratory Analysis Using Census Data Linkage

New data linkages between censuses show that migration flows between Indian reserves and off-reserve areas from 2006 to 2011 and from 2011 to 2016 resulted in negative net migration for Indian reserves, meaning that—overall—more people left Indian reserves than entered them. These results differ from the portrait shown by the retrospective information from the 2011 and 2016 censuses, which indicates positive net migration for Indian reserves. A comparison of the information in the two sources revealed two types of inconsistencies that contributed to the observed differences: (1) inconsistencies in migrant status, and (2) inconsistencies in the origin location of migrants, i.e., the retrospective information about a migrant’s place of residence 5 years earlier does not match the place where the migrant was enumerated in the previous census. Results from this paper suggest that there are limitations to using retrospective information on the place of residence 5 years prior to a census to derive estimates of internal migration flows for small geographic areas, such as Indian reserves. New data linkages are a source of information that can be used to validate and improve these estimates, as well as to derive alternative estimates. However, data linkages also have limitations and require careful preparation before use, particularly when it comes to calculating weights to accurately account for unlinked records.


Introduction
In the 1960s and 1970s, there was significant interest in studying the urbanization of Indigenous populations (Cooke & Bélanger, 2006). The movements or migrations of Indigenous people toward cities were a key part of the urbanization process. These movements toward cities were perceived to be motivated primarily by a desire to escape isolation and the difficult economic conditions of Indian reserves, and thus were seen to be an indication of various needs not been fulfilled in or outside of Indigenous communities (Falconer, 1985;Trovato et al., 1994). Subsequent research has provided a more nuanced portrait of the phenomenon. Demographic studies of the population living on Indian reserves show that this population is growing, and that there are more people entering Indian reserves than leaving them (Amorevieta-Gentil et al., 2015). Between 2006 and 2016, the population living on reserve increased by 42,500 people. 1 The main factor for this growth is the high fertility rate of the Indigenous population, which has remained above the generation replacement level (Morency et al., 2018). However, internal migration is also considered to be a contributor to the population growth on Indian reserves, particularly for Registered Indians (Clatworthy & Norris, 2007, 2014Cooke & Penney, 2019). Estimates based on the retrospective census information have shown steady, small positive net migration for Indian reserves since the 1970s (Amorevieta-Gentil et al., 2015;Clatworthy & Norris, 2007, 2014Cooke & Penney, 2019;Norris et al., 2004;Siggner, 1977). In the five quinquennial periods beginning with 1986 to 1991 until 2006 to 2011, the population of Registered Indians on Indian reserves increased by between 9000 and 15,000 people through internal migration (Clatworthy & Norris, 2007, 2014Cooke & Penney, 2019;Norris et al., 2004). Clatworthy & Norris (2014) found that the people who migrated to Indian reserves came primarily from rural areas, but also from urban areas. Some studies suggest that a substantial portion of the migration into and out of Indian reserves is circular in nature, 2 i.e., that the migration is part of repetitive movements between two or more regions (Cooke, 1999). Furthermore, according to prior research, migration among Indigenous populations is influenced by several factors and triggered by multiple motivations (Cooke, 1999;Trovato et al., 1994).
Estimates of migration flows between Indian reserves and off-reserve areas in the existing literature have been derived using the retrospective information collected in the long-form census on the place of residence 5 years prior 3 . This is not only because the census was the only source of information available, but also since its large sample size makes it possible to measure a relatively rare event among relatively small populations. However, the recent availability of linked data from consecutive censuses provides new possibilities to study migration by comparing the place of residence of individuals enumerated in two consecutive censuses. Moreover, data linkage provides characteristics and place of residence (including Indian reserves) of individuals at two points in time, making it possible to examine both the characteristics associated with migration and the changes induced by migration.
The main goal of this paper is to study the potential uses of census data linkage in analyzing the migration flows into and out of Indian reserves, and to compare these results with the traditional retrospective approach based on a single census (long-form questionnaire). This paper has five main sections. In the following section, we provide a description of the benefits and limitations of using census data to analyze geographic mobility (particularly at low levels of geography), and the potential of using data linkage to fill some significant data gaps. Next, the data sources and methods used to link two consecutive censuses and calculate the weights to compensate for non-linked 2 Indigenous populations' attachment to a particular community may account, in part, for these circular movements, especially for those who are living or have previously lived on an Indian reserve. 3 Most census data are collected through two types of questionnaire-the short form and the long form. The short-form questionnaire asks fewer questions, but targets 100% of the Canadian population, while the long-form questionnaire contains more questions (including on mobility), but is sent only to a sample of the Canadian population living in private households. records are explained. The strengths and limitations of using this new data source for migration analysis are then discussed, followed by an overview of the various census data linkages that were used in this paper to calculate alternative estimates of migration flows between Indian reserves and off-reserve areas. These estimates are then compared with those obtained from a single census (long-form questionnaire). The new estimates obtained from data linkage indicate that Indian reserves lost more people than they gained through internal migration between 2006 and 2016, contrary to what each individual census shows. Still, despite the striking change of sign in net migration, Indian reserves have continued growing in the period due to birth rates, and they have not experienced anything close to an exodus. The fifth section gives an overview of how data linkage was used to compare the place of residence information in the 2016 Census long-form questionnaire with that from the 2011 Census long-form questionnaire (National Household Survey [NHS]). According to the results of this analysis, various factors related to the way the information in the census long-form questionnaire is reported and processed 4 may affect the accuracy of the information on migrant status and place of residence five years prior to the census.

Using the Census to Analyze Geographic Mobility
Because of the richness of its content, its large sample size, and its fine geographic detail, the census long-form questionnaire is a preferred source for analyzing geographic mobility. However, the census also has certain limitations when it comes to analyzing migration into and out of Indian reserves, and these limitations deserve attention. They are described here, and a distinction is made between those that are intrinsic to the census as a data collection tool, and those that are inherent to the use of retrospective information, such as the possibility of recall and response errors, proxy responses, and issues with geocoding an individual's reported place of residence 5 years prior to the census. While the first limitation affects the quality of the migration estimates regardless of the method used (the retrospective approach or the comparison of linked censuses), the second limitation can be avoided by using census data linkage. 5

Limitations Inherent to the Census
Estimates of migration flows into and out of Indian reserves can be affected by how the census reaches its target population. Some individuals may not be enumerated (undercoverage), while others may be counted more than once (overcoverage). Compared with the rest of the population, highly mobile individuals and young adults are more likely to be missed. This could be an issue for the Indigenous population in particular, as it is generally younger and more mobile than the Canadian population as a whole (Clatworthy & Norris, 2014;Dion & Coulombe, 2008). The combined effect of undercoverage and overcoverage is referred to as census net undercoverage (since undercoverage tends to be greater). Net undercoverage is positive in most regions of Canada and is generally higher on than off Indian reserves (Clatworthy & Norris, 2007, 2014Norris & Clatworthy, 2011).
Some individuals living on Indian reserves may also not be enumerated because census data were either not collected on their reserve or were incomplete. For example, there were 14 reserves (referred to as incompletely enumerated reserves) in the 2016 Census and 31 in the 2011 Census 6 for which data were not available (Statistics Canada, 2017b). Any individual who lived in an incompletely enumerated reserve in the start year, the end year, or both years of a linkage cannot be linked, and is therefore excluded from the analysis. In a linkage of the 2011 and the 2016 census, this represents about 40,000 individuals from 35 reserves (Statistics Canada, 2020), or 10% of the total population living on Indian reserves.
Other types of non-sampling errors that can affect the quality of estimates from census data are misunderstanding of the questions, incorrect data capture, or nonresponse (Statistics Canada, 2018). Non-response, in particular, is hard to measure for specific populations, but could be relatively more prevalent among Indigenous populations for reasons such as apprehensions about government data collection, use of the data for government control, or barriers related to language or literacy and numeracy issues (Wright et al., 2020). Higher level of non-response increases the risk of non-response bias, which occurs when non-respondents differ from respondents (Statistics Canada, 2018).
The quality of migration flow estimates also depends on whether enumerated individuals are registered in the right place. For example, individuals enumerated at the wrong residence during the census may mistakenly be considered migrants. While the census data are generally thought to be of good quality in this regard, it is more challenging to accurately collect the usual place of residence for some populations, such as individuals living in more than one place (e.g., students living in an apartment during the school year and with their parents the rest of the time, or workers living in another area for work).
Lastly, the census long-form questionnaire is not distributed to individuals living in institutions. This could be non-negligible for Indigenous populations, as they tend to be overrepresented among the institutional population (Clatworthy & Norris, 2007, 2014Norris & Clatworthy, 2011).

Limitations Inherent to the Use of Retrospective Information
Collecting information on individuals place of residence 1 or 5 years prior to the census date relies on their recollection of past events, which may be poor, especially after 5 years. Because the reference date is not linked to any specific event that could provide a reference point for respondents, it could be more challenging for them to remember this information. Memory problems can affect the quality of the migration flow size estimates (e.g., a migration could be forgotten) and of the migrants' characteristics (e.g., a place of residence error or a forgotten postal code could affect the quality of the geocoding). Accurately remembering information about one's place of residence 5 years prior to the census can be more challenging for highly mobile individuals (e.g., circular migrants).
The vagueness of the concept of municipality used in the census question on the place of residence 1 or 5 years earlier may also affect the quality of responses. Statistically speaking, municipality refers to the boundaries of the census subdivision (CSD), and an Indian reserve has a distinct CSD name and code. 7 However, there can be discrepancies between statistical rules and individuals' perception of their reality. For instance, some respondents provide a name of municipality (CSD) of origin that refers to the closest census metropolitan area (CMA) or census agglomeration (CA) instead of the name of their Indian reserve. 8 The quality of responses can also be affected by the fact that household members' responses are often given in an indirect interview (proxy responses). 9 It can be difficult for a person to know or remember the place of residence of each member of their household 5 years earlier, especially if they are not immediate family. This is more likely to be the case for larger households (which are more prevalent on Indian reserves). Furthermore, the effects of proxy responses are potentially greater on reserves, since data are collected through interviews with enumerators who ask the questions to the person or people in the household when they visit, compared with online questionnaires, for which the person most equipped to respond for all family members will generally do so.
Once collected, the information goes through an intensive data processing step. Geocoding-a process in which collected information is matched to a CSD within the Standard Geographic Classification 10 -is key for information on the place of residence 1 or 5 years prior to the census. This is a highly sophisticated process that continuously improves from census to census. However, there are circumstances that may affect its precision, for example, when the place of residence 1 or 5 years earlier is not a unique name, which is the case for a considerable number of Indian reserves. A number of reserves have names that are similar to those of neighboring off-reserve CSDs (e.g., Chilliwack 1 [on reserve] and Chilliwack [off reserve]), which can create confusion in the geocoding process, especially if both CSDs share the same postal code. The complex spelling of certain Indian reserves-which are more prone to errors-can also lead to geocoding errors. Postal codes help improve 7 The question on the place of residence 5 years earlier asked migrants to "specify the name of the city, town, village, township, municipality or Indian reserve of residence 5 years ago," as well as the province or territory of residence, and the postal code. 8 For example, a person who lived on the Mashteuiatsh reserve (on-reserve CSD) 5 years earlier could report that they lived in Roberval (off-reserve CSD), the main agglomeration nearest to that reserve, which could lead to omitting an off-reserve migration or adding a false on-reserve migration. 9 Indirect interview (or proxy response) is where a person (the proxy) answers questions on behalf of another person (the sampled interview subject). the geographic coding process, but less so in rural areas, as one postal code can often cover more than one CSD. Postal code information is also often unreported, which is most likely because of respondents' trouble remembering. Geocodingrelated aspects will be addressed in greater detail in Section 5 and Appendix 2.
Overall, at large geographic scales, these obstacles are likely minor and do not significantly bias the results. Migratory flows-as measured by census data-are generally in line with the measurements from other sources, such as income tax records. However, for lower levels of geography (e.g., CSD level), these obstacles could be more problematic. Unfortunately, a number of factors can make comparisons with other data sources harder, including sources that lack variables, different methods of operationalizing migration, differences in the target populations, and different reference periods. Census data linkage provides new opportunities for validation and analysis.

Description of Data Files and Method
Linking files is a complex task. The two most challenging steps are linking records-which involves linking as many records as possible while avoiding false positives (i.e two records that belong to different individuals)-and accurately weighting the file so that the linked individuals are representative of the whole population of interest (including individuals who could not be linked). Below is a brief description of the data files used in this study and the methods used for linkage and weighting.

Description of Linked Census Data Files
Every 5 years, Statistics Canada conducts the Census of Population, which is a key source of information on the Canadian population. All Canadians are asked to complete the census short-form questionnaire. A sample of the population is also asked to answer the census long-form questionnaire, which is more elaborate in terms of content. The two questionnaires are mandatory, except in 2011, when the census long-form questionnaire was replaced by the 2011 NHS, which-although it was similar in content to the previous (2006) and subsequent (2016) census long-form questionnaires-was voluntary. The sampling scheme for the long-form questionnaire has also varied over the years. The proportion of sampled households was 33% in the NHS and 25% in the 2016 Census. However, for all northern communities and Indian reserves, no sampling was used. In other words, all households were asked to complete the census long-form questionnaire.
Data linkage makes it possible to combine the data collected through these questionnaires from two consecutive years. Each combination has its advantages. For example, linkage between two consecutive census long-form questionnaires contains a large number of variables at two points in time, allowing for comprehensive multivariate analyses. Conversely, linkage between two consecutive census short-form questionnaires contains a much larger number of records. Hence, a linkage between a census short-form questionnaire and a census long-form questionnaire provides a mix of benefits: a relatively large number of records with a large number of variables (at one point in time). In general, the objectives of a researcher will dictate which type of linkage to use.
Several different linkages built from the 2006, 2011, and 2016 census short-form questionnaires (S2006, S2011, and S2016, respectively), and the 2011 NHS and the 2016 Census long-form questionnaires (L2011 and L2016, respectively) were used for this study. 11 The goal was to assess the consistency of the results and make sure that they are not artifacts of the linking methodology and the weighting process (discussed in the next sections). Likewise, the linkage of S2006/S2011 was made to assess consistency overtime. 12 In all the files considered, the geographical limits used were those defined based on the 2016 Census. The five different linkages 13 used in this study are: For clarity, retrospective information in a census file will be referred to with "_R" in the file name. For example, the place of residence in L2016_R is the place of residence 5 years earlier collected retrospectively in L2016.

Linkage Method
The linkage of individuals from two consecutive censuses was performed using a multistage method. This involved using personal information (given name, family name, sex, and birth date) and household information (geography [province or territory and census division], postal code, telephone number, and household composition). Not all of this information is used at each stage. For instance, some stages use the geographic 11 At the time of publication of this article, these files could only be accessed through a partnership with researchers from Statistics Canada. 12 There are other variants possible. However, the linkage and weighting process of each variant involve a lot of work, and the selection of variants used in this paper was considered appropriate for the objectives at hand. 13  information, while at other stages, geography is not considered at all in order to maximize the linkage of migrants. Only single links were accepted to avoid linking records that do not belong to the same individual. 14 Because of weak linkage rates for the population living in institutions and in collective dwellings (for linkages using the census short-form questionnaire), these populations were excluded from the linkage process. Therefore, this study covers only the population living in private households at both points in time. 15 Linkage rates are lower for the population living on reserve than for the entire Canadian population. For example, using the population of the start-year file (2011) as the denominator, and observing the whole population intended to be linked, the linkage rates for the population living on reserve are 42% in S2011/S2106 and 46% in L2011/S2016, in comparison to 80% and 83%, respectively, for the population living off reserve (please see Appendix 1 for important considerations on the calculation and interpretation of linkage rates).
A number of reasons can explain the differences in linkage rates observed between subgroups of the population, but incorrect or missing information could be an important factor. First, the collection method using enumerators-the predominant method used in Indigenous communities-could introduce errors in the names collected because of misspellings. Moreover, the impact of this situation could be made worse by names with a complex spelling. 16 The information collected (e.g., date of birth) can also be less accurate when collected through indirect interviews. The use of enumerators on reserves can increase the chances for indirect interviews if the questions are asked only to the people present in the household at the time of their visit. Proxy responses may also be more frequent in large households, which is a situation that is more common on reserves. 17 The relatively weak linkage rates for the population living on reserve are a source of concern because biases could occur if unlinked individuals have characteristics different from those of linked individuals (selection bias). In the context of this study, there is no particular reason to believe that the quality of the matches is lower among migrants than among non-migrants because the control methods-i.e., methods used to assess the quality of each linked record-are the same in both cases. Moreover, the proportion of reserve in-migrants and out-migrants is similar among the linked and unlinked populations, which likely means that there is no bias for this specific characteristic. Lastly, linkage rates were similarly low for people living on reserve in 2011 and in 2016, and should be therefore similar for both reserve in-migrants and out-migrants. While these facts are reassuring, they do not imply a total absence of bias. The choice of an appropriate weighting strategy (described next) can minimize potential biases induced by the linkage process.

Weighting Method
Linked files are subsets of the source files they stem from. 18 For example, the S2011/ S2016 file is a subset of the 2011 Census file because not everyone in the 2011 Census population was linked to the 2016 Census file. People who were enumerated in the 2011 Census but were not linked to the 2016 Census file can be thought as non-respondents of a survey. Therefore, to infer results from the S2011/S2016 file for the entire 2011 Census population, a special set of weights must be computed to account for these non-respondents (i.e., non-linked records).
The weights must also take into account the fact that some people in the 2011 Census population could not be linked to the 2016 Census because they had either died or left the country before the 2016 Census. 19 These people can be considered "out-of-scope" cases in a longitudinal survey sample. As a result, a special adjustment to the weights is needed to take these cases into account. The weights for the S2011/S2016 file make it possible to make inferences for the population that was alive and living in the country in both 2011 and 2016. 20 This is also true for other linked files used in this study. Various methods were used to identify the key variables and interaction terms to be used to create homogeneous adjustment classes depending on whether start year file is a long-form census or a short-form census. Tests were conducted to ensure that the results were not overly sensitive to the choice of variables.
In the S2011/S2016 file, all individuals have an initial weight of 1, since the S2011 file covers the entire population and every record is self-representative. This weight of 1 must first be adjusted to account for individuals in the 2011 Census population that were not linked to S2016. This process is somewhat similar to the calibration of weights for non-response in a survey. This adjustment is done among groups of people with similar characteristics (called homogeneous groups), made by combining several variables relevant to this study and available in S2011, including geography (province or territory, census division, and an on-reserve or off-reserve 18 Among the linked files used for this paper, four (L2011/L2016, S2011/S2016, L2011/S2016, and S2006/S2011) were weighted using the method outlined in this section. The fifth linked file (S2011/ L2016), which was not used in this study to estimate migration flows, but only to document discrepancies between retrospective responses and linked information, was weighted using a much more simplified strategy that is outlined in Section 5. 19 Individuals living in an institution or collective dwelling in 2011 or 2016 were also excluded. 20 The weighting takes into account the characteristics of individuals at the beginning and end of the period. As a result, the weighting aims to reflect the characteristics of individuals at both times, rather than at a specific moment (2011 or 2016). indicator), age, sex, marital status, mother tongue (including Aboriginal and Inuit languages), language spoken at home (including Aboriginal and Inuit languages), household size, and type of census family. Within each homogeneous group, the weights of linked records with characteristics XYZ are inflated so that they also represent the non-linked records with the same characteristics XYZ. After this step, the sum of the weights for the linked records in the S2011/S2016 file is equal to the 2011 Census population size. 21 The next step was to adjust the weights of the linked records, which now represented the entire 2011 Census population, to account for the fact that not all of them could be linked to S2016 because of death or emigration, i.e., some people in the 2011 Census population were no longer alive or living in Canada in 2016. To make this adjustment, estimates of the number of people in the 2016 Census population who were alive and in the country in 2011 were needed (i.e., the population that could potentially be linked), then the weights of the linked records were calibrated to these new totals. Estimates of the population that could potentially be linked between 2011 and 2016 were obtained using information collected in L2016_R. With this information, anyone in the 2016 Census population who was not born or did not live in Canada 5 years prior to the census (i.e., people who could not be linked) were removed, resulting in a weighted estimate of the population for both 2011 and 2016. 22 These new totals were calibrated by province or territory of residence in 2016, place of residence on-and off-reserve, age, and sex. This made it possible for the final weights to be representative of the population that was alive and living in the country in both 2011 and 2016. 23 The same strategy was used to calculate the weights for the S2006/S2011 file. 24 The weights of the different linkage files are described in Table 1.
The strategy used to compute the weights for the L2011/L2016 file differs slightly from the strategy used for S2011/S2016 because it must account for the fact that not everyone was selected to respond to the NHS in 2011 and the long-form questionnaire in 2016. Therefore, the initial weights were computed as the product of the 2011 NHS final weight and the 2016 Census long-form questionnaire final weight to reflect the inverse of the probability that an individual would be selected to complete both. The next steps (i.e., making adjustments for non-linked records and for deaths and emigration) are similar to those used for the S2011/S2016 file, except that, in the L2011/L2016 file, a much broader range of characteristics could be considered when adjusting for non-linked individuals. As a result, more complex methods (logistic regression and clustering techniques) were used to generate homogeneous response groups.
The distribution of the adjusted weights for the L2011/L2016 file (Table 1) shows wide variations by place of residence in 2011 and 2016, primarily because of the different collection strategies used in Indigenous communities. This has important implications, as an incorrect link between two records can have a major impact on the quality of the results if its weight is high.
The strategy used to weight the L2011/S2016 is similar to what was described above, but is not presented here to avoid redundancy. 25

Limitations of Linked Files
A large number of complex procedures must be carried out before using linked data to analyze migration flows. This is particularly true for Indigenous populations and populations living on Indian reserves because of inherent difficulties, such as relatively low response and linkage rates, as well as the use of different collection strategies. In addition to the limitations inherent to census data (described in Section 2.1), linked census files have their own limitations. However, while linked census files could be used to evaluate the quality of the retrospective information on the place of residence 5 years prior to the collection date (Section 5), no data file could be used to highlight errors in the places where individuals were enumerated, wrong or missed record linkages, or remaining selection biases that could not be compensated for through the reweighting process, despite the wide range of variables used. 26 Therefore, despite all of the precautions taken, users should keep these considerations in mind when using estimates from linked census files.
Lastly, estimates produced from linked census files are subject to sources of uncertainty that are impossible to measure. Unfortunately, it was not possible to provide variance for estimates computed using linked census files (nor for estimates computed using retrospective census information). Because migration is a fairly rare event, counts are often relatively small and must be interpreted with caution.

Main Results
This section provides the estimates of migration flows into and out of Indian reserves computed using census data linkages for the quinquennial periods of 2006 to 2011 and 2011 to 2016. The range of sources makes it possible to verify the robustness of these estimates, as each file has its own strengths and limitations (described in Section 3). Listed first are the totals for Canada and each province, for large geographical areas following a rural-urban gradient, and then by age group. These estimates are compared with those obtained using retrospective information from the census. At the end of this section, an additional comparison with an alternative data source is presented.

Comparison of Results from Linked Files with Those Obtained from a Single Census
According to the retrospective information collected by the 2016 Census longform questionnaire (L2016_R), between 2011 and 2016 Indian reserves in Canada had a net gain of about 10,600 people through internal migration (Table 2) which contributed to a 3.4% increase of their population during that period. 27 In contrast, estimates from the S2011/S2016, L2011/L2016, and L2011/S2016 files show net losses of 18,900, 20,400, and 18,700 people, respectively, which contributed to a decrease of between 5.4 and 5.9% of their population. The differences found were mainly in the number of out-migrants, as estimates of outmigrants from the linked files were more than double those from the 2016 Census. Estimates of in-migrants were fairly consistent across all sources. 28 Similar results were obtained for the 2006-to-2011 period by comparing estimates from L2011 with those from S2006/S2011 (Table 3). According to the L2011_R file, at the national level, Indian reserves had a net migration gain of 11,700 people, compared with a net loss of about 14,000 people in the S2006/S2011 file. The results for the population of Registered Indians were similar (see Table 2). According to the 2016 Census (L2016_R), in the quinquennial period from 2011 to 2016, Indian reserves gained 6500 people through internal migration (which contributed to an increase of 2.4% of the Registered Indian population on reserve during this 5-year period) compared with net losses of 21,400 and 25,100 in L2011/L2016 using respectively the Registered Indian status information in 2016 and 2011, and a loss of 24,000 people in L2011/S2016 using the Registered Indian status information in 2011 (which contributed to a decrease of between 6.9 and 7.8% of the Registered Indian population). While it seems like it should not matter whether the information about Registered Indian status in 2011 or 2016 is used, the population may-in fact-differ slightly for a few reasons. For example, some people may have registered 27 Results presented in this section exclude the population living in the territories or on an incompletely enumerated Indian reserve at the beginning or end of the period studied. 28 Section 5 contains more information on the factors that can contribute to the differences observed between linked and cross-sectional census files. It is shown in this section that the seemingly consistent results between sources for the number of in-migrants actually hide some significant discrepancies.  The presence of weights less than 1 occurs because of the final calibration used to reduce the overall weights to account for deaths and emigration between the two censuses. Since the individuals who died or left the country cannot be identified, the weights of all linked individuals have been reduced which can generate weights of less than 1. Numbers in parentheses in the first two columns refer to the theoretical average weight based on the sampling plan of each data source. For instance, a value of 1 means that 100% of the population was sampled while a value of 4 means that 25% of the population was sampled.   The trends described above at the national level were also observed at the provincial level. 30 Linked files show negative net migration on Indian reserves in all provinces except British Columbia in 2011/2016, whereas the non-linked L2011 and L2016 files (using retrospective responses) almost always show net positive migration. Figure 1 shows the estimated net migration for Indian reserves in relation to other types of regions along an urban-rural continuum: urban CMA, urban non-CMA, and rural. 31 All estimates computed from linked files were negative, suggesting that Indian reserves lost people through internal migration to all three types of regions, for both the population as a whole and Registered Indians. The results show that Indian reserves' largest net losses were to rural areas or urban CMAs. For Table 3 Estimates of Indian reserves' in-migrants, outmigrants and net migration between 2006 and 2011, total population, for Canada and the provinces, by data source Results exclude the population living in the territories or on an incompletely enumerated Indian reserve in 2006 or 2011. Individuals who moved from one reserve to another in the same province or region are not accounted for as in-migrants or out-migrants. Results are rounded to the nearest hundred. For this reason, net migration may not be equal to the difference between in-migrants and outmigrants. the Registered Indian population in particular, the largest net losses were to urban CMAs. In contrast, L2016_R data show positive net migration everywhere. Figure 2 shows Indian reserves' 5-year out-migration rates by age group for the total population and the Registered Indian population in Canada in 2011. While the different data sources show similar patterns by age group,    Figure 4 shows Indian reserves' net migration rates by age group, computed from all sources. Net migration estimated using linked files was negative for all age groups except for the population aged between 50 and 75. In contrast, net migration estimated using L2016_R was positive or close to 0 for all ages.

Comparison with Net Migration Obtained from a Residual Approach Using Multiple Censuses
As mentioned in the Introduction, the population living on Indian reserves increased by 42,500 people between 2006 and 2016 according to the retrospective information collected in each census. This can be decomposed roughly by estimating the contribution of the various components (fertility, mortality, and migration) that influenced this population growth. First, it can be assumed that the contribution of international migration to population growth on Indian reserves is negligible. Second, the number of births between two censuses can be estimated by counting the population younger than age 5 in the last census. It was estimated that between 2006 and 2016, there were 72,800 births on Indian reserves. By applying death rates to the population at the beginning of an intercensal period, the number of deaths on Indian reserves was estimated at around 17,200 over the same period. As a result, natural growth (i.e., the difference between the number of deaths and number of births) contributed to 131% of the observed growth (+ 55,600) between 2006 and 2016. If these figures are assumed to be accurate and net internal migration on Indian reserves is estimated as a residual (i.e., the number required to balance total growth at 42,500), then an estimate of − 13,100 would be obtained, which is higher than the sum of the estimates from S2006/S2011 and S2011/S2016 (− 32,900), but much lower than those from L2011_R and L2016_R (+ 22,300). Given that natural growth exceeds the estimated total growth over the study period, net migration on Indian reserves could not be positive.
Once again, it is important to interpret these results with caution, as there is a lot of uncertainty with regard to census estimates of population counts on Indian reserves, particularly because of net undercoverage. This uncertainty not only affects the quality of the migration estimates obtained from linked files, but also attempts to reconcile estimates of migration with those related to observed population growth, like the one done here.

Analysis of Discrepancies Related to Migrant Status or Previous Place of Residence Between Retrospective Responses and Linked Census Information
In this section, the S2011/L2016 file was used to compare the retrospective information about an individual's place of residence in 2011 from the L2016_R file with their place of residence in 2011, as listed in the S2011 file. 32 In theory, this information should be identical, and discrepancies between the two files could explain why the two sources provide different estimates of migration flows when the place of residence is located on an Indian reserve in one file but not in the other. Table 4 shows a summary of the discrepancies found among linked individuals in the S2011/L2016 file. These discrepancies are of two main types: those related to self-declared migrant status (migrants are those who reported living in a different municipality five years earlier in L2016_R), and those related to 32 The S2011/L2016 file was weighted using a simpler approach than the one presented in Section 3. Series of average weights were computed for the four combinations of interest: (1) Table 4 also shows the effect of each type of discrepancy on estimates of global net migration on Indian reserves, assuming that the information from the census linked files is correct.

Discrepancies Related to Migrant Status
There are two cases of migrant status-related discrepancies. The first involves individuals who declared in 2016 that they were living in the same place 5 years earlier, so they were considered non-migrants in L2016, but who were enumerated at different places of residence in S2011 and L2016 (i.e., they migrated into or out of an Indian reserve). Overall, discrepancies of this nature accounted for a difference of 7600 in the net migration on Indian reserves in S2011/L2016 when migration was measured using the linked information on place of residence in 2011 and 2016, instead of the retrospective information contained in L2016_R. The second case is the opposite of the first. It involves self-declared migrants who were enumerated at the same place of residence in S2011 and L2016. Of interest in this study among these cases are the records where retrospective information showed a migration into or out of an Indian reserve. Records exhibiting this type of discrepancy contributed to a difference of 3500 individuals in the net migration on Indian reserves between the linked and retrospective sources. What causes these differences? There are reasons to believe that the retrospective information is at fault most of the time. One reason is that it may be difficult for respondents to remember whether they were living in the same place 5 years earlier, on a specific date that bears no particular significance to them, and this is especially true for highly mobile individuals. Some very mobile individuals, such as circular migrants, may believe that they have more than one place of residence (although they should choose one usual place of residence as per census guidelines). Another reason is that responses indicating that the respondent was living in the same place 5 years earlier help alleviate response burden, since no retrospective place of residence information needs to be added.
There is also another factor at play in the case of self-declared migrants with two identical addresses in the linked files. An examination of the coding of the retrospective information on the place of residence 5 years earlier showed that there was a bias toward coding these places as "not a reserve." This could explain why there is a larger number of inconsistent records for migrants onto a reserve (off-reserve to on-reserve) than for migrants out of a reserve (on-reserve to off-reserve). The specific topic of geocoding is addressed in Appendix 2.

Same Migrant Status, But Discrepancies in On-Reserve and Off-Reserve Residence in 2011
The second type of discrepancy involves individuals who are self-declared migrants and have two distinct places of residences in the linked file, but whose place of residence in S2011 does not match their place of residence 5 years earlier as collected in L2016. This type of discrepancy occurs in four different forms: a. Origin is on an Indian reserve as per S2011, but off a reserve as per L2016_R (400 cases) b. Origin is off an Indian reserve as per S2011, but on reserve as per L2016_R (2,300 cases) 2. Migration off reserve: a. Origin is on an Indian reserve as per S2011, but off a reserve as per L2016_R (16,200 cases) b. Origin is off an Indian reserve as per S2011, but on reserve as per L2016_R (4,500 cases).
Overall, these four versions of the second type of discrepancy account for a difference of − 13,600 individuals in the net migration of Indian reserves when the place of residence obtained from S2011 is used instead of the one obtained from L2016_R.
The above discrepancies may occur for many reasons, but it is likely that a large proportion of the inconsistencies originates from L2016_R because of factors such as the provision of imprecise or erroneous retrospective information, possibly caused by respondents having trouble remembering the information, or the challenges associated with coding the retrospective information to a precise location. As mentioned above, there seems to be a bias in the geocoding where more locations are wrongly coded as off reserve (if S2011 is assumed to be accurate) than on reserve. Therefore, there are more "erroneous" migrations from outside reserves than from on reserves (see Appendix 2). This is consistent with the results shown here.

Conclusion
For several decades, the retrospective questions on the place of residence one year and 5 years prior to the census have been the preferred source for measuring migration flows between Indian reserves and off-reserve areas in Canada. From one census to the next, the same observation has emerged: slightly more people entered Indian reserves than left. Consistency over time and the absence of data allowing for alternative estimates have led these results to be widely used and accepted, in particular by departments, policy-makers and the scientific community.
During that period, Statistics Canada was aware of limitations related to these retrospective census questions, and that these limitations have a greater impact for smaller geographic areas. 33 For this reason, the agency put forward various initiatives during the last two censuses to improve the coding and processing operations related to these variables. However, without census data linkage, it is very difficult to assess how and to what extent these limitations could affect the quality of the estimates.
The fact that two consecutive censuses are now linked made it possible-for the first time-to obtain alternative estimates of migration flows between Indian reserves and off-reserve areas without some of the limitations inherent to traditional censuses. These new estimates reveal a somewhat different portrait for the periods from 2006 to 2011 and from 2011 to 2016: there were more out-migrants from than in-migrants to Indian reserves, leading to negative overall net migration on Indian reserves. Furthermore, comparisons of data from the two sources shed light on some specific limitations associated with the collection of retrospective information about an individual's prior place of residence, which led to an underestimation of the number of people leaving Indian reserves.
However, it is important to be cautious when interpreting these results. Although the change of sign in net migration is striking, the size of outward migration flows from Indian reserve does not in any way suggest an exodus, and Indian reserves have been continuously experiencing population growth in last decades due to high birth rates. Besides, these new estimates carry a certain dose of uncertainty. There are, indeed, factors that can affect the precision of the estimates, such as the potential linkages of two records that do not represent the same individual, the exclusion of the population living in incompletely enumerated Indian reserves, the low linkage rates on Indian reserves, and the limitations in the weighting processes. The population living in institutions (which includes many Indigenous people) also created a particular challenge, as it had low linkage rates and was excluded from the analysis as a result. However, as previously mentioned, this population could not be fully excluded from the weighting process.
That said, the results of net migration on reserves obtained from census data linkages may be more consistent with the population growth observed on reserves in the past, as demonstrated by the result of the decomposition of population growth according to the various demographic components, which shows that net migration on reserves could not be positive between 2006 and 2016.
Lastly, census linkages have served in this study as an alternative data source for evaluating the quality of the retrospective information related to the place of residence. This evaluation showed some limitations of this information such as a bias in the geocoding process favoring attribution to off-reserve locations and an underestimation of individuals having migrated. While there is no indication that these issues are specific to some population groups, these results could serve as a starting point for further investigation of the quality of retrospective census information, particularly in regard to other types of geographic areas. They could also provide an opportunity to integrate new data sources, such as data linkages and administrative data, into the coding and processing of census migration data, in particular for smaller areas and Indigenous communities. This would make it possible to further improve the quality of Canadian census data. file population (2011) were successfully linked to the end-year file (2016), and divide by the population from the start-year file. Intuitively, this is in line with the weighting procedure, since adjustment of the weights for non-linkage is done assuming that everyone in the 2011 population should have been linked (the weights are reduced to account for losses due to deaths and emigration between 2011 and 2016 during the calibration stage). Linkage rates computed with this approach are presented for files S2011/S2016 and L2011/S2016 in Table 5. Note that linkage rates for the L2011/L2016 and S2011/L2016 files could also be computed in theory. However, a very high proportion of individuals from the start-year file were not selected to respond to the 2016 Census long-form questionnaire (around 75% of the population living outside an Indian reserve) and therefore cannot be linked, so that the procedure results in very low linkage rates that reflect essentially the sampling scheme of the 2016 Census (long-form).
The linkage rates in Table 5 are indicative of the importance of the weighing process to correct for non-linked individuals. 34 However, they cannot be interpreted as indicators of the performance of the linkage procedure. This is because the calculations make no exclusions for individuals who could not be linked. For example, in Table 5, the linkage rates tend to be lower in older age groups due mainly to the fact that the proportions of individuals in the start-year file who died between 2011 and 2016 increase with age, not to a decreasing capacity to link those who could be linked. There are, in fact, multiple reasons why an individual in the start-year file could not be linked to the end-year file, and vice versa: • An individual in the start-year file died or left the country before the 2016 Census. • An individual in the end-year file was not born or lived outside the country in 2011. • An individual in the start-year (or end-year) file did not respond to the 2011 (or 2016) census (non response). • An individual in the start-year (or end-year) file was missed by the 2011 (or 2016) census (undercoverage). • An individual is present in both the start-year and end-year files but could not be linked due to errors in the linkage keys, wrong information having been captured, or other operational reasons.
A linkage rate that reflects how successful the linkage was for those that could effectively be linked needs to account for all the above cases in the choice of the denominator. Applying such adjustments results in a linkage rate of about 90% for the entire Canadian population and 55% for the population living on an Indian reserve. Unfortunately, not all the information required for these adjustments is available at disaggregated levels (i.e., for population groups or geographical areas).

Appendix 2 Analysis of Discrepancies in Migrants' Place of Residence Information
Differences between the estimates of migration flows into and out of Indian reserves obtained from the 2016 Census and from the various 2011 and 2016 census linked files can be partially explained by inaccuracies in responses to the question in the 2016 Census on self-reported place of residence 5 years prior to the census (for respondents who identified as migrants). There are several reasons why this may occur, and they are examined briefly in this appendix. First, while the statistical definition of a municipality is clear (it is a census subdivision), it is not always clear for respondents, despite the fact that examples are provided in the questionnaire (city, town, village, township, municipality, or Indian reserve). This may explain the recurring situation in which a respondent provides the name of the closest census agglomeration (which is often an offreserve location) instead of the name of the municipality. Second, it is also difficult to accurately assign a geographic code when a respondent provides a place of residence that is not a unique name (some distinct municipalities share the same name). In the specific case of Indian reserves, many share the same name as a municipality outside the reserve, and the only difference is that it is followed by a number. If these numbers are omitted, the location will be coded incorrectly. For example: • Berens (off-reserve area) and Berens River 13 (Indian reserve) • Cote (off-reserve area) and Cote 64 (Indian reserve) • Grand Rapids (off-reserve area) and Grand Rapids 33 (Indian reserve) • Wabasca (off-reserve area) and Wabasca 166D (Indian reserve) • Kamloops (off-reserve area) and Kamloops 1 (Indian reserve) • Penticton (off-reserve area) and Penticton 1 (Indian reserve).
Even though respondents have been asked since 2006 to provide the postal code of their place of residence 5 years earlier (which, in theory, should help identify a precise location), this is not always helpful because, in rural areas, a postal code can cover both on-reserve and off-reserve areas. Furthermore, respondents sometimes do not report their postal code, possibly because they were unable to remember it.
Third, the quality of the place of residence coding can also be compromised by various factors. For instance, some responses provided by respondents could be incorrect, perhaps because it is difficult for the respondent to remember their place