3.1 Introduction

How do we know who is missed in a Census or which groups have a net undercount? Several methods have been used over time and in various countries to answer this question but in the U.S. only the Demographic Analysis (DA) method and the Dual-Systems Estimates (DSE) method provide quantitative answers to the question posed above (Mulry 2014; Hogan et al. 2013; Bryan 2004; Anderson 2004).

According to the U.S. Census Bureau (2012d, p. 2),

The Census Bureau has historically relied on two principal methods to provide measures of the quality of each Census. One method is based on a post-enumeration survey, which is the topic of this report. The other method is based on demographic analysis, which uses various types of demographic data in order to build an historical account of population change.

Briefly, DA compares the Census count to an independent estimate of the expected population based on births, deaths, and net international migration. The DSE method uses a Post-Enumeration Survey to independently gather information on people that can be compared to the Census count to assess correct enumerations, omissions, and erroneous inclusions (mostly people counted more than once). Each of these methods is described in the next two sections of this Chapter along with some of their strengths and limitations.

One important difference between DA and DSE data is the level of age/sex detail available. Detailed 2010 data from the DA estimates are available so researchers can construct tables for whatever age-sex-race/Hispanic groups as they wish, within the limits of the data. For the 2010 DA data, one must download files from the Census Bureau and construct their own net undercount/overcount tables.

On the other hand, data from the DSE method are only provided for a few age/sex groups determined by the Census Bureau. The DSE data are provided in a series of reports that provide net undercounts and omissions.

3.2 Demographic Analysis Methodology

Demographic Analysis has been used since the 1950 Census to provide estimates of net undercounts in the U.S. Census. As stated above, this method creates a separate independent estimate of the expected population based on births, deaths, and net international migration and the expected population is then compared to the Census count to determine net undercounts and net overcounts. DA estimates are provided for both males and females, Black and Non-Black, by single year of age. Data on Hispanics are provided for those below age 20 in the 2010 DA estimates.

DA is an example of the cohort-component method of population estimation meaning each component of population change (births, deaths, and migration) is estimated for each birth cohort. The cohort-component method is one of the most widely used techniques in population estimation (United Nations 1970; Bryan 2004). Since there are already several detailed descriptions of the DA methodology available, I will only review the method briefly here (Robinson 2010; Himes and Clogg 1992; U.S. Census Bureau 2010).

The DA method has been used to assess the accuracy of Decennial Census figures for more than a half century (Coale 1955; Coale and Zelnick 1963; Coale and Rives 1973; Siegel and Zelnik 1966). Its origins are often traced back to an article by Price (1947), which found an unexpectedly high number of young men who turned up at the first compulsory selective service registration on October 6, 1940 and alerted demographers to the possibility of under-enumeration in the 1940 Decennial Census.

The DA method employed for the 2010 Decennial Census used one technique to estimate the population under age 75 and another method based on Medicare enrollment to estimate the population age 75 and older (West 2012). The 2010 DA estimates for the population age 0–74 are based on the compilation of historical estimates of the components of population change: Births (B), Deaths (D), and Net International Migration (NIM). The data and methodology for each of these components is described in separate background documents prepared for the development and release of the Census Bureau’s 2010 DA estimates (Robinson 2010; Devine et al. 2010; Bhaskar et al. 2010).

As described by the U.S. Census Bureau (2010) the DA population estimates for age 0–74 are derived from the basic demographic accounting Eq. (3.1) applied to each birth cohort:

$$ {\text{P}}_{0 - 74} = {\text{B}} - {\text{D}} + {\text{NIM}} $$
(3.1)
  • P0–74 = population for each single year of age from 0 to 74 (people less than a year old are labeled age 0 by the Census Bureau)

  • B = number of births for each age cohort

  • D = number of deaths for each age cohort since birth

  • NIM = Net International Migration for each age cohort.

For example, the estimate for the population age 17 on the April 1, 2010 Decennial Census date is based on births from April 1992 through March 1993, reduced by the deaths to that cohort in each year between 1992 and 2010, and incremented by Net International Migration (NIM) of the cohort each year over the 17-year period.

The birth and death data used in the Census Bureau’s DA estimates come from the U.S. National Center on Health Statistics (NCHS) and these records are widely viewed as being accurate and complete. The National Center for Health Statistics (2014, p. 2) states, “A chief advantage of birth certificate data is that information is collected for essentially every birth occurring the country each year…” After a thorough review of vital statistics prior to the 2010 Census, the U.S. Census Bureau (Devine et al. 2010, p. 3) stated:

The following assumptions are made regarding the use of vital statistics for DA:

  • Birth registration has been 100% complete since 1985.

  • Infant deaths were underregistered at one-half the rate of the underregistration of births up to and including 1959.

  • The registration of deaths for ages 1 and over has been 100% complete for the entire DA time series starting in 1935.

In addition to regularly published totals, the Census Bureau receives microdata files from NCHS containing detailed monthly data on each birth and death. These files were used primarily for DA estimates by race. Construction of DA estimates by race is discussed later in this Chapter.

The Census Bureau changed the way it calculated Net International Migration (NIM) for the 2010 set of DA estimates (Bhaskar et al. 2010). The current method relies heavily on data from the Census Bureau’s American Community Survey (ACS) where the location of the Residence One Year Ago (ROYA) is ascertained for everyone in the survey age 1 or older. The total number of yearly immigrants is derived from this question in each year of the ACS, and then that total number of immigrants is distributed to demographic cells (sex, age, and race) based on an accumulation of the same data over the last five years of the ACS. Five years of ACS data are used to provide more stable and reliable estimates for small demographic groups. On the other hand, it is important to note the five-year average may mask changes in trends over time. Given changing economic conditions, it would not be surprising if the immigration pattern in the 2008–2010 period differed from the pattern before 2008, however, I suspect such errors would be small.

Statistics on emigration of the foreign-born population from the U.S. are based on a residual method comparing data from the 2000 Decennial Census to later American Community Survey estimates to develop rates and then applying those rates to observed populations (Demographic Analysis Research Team 2010).

Emigration of U.S. citizens (net native migration) is derived by examining Census data from several other countries (Schachter 2008). This method of estimating out migration of the native-born population is problematic for a couple of reasons. Data are not available for every country, and the quality of some foreign censuses is suspect. See Jensen (2012) for more details on measuring net international migration. In 2018, the Census Bureau staff presented a paper with revised data for net native migration of young children based on data from the 2010 Mexican Census (Jensen et al. 2018).

In preparing for the December 2010 DA release the Census Bureau developed five estimation series with differing assumptions about births, deaths, and net international immigration to reflect the degree of uncertainty in the estimates. The estimates from the five series presented in December 2010 range from 305,684,000 to 312,713,000. The middle series of the DA estimates was nearly a perfect match to the 2010 Census count so when the DA estimates were updated in May 2012, only the middle series was updated.

3.3 Dual-Systems Estimates Methodology

The other major source of data on net undercounts and overcounts in the U.S. Decennial Census is the Census Bureau’s Dual-Systems Estimates (DSE) method. This section describes the estimation method used in generating the net coverage for the household population from the DSE approach. The DSE method also provides estimates for the other components of Census coverage shown below. According to Hogan (1993) overall Census coverage can be separated in to the four components below;

  1. (1)

    Erroneous enumerations due to duplication,

  2. (2)

    Erroneous enumerations (fictitious, out-of-scope, died before Census day, born after Census day),

  3. (3)

    Whole-person imputations, and

  4. (4)

    Omissions.

The Dual-System Estimates (DSE) method compares Census results to the results of a Post-Enumeration Survey (PES) which is conducted right after Census data collection has been completed to determine the number and characteristics of people who are omitted or included erroneously (mostly those double-counted).

Nomenclature can be confusing in this arena. The terms Dual-Systems Estimates (DSE) and Post-Enumeration Survey (PES) are often used interchangeably. Moreover, the DSE/PES approach has been given a different name in each of the past three U.S. Censuses. In 2010, it was called Census Coverage Measurement (CCM), in the 2000 Census it was called Accuracy and Coverage Evaluation (A.C.E.) and in the 1990 Census it was called the Post-Enumeration Survey (PES). Sometimes the DSE or PES approach is simply called the “survey method.” The DSE operation in the 2020 Census will be called PES again (U.S. Census Bureau 2017).

There is a long history of using Dual-System Estimation in measuring coverage errors in a Census (Hogan 1993; U.S. Census Bureau 2004; Wolter 1986). But it is widely believed that DSE estimates that are consistent over time began in 1990. For a detailed explanation of the CCM estimation methodology used in the 2010 Census, see Mule (2008).

Dual-System Estimation is based on what is sometimes referred to as a capture-recapture methodology. The Census is the first system or first capture point and the Post-Enumeration Survey is the second capture point. To estimate the number of people correctly included in the Census, one must take a sample from Census enumerations to match to the PES. In the 2010 Census the sample from the Census is referred to as the Enumeration or E-sample and the Post-Enumeration Survey is used to make the second capture and the population in the Post-Enumeration Survey is referred to as the Population or P-sample.

The 2010 CCM program involved a complex sample of about 170,000 housing units in a sample of Census blocks nationwide (Mule 2010). In every sampled block, Census staff did an independent listing of housing units and independent roster of every person living in those housing units as of April 1, 2010, which were then compared to Census records. Because the DSE figures are based on a sample, sampling error was calculated for each estimate to determine statistical significance. Sampling error is not a major issue for large national groups but for smaller groups and small areas, the sampling errors are often large.

The PES interview is used to determine if the person enumerated in the Post-Enumeration Survey should have been counted in a housing unit on Census day (April 1). By comparing the PES results to the Census, CCM can estimate the number of correct enumerations in the Census. Matching also produces an estimate of the erroneous enumerations. Whole-person imputations are taken from census records.

3.4 Strengths and Limitations of DA and DSE Methods

Both the DA and DSE methods for evaluating Census results have strengths and limitations which are discussed below.

There are four major limitations to DA. First, coverage estimates from DA are routinely only available for the nation as a whole. Because many people move after they are born, estimating coverage for subnational geographic units is difficult. DA only tracks in and out migration at the national level.

The population age 0–9, is an exception to this rule. Subnational analysis can be done for the population age 0–9, because the Census Bureau’s population estimates for age 0–9 are not linked to the previous Decennial Census (O’Hare 2014; Mayol-Garcia and Robinson 2011; Robinson et al. 1993; Adlakha et al. 2003, U.S. Census Bureau 2014; King et al. 2018). The 2010 estimates for the population age 0–9, are based on a DA-like method that uses births, deaths, and migration to estimate state and county populations.

Second, DA estimates are only available for a few race/ethnic groups. Historically the estimates have only been available for Black and Non-Black groups. This restriction is due to the lack of race specificity and consistency for data collected on the birth and death certificates historically. The only group that has been identified relatively consistently over time is the Black population, and the residual group is labeled Non-Black. In the 2010 DA program, estimates were produced for Black Alone and for Black Alone or in Combination, but only for the population under age 30.

However, in the past few decades comparisons of Black and Non-Black groups have become more problematic because Hispanics are mostly included in the Non-Black group. The Hispanic population is growing rapidly, and Hispanics have high net undercount rates in the Census.

The 2010 DA estimates also include data for Hispanics for the first time, but only for the population under age 20. Hispanics under age 20 were included in the DA estimates in 2010 because Hispanics have been consistently identified in birth and death certificates since 1990.

The third limitation of the DA estimates is that they only supply net undercount/overcount figures. A net undercount of zero could be the result of no one being missed (omissions) or double counted (erroneous enumerations) or for example, it could be the result of ten percent of the population being missed and ten percent counted twice.

The fourth limitation of the DA methodology is the lack of any measures of uncertainty for the estimates similar to standard errors associated with estimates based on sample surveys. However, it should be noted that in the December 2010 DA release, the Census Bureau released five different estimate series based on five sets of assumptions about births, deaths, and net international migration to reflect some of the uncertainty regarding the DA estimates.

Despite these limitations, DA has been used for many decades, the underlying data and methodology are simple and robust, and it has provided useful information for those trying to understand the strengths and weaknesses of the U.S. Decennial Census. According to Robinson (2000, p. 1), “The national DA estimates have become the accepted benchmark for tracking historical trends in net Census undercounts and for assessing coverage differences by age, sex, and race (Black, all other).”

There are several important limitations of the DSE method which should also be acknowledged. First, is the issue of correlation bias. Correlation bias means the kinds of people who are undercounted in the Census are also likely to be undercounted in the PES. This violates the independence assumption of the DSE methodology. If a group of people are likely to be missed in both the Census and the PES, the undercount estimate for that group will be biased downward. According to Martin (2007, p. 436), “The same groups that are affected by coverage errors in the Census also are affected in demographic surveys conducted by the U.S. Census Bureau and other organizations.”

The issue of correlation bias in the DSE approach has been discussed by other researchers (Wolter 1986; Wachter and Freedman 1999: Shores 2002; Shores and Sands 2003; The National Research Council 2009). In the 2010 DSE estimates, the Census Bureau (U.S. Census Bureau 2012c) made adjustment for correlation bias for some groups (Black men) but not for others (Hispanic men or young children).

Second, to tell if an individual was counted correctly in the Census, individuals in the Post-Enumeration Survey must be matched to those in the Census records. This raises a couple of potential problems. First, people don’t always provide their names consistently. For example, a person might be listed as John Jones in the Census, and Johnathan Jones Jr. in the PES. Deciding if these two entries are a match is not always clear. Often, but not always, the Census Bureau has additional information like address and birthdates to help with matching. Nonetheless the matching procedure allows potential error.

The DSE approach is also hindered sometimes because there is little or no information for some people included in the Census. For example, as a last resort an enumerator may contact a neighbor to find out about people living in a household and the neighbor may say there are two adults and a young child, but no names or ages are provided. These people are included in the Census, but there is not enough information about them in the Census records to allow matching to the PES records.

Third, the method relies heavily on the memory of individuals. In August or September after the April 1 Census, respondents were asked to list all members of their household as of April 1. For some households and some individuals this is a challenge. As Martin (2007, p. 429) states, “Respondents interviewed months after April 1 may find it difficult to recall accurately when a move occurred.”

On the other hand, the DSE method has several advantages. One advantage of the DSE method is that the Census Bureau controls all the data collection (unlike DA where they depend on vital records data). Therefore, the concepts and questions used in the PES can be made identical to those used in the Census. For example, questions about race can be asked the same way in the Census and the PES.

A major advantage of the DSE method is that it can be used to ascertain components of Census coverage such as omissions, and erroneous inclusions. This provides a much richer picture of Census coverage than simply looking at net undercounts and net overcounts.

Because the DSE is based on a carefully drawn sample, the coverage estimates include standard measures of uncertainty. DSE data can provide subnational estimates, although the extent to which this is feasible depends to some extent on sample size. In the 2010 CCM, no state had a net undercount that was statistically significantly different from zero (U.S. Census Bureau 2012b). This is likely related, at least in part, to a relatively small sample is some states.

3.5 Consistencies and Inconsistencies Between DA and DSE Results

Table 3.1 shows differences between net undercount estimates of DA and DSE in the 2010 Census for several age-sex groups. For the most part, the results of DA and DSE are relatively consistent. Generally, the groups that have a high net undercount in DA also have a high net undercount in DSE.

Table 3.1 Comparison of 2010 census net coverage error from DA and DSE for demographic groups

For all adult age groups examined, the differences are less than 1.6 percentage points. However, for the population age 0–4, the difference is 3.9 percentage points and for age 5–9 the difference is 2.5 percentage points.

O’Hare et al. (2016) provide detailed documentation of the consistencies and inconsistencies between DSE and DA estimates for young children and after close examination of the differences in the net undercount estimate for children, O’Hare et al. (2016, p. 702) conclude, “…the DSE approach may underestimate the net undercount of young children due to correlation bias.” The inconsistency of undercount estimates for young children from DA and DSE has been noted before (U.S. Census Bureau 2003, p. v; National Research Council report 2004, p. 254). Most experts agree that DA is a better method for measuring the net undercount of young children (U.S. Census Bureau 2014). Given the problems of the DSE estimates for young children they are not included in the DSE tables in the remainder of this book. To the best of my knowledge there are no plans to change the DSE methodology in the 2020 Census to eliminate this problem.

Based on a special analysis which takes advantage of the strengths of both the DA and DSE methods the U.S. Census Bureau (2016) produced reliable omissions rates for young children and used the same method to produce omissions rates for other age/sex groups. Net undercount rates and omissions rates for several groups defined by age and sex based on adjusted omissions rates are shown in Table 3.2.

Table 3.2 Comparison of demographic analysis-based estimates of omissions and net coverage error in the 2010 census by age and sex

Groups that have high net undercount rates typically have relatively high omissions rates, but not always. Note young children (age 0–4) had the highest net undercount rate and the highest omissions rate but males age 18–29 had a relatively low net undercount rate and a relatively high omissions rate. For people in this group, the high omissions rate is balanced by the high erroneous enumeration rate leading to a low net coverage rate. This issue is explored in Chap. 5.

3.6 Measuring the Net Undercount by Race

Historically, Black is the only race group that has been coded relatively consistently in birth and death certificate data, so it is the only group for which DA estimates could be produced. The residual category is labeled Non-Black.

Different data from the Census have been used to compare Census and DA results for Blacks over time. Prior to the 1980 Census, the U.S. Decennial Census figures that were used to compare with the DA estimates for Blacks were the reported U.S. Decennial Census figures for Blacks. In 1980, the Census Bureau compared the DA estimates to a modified file which assigned people in the “some other race” category to a Black or Non-Black category (Fay et al. 1988). In 1990, the Census Bureau used the race of father from the birth certificate to assign race to newborns and then compared DA estimates for Blacks to the MARS (Modified Age, Race, and Sex) file from the Census. For 2000, the Census Bureau used race of father from the birth certificate to assign race to newborns and then DA estimates were compared to an average of Black alone and Black alone or in combination based on the Census Bureau’s modified race file (U.S. Bureau of the Census 2003). While there are some inconsistencies in the way race has been measured from one Census to another, it is generally felt the DA estimates can be used to compare undercount estimates for Blacks since 1950 (National Research Council 2004; Velkoff 2011).

The revised DA estimates issued in May 2012 are used for most of the 2010 DA estimates in this book. In May 2012 the Census Bureau issued revised Demographic Analysis estimates, for the total population, the Black alone population, the Black alone or in Combination population, the Not Black Alone population and the Not Black Alone or in combination population (U.S. Census Bureau, 2012a). The estimates for the Black Alone or in Combination populations were only provided for the population below age 30.

The assessment of Black and Non-Black undercount differentials has always had some methodological issues, but those issues increased dramatically since the 2000 Census when respondents were allowed to mark more than one race. See Robinson (2010) for a good general discussion of issues associated with racial U.S. Office of Management and Budget (1997) classifications in the U.S. Decennial Census and the vital events registers.

Key to the DA method for Blacks and Non-Blacks is making the race categories in the vital events data and the Decennial Census data consistent and there are multiple problems in trying to make data collected in the U.S. Decennial Census racial categories comparable to the race data collected on birth and death certificates. In discussing the use of vital statistics for DA estimates by race the U.S. Census Bureau (Devine et al. 2010, p. 4) conclude, “…developing the estimates for DA race categories comes with a more complex, and substantial set of challenges.” For a more detailed description of these problems see O’Hare (2015, Sect. 2.5).

For example, the “Some Other Race” category is a response category for the race question in the U.S. Decennial Census but not in birth or death certificates. So, to make comparisons, people who were in the “some other race” category in the Census had to be re-assigned to one or more of the five major race categories.

A second issue is the fact that U.S. Decennial Census respondents in 2000 and 2010 could mark more than one race but it wasn’t until 2003 that the federal government issued new standard birth certificate forms allowing parents to mark more than one race. However, birth and death certificate data are collected by states and states did not all adopt the new forms immediately. DA analysis required that the mixed-race data from the birth (and death) certificates be put into Black and Non-Black categories, based on both single-race and multiple-race reported by mother and fathers

Another issue is that birth certificate forms only record the race of the mother and father while the race of a child is asked directly in the Decennial Census. Thus, for birth certificate data, the race of the newborn must be inferred from the race of the parent(s). This is further complicated by a significant level of missing data. While data on the race of mother is relatively complete, many birth certificates are missing data on the race of the father. In 2009, 19% of birth certificate forms did not contain the race of the father (Martin et al. 2011).

Figure 3.1 shows inconsistencies between Black Alone and Black Alone or in Combination are biggest for the youngest age groups. The youngest age groups are the ones that are dependent on matching the race data from the new birth certificates to the race data from the Census.

Fig. 3.1
figure 1

Source U.S. Census Bureau, May 2012 DA Release

2010 census undercounts for black alone and black alone or in combination by single year of age: 0–17.

Given the issues described above, one should view DA estimates for Blacks (alone or alone or in combination) cautiously. Small differences or small changes over time could be due to methodological issues rather than real changes or differences.

3.7 Summary

The main methods for measuring Census undercounts are Demographic Analysis (DA) and Dual-Systems Estimates (DSE). The DA method and the DSE method both have strengths and weaknesses. These two methods produce results that are fairly consistent for all age groups except young children. For the population age 0–4, the DA method estimates a net undercount of 4.6% compared to 0.7% for the DSE method (the DSE method is called Census Coverage Measurement in the 2010 Census). The DA method is widely viewed as the better method for estimates net undercount for young children because it relies heavily on vital events data which are very high quality.

Given the changing methods for identifying the race of people in the Census and in birth/death certificates, and the complications of trying to make the racial categories from the birth certificates consistent with those offered in the Census, DA net undercount estimates for Blacks should be used cautiously.