Skip to main content
Log in

Quality and the 2010 Census

  • Published:
Population Research and Policy Review Aims and scope Submit manuscript

Abstract

The U.S. Census Bureau has a long tradition of evaluating the results of its censuses. This paper presents evaluation results from the 2010 Census, comparing them to earlier results. The paper discusses net coverage at the national and state level, as well as by age, sex, race, and ethnic group. It discusses components of error, including estimated number missed and counted in error. It also presents data on whole-person and item imputation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Medicare is a federal health insurance program that covers most people aged 65 and over.

References

  • Cantwell, P. J., Hogan, H., & Styles, K. M. (2004). “The Use of Statistical Methods in the U.S. Census: Utah v. Evans. The American Statistician, 58(3), 203–212.

    Article  Google Scholar 

  • Devine, J., Bhaskar, R., DeSalvo, B. Robinson, J.G., Scopilliti, M. and West K. (2012). The Development and Sensitivity Analysis of the 2010 Demographic Analysis Estimates. U.S. Census Bureau, Population Division Working Paper No. 93. Retrieved from http://www.census.gov/people/files/popworkingpapers/DevelopmentandSensitivityAnalysis2010DA.pdf.

  • Dusch, G. and Meier, F. (2012). “2010 Census Content Reinterview Survey Evaluation Report,” 2010 Census Planning Memoranda Series, No. 206.

  • ESCAP (2001). Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy, March 1, 2001. Retrieved from http://www.census.gov/dmd/www/pdf/Escap2.pdf.

  • Guarneri, C. E., & Dick, C. (2012). Methods of Assigning Race and Hispanic Origin to Births from Vital Statistics Data. Washington, D.C., January: Paper presented at the Federal Committee on Statistical Methodology Annual Meeting.

    Google Scholar 

  • Mule, T. (2012). 2010 Census Coverage Measurement Estimation Report: Summary of Estimates of Coverage for Persons in the United States, DSSD 2010 Census Coverage Measurement Memorandum Series #2010-G-01. Retrieved from http://www.census.gov/coverage_measurement/pdfs/g01.pdf.

  • Mulry, M. H. and Hogan, H. (1986). Research Plan on Census Adjustment Standards. Proceedings of the Section on Survey Research Methods (pp. 566–570). Alexandria, VA: American Statistical Association. Retrieved from http://www.amstat.org/sections/SRMS/Proceedings/papers/1986_106.pdf.

  • Panel on Small Area Estimates (1980). Estimating Population and Income of Small Areas. National Academy of Sciences Committee on National Statistics. Washington, DC. See also Population Index 1988 Vol 54 No 3 Cover.

  • Robinson, J. G., (1987). Perspectives on the Completeness of Coverage of Population in the United States Decennial Censuses. Paper presented at the Annual Meeting of the Population Association of America, New Orleans, Louisiana. See also Population Index 1988 Vol 54 No 3 Cover.

  • U.S. Bureau of the Census (1960).The Post-Enumeration Survey: 1950. Technical Paper No 4, Table A, p 5.

  • U.S. Census Bureau (2011). Alternative Demographic Analysis Estimates of the Undercount for the Hispanic Population Under 20. Retrieved from http://www.census.gov/newsroom/releases/pdf/20101206_da_table_14.pdf on 12 November 2012.

  • U.S. Census Bureau (2012a). Alternative Demographic Analysis Estimates of the Undercount by Race. Retrieved from http://www.census.gov/popest/research/da-estimates/Table_3.pdf on 12 November 2012.

  • U.S. Census Bureau (2012b). 2010 Census Count Review Program Assessment Report, 2010 Census Planning Memoranda Series. June 21.

Download references

Acknowledgments

This paper would not be possible but for the skill and dedication of the Census staff that conducted the Census Coverage Measurement and the Demographic Analysis. Additionally, the authors wish to thank Tiffney Yowell for careful fact checking and Marjorie Hanson for helping us greatly improve the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Howard Hogan.

Additional information

The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau.

Appendices

Appendix A. Overview of the Demographic Analysis (DA) Methodology

This methodological description focuses on the major components of the Demographic Analysis (DA) middle series. A more detailed description of the DA methodology is available in Devine et al. (2012). Each decade, key assumptions of the DA methodology are revisited and in some cases altered. For the DA in 2010, this led to changes in how race and Hispanic origin were assigned to births occurring after 1980, the use of an entirely different approach for estimating international migration for the period from 2000 to 2010, and the use of component-based estimates for all ages under 75.

The DA is an analysis of aggregate data to obtain estimates of the total population independent of the current census. Separate methods are used in DA to obtain estimates for two segments of the total population. Initially, the two segments included the population under age 65 and the population aged 75 and over.

Ages Under 75

The DA estimates for the population aged 0–74 (P0–74) are derived by the basic demographic accounting equation applied to each birth cohort:

$$ {\text{P}}_{0-64} = {\text{B}} - {\text{D}} + {\text{I}} - {\text{E}} $$
(1)

DA estimates for the population below age 75 are developed from a compilation of historic estimates of the components of population change (the cohort-component approach): births corrected for under-registration, beginning with April 1, 1935 (B); deaths to persons born since April 1, 1935 (D); immigrants below age 75 (I); and emigrants below age 75 (E).

For the DA in 2000, the cohort-component approach was not used for populations born prior to 1935 because the birth registration system did not include all states until 1933. For the DA in 2010, while the cohort-component approach was used initially for the population below age 65, data from 1935 to 1944 were carried forward to produce an estimate of the population aged 65–74. This resulted in an overlap between the estimates developed using the component approach and the Medicare-based estimates. A revised set of DA estimates was released that used the component-based estimates for ages 65–74.

In 2010, births represented by far the largest component of the DA estimates. The immigration component is the second largest, followed by deaths and emigration. The DA estimates produced in 2000 represent the starting point for the development of the 2010 DA estimates for the population under age 75. To obtain the DA estimates for April 1, 2010, births and immigration were added and deaths and emigration were subtracted.

The actual calculations used to develop the DA estimates are carried out for single-year birth cohorts by sex, race (Black and non-Black), and ethnicity (Hispanic and non-Hispanic). For example, the estimate of the population aged 58 on April 1, 2010, was based on births from April of 1951 through March of 1952 (corrected for under-registration), reduced by deaths to the cohort in each year between 1951 and 2010, with additions and subtractions for estimated immigration and emigration of the cohort between 1951 and 2010.

Ages 75 and Over

Administrative data on aggregate Medicare enrollments are used to develop an estimate of the population aged 75 and over (P75+):

$$ {\text{P}}_{75 + } = {\text{M}} + {\text{m}} $$
(2)

where M is the aggregate Medicare enrollment and m is the estimate of the number not enrolled in Medicare.Footnote 1 Although Medicare enrollment is generally presumed to be quite complete, under-enrollment factors are applied to account for individuals who are not enrolled. Some groups are not eligible to enroll, such as federal employees who are covered under a specific retirement program; some may delay enrollment until a date later than when they became eligible; and some may never enroll. Under-enrollment factors are based on estimates of Medicare coverage developed from the Current Population Survey (CPS) and data on age at enrollment in the Medicare file.

The estimates for the population developed using the cohort-component approach are combined with the Medicare-based estimates to produce the total DA population.

Appendix B. Additional Background on the Census Coverage Measurement Program in 2010

This appendix provides a brief overview of the design and methods used to measure net coverage and to estimate the components of census coverage in the 2010 U.S. Census.

A post-enumeration survey can be designed to measure the coverage of the people and the housing units in a census. In recent decades, the Census Bureau has used two samples, the independent sample (P sample) and the enumeration sample (E sample). The former is a sample of housing units and persons selected independent of the census and designed to produce an estimate of how many people were missed in the census. Members of P-sample households are interviewed and then matched to the census on a case-by-case basis to determine whether they were enumerated in the census or missed. The E sample is a sample of census enumerations or records, typically (although not necessarily) in the same areas as the P sample. It is designed to produce an estimate of the number of erroneous inclusions.

CCM Field and Matching Operations

The 2010 Census Coverage Measurement (CCM) survey was a probability sample of approximately 170,000 housing units in the U.S. and 7,500 housing units in Puerto Rico. The CCM primary sampling unit was a block cluster, which contained one or more geographically contiguous census blocks. When joining blocks together into clusters, the Census Bureau considered the costs and timing of field operations as well as the implications on the resulting statistical estimates. A stratified sample of block clusters was selected for each state; Washington, D.C.; and Puerto Rico. Several characteristics of the block clusters were used to stratify the sample, such as the number of housing units in the cluster, whether the cluster was made up primarily of owners or renters, and whether it was on an American Indian reservation. For estimates of housing units, the operations were largely analogous to those described below for persons.

The post-enumeration survey approach to measuring census coverage relies on maintaining independence between census and survey operations. For this reason, from August through December of 2009, CCM staff independently listed all addresses in P-sample block clusters without the use of any information, maps, or materials used to form the address list for the 2010 Census and without participation or assistance from staff working on census operations. Later, interviewing and other operations were also scheduled in the field to minimize interaction between the census and CCM staff.

From August to mid-October of 2010, which was several months after Census Day (April 1, 2010), the Census Bureau attempted to conduct a CCM person interview of all P-sample households in each sample block cluster. Field interviewers collected information about the current residents of the sample housing unit (that is, those who lived there on Census Day and those who moved into the unit after Census Day) and people who had moved out of the unit between Census Day and the time of the CCM interview. The interviewers collected demographic information on each person, including the name, sex, age, date of birth, race, Hispanic origin, relationship to the householder, and whether the household owned or rented the unit. People were asked whether they had lived at other addresses between the time of the census and the day of the interview, and (if so) when.

Beginning in November 2010, an extensive computer-matching operation was conducted. In each sample block cluster, a computerized search of census records in the “local search area”—the same sample block cluster and one ring of surrounding blocks—tried to determine matches between P-sample persons and census enumerations. These matches represented people who were correctly enumerated in the census and the CCM. In addition, P-sample persons were matched to other P-sample persons within the search area to identify possible duplicate enumerations in the P sample. Finally, records in the E sample were matched to the entire census in an effort to identify duplicate census records (erroneous inclusions).

After the completion of computer matching, the matching staff reviewed many cases and tried to clerically match those cases that the computer did not match and to resolve those the computer identified as possible matches. In addition, matchers conducted clerical searches for duplicate persons. Early in 2011, following the clerical match, some cases in which the information remained uncertain as to where the person lived and when, or if the record matched or not, were sent back into the field for follow-up and to attempt a final resolution.

CCM Estimation

After all field operations and data collection, statistical procedures were conducted to address missing data. Like the post-enumeration programs of prior decades, the 2010 CCM program measured net coverage error using a technique called dual system estimation. The strategy is based on capture–recapture methodology. As an example, to estimate the number of fish in a pond using this approach, one captures a set of fish, tags them for later identification, and throws them back into the pond. After the fish have had time to disperse sufficiently, one captures a second set of fish. This second set of fish would be the “recapture.” Then, one counts the number of recaptured fish and also how many of them are tagged.

If we can assume (1) that the capture and recapture are independent, (2) that the chance of being captured initially is the same for each fish, and (3) that the chance of being recaptured is the same for each fish (although not necessarily equal to the chance of initial capture), then an estimate of the fish population in the pond can be made. Turning from estimating fish to estimating people, the census enumeration is the initial capture, while the independent sample represents the recapture. People found in the independent sample who are matched to the census list are analogous to the recaptured fish that were tagged, that is, captured both times.

Under the assumptions mentioned above, an estimate of the population can be derived by starting with the number of correct census inclusions and inflating it by the inverse of the rate of matches among the people in the P sample (analogous to the inverse of the rate of tagged fish among those in the second capture). The “correct” census inclusions do not include duplicates or other erroneous enumerations. One can think of these other erroneous inclusions as captured “frogs”; frogs do not contribute to the size of the fish population in the pond.

The correct enumerations also do not include what are referred to as whole-person imputations. These imputations may well represent people who should correctly be counted in the census enumeration. However, because their census records have so little valid information, we cannot match them to the P sample and verify that the enumeration is correct. Removing them from the set of correct census enumerations tends to balance what appear to be omissions as estimated by the P sample.

Although we have records from the entire census, determining whether a census enumeration is correct or erroneous must be done on a sample basis for reasons of operational expense and time. Based on the E sample, we estimate the number of correct census inclusions.

One may recall the assumption that all fish have an equal chance of being captured initially; as well as for the recapture. The effectiveness of the dual system estimator generally increases if the underlying chance of being enumerated is the same for all people in the E sample and for all people in the P sample. As the probabilities become more and more heterogeneous, an error labeled “correlation bias” can increase.

In the estimation approach used in the recent decades, this issue was addressed by defining a large number of mutually exclusive groups, often called post-strata, such that the probability of an enumeration being correct was more homogeneous among the sample people within the same post-stratum, and similarly for the probability of a match. To estimate the size of the whole population, one then computed the estimated total number of people within each post-stratum and added the estimates across all post-strata. The post-strata were defined by variables that were available (observed or imputed in the samples) and highly correlated with correct enumeration and match status. These variables included demographic characteristics, such as age, sex, race, and owner/renter status, as well as operational variables, such as type of census enumeration area and rate of mail return for the block cluster. The 2010 program applied a similar method, made more efficient through the use of a more general modeling approach.

From the CCM program, the Census Bureau produced estimates of net error and the components of census coverage for important demographic groups, such as gender, race, and Hispanic origin; for specified levels of geography, such states and large counties, cities, and towns; and by various census operational categories. Examples of operational categories include the major types of enumeration areas (e.g., mail-out/mail-back, etc.) and the periods when the census response was collected.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hogan, H., Cantwell, P.J., Devine, J. et al. Quality and the 2010 Census. Popul Res Policy Rev 32, 637–662 (2013). https://doi.org/10.1007/s11113-013-9278-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11113-013-9278-5

Keywords

Navigation