Skip to main content
Log in

Big Microdata for Population Research

  • Published:
Demography

Abstract

This article describes an explosion in the availability of individual-level population data. By 2018, demographic researchers will have access to over 2 billion records of accessible microdata from over 100 countries, dating from 1703 to the present. Another 2 to 4 billion records will be available through restricted-access data enclaves. These new resources represent a new kind of data that will enable transformative research on demographic and economic change and the spatial organization of society.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. A few national statistical offices—including those of Australia, Brazil, China, Colombia, Mexico, Norway, and South Africa—made internal microdata available to selected academic researchers by special arrangement. Statistics Canada began producing Public Use Microdata Files (PUMF) in 1974, and the United Kingdom created Samples of Anonymized Records (SARS) in 1993.

  2. Early national historical samples were created for Argentina and Canada, but they did not become broadly accessible until much later (Darroch and Ornstein 1979; Somoza and Lattes 1967).

  3. This estimate covers the costs of dual keying only; data cleaning, checking, and reconciling two copies would incur additional expense. The cost estimate assumes the average Ancestry.com keying rate and the U.S. average salary for data-entry keyers according to the Bureau of Labor Statistics (2011).

References

  • Abadi, D .J., Madden, S. R., & Hachem, N. (2008). Column-stores vs. row-stores: How different are they really? In SIGMOD’08, Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (pp. 967–980). New York: Association for Computing Machinery.

  • Ancestry.com. (2008). FamilySearch and Ancestry.com team to publish new images and enhanced indexes to the U.S. censuses [Press release]. Retrieved from http://corporate.ancestry.com/press/press-releases/2008/07/familysearch-and-ancestry.com-team-to-publish-new-images-and-enhanced-indexes-to-the-u.s.-censuses/

  • Berg, S. V. (1973). The CPS viewed from the outside. Annals of Economic and Social Measurement, 2, 99–105.

    Google Scholar 

  • Brunsman, H. G. (1963). Letter to Dr. Joshua Lederberg, April 22, 1963. Joshua Lederburg Papers, National Library of Medicine. Retrieved from http://profiles.nlm.nih.gov/BB/G/K/F/C/_/bbgkfc.pdf

  • Bureau of Labor Statistics. (2011). Occupational employment and wages, May 2011. 43–9021 Data Entry Keyers. Retrieved from http://www.bls.gov/oes/current/oes439021.htm

  • Census Bureau, U. S. (1964). Censuses of population and housing, 1960: Two national samples of the population of the United States, 1/1,000, 1/10,000: Description and technical documentation. Washington, DC: U.S. Government Printing Office.

    Google Scholar 

  • Darroch, G., & Ornstein, M. (1979). Canadian historical social mobility project. National sample of the 1871 census of Canada [computer file]. Toronto, Ontario: York Institute for Social Research and Department of Sociology, York University.

  • Duncan, O. D. (1974). Developing social indicators. Proceedings of the National Academy of Sciences, 71, 5096–5102.

    Article  Google Scholar 

  • Duncan, J. W., & Shelton, W. C. (1978). Revolution in United States government statistics: 1926–1976. U.S. Department of Commerce Office of Statistical Policy and Standards. Washington, DC: U.S. Government Printing Office.

    Google Scholar 

  • Esteve, A., Lesthaeghe, R., & López, A. (2012). The Latin American cohabitation boom, 1970–2007. Population and Development Review, 18, 55–82.

    Article  Google Scholar 

  • Ferrie, J. P. (2005). History lessons: The end of American exceptionalism? Mobility in the United States since 1850. Journal of Economic Perspectives, 19, 199–215.

    Article  Google Scholar 

  • Giannotti, F., Pedreschi, D., Pentland, A., Lukowicz, P., Kossmann, D., Crowley, J., & Helbing, D. (2013). A planetary nervous system for social mining and collective awareness. European Physical Journal, 214, 49–75.

    Google Scholar 

  • Groves, R. (2011). “Designed data” and “organic data.” Director’s Blog, U.S. Census Bureau. Retrieved from http://directorsblog.blogs.census.gov/2011/05/31/designed-data-and-organic-data/

  • Hauser, P. M. (1960). The 1960 census as an instrument for demographic research. Population Index, 26, 199–211.

    Article  Google Scholar 

  • Keller, S. A., Koonin, S. E., & Shipp, S. (2012). Big data and city living: What can it do for us? Significance, 9, 4–7.

    Article  Google Scholar 

  • King, G. (2011). Ensuring the data-rich future of the social sciences. Science, 331, 719–721.

    Article  Google Scholar 

  • Kraus, R. S. (2011). Statistical déjà vu: The National Data Center Proposal of 1965 and its descendants (Working paper). Washington, DC: U.S. Census Bureau Retrieved from http://www.census.gov/history/www/reference/publications/working_papers.html

  • Lee, B. A., Reardon, S. F., Firebaugh, G., Farrell, C. R., Matthews, S. A., & O’Sullivan, D. (2008). Beyond the census tract: Patterns and determinants of racial segregation at multiple geographic scales. American Sociological Review, 73, 766–791.

    Article  Google Scholar 

  • Logan, J., & Zhang, W. (2012). White ethnic residential segregation in historical perspective: US cities in 1880. Social Science Research, 41, 1292–1306.

    Article  Google Scholar 

  • Long, J., & Ferrie, J. P. (2007). The path to convergence: Intergenerational occupational mobility in Britain and the U.S. in three eras. The Economic Journal, 117, C61–C71.

    Article  Google Scholar 

  • Mason, W. M., Taeuber, K. E., & Winsborough, H. (1977). Old data for new research (CDE Working Paper 77–3). Madison, WI: Center for Demography and Ecology.

  • McCaa, R., & Ruggles, S. (2002). The census in global perspective and the coming microdata revolution. Scandinavian Population Studies, 13, 7–30.

    Google Scholar 

  • Owsley, F. L., & Owsley, H. C. (1940). The economic basis of society in the late ante-bellum South. Journal of Southern History, 6, 24–25.

    Article  Google Scholar 

  • Roberts, E., Ruggles, S., Dillon, L. Y., Gardarsdottir, O., Oldervoll, J., Thorvaldsen, G., & Woollard, M. (2003). The North Atlantic Population Project: An overview. Historical Methods, 36, 80–88.

    Article  Google Scholar 

  • Ruggles, S. (2005). The Minnesota Population Center data integration projects: Challenges of harmonizing census microdata across time and place. In 2005 Proceedings of the American Statistical Association, Government Statistics Section (pp. 1405–1415). Alexandria, VA: American Statistical Association.

    Google Scholar 

  • Ruggles, S. (2011). Intergenerational coresidence and family transitions in the United States, 1850–1880. Journal of Marriage and Family, 73, 136–148.

    Article  Google Scholar 

  • Ruggles, S., Alexander, J. T., Genadek, K., Goeken, R., Schroeder, M., & Sobek, M. (2012). Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database]. Minneapolis: University of Minnesota.

  • Ruggles, S., Roberts, E., Sarkar, S., & Sobek, M. (2011a). The North Atlantic Population Project: Progress and prospects. Historical Methods, 44, 1–7.

    Article  Google Scholar 

  • Ruggles, S., Schroeder, M., Rivers, N., Alexander, J. T., & Gardner, T. K. (2011b). Frozen film and FOSDIC forms: Restoring the 1960 census of population. Historical Methods, 44, 69–78.

    Article  Google Scholar 

  • Sobek, M., Cleveland, L., Flood, S., Ruggles, S., & Schroeder, M. (2011). Big data: Large-scale historical infrastructure from the Minnesota Population Center. Historical Methods, 44, 61–68.

    Article  Google Scholar 

  • Somoza, J., & Lattes, A. (1967). Muestras de los dos primeros censos nacionales de población, 1869 y 1895 [Samples of the first two national censuses of population, 1869 and 1895] (Documento de Trabajo No. 46). Buenos Aires, Argentina: Instituto Torcuato Di Tella, Centro de Investigaciones Sociales.

  • U.S. Census Bureau. (1964). Censuses of population and housing, 1960: Two national samples of the population of the United States, 1/1,000, 1/10,000: Description and technical documentation. Washington, DC: U.S. Government Printing Office.

Download references

Acknowledgments

The data described in this article are supported by grants and contracts from the National Science Foundation (ACI 0940818, SES 0851414, SES 0851417, and SES 1155572) and the National Institutes of Health (R01 HD073967, R01 AG041831, R01 HD047283, R01 HD052110, R24 HD41023, R01 HD060676, R01 HD047283, R01 HD041575, R01 HD044154, and R01 HD43392). My thanks for the helpful comments and suggestions of Catherine Fitch, Miriam King, Robert McCaa, and anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven Ruggles.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruggles, S. Big Microdata for Population Research. Demography 51, 287–297 (2014). https://doi.org/10.1007/s13524-013-0240-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13524-013-0240-2

Keywords

Navigation