Promises and Pitfalls of Using Digital Traces for Demographic Research


The digital traces that we leave online are increasingly fruitful sources of data for social scientists, including those interested in demographic research. The collection and use of digital data also presents numerous statistical, computational, and ethical challenges, motivating the development of new research approaches to address these burgeoning issues. In this article, we argue that researchers with formal training in demography—those who have a history of developing innovative approaches to using challenging data—are well positioned to contribute to this area of work. We discuss the benefits and challenges of using digital trace data for social and demographic research, and we review examples of current demographic literature that creatively use digital trace data to study processes related to fertility, mortality, and migration. Focusing on Facebook data for advertisers—a novel “digital census” that has largely been untapped by demographers—we provide illustrative and empirical examples of how demographic researchers can manage issues such as bias and representation when using digital trace data. We conclude by offering our perspective on the road ahead regarding demography and its role in the data revolution.

This is a preview of subscription content, log in to check access.


  1. 1.

    For a review of these data, see Ruggles (2014) in a past issue of Demography.

  2. 2.

    See for an overview of Twitter’s public streaming APIs.

  3. 3.

    See Facebook’s Data Policy:

  4. 4.

    See Twitter’s Privacy Policy for more information:

  5. 5.

    See for more information.

  6. 6.

    See for more information on IUSSP workshop events.

  7. 7.

    Find more at

  8. 8.


  9. 9.


  10. 10.

    See Preston et al. (2001:28–30) for details on conducting an age decomposition analysis.


  1. Adams, J., & Brueckner, H. (2015). Wikipedia, sociology, and the promise and pitfalls of Big Data. Big Data & Society, 2(2), 1–5.

    Article  Google Scholar 

  2. Alkema, L., Raftery, A. E., Gerland, P., Clark, S. J., & Pelletier, F. (2012). Estimating trends in the total fertility rate with uncertainty using imperfect data: Examples from West Africa. Demographic Research, 26(article 15), 332–361.

    Article  Google Scholar 

  3. Andrews, C., Fichet, E., Ding, Y., Spiro, E. S., & Starbird, K. (2016). Keeping up with the Tweet-dashians: The impact of “official” accounts on online rumoring. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (pp. 452–465). New York, NY: ACM.

  4. Ang, C., Bobrowicz, A., Schiano, D., & Nardi, B. (2013). Data in the wild: Some reflections. Interactions, 20(2), 39–43.

    Article  Google Scholar 

  5. Araújo, M., Mejova, Y., Weber, I., & Benevenuto, F. (2017). Using Facebook ads audiences for global lifestyle disease surveillance: Promises and limitations. In Proceedings of the 2017 ACM on Web Science Conference (pp. 253–257). New York, NY, ACM.

  6. Barberá, P. (2016). Less is more? How demographic sample weights can improve public opinion estimates based on Twitter data (Working paper). New York: Center for Data Science, New York University. Retrieved from

  7. Barry, B. (2006). Friends for better or for worse: Interracial friendships in the United States as seen through wedding photos. Demography, 43, 491–510.

    Article  Google Scholar 

  8. Belli, R. F., Traugott, M. W., Young, M., & McGonagle, K. A. (1999). Reducing vote overreporting in surveys: Social desirability, memory failure, and source monitoring. Public Opinion Quarterly, 63, 90–108.

    Article  Google Scholar 

  9. Berinsky, A. J. (1999). The two faces of public opinion. American Journal of Political Science, 43, 1209–1230.

    Article  Google Scholar 

  10. Blei, D. M., & Smyth, P. (2017). Science and data science. Science, 114, 8689–8692.

    Google Scholar 

  11. Billari, F. C., D’Amuri, F., & Marcucci, J. (2013, April). Forecasting births using Google. Paper presented at the annual meeting of the Population Association of America, New Orleans, LA.

  12. Billari, F. C., & Zagheni, E. (2017). Big data and population processes: A revolution? In A. Petrucci & R. Verde (Eds.), SIS 2017. Statistics and Data science: New challenges, new generations. 28–30 June 2017 Florence (Italy). Proceedings of the Conference of the Italian Statistical Society (pp. 167–178). Firenze, Italy: Firenze University Press.

  13. Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350, 1073–1076.

    Article  Google Scholar 

  14. Blumenstock, J., & Eagle, N. (2010). Mobile divides: Gender, socioeconomic status, and mobile phone use in Rwanda. In Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development (pp. 6–10). New York, NY: ACM.

  15. Blumenstock, J. E. (2012). Inferring patterns of internal migration from mobile phone call records: Evidence from Rwanda. Information Technology for Development, 18, 107–125.

    Article  Google Scholar 

  16. Blumenstock, J. E., & Eagle, N. (2012). Divided we call: Disparities in access and use of mobile phones in Rwanda. Information Technologies and International Development, 8(2), 1–16.

    Article  Google Scholar 

  17. Blumenstock, J. E., & Toomet, O. (2014, May). Segregation and “silent separation”: Using large-scale network data to model the determinants of ethnic segregation. Paper presented at the annual meeting of the Population Association of America, Boston, MA.

  18. Boyd, D., & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15, 662–679.

    Article  Google Scholar 

  19. Brass, W. (1976). Indirect methods of estimating mortality illustrated by application to Middle East and North African data. In Population Bulletin of the United Nations Economic Commission for Western Asia. Amman, Jordan: UNECWA.

  20. Cesare, N., Spiro, E., & Lee, H. (2015, April). Self-presentation and information disclosure on Twitter: Understanding patterns and mechanisms along demographic lines. Paper presented at the annual meeting of the Population Association of America, San Diego, CA.

  21. Cesare, N., Lee, H., McCormick, T. H., & Spiro, E. S. (2017). Redrawing the silent “color line”: Examining racial segregation in associative networks on Twitter. Unpublished manuscript, University of Washington, Seattle, WA. Retrieved from arXiv:1705.04401.

  22. Couldry, N., & Powell, A. (2014). Big Data from the bottom up. Big Data & Society, 1(2), 1–5.

    Article  Google Scholar 

  23. De Choudhury, M., Counts, S., & Horvitz, E. (2013a). Predicting postpartum changes in emotion and behavior via social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 3267–3276). New York, NY: ACM.

  24. De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013b). Predicting depression via social media. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (pp. 128–137). Palo Alto, CA: AAAI Press. Retrieved from

  25. De Choudhury, M., Sharma, S., & Kiciman, E. (2016). Characterizing dietary choices, nutrition, and language in food deserts via social media. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (pp. 1157–1170). New York, NY: ACM.

  26. Deville, P., Linard, C., Martin, S., Gilbert, M., Stevens, F. R., Gaughan, A. E., . . . Tatem, A. J. (2014). Dynamic population mapping using mobile phone data. Proceedings of the National Academy of Sciences, 111, 15853–15854.

    Article  Google Scholar 

  27. Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., . . . Seligman, M. E. P. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26, 159–169.

    Article  Google Scholar 

  28. Fadnes, L., Taube, A., & Tylleskär, T. (2009). How to identify information bias due to self-reporting in epidemiological research. Internet Journal of Epidemiology, 7(2), 1–8.

    Google Scholar 

  29. Feehan, D., & Cobb, C. (2017, July). How many people have access to the Internet? Estimating Internet adoption around the world using Facebook. Paper presented at the International Conference on Computational Social Science, Cologne, Germany.

  30. Felt, M. (2016). Social media and the social sciences: How researchers employ Big Data analytics. Big Data & Society, 3(1), 1–15.

    Article  Google Scholar 

  31. Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Unpublished manuscript, Department of Statistics, Columbia University, New York, NY. Retrieved from

  32. Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep and daylength across diverse cultures. Science 30, 1878–1881.

    Article  Google Scholar 

  33. Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152.

    Article  Google Scholar 

  34. González-Bailón, S. (2013). Social science in the era of big data. Policy and the Internet, 5, 147–160.

    Article  Google Scholar 

  35. Graham, M., Hale, S., & Stephens, M. (2012). Featured graphic: Digital divide: The geography of Internet access. Environment and Planning, 44, 1009–1010.

    Article  Google Scholar 

  36. Heaivilin, N., Gerbert, B., Page, J. E., & Gibbs, J. L. (2011). Public health surveillance of dental pain via Twitter. Journal of Dental Research, 90, 1047–1051.

    Article  Google Scholar 

  37. Holbrook, A. L., & Krosnick, J. A. (2010). Social desirability bias in voter turnout reports: Tests using the item count technique. Public Opinion Quarterly, 74, 37–67.

    Article  Google Scholar 

  38. Kashyap, R., Billari, F. C., Cavalli, N., Quian, E., & Weber, I. (2017, April). Ultrasound technology and “missing women” in India: Analyses and now-casts based on Google searches. Paper presented at the annual meeting of the Population Association of America, Chicago, IL.

  39. Keyfitz, N., & Caswell, H. (2005). The matrix model framework. In N. Keyfitz & H. Caswell (Eds.), Applied mathematical demography (3rd ed., pp. 47–70). New York, NY: Springer.

    Google Scholar 

  40. Kikas, R., Dumas, M., & Saabas, A. (2015). Explaining international migration in the Skype network. In SIdEWayS ’15: Proceedings of the 1st ACM Workshop on Social Media World Sensors (pp. 17–22). New York, NY: ACM.

  41. Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 1–12.

    Article  Google Scholar 

  42. Latour, B. (2007). Beware, your imagination leaves digital traces. Times Higher Literary Supplement, 6(4). Retrieved from,+your+imagination+leaves+digital+traces#0

  43. Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google flu: Traps in Big Data analysis. Science, 343, 1203–1205.

    Article  Google Scholar 

  44. Lazer, D., & Radford, J. (2017). Data ex machina: Introduction to big data. Annual Review of Sociology, 49, 19–39.

    Article  Google Scholar 

  45. Lee, H., Cesare, N., McCormick, T. H., Morris, J., & Shojaie, A. (2014, May). Redrawing the “color line”: Examining racial homophily of associative networks in social media. Paper presented at the annual meeting of the Population Association of America, Boston, MA.

  46. Lewis, K. (2015). Three fallacies of digital footprints. Big Data & Society, 2(2), 1–4.

    Article  Google Scholar 

  47. Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content analysis in an era of big data: A hybrid approach to computational and manual methods. Journal of Broadcasting & Electronic Media, 57, 1–33.

    Article  Google Scholar 

  48. Lohr, S. (2012, February 11). The age of Big Data. The New York Times. Retrieved from

  49. Madden, M., & Rainie, L. (2015). Americans’ attitudes about privacy, security and surveillance (Report). Washington, DC: Pew Research Center. Retrieved from

  50. Malik, M. M., & Pfeffer, J. (2016, March). Social media data and computational models of mobility: A review for demography. Paper presented at the ICWSM Workshop on Social Media and Demographic Research, Cologne, Germany. Retrieved from

  51. Manovich, L. (2011). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the digital humanities (pp. 460–476). Minneapolis: University of Minnesota Press.

    Google Scholar 

  52. Marwick, A. E., & Boyd, D. (2010). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society, 13, 114–133.

    Article  Google Scholar 

  53. Massey, D. (2016, March). Measuring racial prejudice using Google trends. Paper presented at the annual meeting of the Population Association of America, Washington, DC.

  54. Mateos, P., & Durand, J. (2014, May). Netnography and demography: Mining Internet discussion forums on migration and citizenship. Paper presented at the annual meeting of the Population Association of America, Boston, MA.

  55. McCormick, T. H., Lee, H., Cesare, N., Shojaie, A., & Spiro, E. S. (2015). Using Twitter for demographic and social science research: Tools for data collection and processing. Sociological Methods & Research, 46, 390–421.

    Article  Google Scholar 

  56. Mendieta, J., Su, S., Vaca, C., Ochoa, D., & Vergara, C. (2016). Geo-localized social media data to improve characterization of international travelers. In Proceedings of the 2016 Third International Conference on eDemocracy & eGovernment (ICEDEG) (pp. 126–132). Piscataway, NJ: Institute of Electrical and Electronics Engineers.

  57. Metzler, K., Kim, D. A., Allum, N., & Denman, A. (2016). Who is doing computational social science? Trends in big data research (White paper). London, UK: SAGE Publishing.

  58. Mislove, A., Lehmann, S., & Ahn, Y. (2011). Understanding the demographics of Twitter users. In Proceedings of the Fifth International Conference on Weblogs and Social Media (pp. 554–557). Menlo Park, CA: AAAI Press. Retrieved from

  59. Moreno, M. A., Christakis, D. A., Egan, K. G., Brockman, L. N., & Becker, T. (2012). Associations between displayed alcohol references on Facebook and problem drinking among college students. Archives of Pediatrics & Adolescent Medicine, 166, 157–163.

    Article  Google Scholar 

  60. National Research Council (NRC). (2014). Proposed revisions to the common rule for the protection of human subjects in the behavioral and social sciences (Report). Washington, DC: National Academies Press.

  61. O’Connor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (pp. 122–129). Palo Alto, CA: AAAI Press. Retrieved from

  62. Ojala, J, Zagheni E., Billari, F. C., & Weber, I. (2017). Fertility and its meaning: Evidence from search behavior. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media (pp. 640–643). Palo Alto, CA: AAAI Press.

  63. Palmer, J. R. B., Espenshade, T. J., Bartumeus, F., Chung, C. Y., Ozgencil, N. E., & Li, K. (2013). New approaches to human mobility: Using mobile phones for demographic research. Demography, 50, 1105–1128.

    Article  Google Scholar 

  64. Park, R. E., & Burgess, E. W. (1925). The city. Chicago, IL: University of Chicago Press.

    Google Scholar 

  65. Pettit, B. (2012). Invisible men: Mass incarceration and the myth of black progress. New York, NY: Russell Sage Foundation.

    Google Scholar 

  66. Pew Research Center. (2018). Internet/broadband fact sheet. Washington, DC: Pew Research Center. Retrieved from

  67. Pötzschke, S., & Braun, M. (2016). Migrant sampling using Facebook advertisements: A case study of Polish migrants in four European countries. Social Science Computer Review, 35, 633–653.

    Article  Google Scholar 

  68. Preston, S. H., Heuveline, P., & Guillot, M. (2001). Demography: Measuring and modeling population processes. Oxford, UK: Blackwell.

    Google Scholar 

  69. Reeder, H., McCormick, T. H., & Spiro, E. (2014). Online information behaviors during disaster events: Roles, routines, and reactions (Working Paper No. 144). Seattle, WA: Center for Statistics and the Social Sciences.

  70. Reis, B. Y., & Brownstein, J. S. (2010). Measuring the impact of health policies using Internet search patterns: The case of abortion. BMC Public Health, 10(article 514).

  71. Rosello, J. L. D., & Filgueira, F. (2016, April). Big data in a small country: Integrating birth, maternal and child statistics in Uruguay. Paper presented at the annual meeting of the Population Association of America, Washington, DC.

  72. Rosenfeld, M. J., & Thomas, R. J. (2012). Searching for a mate: The rise of the Internet as a social intermediary. American Sociological Review, 77, 523–547.

    Article  Google Scholar 

  73. Ruggles, S. (2014). Big microdata for population research. Demography, 51, 287–297.

    Article  Google Scholar 

  74. Ruppert, E., Law, J., & Savage, M. (2013). Reassembling social science methods: The challenge of digital devices. Theory, Culture and Society, 30(4), 22–46.

    Article  Google Scholar 

  75. Sagiroglu, S., & Sinanc, D. (2013). Big Data: A review. In 2013 International Conference on Collaboration Technologies and Systems (CTS) (pp. 42–47). Piscataway, NY: Institute of Electrical and Electronics Engineers.

  76. Ševčíková, H., Raftery, A. E., & Waddell, P. A. (2007). Assessing uncertainty in urban simulations using Bayesian melding. Transportation Research, Part B: Methodological, 41, 652–669.

    Article  Google Scholar 

  77. Shaw, C. R., & McKay, H. D. (1942). Juvenile delinquency and urban areas. Chicago, IL: University of Chicago Press.

    Google Scholar 

  78. Smith, A., & Anderson, M. (2018). Social media use in 2018. Washington, DC: Pew Research Center. Retrieved from

  79. Snijders, C., Matzat, U., & Reips, U.-D. (2012). Big data: Big gaps of knowledge in the field of Internet science. International Journal of Internet Science, 7, 1–5. Retrieved from

  80. Starbird, K., Maddock, J., Orand, M., Achterman, P., & Mason, R. M. (2014). Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing. In M. Kindling & E. Greinfeneder (Eds.), iConference 2014 proceedings (pp. 654–662). Urbana-Champaign, IL: iSchools.

    Google Scholar 

  81. Starbird, K., Spiro, E., Edwards, I., Zhou, K., Maddock, J., & Narasimhan, S. (2016). Could this be true? I think so! Expressed uncertainty in online rumoring. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 360–371). New York, NY: ACM.

  82. State, B., Rodriguez, M., Helbing, D., & Zagheni, E. (2014). Migration of professionals to the U.S.: Evidence from LinkedIn data. In L. M. Aiello & D. McFarland (Eds.), 6th International Conference on Social Informatics, SocInfo 2014 (pp. 531–543). Cham, Switzerland: Springer.

    Google Scholar 

  83. Stevenson, A. J. (2014). Finding the Twitter users who stood with Wendy. Contraception, 90, 502–507.

    Article  Google Scholar 

  84. Sutton, J., Spiro, E. S., Johnson, B., Fitzhugh, S., Gibson, B., & Butts, C. T. (2014). Warning tweets: Serial transmission of messages during the warning phase of a disaster event. Information, Communication & Society, 17, 765–787.

    Article  Google Scholar 

  85. Tamgno, J. K., Faye, R. M., & Lishou, C. (2013). Verbal autopsies, mobile data collection for monitoring and warning causes of deaths. In 14th International Conference on Advanced Communication Technology, Technical Proceedings, 2013 (pp. 495–501). Piscataway, NJ: Institute of Electrical and Electronics Engineers. Retrieved from

  86. Taylor, L., Floridi, L., & van der Sloot, L. (Eds.). (2017). Group privacy: New challenges of data technologies. Cham, Switzerland: Springer.

    Google Scholar 

  87. Tomlinson, M., Solomon, W., Singh, Y., Doherty, T., Chopra, M., Ijumba, P., . . . Jackson, D. (2009). The use of mobile phones as a data collection tool: A report from a household survey in South Africa. BMC Medical Informatics and Decision Making, 9(article 51).

  88. Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133, 859–883.

    Article  Google Scholar 

  89. Tourassi, G., Yoon, H. J., & Xu, S. (2016). A novel web informatics approach for automated surveillance of cancer mortality trends. Journal of Biomedical Informatics, 61, 110–118.

    Article  Google Scholar 

  90. Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other methodological pitfalls. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (pp. 505–514). Palo Alto, CA: AAAI Press.

  91. Vitak, J. (2015). I like it….Whatever that means: The evolving relationship between disclosure, audience, and privacy in networked spaces [SlideShare presentation]. Retrieved from

  92. Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2015). Forecasting elections with non-representative polls. International Journal of Forecasting, 31, 980–991.

    Article  Google Scholar 

  93. Willekens, F., Massey, D., Raymer, J., & Beauchemin, C. (2016). International migration under the microscope. Science, 352, 897–899.

    Article  Google Scholar 

  94. Williams, N. E., Thomas, T. A., Dunbar, M., Eagle, N., & Dobra, A. (2015). Measures of human mobility using mobile phone records enhanced with GIS data. PLoS One, 10(7), 1–16.

    Article  Google Scholar 

  95. Zagheni, E., Garimella, V. R. K., Ingmar, W., & State, B. (2014). Inferring international and internal migration patterns from Twitter data. In Proceedings of the 23rd International Conference on World Wide Web (pp. 439–444). New York, NY: ACM Press.

  96. Zagheni, E., & Weber, I. (2012). You are where you e-mail: using e-mail data to estimate international migration rates. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 348–351). New York, NY: ACM.

  97. Zagheni, E., Weber, I., & Gummadi, K. (2017). Leveraging Facebook’s advertising platform to monitor stocks of migrants. Population and Development Review, 43, 721–734.

    Article  Google Scholar 

  98. Zeng, L., Starbird, K., & Spiro, E. S. (2016). Rumors at the speed of light? Modeling the rate of rumor transmission during crisis. In 49th Hawaii International Conference on System Sciences (HICSS), 1969–1978. Piscataway, NJ: Institute of Electrical and Electronics Engineers.

  99. Zimmer, M. (2010). But the data is already public: On the ethics of research in Facebook. Ethics and Information Technology, 12, 313–325.

    Article  Google Scholar 

  100. Zwitter, A. (2014). Big data ethics. Big Data & Society, 1(2), 1–6.

    Article  Google Scholar 

Download references


This work is supported by Grants DMS-1737673 and SES-1559778 from the National Science Foundation and K01 HD078452 from the National Institute of Child Health and Human Development (NICHD). This material is based upon work supported by, or in part by, the U.S. Army Research Laboratory and the U. S. Army Research Office under contract/grant number W911NF-12-1-0379, and by the Washington Research Foundation. We also appreciate the support of the Earl and Edna Stice lectureship in the Social Sciences; the University of Washington Information School, Center for Statistics and the Social Sciences; eScience Institute; and Sociology Department for supporting speakers at the frontier of data science in demographic research.

Author information



Corresponding author

Correspondence to Nina Cesare.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cesare, N., Lee, H., McCormick, T. et al. Promises and Pitfalls of Using Digital Traces for Demographic Research. Demography 55, 1979–1999 (2018).

Download citation


  • Digital data
  • Social media
  • Big data
  • Demographic methods