, Volume 55, Issue 5, pp 1979–1999 | Cite as

Promises and Pitfalls of Using Digital Traces for Demographic Research

  • Nina CesareEmail author
  • Hedwig Lee
  • Tyler McCormick
  • Emma Spiro
  • Emilio Zagheni


The digital traces that we leave online are increasingly fruitful sources of data for social scientists, including those interested in demographic research. The collection and use of digital data also presents numerous statistical, computational, and ethical challenges, motivating the development of new research approaches to address these burgeoning issues. In this article, we argue that researchers with formal training in demography—those who have a history of developing innovative approaches to using challenging data—are well positioned to contribute to this area of work. We discuss the benefits and challenges of using digital trace data for social and demographic research, and we review examples of current demographic literature that creatively use digital trace data to study processes related to fertility, mortality, and migration. Focusing on Facebook data for advertisers—a novel “digital census” that has largely been untapped by demographers—we provide illustrative and empirical examples of how demographic researchers can manage issues such as bias and representation when using digital trace data. We conclude by offering our perspective on the road ahead regarding demography and its role in the data revolution.


Digital data Social media Big data Demographic methods 



This work is supported by Grants DMS-1737673 and SES-1559778 from the National Science Foundation and K01 HD078452 from the National Institute of Child Health and Human Development (NICHD). This material is based upon work supported by, or in part by, the U.S. Army Research Laboratory and the U. S. Army Research Office under contract/grant number W911NF-12-1-0379, and by the Washington Research Foundation. We also appreciate the support of the Earl and Edna Stice lectureship in the Social Sciences; the University of Washington Information School, Center for Statistics and the Social Sciences; eScience Institute; and Sociology Department for supporting speakers at the frontier of data science in demographic research.


  1. Adams, J., & Brueckner, H. (2015). Wikipedia, sociology, and the promise and pitfalls of Big Data. Big Data & Society, 2(2), 1–5. CrossRefGoogle Scholar
  2. Alkema, L., Raftery, A. E., Gerland, P., Clark, S. J., & Pelletier, F. (2012). Estimating trends in the total fertility rate with uncertainty using imperfect data: Examples from West Africa. Demographic Research, 26(article 15), 332–361. CrossRefGoogle Scholar
  3. Andrews, C., Fichet, E., Ding, Y., Spiro, E. S., & Starbird, K. (2016). Keeping up with the Tweet-dashians: The impact of “official” accounts on online rumoring. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (pp. 452–465). New York, NY: ACM.
  4. Ang, C., Bobrowicz, A., Schiano, D., & Nardi, B. (2013). Data in the wild: Some reflections. Interactions, 20(2), 39–43. CrossRefGoogle Scholar
  5. Araújo, M., Mejova, Y., Weber, I., & Benevenuto, F. (2017). Using Facebook ads audiences for global lifestyle disease surveillance: Promises and limitations. In Proceedings of the 2017 ACM on Web Science Conference (pp. 253–257). New York, NY, ACM.
  6. Barberá, P. (2016). Less is more? How demographic sample weights can improve public opinion estimates based on Twitter data (Working paper). New York: Center for Data Science, New York University. Retrieved from
  7. Barry, B. (2006). Friends for better or for worse: Interracial friendships in the United States as seen through wedding photos. Demography, 43, 491–510.CrossRefGoogle Scholar
  8. Belli, R. F., Traugott, M. W., Young, M., & McGonagle, K. A. (1999). Reducing vote overreporting in surveys: Social desirability, memory failure, and source monitoring. Public Opinion Quarterly, 63, 90–108.CrossRefGoogle Scholar
  9. Berinsky, A. J. (1999). The two faces of public opinion. American Journal of Political Science, 43, 1209–1230.CrossRefGoogle Scholar
  10. Blei, D. M., & Smyth, P. (2017). Science and data science. Science, 114, 8689–8692.Google Scholar
  11. Billari, F. C., D’Amuri, F., & Marcucci, J. (2013, April). Forecasting births using Google. Paper presented at the annual meeting of the Population Association of America, New Orleans, LA.Google Scholar
  12. Billari, F. C., & Zagheni, E. (2017). Big data and population processes: A revolution? In A. Petrucci & R. Verde (Eds.), SIS 2017. Statistics and Data science: New challenges, new generations. 28–30 June 2017 Florence (Italy). Proceedings of the Conference of the Italian Statistical Society (pp. 167–178). Firenze, Italy: Firenze University Press.
  13. Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350, 1073–1076.CrossRefGoogle Scholar
  14. Blumenstock, J., & Eagle, N. (2010). Mobile divides: Gender, socioeconomic status, and mobile phone use in Rwanda. In Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development (pp. 6–10). New York, NY: ACM.
  15. Blumenstock, J. E. (2012). Inferring patterns of internal migration from mobile phone call records: Evidence from Rwanda. Information Technology for Development, 18, 107–125.CrossRefGoogle Scholar
  16. Blumenstock, J. E., & Eagle, N. (2012). Divided we call: Disparities in access and use of mobile phones in Rwanda. Information Technologies and International Development, 8(2), 1–16.CrossRefGoogle Scholar
  17. Blumenstock, J. E., & Toomet, O. (2014, May). Segregation and “silent separation”: Using large-scale network data to model the determinants of ethnic segregation. Paper presented at the annual meeting of the Population Association of America, Boston, MA.Google Scholar
  18. Boyd, D., & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15, 662–679.CrossRefGoogle Scholar
  19. Brass, W. (1976). Indirect methods of estimating mortality illustrated by application to Middle East and North African data. In Population Bulletin of the United Nations Economic Commission for Western Asia. Amman, Jordan: UNECWA.Google Scholar
  20. Cesare, N., Spiro, E., & Lee, H. (2015, April). Self-presentation and information disclosure on Twitter: Understanding patterns and mechanisms along demographic lines. Paper presented at the annual meeting of the Population Association of America, San Diego, CA.Google Scholar
  21. Cesare, N., Lee, H., McCormick, T. H., & Spiro, E. S. (2017). Redrawing the silent “color line”: Examining racial segregation in associative networks on Twitter. Unpublished manuscript, University of Washington, Seattle, WA. Retrieved from arXiv:1705.04401.Google Scholar
  22. Couldry, N., & Powell, A. (2014). Big Data from the bottom up. Big Data & Society, 1(2), 1–5. CrossRefGoogle Scholar
  23. De Choudhury, M., Counts, S., & Horvitz, E. (2013a). Predicting postpartum changes in emotion and behavior via social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 3267–3276). New York, NY: ACM.Google Scholar
  24. De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013b). Predicting depression via social media. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (pp. 128–137). Palo Alto, CA: AAAI Press. Retrieved from
  25. De Choudhury, M., Sharma, S., & Kiciman, E. (2016). Characterizing dietary choices, nutrition, and language in food deserts via social media. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (pp. 1157–1170). New York, NY: ACM.
  26. Deville, P., Linard, C., Martin, S., Gilbert, M., Stevens, F. R., Gaughan, A. E., . . . Tatem, A. J. (2014). Dynamic population mapping using mobile phone data. Proceedings of the National Academy of Sciences, 111, 15853–15854.CrossRefGoogle Scholar
  27. Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., . . . Seligman, M. E. P. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26, 159–169.CrossRefGoogle Scholar
  28. Fadnes, L., Taube, A., & Tylleskär, T. (2009). How to identify information bias due to self-reporting in epidemiological research. Internet Journal of Epidemiology, 7(2), 1–8.Google Scholar
  29. Feehan, D., & Cobb, C. (2017, July). How many people have access to the Internet? Estimating Internet adoption around the world using Facebook. Paper presented at the International Conference on Computational Social Science, Cologne, Germany. Google Scholar
  30. Felt, M. (2016). Social media and the social sciences: How researchers employ Big Data analytics. Big Data & Society, 3(1), 1–15. CrossRefGoogle Scholar
  31. Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Unpublished manuscript, Department of Statistics, Columbia University, New York, NY. Retrieved from
  32. Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep and daylength across diverse cultures. Science 30, 1878–1881.CrossRefGoogle Scholar
  33. Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152.CrossRefGoogle Scholar
  34. González-Bailón, S. (2013). Social science in the era of big data. Policy and the Internet, 5, 147–160.CrossRefGoogle Scholar
  35. Graham, M., Hale, S., & Stephens, M. (2012). Featured graphic: Digital divide: The geography of Internet access. Environment and Planning, 44, 1009–1010.CrossRefGoogle Scholar
  36. Heaivilin, N., Gerbert, B., Page, J. E., & Gibbs, J. L. (2011). Public health surveillance of dental pain via Twitter. Journal of Dental Research, 90, 1047–1051.CrossRefGoogle Scholar
  37. Holbrook, A. L., & Krosnick, J. A. (2010). Social desirability bias in voter turnout reports: Tests using the item count technique. Public Opinion Quarterly, 74, 37–67.CrossRefGoogle Scholar
  38. Kashyap, R., Billari, F. C., Cavalli, N., Quian, E., & Weber, I. (2017, April). Ultrasound technology and “missing women” in India: Analyses and now-casts based on Google searches. Paper presented at the annual meeting of the Population Association of America, Chicago, IL.Google Scholar
  39. Keyfitz, N., & Caswell, H. (2005). The matrix model framework. In N. Keyfitz & H. Caswell (Eds.), Applied mathematical demography (3rd ed., pp. 47–70). New York, NY: Springer.Google Scholar
  40. Kikas, R., Dumas, M., & Saabas, A. (2015). Explaining international migration in the Skype network. In SIdEWayS ’15: Proceedings of the 1st ACM Workshop on Social Media World Sensors (pp. 17–22). New York, NY: ACM.
  41. Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 1–12. CrossRefGoogle Scholar
  42. Latour, B. (2007). Beware, your imagination leaves digital traces. Times Higher Literary Supplement, 6(4). Retrieved from,+your+imagination+leaves+digital+traces#0
  43. Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google flu: Traps in Big Data analysis. Science, 343, 1203–1205.CrossRefGoogle Scholar
  44. Lazer, D., & Radford, J. (2017). Data ex machina: Introduction to big data. Annual Review of Sociology, 49, 19–39.CrossRefGoogle Scholar
  45. Lee, H., Cesare, N., McCormick, T. H., Morris, J., & Shojaie, A. (2014, May). Redrawing the “color line”: Examining racial homophily of associative networks in social media. Paper presented at the annual meeting of the Population Association of America, Boston, MA.Google Scholar
  46. Lewis, K. (2015). Three fallacies of digital footprints. Big Data & Society, 2(2), 1–4. CrossRefGoogle Scholar
  47. Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content analysis in an era of big data: A hybrid approach to computational and manual methods. Journal of Broadcasting & Electronic Media, 57, 1–33. CrossRefGoogle Scholar
  48. Lohr, S. (2012, February 11). The age of Big Data. The New York Times. Retrieved from
  49. Madden, M., & Rainie, L. (2015). Americans’ attitudes about privacy, security and surveillance (Report). Washington, DC: Pew Research Center. Retrieved from
  50. Malik, M. M., & Pfeffer, J. (2016, March). Social media data and computational models of mobility: A review for demography. Paper presented at the ICWSM Workshop on Social Media and Demographic Research, Cologne, Germany. Retrieved from
  51. Manovich, L. (2011). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the digital humanities (pp. 460–476). Minneapolis: University of Minnesota Press.Google Scholar
  52. Marwick, A. E., & Boyd, D. (2010). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society, 13, 114–133. CrossRefGoogle Scholar
  53. Massey, D. (2016, March). Measuring racial prejudice using Google trends. Paper presented at the annual meeting of the Population Association of America, Washington, DC.Google Scholar
  54. Mateos, P., & Durand, J. (2014, May). Netnography and demography: Mining Internet discussion forums on migration and citizenship. Paper presented at the annual meeting of the Population Association of America, Boston, MA.Google Scholar
  55. McCormick, T. H., Lee, H., Cesare, N., Shojaie, A., & Spiro, E. S. (2015). Using Twitter for demographic and social science research: Tools for data collection and processing. Sociological Methods & Research, 46, 390–421.CrossRefGoogle Scholar
  56. Mendieta, J., Su, S., Vaca, C., Ochoa, D., & Vergara, C. (2016). Geo-localized social media data to improve characterization of international travelers. In Proceedings of the 2016 Third International Conference on eDemocracy & eGovernment (ICEDEG) (pp. 126–132). Piscataway, NJ: Institute of Electrical and Electronics Engineers.Google Scholar
  57. Metzler, K., Kim, D. A., Allum, N., & Denman, A. (2016). Who is doing computational social science? Trends in big data research (White paper). London, UK: SAGE Publishing.
  58. Mislove, A., Lehmann, S., & Ahn, Y. (2011). Understanding the demographics of Twitter users. In Proceedings of the Fifth International Conference on Weblogs and Social Media (pp. 554–557). Menlo Park, CA: AAAI Press. Retrieved from
  59. Moreno, M. A., Christakis, D. A., Egan, K. G., Brockman, L. N., & Becker, T. (2012). Associations between displayed alcohol references on Facebook and problem drinking among college students. Archives of Pediatrics & Adolescent Medicine, 166, 157–163.CrossRefGoogle Scholar
  60. National Research Council (NRC). (2014). Proposed revisions to the common rule for the protection of human subjects in the behavioral and social sciences (Report). Washington, DC: National Academies Press.Google Scholar
  61. O’Connor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (pp. 122–129). Palo Alto, CA: AAAI Press. Retrieved from
  62. Ojala, J, Zagheni E., Billari, F. C., & Weber, I. (2017). Fertility and its meaning: Evidence from search behavior. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media (pp. 640–643). Palo Alto, CA: AAAI Press.Google Scholar
  63. Palmer, J. R. B., Espenshade, T. J., Bartumeus, F., Chung, C. Y., Ozgencil, N. E., & Li, K. (2013). New approaches to human mobility: Using mobile phones for demographic research. Demography, 50, 1105–1128.CrossRefGoogle Scholar
  64. Park, R. E., & Burgess, E. W. (1925). The city. Chicago, IL: University of Chicago Press.Google Scholar
  65. Pettit, B. (2012). Invisible men: Mass incarceration and the myth of black progress. New York, NY: Russell Sage Foundation.Google Scholar
  66. Pew Research Center. (2018). Internet/broadband fact sheet. Washington, DC: Pew Research Center. Retrieved from
  67. Pötzschke, S., & Braun, M. (2016). Migrant sampling using Facebook advertisements: A case study of Polish migrants in four European countries. Social Science Computer Review, 35, 633–653.CrossRefGoogle Scholar
  68. Preston, S. H., Heuveline, P., & Guillot, M. (2001). Demography: Measuring and modeling population processes. Oxford, UK: Blackwell.Google Scholar
  69. Reeder, H., McCormick, T. H., & Spiro, E. (2014). Online information behaviors during disaster events: Roles, routines, and reactions (Working Paper No. 144). Seattle, WA: Center for Statistics and the Social Sciences.Google Scholar
  70. Reis, B. Y., & Brownstein, J. S. (2010). Measuring the impact of health policies using Internet search patterns: The case of abortion. BMC Public Health, 10(article 514).
  71. Rosello, J. L. D., & Filgueira, F. (2016, April). Big data in a small country: Integrating birth, maternal and child statistics in Uruguay. Paper presented at the annual meeting of the Population Association of America, Washington, DC.Google Scholar
  72. Rosenfeld, M. J., & Thomas, R. J. (2012). Searching for a mate: The rise of the Internet as a social intermediary. American Sociological Review, 77, 523–547.CrossRefGoogle Scholar
  73. Ruggles, S. (2014). Big microdata for population research. Demography, 51, 287–297.CrossRefGoogle Scholar
  74. Ruppert, E., Law, J., & Savage, M. (2013). Reassembling social science methods: The challenge of digital devices. Theory, Culture and Society, 30(4), 22–46.CrossRefGoogle Scholar
  75. Sagiroglu, S., & Sinanc, D. (2013). Big Data: A review. In 2013 International Conference on Collaboration Technologies and Systems (CTS) (pp. 42–47). Piscataway, NY: Institute of Electrical and Electronics Engineers.
  76. Ševčíková, H., Raftery, A. E., & Waddell, P. A. (2007). Assessing uncertainty in urban simulations using Bayesian melding. Transportation Research, Part B: Methodological, 41, 652–669.CrossRefGoogle Scholar
  77. Shaw, C. R., & McKay, H. D. (1942). Juvenile delinquency and urban areas. Chicago, IL: University of Chicago Press.Google Scholar
  78. Smith, A., & Anderson, M. (2018). Social media use in 2018. Washington, DC: Pew Research Center. Retrieved from
  79. Snijders, C., Matzat, U., & Reips, U.-D. (2012). Big data: Big gaps of knowledge in the field of Internet science. International Journal of Internet Science, 7, 1–5. Retrieved from
  80. Starbird, K., Maddock, J., Orand, M., Achterman, P., & Mason, R. M. (2014). Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing. In M. Kindling & E. Greinfeneder (Eds.), iConference 2014 proceedings (pp. 654–662). Urbana-Champaign, IL: iSchools.Google Scholar
  81. Starbird, K., Spiro, E., Edwards, I., Zhou, K., Maddock, J., & Narasimhan, S. (2016). Could this be true? I think so! Expressed uncertainty in online rumoring. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 360–371). New York, NY: ACM.Google Scholar
  82. State, B., Rodriguez, M., Helbing, D., & Zagheni, E. (2014). Migration of professionals to the U.S.: Evidence from LinkedIn data. In L. M. Aiello & D. McFarland (Eds.), 6th International Conference on Social Informatics, SocInfo 2014 (pp. 531–543). Cham, Switzerland: Springer.Google Scholar
  83. Stevenson, A. J. (2014). Finding the Twitter users who stood with Wendy. Contraception, 90, 502–507.CrossRefGoogle Scholar
  84. Sutton, J., Spiro, E. S., Johnson, B., Fitzhugh, S., Gibson, B., & Butts, C. T. (2014). Warning tweets: Serial transmission of messages during the warning phase of a disaster event. Information, Communication & Society, 17, 765–787.CrossRefGoogle Scholar
  85. Tamgno, J. K., Faye, R. M., & Lishou, C. (2013). Verbal autopsies, mobile data collection for monitoring and warning causes of deaths. In 14th International Conference on Advanced Communication Technology, Technical Proceedings, 2013 (pp. 495–501). Piscataway, NJ: Institute of Electrical and Electronics Engineers. Retrieved from
  86. Taylor, L., Floridi, L., & van der Sloot, L. (Eds.). (2017). Group privacy: New challenges of data technologies. Cham, Switzerland: Springer.Google Scholar
  87. Tomlinson, M., Solomon, W., Singh, Y., Doherty, T., Chopra, M., Ijumba, P., . . . Jackson, D. (2009). The use of mobile phones as a data collection tool: A report from a household survey in South Africa. BMC Medical Informatics and Decision Making, 9(article 51).
  88. Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133, 859–883.CrossRefGoogle Scholar
  89. Tourassi, G., Yoon, H. J., & Xu, S. (2016). A novel web informatics approach for automated surveillance of cancer mortality trends. Journal of Biomedical Informatics, 61, 110–118.CrossRefGoogle Scholar
  90. Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other methodological pitfalls. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (pp. 505–514). Palo Alto, CA: AAAI Press.Google Scholar
  91. Vitak, J. (2015). I like it….Whatever that means: The evolving relationship between disclosure, audience, and privacy in networked spaces [SlideShare presentation]. Retrieved from
  92. Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2015). Forecasting elections with non-representative polls. International Journal of Forecasting, 31, 980–991.CrossRefGoogle Scholar
  93. Willekens, F., Massey, D., Raymer, J., & Beauchemin, C. (2016). International migration under the microscope. Science, 352, 897–899.CrossRefGoogle Scholar
  94. Williams, N. E., Thomas, T. A., Dunbar, M., Eagle, N., & Dobra, A. (2015). Measures of human mobility using mobile phone records enhanced with GIS data. PLoS One, 10(7), 1–16. CrossRefGoogle Scholar
  95. Zagheni, E., Garimella, V. R. K., Ingmar, W., & State, B. (2014). Inferring international and internal migration patterns from Twitter data. In Proceedings of the 23rd International Conference on World Wide Web (pp. 439–444). New York, NY: ACM Press.
  96. Zagheni, E., & Weber, I. (2012). You are where you e-mail: using e-mail data to estimate international migration rates. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 348–351). New York, NY: ACM.
  97. Zagheni, E., Weber, I., & Gummadi, K. (2017). Leveraging Facebook’s advertising platform to monitor stocks of migrants. Population and Development Review, 43, 721–734.CrossRefGoogle Scholar
  98. Zeng, L., Starbird, K., & Spiro, E. S. (2016). Rumors at the speed of light? Modeling the rate of rumor transmission during crisis. In 49th Hawaii International Conference on System Sciences (HICSS), 1969–1978. Piscataway, NJ: Institute of Electrical and Electronics Engineers.
  99. Zimmer, M. (2010). But the data is already public: On the ethics of research in Facebook. Ethics and Information Technology, 12, 313–325.CrossRefGoogle Scholar
  100. Zwitter, A. (2014). Big data ethics. Big Data & Society, 1(2), 1–6. CrossRefGoogle Scholar

Copyright information

© Population Association of America 2018

Authors and Affiliations

  • Nina Cesare
    • 1
    Email author
  • Hedwig Lee
    • 2
  • Tyler McCormick
    • 3
    • 4
  • Emma Spiro
    • 2
    • 5
  • Emilio Zagheni
    • 6
  1. 1.Department of Global HealthBoston UniversityBostonUSA
  2. 2.Department of SociologyWashington UniversitySt. LouisUSA
  3. 3.Department of SociologyUniversity of WashingtonSeattleUSA
  4. 4.Department of StatisticsUniversity of WashingtonSeattleUSA
  5. 5.Information SchoolUniversity of WashingtonSeattleUSA
  6. 6.Max Planck Institute for Demographic ResearchRostockGermany

Personalised recommendations