Abstract
The digital traces that we leave online are increasingly fruitful sources of data for social scientists, including those interested in demographic research. The collection and use of digital data also presents numerous statistical, computational, and ethical challenges, motivating the development of new research approaches to address these burgeoning issues. In this article, we argue that researchers with formal training in demography—those who have a history of developing innovative approaches to using challenging data—are well positioned to contribute to this area of work. We discuss the benefits and challenges of using digital trace data for social and demographic research, and we review examples of current demographic literature that creatively use digital trace data to study processes related to fertility, mortality, and migration. Focusing on Facebook data for advertisers—a novel “digital census” that has largely been untapped by demographers—we provide illustrative and empirical examples of how demographic researchers can manage issues such as bias and representation when using digital trace data. We conclude by offering our perspective on the road ahead regarding demography and its role in the data revolution.
Similar content being viewed by others
Notes
For a review of these data, see Ruggles (2014) in a past issue of Demography.
See https://developer.twitter.com/en/docs/tweets/filter-realtime/overview for an overview of Twitter’s public streaming APIs.
See Facebook’s Data Policy: https://www.facebook.com/policy.php.
See Twitter’s Privacy Policy for more information: https://twitter.com/en/privacy.
See https://www.opalproject.org/about-opal for more information.
See http://iussp.org/en/panel/big-data-and-population-processes for more information on IUSSP workshop events.
Find more at http://www.facebook.com/business/.
See Preston et al. (2001:28–30) for details on conducting an age decomposition analysis.
References
Adams, J., & Brueckner, H. (2015). Wikipedia, sociology, and the promise and pitfalls of Big Data. Big Data & Society, 2(2), 1–5. https://doi.org/10.1177/2053951715614332
Alkema, L., Raftery, A. E., Gerland, P., Clark, S. J., & Pelletier, F. (2012). Estimating trends in the total fertility rate with uncertainty using imperfect data: Examples from West Africa. Demographic Research, 26(article 15), 332–361. https://doi.org/10.4054/DemRes.2012.26.15
Andrews, C., Fichet, E., Ding, Y., Spiro, E. S., & Starbird, K. (2016). Keeping up with the Tweet-dashians: The impact of “official” accounts on online rumoring. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (pp. 452–465). New York, NY: ACM. https://doi.org/10.1145/2818048.2819986
Ang, C., Bobrowicz, A., Schiano, D., & Nardi, B. (2013). Data in the wild: Some reflections. Interactions, 20(2), 39–43. https://doi.org/10.1145/2427076.2427085
Araújo, M., Mejova, Y., Weber, I., & Benevenuto, F. (2017). Using Facebook ads audiences for global lifestyle disease surveillance: Promises and limitations. In Proceedings of the 2017 ACM on Web Science Conference (pp. 253–257). New York, NY, ACM. https://doi.org/10.1145/3091478.3091513
Barberá, P. (2016). Less is more? How demographic sample weights can improve public opinion estimates based on Twitter data (Working paper). New York: Center for Data Science, New York University. Retrieved from http://pablobarbera.com/static/less-is-more.pdf
Barry, B. (2006). Friends for better or for worse: Interracial friendships in the United States as seen through wedding photos. Demography, 43, 491–510.
Belli, R. F., Traugott, M. W., Young, M., & McGonagle, K. A. (1999). Reducing vote overreporting in surveys: Social desirability, memory failure, and source monitoring. Public Opinion Quarterly, 63, 90–108.
Berinsky, A. J. (1999). The two faces of public opinion. American Journal of Political Science, 43, 1209–1230.
Blei, D. M., & Smyth, P. (2017). Science and data science. Science, 114, 8689–8692.
Billari, F. C., D’Amuri, F., & Marcucci, J. (2013, April). Forecasting births using Google. Paper presented at the annual meeting of the Population Association of America, New Orleans, LA.
Billari, F. C., & Zagheni, E. (2017). Big data and population processes: A revolution? In A. Petrucci & R. Verde (Eds.), SIS 2017. Statistics and Data science: New challenges, new generations. 28–30 June 2017 Florence (Italy). Proceedings of the Conference of the Italian Statistical Society (pp. 167–178). Firenze, Italy: Firenze University Press. https://doi.org/10.17605/OSF.IO/F9VZP
Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350, 1073–1076.
Blumenstock, J., & Eagle, N. (2010). Mobile divides: Gender, socioeconomic status, and mobile phone use in Rwanda. In Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development (pp. 6–10). New York, NY: ACM. https://doi.org/10.1145/2369220.2369225
Blumenstock, J. E. (2012). Inferring patterns of internal migration from mobile phone call records: Evidence from Rwanda. Information Technology for Development, 18, 107–125.
Blumenstock, J. E., & Eagle, N. (2012). Divided we call: Disparities in access and use of mobile phones in Rwanda. Information Technologies and International Development, 8(2), 1–16.
Blumenstock, J. E., & Toomet, O. (2014, May). Segregation and “silent separation”: Using large-scale network data to model the determinants of ethnic segregation. Paper presented at the annual meeting of the Population Association of America, Boston, MA.
Boyd, D., & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15, 662–679.
Brass, W. (1976). Indirect methods of estimating mortality illustrated by application to Middle East and North African data. In Population Bulletin of the United Nations Economic Commission for Western Asia. Amman, Jordan: UNECWA.
Cesare, N., Spiro, E., & Lee, H. (2015, April). Self-presentation and information disclosure on Twitter: Understanding patterns and mechanisms along demographic lines. Paper presented at the annual meeting of the Population Association of America, San Diego, CA.
Cesare, N., Lee, H., McCormick, T. H., & Spiro, E. S. (2017). Redrawing the silent “color line”: Examining racial segregation in associative networks on Twitter. Unpublished manuscript, University of Washington, Seattle, WA. Retrieved from arXiv:1705.04401.
Couldry, N., & Powell, A. (2014). Big Data from the bottom up. Big Data & Society, 1(2), 1–5. https://doi.org/10.1177/2053951714539277
De Choudhury, M., Counts, S., & Horvitz, E. (2013a). Predicting postpartum changes in emotion and behavior via social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 3267–3276). New York, NY: ACM.
De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013b). Predicting depression via social media. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (pp. 128–137). Palo Alto, CA: AAAI Press. Retrieved from https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/viewFile/6124/6351
De Choudhury, M., Sharma, S., & Kiciman, E. (2016). Characterizing dietary choices, nutrition, and language in food deserts via social media. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (pp. 1157–1170). New York, NY: ACM. https://doi.org/10.1145/2818048.2819956
Deville, P., Linard, C., Martin, S., Gilbert, M., Stevens, F. R., Gaughan, A. E., . . . Tatem, A. J. (2014). Dynamic population mapping using mobile phone data. Proceedings of the National Academy of Sciences, 111, 15853–15854.
Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., . . . Seligman, M. E. P. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26, 159–169.
Fadnes, L., Taube, A., & Tylleskär, T. (2009). How to identify information bias due to self-reporting in epidemiological research. Internet Journal of Epidemiology, 7(2), 1–8.
Feehan, D., & Cobb, C. (2017, July). How many people have access to the Internet? Estimating Internet adoption around the world using Facebook. Paper presented at the International Conference on Computational Social Science, Cologne, Germany.
Felt, M. (2016). Social media and the social sciences: How researchers employ Big Data analytics. Big Data & Society, 3(1), 1–15. https://doi.org/10.1177/2053951716645828
Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Unpublished manuscript, Department of Statistics, Columbia University, New York, NY. Retrieved from http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep and daylength across diverse cultures. Science 30, 1878–1881.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152.
González-Bailón, S. (2013). Social science in the era of big data. Policy and the Internet, 5, 147–160.
Graham, M., Hale, S., & Stephens, M. (2012). Featured graphic: Digital divide: The geography of Internet access. Environment and Planning, 44, 1009–1010.
Heaivilin, N., Gerbert, B., Page, J. E., & Gibbs, J. L. (2011). Public health surveillance of dental pain via Twitter. Journal of Dental Research, 90, 1047–1051.
Holbrook, A. L., & Krosnick, J. A. (2010). Social desirability bias in voter turnout reports: Tests using the item count technique. Public Opinion Quarterly, 74, 37–67.
Kashyap, R., Billari, F. C., Cavalli, N., Quian, E., & Weber, I. (2017, April). Ultrasound technology and “missing women” in India: Analyses and now-casts based on Google searches. Paper presented at the annual meeting of the Population Association of America, Chicago, IL.
Keyfitz, N., & Caswell, H. (2005). The matrix model framework. In N. Keyfitz & H. Caswell (Eds.), Applied mathematical demography (3rd ed., pp. 47–70). New York, NY: Springer.
Kikas, R., Dumas, M., & Saabas, A. (2015). Explaining international migration in the Skype network. In SIdEWayS ’15: Proceedings of the 1st ACM Workshop on Social Media World Sensors (pp. 17–22). New York, NY: ACM. https://doi.org/10.1145/2806655.2806658
Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 1–12. https://doi.org/10.1177/2053951714528481
Latour, B. (2007). Beware, your imagination leaves digital traces. Times Higher Literary Supplement, 6(4). Retrieved from http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Beware+,+your+imagination+leaves+digital+traces#0
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google flu: Traps in Big Data analysis. Science, 343, 1203–1205.
Lazer, D., & Radford, J. (2017). Data ex machina: Introduction to big data. Annual Review of Sociology, 49, 19–39.
Lee, H., Cesare, N., McCormick, T. H., Morris, J., & Shojaie, A. (2014, May). Redrawing the “color line”: Examining racial homophily of associative networks in social media. Paper presented at the annual meeting of the Population Association of America, Boston, MA.
Lewis, K. (2015). Three fallacies of digital footprints. Big Data & Society, 2(2), 1–4. https://doi.org/10.1177/2053951715602496
Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content analysis in an era of big data: A hybrid approach to computational and manual methods. Journal of Broadcasting & Electronic Media, 57, 1–33. https://doi.org/10.1080/08838151.2012.761702
Lohr, S. (2012, February 11). The age of Big Data. The New York Times. Retrieved from https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html
Madden, M., & Rainie, L. (2015). Americans’ attitudes about privacy, security and surveillance (Report). Washington, DC: Pew Research Center. Retrieved from http://www.pewinternet.org/2015/05/20/americans-attitudes-about-privacy-security-and-surveillance/
Malik, M. M., & Pfeffer, J. (2016, March). Social media data and computational models of mobility: A review for demography. Paper presented at the ICWSM Workshop on Social Media and Demographic Research, Cologne, Germany. Retrieved from http://www.pfeffer.at/papers/2016_demography.pdf
Manovich, L. (2011). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the digital humanities (pp. 460–476). Minneapolis: University of Minnesota Press.
Marwick, A. E., & Boyd, D. (2010). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society, 13, 114–133. https://doi.org/10.1177/1461444810365313
Massey, D. (2016, March). Measuring racial prejudice using Google trends. Paper presented at the annual meeting of the Population Association of America, Washington, DC.
Mateos, P., & Durand, J. (2014, May). Netnography and demography: Mining Internet discussion forums on migration and citizenship. Paper presented at the annual meeting of the Population Association of America, Boston, MA.
McCormick, T. H., Lee, H., Cesare, N., Shojaie, A., & Spiro, E. S. (2015). Using Twitter for demographic and social science research: Tools for data collection and processing. Sociological Methods & Research, 46, 390–421.
Mendieta, J., Su, S., Vaca, C., Ochoa, D., & Vergara, C. (2016). Geo-localized social media data to improve characterization of international travelers. In Proceedings of the 2016 Third International Conference on eDemocracy & eGovernment (ICEDEG) (pp. 126–132). Piscataway, NJ: Institute of Electrical and Electronics Engineers.
Metzler, K., Kim, D. A., Allum, N., & Denman, A. (2016). Who is doing computational social science? Trends in big data research (White paper). London, UK: SAGE Publishing. https://doi.org/10.4135/wp160926
Mislove, A., Lehmann, S., & Ahn, Y. (2011). Understanding the demographics of Twitter users. In Proceedings of the Fifth International Conference on Weblogs and Social Media (pp. 554–557). Menlo Park, CA: AAAI Press. Retrieved from http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2816/3234
Moreno, M. A., Christakis, D. A., Egan, K. G., Brockman, L. N., & Becker, T. (2012). Associations between displayed alcohol references on Facebook and problem drinking among college students. Archives of Pediatrics & Adolescent Medicine, 166, 157–163.
National Research Council (NRC). (2014). Proposed revisions to the common rule for the protection of human subjects in the behavioral and social sciences (Report). Washington, DC: National Academies Press.
O’Connor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (pp. 122–129). Palo Alto, CA: AAAI Press. Retrieved from https://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1536/1842
Ojala, J, Zagheni E., Billari, F. C., & Weber, I. (2017). Fertility and its meaning: Evidence from search behavior. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media (pp. 640–643). Palo Alto, CA: AAAI Press.
Palmer, J. R. B., Espenshade, T. J., Bartumeus, F., Chung, C. Y., Ozgencil, N. E., & Li, K. (2013). New approaches to human mobility: Using mobile phones for demographic research. Demography, 50, 1105–1128.
Park, R. E., & Burgess, E. W. (1925). The city. Chicago, IL: University of Chicago Press.
Pettit, B. (2012). Invisible men: Mass incarceration and the myth of black progress. New York, NY: Russell Sage Foundation.
Pew Research Center. (2018). Internet/broadband fact sheet. Washington, DC: Pew Research Center. Retrieved from http://www.pewinternet.org/fact-sheet/internet-broadband/
Pötzschke, S., & Braun, M. (2016). Migrant sampling using Facebook advertisements: A case study of Polish migrants in four European countries. Social Science Computer Review, 35, 633–653.
Preston, S. H., Heuveline, P., & Guillot, M. (2001). Demography: Measuring and modeling population processes. Oxford, UK: Blackwell.
Reeder, H., McCormick, T. H., & Spiro, E. (2014). Online information behaviors during disaster events: Roles, routines, and reactions (Working Paper No. 144). Seattle, WA: Center for Statistics and the Social Sciences.
Reis, B. Y., & Brownstein, J. S. (2010). Measuring the impact of health policies using Internet search patterns: The case of abortion. BMC Public Health, 10(article 514). https://doi.org/10.1186/1471-2458-10-514
Rosello, J. L. D., & Filgueira, F. (2016, April). Big data in a small country: Integrating birth, maternal and child statistics in Uruguay. Paper presented at the annual meeting of the Population Association of America, Washington, DC.
Rosenfeld, M. J., & Thomas, R. J. (2012). Searching for a mate: The rise of the Internet as a social intermediary. American Sociological Review, 77, 523–547.
Ruggles, S. (2014). Big microdata for population research. Demography, 51, 287–297.
Ruppert, E., Law, J., & Savage, M. (2013). Reassembling social science methods: The challenge of digital devices. Theory, Culture and Society, 30(4), 22–46.
Sagiroglu, S., & Sinanc, D. (2013). Big Data: A review. In 2013 International Conference on Collaboration Technologies and Systems (CTS) (pp. 42–47). Piscataway, NY: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/CTS.2013.6567202
Ševčíková, H., Raftery, A. E., & Waddell, P. A. (2007). Assessing uncertainty in urban simulations using Bayesian melding. Transportation Research, Part B: Methodological, 41, 652–669.
Shaw, C. R., & McKay, H. D. (1942). Juvenile delinquency and urban areas. Chicago, IL: University of Chicago Press.
Smith, A., & Anderson, M. (2018). Social media use in 2018. Washington, DC: Pew Research Center. Retrieved from http://assets.pewresearch.org/wp-content/uploads/sites/14/2018/03/01105133/PI_2018.03.01_Social-Media_FINAL.pdf
Snijders, C., Matzat, U., & Reips, U.-D. (2012). Big data: Big gaps of knowledge in the field of Internet science. International Journal of Internet Science, 7, 1–5. Retrieved from http://www.ijis.net/ijis7_1/ijis7_1_editorial_pre.html
Starbird, K., Maddock, J., Orand, M., Achterman, P., & Mason, R. M. (2014). Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing. In M. Kindling & E. Greinfeneder (Eds.), iConference 2014 proceedings (pp. 654–662). Urbana-Champaign, IL: iSchools.
Starbird, K., Spiro, E., Edwards, I., Zhou, K., Maddock, J., & Narasimhan, S. (2016). Could this be true? I think so! Expressed uncertainty in online rumoring. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 360–371). New York, NY: ACM.
State, B., Rodriguez, M., Helbing, D., & Zagheni, E. (2014). Migration of professionals to the U.S.: Evidence from LinkedIn data. In L. M. Aiello & D. McFarland (Eds.), 6th International Conference on Social Informatics, SocInfo 2014 (pp. 531–543). Cham, Switzerland: Springer.
Stevenson, A. J. (2014). Finding the Twitter users who stood with Wendy. Contraception, 90, 502–507.
Sutton, J., Spiro, E. S., Johnson, B., Fitzhugh, S., Gibson, B., & Butts, C. T. (2014). Warning tweets: Serial transmission of messages during the warning phase of a disaster event. Information, Communication & Society, 17, 765–787.
Tamgno, J. K., Faye, R. M., & Lishou, C. (2013). Verbal autopsies, mobile data collection for monitoring and warning causes of deaths. In 14th International Conference on Advanced Communication Technology, Technical Proceedings, 2013 (pp. 495–501). Piscataway, NJ: Institute of Electrical and Electronics Engineers. Retrieved from https://ieeexplore.ieee.org/document/6488236/
Taylor, L., Floridi, L., & van der Sloot, L. (Eds.). (2017). Group privacy: New challenges of data technologies. Cham, Switzerland: Springer.
Tomlinson, M., Solomon, W., Singh, Y., Doherty, T., Chopra, M., Ijumba, P., . . . Jackson, D. (2009). The use of mobile phones as a data collection tool: A report from a household survey in South Africa. BMC Medical Informatics and Decision Making, 9(article 51). https://doi.org/10.1186/1472-6947-9-51
Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133, 859–883.
Tourassi, G., Yoon, H. J., & Xu, S. (2016). A novel web informatics approach for automated surveillance of cancer mortality trends. Journal of Biomedical Informatics, 61, 110–118.
Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other methodological pitfalls. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (pp. 505–514). Palo Alto, CA: AAAI Press.
Vitak, J. (2015). I like it….Whatever that means: The evolving relationship between disclosure, audience, and privacy in networked spaces [SlideShare presentation]. Retrieved from https://www.slideshare.net/jvitak/i-like-itwhatever-that-means-the-evolving-relationship-between-disclosure-audience-and-privacy-in-networked-spaces
Wang, W., Rothschild, D., Goel, S., & Gelman, A. (2015). Forecasting elections with non-representative polls. International Journal of Forecasting, 31, 980–991.
Willekens, F., Massey, D., Raymer, J., & Beauchemin, C. (2016). International migration under the microscope. Science, 352, 897–899.
Williams, N. E., Thomas, T. A., Dunbar, M., Eagle, N., & Dobra, A. (2015). Measures of human mobility using mobile phone records enhanced with GIS data. PLoS One, 10(7), 1–16. https://doi.org/10.1371/journal.pone.0133630
Zagheni, E., Garimella, V. R. K., Ingmar, W., & State, B. (2014). Inferring international and internal migration patterns from Twitter data. In Proceedings of the 23rd International Conference on World Wide Web (pp. 439–444). New York, NY: ACM Press. https://doi.org/10.1145/2567948.2576930
Zagheni, E., & Weber, I. (2012). You are where you e-mail: using e-mail data to estimate international migration rates. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 348–351). New York, NY: ACM. https://doi.org/10.1145/2380718.2380764
Zagheni, E., Weber, I., & Gummadi, K. (2017). Leveraging Facebook’s advertising platform to monitor stocks of migrants. Population and Development Review, 43, 721–734.
Zeng, L., Starbird, K., & Spiro, E. S. (2016). Rumors at the speed of light? Modeling the rate of rumor transmission during crisis. In 49th Hawaii International Conference on System Sciences (HICSS), 1969–1978. Piscataway, NJ: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/HICSS.2016.248
Zimmer, M. (2010). But the data is already public: On the ethics of research in Facebook. Ethics and Information Technology, 12, 313–325.
Zwitter, A. (2014). Big data ethics. Big Data & Society, 1(2), 1–6. https://doi.org/10.1177/2053951714559253
Acknowledgments
This work is supported by Grants DMS-1737673 and SES-1559778 from the National Science Foundation and K01 HD078452 from the National Institute of Child Health and Human Development (NICHD). This material is based upon work supported by, or in part by, the U.S. Army Research Laboratory and the U. S. Army Research Office under contract/grant number W911NF-12-1-0379, and by the Washington Research Foundation. We also appreciate the support of the Earl and Edna Stice lectureship in the Social Sciences; the University of Washington Information School, Center for Statistics and the Social Sciences; eScience Institute; and Sociology Department for supporting speakers at the frontier of data science in demographic research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cesare, N., Lee, H., McCormick, T. et al. Promises and Pitfalls of Using Digital Traces for Demographic Research. Demography 55, 1979–1999 (2018). https://doi.org/10.1007/s13524-018-0715-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13524-018-0715-2