“But the data is already public”: on the ethics of research in Facebook

Abstract

In 2008, a group of researchers publicly released profile data collected from the Facebook accounts of an entire cohort of college students from a US university. While good-faith attempts were made to hide the identity of the institution and protect the privacy of the data subjects, the source of the data was quickly identified, placing the privacy of the students at risk. Using this incident as a case study, this paper articulates a set of ethical concerns that must be addressed before embarking on future research in social networking sites, including the nature of consent, properly identifying and respecting expectations of privacy on social network sites, strategies for data anonymization prior to public release, and the relative expertise of institutional review boards when confronted with research projects based on data gleaned from social media.

This is a preview of subscription content, log in to check access.

Notes

  1. 1.

    While no individuals within the T3 dataset were positively identified (indeed, the author did not attempt to re-identify individuals), discovering the source institution makes individual re-identification much easier, perhaps even trivial, as discussed below.

  2. 2.

    See also bibliography maintained by danah boyd at http://www.danah.org/SNSResearch.html.

  3. 3.

    The research team includes Harvard University professors Jason Kaufman and Nicholas Christakis, UCLA professor Andreas Wimmer, and Harvard sociology graduate students Kevin Lewis and Marco Gonzalez.

  4. 4.

    See “Social Networks and Online Spaces: A Cohort Study of American College Students”, Award #0819400, http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0819400.

  5. 5.

    See relevant National Science Foundation Grant General Conditions (GC-1), section 38. Sharing of Findings, Data, and Other Research Products (http://www.nsf.gov/publications/pub_summ.jsp?ods_key=gc109).

  6. 6.

    The dataset is archived at the IQSS Dataverse Network at Harvard University (http://dvn.iq.harvard.edu/dvn/).

  7. 7.

    College Board, http://www.collegeboard.com.

  8. 8.

    This process is described at the Harvard College Office of Residential Life website: http://www.orl.fas.harvard.edu/icb/icb.do?keyword=k11447&tabgroupid=icb.tabgroup17715.

  9. 9.

    Screenshot of http://dvn.iq.harvard.edu/dvn/dv/t3 taken on October 22, 2008, on file with author.

  10. 10.

    Screenshot of http://dvn.iq.harvard.edu/dvn/dv/t3 taken on March 27, 2009, on file with author. Webpage remains unchanged as of April 29, 2009.

  11. 11.

    Screenshot of http://dvn.iq.harvard.edu/dvn/dv/t3 taken on November 1, 2009, on file with author. As of May 29, 2010, this message remains in place.

  12. 12.

    Facebook allows users to control access to their profiles based on variables such as “Friends only”, or those in their “Network” (such as the Harvard network), or to “Everyone”. Thus, a profile might not be discoverable or viewable to someone outside the boundaries of the access setting.

  13. 13.

    Simply stripping names from records is rarely a sufficient means to keep a dataset anonymous. For example, Latanya Sweeny has shown that 87 percent of Americans could be identified by records listing solely their birth date, gender and ZIP code (Sweeney 2002).

  14. 14.

    See, for example, the California Senate Bill 1386, http://info.sen.ca.gov/pub/01-02/bill/sen/sb_1351-1400/sb_1386_bill_20020926_chaptered.html.

  15. 15.

    European Union Data Protection Directive 95/46/EC, http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31995L0046:EN:HTML.

  16. 16.

    http://www.fas.harvard.edu/~research/hum_sub/.

  17. 17.

    Attempts to obtain information about the IRB deliberations with regard to the T3 project have been unsuccessful.

  18. 18.

    This section is intended as an informal analysis of the discourse used when talking about the T3 project. It is meant to reveal gaps in broader understanding of the issues at hand, and not necessarily directed against a particular speaker.

  19. 19.

    After the T3 research project was funded and well underway, Kaufman became a fellow at the Berkman Center for Internet & Society at Harvard University, an organization dedicated to studying a number of Internet-related issues, including privacy. While Kaufman presented preliminary results of his research to the Berkman community prior to joining the center (Kaufman 2008a), there is no evidence that others at Berkman were consulted prior to the release of the T3 dataset.

  20. 20.

    I thank an anonymous reviewer for suggesting this organizing framework.

  21. 21.

    See, for example, the United States Federal Trade Commission’s Fair Information Practice Principles (http://www.ftc.gov/reports/privacy3/fairinfo.shtm), which include “Access” as a key provision, providing data subjects the ability to view and contesting inaccurate or incomplete data.

  22. 22.

    See Part 46 Protection of Human Subjects of Title 45 Public Welfare of the Code of Federal Regulations at http://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm.

  23. 23.

    See, for example, the “Internet Research Ethics: Discourse, Inquiry, and Policy” research project directed by Elizabeth Buchanan and Charles Ess (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0646591).

  24. 24.

    An important movement in this direction is the recently funded “Internet Research and Ethics 2.0: The Internet Research Ethics Digital Library, Interactive Resource Center, and Online Ethics Advisory Board” project, also directly by Elizabeth Buchanan and Charles Ess (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0924604 and http://www.internetresearchethics.org/).

References

  1. Albrechtslund, A. (2008). Online social networking as participatory surveillance. First Monday Retrieved 2008, March 3, from http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2142/1949.

  2. Barbaro, M., & Zeller Jr, T. (2006). A face is exposed for AOL searcher no. 4417749. The New York Times, p. A1.

  3. Barnes, S. (2006). A privacy paradox: Social networking in the United States. First Monday Retrieved October 12, 2007, from http://www.firstmonday.org/ISSUES/issue11_9/barnes/.

  4. Bloustein, E. (1964). Privacy as an aspect of human dignity: An answer to Dean Prosser. New York University Law Review, 39, 962–1007.

    Google Scholar 

  5. boyd, D. (2008a). Putting privacy settings in the context of use (in Facebook and elsewhere). Apophenia Retrieved October 22, 2008, from http://www.zephoria.org/thoughts/archives/2008/10/22/putting_privacy.html.

  6. boyd, D. (2008b). Taken out of context: American teen sociality in networked publics. Unpublished Dissertation, University of California-Berkeley.

  7. boyd, D., & Ellison, N. (2008). Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication, 13(1), 210–230.

    Article  Google Scholar 

  8. Ess, C., & AoIR ethics working committee. (2002). Ethical decision-making and Internet research. Retrieved March 12, 2010, from http://www.aoir.org/reports/ethics.pdf.

  9. Gatt, A. (2002). Click-wrap agreements the enforceability of click-wrap agreements. Computer Law & Security Report, 18(6), 404–410.

    Article  Google Scholar 

  10. Grimmelmann, J. (2009). Facebook and the social dynamics of privacy. Iowa Law Review, 95, 4.

    Google Scholar 

  11. Gross, R., & Acquisti, A. (2005). Information revelation and privacy in online social networks. Paper presented at the 2005 ACM workshop on Privacy in the electronic society, Alexandria, VA.

  12. Jansen, B. J., & Resnick, M. (2005). Examining searcher perceptions of and interactions with sponsored results. Paper presented at the Workshop on Sponsored Search Auctions at ACM Conference on Electronic Commerce, Vancouver, BC.

  13. Jansen, B. J., & Spink, A. (2005). How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing & Management, 42(1), 248–263.

    Article  Google Scholar 

  14. Kaufman, J. (2008a). Considering the sociology of Facebook: Harvard Research on Collegiate Social Networking [Video].: Berkman Center for Internet & Society.

  15. Kaufman, J. (2008b). I am the Principal Investigator… [Blog comment]. On the “Anonymity” of the Facebook dataset Retrieved September 30, 2008, from http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/.

  16. Kaufman, J. (2008c). Michael—We did not consult… [Blog comment]. michaelzimmer.org Retrieved September 30, 2008, from http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/.

  17. Lenhart, A., & Madden, M. (2007). Teens, privacy & online social networks. Pew internet & American life project Retrieved April 20, 2007, from http://www.pewinternet.org/pdfs/PIP_Teens_Privacy_SNS_Report_Final.pdf.

  18. Lewis, K. (2008). Tastes, Ties, and Time: Cumulative codebook. Retrieved September 30, 2008, from http://dvn.iq.harvard.edu/dvn/dv/t3.

  19. Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, Ties, and time: A new social network dataset using Facebook. com. Social Networks, 30(4), 330–342.

    Article  Google Scholar 

  20. McGeveran, W. (2007). Facebook, context, and privacy. Info/Law Retrieved October 3, 2008, from http://blogs.law.harvard.edu/infolaw/2007/09/17/facebook-context/.

  21. N.A. (2008). Tastes, Ties, and Time: Facebook data release. Berkman Center for Internet & Society Retrieved September 30, 2008, from http://cyber.law.harvard.edu/node/4682.

  22. Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. Paper presented at the IEEE Symposium on Security and Privacy, 2008.

  23. Narayanan, A., & Shmatikov, V. (2009). De-anonymizing social networks. Paper presented at the 30th IEEE Symposium on Security and Privacy.

  24. Nissenbaum, H. (1998). Protecting privacy in an information age: The problem of privacy in public. Law and Philosophy, 17(5), 559–596.

    Google Scholar 

  25. Nissenbaum, H. (2004). Privacy as contextual integrity. Washington Law Review, 79(1), 119–157.

    Google Scholar 

  26. Nissenbaum, H. (2009). Privacy in context: Technology, policy, and the integrity of social life. Stanford, CA: Stanford University Press.

  27. Nussbaum, E. (2007). Kids, the Internet, and the end of privacy. New York Magazine Retrieved February 13, 2007, from http://nymag.com/news/features/27341/.

  28. Rosenbloom, S. (2007). On Facebook, scholars link up with data. New York Times Retrieved September 30, 2008, from http://www.nytimes.com/2007/12/17/style/17facebook.html?ref=us.

  29. Simmel, G., & Wolff, K. H. (1964). The sociology of Georg Simmel. Glencoe, Ill: Free Press.

    Google Scholar 

  30. Smith, H. J., Milberg, S. J., & Burke, S. J. (1996). Information privacy: Measuring individuals’ concerns about organizational practices. MIS Quarterly, 20(2), 167–196.

    Article  Google Scholar 

  31. Solove, D. (2007). The future of reputation: Gossip, rumor, and privacy on the internet. New Haven, CT: Yale University Press.

    Google Scholar 

  32. Stutzman, F. (2006). How Facebook broke its culture. Unit Structures Retrieved 2008, October 3, from http://chimprawk.blogspot.com/2006/09/how-facebook-broke-its-culture.html.

  33. Stutzman, F. (2008). Facebook datasets and private chrome. Unit Structures Retrieved 2008, September 30, from http://fstutzman.com/2008/09/29/facebook-datasets-and-private-chrome/.

  34. Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 10(5), 557–570.

    MATH  Article  MathSciNet  Google Scholar 

  35. Wellman, B., & Berkowitz, S. D. (1988). Social structures: A network approach. Cambridge: University Press Cambridge.

    Google Scholar 

  36. Zimmer, M. (2006). More on Facebook and the contextual integrity of personal information flows. michaelzimmer.org Retrieved 2008, October 3, from http://michaelzimmer.org/2006/09/08/more-on-facebook-and-the-contextual-integrity-of-personal-information-flows/.

  37. Zimmer, M. (2008a). More on the “Anonymity” of the Facebook dataset—It’s Harvard College. michaelzimmer.org Retrieved October 3, 2008, from http://michaelzimmer.org/2008/10/03/more-on-the-anonymity-of-the-facebook-dataset-its-harvard-college/.

  38. Zimmer, M. (2008b). On the “Anonymity” of the Facebook dataset. michaelzimmer.org Retrieved September 30, 2008, from http://michaelzimmer.org/2008/09/30/on-the-anonymity-of-the-facebook-dataset/.

Download references

Acknowledgments

The author thanks the participants at the International Conference of Computer Ethics: Philosophical Enquiry in Corfu, Greece, as well as the Internet Research 10: Internet Critical conference in Milwaukee, Wisconsin, for their helpful comments and feedback. Additional thanks to Elizabeth Buchanan, Charles Ess, Alex Halavais, Anthony Hoffmann, Jon Pincus, Adam Shostack, and Fred Stutzman for their valuable insights and conversations, both online and off. The author also thanks the anonymous reviewers for their helpful suggestions and criticisms. This article would not have been possible without the research assistance of Wyatt Ditzler and Renea Drews. Finally, I would like to thank Jason Kaufman and Colin McKay at the Berkman Center for Internet & Society, for their valued and continued feedback regarding this work.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Michael Zimmer.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zimmer, M. “But the data is already public”: on the ethics of research in Facebook. Ethics Inf Technol 12, 313–325 (2010). https://doi.org/10.1007/s10676-010-9227-5

Download citation

Keywords

  • Research ethics
  • Social networks
  • Facebook
  • Privacy
  • Anonymity