An ethico-legal framework for social data science

Abstract

This paper presents a framework for research infrastructures enabling ethically sensitive and legally compliant data science in Europe. Our goal is to describe how to design and implement an open platform for big data social science, including, in particular, personal data. To this end, we discuss a number of infrastructural, organizational and methodological principles to be developed for a concrete implementation. These include not only systematically tools and methodologies that effectively enable both the empirical evaluation of the privacy risk and data transformations by using privacy-preserving approaches, but also the development of training materials (a massive open online course) and organizational instruments based on legal and ethical principles. This paper provides, by way of example, the implementation that was adopted within the context of the SoBigData Research Infrastructure.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Notes

  1. 1.

    Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).

  2. 2.

    Art. 1(2) GDPR.

  3. 3.

    Directive 95/46/EC of the European Parliament and the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data.

  4. 4.

    Recitals 1–13 GDPR.

  5. 5.

    For example, Art. 89(2) and (3) GDPR and Art. 9(2)(j) GDPR.

  6. 6.

    Art. 2(1) GDPR.

  7. 7.

    Recital 26 GDPR.

  8. 8.

    Art. 4(1) GDPR.

  9. 9.

    Recital 26 GDPR.

  10. 10.

    Art. 3(2)(b) GDPR.

  11. 11.

    The tenet of accountability is explicitly mentioned in Art. 5(2) GDPR.

  12. 12.

    Art. 89(3) GDPR provides that the EU or Member States can enact exceptions from the rights granted in Arts. 15, 16, 18, 19, 20, 21 GDPR.

  13. 13.

    Arts. 13 and 14 GDPR.

  14. 14.

    Art. 15 GDPR.

  15. 15.

    Art. 29 Working Party, Opinion 1/2010 on the concepts of “controller” and “processor”.

  16. 16.

    Art. 82 GDPR.

  17. 17.

    Art. 29 Working Party, Opinion 1/2010 on the concepts of “controller” and “processor”.

  18. 18.

    See for a more detailed analysis of the assessment of who is acting as controller in complicated technical environments: Mahieu, R., van Hoboken, J., Asghari, H. (2019). Responsibility for Data Protection in a Networked World—On the Question of the Controller, Effective and Complete Protection and Its Application to Data Access Rights in Europe. JIPITEC, 10(1), 85–105. A critical analysis of Art. 26 GDPR is also provided by Kartheuser I. & Nabulsi S (2018). Abgrenzungsfragen bei gemeinsamen Verantwortlichen—Kritische Analyse der Voraussetzungen nach Art. 26 DS-GVO. MMR 21(11), 717–721

  19. 19.

    Ibid, p. 11.

  20. 20.

    Ibid, pp. 11–14.

  21. 21.

    Article 2 Berne Convention for the Protection of Literary and Artistic Works; Article 1 WIPO Copyright Treaty; Article 9 TRIPS Agreement.

  22. 22.

    Directive 2001/29/EC of 22 May 2001 on the harmonization of certain aspects of copyright and related rights in the information society, OJ L 167 , 22/06/2001 P. 0010–0019.

  23. 23.

    According to the work-for-hire doctrine, copyright in computer programs developed in course of employment pass to the employer. See: Article 2(3) Directive 2009/24/EC of 23 April 2009 on the legal protection of computer programs.

  24. 24.

    M-Atlas. Available at https://sobigdata.d4science.org/group/sobigdata-gateway/data-catalogue.

  25. 25.

    Twitter Dataset 2013–2014: The data set was collected by the Archive team through the Twitter Streaming API which provides free access to 1% of public tweets. Available at https://sobigdata.d4science.org/group/sobigdata-gateway/data-catalogue.

  26. 26.

    Disease Twitter Dataset: This Twitter Dataset covers two recent outbreaks: Ebola and Zika. About 60 million tweets were collected through a query-based access to the Twitter Streaming API, covering the period of April 13th 2015 to August 2nd 2016. Available at https://sobigdata.d4science.org/group/sobigdata-gateway/data-catalogue.

  27. 27.

    Article 3 Twitter Terms of Service. Available at https://twitter.com/en/tos.

  28. 28.

    Paragraph I B i licence from Twitter, Twitter Developer Agreement, Effective: May 25, 2018. Available at https://developer.twitter.com/en/developer-terms/agreement-and-policy.html.

  29. 29.

    Bygrave, L. A. (2017). Data Protection by Design and by Default: Deciphering the EU’s Legislative Requirements. Oslo Law Review, 4(02), 105–120. https://doi.org/10.18261/issn.2387-3299-2017-02-03. See also Mahieu, R., van Eck, N. J., van Putten, D., & van den Hoven, J. (2018). From dignity to security protocols: a scientometric analysis of digital ethics. Ethics and Information Technology, 20(3), 175–187. https://doi.org/10.1007/s10676-018-9457-5 showing a divide between the work on digital ethics in the fields of ethics, law and computer science.

  30. 30.

    SoBigData Gateway Terms of Use. Available at https://sobigdata.d4science.org/terms-of-use.

  31. 31.

    Disease Twitter Dataset accessible via SoBigData Catalogue accessible at: https://sobigdata.d4science.org/catalogue.

  32. 32.

    Section I.F Be A Good Partner to Twitter, Twitter Developer Policy, Effective: November 3, 2017. Available at https://developer.twitter.com/en/developer-terms/policy.html.

  33. 33.

    Art. 30 GDPR also requires that each controller shall maintain a record of certain information, e.g. the name and contact details of the controller, its representative and the data protection officer or the purpose of the processing.

  34. 34.

    This is to be distinguished from the data protection impact assessment according to Art. 35 GDPR.

References

  1. 1.

    Anderson, C.: The Future of High Tech: The Power of a Strong Startup Eco System. Related by Laurens van de Velde/Universiteit van Twente (2016)

  2. 2.

    Boehme-Nesler, V.: Das Ende der Anonymität—Wie Big Data das Datenschutzrecht verändert. DuD 40(7), 419–423 (2016). https://doi.org/10.1007/s11623-016-0629-3

    Article  Google Scholar 

  3. 3.

    Bretthauer, S.: Compliance-by-design-Anforderungen bei Smart Data. ZD 6(2), 267–274 (2016)

    Google Scholar 

  4. 4.

    Buttarelli, G.: Opinion 4/2015 Towards a New Digital Ethics—Data, Dignity and Technology (2015). Retrieved from https://edps.europa.eu/sites/edp/files/publication/15-09-11_data_ethics_en.pdf. Accessed on 31 May 2019

  5. 5.

    Capitani, D., di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Over-encryption: management of access control evolution on outsourced data. In: Proceeding of the 33rd International Conference on Very Large Data Bases (VLDB) (2007)

  6. 6.

    Capitani, D., di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Encryption policies for regulating access to outsourced data. ACM Trans. Database Syst. 35(2), 12 (2010)

    Google Scholar 

  7. 7.

    Carniani, E., D’Arenzo, D., Lazouski, A., Martinelli, F., Mori, P.: Usage control on cloud systems. Future Gener. Comput. Syst. 63(C), 37–55 (2016). https://doi.org/10.1016/j.future.2016.04.010

    Article  Google Scholar 

  8. 8.

    European Data Protection Supervisor, Opinion 7/2015. Meeting the Challenges of Big Data. https://secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Consultation/Opinions/2015/15-11-19_Big_Data_EN.pdf. Accessed on 31 May 2019

  9. 9.

    European Research Area: The Lund Declaration (2009). Retrieved from https://era.gv.at/object/document/130. Accessed on 31 May 2019

  10. 10.

    Furletti, B., Trasarti, R., Cintia, P., Gabrielli, L.: Discovering and understanding city events with big data: the case of Rome. Inf. Multidiscip. Digit. Publ. Inst. 8(74), 3 (2017)

    Google Scholar 

  11. 11.

    Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine-grained access control of encrypted data. In: Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS), Alexandria, VA, USA (2006)

  12. 12.

    Gürses, S.: Can you engineer privacy? Commun. ACM 57(8), 20–23 (2014). https://doi.org/10.1145/2633029

    Article  Google Scholar 

  13. 13.

    Hasan, M., Rundensteiner, E., Agu, E.: Automatic emotion detection in text streams by analyzing Twitter data. Int. J. Data Sci. Anal. 7, 35 (2019). https://doi.org/10.1007/s41060-018-0096-z

    Article  Google Scholar 

  14. 14.

    Hustinx, P.: Privacy by design: delivering the promises. Identity Inf. Soc. 3(2), 253–255 (2010). https://doi.org/10.1007/s12394-010-0061-z

    Article  Google Scholar 

  15. 15.

    Inkpen, D., Roche, M., Teisseire, M.: Guest editorial: Special issue on environmental and geospatial data analytics. Int. J. Data Sci. Anal. 5, 81 (2018). https://doi.org/10.1007/s41060-018-0105-2

    Article  Google Scholar 

  16. 16.

    Krügel, T.: Das personenbezogene Datum nach der DS-GVO—Mehr Klarheit und Rechtssicherheit? ZD 7(10), 455–460 (2017)

    Google Scholar 

  17. 17.

    Katko, P., Babaei-Beigi, A.: Accountability statt Einwilligung? Führt Big Data zum Paradigmenwechsel im Datenschutz. MMR 17(6), 360–364 (2014)

    Google Scholar 

  18. 18.

    Marnau, N.: Anonymisierung. Pseudonymisierung und Transparenz für Big Data. DuD 40(7), 428–433 (2016)

    Article  Google Scholar 

  19. 19.

    Martini, M.: In: B. Paal and D. Pauly (eds.) Datenschutz-Grundverordnung. München: C.H. Beck (2017)

  20. 20.

    Mayer-Schönberger, V., Padova, Y.: Regime change? Enabling big data through Europe’s new data protection regulation. Colum. Sci. Tech. L. Rev. 17, 315–335 (2016)

    Google Scholar 

  21. 21.

    Monreale, A., Rinzivillo, S., Pratesi, F., Giannotti, F., Pedreschi, D.: Privacy-by-design in big data analytics and social mining. EPJ Data Sci. 3(1), 10 (2014)

    Article  Google Scholar 

  22. 22.

    Monreale, A., Andrienko, G.L., Andrienko, N.V., Giannotti, F., Pedreschi, D., Rinzivillo, S., Wrobel, S.: Movement data anonymity through generalization. Trans. Data Priv. 3(2), 91–121 (2010)

    MathSciNet  Google Scholar 

  23. 23.

    Narayanan, A., Felten, E.W.: No silver bullet: de-identification still doesn’t work. White Paper (2014). Retrieved from http://randomwalker.info/publications/no-silver-bullet-de-identification.pdf. Accessed on 31 May 2019

  24. 24.

    Park, J., Sandhu, R.: Towards usage control models: beyond traditional access control. In: Proceedings of the 7-th ACM Symposium on Access Control (2002)

  25. 25.

    Park, J., Sandhu, R.: The UCONABC usage control model. ACM Trans. Inf. Syst. Secur. 7(1), 128–174 (2004)

    Article  Google Scholar 

  26. 26.

    Pratesi, F., Monreale, A., Trasarti, R., Giannotti, F., Pedreschi, D., Yanagihara, T.: PRUDEnce: a system for assessing privacy risk vs utility in data sharing ecosystems. Trans. Data Priv. 11, 139–167 (2018)

    Google Scholar 

  27. 27.

    President of the Council of European Union: Rome Declaration on Responsible Research and Innovation in Europe (2014). Retrieved from https://ec.europa.eu/research/swafs/pdf/rome_declaration_RRI_final_21_November.pdf. Accessed on 31 May 2019

  28. 28.

    Rodríguez-González, A., Vakali, A., Mayer, M.A., Okumura, T., Menasalvas-Ruiz, E., Spiliopoulou, M.: Introduction to the special issue on social data analytics in medicine and healthcare. Int. J. Data Sci. Anal. 8, 325 (2019). https://doi.org/10.1007/s41060-019-00199-9

    Article  Google Scholar 

  29. 29.

    Sarunski, M.: Big Data-Ende der Anonymität? Fragen aus Sicht der Datenschutzaufsichtsbehörde Mecklenburg-Vorpommern. DuD 40(7), 424–427 (2016). https://doi.org/10.1007/s11623-016-0630-x

    Article  Google Scholar 

  30. 30.

    Schefzig, J.: Big Data = Personal Data? Der Personenbezug von Daten bei Big-Data-Analysen. K&R 19(12), 772–778 (2014)

    Google Scholar 

  31. 31.

    Van den Hoven, J.: ICT and value sensitive design. In: The Information Society: Innovation, Legitimacy, Ethics and Democracy in Honor of Professor Jacques Berleur SJ, pp. 67–72. Springer, Boston (2007)

  32. 32.

    Waters, B.: Ciphertext-policy attribute-based encryption: an expressive, efficient, and provably secure realization. In: PKC 2011. LNCS, vol. 6571, pp. 53–70. Springer, Heidelberg (2011)

  33. 33.

    Zuboff, S.: Big other: surveillance capitalism and the prospects of an information civilization. J. Inf. Technol. 30(1), 75–89 (2015)

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Francesca Pratesi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the European Commission through the Horizon2020 European project “SoBigData Research Infrastructure—Big Data and Social Mining Ecosystem” (Grant Agreement 654024). The funders had no role in developing the research and writing the manuscript.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Forgó, N., Hänold, S., van den Hoven, J. et al. An ethico-legal framework for social data science. Int J Data Sci Anal (2020). https://doi.org/10.1007/s41060-020-00211-7

Download citation

Keywords

  • Ethical data science
  • Legal data science
  • Research infrastructure