Compliance Using Metadata

  • Rigo WenningEmail author
  • Sabrina Kirrane


Everybody talks about the data economy. Data is collected stored, processed and re-used. In the EU, the GDPR creates a framework with conditions (e.g. consent) for the processing of personal data. But there are also other legal provisions containing requirements and conditions for the processing of data. Even today, most of those are hard-coded into workflows or database schemes, if at all. Data lakes are polluted with unusable data because nobody knows about usage rights or data quality. The approach presented here makes the data lake intelligent. It remembers usage limitations and promises made to the data subject or the contractual partner. Data can be used as risk can be assessed. Such a system easily reacts on new requirements. If processing is recorded back into the data lake, the recording of this information allows to prove compliance. This can be shown to authorities on demand as an audit trail. The concept is best exemplified by the SPECIAL project (Scalable Policy-aware Linked Data Architecture For Privacy, Transparency and Compliance). SPECIAL has several use cases, but the basic framework is applicable beyond those cases.


  1. 1.
    Raymond ES (1999) The cathedral and the bazaar: musings on Linux and open source by an accidental revolutionary. O’Reilly Media, Cambridge. ISBN 1-56592-724-9Google Scholar
  2. 2.
    A W3C/IAB (2014) Workshop on Strengthening the Internet Against Pervasive Monitoring (STRINT), London, 28 Feb–1 Mar. Accessed 20 Oct 2017
  3. 3.
    Lipartito K (2010) The economy of surveillance. MPRA paper, vol 21181, Mar. Accessed 20 Oct 2017
  4. 4.
  5. 5.
    Regulation (EU) (2016) 2016/679 of the European Parliament and of the Council of 27 April on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Official Journal of the European Union 59(L 119), May 2016, 1–88 ELI: Accessed 20 Oct 2017
  6. 6.
    Clubbing Seals (2014) Exploring the ecosystem of third-party security seals. In: Van Goethem T, Piessens F, Joosen W, Nikiforakis N (eds) Proceedings of the ACM SIGSAC conference on computer and communications security, Scottsdale. Accessed 20 Oct 2017
  7. 7.
    Seneviratne O, Kagal L, Berners-Lee T (2009) Policy-aware content reuse on the web. In: ISWC 2009. Accessed 20 Oct 2017
  8. 8.
    The PPL language, Primelife Deliverable D5.3.4 – Report on design and implementation. Accessed 20 Oct 2017
  9. 9.
    Tools for semantic lifting of multiformat budgetary data. Deliverable D2.1 from Fighting corruption with fiscal transparency. H2020 project number: 645833. Accessed 20 Oct 2017
  10. 10.
    RFC3987 Internationalized Resource Identifiers.
  11. 11.
    The W3C Web Annotation Working Group. Accessed 20 Oct 2017
  12. 12.
    Web Annotation Data Model, W3C Recommendation 23 February (2017) Accessed 20 Oct 2017
  13. 13.
    SPARQL(2013) Query language for RDF, W3C Recommendation 21 March. Accessed 20 Oct 2017
  14. 14.
    See eXtensible Access Control Markup Language (XACML), currently version 3, with various specifications. Accessed 20 Oct 2017
  15. 15.
    Security Assertion Markup Language (SAML) v2.0 (with further info). Accessed 20 Oct 2017
  16. 16.
    ODRL Vocabulary & Expression, W3C working draft 23 February (2017) Accessed 20 Oct 2017. See also the linked data profile Accessed 20 Oct 2017 and the various notes linked from the WG page Accessed 20 Oct 2017
  17. 17.
    An Overview of the PROV Family of Documents, W3C Working Group Note 30 April (2013) Accessed 20 Oct 2017
  18. 18.
  19. 19.
    McDonald AM, Cranor LF (2008) The cost of reading privacy policies, ISJLP 4, HeinOnline, 543. Accessed 20 Oct 2017
  20. 20.
    McDonald AM, Reeder RW, Kelley PG, Cranor LF (2009) A comparative study of online privacy policies and formats. In: Privacy enhancing technologies, vol 5672. Springer. Accessed 20 Oct 2017
  21. 21.
    Villata S, Gandon F (2012) Licenses compatibility and composition in the web of data. In: Proceedings of the third international conference on consuming linked data, vol 905, pp 124–135. Accessed 20 Oct 2017
  22. 22.
    Big Data Europe. Accessed 20 Oct 2017
  23. 23.
    Components supported by the Big Data Europe platform. Accessed 20 Oct 2017
  24. 24.
    Auer S et al (2017) The BigDataEurope platform – supporting the variety dimension of big data. In: Web engineering: 17th international conference, ICWE 2017, Rome, 5–8 June 2017, Proceedings, pp 41–59Google Scholar
  25. 25.
    SANSA – Scalable Semantic Analytics Stack, open source algorithms for distributed data processing for large-scale RDF knowledge graphs. Accessed 20 Oct 2017

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.European Research Consortium for Informatics and Mathematics (GEIE ERCIM)Sophia AntipolisFrance
  2. 2.Vienna University of Economics and BusinessWienGermany

Personalised recommendations