Extracting Provenance Metadata from Privacy Policies

  • Harshvardhan Jitendra PanditEmail author
  • Declan O’Sullivan
  • Dave Lewis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11017)


Privacy policies are legal documents that describe activities over personal data such as its collection, usage, processing, sharing, and storage. Expressing this information as provenance metadata can aid in legal accountability as well as modelling of data usage in real-world use-cases. In this paper, we describe our early work on identification, extraction, and representation of provenance information within privacy policies. We discuss the adoption of entity extraction approaches using concepts and keywords defined by the GDPRtEXT resource along with using annotated privacy policy corpus from the UsablePrivacy project. We use the previously published GDPRov ontology (an extension of PROV-O) to model provenance model extracted from privacy policies.


Provenance Privacy policy GDPR 


This work is supported by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.


  1. 1.
    Bhatia, J., Breaux, T.D.: Towards an information type Lexicon for privacy policies. In: 2015 IEEE Eighth International Workshop on Requirements Engineering and Law (RELAW), pp. 19–24, August 2015.
  2. 2.
    Bhatia, J., Breaux, T.D.: A data purpose case study of privacy policies. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), pp. 394–399. IEEE (2017)Google Scholar
  3. 3.
    Fabian, B., Ermakova, T., Lentz, T.: Large-scale readability analysis of privacy policies. In: Proceedings of the International Conference on Web Intelligence, WI 2017, pp. 18–25. ACM, New York (2017).
  4. 4.
    Oltramari, A., et al.: PrivOnto: a semantic framework for the analysis of privacy policies. Semant. Web 9(2), 185–203 (2018). Scholar
  5. 5.
    Pandit, H.J., Fatema, K., O’Sullivan, D., Lewis, D.: GDPRtEXT - GDPR as a Linked Data Resource, p. 14. Heraklion, Crete, Greece (2018)Google Scholar
  6. 6.
    Pandit, H.J., Lewis, D.: Modelling Provenance for GDPR Compliance using Linked Open Data Vocabularies, p. 15Google Scholar
  7. 7.
    Tesfay, W.B., Hofmann, P., Nakamura, T., Kiyomoto, S., Serna, J.: PrivacyGuide: towards an implementation of the EU GDPR on internet privacy policy evaluation. In: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, IWSPA 2018, pp. 15–21. ACM, New York (2018).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Harshvardhan Jitendra Pandit
    • 1
    Email author
  • Declan O’Sullivan
    • 1
  • Dave Lewis
    • 1
  1. 1.ADAPT CentreTrinity College DublinDublinIreland

Personalised recommendations