KnIGHT: Mapping Privacy Policies to GDPR

  • Najmeh Mousavi NejadEmail author
  • Simon Scerri
  • Jens Lehmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11313)


Although the use of apps and online services comes with accompanying privacy policies, a majority of end-users ignore them due to their length, complexity and unappealing presentation.pite the potential risks. In light of the, now enforced EU-wide, General Data Protection Regulation (GDPR) we present an automatic technique for mapping privacy policies excerpts to relevant GDPR articles so as to support average users in understanding their usage risks and rights as a data subject. KnIGHT (Know your rIGHTs), is a tool that finds candidate sentences in a privacy policy that are potentially related to specific articles in the GDPR. The approach employs semantic text matching in order to find the most appropriate GDPR paragraph, and to the best of our knowledge is one of the first automatic attempts of its kind applied to a company’s policy. Our evaluation shows that on average between 70–90% of the tool’s automatic mappings are at least partially correct, meaning that the tool can be used to significantly guide human comprehension. Following this result, in the future we will utilize domain-specific vocabularies to perform a deeper semantic analysis and improve the results further.


Privacy policy General data protection regulation Semantic text matching 


  1. 1.
    Deeplearning4j: Open-source distributed deep learning for the JVM Apache Software Foundation License 2.0 (2015).
  2. 2.
    Acquisti, A., Grossklags, J.: Privacy and rationality in individual decision making. IEEE Secur. Priv. 3(1), 26–33 (2005). Scholar
  3. 3.
    Alzahrani, S., Salim, N., Abraham, A.: Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42, 133–149 (2012)CrossRefGoogle Scholar
  4. 4.
    Bartolini, C., Muthuri, R.: Reconciling data protection rights and obligations: an ontology of the forthcoming EU regulation. In: Proceedings of the Workshop on Language and Semantic Technology for Legal Domain (LST4LD) (2015)Google Scholar
  5. 5.
    Breaux, T.D., Vail, M.W., Anton, A.I.: Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In: Proceedings of the 14th IEEE International Requirements Engineering Conference, pp. 46–55. RE 2006. IEEE Computer Society, Washington, DC (2006).
  6. 6.
    Cohen, J.M.: Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213–20 (1968)CrossRefGoogle Scholar
  7. 7.
    Costante, E., Sun, Y., Petković, M., den Hartog, J.: A machine learning solution to assess privacy policy completeness: (short paper). In: Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society, pp. 91–96. WPES 2012. ACM, New York (2012).
  8. 8.
    Cranor, L.F., Guduru, P., Arjula, M.: User interfaces for privacy agents. ACM Trans. Comput.-Hum. Interact. 13(2), 135–178 (2006). Scholar
  9. 9.
    Cunningham, H., Maynard, D., Tablan, V.: JAPE: a Java Annotation Patterns Engine (Second Edition). Research Memorandum CS-00-10, Department of Computer Science, University of Sheffield (November 2000).
  10. 10.
    Fleiss, J.L.: Measuring agreement between two judges on the presence or absence of a trait. Biometrics 31(3), 651–659 (1975). Scholar
  11. 11.
    Guntamukkala, N., Dara, R., Grewal, G.W.: A machine-learning based approach for measuring the completeness of online privacy policies. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 289–294 (2015)Google Scholar
  12. 12.
    Harkous, H., Fawaz, K., Lebret, R., Schaub, F., Shin, K.G., Aberer, K.: Polisis: automated analysis and presentation of privacy policies using deep learning. CoRR abs/1802.02561 (2018)Google Scholar
  13. 13.
    Hripcsak, G., Heitjan, D.: Measuring agreement in medical informatics reliability studies. J. Biomed. Inform. 35(2), 99–110 (2002). Scholar
  14. 14.
    Hripcsak, G., Rothschild, A.S.: Agreement, the F-Measure, and Reliability in Information Retrieval. JAMIA 12(3), 296–298 (2005). Scholar
  15. 15.
    Jensen, C., Potts, C., Jensen, C.: Privacy practices of internet users: self-reports versus observed behavior. Int. J. Hum.-Comput. Stud. 63(1–2), 203–227 (2005). Scholar
  16. 16.
    Kenter, T., Borisov, A., de Rijke, M.: Siamese CBOW: optimizing word embeddings for sentence representations. CoRR abs/1606.04640 (2016).
  17. 17.
    Obar, J.A., Oeldorf-Hirsch, A.: The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services. Inf. Commun. Soc. (2018).
  18. 18.
    Pandit, H.J., Lewis, D., O’Sullivan, D.: Gdprtext - gdpr as a linked data resource, January 2018.
  19. 19.
    Wilson, S., et al.: The creation and analysis of a website privacy policy corpus. In: ACL (2016)Google Scholar
  20. 20.
    Zeni, N., Kiyavitskaya, N., Mich, L., Cordy, J.R., Mylopoulos, J.: Gaiust: supporting the extraction of rights and obligations for regulatory compliance. Requir. Eng. 20(1), 1–22 (2015). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Najmeh Mousavi Nejad
    • 1
    • 2
    Email author
  • Simon Scerri
    • 1
    • 2
  • Jens Lehmann
    • 1
    • 2
  1. 1.Smart Data Analytics (SDA)University of BonnBonnGermany
  2. 2.Fraunhofer Intelligent Analysis and Information Systems (IAIS) Sankt AugustinGermany

Personalised recommendations