Skip to main content

Privacy Policy Classification with XLNet (Short Paper)

  • 495 Accesses

Part of the Lecture Notes in Computer Science book series (LNSC,volume 12484)

Abstract

Popularization of privacy policies has become an attractive subject of research in recent years, notably after General Data Protection Regulation came into force in the European Union. While GDPR gives Data Subjects more rights and control over the use of their personal data, length and complexity of privacy policies can still prevent them from exercising those rights. An accepted way to improve the interpretability of privacy policies is through assigning understandable categories to every paragraph or segment in said documents. Current state of the art in privacy policy analysis has established a baseline in multi-label classification on the dataset containing 115 privacy policies, using BERT Transformers. In this paper, we propose a new classification model based on the XLNet. Trained on the same dataset, our model improves the baseline F1 macro and micro averages by 1–3% for both majority vote and union-based gold standards. Moreover, the results reported by our XLNet-based model have been achieved without fine-tuning on domain-specific data, which reduces the training time and complexity, compared to the BERT-based model. To make our method reproducible, we report our hyper-parameters and provide access to all used resources, including code. This work may therefore be considered as a first step to establishing a new baseline for privacy policy classification.

Keywords

  • Privacy policy
  • Multi-label classification
  • Deep learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-66172-4_16
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-66172-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)

Notes

  1. 1.

    Supported and funded by the Walloon region, Belgium.

  2. 2.

    To be published in the proceedings of The 35th International Conference on ICT Systems Security and Privacy Protection (2020).

  3. 3.

    https://github.com/euranova/privacy-policy-classification-xlnet.

  4. 4.

    https://usableprivacy.org/.

  5. 5.

    https://tosdr.org/about.html.

  6. 6.

    https://github.com/huggingface/transformers.

  7. 7.

    https://github.com/kaushaltrivedi/fast-bert.

  8. 8.

    We follow the baseline [10], where the Other category was broken down into its 3 underlying attributes.

  9. 9.

    For the label distribution in the two gold standards see [10].

References

  1. Al-Rfou, R., Choe, D., Constant, N., Guo, M., Jones, L.: Character-level language modeling with deeper self-attention. CoRR abs/1808.04444 (2018)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings (2015)

    Google Scholar 

  3. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning (2014)

    Google Scholar 

  4. Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Volume 1: Long Papers, pp. 2978–2988. ACL (2019). https://doi.org/10.18653/v1/p19-1285

  5. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. ACL (2019). https://doi.org/10.18653/v1/n19-1423

  6. Gallé, M., Christofi, A., Elsahar, H.: The case for a GDPR-specific annotated dataset of privacy policies. In: AAAI Symposium on Privacy-Enhancing AI and HLT Technologies (2019)

    Google Scholar 

  7. Harkous, H., Fawaz, K., Lebret, R., Schaub, F., Shin, K.G., Aberer, K.: Polisis: automated analysis and presentation of privacy policies using deep learning. In: 27th \(\{\)USENIX\(\}\) Security Symposium, \(\{\)USENIX\(\}\) Security 2018, pp. 531–548 (2018)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    CrossRef  Google Scholar 

  9. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018, Volume 1: Long Papers, pp. 328–339. ACL (2018). https://doi.org/10.18653/v1/P18-1031

  10. Mousavi, N., Jabat, P., Nedelchev, R., Scerri, S., Graux, D.: Establishing a strong baseline for privacy policy classification. In: IFIP International Conference on ICT Systems Security and Privacy Protection (2020)

    Google Scholar 

  11. Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. ACL, New Orleans (2018). https://doi.org/10.18653/v1/N18-1202

  12. Sathyendra, K.M., Wilson, S., Schaub, F., Zimmeck, S., Sadeh, N.M.: Identifying the provision of choices in privacy policy text. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, pp. 2774–2779. ACL (2017)

    Google Scholar 

  13. Sobers, R.: The average reading level of a privacy policy (2020). https://www.varonis.com/blog/gdpr-privacy-policy/

  14. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017)

    Google Scholar 

  15. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)

  16. Wilson, S., et al.: The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1330–1340 (2016)

    Google Scholar 

  17. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, pp. 5754–5764 (2019)

    Google Scholar 

  18. Zimmeck, S., et al.: MAPS: scaling privacy compliance analysis to a million apps. PoPETs 2019(3), 66–86 (2019). https://doi.org/10.2478/popets-2019-0037

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majd Mustapha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Mustapha, M., Krasnashchok, K., Al Bassit, A., Skhiri, S. (2020). Privacy Policy Classification with XLNet (Short Paper). In: Garcia-Alfaro, J., Navarro-Arribas, G., Herrera-Joancomarti, J. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM CBT 2020 2020. Lecture Notes in Computer Science(), vol 12484. Springer, Cham. https://doi.org/10.1007/978-3-030-66172-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66172-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66171-7

  • Online ISBN: 978-3-030-66172-4

  • eBook Packages: Computer ScienceComputer Science (R0)