Skip to main content

Privacy and Customer’s Education: NLP for Information Resources Suggestions and Expert Finder Systems

  • Conference paper
  • First Online:
HCI for Cybersecurity, Privacy and Trust (HCII 2022)

Abstract

Privacy is one of the key issues for citizen’s everyday online activities, with the United Nations defining it as “a human right in the digital age”. Despite the introduction of data privacy regulations almost everywhere around the globe, the biggest barrier to effectiveness is the customer’s capacity to map the privacy statement received with the regulation in force and understand their terms. This study advocates the creation of a convenient and cost-efficient question-answering service for answering customers’ queries on data privacy. It proposes a dual step approach, allowing consumers to ask support to a conversational agent boosted by a smart knowledge base, attempting to answer the question using the most appropriate legal document. Being the self-help approach insufficient, our system enacts a second step suggesting a ranked list of legal experts for focused advice. To achieve our objective, we need large enough and specialised dataset and we plan to apply state-of-the-art Natural Language Processing (NLP) techniques in the field of open domain question answering. This paper describes the initial steps and some early results we achieved in this direction and the next steps we propose to develop a one-stop solution for consumers privacy needs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.alexa.com/topsites.

  2. 2.

    https://www.kaggle.com/peopledatalabssf/free-7-million-company-dataset.

  3. 3.

    https://duckduckgo.com/.

  4. 4.

    https://law.stackexchange.com/.

  5. 5.

    https://archive.org/details/stackexchange.

  6. 6.

    https://github.com/pushshift/api.

References

  1. Ahmad, W., Chi, J., Tian, Y., Chang, K.W.: PolicyQA: a reading comprehension dataset for privacy policies. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 743–749. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.findings-emnlp.66

  2. Amos, R., Acar, G., Lucherini, E., Kshirsagar, M., Narayanan, A., Mayer, J.: Privacy policies over time: curation and analysis of a million-document dataset. In: Proceedings of The Web Conference 2021, WWW ’21, p. 22. Association for Computing Machinery (2021). https://doi.org/10.1145/3442381.3450048

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  4. Chalkidis, I., Androutsopoulos, I., Aletras, N.: Neural legal judgment prediction in English. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4317–4323. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1424. https://aclanthology.org/P19-1424

  5. Chalkidis, I., Fergadiotis, M., Androutsopoulos, I.: MultiEURLEX - a multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. CoRR abs/2109.00904 (2021). https://arxiv.org/abs/2109.00904

  6. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2898–2904. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.261

  7. Chalkidis, I., Fergadiotis, M., Tsarapatsanis, D., Aletras, N., Androutsopoulos, I., Malakasiotis, P.: Paragraph-level rationale extraction through regularization: a case study on European court of human rights cases. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 226–241. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.naacl-main.22. https://aclanthology.org/2021.naacl-main.22

  8. Chalkidis, I., et al.: LexGLUE: a benchmark dataset for legal language understanding in English. CoRR (2021). arXiv: 2110.00976

  9. Franco, M.F., Rodrigues, B., Scheid, E.J., Jacobs, A., Killer, C., Granville, L.Z., Stiller, B.: SecBot: a business-driven conversational agent for cybersecurity planning and management. In: 2020 16th International Conference on Network and Service Management (CNSM), pp. 1–7. IEEE (2020)

    Google Scholar 

  10. Gstrein, O.J., Beaulieu, A.: How to protect privacy in a datafied society? A presentation of multiple legal and conceptual approaches. Philos. Technol. 35(1), 1–38 (2022). https://doi.org/10.1007/s13347-022-00497-4

    Article  Google Scholar 

  11. Jain, M., Kumar, P., Bhansali, I., Liao, Q.V., Truong, K., Patel, S.: FarmChat: a conversational agent to answer farmer queries. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 2, no. 4, pp. 1–22 (2018)

    Google Scholar 

  12. Lippi, M., et al.: CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service. Artif. Intell. Law 27(2), 117–139 (2019). https://doi.org/10.1007/s10506-019-09243-2

    Article  Google Scholar 

  13. Meier, P., Beinke, J.H., Fitte, C., Behne, A., Teuteberg, F.: FeelFit - design and evaluation of a conversational agent to enhance health awareness. In: Krcmar, H., Fedorowicz, J., Boh, W.F., Leimeister, J.M., Wattal, S. (eds.) Proceedings of the 40th International Conference on Information Systems, ICIS 2019, Munich, Germany, 15–18 December 2019. Association for Information Systems (2019). https://aisel.aisnet.org/icis2019/is_health/is_health/22

  14. Prince, C., Omrani, N., Maalaoui, A., Dabic, M., Kraus, S.: Are we living in surveillance societies and is privacy an illusion? An empirical study on privacy literacy and privacy concerns. IEEE Trans. Eng. Manag. 1–18 (2021). https://doi.org/10.1109/TEM.2021.3092702

  15. Ravichander, A., Black, A.W., Wilson, S., Norton, T., Sadeh, N.: Question answering for privacy policies: combining computational and legal perspectives. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4949–4959. Association for Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-1500. https://www.aclweb.org/anthology/D19-1500

  16. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019). https://arxiv.org/abs/1908.10084

  17. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). http://arxiv.org/abs/1910.01108

  18. Spaeth, H., et al.: Supreme court database code book (2020). https://scdb.wustl.edu

  19. Stalder, F., Denzler, A., Mazzola, L.: Towards granular knowledge structures: Comparison of different approaches. In: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 261–266. IEEE (2021)

    Google Scholar 

  20. Strycharz, J., Ausloos, J., Helberger, N.: Data protection or data frustration? Individual perceptions and attitudes towards the GDPR. Eur. Data Prot. L. Rev. 6, 407 (2020)

    Article  Google Scholar 

  21. Tuggener, D., von Däniken, P., Peetz, T., Cieliebak, M.: LEDGAR: a large-scale multi-label corpus for text classification of legal provisions in contracts. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 1235–1241. European Language Resources Association, Marseille (2020). https://aclanthology.org/2020.lrec-1.155

  22. Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-read students learn better: the impact of student initialization on knowledge distillation. CoRR abs/1908.08962 (2019). http://arxiv.org/abs/1908.08962

  23. Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. JL Tech. 31, 841 (2017)

    Google Scholar 

  24. Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. CoRR abs/2002.10957 (2020). https://arxiv.org/abs/2002.10957

  25. Zheng, L., Guha, N., Anderson, B.R., Henderson, P., Ho, D.E.: When does pretraining help? Assessing self-supervised learning for law and the CaseHOLD dataset. CoRR abs/2104.08671 (2021). https://arxiv.org/abs/2104.08671

Download references

Acknowledgement

The research leading to this work was partially financed by Innosuisse - Swiss federal agency for Innovation, through a competitive call. The project 50446.1 IP-ICT is called P2Sr Profila Privacy Simplified reloaded: Open-smart knowledge base on Swiss privacy policies and Swiss privacy legislation, simplifying consumers’ access to legal knowledge and expertise. (https://www.aramis.admin.ch/Grunddaten/?ProjectID=48867). The authors would like to thanks all the people involved on the implementation-side at Profila GmbH (https://www.profila.com/) for all the constructive and fruitful discussions and insights provided about privacy regulations and consumers’ rights.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Mazzola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mazzola, L., Waldis, A., Shankar, A., Argyris, D., Denzler, A., Van Roey, M. (2022). Privacy and Customer’s Education: NLP for Information Resources Suggestions and Expert Finder Systems. In: Moallem, A. (eds) HCI for Cybersecurity, Privacy and Trust. HCII 2022. Lecture Notes in Computer Science, vol 13333. Springer, Cham. https://doi.org/10.1007/978-3-031-05563-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05563-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05562-1

  • Online ISBN: 978-3-031-05563-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics