Abstract
Privacy is one of the key issues for citizen’s everyday online activities, with the United Nations defining it as “a human right in the digital age”. Despite the introduction of data privacy regulations almost everywhere around the globe, the biggest barrier to effectiveness is the customer’s capacity to map the privacy statement received with the regulation in force and understand their terms. This study advocates the creation of a convenient and cost-efficient question-answering service for answering customers’ queries on data privacy. It proposes a dual step approach, allowing consumers to ask support to a conversational agent boosted by a smart knowledge base, attempting to answer the question using the most appropriate legal document. Being the self-help approach insufficient, our system enacts a second step suggesting a ranked list of legal experts for focused advice. To achieve our objective, we need large enough and specialised dataset and we plan to apply state-of-the-art Natural Language Processing (NLP) techniques in the field of open domain question answering. This paper describes the initial steps and some early results we achieved in this direction and the next steps we propose to develop a one-stop solution for consumers privacy needs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad, W., Chi, J., Tian, Y., Chang, K.W.: PolicyQA: a reading comprehension dataset for privacy policies. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 743–749. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.findings-emnlp.66
Amos, R., Acar, G., Lucherini, E., Kshirsagar, M., Narayanan, A., Mayer, J.: Privacy policies over time: curation and analysis of a million-document dataset. In: Proceedings of The Web Conference 2021, WWW ’21, p. 22. Association for Computing Machinery (2021). https://doi.org/10.1145/3442381.3450048
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Chalkidis, I., Androutsopoulos, I., Aletras, N.: Neural legal judgment prediction in English. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4317–4323. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1424. https://aclanthology.org/P19-1424
Chalkidis, I., Fergadiotis, M., Androutsopoulos, I.: MultiEURLEX - a multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. CoRR abs/2109.00904 (2021). https://arxiv.org/abs/2109.00904
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2898–2904. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.261
Chalkidis, I., Fergadiotis, M., Tsarapatsanis, D., Aletras, N., Androutsopoulos, I., Malakasiotis, P.: Paragraph-level rationale extraction through regularization: a case study on European court of human rights cases. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 226–241. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.naacl-main.22. https://aclanthology.org/2021.naacl-main.22
Chalkidis, I., et al.: LexGLUE: a benchmark dataset for legal language understanding in English. CoRR (2021). arXiv: 2110.00976
Franco, M.F., Rodrigues, B., Scheid, E.J., Jacobs, A., Killer, C., Granville, L.Z., Stiller, B.: SecBot: a business-driven conversational agent for cybersecurity planning and management. In: 2020 16th International Conference on Network and Service Management (CNSM), pp. 1–7. IEEE (2020)
Gstrein, O.J., Beaulieu, A.: How to protect privacy in a datafied society? A presentation of multiple legal and conceptual approaches. Philos. Technol. 35(1), 1–38 (2022). https://doi.org/10.1007/s13347-022-00497-4
Jain, M., Kumar, P., Bhansali, I., Liao, Q.V., Truong, K., Patel, S.: FarmChat: a conversational agent to answer farmer queries. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 2, no. 4, pp. 1–22 (2018)
Lippi, M., et al.: CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service. Artif. Intell. Law 27(2), 117–139 (2019). https://doi.org/10.1007/s10506-019-09243-2
Meier, P., Beinke, J.H., Fitte, C., Behne, A., Teuteberg, F.: FeelFit - design and evaluation of a conversational agent to enhance health awareness. In: Krcmar, H., Fedorowicz, J., Boh, W.F., Leimeister, J.M., Wattal, S. (eds.) Proceedings of the 40th International Conference on Information Systems, ICIS 2019, Munich, Germany, 15–18 December 2019. Association for Information Systems (2019). https://aisel.aisnet.org/icis2019/is_health/is_health/22
Prince, C., Omrani, N., Maalaoui, A., Dabic, M., Kraus, S.: Are we living in surveillance societies and is privacy an illusion? An empirical study on privacy literacy and privacy concerns. IEEE Trans. Eng. Manag. 1–18 (2021). https://doi.org/10.1109/TEM.2021.3092702
Ravichander, A., Black, A.W., Wilson, S., Norton, T., Sadeh, N.: Question answering for privacy policies: combining computational and legal perspectives. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4949–4959. Association for Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-1500. https://www.aclweb.org/anthology/D19-1500
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019). https://arxiv.org/abs/1908.10084
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). http://arxiv.org/abs/1910.01108
Spaeth, H., et al.: Supreme court database code book (2020). https://scdb.wustl.edu
Stalder, F., Denzler, A., Mazzola, L.: Towards granular knowledge structures: Comparison of different approaches. In: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 261–266. IEEE (2021)
Strycharz, J., Ausloos, J., Helberger, N.: Data protection or data frustration? Individual perceptions and attitudes towards the GDPR. Eur. Data Prot. L. Rev. 6, 407 (2020)
Tuggener, D., von Däniken, P., Peetz, T., Cieliebak, M.: LEDGAR: a large-scale multi-label corpus for text classification of legal provisions in contracts. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 1235–1241. European Language Resources Association, Marseille (2020). https://aclanthology.org/2020.lrec-1.155
Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-read students learn better: the impact of student initialization on knowledge distillation. CoRR abs/1908.08962 (2019). http://arxiv.org/abs/1908.08962
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. JL Tech. 31, 841 (2017)
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. CoRR abs/2002.10957 (2020). https://arxiv.org/abs/2002.10957
Zheng, L., Guha, N., Anderson, B.R., Henderson, P., Ho, D.E.: When does pretraining help? Assessing self-supervised learning for law and the CaseHOLD dataset. CoRR abs/2104.08671 (2021). https://arxiv.org/abs/2104.08671
Acknowledgement
The research leading to this work was partially financed by Innosuisse - Swiss federal agency for Innovation, through a competitive call. The project 50446.1 IP-ICT is called P2Sr Profila Privacy Simplified reloaded: Open-smart knowledge base on Swiss privacy policies and Swiss privacy legislation, simplifying consumers’ access to legal knowledge and expertise. (https://www.aramis.admin.ch/Grunddaten/?ProjectID=48867). The authors would like to thanks all the people involved on the implementation-side at Profila GmbH (https://www.profila.com/) for all the constructive and fruitful discussions and insights provided about privacy regulations and consumers’ rights.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mazzola, L., Waldis, A., Shankar, A., Argyris, D., Denzler, A., Van Roey, M. (2022). Privacy and Customer’s Education: NLP for Information Resources Suggestions and Expert Finder Systems. In: Moallem, A. (eds) HCI for Cybersecurity, Privacy and Trust. HCII 2022. Lecture Notes in Computer Science, vol 13333. Springer, Cham. https://doi.org/10.1007/978-3-031-05563-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-05563-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05562-1
Online ISBN: 978-3-031-05563-8
eBook Packages: Computer ScienceComputer Science (R0)