Abstract
Making sure that users understand privacy policies that impact them is a key challenge for a real GDPR deployment. Research studies are mostly carried in English, but in Europe and elsewhere, users speak a language that is not English. Replicating studies in different languages requires the availability of comparable cross-language privacy policies corpora. This work provides a methodology for building comparable cross-language in a national language and a reference study language. We provide an application example of our methodology comparing English and Italian extending the corpus of one of the first studies about users understanding of technical terms in privacy policies. We also investigate other open issues that can make replication harder.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Data Availability
The datasets generated and analyzed during the study are archived on Zenodo at https://doi.org/10.5281/zenodo.7729546.
References
Amos, R., Acar, G., Lucherini, E., Kshirsagar, M., Narayanan, A., Mayer, J.: Privacy policies over time: curation and analysis of a million-document dataset. In: WWW 2021: Proceedings of the Web Conference 2021, pp. 2165–2176 (2021)
Cecere, G., Le Guel, F., Soulié, N.: Perceived internet privacy concerns on social networks in Europe. Technol. Forecast. Soc. Change 96, 277–287 (2015)
Chatzipoulidis, A., Tsiakis, T., Kargidis, T.: A readiness assessment tool for GDPR compliance certification. Comput. Fraud Secur. 2019(8), 14–19 (2019)
Chebel, M., Latiri, C., Gaussier, E.: Bilingual lexicon extraction from comparable corpora based on closed concepts mining. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10234, pp. 586–598. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_46
Ciclosi, F., Massacci, F.: The data protection officer: a ubiquitous role that no one really knows. IEEE Secur. Privacy 21(1), 66–77 (2023)
Del Alamo, J.M., Guaman, D.S., García, B., Diez, A.: A systematic mapping study on automated analysis of privacy policies. Computing 104, 2053–2076 (2022)
Ermakova, T., Fabian, B., Babina, E.: Readability of privacy policies of healthcare websites. In: Wirtschaftsinformatik Proceedings 2015 (WI 2015) (2015)
Fabian, B., Ermakova, T., Lents, T.: Large-scale readability analysis of privacy policies. WI 2017: Proceedings of the International Conference on Web Intelligence, pp. 18–25 (2017)
Hosseini, M.B., Breaux, T.D., Slavin, R., Niu, J., Wang, X.: Analyzing privacy policies through syntax-driven semantic analysis of information types. Inf. Softw. Technol. 138, 106608 (2021)
Köhler, R.: Statistical comparability: methodological caveats. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 77–91. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-20128-8_4
Krumay, B., Klar, J.: Readability of privacy policies. In: Singhal, A., Vaidya, J. (eds.) DBSec 2020. LNCS, vol. 12122, pp. 388–399. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49669-2_22
Layton, R., Elaluf-Calderwood, S.: A social economic analysis of the impact of GDPR on security and privacy practices. In 2019 12th CMI Conference on Cybersecurity and Privacy (CMI), pp. 1–6. IEEE (2019)
Leicht, J., Heisel, M.: A Survey on privacy policy languages: expressiveness concerning data protection regulations. In: 2019 12th CMI Conference on Cybersecurity and Privacy (CMI), pp. 1–6 (2019)
Li, B., Gaussier, E.: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 644–652 (2010)
Li, B., Gaussier, E.: Exploiting comparable corpora for lexicon extraction: measuring and improving corpus quality. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 131–149. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-20128-8_7
National People’s Congress of the People’s Republic of China. Personal Information Protection Law of the People’s Republic of China (2021)
Paramita, M.L., Guthrie, D., Kanoulas, E., Gaizauskas, R., Clough, P., Sanderson, M.: Methods for collection and evaluation of comparable documents. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 93–112. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-20128-8_5
Parliament EU and Council EU. Consolidated text: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (2016)
Reidenberg, J.R., et al.: Disagreeable privacy policies: mismatches between meaning and users’ understanding. Berkeley Technol. Law J. 30(1), 39–68 (2015)
Robillard, J.M., et al.: Availability, readability, and content of privacy policies and terms of agreements of mental health apps. Internet Intervent. 17, 100243 (2019)
Sarne, D., Chler, J., Singer, A., Sela, A., Bar Siman Tov, I.: Unsupervised topic extraction from privacy policies. In: WWW 2019: Companion Proceedings of The 2019 World Wide Web Conference, pp. 563–568 (2019)
Skadiņa, I., Vasiļjevs, A., Skadiņš, R., Gaizauskas, R., Tufiş, D., Gornostay, T.: Analysis and evaluation of comparable corpora for under resourced areas of machine translation. In: The 5th Workshop on Building and Using Comparable Corpora, p. 17. CiteSeer (2012)
Talvensaari, T., Laurikkala, J., Järvelin, K., Juhola, M., Keskustalo, H.: Creating and exploiting a comparable corpus in cross-language information retrieval. ACM Trans. Inf. Syst. (TOIS), 25(1), 4-es (2007)
Tang, J., Shoemaker, H., Lerner, A., Birrell, E.: Defining privacy: how users interpret technical terms in privacy policies. Proceedi. Priv. Enhan. Technol. 3, 70–94 (2021)
Turow, J., Hennessy, M., Draper, N.: Persistent misperceptions: Americans’ misplaced confidence in privacy policies, 2003–2015. J. Broadcast. Electron. Media 62(3), 461–478 (2018)
Vail, M.W., Earp, J.B., Antón, A.I.: An empirical study of consumer perceptions and comprehension of web site privacy policies. IEEE Trans. Eng. Manage. 55(3), 442–454 (2008)
Wilson, S., et al.: The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1330–1340 (2016)
Zaeem, R.N., et al.: PrivacyCheck v2: a tool that recaps privacy policies for you. CIKM 2020. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3441–3444 (2020)
Zeadally, S., Winkler, S.: Privacy policy analysis of popular web platforms. IEEE Technol. Soc. Mag. 35(2), 75–85 (2016)
Zimmeck, S., Bellovin, S.M.: Privee: an architecture for automatically analyzing web privacy policies. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 1–16, San Diego, CA. USENIX Association (2014)
Zimmeck, S., et al.: MAPS: scaling privacy compliance analysis to a million apps. Proceed. Priv. Enhan. Technol. 2019(3), 66–86 (2019)
Acknowledgement
The authors would like to thank Eleanor Birrell and Ada Lerner for providing us their raw privacy corpus used in their paper [24]. Without their time and expertise this paper would not have been possible. This work was supported in part by the EU under the H2020 Leadership in Enabling and Industrial Technologies program under grant agreement 830929 (CyberSec4Europe).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
CRediT statements
Conceptualization: FC, FM, SV; Methodology: FC, SV; Software: SV, FC; Validation: FC, SV, FM; Investigation: FC, SV; Data Curation: FC, SV; Writing - Original Draft: FC, SV; Writing - Review & Editing: FC, SV, FM; Visualization: FC; Supervision: FM; Project administration: FM, SV; Funding acquisition: FM.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ciclosi, F., Vidor, S., Massacci, F. (2023). Building Cross-language Corpora for Human Understanding of Privacy Policies. In: Skarmeta, A., Canavese, D., Lioy, A., Matheu, S. (eds) Digital Sovereignty in Cyber Security: New Challenges in Future Vision. CyberSec4Europe 2022. Communications in Computer and Information Science, vol 1807. Springer, Cham. https://doi.org/10.1007/978-3-031-36096-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-36096-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36095-4
Online ISBN: 978-3-031-36096-1
eBook Packages: Computer ScienceComputer Science (R0)