Skip to main content

Building Cross-language Corpora for Human Understanding of Privacy Policies

  • Conference paper
  • First Online:
Digital Sovereignty in Cyber Security: New Challenges in Future Vision (CyberSec4Europe 2022)

Abstract

Making sure that users understand privacy policies that impact them is a key challenge for a real GDPR deployment. Research studies are mostly carried in English, but in Europe and elsewhere, users speak a language that is not English. Replicating studies in different languages requires the availability of comparable cross-language privacy policies corpora. This work provides a methodology for building comparable cross-language in a national language and a reference study language. We provide an application example of our methodology comparing English and Italian extending the corpus of one of the first studies about users understanding of technical terms in privacy policies. We also investigate other open issues that can make replication harder.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Availability

The datasets generated and analyzed during the study are archived on Zenodo at https://doi.org/10.5281/zenodo.7729546.

References

  1. Amos, R., Acar, G., Lucherini, E., Kshirsagar, M., Narayanan, A., Mayer, J.: Privacy policies over time: curation and analysis of a million-document dataset. In: WWW 2021: Proceedings of the Web Conference 2021, pp. 2165–2176 (2021)

    Google Scholar 

  2. Cecere, G., Le Guel, F., Soulié, N.: Perceived internet privacy concerns on social networks in Europe. Technol. Forecast. Soc. Change 96, 277–287 (2015)

    Article  Google Scholar 

  3. Chatzipoulidis, A., Tsiakis, T., Kargidis, T.: A readiness assessment tool for GDPR compliance certification. Comput. Fraud Secur. 2019(8), 14–19 (2019)

    Article  Google Scholar 

  4. Chebel, M., Latiri, C., Gaussier, E.: Bilingual lexicon extraction from comparable corpora based on closed concepts mining. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10234, pp. 586–598. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_46

    Chapter  Google Scholar 

  5. Ciclosi, F., Massacci, F.: The data protection officer: a ubiquitous role that no one really knows. IEEE Secur. Privacy 21(1), 66–77 (2023)

    Google Scholar 

  6. Del Alamo, J.M., Guaman, D.S., García, B., Diez, A.: A systematic mapping study on automated analysis of privacy policies. Computing 104, 2053–2076 (2022)

    Google Scholar 

  7. Ermakova, T., Fabian, B., Babina, E.: Readability of privacy policies of healthcare websites. In: Wirtschaftsinformatik Proceedings 2015 (WI 2015) (2015)

    Google Scholar 

  8. Fabian, B., Ermakova, T., Lents, T.: Large-scale readability analysis of privacy policies. WI 2017: Proceedings of the International Conference on Web Intelligence, pp. 18–25 (2017)

    Google Scholar 

  9. Hosseini, M.B., Breaux, T.D., Slavin, R., Niu, J., Wang, X.: Analyzing privacy policies through syntax-driven semantic analysis of information types. Inf. Softw. Technol. 138, 106608 (2021)

    Google Scholar 

  10. Köhler, R.: Statistical comparability: methodological caveats. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 77–91. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-20128-8_4

    Chapter  Google Scholar 

  11. Krumay, B., Klar, J.: Readability of privacy policies. In: Singhal, A., Vaidya, J. (eds.) DBSec 2020. LNCS, vol. 12122, pp. 388–399. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49669-2_22

    Chapter  Google Scholar 

  12. Layton, R., Elaluf-Calderwood, S.: A social economic analysis of the impact of GDPR on security and privacy practices. In 2019 12th CMI Conference on Cybersecurity and Privacy (CMI), pp. 1–6. IEEE (2019)

    Google Scholar 

  13. Leicht, J., Heisel, M.: A Survey on privacy policy languages: expressiveness concerning data protection regulations. In: 2019 12th CMI Conference on Cybersecurity and Privacy (CMI), pp. 1–6 (2019)

    Google Scholar 

  14. Li, B., Gaussier, E.: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 644–652 (2010)

    Google Scholar 

  15. Li, B., Gaussier, E.: Exploiting comparable corpora for lexicon extraction: measuring and improving corpus quality. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 131–149. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-20128-8_7

    Chapter  Google Scholar 

  16. National People’s Congress of the People’s Republic of China. Personal Information Protection Law of the People’s Republic of China (2021)

    Google Scholar 

  17. Paramita, M.L., Guthrie, D., Kanoulas, E., Gaizauskas, R., Clough, P., Sanderson, M.: Methods for collection and evaluation of comparable documents. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 93–112. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-20128-8_5

    Chapter  Google Scholar 

  18. Parliament EU and Council EU. Consolidated text: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (2016)

    Google Scholar 

  19. Reidenberg, J.R., et al.: Disagreeable privacy policies: mismatches between meaning and users’ understanding. Berkeley Technol. Law J. 30(1), 39–68 (2015)

    Google Scholar 

  20. Robillard, J.M., et al.: Availability, readability, and content of privacy policies and terms of agreements of mental health apps. Internet Intervent. 17, 100243 (2019)

    Google Scholar 

  21. Sarne, D., Chler, J., Singer, A., Sela, A., Bar Siman Tov, I.: Unsupervised topic extraction from privacy policies. In: WWW 2019: Companion Proceedings of The 2019 World Wide Web Conference, pp. 563–568 (2019)

    Google Scholar 

  22. Skadiņa, I., Vasiļjevs, A., Skadiņš, R., Gaizauskas, R., Tufiş, D., Gornostay, T.: Analysis and evaluation of comparable corpora for under resourced areas of machine translation. In: The 5th Workshop on Building and Using Comparable Corpora, p. 17. CiteSeer (2012)

    Google Scholar 

  23. Talvensaari, T., Laurikkala, J., Järvelin, K., Juhola, M., Keskustalo, H.: Creating and exploiting a comparable corpus in cross-language information retrieval. ACM Trans. Inf. Syst. (TOIS), 25(1), 4-es (2007)

    Google Scholar 

  24. Tang, J., Shoemaker, H., Lerner, A., Birrell, E.: Defining privacy: how users interpret technical terms in privacy policies. Proceedi. Priv. Enhan. Technol. 3, 70–94 (2021)

    Google Scholar 

  25. Turow, J., Hennessy, M., Draper, N.: Persistent misperceptions: Americans’ misplaced confidence in privacy policies, 2003–2015. J. Broadcast. Electron. Media 62(3), 461–478 (2018)

    Article  Google Scholar 

  26. Vail, M.W., Earp, J.B., Antón, A.I.: An empirical study of consumer perceptions and comprehension of web site privacy policies. IEEE Trans. Eng. Manage. 55(3), 442–454 (2008)

    Article  Google Scholar 

  27. Wilson, S., et al.: The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1330–1340 (2016)

    Google Scholar 

  28. Zaeem, R.N., et al.: PrivacyCheck v2: a tool that recaps privacy policies for you. CIKM 2020. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3441–3444 (2020)

    Google Scholar 

  29. Zeadally, S., Winkler, S.: Privacy policy analysis of popular web platforms. IEEE Technol. Soc. Mag. 35(2), 75–85 (2016)

    Article  Google Scholar 

  30. Zimmeck, S., Bellovin, S.M.: Privee: an architecture for automatically analyzing web privacy policies. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 1–16, San Diego, CA. USENIX Association (2014)

    Google Scholar 

  31. Zimmeck, S., et al.: MAPS: scaling privacy compliance analysis to a million apps. Proceed. Priv. Enhan. Technol. 2019(3), 66–86 (2019)

    Google Scholar 

Download references

Acknowledgement

The authors would like to thank Eleanor Birrell and Ada Lerner for providing us their raw privacy corpus used in their paper [24]. Without their time and expertise this paper would not have been possible. This work was supported in part by the EU under the H2020 Leadership in Enabling and Industrial Technologies program under grant agreement 830929 (CyberSec4Europe).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Ciclosi .

Editor information

Editors and Affiliations

Ethics declarations

CRediT statements

Conceptualization: FC, FM, SV; Methodology: FC, SV; Software: SV, FC; Validation: FC, SV, FM; Investigation: FC, SV; Data Curation: FC, SV; Writing - Original Draft: FC, SV; Writing - Review & Editing: FC, SV, FM; Visualization: FC; Supervision: FM; Project administration: FM, SV; Funding acquisition: FM.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ciclosi, F., Vidor, S., Massacci, F. (2023). Building Cross-language Corpora for Human Understanding of Privacy Policies. In: Skarmeta, A., Canavese, D., Lioy, A., Matheu, S. (eds) Digital Sovereignty in Cyber Security: New Challenges in Future Vision. CyberSec4Europe 2022. Communications in Computer and Information Science, vol 1807. Springer, Cham. https://doi.org/10.1007/978-3-031-36096-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36096-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36095-4

  • Online ISBN: 978-3-031-36096-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics