Skip to main content

Multifaceted Metrics for Assessing Privacy Policies Using Text Processing and Clustering Analysis

  • Conference paper
  • First Online:
Proceedings of International Conference on Communication and Computational Technologies

Abstract

Today's privacy policies contain various deficiencies, including failure to convey information comprehensibly to most Internet users and a lack of transparency. Meanwhile, existing studies on privacy policies only focused on specific areas of interest and lack an inclusive outlook on the state privacy policies due to the differences in privacy policy samples, text properties, measures, methodologies, and backgrounds. Therefore, this research develops an assessment metric to bridge this gap by integrating the fragmented understanding of privacy policies and exploring potential aspects to evaluate privacy policies absent from existing studies. The multifaceted assessment metric developed through this study covers three main aspects: content, text property, and user interface. Through the investigation and analyses performed on Malaysian organizations’ online privacy policies, this study reveals several trends using text processing and clustering analysis methods: (1) the use of jargon in privacy policies are relatively low, (2) privacy policies with higher compliance levels tend to be lengthier and more repetitive, and vice versa, (3) regardless of compliance level, there are privacy policies that are not presented in user-friendly font size. Finally, as an experiment of applying the developed metrics, the results confirm the relevance of the assessment metrics developed for assessing online privacy policies via text processing and clustering analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mutimukwe C, Kolkowska E, Grönlund Å Information privacy in e-service: effect of organizational privacy assurances on individual privacy concerns, perceptions, trust and self-disclosure behaviour. Gov Inf Quart 37(1):101413

    Google Scholar 

  2. Chua HN, Wong SF, Low YC, Chang Y Impact of employees’ demographic characteristics on the awareness and compliance of information security policy in organizations. Telematics Inform 35(6):1770–1780

    Google Scholar 

  3. Kaur J, Dara RA, Obimbo C, Song F, Menard K A comprehensive keyword analysis of online privacy policies. Inf Secur J: Glob Perspect 27(5–6):260–275

    Google Scholar 

  4. Waldman AE (2018) Privacy, notice, and design. Stanford Technol Law Rev 21(1):74–127

    Google Scholar 

  5. Guntamukkala N, Dara R, Grewal G (2015) A machine-learning based approach for measuring the completeness of online privacy policies. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 289–294

    Google Scholar 

  6. Reidenberg J, Bhatia J, Breaux T, Norton T (2016) Ambiguity in privacy policies and the impact of regulation. J Leg Stud 45:S163–S190

    Article  Google Scholar 

  7. Meier Y, Schäwel J, Krämer NC (2020) The shorter the better? Effects of privacy policy length on online privacy decision-making. Media Commun 8(2):291–301

    Article  Google Scholar 

  8. Chua HN, Ooi JS, Herbland A (2021) The effects of different personal data categories on information privacy concern and disclosure. Comput Secur 110:102453

    Article  Google Scholar 

  9. Gao L, Brink AG (2018) A content analysis of the privacy policies of cloud computing services. J Inf Syst 33(3):93–115

    Google Scholar 

  10. Liao S, Wilson C, Cheng L, Hu H, Deng H (2020) Measuring the effectiveness of privacy policies for voice assistant applications. arXiv preprint arXiv:2007.14570

  11. Chua HN, Herbland A, Wong SF, Chang Y (2017) Compliance to personal data protection principles: a study of how organizations frame privacy policy notices. Telematics Inform 34(4):157–170

    Article  Google Scholar 

  12. Ermakova T, Baumann A, Fabian B, Krasnova H (2014) Privacy policies and users' trust: does readability matter? In: AMCIS

    Google Scholar 

  13. Xu H, Dinev T, Smith H, Hart P (2008) Examining the formation of individual's privacy concerns: toward an integrative view, p 6

    Google Scholar 

  14. Pavlou PA (2003) Consumer acceptance of electronic commerce: integrating trust and risk with the technology acceptance model. Int J Electron Commer 7(3):101–134

    Article  Google Scholar 

  15. Al-Jabri Ibrahim M, Eid Mustafa I, Abed A (2019) The willingness to disclose personal information: trade-off between privacy concerns and benefits. Inf Comput Secur 28(2):161–181

    Google Scholar 

  16. Raschke R, Krishen A, Kachroo P (2014) Understanding the components of information privacy threats for location-based services. J Inf Syst 28:227–242

    Google Scholar 

  17. Young J (2020) Metrics. https://www.investopedia.com/terms/m/metrics.asp (accessed 2020/11/15, 2020)

  18. Flesch R (1949) The art of readable writing. Harper and Row, New York, NY

    Google Scholar 

  19. PDPA (2010) Laws of Malaysia, Act 709, Personal Data Protection Act 2010

    Google Scholar 

  20. Lauer TW, Deng X (2007) Building online trust through privacy practices. Int J Inf Secur 6(5):323–331, 2007/09/01 2007

    Google Scholar 

  21. Wu K-W, Huang SY, Yen DC, Popova I (2012) The effect of online privacy policy on consumer privacy concern and trust. Comput Hum Behav 28(3):889–897

    Article  Google Scholar 

  22. Acquisti A, Adjerid I, Brandimarte L (2013) Gone in 15 seconds: the limits of privacy transparency and control. IEEE Secur Priv 11(4):72–74

    Article  Google Scholar 

  23. Zimmeck S, Bellovin SM (2014) Privee: an architecture for automatically analyzing web privacy policies. In: 23rd Security symposium ({USENIX} Security 14), pp 1–16

    Google Scholar 

  24. Wilson S et al The creation and analysis of a website privacy policy corpus. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1330–1340

    Google Scholar 

  25. Li Y, Stweart W, Zhu J, Ni A (2012) Online privacy policy of the thirty Dow Jones corporations: compliance with FTC fair information practice principles and readability assessment. Commun IIMA 12:5

    Google Scholar 

  26. Chaw CY, Chua HN (2021) A framework system using word mover’s distance text similarity algorithm for assessing privacy policy compliance. In: IT convergence and security. Springer, Singapore, pp 79–89

    Google Scholar 

  27. Paasche-Orlow MK, Jacob DM, Powell JN (2005) Notices of privacy practices: a survey of the health insurance portability and accountability act of 1996 documents presented to patients at US hospitals. Med Care 43(6):558–564

    Article  Google Scholar 

  28. Milne G, Culnan M, Greene H (2006) A longitudinal assessment of online privacy notice readability. J Public Policy Market 25:238–249

    Article  Google Scholar 

  29. Vail MW, Earp JB, AntÓn AI (2008) An empirical study of consumer perceptions and comprehension of web site privacy policies. IEEE Trans Eng Manage 55(3):442–454

    Article  Google Scholar 

  30. Singh RI, Sumeeth M, Miller J (2011) A user-centric evaluation of the readability of privacy policies in popular web sites. Inf Syst Front 13(4):501–514, 2011/09/01 2011. https://doi.org/10.1007/s10796-010-9228-2

  31. Shulman HC, Dixon GN, Bullock OM, Colón Amill D (2020) The effects of Jargon on processing fluency, self-perceptions, and scientific engagement. J Lang Soc Psychol, 39(5–6):579–597

    Google Scholar 

  32. Al-Saqer NS, Seliaman ME (2016) The impact of privacy concerns and perceived vulnerability to risks on users privacy protection behaviors on SNS: a structural equation model. Int J Adv Comput Sci Appl 7

    Google Scholar 

  33. Hu M, Nation P (2000) Unknown vocabulary density and reading comprehension. Read Foreign Language 13:403–430

    Google Scholar 

  34. Laufer B, Ravenhorst-Kalovski GC (2010) Lexical threshold revisited: lexical text coverage, learners’ vocabulary size and reading comprehension. Read Foreign Language 22:15–30

    Google Scholar 

  35. Kon G (2018) Does anyone read privacy notices? The facts. In: Linklaters (ed)

    Google Scholar 

  36. Grannis A (2015) You didn’t even notice! Elements of effective online privacy policies. Fordham Urban Law J 42:1109

    Google Scholar 

  37. Obar J, Oeldorf-Hirsch A (2018) The biggest lie on the Internet: ignoring the privacy policies and terms of service policies of social networking services. Inf, Commun Soc 23:1–20, 07/03 2018. https://doi.org/10.1080/1369118X.2018.1486870

  38. Goel S, Chengalur-Smith IN (2010) Metrics for characterizing the form of security policies. J Strateg Inf Syst 19(4):281–295

    Article  Google Scholar 

  39. Hwang MI, Lin JW (1999) Information dimension, information overload and decision quality. J Inf Sci 25(3):213–218

    Article  Google Scholar 

  40. Jacoby J (1984) Perspectives on information overload. J Consum Res 10(4):432–435

    Article  Google Scholar 

  41. Edmunds A, Morris A (2000) The problem of information overload in business organisations: a review of the literature. Int J Inf Manage 20(1):17–28

    Article  Google Scholar 

  42. Rello L, Pielot M, Marcos M-C (2016) Make it big! The effect of font size and line spacing on online readability. In: Proceedings of the 2016 CHI conference on human factors in computing systems, San Jose, California, USA

    Google Scholar 

  43. Banerjee J, Bhattacharyya M (2011) Selection of the optimum font type and size interface for on screen continuous reading by young adults: an ergonomic approach. J Hum Ergol 40:47–62, 12/01 2011

    Google Scholar 

  44. Darroch I, Goodman J, Brewster SA, Gray PDG (2005) The effect of age and font size on reading text on handheld computers. Lect Notes Comput Sci 3585:253–266

    Article  Google Scholar 

  45. Evett L, Brown D (2005) Text formats and web design for visually impaired and dyslexic readers-clear text for all. Interact Comput 17(4):453–472

    Article  Google Scholar 

  46. O’Brien BA, Mansfield JS, Legge GE (2005) The effect of print size on reading speed in dyslexia (in Eng). J Res Read 28(3):332–349

    Article  Google Scholar 

  47. Rello L, Pielot M, Marcos M-C, Carlini R (2013) Size matters (spacing not): 18 points for a dyslexic-friendly Wikipedia

    Google Scholar 

  48. Power C, Petrie H, Swallow D, Murphy E, Gallagher B, Velasco CA (2013) Navigating, discovering and exploring the web: strategies used by people with print disabilities on interactive websites. Berlin, Heidelberg, 2013: Springer Berlin Heidelberg, in Human-Computer Interaction—INTERACT 2013, pp 667–684

    Google Scholar 

  49. Hojjati N, Muniandy B (2014) The effects of font type and spacing of text for online readability and performance. Contemp Educ Technol 5, 06/01 2014

    Google Scholar 

  50. PDPA (2013) Personal data protection (Class of Data Users) Order 2013. Federal Government Gazette

    Google Scholar 

  51. Bell SM, Miller KC, McCallum RS, Hopkins M, Hilton-Prillhart A (2012) Unique screener of reading fluency and comprehension for adolescents and adults. Psychology 3(1):45

    Article  Google Scholar 

  52. Benevides T, Peterson SS (2010) Literacy attitudes, habits and achievements of future teachers. J Educ Teach 36(3):291–302

    Article  Google Scholar 

  53. Masterson J, Hayes M (2004) UK data from 197 undergraduates for the Nelson Denny reading test. J Res Read 27(1):30–35

    Article  Google Scholar 

  54. Rakedzon T, Segev E, Chapnik N, Yosef R, Baram-Tsabari A (2017) Automatic jargon identifier for scientists engaging with the public and science communication educators. PLoS ONE 12(8):e0181742

    Article  Google Scholar 

  55. Franken G, Podlesek A, Mozina K (2015) Eye-tracking study of reading speed from LCD displays: influence of type style and type size. J Eye Mov Res 8

    Google Scholar 

  56. Wallace S, Treitman R, Huang J, Sawyer BD, Bylinskii Z (2020) Accelerating adult readers with typeface: a study of individual preferences and effectiveness. In: 2020 CHI conference on human factors in computing systems, Honolulu, HI, USA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Na Chua .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Low, S.A., Chua, H.N. (2023). Multifaceted Metrics for Assessing Privacy Policies Using Text Processing and Clustering Analysis. In: Kumar, S., Hiranwal, S., Purohit, S.D., Prasad, M. (eds) Proceedings of International Conference on Communication and Computational Technologies . Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-3951-8_19

Download citation

Publish with us

Policies and ethics