Skip to main content

What Websites Know About You

Privacy Policy Analysis Using Information Extraction

  • Conference paper
Data Privacy Management and Autonomous Spontaneous Security (DPM 2012, SETOP 2012)

Abstract

The need for privacy protection on the Internet is well recognized. Everyday users are asked to release personal information in order to use online services and applications. Service providers do not always need all the data they gather to be able to offer a service. Thus users should be aware of what data is collected by a provider to judge whether this is too much for the services offered. Providers are obliged to describe how they treat personal data in privacy policies. By reading the policy users could discover, amongst others, what personal data they agree to give away when choosing to use a service. Unfortunately, privacy policies are long legal documents that users notoriously refuse to read. In this paper we propose a solution which automatically analyzes privacy policy text and shows what personal information is collected. Our solution is based on the use of Information Extraction techniques and represents a step towards the more ambitious aim of automated grading of privacy policies.

This work has been partially funded by the THeCS project in the Dutch National COMMIT program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kosta, E., Dumortier, J., Graux, H., Tirtea, R., Ikonomou, D.: Study on data collection and storage in the EU. Technical report, ENISA, European Network and Information Securiy Agency (2012)

    Google Scholar 

  2. Newman, J.: 8 Tools for the Online Privacy Paranoid (2012)

    Google Scholar 

  3. Spiekermann, S.: Engineering privacy. IEEE Software Engineering 35(1) (2009)

    Google Scholar 

  4. Tsai, J., Egelman, S., Cranor, L.: The effect of online privacy information on purchasing behavior: An experimental study. Information Systems Research 21 (2011)

    Google Scholar 

  5. Turow, J., Hoofnagle, C.J., Mulligan, D.K., Good, N., Grossklags, J.: The FTC and Consumer Privacy in the Coming Decade. Federal Trade Commission (2006)

    Google Scholar 

  6. Tene, O.: Privacy in the Age of Big Data: A Time for Big Decisions. Stanford Law Review Online (2012)

    Google Scholar 

  7. Costante, E., Sun, Y., Petkovic, M., den Hartog, J.: A machine learning solution to assess privacy policy completeness: (short paper). In: WPES 2012, pp. 91–96 (2012)

    Google Scholar 

  8. Holtz, L.-E., Nocun, K., Hansen, M.: Towards Displaying Privacy Information with Icons. In: Fischer-Hübner, S., Duquenoy, P., Hansen, M., Leenes, R., Zhang, G. (eds.) Privacy and Identity Management for Life. IFIP AICT, vol. 352, pp. 338–348. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Anton, A.I., Earp, J.B., Qingfeng, H., Stufflebeam, W., Bolchini, D., Jensen, C.: Financial privacy policies and the need for standardization. IEEE Security and Privacy 2(2) (2004)

    Google Scholar 

  10. Brodie, C.A., Karat, C.M., Karat, J.: An empirical study of natural language parsing of privacy policy rules using the SPARCLE policy workbench. In: Proc. of SOUPS 2006. ACM (2006)

    Google Scholar 

  11. Brodie, C.A., Karat, C.M., Karat, J., Feng, J.: Usable security and privacy: a case study of developing privacy management tools. In: Proc. of SOUPS 2005. ACM (2005)

    Google Scholar 

  12. Yu, W.D., Doddapaneni, S., Murthy, S.: A Privacy Assessment Approach for Serviced Oriented Architecture Application. In: Proc. of SOSE 2006. IEEE (2006)

    Google Scholar 

  13. Yu, W.D., Murthy, S.: PPMLP: A Special Modeling Language Processor for Privacy Policies. In: Proc. of ISCC 2007. IEEE (2007)

    Google Scholar 

  14. Cranor, L., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J.: The platform for privacy preferences 1.0 (P3P1. 0) specification. W3C (2002)

    Google Scholar 

  15. Aïmeur, E., Gambs, S., Ho, A.: UPP: User Privacy Policy for Social Networking Sites. In: Proc. of ICIW 2009. IEEE (2009)

    Google Scholar 

  16. W3C: Privacy Enhancing Browser Extensions. Technical report, W3C (2011)

    Google Scholar 

  17. Ashley, P., Hada, S., Karjoth, G., Powers, C., Schunter, M.: Enterprise privacy authorization language (EPAL). Technical report, IBM Research (2003)

    Google Scholar 

  18. OASIS: extensible access control markup language (xacml) version 2.0. Technical report, OASIS (2008)

    Google Scholar 

  19. Schwitter, R.: English as a Formal Specification Language. In: Proc. of DEXA 2002. IEEE Computer Society (2002)

    Google Scholar 

  20. Cranor, L., Arjula, M.: Use of a P3P user agent by early adopters. In: Poc. of WPES 2002 (2002)

    Google Scholar 

  21. Reagle, J., Cranor, L.: The platform for privacy preferences. Communications of the ACM 42(2) (1999)

    Google Scholar 

  22. Beatty, P., Reay, I., Dick, S., Miller, J.: P3P Adoption on E-Commerce Web sites: A Survey and Analysis. IEEE Internet Computing 11(2) (2007)

    Google Scholar 

  23. Nédellec, C., Nazarenko, A.: Ontologies and Information Extraction. CoRR abs/cs/060 (July 2006)

    Google Scholar 

  24. Cunningham, H.: Information extraction, automatic. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, vol. 5. Elsevier (2005)

    Google Scholar 

  25. Turmo, J., Ageno, A.: Adaptive information extraction. ACM Computing Surveys (CSUR) 38(2) (2006)

    Google Scholar 

  26. Hobbs, J.: The generic information extraction system. In: Proc. of MUC 1993 (1993)

    Google Scholar 

  27. Deemter, K., Kibble, R.: On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics (2000)

    Google Scholar 

  28. Hirschman, L., Robinson, P., Burger, J.D., Vilain, M.B.: Automating coreference: The role of annotated training data. CoRR cmp-lg/9803001 (1998)

    Google Scholar 

  29. Cunningham, H.: GATE, a General Architecture for Text Engineering. Computers and the Humanities 36(2) (2002)

    Google Scholar 

  30. Cunningham, H., Maynard, D., Bontcheva, K.: Text Processing with GATE (Version 6). GATE (2011)

    Google Scholar 

  31. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proc. of ACL 2002 (2002)

    Google Scholar 

  32. Ohm, P.: Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57 (2010)

    Google Scholar 

  33. Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan, S., Zhu, H.: SystemT: a system for declarative information extraction. SIGMOD Rec. 37(4) (2009)

    Google Scholar 

  34. Ashish, N., Mehrotra, S., Pirzadeh, P.: Xar: An integrated framework for information extraction. In: WRI World Congress on Computer Science and Information Engineering (2009)

    Google Scholar 

  35. Cunningham, H., Maynard, D., Tablan, V.: Jape: a java annotation patterns engine (1999)

    Google Scholar 

  36. Xu, F.: Bootstrapping Relation Extraction from Semantic Seeds. PhD thesis, Saarland University (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Costante, E., den Hartog, J., Petković, M. (2013). What Websites Know About You. In: Di Pietro, R., Herranz, J., Damiani, E., State, R. (eds) Data Privacy Management and Autonomous Spontaneous Security. DPM SETOP 2012 2012. Lecture Notes in Computer Science, vol 7731. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35890-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35890-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35889-0

  • Online ISBN: 978-3-642-35890-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics