Data Mining and Knowledge Discovery

, Volume 31, Issue 5, pp 1419–1443 | Cite as

The best privacy defense is a good privacy offense: obfuscating a search engine user’s profile

  • Jörg WickerEmail author
  • Stefan Kramer
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2017


User privacy on the internet is an important and unsolved problem. So far, no sufficient and comprehensive solution has been proposed that helps a user to protect his or her privacy while using the internet. Data are collected and assembled by numerous service providers. Solutions so far focused on the side of the service providers to store encrypted or transformed data that can be still used for analysis. This has a major flaw, as it relies on the service providers to do this. The user has no chance of actively protecting his or her privacy. In this work, we suggest a new approach, empowering the user to take advantage of the same tool the other side has, namely data mining to produce data which obfuscates the user’s profile. We apply this approach to search engine queries and use feedback of the search engines in terms of personalized advertisements in an algorithm similar to reinforcement learning to generate new queries potentially confusing the search engine. We evaluated the approach using a real-world data set. While evaluation is hard, we achieve results that indicate that it is possible to influence the user’s profile that the search engine generates. This shows that it is feasible to defend a user’s privacy from a new and more practical perspective.


Privacy Search engines Personalized ads Web mining Reinforcement learning 



The authors thank Nicolas Krauter for the help on the initial implementation.

Supplementary material

10618_2017_524_MOESM1_ESM.pdf (3.7 mb)
Supplementary material 1 (pdf 3827 KB)


  1. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM, New York, pp 439–450Google Scholar
  2. Aldeen YAAS, Salleh M, Razzaque MA (2015) A comprehensive review on privacy preserving data mining. SpringerPlus 4(1):694CrossRefGoogle Scholar
  3. Barreno M, Nelson B, Joseph AD, Tygar J (2010) The security of machine learning. Mach Learn 81(2):121–148MathSciNetCrossRefGoogle Scholar
  4. Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the 2006 ACM symposium on information, computer and communications security. ACM, New York, pp 16–25Google Scholar
  5. Beato F, Conti M, Preneel B (2013) Friend in the middle (fim): tackling de-anonymization in social networks. In: IEEE international conference on pervasive computing and communications workshops (PERCOM Workshops), pp 279–284Google Scholar
  6. Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 1807–1814Google Scholar
  7. Bilenko M, Richardson M (2011) Predictive client-side profiles for personalized advertising. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 413–421Google Scholar
  8. Ceci M, Malerba D (2007) Classifying web documents in a hierarchy of categories: a comprehensive study. J Intell Inf Syst 28(1):37–78CrossRefGoogle Scholar
  9. Eckersley P (2010) Privacy enhancing technologies: proceedings 10th international symposium, pets 2010, Berlin, Germany, July 21–23. In: Atallah MJ, Hopper NJ (eds) Privacy enhancing technologies, chapter How Unique Is Your Web Browser? Springer, Berlin, pp 1–18Google Scholar
  10. Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRefGoogle Scholar
  11. Gervais A, Shokri R, Singla A, Capkun S, Lenders V (2014) Quantifying web-search privacy. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, CCS ’14. ACM, New York, pp 966–977Google Scholar
  12. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newslett 11(1):10–18CrossRefGoogle Scholar
  13. Howe DC, Nissenbaum H (2009) Trackmenot: resisting surveillance in web search. In: Kerr I, Steeves V, Lucock C (eds) Lessons from the identity trail: anonymity, privacy, and identity in a networked society, vol 23. Oxford University, Oxford, pp 417–436Google Scholar
  14. Huang L, Joseph AD, Nelson B, Rubinstein BI, Tygar J (2011) Adversarial machine learning. In: Proceedings of the 4th ACM workshop on security and artificial intelligence. ACM, New York, pp 43–58Google Scholar
  15. Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Third IEEE international conference on data mining, pp 99–106Google Scholar
  16. Klivans AR, Long PM, Servedio RA (2009) Learning halfspaces with malicious noise. J Mach Learn Res 10:2715–2740MathSciNetzbMATHGoogle Scholar
  17. Lowd D, Meek C (2005) Adversarial learning. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, New York, pp 641–647Google Scholar
  18. Nikiforakis N, Joosen W, Livshits B (2015) Privaricator: Deceiving fingerprinters with little white lies. In: Proceedings of the 24th international conference on world wide web. International world wide web conferences steering committee, pp 820–830Google Scholar
  19. Nikiforakis N, Kapravelos A, Joosen W, Kruegel C, Piessens F, Vigna G (2013) Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: IEEE symposium on security and privacy (SP), pp 541–555Google Scholar
  20. Pedreschi D, Bonchi F, Turini F, Verykios VS, Atzori M, Malin B, Moelans B, Saygin Y (2008) Privacy protection: regulations and technologies, opportunities and threats. In: Giannotti F, Pedreschi D (eds) Mobility, data mining and privacy: geographic knowledge discovery. Springer, Berlin, pp 101–119CrossRefGoogle Scholar
  21. Purcell K, Brenner J, Rainie L (2012) Search engine use 2012. Technical report, Pew Internet and American Life Project WashingtonGoogle Scholar
  22. Rebollo-Monedero D, Forné J, Domingo-Ferrer J (2012) Query profile obfuscation by means of optimal query exchange between users. IEEE Trans Dependable Secure Comput 9(5):641–654Google Scholar
  23. Sánchez D, Castellà-Roca J, Viejo A (2013) Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines. Inf Sci 218:17–30CrossRefGoogle Scholar
  24. Skarkala ME, Maragoudakis M, Gritzalis S, Mitrou L, Toivonen H, Moen P (2012) Privacy preservation by k-anonymization of weighted social networks. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), ASONAM ’12. IEEE Computer Society, Washington, DC, pp 423–428Google Scholar
  25. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, CambridgeGoogle Scholar
  26. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Record 33(1):50–57CrossRefGoogle Scholar
  27. Viejo A, Sánchez D (2014) Profiling social networks to provide useful and privacy-preserving web search. J Assoc Inf Sci Technol 65(12):2444–2458CrossRefGoogle Scholar
  28. Wiering M, Van Otterlo M (2012) Reinforcement learning. In: Adaptation, learning, and optimization, vol 12. Springer Berlin HeidelbergGoogle Scholar
  29. Xu L, Jiang C, Wang J, Yuan J, Ren Y (2014) Information security in big data: privacy and data mining. IEEE Access 2:1149–1176CrossRefGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Institute of Computer ScienceJohannes Gutenberg University MainzMainzGermany

Personalised recommendations