A Versatile Tool for Privacy-Enhanced Web Search

  • Avi Arampatzis
  • George Drosatos
  • Pavlos S. Efraimidis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7814)


We consider the problem of privacy leaks suffered by Internet users when they perform web searches, and propose a framework to mitigate them. Our approach, which builds upon and improves recent work on search privacy, approximates the target search results by replacing the private user query with a set of blurred or scrambled queries. The results of the scrambled queries are then used to cover the original user interest. We model the problem theoretically, define a set of privacy objectives with respect to web search and investigate the effectiveness of the proposed solution with a set of real queries on a large web collection. Experiments show great improvements in retrieval effectiveness over a previously reported baseline in the literature. Furthermore, the methods are more versatile, predictably-behaved, applicable to a wider range of information needs, and the privacy they provide is more comprehensible to the end-user.


Document Sample Semantic Approach Retrieval Effectiveness Target Document Pointwise Mutual Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arampatzis, A., Efraimidis, P., Drosatos, G.: Enhancing Deniability against Query-Logs. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 117–128. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Barbaro, M., Zeller, T.: A Face Is Exposed for AOL Searcher No. 4417749 (2006), (accessed June 3, 2010)
  3. 3.
    Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: GSCL, pp. 31–40. Gunter Narr Verlag, Tbingen (2009)Google Scholar
  4. 4.
    Brown, P.F., Pietra, V.J.D., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)Google Scholar
  5. 5.
    Callan, J.P., Connell, M.E.: Query-based sampling of text databases. ACM Trans. Inf. Syst. 19(2), 97–130 (2001)CrossRefGoogle Scholar
  6. 6.
    Caprara, A., Toth, P., Fischetti, M.: Algorithms for the set covering problem. Annals of Operations Research 98, 353–371 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press (2008)Google Scholar
  8. 8.
    Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: InfoScale. ACM (2006)Google Scholar
  9. 9.
    Solove, D.J.: Understanding Privacy. Harvard University Press (2008)Google Scholar
  10. 10.
    Sweeney, L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Terra, E.L., Clarke, C.L.A.: Frequency estimates for statistical word similarity measures. In: NAACL-HLT, pp. 165–172. ACL (2003)Google Scholar
  12. 12.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, pp. 412–420. Morgan Kaufmann (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Avi Arampatzis
    • 1
  • George Drosatos
    • 1
  • Pavlos S. Efraimidis
    • 1
  1. 1.Department of Electrical and Computer EngineeringDemocritus University of ThraceXanthiGreece

Personalised recommendations