Information Retrieval

, Volume 16, Issue 6, pp 657–679 | Cite as

A query scrambler for search privacy on the internet

  • Avi Arampatzis
  • Pavlos S. Efraimidis
  • George Drosatos


We propose a method for search privacy on the Internet, focusing on enhancing plausible deniability against search engine query-logs. The method approximates the target search results, without submitting the intended query and avoiding other exposing queries, by employing sets of queries representing more general concepts. We model the problem theoretically, and investigate the practical feasibility and effectiveness of the proposed solution with a set of real queries with privacy issues on a large web collection. The findings may have implications for other IR research areas, such as query expansion and fusion in meta-search. Finally, we discuss ideas for privacy, such as k-anonymity, and how these may be applied to search tasks.


Query scrambler Search privacy WordNet Fusion 



We thank Jaap Kamps from University of Amsterdam, the Netherlands, for providing access to the ClueWeb09_B dataset, and Savvas Chatzichristofis from Democritus University of Thrace for creating Figs. 1 and 2.


  1. Barbaro, M., & Zeller, T. (2006). A face is exposed for AOL searcher no. 4417749. Accessed June 3, 2010 from
  2. Boldi, P., Bonchi, F., Castillo, C., & Vigna, S. (2009). From "Dango" to "Japanese Cakes": Query reformulation models and patterns. In: WI-IAT ’09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (pp.183–190). Washington, DC, USA: IEEE Computer Society.Google Scholar
  3. Chor, B., Gilboa, N., & Naor, M. (1997). Private information retrieval by keywords. Tech. Rep. Technical Report TR CS0917. Haifa: Department of Computer Science, Technion, Israel Institute of Technology.Google Scholar
  4. Domingo-Ferrer, J., Bras-Amorós, M., Wu, Q., & Manjón, J. A. (2009a). User-private information retrieval based on a peer-to-peer community. Data & Knowledge Engineering 68(11), 1237–1252.CrossRefGoogle Scholar
  5. Domingo-Ferrer, J., Solanas, A., & Castella-Roca, J. (2009b). h(k)-private information retrieval from privacy-uncooperative queryable databases. Online Information Review, 33(4), 720–744.CrossRefGoogle Scholar
  6. Erola, A., Castellà-Roca, J., Navarro-Arribas, G., & c Torra, V. (2011). Semantic microaggregation for the anonymization of query logs using the open directory project. SORT—Statistics and Operations Research Transactions, 41–58).Google Scholar
  7. Fagin, R., Kumar, R., & Sivakumar, D. (2003). Comparing top k lists. SIAM Journal on Discrete Mathematics, 17(1), 134–160.MathSciNetCrossRefzbMATHGoogle Scholar
  8. Howe, D. C., & Nissenbaum, H. (2009). TrackMeNot: Resisting surveillance in web search. In: Lessons from the Identity Trail: Anonymity, Privacy, and Identity in a Networked Society (Chap. 23, pp. 417–436). Oxford, UK: Oxford University Press.Google Scholar
  9. Jones, R., Kumar, R., Pang, B., & Tomkins, A. (2008). Vanity fair: Privacy in querylog bundles. In: CIKM ’08: Proceeding of the 17th ACM Conference on Information and Knowledge Management (pp. 853–862). New York, NY, USA: ACM.Google Scholar
  10. Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Kumar, R., Novak, J., Pang, B., & Tomkins, A. (2007). On anonymizing query logs via token-based hashing. In: WWW ’07: Proceedings of the 16th International Conference on World Wide Web (pp. 629–638). New York, NY, USA: ACM.Google Scholar
  12. Miller, G. A. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(1), 39–41.CrossRefGoogle Scholar
  13. Mitzenmacher, M., & Upfal, E. (2005). Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge, MA: Cambridge University Press.CrossRefGoogle Scholar
  14. Motwani, R., & Raghavan, P. (1995). Randomized Algorithms. Cambridge, MA: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  15. Murugesan, M., & Clifton, C. (2009). Providing privacy through plausibly deniable search. In: SDM, SIAM (pp. 768–779).Google Scholar
  16. Ostrovsky, R., & Skeith, W. I. (2007). A survey of single-database PIR: techniques and applications. In: Public Key Cryptography (PKC 2007), Lecture Notes in Computer Science. (Vol. 4450, pp. 393–411). Berlin and Heidelberg:Springer.Google Scholar
  17. Pang, H., Ding, X., & Xiao, X. (2010). Embellishing text search queries to protect user privacy. Proceedings of the VLDB Endowment, 3(1), 598–607.Google Scholar
  18. Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In: InfoScale ’06: Proceedings of the 1st International Conference on Scalable Information Systems. New York, NY, USA: ACM Press.Google Scholar
  19. Raykova, M., Vo, B., Bellovin, S. M., & Malkin, T. (2009). Secure anonymous database search. In: R. Sion & D. Song (Eds.), CCSW, ACM, pp. 115–126.Google Scholar
  20. Saint-Jean, F., Johnson, A., Boneh, D., & Feigenbaum, J. (2007). Private web search. In: WPES ’07: Proceedings of the 2007 ACM Workshop on Privacy in Electronic Society. (pp. 84–90). New York, NY, USA: ACM.Google Scholar
  21. Shen, X., Tan, B., & Zhai, C. (2007). Privacy protection in personalized search. SIGIR Forum, 41(1), 4–17.CrossRefGoogle Scholar
  22. Spink, A., Wolfram, D., Jansen, M. B. J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.CrossRefGoogle Scholar
  23. Strube, M., & Ponzetto, S. P. (2006). Wikirelate! computing semantic relatedness using wikipedia. In: Proceedings of the 21st National Conference on Artificial Intelligence. (Vol. 2, pp 1419–1424). Menlo Park, CA:AAAI Press.Google Scholar
  24. Sweeney, L. (2002). k-Anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557–570.MathSciNetCrossRefzbMATHGoogle Scholar
  25. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G., & Milios, E. E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In: WIDM ’05: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management. (pp. 10–16). New York, NY, USA: ACM.Google Scholar
  26. Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (pp. 113–138). New Mexico: Las Cruces.Google Scholar
  27. Yan, P., Jiao, Y., Hurson, A. R., & Potok, T. E. (2006). Semantic-based information retrieval of biomedical data. In: SAC ’06: Proceedings of the 2006 ACM Symposium on Applied computing (pp. 1700–1704). New York, NY, USA: ACM.Google Scholar
  28. Yekhanin, S. (2010). Private information retrieval. Communications of the ACM, 53(4), 68–73.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Avi Arampatzis
    • 1
  • Pavlos S. Efraimidis
    • 1
  • George Drosatos
    • 1
  1. 1.Department of Electrical and Computer EngineeringDemocritus University of ThraceXanthiGreece

Personalised recommendations