Information Retrieval Journal

, Volume 18, Issue 4, pp 331–358 | Cite as

Versatile Query Scrambling for Private Web Search

  • Avi Arampatzis
  • George Drosatos
  • Pavlos S. Efraimidis
Article

Abstract

We consider the problem of privacy leaks suffered by Internet users when they perform web searches, and propose a framework to mitigate them. In brief, given a ‘sensitive’ search query, the objective of our work is to retrieve the target documents from a search engine without disclosing the actual query. Our approach, which builds upon and improves recent work on search privacy, approximates the target search results by replacing the private user query with a set of blurred or scrambled queries. The results of the scrambled queries are then used to cover the private user interest. We model the problem theoretically, define a set of privacy objectives with respect to web search and investigate the effectiveness of the proposed solution with a set of queries with privacy issues on a large web collection. Experiments show great improvements in retrieval effectiveness over a previously reported baseline in the literature. Furthermore, the methods are more versatile, predictably-behaved, applicable to a wider range of information needs, and the privacy they provide is more comprehensible to the end-user. Additionally, we investigate the perceived privacy via a user study, as well as, measure the system’s usefulness taking into account the trade off between retrieval effectiveness and privacy. The practical feasibility of the methods is demonstrated in a field experiment, scrambling queries against a popular web search engine. The findings may have implications for other IR research areas, such as query expansion, query decomposition, and distributed retrieval.

Keywords

Query scrambler Search privacy Query-based document sampling Mutual information Set covering Inter-user agreement 

References

  1. Arampatzis, A., Kamps, J., & Robertson, S. (2009). Where to stop reading a ranked list: Threshold optimization using truncated score distributions. In SIGIR, ACM (pp. 524–531).Google Scholar
  2. Arampatzis, A., Efraimidis, P., & Drosatos, G. (2011). Enhancing deniability against query-logs. In ECIR, Springer, lecture notes in computer science (Vol. 6611, pp. 117–128).Google Scholar
  3. Arampatzis, A., Efraimidis, P. S., & Drosatos, G. (2013). A query scrambler for search privacy on the internet. Information Retrieval, 16(6), 657–679.CrossRefGoogle Scholar
  4. Barbaro, M., & Zeller, T. (2006). A face is exposed for AOL searcher no. 4417749. Accessed June 5, 2014. http://www.nytimes.com/2006/08/09/technology/09aol.html.
  5. Bhagat, S., Weinsberg, U., Ioannidis, S., & Taft, N. (2014). Recommending with an agenda: Active learning of private attributes using matrix factorization. In Proceedings of the 8th ACM conference on recommender systems (pp. 65–72). New York: ACM. RecSys ’14. doi:10.1145/2645710.2645747.
  6. Boneh, D., & Waters, B. (2007). Conjunctive, subset, and range queries on encrypted data. In Theory of cryptography, lecture notes in computer science (Vol. 4392, pp. 535–554). Berlin: Springer. doi:10.1007/978-3-540-70936-7_29.
  7. Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Proceedings of GSCL (pp. 31–40). http://www.ling.uni-potsdam.de/ gerlof/docs/npmi-pfd.pdf.
  8. Brown, P. F., Pietra, V. J. D., de Souza, P. V., Lai, J. C., & Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–479.Google Scholar
  9. Callan, J. P., & Connell, M. E. (2001). Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2), 97–130.CrossRefGoogle Scholar
  10. Cao, N., Wang, C., Li, M., Ren, K., & Lou, W. (2014). Privacy-preserving multi-keyword ranked search over encrypted cloud data. Parallel and Distributed Systems, IEEE Transactions on, 25(1), 222–233. doi:10.1109/TPDS.2013.45.CrossRefGoogle Scholar
  11. Caprara, A., Fischetti, M., & Toth, P. (1998). Algorithms for the set covering problem. Annals of Operations Research, 98, 2000.MathSciNetGoogle Scholar
  12. Carpineto, C., & Romano, G. (2013). Semantic search log k-anonymization with generalized k-cores of query concept graph. In Advances in information retrieval, lecture notes in computer Science (Vol. 7814, pp. 110–121). Berlin: Springer. doi:10.1007/978-3-642-36973-5_10.
  13. Carr, R. D., Doddi, S., Konjevod, G., & Marathe, M. (2000). On the red-blue set cover problem. In Proceedings of the eleventh annual ACM-SIAM symposium on Discrete Algorithms (pp. 345–353). Philadelphia: Society for Industrial and Applied Mathematics. SODA ’00, http://dl.acm.org/citation.cfm?id=338219.338271.
  14. Castellà-Roca, J., Viejo, A., & Herrera-Joancomartí, J. (2009). Preserving user’s privacy in web search engines. Computer Communications, 32(13–14), 1541–1551. doi:10.1016/j.comcom.2009.05.009.CrossRefGoogle Scholar
  15. Chvatal, V. (1979). A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4(3), 233–235. doi:10.2307/3689577.MATHMathSciNetCrossRefGoogle Scholar
  16. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37.CrossRefGoogle Scholar
  17. Domingo-Ferrer, J., Bras-Amorós, M., Wu, Q., & Manjón, J. A. (2009). User-private information retrieval based on a peer-to-peer community. Data & Knowledge Engineering, 68(11), 1237–1252.CrossRefGoogle Scholar
  18. Domingo-Ferrer, J., Solanas, A., & Castellà-Roca, J. (2009). h(k)-private information retrieval from privacy-uncooperative queryable databases. Online Information Review, 33(4), 720–744. doi:10.1108/14684520910985693.CrossRefGoogle Scholar
  19. Fleiss, J., et al. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRefGoogle Scholar
  20. Hannak, A., Sapiezynski, P., Molavi Kakhki, A., Krishnamurthy, B., Lazer, D., Mislove, A., & Wilson, C. (2013). Measuring personalization of web search. In Proceedings of the 22nd international conference on World Wide Web (pp. 527–538). Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee. WWW ’13. http://dl.acm.org/citation.cfm?id=2488388.2488435.
  21. Howe, D. C., & Nissenbaum, H. (2009). TrackMeNot: Resisting surveillance in web search. In I. Kerr, C. Lucock, & V. Steeves (Eds.), Lessons from the Identity trail: Anonymity, privacy, and identity in a networked society (Chap 23, pp. 417–436). Oxford: Oxford University Press.Google Scholar
  22. Karp, R. (1972). Reducibility among combinatorial problems. In R. E. Miller & J. W. Thatcher (Eds.), Proceedings of a Symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, The IBM Research Symposia Series, Complexity of computer computations (pp. 85–103). New York: Plenum Press.Google Scholar
  23. Lindell, Y., & Waisbard, E. (2010). Private web search with malicious adversaries. In Privacy enhancing technologies, lecture notes in computer science (Vol. 6205, pp. 220–235). Berlin: Springer. doi:10.1007/978-3-642-14527-8_13.
  24. Lund, C., & Yannakakis, M. (1994). On the hardness of approximating minimization problems. Journal of the ACM (JACM), 41(5), 960–981. doi:10.1145/185675.306789.MATHMathSciNetCrossRefGoogle Scholar
  25. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.MATHCrossRefGoogle Scholar
  26. Murugesan, M., & Clifton, C. (2009). Providing privacy through plausibly deniable search. In SDM, SIAM (pp. 768–779).Google Scholar
  27. Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In InfoScale ’06: Proceedings of the 1st international conference on scalable information systems. New York: ACM Press.Google Scholar
  28. Peddinti, S. T., & Saxena, N. (2014). Web search query privacy: Evaluating query obfuscation and anonymizing networks. Journal of Computer Security, 22(1), 155–199. http://dl.acm.org/citation.cfm?id=2590636.2590640.
  29. Saint-Jean, F., Johnson, A., Boneh, D., & Feigenbaum, J. (2007). Private web search. In WPES ’07: Proceedings of the 2007 ACM workshop on privacy in electronic society (pp. 84–90). New York: ACM.Google Scholar
  30. Sánchez, D., Castellà-Roca, J., & Viejo, A. (2013). Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines. Information Sciences, 218, 17–30. doi:10.1016/j.ins.2012.06.025.CrossRefGoogle Scholar
  31. Shen, X., Tan, B., & Zhai, C. (2007). Privacy protection in personalized search. SIGIR Forum, 41(1), 4–17.CrossRefGoogle Scholar
  32. Sweeney, L. (2002). k-Anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557–570.MATHMathSciNetCrossRefGoogle Scholar
  33. Terra, E. L., & Clarke, C. L. A. (2003). Frequency estimates for statistical word similarity measures. In HLT-NAACL, Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, May 27–June 1, Edmonton, Canada.Google Scholar
  34. Tigelaar, A. S., & Hiemstra, D. (2010). Query-based sampling using snippets. In Eighth workshop on Large-Scale Distributed Systems for information retrieval, Geneva, Switzerland, CEUR-WS, Aachen, Germany, CEUR workshop proceedings (Vol. 630, pp. 9–14).Google Scholar
  35. Viejo, A., & Sánchez, D. (2014). Profiling social networks to provide useful and privacy-preserving web search. JASIST, 65(12), 2444–2458. doi:10.1002/asi.23144.Google Scholar
  36. Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In ICML, Morgan Kaufmann (pp. 412–420).Google Scholar
  37. Young, N. E. (2008). Greedy set-cover algorithms. In M.-Y. Kao (Ed.), Encyclopedia of algorithms (pp. 379–381). US: Springer. doi:10.1007/978-0-387-30162-4_175.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Avi Arampatzis
    • 1
  • George Drosatos
    • 1
  • Pavlos S. Efraimidis
    • 1
  1. 1.Department of Electrical and Computer EngineeringDemocritus University of ThraceXanthiGreece

Personalised recommendations