Skip to main content

Using Search Results to Microaggregate Query Logs Semantically

  • Conference paper
  • First Online:
Data Privacy Management and Autonomous Spontaneous Security (DPM 2013, SETOP 2013)

Abstract

Query log anonymization has become an important challenge nowadays. A query log contains the search history of the users, as well as the selected results and their position in the ranking. These data are used to provide a personalized re-ranking of results and trend studies. However, query logs can disclose sensitive information of the users. Hence, query logs must be submitted to an anonymization process to guarantee that: (a) no sensitive information can be linked to an identity; (b) the analysis of the anonymized data produces similar results than the original data, i.e. minimize data distortion. Latest anonymization approaches utilize microaggregation, a statistical disclosure control technique that provides a privacy comparable with \(k\)-anonymity, attempting to minimize the data distortion. We propose a new method that uses search results to optimize microaggregation, providing more data reliability than the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that a user can consider some information private or not according to her beliefs, i.e. whereas a user can consider her religion a public issue, another user can consider this information private. Determining what information is private or not is out of the scope of this paper. For this reason, we consider that all the information has the same importance and is private, as is made in [3].

References

  1. Richardson, M.: Learning about the world through long-term query logs. ACM Trans. Web 2, 1–27 (2008)

    Article  Google Scholar 

  2. Xiong, L., Agichtein, E.: Towards privacy-preserving query log publishing. In: Amitay, E., Murray, C.G., Teevan, J. (eds) Query Log Analysis: Social and Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007) (2007)

    Google Scholar 

  3. He, Y., Naughton, J.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endowment 2(1), 934–945 (2009)

    Google Scholar 

  4. Adar, E.: User 4XXXXX9: anonymizing query logs. In: Query Log Analysis: Social and Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007) (2007)

    Google Scholar 

  5. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy (1997)

    Google Scholar 

  6. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.: Utility-based anonymization for privacy preservation with less information loss. SIGKDD Explor. Newsl. 8(2), 21–30 (2006)

    Article  Google Scholar 

  7. Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 92 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, pp. 195–204 (1993)

    Google Scholar 

  8. Hong, Y., He, X., Vaidya, J., Adam, N., Atluri, V.: Effective anonymization of query logs. In: CIKM ’09: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 1465–1468 (2009)

    Google Scholar 

  9. Navarro-Arribas, G., Torra, V., Erola, A., Castellà-Roca, J.: User k-anonymity for privacy preserving data mining of query logs. Inf. Process. Manage. 48(3), 476–487 (2012)

    Article  Google Scholar 

  10. Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 127–137. Springer, Heidelberg (2010)

    Google Scholar 

  11. Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs using the open directory project. SORT-Stat. Oper. Res. Trans. 35(Special issue), 25–40 (2011)

    Google Scholar 

  12. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  13. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)

    Article  Google Scholar 

  14. Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55(4), 714–732 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  15. Cooper, A.: A survey of query log privacy-enhancing techniques from a policy perspective. ACM Trans. Web 2(4), 1–27 (2008)

    Article  Google Scholar 

  16. Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 171–180 (2009)

    Google Scholar 

  17. Poblete, B., Spiliopoulou, M., Baeza-Yates, R.: Website privacy preservation for query log publishing. In: Bonchi, F., Malin, B., Saygın, Y. (eds.) PInKDD 2007. LNCS, vol. 4890, pp. 80–96. Springer, Heidelberg (2008)

    Google Scholar 

  18. Miller, G.: WordNet - About Us. WordNet. Princeton University, Princeton (2009)

    Google Scholar 

  19. ODP. Open directory project (2011)

    Google Scholar 

  20. Sætre, R., Tveit, A., Steigedal, T.S., Lægreid, A.: Semantic annotation of biomedical literature using google. ICCSA 3, 327–337 (2005)

    Google Scholar 

  21. Gligorov, R., Aleksovski, Z., Kate, W., F. Van Harmelen, B.: Using google distance to weight approximate ontology matches. In: Proceedings of the WWW-07, pp. 767–776. ACM Press (2007)

    Google Scholar 

  22. iprospect.com, inc, iProspect Blended Search Results Study. http://www.iProspect.com (2009)

Download references

Acknowledgements

This work was partly supported by the European Commission under FP7 project Inter-Trust, by the Spanish Ministry of Science and Innovation (through projects eAEGIS TSI2007-65406-C03-01, CO-PRIVACY TIN2011-27076-C03-01, ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, Audit Transparency Voting Process IPT-430000-2010-31 and BallotNext IPT-2012-0603-430000) and by the Government of Catalonia (under grant 2009 SGR 1135).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnau Erola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Erola, A., Castellà-Roca, J. (2014). Using Search Results to Microaggregate Query Logs Semantically. In: Garcia-Alfaro, J., Lioudakis, G., Cuppens-Boulahia, N., Foley, S., Fitzgerald, W. (eds) Data Privacy Management and Autonomous Spontaneous Security. DPM SETOP 2013 2013. Lecture Notes in Computer Science(), vol 8247. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54568-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54568-9_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54567-2

  • Online ISBN: 978-3-642-54568-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics