Skip to main content

Semantic Microaggregation for the Anonymization of Query Logs

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 6344)

Abstract

The publication of Web search logs is very useful for the scientific research community, but to preserve the users’ privacy, logs have to be submitted to an anonymization process. Random query swapping is a common technique used to protect logs that provides k-anonymity to the users in exchange for loss of utility. With the assumption that by swapping queries semantically close this utility loss can be reduced, we introduce a novel protection method that semantically microaggregates the logs using the Open Directory Project. That is, we extend a common method used in statistical disclosure control to protect search logs from a semantic perspective. The method has been tested with a random subset of AOL search logs, and it has been observed that new logs improve the data usefulness.

Keywords

  • Incidence Matrix
  • Depth Level
  • Semantic Distance
  • Similar User
  • Similar Query

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-15838-4_12
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-15838-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adar, E.: User 4xxxxx9: Anonymizing query logs. In: Query Logs workshop (2007)

    Google Scholar 

  2. Cooper, A.: A survey of query log privacy-enhancing techniques from a policy perspective. ACM Transactions on the Web 2(4) (2008)

    Google Scholar 

  3. Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proc. of 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, pp. 195–204 (1993)

    Google Scholar 

  4. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)

    CrossRef  Google Scholar 

  5. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)

    CrossRef  MathSciNet  Google Scholar 

  6. Domingo-Ferrer, J., Solanas, A.: Erratum: Erratum to ”a measure of variance for hierarchical nominal attributes”. Inf. Sci. 179(20), 3732 (2009)

    CrossRef  MathSciNet  Google Scholar 

  7. EFF. AOL’s massive data leak. Electronic Frontier Foundation (2009), http://w2.eff.org/Privacy/AOL/

  8. Gauch, S., Speretta, M.: Personalized search based on user search histories. In: Proc. of International Conference of Knowledge Management, CIKM 2004, pp. 622–628 (2004)

    Google Scholar 

  9. Google. 2008 annual report (December 2008), http://investor.google.com/order.html

  10. Hansell, S.: Increasingly, internet’s data trail leads to court. The New York Times (February 2006)

    Google Scholar 

  11. He, Y., Naughton, J.: Anonymization of set-valued data via top-down, local generalization. Proceedings of the VLDB Endowment 2(1), 934–945 (2009)

    Google Scholar 

  12. Hong, Y., He, X., Vaidya, J., Adam, N., Atluri, V.: Effective anonymization of query logs. In: CIKM 2009: Proceeding of the 18th ACM conference on Information and knowledge management, pp. 1465–1468 (2009)

    Google Scholar 

  13. Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: WWW 2009: Proceedings of the 18th international conference on World wide web, pp. 171–180 (2009)

    Google Scholar 

  14. Miller, G.: WordNet - about us. WordNet. Princeton University (2009), http://wordnet.princeton.edu

  15. Mills, E.: AOL sued over web search data release. CNET News (September 2006), http://news.cnet.com/8301-10784_3-6119218-7.html

  16. Navarro-Arribas, G., Torra, V.: Tree-based microaggregation for the anonymization of search logs. In: WI-IAT 2009: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, pp. 155–158 (2009)

    Google Scholar 

  17. ODP. Open directory project (2010)

    Google Scholar 

  18. Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Commision for Europe 18(4), 345–353 (2001)

    Google Scholar 

  19. Poblete, B., Spiliopoulou, M., Baeza-Yates, R.: Website privacy preservation for query log publishing. In: Bonchi, F., Ferrari, E., Malin, B., Saygın, Y. (eds.) PInKDD 2007. LNCS, vol. 4890, pp. 80–96. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  20. Samarati, P.: Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)

    CrossRef  Google Scholar 

  21. Summers, N.: Walking the cyberbeat. Newsweek (May 2009), http://www.newsweek.com/id/195621

  22. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5) (2002)

    Google Scholar 

  23. Torra, V.: Microaggregation for categorical variables: A median based approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

  24. Torra, V.: Constrained microaggregation: Adding constraints for data editing. Transactions on Data Privacy 1(2), 86–104 (2008)

    Google Scholar 

  25. Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58(301), 236–244 (1963)

    CrossRef  MathSciNet  Google Scholar 

  26. Zetter, K.: Yahoo issues takedown notice for spying price list. Wired (December 2009), http://www.wired.com/threatlevel/2009/12/yahoo-spy-prices/#more-11725

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V. (2010). Semantic Microaggregation for the Anonymization of Query Logs. In: Domingo-Ferrer, J., Magkos, E. (eds) Privacy in Statistical Databases. PSD 2010. Lecture Notes in Computer Science, vol 6344. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15838-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15838-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15837-7

  • Online ISBN: 978-3-642-15838-4

  • eBook Packages: Computer ScienceComputer Science (R0)