Using Search Results to Microaggregate Query Logs Semantically

Erola, Arnau; Castellà-Roca, Jordi

doi:10.1007/978-3-642-54568-9_10

Arnau Erola²⁰ &
Jordi Castellà-Roca²⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8247))

Included in the following conference series:

1305 Accesses
2 Citations

Abstract

Query log anonymization has become an important challenge nowadays. A query log contains the search history of the users, as well as the selected results and their position in the ranking. These data are used to provide a personalized re-ranking of results and trend studies. However, query logs can disclose sensitive information of the users. Hence, query logs must be submitted to an anonymization process to guarantee that: (a) no sensitive information can be linked to an identity; (b) the analysis of the anonymized data produces similar results than the original data, i.e. minimize data distortion. Latest anonymization approaches utilize microaggregation, a statistical disclosure control technique that provides a privacy comparable with \(k\)-anonymity, attempting to minimize the data distortion. We propose a new method that uses search results to optimize microaggregation, providing more data reliability than the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that a user can consider some information private or not according to her beliefs, i.e. whereas a user can consider her religion a public issue, another user can consider this information private. Determining what information is private or not is out of the scope of this paper. For this reason, we consider that all the information has the same importance and is private, as is made in [3].

References

Richardson, M.: Learning about the world through long-term query logs. ACM Trans. Web 2, 1–27 (2008)
Article Google Scholar
Xiong, L., Agichtein, E.: Towards privacy-preserving query log publishing. In: Amitay, E., Murray, C.G., Teevan, J. (eds) Query Log Analysis: Social and Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007) (2007)
Google Scholar
He, Y., Naughton, J.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endowment 2(1), 934–945 (2009)
Google Scholar
Adar, E.: User 4XXXXX9: anonymizing query logs. In: Query Log Analysis: Social and Technological Challenges. A Workshop at the 16th International World Wide Web Conference (WWW 2007) (2007)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy (1997)
Google Scholar
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.: Utility-based anonymization for privacy preservation with less information loss. SIGKDD Explor. Newsl. 8(2), 21–30 (2006)
Article Google Scholar
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 92 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, pp. 195–204 (1993)
Google Scholar
Hong, Y., He, X., Vaidya, J., Adam, N., Atluri, V.: Effective anonymization of query logs. In: CIKM ’09: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 1465–1468 (2009)
Google Scholar
Navarro-Arribas, G., Torra, V., Erola, A., Castellà-Roca, J.: User k-anonymity for privacy preserving data mining of query logs. Inf. Process. Manage. 48(3), 476–487 (2012)
Article Google Scholar
Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 127–137. Springer, Heidelberg (2010)
Google Scholar
Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs using the open directory project. SORT-Stat. Oper. Res. Trans. 35(Special issue), 25–40 (2011)
Google Scholar
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55(4), 714–732 (2008)
Article MATH MathSciNet Google Scholar
Cooper, A.: A survey of query log privacy-enhancing techniques from a policy perspective. ACM Trans. Web 2(4), 1–27 (2008)
Article Google Scholar
Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 171–180 (2009)
Google Scholar
Poblete, B., Spiliopoulou, M., Baeza-Yates, R.: Website privacy preservation for query log publishing. In: Bonchi, F., Malin, B., Saygın, Y. (eds.) PInKDD 2007. LNCS, vol. 4890, pp. 80–96. Springer, Heidelberg (2008)
Google Scholar
Miller, G.: WordNet - About Us. WordNet. Princeton University, Princeton (2009)
Google Scholar
ODP. Open directory project (2011)
Google Scholar
Sætre, R., Tveit, A., Steigedal, T.S., Lægreid, A.: Semantic annotation of biomedical literature using google. ICCSA 3, 327–337 (2005)
Google Scholar
Gligorov, R., Aleksovski, Z., Kate, W., F. Van Harmelen, B.: Using google distance to weight approximate ontology matches. In: Proceedings of the WWW-07, pp. 767–776. ACM Press (2007)
Google Scholar
iprospect.com, inc, iProspect Blended Search Results Study. http://www.iProspect.com (2009)

Download references

Acknowledgements

This work was partly supported by the European Commission under FP7 project Inter-Trust, by the Spanish Ministry of Science and Innovation (through projects eAEGIS TSI2007-65406-C03-01, CO-PRIVACY TIN2011-27076-C03-01, ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, Audit Transparency Voting Process IPT-430000-2010-31 and BallotNext IPT-2012-0603-430000) and by the Government of Catalonia (under grant 2009 SGR 1135).

Author information

Authors and Affiliations

Departament d’Enginyeria Informàtica i Matemàtiques, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Av. Països Catalans 26, 43007, Tarragona, Spain
Arnau Erola & Jordi Castellà-Roca

Authors

Arnau Erola
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Castellà-Roca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnau Erola .

Editor information

Editors and Affiliations

TELECOM SudParis, Evry, France
Joaquin Garcia-Alfaro
National Technical University of Athens, Athens, Greece
Georgios Lioudakis
TELECOM Bretagne, Cesson Sévigné, France
Nora Cuppens-Boulahia
University College Cork, Cork, Ireland
Simon Foley
IDA Ovens, EMC Information Systems International, Cork, Ireland
William M. Fitzgerald

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Erola, A., Castellà-Roca, J. (2014). Using Search Results to Microaggregate Query Logs Semantically. In: Garcia-Alfaro, J., Lioudakis, G., Cuppens-Boulahia, N., Foley, S., Fitzgerald, W. (eds) Data Privacy Management and Autonomous Spontaneous Security. DPM SETOP 2013 2013. Lecture Notes in Computer Science(), vol 8247. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54568-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-54568-9_10
Published: 21 March 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54567-2
Online ISBN: 978-3-642-54568-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics