Preserving Privacy in Data Mining via Importance Weighting

Elkan, Charles

doi:10.1007/978-3-642-19896-0_2

Charles Elkan²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6549))

Included in the following conference series:

International Workshop on Privacy and Security Issues in Data Mining and Machine Learning

1211 Accesses
1 Citations

Abstract

This paper presents a fundamentally new approach to allowing learning algorithms to be applied to a dataset, while still keeping the records in the dataset confidential. Let D be the set of records to be kept private, and let E be a fixed set of records from a similar domain that is already public. The idea is to compute and publish a weight w(x) for each record x in E that measures how representative it is of the records in D. Data mining on E using these importance weights is then approximately equivalent to data mining directly on D. The dataset D is used by its owner to compute the weights, but not revealed in any other way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the 24th ACM Symposium on Principles of Database Systems, pp. 128–138. ACM Press, New York (2005)
Google Scholar
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pp. 609–618. ACM Press, New York (2008)
Google Scholar
Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS), pp. 289–296 (2008)
Google Scholar
Chaudhuri, K., Sarwate, A.D.: Privacy constraints in regularized convex optimization. Arxiv preprint arXiv:0907.1413 (2009)
Google Scholar
Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correction theory. In: Algorithmic Learning Theory, pp. 38–53. Springer, Heidelberg (2010)
Google Scholar
Dwork, C.: Differential privacy: A survey of results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Chapter Google Scholar
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Article MathSciNet MATH Google Scholar
Kearns, M.: Efficient noise-tolerant learning from statistical queries. Journal of the ACM 45(6), 983–1006 (1998)
Article MathSciNet MATH Google Scholar
Press, W.H.: How to use Markov chain Monte Carlo to do difficult integrals (including those for normalizing constants) (2004), Draft working paper available at http://www.nr.com/whp/workingpapers.html
Scott, D.W.: Multivariate density estimation: Theory, practice, and visualization. Wiley-Interscience, Hoboken (1992)
Book MATH Google Scholar
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90(2), 227–244 (2000)
Article MathSciNet MATH Google Scholar
Smith, A., Elkan, C.: Making generative classifiers robust to selection bias. In: Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 657–666. ACM Press, New York (2007)
Google Scholar
Tsuboi, Y., Kashima, H., Bickel, S., Sugiyama, M.: Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation. Journal of Information Processing 17, 138–155 (2009)
Article Google Scholar
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: Proceedings of the 21st International Conference on Machine Learning, pp. 903–910. ACM Press, New York (2004)
Google Scholar
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the 18th International Conference on Machine Learning, pp. 609–616. Morgan Kaufmann, San Francisco (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, 92093-0404, USA
Charles Elkan

Authors

Charles Elkan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Johann Wolfgang Goethe University, Ruth-Moufang-Str. 1, 60438, Frankfurt am Main, Germany
Christos Dimitrakakis
Information Analytics Lab, IBM Research – Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
Aris Gkoulalas-Divanis
Ecole Polytechnice Fédérale de Lausanne, I&C - ISC - LASEC, Bâtiment INF, Station 14, CH-1015, Lausanne, Switzerland
Aikaterini Mitrokotsa
Department of Computer and Communication Engineering, University of Thessaly, Glavani 37 & 28TH, GR 38221, Octovriou, Volos, Greece
Vassilios S. Verykios
Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli, 34956, Tuzla, Istanbul, Turkey
Yücel Saygin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elkan, C. (2011). Preserving Privacy in Data Mining via Importance Weighting. In: Dimitrakakis, C., Gkoulalas-Divanis, A., Mitrokotsa, A., Verykios, V.S., Saygin, Y. (eds) Privacy and Security Issues in Data Mining and Machine Learning. PSDML 2010. Lecture Notes in Computer Science(), vol 6549. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19896-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-19896-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19895-3
Online ISBN: 978-3-642-19896-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics