A framework for condensation-based anonymization of string data

Aggarwal, Charu C.; Yu, Philip S.

doi:10.1007/s10618-008-0088-z

A framework for condensation-based anonymization of string data

Published: 06 February 2008

Volume 16, pages 251–275, (2008)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Charu C. Aggarwal¹ &
Philip S. Yu²

346 Accesses
11 Citations
Explore all metrics

Abstract

In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. An important method for privacy preserving data mining is the method of condensation. This method is often used in the case of multi-dimensional data in which pseudo-data is generated to mask the true values of the records. However, these methods are not easily applicable to the case of string data, since they require the use of multi-dimensional statistics in order to generate the pseudo-data. String data are especially important in the privacy preserving data-mining domain because most DNA and biological data are coded as strings. In this article, we will discuss a new method for privacy preserving mining of string data with the use of simple template-based models. The template-based model turns out to be effective in practice, and preserves important statistical characteristics of the strings such as intra-record distances. We will explore the behavior in the context of a classification application, and show that the accuracy of the application is not affected significantly by the anonymization process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC (2002) On effective classification of strings with wavelets. In: ACM KDD conference
Aggarwal CC (2004) On k-anonymity and the curse of dimensionality. In: VLDB conference. Scalable clustering with balancing constraints
Agrawal D, Aggarwal CC (2002) On the design and quantification of privacy preserving data mining algorithms. In: ACM PODS conference
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the VLDB conference
Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the ACM SIGMOD conference
Aggarwal CC, Yu PS (2004) A condensation based approach to privacy preserving data mining. In: EDBT conference
Aggarwal CC, Yu PS (2005) On variable constraints in privacy preserving data mining. In: ACM SIAM data mining conference
Aggarwal CC, Yu PS (2007) On anonymization of strings. In: SIAM conference on data mining. http://www.charuaggarwal.net/str.pdf
Banerjee A and Ghosh J (2006). Scalable clustering with balancing constraints. Data Min Knowl Discov J 13: 365–395
Article MathSciNet Google Scholar
Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: ICDE conference
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: KDD conference
Iyengar V (2000) Transforming data to satisfy privacy constraints. In: ACM KDD conference
Kifer D, Gehrke J (2006) Injecting utility into anonymized data sets. In: ACM SIGMOD conference
LeFevre K, Dewitt DJ, Ramakrishnan R (2006) Mondrian multi-dimensional k-anonymity. In: ICDE conference
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-Diversity: privacy beyond k-anonymity. In: ICDE conference
Malin B (2004) Why methods for genomic data privacy fail and what we can do to fix it. In: AAAS Annual Meeting, Seattle, WA
Malin B, Sweeney L (2001) Re-identification of DNA through an automated linkage process. In: Proceedings, Journal of American Medical Informatics Associations. Hanley & Belfus, Inc, Washington, DC, pp 423–427
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: ACM PODS conference
Needleman S and Wunsch C (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3): 443–453
Article Google Scholar
Rizvi S, Haritsa J (2002) Maintaining data privacy in association rule mining. In: VLDB conference
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Proceedings of the IEEE symposium on research in security and privacy
Sweeney L (1996) Replacing personally identifying information in medical records: the scrub system. In: Proceedings of the AMIA symposium
Wang K, Fung BCM, Yu PS (2006) Handicapping attacker’s confidence: an alternative to k-anonymization. Knowledge Inform Sys Int J

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Hawthorne, NY, USA
Charu C. Aggarwal
University of Illinois at Chicago, Chicago, IL, USA
Philip S. Yu

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charu C. Aggarwal.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aggarwal, C.C., Yu, P.S. A framework for condensation-based anonymization of string data. Data Min Knowl Disc 16, 251–275 (2008). https://doi.org/10.1007/s10618-008-0088-z

Download citation

Received: 08 February 2007
Accepted: 10 January 2008
Published: 06 February 2008
Issue Date: June 2008
DOI: https://doi.org/10.1007/s10618-008-0088-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for condensation-based anonymization of string data

Abstract

Access this article

Similar content being viewed by others

Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining

Privacy Preserving Datamining Techniques with Data Security in Data Transformation

Co-clustering for Microdata Anonymization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for condensation-based anonymization of string data

Abstract

Access this article

Similar content being viewed by others

Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining

Privacy Preserving Datamining Techniques with Data Security in Data Transformation

Co-clustering for Microdata Anonymization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation