Abstract
Organizations need to anonymize the data before releasing them so that data mining cannot predict private information. It’s the duty of every organization to ensure privacy of its stakeholders. There is a tradeoff between privacy and utility of the released data. Many methods have been proposed earlier for correlating between the released data and the actual data. All of them use the information theoretic measures. Various methods have been proposed to tackle the privacy preservation problem like Anonymization and perturbation; but the natural consequence of privacy preservation is information loss. The loss of specific information about certain individuals may affect the data quality and in extreme case the data may become completely useless. There are methods like cryptography which completely anonymize the dataset and which renders the dataset useless making the utility of the data is completely lost. One needs to protect the private information and preserve the data utility as much as possible. The objective of this paper is to find an optimum balance between privacy and utility while publishing dataset of any organization. Privacy preservation is hard requirement that must be satisfied and utility is the measure to be optimized. One of the methods for preserving privacy is k-Anonymization which also preserves privacy to a good extent. Many other methods also were proposed after k- Anonymity, but they are impractical. The balancing point will vary from dataset to dataset and the choice of Quasi-identifier sensitive attribute and number of records.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Vaidya, J., Clifton, C.W., Zhu, Y.M.: Privacy Preserving Data Mining, ch. 1. Springer, New York (2006)
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-Diversity: Privacy beyond k-anonymity. ACM Trans. Knowledge Discovery of Data 1(3) (March 2007)
Li, N., et al.: t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. In: Proceedings of IEEE 23rd ICDE, pp. 106–115 (April 2007)
Lin, J.-L., Wei, M.-C.: An Efficient Clustering Method for k-Anonymization. In: Proceedings of the International Workshop on Privacy and Anonymity in Information Society, vol. 331, pp. 46–50 (2008)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
UCI Repository of machine learning databases, University of California, Irvine, http://archive.ics.uci.edu/ml/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Babu, K.S., Jena, S.K. (2011). Balancing between Utility and Privacy for k-Anonymity. In: Abraham, A., Lloret Mauri, J., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22714-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-22714-1_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22713-4
Online ISBN: 978-3-642-22714-1
eBook Packages: Computer ScienceComputer Science (R0)