Automated k-Anonymization and l-Diversity for Shared Data Privacy

Kayem, Anne V. D. M.; Vester, C. T.; Meinel, Christoph

doi:10.1007/978-3-319-44403-1_7

Anne V. D. M. Kayem^15,16,
C. T. Vester¹⁵ &
Christoph Meinel¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9827))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

964 Accesses
5 Citations

Abstract

Analyzing data is a cost-intensive process, particularly for organizations lacking the necessary in-house human and computational capital. Data analytics outsourcing offers a cost-effective solution, but data sensitivity and query response time requirements, make data protection a necessary pre-processing step. For performance and privacy reasons, anonymization is preferred over encryption. Yet, manual anonymization is time-intensive and error-prone. Automated anonymization is a better alternative but requires satisfying the conflicting objectives of utility and privacy. In this paper, we present an automated anonymization scheme that extends the standard k-anonymization and l-diversity algorithms to satisfy the dual objectives of data utility and privacy. We use a multi-objective optimization scheme that employs a weighting mechanism, to minimise information loss and maximize privacy. Our results show that automating l-diversity results in an added average information loss of 7 % over automated k-anonymization, but in a diversity of between 9–14 % in comparison to 10–30 % in k-anonymised datasets. The lesson that emerges is that automated l-diversity offers better privacy than k-anonymization and with negligible information loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Quasi-identifiers: Attributes which independently or combined can be used to uniquely identify an individual.

References

Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 901–909. VLDB Endowment (2005)
Google Scholar
Aggarwal, C.C.: On unifying privacy and uncertain data models. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 386–395. IEEE Computer Society, Washington, DC (2008)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms, 1st edn. Springer, New York (2008)
Book Google Scholar
Arasu, A., Eguro, K., Kaushik, R., Ramamurthy, R.: Querying encrypted data. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 1259–1261. ACM, New York (2014)
Google Scholar
Aytug, H., Koehler, G.J.: New stopping criterion for genetic algorithms. Eur. J. Oper. Res. 126(3), 662–674 (2000)
Article MathSciNet MATH Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228, April 2005
Google Scholar
Burke, M., Kayem, A.: K-anonymity for privacy preserving crime data publishing in resource constrained environments. In: 28th International Conference on Advanced Information Networking and Applications Workshops, AINA 2014 Workshops, Victoria, BC, Canada, 13–16 May 2014, pp. 833–840 (2014)
Google Scholar
Ciriani, V., Vimercati, S.D.C., Foresti, S., Samarati, P.: k-anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, pp. 105–136. Springer, Boston (2008)
Chapter Google Scholar
Ciriani, V., De Capitani Di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Combining fragmentation and encryption to protect privacy in data storage. ACM Trans. Inf. Syst. Secur. 13(3), 22:1–22:33 (2010)
Google Scholar
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)
MathSciNet Google Scholar
De Capitani Di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Encryption policies for regulating access to outsourced data. ACM Trans. Database Syst. 35(2), 12:1–12:46 (2010)
Google Scholar
De Capitani Di Vimercati S., Foresti, S., Paraboschi, S., Pelosi, G., Samarati, P.: Shuffle index: efficient and private access to outsourced data. ACM Trans. Storage 11(4), 19:1–19:55 (2015)
Google Scholar
Dewri, R., Ray, I., Ray, I., Whitley, D.: Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization. J. Comput. Secur. 19(5), 935–974 (2011)
Article Google Scholar
Dewri, R., Whitley, D., Ray, I., Ray, I.: A multi-objective approach to data sharing with privacy constraints and preference based objectives. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 1499–1506. ACM, New York (2009)
Google Scholar
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
MathSciNet MATH Google Scholar
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 758–769. VLDB Endowment (2007)
Google Scholar
Gould, C., Burger, J., Newham, G.: The saps crime statistics: what they tell us and what they don’t. SA Crime Quaterly (2012). https://www.issafrica.org/uploads/1crimestats.pdf
Hang, I., Kerschbaum, F., Damiani, E.: ENKI: access control for encrypted query processing. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 183–196. ACM, New York (2015)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)
Google Scholar
Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 193–204. ACM, New York (2011)
Google Scholar
Last, M., Tassa, T., Zhmudyak, A., Shmueli, E.: Improving accuracy of classification models induced from anonymized datasets. Inf. Sci. 256, 138–161 (2014). Business Intelligence in Risk Management
Article MathSciNet Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, April 2007
Google Scholar
Lin, J.L., Wei, M.C.: Genetic algorithm-based clustering approach for k-anonymization. Expert Syst. Appl. 36(6), 9784–9792 (2009)
Article Google Scholar
Liu, K., Giannella, C., Kargupta, H.: A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, pp. 359–381. Springer, Boston (2008)
Chapter Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 1–52 (2007)
Article Google Scholar
Nergiz, M.E., Tamersoy, A., Saygin, Y.: Instant anonymization. ACM Trans. Database Syst. 36(1), 2:1–2:33 (2011)
Article Google Scholar
Sakpere, A.B., Kayem, A., Ndlovu, T.: A usable and secure crime reporting system for technology resource constrained context. In: 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, AINA 2015 Workshops, Gwangju, South Korea, 24–27 March 2015, pp. 424–429 (2015)
Google Scholar
Seckan, B.: Violent crime in the developing world: research roundup. Journalist’s Resource: Research on today’s New topics (2012). http://journalistsresource.org/studies/international/development/crime-violence-developing-world-research-roundup
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Wang, F., Kohler, M., Schaad, A.: Initial encryption of large searchable data sets using hadoop. In: Proceedings of the 20th ACM Symposium on Access Control Models and Technologies, SACMAT 2015, pp. 165–168. ACM, New York (2015)
Google Scholar
Website: South Africa’s police: something very rotten. In: The Economist: Middle East and Africa (2012). http://www.economist.com/node/21557385
Wicker, S.B.: The loss of location privacy in the cellular age. Commun. ACM 55(8), 60–68 (2012)
Article Google Scholar
Wong, W.K., Kao, B., Cheung, D.W.L., Li, R., Yiu, S.M.: Secure query processing with data interoperability in a cloud database environment. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 1395–1406. ACM, New York (2014)
Google Scholar
Xiao, Q., Reiter, M.K., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1582–1594. ACM, New York (2015)
Google Scholar
Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 135–146. ACM, New York (2010)
Google Scholar
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. ACM, New York (2006)
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge funding for this research provided by the National Research Foundation (NRF) of South Africa, and the Hasso-Plattner-Institute (HPI). In addition, the authors are grateful for the anonymous reviews.

Author information

Authors and Affiliations

Department of Computer Science, University of Cape Town, Rondebosch, Cape Town, 7701, South Africa
Anne V. D. M. Kayem & C. T. Vester
Hasso-Plattner-Institute, Potsdam, Germany
Anne V. D. M. Kayem & Christoph Meinel

Authors

Anne V. D. M. Kayem
View author publications
You can also search for this author in PubMed Google Scholar
C. T. Vester
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne V. D. M. Kayem .

Editor information

Editors and Affiliations

Clausthal University of Technology , Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington , Wellington, New Zealand
Hui Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kayem, A.V.D.M., Vester, C.T., Meinel, C. (2016). Automated k-Anonymization and l-Diversity for Shared Data Privacy. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-44403-1_7
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44402-4
Online ISBN: 978-3-319-44403-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics