Skip to main content

Automated k-Anonymization and l-Diversity for Shared Data Privacy

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9827))

Included in the following conference series:

Abstract

Analyzing data is a cost-intensive process, particularly for organizations lacking the necessary in-house human and computational capital. Data analytics outsourcing offers a cost-effective solution, but data sensitivity and query response time requirements, make data protection a necessary pre-processing step. For performance and privacy reasons, anonymization is preferred over encryption. Yet, manual anonymization is time-intensive and error-prone. Automated anonymization is a better alternative but requires satisfying the conflicting objectives of utility and privacy. In this paper, we present an automated anonymization scheme that extends the standard k-anonymization and l-diversity algorithms to satisfy the dual objectives of data utility and privacy. We use a multi-objective optimization scheme that employs a weighting mechanism, to minimise information loss and maximize privacy. Our results show that automating l-diversity results in an added average information loss of 7 % over automated k-anonymization, but in a diversity of between 9–14 % in comparison to 10–30 % in k-anonymised datasets. The lesson that emerges is that automated l-diversity offers better privacy than k-anonymization and with negligible information loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Quasi-identifiers: Attributes which independently or combined can be used to uniquely identify an individual.

References

  1. Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 901–909. VLDB Endowment (2005)

    Google Scholar 

  2. Aggarwal, C.C.: On unifying privacy and uncertain data models. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 386–395. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  3. Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms, 1st edn. Springer, New York (2008)

    Book  Google Scholar 

  4. Arasu, A., Eguro, K., Kaushik, R., Ramamurthy, R.: Querying encrypted data. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 1259–1261. ACM, New York (2014)

    Google Scholar 

  5. Aytug, H., Koehler, G.J.: New stopping criterion for genetic algorithms. Eur. J. Oper. Res. 126(3), 662–674 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228, April 2005

    Google Scholar 

  7. Burke, M., Kayem, A.: K-anonymity for privacy preserving crime data publishing in resource constrained environments. In: 28th International Conference on Advanced Information Networking and Applications Workshops, AINA 2014 Workshops, Victoria, BC, Canada, 13–16 May 2014, pp. 833–840 (2014)

    Google Scholar 

  8. Ciriani, V., Vimercati, S.D.C., Foresti, S., Samarati, P.: k-anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, pp. 105–136. Springer, Boston (2008)

    Chapter  Google Scholar 

  9. Ciriani, V., De Capitani Di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Combining fragmentation and encryption to protect privacy in data storage. ACM Trans. Inf. Syst. Secur. 13(3), 22:1–22:33 (2010)

    Google Scholar 

  10. Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)

    MathSciNet  Google Scholar 

  11. De Capitani Di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Encryption policies for regulating access to outsourced data. ACM Trans. Database Syst. 35(2), 12:1–12:46 (2010)

    Google Scholar 

  12. De Capitani Di Vimercati S., Foresti, S., Paraboschi, S., Pelosi, G., Samarati, P.: Shuffle index: efficient and private access to outsourced data. ACM Trans. Storage 11(4), 19:1–19:55 (2015)

    Google Scholar 

  13. Dewri, R., Ray, I., Ray, I., Whitley, D.: Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization. J. Comput. Secur. 19(5), 935–974 (2011)

    Article  Google Scholar 

  14. Dewri, R., Whitley, D., Ray, I., Ray, I.: A multi-objective approach to data sharing with privacy constraints and preference based objectives. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 1499–1506. ACM, New York (2009)

    Google Scholar 

  15. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)

    MathSciNet  MATH  Google Scholar 

  16. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 758–769. VLDB Endowment (2007)

    Google Scholar 

  17. Gould, C., Burger, J., Newham, G.: The saps crime statistics: what they tell us and what they don’t. SA Crime Quaterly (2012). https://www.issafrica.org/uploads/1crimestats.pdf

  18. Hang, I., Kerschbaum, F., Damiani, E.: ENKI: access control for encrypted query processing. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 183–196. ACM, New York (2015)

    Google Scholar 

  19. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)

    Google Scholar 

  20. Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 193–204. ACM, New York (2011)

    Google Scholar 

  21. Last, M., Tassa, T., Zhmudyak, A., Shmueli, E.: Improving accuracy of classification models induced from anonymized datasets. Inf. Sci. 256, 138–161 (2014). Business Intelligence in Risk Management

    Article  MathSciNet  Google Scholar 

  22. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, April 2007

    Google Scholar 

  23. Lin, J.L., Wei, M.C.: Genetic algorithm-based clustering approach for k-anonymization. Expert Syst. Appl. 36(6), 9784–9792 (2009)

    Article  Google Scholar 

  24. Liu, K., Giannella, C., Kargupta, H.: A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, pp. 359–381. Springer, Boston (2008)

    Chapter  Google Scholar 

  25. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 1–52 (2007)

    Article  Google Scholar 

  26. Nergiz, M.E., Tamersoy, A., Saygin, Y.: Instant anonymization. ACM Trans. Database Syst. 36(1), 2:1–2:33 (2011)

    Article  Google Scholar 

  27. Sakpere, A.B., Kayem, A., Ndlovu, T.: A usable and secure crime reporting system for technology resource constrained context. In: 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, AINA 2015 Workshops, Gwangju, South Korea, 24–27 March 2015, pp. 424–429 (2015)

    Google Scholar 

  28. Seckan, B.: Violent crime in the developing world: research roundup. Journalist’s Resource: Research on today’s New topics (2012). http://journalistsresource.org/studies/international/development/crime-violence-developing-world-research-roundup

  29. Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  30. Wang, F., Kohler, M., Schaad, A.: Initial encryption of large searchable data sets using hadoop. In: Proceedings of the 20th ACM Symposium on Access Control Models and Technologies, SACMAT 2015, pp. 165–168. ACM, New York (2015)

    Google Scholar 

  31. Website: South Africa’s police: something very rotten. In: The Economist: Middle East and Africa (2012). http://www.economist.com/node/21557385

  32. Wicker, S.B.: The loss of location privacy in the cellular age. Commun. ACM 55(8), 60–68 (2012)

    Article  Google Scholar 

  33. Wong, W.K., Kao, B., Cheung, D.W.L., Li, R., Yiu, S.M.: Secure query processing with data interoperability in a cloud database environment. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 1395–1406. ACM, New York (2014)

    Google Scholar 

  34. Xiao, Q., Reiter, M.K., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1582–1594. ACM, New York (2015)

    Google Scholar 

  35. Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 135–146. ACM, New York (2010)

    Google Scholar 

  36. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. ACM, New York (2006)

    Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge funding for this research provided by the National Research Foundation (NRF) of South Africa, and the Hasso-Plattner-Institute (HPI). In addition, the authors are grateful for the anonymous reviews.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne V. D. M. Kayem .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kayem, A.V.D.M., Vester, C.T., Meinel, C. (2016). Automated k-Anonymization and l-Diversity for Shared Data Privacy. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44403-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44402-4

  • Online ISBN: 978-3-319-44403-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics