Abstract
Analyzing data is a cost-intensive process, particularly for organizations lacking the necessary in-house human and computational capital. Data analytics outsourcing offers a cost-effective solution, but data sensitivity and query response time requirements, make data protection a necessary pre-processing step. For performance and privacy reasons, anonymization is preferred over encryption. Yet, manual anonymization is time-intensive and error-prone. Automated anonymization is a better alternative but requires satisfying the conflicting objectives of utility and privacy. In this paper, we present an automated anonymization scheme that extends the standard k-anonymization and l-diversity algorithms to satisfy the dual objectives of data utility and privacy. We use a multi-objective optimization scheme that employs a weighting mechanism, to minimise information loss and maximize privacy. Our results show that automating l-diversity results in an added average information loss of 7 % over automated k-anonymization, but in a diversity of between 9–14 % in comparison to 10–30 % in k-anonymised datasets. The lesson that emerges is that automated l-diversity offers better privacy than k-anonymization and with negligible information loss.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Quasi-identifiers: Attributes which independently or combined can be used to uniquely identify an individual.
References
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 901–909. VLDB Endowment (2005)
Aggarwal, C.C.: On unifying privacy and uncertain data models. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 386–395. IEEE Computer Society, Washington, DC (2008)
Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms, 1st edn. Springer, New York (2008)
Arasu, A., Eguro, K., Kaushik, R., Ramamurthy, R.: Querying encrypted data. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 1259–1261. ACM, New York (2014)
Aytug, H., Koehler, G.J.: New stopping criterion for genetic algorithms. Eur. J. Oper. Res. 126(3), 662–674 (2000)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228, April 2005
Burke, M., Kayem, A.: K-anonymity for privacy preserving crime data publishing in resource constrained environments. In: 28th International Conference on Advanced Information Networking and Applications Workshops, AINA 2014 Workshops, Victoria, BC, Canada, 13–16 May 2014, pp. 833–840 (2014)
Ciriani, V., Vimercati, S.D.C., Foresti, S., Samarati, P.: k-anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, pp. 105–136. Springer, Boston (2008)
Ciriani, V., De Capitani Di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Combining fragmentation and encryption to protect privacy in data storage. ACM Trans. Inf. Syst. Secur. 13(3), 22:1–22:33 (2010)
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)
De Capitani Di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Encryption policies for regulating access to outsourced data. ACM Trans. Database Syst. 35(2), 12:1–12:46 (2010)
De Capitani Di Vimercati S., Foresti, S., Paraboschi, S., Pelosi, G., Samarati, P.: Shuffle index: efficient and private access to outsourced data. ACM Trans. Storage 11(4), 19:1–19:55 (2015)
Dewri, R., Ray, I., Ray, I., Whitley, D.: Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization. J. Comput. Secur. 19(5), 935–974 (2011)
Dewri, R., Whitley, D., Ray, I., Ray, I.: A multi-objective approach to data sharing with privacy constraints and preference based objectives. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 1499–1506. ACM, New York (2009)
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 758–769. VLDB Endowment (2007)
Gould, C., Burger, J., Newham, G.: The saps crime statistics: what they tell us and what they don’t. SA Crime Quaterly (2012). https://www.issafrica.org/uploads/1crimestats.pdf
Hang, I., Kerschbaum, F., Damiani, E.: ENKI: access control for encrypted query processing. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 183–196. ACM, New York (2015)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)
Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 193–204. ACM, New York (2011)
Last, M., Tassa, T., Zhmudyak, A., Shmueli, E.: Improving accuracy of classification models induced from anonymized datasets. Inf. Sci. 256, 138–161 (2014). Business Intelligence in Risk Management
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, April 2007
Lin, J.L., Wei, M.C.: Genetic algorithm-based clustering approach for k-anonymization. Expert Syst. Appl. 36(6), 9784–9792 (2009)
Liu, K., Giannella, C., Kargupta, H.: A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, pp. 359–381. Springer, Boston (2008)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 1–52 (2007)
Nergiz, M.E., Tamersoy, A., Saygin, Y.: Instant anonymization. ACM Trans. Database Syst. 36(1), 2:1–2:33 (2011)
Sakpere, A.B., Kayem, A., Ndlovu, T.: A usable and secure crime reporting system for technology resource constrained context. In: 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, AINA 2015 Workshops, Gwangju, South Korea, 24–27 March 2015, pp. 424–429 (2015)
Seckan, B.: Violent crime in the developing world: research roundup. Journalist’s Resource: Research on today’s New topics (2012). http://journalistsresource.org/studies/international/development/crime-violence-developing-world-research-roundup
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)
Wang, F., Kohler, M., Schaad, A.: Initial encryption of large searchable data sets using hadoop. In: Proceedings of the 20th ACM Symposium on Access Control Models and Technologies, SACMAT 2015, pp. 165–168. ACM, New York (2015)
Website: South Africa’s police: something very rotten. In: The Economist: Middle East and Africa (2012). http://www.economist.com/node/21557385
Wicker, S.B.: The loss of location privacy in the cellular age. Commun. ACM 55(8), 60–68 (2012)
Wong, W.K., Kao, B., Cheung, D.W.L., Li, R., Yiu, S.M.: Secure query processing with data interoperability in a cloud database environment. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 1395–1406. ACM, New York (2014)
Xiao, Q., Reiter, M.K., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1582–1594. ACM, New York (2015)
Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 135–146. ACM, New York (2010)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. ACM, New York (2006)
Acknowledgements
The authors gratefully acknowledge funding for this research provided by the National Research Foundation (NRF) of South Africa, and the Hasso-Plattner-Institute (HPI). In addition, the authors are grateful for the anonymous reviews.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kayem, A.V.D.M., Vester, C.T., Meinel, C. (2016). Automated k-Anonymization and l-Diversity for Shared Data Privacy. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-44403-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44402-4
Online ISBN: 978-3-319-44403-1
eBook Packages: Computer ScienceComputer Science (R0)