Abstract
Resource constrained environments (RCEs) describe remote or rural developing world regions where missing specialised expertise, and computational processing power hinders data analytics operations. Outsourcing to third-party data analytics service providers offers a cost-effective management solution. However, a necessary pre-processing step is to anonymise the data before it is shared, to protect against privacy violations. Syntactic anonymisation algorithms (k-anonymisation, l-diversity, and t-closeness) are an attractive solution for RCEs because the generated data is not use case specific. These algorithms have however been shown to be NP-Hard, and as such need to be re-factored to run efficiently with limited processing power. In previous work [23], we presented a method of extending the standard k-anonymization and l-diversity algorithms, to satisfy both data utility and privacy. We used a multi-objective optimization scheme to minimise information loss and maximize privacy. Our results showed that the extended l-diverse algorithm incurs higher information losses than the extended k-anonymity algorithm, but offers better privacy in terms of protection against inferential disclosure. The additional information loss (7%) was negligible, and did not negatively affect data utility. As a further step, in this paper, we extend this result with a modified t-closeness algorithm based on the notion of clustering. The aim of this is to provide a performance-efficient algorithm that maintains the low information loss levels of our extended k-anonymisation and l-diversity algorithms, but also provides protection against skewness and similarity attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 901–909. VLDB Endowment (2005)
Aggarwal, C.C.: On unifying privacy and uncertain data models. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 386–395. IEEE Computer Society, Washington, DC, USA (2008)
Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms, 1st edn. Springer Publishing Company Incorporated, New York (2008). https://doi.org/10.1007/978-0-387-70992-5
Aytug, H., Koehler, G.J.: New stopping criterion for genetic algorithms. Eur. J. Oper. Res. 126(3), 662–674 (2000)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228, April 2005
Burke, M., Kayem, A.V.D.M.: K-anonymity for privacy preserving crime data publishing in resource constrained environments. In: 28th International Conference on Advanced Information Networking and Applications Workshops, AINA 2014 Workshops, Victoria, BC, Canada, 13–16 May 2014, pp. 833–840 (2014)
Ciglic, M., Eder, J., Koncilia, C.: k-anonymity of microdata with NULL values. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014. LNCS, vol. 8644, pp. 328–342. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10073-9_27
Ciriani, V., Vimercati, S.D.C., Foresti, S., Samarati, P.: k-Anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, vol. 34, pp. 105–136. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_5
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)
Dewri, R., Ray, I., Ray, I., Whitley, D.: Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization. J. Comput. Secur. 19(5), 935–974 (2011)
Dewri, R., Whitley, D., Ray, I., Ray, I.: A multi-objective approach to data sharing with privacy constraints and preference based objectives. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 1499–1506. ACM, New York (2009)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Fienberg, S.E., Jin, J.: Privacy-preserving data sharing in high dimensional regression and classification settings. J. Priv. Confid. 4(1), 10 (2012)
Fredj, F.B., Lammari, N., Comyn-Wattiau, I.: Abstracting anonymization techniques: a prerequisite for selecting a generalization algorithm. Procedia Comput. Sci. 60, 206–215 (2015)
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 758–769. VLDB Endowment (2007)
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)
Gould, C., Burger, J., Newham, G.: The SAPS crime statistics: What they tell us and what they don’t. S. Afr. Crime Quaterly, December 2012. https://www.issafrica.org/uploads/1crimestats.pdf
Islam, M.Z., Brankovic, L.: Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowl. Based Syst. 24(8), 1214–1223 (2011)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)
Kayem, A.V.D.M., Meinel, C.: Clustering heuristics for efficient t-closeness anonymisation. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 27–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_3
Kayem, A.V.D.M., Vester, C.T., Meinel, C.: Automated k-anonymization and l-diversity for shared data privacy. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 105–120. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44403-1_7
Koufogiannis, F., Han, S., Pappas, G.J.: Optimality of the laplace mechanism in differential privacy. arXiv preprint arXiv:1504.00065 (2015)
Last, M., Tassa, T., Zhmudyak, A., Shmueli, E.: Improving accuracy of classification models induced from anonymized datasets. Inf. Sci. 256, 138–161 (2014). Business Intelligence in Risk Management
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD 2005, pp. 49–60. ACM, New York (2005). http://doi.acm.org/10.1145/1066157.1066164
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst. (TODS) 33(3), 17 (2008)
Li, C., Miklau, G., Hay, M., McGregor, A., Rastogi, V.: The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24(6), 757–781 (2015). http://dx.doi.org/10.1007/s00778-015-0398-x
Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, April 2007
Liang, H., Yuan, H.: On the complexity of t-closeness anonymization and related problems. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7825, pp. 331–345. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37487-6_26
Lin, J.L., Wei, M.C.: Genetic algorithm-based clustering approach for \(k\)-anonymization. Expert. Syst. Appl. 36(6), 9784–9792 (2009)
Liu, F.: Generalized Gaussian mechanism for differential privacy. arXiv preprint arXiv:1602.06028 (2016)
Liu, K., Giannella, C., Kargupta, H.: A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, vol. 34, pp. 359–381. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_15
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), Article no. 3 (2007)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007, pp. 94–103. IEEE (2007)
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM (2004)
Nergiz, M.E., Tamersoy, A., Saygin, Y.: Instant anonymization. ACM Trans. Database Syst. 36(1), 2:1–2:33 (2011)
Sakpere, A.B., Kayem, A.V.D.M., Ndlovu, T.: A usable and secure crime reporting system for technology resource constrained context. In: 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, AINA 2015 Workshops, Gwangju, South Korea, 24–27 March 2015, pp. 424–429 (2015)
Seckan, B.: Violent crime in the developing world: research roundup. In: Journalist’s Resource: Research on today’s New topics, October 2012. http://journalistsresource.org/studies/international/development/crime-violence-developing-world-research-roundup
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2003)
Vaidya, J., Kantarcıoğlu, M., Clifton, C.: Privacy-preserving Naive Bayes classification. VLDB J. Int. J. Very Large Data Bases 17(4), 879–898 (2008)
Website: South africa’s police: Something very rotten. The Economist: Middle East and Africa, June 2012. http://www.economist.com/node/21557385
Wicker, S.B.: The loss of location privacy in the cellular age. Commun. ACM 55(8), 60–68 (2012)
Wimmer, H., Powell, L.: A comparison of the effects of k-anonymity on machine learning algorithms. In: Proceedings of the Conference for Information Systems Applied Research ISSN, vol. 2167, p. 1508 (2014)
Xiao, Q., Reiter, M.K., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1582–1594. ACM, New York (2015)
Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 135–146. ACM, New York (2010)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. ACM, New York (2006)
Zhang, B., Dave, V., Mohammed, N., Hasan, M.A.: Feature selection for classification under anonymity constraint. arXiv preprint arXiv:1512.07158 (2015)
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1423–1434. ACM (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A Notations: Summary
Appendix A Notations: Summary
see Table 3.
Rights and permissions
Copyright information
© 2018 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Kayem, A.V.D.M., Vester, C.T., Meinel, C. (2018). Syntactic Anonymisation of Shared Datasets in Resource Constrained Environments. In: Hameurlain, A., Wagner, R., Hartmann, S., Ma, H. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Lecture Notes in Computer Science(), vol 11250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58384-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-58384-5_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58383-8
Online ISBN: 978-3-662-58384-5
eBook Packages: Computer ScienceComputer Science (R0)