Syntactic Anonymisation of Shared Datasets in Resource Constrained Environments

Kayem, Anne V. D. M.; Vester, C. T.; Meinel, Christoph

doi:10.1007/978-3-662-58384-5_2

Anne V. D. M. Kayem¹⁷,
C. T. Vester¹⁸ &
Christoph Meinel¹⁷

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11250))

244 Accesses
2 Citations

Abstract

Resource constrained environments (RCEs) describe remote or rural developing world regions where missing specialised expertise, and computational processing power hinders data analytics operations. Outsourcing to third-party data analytics service providers offers a cost-effective management solution. However, a necessary pre-processing step is to anonymise the data before it is shared, to protect against privacy violations. Syntactic anonymisation algorithms (k-anonymisation, l-diversity, and t-closeness) are an attractive solution for RCEs because the generated data is not use case specific. These algorithms have however been shown to be NP-Hard, and as such need to be re-factored to run efficiently with limited processing power. In previous work [23], we presented a method of extending the standard k-anonymization and l-diversity algorithms, to satisfy both data utility and privacy. We used a multi-objective optimization scheme to minimise information loss and maximize privacy. Our results showed that the extended l-diverse algorithm incurs higher information losses than the extended k-anonymity algorithm, but offers better privacy in terms of protection against inferential disclosure. The additional information loss (7%) was negligible, and did not negatively affect data utility. As a further step, in this paper, we extend this result with a modified t-closeness algorithm based on the notion of clustering. The aim of this is to provide a performance-efficient algorithm that maintains the low information loss levels of our extended k-anonymisation and l-diversity algorithms, but also provides protection against skewness and similarity attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 901–909. VLDB Endowment (2005)
Google Scholar
Aggarwal, C.C.: On unifying privacy and uncertain data models. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 386–395. IEEE Computer Society, Washington, DC, USA (2008)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms, 1st edn. Springer Publishing Company Incorporated, New York (2008). https://doi.org/10.1007/978-0-387-70992-5
Book Google Scholar
Aytug, H., Koehler, G.J.: New stopping criterion for genetic algorithms. Eur. J. Oper. Res. 126(3), 662–674 (2000)
Article MathSciNet Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228, April 2005
Google Scholar
Burke, M., Kayem, A.V.D.M.: K-anonymity for privacy preserving crime data publishing in resource constrained environments. In: 28th International Conference on Advanced Information Networking and Applications Workshops, AINA 2014 Workshops, Victoria, BC, Canada, 13–16 May 2014, pp. 833–840 (2014)
Google Scholar
Ciglic, M., Eder, J., Koncilia, C.: k-anonymity of microdata with NULL values. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014. LNCS, vol. 8644, pp. 328–342. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10073-9_27
Chapter Google Scholar
Ciriani, V., Vimercati, S.D.C., Foresti, S., Samarati, P.: k-Anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, vol. 34, pp. 105–136. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_5
Chapter Google Scholar
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)
MathSciNet Google Scholar
Dewri, R., Ray, I., Ray, I., Whitley, D.: Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization. J. Comput. Secur. 19(5), 935–974 (2011)
Article Google Scholar
Dewri, R., Whitley, D., Ray, I., Ray, I.: A multi-objective approach to data sharing with privacy constraints and preference based objectives. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 1499–1506. ACM, New York (2009)
Google Scholar
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Chapter MATH Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
MathSciNet MATH Google Scholar
Fienberg, S.E., Jin, J.: Privacy-preserving data sharing in high dimensional regression and classification settings. J. Priv. Confid. 4(1), 10 (2012)
Google Scholar
Fredj, F.B., Lammari, N., Comyn-Wattiau, I.: Abstracting anonymization techniques: a prerequisite for selecting a generalization algorithm. Procedia Comput. Sci. 60, 206–215 (2015)
Article Google Scholar
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 758–769. VLDB Endowment (2007)
Google Scholar
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)
Article MathSciNet Google Scholar
Gould, C., Burger, J., Newham, G.: The SAPS crime statistics: What they tell us and what they don’t. S. Afr. Crime Quaterly, December 2012. https://www.issafrica.org/uploads/1crimestats.pdf
Islam, M.Z., Brankovic, L.: Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowl. Based Syst. 24(8), 1214–1223 (2011)
Article Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)
Google Scholar
Kayem, A.V.D.M., Meinel, C.: Clustering heuristics for efficient t-closeness anonymisation. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 27–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_3
Chapter Google Scholar
Kayem, A.V.D.M., Vester, C.T., Meinel, C.: Automated k-anonymization and l-diversity for shared data privacy. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 105–120. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44403-1_7
Chapter Google Scholar
Koufogiannis, F., Han, S., Pappas, G.J.: Optimality of the laplace mechanism in differential privacy. arXiv preprint arXiv:1504.00065 (2015)
Last, M., Tassa, T., Zhmudyak, A., Shmueli, E.: Improving accuracy of classification models induced from anonymized datasets. Inf. Sci. 256, 138–161 (2014). Business Intelligence in Risk Management
Article MathSciNet Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD 2005, pp. 49–60. ACM, New York (2005). http://doi.acm.org/10.1145/1066157.1066164
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst. (TODS) 33(3), 17 (2008)
Article Google Scholar
Li, C., Miklau, G., Hay, M., McGregor, A., Rastogi, V.: The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24(6), 757–781 (2015). http://dx.doi.org/10.1007/s00778-015-0398-x
Article Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, April 2007
Google Scholar
Liang, H., Yuan, H.: On the complexity of t-closeness anonymization and related problems. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7825, pp. 331–345. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37487-6_26
Chapter Google Scholar
Lin, J.L., Wei, M.C.: Genetic algorithm-based clustering approach for \(k\)-anonymization. Expert. Syst. Appl. 36(6), 9784–9792 (2009)
Article Google Scholar
Liu, F.: Generalized Gaussian mechanism for differential privacy. arXiv preprint arXiv:1602.06028 (2016)
Liu, K., Giannella, C., Kargupta, H.: A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, vol. 34, pp. 359–381. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_15
Chapter Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), Article no. 3 (2007)
Article Google Scholar
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007, pp. 94–103. IEEE (2007)
Google Scholar
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM (2004)
Google Scholar
Nergiz, M.E., Tamersoy, A., Saygin, Y.: Instant anonymization. ACM Trans. Database Syst. 36(1), 2:1–2:33 (2011)
Article Google Scholar
Sakpere, A.B., Kayem, A.V.D.M., Ndlovu, T.: A usable and secure crime reporting system for technology resource constrained context. In: 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, AINA 2015 Workshops, Gwangju, South Korea, 24–27 March 2015, pp. 424–429 (2015)
Google Scholar
Seckan, B.: Violent crime in the developing world: research roundup. In: Journalist’s Resource: Research on today’s New topics, October 2012. http://journalistsresource.org/studies/international/development/crime-violence-developing-world-research-roundup
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MathSciNet Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2003)
Google Scholar
Vaidya, J., Kantarcıoğlu, M., Clifton, C.: Privacy-preserving Naive Bayes classification. VLDB J. Int. J. Very Large Data Bases 17(4), 879–898 (2008)
Article Google Scholar
Website: South africa’s police: Something very rotten. The Economist: Middle East and Africa, June 2012. http://www.economist.com/node/21557385
Wicker, S.B.: The loss of location privacy in the cellular age. Commun. ACM 55(8), 60–68 (2012)
Article Google Scholar
Wimmer, H., Powell, L.: A comparison of the effects of k-anonymity on machine learning algorithms. In: Proceedings of the Conference for Information Systems Applied Research ISSN, vol. 2167, p. 1508 (2014)
Google Scholar
Xiao, Q., Reiter, M.K., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1582–1594. ACM, New York (2015)
Google Scholar
Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 135–146. ACM, New York (2010)
Google Scholar
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. ACM, New York (2006)
Google Scholar
Zhang, B., Dave, V., Mohammed, N., Hasan, M.A.: Feature selection for classification under anonymity constraint. arXiv preprint arXiv:1512.07158 (2015)
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1423–1434. ACM (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Hasso-Plattner-Institute, Potsdam, Germany
Anne V. D. M. Kayem & Christoph Meinel
Department of Computer Science, University of Cape Town, Cape Town, South Africa
C. T. Vester

Authors

Anne V. D. M. Kayem
View author publications
You can also search for this author in PubMed Google Scholar
C. T. Vester
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne V. D. M. Kayem .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Roland Wagner
Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington, Wellington, New Zealand
Hui Ma

Appendix A Notations: Summary

see Table 3.

Table 3. Notation

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kayem, A.V.D.M., Vester, C.T., Meinel, C. (2018). Syntactic Anonymisation of Shared Datasets in Resource Constrained Environments. In: Hameurlain, A., Wagner, R., Hartmann, S., Ma, H. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Lecture Notes in Computer Science(), vol 11250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58384-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-58384-5_2
Published: 22 November 2018
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-58383-8
Online ISBN: 978-3-662-58384-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Syntactic Anonymisation of Shared Datasets in Resource Constrained Environments

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A Notations: Summary

Appendix A Notations: Summary

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation