Skip to main content

Syntactic Anonymisation of Shared Datasets in Resource Constrained Environments

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11250))

Abstract

Resource constrained environments (RCEs) describe remote or rural developing world regions where missing specialised expertise, and computational processing power hinders data analytics operations. Outsourcing to third-party data analytics service providers offers a cost-effective management solution. However, a necessary pre-processing step is to anonymise the data before it is shared, to protect against privacy violations. Syntactic anonymisation algorithms (k-anonymisation, l-diversity, and t-closeness) are an attractive solution for RCEs because the generated data is not use case specific. These algorithms have however been shown to be NP-Hard, and as such need to be re-factored to run efficiently with limited processing power. In previous work [23], we presented a method of extending the standard k-anonymization and l-diversity algorithms, to satisfy both data utility and privacy. We used a multi-objective optimization scheme to minimise information loss and maximize privacy. Our results showed that the extended l-diverse algorithm incurs higher information losses than the extended k-anonymity algorithm, but offers better privacy in terms of protection against inferential disclosure. The additional information loss (7%) was negligible, and did not negatively affect data utility. As a further step, in this paper, we extend this result with a modified t-closeness algorithm based on the notion of clustering. The aim of this is to provide a performance-efficient algorithm that maintains the low information loss levels of our extended k-anonymisation and l-diversity algorithms, but also provides protection against skewness and similarity attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 901–909. VLDB Endowment (2005)

    Google Scholar 

  2. Aggarwal, C.C.: On unifying privacy and uncertain data models. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 386–395. IEEE Computer Society, Washington, DC, USA (2008)

    Google Scholar 

  3. Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms, 1st edn. Springer Publishing Company Incorporated, New York (2008). https://doi.org/10.1007/978-0-387-70992-5

    Book  Google Scholar 

  4. Aytug, H., Koehler, G.J.: New stopping criterion for genetic algorithms. Eur. J. Oper. Res. 126(3), 662–674 (2000)

    Article  MathSciNet  Google Scholar 

  5. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228, April 2005

    Google Scholar 

  6. Burke, M., Kayem, A.V.D.M.: K-anonymity for privacy preserving crime data publishing in resource constrained environments. In: 28th International Conference on Advanced Information Networking and Applications Workshops, AINA 2014 Workshops, Victoria, BC, Canada, 13–16 May 2014, pp. 833–840 (2014)

    Google Scholar 

  7. Ciglic, M., Eder, J., Koncilia, C.: k-anonymity of microdata with NULL values. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014. LNCS, vol. 8644, pp. 328–342. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10073-9_27

    Chapter  Google Scholar 

  8. Ciriani, V., Vimercati, S.D.C., Foresti, S., Samarati, P.: k-Anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, vol. 34, pp. 105–136. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_5

    Chapter  Google Scholar 

  9. Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)

    MathSciNet  Google Scholar 

  10. Dewri, R., Ray, I., Ray, I., Whitley, D.: Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization. J. Comput. Secur. 19(5), 935–974 (2011)

    Article  Google Scholar 

  11. Dewri, R., Whitley, D., Ray, I., Ray, I.: A multi-objective approach to data sharing with privacy constraints and preference based objectives. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 1499–1506. ACM, New York (2009)

    Google Scholar 

  12. Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1

    Chapter  MATH  Google Scholar 

  13. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  14. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)

    MathSciNet  MATH  Google Scholar 

  15. Fienberg, S.E., Jin, J.: Privacy-preserving data sharing in high dimensional regression and classification settings. J. Priv. Confid. 4(1), 10 (2012)

    Google Scholar 

  16. Fredj, F.B., Lammari, N., Comyn-Wattiau, I.: Abstracting anonymization techniques: a prerequisite for selecting a generalization algorithm. Procedia Comput. Sci. 60, 206–215 (2015)

    Article  Google Scholar 

  17. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 758–769. VLDB Endowment (2007)

    Google Scholar 

  18. Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)

    Article  MathSciNet  Google Scholar 

  19. Gould, C., Burger, J., Newham, G.: The SAPS crime statistics: What they tell us and what they don’t. S. Afr. Crime Quaterly, December 2012. https://www.issafrica.org/uploads/1crimestats.pdf

  20. Islam, M.Z., Brankovic, L.: Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowl. Based Syst. 24(8), 1214–1223 (2011)

    Article  Google Scholar 

  21. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)

    Google Scholar 

  22. Kayem, A.V.D.M., Meinel, C.: Clustering heuristics for efficient t-closeness anonymisation. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 27–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_3

    Chapter  Google Scholar 

  23. Kayem, A.V.D.M., Vester, C.T., Meinel, C.: Automated k-anonymization and l-diversity for shared data privacy. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 105–120. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44403-1_7

    Chapter  Google Scholar 

  24. Koufogiannis, F., Han, S., Pappas, G.J.: Optimality of the laplace mechanism in differential privacy. arXiv preprint arXiv:1504.00065 (2015)

  25. Last, M., Tassa, T., Zhmudyak, A., Shmueli, E.: Improving accuracy of classification models induced from anonymized datasets. Inf. Sci. 256, 138–161 (2014). Business Intelligence in Risk Management

    Article  MathSciNet  Google Scholar 

  26. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD 2005, pp. 49–60. ACM, New York (2005). http://doi.acm.org/10.1145/1066157.1066164

  27. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst. (TODS) 33(3), 17 (2008)

    Article  Google Scholar 

  28. Li, C., Miklau, G., Hay, M., McGregor, A., Rastogi, V.: The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24(6), 757–781 (2015). http://dx.doi.org/10.1007/s00778-015-0398-x

    Article  Google Scholar 

  29. Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, April 2007

    Google Scholar 

  30. Liang, H., Yuan, H.: On the complexity of t-closeness anonymization and related problems. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7825, pp. 331–345. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37487-6_26

    Chapter  Google Scholar 

  31. Lin, J.L., Wei, M.C.: Genetic algorithm-based clustering approach for \(k\)-anonymization. Expert. Syst. Appl. 36(6), 9784–9792 (2009)

    Article  Google Scholar 

  32. Liu, F.: Generalized Gaussian mechanism for differential privacy. arXiv preprint arXiv:1602.06028 (2016)

  33. Liu, K., Giannella, C., Kargupta, H.: A survey of attack techniques on privacy-preserving data perturbation methods. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, vol. 34, pp. 359–381. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_15

    Chapter  Google Scholar 

  34. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), Article no. 3 (2007)

    Article  Google Scholar 

  35. McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2007, pp. 94–103. IEEE (2007)

    Google Scholar 

  36. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM (2004)

    Google Scholar 

  37. Nergiz, M.E., Tamersoy, A., Saygin, Y.: Instant anonymization. ACM Trans. Database Syst. 36(1), 2:1–2:33 (2011)

    Article  Google Scholar 

  38. Sakpere, A.B., Kayem, A.V.D.M., Ndlovu, T.: A usable and secure crime reporting system for technology resource constrained context. In: 29th IEEE International Conference on Advanced Information Networking and Applications Workshops, AINA 2015 Workshops, Gwangju, South Korea, 24–27 March 2015, pp. 424–429 (2015)

    Google Scholar 

  39. Seckan, B.: Violent crime in the developing world: research roundup. In: Journalist’s Resource: Research on today’s New topics, October 2012. http://journalistsresource.org/studies/international/development/crime-violence-developing-world-research-roundup

  40. Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  41. Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM (2003)

    Google Scholar 

  42. Vaidya, J., Kantarcıoğlu, M., Clifton, C.: Privacy-preserving Naive Bayes classification. VLDB J. Int. J. Very Large Data Bases 17(4), 879–898 (2008)

    Article  Google Scholar 

  43. Website: South africa’s police: Something very rotten. The Economist: Middle East and Africa, June 2012. http://www.economist.com/node/21557385

  44. Wicker, S.B.: The loss of location privacy in the cellular age. Commun. ACM 55(8), 60–68 (2012)

    Article  Google Scholar 

  45. Wimmer, H., Powell, L.: A comparison of the effects of k-anonymity on machine learning algorithms. In: Proceedings of the Conference for Information Systems Applied Research ISSN, vol. 2167, p. 1508 (2014)

    Google Scholar 

  46. Xiao, Q., Reiter, M.K., Zhang, Y.: Mitigating storage side channels using statistical privacy mechanisms. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 1582–1594. ACM, New York (2015)

    Google Scholar 

  47. Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 135–146. ACM, New York (2010)

    Google Scholar 

  48. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 785–790. ACM, New York (2006)

    Google Scholar 

  49. Zhang, B., Dave, V., Mohammed, N., Hasan, M.A.: Feature selection for classification under anonymity constraint. arXiv preprint arXiv:1512.07158 (2015)

  50. Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Privbayes: private data release via Bayesian networks. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1423–1434. ACM (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne V. D. M. Kayem .

Editor information

Editors and Affiliations

Appendix A Notations: Summary

Appendix A Notations: Summary

see Table 3.

Table 3. Notation

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kayem, A.V.D.M., Vester, C.T., Meinel, C. (2018). Syntactic Anonymisation of Shared Datasets in Resource Constrained Environments. In: Hameurlain, A., Wagner, R., Hartmann, S., Ma, H. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Lecture Notes in Computer Science(), vol 11250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58384-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-58384-5_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-58383-8

  • Online ISBN: 978-3-662-58384-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics