Skip to main content
Log in

DRAPE: optimizing private data release under adjustable privacy-utility equilibrium

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

Data releasing and sharing between several fields has became inevitable tendency in the context of big data. Unfortunately, this situation has clearly caused enormous exposure of sensitive and private information. Along with massive privacy breaches, privacy-preservation issues were brought into sharp focus and privacy concerns may prevent people from providing their personal data. To meet the requirements of privacy protection, such a problem has been extensively studied. However, privacy protection of sensitive information should not prevent data users from conducting valid analyses of the released data. We propose a novel algorithm in this paper, named Data Release under Adjustable Privacy-utility Equilibrium (DRAPE), to address this problem. We handle the privacy versus utility tradeoff in the data release problem by breaking sensitive associations among variables while maintaining the correlations of nonsensitive variables. Furthermore, we quantify the impact of the proposed privacy-preserving method in terms of correlation preservation and privacy level, and thereby develop an optimization model to fulfil data privacy and data utility constraints. The proposed approach is not only able to provide a better privacy levels control scheme for data publishers, but also provides personalized service for data requesters with different utility requirements. We conduct experiments on one simulated dataset and two real datasets, and the simulation results show that DRAPE efficiently achieves a guaranteed privacy level while simultaneously effectively preserving data utility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and material

Some or all data generated or used during the study are available online.

Code Availability

The model required to reproduce these findings cannot be shared at this time as the model also forms part of an ongoing study

References

  1. Chen M, Mao S, Zhang Y, Leung VC et al (2014) Big data: related technologies, challenges and future prospects. vol. 96

  2. Aggarwal CC, Philip SY (2008) A general survey of privacy-preserving data mining models and algorithms. Privacy-preserving data mining pp. 11–52

  3. Zhu T, Li G, Zhou W, Philip SY (2017) Differentially private data publishing and analysis: a survey. IEEE Trans Knowl Data Eng 29(8):1619–1638

    Article  Google Scholar 

  4. Xiao X, Tao Y, Koudas N (2010) Transparent anonymization: thwarting adversaries who know the algorithm. ACM Trans Database Syst 35(2):1–48

    Article  Google Scholar 

  5. Cormode G, Srivastava D, Li N, Li T (2010) Minimizing minimality and maximizing utility: analyzing method-based attacks on anonymized data. Proceed VLDB Endowment 3(1–2):1045–1056

    Article  Google Scholar 

  6. Motiwalla L, Li X-B (2013) Developing privacy solutions for sharing and analysing healthcare data. Int J Bus Inf Syst 13(2):199–216

    Google Scholar 

  7. Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582

    Article  Google Scholar 

  8. Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: 3rd IEEE international conference on data mining, pp. 99–106

  9. Yao AC (1982) Protocols for secure computations. In: 23rd annual symposium on foundations of computer science pp. 160–164

  10. Rivest RL, Adleman L, Dertouzos ML et al (1978) On data banks and privacy homomorphisms. Found Secur Comput 4(11):169–180

    Google Scholar 

  11. Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, pp. 265–284

  12. Fung BC, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):1–53

    Article  Google Scholar 

  13. Cynthia D (2006) Differential privacy. Automata, languages and programming pp. 1–12

  14. Lindell Y (2009) Secure computation for privacy preserving data mining. In: Encyclopedia of data warehousing and mining pp. 1747–1752

  15. Sarathy R, Muralidhar K (2002) The security of confidential numerical data in databases. Inf Syst Res 13(4):389–403

    Article  Google Scholar 

  16. Sweeney L (2002) k-anonymity: a model for protecting privacy. Internat J Uncertain Fuzziness Knowl-Based Syst 10(05):557–570

    Article  Google Scholar 

  17. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3–es

    Article  Google Scholar 

  18. Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd international conference on data engineering, pp. 106–115

  19. Wong RCW, Fu AWC, Wang K, Yu PS, Pei J (2011) Can the utility of anonymized data be used for privacy breaches? ACM Trans Knowl Discov Data 5(3):1–24

    Article  Google Scholar 

  20. Kifer D, Gehrke J (2006) Injecting utility into anonymized datasets. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 217–228

  21. Li T, Li N (2008) Injector: Mining background knowledge for data anonymization. In: 2008 IEEE 24th international conference on data engineering, pp. 446–455

  22. Kartal HB, Li XB (2020) Protecting privacy when sharing and releasing data with multiple records per person. J Assoc Inf Syst 21(6):1461–1485

    Google Scholar 

  23. Dalenius T, Reiss SP (1982) Data-swapping: a technique for disclosure control. J Stat Plann Inf 6(1):73–85

    Article  Google Scholar 

  24. Liew CK, Choi UJ, Liew CJ (1985) A data distortion by probability distribution. ACM Trans Database Syst 10(3):395–411

    Article  Google Scholar 

  25. Liu K, Kargupta H, Ryan J (2005) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106

    Google Scholar 

  26. Liu P, Le Wang, Li X (2017) Randomized perturbation for privacy-preserving social network data publishing. In: IEEE international conference on big knowledge, 2017:208–213

  27. Badu-Marfo G, Farooq B, Patterson Z (2019) Perturbation privacy for sensitive locations in transit data publication: A case study of montreal trajet surveys. CoRR

  28. Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 37–48

  29. Liu L, Kantarcioglu M, Thuraisingham B (2008) The applicability of the perturbation based privacy preserving data mining for real-world data. Data & Knowl Eng 65(1):5–21

    Article  Google Scholar 

  30. Li XB, Sarkar S (2013) Class-restricted clustering and microperturbation for data privacy. Manage Sci 59(4):796–812

    Article  Google Scholar 

  31. Jiang X, Ji Z, Wang S, Mohammed N, Cheng S, Ohno-Machado L (2013) Differential-private data publishing through component analysis. Trans Data Privacy 6(1):19

    Google Scholar 

  32. Gong M, Pan K, Xie Y (2019) Differential privacy preservation in regression analysis based on relevance. Knowl-Based Syst 173:140–149

    Article  Google Scholar 

  33. Baak M, Koopman R, Snoek H, Klous S (2020) A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput Stat Data Anal 152:107043

    Article  Google Scholar 

  34. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp. 439–450

  35. Muralidhar K, Parsa R, Sarathy R (1999) A general additive data perturbation method for database security. Manage Sci 45(10):1399–1415

    Article  Google Scholar 

  36. Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for the protection of numerical microdata

  37. Harrison D, Rubimfeld D (1978) Hedonic prices and the demand for clean air

  38. Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martínez S (2014) Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J 23(5):771–794

    Article  Google Scholar 

  39. Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Moore JH et al (2016) Automating biomedical data science through tree-based pipeline optimization. In: European conference on the applications of evolutionary computation, pp. 123–137

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 71871090), the Science & Technology Innovation Leading Project of Hunan High-tech Industry (No. 2020GK2005) and the Natural Science Foundation of Hunan Province of China (No. 2021JJ30158).

Funding

This work was supported by the National Natural Science Foundation of China (No. 71871090), the Science & Technology Innovation Leading Project of Hunan High-tech Industry (No. 2020GK2005) and the Natural Science Foundation of Hunan Province of China (No. 2021JJ30158).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiujun Lan.

Ethics declarations

Conflict of interest

No conflict of interest exits in the submission of this manuscript.

Consent for publication

Not applicable.

Consent for Participate

Not applicable.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, Q., Lan, Q., Ma, J. et al. DRAPE: optimizing private data release under adjustable privacy-utility equilibrium. Inf Technol Manag 25, 199–217 (2024). https://doi.org/10.1007/s10799-022-00378-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-022-00378-4

Keywords

Navigation