Skip to main content
Log in

Effective Removal of Privacy Breaches in Disassociated Transactional Datasets

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

A broad range of web activities such as querying web pages, e-commerce transactions, health diagnosis and seat reservations generate vast volume of data, referred to as transactional data. These transactional data are published and widely used for data mining, research and analysis. However, the publishing of individuals’ transactional data implies serious concerns related to privacy for the individuals whose data have been published. The methods proposed in previous researches to preserve the privacy are suitable for structured relational data but are not well suitable to anonymize transactional data since the latter are generally unstructured, sparse and high dimensional. This paper addresses the problem of privacy-preserving publication of transactional data using two enhanced versions of ‘disassociation’ technique. Disassociation limits privacy breaches and increases the utility of the published data, but still, it does not eliminate them because it results in a cover problem that may lead to further privacy concerns. In this paper, we propose two algorithms: (i) improvement in disassociation using suppression and addition (IDSA) and (ii) improvement in disassociation by generalizing cover item (IDGC) to eliminate the cover problem of disassociated data. The proposed algorithms are implemented on INFORMS and BMS-Webview1 datasets and compared to disassociation concerning prevention of privacy breaches as well as loss in information. The results depict that the IDSA leads to a significant drop in privacy breaches due to cover problem with minimal information loss and IDGC completely removes the privacy breaches due to cover problem without any significant loss in data utility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. https://sites.google.com/site/informsdataminingcontest/data.

  2. http://icd9cm.chrisendres.com.

  3. http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php.

References

  1. Abdulsalam, S.O.; Adewole, K.S.; Akintola, A.G.; Hambali, M.A.: Data mining in market basket transaction: an association rule mining approach. Int. J. Appl. Inf. Syst. 7(10), 15–20 (2014)

    Google Scholar 

  2. Belle, A.; Thiagarajan, R.; Soroushmehr, S.M.; Navidi, F.; Beard, D.A.; Najarian, K.: Big data analytics in healthcare. Biomed. Res. Int. 2015, 1–16 (2015)

    Article  Google Scholar 

  3. Puri, V.; Sachdeva, S.; Kaur, P.: Privacy preserving publication of relational and transaction data: survey on the anonymization of patient data. Comput. Sci. Rev. 32, 45–61 (2019)

    Article  MathSciNet  Google Scholar 

  4. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl Based. Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  Google Scholar 

  5. Machanavajjhala, A.; Gehrke J.; Kifer D.: L-Diversity: privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering, Atlanta, GA, USA, pp. 3–8 (2006)

  6. Li, N.; Li, T.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering. pp. 106–115 (2007)

  7. Samarati, P.; Sweeney, L.: Protecting privacy when disclosing information: K-anonymity and its enforcement through generalization and suppression.In: IEEE Symp. Research in Security and Privacy (1998)

  8. Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: 31 VLDB Conference, pp. 901–909 (2005)

  9. Arava, K.; Lingamgunta, S.: Adaptive k-Anonymity approach for privacy preserving in cloud. Arab. J. Sci. Eng. pp. 1–8 (2019)

  10. Terrovitis, M.; Mamoulis, N.; Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J. 20(1), 83–106 (2011)

    Article  Google Scholar 

  11. Xue, M.; Karras, P.; Rassi, C.: Anonymizing set-valued data by non-reciprocal recoding. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1050–1058 (2012)

  12. Loukides, G.; Gkoulalas-Divanis, A.: Utility-aware anonymization of diagnosis codes. IEEE J. Biomed. Health Inf. 17(1), 60–70 (2013)

    Article  Google Scholar 

  13. Loukides, G.; Gkoulalas-Divanis, A.; Shao, J.: Anonymizing transaction data to eliminate sensitive inferences. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol. 6261, pp. 400–415. Springer, Berlin, Heidelberg (2010)

    Google Scholar 

  14. He, Y.; Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. PVLDB 2(1), 934–945 (2009)

    Google Scholar 

  15. Cao, J.; Karras, P.; Raïssi, C.; Tan, K.: ρ-uncertainty: inference-proof transaction anonymization. Very Large Data Bases Endow. (PVLDB) 3(1), 1033–1044 (2010)

    Google Scholar 

  16. Takahashi, T.; Sabataka, K.; Mori, T.: Efficient and tailor-made anonymization for relational and transactional medical records. In: Worshop on Data Science for Social Good at KDD (2014)

  17. Jiang, M.N.; Chen, R.; et al.: Privacy-preserving heterogeneous health data sharing. J. Am. Med. Inf. Assoc. 20, 462–469 (2013)

    Article  Google Scholar 

  18. Ghinita, G.; Kalnis, P.; Tao, Y.: Anonymous publication of sensitive transactional data. IEEE Trans. Knowl. Data Eng. 33(2), 161–174 (2011)

    Article  Google Scholar 

  19. Loukides, G.; Gkoulalas-Divanis, A.; Malin, B.: COAT: constraint-based anonymization of transactions. Knowl. Inf. Syst. 28(2), 251–282 (2011)

    Article  Google Scholar 

  20. Gkoulalas-Divanis, A.; Loukides, G.: PCTA: privacy-constrained clustering-based transaction data anonymization. In: 4th International Workshop on Privacy and Anonymity in the Information Society, pp. 1–10 (2011)

  21. Terrovitis, M.; Mamoulis, N.; Kalnis, P.: Privacy-preserving anonymization of set-valued data. In: VLDB Endowment, vol. 1, pp. 115–125 (2008)

  22. Ghinita, G.; Tao, Y.; Kalnis, P.: On the anonymization of sparse high-dimensional data. In: International Conference on Data Engineering, pp. 715–724 (2008)

  23. Xu, Y.; Wang, K.; Fu, A.; Yu, P.S.: Anonymizing transaction databases for publication. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 767–775 (2008)

  24. Wang, J.; Deng, C.; Li, X.: Two privacy-preserving approaches for publishing transactional data streams. IEEE Access 6, 23648–23658 (2018)

    Article  Google Scholar 

  25. Bewong M.; Liu J.; Liu L.; Li J.: Utility aware clustering for publishing transactional data. In: Kim, J., Shim, K., Cao, L., Lee, J.G., Lin, X., Moon, Y.S. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science, vol. 10235, pp. 481–494. Springer, Cham (2017)

    Google Scholar 

  26. Loukides, G.; Liagouris, J.; Gkoulalas-Divanis, A.; Terrovitis, M.: Disassociation for electronic health record privacy. J. Biomed. Inf. (Spec. Issue Inf. Methods Med. Priv.) 50, 46–61 (2014)

    Google Scholar 

  27. Terrovitis, M.; Liagouris, J.; Mamoulis, N.; Skiadopoulos, S.: Privacy preservation by disassociation. VLDB 5(10), 944–955 (2012)

    Google Scholar 

  28. Barakat, S.; Bouna, B.; Nassar, M.; Guyeux, C.: On the evaluation of the privacy breach in disassociated set-valued datasets. In: 13th International Joint Conference on e-Business and Telecommunications (ICETE 2016) - SECRYPT, vol. 4, pp. 318–326 (2016)

  29. Awad, N.; Bechara Al, B.; Couchot, J.F.; Philippe, L.: Safe disassociation of set-valued datasets. J. Intell. Inf. Syst. 53, 547–562 (2019)

    Article  Google Scholar 

  30. Xu, J.; Wang, W.; Pei, J.; Wang, X.; Shi, B.; Fu, A.: Utility-based anonymization using local recoding. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–790 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vartika Puri.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Puri, V., Kaur, P. & Sachdeva, S. Effective Removal of Privacy Breaches in Disassociated Transactional Datasets. Arab J Sci Eng 45, 3257–3272 (2020). https://doi.org/10.1007/s13369-020-04353-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-020-04353-5

Keywords

Navigation