Abstract
A broad range of web activities such as querying web pages, e-commerce transactions, health diagnosis and seat reservations generate vast volume of data, referred to as transactional data. These transactional data are published and widely used for data mining, research and analysis. However, the publishing of individuals’ transactional data implies serious concerns related to privacy for the individuals whose data have been published. The methods proposed in previous researches to preserve the privacy are suitable for structured relational data but are not well suitable to anonymize transactional data since the latter are generally unstructured, sparse and high dimensional. This paper addresses the problem of privacy-preserving publication of transactional data using two enhanced versions of ‘disassociation’ technique. Disassociation limits privacy breaches and increases the utility of the published data, but still, it does not eliminate them because it results in a cover problem that may lead to further privacy concerns. In this paper, we propose two algorithms: (i) improvement in disassociation using suppression and addition (IDSA) and (ii) improvement in disassociation by generalizing cover item (IDGC) to eliminate the cover problem of disassociated data. The proposed algorithms are implemented on INFORMS and BMS-Webview1 datasets and compared to disassociation concerning prevention of privacy breaches as well as loss in information. The results depict that the IDSA leads to a significant drop in privacy breaches due to cover problem with minimal information loss and IDGC completely removes the privacy breaches due to cover problem without any significant loss in data utility.
Similar content being viewed by others
References
Abdulsalam, S.O.; Adewole, K.S.; Akintola, A.G.; Hambali, M.A.: Data mining in market basket transaction: an association rule mining approach. Int. J. Appl. Inf. Syst. 7(10), 15–20 (2014)
Belle, A.; Thiagarajan, R.; Soroushmehr, S.M.; Navidi, F.; Beard, D.A.; Najarian, K.: Big data analytics in healthcare. Biomed. Res. Int. 2015, 1–16 (2015)
Puri, V.; Sachdeva, S.; Kaur, P.: Privacy preserving publication of relational and transaction data: survey on the anonymization of patient data. Comput. Sci. Rev. 32, 45–61 (2019)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl Based. Syst. 10(5), 557–570 (2002)
Machanavajjhala, A.; Gehrke J.; Kifer D.: L-Diversity: privacy beyond k-anonymity. In: 22nd International Conference on Data Engineering, Atlanta, GA, USA, pp. 3–8 (2006)
Li, N.; Li, T.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering. pp. 106–115 (2007)
Samarati, P.; Sweeney, L.: Protecting privacy when disclosing information: K-anonymity and its enforcement through generalization and suppression.In: IEEE Symp. Research in Security and Privacy (1998)
Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: 31 VLDB Conference, pp. 901–909 (2005)
Arava, K.; Lingamgunta, S.: Adaptive k-Anonymity approach for privacy preserving in cloud. Arab. J. Sci. Eng. pp. 1–8 (2019)
Terrovitis, M.; Mamoulis, N.; Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J. 20(1), 83–106 (2011)
Xue, M.; Karras, P.; Rassi, C.: Anonymizing set-valued data by non-reciprocal recoding. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1050–1058 (2012)
Loukides, G.; Gkoulalas-Divanis, A.: Utility-aware anonymization of diagnosis codes. IEEE J. Biomed. Health Inf. 17(1), 60–70 (2013)
Loukides, G.; Gkoulalas-Divanis, A.; Shao, J.: Anonymizing transaction data to eliminate sensitive inferences. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol. 6261, pp. 400–415. Springer, Berlin, Heidelberg (2010)
He, Y.; Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. PVLDB 2(1), 934–945 (2009)
Cao, J.; Karras, P.; Raïssi, C.; Tan, K.: ρ-uncertainty: inference-proof transaction anonymization. Very Large Data Bases Endow. (PVLDB) 3(1), 1033–1044 (2010)
Takahashi, T.; Sabataka, K.; Mori, T.: Efficient and tailor-made anonymization for relational and transactional medical records. In: Worshop on Data Science for Social Good at KDD (2014)
Jiang, M.N.; Chen, R.; et al.: Privacy-preserving heterogeneous health data sharing. J. Am. Med. Inf. Assoc. 20, 462–469 (2013)
Ghinita, G.; Kalnis, P.; Tao, Y.: Anonymous publication of sensitive transactional data. IEEE Trans. Knowl. Data Eng. 33(2), 161–174 (2011)
Loukides, G.; Gkoulalas-Divanis, A.; Malin, B.: COAT: constraint-based anonymization of transactions. Knowl. Inf. Syst. 28(2), 251–282 (2011)
Gkoulalas-Divanis, A.; Loukides, G.: PCTA: privacy-constrained clustering-based transaction data anonymization. In: 4th International Workshop on Privacy and Anonymity in the Information Society, pp. 1–10 (2011)
Terrovitis, M.; Mamoulis, N.; Kalnis, P.: Privacy-preserving anonymization of set-valued data. In: VLDB Endowment, vol. 1, pp. 115–125 (2008)
Ghinita, G.; Tao, Y.; Kalnis, P.: On the anonymization of sparse high-dimensional data. In: International Conference on Data Engineering, pp. 715–724 (2008)
Xu, Y.; Wang, K.; Fu, A.; Yu, P.S.: Anonymizing transaction databases for publication. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 767–775 (2008)
Wang, J.; Deng, C.; Li, X.: Two privacy-preserving approaches for publishing transactional data streams. IEEE Access 6, 23648–23658 (2018)
Bewong M.; Liu J.; Liu L.; Li J.: Utility aware clustering for publishing transactional data. In: Kim, J., Shim, K., Cao, L., Lee, J.G., Lin, X., Moon, Y.S. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science, vol. 10235, pp. 481–494. Springer, Cham (2017)
Loukides, G.; Liagouris, J.; Gkoulalas-Divanis, A.; Terrovitis, M.: Disassociation for electronic health record privacy. J. Biomed. Inf. (Spec. Issue Inf. Methods Med. Priv.) 50, 46–61 (2014)
Terrovitis, M.; Liagouris, J.; Mamoulis, N.; Skiadopoulos, S.: Privacy preservation by disassociation. VLDB 5(10), 944–955 (2012)
Barakat, S.; Bouna, B.; Nassar, M.; Guyeux, C.: On the evaluation of the privacy breach in disassociated set-valued datasets. In: 13th International Joint Conference on e-Business and Telecommunications (ICETE 2016) - SECRYPT, vol. 4, pp. 318–326 (2016)
Awad, N.; Bechara Al, B.; Couchot, J.F.; Philippe, L.: Safe disassociation of set-valued datasets. J. Intell. Inf. Syst. 53, 547–562 (2019)
Xu, J.; Wang, W.; Pei, J.; Wang, X.; Shi, B.; Fu, A.: Utility-based anonymization using local recoding. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–790 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Puri, V., Kaur, P. & Sachdeva, S. Effective Removal of Privacy Breaches in Disassociated Transactional Datasets. Arab J Sci Eng 45, 3257–3272 (2020). https://doi.org/10.1007/s13369-020-04353-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-020-04353-5