Attribute association based privacy preservation for multi trust level environment

PRIYADARSINI, R PRAVEENA; VALARMATHI, M L; SIVAKUMARI, S

doi:10.1007/s12046-015-0412-4

Attribute association based privacy preservation for multi trust level environment

Published: 30 October 2015

Volume 40, pages 1769–1792, (2015)
Cite this article

Sadhana Aims and scope Submit manuscript

R PRAVEENA PRIYADARSINI¹,
M L VALARMATHI² &
S SIVAKUMARI¹

200 Accesses
2 Citations
Explore all metrics

Abstract

Enormous amount of e-data is collected world-wide by organizations for the purpose of their research and decision making. The availability of this heterogeneous, sensitive information in e-databases poses a threat to the privacy of the individual or organization on which the data is collected. Privacy Preserving Data Mining [PPDM] is a field of research which concentrates on preserving data privacy during the process of data mining. This paper proposes a two level partition and perturbation frame work to release multiple copies of privacy preserved datasets in Multi Trust Level [MTL] scenario that can prevent linking and diversity attack. The framework proposes two methods namely, Entropy based Attribute Privacy Preservation [EAPP] and Information Gain based Attribute Privacy Preservation [IGAPP] for privacy preservation in MTL environment. The two methods perform vertical and horizontal partitioning of data for privacy preservation. Simple K-Means clustering algorithm with cluster size 2 using both Euclidean and Manhattan distance functions are used for horizontal partitioning. The vertical partitioning of attributes within the cluster is performed based on their entropy value that indicates its one way association with its class in EAPP method and Information Gain [IG] value of the attributes that indicates the two way associations with class in IGAPP method. The attributes in the clusters are subjected to privacy preservation technique based on their entropy and IG values in EAPP and IGAPP methods, respectively. The effect of distance in clustering the data points on privacy preservation and the ability of the privacy preserved datasets generated using the proposed methods to prevent privacy attacks are studied using variance, rank distortion and utility metrics. Real life medical and bench mark adult data sets have been used here for experimentation. The results show that the generated datasets exhibit good variance and rank distortion values and hence can prevent diversity and linking attacks in MTL environment. Also, the privacy preserved datasets have comparable utility on selected classification and clustering algorithms with original and L-Diversified datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

Big healthcare data: preserving security and privacy

Article Open access 09 January 2018

Big data privacy: a technological perspective and review

Article Open access 26 November 2016

References

Aggarwal C C and Philip S Y 2008 A general survey of privacy-preserving data mining models and algorithms, (pp. 11–52). Springer US
Baixeries Jaume 2008 A formal context for symmetric dependencies. Formal Concept Analysis, Berlin Heidelberg, Springer: 90–105
Benjamein Fung C M, Kewang, Lingyu Wang and Patrick Hang C K 2009 Privacy Persevering Data Publishing for Cluster Analysis. Data Knowl. Eng. 68: 552–575
Article Google Scholar
Charu Aggarwal C and Philip Yu S 2008 An Introduction to Privacy Preserving Data Mining, Privacy Preserving Data Mining: Models and Algorithms. Springer Science, LLC
Charu Aggarwal C 2008 Privacy and the Dimensionality Curse, Privacy Preserving Data Mining: Models and Algorithms. Springer Science, LLC
Chris R Giannela, Kunliu and Hillol Kargupta 2013 Breaching distance preserving data perturbation using few known inputs. Data Knowl. Eng. 83: 93–110
Article Google Scholar
Danker F K and El Emam K 2012 The Application of Differential Privacy to Health Data, EDBT/ICDT Workshops, p 158–166
Dinusha Vatsalan, Peter Christen and Vassilios S Verykios 2013 A Taxonomy of Privacy Preserving Record Linking Techniques. Inf. Syst. 38: 946–969
Article Google Scholar
Edoardo M Airoldi, XueBai and Bradley A Malin 2011 An Entropy Approach To Disclosure Risk Assessment: Lessons from Real Applications and Simulated Domains. Decis. Support Syst. 51: 10– 20
Article Google Scholar
Erez shmueli, Tamir Tassa, Raz Wasserstein, Bracha Shapira and Loar Rokach 2012 Limiting disclosure of sensitive data on sequential releases of data bases. Inf. Sci. 191: 98–127
Article Google Scholar
Evfimevski A, Gehrke J and Srikant R 2003 Limiting Privacy Breaches In: Privacy Preserving Data Mining, Proc. ACM PODS
Gagan Aggarwal, Tomas Feber, Krishnaram Kenthapadi, Samir Khuller, Rina Panigrahy, Dilys Tohas and Anzhu 2006 Achieving Anonymity via Clustering, PODS’06, June 26–28, Chicago
Grigorios Loukides, John Liagouris, Aris Gkoulalas-Divanis and Manolis Terrovitis 2014 Disassociation for electronic health record privacy. J. Biomed. Informatics 50: 46–61
Article Google Scholar
Hall M A 1999 Correlation-based Feature Selection for Machine Learning (Doctoral dissertation, The University of Waikato). http://archive.ics.uci.edu/ml/datasets/Adult
Hui Wang and Ruilin Liu 2011 Privacy-preserving publishing microdata with full functional dependencies. Data Knowl. Eng. 70: 249–268
Article Google Scholar
Ienco D, Pensa R G and Meo R 2012 From Context to Distance: Learning Dissimilarity for Categorical Data Clustering, ACM Trans. Knowl. Discovery Data 6 (1)
Javier Herranz, Stan Matwin, Jordi Nin and Vicenc Torra 2010 Classifying data from protected statistical datasets. Comput. Security 29: 875–896
Article Google Scholar
Jiawei Han, Micheline Kamber and Jian Pei 2011 Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann
Josep Domingo – Ferrar 2008 A survey of inference control methods for privacy persevering data mining , Privacy Persevering Data Mining Models and Algorithms, Springer science + Business media, LLC
Lin Xiaodong, Clifton Chris and zhu Michael 2005 Privacy preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8 (1): 68–81
Article Google Scholar
Li Liu, Murat Kantarcioglu and BhavaniThuraisingham 2008 The applicability of the perturbation based privacy preserving data mining for real-world data. Data Knowl. Eng. 65 (1): 5–21
Article Google Scholar
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann and Ian H Witten 2009 The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume, 11 (1)
Md Zahidul Islam and Lyilyana Brankovic 2011 Privacy Preserving Data Mining: A Noise Addition Framework Using a Novel Clustering Technique. Knowl. Based Syst. 24 (8): 1214–1223
Article Google Scholar
Ming Hau and Jian Pei 2008 A Survey Of Utility Based Privacy –Persevering Data Transformation Methods, Privacy Persevering Data Mining Models and Algorithms, Springer science + Business media, LLC
Murat Kantanlioglu 2008 A Survey Of Privacy Persevering Methods Across Horizontally Partitioned Data, Privacy Persevering Data Mining Models and Algorithms, Springer science + Business media, LLC
Oliveria S R M and Zaiana O R 2003 Privacy Preserving Clustering by Data Transformation, In: 18th Brezilian Symposium on Databases (SBBD 2003), 304–318
Pang–Ningtan, Michael Steinbach and Vipinkumar 2006 Introduction to data mining, Pearson Education, Inc
Rashid Hussain Khokhar, Rui Chen, Benjamin C M, Fung and Siu Man Lui 2014 Quantifying The Costs And Benefits of Privacy-Preserving Health Data Publishing. J. Biomed. Informatics 50: 107–121
Article Google Scholar
Samarati P 2001 Protecting Respondents Identifies In Micro Data Release. IEEE Trans. Knowl. Data Eng. 13 (6): 1010–1027
Article Google Scholar
Sarowar Sattar A H M, Jauyoug Li, Xiaofeng Ding, Jixue liu and Millist Vincent 2013 A general framework for privacy preserving data publishing. Knowl. Based Syst. 54: 276–287
Article Google Scholar
Shlome Berkovsky, Tsvi Kufik and Francesco Ricci 2012 The Impact Of Obfuscation On the accuracy of collaborative filtering. Experts Syst. Appl. 39: 5033–5042
Article Google Scholar
Songjie Gong 2011 Privacy Preserving Collabrative Filtering based Randomized Perturbation Techniques and Secure Multiparty Computing. Int. J. Adv. Comput. Technol. 3
Srujana Marugu and Joydeep Ghosh 2005 A privacy sensitive approach to distributed clustering. Pattern Recognit. Lett. 26: 399–410
Stanley R M olivera and Osmar R Zainne 2007 A Privacy Preserving Clustering Approach Towards Secure And Effective Data Analysis For Business Collaboration. Comput. Security 26: 81–93
Sweeney L 2002 Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuz 10 (6): 571–588
Article MATH MathSciNet Google Scholar
Tamas S Gal, Thomas C Tucker, Aryya Gangopadhyay and Zhiyuan Chen 2014 A data recipient centered de-identification method to retain statistical attributes. J. Biomed. Informatics 50: 32–45
Thomas M Cover and Joy A Thomas 1991 Elements of Information Theory, John Wiley & Sons, Inc. Print ISBN 0-471-06259-6 Online ISBN 0-471-20061-1
Tiancheng Li, Ninghui Li, Jian Zhang and Ian Molloy 2012 Slicing: A New Approach for Privacy Preserving Data Publishing. IEEE Trans. Knowl. Data Eng. 24 (3)
Wang J, Zhong W J, Zhang J and Xu S T 2006 Selective Data Distortion via Structural Partition and SSVD for Privacy Preservation, In: Proceedings of the 2006 International conference on Information and Knowledge Engineering, 114–120
Weijia Yang and Shang Tang Haung 2008 Data Privacy Protection In Multi Party Clustering. Data Knowl. Eng. 67: 185–199
Weiwei Ni and Zhihong Chong 2012 Clustering-Oriented Privacy Preserving Data Publishing. Knowl. Based Syst. 35: 264–270
Wu M and Zhang A D 2004 Feature Selection for Classifying High-Dimensional Numerical Data, Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2004), pp. 251–258
Yao Y 2003 Information-Theoretic Measures for Knowledge Discovery And Data Mining, In: Karmeshu, ed.: Entropy Measures, Maximum Entropy and Emerging Applications. Berlin Springer, 115–136
Yaping Li, Minghua Chen and Wei Zhang 2012 Enabling Multi-Level Trust in Privacy Preserving Data Mining. IEEE Trans. Knowl. Data Eng. 24 (9): 1598–1612
Zhu D, Li X B and Wu S 2009 Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining. Decis. Support Syst. 48 (1): 133–140
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept of Computer Science and Engineering, Faculty of Engineering, Avinashilingam Institute for Homescience and Higher Education for Women University, Coimbatore, 641108, India
R PRAVEENA PRIYADARSINI & S SIVAKUMARI
Dept of Computer Science and Engineering, Government College of Technology, Coimbatore, 641 025, India
M L VALARMATHI

Authors

R PRAVEENA PRIYADARSINI
View author publications
You can also search for this author in PubMed Google Scholar
M L VALARMATHI
View author publications
You can also search for this author in PubMed Google Scholar
S SIVAKUMARI
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R PRAVEENA PRIYADARSINI.

Rights and permissions

Reprints and permissions

About this article

Cite this article

PRIYADARSINI, R.P., VALARMATHI, M.L. & SIVAKUMARI, S. Attribute association based privacy preservation for multi trust level environment. Sadhana 40, 1769–1792 (2015). https://doi.org/10.1007/s12046-015-0412-4

Download citation

Received: 29 August 2014
Revised: 16 March 2015
Accepted: 06 June 2015
Published: 30 October 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s12046-015-0412-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attribute association based privacy preservation for multi trust level environment

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Big healthcare data: preserving security and privacy

Big data privacy: a technological perspective and review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attribute association based privacy preservation for multi trust level environment

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Big healthcare data: preserving security and privacy

Big data privacy: a technological perspective and review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation