Skip to main content
Log in

Attribute association based privacy preservation for multi trust level environment

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

Enormous amount of e-data is collected world-wide by organizations for the purpose of their research and decision making. The availability of this heterogeneous, sensitive information in e-databases poses a threat to the privacy of the individual or organization on which the data is collected. Privacy Preserving Data Mining [PPDM] is a field of research which concentrates on preserving data privacy during the process of data mining. This paper proposes a two level partition and perturbation frame work to release multiple copies of privacy preserved datasets in Multi Trust Level [MTL] scenario that can prevent linking and diversity attack. The framework proposes two methods namely, Entropy based Attribute Privacy Preservation [EAPP] and Information Gain based Attribute Privacy Preservation [IGAPP] for privacy preservation in MTL environment. The two methods perform vertical and horizontal partitioning of data for privacy preservation. Simple K-Means clustering algorithm with cluster size 2 using both Euclidean and Manhattan distance functions are used for horizontal partitioning. The vertical partitioning of attributes within the cluster is performed based on their entropy value that indicates its one way association with its class in EAPP method and Information Gain [IG] value of the attributes that indicates the two way associations with class in IGAPP method. The attributes in the clusters are subjected to privacy preservation technique based on their entropy and IG values in EAPP and IGAPP methods, respectively. The effect of distance in clustering the data points on privacy preservation and the ability of the privacy preserved datasets generated using the proposed methods to prevent privacy attacks are studied using variance, rank distortion and utility metrics. Real life medical and bench mark adult data sets have been used here for experimentation. The results show that the generated datasets exhibit good variance and rank distortion values and hence can prevent diversity and linking attacks in MTL environment. Also, the privacy preserved datasets have comparable utility on selected classification and clustering algorithms with original and L-Diversified datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16

Similar content being viewed by others

References

  • Aggarwal C C and Philip S Y 2008 A general survey of privacy-preserving data mining models and algorithms, (pp. 11–52). Springer US

  • Baixeries Jaume 2008 A formal context for symmetric dependencies. Formal Concept Analysis, Berlin Heidelberg, Springer: 90–105

  • Benjamein Fung C M, Kewang, Lingyu Wang and Patrick Hang C K 2009 Privacy Persevering Data Publishing for Cluster Analysis. Data Knowl. Eng. 68: 552–575

    Article  Google Scholar 

  • Charu Aggarwal C and Philip Yu S 2008 An Introduction to Privacy Preserving Data Mining, Privacy Preserving Data Mining: Models and Algorithms. Springer Science, LLC

  • Charu Aggarwal C 2008 Privacy and the Dimensionality Curse, Privacy Preserving Data Mining: Models and Algorithms. Springer Science, LLC

  • Chris R Giannela, Kunliu and Hillol Kargupta 2013 Breaching distance preserving data perturbation using few known inputs. Data Knowl. Eng. 83: 93–110

    Article  Google Scholar 

  • Danker F K and El Emam K 2012 The Application of Differential Privacy to Health Data, EDBT/ICDT Workshops, p 158–166

  • Dinusha Vatsalan, Peter Christen and Vassilios S Verykios 2013 A Taxonomy of Privacy Preserving Record Linking Techniques. Inf. Syst. 38: 946–969

    Article  Google Scholar 

  • Edoardo M Airoldi, XueBai and Bradley A Malin 2011 An Entropy Approach To Disclosure Risk Assessment: Lessons from Real Applications and Simulated Domains. Decis. Support Syst. 51: 10– 20

    Article  Google Scholar 

  • Erez shmueli, Tamir Tassa, Raz Wasserstein, Bracha Shapira and Loar Rokach 2012 Limiting disclosure of sensitive data on sequential releases of data bases. Inf. Sci. 191: 98–127

    Article  Google Scholar 

  • Evfimevski A, Gehrke J and Srikant R 2003 Limiting Privacy Breaches In: Privacy Preserving Data Mining, Proc. ACM PODS

  • Gagan Aggarwal, Tomas Feber, Krishnaram Kenthapadi, Samir Khuller, Rina Panigrahy, Dilys Tohas and Anzhu 2006 Achieving Anonymity via Clustering, PODS’06, June 26–28, Chicago

  • Grigorios Loukides, John Liagouris, Aris Gkoulalas-Divanis and Manolis Terrovitis 2014 Disassociation for electronic health record privacy. J. Biomed. Informatics 50: 46–61

    Article  Google Scholar 

  • Hall M A 1999 Correlation-based Feature Selection for Machine Learning (Doctoral dissertation, The University of Waikato). http://archive.ics.uci.edu/ml/datasets/Adult

  • Hui Wang and Ruilin Liu 2011 Privacy-preserving publishing microdata with full functional dependencies. Data Knowl. Eng. 70: 249–268

    Article  Google Scholar 

  • Ienco D, Pensa R G and Meo R 2012 From Context to Distance: Learning Dissimilarity for Categorical Data Clustering, ACM Trans. Knowl. Discovery Data 6 (1)

  • Javier Herranz, Stan Matwin, Jordi Nin and Vicenc Torra 2010 Classifying data from protected statistical datasets. Comput. Security 29: 875–896

    Article  Google Scholar 

  • Jiawei Han, Micheline Kamber and Jian Pei 2011 Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann

  • Josep Domingo – Ferrar 2008 A survey of inference control methods for privacy persevering data mining , Privacy Persevering Data Mining Models and Algorithms, Springer science + Business media, LLC

  • Lin Xiaodong, Clifton Chris and zhu Michael 2005 Privacy preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8 (1): 68–81

    Article  Google Scholar 

  • Li Liu, Murat Kantarcioglu and BhavaniThuraisingham 2008 The applicability of the perturbation based privacy preserving data mining for real-world data. Data Knowl. Eng. 65 (1): 5–21

    Article  Google Scholar 

  • Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann and Ian H Witten 2009 The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume, 11 (1)

  • Md Zahidul Islam and Lyilyana Brankovic 2011 Privacy Preserving Data Mining: A Noise Addition Framework Using a Novel Clustering Technique. Knowl. Based Syst. 24 (8): 1214–1223

    Article  Google Scholar 

  • Ming Hau and Jian Pei 2008 A Survey Of Utility Based Privacy –Persevering Data Transformation Methods, Privacy Persevering Data Mining Models and Algorithms, Springer science + Business media, LLC

  • Murat Kantanlioglu 2008 A Survey Of Privacy Persevering Methods Across Horizontally Partitioned Data, Privacy Persevering Data Mining Models and Algorithms, Springer science + Business media, LLC

  • Oliveria S R M and Zaiana O R 2003 Privacy Preserving Clustering by Data Transformation, In: 18th Brezilian Symposium on Databases (SBBD 2003), 304–318

  • Pang–Ningtan, Michael Steinbach and Vipinkumar 2006 Introduction to data mining, Pearson Education, Inc

  • Rashid Hussain Khokhar, Rui Chen, Benjamin C M, Fung and Siu Man Lui 2014 Quantifying The Costs And Benefits of Privacy-Preserving Health Data Publishing. J. Biomed. Informatics 50: 107–121

    Article  Google Scholar 

  • Samarati P 2001 Protecting Respondents Identifies In Micro Data Release. IEEE Trans. Knowl. Data Eng. 13 (6): 1010–1027

    Article  Google Scholar 

  • Sarowar Sattar A H M, Jauyoug Li, Xiaofeng Ding, Jixue liu and Millist Vincent 2013 A general framework for privacy preserving data publishing. Knowl. Based Syst. 54: 276–287

    Article  Google Scholar 

  • Shlome Berkovsky, Tsvi Kufik and Francesco Ricci 2012 The Impact Of Obfuscation On the accuracy of collaborative filtering. Experts Syst. Appl. 39: 5033–5042

    Article  Google Scholar 

  • Songjie Gong 2011 Privacy Preserving Collabrative Filtering based Randomized Perturbation Techniques and Secure Multiparty Computing. Int. J. Adv. Comput. Technol. 3

  • Srujana Marugu and Joydeep Ghosh 2005 A privacy sensitive approach to distributed clustering. Pattern Recognit. Lett. 26: 399–410

  • Stanley R M olivera and Osmar R Zainne 2007 A Privacy Preserving Clustering Approach Towards Secure And Effective Data Analysis For Business Collaboration. Comput. Security 26: 81–93

  • Sweeney L 2002 Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuz 10 (6): 571–588

    Article  MATH  MathSciNet  Google Scholar 

  • Tamas S Gal, Thomas C Tucker, Aryya Gangopadhyay and Zhiyuan Chen 2014 A data recipient centered de-identification method to retain statistical attributes. J. Biomed. Informatics 50: 32–45

  • Thomas M Cover and Joy A Thomas 1991 Elements of Information Theory, John Wiley & Sons, Inc. Print ISBN 0-471-06259-6 Online ISBN 0-471-20061-1

  • Tiancheng Li, Ninghui Li, Jian Zhang and Ian Molloy 2012 Slicing: A New Approach for Privacy Preserving Data Publishing. IEEE Trans. Knowl. Data Eng. 24 (3)

  • Wang J, Zhong W J, Zhang J and Xu S T 2006 Selective Data Distortion via Structural Partition and SSVD for Privacy Preservation, In: Proceedings of the 2006 International conference on Information and Knowledge Engineering, 114–120

  • Weijia Yang and Shang Tang Haung 2008 Data Privacy Protection In Multi Party Clustering. Data Knowl. Eng. 67: 185–199

  • Weiwei Ni and Zhihong Chong 2012 Clustering-Oriented Privacy Preserving Data Publishing. Knowl. Based Syst. 35: 264–270

  • Wu M and Zhang A D 2004 Feature Selection for Classifying High-Dimensional Numerical Data, Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2004), pp. 251–258

  • Yao Y 2003 Information-Theoretic Measures for Knowledge Discovery And Data Mining, In: Karmeshu, ed.: Entropy Measures, Maximum Entropy and Emerging Applications. Berlin Springer, 115–136

  • Yaping Li, Minghua Chen and Wei Zhang 2012 Enabling Multi-Level Trust in Privacy Preserving Data Mining. IEEE Trans. Knowl. Data Eng. 24 (9): 1598–1612

  • Zhu D, Li X B and Wu S 2009 Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining. Decis. Support Syst. 48 (1): 133–140

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R PRAVEENA PRIYADARSINI.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

PRIYADARSINI, R.P., VALARMATHI, M.L. & SIVAKUMARI, S. Attribute association based privacy preservation for multi trust level environment. Sadhana 40, 1769–1792 (2015). https://doi.org/10.1007/s12046-015-0412-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-015-0412-4

Keywords

Navigation