Subspace based noise addition for privacy preserved data mining on high dimensional continuous data

Virupaksha, Shashidhar; Dondeti, Venkatesulu

doi:10.1007/s12652-020-01881-8

Subspace based noise addition for privacy preserved data mining on high dimensional continuous data

Original Research
Published: 21 March 2020

(2020)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Shashidhar Virupaksha^1,2 &
Venkatesulu Dondeti¹

245 Accesses
4 Citations
Explore all metrics

Abstract

Clustering is an important data mining technique. Due to privacy concerns associated with data mining, privacy preservation algorithms have been developed in the last decade. Noise addition is a popular privacy preservation technique. However, in recent years a lot of applications have been dealing with high dimensional datasets. The present privacy-preserving algorithms perform noise addition in a univariate manner along each dimension. This approach does not work well with high dimensional continuous datasets because the distance measurements that contrast a point between similar and non-similar is less and data groups differently in different dimensions. Therefore information loss and data loss are very high, clusters identified are reduced drastically and sometimes even wrong clusters are identified. Data mining performed on such data is ineffective and sometimes invalid. This paper proposes a novel technique called subspace based noise addition (SBNA) that adds noise in subspaces. Dense and Non-dense subspaces are identified and noise addition is then performed separately in dense and non-dense subspaces. Noise is added such that points lying in dense and non-dense subspaces continue to lie in their respective subspaces even after privacy preservation. This approach reduces data loss, information loss and maximizes the identification of clusters. This ensures effective data mining. Experiments are performed on benchmark high dimensional continuous datasets from UCI Machine Learning Repository and the results are compared with the related works like SNA, NALT, and NANLT. SBNA provides an improvement of up to 80% in data utility, 90% in cluster identification and information measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data

Article 29 January 2021

Soft Subspace Growing Neural Gas for Data Stream Clustering

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Article 22 April 2020

References

Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD, pp 439–450
Agrawal R, Gehrke J, Gunopulos D, Raghavan R (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, pp 94–105
Agrawal R, Gehrke J, Gunopulos D et al (2005) Automatic subspace clustering of high dimensional data. Data Min Knowl Disc 11:5–33. https://doi.org/10.1007/s10618-005-1396-1
Article MathSciNet Google Scholar
Ankerst M, Markus M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. Proceedings of the ACM SIGMOD international conference on management of data, Philadelphia, pp 49–60
Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bertino E, Fovino F, Provenza LP (2005) A Framework for evaluating privacy preserving data mining algorithms data mining and knowledge discovery 11:121–154
Google Scholar
Beyer K, Goldstein J (1999) When is nearest neighbor meaningful? In: Proceedings of the 7th international conference on database theory, Database theory—ICDT’99, vol 1540, pp 217–235
Brand R (2002). Microdata protection through noise addition. In: Lecture notes in computer science. Springer, London
Cao H, Liu S, Wu L, Guan Z, Du X (2018) Achieving differential privacy against non-intrusive load monitoring in smart grid: a fog computing approach. Comput Pract Exp, Concurr, p e4528
Google Scholar
Carrizosa E, Gómez A, Morales D (2017) Clustering categories in support vector machines. Omega 66:28–37
Article Google Scholar
Clifton C, Kantarcioglou M, Lin X and Zhu M (2002) Tools for privacy preserving distributed data mining SIGKDD explorations, vol 4(2)
Cui Y, Wong Y, Cheung DW (2009) Privacy preserving clustering with high accuracy and low time complexity DASFAA. In: LNCS, vol 5463, pp 456–470. Springer, Berlin
Dittrich D, Kenneally E (2012) The Menlo report: ethical principles guiding information and communication technology research. US Department of Homeland Security, Washington
Google Scholar
Du W, Atallah M (2001) Privacy-preserving cooperative statistical analysis. In: Annual computer security applications conference (ACSAC), pp 102–110, New Orleans, 10–14 December 2001
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, Portland, pp 291–316
Fan W, He J, Guo M, Li P, Han Z, Wang R (2019) Privacy preserving classification on local differential privacy in data centers. J Parallel Distrib Comput 135:70–82
Article Google Scholar
Fanyu B (2018) A high-order clustering algorithm based on dropout deep learning for heterogeneous data in cyber-physical-social systems. IEEE Access 6:11687–11693
Article Google Scholar
Florin M, Balcan T, Liang Y, Mou W, Zhang H (2017) Differentially private clustering in high-dimensional Euclidean spaces. In: Proceedings of the 34th international conference on machine learning, Sydney, PMLR 70
Fung BCM, Wang K, Wang L, Hung PCK (2009) Privacy preserving data publishing for cluster analysis. Data Knowl Eng 68:552–575
Article Google Scholar
Fung B, Trojer T, Hung PCK, Xiong L, Hussaeni K, Dssouli R (2012) Service-oriented architecture for high-dimensional private data mashup. IEEE Trans Serv Comput 5(3):373–386
Article Google Scholar
Gaby G, Iqbal M, Fung B (2015) Fusion: privacy-preserving distributed protocol for high-dimensional data mashup. In: IEEE 21st international conference on parallel and distributed systems
Goryczka S, Li Xiong, Fung B (2014) m-Privacy for collaborative data publishing. IEEE Trans Knowl Data Eng 26(10):2520–2533
Article Google Scholar
Hamm JH (2015) Preserving privacy of continuous high dimensional data with minimax filters. In: Proceedings of the 18th international conference on artificial intelligence and statistics (AISTATS), vol 38, San Diego, JMLR: W&CP
Hassan M, Rahmani M, Chen J (2019) Privacy preservation in blockchain based IoT systems: integration issues, prospects, challenges, and future research directions. Future Gener Comput Syst 97(2019):512–529
Article Google Scholar
Hassani M, Hansen M (2015) Subspace: interface to OpenSubspace. R package version 1.0.4. https://CRAN.project.org/package=subspace
Hinneburg A, Keim A (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceeding of the 4th international conference on knowledge discovery and data mining, New York, pp 224–228
Hussaeni K, Fung B, Cheung W (2014) Privacy-preserving trajectory stream publishing. Data Knowl Eng 94:89–109
Article Google Scholar
Jha S, Krugel L, McDaniel P (2005) Privacy preserving clustering ESORICS. In: LNCS, vol 3679, pp 397–417. Springer, Berlin
Kaur A, Dutta A (2015) A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J Big Data (Springer) 2:1–24
Google Scholar
Kim J, Winkler W (2003) Multiplicative noise for masking continuous data. In: Technical report statistics #2003-01, Statistical Research Division, US Bureau of the Census, Washington D.C.
Klein MD, Datta GS (2017) Statistical disclosure control via sufficiency under the multiple linear regression model. J Stat Theor Pract 12(1):100–110. https://doi.org/10.1080/15598608.2017.1350606
Article MathSciNet MATH Google Scholar
Kriegal HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. In: ACM transactions on knowledge discovery from data, vol 3
Kumar P, Varma KI, Sureka A (2011) Fuzzy based clustering algorithm for privacy preserving data mining. Int J Bus Inf Syst 7(1):27–40
Google Scholar
Lefons E, Silvestri A, Tangorra F (1983) An analytic approach to statistical databases. In: Proceeding of the 9th international conference on very large data bases, pp 260–274
Li T, Venkatasubramanian S (2010) t-Closeness: privacy beyond k-anonymity and l-diversity. IEEE Trans Knowl Data Eng 22(7):943–956
Article Google Scholar
Li L, Lu R, Choo KR, Datta A, Shao J (2016) Privacy-preserving-outsourced association rule mining on vertically partitioned databases. IEEE Trans Inf Forensics Secur 11(8):1847–1861
Article Google Scholar
Liew C, Choi C, Liew J (1985) A data distortion by probability distribution. ACM Trans Database Syst (TODS) 10(3):395–411
Article Google Scholar
Liu F, Li T (2018) A clustering-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 2018(5):1–8
Google Scholar
Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18:92–106. https://doi.org/10.1109/TKDE.2006.14
Article Google Scholar
Machanavajjhala Gehrke A, Kiefer D, Venkatasubramanian M (2006) L-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd international conference on data engineering, IEEE, Atlanta, GA, USA, pp 13–24. https://doi.org/10.1109/ICDE.2006.1
Mafruz ZA, Taniar D, Smith AT (2005) PPDAM: privacy-preserving distributed association rule mining algorithm. IJIIT 1(1):49–69
MATH Google Scholar
Mateo-Sanz J, Domingo-Ferrer J, Sebe F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11:181–193
Article MathSciNet Google Scholar
Matthias T, Alexander K, Bernhard M (2015) Statistical disclosure control for micro-data using the R package sdcMicro. J Stat Softw 67(4):1–36. https://doi.org/10.18637/jss.v067.i04
Article Google Scholar
Mohammed N, Fung B, Hung H, Lee C (2009) Anonymizing healthcare data: a case study on the blood transfusion service. In: Proceeding of the 15th ACM SIGKDD international conference knowledge discovery and data mining, pp 1285–1294
Mondero D, Forni J, Ferrer J (2010) From t-closeness-like privacy to post randomization via information theory. IEEE Trans Knowl Data Eng 22(11):1623–1636
Article Google Scholar
Oliveira SRM, Zaiane OR (2010) Privacy preserving clustering by data transformation. J Inf Data Manag 1(1):37–51
Google Scholar
Onashoga SA, Bamiro BA, Akinwale J, Oguntuase JA (2017) KC-slice: a dynamic privacy preserving data publishing technique for multisensitive attributes. Inf Secur J Glob Perspect 26(3):121–135
Article Google Scholar
Panagopoulos P, Pappu V, Xanthopoulos P, Pardalos PM (2015) Constrained subspace classifier for high dimensional datasets. Omega. https://doi.org/10.1016/j.omega.2015.05.-009i
Article Google Scholar
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD 6(1):90–105
Article Google Scholar
Purohit R, Bhargava D (2017) An illustration to secured way of data mining using privacy preserving data mining. J Stat Manag Syst 20(4):637–645
Article Google Scholar
Rajesh N, Selvakumar AAL (2019) Association rules and deep learning for cryptographic algorithm in privacy preserving data mining. Cluster Computing 22 (S1):119–131
Article Google Scholar
R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
Sivarajah U, Kamal M, Irani M, Weerakkody V (2016) Critical analysis of big data challenges and analytical methods. J Bus Res 70:263–286
Article Google Scholar
Soria-Comas J, Domingo-Ferrer J, Sánchez D, Megías D (2017) Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Trans Inf Forensics Secur 12(6):1418–1429
Article Google Scholar
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Article MathSciNet Google Scholar
Taipale KA (2003) Data mining and domestic security: connecting the dots to make sense of data. Columbia Sci Technol Law Rev 5(2):83
Google Scholar
Tao Y, Chen H, Xiao X, Zaou S (2009) Angel: enhancing the utility of generalization for privacy preserving publication. IEEE Trans Knowl Data Eng 21(7):1073–1087
Article Google Scholar
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 206–215
Waluyo AB, Taniar D, Rahayu W, Srinivasan B (2018) A dual privacy preserving approach for location-based services mobile multicast environment. Mobile Netw Appl 23:34. https://doi.org/10.1007/s11036-017-0898-6
Article Google Scholar
Wang Y, Wang YX, Singh A (2015). Differentially private subspace clustering. In: NIPS’15 proceedings of the 28th international conference on neural information processing systems, pp 1000–1008. Research Collection School of Information Systems
Wu TY, Lin J, Zhang Y, Chen CH (2019) A grid-based swarm intelligence algorithm for privacy-preserving data mining. Appl Sci 9(4):774
Article Google Scholar
Xin Y, Qiang Y, Yang X (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Inf Sci 378:131–143
Article Google Scholar
Xing K, Hu C, Yu J (2017) Mutual privacy preserving K-means clustering in social participatory sensing. IEEE Trans Ind Inf 13(4):2066–2076
Article Google Scholar
Yi X, Zhang Y (2013) Equally contributory privacy-preserving k-means clustering over vertically partitioned data. Inf Syst 38(1):97–107
Article Google Scholar
Zheng X, Luo G, Tian L, Xiao B (2019) Privacy-preserved community discovery in online social networks. Future Gener Comp Sys 93:1002–1009
Article Google Scholar
Zhou S, Taniar D, Adhinugraha KM (2015) Range-kNN queries with privacy protection in a mobile environment. Pervasive Mobile Comput 24:30–49
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, VFSTR (Deemed to be University), Guntur, India
Shashidhar Virupaksha & Venkatesulu Dondeti
Department of CSE, Presidency University, Bengaluru, India
Shashidhar Virupaksha

Authors

Shashidhar Virupaksha
View author publications
You can also search for this author in PubMed Google Scholar
Venkatesulu Dondeti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shashidhar Virupaksha.

Ethics declarations

Conflict of interest

The authors have no conflict of interest.

Human participants or animals

This manuscript does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Virupaksha, S., Dondeti, V. Subspace based noise addition for privacy preserved data mining on high dimensional continuous data. J Ambient Intell Human Comput (2020). https://doi.org/10.1007/s12652-020-01881-8

Download citation

Received: 08 October 2019
Accepted: 06 March 2020
Published: 21 March 2020
DOI: https://doi.org/10.1007/s12652-020-01881-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Subspace based noise addition for privacy preserved data mining on high dimensional continuous data

Abstract

Access this article

Similar content being viewed by others

Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data

Soft Subspace Growing Neural Gas for Data Stream Clustering

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human participants or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Subspace based noise addition for privacy preserved data mining on high dimensional continuous data

Abstract

Access this article

Similar content being viewed by others

Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data

Soft Subspace Growing Neural Gas for Data Stream Clustering

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human participants or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation