Skip to main content
Log in

Subspace based noise addition for privacy preserved data mining on high dimensional continuous data

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Clustering is an important data mining technique. Due to privacy concerns associated with data mining, privacy preservation algorithms have been developed in the last decade. Noise addition is a popular privacy preservation technique. However, in recent years a lot of applications have been dealing with high dimensional datasets. The present privacy-preserving algorithms perform noise addition in a univariate manner along each dimension. This approach does not work well with high dimensional continuous datasets because the distance measurements that contrast a point between similar and non-similar is less and data groups differently in different dimensions. Therefore information loss and data loss are very high, clusters identified are reduced drastically and sometimes even wrong clusters are identified. Data mining performed on such data is ineffective and sometimes invalid. This paper proposes a novel technique called subspace based noise addition (SBNA) that adds noise in subspaces. Dense and Non-dense subspaces are identified and noise addition is then performed separately in dense and non-dense subspaces. Noise is added such that points lying in dense and non-dense subspaces continue to lie in their respective subspaces even after privacy preservation. This approach reduces data loss, information loss and maximizes the identification of clusters. This ensures effective data mining. Experiments are performed on benchmark high dimensional continuous datasets from UCI Machine Learning Repository and the results are compared with the related works like SNA, NALT, and NANLT. SBNA provides an improvement of up to 80% in data utility, 90% in cluster identification and information measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD, pp 439–450

  • Agrawal R, Gehrke J, Gunopulos D, Raghavan R (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, pp 94–105

  • Agrawal R, Gehrke J, Gunopulos D et al (2005) Automatic subspace clustering of high dimensional data. Data Min Knowl Disc 11:5–33. https://doi.org/10.1007/s10618-005-1396-1

    Article  MathSciNet  Google Scholar 

  • Ankerst M, Markus M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. Proceedings of the ACM SIGMOD international conference on management of data, Philadelphia, pp 49–60

    Google Scholar 

  • Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Bertino E, Fovino F, Provenza LP (2005) A Framework for evaluating privacy preserving data mining algorithms data mining and knowledge discovery 11:121–154

    Google Scholar 

  • Beyer K, Goldstein J (1999) When is nearest neighbor meaningful? In: Proceedings of the 7th international conference on database theory, Database theory—ICDT’99, vol 1540, pp 217–235

  • Brand R (2002). Microdata protection through noise addition. In: Lecture notes in computer science. Springer, London

  • Cao H, Liu S, Wu L, Guan Z, Du X (2018) Achieving differential privacy against non-intrusive load monitoring in smart grid: a fog computing approach. Comput Pract Exp, Concurr, p e4528

    Google Scholar 

  • Carrizosa E, Gómez A, Morales D (2017) Clustering categories in support vector machines. Omega 66:28–37

    Article  Google Scholar 

  • Clifton C, Kantarcioglou M, Lin X and Zhu M (2002) Tools for privacy preserving distributed data mining SIGKDD explorations, vol 4(2)

  • Cui Y, Wong Y, Cheung DW (2009) Privacy preserving clustering with high accuracy and low time complexity DASFAA. In: LNCS, vol 5463, pp 456–470. Springer, Berlin

  • Dittrich D, Kenneally E (2012) The Menlo report: ethical principles guiding information and communication technology research. US Department of Homeland Security, Washington

    Google Scholar 

  • Du W, Atallah M (2001) Privacy-preserving cooperative statistical analysis. In: Annual computer security applications conference (ACSAC), pp 102–110, New Orleans, 10–14 December 2001

  • Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, Portland, pp 291–316

  • Fan W, He J, Guo M, Li P, Han Z, Wang R (2019) Privacy preserving classification on local differential privacy in data centers. J Parallel Distrib Comput 135:70–82

    Article  Google Scholar 

  • Fanyu B (2018) A high-order clustering algorithm based on dropout deep learning for heterogeneous data in cyber-physical-social systems. IEEE Access 6:11687–11693

    Article  Google Scholar 

  • Florin M, Balcan T, Liang Y, Mou W, Zhang H (2017) Differentially private clustering in high-dimensional Euclidean spaces. In: Proceedings of the 34th international conference on machine learning, Sydney, PMLR 70

  • Fung BCM, Wang K, Wang L, Hung PCK (2009) Privacy preserving data publishing for cluster analysis. Data Knowl Eng 68:552–575

    Article  Google Scholar 

  • Fung B, Trojer T, Hung PCK, Xiong L, Hussaeni K, Dssouli R (2012) Service-oriented architecture for high-dimensional private data mashup. IEEE Trans Serv Comput 5(3):373–386

    Article  Google Scholar 

  • Gaby G, Iqbal M, Fung B (2015) Fusion: privacy-preserving distributed protocol for high-dimensional data mashup. In: IEEE 21st international conference on parallel and distributed systems

  • Goryczka S, Li Xiong, Fung B (2014) m-Privacy for collaborative data publishing. IEEE Trans Knowl Data Eng 26(10):2520–2533

    Article  Google Scholar 

  • Hamm JH (2015) Preserving privacy of continuous high dimensional data with minimax filters. In: Proceedings of the 18th international conference on artificial intelligence and statistics (AISTATS), vol 38, San Diego, JMLR: W&CP

  • Hassan M, Rahmani M, Chen J (2019) Privacy preservation in blockchain based IoT systems: integration issues, prospects, challenges, and future research directions. Future Gener Comput Syst 97(2019):512–529

    Article  Google Scholar 

  • Hassani M, Hansen M (2015) Subspace: interface to OpenSubspace. R package version 1.0.4. https://CRAN.project.org/package=subspace

  • Hinneburg A, Keim A (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceeding of the 4th international conference on knowledge discovery and data mining, New York, pp 224–228

  • Hussaeni K, Fung B, Cheung W (2014) Privacy-preserving trajectory stream publishing. Data Knowl Eng 94:89–109

    Article  Google Scholar 

  • Jha S, Krugel L, McDaniel P (2005) Privacy preserving clustering ESORICS. In: LNCS, vol 3679, pp 397–417. Springer, Berlin

  • Kaur A, Dutta A (2015) A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J Big Data (Springer) 2:1–24

    Google Scholar 

  • Kim J, Winkler W (2003) Multiplicative noise for masking continuous data. In: Technical report statistics #2003-01, Statistical Research Division, US Bureau of the Census, Washington D.C.

  • Klein MD, Datta GS (2017) Statistical disclosure control via sufficiency under the multiple linear regression model. J Stat Theor Pract 12(1):100–110. https://doi.org/10.1080/15598608.2017.1350606

    Article  MathSciNet  MATH  Google Scholar 

  • Kriegal HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. In: ACM transactions on knowledge discovery from data, vol 3

  • Kumar P, Varma KI, Sureka A (2011) Fuzzy based clustering algorithm for privacy preserving data mining. Int J Bus Inf Syst 7(1):27–40

    Google Scholar 

  • Lefons E, Silvestri A, Tangorra F (1983) An analytic approach to statistical databases. In: Proceeding of the 9th international conference on very large data bases, pp 260–274

  • Li T, Venkatasubramanian S (2010) t-Closeness: privacy beyond k-anonymity and l-diversity. IEEE Trans Knowl Data Eng 22(7):943–956

    Article  Google Scholar 

  • Li L, Lu R, Choo KR, Datta A, Shao J (2016) Privacy-preserving-outsourced association rule mining on vertically partitioned databases. IEEE Trans Inf Forensics Secur 11(8):1847–1861

    Article  Google Scholar 

  • Liew C, Choi C, Liew J (1985) A data distortion by probability distribution. ACM Trans Database Syst (TODS) 10(3):395–411

    Article  Google Scholar 

  • Liu F, Li T (2018) A clustering-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 2018(5):1–8

    Google Scholar 

  • Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18:92–106. https://doi.org/10.1109/TKDE.2006.14

    Article  Google Scholar 

  • Machanavajjhala Gehrke A, Kiefer D, Venkatasubramanian M (2006) L-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd international conference on data engineering, IEEE, Atlanta, GA, USA, pp 13–24. https://doi.org/10.1109/ICDE.2006.1

  • Mafruz ZA, Taniar D, Smith AT (2005) PPDAM: privacy-preserving distributed association rule mining algorithm. IJIIT 1(1):49–69

    MATH  Google Scholar 

  • Mateo-Sanz J, Domingo-Ferrer J, Sebe F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11:181–193

    Article  MathSciNet  Google Scholar 

  • Matthias T, Alexander K, Bernhard M (2015) Statistical disclosure control for micro-data using the R package sdcMicro. J Stat Softw 67(4):1–36. https://doi.org/10.18637/jss.v067.i04

    Article  Google Scholar 

  • Mohammed N, Fung B, Hung H, Lee C (2009) Anonymizing healthcare data: a case study on the blood transfusion service. In: Proceeding of the 15th ACM SIGKDD international conference knowledge discovery and data mining, pp 1285–1294

  • Mondero D, Forni J, Ferrer J (2010) From t-closeness-like privacy to post randomization via information theory. IEEE Trans Knowl Data Eng 22(11):1623–1636

    Article  Google Scholar 

  • Oliveira SRM, Zaiane OR (2010) Privacy preserving clustering by data transformation. J Inf Data Manag 1(1):37–51

    Google Scholar 

  • Onashoga SA, Bamiro BA, Akinwale J, Oguntuase JA (2017) KC-slice: a dynamic privacy preserving data publishing technique for multisensitive attributes. Inf Secur J Glob Perspect 26(3):121–135

    Article  Google Scholar 

  • Panagopoulos P, Pappu V, Xanthopoulos P, Pardalos PM (2015) Constrained subspace classifier for high dimensional datasets. Omega. https://doi.org/10.1016/j.omega.2015.05.-009i

    Article  Google Scholar 

  • Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD 6(1):90–105

    Article  Google Scholar 

  • Purohit R, Bhargava D (2017) An illustration to secured way of data mining using privacy preserving data mining. J Stat Manag Syst 20(4):637–645

    Article  Google Scholar 

  • Rajesh N, Selvakumar AAL (2019) Association rules and deep learning for cryptographic algorithm in privacy preserving data mining. Cluster Computing 22 (S1):119–131

    Article  Google Scholar 

  • R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/

  • Sivarajah U, Kamal M, Irani M, Weerakkody V (2016) Critical analysis of big data challenges and analytical methods. J Bus Res 70:263–286

    Article  Google Scholar 

  • Soria-Comas J, Domingo-Ferrer J, Sánchez D, Megías D (2017) Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Trans Inf Forensics Secur 12(6):1418–1429

    Article  Google Scholar 

  • Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570

    Article  MathSciNet  Google Scholar 

  • Taipale KA (2003) Data mining and domestic security: connecting the dots to make sense of data. Columbia Sci Technol Law Rev 5(2):83

    Google Scholar 

  • Tao Y, Chen H, Xiao X, Zaou S (2009) Angel: enhancing the utility of generalization for privacy preserving publication. IEEE Trans Knowl Data Eng 21(7):1073–1087

    Article  Google Scholar 

  • Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 206–215

  • Waluyo AB, Taniar D, Rahayu W, Srinivasan B (2018) A dual privacy preserving approach for location-based services mobile multicast environment. Mobile Netw Appl 23:34. https://doi.org/10.1007/s11036-017-0898-6

    Article  Google Scholar 

  • Wang Y, Wang YX, Singh A (2015). Differentially private subspace clustering. In: NIPS’15 proceedings of the 28th international conference on neural information processing systems, pp 1000–1008. Research Collection School of Information Systems

  • Wu TY, Lin J, Zhang Y, Chen CH (2019) A grid-based swarm intelligence algorithm for privacy-preserving data mining. Appl Sci 9(4):774

    Article  Google Scholar 

  • Xin Y, Qiang Y, Yang X (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Inf Sci 378:131–143

    Article  Google Scholar 

  • Xing K, Hu C, Yu J (2017) Mutual privacy preserving K-means clustering in social participatory sensing. IEEE Trans Ind Inf 13(4):2066–2076

    Article  Google Scholar 

  • Yi X, Zhang Y (2013) Equally contributory privacy-preserving k-means clustering over vertically partitioned data. Inf Syst 38(1):97–107

    Article  Google Scholar 

  • Zheng X, Luo G, Tian L, Xiao B (2019) Privacy-preserved community discovery in online social networks. Future Gener Comp Sys 93:1002–1009

    Article  Google Scholar 

  • Zhou S, Taniar D, Adhinugraha KM (2015) Range-kNN queries with privacy protection in a mobile environment. Pervasive Mobile Comput 24:30–49

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shashidhar Virupaksha.

Ethics declarations

Conflict of interest

The authors have no conflict of interest.

Human participants or animals

This manuscript does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Virupaksha, S., Dondeti, V. Subspace based noise addition for privacy preserved data mining on high dimensional continuous data. J Ambient Intell Human Comput (2020). https://doi.org/10.1007/s12652-020-01881-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12652-020-01881-8

Keywords

Navigation