An evolutionary feature set decomposition based anonymization for classification workloads: Privacy Preserving Data Mining

Ilavarasi, A. K.; Sathiyabhama, B.

doi:10.1007/s10586-017-1108-9

An evolutionary feature set decomposition based anonymization for classification workloads: Privacy Preserving Data Mining

Published: 30 August 2017

Volume 20, pages 3515–3525, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

494 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Privacy has become an important concern while publishing micro data about a population. The emerging area called privacy preserving data mining (PPDM) focus on individual privacy without compromising data mining results. An adversarial exploitation of published data poses a risk of information disclosure about individuals. On the other hand, imposing privacy constraints on the data results in substantial information loss and compromises the legitimate data analysis. Motivated by the increasing growth of PPDM algorithms, we first investigate the privacy implications and the crosscutting issues between privacy versus utility of data. We present a privacy model that embeds the anonymization procedure in to a learning algorithm and this has mitigated the additional overheads imposed on data mining tasks. Our primary concern about PPDM is that the utility of data should not be compromised by the transformation applied. Different data mining classification workloads are analyzed with the proposed anonymization procedure for any side effects incurred. It is shown empirically that classification accuracy obtained for most of the datasets outperforms the results obtained with original dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid Optimization of Cuckoo Search and Differential Evolution Algorithm for Privacy-Preserving Data Mining

Task Oriented Privacy Preserving Data Publishing Using Feature Selection

Privacy Preserving Distributed Data Mining with Evolutionary Computing

References

Regard, H.: Recommendation of the council concerning guidelines governing the protection of privacy and transborder flows of personal data (2013)
Centers for Disease Control and Prevention: HIPAA privacy rule and public health. Guidance from CDC and the US Department of Health and Human Services. MMWR Morb. Mortal. Weekly Rep. 52(Suppl. 1), 1–17 (2003)
Google Scholar
Canadian Environmental Protection Act: The House of Commons of Canada (Bill C-32, as passed by the House of Commons) (1999)
Oliveira, S.R.M., Osmar, R.Z.: Toward standardization in privacy-preserving data mining. In: ACM SIGKDD 3rd Workshop on Data Mining Standards, vol. 7 (2004)
Sweeney, L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Machanavajjhala, A., et al.: l-Diversity: privacy beyond k-anonymity. In: ACM Transactions on Knowledge Discovery from Data (TKDD)vol. 1, no. 1 (2007)
Aggarwal, C. C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment (2005)
Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 70–78. ACM, New York (2008)
Wang, K., Yu, P.S., Chakraborty, W.K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Fourth IEEE International Conference on 2004. InData Mining, ICDM’04, pp. 249–256. IEEE, New York (2004)
Chen, R., Fung, B.C., Mohammed, N., Desai, B.C., Wang, K.: Privacy-preserving trajectory data publishing by local suppression. Inf. Sci. 10(231), 83–97 (2013)
Article MATH Google Scholar
Maimon, O., Rokach, L.: Decomposition methodology for knowledge discovery and data mining. In: Data Mining and Knowledge Discovery Handbook, pp. 981–1003. Springer, New York (2005)
Harding, J.A., Shahbaz, M., Kusiak, A.: Data mining in manufacturing: a review. J. Manuf. Sci. Eng. 128(4), 969–976 (2006)
Article Google Scholar
Chipperfield, A.J., Fleming, P.J.: The MATLAB genetic algorithm toolbox. In: IEE Colloquium on Applied Control Techniques Using MATLAB (pp. 10-1). IET, Stevenage (1995)
Jain, P., Gyanchandani, M., Khare, N.: Big data privacy: a technological perspective and review. J. Big Data 3(1), 25 (2016)
Article Google Scholar
Ying, S., Mingsheng, Y., Yuan, F.: Quantum privacy-preserving data mining. arXiv preprint arXiv:1512.04009 (2015)
Ying, S., Mingsheng, Y., Yuan, F.: Quantum privacy-preserving data analytics. arXiv preprint arXiv:1702.04420 (2017)
Jaiswal, J.K., Rita, S., Ilango, P.: Anonymization in PPDM based on data distributions and attribute relations. Indian J. Sci. Technol. 9, 37 (2016)
Bertino, E., Lin, D., Jiang, W.: A Survey of Quantification of Privacy Preserving Data Mining Algorithms. Springer, New York (2008)
Book Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 279–288 (2002)
LeFevre, K., DeWitt, D.J., Raghu, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, 2006. ICDE’06. IEEE, New York (2006)
Aggarwal, C.C., Yu Philip, S.: A condensation approach to privacy preserving data mining. In: Advances in Database Technology-EDBT, vol. 2004, pp. 183–199. Springer, Berlin (2004)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, Burlington (2011)
MATH Google Scholar
Pramanik, M.I., Raymond, Y.K.L., Wenping, Z.: K-anonymity through the enhanced clustering method. In: 2016 IEEE 13th International Conference on e-Business Engineering (ICEBE). IEEE, New York (2016)
Inan, A., Kantarcioglu, M., Elisa, B.: Using anonymized data for classification. In: IEEE 25th International Conference on Data Engineering, 2009. ICDE’09. IEEE, New York (2009)
Lanzi, P.L.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary Computation, 1997, pp. 537–540. IEEE, New York (1997)
Rokach, L., Maimon, O.: Data mining using decomposition methods. In: Data Mining and Knowledge Discovery Handbook, pp. 981–998. Springer, New York (2009)
Rokach, L.: Genetic algorithm-based feature set partitioning for classification problems. Pattern Recognit. 41(5), 1676–1700 (2008)
Article MATH Google Scholar
Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: International Conference on Database Systems for Advanced Applications, pp. 188–200. Springer, Berlin (2007)
Duan, Y., Canny, J., Zhan, J.: Efficient privacy-preserving association rule mining: P4P style. In: IEEE Symposium on Computational Intelligence and Data Mining, 2007. CIDM 2007, pp. 654–660. IEEE, Piscataway (2007)
Fung, B., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–25 (2007)
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997). Dec 31
Article MATH Google Scholar
Yu L, Liu H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML Aug 21, vol. 3, pp. 856–863 (2003)
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: AAAI, vol. 2 (1992)
Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 1995. IEEE, New York (1995)
Shin, K., Miyazaki, S.: A fast and accurate feature selection algorithm based on binary consistency measure. In: Computational Intelligence (2015)
Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–526. ACM, New York (2009)
Yildirim, P.: Filter based feature selection methods for prediction of risks in hepatitis disease. Int. J. Mach. Learn. Comput. 5(4), 258 (2015)
Article Google Scholar
Liu, H., Setiono, R.: A probabilistic approach to feature selection—a filter solution. In: 13th International Conference on Machine Learning, pp. 319–327 (1996)
Noever, D., Baskaran, S.: Steady state vs. generational genetic algorithms: a comparison of time complexity and convergence properties. Preprint 1992:92-07 (1992)
Li, T., et al.: Slicing: a new approach for privacy preserving data publishing. IEEE Trans. Knowl. Data Eng. 24(3), 561–574 (2012)
Article Google Scholar
Sugiyama, M., Krauledat, M., Mazller, K.R.: Covariate shift adaptation by importance weighted cross validation. J. Mach. Learn. Res. 8(May), 985–1005 (2007)
MATH Google Scholar
Witten, I. H., Frank, E., Trigg, L.E., Hall, M.A., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with Java implementations (1999)
Derrac, J., Garcia, S., Sanchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17, 255–287 (2011)
Google Scholar
Fung, B.C., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st International Conference on InData Engineering, 2005. ICDE 2005, pp. 205–216. IEEE, New York (2005)
Blake, C., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, Irvine. http://www.archive.ics.uci.edu/ml (1998). Accessed 2015

Download references

Acknowledgements

This research is a part of Grant received from the Department of Science and Technology, Government of India: SEED/WS/018/2015.

Author information

Authors and Affiliations

Sona College of Technology, Salem, Tamilnadu, India
A. K. Ilavarasi & B. Sathiyabhama

Authors

A. K. Ilavarasi
View author publications
You can also search for this author in PubMed Google Scholar
B. Sathiyabhama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. K. Ilavarasi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ilavarasi, A.K., Sathiyabhama, B. An evolutionary feature set decomposition based anonymization for classification workloads: Privacy Preserving Data Mining. Cluster Comput 20, 3515–3525 (2017). https://doi.org/10.1007/s10586-017-1108-9

Download citation

Received: 24 November 2016
Revised: 16 July 2017
Accepted: 10 August 2017
Published: 30 August 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10586-017-1108-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An evolutionary feature set decomposition based anonymization for classification workloads: Privacy Preserving Data Mining

Abstract

Access this article

Similar content being viewed by others

Hybrid Optimization of Cuckoo Search and Differential Evolution Algorithm for Privacy-Preserving Data Mining

Task Oriented Privacy Preserving Data Publishing Using Feature Selection

Privacy Preserving Distributed Data Mining with Evolutionary Computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An evolutionary feature set decomposition based anonymization for classification workloads: Privacy Preserving Data Mining

Abstract

Access this article

Similar content being viewed by others

Hybrid Optimization of Cuckoo Search and Differential Evolution Algorithm for Privacy-Preserving Data Mining

Task Oriented Privacy Preserving Data Publishing Using Feature Selection

Privacy Preserving Distributed Data Mining with Evolutionary Computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation