Privacy preserving sub-feature selection based on fuzzy probabilities

Bhuyan, Hemanta Kumar; Kamila, Narendra Kumar

doi:10.1007/s10586-014-0393-9

Privacy preserving sub-feature selection based on fuzzy probabilities

Published: 21 August 2014

Volume 17, pages 1383–1399, (2014)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Hemanta Kumar Bhuyan¹ &
Narendra Kumar Kamila²

325 Accesses
15 Citations
Explore all metrics

Abstract

The feature selection addresses the issue of developing accurate models for classification in data mining. The aggregated data collection from distributed environment for feature selection makes the problem of accessing the relevant inputs of individual data records. Preserving the privacy of individual data is often critical issue in distributed data mining. In this paper, it proposes the privacy preservation of individual data for both feature and sub-feature selection based on data mining techniques and fuzzy probabilities. For privacy purpose, each party maintains their privacy as the instruction of data miner with the help of fuzzy probabilities as alias values. The techniques have developed for own database of data miner in distributed network with fuzzy system and also evaluation of sub-feature value included for the processing of data mining task. The feature selection has been explained by existing data mining techniques i.e., gain ratio using fuzzy optimization. The estimation of gain ratio based on the relevant inputs for the feature selection has been evaluated within the expected upper and lower bound of fuzzy data set. It mainly focuses on sub-feature selection with privacy algorithm using fuzzy random variables among different parties in distributed environment. The sub-feature selection is uniquely identified for better class prediction. The algorithm provides the idea of selecting sub-feature using fuzzy probabilities with fuzzy frequency data from data miner’s database. The experimental result shows performance of our findings based on real world data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Interaction Between Feature Selection and Parameter Determination in Fuzzy Modelling

Fuzzy-Based Privacy Preserving Approach in Centralized Database Environment

Optimizing Partition Granularity, Membership Function Parameters, and Rule Bases of Fuzzy Classifiers for Big Data by a Multi-objective Evolutionary Approach

Article 04 January 2019

References

Rogati, M., Yang, Y.: High -performing feature selection for text classification. In: CIKM’02, ACM, McLean, 4–9 Nov (2002)
Azizi, A., Pourreza, H. R.: Efficient IRIS recognition through improvement of feature extraction and subset selection. Int. J. Comput. Sci. Infor. Sec. (IJCIS). 2, (1), (2009)
Uncu, O., Turksen, I.B.: A novel feature selection approach: combining feature wrappers and filters. Infor. Sci. 177(2), 449–466 (2007)
Article MATH MathSciNet Google Scholar
Xia, H., Hu, B.Q.: Feature selection using fuzzy support vector machines. Fuzzy Optim. Decis. Mak. 5(2), 187–192 (2006)
Article MATH Google Scholar
Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)
Article Google Scholar
Rezaee, M. R., Goedhart, B., Lelieveldt, B. P. F., Reiber\(,\) J. H. C.: Fuzzy feature selection. Pattern Recognit. 32, 2011–2019 (1999)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)
Article Google Scholar
Bhuyan, H. K., Kamila, N. K., Mishra, M., Jena, S. S., Bhuyan, G.: Sub-feature selection with privacy in decentralized network based on fuzzy environment. In: Proceedings of CNC 2013, Chennai, India, pp. 19–26. LNICST, Chennai, 22–23 Feb (2013)
Wolf, R., Schuster, A.: Association rule mining in peer-to-peer systems. IEEE Trans. Syst. Man Cybern. Part B 34(6), 2426–2438 (2004)
Article Google Scholar
Bhaduri, K., Wolff, R., Gianella C., Kargupta, H.: Distributed Decision tree induction in peer-to-peer systems. Stat. Anal. Data Min. J. 1(2), 85–103, (2008)
Das, K., Bhaduri, K., Liu, K., Kargupta, H.: Distributed identification of Top-l inner products elements and it’s application in a peer-to-peer network. TKDE 20(4), 475–488 (2008)
Google Scholar
Chen, R., Sivkumar, K., Kargupta, H.: Collective mining of Baysian networks from distributed heterogeneous data. Knowl. Inf. Syst. 6(2), 164–187 (2004)
Article Google Scholar
Al-Zaidy, R., Fung, B.C.M., Youssef, A.M., Fortin, F.: Mining criminal networks from unstructured text documents. Digit. Investig. 8(3—-4), 147–160 (2012)
Article Google Scholar
Nix, R., Kantarcioglu, M.: Incentive compatible privacy-preserving distributed classification. IEEE Trans. Dependable Secure Comput. 9(4), 451–462 (2012)
Clifton, C., Kantarcioglu, M., Lin, X., Vaidya, J., Zhu, M.: Tools for privacy preserving distributed data mining. SIGKDD Explor. 4(2), 28–34 (2003)
Article Google Scholar
Kargupta, H., Das, K., Liu, K.: Multiparty, privacy preserving distributed data mining using game theoretic framework. In: Proceedings of PKDD’07, pp. 523–531. Warsaw (2007)
Zhou, B., Pei, J.: The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl. Inf. Syst. 28(1), 47–77 (2011)
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14 (2010)
Kaleli, C., Polat, H.: Privacy-preserving SOM-based recommendations on horizontally distributed data. Knowl.-Based Syst. 33, 124–135 (2012)
Bhuyan, H. K., Kamila N. K., Dash, S. K.: An approach for privacy preservation of distributed data in peer-to-peer network using multiparty computation. Int. J. Comput. Sci. Issues (IJCSI). 8(4), 2 (2011)
Diamantini, C., Gemelli, A., Potena, D.: Feature ranking based on decision border. In: International conference on pattern recognition, IEEE Computer Society (2010)
Das, K., Bhaduri, K., Kargupta, H.: A local asynchronous distributed privacy preserving feature selection algorithm for large peer to peer networks. Knowl. Inf. Syst. 24(3), 341–367 (2014)
Article Google Scholar
Sun, H. J., Sun, M., Mei, Z.: Feature selection via fuzzy clustering. In: Proceedings of International Conference on Machine Learning and Cybernetics, pp. 1400–1405. (2006)
Zhang, Y., Wu, X.B., Xiang, Z.R., Hu, W.L.: Design of high dimensional fuzzy classification systems based on multi-objective evolutionary algorithm. J. Syst. Simul. 19(1), 210–215 (2007)
Google Scholar
Xiong, N., Funk, P.: Construction of fuzzy knowledge bases incorporating feature selection. Soft Comput. 10(9), 796–804 (2006)
Article Google Scholar
Couso, I., L. Sánchez, L.: Higher order models for fuzzy random variables. Fuzzy Sets Syst. 159, 237–258 (2008)
Couso, I., Sánchez, L.: Upper and lower probabilities induced by a fuzzy random variable. Fuzzy Sets Syst. 165, 1–23 (2011)
Article MATH Google Scholar
Jesus, M.J.D., Hoffmann, F., Junco, L., S’anchez, L.: Induction of fuzzy rule based classifiers with evolutionary boosting algorithms. IEEE Trans. Fuzzy Sets Syst. 12(3), 296–308 (2004)
Article Google Scholar
S’anchez, L., Couso, I., Casillas, J.: Modelling vague data with genetic fuzzy systems under a combination of crisp and imprecise criteria. In: Proceedings of IEEE MCDM, Honolulu (2007)
S’anchez, L., Otero, J., Villar. J. R.: Learning fuzzy linguistic models from low quality data by genetic algorithms. In: FUZZ-IEEE, London. (2007)
Kwakernaak, H.: Fuzzy random variable-I. Definition and Theorem. Inf. Sci. 15, 1–29 (1978)
Article MATH MathSciNet Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addision-Wesley, Redwood (2006)
Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Elsevier, Morgan Kaufmann Publishers, San Francisco (2006)
MATH Google Scholar
Agrawal, R., Srikant, R.: Privacy preserving data mining. In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 439–450. Dallas (2000)
Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 37–48. Baltimroe (2005)
Li, Y., Chen, M., Li, Q., Zhang, W.: Enabling multilevel trust in privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 24(9), 1598–1612 (2012)
Sanchez, L., Suarez, M.R., Couso, I.: A fuzzy definition of mutual information with application to the design of genetic fuzzy classifiers. In: International Conference on Machine Intelligence, pp. 5–7. Tozeur (2005)
Bacardit, J.: Pittsburgh generic based machine learning in the data mining era: representations, generalization, and run time. Ph.D. Thesis. La Salle-Univ. Ramon Llull (2005)
Sanchez, L., Suarez, M.R., Villar, J.R., Couso, I.: Some results about Mutual information based feature selection and fuzzy Discretization of vague data. In: IEEE, Fuzzy Systems Conference, FUZZ-IEEE 2007, pp 1–6. London, 23–26 July (2007)
Asuncion, A., Newman, D.: UCI machine learning repository, (2007)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Mahavir Institute of Engineering and Technology, Odisha, India
Hemanta Kumar Bhuyan
Department of Computer Science and Engineering, C. V. Raman College of Engineering, Odisha, India
Narendra Kumar Kamila

Authors

Hemanta Kumar Bhuyan
View author publications
You can also search for this author in PubMed Google Scholar
Narendra Kumar Kamila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hemanta Kumar Bhuyan.

Appendices

Appendix 1

See Tables 8, 9, 10

Table 8 Database for each feature

Full size table

Table 9 Coordinator collects alias data as natural numbers

Full size table

Table 10 Conversion of alias value to original value

Full size table

Appendix 2

See Fig. 6

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhuyan, H.K., Kamila, N.K. Privacy preserving sub-feature selection based on fuzzy probabilities. Cluster Comput 17, 1383–1399 (2014). https://doi.org/10.1007/s10586-014-0393-9

Download citation

Received: 24 October 2013
Revised: 10 April 2014
Accepted: 14 July 2014
Published: 21 August 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10586-014-0393-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy preserving sub-feature selection based on fuzzy probabilities

Abstract

Access this article

Similar content being viewed by others

On the Interaction Between Feature Selection and Parameter Determination in Fuzzy Modelling

Fuzzy-Based Privacy Preserving Approach in Centralized Database Environment

Optimizing Partition Granularity, Membership Function Parameters, and Rule Bases of Fuzzy Classifiers for Big Data by a Multi-objective Evolutionary Approach

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Privacy preserving sub-feature selection based on fuzzy probabilities

Abstract

Access this article

Similar content being viewed by others

On the Interaction Between Feature Selection and Parameter Determination in Fuzzy Modelling

Fuzzy-Based Privacy Preserving Approach in Centralized Database Environment

Optimizing Partition Granularity, Membership Function Parameters, and Rule Bases of Fuzzy Classifiers for Big Data by a Multi-objective Evolutionary Approach

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation