Skip to main content
Log in

Privacy preserving sub-feature selection based on fuzzy probabilities

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The feature selection addresses the issue of developing accurate models for classification in data mining. The aggregated data collection from distributed environment for feature selection makes the problem of accessing the relevant inputs of individual data records. Preserving the privacy of individual data is often critical issue in distributed data mining. In this paper, it proposes the privacy preservation of individual data for both feature and sub-feature selection based on data mining techniques and fuzzy probabilities. For privacy purpose, each party maintains their privacy as the instruction of data miner with the help of fuzzy probabilities as alias values. The techniques have developed for own database of data miner in distributed network with fuzzy system and also evaluation of sub-feature value included for the processing of data mining task. The feature selection has been explained by existing data mining techniques i.e., gain ratio using fuzzy optimization. The estimation of gain ratio based on the relevant inputs for the feature selection has been evaluated within the expected upper and lower bound of fuzzy data set. It mainly focuses on sub-feature selection with privacy algorithm using fuzzy random variables among different parties in distributed environment. The sub-feature selection is uniquely identified for better class prediction. The algorithm provides the idea of selecting sub-feature using fuzzy probabilities with fuzzy frequency data from data miner’s database. The experimental result shows performance of our findings based on real world data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Rogati, M., Yang, Y.: High -performing feature selection for text classification. In: CIKM’02, ACM, McLean, 4–9 Nov (2002)

  2. Azizi, A., Pourreza, H. R.: Efficient IRIS recognition through improvement of feature extraction and subset selection. Int. J. Comput. Sci. Infor. Sec. (IJCIS). 2, (1), (2009)

  3. Uncu, O., Turksen, I.B.: A novel feature selection approach: combining feature wrappers and filters. Infor. Sci. 177(2), 449–466 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  4. Xia, H., Hu, B.Q.: Feature selection using fuzzy support vector machines. Fuzzy Optim. Decis. Mak. 5(2), 187–192 (2006)

    Article  MATH  Google Scholar 

  5. Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)

    Article  Google Scholar 

  6. Rezaee, M. R., Goedhart, B., Lelieveldt, B. P. F., Reiber\(,\) J. H. C.: Fuzzy feature selection. Pattern Recognit. 32, 2011–2019 (1999)

  7. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)

    Article  Google Scholar 

  8. Bhuyan, H. K., Kamila, N. K., Mishra, M., Jena, S. S., Bhuyan, G.: Sub-feature selection with privacy in decentralized network based on fuzzy environment. In: Proceedings of CNC 2013, Chennai, India, pp. 19–26. LNICST, Chennai, 22–23 Feb (2013)

  9. Wolf, R., Schuster, A.: Association rule mining in peer-to-peer systems. IEEE Trans. Syst. Man Cybern. Part B 34(6), 2426–2438 (2004)

    Article  Google Scholar 

  10. Bhaduri, K., Wolff, R., Gianella C., Kargupta, H.: Distributed Decision tree induction in peer-to-peer systems. Stat. Anal. Data Min. J. 1(2), 85–103, (2008)

  11. Das, K., Bhaduri, K., Liu, K., Kargupta, H.: Distributed identification of Top-l inner products elements and it’s application in a peer-to-peer network. TKDE 20(4), 475–488 (2008)

    Google Scholar 

  12. Chen, R., Sivkumar, K., Kargupta, H.: Collective mining of Baysian networks from distributed heterogeneous data. Knowl. Inf. Syst. 6(2), 164–187 (2004)

    Article  Google Scholar 

  13. Al-Zaidy, R., Fung, B.C.M., Youssef, A.M., Fortin, F.: Mining criminal networks from unstructured text documents. Digit. Investig. 8(3—-4), 147–160 (2012)

    Article  Google Scholar 

  14. Nix, R., Kantarcioglu, M.: Incentive compatible privacy-preserving distributed classification. IEEE Trans. Dependable Secure Comput. 9(4), 451–462 (2012)

  15. Clifton, C., Kantarcioglu, M., Lin, X., Vaidya, J., Zhu, M.: Tools for privacy preserving distributed data mining. SIGKDD Explor. 4(2), 28–34 (2003)

    Article  Google Scholar 

  16. Kargupta, H., Das, K., Liu, K.: Multiparty, privacy preserving distributed data mining using game theoretic framework. In: Proceedings of PKDD’07, pp. 523–531. Warsaw (2007)

  17. Zhou, B., Pei, J.: The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl. Inf. Syst. 28(1), 47–77 (2011)

  18. Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14 (2010)

  19. Kaleli, C., Polat, H.: Privacy-preserving SOM-based recommendations on horizontally distributed data. Knowl.-Based Syst. 33, 124–135 (2012)

  20. Bhuyan, H. K., Kamila N. K., Dash, S. K.: An approach for privacy preservation of distributed data in peer-to-peer network using multiparty computation. Int. J. Comput. Sci. Issues (IJCSI). 8(4), 2 (2011)

  21. Diamantini, C., Gemelli, A., Potena, D.: Feature ranking based on decision border. In: International conference on pattern recognition, IEEE Computer Society (2010)

  22. Das, K., Bhaduri, K., Kargupta, H.: A local asynchronous distributed privacy preserving feature selection algorithm for large peer to peer networks. Knowl. Inf. Syst. 24(3), 341–367 (2014)

    Article  Google Scholar 

  23. Sun, H. J., Sun, M., Mei, Z.: Feature selection via fuzzy clustering. In: Proceedings of International Conference on Machine Learning and Cybernetics, pp. 1400–1405. (2006)

  24. Zhang, Y., Wu, X.B., Xiang, Z.R., Hu, W.L.: Design of high dimensional fuzzy classification systems based on multi-objective evolutionary algorithm. J. Syst. Simul. 19(1), 210–215 (2007)

    Google Scholar 

  25. Xiong, N., Funk, P.: Construction of fuzzy knowledge bases incorporating feature selection. Soft Comput. 10(9), 796–804 (2006)

    Article  Google Scholar 

  26. Couso, I., L. Sánchez, L.: Higher order models for fuzzy random variables. Fuzzy Sets Syst. 159, 237–258 (2008)

  27. Couso, I., Sánchez, L.: Upper and lower probabilities induced by a fuzzy random variable. Fuzzy Sets Syst. 165, 1–23 (2011)

    Article  MATH  Google Scholar 

  28. Jesus, M.J.D., Hoffmann, F., Junco, L., S’anchez, L.: Induction of fuzzy rule based classifiers with evolutionary boosting algorithms. IEEE Trans. Fuzzy Sets Syst. 12(3), 296–308 (2004)

    Article  Google Scholar 

  29. S’anchez, L., Couso, I., Casillas, J.: Modelling vague data with genetic fuzzy systems under a combination of crisp and imprecise criteria. In: Proceedings of IEEE MCDM, Honolulu (2007)

  30. S’anchez, L., Otero, J., Villar. J. R.: Learning fuzzy linguistic models from low quality data by genetic algorithms. In: FUZZ-IEEE, London. (2007)

  31. Kwakernaak, H.: Fuzzy random variable-I. Definition and Theorem. Inf. Sci. 15, 1–29 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  32. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addision-Wesley, Redwood (2006)

    Google Scholar 

  33. Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Elsevier, Morgan Kaufmann Publishers, San Francisco (2006)

    MATH  Google Scholar 

  34. Agrawal, R., Srikant, R.: Privacy preserving data mining. In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 439–450. Dallas (2000)

  35. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 37–48. Baltimroe (2005)

  36. Li, Y., Chen, M., Li, Q., Zhang, W.: Enabling multilevel trust in privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 24(9), 1598–1612 (2012)

  37. Sanchez, L., Suarez, M.R., Couso, I.: A fuzzy definition of mutual information with application to the design of genetic fuzzy classifiers. In: International Conference on Machine Intelligence, pp. 5–7. Tozeur (2005)

  38. Bacardit, J.: Pittsburgh generic based machine learning in the data mining era: representations, generalization, and run time. Ph.D. Thesis. La Salle-Univ. Ramon Llull (2005)

  39. Sanchez, L., Suarez, M.R., Villar, J.R., Couso, I.: Some results about Mutual information based feature selection and fuzzy Discretization of vague data. In: IEEE, Fuzzy Systems Conference, FUZZ-IEEE 2007, pp 1–6. London, 23–26 July (2007)

  40. Asuncion, A., Newman, D.: UCI machine learning repository, (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hemanta Kumar Bhuyan.

Appendices

Appendix 1

See Tables 8, 9, 10

Table 8 Database for each feature
Table 9 Coordinator collects alias data as natural numbers
Table 10 Conversion of alias value to original value

Appendix 2

See Fig. 6

Fig. 6
figure 6

Values of fuzzy random variable for different frequency of IRIS data set

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhuyan, H.K., Kamila, N.K. Privacy preserving sub-feature selection based on fuzzy probabilities. Cluster Comput 17, 1383–1399 (2014). https://doi.org/10.1007/s10586-014-0393-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-014-0393-9

Keywords

Navigation