Abstract
The feature selection addresses the issue of developing accurate models for classification in data mining. The aggregated data collection from distributed environment for feature selection makes the problem of accessing the relevant inputs of individual data records. Preserving the privacy of individual data is often critical issue in distributed data mining. In this paper, it proposes the privacy preservation of individual data for both feature and sub-feature selection based on data mining techniques and fuzzy probabilities. For privacy purpose, each party maintains their privacy as the instruction of data miner with the help of fuzzy probabilities as alias values. The techniques have developed for own database of data miner in distributed network with fuzzy system and also evaluation of sub-feature value included for the processing of data mining task. The feature selection has been explained by existing data mining techniques i.e., gain ratio using fuzzy optimization. The estimation of gain ratio based on the relevant inputs for the feature selection has been evaluated within the expected upper and lower bound of fuzzy data set. It mainly focuses on sub-feature selection with privacy algorithm using fuzzy random variables among different parties in distributed environment. The sub-feature selection is uniquely identified for better class prediction. The algorithm provides the idea of selecting sub-feature using fuzzy probabilities with fuzzy frequency data from data miner’s database. The experimental result shows performance of our findings based on real world data set.
Similar content being viewed by others
References
Rogati, M., Yang, Y.: High -performing feature selection for text classification. In: CIKM’02, ACM, McLean, 4–9 Nov (2002)
Azizi, A., Pourreza, H. R.: Efficient IRIS recognition through improvement of feature extraction and subset selection. Int. J. Comput. Sci. Infor. Sec. (IJCIS). 2, (1), (2009)
Uncu, O., Turksen, I.B.: A novel feature selection approach: combining feature wrappers and filters. Infor. Sci. 177(2), 449–466 (2007)
Xia, H., Hu, B.Q.: Feature selection using fuzzy support vector machines. Fuzzy Optim. Decis. Mak. 5(2), 187–192 (2006)
Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)
Rezaee, M. R., Goedhart, B., Lelieveldt, B. P. F., Reiber\(,\) J. H. C.: Fuzzy feature selection. Pattern Recognit. 32, 2011–2019 (1999)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)
Bhuyan, H. K., Kamila, N. K., Mishra, M., Jena, S. S., Bhuyan, G.: Sub-feature selection with privacy in decentralized network based on fuzzy environment. In: Proceedings of CNC 2013, Chennai, India, pp. 19–26. LNICST, Chennai, 22–23 Feb (2013)
Wolf, R., Schuster, A.: Association rule mining in peer-to-peer systems. IEEE Trans. Syst. Man Cybern. Part B 34(6), 2426–2438 (2004)
Bhaduri, K., Wolff, R., Gianella C., Kargupta, H.: Distributed Decision tree induction in peer-to-peer systems. Stat. Anal. Data Min. J. 1(2), 85–103, (2008)
Das, K., Bhaduri, K., Liu, K., Kargupta, H.: Distributed identification of Top-l inner products elements and it’s application in a peer-to-peer network. TKDE 20(4), 475–488 (2008)
Chen, R., Sivkumar, K., Kargupta, H.: Collective mining of Baysian networks from distributed heterogeneous data. Knowl. Inf. Syst. 6(2), 164–187 (2004)
Al-Zaidy, R., Fung, B.C.M., Youssef, A.M., Fortin, F.: Mining criminal networks from unstructured text documents. Digit. Investig. 8(3—-4), 147–160 (2012)
Nix, R., Kantarcioglu, M.: Incentive compatible privacy-preserving distributed classification. IEEE Trans. Dependable Secure Comput. 9(4), 451–462 (2012)
Clifton, C., Kantarcioglu, M., Lin, X., Vaidya, J., Zhu, M.: Tools for privacy preserving distributed data mining. SIGKDD Explor. 4(2), 28–34 (2003)
Kargupta, H., Das, K., Liu, K.: Multiparty, privacy preserving distributed data mining using game theoretic framework. In: Proceedings of PKDD’07, pp. 523–531. Warsaw (2007)
Zhou, B., Pei, J.: The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl. Inf. Syst. 28(1), 47–77 (2011)
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14 (2010)
Kaleli, C., Polat, H.: Privacy-preserving SOM-based recommendations on horizontally distributed data. Knowl.-Based Syst. 33, 124–135 (2012)
Bhuyan, H. K., Kamila N. K., Dash, S. K.: An approach for privacy preservation of distributed data in peer-to-peer network using multiparty computation. Int. J. Comput. Sci. Issues (IJCSI). 8(4), 2 (2011)
Diamantini, C., Gemelli, A., Potena, D.: Feature ranking based on decision border. In: International conference on pattern recognition, IEEE Computer Society (2010)
Das, K., Bhaduri, K., Kargupta, H.: A local asynchronous distributed privacy preserving feature selection algorithm for large peer to peer networks. Knowl. Inf. Syst. 24(3), 341–367 (2014)
Sun, H. J., Sun, M., Mei, Z.: Feature selection via fuzzy clustering. In: Proceedings of International Conference on Machine Learning and Cybernetics, pp. 1400–1405. (2006)
Zhang, Y., Wu, X.B., Xiang, Z.R., Hu, W.L.: Design of high dimensional fuzzy classification systems based on multi-objective evolutionary algorithm. J. Syst. Simul. 19(1), 210–215 (2007)
Xiong, N., Funk, P.: Construction of fuzzy knowledge bases incorporating feature selection. Soft Comput. 10(9), 796–804 (2006)
Couso, I., L. Sánchez, L.: Higher order models for fuzzy random variables. Fuzzy Sets Syst. 159, 237–258 (2008)
Couso, I., Sánchez, L.: Upper and lower probabilities induced by a fuzzy random variable. Fuzzy Sets Syst. 165, 1–23 (2011)
Jesus, M.J.D., Hoffmann, F., Junco, L., S’anchez, L.: Induction of fuzzy rule based classifiers with evolutionary boosting algorithms. IEEE Trans. Fuzzy Sets Syst. 12(3), 296–308 (2004)
S’anchez, L., Couso, I., Casillas, J.: Modelling vague data with genetic fuzzy systems under a combination of crisp and imprecise criteria. In: Proceedings of IEEE MCDM, Honolulu (2007)
S’anchez, L., Otero, J., Villar. J. R.: Learning fuzzy linguistic models from low quality data by genetic algorithms. In: FUZZ-IEEE, London. (2007)
Kwakernaak, H.: Fuzzy random variable-I. Definition and Theorem. Inf. Sci. 15, 1–29 (1978)
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addision-Wesley, Redwood (2006)
Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Elsevier, Morgan Kaufmann Publishers, San Francisco (2006)
Agrawal, R., Srikant, R.: Privacy preserving data mining. In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 439–450. Dallas (2000)
Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 37–48. Baltimroe (2005)
Li, Y., Chen, M., Li, Q., Zhang, W.: Enabling multilevel trust in privacy preserving data mining. IEEE Trans. Knowl. Data Eng. 24(9), 1598–1612 (2012)
Sanchez, L., Suarez, M.R., Couso, I.: A fuzzy definition of mutual information with application to the design of genetic fuzzy classifiers. In: International Conference on Machine Intelligence, pp. 5–7. Tozeur (2005)
Bacardit, J.: Pittsburgh generic based machine learning in the data mining era: representations, generalization, and run time. Ph.D. Thesis. La Salle-Univ. Ramon Llull (2005)
Sanchez, L., Suarez, M.R., Villar, J.R., Couso, I.: Some results about Mutual information based feature selection and fuzzy Discretization of vague data. In: IEEE, Fuzzy Systems Conference, FUZZ-IEEE 2007, pp 1–6. London, 23–26 July (2007)
Asuncion, A., Newman, D.: UCI machine learning repository, (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bhuyan, H.K., Kamila, N.K. Privacy preserving sub-feature selection based on fuzzy probabilities. Cluster Comput 17, 1383–1399 (2014). https://doi.org/10.1007/s10586-014-0393-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-014-0393-9