Abstract
Feature selection is an important preprocessing step in pattern classification and machine learning, and mutual information is widely used to measure relevance between features and decision. However, it is difficult to directly calculate relevance between continuous or fuzzy features using mutual information. In this paper we introduce the fuzzy information entropy and fuzzy mutual information for computing relevance between numerical or fuzzy features and decision. The relationship between fuzzy information entropy and differential entropy is also discussed. Moreover, we combine fuzzy mutual information with “min-Redundancy-Max-Relevance”, “Max-Dependency” and “min-Redundancy-Max-Dependency” algorithms. The performance and stability of the proposed algorithms are tested on benchmark data sets. Experimental results show the proposed algorithms are effective and stable.
Article PDF
Avoid common mistakes on your manuscript.
References
R. Battiti, “Using mutual information for selecting features in supervised neural net learning, ” IEEE Transactions on Neural Networks, 5, 531–549(1994).
J.Y. Ching, A.K.C. Wong and K.C.C. Chan, “Class-dependent discretization for inductive learning form continuous and mixed-mode data, ” IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 641–651(1995).
C.L. Blake and C.J. Merz, “UCI Repository of Machine Learning Databases, ” Available: http://www.ics.uci.edu/mlearn/MLRepository.html, 1998.
C. Corts and V. Vapnik, “Support vector networks, ” Machine Learning, 20, 1–25(1995).
T.M. Cover, “The best two independent measurements are not the two best, ” IEEE Transactions on Systems, Man, and Cybernetics, 4, 116–117(1974).
M. Dash and H. Liu, “Consistency-based search in feature selection, ” Artificial Intelligence, 151, 155–176(2003).
A.B. David and H. Wang, “A formalism for relevance and its application in feature subset selection, ” Machine Learning, 41, 175–195(2000).
C. Ding and H.C. Peng, “Minimum redundancy feature selection from microarray gene expression data, ” In Proceeding of the 2003 IEEE Computational Systems Bioinformatics Conference, Stanford, California, 523–528(2003).
R. Duda, P. Hart and D. Stork, “Pattern classification and scene analysis, ” Wiley, New York, 2001.
M.E. Farmer, S. Bapna and A.K. Jain, “Large scale feature selection using modified random mutation hill climbing, ” In Proceedings of the 17th International Conference on Pattern Recognition, UK, 2, 287–290(2004).
M.A. Hall, “Correlation-based feature selection for discrete and numeric class machine learning, ” In Proceedings of the Seventeenth International Conference on Machine Learning, Hamilton, New Zealand, 359–366(2000).
Q.H. Hu, Z.X. Xie and D.R. Yu, “Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation, ” Pattern Recognition, 40, 3509–3521(2007).
Q.H. Hu, D.R. Yu and Z.X. Xie, “Information-preserving hybrid data reduction based on fuzzy-rough techniques, ” Pattern Recognition Letters, 27, 414–423(2006).
Q.H. Hu, J.F. Liu and D.R. Yu, “Stability Analysis on Rough Set Based Feature Evaluation, ” Lecture Notes in Computer Science, Springer Berlin/Heidelberg, 5009, 88–96(2008).
A. Kalousis, J. Prados and M. Hilario, “Stability of feature selection algorithms: a study on high-dimensional spaces, ” Knowledge and Information Systems, 12, 95–116(2007).
L.J. Ke, Z.R. Feng and Z.G. Ren, “An efficient ant colony optimization approach to attribute reduction in rough set theory, ” Pattern Recognition Letters, 29, 1351–1357(2008).
I. Kononenko, “Estimating Attributes: Analysis and Extensions of RELIEF, ” European Conference on Machine Learning, 171–182(1994).
N. Kwak and C.-H. Choi, “Input Feature selection by mutual information based on Parzen window, ” IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 1667–1671(1994).
N. Kwak and C.-H. Choi, “Input feature selection for classification problems, ” IEEE Transaction on Neural Networks, 13, 143–159(2002).
H.-M. Lee, C.-M Chen, J.-M. Chen and Y.-L. Jou, “An efficient fuzzy classifier with feature selection based on fuzzy information entropy, ” IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 31, 426–432(1999).
B. Leo, H.F. Jerome, A.O. Richard and J.S. Charls, “Classification and regression trees, ” Chapman and Hall, New York, 1993.
Y.H. Li, M. Dong and J. Hua, “Localized feature selection for clustering, ” Pattern Recognition Letters, 29, 10–18(2008).
H. Liu and H. Motoda, “Feature selection for knowledge discovery and data mining, ” Kluwer Academic Publishers, Boston, 1998.
X.X. Liu, A. Krishnan and A. Mondry, “An entropy-based gene selection method for cancer classification using microarray data, ” BMC Bioinformatics, 6, 1–14(2005).
J.S. Maritz, “Distribution-free statistical methods, ” Chapman Hall, 217, 1981.
P. Narendra and K. Fukunaga, “A branch and bound algorithm for feature subset selection, ” IEEE Transactions on Computers, 26, 917–922(1997).
E.A. Patrick and F.P. Fisher, “A generalized k-nearest neighbor rule, ” Information and Control, 16, 128–152(1970).
Z. Pawlak and C. Rauszer, “Dependency of attributes in information systems, ” Bull. Polish Acad. Sci. Math. 33, 551–559(1985).
H.C. Peng, F.H. Long and C. Ding, “Feature selection based on mutual information: criteria of Max-Dependency, Max-Relevance, and Min-Redundancy, ” IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226–1238(2005).
M. Prasad, “Online feature selection for classifying emphysema in HRCT images, ” International Journal of Computational Intelligence Systems, 1, 127–133(2008).
P. Pudil, J. Novovicova and J. Kittler, “Floating search methods in feature selection, ” Pattern Recognition Letters, 15, 1119–1125(1994).
Y.H. Qian, J.Y Liang and C.G. Dang, “Consistency measure, inclusion degree and fuzzy measure in decision tables, ” Fuzzy Sets and Systems, 159, 2353–2377(2008).
Y.H. Qian, J.Y. Liang, C.G. Dang, H.Y Zhang and J.M. Ma, “On the evaluation of the decision performance of an incomplete decision table, ” Data and Knowledge Engineering, 65, 373–400(2008).
J.C. Schlimmer, “Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning, ” Proceedings of Tenth International Conference on Machine Learning, Morgan Kaufmann, MA, 284–290(1993).
C.E. Shannon, “A mathematical theory of communication, ” The Bell System Technical Journal, 27, 379–423(1948).
Q. Shen and R. Jensen, “Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring, ” Pattern Recognition, 37, 1351–1363(2004).
C. Sima and E.R. Dougherty, “The peaking phenomenon in the presence of feature-selection, ” Pattern Recognition Letters, 29, 1667–1674(2008).
P. Somol, P. Pudil, J. Novoviova and P. Paclik, “Adaptive floating search methods in feature selection, ” Pattern Recognition Letters, 20, 1157–1163(1999).
P. Somol, P. Pudil and J. Kittler, “Fast branch and bound algorithms for optimal feature selection, ” IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 900–912(2004).
M.R. Suarez, J.R Vilar and J. Grande, “A feature selection method using a fuzzy mutual information measure, ” Advances in Soft Computing, Springer Berlin / Heidelberg, 44, 56–63(2007).
R.W. Swiniarski and A. Skowron, “Rough set methods in feature selection and recognition, ” Pattern Recognition Letters, 24, 833–849(2003).
W.Y. Tang and K.Z. Mao, “Feature selection algorithm for mixed data with both nominal and continuous features, ” Pattern Recognition Letters, 28, 563–571(2007).
Q. Wang, Y. Shen and Y Zhang, “A quantitative method for evaluating the performances of hyperspectral image fusion,” IEEE transactions on Instrumentation and Measurement, 52, 1041–1047(2003).
Q. Wang, Y. Shen and J.Q. Zhang, “Nonlinear correlation measure for multivariable data set, ” PhysicaD-Nonlinear Phenomena, 200, 287–295(2005).
Z.H. Wei, D.Q. Miao, J.-H. Chauchat and R.Z. Wen, “LiN-grams based feature selection and text representation for Chinese Text Classification, ” International Journal of Computational Intelligence Systems, 2, 365–374(2009).
W.Z. Wu, “Attribute reduction based on evidence theory in incomplete decision systems,” Information Sciences, 178, 1355–1371(2008).
L. Yu and H. Liu, “Efficient feature selection via analysis of relevance and redundancy,” Journal of Machine Learning Research, 5, 1205–1224(2004).
Author information
Authors and Affiliations
Rights and permissions
This is an open access article distributed under the CC BY-NC license (https://doi.org/creativecommons.org/licenses/by-nc/4.0/).
About this article
Cite this article
Yu, D., An, S. & Hu, Q. Fuzzy Mutual Information Based min-Redundancy and Max-Relevance Heterogeneous Feature Selection. Int J Comput Intell Syst 4, 619–633 (2011). https://doi.org/10.2991/ijcis.2011.4.4.18
Published:
Issue Date:
DOI: https://doi.org/10.2991/ijcis.2011.4.4.18