Abstract
Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix; the result may not be the simplest rules. Although difference-similitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(|C|2|U|2).
Similar content being viewed by others
References
Siedlecki W, Sklansky J. A Note on Genetic Algorithms for Large-Scale Feature-Selection[J]. Pattern Recognition Letters, 1989, 10(11):335–347.
Iriza I, Merino M, Larranaga P, et al. Feature Subset Selection by Population-Based Incremental Learning[R]. Bilbao: University of Basque Country, 1999.
Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers[J]. Machine Learning, 1997, 29(2–3):131–163.
Weston J, Mukherjee S, Chapelle O, et al. Feature Selection for SVMs[J]. Neural Information Processing Systems, 2000, 13:668–674.
Kuncheva L I. Fuzzy Rough Sets-Application to Feature-Selection[J]. Fuzzy Sets and Systems, 1992, 51(2):147–153.
Daphne K, Sahami M. Toward Optimal Feature Selection[C] //Proceedings of 13th International Conference on Machine Learning. Bari: Morgan Kaufmann, 1996:284–292.
Kohavi R, John G H. Wrappers for Feature Subset Selection[J]. Artificial Intelligence, 1997, 97(1–2):273–324.
Skowron A, Rauszer C. The Discernibility Matrices and Functions in Information Systems[M]. Dordrecht: Kluwer Academic Publishers, 1992:331–362.
Xia Delin, Yan Puliu. A New Method of Knowledge Reduction for Information System—DSM Approach[R]. Wuhan: Wuhan University, 2001.
Wu Ming, Xia Delin, Yan Puliu. Difference-Similitude Set Theory[C]//Intelligent Computing: Theory and Applications III. Orlando: SPIE, 2005:1–11.
Jiang Hao, Yan Puliu, Xia Delin. A New Reduction Algorithm Difference-Similitude Matrix[C]//Proceedings of the 2nd International Conference on Machine Learning and Cybernetics. Xi’an: IEEE/IEE Electronic Library, 2003:1533–1537.
Hu Keyun, Diao Lili, Lu Yuchang, et al. A Heuristic Optimal Reduct Algorithm[J]. Lecture Notes in Computer Science, 2000, 83: 139–144.
Hamilton H J, Shan Ning, Cercone N. RIAC: A Rule Induction Algorithm Based on Approximate Classification[R]. Regina: University of Regina, 1996.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by the National Natural Science Foundation of China (90204008) and Chen-Guang Plan of Wuhan City(20055003059-3)
Biography: WU Ming (1972–), male, Ph.D. candidate, research direction: knowledge discovery in databases.
Rights and permissions
About this article
Cite this article
Wu, M., Yan, P. Feature selection based on difference and similitude in data mining. Wuhan Univ. J. of Nat. Sci. 12, 467–470 (2007). https://doi.org/10.1007/s11859-006-0077-2
Received:
Issue Date:
DOI: https://doi.org/10.1007/s11859-006-0077-2