Skip to main content
Log in

Feature selection based on difference and similitude in data mining

  • Published:
Wuhan University Journal of Natural Sciences

Abstract

Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix; the result may not be the simplest rules. Although difference-similitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(|C|2|U|2).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Siedlecki W, Sklansky J. A Note on Genetic Algorithms for Large-Scale Feature-Selection[J]. Pattern Recognition Letters, 1989, 10(11):335–347.

    Article  MATH  Google Scholar 

  2. Iriza I, Merino M, Larranaga P, et al. Feature Subset Selection by Population-Based Incremental Learning[R]. Bilbao: University of Basque Country, 1999.

    Google Scholar 

  3. Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers[J]. Machine Learning, 1997, 29(2–3):131–163.

    Article  MATH  Google Scholar 

  4. Weston J, Mukherjee S, Chapelle O, et al. Feature Selection for SVMs[J]. Neural Information Processing Systems, 2000, 13:668–674.

    Google Scholar 

  5. Kuncheva L I. Fuzzy Rough Sets-Application to Feature-Selection[J]. Fuzzy Sets and Systems, 1992, 51(2):147–153.

    Article  MathSciNet  Google Scholar 

  6. Daphne K, Sahami M. Toward Optimal Feature Selection[C] //Proceedings of 13th International Conference on Machine Learning. Bari: Morgan Kaufmann, 1996:284–292.

    Google Scholar 

  7. Kohavi R, John G H. Wrappers for Feature Subset Selection[J]. Artificial Intelligence, 1997, 97(1–2):273–324.

    Article  MATH  Google Scholar 

  8. Skowron A, Rauszer C. The Discernibility Matrices and Functions in Information Systems[M]. Dordrecht: Kluwer Academic Publishers, 1992:331–362.

    Google Scholar 

  9. Xia Delin, Yan Puliu. A New Method of Knowledge Reduction for Information System—DSM Approach[R]. Wuhan: Wuhan University, 2001.

    Google Scholar 

  10. Wu Ming, Xia Delin, Yan Puliu. Difference-Similitude Set Theory[C]//Intelligent Computing: Theory and Applications III. Orlando: SPIE, 2005:1–11.

    Chapter  Google Scholar 

  11. Jiang Hao, Yan Puliu, Xia Delin. A New Reduction Algorithm Difference-Similitude Matrix[C]//Proceedings of the 2nd International Conference on Machine Learning and Cybernetics. Xi’an: IEEE/IEE Electronic Library, 2003:1533–1537.

    Google Scholar 

  12. Hu Keyun, Diao Lili, Lu Yuchang, et al. A Heuristic Optimal Reduct Algorithm[J]. Lecture Notes in Computer Science, 2000, 83: 139–144.

    Article  Google Scholar 

  13. Hamilton H J, Shan Ning, Cercone N. RIAC: A Rule Induction Algorithm Based on Approximate Classification[R]. Regina: University of Regina, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Puliu.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (90204008) and Chen-Guang Plan of Wuhan City(20055003059-3)

Biography: WU Ming (1972–), male, Ph.D. candidate, research direction: knowledge discovery in databases.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, M., Yan, P. Feature selection based on difference and similitude in data mining. Wuhan Univ. J. of Nat. Sci. 12, 467–470 (2007). https://doi.org/10.1007/s11859-006-0077-2

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11859-006-0077-2

Key words

CLC number

Navigation