Feature selection based on difference and similitude in data mining

Wu, Ming; Yan, Puliu

doi:10.1007/s11859-006-0077-2

Feature selection based on difference and similitude in data mining

Published: May 2007

Volume 12, pages 467–470, (2007)
Cite this article

Wuhan University Journal of Natural Sciences

Wu Ming¹ &
Yan Puliu¹

49 Accesses
2 Citations
Explore all metrics

Abstract

Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix; the result may not be the simplest rules. Although difference-similitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(|C|²|U|²).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A feature subset selection algorithm based on equal interval division and three-way interaction information

Article 20 April 2021

Feature selection for set-valued data based on D–S evidence theory

Article 03 August 2022

SIP-FS: a novel feature selection for data representation

Article Open access 20 February 2018

References

Siedlecki W, Sklansky J. A Note on Genetic Algorithms for Large-Scale Feature-Selection[J]. Pattern Recognition Letters, 1989, 10(11):335–347.
Article MATH Google Scholar
Iriza I, Merino M, Larranaga P, et al. Feature Subset Selection by Population-Based Incremental Learning[R]. Bilbao: University of Basque Country, 1999.
Google Scholar
Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers[J]. Machine Learning, 1997, 29(2–3):131–163.
Article MATH Google Scholar
Weston J, Mukherjee S, Chapelle O, et al. Feature Selection for SVMs[J]. Neural Information Processing Systems, 2000, 13:668–674.
Google Scholar
Kuncheva L I. Fuzzy Rough Sets-Application to Feature-Selection[J]. Fuzzy Sets and Systems, 1992, 51(2):147–153.
Article MathSciNet Google Scholar
Daphne K, Sahami M. Toward Optimal Feature Selection[C] //Proceedings of 13th International Conference on Machine Learning. Bari: Morgan Kaufmann, 1996:284–292.
Google Scholar
Kohavi R, John G H. Wrappers for Feature Subset Selection[J]. Artificial Intelligence, 1997, 97(1–2):273–324.
Article MATH Google Scholar
Skowron A, Rauszer C. The Discernibility Matrices and Functions in Information Systems[M]. Dordrecht: Kluwer Academic Publishers, 1992:331–362.
Google Scholar
Xia Delin, Yan Puliu. A New Method of Knowledge Reduction for Information System—DSM Approach[R]. Wuhan: Wuhan University, 2001.
Google Scholar
Wu Ming, Xia Delin, Yan Puliu. Difference-Similitude Set Theory[C]//Intelligent Computing: Theory and Applications III. Orlando: SPIE, 2005:1–11.
Chapter Google Scholar
Jiang Hao, Yan Puliu, Xia Delin. A New Reduction Algorithm Difference-Similitude Matrix[C]//Proceedings of the 2nd International Conference on Machine Learning and Cybernetics. Xi’an: IEEE/IEE Electronic Library, 2003:1533–1537.
Google Scholar
Hu Keyun, Diao Lili, Lu Yuchang, et al. A Heuristic Optimal Reduct Algorithm[J]. Lecture Notes in Computer Science, 2000, 83: 139–144.
Article Google Scholar
Hamilton H J, Shan Ning, Cercone N. RIAC: A Rule Induction Algorithm Based on Approximate Classification[R]. Regina: University of Regina, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Information, Wuhan University, Wuhan, 430072, Hubei, China
Wu Ming & Yan Puliu

Authors

Wu Ming
View author publications
You can also search for this author in PubMed Google Scholar
Yan Puliu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Puliu.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (90204008) and Chen-Guang Plan of Wuhan City(20055003059-3)

Biography: WU Ming (1972–), male, Ph.D. candidate, research direction: knowledge discovery in databases.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, M., Yan, P. Feature selection based on difference and similitude in data mining. Wuhan Univ. J. of Nat. Sci. 12, 467–470 (2007). https://doi.org/10.1007/s11859-006-0077-2

Download citation

Received: 15 July 2006
Issue Date: May 2007
DOI: https://doi.org/10.1007/s11859-006-0077-2

Key words

CLC number

TP 18

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection based on difference and similitude in data mining

Abstract

Access this article

Similar content being viewed by others

A feature subset selection algorithm based on equal interval division and three-way interaction information

Feature selection for set-valued data based on D–S evidence theory

SIP-FS: a novel feature selection for data representation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Feature selection based on difference and similitude in data mining

Abstract

Access this article

Similar content being viewed by others

A feature subset selection algorithm based on equal interval division and three-way interaction information

Feature selection for set-valued data based on D–S evidence theory

SIP-FS: a novel feature selection for data representation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation