Implementation of FAST Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
In feature selection, we are concerned with finding out those features that produces result similar to those of the original entire set of features. We concern ourselves with efficiency and effectiveness while evaluating Feature selection algorithms. Efficiency deals with the time required to find a subset of features and effectiveness, with the quality of subset of features. On these criteria, a FAST clustering-based feature selection algorithm (FAST) has been proposed and experimentally evaluated and implemented in this paper. The dimensionality reduction of data is the most important feature of FAST. First, we use graph-theoretic clustering method to divide features into clusters. Next, we form a subset of features by selecting the feature which is most representative and strongly related to the target classes. Due to features in different clusters being relatively independent; the clustering-based strategy of FAST has a high probability of providing us with a subset of features which are both useful and independent. Efficiency of FAST is ensured by using the concept of minimum spanning tree (MST) along with kruskal’s algorithm.
KeywordsFeature subset selection Feature clustering Filter method Kruskal’s algorithm Graph-based clustering
We thank Dr. Amit Ganatra, Dean and Head of Department Computer Engineering Department, Charusat University for comments that greatly improved the manuscript, and Mr. Chintan Bhatt, Professor Charusat University for assistance in implementation of this algorithm.