Classification is a key problem in machine learning/data mining. Algorithms for classification have the ability to predict the class of a new instance after having been trained on data representing past experience in classifying instances. However, the presence of a large number of features in training data can hurt the classification capacity of a machine learning algorithm. The Feature Selection problem involves discovering a subset of features such that a classifier built only with this subset would attain predictive accuracy no worse than a classifier built from the entire set of features. Several algorithms have been proposed to solve this problem. In this paper we discuss how parallelism can be used to improve the performance of feature selection algorithms. In particular, we present, discuss and evaluate a coarse-grained parallel version of the feature selection algorithm FortalFS. This algorithm performs well compared with other solutions and it has certain characteristics that makes it a good candidate for parallelization. Our parallel design is based on the master--slave design pattern. Promising results show that this approach is able to achieve near optimum speedups in the context of Amdahl's Law.
KeywordsFeature Selection Feature Selection Algorithm Good Subset Parallelization Strategy Master Process
Unable to display preview. Download preview PDF.