A Fractal Dimension Based Filter Algorithm to Select Features for Supervised Learning
Feature selection plays an important role in machine learning and is often applied as a data pre-processing step. Its objective is to choose a subset from the original set of features that describes a data set, according to some importance criterion, by removing irrelevant and/or redundant features, as they may decrease data quality and reduce the comprehensibility of hypotheses induced by supervised learning algorithms. Most of the state-of-art feature selection algorithms mainly focus on finding relevant features. However, it has been shown that relevance alone is not sufficient to select important features. It is also important to deal with the problem of features’ redundancy. For the purpose of selecting features and discarding others, it is necessary to measure the features’ goodness (importance), and many importance measures have been proposed. This work proposes a filter algorithm that decouples relevance and redundancy analysis, and introduces the use of Fractal Dimension to deal with redundant features. Empirical results on several data sets show that Fractal Dimension is an appropriate criterion to filter out redundant features for supervised learning.
Unable to display preview. Download preview PDF.
- 4.Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(3), 131–156 (1997), http://dx.doi.org/10.1016/S1088-467X9700008-5
- 5.John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Kaufmann, M. (ed.) Proc. of the 11th Int. Conf. on Machine Learning, CA, pp. 167–173 (1994)Google Scholar
- 6.Traina, C., Sousa, E.P.M., Traina, A.J.M.: 24. In: Using Fractals in Data Mining, 1st edn., vol. 1(1), pp. 599–630. Wiley-IEEE Press, New Jersey (2005)Google Scholar
- 7.Lee, H.D.: Selection of important features for knowledge extraction from data bases. PhD thesis, Intitute of Mathematical Sciences and Computing — ICMC, University of São Paulo — USP, São Carlos (2005) (in Portuguese)Google Scholar
- 8.Sousa, E.P.M., Traina, C., Traina, A.J.M., Faloutsos, C.: How to use fractal dimension to find correlations between attributes. In: Workshop Notes of KDD 2002 Workshop on Fractals and Self-similarity in Data Mining: Issues and Approaches, Edmonton, Canada, pp. 26–30 (2002)Google Scholar
- 9.Faloutsos, C., Kamel, I.: Beyond uniformity and independence: Analysis of rtrees using the concept of fractal dimension. In: Proc. of the 13th ACM SIGMOD Symposium on Principles of Database Systems, MN, pp. 4–13 (1994)Google Scholar
- 10.Merz, C.J., Murphy, P.M.: UCI repository of machine learning datasets (1998)Google Scholar
- 11.Kira, K., Rendell, L.: A pratical approach to feature selection. In: Proc. of the 9th Int. Conf. on Machine Learning, Aberdeen, Scotland, pp. 249–256 (1992)Google Scholar
- 12.Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proc. of the 17th Int. Conf. on Machine Learning, CA, pp. 359–366. Morgan Kaufmann, San Francisco (2000)Google Scholar
- 13.Liu, H., Setiono, R.: A probabilistic approach to feature selection – a filter solution. In: Proc. of the 13th Int. Conf. on Machine Learning, Bari, Italy, pp. 319–327 (1996)Google Scholar
- 14.Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, CA (2000)Google Scholar
- 15.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, CA (1988)Google Scholar
- 16.Lee, H.D., Monard, M.C., Voltolini, R.F., Wu, F.C.: Avaliação experimental e comparação de algoritmos de seleção de atributos importantes com o algoritmo FDimBF baseado na dimensão fractal. Technical Report 264, ICMC-USP (2005), http://www.icmc.usp.br/~biblio/download/RT264.pdf