Abstract
An algorithm for filtering information based on the Pearson χ2 test approach has been implemented and tested on feature selection. This test is frequently used in biomedical data analysis and should be used only for nominal (discretized) features. This algorithm has only one parameter, statistical confidence level that two distributions are identical. Empirical comparisons with four other state-of-the-art features selection algorithms (FCBF, CorrSF, ReliefF and ConnSF) are very encouraging.
Keywords
- Feature Selection
- Feature Subset
- Feature Selection Algorithm
- Redundant Feature
- Relevance Index
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
W. Duch, Filter Methods. In: Feature extraction, foundations and applications. Eds: I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh, Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer, pp. 89–118, 2006.
T.M. Cover. The best two independent measurements are not the two best. IEEE Transactions on Systems, Man, and Cybernetics, 4:116–117, 1974.
J. Biesiada, W. Duch, Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter Solution. Advances in Soft Computing, Computer Recognition Systems (CORES 2005), pp. 95–105, 2005.
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge, UK, 1988.
M.A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, N.Z, 1999.
L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlationbased filter solution. In 12th Int. Conf. on Machine Learning (IGML-03), Washington, D.C., pp. 856–863, Morgan Kaufmann, CA 2003.
M. Dash and H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 151:155–176, 2003.
M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 2003.
W. Duch, T. Winiarski, J. Biesiada, and A. Kachel. Feature ranking, selection and discretization. In Proceedings of Int. Gonf. on Artificial Neural Networks (ICANN), pages 251–254, Istanbul, 2003. Bogazici University Press.
I. Witten and E. Frank. Data minig — practical machine learning tools and techniques with JAVA implementations. Morgan Kaufmann, San Francisco, CA, 2000.
C.J. Mertz and P.M. Murphy. The UCI repository of machine learning databases. Univ. of California, Irvine, 1998. http://www.ics.uci.edu.pl/ mlearn/MLRespository.html.
J.R. Quinlan. C 4.5: Programs for Machine Learning. Morgan Kaufman, CA, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Biesiada, J., Duch, W. (2007). Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter. In: Kurzynski, M., Puchala, E., Wozniak, M., Zolnierek, A. (eds) Computer Recognition Systems 2. Advances in Soft Computing, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75175-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-75175-5_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75174-8
Online ISBN: 978-3-540-75175-5
eBook Packages: EngineeringEngineering (R0)