Skip to main content

Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter

  • Conference paper

Part of the Advances in Soft Computing book series (AINSC,volume 45)

Abstract

An algorithm for filtering information based on the Pearson χ2 test approach has been implemented and tested on feature selection. This test is frequently used in biomedical data analysis and should be used only for nominal (discretized) features. This algorithm has only one parameter, statistical confidence level that two distributions are identical. Empirical comparisons with four other state-of-the-art features selection algorithms (FCBF, CorrSF, ReliefF and ConnSF) are very encouraging.

Keywords

  • Feature Selection
  • Feature Subset
  • Feature Selection Algorithm
  • Redundant Feature
  • Relevance Index

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-540-75175-5_30
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   309.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-75175-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   399.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. Duch, Filter Methods. In: Feature extraction, foundations and applications. Eds: I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh, Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer, pp. 89–118, 2006.

    Google Scholar 

  2. T.M. Cover. The best two independent measurements are not the two best. IEEE Transactions on Systems, Man, and Cybernetics, 4:116–117, 1974.

    MATH  Google Scholar 

  3. J. Biesiada, W. Duch, Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter Solution. Advances in Soft Computing, Computer Recognition Systems (CORES 2005), pp. 95–105, 2005.

    Google Scholar 

  4. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge, UK, 1988.

    Google Scholar 

  5. M.A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, N.Z, 1999.

    Google Scholar 

  6. L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlationbased filter solution. In 12th Int. Conf. on Machine Learning (IGML-03), Washington, D.C., pp. 856–863, Morgan Kaufmann, CA 2003.

    Google Scholar 

  7. M. Dash and H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 151:155–176, 2003.

    CrossRef  MATH  MathSciNet  Google Scholar 

  8. M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 2003.

    CrossRef  MATH  Google Scholar 

  9. W. Duch, T. Winiarski, J. Biesiada, and A. Kachel. Feature ranking, selection and discretization. In Proceedings of Int. Gonf. on Artificial Neural Networks (ICANN), pages 251–254, Istanbul, 2003. Bogazici University Press.

    Google Scholar 

  10. I. Witten and E. Frank. Data minig — practical machine learning tools and techniques with JAVA implementations. Morgan Kaufmann, San Francisco, CA, 2000.

    Google Scholar 

  11. C.J. Mertz and P.M. Murphy. The UCI repository of machine learning databases. Univ. of California, Irvine, 1998. http://www.ics.uci.edu.pl/ mlearn/MLRespository.html.

    Google Scholar 

  12. J.R. Quinlan. C 4.5: Programs for Machine Learning. Morgan Kaufman, CA, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Biesiada, J., Duch, W. (2007). Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter. In: Kurzynski, M., Puchala, E., Wozniak, M., Zolnierek, A. (eds) Computer Recognition Systems 2. Advances in Soft Computing, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75175-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75175-5_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75174-8

  • Online ISBN: 978-3-540-75175-5

  • eBook Packages: EngineeringEngineering (R0)