Skip to main content

Filter Methods

  • Chapter
Feature Extraction

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

Abstract

Feature ranking and feature selection algorithms may roughly be divided into three types. The first type encompasses algorithms that are built into adaptive systems for data analysis (predictors), for example feature selection that is a part of embedded methods (such as neural training algorithms). Algorithms of the second type are wrapped around predictors providing them subsets of features and receiving their feedback (usually accuracy). These wrapper approaches are aimed at improving results of the specific predictors they work with. The third type includes feature selection algorithms that are independent of any predictors, filtering out features that have little chance to be useful in analysis of data. These filter methods are based on performance evaluation metric calculated directly from the data, without direct feedback from predictors that will finally be used on data with reduced number of features. Such algorithms are usually computationally less expensive than those from the first or the second group. This chapter is devoted to filter methods.

Google: Duch

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • H. Almuallim and T.G. Dietterich. Learning with many irrelevant features. In Proceedings of the 9th National Conference on Artificial Intelligence (AAAI-91), pages 547–552, 1991.

    Google Scholar 

  • A. Antos, L. Devroye, and L. Gyorfi. An extensive empirical study of feature selection metrics for text classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(7):643–645, 1999.

    Article  Google Scholar 

  • R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. on Neural Networks, 5:537–550, 1994.

    Article  Google Scholar 

  • D.A. Bell and H. Wang. A formalism for relevance and its application in feature subset selection. Machine Learning, 41:175–195, 2000.

    Article  MATH  Google Scholar 

  • L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.

    MATH  Google Scholar 

  • L. Bruzzone, F. Roli, and S.B. Serpico. An extension of the jeffreys-matusita distance to multiclass cases for feature selection. IEEE Transactions on Geoscience and Remote Sensing, 33(6):1318–1321, 1995.

    Article  Google Scholar 

  • F.M. Coetzee, E. Glover, L. Lawrence, and C.L Giles. Feature selection in web applications by roc inflections and powerset pruning. In Proceedings of 2001 Symp. on Applications and the Internet (SAINT 2001), pages 5–14, Los Alamitos, CA, 2001. IEEE Computer Society.

    Google Scholar 

  • T.M. Cover. The best two independent measurements are not the two best. IEEE Transactions on Systems, Man, and Cybernetics, 4:116–117, 1974.

    MATH  Google Scholar 

  • D.R. Cox and D.V. Hinkley. Theoretical Statistics. Chapman and Hall/CRC Press, Berlin, Heidelberg, New York, 1974.

    MATH  Google Scholar 

  • M. Dash and H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 151:155–176, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  • R.L. de Mantaras. A distance-based attribute selection measure for decision tree induction. Machine Learning Journal, 6:81–92, 1991.

    Article  Google Scholar 

  • L. Devroye, L. Gyrfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, Berlin, Heidelberg, New York, 1996.

    MATH  Google Scholar 

  • W. Duch, R. Adamczak, and K. Grabczewski. A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks, 12:277–306, 2001.

    Article  Google Scholar 

  • W. Duch and L. Itert. A posteriori corrections to classification methods. In L. Rutkowski and J. Kacprzyk, editors, Neural Networks and Soft Computing, pages 406–411. Physica Verlag, Springer, Berlin, Heidelberg, New York, 2002.

    Google Scholar 

  • W. Duch, R. Setiono, and J. Zurada. Computational intelligence methods for understanding of data. Proceedings of the IEEE, 92(5):771–805, 2004.

    Article  Google Scholar 

  • W. Duch, T. Winiarski, J. Biesiada, and A. Kachel. Feature ranking, selection and discretization. In Proceedings of Int. Conf. on Artificial Neural Networks (ICANN), pages 251–254, Istanbul, 2003. Bogazici University Press.

    Google Scholar 

  • R.O. Duda, P.E. Hart, and D.G. Stork. Patter Classification. John Wiley & Sons, New York, 2001.

    Google Scholar 

  • D. Erdogmus and J.C. Principe. Lower and upper bounds for misclassification probability based on renyis information. Journal of VLSI Signal Processing Systems, 37(2–3):305–317, 2004. ISSN 1533-7928.

    Article  MATH  Google Scholar 

  • G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289–1305, 2003.

    Article  MATH  Google Scholar 

  • K. Fukunaga. Introduction to Statistical Pattern Recognition. 2nd ed. Academic Press, Boston, 1990.

    MATH  Google Scholar 

  • E.E. Ghiselli. Theory of psychological measurement. McGrawHill, New York, 1964.

    Google Scholar 

  • K. Grabczewski and W. Duch. The separability of split value criterion. In Proceedings of the 5th Conf. on Neural Networks and Soft Computing, pages 201–208, Zakopane, Poland, 2000. Polish Neural Network Society.

    Google Scholar 

  • I. Guyon, H.-M. Bitter, Z. Ahmed, M. Brown, and J. Heller. Multivariate non-linear feature selection with kernel multiplicative updates and gram-schmidt relief. In BISC FLINT-CIBI 2003 workshop, Berkeley, Dec. 2003, 2003.

    Google Scholar 

  • M.A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, N.Z., 1999.

    Google Scholar 

  • D.J. Hand. Construction and assessment of classification rules. J. Wiley and Sons, Chichester, 1997.

    MATH  Google Scholar 

  • J.A. Hanley and B.J. McNeil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143:29–36, 1982.

    Google Scholar 

  • R.M. Hogarth. Methods for aggregating opinions. In H. Jungermann and G. de Zeeuw, editors, Decision Making and Change in Human Affairs. D. Reidel Publishing, Dordrecht, Holland, 1977.

    Google Scholar 

  • D. Holste, I. Grosse, and H. Herzel. Bayes’ estimators of generalized entropies. J. Physics A: Math. General, 31:2551–2566, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  • R.C. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–91, 1993.

    Article  MATH  Google Scholar 

  • S.J. Hong. Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 9:718–730, 1997.

    Article  Google Scholar 

  • G.V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29:119–127, 1980.

    Article  Google Scholar 

  • R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324, 1996.

    Google Scholar 

  • I. Kononenko. On biases in estimating the multivalued attributes. In Proceedings of IJCAI-95, Montreal, pages 1034–1040, SanMateo, CA, 1995. Morgan Kaufmann.

    Google Scholar 

  • N. Kwak and C-H. Choi. Input feature selection by mutual information based on parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1667–1671, 2002a.

    Article  Google Scholar 

  • N. Kwak and C-H. Choi. Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13:143–159, 2002b.

    Article  Google Scholar 

  • M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Text and Monographs in Computer Science. Springer, Berlin, Heidelberg, New York, 1993.

    MATH  Google Scholar 

  • H. Liu, F. Hussain, C.L. Tan, and M. Dash. Discretization: An enabling technique. Journal of Data Mining and Knowledge Discovery, 6(4):393–423, 2002.

    Article  MathSciNet  Google Scholar 

  • H. Liu and R. Setiono. Feature selection and discretization. IEEE Transactions on Knowledge and Data Engineering, 9:1–4, 1997.

    Article  MATH  Google Scholar 

  • D. Michie. Personal models of rationality. J. Statistical Planning and Inference, 21:381–399, 1990.

    Google Scholar 

  • R. Moddemeijer. A statistic to estimate the variance of the histogram based mutual information estimator based on dependent pairs of observations. Signal Processing, 75:51–63, 1999.

    Article  MATH  Google Scholar 

  • A.Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In Proceedings of the 15th International Conference on Machine Learning, pages 404–412, San Francisco, CA, 1998. Morgan Kaufmann.

    Google Scholar 

  • W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge, UK, 1988.

    MATH  Google Scholar 

  • J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo, CA, 1993.

    Google Scholar 

  • M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 2003.

    Article  MATH  Google Scholar 

  • P. Smyth and R.M. Goodman. An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering, 4:301–316, 1992.

    Article  Google Scholar 

  • G.W. Snedecorand and W.G. Cochran. Statistical Methods, 8th ed. Iowa State University Press, Berlin, Heidelberg, New York, 1989.

    Google Scholar 

  • J.A. Swets. Measuring the accuracy of diagnostic systems. Proceedings of the IEEE, 240(5):1285–1293, 1988.

    MathSciNet  Google Scholar 

  • R.W. Swiniarski and A. Skowron. Rough set methods in feature selection and recognition. Pattern Recognition Letters, 24:833–849, 2003.

    Article  MATH  Google Scholar 

  • K. Torkkola. Feature extraction by non parametric mutual information maximization. Journal of Machine Learning Research, 3:1415–1438, 2003. ISSN 1533-7928.

    Article  MATH  MathSciNet  Google Scholar 

  • G.T. Toussaint. Note on optimal selection of independent binary-valued features for pattern recognition. IEEE Transactions on Information Theory, 17:618–618, 1971.

    Google Scholar 

  • I. Vajda. Theory of statistical inference and information. Kluwer Academic Press, London, 1979.

    Google Scholar 

  • C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979.

    Google Scholar 

  • T.R. Vilmansen. Feature evaluation with measures of probabilistic dependence. IEEE Transactions on Computers, 22:381–388, 1973.

    Article  MATH  MathSciNet  Google Scholar 

  • D.R. Wilson and T.R. Martinez. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6:1–34, 1997.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Duch, W. (2006). Filter Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-35488-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35487-1

  • Online ISBN: 978-3-540-35488-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics