Filter Methods

Duch, Włodzisław

doi:10.1007/978-3-540-35488-8_4

Włodzisław Duch^6,7

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

9331 Accesses
64 Citations

Abstract

Feature ranking and feature selection algorithms may roughly be divided into three types. The first type encompasses algorithms that are built into adaptive systems for data analysis (predictors), for example feature selection that is a part of embedded methods (such as neural training algorithms). Algorithms of the second type are wrapped around predictors providing them subsets of features and receiving their feedback (usually accuracy). These wrapper approaches are aimed at improving results of the specific predictors they work with. The third type includes feature selection algorithms that are independent of any predictors, filtering out features that have little chance to be useful in analysis of data. These filter methods are based on performance evaluation metric calculated directly from the data, without direct feedback from predictors that will finally be used on data with reduced number of features. Such algorithms are usually computationally less expensive than those from the first or the second group. This chapter is devoted to filter methods.

Google: Duch

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Almuallim and T.G. Dietterich. Learning with many irrelevant features. In Proceedings of the 9th National Conference on Artificial Intelligence (AAAI-91), pages 547–552, 1991.
Google Scholar
A. Antos, L. Devroye, and L. Gyorfi. An extensive empirical study of feature selection metrics for text classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(7):643–645, 1999.
Article Google Scholar
R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. on Neural Networks, 5:537–550, 1994.
Article Google Scholar
D.A. Bell and H. Wang. A formalism for relevance and its application in feature subset selection. Machine Learning, 41:175–195, 2000.
Article MATH Google Scholar
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.
MATH Google Scholar
L. Bruzzone, F. Roli, and S.B. Serpico. An extension of the jeffreys-matusita distance to multiclass cases for feature selection. IEEE Transactions on Geoscience and Remote Sensing, 33(6):1318–1321, 1995.
Article Google Scholar
F.M. Coetzee, E. Glover, L. Lawrence, and C.L Giles. Feature selection in web applications by roc inflections and powerset pruning. In Proceedings of 2001 Symp. on Applications and the Internet (SAINT 2001), pages 5–14, Los Alamitos, CA, 2001. IEEE Computer Society.
Google Scholar
T.M. Cover. The best two independent measurements are not the two best. IEEE Transactions on Systems, Man, and Cybernetics, 4:116–117, 1974.
MATH Google Scholar
D.R. Cox and D.V. Hinkley. Theoretical Statistics. Chapman and Hall/CRC Press, Berlin, Heidelberg, New York, 1974.
MATH Google Scholar
M. Dash and H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 151:155–176, 2003.
Article MATH MathSciNet Google Scholar
R.L. de Mantaras. A distance-based attribute selection measure for decision tree induction. Machine Learning Journal, 6:81–92, 1991.
Article Google Scholar
L. Devroye, L. Gyrfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, Berlin, Heidelberg, New York, 1996.
MATH Google Scholar
W. Duch, R. Adamczak, and K. Grabczewski. A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks, 12:277–306, 2001.
Article Google Scholar
W. Duch and L. Itert. A posteriori corrections to classification methods. In L. Rutkowski and J. Kacprzyk, editors, Neural Networks and Soft Computing, pages 406–411. Physica Verlag, Springer, Berlin, Heidelberg, New York, 2002.
Google Scholar
W. Duch, R. Setiono, and J. Zurada. Computational intelligence methods for understanding of data. Proceedings of the IEEE, 92(5):771–805, 2004.
Article Google Scholar
W. Duch, T. Winiarski, J. Biesiada, and A. Kachel. Feature ranking, selection and discretization. In Proceedings of Int. Conf. on Artificial Neural Networks (ICANN), pages 251–254, Istanbul, 2003. Bogazici University Press.
Google Scholar
R.O. Duda, P.E. Hart, and D.G. Stork. Patter Classification. John Wiley & Sons, New York, 2001.
Google Scholar
D. Erdogmus and J.C. Principe. Lower and upper bounds for misclassification probability based on renyis information. Journal of VLSI Signal Processing Systems, 37(2–3):305–317, 2004. ISSN 1533-7928.
Article MATH Google Scholar
G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289–1305, 2003.
Article MATH Google Scholar
K. Fukunaga. Introduction to Statistical Pattern Recognition. 2nd ed. Academic Press, Boston, 1990.
MATH Google Scholar
E.E. Ghiselli. Theory of psychological measurement. McGrawHill, New York, 1964.
Google Scholar
K. Grabczewski and W. Duch. The separability of split value criterion. In Proceedings of the 5th Conf. on Neural Networks and Soft Computing, pages 201–208, Zakopane, Poland, 2000. Polish Neural Network Society.
Google Scholar
I. Guyon, H.-M. Bitter, Z. Ahmed, M. Brown, and J. Heller. Multivariate non-linear feature selection with kernel multiplicative updates and gram-schmidt relief. In BISC FLINT-CIBI 2003 workshop, Berkeley, Dec. 2003, 2003.
Google Scholar
M.A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, N.Z., 1999.
Google Scholar
D.J. Hand. Construction and assessment of classification rules. J. Wiley and Sons, Chichester, 1997.
MATH Google Scholar
J.A. Hanley and B.J. McNeil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143:29–36, 1982.
Google Scholar
R.M. Hogarth. Methods for aggregating opinions. In H. Jungermann and G. de Zeeuw, editors, Decision Making and Change in Human Affairs. D. Reidel Publishing, Dordrecht, Holland, 1977.
Google Scholar
D. Holste, I. Grosse, and H. Herzel. Bayes’ estimators of generalized entropies. J. Physics A: Math. General, 31:2551–2566, 1998.
Article MATH MathSciNet Google Scholar
R.C. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–91, 1993.
Article MATH Google Scholar
S.J. Hong. Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 9:718–730, 1997.
Article Google Scholar
G.V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29:119–127, 1980.
Article Google Scholar
R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324, 1996.
Google Scholar
I. Kononenko. On biases in estimating the multivalued attributes. In Proceedings of IJCAI-95, Montreal, pages 1034–1040, SanMateo, CA, 1995. Morgan Kaufmann.
Google Scholar
N. Kwak and C-H. Choi. Input feature selection by mutual information based on parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1667–1671, 2002a.
Article Google Scholar
N. Kwak and C-H. Choi. Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13:143–159, 2002b.
Article Google Scholar
M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Text and Monographs in Computer Science. Springer, Berlin, Heidelberg, New York, 1993.
MATH Google Scholar
H. Liu, F. Hussain, C.L. Tan, and M. Dash. Discretization: An enabling technique. Journal of Data Mining and Knowledge Discovery, 6(4):393–423, 2002.
Article MathSciNet Google Scholar
H. Liu and R. Setiono. Feature selection and discretization. IEEE Transactions on Knowledge and Data Engineering, 9:1–4, 1997.
Article MATH Google Scholar
D. Michie. Personal models of rationality. J. Statistical Planning and Inference, 21:381–399, 1990.
Google Scholar
R. Moddemeijer. A statistic to estimate the variance of the histogram based mutual information estimator based on dependent pairs of observations. Signal Processing, 75:51–63, 1999.
Article MATH Google Scholar
A.Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In Proceedings of the 15th International Conference on Machine Learning, pages 404–412, San Francisco, CA, 1998. Morgan Kaufmann.
Google Scholar
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge, UK, 1988.
MATH Google Scholar
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo, CA, 1993.
Google Scholar
M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 2003.
Article MATH Google Scholar
P. Smyth and R.M. Goodman. An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering, 4:301–316, 1992.
Article Google Scholar
G.W. Snedecorand and W.G. Cochran. Statistical Methods, 8th ed. Iowa State University Press, Berlin, Heidelberg, New York, 1989.
Google Scholar
J.A. Swets. Measuring the accuracy of diagnostic systems. Proceedings of the IEEE, 240(5):1285–1293, 1988.
MathSciNet Google Scholar
R.W. Swiniarski and A. Skowron. Rough set methods in feature selection and recognition. Pattern Recognition Letters, 24:833–849, 2003.
Article MATH Google Scholar
K. Torkkola. Feature extraction by non parametric mutual information maximization. Journal of Machine Learning Research, 3:1415–1438, 2003. ISSN 1533-7928.
Article MATH MathSciNet Google Scholar
G.T. Toussaint. Note on optimal selection of independent binary-valued features for pattern recognition. IEEE Transactions on Information Theory, 17:618–618, 1971.
Google Scholar
I. Vajda. Theory of statistical inference and information. Kluwer Academic Press, London, 1979.
Google Scholar
C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979.
Google Scholar
T.R. Vilmansen. Feature evaluation with measures of probabilistic dependence. IEEE Transactions on Computers, 22:381–388, 1973.
Article MATH MathSciNet Google Scholar
D.R. Wilson and T.R. Martinez. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6:1–34, 1997.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Toruń, Poland
Włodzisław Duch
Department of Computer Science, School of Computer Engineering, Nanyang Technological University, Singapore, 639798
Włodzisław Duch

Authors

Włodzisław Duch
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Clopinet, 955 Creston Road, 94708, Berkeley, USA
Isabelle Guyon
Department of Electrical Engineering & Computer Science — EECS, University of California, 94720, Berkeley, USA
Masoud Nikravesh
School of Electronics and Computer Sciences, University of Southampton, SO17 1BJ, Southampton Highfield, UK
Steve Gunn
Division of Computer Science Lab. Electronics Research, University of California, Soda Hall 387, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Duch, W. (2006). Filter Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-35488-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics