Abstract
Feature ranking and feature selection algorithms may roughly be divided into three types. The first type encompasses algorithms that are built into adaptive systems for data analysis (predictors), for example feature selection that is a part of embedded methods (such as neural training algorithms). Algorithms of the second type are wrapped around predictors providing them subsets of features and receiving their feedback (usually accuracy). These wrapper approaches are aimed at improving results of the specific predictors they work with. The third type includes feature selection algorithms that are independent of any predictors, filtering out features that have little chance to be useful in analysis of data. These filter methods are based on performance evaluation metric calculated directly from the data, without direct feedback from predictors that will finally be used on data with reduced number of features. Such algorithms are usually computationally less expensive than those from the first or the second group. This chapter is devoted to filter methods.
Google: Duch
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Almuallim and T.G. Dietterich. Learning with many irrelevant features. In Proceedings of the 9th National Conference on Artificial Intelligence (AAAI-91), pages 547–552, 1991.
A. Antos, L. Devroye, and L. Gyorfi. An extensive empirical study of feature selection metrics for text classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(7):643–645, 1999.
R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. on Neural Networks, 5:537–550, 1994.
D.A. Bell and H. Wang. A formalism for relevance and its application in feature subset selection. Machine Learning, 41:175–195, 2000.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.
L. Bruzzone, F. Roli, and S.B. Serpico. An extension of the jeffreys-matusita distance to multiclass cases for feature selection. IEEE Transactions on Geoscience and Remote Sensing, 33(6):1318–1321, 1995.
F.M. Coetzee, E. Glover, L. Lawrence, and C.L Giles. Feature selection in web applications by roc inflections and powerset pruning. In Proceedings of 2001 Symp. on Applications and the Internet (SAINT 2001), pages 5–14, Los Alamitos, CA, 2001. IEEE Computer Society.
T.M. Cover. The best two independent measurements are not the two best. IEEE Transactions on Systems, Man, and Cybernetics, 4:116–117, 1974.
D.R. Cox and D.V. Hinkley. Theoretical Statistics. Chapman and Hall/CRC Press, Berlin, Heidelberg, New York, 1974.
M. Dash and H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 151:155–176, 2003.
R.L. de Mantaras. A distance-based attribute selection measure for decision tree induction. Machine Learning Journal, 6:81–92, 1991.
L. Devroye, L. Gyrfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, Berlin, Heidelberg, New York, 1996.
W. Duch, R. Adamczak, and K. Grabczewski. A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks, 12:277–306, 2001.
W. Duch and L. Itert. A posteriori corrections to classification methods. In L. Rutkowski and J. Kacprzyk, editors, Neural Networks and Soft Computing, pages 406–411. Physica Verlag, Springer, Berlin, Heidelberg, New York, 2002.
W. Duch, R. Setiono, and J. Zurada. Computational intelligence methods for understanding of data. Proceedings of the IEEE, 92(5):771–805, 2004.
W. Duch, T. Winiarski, J. Biesiada, and A. Kachel. Feature ranking, selection and discretization. In Proceedings of Int. Conf. on Artificial Neural Networks (ICANN), pages 251–254, Istanbul, 2003. Bogazici University Press.
R.O. Duda, P.E. Hart, and D.G. Stork. Patter Classification. John Wiley & Sons, New York, 2001.
D. Erdogmus and J.C. Principe. Lower and upper bounds for misclassification probability based on renyis information. Journal of VLSI Signal Processing Systems, 37(2–3):305–317, 2004. ISSN 1533-7928.
G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289–1305, 2003.
K. Fukunaga. Introduction to Statistical Pattern Recognition. 2nd ed. Academic Press, Boston, 1990.
E.E. Ghiselli. Theory of psychological measurement. McGrawHill, New York, 1964.
K. Grabczewski and W. Duch. The separability of split value criterion. In Proceedings of the 5th Conf. on Neural Networks and Soft Computing, pages 201–208, Zakopane, Poland, 2000. Polish Neural Network Society.
I. Guyon, H.-M. Bitter, Z. Ahmed, M. Brown, and J. Heller. Multivariate non-linear feature selection with kernel multiplicative updates and gram-schmidt relief. In BISC FLINT-CIBI 2003 workshop, Berkeley, Dec. 2003, 2003.
M.A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, N.Z., 1999.
D.J. Hand. Construction and assessment of classification rules. J. Wiley and Sons, Chichester, 1997.
J.A. Hanley and B.J. McNeil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143:29–36, 1982.
R.M. Hogarth. Methods for aggregating opinions. In H. Jungermann and G. de Zeeuw, editors, Decision Making and Change in Human Affairs. D. Reidel Publishing, Dordrecht, Holland, 1977.
D. Holste, I. Grosse, and H. Herzel. Bayes’ estimators of generalized entropies. J. Physics A: Math. General, 31:2551–2566, 1998.
R.C. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–91, 1993.
S.J. Hong. Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 9:718–730, 1997.
G.V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29:119–127, 1980.
R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324, 1996.
I. Kononenko. On biases in estimating the multivalued attributes. In Proceedings of IJCAI-95, Montreal, pages 1034–1040, SanMateo, CA, 1995. Morgan Kaufmann.
N. Kwak and C-H. Choi. Input feature selection by mutual information based on parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1667–1671, 2002a.
N. Kwak and C-H. Choi. Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13:143–159, 2002b.
M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Text and Monographs in Computer Science. Springer, Berlin, Heidelberg, New York, 1993.
H. Liu, F. Hussain, C.L. Tan, and M. Dash. Discretization: An enabling technique. Journal of Data Mining and Knowledge Discovery, 6(4):393–423, 2002.
H. Liu and R. Setiono. Feature selection and discretization. IEEE Transactions on Knowledge and Data Engineering, 9:1–4, 1997.
D. Michie. Personal models of rationality. J. Statistical Planning and Inference, 21:381–399, 1990.
R. Moddemeijer. A statistic to estimate the variance of the histogram based mutual information estimator based on dependent pairs of observations. Signal Processing, 75:51–63, 1999.
A.Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In Proceedings of the 15th International Conference on Machine Learning, pages 404–412, San Francisco, CA, 1998. Morgan Kaufmann.
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge, UK, 1988.
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo, CA, 1993.
M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 2003.
P. Smyth and R.M. Goodman. An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering, 4:301–316, 1992.
G.W. Snedecorand and W.G. Cochran. Statistical Methods, 8th ed. Iowa State University Press, Berlin, Heidelberg, New York, 1989.
J.A. Swets. Measuring the accuracy of diagnostic systems. Proceedings of the IEEE, 240(5):1285–1293, 1988.
R.W. Swiniarski and A. Skowron. Rough set methods in feature selection and recognition. Pattern Recognition Letters, 24:833–849, 2003.
K. Torkkola. Feature extraction by non parametric mutual information maximization. Journal of Machine Learning Research, 3:1415–1438, 2003. ISSN 1533-7928.
G.T. Toussaint. Note on optimal selection of independent binary-valued features for pattern recognition. IEEE Transactions on Information Theory, 17:618–618, 1971.
I. Vajda. Theory of statistical inference and information. Kluwer Academic Press, London, 1979.
C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979.
T.R. Vilmansen. Feature evaluation with measures of probabilistic dependence. IEEE Transactions on Computers, 22:381–388, 1973.
D.R. Wilson and T.R. Martinez. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6:1–34, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Duch, W. (2006). Filter Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-35488-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)