Dimension Reduction and Feature Selection

Chizi, Barak; Maimon, Oded

doi:10.1007/0-387-25465-X_5

Dimension Reduction and Feature Selection

Barak Chizi² &
Oded Maimon²

Chapter

20k Accesses
10 Citations

Abstract

Data Mining algorithms search for meaningful patterns in raw data sets. The Data Mining process requires high computational cost when dealing with large data sets. Reducing dimensionality (the number of attributed or the number of records) can effectively cut this cost. This chapter focuses a pre-processing step which removes dimension from a given data set before it is fed to a data mining algorithm. This work explains how it is often possible to reduce dimensionality with minimum loss of information. Clear dimension reduction taxonomy is described and techniques for dimension reduction are presented theoretically.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. W. and Blankert, R. L. Feature selection for case-based classification of cloud types. In Working Notes of th AAAI-94 Workshop on Case-Based Reasoning, pages 106–112, 1994.
Google Scholar
Aha, D. W. Kibler, and Albert, M. K. Instance based learning algorithms. Machine Learning, 6: 37–66, 1991.
Google Scholar
Allen, D. The relationship between variable selection and data augmentation and a method for prediction. Technometrics, 16: 125–127, 1974.
Article MATH MathSciNet Google Scholar
Almuallim, H. and Dietterich, T. G. Efficient algorithms for identifying relevant features. In.Proceedings of the Ninth Canadian Conference on Artificial Intelligence, pages 38–45. Morgan Kaufmann, 1992.
Google Scholar
Almuallim, H. and Dietterich, T. G. Learning with many irrelevant features. In Proceedings of the Ninth National Conference on Artificial Intelligence, pages 547–542. MIT Press, 1991.
Google Scholar
Blum P. and Langley, P. Selection Of Relevant Features And Examples In Machine Learning, Artificial Intelligence, 1997;97: 245–271
Article MathSciNet Google Scholar
Cardie, C. Using decision trees to improve cased-based learning. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1995.
Google Scholar
Caruana, R. and Freitag, D. Greedy attribute selection. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 1994.
Google Scholar
Cherkauer, K. J. and Shavlik, J. W. Growing simpler decision trees to facilitate knowledge discovery. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
Google Scholar
Chizi, B. and Maimon, O. “On Dimensionality Reduction of High Dimensional Data Sets”, In “Frontiers in Artificial Intelligence and Applications”. IOS press, pp. 230–236, 2002.
Google Scholar
Domingos, P. Context-sensitive feature selection for lazy learners. Artificial Intelligence Review, (11): 227–253, 1997.
Article Google Scholar
Elder, J.F. and Pregibon, D. “A Statistical perspective on knowledge discovery in databases” In Advances in Knowledge Discovery and Data Mining, Fayyad, U. Piatetsky-Shapiro, G. Smyth, P. & Uthurusamy, R. ed., AAAI/MIT Press., 1996.
Google Scholar
George, E. and Foster. D. Empirical Bayes Variable Selection, Biometrika, 2000.
Google Scholar
Hall, M. Correlation-based feature selection for machine learning, Ph.D. Thesis, Department of Computer Science, University of Waikato, 1999.
Google Scholar
Holmes, G. and Nevill-Manning, C. G.. Feature selection via the discovery of simple classification rules. In Proceedings of the Symposium on Intelligent Data Analysis, Baden-Baden, Germany, 1995.
Google Scholar
Holte, R. C. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11: 63–91, 1993.
Article MATH Google Scholar
Jackson, J. A User’s Guide to Principal Components. New York: John Wiley and Sons, 1991
Google Scholar
John, G. H. Kohavi, R. and Pfleger, P. Irrelevant features and the subset selection problem. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 1994.
Google Scholar
Jolliffe, I. Principal Component Analysis. Springer-Verlag, 1986
Google Scholar
Kira, K. and Rendell, L. A.. A practical approach to feature selection. In Machine Learning: Proceedings of the Ninth International Conference, 1992.
Google Scholar
Kohavi R. and John, G. Wrappers for feature subset selection. Artificial Intelligence, special issue on relevance, 97(1–2):273–324, 1996
Google Scholar
Kohavi, R. and Sommerfield, D. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1995.
Google Scholar
Kohavi, R. Wrappers for Performance Enhancement and Oblivious Decision Graphs. PhD thesis, Stanford University, 1995.
Google Scholar
Koller, D. and Sahami, M. Towards optimal feature selection. In Machine Learning: Proceedings of the Thirteenth International Conference on machine Learning. Morgan Kaufmann, 1996.
Google Scholar
Kononenko, I. Estimating attributes: Analysis and extensions of relief. In Proceedings of the European Conference on Machine Learning, 1994.
Google Scholar
Langley, P. Selection of relevant features in machine learning. In Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press, 1994.
Google Scholar
Langley, P. and Sage, S. Scaling to domains with irrelevant features. In R. Greiner, editor, Computational Learning Theory and Natural Learning Systems, volume 4. MIT Press, 1994.
Google Scholar
Liu, H. and Setiono, R. A probabilistic approach to feature selection: A filter solution. In Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, 1996.
Google Scholar
Maimon, O. and Last, M. Knowledge Discovery and Data Mining — The Info-Fuzzy Network (IFN) Methodology, Kluwer, 2000.
Google Scholar
Maimon, O. and Rokach, L. Improving supervised learning by feature decomposition, Proceedings of the Second International Symposium on Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, 2002, 178–196
Google Scholar
Mallows, C. L. Some comments on Cp. Technometrics 15, 661–676, 1973
Article MATH Google Scholar
Michalski, R. S. A theory and methodology of inductive learning. Artificial Intelligence, 20(2): 111–161, 1983.
Article MathSciNet Google Scholar
Moore, A. W. and Lee, M. S. Efficient algorithms for minimizing cross validation error. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 1994.
Google Scholar
Moore, A. W. Hill, D. J. and Johnson, M. P. An empirical investigation of brute force to choose features, smoothers and function approximations. In S. Hanson, S. Judd, and T. Petsche, editors, Computational Learning Theory and Natural Learning Systems, volume 3. MIT Press, 1992.
Google Scholar
Pazzani, M. Searching for dependencies in Bayesian classifiers. In Proceedings of the Fifth International Workshop on AI and Statistics, 1995.
Google Scholar
Pfahringer, B. Compression-based feature subset selection. In Proceeding of the IJCAI-95 Workshop on Data Engineering for Inductive Learning, pages 109–119, 1995.
Google Scholar
Provan, G. M. and Singh, M. Learning Bayesian networks using feature selection. In D. Fisher and H. Lenz, editors, Learning from Data, Lecture Notes in Statistics, pages 291–300. Springer-Verlag, New York, 1996.
Google Scholar
Quinlan, J.R. C4.5: Programs for machine learning. Morgan Kaufmann, Los Altos, California, 1993.
Google Scholar
Quinlan,.J.R. Induction of decision trees. Machine Learning, 1: 81–106, 1986.
Google Scholar
Rissanen, J. Modeling by shortest data description. Automatica, 14: 465–471, 1978.
Article MATH Google Scholar
Scherf, M. and Brauer, W. Feature selection by means of a feature weighting approach. Technical Report FKI-221-97, Technische Universit at Munchen 1997.
Google Scholar
Setiono, R. and Liu, H. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, 1995
Google Scholar
Singh, M. and Provan, G. M. Efficient learning of selective Bayesian classifiers. In Machine Learning: Proceedings of the Thirteenth International network Conference on Machine Learning. Morgan Kaufmann, 1996.
Google Scholar
Skalak, B. Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 1994.
Google Scholar
Vafaie, H. and De Jong, K. Genetic algorithms as a tool for restructuring feature space representations. In Proceedings of the International Conference on Tools with A. I. IEEE Computer Society Press, 1995.
Google Scholar
Ward, B., What’s Wrong with Economics. New York: Basic Books, 1972.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Industrial Engineering, Tel-Aviv University, 69978, Ramat-Aviv, Israel
Barak Chizi & Oded Maimon

Authors

Barak Chizi
View author publications
You can also search for this author in PubMed Google Scholar
Oded Maimon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Industrial Engineering, Tel-Aviv University, 69978, Ramat-Aviv, Israel
Oded Maimon & Lior Rokach &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chizi, B., Maimon, O. (2005). Dimension Reduction and Feature Selection. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_5

Download citation

DOI: https://doi.org/10.1007/0-387-25465-X_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics