Abstract
High sensitivity to irrelevant features is arguably the main shortcoming of simple lazy learners. In response to it, many feature selection methods have been proposed, including forward sequential selection (FSS) and backward sequential selection (BSS). Although they often produce substantial improvements in accuracy, these methods select the same set of relevant features everywhere in the instance space, and thus represent only a partial solution to the problem. In general, some features will be relevant only in some parts of the space; deleting them may hurt accuracy in those parts, but selecting them will have the same effect in parts where they are irrelevant. This article introduces RC, a new feature selection algorithm that uses a clustering-like approach to select sets of locally relevant features (i.e., the features it selects may vary from one instance to another). Experiments in a large number of domains from the UCI repository show that RC almost always improves accuracy with respect to FSS and BSS, often with high significance. A study using artificial domains confirms the hypothesis that this difference in performance is due to RC's context sensitivity, and also suggests conditions where this sensitivity will and will not be an advantage. Another feature of RC is that it is faster than FSS and BSS, often by an order of magnitude or more.
Similar content being viewed by others
References
Aha, D. W. (1989). Incremental, Instance-Based Learning of Independent and Graded Concept Descriptions. In Proceedings of The Sixth International Workshop on Machine Learning, pp. 387–391. Ithaca, NY: Morgan Kaufmann.
Aha, D. W. (1992). Generalizing from Case Studies: A Case Study. In Proceedings of The Ninth International Workshop on Machine Learning, pp. 1–10. Aberdeen, Scotland: Morgan Kaufmann.
Aha, D. W. & Bankert, R. L. (1994). Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 106–112. Seattle, WA: AAAI Press.
Aha, D. W. & Goldstone, R. L. (1992). Concept Learning and Flexible Weighting. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pp. 534–539. Evanston, IL: Lawrence Erlbaum.
Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-Based Learning Algorithms. Machine Learning 6: 37–66.
Almuallim, H. & Dietterich, T. G. (1991). Learning with Many Irrelvant Features. In Proceedings of The Ninth National Conference on Artificial Intelligence, pp. 547–552. Menlo Park, CA: AAAI Press.
Atkeson, C. G., Moore, A. W. & Schaal, S. (1997). Locally Weighted Learning. Artificial Intelligence Review, this issue.
Cain, T., Pazzani, M. J. & Silverstein, G. (1991). Using Domain Knowledge to Influence Similarity Judgments. In Proceedings of The Case-Based Reasoning Workshop, pp. 191–199. Washington, DC: Morgan Kaufmann.
Cardie, C. (1993). Using Decision Trees to Improve Case-Based Learning. In Proceedings of The Tenth International Conference on Machine Learning, pp. 25–32. Amherst, MA: Morgan Kaufmann.
Caruana, R. & Freitag, D. (1994). Greedy Attribute Selection. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 28–36. New Brunswick, NJ: Morgan Kaufmann.
Clark, P. & Niblett, T. (1989). The CN2 Induction Algorithm. Machine Learning 3: 261–283.
Cost, S. & Salzberg, S. (1993). A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10: 57–78.
Creecy, R. H., Masand, B. M., Smith, S. J. & Waltz, D. L. (1992). Trading MIPS and Memory for Knowledge Engineering. Communications of the ACM 35(8): 48–63.
DeGroot, M. H. (1986). Probability and Statistics, Second Edition. Addison-Wesley: Reading, MA.
Devijver, P. A. & Kittler, J. (1982). Pattern Recognition: A Statistical Approach. Prentice/Hall: Englewood Cliffs, NJ.
Domingos, P. (1995). The RISE 2.0 System: A Case Study in Multistrategy Learning. TR–95–2, Department of Information and Computer Science, University of California at Irvine, Irvine, CA.
Domingos, P. (1996). Unifying Instance-Based and Rule-Based Induction. Machine Learning 24: 141–168.
Holte, R. C. (1993). Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11: 63–91.
John, G. H., Kohavi, R. & Pfleger, K. (1994). Irrelevant Features and the Subset Selection Problem. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 121–129. New Brunswick, NJ: Morgan Kaufmann.
Kelly, J. D. & Davis, L. (1991). A Hybrid Genetic Algorithm for Classification. In Proceedings of The Twelfth International Joint Conference on Artificial Intelligence, pp. 645–650. Sydney: Morgan Kaufmann.
Kibler, D. & Aha, D. W. (1987). Learning Representative Exemplars of Concepts: An Initial Case Study. In Proceedings of The Fourth International Workshop on Machine Learning, pp. 24–30, Irvine, CA: Morgan Kaufmann.
Kira, A. & Rendell, L. A. (1992). A Practical Approach to Feature Selection. In Proceedings of The Ninth International Workshop on Machine Learning, pp. 249–256. Aberdeen, Scotland: Morgan Kaufmann.
Kittler, J. (1986). Feature Selection and Extraction. In Young, T. Y. & Fu, K. S. (eds.) Handbook of Pattern Recognition and Image Processing. Academic Press: New York.
Kolodner, J. (1993). Case-Based Reasoning. Morgan Kaufmann: San Mateo, CA.
Langley, P. & Sage, S. (1994). Oblivious Decision Trees and Abstract Cases. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 113–117. Seattle, CA: AAAI Press.
Lee, C. (1994). An Instance-Based Learning Method for Databases: An Information Theoretic Approach. In Proceedings of The Ninth European Conference on Machine Learning, pp. 387–390. Catania, Italy: Springer-Verlag.
Mohri, T. & Tanaka, H. (1994). An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 123–127. Seattle, WA: AAAI Press.
Murphy, P. M. (1995). UCI Repository of Machine Learning Databases. Machine-Readable Data Repository, Department of Information and Computer Science, University of California at Irvine, Irvine, CA.
Niblett, T. (1987). Constructing Decision Trees in Noisy Domains. In Proceedings of The Second European Working Session on Learning, pp. 67–78. Bled, Yugoslavia: Sigma.
Nosofsky, R. M., Clark, S. E. & Shin, H. J. (1989). Rules and Exemplars in Categorization, Identification, and Recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition 15: 282–304.
Pagallo, G. & Haussler, D. (1990). Boolean Feature Discovery in Empirical Learning. Machine Learning 3: 71–99.
Ricci, F. & Avesani, P. (1995). Learning a Local Similarity Metric for Case-Based Reasoning. In Proceedings of The First International Conference on Case-Based Reasoning, pp. 301–312. Sesimbra, Portugal: Springer-Verlag.
Salzberg, S. (1991). A Nearest Hyperrectangle Learning Method. Machine Learning 6: 251–276.
Schaffer, C. (1989). Analysis of Artificial Data Sets. In Proceedings of The Second International Symposium on Artificial Intelligence, pp. 607–617. Monterrey, Mexico: McGraw-Hill.
Schlimmer, J. C. (1993). Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning. In Proceedings of The Tenth International Conference on Machine Learning, pp. 284–290. Amherst, MA: Morgan Kaufmann.
Skalak, D. B. (1992). Representing Cases as Knowledge Sources that Apply Local Similarity Metrics. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pp. 325–330. Evanston, IL: Lawrence Erlbaum.
Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 293–301. New Brunswick, NJ: Morgan Kaufmann.
Stanfill, C. & Waltz, D. (1986). Toward Memory-Based Reasoning. Communications of the ACM 29: 1213–1228.
Vafaie, H. & DeJong, K. (1993). Robust Feature Selection Algorithms. In Proceedings of The Fifth IEEE International Conference on Tools for Artificial Intelligence, pp. 356–363. Boston, MA: Computer Society Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Domingos, F. Control-Sensitive Feature Selection for Lazy Learners. Artificial Intelligence Review 11, 227–253 (1997). https://doi.org/10.1023/A:1006508722917
Issue Date:
DOI: https://doi.org/10.1023/A:1006508722917