Skip to main content
Log in

Control-Sensitive Feature Selection for Lazy Learners

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

High sensitivity to irrelevant features is arguably the main shortcoming of simple lazy learners. In response to it, many feature selection methods have been proposed, including forward sequential selection (FSS) and backward sequential selection (BSS). Although they often produce substantial improvements in accuracy, these methods select the same set of relevant features everywhere in the instance space, and thus represent only a partial solution to the problem. In general, some features will be relevant only in some parts of the space; deleting them may hurt accuracy in those parts, but selecting them will have the same effect in parts where they are irrelevant. This article introduces RC, a new feature selection algorithm that uses a clustering-like approach to select sets of locally relevant features (i.e., the features it selects may vary from one instance to another). Experiments in a large number of domains from the UCI repository show that RC almost always improves accuracy with respect to FSS and BSS, often with high significance. A study using artificial domains confirms the hypothesis that this difference in performance is due to RC's context sensitivity, and also suggests conditions where this sensitivity will and will not be an advantage. Another feature of RC is that it is faster than FSS and BSS, often by an order of magnitude or more.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aha, D. W. (1989). Incremental, Instance-Based Learning of Independent and Graded Concept Descriptions. In Proceedings of The Sixth International Workshop on Machine Learning, pp. 387–391. Ithaca, NY: Morgan Kaufmann.

    Google Scholar 

  • Aha, D. W. (1992). Generalizing from Case Studies: A Case Study. In Proceedings of The Ninth International Workshop on Machine Learning, pp. 1–10. Aberdeen, Scotland: Morgan Kaufmann.

    Google Scholar 

  • Aha, D. W. & Bankert, R. L. (1994). Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 106–112. Seattle, WA: AAAI Press.

    Google Scholar 

  • Aha, D. W. & Goldstone, R. L. (1992). Concept Learning and Flexible Weighting. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pp. 534–539. Evanston, IL: Lawrence Erlbaum.

    Google Scholar 

  • Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-Based Learning Algorithms. Machine Learning 6: 37–66.

    Google Scholar 

  • Almuallim, H. & Dietterich, T. G. (1991). Learning with Many Irrelvant Features. In Proceedings of The Ninth National Conference on Artificial Intelligence, pp. 547–552. Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Atkeson, C. G., Moore, A. W. & Schaal, S. (1997). Locally Weighted Learning. Artificial Intelligence Review, this issue.

  • Cain, T., Pazzani, M. J. & Silverstein, G. (1991). Using Domain Knowledge to Influence Similarity Judgments. In Proceedings of The Case-Based Reasoning Workshop, pp. 191–199. Washington, DC: Morgan Kaufmann.

    Google Scholar 

  • Cardie, C. (1993). Using Decision Trees to Improve Case-Based Learning. In Proceedings of The Tenth International Conference on Machine Learning, pp. 25–32. Amherst, MA: Morgan Kaufmann.

    Google Scholar 

  • Caruana, R. & Freitag, D. (1994). Greedy Attribute Selection. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 28–36. New Brunswick, NJ: Morgan Kaufmann.

    Google Scholar 

  • Clark, P. & Niblett, T. (1989). The CN2 Induction Algorithm. Machine Learning 3: 261–283.

    Google Scholar 

  • Cost, S. & Salzberg, S. (1993). A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10: 57–78.

    Google Scholar 

  • Creecy, R. H., Masand, B. M., Smith, S. J. & Waltz, D. L. (1992). Trading MIPS and Memory for Knowledge Engineering. Communications of the ACM 35(8): 48–63.

    Google Scholar 

  • DeGroot, M. H. (1986). Probability and Statistics, Second Edition. Addison-Wesley: Reading, MA.

    Google Scholar 

  • Devijver, P. A. & Kittler, J. (1982). Pattern Recognition: A Statistical Approach. Prentice/Hall: Englewood Cliffs, NJ.

    Google Scholar 

  • Domingos, P. (1995). The RISE 2.0 System: A Case Study in Multistrategy Learning. TR–95–2, Department of Information and Computer Science, University of California at Irvine, Irvine, CA.

    Google Scholar 

  • Domingos, P. (1996). Unifying Instance-Based and Rule-Based Induction. Machine Learning 24: 141–168.

    Google Scholar 

  • Holte, R. C. (1993). Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11: 63–91.

    Google Scholar 

  • John, G. H., Kohavi, R. & Pfleger, K. (1994). Irrelevant Features and the Subset Selection Problem. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 121–129. New Brunswick, NJ: Morgan Kaufmann.

    Google Scholar 

  • Kelly, J. D. & Davis, L. (1991). A Hybrid Genetic Algorithm for Classification. In Proceedings of The Twelfth International Joint Conference on Artificial Intelligence, pp. 645–650. Sydney: Morgan Kaufmann.

    Google Scholar 

  • Kibler, D. & Aha, D. W. (1987). Learning Representative Exemplars of Concepts: An Initial Case Study. In Proceedings of The Fourth International Workshop on Machine Learning, pp. 24–30, Irvine, CA: Morgan Kaufmann.

    Google Scholar 

  • Kira, A. & Rendell, L. A. (1992). A Practical Approach to Feature Selection. In Proceedings of The Ninth International Workshop on Machine Learning, pp. 249–256. Aberdeen, Scotland: Morgan Kaufmann.

    Google Scholar 

  • Kittler, J. (1986). Feature Selection and Extraction. In Young, T. Y. & Fu, K. S. (eds.) Handbook of Pattern Recognition and Image Processing. Academic Press: New York.

    Google Scholar 

  • Kolodner, J. (1993). Case-Based Reasoning. Morgan Kaufmann: San Mateo, CA.

    Google Scholar 

  • Langley, P. & Sage, S. (1994). Oblivious Decision Trees and Abstract Cases. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 113–117. Seattle, CA: AAAI Press.

    Google Scholar 

  • Lee, C. (1994). An Instance-Based Learning Method for Databases: An Information Theoretic Approach. In Proceedings of The Ninth European Conference on Machine Learning, pp. 387–390. Catania, Italy: Springer-Verlag.

    Google Scholar 

  • Mohri, T. & Tanaka, H. (1994). An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 123–127. Seattle, WA: AAAI Press.

    Google Scholar 

  • Murphy, P. M. (1995). UCI Repository of Machine Learning Databases. Machine-Readable Data Repository, Department of Information and Computer Science, University of California at Irvine, Irvine, CA.

    Google Scholar 

  • Niblett, T. (1987). Constructing Decision Trees in Noisy Domains. In Proceedings of The Second European Working Session on Learning, pp. 67–78. Bled, Yugoslavia: Sigma.

    Google Scholar 

  • Nosofsky, R. M., Clark, S. E. & Shin, H. J. (1989). Rules and Exemplars in Categorization, Identification, and Recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition 15: 282–304.

    Google Scholar 

  • Pagallo, G. & Haussler, D. (1990). Boolean Feature Discovery in Empirical Learning. Machine Learning 3: 71–99.

    Google Scholar 

  • Ricci, F. & Avesani, P. (1995). Learning a Local Similarity Metric for Case-Based Reasoning. In Proceedings of The First International Conference on Case-Based Reasoning, pp. 301–312. Sesimbra, Portugal: Springer-Verlag.

    Google Scholar 

  • Salzberg, S. (1991). A Nearest Hyperrectangle Learning Method. Machine Learning 6: 251–276.

    Google Scholar 

  • Schaffer, C. (1989). Analysis of Artificial Data Sets. In Proceedings of The Second International Symposium on Artificial Intelligence, pp. 607–617. Monterrey, Mexico: McGraw-Hill.

    Google Scholar 

  • Schlimmer, J. C. (1993). Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning. In Proceedings of The Tenth International Conference on Machine Learning, pp. 284–290. Amherst, MA: Morgan Kaufmann.

    Google Scholar 

  • Skalak, D. B. (1992). Representing Cases as Knowledge Sources that Apply Local Similarity Metrics. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pp. 325–330. Evanston, IL: Lawrence Erlbaum.

    Google Scholar 

  • Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 293–301. New Brunswick, NJ: Morgan Kaufmann.

    Google Scholar 

  • Stanfill, C. & Waltz, D. (1986). Toward Memory-Based Reasoning. Communications of the ACM 29: 1213–1228.

    Google Scholar 

  • Vafaie, H. & DeJong, K. (1993). Robust Feature Selection Algorithms. In Proceedings of The Fifth IEEE International Conference on Tools for Artificial Intelligence, pp. 356–363. Boston, MA: Computer Society Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domingos, F. Control-Sensitive Feature Selection for Lazy Learners. Artificial Intelligence Review 11, 227–253 (1997). https://doi.org/10.1023/A:1006508722917

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1006508722917

Navigation