Artificial Intelligence Review

, Volume 11, Issue 1–5, pp 227–253 | Cite as

Control-Sensitive Feature Selection for Lazy Learners

  • Fedro Domingos
Article

Abstract

High sensitivity to irrelevant features is arguably the main shortcoming of simple lazy learners. In response to it, many feature selection methods have been proposed, including forward sequential selection (FSS) and backward sequential selection (BSS). Although they often produce substantial improvements in accuracy, these methods select the same set of relevant features everywhere in the instance space, and thus represent only a partial solution to the problem. In general, some features will be relevant only in some parts of the space; deleting them may hurt accuracy in those parts, but selecting them will have the same effect in parts where they are irrelevant. This article introduces RC, a new feature selection algorithm that uses a clustering-like approach to select sets of locally relevant features (i.e., the features it selects may vary from one instance to another). Experiments in a large number of domains from the UCI repository show that RC almost always improves accuracy with respect to FSS and BSS, often with high significance. A study using artificial domains confirms the hypothesis that this difference in performance is due to RC's context sensitivity, and also suggests conditions where this sensitivity will and will not be an advantage. Another feature of RC is that it is faster than FSS and BSS, often by an order of magnitude or more.

lazy learning feature selection nearest neighbor induction machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D. W. (1989). Incremental, Instance-Based Learning of Independent and Graded Concept Descriptions. In Proceedings of The Sixth International Workshop on Machine Learning, pp. 387–391. Ithaca, NY: Morgan Kaufmann.Google Scholar
  2. Aha, D. W. (1992). Generalizing from Case Studies: A Case Study. In Proceedings of The Ninth International Workshop on Machine Learning, pp. 1–10. Aberdeen, Scotland: Morgan Kaufmann.Google Scholar
  3. Aha, D. W. & Bankert, R. L. (1994). Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 106–112. Seattle, WA: AAAI Press.Google Scholar
  4. Aha, D. W. & Goldstone, R. L. (1992). Concept Learning and Flexible Weighting. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pp. 534–539. Evanston, IL: Lawrence Erlbaum.Google Scholar
  5. Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-Based Learning Algorithms. Machine Learning 6: 37–66.Google Scholar
  6. Almuallim, H. & Dietterich, T. G. (1991). Learning with Many Irrelvant Features. In Proceedings of The Ninth National Conference on Artificial Intelligence, pp. 547–552. Menlo Park, CA: AAAI Press.Google Scholar
  7. Atkeson, C. G., Moore, A. W. & Schaal, S. (1997). Locally Weighted Learning. Artificial Intelligence Review, this issue.Google Scholar
  8. Cain, T., Pazzani, M. J. & Silverstein, G. (1991). Using Domain Knowledge to Influence Similarity Judgments. In Proceedings of The Case-Based Reasoning Workshop, pp. 191–199. Washington, DC: Morgan Kaufmann.Google Scholar
  9. Cardie, C. (1993). Using Decision Trees to Improve Case-Based Learning. In Proceedings of The Tenth International Conference on Machine Learning, pp. 25–32. Amherst, MA: Morgan Kaufmann.Google Scholar
  10. Caruana, R. & Freitag, D. (1994). Greedy Attribute Selection. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 28–36. New Brunswick, NJ: Morgan Kaufmann.Google Scholar
  11. Clark, P. & Niblett, T. (1989). The CN2 Induction Algorithm. Machine Learning 3: 261–283.Google Scholar
  12. Cost, S. & Salzberg, S. (1993). A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10: 57–78.Google Scholar
  13. Creecy, R. H., Masand, B. M., Smith, S. J. & Waltz, D. L. (1992). Trading MIPS and Memory for Knowledge Engineering. Communications of the ACM 35(8): 48–63.Google Scholar
  14. DeGroot, M. H. (1986). Probability and Statistics, Second Edition. Addison-Wesley: Reading, MA.Google Scholar
  15. Devijver, P. A. & Kittler, J. (1982). Pattern Recognition: A Statistical Approach. Prentice/Hall: Englewood Cliffs, NJ.Google Scholar
  16. Domingos, P. (1995). The RISE 2.0 System: A Case Study in Multistrategy Learning. TR–95–2, Department of Information and Computer Science, University of California at Irvine, Irvine, CA.Google Scholar
  17. Domingos, P. (1996). Unifying Instance-Based and Rule-Based Induction. Machine Learning 24: 141–168.Google Scholar
  18. Holte, R. C. (1993). Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11: 63–91.Google Scholar
  19. John, G. H., Kohavi, R. & Pfleger, K. (1994). Irrelevant Features and the Subset Selection Problem. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 121–129. New Brunswick, NJ: Morgan Kaufmann.Google Scholar
  20. Kelly, J. D. & Davis, L. (1991). A Hybrid Genetic Algorithm for Classification. In Proceedings of The Twelfth International Joint Conference on Artificial Intelligence, pp. 645–650. Sydney: Morgan Kaufmann.Google Scholar
  21. Kibler, D. & Aha, D. W. (1987). Learning Representative Exemplars of Concepts: An Initial Case Study. In Proceedings of The Fourth International Workshop on Machine Learning, pp. 24–30, Irvine, CA: Morgan Kaufmann.Google Scholar
  22. Kira, A. & Rendell, L. A. (1992). A Practical Approach to Feature Selection. In Proceedings of The Ninth International Workshop on Machine Learning, pp. 249–256. Aberdeen, Scotland: Morgan Kaufmann.Google Scholar
  23. Kittler, J. (1986). Feature Selection and Extraction. In Young, T. Y. & Fu, K. S. (eds.) Handbook of Pattern Recognition and Image Processing. Academic Press: New York.Google Scholar
  24. Kolodner, J. (1993). Case-Based Reasoning. Morgan Kaufmann: San Mateo, CA.Google Scholar
  25. Langley, P. & Sage, S. (1994). Oblivious Decision Trees and Abstract Cases. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 113–117. Seattle, CA: AAAI Press.Google Scholar
  26. Lee, C. (1994). An Instance-Based Learning Method for Databases: An Information Theoretic Approach. In Proceedings of The Ninth European Conference on Machine Learning, pp. 387–390. Catania, Italy: Springer-Verlag.Google Scholar
  27. Mohri, T. & Tanaka, H. (1994). An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes. In Proceedings of The 1994 AAAI Workshop on Case-Based Reasoning, pp. 123–127. Seattle, WA: AAAI Press.Google Scholar
  28. Murphy, P. M. (1995). UCI Repository of Machine Learning Databases. Machine-Readable Data Repository, Department of Information and Computer Science, University of California at Irvine, Irvine, CA.Google Scholar
  29. Niblett, T. (1987). Constructing Decision Trees in Noisy Domains. In Proceedings of The Second European Working Session on Learning, pp. 67–78. Bled, Yugoslavia: Sigma.Google Scholar
  30. Nosofsky, R. M., Clark, S. E. & Shin, H. J. (1989). Rules and Exemplars in Categorization, Identification, and Recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition 15: 282–304.Google Scholar
  31. Pagallo, G. & Haussler, D. (1990). Boolean Feature Discovery in Empirical Learning. Machine Learning 3: 71–99.Google Scholar
  32. Ricci, F. & Avesani, P. (1995). Learning a Local Similarity Metric for Case-Based Reasoning. In Proceedings of The First International Conference on Case-Based Reasoning, pp. 301–312. Sesimbra, Portugal: Springer-Verlag.Google Scholar
  33. Salzberg, S. (1991). A Nearest Hyperrectangle Learning Method. Machine Learning 6: 251–276.Google Scholar
  34. Schaffer, C. (1989). Analysis of Artificial Data Sets. In Proceedings of The Second International Symposium on Artificial Intelligence, pp. 607–617. Monterrey, Mexico: McGraw-Hill.Google Scholar
  35. Schlimmer, J. C. (1993). Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning. In Proceedings of The Tenth International Conference on Machine Learning, pp. 284–290. Amherst, MA: Morgan Kaufmann.Google Scholar
  36. Skalak, D. B. (1992). Representing Cases as Knowledge Sources that Apply Local Similarity Metrics. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pp. 325–330. Evanston, IL: Lawrence Erlbaum.Google Scholar
  37. Skalak, D. B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Proceedings of The Eleventh International Conference on Machine Learning, pp. 293–301. New Brunswick, NJ: Morgan Kaufmann.Google Scholar
  38. Stanfill, C. & Waltz, D. (1986). Toward Memory-Based Reasoning. Communications of the ACM 29: 1213–1228.Google Scholar
  39. Vafaie, H. & DeJong, K. (1993). Robust Feature Selection Algorithms. In Proceedings of The Fifth IEEE International Conference on Tools for Artificial Intelligence, pp. 356–363. Boston, MA: Computer Society Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Fedro Domingos
    • 1
  1. 1.Department of Information and Computer ScienceUniversity of California, IrvineIrvineU.S.A.

Personalised recommendations