Advertisement

Machine Learning

, Volume 20, Issue 1–2, pp 35–61 | Cite as

Inductive policy: The pragmatics of bias selection

  • Foster John Provost
  • Bruce G. Buchanan
Article

Abstract

This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (theinductive policy). We analyze existing “bias selection” systems, examining the similarities and differences in their inductive policies, and identify three techniques useful for building inductive policies. We then present a framework for representing and automatically selecting a wide variety of biases and describe experiments with an instantiation of the framework addressing various pragmatic tradeoffs of time, space, accuracy, and the cost of errors. The experiments show that a common framework can be used to implement policies for a variety of different types of bias selection, such as parameter selection, term selection, and example selection, using similar techniques. The experiments also show that different tradeoffs can be made by the implementation of different policies; for example, from the same data different rule sets can be learned based on different tradeoffs of accuracy versus the cost of erroneous predictions.

Keywords

inductive learning inductive bias bias selection inductive policy pragmatics 

References

  1. Almuallim, H., & Dietterich, T. (1991). Learning with many irrelevant features.Proceedings of the Ninth National Conference on Artificial Intelligence (pp. 547–552), Menlo Park, CA: AAAI Press.Google Scholar
  2. Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984).Classification and regression trees, Belmont, CA: Wadsworth International Group.Google Scholar
  3. Brodley, C.E. (1993). Addressing the selective superiority problem: automatic algorithm/model class selection.Proceedings of the Tenth International Conference on Machine Learning (pp. 17–24), San Mateo, CA: Morgan Kaufmann.Google Scholar
  4. Brodley, C.E. (this volume). Recursive automatic bias selection for classifier construction.Google Scholar
  5. Buchanan, B., Johnson, C., Mitchell, T., & Smith, R. (1978). Models of learning systems, in J. Beltzer (Ed.),Encyclopedia of computer science and technology (pp. 24–51).Google Scholar
  6. Buntine, W. (1991).A theory of learning classification rules, Ph.D. Thesis, University of Technology, Sidney.Google Scholar
  7. Catlett, J. (1992). Peepholing: choosing attributes efficiently for megainduction, in D. Sleeman & P. Edwards (Eds.),Proceedings of the Ninth International Conference on Machine Learning (pp. 49–54), San Mateo, CA: Morgan Kaufmann.Google Scholar
  8. Chan, P., & Stolfo, S. (1993). Toward parallel and distributed learning by meta-learning.Proceedings of the AAAI-93 Workshop on Knowledge Discovery in Databases.Google Scholar
  9. Clearwater, S., Cheng, T., Hirsh, H., & Buchanan, B. (1989). Incremental batch learning.Proceedings of the Sixth International Workshop on Machine Learning (pp. 366–370), San Mateo, CA: Morgan Kaufmann.Google Scholar
  10. Clearwater, S., & Provost, F. (1990). RL4: a tool for knowledge-based induction.Proceedings of the Second International IEEE Conference on Tools for Artificial Intelligence (pp. 24–30), Los Alamitos, CA: IEEE C.S. Press.Google Scholar
  11. Cohen, D., Kulikowski, C., & Berman, H. (1993). Knowledge-based generation of machine learning experiments: learning with DNA crystallography data.Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, Menlo Park, CA: AAAI Press.Google Scholar
  12. Danyluk, A.P., & Provost, F.J. (1993). Small disjuncts in action: learning to diagnose errors in the telephone network local loop.Proceedings of the Tenth International Conference on Machine Learning (ML-93) (pp. 81–88), San Mateo, CA: Morgan Kaufmann.Google Scholar
  13. DeJong, G., & Mooney, R. (1986). Explanation-based learning: an alternative view,Machine Learning, 1:287–316.Google Scholar
  14. desJardins, M. (1991). Probabilistic evaluation of bias for learning systems.Proceedings of the Eighth International Workshop on Machine Learning (pp. 495–499). San Mateo, CA: Morgan Kaufmann.Google Scholar
  15. desJardins, M. (1992).PAGODA: a model for autonomous learning in probabilistic domains, Ph.D. Thesis, University of California at Berkeley (available as Technical Report UCB/CSD 92/678).Google Scholar
  16. Devijver, P.A., & Kittler, J. (1982).Pattern recognition: a statistical approach, Prentice Hall.Google Scholar
  17. Dietterich, T. (1991). Machine learning: issues, answers, and quandaries, Keynote Lecture, AAAI-91.Google Scholar
  18. Dietterich, T., & Michalski, R. (1983). A comparative review of selected methods for learning from examples, in R. Michalski, J. Carbonell, & T. Mitchell (Eds.),Machine learning: an artificial intelligence approach (pp. 41–81), Palo Alto, CA: Tioga.Google Scholar
  19. Gordon, D. (1990).Active bias adjustment for incremental, supervised concept learning, Ph.D. Thesis, Department of Computer Science, University of Maryland.Google Scholar
  20. Grosof, B., & Russell, S. (1990). Shift of bias as non-monotonic reasoning, in Brazdil & Konolige (Eds.),Machine Learning, meta-reasoning, and logics, Boston, MA: Kluwer Academic Publishers.Google Scholar
  21. Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework,Artificial Intelligence, 36:177–221.Google Scholar
  22. Holder, L. (1990). The general utility problem in machine learning,Proceedings of the Seventh International Conference on Machine Learning (pp. 402–410). San Mateo, CA: Morgan Kaufmann.Google Scholar
  23. Holte, R.C. (1993). Very simple classification rules perform well on most commonly used datasets.Machine Learning, 11(1):63–90.Google Scholar
  24. Kira, K., & Rendell, L. (1992). The feature selection problem: traditional methods and a new algorithm,Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 129–134), Menlo Park, CA: AAAI Press.Google Scholar
  25. Korf, R.E. (1985). Depth-first iterative deepening: an optimal admissible tree search,Artificial Intelligence, 27:97–109.Google Scholar
  26. Lifschitz, V. (1987). Circumscriptive theories: a logic-based framework for knowledge.Proceedings of the Sixth National Conference on Artificial Intelligence (pp. 364–368), San Mateo, CA: Morgan Kaufmann.Google Scholar
  27. Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm,Machine Learning, 2(4):284–318.Google Scholar
  28. Matheus, C. (1989).Feature construction: an analytic framework and an application to decision trees, Ph.D. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign.Google Scholar
  29. McCarthy, J. (1980). Circumscription—a form of non-monotonic reasoning,Artificial Intelligence, 13:27–39.Google Scholar
  30. Michalski, R., & Mozetic, I., Hong, J., & Lavrac, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains.Proceedings of the Fifth National Conference on Artifical Intelligence (pp. 1041–1045), Menlo Park, CA: AAAI Press.Google Scholar
  31. Michalski, R.S. (1993). Special issue on multistrategy learning,Machine Learning, 11(2/3).Google Scholar
  32. Mitchell, T. (1978).Version spaces: an approach to concept learning, Ph.D. Thesis, Department of Electrical Engineering, Stanford University.Google Scholar
  33. Mitchell, T. (1980).The need for biases in learning generalizations (Technical Report CBM-TR-117), Department of Computer Science, Rutgers University.Google Scholar
  34. Mitchell, T., Keller, R., & Kedar-Cabelli, S. (1986). Explanation-based generalization: a unifying view,Machine Learning, 1:47–80.Google Scholar
  35. Mitchell, T., Utgoff, P., & Banerji, R. (1983). Learning by experimentation: acquiring and refining problem-sloving heuristics, in R. Michalski, J. Carbonell, & T. Mitchell (Eds.),Machine learning: an artificial intelligence approach (pp. 163–190), Palo Alto, CA: Tioga.Google Scholar
  36. Musick, R., Catlett, J., & Russell, S. (1993). Decision theoretic subsampling for induction on large databases.Proceedings of the Tenth International Conference on Machine Learning (pp. 212–219), San Mateo, CA: Morgan Kaufmann.Google Scholar
  37. Pagallo, G. (1989). Learning DNF by decision trees.Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 639–644), San Mateo, CA: Morgan Kaufmann.Google Scholar
  38. Provost, F.J. (1992a). ClimBS: searching the bias space.Proceedings of the Fourth International Conference on Tools with Artificial Intelligence (pp. 146–153), Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  39. Provost, F.J. (1992b).Policies for the selection of bias in inductive machine learning, Ph.D. Thesis, Computer Science Department, University of Pittsburgh.Google Scholar
  40. Provost, F.J. (1993). Iterative weakening: optimal and near-optimal policies for the selection of search bias.Proceedings of the Eleventh National Conference on Artificial Intelligence (pp. 749–755), Menlo Park, CA: AAAI Press.Google Scholar
  41. Provost, F.J. (1994). Goal-directed inductive learning: trading off accuracy for reduced error cost.Proceedings of the AAAI Spring Symposium on Goal-Directed Learning.Google Scholar
  42. Provost, F.J., & Aronis, J. (in press).Scaling up inductive learning with massive parallelism, Machine Learning.Google Scholar
  43. Provost, F.J., & Hennessy, D. (1994). Distributed machine learning: scaling up with coarse-grained parallelism.Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA: AAAI Press.Google Scholar
  44. Provost, F., & Buchanan, B. (1992a). Inductive policy.Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 255–261), Menlo Park, CA: AAAI Press.Google Scholar
  45. Provost, F., & Buchanan, B. (1992b). Inductive strengthening: the effects of a simple heuristic for restricting hypothesis space search, in K.P. Jantke (Ed.),Analogical and inductive inference (Proceedings of the 3rd International Workshop on Analogical and Inductive Inference), New York, NY: Springer-Verlag.Google Scholar
  46. Provost, F.J., Buchanan, B.G., Clearwater, S.H., & Lee, Y. (1993).Machine learning in the service of exploratory science and engineering: a case study of the RL induction program (Technical Report ISL-93-6), Intelligent Systems Laboratory, University of Pittsburgh.Google Scholar
  47. Quinlan, J. (1986a). Induction of decision trees,Machine Learning, 1:81–106.Google Scholar
  48. Quinlan, J. (1986b). Inductive knowledge acquisition: a case study, in J. Quinlan (Ed.),Applications of expert systems (pp. 157–173).Google Scholar
  49. Quinlan, J. (1987). Generating production rules from decision trees.Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 304–307), San Mateo, CA: Morgan Kaufmann.Google Scholar
  50. Quinlan, R. (1993). Presentation of new results on selecting from multiple learning algorithms at the Tenth International Conference on Machine Learning.Google Scholar
  51. Rabinowitz, H., Flamholz, J., Wolin, E., & Euchner, J. (1991). NYNEX MAX: a telephone trouble screening expert.Innovative Applications of Artificial Intelligence, 3 (pp. 213–230), Menlo Park, CA: AAAI Press.Google Scholar
  52. Rendell, L. (1986). A general framework for induction and a study of selective induction,Machine Learning, 1:177–226.Google Scholar
  53. Rendell, L., Seshu, R., & Tcheng, D. (1987). Layered concept-learning and dynamically-variable bias management.Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 308–314), San Mateo, CA: Morgan Kaufmann.Google Scholar
  54. Riddle, P. (1989).Automating shifts of representation, Ph.D. Thesis, Department of Computer Science, Rutgers University.Google Scholar
  55. Ruff, R. (1990).An empirical study into learning through experimentation, Ph.D. Thesis, Department of Computer Science, Oregon State University.Google Scholar
  56. Russell, S. (1986). Preliminary steps toward the automation of induction.Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 477–484), San Mateo, CA: Morgan Kaufmann.Google Scholar
  57. Russell, S., & Grosof, B. (1990). Declarative bias: an overview, in D.P. Benjamin (Ed.),Change of representation and inductive bias (pp. 267–308), Boston, MA: Kluwer Academic Publishers.Google Scholar
  58. Rymon, R. (1993). An SE-tree based characterization of the induction problem.Proceedings of the Tenth International Conference on Machine Learning (pp. 268–275), San Mateo, CA: Morgan Kaufmann.Google Scholar
  59. Samuel, A. (1963). Some studies in machine learning using the game of checkers, in E. Feigenbaum & J. Feldman (Eds.),Computers and Thought (pp. 71–105), New York, NY: McGraw-Hill.Google Scholar
  60. Schaffer, C. (1993). Selecting a classification method by cross-validation,Machine Learning, 13(1):135–143.Google Scholar
  61. Schlimmer, J. (1987).Concept acquisition through representational adjustment, Ph.D. Thesis, Department of Information and Computer Science, University of California at Irvine.Google Scholar
  62. Simon, H.A., & Kadane, J.B. (1975). Optimal problem-solving search: all-or-none solutions,Artificial Intelligence, 6:235–247.Google Scholar
  63. Simon, H.A., & Lea, G. (1974). Problem solving and rule induction: a unified view, in L.W. Gregg (Ed.),Knowledge and cognition (pp. 105–128), Lawrence Erlbaum Associates.Google Scholar
  64. Slagle, J.R. (1964). An efficient algorithm for finding certain minimum-cost procedures for making binary decisions,Journal of the ACM, 11(3):253–264.Google Scholar
  65. Spears, W., & Gordon, D. (1991). Adaptive strategy selection for concept learning.Proceedings of the First International Workshop on Multistrategy Learning (pp. 231–246) Fairfax, VA: George Mason University.Google Scholar
  66. Subramanian, D. (1989).Toward a theory of justified reformulations, Ph.D. Thesis, Stanford University.Google Scholar
  67. Tcheng, D., Lambert, B., Lu, S., & Rendell, L. (1989). Building robust learning systems by combining induction and optimization.Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 806–812), San Mateo, CA: Morgan Kaufmann.Google Scholar
  68. Uhr, L., & Vossler, C. (1963). A pattern-recognition program that generates, evaluates, and adjusts its own operators, in E. Feigenbaum & J. Feldman (Eds.),Computers and thought (pp. 251–268), New York, NY: McGraw-Hill.Google Scholar
  69. Utgoff, P. (1984).Shift of bias for inductive concept learning, Ph.D. Thesis, Rutgers University.Google Scholar
  70. Utgoff, P. (1986). Shift of bias for inductive concept learning, in R. Michalski, J. Carbonell, & T. Mitchell (Eds.),Machine learning, an artificial intelligence approach (pp. 107–148), San Mateo, CA: Morgan Kaufmann.Google Scholar
  71. Utgoff, P.E. (1989). Incremental induction of decision trees,Machine Learning, 4(2):161–186.Google Scholar
  72. Weiss, S., & Indurkhya, N. (1991). Reduced complexity rule induction.Proceedings of IJCAI-91 (pp. 678–684).Google Scholar
  73. Wirth, J., & Catlett, J. (1988). Experiments on the costs and benefits of windowing in ID3.Proceedings of the Fifth International Conference on Machine Learning (pp. 87–99), San Mateo, CA: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Foster John Provost
    • 1
  • Bruce G. Buchanan
    • 2
  1. 1.NYNEX Science & TechnologyWhite Plains
  2. 2.Intelligent Systems Laboratory, Computer Science DepartmentUniversity of PittsburghPittsburgh

Personalised recommendations