Advertisement

Machine Learning

, Volume 3, Issue 2–3, pp 193–223 | Cite as

Learning and programming in classifier systems

  • Richard K. Belew
  • Stephanie Forrest
Article

Abstract

Both symbolic and subsymbolic models contribute important insights to our understanding of intelligent systems. Classifier systems are low-level learning systems that are also capable of supporting representations at the symbolic level. In this paper, we explore in detail the issues surrounding the integration of programmed and learned knowledge in classifier-system representations, including comprehensibility, ease of expression, explanation, predictability, robustness, redundancy, stability, and the use of analogical representations. We also examine how these issues speak to the debate between symbolic and subsymbolic paradigms. We discuss several dimensions for examining the tradeoffs between programmed and learned representations, and we propose an optimization model for constructing hybrid systems that combine positive aspects of each paradigm.

Keywords

Subsymbolic representation inheritance tagging default hierarchy connectionism 

References

  1. Anderson J. A., & Hinton G. E. (1984). Parallel models of associative memory. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  2. Belew, R. K. (1986). Adaptive information retrieval: Machine learning in associative networks. Doctoral dissertation, Department of Computer and Communication Sciences, University of Michigan, Ann Arbor.Google Scholar
  3. Belew R. K., & Gherrity M. (1988). Connectionism and the classifier system: Two examples of subsymbolic learning. Unpublished manuscript. University of California, San Diego, Computer Science and Engineering Department, La Jolla.Google Scholar
  4. Belew R. K., & Holland M. P. (1987). BIBLIO: A computer system designed to support the near-library user of information retrieval. Unpublished manuscript. University of California, San Diego, Computer Science and Engineering Department, La Jolla.Google Scholar
  5. Booker, L. B. (1982). Intelligent behavior as an adaptation to the task environment. Doctoral dissertation, Department of Computer and Communication Sciences, University of Michigan, Ann Arbor.Google Scholar
  6. Brachman R. J., & Levesque H. L. (Eds.) (1985). Reading in knowledge representation. Los Altos, CA: Morgan Kaufmann.Google Scholar
  7. Brachman R. J., & Schmolze J. G. (1985). An overview of the KL-ONE knowledge representation system. Cognitive Science, 9, 171–216.Google Scholar
  8. Davis R., & Buchanan B. G. (1984). Meta-level knowledge. In B. G.Buchanan & E. H.Shortliffe (Eds.), Rule-based expert systems. Reading, MA: Addison-Wesley.Google Scholar
  9. Davis R., & King J. (1977). An overview of production systems. In E. W.Elcock & D.Michie (Eds.), Machine intelligence (Vol. 8). New York: American Elsevier.Google Scholar
  10. DeJong K. (1988). Learning with genetic algorithms: An overview. Machine Learning, 3, 121–138.Google Scholar
  11. Erman L. D., Hayes-Roth F., Lesser V., & Reddy R. (1980). The Hearsay-II speech-understanding system: Integrating knowledge to resolve uncertainty. Computing Surveys, 12, 213–253.Google Scholar
  12. Erman L. D., London P. E., & Scott A. C. (1984). Separating and integrating control in a rule-based tool. Proceedings of the IEEE Workshop on Principles of Knowledge-based Systems (pp. 37–43). Silver Springs, MD: IEEE Computer Society Press.Google Scholar
  13. Fahlman S. E. (1979). NETL: A system for representing and using real-world knowledge. Cambridge, MA: MIT Press.Google Scholar
  14. Fanty M. (1986). A connectionist simulator for the BBN Butterfly multiprocessor (Technical Report BFP 2). Rochester, NY: University of Rochester, Computer Science Department.Google Scholar
  15. Feldman J. A., Fanty M. A., & Goddard N. H. (1988). Computing with structured connectionist networks. Communications of the ACM, 31, 170–187.Google Scholar
  16. Forgy C., & McDermott J. (1977). OPS, a domain-independent production system language. Proceedings of the Fifth International Joint Conference on Artificial Intelligence (pp. 933–939). Cambridge, MA: Morgan Kaufmann.Google Scholar
  17. Forrest, S. (1985). A study of parallelism in the classifier system and its application to classification in KL-ONE semantic networks. Doctoral dissertation, Department of Computer and Communication Sciences, University of Michigan, Ann Arbor.Google Scholar
  18. Gallant S. I. (1988). Connectionist expert systems. Communications of the ACM, 31, 152–169.Google Scholar
  19. Gennari, J. H., Langley, P., & Fisher, D. (in press). Models of incremental concept formation. Artificial Intelligence.Google Scholar
  20. Goldberg, D. E. (1983). Computer-aided gas pipeline operation using genetic algorithms and rule learning. Doctoral dissertation, Department of Civil Engineering, University of Michigan, Ann Arbor.Google Scholar
  21. Holland J. H. (1985). Properties of the bucket brigade algorithm. Proceedings of the First International Conference on Genetic Algorithms and Their Applications (pp. 1–7). Pittsburgh, PA: Lawrence Erlbaum.Google Scholar
  22. Holland J. H., Holyoak K. J., Nisbett R. E., & Thagard P. R. (1986). Induction: Processes of inference, learning, and discovery. Cambridge, MA: MIT Press.Google Scholar
  23. Klopf A. H. (1987). Drive-reinforcement learning: A real-time learning mechanism for unsupervised learning. Proceedings of the International Conference on Neural Networks (pp. 441–446) San Diego, CA: IEEE.Google Scholar
  24. Lenat D. B., & Brown J. S. (1984). Why AM and Eurisko appear to work. Artificial Intelligence, 23, 269–298.Google Scholar
  25. Lipkis T. (1981). A KL-ONE Classifier (Technical Report). Marina del Rey, CA: University of Southern California, Information Sciences Institute.Google Scholar
  26. McCarthy J. (1960). Recursive functions of symbolic expressions and their computation by machine, Part I. Communications of the ACM, 3, 185–95.Google Scholar
  27. Mead C. (1987). Silicon models of neural computation. Proceedings of the International Conference on Neural Networks (pp. 91–106). San Diego, CA: IEEE.Google Scholar
  28. Newell A. (1973). Production systems: Models of control structures. In W. G.Chase (Ed.), Visual information processing. New York: Academic Press.Google Scholar
  29. Newell A. (1980). Physical symbol systems. Cognitive Science, 4, 135–183.Google Scholar
  30. Pinker, S., & Prince, A. (in press). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition.Google Scholar
  31. Quinlan J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S.Michalski, J. G.Carbonell, & T. M.Mitchell (Eds.), Machine learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann.Google Scholar
  32. Rescorla R. A., & Wagner A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H.Black & W. F.Prokasy (Eds.), Classical conditioning (Vol. 2). New York: Appleton-Century-Crofts.Google Scholar
  33. Riolo R. L. (1987). Bucket brigade performance: II. Default hierarchies. Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms (pp. 196–201). Cambridge, MA: Lawrence Erlbaum.Google Scholar
  34. Robertson G. G., & Riolo R. L. (1988). A tale of two classifier systems. Machine Learning, 3, 139–159.Google Scholar
  35. Rosenberg C. R., & Blelloch G. (1987). An implementation of network learning on the Connection Machine (Technical Report). Cambridge, MA: Thinking Machines, Inc.Google Scholar
  36. Rosenbloom P., & Newell A. (1987). Learning by chunking: A production system model of practive. In D.Klahr, P.Langley, & R.Neches (Eds.), Production system models of learning and development. Cambridge, MA: MIT Press.Google Scholar
  37. Rumelhart D. E., Hinton G. E., & Williams R. J. (1986). Learning internal representations by error propagation. In D. E.Rumelhart & J. L.McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.Google Scholar
  38. Rumelhart D. E., & McClelland J. L. (1986). On learning the past tenses of English verbs. In J. L.McClelland & D. E.Rumelhart (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 2). Cambridge, MA: MIT Press.Google Scholar
  39. Rumelhart D. E., McClelland J. L., & the PDP Research Group (Eds.). (1986). Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.Google Scholar
  40. Scott P. D., & Vogt R. C. (1983). Knowledge oriented learning. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (pp. 432–435). Karlsruhe, West Germany: Morgan Kaufmann.Google Scholar
  41. Sejnowski T. J., & Rosenberg C. R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145–168.Google Scholar
  42. Shepard R. N. (1981). Psychophysical complementarity. In M.Kubovy & J. R.Pomerantz (Eds.), Perceptual organization. Hillsdale, NJ: LawrenceErlbaum.Google Scholar
  43. Simon H. A. (1983). Why should machines learn? In R. S.Michalski, J. G.Carbonell, & T. M.Mitchell (Eds.), Machine learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann.Google Scholar
  44. Simon H. A., & Lea G. (1974). Problem solving and rule induction: A unified view. In L.Gregg (Ed.), Knowledge and cognition. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  45. Sloman A. (1971). Interactions between philosophy and AI—The role of intuition and non-logical reasoning in intelligence. Artificial Intelligence, 2, 209–225.Google Scholar
  46. Sloman, A. (1975). Afterthoughts on analogical representation. Proceedings of the First Workshop on Theoretical Issues in Natural Language Processing (pp. 164–168). Cambridge, MA.Google Scholar
  47. Smith B. C. (1984). Reflection and semantics in LISP. Proceedings of Principles of Programming Languages (pp. 23–35). New York: ACM.Google Scholar
  48. Smith L. C. (1981). Representation issues in information retrieval system design. Proceedings of Information Storage and Retrieval, 19, 100–105.Google Scholar
  49. Smith, S. F. (1980). A learning system based on genetic adaptive algorithms. Doctoral dissertation, Department of Computer Science, University of Pittsburgh, PA.Google Scholar
  50. Smolensky, P. (in press). On the proper treatment of connectionism. Behavioral and Brain Sciences.Google Scholar
  51. Sutton R. S. (1988) Learning to predict by the methods of temporal difference. Machine Learning, 3, 9–44.Google Scholar
  52. Sutton R. S., & Barto A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88, 135–170.Google Scholar
  53. Sutton R. S., & Barto A. G. (1987). A temporal-difference model of classical conditioning. Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 355–378). Seattle, WA: Lawrence Erlbaum.Google Scholar
  54. Touretzky D. S., & Hinton G. E. (1985). Symbols among the neurons: Details of a connectionist inference architecture. Proceedings of the Ninth International Joint Conference on Artificial Intelligence (pp. 238–243). Los Angeles, CA: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 1988

Authors and Affiliations

  • Richard K. Belew
    • 1
  • Stephanie Forrest
    • 2
  1. 1.Computer Science and Engineering DepartmentUniversity of CaliforniaLa JollaU.S.A.
  2. 2.Teknowledge, Inc.Palo AltoU.S.A.

Personalised recommendations