Machine Learning

, Volume 35, Issue 2, pp 155–185 | Cite as

Toward a Model of Intelligence as an Economy of Agents

  • Eric B. Baum


A market-based algorithm is presented which autonomously apportions complex tasks to multiple cooperating agents giving each agent the motivation of improving performance of the whole system. A specific model, called “The Hayek Machine” is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve more complex BW problems than any previous learning algorithm. Given intermediate reward and simple features, it has learned to efficiently solve arbitrary BW problems. The Hayek Machine can also be seen as a model of evolutionary economics.

reinforcement learning multi-agent systems planning evolutionary economics tragedy of the commons classifier systems agoric systems autonomous programming cognition artificial intelligence Hayek complex adaptive systems temporal difference learning evolutionary computation economic models of mind economic models of computation Blocks World reasoning learning computational learning theory learning to reason meta-reasoning 


  1. Anderson, E.S. (1996). Evolutionary Economics: Post-schumpeterian contributions. London: Pinter Publishers.Google Scholar
  2. Anderson, P.W., Arrow, K.J., & Pines, D. (1998). The economy as an evolving complex system. Redwood City, CA: Addison Wesley.Google Scholar
  3. Bacchus, F., & Kabanza, F. (1995). Using temporal logic to control search in planning. Unpublished document available from A short version was presented at the European Workshop on Planning.Google Scholar
  4. Baum, E.B. (1996). Toward a model of mind as a laissez-faire economy of idiots, extended abstract. In L. Saitta (Ed.), Proc. 13th ICML '96 (pp. 28–36). San Francisco, CA: Morgan Kaufman.Google Scholar
  5. Baum, E.B. (1998). Manifesto for an evolutionary economics of intelligence. In C.M. Bishop (Ed.), Neural networks and machine learning. Springer-Verlag.Google Scholar
  6. Baum, E.B., Boneh, D., & Garrett, C. (1995). On genetic algorithms. COLT '95: Proceedings of the Eighth Annual Conference on Computational Learning Theory (pp. 230–239). New York: Association for Computing Machinery.Google Scholar
  7. Baum, E.B., & Durdanovic, I. (1998a). Emergent planning by an artificial economy. Submitted for publication.Google Scholar
  8. Baum, E.B., & Durdanovic, I. (1998b). Toward code evolution by artificial economies. In L.F. Landweber and E. Wintree (Eds.), Evaluation as Computation, Springer Verlag, 1999, and available at Scholar
  9. Bertsekas, D.P., & Tsitsiklis, D.P. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.Google Scholar
  10. Birk, A., & Paul, W.J. (1994). Schemas and genetic programming. Conference on Integration of Elementary Functions into Complex Behavior, Bielefeld.Google Scholar
  11. Carbonell, J.G., Blythe, J., Etzioni, O., Gill, Y., Joseph, R., Khan, D., Knoblock, C., Minton, S., Perez, A., Reilly, S., Veloso, M., & Wang, X. (1992). Prodigy 4.0: The manual and tutorial. Technical Report CMU-CS-92-150, School of Computer Science.Google Scholar
  12. S.H. Clearwater (Ed.). (1996). Market-based control, a paradigm for distributed resource allocation. Singapore: World Scientific.Google Scholar
  13. Coase, R.H. (1960). The theory of social cost. Journal of Law and Economics, 3(1), 1–44.Google Scholar
  14. Cosimides, L., & Tooby, J. (1992). Cognitive adaptations for social exchange. In J.H. Barkow, L. Cosimidies, & J. Tooby (Eds.), The adapted mind. New York: Oxford University Press.Google Scholar
  15. Crites, R.H., & Barto, A.G. (1996). Improving elevator performance using reinforcement learning. In D.S. Touretsky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1017–1023). Cambridge, MA: MIT Press.Google Scholar
  16. Dayan, P., & Sejnowski, T.J. (1994). Td converges with probability 1. Machine Learning, 14(3).Google Scholar
  17. Dennett, D.C. (1991). Consciousness explained. Brown, Boston. Little.Google Scholar
  18. Drescher, G.L. (1991). Made-up minds. MIT Press.Google Scholar
  19. Dzeroski, S., Blockeel, H., & DeRaedt, L. (1998). Relational reinforcement learning. In J. Shavlik (Ed.), Proceedings of the 12th International Conference on Machine Learning, San Mateo, CA: Morgan Kaufman.Google Scholar
  20. Estlin, T.A., & Mooney, R.J. (1996). Multi-strategy learning of search control for partial-order planning. Proceedings of the Thirteenth National Conference on Aritificial Intelligence (pp. 843–848).Google Scholar
  21. Forrest, S. (1985). Implementing semantic network structures using the classifiersystem. Proc. First International Conference on Genetic Algorithms (pp. 188–196). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  22. Fox, P. (1997). Functional volume models: System level models for funcational neuroimaging. In International Conference on Neural Networks.Google Scholar
  23. Gurvits, L., Lin, L.-J., & Hanson, S.J. (1994). Incremental learning of evaluation functions for absorbing markov chains: New methods and theorems. Unpublished report.Google Scholar
  24. Hardin, G. (1968). The tragedy of the commons. Science, 162, 1243–1248.Google Scholar
  25. Holland, J.H. (1986). Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning (Vol. 2, pp. 593–623). Los Altos, CA: Morgan Kauffman.Google Scholar
  26. Holland, J.H. (1995). Hidden order. Reading, MA: Addison-Wesley.Google Scholar
  27. Humphrys, M. (1996). Action selection methods using reinforcement learning. In P. Maes, M. Mataric, J.-A. Meyer, J. Pollack, & S.W. Wilson (Eds.), From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior (pp. 135–144). Cambridge MA: MIT Press/Bradford Books.Google Scholar
  28. Koza, J.R. (1992). Genetic programming (pp. 459–470). Cambridge: MIT Press.Google Scholar
  29. Lang, K. (1995a). Hill climbing beats genetic search on a boolean circuit synthesis task of koza's. The Twelfth International Conference on Machine Learning (pp. 340–343).Google Scholar
  30. Lang, K. (1995b). Comments on a response to..., August 18.Google Scholar
  31. Lenat, D.B. (1983). EURISKO: a program that learns new heuristics and domain concepts, the nature of heuristics III: Program design and results. Artificial Intelligence, 21(1/2), 61–98.Google Scholar
  32. Lettau, M., & Uhlig, H. (1999). Rule of thumb and dynamic programming. American Economic Review, in press.Google Scholar
  33. Lloyd, W. (1833). Two lectures on the checks to population. Oxford: Oxford University Press.Google Scholar
  34. Luria, A.R. (1973). The working brain, an introduction to neuropsychology. New York: Basic Books.Google Scholar
  35. Maes, P. (1990). How to do the right thing. Connection Science, 1(3).Google Scholar
  36. McAllester, D., & Rosenblitt, D. (1991). Systematic nonlinear planning. Proceedings of the AAAI National Conference.Google Scholar
  37. Miller, M.S., & Drexler, K.E. (1988a). Markets and computation: Agoric open systems. In B.A. Huberman (Ed.), The ecology of computation, number 2 in Studies in Computer Science and Artificial Intelligence (pp. 133–176). New York: North Holland.Google Scholar
  38. Miller, M.S., & Drexler, K.E. (1988b). Comparative ecology. In B.A. Huberman (Ed.), The ecology of computation, number 2 in Studies in Computer Science and Artificial Intelligence (pp. 51–76). New York: North Holland.Google Scholar
  39. Minsky, M. (1986). The society of mind. New York: Simon and Schuster.Google Scholar
  40. Minsky, M. (1995). Steps towards artificial intelligence. In E.A. Feigenbaum & J. Feldman (Eds.). Computers and thought. Menlo Park: AAAI Press.Google Scholar
  41. Nelson, R.R., & Winter, S.G. (1994). An evolutionary theory of economic change, volume 5th Printing. Harvard University Press.Google Scholar
  42. Newell, A. (1990). Unified theories of cognition. Cambridge: Harvard University Press.Google Scholar
  43. Palmer, R.G., Arthur, W.B., Holland, J.H., LeBaron, B., & Tayler, P. (1994). Artificial economic life: A simple model of a stockmarket. Physica D 75 (pp. 264–274).Google Scholar
  44. Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (Eds.), Parallel distributed processing. Cambridge: MIT Press.Google Scholar
  45. Schmidhuber, J. (1989). The Neural Bucket Brigade: A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4), 403–412.Google Scholar
  46. Schuurmans, D., & Schaeffer, J. (1989). Representational difficulties with classifier systems. Proceedings of International Conference on Genetic Algorithms (pp. 328–333), Fairfax, VA.Google Scholar
  47. Selfridge, O.G. (1959). Pandemonium: A paradigm for learning. Proceedings of the Symposium on Mechanisation of Thought Process. National Physics Laboratory.Google Scholar
  48. Simon, H.A. (1987). Bounded rationality. In J. Eatwell, M. Millgate, & P. Newman (Eds.), The new palgrave: A dictionary of economics. London and Basingstoke: Macmillan.Google Scholar
  49. Soderlan, S., Barrett, T., & Weld, D. (1990). The snlp planner implementation, contact Scholar
  50. Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.Google Scholar
  51. Sutton, R.S., & Barto, A.G. (1998). Reinforcement learning, an introduction. Cambridge: MIT Press.Google Scholar
  52. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.Google Scholar
  53. Tesauro, G. (1995). Temporal difference learning and td-gammon. Communications of the ACM, 38(3), 58–68.Google Scholar
  54. Toga, A.W., & Mazziotta, J.C. (1996). Brain mapping, the methods. San Diego: Academic Press.Google Scholar
  55. Valiant, L. (1994). Circuits of the mind. Oxford University Press.Google Scholar
  56. Valiant, L. (1995). Rationality. In Proceedings of the Eighth Annual Conference on Computational Learning Theory (pp. 3–14).Google Scholar
  57. Venturini, G. (1994). Adaption in dynamic environments through a minimal probability of exploration. In Proceedings of the Third International Conference on Simulation of Adaptive Behavior (pp. 371–379). Cambridge, MA: MIT Press.Google Scholar
  58. Watkins, C.J.C.H. (1989). Learning from delayed rewards. Ph.D. thesis, Cambridge University.Google Scholar
  59. Wellman, M.P. (1993). A market oriented programming environment and its application to distributed multicommodity flow problems. Journal of Artificial Intelligence Research, 1, 1–23.Google Scholar
  60. Whitehead, S.D., & Ballard, D.H. (1991). Learning to perceive and act. Machine Learning, 7(1), 45–83.Google Scholar
  61. Wilson, S.W. (1995). Classifier fitness based on accuracy. Evolutionary Computation, 3(2), 149–175.Google Scholar
  62. Wilson, S.W., & Goldberg, D.E. (1998). A critical review of classifier systems. Proceedings of the Third International Conference on Genetic Algorithms, San Mateo, CA: Morgan Kauffman.Google Scholar
  63. Winograd, T. (1972). Understanding natural language. New York: Academic Press.Google Scholar
  64. Zang, W., & Dietterich, T.G. (1996). High-performance job-shop scheduling with a time-delay td (lambda) network. In D.S. Touretszky, M.C. Mozer, & M.E Haselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1024–1030).Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Eric B. Baum
    • 1
  1. 1.NEC Research InstitutePrinceton

Personalised recommendations