Machine Learning

, Volume 22, Issue 1–3, pp 11–32 | Cite as

Efficient reinforcement learning through symbiotic evolution

  • David E. Moriarty
  • Risto Miikkulainen
Article

Abstract

This article presents a new reinforcement learning method called SANE (Symbiotic, Adaptive Neuro-Evolution), which evolves a population of neurons through genetic algorithms to form a neural network capable of performing a task. Symbiotic evolution promotes both cooperation and specialization, which results in a fast, efficient genetic search and discourages convergence to suboptimal solutions. In the inverted pendulum problem, SANE formed effective networks 9 to 16 times faster than the Adaptive Heuristic Critic and 2 times faster thanQ-learning and the GENITOR neuro-evolution approach without loss of generalization. Such efficient learning, combined with few domain assumptions, make SANE a promising approach to a broad range of reinforcement learning problems, including many real-world applications.

Keywords

Neuro-Evolution Reinforcement Learning Genetic Algorithms Neural Networks 

References

  1. Anderson, C. W. (1987). Strategy learning with multilayer connectionist representations. Technical Report TR87–509.3, GTE Labs, Waltham, MA.Google Scholar
  2. Anderson, C. W. (1989). Learning to control an inverted pendulum using neural networks.IEEE Control Systems Magazine, 9:31–37.Google Scholar
  3. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834–846.Google Scholar
  4. Belew, R. K., McInerney, J. & Schraudolph, N. N. (1991). Evolving networks: Using the genetic algorithm with connectionist learning. In Farmer, J. D., Langton, C., Rasmussen, S., and Taylor, C., editors,Artificial Life II. Reading, MA: Addison-Wesley.Google Scholar
  5. Brooks, R. A. (1991). Intelligence without representation.Artificial Intelligence, 47:139–159.Google Scholar
  6. Collins, R. J. & Jefferson, D. R. (1991). Selection in massively parallel genetic algorithms. InProceedings of the Fourth International Conference on Genetic Algorithms, 249–256. San Mateo, CA: Morgan Kaufmann.Google Scholar
  7. De Jong, K. A. (1975).An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, The University of Michigan, Ann Arbor, MI.Google Scholar
  8. Goldberg, D. E. (1989).Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.Google Scholar
  9. Goldberg, D. E. & Richardson, J. (1987). Genetic algorithms with sharing for multimodal function optimization. InProceedings of the Second International Conference on Genetic Algorithms, 148–154. San Mateo, CA: Morgan Kaufmann.Google Scholar
  10. Grefenstette, J., & Schultz, A. (1994). An evolutionary approach to learning in robots. InProceedings of the Machine Learning Workshop on Robot Learning, Eleventh International Conference on Machine Learning, New Brunswick, NJ.Google Scholar
  11. Grefenstette, J. J. (1992). An approach to anytime learning. InProceedings of the Ninth International Conference on Machine Learning, 189–195.Google Scholar
  12. Grefenstette, J. J., Ramsey, C. L., & Schultz, A. C. (1990). Learning sequential decision rules using simulation models and competition.Machine Learning, 5:355–381.Google Scholar
  13. Holland, J. H. (1975).Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. Ann Arbor, MI: University of Michigan Press.Google Scholar
  14. Horn, J., Goldberg, D. E. & Deb, K. (1994). Implicit niching in a learning classifier system: Nature's way.Evolutionary Computation, 2(1):37–66.Google Scholar
  15. Jefferson, D., Collins, R., Cooper, C., Dyer, M., Flowers, M., Korf, R., Taylor, C., & Wang, A. (1991). Evolution as a theme in artificial life: The genesys/tracker system. In Farmer, J. D., Langton, C., Rasmusson, S., and Taylor, C., editors,Artificial Life II. Reading, MA: Addison-Wesley.Google Scholar
  16. Kitano, H. (1990). Designing neural networks using genetic algorithms with graph generation system.Complex Systems, 4:461–476.Google Scholar
  17. Koza, J. R., & Rice, J. P. (1991). Genetic generalization of both the weights and architecture for a neural network. InInternational Joint Conference on Neural Networks, vol. 2, 397–404. New York, NY: IEEE.Google Scholar
  18. Lee, K.-F., & Mahajan, S. (1990). The development of a world class Othello program.Artificial Intelligence, 43:21–36.Google Scholar
  19. Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning, and teaching.Machine Learning, 8(3):293–321.Google Scholar
  20. Michie, D., & Chambers, R. A. (1968). BOXES: An experiment in adaptive control. In Dale, E., and Michie, D., editors,Machine Intelligence. Edinburgh, UK: Oliver and Boyd.Google Scholar
  21. Moriarty, D. E., & Miikkulainen, R. (1994a). Evolutionary neural networks for value ordering in constraint satisfaction problems. Technical Report AI94-218, Department of Computer Sciences, The University of Texas at Austin.Google Scholar
  22. Moriarty, D. E., & Miikkulainen, R. (1994b). Evolving neural networks to focus minimax search. InProceedings of the Twelfth National Conference on Artificial Intelligence, 1371–1377. Seattle, WA: MIT Press.Google Scholar
  23. Nolfi, S., & Parisi, D. (1992). Growing neural networks.Artificial Life III, Reading, MA: Addison-Wesley.Google Scholar
  24. Pendrith, M. (1994). On reinforcement learning of control actions in noisy and non-Markovian domains. Technical Report UNSW-CSE-TR-9410, School of Computer Science and Engineering, The University of New South Wales.Google Scholar
  25. Potter, M., & De Jong, K. (1995a). Evolving neural networks with collaborative species. InProceedings of the 1995 Summer Computer Simulation Conference, Ottawa, Canada.Google Scholar
  26. Potter, M., De Jong, K., & Grefenstette, J. (1995b). A coevolutionary approach to learning sequential decision rules. InProceedings of the Sixth International Conference on Genetic Algorithms. Pittsburgh, PA.Google Scholar
  27. Sammut, C., & Cribb, J. (1990). Is learning rate a good performance criterion for learning? InProceedings of the Seventh International Conference on Machine Learning, 170–178. Morgan Kaufmann.Google Scholar
  28. Schaffer, J. D., Whitley, D., & Eshelman, L. J. (1992). Combinations of genetic algorithms and neural networks: A survey of the state of the art. InProceedings of the International Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92). Baltimore, MD.Google Scholar
  29. Smith, R. E. (1994). Is a learning classifier system a type of neural network?Evolutionary Computation, 2(1).Google Scholar
  30. Smith, R. E., Forrest, S., & Perelson, A. S. (1993). Searching for diverse, cooperative populations with genetic algorithms.Evolutionary Computation, 1(2):127–149.Google Scholar
  31. Smith, R. E., & Gray, B. (1993). Co-adaptive genetic algorithms: An example in Othello strategy, Technical Report TCGA 94002. Department of Engineering Science and Mechanics, The University of Alabama.Google Scholar
  32. Steetskamp, R. (1995) Explorations in symbiotic neuro-evolution search spaces. Masters Stage Report, Department of Computer Science, University of Twente, The Netherlands.Google Scholar
  33. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3:9–44.Google Scholar
  34. Syswerda, G. (1991). A study of reproduction in generational and steady-state genetic algorithms. In Rawlings, G., editor,Foundations of Genetic Algorithms, 94–101. San Mateo, CA: Morgan-Kaufmann.Google Scholar
  35. Watkins, C. J. C. H. (1989).Learning from Delayed Rewards. PhD thesis, University of Cambridge, England.Google Scholar
  36. Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning.Machine Learning, 8(3):279–292.Google Scholar
  37. Whitley, D. (1989). The GENITOR algorithm and selective pressure. InProceedings of the Third International Conference on Genetic Algorithms. San Mateo, CA: Morgan Kaufman.Google Scholar
  38. Whitley, D. (1994). A genetic algorithm tutorial.Statistics and Computing, 4:65–85.Google Scholar
  39. Whitley, D., Dominic, S., Das, R., & Anderson, C. W. (1993). Genetic reinforcement learning for neurocontrol problems.Machine Learning, 13:259–284.Google Scholar
  40. Whitley, D., & Kauth, J. (1988). GENITOR: A different genetic algorithm. InProceedings of the Rocky Mountain Conference on Artificial Intelligence, 118–130. Denver, CO.Google Scholar
  41. Whitley, D., Starkweather, T., & Bogart, C. (1990). Genetic algorithms and neural networks: Optimizing connections and connectivity.Parallel Computing, 14:347–361.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • David E. Moriarty
    • 1
  • Risto Miikkulainen
    • 1
  1. 1.Department of Computer SciencesThe University of Texas at AustinAustin

Personalised recommendations