Efficient Reinforcement Learning through Symbiotic Evolution

Abstract

This article presents a new reinforcement learning method called SANE (Symbiotic, Adaptive Neuro-Evolution), which evolves a population of neurons through genetic algorithms to form a neural network capable of performing a task. Symbiotic evolution promotes both cooperation and specialization, which results in a fast, efficient genetic search and discourages convergence to suboptimal solutions. In the inverted pendulum problem, SANE formed effective networks 9 to 16 times faster than the Adaptive Heuristic Critic and 2 times faster than Q-learning and the GENITOR neuro-evolution approach without loss of generalization. Such efficient learning, combined with few domain assumptions, make SANE a promising approach to a broad range of reinforcement learning problems, including many real-world applications.

References

  1. Anderson, C. W. (1987). Strategy Learning with multilayer connectionist representations Technical Report TR87 509.3, GTE Labs, Watham, MA.

    Google Scholar 

  2. Anderson, C. W. (1989). Learning to control an inverted pendulum using neural networks. IEEE Control Systems Magazine. 9: 31-37.

    Google Scholar 

  3. Barto, A. G., Sutton, R. S.. & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems Man, and Cybernetics, SMC 13:134-846.

    Google Scholar 

  4. Belew, R. K., Mclnerney, J., & Schraudolph, N. N. (1991). Evolving networks: Using the genetic algorithm with connectionist learning. In Farmer, J. D., Langton, C., Rasmussen, S., and Taylor, C. editors. Artificial life II.. Reading, MA: Addison Wesley.

    Google Scholar 

  5. Brooks, R. A (1991) Intelligence without representation. Artificial Intelligence, 47: 139-159.

    Google Scholar 

  6. Collins, R. J., & Jefferson, D. R. (1991). Selection in massively parallel genetic algorithms. In Proceedings of the fourth International Conference or Genetic Algorithms, 249-256. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  7. De Jong, K. A. (1975). An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, The University of Michigan, Ann Arbor, MI.

    Google Scholar 

  8. Goldberg, D. E. (1989). Genetic Algorithms with Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.

    Google Scholar 

  9. Goldberg, D. E., & Richardson, J. (1987) Genetic algorithms with sharing for multimodal function optimization. In Proceedings of the Second Internaional Conference on Genetic Algorithms. 148-154 San Mateo, CA: Morgan Kaufmann

    Google Scholar 

  10. Grefensette, J., & Schultz, A. (1994). An evolutionary approach of learning in robots. In Proceedings of the Machine learning Workshop on Robot Learning, Eleventh International Conference on Machine Learning. New Brunswick, NJ.

    Google Scholar 

  11. Grefenstette, J. J. (1992). An approach to anytime learning. In Proceedings of the Ninth International Conference on Machine Learning, 189-195.

  12. Grefenstette, J. J., Ramsey, C. L. & Schell, A. C. (1990). Learning sequential decision rules using simulation models and competition. Machine Learning 5: 355-381.

    Google Scholar 

  13. Holland, J. H. (1975). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Application to Biology, Control and Artificial Intelligence. Ann Arbor, MI: University of Michigan Prees

    Google Scholar 

  14. Horn, J., Goldberg, D. E. & Deb, K. (1994). Implicit niching in a leating classifier system: Nature's way. Evolutionary Computation, 2(1): 37-66.

    Google Scholar 

  15. Jefftrson, D., Collins, R., Cooper, C., Dyer, M., Flowers, M., Korf, R., Taylor, C., Wang, A. (1991). Evolution as a theme in artificial life: The genesys/tracker system. In Farmer,. J. D., Langton, C., Rasmussen, S., and Taylor, C. editors, Artificial Life II. Reading. MA: Addison-Wesley.

    Google Scholar 

  16. Kitano, H. (1990). Designing neutral networks using genetic algorithms with graph generation system, Complex Systems 4: 461-476.

    Google Scholar 

  17. Koza, I. R., & Rice,. J. P. (1991) Genetic generalization of both the weights and architecture for a neural network. In International Joint Conference on Neural Networks, vol. 2 397-404. New York, NY: IEEE.

    Google Scholar 

  18. Lee, K-F., & Mahulan, S. (1990). The development of a world class Othello program. Artificial Intelligence, 43: 21-36.

    Google Scholar 

  19. Lin, L. J. (1992). Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning. 8(3): 293-32 1.

    Google Scholar 

  20. Michie, D., & Chambers, R. A (1968) BOXES: An experiment in adaptive control. In Dale, E. and Michie, D., editors, Machine Intelligence. Edinburgh, UK: Oliver and Boyd.

    Google Scholar 

  21. Moriarty, D. E. & Miikkulainen, R. (1994a). Evolutionary neural networks for value ordering in constraint satisfaction problems. Technical Report A194-218. Department of Computer Sciences, The University of Texas at Austin.

    Google Scholar 

  22. Moriarly. D. E. Miikkulainen, R (1994b). Evolving neural networks to focus minimax search. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 1371-1377. Seattle, WA: MIT Press.

    Google Scholar 

  23. Nolfi, S. & Parisi D. (1992). Growing neural networks. Artificial Life III Reading, MA: Addison-Wesley

    Google Scholar 

  24. Pendrith, M. (1994) On reinforcement learning of control actions in noisy and non-Markovian domains. Technical Report UNSW-CSE-TR-9410. School of Computer Science and Engineenng, The University of New South Wales

  25. Potter, M. & De Jong, K. (1995a) Evolving neural network with collaborative specie. In Proceedings of the 1995 Summer Computer Simulation, Conference. Ottawa. Canada.

  26. Potter, M., De Jong, K & Grelenstelle, J (1995b) A coevolutionary approach to learning sequential decision rules. In Proceedings of the Sixth International Conference on Genetic Algorithms. Pittshurgh, PA.

  27. Sammut C. & Cribb J. (1990). Is learning rate a good performance criterion for learning? In Proceedings of the Seventh International Conference on Machine Learning. 170-178. Morgan Kaufmann.

  28. Schaffer, J. D., Whitley, D., & Eshelmlan, L. J. (1992). Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Proceedings of the International Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92) Baltimore, MD.

  29. Smith. R. E. (1994). Is a learning classifier system a type of neural network? Evolutionary Computation 2(1).

  30. Smith, R. E. Forrest S. & Perelson. A. S. (1993). Searching for diverse, cooperative populations with genetic algorithms. Evolutionary Computation, 1(2): 127-149

    Google Scholar 

  31. Smith, R. E. & Gray B. (1993). Co-adaptive genetic algorithms: An example in Othello strategy. Technical Report TCGA 94002. Department of Engineering Science and Mechanics, The University of Alabama.

  32. Sleetskamp, R. (1995) Explorations in symbiotic neuro-evolution search space, Masters Stage Report. Department of Computer Science, University of Twente, The Netherlands.

    Google Scholar 

  33. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3: 9-44.

    Google Scholar 

  34. Syswerda, G. (1991). A study of reproduction in generational and steady-state genetic algorithms. In Rawlings, G., editor, Foundations of Genetic Algorithms. 94-101. San Mateo, CA: Morgan-Kaufmann

    Google Scholar 

  35. Watkins, C J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, University of Cambridge, England.

    Google Scholar 

  36. Watkins, C. J. C H., & Dayan, P. (1992). Q learning. Machine Learning, 8(3): 279-292.

    Google Scholar 

  37. Whitley, D. (1989) The GENITOR algorithm and selective pressure In Proceedings of the Third International Conference on Genetic Algorithms. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  38. Whitley, D. (1994) A genetic algorithm tutorial. Statistics and Computing, 4: 65-85.

    Google Scholar 

  39. Whitley. D., Dominic, S. Das, R, & Anderson, C W. (1993) Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13: 259-284

    Google Scholar 

  40. Whitley, D. & Kauth, J. (1988). GENITOR: A different generic algorithm. In Proceedings of the Rocky Mountain Conference on Artificial Intelligence 118-130. Denver, CO.

  41. Whitley, D., Starkweather, T., & Bogart, C. (1990). Genetic algorithms and neural networks: Optimizing connections and connectivity. Parallel Computing. 14: 347-361

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Moriarty, D.E., Mikkulainen, R. Efficient Reinforcement Learning through Symbiotic Evolution. Machine Learning 22, 11–32 (1996). https://doi.org/10.1023/A:1018004120707

Download citation

  • Neuro-Evolution
  • Reinforcement Learning
  • Genetic Algorithms
  • Neural Networks