Abstract
This article presents a new reinforcement learning method called SANE (Symbiotic, Adaptive Neuro-Evolution), which evolves a population of neurons through genetic algorithms to form a neural network capable of performing a task. Symbiotic evolution promotes both cooperation and specialization, which results in a fast, efficient genetic search and discourages convergence to suboptimal solutions. In the inverted pendulum problem, SANE formed effective networks 9 to 16 times faster than the Adaptive Heuristic Critic and 2 times faster than Q-learning and the GENITOR neuro-evolution approach without loss of generalization. Such efficient learning, combined with few domain assumptions, make SANE a promising approach to a broad range of reinforcement learning problems, including many real-world applications.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Anderson, C. W. (1987). Strategy Learning with multilayer connectionist representations Technical Report TR87 509.3, GTE Labs, Watham, MA.
Anderson, C. W. (1989). Learning to control an inverted pendulum using neural networks. IEEE Control Systems Magazine. 9: 31-37.
Barto, A. G., Sutton, R. S.. & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems Man, and Cybernetics, SMC 13:134-846.
Belew, R. K., Mclnerney, J., & Schraudolph, N. N. (1991). Evolving networks: Using the genetic algorithm with connectionist learning. In Farmer, J. D., Langton, C., Rasmussen, S., and Taylor, C. editors. Artificial life II.. Reading, MA: Addison Wesley.
Brooks, R. A (1991) Intelligence without representation. Artificial Intelligence, 47: 139-159.
Collins, R. J., & Jefferson, D. R. (1991). Selection in massively parallel genetic algorithms. In Proceedings of the fourth International Conference or Genetic Algorithms, 249-256. San Mateo, CA: Morgan Kaufmann.
De Jong, K. A. (1975). An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, The University of Michigan, Ann Arbor, MI.
Goldberg, D. E. (1989). Genetic Algorithms with Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.
Goldberg, D. E., & Richardson, J. (1987) Genetic algorithms with sharing for multimodal function optimization. In Proceedings of the Second Internaional Conference on Genetic Algorithms. 148-154 San Mateo, CA: Morgan Kaufmann
Grefensette, J., & Schultz, A. (1994). An evolutionary approach of learning in robots. In Proceedings of the Machine learning Workshop on Robot Learning, Eleventh International Conference on Machine Learning. New Brunswick, NJ.
Grefenstette, J. J. (1992). An approach to anytime learning. In Proceedings of the Ninth International Conference on Machine Learning, 189-195.
Grefenstette, J. J., Ramsey, C. L. & Schell, A. C. (1990). Learning sequential decision rules using simulation models and competition. Machine Learning 5: 355-381.
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Application to Biology, Control and Artificial Intelligence. Ann Arbor, MI: University of Michigan Prees
Horn, J., Goldberg, D. E. & Deb, K. (1994). Implicit niching in a leating classifier system: Nature's way. Evolutionary Computation, 2(1): 37-66.
Jefftrson, D., Collins, R., Cooper, C., Dyer, M., Flowers, M., Korf, R., Taylor, C., Wang, A. (1991). Evolution as a theme in artificial life: The genesys/tracker system. In Farmer,. J. D., Langton, C., Rasmussen, S., and Taylor, C. editors, Artificial Life II. Reading. MA: Addison-Wesley.
Kitano, H. (1990). Designing neutral networks using genetic algorithms with graph generation system, Complex Systems 4: 461-476.
Koza, I. R., & Rice,. J. P. (1991) Genetic generalization of both the weights and architecture for a neural network. In International Joint Conference on Neural Networks, vol. 2 397-404. New York, NY: IEEE.
Lee, K-F., & Mahulan, S. (1990). The development of a world class Othello program. Artificial Intelligence, 43: 21-36.
Lin, L. J. (1992). Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning. 8(3): 293-32 1.
Michie, D., & Chambers, R. A (1968) BOXES: An experiment in adaptive control. In Dale, E. and Michie, D., editors, Machine Intelligence. Edinburgh, UK: Oliver and Boyd.
Moriarty, D. E. & Miikkulainen, R. (1994a). Evolutionary neural networks for value ordering in constraint satisfaction problems. Technical Report A194-218. Department of Computer Sciences, The University of Texas at Austin.
Moriarly. D. E. Miikkulainen, R (1994b). Evolving neural networks to focus minimax search. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 1371-1377. Seattle, WA: MIT Press.
Nolfi, S. & Parisi D. (1992). Growing neural networks. Artificial Life III Reading, MA: Addison-Wesley
Pendrith, M. (1994) On reinforcement learning of control actions in noisy and non-Markovian domains. Technical Report UNSW-CSE-TR-9410. School of Computer Science and Engineenng, The University of New South Wales
Potter, M. & De Jong, K. (1995a) Evolving neural network with collaborative specie. In Proceedings of the 1995 Summer Computer Simulation, Conference. Ottawa. Canada.
Potter, M., De Jong, K & Grelenstelle, J (1995b) A coevolutionary approach to learning sequential decision rules. In Proceedings of the Sixth International Conference on Genetic Algorithms. Pittshurgh, PA.
Sammut C. & Cribb J. (1990). Is learning rate a good performance criterion for learning? In Proceedings of the Seventh International Conference on Machine Learning. 170-178. Morgan Kaufmann.
Schaffer, J. D., Whitley, D., & Eshelmlan, L. J. (1992). Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Proceedings of the International Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92) Baltimore, MD.
Smith. R. E. (1994). Is a learning classifier system a type of neural network? Evolutionary Computation 2(1).
Smith, R. E. Forrest S. & Perelson. A. S. (1993). Searching for diverse, cooperative populations with genetic algorithms. Evolutionary Computation, 1(2): 127-149
Smith, R. E. & Gray B. (1993). Co-adaptive genetic algorithms: An example in Othello strategy. Technical Report TCGA 94002. Department of Engineering Science and Mechanics, The University of Alabama.
Sleetskamp, R. (1995) Explorations in symbiotic neuro-evolution search space, Masters Stage Report. Department of Computer Science, University of Twente, The Netherlands.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3: 9-44.
Syswerda, G. (1991). A study of reproduction in generational and steady-state genetic algorithms. In Rawlings, G., editor, Foundations of Genetic Algorithms. 94-101. San Mateo, CA: Morgan-Kaufmann
Watkins, C J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, University of Cambridge, England.
Watkins, C. J. C H., & Dayan, P. (1992). Q learning. Machine Learning, 8(3): 279-292.
Whitley, D. (1989) The GENITOR algorithm and selective pressure In Proceedings of the Third International Conference on Genetic Algorithms. San Mateo, CA: Morgan Kaufmann.
Whitley, D. (1994) A genetic algorithm tutorial. Statistics and Computing, 4: 65-85.
Whitley. D., Dominic, S. Das, R, & Anderson, C W. (1993) Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13: 259-284
Whitley, D. & Kauth, J. (1988). GENITOR: A different generic algorithm. In Proceedings of the Rocky Mountain Conference on Artificial Intelligence 118-130. Denver, CO.
Whitley, D., Starkweather, T., & Bogart, C. (1990). Genetic algorithms and neural networks: Optimizing connections and connectivity. Parallel Computing. 14: 347-361
Rights and permissions
About this article
Cite this article
Moriarty, D.E., Mikkulainen, R. Efficient Reinforcement Learning through Symbiotic Evolution. Machine Learning 22, 11–32 (1996). https://doi.org/10.1023/A:1018004120707
Issue Date:
DOI: https://doi.org/10.1023/A:1018004120707