Machine Learning

, Volume 13, Issue 2–3, pp 259–284 | Cite as

Genetic reinforcement learning for neurocontrol problems

  • Darrell Whitley
  • Stephen Dominic
  • Rajarshi Das
  • Charles W. Anderson
Article

Abstract

Empirical tests indicate that at least one class of genetic algorithms yields good performance for neural network weight optimization in terms of learning rates and scalability. The successful application of these genetic algorithms to supervised learning problems sets the stage for the use of genetic algorithms in reinforcement learning problems. On a simulated inverted-pendulum control problem, “genetic reinforcement learning” produces competitive results with AHC, another well-known reinforcement learning paradigm for neural networks that employs the temporal difference method. These algorithms are compared in terms of learning rates, performance-based generalization, and control behavior over time.

Keywords

Genetic algorithms reinforcement learning neural networks adaptive control 

References

  1. Ackley, D., & Littman, M. (1990). Generalization and scaling in reinforcement learning. In D. Touretzky (Ed.),Advances in neural information processing systems (Vol. 2). San Mateo, CA: Morgan Kaufmann.Google Scholar
  2. Ackley, D., & Littman, M. (1991)Interactions between learning and evolution. Morristown, NJ: Cognitive Science Research Group, Bellcore.Google Scholar
  3. Anderson, C.W. (1987).Strategy learning with multilayer connectionist representations (TR87-509.3). GTE Labs, Waltham, MA.Google Scholar
  4. Anderson, C.W. (1989). Learning to control an inverted pendulum using neural networks.IEEE Control Systems Magazine, 9 31–37.Google Scholar
  5. Anderson, C.W., & Miller, W.T. (1990). A challenging set of control problems. In T. Miller, R. Sutton, & P. Werbos (Eds.),Neural networks for control. Cambridge, MA: MIT Press.Google Scholar
  6. Barto, A.G. (1990). Connectionist learning for control. In T. Miller, R. Sutton, & P. Werbos (Eds.),Neural networks for control. Cambridge, MA: MIT Press.Google Scholar
  7. Barto, A.G., Sutton, R.S., & Anderson, C.W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man, and Cybernetics, 13 834–846.Google Scholar
  8. Booker, L., Goldberg, D., & Holland, J. (1989). Classifier systems and genetic algorithms.Artificial Intelligence, 40(1–3), 235–282.Google Scholar
  9. Davis, L. (1989). Adapting operator probabilities in genetic search.Proceedings of the Third International Conference on Genetic Algorithms (pp. 61–69). Fairfax, VA: Morgan Kaufmann.Google Scholar
  10. Davis, L. (1991).The handbook of genetic algorithms. New York: Van Nostrand Reinhold.Google Scholar
  11. Fahlman, S., & Lebiere, C. (1990). The cascade correlation learning architecture. In D. Touretzky (Ed.),Advances in neural information processing systems (Vol. 2). San Mateo, CA: Morgan Kaufmann.Google Scholar
  12. Fitzpatrick, J., & Grefenstette, J. (1988). Genetic algorithms in noisy environments.Machine Learning, 3(2–3), 101–120.Google Scholar
  13. Goldberg, D. (1989).Genetic algorithms in search, optimization and machine learning. Reading, MA: Addison-Wesley.Google Scholar
  14. Grefenstette, J. (1987). Incorporating problem specific knowledge into genetic algorithms. In L. Davis (Ed.),Genetic algorithms and simulated annealing. London: Pitman/Morgan Kaufmann.Google Scholar
  15. Grefenstette, J. (1989). A system for learning control strategies using genetic algorithms.Proceedings of the Third International Conference on Genetic Algorithms (p. 183–190). Fairfax, VA: Morgan Kaufmann.Google Scholar
  16. Grefenstette, J. (1991). Strategy acquisition with genetic algorithms. In L. Davis (Ed.),Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold.Google Scholar
  17. Grefenstette, J.J., Ramsey, C.L., & Scultz, A.C. (1990). Learning sequential decision rules using simulation models and competition.Machine Learning, 5 355–381.Google Scholar
  18. Harp, S., Samad, T., & Guha, A. (1990). Designing application-specific neural networks using the genetic algorithm.Neural Information Processing Systems (Vol. 2). San Mateo, CA: Morgan Kaufman.Google Scholar
  19. Holland, J. (1975)Adaptation in natural and artificial systems. Ann Arbor, MI: University of Michigan Press.Google Scholar
  20. Holland, J. (1986). Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. In R. Michalski, J. Carbonell, & T. Mitchell (Eds.),Machine learning (Vol. 2). San Mateo, CA: Morgan Kaufmann.Google Scholar
  21. Michie, D., & Chambers, R. (1968). BOXES: An experiment in adaptive control. In E. Dale & D. Michie (Eds.),Machine intelligence (Vol. 2). Edinburgh: Oliver and Boyd.Google Scholar
  22. Miller, G., Todd, P., & Hegde, S. (1989). Designing neural networks using genetic algorithms.Proceedings of the Third International Conference on Genetic Algorithms (pp. 379–384). Fairfax, VA: Morgan Kaufmann.Google Scholar
  23. Montana, D., & Davis, L. (1989). Training feedforward neural networks using genetic algorithms.Proceedings of the 1989 International Joint Conference on Artificial Intelligence (pp. 762–767).Google Scholar
  24. Odetayo, M.O., & McGregor, D.R. (1989). Genetic algorithm for inducing control rules for a dynamic system.Proceedings of the Third International Conference on Genetic Algorithms (pp. 177–182). Fairfax, VA: Morgan Kaufmann.Google Scholar
  25. Sammut, C., & Cribb, J. (1990). Is learning rate a good performance criterion for learning.Machine Learning: Proceedings of the 7th International Conference (pp. 170–178). San Mateo, CA: Morgan Kaufmann.Google Scholar
  26. Schaffer, D. (1987). Some effects of selection procedures on hyperplane sampling by genetic algorithms. In L. Davis (Ed.),Genetic algorithms and simulated annealing. London: Pitman/Morgan Kaufmann.Google Scholar
  27. Schaffer, J.D., Caruana, R.A., & Eshelman, L.J. (1990). Using genetic search to exploit the emergent behavior of neural networks.Physica D, 42 244–248.Google Scholar
  28. Selfridge, O.G., Sutton, R.S., & Barto, A.G. (1988). Training and tracking in robotics.Proceedings of the Fifth International Conference on Machine Learning (pp. 437–443). San Mateo, CA: Morgan Kaufmann.Google Scholar
  29. Sietsma, J., & Dow, R. (1991). Creating artificial neural networks that generalize.Neural Networks, 4 67–79.Google Scholar
  30. Sutton, R. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3 9–44.Google Scholar
  31. Sutton, R. (1991). Reinforcement learning architectures for animats. In J. Meyers & S. Wilson (Eds.),Simulation of adaptive behavior: From animals to animats (pp. 288–296). Cambridge, MA: MIT Press.Google Scholar
  32. Thierens, D., & Vercauteren, L. (1990). A topology exploiting genetic algorithms to control dynamical systems. In H.P. Schwefel & R. Manners (Eds.),Parallel problems solving from nature (pp. 104–108). Springer/Verlag.Google Scholar
  33. Watkins, C. (1990).Learning with delayed rewards. Ph.D. dissertation, Psychology Department, Cambridge University, Cambridge, England.Google Scholar
  34. Weiland, A. (1990). Evolving controllers for unstable systems. In D. Touretzky, J. Elman, T. Sejnowski & G. Hinton (Eds.),Connectionist models: Proceedings of the 1990 Summer School (p. 91–102). San Mateo, CA: Morgan KaufmannGoogle Scholar
  35. Weiland, A. (1991). Evolving neural network controllers for unstable systems.1991 International Joint Conference on Neural Networks, 2 (pp. 667–673). Seattle.Google Scholar
  36. Werbos, P. (1989). Backpropagation and neurocontrol: A review and prospectus.1989 International Joint Conference on Neural Networks, 1 (pp. 209–215). Washington, DC.Google Scholar
  37. Werbos, P. (1990). A menu of designs for reinforcement learning over time. In T. Miller, R. Sutton, & P. Werbos (Eds.),Neural networks for control. Cambridge, MA: MIT Press.Google Scholar
  38. Whitley, D., & Kauth, K. (1988). GENITOR: A different genetic algorithm.Proceedings of the 1988 Rocky Mountain Conference on Artificial Intelligence (pp. 118–130). Denver.Google Scholar
  39. Whitley, D., & Hanson, T. (1989). Optimizing neural nets using faster, more accurate genetic search.Proceedings of the Third International Conference on Genetic Algorithms (pp. 391–396). Fairfax, VA: Morgan Kaufmann.Google Scholar
  40. Whitley, D., & Bogart, C. (1990a). The evolution of connectivity: Pruning neural networks using genetic algorithms.1990 International Joint Conference on Neural Networks, 1 (p. 134–137). Washington, DC.Google Scholar
  41. Whitley, D., & Starkweather, T. (1990b). Optimizing small neural networks using a distributed genetic algorithm.1990 International Joint Conference on Neural Networks, 1 (pp. 206–209). Washington, DC.Google Scholar
  42. Whitley, D., Starkweather, T., & Bogart, C. (1990c). Genetic algorithm and neural networks: Optimizing connections and connectivity.Parallel Computing, 14 347–361.Google Scholar
  43. Whitley, D., Dominic, S., & Das, R. (1991). Genetic reinforcement learning with multilayered neural networks. In R. Belew & L. Booker (Eds.),Proceedings of the 4th International Conference on Genetic Algorithms (pp. 562–569). San Diego, CA: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • Darrell Whitley
    • 1
  • Stephen Dominic
    • 1
  • Rajarshi Das
    • 1
  • Charles W. Anderson
    • 1
  1. 1.Computer Science DepartmentColorado State UniversityFort Collins

Personalised recommendations