Skip to main content

Hybrid adaptive heuristic critic architectures for learning in mazes with continuous search spaces

  • Conference paper
  • First Online:
Book cover Parallel Problem Solving from Nature — PPSN III (PPSN 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 866))

Included in the following conference series:

Abstract

We present the first results obtained from two implementations of a hybrid architecture which balances exploration and exploitation to solve mazes with continuous search spaces. In both cases the critic is based around a Radial Basis Function (RBF) Neural Network which uses Temporal Difference learning to acquire a continuous valued internal model of the environment through interaction with it. Also in both cases an Evolutionary Algorithm is employed in the search policy for each movement. In the first implementation a Genetic Algorithm (GA) is used, and in the second an Evolutionary Strategy (ES). Over successive trials the maze solving agent learns the V-function, a mapping between real numbered positions in the maze and the value of being at those positions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Back T., Schwefel H-P., 1993, ‘An Overview of Evolutionary Algorithms for Parameter Optimization', Evolutionary Computation Vol.1 Num.1, pp1–23

    Google Scholar 

  • Barto A. G., Bradtke S. J., Singh S. P., 1991, ‘Real-Time Learning and Control using Asynchronous Dynamic Programming', Dept. of Computer Science, University of Massachusetts, USA, Technical Report 91-57

    Google Scholar 

  • Barto A. G., Sutton R. S., Watkins C. J. C. H., 1989, ‘Learning and Sequential Decision Making', COINS Technical Report 89–95

    Google Scholar 

  • Belew R. K., McInerney J., Schraudolph N. N., 1990, ‘Evolving Networks: Using the Genetic Algorithm with Connectionist Learning', University of California at San Diego, USA, CSE Technical Report CS90-174

    Google Scholar 

  • Booker L. B., Goldberg D. E., Holland J. H., 1989, ‘Classifier Systems and Genetic Algorithms', Artificial Intelligence 40, pp.235–282

    Article  Google Scholar 

  • Cliff D., Husbands P., Harvey I., 1992, ‘Evolving Visually Guided Robots', University of Sussex, Cognitive Science Research Papers CSRP 220

    Google Scholar 

  • Lin L., PhD thesis, 1993, ‘Reinforcement Learning for Robots using Neural Networks', Computer Science School, Carnegie Mellon University Pittsburgh, USA

    Google Scholar 

  • Poggio T., Girosi F., 1989, ‘A theory of Networks for Approximation and Learning', MIT Cambridge, MA, AI lab. Memo 1140

    Google Scholar 

  • Roberts G., 1989, ‘A rational reconstruction of Wilson's Animat and Holland's CS-1', Procs. of 3rd International Conference on Genetic Algorithms, pp.317–321, Editor Schaffer J. D., Morgan Kaufmann

    Google Scholar 

  • Roberts G., 1991, ‘Classifier Systems for Situated Autonomous Learning', PhD thesis, Edinburgh University

    Google Scholar 

  • Roberts G., 1993, ‘Dynamic Planning for Classifier Systems', Proceedings of the 5th International Conference on Genetic Algorithms, pp.231–237

    Google Scholar 

  • Sanner R. M., Slotine J. E., 1991, ‘Gaussian Networks for Direct Adaptive Control', Nonlinear Systems Lab., MIT, Cambridge, USA, Technical Report NSL-910503

    Google Scholar 

  • Sutton R. S., 1984, PhD thesis ‘Temporal Credit Assignment in Reinforcement Learning', University of Massachusetts, Dept. of computer and Information Science

    Google Scholar 

  • Sutton R. S., 1991, ‘Reinforcement Learning Architectures for Animats', From Animals to Animats, pp288–296, Editors Meyer, J., Wilson, S., MIT Press

    Google Scholar 

  • Thrun S. B., 1992, ‘The Role of Exploration in Learning', Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, Ed. White D. A., Sofge D. A.

    Google Scholar 

  • Watkins C. J. C. H., 1989, PhD thesis ‘Learning from Delayed Rewards', King's College, Cambridge.

    Google Scholar 

  • Werbos P. J., 1992, ‘Approximate Dynamic Programming for Real-Time Control and Neural Modelling', Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, Ed. White D. A., Sofge D. A.

    Google Scholar 

  • Wilson S. W., 1985, ‘Knowledge growth in an artificial animal', Proceedings of an International Conference on Genetic Algorithms and their Applications, pp. 16–23, Editor Grefenstette J. J.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yuval Davidor Hans-Paul Schwefel Reinhard Männer

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pipe, A.G., Fogarty, T.C., Winfield, A. (1994). Hybrid adaptive heuristic critic architectures for learning in mazes with continuous search spaces. In: Davidor, Y., Schwefel, HP., Männer, R. (eds) Parallel Problem Solving from Nature — PPSN III. PPSN 1994. Lecture Notes in Computer Science, vol 866. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58484-6_291

Download citation

  • DOI: https://doi.org/10.1007/3-540-58484-6_291

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58484-1

  • Online ISBN: 978-3-540-49001-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics