Adaptive and economic data representation in control architectures of autonomous real-world robots
Learning algorithms for autonomous robots in complex, real-world environments usually have to deal with many degrees of freedom and continuous state spaces. Reinforcement learning is a promising concept for unsupervised learning, but common algorithms suffer from huge storage and calculation requirements if they are used to construct an internal model by estimating a value-function for every action in every possible state. In our attempt to approximate this function at the lowest cost, we introduce a flexible method that focuses on the states of greatest interest, and interpolates between them with a fast and easy-to-implement algorithm. In order to provide the highest accuracy to any predefined limit, we enhanced this algorithm by a fast converging multilayer error approximator.
KeywordsReinforcement Learning Associative Memory Vector Quantizer Voronoi Cell Function Approximator
Unable to display preview. Download preview PDF.
- 1.Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA, London, UK.Google Scholar
- 2.Fischer J (1999) Strategiebildung mit neuronalen Netzen (in German). Diploma thesis, Department of Applied Physics, WWU-MuensterGoogle Scholar
- 3.Breithaupt R (1999) Adaptive und kooperative Automaten in statischen und dynamischen Umgebungen (in German). Diploma thesis, department of Applied Physics, WWU-MuensterGoogle Scholar
- 4.Sutton RS, Santamaria CJ, Ram A (1996) Experiments with reinforcement learning problems with continuous state and action spaces. University of Massachusetts Amherst (UM-CS-1996-088)Google Scholar
- 5.Kroese BJA, van Dam JW (1992) Adaptive space quantisation for reinforcement learning of collision-free navigation. Faculty of Mathematics and Computer Science University of AmsterdamGoogle Scholar
- 8.Kohonen T (1989) Self-organization and associative memory. Springer series in information sciences, 3rd edn. Springer, BerlinGoogle Scholar
- 9.Herz J, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation. Addison-Wesley, ReadingGoogle Scholar
- 11.Tesauro G (1992) Temporal difference learning of backgammon strategy. In: Sleeman D, Edwards P (eds) Machine learning. Morgan Kaufmann, San Mateo, p 451–457Google Scholar