KI 2010: Advances in Artificial Intelligence

Volume 6359 of the series Lecture Notes in Computer Science pp 203-210

Adaptive ε-Greedy Exploration in Reinforcement Learning Based on Value Differences

  • Michel TokicAffiliated withInstitute of Applied Research, University of Applied Sciences Ravensburg-WeingartenInstitute of Neural Information Processing, University of Ulm

* Final gross prices may vary according to local VAT.

Get Access


This paper presents “Value-Difference Based Exploration” (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of ε-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent’s uncertainty about the environment. VDBE is evaluated on a multi-armed bandit task, which allows for insight into the behavior of the method. Preliminary results indicate that VDBE seems to be more parameter robust than commonly used ad hoc approaches such as ε-greedy or softmax.