Abstract
In this article, we examine the learning performance of various strategies under different conditions using the Voronoi Q-value element (VQE) based on reward in a single-agent environment, and decide how to act in a certain state. In order to test our hypotheses, we performed computational experiments using several situations such as various angles of rotation of VQEs which are arranged into a lattice structure, various angles of an agent’s action rotation that has 4 actions, and a random arrangement of VQEs to correctly evaluate the optimal Q-values for state and action pairs in order to deal with continuous-valued inputs. As a result, the learning performance changes when the angle of VQEs and the angle of action are changed by a specific relative position.
Similar content being viewed by others
References
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Bradford Books, MIT Press
Aung KT, Fuchida T (2010) Reinforcement learning using Voronoi space division. Artif Life Robotics 15:330–334
Gaskett C, Wettergreen D, Zelinsky A (1999) Q-learning in continuous state and action spaces. Adv Topics Artif Intell 1747
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Aung, K.T., Fuchida, T. A comparison of learning performance in two-dimensional Q-learning by the difference of Q-values alignment. Artif Life Robotics 16, 473–477 (2012). https://doi.org/10.1007/s10015-011-0961-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-011-0961-5