Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions
We target the problem of closed-loop learning of control policies that map visual percepts to continuous actions. Our algorithm, called Reinforcement Learning of Joint Classes (RLJC), adaptively discretizes the joint space of visual percepts and continuous actions. In a sequence of attempts to remove perceptual aliasing, it incrementally builds a decision tree that applies tests either in the input perceptual space or in the output action space. The leaves of such a decision tree induce a piecewise constant, optimal state-action value function, which is computed through a reinforcement learning algorithm that uses the tree as a function approximator. The optimal policy is then derived by selecting the action that, given a percept, leads to the leaf that maximizes the value function. Our approach is quite general and applies also to learning mappings from continuous percepts to continuous actions. A simulated visual navigation problem illustrates the applicability of RLJC.
Unable to display preview. Download preview PDF.
- 1.Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scient. (1996)Google Scholar
- 2.Sutton, R., Barto, A.: Reinforcement Learning, an Introduction. MIT Press, Cambridge (1998)Google Scholar
- 3.Gross, H.M., Stephan, V., Krabbes, M.: A neural field approach to topological reinforcement learning in continuous action spaces. In: Proc. of the IEEE World Congress on Computational Intelligence, vol. 3, pp. 1992–1997 (1998)Google Scholar
- 5.Gaskett, C., Wettergreen, D., Zelinsky, A.: Q-learning in continuous state and action spaces. In: Australian Joint Conf. on Artificial Intelligence, pp. 417–428 (1999)Google Scholar
- 7.Monson, C., Wingate, D., Seppi, K., Peterson, T.: Variable resolution discretization in the joint space. In: Intern. Conf. on Machine Learning and Applications (2004)Google Scholar
- 9.Whitehead, S., Ballard, D.: Learning to perceive and act by trial and error. Machine Learning 7, 45–83 (1991)Google Scholar
- 10.Breiman, L., Friedman, J., Stone, C.: Classification and Regression Trees. Wadsworth Intern. Group (1984)Google Scholar
- 12.Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Madison, WI, USA, vol. 2, pp. 257–263 (2003)Google Scholar