Flood-Fill Q-Learning Updates for Learning Redundant Policies in Order to Interact with a Computer Screen by Clicking
We present a specialisation of Q-learning for the problem of training an agent to click on a computer screen. In this problem formulation the agent sees the pixels of the screen as input, and selects a pixel as output. The task of selecting a pixel to click on involves selecting an action from a large discrete action space in which many of the actions are completely equivalent in terms of reinforcement learning state transitions. We propose to exploit this by performing simultaneous Q-learning updates for equivalent actions. We use the flood fill algorithm on the input image to determine the action (pixel) equivalence.
- 1.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)Google Scholar
- 4.Pazis,J., Parr, R.: Generalized value functions for large action sets. In: ICML, pp. 1185–1192 (2011)Google Scholar
- 5.Shi, T., et al.: World of bits: an open-domain platform for web-based agents. In: ICML (2017)Google Scholar
- 6.Haarnoja, T., et al.: Reinforcement learning with deep energy-based policies. In: ICML (2017)Google Scholar
- 7.Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI (2016)Google Scholar