Abstract
We propose a reinforcement learning approach that combines an asynchronous actor-critic model with a recurrent model of visual attention. Instead of using the full visual information of the scene, the resulting model accumulates the foveal information of controlled glimpses and is thus able to reduce the complexity of the network. Using the designed model, an artificial agent is able to solve a challenging “mediated interaction” task. In these tasks, the desired effects cannot be created through direct interaction, but instead require the learner to discover how to exert suitable effects on the target object through involving a “tool”. To learn the given mediated interaction task, the agent is “actively” searching for salient points within the environment by taking a limited number of fovea-like glimpses. It then uses the accumulated information to decide which action to take next.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The very first glimpse of each step is always random.
- 3.
- 4.
i.e. the distance to the domains origin becomes smaller than the radius of the goal area.
- 5.
A short movie of the learned policy for the model using 6 glimpses and a \(20 \times 20\) context image can be found at https://doi.org/10.4119/unibi/2934182.
References
Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Jaderberg M, Mnih V, Czarnecki WM, Schaul T, Leibo JZ, Silver D, Kavukcuoglu K (2016) Reinforcement learning with unsupervised auxiliary tasks. arXiv:1611.05397
Hayhoe M, Ballard D (2005) Eye movements in natural behavior. Trends Cogn Sci 9(4):188–194
Mathe S, Sminchisescu C (2013) Action from still image dataset and inverse optimal control to learn task specific visual scanpaths. In: Advances in neural information processing systems, pp 1923–1931
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Itti L, Koch C (2001) Computational modelling of visual attention. Nature Rev Neurosci 2(3):194–203
Hassabis D, Kumaran D, Summerfield C, Botvinick M (2017) Neuroscience-inspired artificial intelligence. Neuron 95(2):245–258
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. arXiv:1406.6247
Ba J, Mnih V, Kavukcuoglu K (2014) Multiple object recognition with visual attention. arXiv:1412.7755
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. arXiv:1602.01783
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Larochelle H, Hinton GE (2010) Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in Neural Information Processing Systems 23. Curran Associates, Inc., pp 1243–1251 (2010)
Schulman J, Moritz P, Levine S, Jordan MI, Abbeel P (2015) High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438
Acknowledgment
This research/work was supported by the Cluster of Excellence Cognitive Interaction Technology ‘CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fleer, S., Ritter, H. (2020). Solving a Tool-Based Interaction Task Using Deep Reinforcement Learning with Visual Attention. In: Vellido, A., Gibert, K., Angulo, C., Martín Guerrero, J. (eds) Advances in Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization. WSOM 2019. Advances in Intelligent Systems and Computing, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-19642-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-19642-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19641-7
Online ISBN: 978-3-030-19642-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)