Motivated Reinforcement Learning for Improved Head Actuation of Humanoid Robots
The ability of an autonomous agent to self-localise within its environment is critically dependent on its ability to make accurate observations of static, salient features. This notion has driven considerable research into the development and improvement of feature extraction and object recognition algorithms, both within RoboCup and the robotics community at large. Instead, this paper focuses on the rarely-considered issue imposed by the limited field of view of humanoid robots; namely, determining an optimal policy for actuating a robot’s head, to ensure it observes regions of the environment that will maximise the positional information provided. The complexity of this task is magnified by a number of common computational issues; specifically high dimensional state spaces and noisy environmental observations. This paper details the application of motivated reinforcement learning to partially overcome these issues, leading to an 11% improvement (relative to the null case of uniformly distributed actuation policies) in self-localisation and ball-localisation for an agent trained online for less than one hour. The method is demonstrated as a viable method for improving self-localisation in robotics, without the need for further optimisation of object recognition or tuning of probabilistic filters.
Keywordsmotivated reinforcement learning localisation Fourier basis head actuation simulated curiosity
Unable to display preview. Download preview PDF.
- 3.Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawa, E., Matsubara, H.: Robocup: A challenge problem for ai. AI Magazine 18(1) (1991)Google Scholar
- 4.Budden, D., Fenn, S., Walker, J., Mendes, A.: A novel approach to ball detection for humanoid robot soccer. In: Thielscher, M., Zhang, D. (eds.) AI 2012. LNCS, vol. 7691, pp. 827–838. Springer, Heidelberg (2012)Google Scholar
- 5.Wan, E., van der Merwe, R.: The unscented kalman filter for nonlinear estimation. In: Adaptive Systems for Signal Processing, Communications, and Control Symposium, AS-SPCC 2000, pp. 153–158. The IEEE (2000)Google Scholar
- 6.Watkins, C.: Learning from Delayed Rewards. PhD thesis, Cambridge University (1989)Google Scholar
- 7.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
- 8.Wundt, W.: Principles of Physiology and Psychology. Macmillan, New York (1910)Google Scholar
- 9.Saunders, R., Gero, J.S.: Designing for interest and novelty - motivating design agents. In: de Vries, B., van Leeuwen, J., Achten, H. (eds.) Proceedings of the Ninth International Conference on Computer Aided Architectural Design Futures, pp. 725–738. Kluwer Academic Publishers (2001)Google Scholar
- 12.Konidaris, G., Osentoski, S., Thomas, P.S.: Value function approximation in reinforcement learning using the Fourier basis. In: Burgard, W., Roth, D. (eds.) Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, pp. 380–385. AAAI Press, San Francisco (2011)Google Scholar
- 13.The RoboCup Institution: RoboCup Soccer Humanoid League Rules and Setup for the 2013 Competition in Eindhoven, DRAFT (2012), http://www.tzi.de/humanoid/bin/view/Website/Downloads
- 14.Majdik, A., Popa, M., Tamas, L., Szoke, I., Lazea, G.: New approach in solving the kidnapped robot problem. In: Robotics (ISR), 2010 41st International Symposium on and 2010 6th German Conference on Robotics (ROBOTIK), pp. 1–6 (2010)Google Scholar