Reinforcement Learning of Collision-free Motions for a Robot Arm with a Sensing Skin
Sensory information is fundamental for autonomous robots that face unknown environments. On-line sensing allows a robot arm to modify its motion in real time to cope better with the environment. Reactive systems (e.g., ) are appropriate to generate on-line motions from local sensory data. A reactive controller can be implemented automatically by using artificial neural networks and reinforcement learning (RL) [2,3,4]. RL allows a neural network to acquire reaction rules while the robot arm interacts with its environment. We have previously demonstrated the feasibility of RL to acquire sensor-based reaching strategies for simulated multi-link planar manipulators . In this paper, we extend this work to a real manipulator, namely a Zebra ZERO, that has a whole-arm sensing skin with sonar proximity sensors (see Fig. 1a). We describe a neural reactive controller that learns goal-oriented obstacle-avoiding motion strategies for such a manipulator in unknown 3D environments. The controller is made up of two main modules: a reinforcement-based action generator (AG) and a goal vector generator (GG). The AG uses local sensory data and position information to determine an appropriate deviation from the goal vector given by the GG. The task of collision-free reaching can be decomposed into two sequential subtasks: Negotiate Obstacles (NO subtask) and Move to Goal position (MG subtask). When the robot arm is not near the goal position and detects an obstacle in its way to the goal, the best strategy is to focus on negotiating the obstacle—moving along an efficient trajectory is not so important.
Unable to display preview. Download preview PDF.
- Sutton, R. S. Learning to predict by the methods of temporal differences. Machine Learning 1988; 3:9–44.Google Scholar