ICANN 98 pp 1109-1114 | Cite as

Reinforcement Learning of Collision-free Motions for a Robot Arm with a Sensing Skin

  • Pedro Martín
  • José del R. Millán
Conference paper
Part of the Perspectives in Neural Computing book series (PERSPECT.NEURAL)


Sensory information is fundamental for autonomous robots that face unknown environments. On-line sensing allows a robot arm to modify its motion in real time to cope better with the environment. Reactive systems (e.g., [1]) are appropriate to generate on-line motions from local sensory data. A reactive controller can be implemented automatically by using artificial neural networks and reinforcement learning (RL) [2,3,4]. RL allows a neural network to acquire reaction rules while the robot arm interacts with its environment. We have previously demonstrated the feasibility of RL to acquire sensor-based reaching strategies for simulated multi-link planar manipulators [5]. In this paper, we extend this work to a real manipulator, namely a Zebra ZERO, that has a whole-arm sensing skin with sonar proximity sensors (see Fig. 1a). We describe a neural reactive controller that learns goal-oriented obstacle-avoiding motion strategies for such a manipulator in unknown 3D environments. The controller is made up of two main modules: a reinforcement-based action generator (AG) and a goal vector generator (GG). The AG uses local sensory data and position information to determine an appropriate deviation from the goal vector given by the GG. The task of collision-free reaching can be decomposed into two sequential subtasks: Negotiate Obstacles (NO subtask) and Move to Goal position (MG subtask). When the robot arm is not near the goal position and detects an obstacle in its way to the goal, the best strategy is to focus on negotiating the obstacle—moving along an efficient trajectory is not so important.


Reinforcement Learning Goal Location Motion Strategy Goal Position Goal Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Brooks R. A. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation 1986; 2:14–23MathSciNetCrossRefGoogle Scholar
  2. [2]
    Barto A. G., Sutton R. S., Anderson C. W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. on Systems, Man, and Cybernetics 1983; 13: 834–846.CrossRefGoogle Scholar
  3. [3]
    Sutton, R. S. Learning to predict by the methods of temporal differences. Machine Learning 1988; 3:9–44.Google Scholar
  4. [4]
    Williams R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 1992; 8:229–256.MATHGoogle Scholar
  5. [5]
    Martín P., Millán J. del R. Learning reaching strategies through reinforcement for a sensor-based manipulator. Neural Networks 1998; 11:359–376.CrossRefGoogle Scholar
  6. [6]
    Jordan M. I., Rumelhart, D. E. Forward models: Supervised learning with a distal teacher. Cognitive Science 1992; 16:307–354.CrossRefGoogle Scholar
  7. [7]
    Kindermann J., Linden A. Inversion of neural networks by gradient descent. Journal of Parallel Computing 1992; 14:277–286.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 1998

Authors and Affiliations

  • Pedro Martín
    • 1
  • José del R. Millán
    • 2
  1. 1.Dept. of Computer ScienceUniversity of Jaume ICastellónSpain
  2. 2.ISIS, Joint Research CentreEuropean CommissionIspra (VA)Italy

Personalised recommendations