The following two sub-sections summarize the two internal models that are jointly learnt and form the core of the proposed architecture: (1) internal model of the body and (2) internal model of the peripersonal space. An important feature of the architecture is that both these models are learnt from the same data generated during sensorimotor exploration of a robot in its reachable workspace through a process of random motor babbling. This is outlined in Fig. 2a which shows diagrammatically the main stages of the learning process of the two internal models. Finally, the third sub-section describes how the coupled interaction between these internal models provides us with a framework for spatial reasoning in a parallel assembly task.
Internal Body Model
The internal model of the body is based on the motor control theory known as passive motion paradigm (PMP) [53,54,55, 66] that draws on prominent ideas like the synergy formation [59] and the Equilibrium Point Hypothesis [67, 68]. PMP offers a shared computational basis for simulation and generation of action in articulated structures of arbitrary complexity and redundancy. Intuitively, the idea is that given a target for robot end-effector to reach, the process to determine the distribution of work across its joints can be represented as an ‘internal simulation on the body model’. The internal simulation calculates how much each joint would move if an externally induced force (i.e. the goal) pulls the end-effector by a small amount towards the target. This process of relaxation is like coordinating the movements of a puppet by means of attached strings: as the puppeteer pulls the task relevant tip of the body to a target, the rest of its body elastically reconfigures to allow the tip to reach the target. PMP can be defined computationally in the following steps:
-
(1)
Given a target, generate a target-centred virtual force field in the extrinsic space:
$$ F={K}_{\mathrm{ext}}\left({s}_{\mathrm{T}}-s\right) $$
Where sT is the target, s is the current position of the end-effector and Kext is the virtual stiffness of the attractive field in the extrinsic space. Kext determines the shape and intensity of the force field. In the simplest case, Kext is proportional to the identity matrix and this corresponds to an isotropic field, converging to the goal target along straight flow lines.
-
(2)
Map the force field from the extrinsic space into virtual torque field in the intrinsic space:
where J(q)T is the transposed Jacobian matrix which is always well-defined. In the next section, we show how these Jacobians can be derived from a learnt internal body model.
-
(3)
Relax the arm configuration to the applied field:
$$ \dot{q}={A}_{\mathrm{int}}\bullet T $$
Where Aint is the virtual admittance (or joint compliance) matrix in the intrinsic space that leads to the distribution of the torques among the joint rotations. In the simplest case, it is an identity matrix.
-
(4)
Map the arm movement into the extrinsic workspace:
$$ \dot{s} = J(q)\bullet \dot{q} $$
-
(5)
Integrate over time until equilibrium:
$$ s(t)={\int}_{t_0}^t\ J(q)\dot{q} d\tau $$
The last step integration gives us a trajectory with the equilibrium configuration s(t) defining the final position of the robot in the extrinsic space.
To put in words, at each time step of this cyclic computational process, target goal xT in the extrinsic space induces virtual disturbance forces F on the end-effector, which are modulated by the virtual stiffness Kext. These forces are then mapped into equivalent torques T; this projection is implemented by the transpose Jacobian J(q)T. These virtual torques T cause incremental joint rotations \( \dot{q} \) as allowed by the compliances Aint of different joints. The incremental change in joint space \( \dot{q} \) is mapped to the extrinsic space \( \dot{x\ } \) using the Jacobian matrix J(q), causing a small displacement of the end-effector towards the intended target. This process progresses cyclically till the time the algorithm converges to an equilibrium state reaching the target. While in depth discussions on PMP and its extensions and implementations on various robotic platforms are available in [55, 69] and [66, 70], here, we briefly highlight some key advantages of the PMP formulation against the traditional analytical approaches in robotics:
-
1.
There is no kinematic inversion or cost function optimization since all computations are well posed (one-to-one). Hence, PMP is computationally inexpensive especially while coordinating highly redundant robots. In this sense, PMP is closely linked to other approaches based on active inference [71] that also avoid inverse kinematics. Two recent reviews discuss the pros and cons of these approaches in detail [72, 73].
-
2.
A solution is guaranteed and there are no singularities. This applies even in cases where the target is unreachable. In such a case, the final solution or the output of the forward model (i.e. the end-effector location) provides useful geometric information to trigger further reasoning; such as the desired length of a tool to reach the target.
-
3.
PMP offer runtime configurability. PMP networks can be assembled on the fly based on the nature of the motor task and the body chains or tools chosen for execution.
In the context of this article, what is important is the fact that PMP is not only able to generate real actions but also simulate imaginary movements predicting the sensory consequences of the imagined actions. This is very consistent with the recent research confirming common underpinnings for real and imagined actions [39, 72]. In our view, the internal body model acts as a link or the middleware between the real and imagined actions. Running internal simulations on an interconnected set of neuronal networks must be the main function of the body schema in humans. We believe the proposed PMP model provides a possible computational formulation to explain the results from neuroscience. Further, PMP augments the idea that cognitive motor processes such as action planning shares the same representations with motor execution [58]. This allows a cognitive agent to reason about and plan its actions in the environment beforehand in a goal-directed fashion [37, 38, 74]. In this sense, PMP framework closely resonates with the embodied simulation hypothesis discussed in the Introduction.
Acquisition of the Internal Body Model
The body schema representation is acquired using random motor babbling movements of a robotic arm in its peripersonal space while the end-effector location is tracked through visual perception. This gives rise to sensorimotor data: a training set of joint rotation readings with the corresponding coordinates of the end-effector. Using joint angle vectors as input and the corresponding 3D end-effector location vectors as desired output, a feedforward neural network with two hidden layers is trained through backpropagation of error. Thereafter, the Jacobian can be recovered from the weights of the trained neural network: that represents the body schema, mapping the extrinsic to the intrinsic spaces. Below we discuss the process of acquisition of the Jacobian matrix J(q).
Let a vector q represent the state of a robot in the intrinsic joint space for a given pose and a vector s identifies the position of the end-effector of the robot in the extrinsic workspace for that pose. Then the kinematic transformation s = f(q) can be expressed as: \( \dot{\boldsymbol{s}} = J\left(\boldsymbol{q}\right)\bullet \dot{\boldsymbol{q}} \) where J(q) is the Jacobian matrix of the transformation. We train a multilayer feed forward neural network (see Fig. 2b) with two hidden layers to learn the mapping s = f(q) where q = {qi} is the input vector (of joint angles) and s = {sk} is the output vector (representing 3D position/orientation of the end-effector). z = {zj} and y = {yl} vectors are the output of first and second hidden layer units respectively. Equation 1 expresses the mapping, where Ω = {ωij} are connection weights from the input layer to first hidden layer, O = {ojl} are the connection weights between the hidden layers, W = {wlk} are the connection weights from the second hidden layer to the output layer, h = {hj} are the net inputs to the neurons of the first hidden layer and p = {pl} are net inputs to the second hidden layer. Neurons in the two hidden layers fire using the activation function g which represents the hyperbolic tangent function tanh(); the output layer neurons are linear.
$$ \boldsymbol{s}=f\left(\boldsymbol{q}\right)\Rightarrow \left\{\begin{array}{c}\genfrac{}{}{0pt}{}{h_j={\sum}_i{\omega}_{ij}\ {q}_i}{z_j=g\left({h}_j\right)}\\ {}\genfrac{}{}{0pt}{}{p_l={\sum}_j{o}_{jl}\ {z}_j}{y_l=g\left({p}_l\right)}\\ {}\genfrac{}{}{0pt}{}{s_k={\sum}_l{w}_{lk}\ {y}_l={\sum}_l{w}_{lk}\bullet g\left({\sum}_j{o}_{jl}\ {z}_j\right)}{\Rightarrow \kern1em {s}_k=\kern0.5em {\sum}_l{w}_{lk}\bullet g\left({\sum}_j{o}_{jl}\bullet g\left({\sum}_i{\omega}_{ij}\ {q}_i\right)\right)\ }\end{array}\right. $$
(1)
After training the neural network using sensorimotor data generated by the robot, the Jacobian J(q) can be extracted from the learnt weight matrices using the chain rule, as in the following expression:
$$ J\left(\boldsymbol{q}\right)=\frac{\partial {s}_k}{\partial {q}_i}=\sum \limits_l{w}_{lk}\cdot {g}^{-1}\left({p}_l\right)\kern0.5em \sum \limits_j{o}_{jl}\cdot {g}^{-1}\left({h}_j\right)\ {\omega}_{ij} $$
(2)
Since each robot has 6 joints, the input vector consisted of 6 values and the target vector consists of the corresponding 3D location. The number of neurons in first and second hidden layers of the neural networks was determined heuristically as 32 and 41, respectively. The neural network was trained using Levenberg–Marquardt algorithm. Over five different training runs of 2000 epochs each, results showed that the network converged very well, with an average root mean square error of the approximator less than 0.04 mm at the test.
The Body Model in Goal-Directed Reasoning
The Jacobians extracted from the trained network are inserted at appropriate steps in the above given PMP computational algorithm. The Jacobians provide the PMP dynamics with a manipulable representation of the body; i.e. a body model that can be used as a forward/inverse model for goal-directed simulation and execution of actions in real world. In the context of this work, a robot can then use the acquired internal body model to ‘imagine’/simulate reaching movement to a given target and anticipate if the target is reachable or not without making any real movement. This provides the robot with the information of the feasibility of any planned reaching without entering a risk of unexpected failures to reach. In addition, since the forward kinematics of the internal model of body (neural PMP) computes a movement trajectory from the current to the goal position in the extrinsic space, imagined movements of multiple robots in a shared workspace can be explored for possible collisions between the robots before any execution of real movements. An alternative course of actions can be planned to realize the goal in case collisions between the imagined movement trajectories are detected.
Internal Model for Peripersonal Space Representation
In this section, we describe the internal model for a sparse representation of the peripersonal space of a robot. Such an internal representation is needed to perform a non-uniform quantization of the peripersonal space, thus simplifying the target selection during the assembly task. This representation of space is learnt using a Growing Neural Gas (GNG) algorithm [47] from the same data used to learn the internal model of body (neural PMP). GNG is an unsupervised incremental network model that can learn important topological relations in a given set of input vectors by means of a simple Hebb-like learning rule. The model continues learning, adding new neurons and connections until a performance criterion is met.
Acquisition of the Peripersonal Representation
To begin the learning process, the robot randomly explores different spatial locations in its workspace through motor babbling; these spatial locations/end-effector coordinates St form the sensory input to the growing neural gas. The free variables that are learnt in this algorithm are as follows. The size of the resulting matrix is indicated inside the parenthesis.
-
1)
N: No. of neurons in the GNG network (N).
-
2)
si: Sensory weights for each neuron (N × D), these are randomly initialized; D: degrees of freedom in the sensory space, which is 3.
-
3)
errori: Local estimate of representational error, i.e. the accumulated difference between the actual perception and the best matching unit in the GNG (N). This information is particularly useful for growing the GNG.
-
4)
Ageij: Age of lateral connection for pruning off excess and less valuable lateral connections in the GNG (N × N). Age of all connections is initialized to zero in the beginning.
-
5)
Wij: Lateral weights (these are edges that encode neighbourhood) (N × N).
Below, the algorithm for learning the neural map through randomly generated sequences of sensory S data is outlined as a sequence of steps (a–g):
-
(a)
Initialization: Start with one single neuron with randomly initialized sensory weights si.
-
(b)
Acting and observing: Babble to a random location and acquire the sensory information St. t stands for time or iteration number of the exploration (see Fig. 3 for details).
-
(c)
Estimating the winner: Of all the neurons that exist in the GNG at that point of time, find the neuron ‘i’ that shows maximum activity for the observed sensory stimulus St at time/iteration t. This implies finding the neuron ‘i’ that has sensory weights si such that ∣|si − St|∣ has the smallest value, among all neurons existing in the GNG at that instance of time.
-
(d)
Growing when needed: New neurons are incorporated into the GNG when the difference between the actual perception and the best matching unit say ‘i’ becomes too large. To make this detection more robust, we assume that every neuron in the GNG has a measure of its own local representational error that accumulates with respect to time. For this purpose, we use a low pass filter at a timescale τe = 10 as in equation below:
$$ {\tau}_e\dot{error_i}=-{error}_i+\left(1-\frac{1}{\sqrt{2\pi }{\sigma}_e}{e}^{\frac{-{\left({S}_i-S\right)}^2}{2{\sigma}_e^2}}\right) $$
Whenever this error measure exceeds a threshold called vigilance, errori > v (in our case v = 0.25), we generate a new neuron j with the codebook vector equal to the current perception. Gaussian kernel width σe = 1 was used.
-
(e)
Adapting the sensory weights: Now adapt the sensory weights of the winner and its topological neighbours (all neurons laterally connected to the winning neuron) by small fractions ew and en of the distance as follows:
$$ {\displaystyle \begin{array}{c}{s}_i\leftarrow {s}_i+{e}_w\left(S-{s}_i\right),\kern0.75em \mathrm{Winner}\kern0.24em i\\ {}{s}_n\leftarrow {s}_n+{e}_n\left(S-{s}_i\right),\forall n\kern0.61em \in \mathrm{Neighbours}(i)\end{array}} $$
ew, en ∈ [0, 1]. While setting ew and en too high usually results in an unstable network, with nodes moving all around all the time, setting them too low often makes training slow and ineffective. In all our experiments, we choose the following values: ew = 0.04 and en = 0.0006.
-
(f)
Adapting the lateral weights: Lateral weights Wij are simply edges between neurons that encode neighbourhood and possible state transitions. These links permit spreading of activity in the direction of the gradient of value field and are locally adapted in response to dynamic changes in the world. We employ the simplest mechanism to organize the lateral weights, as proposed by Fritzke [47]. This technique involves growing a lateral connection between successive best winning neurons ‘k’ and ‘i’ with a lateral weight initialized as Wik = 1, incrementing the age of all other neighbouring lateral connections, and finally pruning off the connections whose age cross an age threshold Agemax (in our case equal to 25).
-
(g)
Pruning: Finally eliminate the dead neurons (with no lateral connections) existing in the system and proceed with the next step of sensory input observation and another incremental phase of learning the free variables in the system using the procedure mentioned above. As newer regions in the workspace are explored, the internal map grows and becomes more densely connected. This process continues till the time the internal map becomes almost quasi stationary.
During this process of training, the robot babbled to about 10,000 locations in its workspace area before settling to a sparse representation of the area explored. Figure 3a shows the generated data points in the workspace of the TX robot and Fig. 3b shows the corresponding learnt GNG network grown to a size of 478 interconnected neurons. Figure 3c shows the topology of the evolving GNG as new incoming sensory information St keeps coming or in other words as the robot explores more and more locations in its workspace. Another GNG network representing the peripersonal space of the RX robot is learnt using the same technique. In our results, the GNG network for RX robot grew to a size of 476 neurons. Figure 3d shows the overall structure of the workspace with the two robots and the two GNG networks representing their peripersonal spaces. There is an overlap between the networks representing the intersecting peripersonal areas where both the robots can operate. Before we go into the next section that describes how this representational scheme can be exploited to serve as a general substrate for realizing goal-directed planning and cooperation, below we highlight the relevance of the proposed approach from a neuroscientific perspective.
In humans, peripersonal space representations are pivotal in the sensory guidance of motor behaviour, allowing us to interact with objects and with other people in the space around us [75,76,77]. It is widely argued that body schema and peripersonal space representations share underlying brain networks. Recent studies have provided evidence that perceiving objects in peripersonal space activates a set of interconnected parietal and frontal areas overlapping with the set subtending voluntary motor action and motor imagery [75, 76, 78]. This trend of research was one of the key motivations to our work for designing a system with intertwined models for body schema and peripersonal space. Here, we emphasize that our model training approach is also very consistent with the neuroscientific perspective that the representations or internal models of both the space and the body are deeply intertwined [45] and synergistically interact to facilitate goal-directed behaviour, and hence, they should be developed in parallel [38, 46]. Furthermore, neuropsychological studies show that brain constructs rapidly modifiable representations of space, centred on different body parts (i.e. hand-centred, head-centred, and trunk-centred). The size of these peripersonal spaces also varies for different stimulated body parts [79]. There is also convincing evidence to show that peripersonal space processing operates in a very plastic and dynamic manner, e.g. peripersonal space of arms is extended due to tool-use [75] and gets shrunk due to amputations [80]. Relating to our case, the industrial robots TX and RX are not identical (TX is a smaller robot), and for many other tasks they are used separately, sometimes extended with tools coupled to their grippers. Hence, we deploy separate GNGs for the two different robots.
Spatiotemporal Reward Field Dynamics for Goal-Directed Reasoning
Now that each robot has a representation of its workspace, we discuss the mechanisms to organize the action sequence of the robots in a goal-oriented way. By providing each robot with information on what object to act on and when, the two robots can complete the assembly task successfully. An optimal action sequence will maximize parallel operation of the two robots during assembly and minimize involvement of any other mechanisms needed for collision avoidance which delay the assembly process.
To organize goal-oriented assembly, a reward-based neural field is applied to the GNG network on top of the sensory input driven neural field in the network. The spatiotemporal dynamics of the reward field organizes the sequence of the actions taken by each robot by prioritizing execution of actions which fetch maximum reward. Such a reward field can be based on a default plan [51, 52] or can be learnt by robots through exploration and experience (see the Supplementary Information). The layout of the workspace in our experimental setup is such that the position of the robots is roughly symmetric along the y-axis of the workspace. Therefore, a good strategy is to structure the instantaneous reward gained by a robot when acting on an object as a function of the distance (in y-axis only) of the object from the robot. This would imply for a robot to minimize entering the overlapping workspace (see Fig. 3c) and to keep within its unshared workspace for as many successful assemblies as possible. This default reward Ri associated with a neuron i is given by
$$ {R}_i=\frac{1}{Z}\left({e}^{\frac{-{\left({y}_i-Y\right)}^2}{2{\sigma}_R^2}}+{R}_c\right) $$
Here, yi denotes the y-value of the sensory weight of ith neuron and Y is the y-coordinate of the location of robot with respect to origin. Rc (=1, in our experiments) is a constant minimum reward to a robot for merely acting in the world. Z is chosen such that
$$ \sum \limits_i{R}_i=1 $$
This reward function based field dynamics will elicit maximum rewards for objects that are minimally distant from the robot along the y-axis. Figure 4 shows the resulting reward functions for the two robots that we used in all real-world assembly tasks. From this graph, it is easy to conclude that the spatiotemporal dynamics of the reward structure will enforce each robot to prioritize acting on objects which are minimally distant from the robot in their y-axis.
Localization of an Empty Area in the Workspace Using the Internal Model for Peripersonal Space
In response to a sensory input S, the activity xi of a neuron in the GNG network can be given by a Gaussian kernel which compares the sensory weight si of a neuron i with the current perceptual input S
$$ {x}_i=\frac{1}{\surd 2\pi {\sigma}_s}{e}^{-{\left({s}_i-S\right)}^2/2{\sigma}_s^2} $$
Clearly, for any sensory input (which is the 3D location (X, Y, Z) of the centroid of any object in the scene), a neuron which has internal sensory weights closest to the incoming signal will exhibit the highest level of activity in the network and thus best represents the input. For any given spatial layout with n objects in a robot’s workspace, n neurons of the corresponding robot’s GNG network will show maximal activations corresponding to the locations of the objects in the scene. Neurons nearby these ‘most active neurons’ with also elicit higher activity than the rest of the neurons in the network. Thus, regions in the network where neurons are more active than others represent the occupied areas in the workspace where as regions which show least activity are empty areas in the workspace of the robot.
During cooperative behaviour (see the ‘Cooperation Between the Robots To Achieve Otherwise Unrealizable Goals’ section), a robot may need to move objects from its own non-shared workspace into the shared workspace for the other robot to work with. Since the shared workspace can be cluttered with objects, the spatial reasoning system must find an empty area in the shared workspace for the robot to place down an object (fuse). This is to avoid collision of the robot gripper with objects in the scene. Using the spatial configuration as the sensory input to the internal model of a robot’s peripersonal space, we look at the activity of all the neurons in the GNG network representing the shared workspace. The activity of each neuron is a result of the cumulative influence of sensory inputs that fall within a threshold distance to the neuron’s weights. In our experiments, the threshold distance is equal to the width of the robot’s gripper. The neuron which shows least activity has its weights most distant from other objects lying within the threshold distance. We look for a neuron i with an activity xi
$$ {x}_i=\min \sum \limits_T\frac{1}{\surd 2\pi {\sigma}_s}{e}^{-{\left({s}_i-{S}_j\right)}^2/2{\sigma}_s^2} $$
where Sj is the 3D location of every object that lies within the distance threshold T from the neuron’s sensory weights si. The area centred at the location of this neuron is an empty area where the object can be safely placed down without colliding with other objects in the workspace. Later in the ‘Cooperation Between the Robots To Achieve Otherwise Unrealizable Goals’ section, we show a case in which empty areas of specific radii are localized and exploited for cooperative behaviour between robots in situations where a single robot cannot realize an assembly task on its own.
Coupled Interaction Between Internal Models of Body and Space
The problem of two robots performing parallel assemblies in shared workspaces is challenging both from the perspective of reasoning and that of control as well. To plan action sequences based on what objects to act on and when for time efficient assembly; and to work safely and robustly requires multiple subsystems providing different sub-functionalities to be coherently integrated. In the following lines, we give a description of the overall mechanism conceived to deal with the multi-faceted problem of efficient parallel assembly described in the ‘The Robots and the Experimental Setup’ section. The developed mechanism is a result of synergistic interaction between multiple subsystems (discussed in ‘The Computational Framework’ section) working together. Figure 5 depicts a block diagram of the main components of the reasoning system and the flow of information between these subsystems. Below we outline the functionalities provided by these subsystems and their coupled interactions for realizing the assembly task:
-
1)
Peripersonal space models with reward dynamics: These GNG models account for representation of (1) reachable workspace for a robot with a reward structure that evolves over the course of assembly process as the scenario changes; (2) shared workspace between the robots; and (3) empty and occupied regions in the peripersonal spaces of the two robots. These representations work as resource allocators as described in the ‘Internal Model for Peripersonal Space Representation’ section. Based on the spatial configuration, these space representations allocate sub-goals (fuses and holes) to both the robots using the reward field dynamics. From the spatial layout of the scene as perceived by the visual system, the configuration of the scene (i.e. different objects and their 3D locations in the workspace) generates neural field activity in the two GNG models. In other words, the neural activity in these GNG networks is an internal representation of the spatial layout of the scene outside. Now, because of the corresponding reward structures imposed upon the two GNG networks, each GNG network elicits highest reward for a particular object. Objects fetching highest rewards become targets and their locations are forwarded to the internal model for body. Target selection occurs in two steps: A robot’s GNG selects a target fuse followed by a target hole whenever available.
-
2)
Internal models of the body for simulated and real actions: Given the reachable target goals for the two robots, the PMP-based internal body models simulate solutions in the joint space of the robots (motion trajectories) to reach the targets.
Collision detection follows. From the anticipated motion trajectories, possible collisions during parallel execution of these simulated trajectories are estimated. To implement this, the 3D locations of the two distal joints (nearest to the end-effector) of both the robots are computed at multiple points along the two anticipated motion trajectories. The 3D locations are calculated using forward kinematics of the internal models. If the calculated distance between the joints in the extrinsic space is below a threshold value any time during simulated motion, a possible collision is detected. In case no collision is detected, synthesized motor commands corresponding to the two trajectories are forwarded to robots to operate in the workspace in parallel. As for example, the bottom left panel of Fig. 5 shows two robots performing different sub-tasks in simulation (e.g. grasping two different fuses).
Objects in the peripheries of the overall workspace are assembled first. However, as the scenario evolves during the process of assembly, the spatial layout and hence the reward structure pushes the two robots to operate towards the middle of the overall workspace. The neural activity fields in the two GNG networks with reward dynamics select increasingly closely lying targets. Hence, despite reward-efficient allocation of sub-goals by the internal models for peripersonal spaces, collisions are possible in the shared workspace. Whenever possible collisions are detected from the anticipated motion trajectories as described above (see for example a simulation result in Fig. 5 bottom middle panel), movements of the two robots need to be re-planned to avoid collisions. Collison avoidance is implemented by serializing the two robot actions one after the other. This occurs by alternately allowing one of the robots to complete its movement (a sub-task) while keeping the other robot away from the shared workspace area at an initialized location (Fig. 5 bottom right panel). A sub-task completion is followed by moving the robot to an initialization position and allowing the other robot to complete its own sub-task.
However, in case a robot’s internal peripersonal space model selected a fuse but cannot find a reachable fuse-box hole in the next target selection step, the model selects an empty location in the shared workspace for the robot to place the fuse. After this object relocation, another robot will find both the fuse and fuse-box reachable and will trigger assembly.
The next section presents the experimental results which demonstrate the coupled interaction between the internal models for realization of the task of parallel assembly.