# Synthesis of kinematically constrained full-body motion from stochastic motion model

- 72 Downloads

## Abstract

Humanoid robots are expected not only to understand human behaviors but also to perform human-like actions in order to be integrated into our daily lives. Learning by imitation is a powerful framework that can allow robots to generate the same motions that humans do. However, generation of motions by robots that are precisely the same as learned motions is not typically helpful in real environments, which are likely to be different from the environment where the motions were learned. For example, a robot may learn to reach for a glass on a table, but this motion cannot be used as is to reach for a cup on a cupboard shelf because the location of the cup is different from the location of the glass. Objects manipulated by robots and humans can be located in a variety of places. The robots therefore have to synthesize motions that depend on the current environment in order to reach a target object. Adaptive motion synthesis from memorized motions is an essential technique for allowing robots to perform human-like motions and accomplish motion tasks. This paper proposes a novel approach to synthesize full body motion by using both motions encoded as Hidden Markov Models and kinematic task constraints. We design an objective function that evaluates similarity between synthesized and memorized motions, satisfaction of the kinematic constraints, and smoothness of the generated motion. We develop an algorithm to find a motion trajectory that maximizes this objective function. The experiments demonstrate the utility of the proposed framework for the synthesis of full body motions by robots.

## Keywords

Motion synthesis Motion primitive Stochastic model## 1 Introduction

Humanoid robots are expected to understand gesture and language, communicate with humans, and be integrated into our daily lives. Recently, approaches for acquisition by robots of motions and their relevant language have been proposed. These approaches are based on mimesis theory, suggested by Donald, and support the hypothesis that humans communicated with each other by using gestures before they acquired language and that gestures underlie symbolic and linguistic mechanisms (Donald 1991).

Representations of motion primitives can be grouped according to two approaches. One approach reduces motions into a set of parameters of a dynamical system, such as systems of differential equations or neural networks (Okada et al. 2002; Ijspeert et al. 2003; Kadone and Nakamura 2005; Ito et al. 2006). Another approach is based on stochastic models, as typified by Hidden Markov Models (HMMs) (Inamura et al. 2004; Asfour et al. 2006; Billard et al. 2006; Kulic et al. 2008). In both approaches, motions are encoded as parameters in dynamical systems or stochastic models. Robots can memorize motions as stochastic models and subsequently reuse the models for the motion recognition and synthesis. In particular, efficient algorithms for stochastic motion models have been developed to compute the likelihood that a stochastic model generates a specific data sequence and to optimize the parameters such that they can maximize the likelihood that the model generates the sequences present in training data. These algorithms can be used to recognize observed motions and to learn motion primitives. Stochastic approaches have the additional advantage that they can be combined with natural language models. In the field of natural language processing, many natural language models have been developed by using stochastic models based on the same graphical structures as the motion models. The combination of the motion models and the natural language makes it possible for robots to understand motions as a form of language (Takano et al. 2007; Takano and Nakamura 2008; Takano et al. 2018).

The stochastic motion model is efficiently used for the motion recognition. Robots is expected only to recognize observed motion but also to synthesize human-like motions during interacting with human partner. Motion synthesis from the stochastic motion model is a key technology to improve the human–robot interaction. Several approaches to synthesizing motions from HMMs according to the distribution of training motions has beep developed (Inamura et al. 2004; Calinon et al. 2010). Because the synthesized motions are likely to be the average of the training motions, they cannot be used to adaptively synthesize motions in environments different from where the motions were originally learned. For example, if a robot learns a motion for grasping an object on a table, then the robot can reach to an object located at the same position by using the motions as learned. However, the robot cannot reach to an object on a shelf without modifying the learned motion since the object is located at a different position on the shelf than the learned position on the table. The adaptation of synthesized motion to the environment by using demonstrated motions is a required ability for robots to perform flexible actions for manipulation motions.

This paper proposes a novel approach to synthesizing full body motion that satisfies kinematic constraints from HMMs of motion models for humanoid robots. The proposed approach introduces an objective function consisting of three terms: a likelihood that an HMM generates the motion, distances between the synthesized motion and kinematic constraints, and continuity of the synthesized motion. Computation for motion synthesis can be formulated by optimization of the motion such that the motion maximizes the objective function. Thus, the optimization results in motion that not only resembles the learned motion but also satisfies kinematic task constraints. This approach makes it possible for robots both to recognize human motions and to synthesize human-like motions adaptive to the environment only using a stochastic motion model.

## 2 Related work

Several techniques have been developed to synthesize motions from motion models (Inamura et al. 2004; Herzog et al. 2008; Sugiura and Iwahashi 2008; Kunori et al. 2009; Calinon et al. 2010). Inamura et al. have proposed a novel method to synthesize motions from HMMs. This method consists of two steps. The first step samples a node according to probabilities of node transitions in the HMM. The second step samples a vector of joint angles according to distribution of joint angles on the sampled node. The iteration of these two steps results in a motion trajectory that is represented by a sequence of the joint angles (Inamura et al. 2004). Calinon et al. (2010) have extended HMMs to joint angles and joint angular velocities, using a method that calculates the distribution of the joint angular velocities given the joint angles at the previous frame, samples the joint angular velocities based on the distribution, and updates the joint angles at the current frame. This framework can establish a dynamical system from demonstrations of motion in the system. Although these types approaches can reproduce motion trajectories, they cannot be used to adaptively synthesize motions in current environments.

Herzog et al. (2008) used a Parametric Hidden Markov Model (PHMM) to recognize and synthesize object manipulation motions. PHMMs allow representing typical motions and compensating for different motions of a particular type. A set of motions is encoded into an HMM as a representative motion, and each motion variant is encoded into its own HMM. The parameters of each motion variant’s HMM are represented as a linear function of the parameters of the representative motion’s HMM and the position of the object. Thus, the individual motions can be parameterized by the position of the object. The performance of PHMM critically depends on the number of demonstrations of manipulation motions. It is difficult to apply the synthesized motion to situations that are markedly different from those in the demonstration. Calinon et al. (2013) proposes an approach to synthesize adaptive motions by using descriptors of both the motion and the manipulated object. The complexity of the descriptor is critical for systems with high degrees of freedom, such as humanoid robots. Paraschos et al. developed Probabilistic Movement Primitives (ProMP) for representing distributions over motions. A feedback controller can be stochastically derived that synthesizes motion reaching the target positions while maintaining the given motion distribution. In Paraschos et al. (2013), the distribution of parameters is characterized by a single Gaussian distribution across the whole motion trajectory. This specialization is necessary to allow efficient searching of the configuration space for useful parameters. However, the specialization restricts the variations of motions that can be synthesized. Additionally, the performance of the ProMP on motion recognition has not been investigated.

A dynamical system with states represented by nonlinear differential equations is widely used for encoding a training motion trajectory and subsequently synthesizing robot motion adapted to different configurations. Okada et al. defined the dynamics as a vector field. In that framework, a motion pattern is expressed by a closed curve in the vector field, and they proposed a design for such vector fields that would attract a robot’s motion to a reference motion (Okada et al. 2002). This attractor field will stabilize the robot’s motion even when the robot is perturbed by the external environment. This framework can be used to synthesize a closed motion trajectory, but it does not aim at modifying training motions to allow reaching target configurations at specified times. Additionally, it is difficult to use this framework to recognize observed motions. The Dynamic Movement Primitives (DMPs) framework is a popular approach for learning a single trajectory by using a spring–damping model that includes a forcing term used to attract a synthesized trajectory to a goal configuration (Ijispeert et al. 2001). The single-trajectory nature of the framework results in limited scalability and makes it difficult to apply the framework in real environments, where robots are required to perform various motions and must adapt to them. Matsubara et al. (2011) extended the DMPs framework to allow learning multiple motions and demonstrated the highly scalable nature of the modified framework by trials on a dual-arm robot. Prada et al. (2013) also specialized the DMPs framework, developing a robot that can pass an object to a human partner. Their approach enables handling of a moving goal. In these frameworks, only a single goal is given for the final phase. Although the resultant trajectory reaches this goal, it is difficult to modify the motion across the whole trajectory. More specifically, the setting of multiple sequential goals (i.e., via-points) is not supported. This restriction leads to limited adaptivity. Additionally, the DMPs framework has difficulty with recognizing human demonstrations. Noda et al. (2014) developed a framework for representation of sensory–motor data, using deep neural networks for this purpose. In that framework, a robot learns a sequence of multimodal data from vision, motor, and audio sensors. An auto-encoder model then extracts low-dimensional features from the sensory images, and these image features and motion data are integrated by another auto-encoder model. Neural networks trained in this way enable a humanoid robot to perform whole body motion that is responsive to sensory perceptions. However, the synthesized motion is not guaranteed to reach target positions.

## 3 Synthesis of motions with constraints from motion models

A full-body motion of a human or a humanoid robot is represented by a vector of states, \({{\varvec{x}}}_t\) at time *t*, such as joint angles or positions. A motion segment is expressed by a motion trajectory \({{\varvec{x}}}_1, {{\varvec{x}}}_2, \ldots , {{\varvec{x}}}_n\). The motion segment is encoded into a set of parameters of an HMM, and these sets of parameters are hereinafter referred to as a “motion model”. The HMM is a stochastic model and can be defined in a compact notation \(\lambda = \left\{ {{\varvec{A}}},{{\varvec{B}}},{\varvec{\varPi }} \right\} \), where \({{\varvec{A}}}=\left\{ a_{ij}\right\} \) is the matrix whose entries \(a_{ij}\) are the probability of transitioning from the *i*th node to the *j*th node, \({{\varvec{B}}}\) is the set of output probability density functions at the nodes, and \({\varvec{\varPi }}=\left\{ \pi _1, \pi _2, \ldots , \pi _n\right\} \) is the vector whose entries \(\pi _i\) are probability of starting at the *i*th node. Encoding of the motion segment optimizes these parameters of an HMM such that the likelihood that the HMM generates the motion segment is maximized. This optimization is achieved using the Baum–Welch algorithm, which is an expectation maximization (EM) algorithm (Rabiner and Juang 1993). HMMs can be used for motion recognition. Motion recognition finds the HMM from among a set of HMMs that is the most likely to generate the observed motion. This means that the observation is recognized as a motion model corresponding to an HMM. HMMs can be also used to generate motion trajectories. Thus, a humanoid robot memorizes motion segments in the form of motion models through observations and demonstrations, recognizes observed sequences, and generates its own motions by using the motion models. This section describes a novel approach to synthesizing kinematically constrained motion from a motion model in a way that allows humanoid robots to make human-like motions and achieve tasks dependent on the environment, as shown in Fig. 1.

Figure 2 shows a flow diagram of the synthesizing motion from motion models. This framework introduces an objective function that integrates three functions. The first function calculates the likelihood that a synthesized motion is generated by the HMM corresponding to a particular motion model. The second function calculates the error between the synthesized motion and kinematic constraints, which are manually given or determined by the environment. The third function calculates the smoothness of the synthesized motion. The large first function leads to a synthesized motion more similar to training motion, the small second function derives a motion reaching to target configurations, and the small third function derives a motion without abrupt acceleration. The motion synthesis computation searches for the motion trajectory that maximizes the objective function.

*t*and that \(\delta _t = 0\) otherwise. The second term in the objective function is derived from the difference between the state \({{\varvec{x}}}_t\) and the constraint \({{\varvec{x}}}_{t,c}\) at time

*t*. The second term can be written as a weighted quadratic form of the differences as follows:

*t*aligned in the column direction,

*d*is the number of dimensions of state \({{\varvec{x}}}_t\). The matrix \({{\varvec{L}}}\) is written as the \(dn \times dn\) Laplacian matrix.

*t*in the motion trajectory \({{\varvec{z}}}^{(k)}\) as generated from the

*i*th node in HMM \(\lambda \), where the motion trajectory \({{\varvec{z}}}^{(k)}\) is derived during the

*k*th maximization process. The maximization process optimizes the motion trajectory such that it maximizes the objective function \(\varPsi \), which is based on the estimated node distribution \(P(q_t=i|{{\varvec{z}}}^{(k)},\lambda )\). The maximization process results in a motion trajectory represented as (11).

*i*th node in the HMM \(\lambda \).

*dn*-dimensional vector consisting of

*n*copies of the mean vector \({\varvec{\mu }}_i\) at the

*i*th node of the HMM.

## 4 Experiments

An experiment on the practical synthesis of full-body motion was conducted by using our proposed framework, which generates a motion trajectory satisfying kinematic constraints from a motion model. We measured human full-body motions by using an optical motion capture system. The capture system acquired positions of 34 markers attached to a performer. These measured position data were converted into joint angles of a humanoid robot, HRP4, by inverse kinematics. Sequences of joint angles were obtained. We recorded three different motions for each action: “patting an object with the right hand”, “lifting an object with both hands”, and “touching an object with the right foot”. Only one trial from measuring the motion of a single individual is used for training each HMM. These actions are shown in Fig. 3. In this figure, the captured human full-body motion data were transformed to human character motions. Each sequence of joint angles was encoded into a 30-node left-to-right HMM (\(N=30\)). In this way, three motion models represented by the HMMs were acquired from the captured data.

In this experiment, kinematic constraints were given for the final postures of these three motions. Kinematic constraints to reach up with the right hand for an object at a high position and to reach down with the right hand for an object at a low position were added to the motion of “patting an object with the right hand”. The kinematic constraints to lift with both hands to a high or low position were added to the motion of “lifting an object with both hands”. The motion of “touching an object with the right foot” was given the added kinematic constraints of touching an object near the feet with a foot and touching an object on the ground ahead with a foot.

We tested the synthesized motions on the full-size humanoid robot HRP-4. All joint angles of the full body are controlled according to the synthesized motions of “raising the right fist to middle height” and “raising the right fist to chest height.” Because of limits in joint mobility, the motion “raising the fist up high” could be performed by the real humanoid robot. Figure 9 shows the two motions being performed by the humanoid robot. In the photographs on the top, the robot raises its right fist to middle height, and in the photographs on the bottom, the robot raises its right fist to chest height. The joint angle of the right shoulder in the final posture of the motion “raising the fist to middle height” was 0.49 rad, which differs from the given constraints by 0.03 rad. The joint angle of the right shoulder in the motion “raising the fist to chest height” was 0.11 rad in the final posture. The results of this experiment on a real humanoid robot demonstrate that our proposed method can synthesize full-body human-like motions.

In the experiments described above, the kinematic constraints were given manually. This is not practical for real-world applications, where robots will be expected to discover the appropriate constraints on motions and the environments. More specifically: the robots need the ability to extract key features between the motions and objects being manipulated. An HMM that captures not only the sequence of angles of all the joint in the full body but also the sequence of positions of the object in the robot’s coordinate system may be helpful for extracting features of constraints. Additionally, our proposed framework should be extended to account for the torques of all joints in humanoid robot’s full body, such that the robot can perform human-like motions and maintain the desired points of contact with the environment so that the robot can avoid falling down or suffering contact damage.

## 5 Conclusion

- 1.
We presented a novel approach to synthesizing human-like motion with kinematic constraints for a humanoid robot. Human full-body motions are encoded into HMMs, which can be referred to as motion models. The objective function is designed by using three functions. The first function represents the likelihood that the motion trajectory is generated from the motion model. The second function calculates the distance between the synthesized motion trajectory and kinematic constraints. The third function calculates the total jerk along the synthesized motion, so that minimizing this produces smooth motion.

- 2.
We proposed an algorithm to find the motion trajectory that maximizes the objective function. The algorithm alternates between two processes. One is estimating the distribution of node sequences in the HMM given a motion trajectory that is derived in the other process. The other process finds a motion trajectory that maximizes the objective function given the estimated distribution of the node sequences. Iteration of these two processes results in the motion synthesis.

- 3.
We tested our proposed approach to synthesizing full-body motion in a humanoid robot by using the motion models. We recorded the human full-body motions of “patting an object with the right hand,” “lifting an object with both hands,” “touching an object with the right foot,” and “raising the right fist.” Each recorded motion data was encoded into its corresponding HMM. Kinematic constraints were manually added to the motions. Synthesized motion demonstrated that the humanoid robot can not only perform human-like motions but also satisfy the kinematic constraints. The experiments validated our approach to synthesizing full-body motions in a humanoid robot.

## Notes

## References

- Asfour, T., Gyarfas, F., Azad, P., & Dillmann, R. (2006). Imitation learning of dual-arm manipulation task in humanoid robots. In
*Proceedings of the IEEE-RAS international conference on humanoid robots*(pp. 40–47).Google Scholar - Billard, A., Calinon, S., & Guenter, F. (2006). Discriminative and adaptive imitation in uni-manual and bi-manual tasks.
*Robotics and Autonomous Systems*,*54*, 370–384.CrossRefGoogle Scholar - Calinon, S., Alisadeh, T., & Caldwell, D. G. (2013). On improving the extrapolation capability of task-parameterized movement models. In
*Proceedings of the IEEE/RSJ international conference on intelligent robots and systems*(pp. 610–617).Google Scholar - Calinon, S., D’halluin, F., Sauser, E. L., Caldwell, D. G., & Billard, A. (2010). Learning and reproduction of gestures by imitation.
*IEEE Robotics Automation Magazine*,*17*(2), 44–54.CrossRefGoogle Scholar - Donald, M. (1991).
*Origin of the modern mind*. Cambridge: Harvard University Press.Google Scholar - Flash, T., & Hogan, N. (1985). The coordination of arm movement: An experimentally confirmed mathematical model.
*Neuroscience*,*5*, 1688–1703.CrossRefGoogle Scholar - Herzog, D., Krueger, V., & Grest, D. (2008). Parametric hidden markov models for recognition and synthesis of movements. In
*Proceedings of the British Machine Vision Conference*(pp. 163–172).Google Scholar - Ijispeert, A. J., Nakanishi, J., Shibata, T., & Schaal, S. (2001). Nonlinear dynamical systems for imitation with humanoid robots. In
*Proceedings of the IEEE-RAS international conference on humanoid robots*.Google Scholar - Ijspeert, A. J., Nakanishi, J., & Shaal, S. (2003). Learning control policies for movement imitation and movement recognition.
*Neural Information Processing System*,*15*, 1547–1554.Google Scholar - Inamura, T., Toshima, I., Tanie, H., & Nakamura, Y. (2004). Embodied symbol emergence based on mimesis theory.
*International Journal of Robotics Research*,*23*(4), 363–377.CrossRefGoogle Scholar - Ito, M., Noda, K., Hoshino, Y., & Tani, J. (2006). Dynamic and interactive generation of object handing behaviors by a small humanoid robot using a dynamic neural network model.
*Neural Networks*,*19*(3), 323–337.CrossRefzbMATHGoogle Scholar - Kadone, H., & Nakamura, Y. (2005). Symbolic memory for humanoid robots using hierarchical bifurcations of attractors in nonmonotonic neural networks. In
*Proceedings of the 2005 IEEE/RSJ international conference on intelligent robots and systems*(pp. 2900–2905).Google Scholar - Kulic, D., Takano, W., & Nakamura, Y. (2008). Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden markov chains.
*International journal of Robotics Research*,*27*(7), 761–784.CrossRefGoogle Scholar - Kunori, H., Lee, D., & Nakamura, Y. (2009). Associating and reshaping of whole body motions for object manipulation. In
*Proceedings of the 2008 IEEE/RSJ international conference on intelligent robots and systems*(pp. 5240–5247).Google Scholar - Matsubara, T., Hyon, S. H., & Morimoto, J. (2011). Learning parametric dynamic movement primitives from multiple demonstrations.
*Neural Networks*,*24*(5), 493–500.CrossRefGoogle Scholar - Noda, K., Arie, H., Suga, Y., & Ogata, T. (2014). Multimodal integration learning of robot behavior using deep neural networks.
*Robotics and Autonomous Systems*,*62*, 721–736.CrossRefGoogle Scholar - Okada, M., Tatani, K., & Nakamura, Y. (2002). Polynomial design of the nonlinear dynamics for the brain-like information processing of whole body motion. In
*Proceedings of the IEEE international conference on robotics and automation*(pp. 1410–1415).Google Scholar - Paraschos, A., Daniel, C., Peters, J., & Neumann, G. (2013). Probabilistic movement primitives. In
*Advances in neural information processing systems*(pp. 2616–2624).Google Scholar - Prada, M., Remazeilles, A., Koene, A., & Endo, S. (2013). Dynamic movement primitives for human-robot interaction: Comparison with human behavioral observation. In
*Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013*(pp. 1168–1175).Google Scholar - Rabiner, L. (1989). A tutorial on hidden markov models and selected applications in speech recognition.
*Proceedings of the IEEE*,*77*, 257–286.CrossRefGoogle Scholar - Rabiner, L., & Juang, B. H. (1993).
*Fundamentals of speech recognition. Prentice Hall Signal Processing Series*. Englewood Cliffs, NJ: Prentice Hall.Google Scholar - Sugiura, K., & Iwahashi, N. (2008). Motion recognition and generation by combining reference-point-dependent probabilistic models. In
*Proceedings of the 2008 IEEE/RSJ international conference on intelligent robots and systems*(pp. 852–857).Google Scholar - Takano, W., & Nakamura, Y. (2008). Integrating whole body motion primitives and natural language for humanoid robots. In
*Proceedings of the IEEE-RAS international conference on humanoid robots*(pp. 708–713).Google Scholar - Takano, W., Yamada, Y., & Nakamura, Y. (2018). Linking human motions and objects to language for synthesizing action sentences.
*Autonomous Robots*. https://doi.org/10.1007/s10514-018-9762-1. - Takano, W., Yamane, K., & Nakamura, Y. (2007). Capture database through symbolization, recognition and generation of motion patterns. In
*Proceedings of the IEEE international conference on robotics and automation*(pp. 3092–3097).Google Scholar

## Copyright information

**OpenAccess**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.