Autonomous exploration of motor skills by skill babbling

Reinhart, René Felix

doi:10.1007/s10514-016-9613-x

Autonomous exploration of motor skills by skill babbling

Published: 28 December 2016

Volume 41, pages 1521–1537, (2017)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

René Felix Reinhart ORCID: orcid.org/0000-0003-3305-1157¹

736 Accesses
16 Citations
Explore all metrics

Abstract

Autonomous exploration of motor skills is a key capability of learning robotic systems. Learning motor skills can be formulated as inverse modeling problem, which targets at finding an inverse model that maps desired outcomes in some task space, e.g., via points of a motion, to appropriate actions, e.g., motion control policy parameters. In this paper, autonomous exploration of motor skills is achieved by incrementally learning inverse models starting from an initial demonstration. The algorithm is referred to as skill babbling, features sample-efficient learning, and scales to high-dimensional action spaces. Skill babbling extends ideas of goal-directed exploration, which organizes exploration in the space of goals. The proposed approach provides a modular framework for autonomous skill exploration by separating the learning of the inverse model from the exploration mechanism and a model of achievable targets, i.e. the workspace. The effectiveness of skill babbling is demonstrated for a range of motor tasks comprising the autonomous bootstrapping of inverse kinematics and parameterized motion primitives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of motion planning algorithms for intelligent robots

Article Open access 25 November 2021

Programming tunable active dynamics in a self-propelled robot

Article 23 May 2024

A review of recent trend in motion planning of industrial robots

Article 22 February 2023

References

Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.
Article Google Scholar
Baranes, A., & Oudeyer, P. Y. (2013). Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 61(1), 49–73.
Article Google Scholar
Calinon, S., Alizadeh, T., & Caldwell, D. G. (2013). On improving the extrapolation capability of task-parameterized movement models. In IEEE/RSJ international conference on intelligent robots and systems (pp. 610–616).
Edelsbrunner, H., & Mücke, E. P. (1994). Three-dimensional alpha shapes. ACM Transactions on Graphics, 13(1), 43–72.
Article MATH Google Scholar
Haykin, S. (1991). Adaptive filter theory. New York: Prentice Hall.
MATH Google Scholar
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. IEEE International Joint Conference on Neural Networks, 2, 985–990.
Google Scholar
Ijspeert, A. J., Nakanishi, J., Hoffmann, H., Pastor, P., & Schaal, S. (2013). Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, 25(2), 328–373.
Article MathSciNet MATH Google Scholar
Jordan, M. I., & Rumelhart, D. E. (1992). Forward models: Supervised learning with a distal teacher. Cognitive Science, 16(3), 307–354.
Article Google Scholar
Khansari-Zadeh, S. (2012). http://www.amarsi-project.eu/open-source
Kober, J., Wilhelm, A., Oztop, E., & Peters, J. (2012). Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 33, 361–379.
Article Google Scholar
Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32(11), 1238–1274.
Article Google Scholar
Kormushev, P., Calinon, S., & Caldwell, D. (2010). Robot motor skill coordination with EM-based reinforcement learning. In IEEE/RSJ international conference on intelligent robots and systems (pp. 3232–3237).
Kulvicius, T., Ning, K., Tamosiunaite, M., & Worgötter, F. (2012). Joining movement sequences: Modified dynamic movement primitives for robotics applications exemplified on handwriting. IEEE Transactions on Robotics, 28(1), 145–157.
Article Google Scholar
Kupcsik, A., Deisenroth, M. P., Peters, J., & Neumann, G. (2013). Data-efficient generalization of robot skills with contextual policy search. In AAAI conference on artificial intelligence (pp. 1401–1407).
Lemme, A., Meirovitch, Y., Khansari-Zadeh, S. M., Flash, T., Billard, A., & Steil, J. J. (2015). Open-source benchmarking for learned reaching motion generation in robotics. Paladyn Journal of Behavioral Robotics, 6(1), 30–41.
Article Google Scholar
Liang, N. Y., Huang, G. B., Saratchandran, P., & Sundararajan, N. (2006). A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Transactions on Neural Networks, 17(6), 1411–1423.
Article Google Scholar
Lundgren, J. (2010). alphavol.m. http://au.mathworks.com/matlabcentral/fileexchange/28851-alpha-shapes
Matsubara, T., Hyon, S., & Morimoto, J. (2010). Learning stylistic dynamic movement primitives from multiple demonstrations. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1277–1283).
Mülling, K., Kober, J., Kroemer, O., & Peters, J. (2013). Learning to select and generalize striking movements in robot table tennis. Intern Journal of Robotics Research, 32(3), 263–279.
Article Google Scholar
Pontón, B., Farshidian, F., & Buchli, J. (2014). Learning compliant locomotion on a quadruped robot. In I. R. O. S. Workshop (Ed.), Compliant manipulation: Challenges in learning and control.
Reinhart, R., & Steil, J. (2014). Efficient policy search with a parameterized skill memory. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1400–1407).
Reinhart, R. F., & Steil, J. J. (2015). Efficient policy search in low-dimensional embedding spaces by generalizing motion primitives with a parameterized skill memory. Autonomous Robots, 38(4), 331–348.
Article Google Scholar
Ritter, H. (1991). Learning with the self-organizing map. In Artificial neural networks (pp. 357–364). New York: Elsevier.
Rolf, M., & Steil, J. (2014). Efficient exploratory learning of inverse kinematics on a bionic elephant trunk. IEEE Transactions on Neural Networks and Learning Systems, 25(6), 1147–1160.
Rolf, M., Steil, J., & Gienger, M. (2010). Goal babbling permits direct learning of inverse kinematics. IEEE Transactions on Autonomous Mental Development, 2(3), 216–229.
Article Google Scholar
Rolf, M., Steil, J., & Gienger, M. (2011). Online goal babbling for rapid bootstrapping of inverse models in high dimensions. IEEE International Conference on Development and Learning, 2, 1–8.
Google Scholar
Schmidt, W., Kraaijveld, M., & Duin, R. (1992). Feedforward neural networks with random weights. In IAPR international conference on pattern recognition, conference B: Pattern recognition methodology and systems (Vol. II, pp. 1–4).
da Silva B. C., Konidaris, G., & Barto, A. G. (2012). Learning parameterized skills. In International conference on machine learning (pp. 1679–1686).
da Silva B. C., Baldassarre, G., Konidaris, G., & Barto, A. (2014a) Learning parameterized motor skills on a humanoid robot. In IEEE international conference on robotics and automation (pp. 5239–5244).
da Silva, B. C., Konidaris, G., & Barto, A. (2014b) Active learning of parameterized skills. In International conference on machine learning, JMLR workshop and conference proceedings (pp. 1737–1745).
Stulp, F., & Sigaud, O. (2013). Robot skill learning: From reinforcement learning to evolution strategies. Paladyn Journal of Behavioral Robotics, 4(1), 49–61.
Article Google Scholar
Stulp, F., Raiola, G., Hoarau, A., Ivaldi, S., & Sigaud, O. (2013). Learning compact parameterized skills with a single regression. In IEEE-RAS international conference on humanoid robots (pp. 417–422).
Theodorou, E., Buchli, J., & Schaal, S. (2010). A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research, 11, 3137–3181.
MathSciNet MATH Google Scholar
Ude, A., Riley, M., Nemec, B., Kos, A., Asfour, T., & Cheng, G. (2007). Synthesizing goal-directed actions from a library of example movements. In IEEE-RAS international conference on humanoid robots (pp. 115–121).

Download references

Acknowledgements

This research and development project is funded by the German Federal Ministry of Education and Research (BMBF) within the Leading-Edge Cluster Competition and managed by the Project Management Agency Karlsruhe (PTKA).

Author information

Authors and Affiliations

Research Institute for Cognition and Robotics – CoR-Lab, Bielefeld University, Universitätsstr. 25, 33615, Bielefeld, Germany
René Felix Reinhart

Authors

René Felix Reinhart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to René Felix Reinhart.

Appendix: Learning algorithms

For skill babbling, any learning algorithm is applicable which solves the weighted regression problem (6) online. In this paper, the two following learning algorithms are applied.

1.1 Locally linear model

The first learner is a Locally Linear Model (LLM, Ritter 1991) as it is implemented by Rolf et al. (2011). It comprises $l=1,\ldots ,L$ linear models

$$\begin{aligned} g_{l}(\mathbf {x}) = {\mathbf {W}}_l \mathbf {x}+ {\mathbf {b}}_l . \end{aligned}$$

The linear models $g_l(\mathbf {x})$ are combined according to

$$\begin{aligned} \mathbf {y}(\mathbf {x}) = \frac{1}{n(\mathbf {x})} \sum _{l=1}^L b\left( \frac{\mathbf {x}- \mathbf {p}_l}{d}\right) g_l\left( \frac{\mathbf {x}- \mathbf {p}_l}{d}\right) \end{aligned}$$

with Gaussian responsibilities

$$\begin{aligned} b(\mathbf {x}) = exp(-||\mathbf {x}||^2) \end{aligned}$$

and the normalization

$$\begin{aligned} n(\mathbf {x}) = \sum _{l=1}^L b\left( \frac{\mathbf {x}- \mathbf {p}_l}{d}\right) , \end{aligned}$$

where the radius $d$ determines the area of responsibility for each linear model around prototypical centers $\mathbf {p}_l$.

For supervised training, appropriate centers $\mathbf {p}_l$ for the linear models have to be found and the parameters ${\mathbf {W}}_l$ and ${\mathbf {b}}_l$ of the linear models have to be learned. Following the implementation by Rolf et al. (2011), centers are incrementally added to the learner according to a vector quantization algorithm. For the initial training sample $(\mathbf {x}_1, \mathbf {y}_1)$, the first linear model with center $\mathbf {p}_1 = \mathbf {x}_1$, linear part ${\mathbf {W}}_1 = {\mathbf {0}}$, and bias ${\mathbf {b}}_1 = \mathbf {y}_1$ is created. For new training samples $(\mathbf {x}_n, \mathbf {y}_n)$ together with the weight $w_n$, the weighted square error $w_n ||\mathbf {y}_n - \mathbf {y}(\mathbf {x}_n)||^2$ is minimized by online gradient descend with learning rate $\eta $.

1.2 Extreme learning machine with weighted recursive least squares learning

The second learner is a variant of an Extreme Learning Machine (ELM, Huang et al. 2004) with weighted recursive least squares learning. ELMs are feedforward neural networks with a single hidden layer. The output is computed according to

$$\begin{aligned} \mathbf {y}(\mathbf {x}) = {\mathbf {W}}^{\text {out}}{^T} \varvec{\sigma }({\mathbf {W}}^{\text {inp}}\mathbf {x}+ {\mathbf {b}}), \end{aligned}$$

where $\sigma (a) = 1/(1 + exp(-a))$ is a sigmoid activation function applied component-wise to the synaptic summations ${\mathbf {a}} = {\mathbf {W}}^{\text {inp}}\mathbf {x}+ {\mathbf {b}}$. The special property of ELMs is that learning is restricted to the read-out weights ${\mathbf {W}}^{\text {out}}$, which makes backpropagation of errors dispensable. Learning then boils down to a simple linear regression problem for ${\mathbf {W}}^{\text {out}}$. The input weights ${\mathbf {W}}^{\text {inp}}\in {\mathbb {R}}^{H\times dim(\mathbf {x})}$ and biases ${\mathbf {b}}\in {\mathbb {R}}^{H}$ are initialized randomly and remain fixed. Typically, the number of hidden neurons $H$ is chosen large in comparison to the number of inputs. In this paper, the values of ${\mathbf {W}}^{\text {inp}}$ and ${\mathbf {b}}$ are drawn from uniform distributions in range $[-2, 2]$ and $[-1, 1]$ if not stated otherwise. Note that the idea of using feedforward neural networks with a random hidden layer has been proposed earlier, e.g., by Schmidt et al. (1992).

While the variant of ELMs proposed by Liang et al. (2006) does feature sequential learning, it does not incorporate a weighted error criterion and does not make use of a forgetting factor. In this paper, a weighted recursive least squares algorithm similar to Haykin (1991) is applied in order to update the read-out weights ${\mathbf {W}}^{\text {out}}$ sequentially. For the first training sample $(\mathbf {x}_1, \mathbf {y}_1)$, the read-out weights are initialized according to

$$\begin{aligned} {\mathbf {W}}^{\text {out}}_1 = {\mathbf {P}}_1 \mathbf {h}(\mathbf {x}_1) \mathbf {y}_1^T \, \text { with } \, {\mathbf {P}}_1 = (\mathbf {h}(\mathbf {x}_1) \mathbf {h}(\mathbf {x}_1)^T + \varepsilon \mathbbm {1}_{H})^{-1}, \end{aligned}$$

where

$$\begin{aligned} \mathbf {h}(\mathbf {x}) = \sigma ({\mathbf {W}}^{\text {inp}}\mathbf {x}+ {\mathbf {b}}) \, \in {\mathbb {R}}^{H} \end{aligned}$$

are the hidden neuron activations for input $\mathbf {x}$ and $\varepsilon > 0$ is a regularization parameter. For new training samples $(\mathbf {x}_n, \mathbf {y}_n)$ together with the weight $w_n$, the read-out weights are updated sequentially according to the weighted recursive least squares rule

$$\begin{aligned} \varvec{\kappa }_n&= \frac{1}{\lambda + \mathbf {h}(\mathbf {x}_n)^T {\mathbf {P}}_{n-1} \mathbf {h}(\mathbf {x}_n)} {\mathbf {P}}_{n-1} \mathbf {h}(\mathbf {x}_n) \\ {\mathbf {W}}^{\text {out}}_n&= {\mathbf {W}}^{\text {out}}_{n-1} + w_n \varvec{\kappa }_n \left( \mathbf {y}_n - {\mathbf {W}}^{\text {out}}_{n-1}{^T} \mathbf {h}(\mathbf {x}_n)\right) \\ {\mathbf {P}}_n&= \frac{1}{\lambda } \left( {\mathbf {P}}_{n-1} - \varvec{\kappa }_n^T \mathbf {h}(\mathbf {x}_n) {\mathbf {P}}_{n-1}\right) , \end{aligned}$$

where $0 \ll \lambda \le 1$ is a forgetting factor. That is, $\lambda = 1$ corresponds to no forgetting.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reinhart, R.F. Autonomous exploration of motor skills by skill babbling. Auton Robot 41, 1521–1537 (2017). https://doi.org/10.1007/s10514-016-9613-x

Download citation

Received: 15 September 2015
Accepted: 16 December 2016
Published: 28 December 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10514-016-9613-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomous exploration of motor skills by skill babbling

Abstract

Access this article

Similar content being viewed by others

A review of motion planning algorithms for intelligent robots

Programming tunable active dynamics in a self-propelled robot

A review of recent trend in motion planning of industrial robots

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Learning algorithms

1.1 Locally linear model

1.2 Extreme learning machine with weighted recursive least squares learning

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Autonomous exploration of motor skills by skill babbling

Abstract

Access this article

Similar content being viewed by others

A review of motion planning algorithms for intelligent robots

Programming tunable active dynamics in a self-propelled robot

A review of recent trend in motion planning of industrial robots

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Learning algorithms

Appendix: Learning algorithms

1.1 Locally linear model

1.2 Extreme learning machine with weighted recursive least squares learning

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation