Skip to main content

A tutorial on task-parameterized movement learning and retrieval


Task-parameterized models of movements aim at automatically adapting movements to new situations encountered by a robot. The task parameters can, for example, take the form of positions of objects in the environment or landmark points that the robot should pass through. This tutorial aims at reviewing existing approaches for task-adaptive motion encoding. It then narrows down the scope to the special case of task parameters that take the form of frames of reference, coordinate systems or basis functions, which are most commonly encountered in service robotics. Each section of the paper is accompanied by source codes designed as simple didactic examples implemented in Matlab with a full compatibility with GNU Octave, closely following the notation and equations of the article. It also presents ongoing work and further challenges that remain to be addressed, with examples provided in simulation and on a real robot (transfer of manipulation behaviors to the Baxter bimanual robot). The repository for the accompanying source codes is available at

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14


  1. 1.

    Competition/collaboration arises due to the weighting term \(h_{t,i}\) in Eq. (49) summing over the influence of the other Gaussian components.

  2. 2.

    Possible extensions are possible here for a local modulation of movement duration.

  3. 3.

    To simplify the notation, the number of derivatives will be set up to acceleration (\(C=3\)), but the results can easy be generalized to a higher or lower number of derivatives (in the provided source codes, a parameter automatically sets the number of derivatives to be considered).

  4. 4.

    Note that a similar operator is defined to handle border conditions and that \({\varvec{{\varPhi }}}\) can automatically be constructed through the use of Kronecker products, see source codes for details.

  5. 5.

    The use of an HSMM encoding can autonomously regenerate such sequence in a stochastic manner, which is not described here due to space constraints.

  6. 6.

    Equations (30) and (31) describe a trajectory distribution and can be computed efficiently with Cholesky and/or QR decompositions by exploiting the positive definite symmetric band structure of the matrices, see for example [87]. With the Cholesky decomposition \({({\varvec{{\varSigma }}}^{{\varvec{s}}})}^{-1}={\varvec{T}}^{\scriptscriptstyle \top }{\varvec{T}}\), the objective function is maximized when \({\varvec{T}}{\varvec{{\varPhi }}}{\varvec{x}}={\varvec{T}}{\varvec{\mu }}^{{\varvec{s}}}\). With a QR decomposition \({\varvec{T}}{\varvec{{\varPhi }}}={\varvec{Q}}{\varvec{R}}\), the equation becomes \({\varvec{Q}}{\varvec{R}}{\varvec{x}}={\varvec{T}}{\varvec{\mu }}^{{\varvec{s}}}\) with a solution efficiently computed with \({\varvec{x}}={\varvec{R}}^{-1}{\varvec{Q}}^{\scriptscriptstyle \top }{\varvec{T}}{\varvec{\mu }}^{{\varvec{s}}}\). When using Matlab, \({\varvec{\hat{x}}}\) and \({\varvec{\hat{{\varSigma }}}}^{{\varvec{x}}}\) in Eqs. (30) and (31) can, for example, be computed with the lscov function.

  7. 7.

    Note here that the term parametric in PGMM/PHMM (referring to task parameters) is ambiguous because a standard GMM can also be described as being parametric (referring to model parameters).

  8. 8.

    Full end-effector poses or decoupled position and orientation can be considered here.


  1. 1.

    Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of international conference on machine learning (ICML)

  2. 2.

    Akgun B, Thomaz A (2015) Simultaneously learning actions and goals from demonstration. Autono Robots 1–17. doi:10.1007/s10514-015-9448-x

  3. 3.

    Alissandrakis A, Nehaniv CL, Dautenhahn K (2006) Action, state and effect metrics for robot imitation. In: Proceedings of IEEE international symposium on robot and human interactive communication (Ro-Man), pp 232–237. Hatfield, UK

  4. 4.

    Alizadeh T, Calinon S, Caldwell DG (2014) Learning from demonstrations with partially observable task parameters. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 3309–3314. Hong Kong, China

  5. 5.

    Antonelli G (2014) Underwater robots, 3rd edn. Springer, Berlin

    Book  Google Scholar 

  6. 6.

    Astrom KJ, Murray RM (2008) Feedback systems: an introduction for scientists and engineers. Princeton University Press, Princeton

    Google Scholar 

  7. 7.

    Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309

    Article  Google Scholar 

  8. 8.

    Basser PJ, Pajevic S (2003) A normal distribution for tensor-valued random variables: applications to diffusion tensor MRI. IEEE Trans Med Imaging 22(7):785–794

    Article  Google Scholar 

  9. 9.

    Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    MATH  MathSciNet  Article  Google Scholar 

  10. 10.

    Borrelli F, Bemporad A, Morari M (2015) Predictive control for linear and hybrid systems. Cambridge University Press, Cambridge In preparation

    Google Scholar 

  11. 11.

    Bouveyron C, Brunet C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78

    MathSciNet  Article  Google Scholar 

  12. 12.

    Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519

    MATH  MathSciNet  Article  Google Scholar 

  13. 13.

    Brand M, Hertzmann A (2000) Style machines. In: Proceedings of ACM international conference on computer graphics and interactive techniques (SIGGRAPH), pp 183–192. New Orleans, Louisiana, USA

  14. 14.

    Calinon S, Alizadeh T, Caldwell DG (2013) On improving the extrapolation capability of task-parameterized movement models. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 610–616. Tokyo, Japan

  15. 15.

    Calinon S, Billard AG (2009) Statistical learning by imitation of competing constraints in joint space and task space. Adv Robot 23(15):2059–2076

    Article  Google Scholar 

  16. 16.

    Calinon S, Bruno D, Caldwell DG (2014) A task-parameterized probabilistic model with minimal intervention control. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 3339–3344. Hong Kong, China

  17. 17.

    Calinon S, D’halluin F, Sauser EL, Caldwell DG, Billard AG (2010) Learning and reproduction of gestures by imitation: an approach based on hidden Markov model and Gaussian mixture regression. IEEE Robot Autom Mag 17(2):44–54

    Article  Google Scholar 

  18. 18.

    Calinon S, Guenter F, Billard AG (2007) On learning, representing and generalizing a task in a humanoid robot. IEEE Trans Syst Man Cybern B 37(2):286–298

    Article  Google Scholar 

  19. 19.

    Calinon S, Kormushev P, Caldwell DG (2013) Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning. Robot Auton Sys 61(4):369–379

    Article  Google Scholar 

  20. 20.

    Calinon S, Li Z, Alizadeh T, Tsagarakis NG, Caldwell DG (2012) Statistical dynamical systems for skills acquisition in humanoids. In: Proceedings of IEEE international conference on humanoid robots (humanoids), pp 323–329. Osaka, Japan

  21. 21.

    Campbell CL, Peters RA, Bodenheimer RE, Bluethmann WJ, Huber E, Ambrose RO (2006) Superpositioning of behaviors learned through teleoperation. IEEE Trans Robot 22(1): 79–91

    Article  Google Scholar 

  22. 22.

    Chatzis SP, Korkinof D, Demiris Y (2012) A nonparametric Bayesian approach toward robot learning by demonstration. Robot Auton Syst 60(6):789–802

    Article  Google Scholar 

  23. 23.

    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38

    MATH  MathSciNet  Google Scholar 

  24. 24.

    Doerr A, Ratliff N, Bohg J, Toussaint M, Schaal S (2015) Direct loss minimization inverse optimal control. In: Proceedings of robotics: science and systems (R:SS), pp 1–9. Rome, Italy

  25. 25.

    Dong S, Williams B (2012) Learning and recognition of hybrid manipulation motions in variable environments using probabilistic flow tubes. Int J Soc Robot 4(4):357–368

    Article  Google Scholar 

  26. 26.

    Field M, Stirling D, Pan Z, Naghdy F (2015) Learning trajectories for robot programing by demonstration using a coordinated mixture of factor analyzers. IEEE Trans Cybern (in press)

  27. 27.

    Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396

    Article  Google Scholar 

  28. 28.

    Flash T, Hochner B (2005) Motor primitives in vertebrates and invertebrates. Curr Opin Neurobiol 15(6):660–666

    Article  Google Scholar 

  29. 29.

    Forte D, Gams A, Morimoto J, Ude A (2012) On-line motion synthesis and adaptation using a trajectory database. Robot Auton Syst 60(10):1327–1339

    Article  Google Scholar 

  30. 30.

    Furui S (1986) Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans Acoust Speech Signal Process 34(1):52–59

    Article  Google Scholar 

  31. 31.

    Gales MJF (1999) Semi-tied covariance matrices for hidden Markov models. IEEE Trans Speech Audio Process 7(3):272–281

    Article  Google Scholar 

  32. 32.

    Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers. Tech. rep., University of Toronto

  33. 33.

    Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems (NIPS), vol 6. Morgan Kaufmann, San Francisco, pp 120–127

    Google Scholar 

  34. 34.

    Greggio N, Bernardino A, Dario P, Santos-Victor J (2014) Efficient greedy estimation of mixture models through a binary tree search. Robot Auton Syst 62(10):1440–1452

    Article  Google Scholar 

  35. 35.

    Grimes DB, Chalodhorn R, Rao RPN (2006) Dynamic imitation in a humanoid robot through nonparametric probabilistic inference. In: Proceedings of robotics: science and systems (R:SS), pp 1–8

  36. 36.

    Gross R, Shi J (2001) The CMU motion of body (MoBo) database. Tech. Rep. CMU-RI-TR-01-18, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA

  37. 37.

    Hak S, Mansard N, Stasse O, Laumond JP (2012) Reverse control for humanoid robot task recognition. IEEE Trans Syst Man Cybern B Cybern 42(6):1524–1537

    Article  Google Scholar 

  38. 38.

    Hersch M, Guenter F, Calinon S, Billard AG (2006) Learning dynamical system modulation for constrained reaching tasks. In: Proceedings of IEEE international conference on humanoid eobots (humanoids), pp 444–449. Genova, Italy

  39. 39.

    Hersch M, Guenter F, Calinon S, Billard AG (2008) Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Trans Robot 24(6):1463–1467

    Article  Google Scholar 

  40. 40.

    Hinton GE (2007) Learning multiple layers of representation. Trends Cogn Sci 11(10):428–434

    Article  Google Scholar 

  41. 41.

    Hogan N, Sternad D (2012) Dynamic primitives of motor behavior. Biol Cybern 106(11–12):727–739

    MathSciNet  Article  Google Scholar 

  42. 42.

    Hsu D, Kakade SM (2013) Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Conference on innovations in theoretical computer science, pp 11–20

  43. 43.

    Ijspeert A, Nakanishi J, Pastor P, Hoffmann H, Schaal S (2013) Dynamical movement primitives: learning attractor models for motor behaviors. Neural Comput 25(2):328–373

    MATH  MathSciNet  Article  Google Scholar 

  44. 44.

    Inamura T, Toshima I, Tanie H, Nakamura Y (2004) Embodied symbol emergence based on mimesis theory. Int J Robot Res 23(4–5):363–377

    Article  Google Scholar 

  45. 45.

    Jetchev N, Toussaint M (2014) Discovering relevant task spaces using inverse feedback control. Auton Robot 37(2):169–189

    Article  Google Scholar 

  46. 46.

    Kelso JAS (2009) Synergies: atoms of brain and behavior. In: Sternad D (ed) A multidisciplinary approach to motor control. Advances in Experimental Medicine and Biology, vol 629. Springer, Heidelberg, pp 83–91

  47. 47.

    Khansari-Zadeh SM, Billard A (2011) Learning stable non-linear dynamical systems with Gaussian mixture models. IEEE Trans Robot 27(5):943–957

    Article  Google Scholar 

  48. 48.

    Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33(4):361–379

    Article  Google Scholar 

  49. 49.

    Kolda T, Bader B (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500

    MATH  MathSciNet  Article  Google Scholar 

  50. 50.

    Krishnan S, Garg A, Patil S, Lea C, Hager G, Abbeel P, Goldberg K (2015) Unsupervised surgical task segmentation with milestone learning. In: Proceedings of international symposium on robotics research (ISRR)

  51. 51.

    Kronander K, Khansari-Zadeh MSM, Billard A (2011) Learning to control planar hitting motions in a minigolf-like task. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 710–717

  52. 52.

    Krueger V, Herzog DL, Baby S, Ude A, Kragic D (2010) Learning actions from observations: primitive-based modeling and grammar. IEEE Robot Autom Mag 17(2):30–43

    Article  Google Scholar 

  53. 53.

    Kulis B, Jordan MI (2012) Revisiting k-means: new algorithms via Bayesian nonparametrics. In: Proceedings of international conference on machine learning (ICML)

  54. 54.

    Latash ML, Scholz JP, Schoener G (2002) Motor control strategies revealed in the structure of motor variability. Exerc Sport Sci Rev 30(1):26–31

    Article  Google Scholar 

  55. 55.

    Lee D, Ott C (2011) Incremental kinesthetic teaching of motion primitives using the motion refinement tube. Auton Robots 31(2):115–131

    MathSciNet  Article  Google Scholar 

  56. 56.

    Lee SH, Suh IH, Calinon S, Johansson R (2015) Autonomous framework for segmenting robot trajectories of manipulation task. Auton Robots 38(2):107–141

    Article  Google Scholar 

  57. 57.

    Levine S, Wagener N, Abbeel P (2015) Learning contact-rich manipulation skills with guided policy search. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 156–163

  58. 58.

    Lober R, Padois V, Sigaud O (2014) Multiple task optimization using dynamical movement primitives for whole-body reactive control. In: Proceedings of IEEE international conference on humanoid robots (humanoids). Madrid, Spain

  59. 59.

    MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of of the 5th Berkeley symposium on mathematical statistics and probability, pp 281–297

  60. 60.

    Matsubara T, Hyon SH, Morimoto J (2011) Learning parametric dynamic movement primitives from multiple demonstrations. Neural Netw 24(5):493–500

    Article  Google Scholar 

  61. 61.

    McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41(3–4):379–388

    MATH  MathSciNet  Article  Google Scholar 

  62. 62.

    McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296

    MathSciNet  Article  Google Scholar 

  63. 63.

    Medina JR, Lee D, Hirche S (2012) Risk-sensitive optimal feedback control for haptic assistance. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 1025–1031

  64. 64.

    Miller S, Fritz M, Darrell T, Abbeel P (2011) Parametrized shape models for clothing. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 4861–4868

  65. 65.

    Moldovan TM, Levine S, Jordan MI, Abbeel P (2015) Optimism-driven exploration for nonlinear systems. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 3239–3246. Seattle, WA, USA

  66. 66.

    Mühlig M, Gienger M, Steil J (2012) Interactive imitation learning of object movement skills. Auton Robots 32(2):97–114

    Article  Google Scholar 

  67. 67.

    Mussa-Ivaldi FA (1992) From basis functions to basis fields: vector field approximation from sparse data. Biol Cybern 67(6):479–489

    MATH  Article  Google Scholar 

  68. 68.

    Neal RM, Hinton GE (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, pp 355–368

    Google Scholar 

  69. 69.

    Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 849–856

    Google Scholar 

  70. 70.

    Nguyen-Tuong D, Peters J (2008) Local Gaussian process regression for real-time model-based robot control. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 380–385

  71. 71.

    Niekum S, Osentoski S, Konidaris G, Chitta S, Marthi B, Barto AG (2015) Learning grounded finite-state representations from unstructured demonstrations. Int J Robot Res 34(2):131–157

    Article  Google Scholar 

  72. 72.

    Paraschos A, Daniel C, Peters J, Neumann G (2013) Probabilistic movement primitives. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems (NIPS). Curran Associates, Red Hook, pp 2616–2624

    Google Scholar 

  73. 73.

    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–285

    Article  Google Scholar 

  74. 74.

    Rasmussen CE (2000) The infinite Gaussian mixture model. In: Solla SA, Leen TK, Mueller K-R (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 554–560

    Google Scholar 

  75. 75.

    Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  76. 76.

    Renard N, Bourennane S, Blanc-Talon J (2008) Denoising and dimensionality reduction using multilinear tools for hyperspectral images. IEEE Geosci Remote Sens Lett 5(2):138–142

    Article  Google Scholar 

  77. 77.

    Rueckert E, Mundo J, Paraschos A, Peters J, Neumann G (2015) Extracting low-dimensional control variables for movement primitives. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 1511–1518. Seattle, WA, USA

  78. 78.

    Saveriano M, An S, Lee D (2015) Incremental kinesthetic teaching of end-effector and null-space motion primitives. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 3570–3575

  79. 79.

    Schaal S, Atkeson CG (1998) Constructive incremental learning from only local information. Neural Comput 10(8):2047–2084

    Article  Google Scholar 

  80. 80.

    Schaal S, Mohajerian P, Ijspeert AJ (2007) Dynamics systems vs. optimal control: a unifying view. Prog Brain Res 165:425–445

    Article  Google Scholar 

  81. 81.

    Scholz JP, Schoener G (1999) The uncontrolled manifold concept: identifying control variables for a functional task. Exp Brain Res 126(3):289–306

    Article  Google Scholar 

  82. 82.

    Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MATH  Article  Google Scholar 

  83. 83.

    Scott DW, Szewczyk WF (2001) From kernels to mixtures. Technometrics 43(3):323–335

    MathSciNet  Article  Google Scholar 

  84. 84.

    Shi T, Belkin M, Yu B (2009) Data spectroscopy: eigenspace of convolution operators and clustering. Ann Stat 37(6B):3960–3984

    MATH  MathSciNet  Article  Google Scholar 

  85. 85.

    Signoretto M, Van de Plas R, De Moor B, Suykens JAK (2011) Tensor versus matrix completion: a comparison with application to spectral data. IEEE Signal Process Lett 18(7):403–406

    Article  Google Scholar 

  86. 86.

    Sternad D, Park SW, Mueller H, Hogan N (2010) Coordinate dependence of variability analysis. PLoS Comput Biol 6(4):1–16

    Article  Google Scholar 

  87. 87.

    Strang G (1986) Introduction to applied mathematics. Wellesley-Cambridge Press, Wellesley

    MATH  Google Scholar 

  88. 88.

    Stulp F, Sigaud O (2015) Many regression algorithms, one unified model—a review. Neural Netw 69:60–79

    Article  Google Scholar 

  89. 89.

    Sugiura K, Iwahashi N, Kashioka H, Nakamura S (2011) Learning, generation, and recognition of motions by reference-point-dependent probabilistic models. Adv Robot 25(6–7):825–848

    Article  Google Scholar 

  90. 90.

    Sung HG (2004) Gaussian mixture regression and classification. PhD thesis, Rice University, Houston, Texas

  91. 91.

    Tang J, Singh A, Goehausen N, Abbeel P (2010) Parameterized maneuver learning for autonomous helicopter flight. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 1142–1148

  92. 92.

    Tang Y, Salakhutdinov R, Hinton G (2012) Deep mixtures of factor analysers. In: Proceedings of international conference on machine learning (ICML). Edinburgh, Scotland

  93. 93.

    Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comput 11(2):443–482

    Article  Google Scholar 

  94. 94.

    Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5:1226–1235

    Article  Google Scholar 

  95. 95.

    Tokuda K, Masuko T, Yamada T, Kobayashi T, Imai S (1995) An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features. In: Proceedings of European conference on speech communication and technology (EUROSPEECH), pp 757–760

  96. 96.

    Towell C, Howard M, Vijayakumar S (2010) Learning nullspace policies. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 241–248

  97. 97.

    Ude A, Gams A, Asfour T, Morimoto J (2010) Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Trans Robot 26(5):800–815

    Article  Google Scholar 

  98. 98.

    Vasilescu MAO, Terzopoulos D (2002) Multilinear analysis of image ensembles: TensorFaces. In: Computer vision (ECCV), Lecture Notes in Computer Science, vol 2350. Springer, Berlin, pp 447–460

  99. 99.

    Verbeek JJ, Vlassis N, Kroese B (2003) Efficient greedy learning of Gaussian mixture models. Neural Comput 15(2):469–485

    MATH  Article  Google Scholar 

  100. 100.

    Vijayakumar S, D’souza A, Schaal S (2005) Incremental online learning in high dimensions. Neural Comput 17(12):2602–2634

    MathSciNet  Article  Google Scholar 

  101. 101.

    Wang Y, Zhu J (2015) DP-space: Bayesian nonparametric subspace clustering with small-variance asymptotics. In: Proceedings of international conference on machine learning (ICML), pp 1–9. Lille, France

  102. 102.

    Wilson AD, Bobick AF (1999) Parametric hidden Markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9):884–900

  103. 103.

    Wolpert DM, Diedrichsen J, Flanagan JR (2011) Principles of sensorimotor learning. Nat Rev 12:739–751

    Google Scholar 

  104. 104.

    Wrede S, Emmerich C, Ricarda R, Nordmann A, Swadzba A, Steil JJ (2013) A user study on kinesthetic teaching of redundant robots in task and configuration space. J Hum Robot Interact 2:56–81

    Article  Google Scholar 

  105. 105.

    Yamazaki T, Niwase N, Yamagishi J, Kobayashi T (2005) Human walking motion synthesis based on multiple regression hidden semi-Markov model. In: Proceedings of international conference on cyberworlds, pp 445–452

  106. 106.

    Zen H, Tokuda K, Kitamura T (2007) Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Comput Speech Lang 21(1):153–173

    Article  Google Scholar 

  107. 107.

    Zhao Q, Zhou G, Adali T, Zhang L, Cichocki A (2013) Kernelization of tensor-based models for multiway data analysis: processing of multidimensional structured data. IEEE Signal Process Mag 30(4):137–148

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Sylvain Calinon.

Additional information

This work was in part supported by the DexROV Project through the EC Horizon 2020 programme (Grant #635491).


Appendix 1: Expectation-maximization for TP-GMM parameters estimation

In order to estimate the parameters of a TP-GMM, the following two steps are repeated until convergence:


$$\begin{aligned} h_{t,i} = \frac{\pi _i \prod \nolimits _{j=1}^P \mathcal {N}\Big ({\varvec{X}}^{(j)}_t \Big |\; {\varvec{\mu }}^{(j)}_i,{\varvec{{\varSigma }}}^{(j)}_i \Big )}{\sum _{k=1}^K \pi _k \prod \nolimits _{j=1}^P \mathcal {N}\Big ({\varvec{X}}^{(j)}_t \Big |\; {\varvec{\mu }}^{(j)}_k,{\varvec{{\varSigma }}}^{(j)}_k \Big )}. \end{aligned}$$


$$\begin{aligned}&\pi _i \leftarrow \frac{\sum _{t=1}^N h_{t,i}}{N}, \end{aligned}$$
$$\begin{aligned}&{\varvec{\mu }}^{(j)}_i \leftarrow \frac{ \sum _{t=1}^N h_{t,i}\; {\varvec{X}}^{(j)}_t }{\sum _{t=1}^N h_{t,i}},\end{aligned}$$
$$\begin{aligned}&{\varvec{{\varSigma }}}^{(j)}_i \leftarrow \frac{\sum _{t=1}^N h_{t,i}\; \Big ({\varvec{X}}^{(j)}_t-{\varvec{\mu }}^{(j)}_i\Big ) \Big ({\varvec{X}}^{(j)}_t-{\varvec{\mu }}^{(j)}_i \Big )^{\scriptscriptstyle \top }}{\sum _{t=1}^N h_{t,i}}. \end{aligned}$$

In practice, it is recommended to start EM from a coarse estimate of the parameters. For example, based on an equal split in time of motion segments, based on a geometric segmentation with k-means [59], based on moments or spectral approaches with circular covariances [42, 53, 84] or based on an iterative clustering algorithm [83].

Model selection (i.e., determining the number of Gaussians in the GMM) is compatible with the techniques employed in standard GMM, such as the use of a Bayesian information criterion [82], Dirichlet process [22, 50, 65, 74], iterative pairwise replacement [83], spectral clustering [53, 69, 84] or based on segmentation points [56]. Model selection in mixture modeling shares a similar core challenge as that of data-driven sparse kernel regression techniques, which requires to find the right bandwidth parameters to select a subset of existing/new datapoints that are the most representatives of the dataset.

Appendix 2: Expectation-maximization for TP-MFA and TP-MPPCA parameters estimation

In TP-MFA, the generative model for the jth frame and ith mixture component assumes that a D-dimension random vector \({\varvec{X}}^{(j)}\) is modeled using a d-dimension vector of latent (unobserved) factors \({\varvec{z}}^{(j)}\)

$$\begin{aligned} {\varvec{X}}^{(j)} = {\varvec{{\varLambda }}}^{(j)}_i {\varvec{z}}^{(j)} + {\varvec{\mu }}^{(j)}_i + {\varvec{\epsilon }}^{(j)}_i, \end{aligned}$$

where \({\varvec{\mu }}^{(j)}_i\in \mathbb {R}^D\) is the mean vector of the ith factor analyzer, \({\varvec{z}}^{(j)}\sim \mathcal {N}({\varvec{0}},{\varvec{I}})\) (the factors are assumed to be distributed according to a zero-mean normal with unit variance), and \({\varvec{\epsilon }}^{(j)}_i\sim \mathcal {N}({\varvec{0}},{\varvec{{\varPsi }}}^{(j)}_i)\) is a centered normal noise with diagonal covariance \({\varvec{{\varPsi }}}^{(j)}_i\).

This diagonality is a key assumption in factor analysis. Namely, the observed variables are independent given the factors, and the goal of TP-MFA is to best model the covariance structure of \({\varvec{X}}^{(j)}\). It follows from this model that the marginal distribution of \({\varvec{X}}^{(j)}\) for the ith component is

$$\begin{aligned} {\varvec{X}}^{(j)} \;\sim \; \mathcal {N}\left( {\varvec{\mu }}^{(j)}_i,\; {\varvec{{\varLambda }}}^{(j)}_i {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }+ {\varvec{{\varPsi }}}^{(j)}_i \right) , \end{aligned}$$

and the joint distribution of \({\varvec{X}}^{(j)}\) and \({\varvec{z}}^{(j)}\) is

$$\begin{aligned} \left[ \begin{matrix} {\varvec{X}}^{(j)} \\ {\varvec{z}}^{(j)} \end{matrix}\right] \;\sim \; \mathcal {N}\left( \left[ \begin{matrix} {\varvec{\mu }}^{(j)}_i \\ {\varvec{0}} \end{matrix}\right] ,\; \left[ \begin{matrix} {\varvec{{\varLambda }}}^{(j)}_i {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }+ {\varvec{{\varPsi }}}^{(j)}_i &{} \;{\varvec{{\varLambda }}}_i \\ {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }&{} \;{\varvec{I}} \end{matrix}\right] \right) . \end{aligned}$$

The above can be used to show that the d factors are informative projections of the data, which can be computed by Gaussian conditioning, corresponding to the affine projection

$$\begin{aligned} {\varvec{z}}^{(j)} | {\varvec{X}}^{(j)} \;\sim \; \overbrace{{{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }{\Big ({\varvec{{\varLambda }}}^{(j)}_i {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }+ {\varvec{{\varPsi }}}^{(j)}_i\Big )}^{-1}}^{{\varvec{B}}^{(j)}_i}\; \Big ({\varvec{\mu }}^{(j)}_i-{\varvec{X}}^{(j)}\Big ). \end{aligned}$$

As highlighted by [32], the same process can be used to estimate the second moment of the factors \(\mathbb {E}\Big ({\varvec{z}}^{(j)}{{\varvec{z}}^{(j)}}^{\scriptscriptstyle \top }|{\varvec{X}}^{(j)}\Big )\), which provides a measure of uncertainty in the factors that has no analogue in PCA. This relation can be exploited to derive an EM algorithm (see for example [32] or [62]) to train a TP-MFA model of K components with parameters \(\big \{\pi _i,\{{\varvec{\mu }}^{(j)}_i,{\varvec{{\varLambda }}}^{(j)}_i,{\varvec{{\varPsi }}}^{(j)}_i\}_{j=1}^P\big \}_{i=1}^K\), yielding an EM parameters estimation strategy.

The following two steps are repeated until convergence:


$$\begin{aligned} h_{t,i} = \frac{\pi _i \prod \nolimits _{j=1}^P \mathcal {N}\Big ({\varvec{X}}^{(j)}_t \Big | \;{\varvec{\mu }}^{(j)}_i,\; {\varvec{{\varLambda }}}^{(j)}_i {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }+ {\varvec{{\varPsi }}}^{(j)}_i \Big )}{\sum _{k=1}^K \pi _k \prod \nolimits _{j=1}^P \mathcal {N}\Big ({\varvec{X}}^{(j)}_t \Big | \;{\varvec{\mu }}^{(j)}_k,\; {\varvec{{\varLambda }}}^{(j)}_k {{\varvec{{\varLambda }}}^{(j)}_k}^{\scriptscriptstyle \top }+ {\varvec{{\varPsi }}}^{(j)}_k \Big )}. \end{aligned}$$


$$\begin{aligned} \pi _i&\leftarrow \frac{\sum _{t=1}^N h_{t,i} }{N},\end{aligned}$$
$$\begin{aligned} {\varvec{\mu }}^{(j)}_i&\leftarrow \frac{\sum _{t=1}^N h_{t,i} {\varvec{X}}^{(j)}_t}{\sum _{t=1}^N h_{t,i} },\end{aligned}$$
$$\begin{aligned} {\varvec{{\varLambda }}}^{(j)}_i&\leftarrow {\varvec{S}}^{(j)}_i {{\varvec{B}}^{(j)}_i}^{\scriptscriptstyle \top }{\Big ({\varvec{I}}-{\varvec{B}}^{(j)}_i {\varvec{{\varLambda }}}^{(j)}_i + {\varvec{B}}^{(j)}_i {\varvec{S}}^{(j)}_i {{\varvec{B}}^{(j)}_i}^{\scriptscriptstyle \top }\Big )}^{-1}, \end{aligned}$$
$$\begin{aligned} {\varvec{{\varPsi }}}^{(j)}_i&\leftarrow {{\mathrm {diag}}}\Big ({{\mathrm {diag}}}\big ( {\varvec{S}}^{(j)}_i - {\varvec{{\varLambda }}}^{(j)}_i {\varvec{B}}^{(j)}_i {\varvec{S}}^{(j)}_i \big )\Big ), \end{aligned}$$

computed with the help of the intermediary variables

$$\begin{aligned} {\varvec{S}}^{(j)}_i&= \frac{\sum _{t=1}^N h_{t,i} \Big ({\varvec{X}}^{(j)}_t-{\varvec{\mu }}^{(j)}_i\Big ) {\Big ({\varvec{X}}^{(j)}_t-{\varvec{\mu }}^{(j)}_i\Big )}^{\scriptscriptstyle \top }}{\sum _{t=1}^N h_{t,i} }, \end{aligned}$$
$$\begin{aligned} {\varvec{B}}^{(j)}_i&= {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }{\Big ({\varvec{{\varLambda }}}^{(j)}_i {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }+ {\varvec{{\varPsi }}}^{(j)}_i\Big )}^{-1}. \end{aligned}$$

Alternatively, an update step simultaneously computing \({\varvec{\mu }}^{(j)}_i\) and \({\varvec{{\varLambda }}}^{(j)}_i\) can be derived, see [32] for details.

Similarly, the M-step in TP-MPPCA is given by

$$\begin{aligned} {\varvec{\tilde{\varLambda }}}^{(j)}_i&\leftarrow {\varvec{S}}^{(j)}_i {\varvec{{\varLambda }}}^{(j)}_i {\Big ({\varvec{I}}{\sigma ^{(j)}_i}^2 + {{\varvec{M}}^{(j)}_i}^{-1} {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }{\varvec{S}}^{(j)}_i {\varvec{{\varLambda }}}^{(j)}_i\Big )}^{-1}, \end{aligned}$$
$$\begin{aligned} {\varvec{{\varPsi }}}^{(j)}_i&\leftarrow {\varvec{I}} {\sigma ^{(j)}_i}^2, \end{aligned}$$

computed with the help of the intermediary variables

$$\begin{aligned} {\varvec{S}}^{(j)}_i&= \frac{\sum _{t=1}^N h_{t,i} \Big ({\varvec{\xi }}^{(j)}_t - {\varvec{\mu }}^{(j)}_i\Big ) {\Big ({\varvec{\xi }}^{(j)}_t - {\varvec{\mu }}^{(j)}_i\Big )}^{\scriptscriptstyle \top }}{\sum _{t=1}^N h_{t,i} }, \end{aligned}$$
$$\begin{aligned} {\varvec{M}}^{(j)}_i&= {{\varvec{{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }{\varvec{{\varLambda }}}^{(j)}_i + {\varvec{I}} {\sigma ^{(j)}_i}^2, \end{aligned}$$
$$\begin{aligned} {\sigma ^{(j)}_i}^2&= \frac{1}{D} {{\mathrm {tr}}}\Big ({\varvec{S}}^{(j)}_i - {\varvec{S}}^{(j)}_i {\varvec{{\varLambda }}}^{(j)}_i {{\varvec{M}}^{(j)}_i}^{-1} {{\varvec{\tilde{\varLambda }}}^{(j)}_i}^{\scriptscriptstyle \top }\Big ), \end{aligned}$$

where \({\varvec{{\varLambda }}}^{(j)}_i\) is replaced by \({\varvec{\tilde{\varLambda }}}^{(j)}_i\) at each iteration, see [93] for details.

Appendix 3: Gaussian mixture regression approximated by a single normal distribution

Let us consider a datapoint \({\varvec{\xi }}_t\) distributed as in Eq. (6), with \(\mathcal {P}({\varvec{\xi }}_t) = \mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t,{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t)\) being the joint distribution describing the data. The conditional probability of an output given an input is

$$\begin{aligned} \mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) = \frac{\mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t,{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t)}{\mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)} = \frac{\sum _{i=1}^K \mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t,{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | z_i) \mathcal {P}(z_i)}{\mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)}, \end{aligned}$$

where \(z_i\) represents the ith component of the GMM. Namely,

$$\begin{aligned} {\mathcal {P}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)&= \sum _{i=1}^K {\mathcal {P}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t|{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t,z_i) \frac{{\mathcal {P}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t | z_i) {\mathcal {P}}(z_i)}{{\mathcal {P}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)} \nonumber \\&= \sum _{i=1}^K h_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)\; \mathcal {N}\left( {\varvec{\hat{\mu }}}^{\scriptscriptstyle {{\mathcal {O}}}}_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t),{\varvec{\hat{{\varSigma }}}}^{\scriptscriptstyle {{\mathcal {O}}}}_i\right) . \end{aligned}$$

The conditional mean can be computed as

$$\begin{aligned} {\varvec{\hat{\mu }}}{}^{\scriptscriptstyle {{\mathcal {O}}}}_t = \mathbb {E}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)&= \int {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t \; \mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \; d{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t \nonumber \\&= \int {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t \sum _{i=1}^K h_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)\; \mathcal {N}\left( {\varvec{\hat{\mu }}}^{\scriptscriptstyle {{\mathcal {O}}}}_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t),{\varvec{\hat{{\varSigma }}}}^{\scriptscriptstyle {{\mathcal {O}}}}_i\right) \nonumber \\&= \sum _{i=1}^K h_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \; {\varvec{\hat{\mu }}}_i^{\scriptscriptstyle {{\mathcal {O}}}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t). \end{aligned}$$

In order to evaluate the covariance, we calculate

$$\begin{aligned} {{\mathrm {cov}}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) = \mathbb {E}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t {{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t}^{\scriptscriptstyle \top }| {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t ) - \mathbb {E}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \mathbb {E}({{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t}^{\scriptscriptstyle \top }| {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t). \end{aligned}$$

We have that

$$\begin{aligned} \mathbb {E}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t {{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t}^{\scriptscriptstyle \top }| {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)&= \int {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t {{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t}^{\scriptscriptstyle \top }\mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \; d{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t \nonumber \\&= \int \sum _{i=1}^K h_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \; {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t {{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t}^{\scriptscriptstyle \top }\mathcal {N}\left( {\varvec{\hat{\mu }}}^{\scriptscriptstyle {{\mathcal {O}}}}_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t),{\varvec{\hat{{\varSigma }}}}^{\scriptscriptstyle {{\mathcal {O}}}}_i\right) d{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t \nonumber \\&= \sum _{i=1}^K h_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \int {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t {{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t}^{\scriptscriptstyle \top }\mathcal {N}\left( {\varvec{\hat{\mu }}}^{\scriptscriptstyle {{\mathcal {O}}}}_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t),{\varvec{\hat{{\varSigma }}}}^{\scriptscriptstyle {{\mathcal {O}}}}_i\right) d{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t. \end{aligned}$$

By using Eq. (72) with a Gaussian distribution, we obtain

$$\begin{aligned} \mathbb {E}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t {{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t}^{\scriptscriptstyle \top }| {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)= & {} \sum _{i=1}^K h_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) {\varvec{\hat{{\varSigma }}}}_i^{\scriptscriptstyle {{\mathcal {O}}}}\nonumber \\&+ \sum _{i=1}^K h_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \; {\varvec{\hat{\mu }}}_i^{\scriptscriptstyle {{\mathcal {O}}}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) {\big ({\varvec{\hat{\mu }}}_i^{\scriptscriptstyle {{\mathcal {O}}}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)\big )}^{\scriptscriptstyle \top }. \end{aligned}$$

Combining (72) with (74) we finally have that (see also [90])

$$\begin{aligned} {\varvec{\hat{\varSigma }}}{}^{\scriptscriptstyle {{\mathcal {O}}}}_t= & {} {{\mathrm {cov}}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t | {\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) = \sum _{i=1}^K h_i({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \Big ( {\varvec{\hat{{\varSigma }}}}_i^{\scriptscriptstyle {{\mathcal {O}}}} + {\varvec{\hat{\mu }}}_i^{\scriptscriptstyle {{\mathcal {O}}}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t) \; {{\varvec{\hat{\mu }}}_i^{\scriptscriptstyle {{\mathcal {O}}}}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)}^{\scriptscriptstyle \top }\Big )\nonumber \\&-\,{\varvec{\hat{\mu }}}^{\scriptscriptstyle {{\mathcal {O}}}}_t {{\varvec{\hat{\mu }}}^{\scriptscriptstyle {{\mathcal {O}}}}_t}^{\scriptscriptstyle \top }. \end{aligned}$$

Appendix 4: Expectation-maximization for parametric GMM parameters estimation

The following two steps are repeated until convergence, see [102] for details:


$$\begin{aligned} h_{t,i} = \frac{\pi _i \mathcal {N}\big ({\varvec{\xi }}_t \big |\; {\varvec{\mu }}_{t,i},{\varvec{{\varSigma }}}_i \big )}{\sum _{k=1}^K \pi _k \mathcal {N}\big ({\varvec{\xi }}_t \big |\; {\varvec{\mu }}_{t,k},{\varvec{{\varSigma }}}_k \big )}. \end{aligned}$$


$$\begin{aligned}&\pi _i \leftarrow \frac{\sum _{t=1}^N h_{t,i}}{N}, \end{aligned}$$
$$\begin{aligned}&{\varvec{Z}}_i \leftarrow \left( \sum _{t=1}^N h_{t,i} \; {\varvec{\xi }}_t \big [{\varvec{Q}}_t^{\scriptscriptstyle \top },\; 1\big ] \right) \left( \sum _{t=1}^N h_{t,i} {\big [{\varvec{Q}}_t^{\scriptscriptstyle \top },\; 1\big ]}^{\scriptscriptstyle \top }\big [{\varvec{Q}}_t^{\scriptscriptstyle \top },\; 1\big ] \right) ^{-1} ,\end{aligned}$$
$$\begin{aligned}&{\varvec{{\varSigma }}}_i \leftarrow \frac{\sum _{t=1}^N h_{t,i} \; ({\varvec{\xi }}_t-{\varvec{\mu }}_{t,i})({\varvec{\xi }}_t-{\varvec{\mu }}_{t,i} )^{\scriptscriptstyle \top }}{\sum _{t=1}^N h_{t,i}},\end{aligned}$$
$$\begin{aligned}&{{\mathrm {where}}}\quad {\varvec{\mu }}_{t,i}={\varvec{Z}}_i \; {\big [{\varvec{Q}}_t^{\scriptscriptstyle \top },\; 1\big ]}^{\scriptscriptstyle \top }. \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Calinon, S. A tutorial on task-parameterized movement learning and retrieval. Intel Serv Robotics 9, 1–29 (2016).

Download citation


  • Probabilistic motion encoding
  • Task-parameterized movements
  • Task-adaptive models
  • Natural motion synthesis