Learning from Demonstration (Programming by Demonstration)
Learning from demonstration (LfD), also called programming by demonstration (PbD), refers to the process used to transfer new skills to a machine by relying on demonstrations from a user. It is inspired by the imitation capability developed by humans and animals to acquire new skills. LfD aims at making programming accessible to novice users by providing them with an intuitive interface they are familiar with, as humans already exchange knowledge in this way.
In robotics, LfD appeared as a way to reprogram a robot without having to rely on a computer language or a complex interface. It instead introduces more intuitive skill transfer interactions with the robot (Billard et al., 2016; Argall et al., 2009). The goal is to provide user-friendly interfaces that do not require knowledge in computer programming or robotics. LfD can be considered at various levels, from the transfer of low-level motor control to the transfer of high-level symbolic reasoning capabilities. For a given skill to be acquired, several learning strategies can be considered, from the copying of the demonstrated actions to more elaborated abstractions, such as the extraction of the underlying objectives of the actions (Coates et al., 2009; Ratliff et al., 2009). The terms behavioral cloning and inverse optimal control can respectively be used to refer to these two broad learning strategies. They have connections with imitation mechanisms studied in ethology, where one can distinguish action-level mimicry to goal-level emulation (see, e.g., Whiten et al. 2009).
Key Research Findings
The key research developments in LfD cover various fields of research. Such developments also include the joint exploitation and organization of these different research aspects.
Kinesthetic teaching, also called direct teaching, refers to the process of moving the robot manually, by haptic interaction, while the robot records the demonstration through its sensors (proprioception). This is often used for articulated robots. In this case, the user can demonstrate the task directly in the workspace of the robot by constraining the movement to the robot capability. Compared to observational learning, kinesthetic teaching simplifies the correspondence problem (Nehaniv and Dautenhahn, 2002) (also called motion retargeting or body mapping, see Fig. 1). The drawback is that the user does not execute the task on his/her own and can control only a limited number of articulations simultaneously. In practice, such limitation often restricts kinesthetic teaching to tasks that do not involve highly dynamic movements. At a technical level, kinesthetic teaching is often implemented by having the robot actively compensating for the effect of gravity (typically implemented with torque-controlled robots), sometimes also complemented by compensation of inertia, friction, and Coriolis effects.
LfD also considers demonstration modalities dedicated to robotics, such as devices used in a teleoperation setting. This includes commands from a graphical user interface, joysticks, or more elaborated devices such as exoskeletons. These interfaces either act in a passive way, or they can be provided with feedback mechanisms. The most sophisticated interfaces exploit recent developments in telepresence to allow the user to demonstrate tasks by feeling as if she/he was executing it on her/his own.
A key research aspect underlying LfD is the design of compact and adaptive movement representations that can be used for both analysis and synthesis. The term movement primitives is often employed in this context to highlight their modularity. The goal of such encoding strategy is to represent movements as a set of adaptive building blocks that can be (re)organized in parallel and in series to create more complex behaviors. Often, such representation also enables demonstrations and reproductions to be executed in different situations. The proposed representations originate from various fields and gather components from statistical learning, computational motor control, computational neuroscience, cognitive sciences, developmental psychology, human movement sciences, or optimal control.
Examples with a link to motor control and dynamical systems include dynamical movement primitives (DMPs) (Ijspeert et al., 2013). They represent movements as a controller that modulates a spring-damper system with nonlinear forcing terms represented as a combination of radial basis functions. Several extensions of DMPs have been proposed to handle coordination and task variations (Calinon et al., 2012; Paraschos et al., 2013), with the aim of providing a probabilistic formulation of DMPs enabling the exploitation of temporal and spatial coordination patterns.
Examples with a link to statistical learning include representations based on hidden Markov models (HMMs), with many variants such as incremental learning extensions (Lee and Ott, 2011), the inclusion of dynamic features to retrieve trajectory distributions (Calinon and Lee, 2018), the local encoding of state durations to handle partial demonstrations (Zeestraten et al., 2016), or the exploitation of the hierarchical organization capability of HMMs (Kulic et al., 2008). Another key challenge closely related to statistical representations in LfD concerns the problem of autonomously segmenting and abstracting the continuous flow of demonstration (Savarimuthu et al., 2018; Niekum et al., 2015; Krishnan et al., 2015; Lee et al., 2012).
Another method to represent movements in LfD is to encode the entire attractor landscape in the state space of the observed data. Such approach provides representations based on time-invariant autonomous systems. It usually comes at the expense of estimating asymptotically stable dynamical systems, which can be a difficult constrained optimization problem in high-dimensional spaces. An example of this approach is the stable estimator of dynamical systems (SEDS) (Khansari-Zadeh and Billard, 2011). Other approaches based on geometrical diffeomorphic transformation have also been investigated (Neumann and Steil, 2015; Perrin and Schlehuber-Caissier, 2016) to solve this challenge.
Exploitation of (co)variations
The developed LfD algorithms need to cope with several forms of variations, which make the problem harder than a simple record and play process. First, variations can arise from the constraints of the task to be transferred (dropping a bouillon cube in a pot requires less precision than dropping a sugar cube in a cup). Then, variations can arise from the kinematic structure of the robot (a redundant arm can achieve the same task in different manners). Most often, these variations are better described by taking into account the covariations instead of each dimension separately. From a statistical perspective, this corresponds to the use of full covariances instead of diagonal covariances. In high-dimensional problems, the full covariances will often be considered with a low rank structure (e.g., PCA decomposition in joint angle space). From a motor control perspective, such approach allows the encoding of coordination patterns and synergies that are of primary importance in many different skills and at diverse levels (Kelso, 2009).
One of the challenges of LfD is to exploit the detected covariations in the demonstrations to retrieve an adaptive and robust controller for the reproduction of the task. This can, for example, take the form of competing constraints through the weighting of movement primitives activated in parallel or through a hierarchical organization of the task (Kulic et al., 2008). This can also be incorporated more directly within a control strategy by retrieving a controller following a minimal intervention strategy (Todorov and Jordan, 2002; Calinon, 2016; Zeestraten et al., 2016).
An important aspect of LfD is to enable robots to acquire skills that can be adapted to new situations. A common approach to achieve such generalization capability is to associate task variables (what describes the situation) with movement variables (what describes the skill) and then use a regression technique to retrieve new movement variables from new task variables (see, e.g., Paraschos et al. 2013). Such regression approach is generic since the task variables can represent a wide range of context features organized in a vector form.
An alternative approach is to encode demonstrations from the perspective of multiple coordinate systems (Calinon, 2016). This is achieved by providing a list of observers that could be relevant for the movement or the task to transfer. Such approach is motivated by the observation that skillful movement planning and control often require the orchestration of multiple coordinate systems that can have varying levels of importance along the task (see, e.g., Bennequin et al. 2009). Typical examples are movements in object-, body-, head-, or gaze-centered frames of reference that can collaborate in various manners for the different phases of a task. Invariance and coordination extraction in movements are also closely related to the coordinate systems in which the analysis takes place (Sternad et al., 2010). Task-parameterized movement primitives (Calinon, 2016) take inspiration from these lines of research to encode movements from the perspective of multiple coordinate systems, where a statistical analysis is simultaneously conducted in each coordinate system. Such model encapsulates the variations and coordinations in each frame, enabling the robot to learn from demonstration information about the orchestration and transition between the coordinate systems, resulting in improved extrapolation capability (e.g., for the adaptation of movements to new positions of objects involved in a manipulation task). This generalization capability comes at the expense of limiting the task parameters to be in the form of coordinate systems or local projection operators.
Learning by interaction
In addition to the machine learning and skill encoding perspectives described above, another key research perspective concerns the exploitation of the social learning mechanisms in LfD (see Nehaniv and Dautenhahn 2007 for an overview). A large part of the efforts in LfD concerns the development of learning and control algorithms. Such developments most often assume that expert datasets are available (e.g., assuming that the provided demonstrations are relevant solutions to the problem). They often also explicitly specify the learning strategy to be used, such as mimicking actions (without understanding the overall objective), goal-level imitation (inverse optimal control, extraction of the underlying objectives by discarding the specific way in which the task is achieved), or refinement by kinesthetic corrections. While such developments are important, they do not account for the way in which data are collected. In contrast to many machine learning applications in which the learning systems are independent of the acquired data, a remarkable characteristic of LfD is that the iterative interaction with the user and the robot can be exploited to influence the quality and nature of the collected data. It was observed in robotics that several learning strategies need to be combined to acquire skills efficiently (see, e.g., Cakmak et al. 2010). In the field of machine learning, this is sometimes referred to as machine teaching or iterative machine teaching (Liu, 2017).
Examples of Application
The development of LfD is motivated by many application areas and can be applied to various robots. In an industrial context, it is driven by the evolution of the shopfloor toward quick and cost-effective adaptations of existing assembly lines, as well as the handling of small volumes such as personalized products. In practice, LfD enables robots to be reprogrammed by the persons who know the tasks to achieve, but who do not necessary have expertise in robot programming. In this context, LfD removes the costly and timely step of soliciting external expertise each time the robots need to be reprogrammed.
In service robotics, LfD aims at providing personalized assistance and services that could not be preprogrammed in advance due to the broad variety of tasks, persons, or environments that a robot can encounter. In some of these applications, LfD can rely on interactive social cues and on the natural human propensity to teach news skills to others – a communication behavior we already use to transfer knowledge.
With humanoids, LfD has been tested with various adaptive control skills involving both discrete (point-to-point) and periodic (rhythmic) motions, ranging from biped locomotion (Nakanishi et al., 2004) to the transfer of communicative gestures (Lee et al., 2010). Examples for services and entertaining activities learned from demonstration include pouring beverages (Mühlig et al., 2012), cooking rice (Lee et al., 2012), or playing the drums (Ude et al., 2010). With robot manipulators, the skills investigated in LfD typically relate to assembly (see, e.g., Savarimuthu et al. 2018). Other skills are considered in lab environments to test and evaluate the generalization capability of these approaches, ranging from table tennis strokes (Rueckert et al., 2015) to the rolling of pizza dough (Calinon et al., 2013).
A recent line of work in LfD considers the transfer of shared control behaviors. In the field of human-robot collaboration, examples are the collaborative transportation of objects (Evrard et al., 2009) and the assistance in the assembly of objects or furnitures (Rozo et al., 2016; Maeda et al., 2017). In such applications, the role of LfD is to demonstrate the collaborative manipulation so that the user can then employ the robot as if she/he was collaborating with the person who demonstrated the skill. LfD can also be extended to assistive behaviors (learning assistance by demonstration) in applications such as surgical interventions (Reiley et al., 2010; Padoy and Hager, 2011; Yang et al., 2014; Krishnan et al., 2015; Chen et al., 2016; Bruno et al., 2017), feeding tasks (Canal et al., 2016; Calinon et al., 2010), or dressing assistance (Pignat and Calinon, 2017), as well as in the context of robotic wheelchairs (Soh and Demiris, 2015) and exoskeletons (Hamaya et al., 2017).
Future Directions for Research
LfD is a rich and diverse research field with many open problems. Examples of ongoing challenges are described below. These examples are not exhaustive and only present a subset of potential future research directions.
Learning with small datasets
In the field of machine learning, important efforts are deployed toward developing learning algorithms dedicated to large datasets and deep learning strategies. Most of these developments target problems in which data are readily available or inexpensive to acquire. LfD holds a distinct challenge, in the sense that it often requires the robot to acquire skills from only few demonstrations and interactions, with strong generalization demands. On the one side, such system collects a very large amount of information from a large variety of sensors, but on the other side, it is limited by the number of demonstrations that the user can provide to keep the procedure user-friendly. In LfD, there are strong needs to develop algorithms that can exploit data as efficiently as possible while being acquired. This challenge is connected to diverse research directions such as online learning, lifelong learning, continual adaptation, or never-ending learning.
Skill encoding for heterogeneous data with structures
From a skill encoding, decomposition, and organization perspective, movement primitives have largely been studied in the context of gestures or motions without contacts. There are currently strong demands toward developing similar mechanism to handle the transfer of a richer set of skills, involving contacts with the environment, force profiles, varying compliance, manipulability ellipsoids, and priority constraints. In all cases, a modular, adaptive, and compact representation is required to learn new skills from demonstration. One of the challenges is to find an approach that could handle such variety of data for both analysis and synthesis.
The data handled by LfD have structures and symmetries that are currently underexploited in the learning process. This is inefficient, because with the low number of demonstrations in LfD, it would be important to conserve as much information as possible from each demonstration. A direction for future work is to develop algorithms that could efficiently take into account these different structures, symmetries for both sensing and actuation data. One potential approach in this direction could be to exploit knowledge of the manifold from which the data come from (e.g., with Riemannian geometry). There are numerous of such known geometries in robotics, including stiffness and damping gains, inertia, manipulability ellipsoids (symmetric positive definite matrices), orientations (unit quaternions), periodic movements (phase variable on unit circle manifold), or rotary joints (e.g., a two-link planar robot forms a torus manifold).
Bridging the gap between symbolic and continuous knowledge
Current research in LfD tends to dissociate low-level and high-level learning aspects. On the one side of the spectrum, continuous representations are developed in tight links to the low-level control capability of the robots. On the other side of the spectrum, high-level learning approaches with discrete representations are developed to provide the level of abstraction required to perform cognitive tasks.
There are research efforts toward augmenting low-level learning methods with the extraction of discrete features and structural elements. Similarly, there are research efforts to provide high-level learning methods with techniques that more closely exploit the motor control capability. Research efforts are required to bridge the gap between symbolic and continuous knowledge in LfD, which could lead to more flexible and scalable learning of tasks. It requires the development of models and algorithms capable of covering a wide spectrum of representations, from the continuous stream of low-level sensorimotor data to macro actions, reasoning, and high-level symbolic representations of skills. One first step in this direction is to address the problem of learning to organize in series and in parallel multiple movement primitives (as in transfer learning, instead of learning each primitive individually) and to tackle the problem of learning the structures of these models (instead of setting the structure a priori and learning the parameters).
Exploiting the social interaction dimension in LfD
In LfD, the way in which the different learning modalities can be organized and coexist remains largely unexplored. Questions include how and when a robot should request feedback from the user, either explicitly (e.g., through demonstration requests or spoken questions to validate hypotheses about motor skill properties) or implicitly (e.g., by exaggerating parts of movements to measure users reaction). How to autonomously determine which learning modality is currently the most appropriate/available/efficient to improve the skill to be acquired? How should this efficiency be measured (e.g., in terms of interaction duration, in terms of generalization ability)? Parts of this problem share links with active learning, but with a distinct and important multimodal social interaction aspect.
In addition to extracting control patterns from predetermined learning strategies, one further challenge of LfD is to acquire interaction patterns and devise efficient ways of making different learning modalities coexist, such as assessing autonomously which learning strategy to use in a given context. One such research direction requires a better exploitation of the social dimension in LfD, where both actors can influence the success of skills acquisition. Another related aspect concerns the extension of LfD to a richer set of teaching interactions, with interchangeable roles that would not only involve the human as a teacher and the robot as a learner but that would instead consider varied interactions such as a robot learning from multiple teachers, a user learning from the robot, or a robot transferring skills to another robot. Similarly, learning from counterexamples, or from conflicting, ambiguous, suboptimal, or unsuccessful demonstrations, is an important research route that still requires further investigation. Finally, the definition of evaluation metrics and benchmarks for LfD needs to be strengthened for the evolution of this research field.
- Calinon S, Alizadeh T, Caldwell DG (2013) On improving the extrapolation capability of task-parameterized movement models. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), Tokyo, pp 610–616, Nov 2013Google Scholar
- Calinon S, D’halluin F, Sauser EL, Caldwell DG, Billard AG (2010) Learning and reproduction of gestures by imitation: an approach based on hidden Markov model and Gaussian mixture regression. IEEE Robot Autom Mag 17(2):44–54Google Scholar
- Calinon S, Lee D (2018, in press) Learning control. In: Vadakkepat P, Goswami A (eds) Humanoid robotics: a reference. Springer. https://doi.org/10.1007/978-94-007-7194-9_68-2
- Calinon S, Li Z, Alizadeh T, Tsagarakis NG, Caldwell DG (2012) Statistical dynamical systems for skills acquisition in humanoids. In: Proceedings of IEEE international conference on humanoid robots (Humanoids), Osaka, pp 323–329Google Scholar
- Chen J, Lau HYK, Xu W, Ren H (2016) Towards transferring skills to flexible surgical robots with programming by demonstration and reinforcement learning. In: Proceedings of international conference on advanced computational intelligence, pp 378–384, Feb 2016Google Scholar
- Evrard P, Gribovskaya E, Calinon S, Billard AG, Kheddar A (2009) Teaching physical collaborative tasks: object-lifting case study with a humanoid. In: Proceedings of IEEE international conference on humanoid robots (Humanoids), Paris, pp 399–404, Dec 2009Google Scholar
- Krishnan S, Garg A, Patil S, Lea C, Hager G, Abbeel P, Goldberg K (2015) Unsupervised surgical task segmentation with milestone learning. In: Proceedings of international symposium on robotics research (ISRR)Google Scholar
- Lee SH, Suh IH, Calinon S, Johansson R (2012) Learning basis skills by autonomous segmentation of humanoid motion trajectories. In: Proceedings of IEEE international conference on humanoid robots (Humanoids), Osaka, pp 112–119Google Scholar
- Liu W, Dai B, Humayun A, Tay C, Yu C, Smith LB, Rehg JM, Song L (2017) Iterative machine teaching. In: Proceedings of international conference on machine learning (ICML), Sydney, Aug 2017Google Scholar
- Nehaniv CL, Dautenhahn K (2002) The correspondence problem. In: Dautenhahn K, Nehaniv CL (eds) Imitation in animals and artifacts. MIT Press, Cambridge, pp 41–61Google Scholar
- Nehaniv CL, Dautenhahn K (eds) (2007) Imitation and social learning in robots, humans, and animals: behavioural, social and communicative dimensions. Cambridge University Press, CambridgeGoogle Scholar
- Padoy N, Hager GD (2011) Human-machine collaborative surgery using learned models. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 5285–5292, May 2011Google Scholar
- Paraschos A, Daniel C, Peters J, Neumann G (2013) Probabilistic movement primitives. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems (NIPS). Curran Associates, Inc., Red Hook, pp 2616–2624Google Scholar
- Ratliff N, Ziebart BD, Peterson K, Bagnell JA, Hebert M, Dey A, Srinivasa S (2009) Inverse optimal heuristic control for imitation learning. In: International conference on artificial intelligence and statistics (AIStats), pp 424–431, Apr 2009Google Scholar
- Reiley CE, Plaku E, Hager GD (2010) Motion generation of robotic surgical tasks: learning from expert demonstrations. In: International conference on IEEE engineering in medicine and biology society (EMBC), pp 967–970Google Scholar
- Rueckert E, Mundo J, Paraschos A, Peters J, Neumann G (2015) Extracting low-dimensional control variables for movement primitives. In: Proceedings of IEEE international conference on robotics and automation (ICRA), Seattle, pp 1511–1518Google Scholar
- Savarimuthu TR, Buch AG, Schlette C, Wantia N, Rossmann J, Martinez D, Alenya G, Torras C, Ude A, Nemec B, Kramberger A, Worgotter F, Aksoy EE, Papon J, Haller S, Piater J, Kruger N (2018) Teaching a robot the semantics of assembly tasks. IEEE Trans Syst Man Cybernet Syst 48(5):670–692CrossRefGoogle Scholar
- Todorov E, Jordan MI (2002) A minimal intervention principle for coordinated movement. In: Advances in neural information processing systems (NIPS), pp 27–34Google Scholar
- Zeestraten MJA, Calinon S, Caldwell DG (2016) Variable duration movement encoding with minimal intervention control. In: Proceedings of IEEE international conference on robotics and automation (ICRA), May 2016, Stockholm, pp 497–503Google Scholar