1 Introduction

Rapid developments in social robotics and the accompanying integration of robots into human daily life yield important questions for the future of society. For instance, in the educational context, one may ask whether robots could one day assist teachers in the classroom by first learning from the teachers and then, by transferring the acquired knowledge and skills to students, as humans do. To answer this question, understanding how humans and robots learn from other agents [1] and especially by learning through imitation-an exceptionally developed capacity in humans [2, 3]-is of primary importance. We envision that social robots could rely on, for example, multiple human teachers and tutors and use the principles of observational learning (e.g., to observe and prioritize the demonstrators that would maximize learning gains) to extract the features that characterize human teaching and tutoring expertise.

Two families of imitation techniques can be identified in robotics, which have been developed respectively in the last two decades. First, robots can learn through means of demonstrations [4,5,6,7,8]. This approach, also known as programming through demonstration or imitation learning, employs action examples provided by a teacher or demonstrator, which reduces the complexity of search spaces for learning, thus overcoming the abortive slowness of learning through experience alone using standard learning methods, including reinforcement learning and its variations. However, the main downside of this approach is the requirement of a demonstrator’s kinaesthetic guidance, which places constraints on the learning situation (e.g., physical presence, availability of the demonstrator, a limited number of demonstrators), precluding the use of valuable resources such as videos [9,10,11].

A second, more recent family of imitation-learning techniques in robotics, drawing closer to the flexible and common ability of humans to rely solely on observations [12,13,14,15,16] is imitation learning from observation [10, 11, 17, 19]. In this perspective, the agent learns to perform imitation learning via perceptual information only, without action guidance, where the robot learns (through inverse reinforcement learning [10, 17]) or mirrors (based on the retrieval of prior visuo-motor associations [20,21,22]) the action policy from the perceptually extracted features. Because learning through observation offers the possibility to exploit a vast amount of virtual resources in addition to in-presence observations and social interactions, new challenges come into consideration regarding how to filter relevant information. In this regard, who-to-imitate becomes a central question, as selective social mechanisms may be needed to choose, among many exemplars, which demonstrations best align with the objectives of the learning activity, for example, through recognizing that specific demonstrators are not worth imitating. We note that the who-to-imitate question has received limited attention relative to the other fundamental questions of what and how to imitate, which are extensively discussed in the work of [8, 18, 23,24,25,26] and therefore, are not addressed here.

In this paper, we emphasize selective social learning as a supplementary avenue to circumvent the challenges in current imitation learning methods, envisioning that robots could be endowed with the strategic choice to follow a given demonstrator in line with imitation learning objectives. The rest of the manuscript is organized as follows: First, we briefly explore the concept of selective social learning in humans by discussing the attributes known to influence humans’ preferences for certain informants and demonstrators. Then, we explore similar aspects of selective social learning in robotics. Finally, with the perspective of imitation learning from observation in mind, we propose a conceptual bridge between humans and robots on how to foster selective social imitation by exploring motivational factors, as possible answers to the who-to-imitate question.

2 Selective Social Learning

2.1 Selective Social Learning in Humans

Humans are selective social learners. A number of factors underlying human’s ability to trust a message and/or learn from specific partners-while not from others-have been identified in the literature [27,28,29,30], for reviews). According to Mills [27], attributes pertaining to three domains underlay the ability to adopt a critical stance in children and adults: attributes relate to the informant, the message, and the target (e.g., the learner). First, as a general tendency, a message’s acceptability by a target is more likely when an informant shows a high degree of expertise, which may be inferred based on age, authority, and familiarity, in addition to showing positive attitudes during the delivery of the message. Second, to be accepted as valid, the message itself should be highly accurate, matching the informant’s domain of expertise, and be generalizable to other domains. Third, the target’s skills are needed to decode the informant’s intentions, detect flaws, use prior knowledge, and be motivated to critically evaluate the reliability of the message.

Another classification is provided by Harris et al. [28], based on studies in children. The authors first describe a class of social-affective attributes of the informant-target interaction, where a message acceptability depends on the informant’s effective use-and the learner’s perception-of congruent affective signals, and an appropriate use of gestures and vocalizations. Second, cognitive attributes are proposed, including the learner’s ability to reason about the informant, that is, to perceive the reliability and accuracy of the informant. Third, social-positional attributes include considerations of the informant’s social standing and personality: positive attitudes, same in-group membership, familiarity, and high social status. Fourth, attributes concern the learner’s ability to use prior knowledge in the evaluation of the message. For instance, if the message contradicts the learner’s strong intuitions or expectations, it is less likely to be accepted as valid.

Focusing on situations involving infant learners, Poulin-Dubois and Brosseau-Liard [29] propose a distinction in terms of cues that directly or indirectly evidence the competency of the informant. First, the direct cues include the degree of accuracy of informants during the learning task, where infants attend to and learn more (e.g., in a word-learning task) with informants that were previously accurate, who exhibit a competent behaviour in the task (e.g., using an object in appropriate ways) and make a congruent use of emotional expressions (e.g., reacting positively to positive experiences). Second, the authors report cues that tend to evidence the competence of the informant more indirectly: when the information receives agreement by a third party, when the informant is older than the learner, or shows confidence rather than uncertainty.

Consistent with these studies, two broad classes of attributes can be delineated, echoing Koenig and Sabbagh’s [30] distinction in terms of epistemic and non-epistemic attributes. First, epistemic attributes in the process of information transfer relate to the (a) degree of expertise of the informant, through evaluation of accuracy or knowledge level shown in (related) past or current tasks by the learner, and (b) the learner’s ability to use prior knowledge. Second, non-epistemic attributes pertain to (c) the social-positional domain (e.g., group membership, age, social standing) and (d) social-affective processes (e.g., decoding emotional cues, inferring confidence).

2.2 Selective Social Learning in the Context of Imitation Learning from Observation

Regarding imitation more specifically, a growing body of studies on motor learning has documented the attributes that drive the learner’s preference towards certain demonstrators with the aim of reproducing/imitating a given behaviour through observation ([31, 32] for reviews). Note that the focus made here on observational learning does not undermine the existence nor the importance of other forms of social learning in humans, such as instructed and collaborative learning [33, 34].

In line with the literature on message acceptability, the observation and selection of specific demonstrators by humans in the field of motor learning is motivated by factors that largely relate to epistemic considerations including the expertise level of the demonstrator [31, 32, 35, 36], in addition to learners’ self-assessment of task difficulty and competence level [37, 38]. Compelling evidence shows the role of a skilled demonstrator in enhancing learners’ motor execution and retention of the to-be-learned behaviour. The benefit of a skilled demonstrator is explained in terms of a correct performance providing more accurate representations of the task [31, 32, 36]. Despite this general trend, the benefit of an unskilled demonstrator compared to a skilled one is sometimes reported, in particular, in that it promotes cognitive engagement and error detection [31, 32, 36, 39]. Similarly, a so-called coping model, transiting from being unskilled to being skilled on a given task, may enhance the observers’ affective/cognitive outcomes such as self-efficacy and perceived task difficulty [31, 32, 39, 40]. For example, human learners holding doubts about their ability to learn a given task may be more interested in peer models (who share similar characteristics in terms of age and perceived competence) and may learn more by observing them gradually progress in a task than by observing an expert demonstrator directly [39,40,41]. Another finding concerns the benefits of observing a combination of demonstrators with different skill levels on skill execution and retention compared to a single demonstrator [31, 32, 36].

Taken together, studies on selective social learning crucially stress epistemic factors as a major driving force behind a learner’s preference towards specific informants and demonstrators. As we can see in the next section, robotic studies similarly place great emphasis on epistemic considerations to guide the selection of learning partners, that is, those partners who would maximize the robots’ internal reward and learning gains [42,43,44].

2.3 Selective Social Learning in Robots

Selective social learning is an important concept that allows the robots to differentiate reliable and unreliable information sources (e.g., interaction partners) and leverage information provided by the reliable sources to perform a given task. Here we focus on selective learning studies in the context of forming varying degrees of trust in interaction partners and selecting the trustworthy partner based on a robot’s internal signals: increasing the cumulative reward, decreasing computational cognitive load, to mention a few. Trust studies in the literature have been addressed in three distinct research directions: human trust in robots, robot trust in humans, and reciprocal trust. Most human trust-in-robots studies focus on determining core components of trust, including predictability of the robot, anthropomorphism of the robot design, etc. [45, 46]. Although human trust in robots is the dominant research direction in the literature, there have been notable attempts to propose robot trust models by using epistemic attributes of the robot and the partner and reciprocal trust between humans and robots [47, 48].

Here, in line with the present paper’s scope, we review the studies that address robot trust in interaction partners, including simulated agents, humans, and robots, by using the epistemic attributes. Kirtay et al. [49] presented a robot trust model on the Nao robot in an interactive sequential pattern recalling tasks with online pre-programmed partners that have reliable, unreliable, and random strategies to guide the robot. In this setting, interaction partners guide the Nao robot to perform visual recall tasks [47]. For instance, a reliable partner will provide useful suggestions to the robot that enables the robot to process the visual patterns that were associated with a low cost (i.e., low computational load incurred on the robot to recall a visual pattern). In turn, interacting with reliable partners yield increments in cumulative reward during the experiment. After conducting the experiments with all partners, the authors provide a free-choice to the robot to select the trustworthy interaction partner who reduces the robot’s cognitive load in performing the interactive task. We note that the authors also extended the same robot trust model in multimodal human–robot and robot-robot interaction settings [50, 51]. Chen et al. [52] presented a computational model that infers human trust by a robot in a collaborative tabletop task (i.e., cleaning a table) between a robot arm and a human partner. Overall, the study concludes that the robot with the trust model leads to a better task performance by using the cumulative reward relative to the robot without the trust model. Patacchiola and Cangelosi [53] proposed a robot trust model implemented on the iCub robot. In a first object naming learning phase, the robot builds prior knowledge by receiving accurate information linking objects to correct labels. Then, in a familiarisation phase, reliable and unreliable informants respectively provide correct and incorrect labels. Lasty, in an explicit judgement phase, the robot assesses the trustworthiness of the partners and recognizes the reliable partner by using its prior knowledge acquired during the learning phase.

Overall, robot trust studies are dedicated to provide computational accounts of a robot’s trust, based on the interaction partners’ epistemic attributes during learning tasks (e.g., accuracy and reliability of the information by the partner and the robot’s performance) that influence the social interaction dynamics (e.g., increasing the cumulative reward).

2.4 Selective Social Learning in the Context of Imitation Learning from Demonstration

Despite a growing body of research devoted to imitation learning from observation ([10, 11, 18, 19, 54], there is still, to our knowledge, a lack of research investigating selective social learning in this context. There are, however, some few studies using the principle of selective social learning in the context of learning from demonstration. For example, a study by Nguyen and Oudeyer [44] combined intrinsic motivation and socially guided learning on a robot arm to explore and learn different motor movements (e.g., placing a fishing can on a task space area) to complete a fishing task. In this setting, the robot either performed autonomous exploration or imitated teachers' policies via kinaesthetic teaching to reach a specific position on the surface area. The active strategic choice of the robot to either self-explore or rely on social guidance at each learning episode was based on the previously developed intrinsic-motivation algorithm [55], that is, an architecture guiding the robotic agent to prioritize options decreasing sensorimotor prediction error in reaching a given outcome and, thus, maximize learning progress.

More importantly, the authors also emphasized the who-to-imitate question by using the previous intrinsic-motivation architecture to integrate demonstrations from multiple teachers in order to determine which teacher is the most expert and should be imitated. Three demonstrators provided kinaesthetic information with variable levels of expertise on two types of tasks (i.e., throwing or placing a ball with a fishing rod). Applying the principle of prediction error minimization to differentiate between the three demonstrators for each type of task, the SGIM-ACTS architecture learned who to ask for demonstrations among the available teachers by deciding which demonstrator yielded the biggest learning progress for each task when imitated.

In the next section, we build further on the idea of social selective learning by focusing on motivational factors documented in psychological research to give insights on how to further develop current approaches to learning from observation in robotics. Owing to the ubiquitous role of epistemic attributes in guiding human and robots’ preferences for specific demonstrators and informants, in the following, we will focus on motivational factors that take these considerations into account.

3 Motivational Factors Guiding Observational Learning

3.1 Expectancies Related to Outcomes and Self-efficacy

In humans, an important part of the motivation process at play when learning from observation is the estimation of activity-related expectations in terms of positive outcomes and self-efficacy, according to the Social Learning Theory [56, 57]. While outcome expectancy refers to an individual’s evaluation that the behaviour will lead to a certain outcome (i.e., Would imitating this demonstration effectively lead me to hand write the letter “a”?), efficacy expectation is the belief that one can successfully perform the behavior required to produce the outcome (i.e., Will I be capable of handwriting the letter “a'' considering the difficulty to imitate this demonstration?). It is assumed that individuals develop and regulate belief in their ability to ensure that acts or events occur, which acts as an expected reinforcement through increased internal reward (i.e., a positive internal reinforcement).

However, the positive reward associated with successful observational learning may come at a cost related to the characteristics of the task in terms of difficulty and time, according to the Expectancy-Value Theory [58, 59]. Pursuing a costly activity is thought to depend on how much effort can be deployed considering task difficulty, which follows a non-monotonic relationship [60]. This means that as long as the objective remains achievable from the agent’s perspective, higher perceived expectancies (in terms of actual outcomes and self-efficacy) lead to greater internal motivation, allocation of cognitive resources, intensity, and persistence at the service of the task [60,61,62]. However, beyond acceptable levels of task difficulty, the relationship between effort and task difficulty may turn into a negative reinforcement for the agent as the efforts are no longer efficient, leading to task disengagement [60].

Therefore, the notion of cost is particularly essential for determining task choice, as it reflects the amount of “sacrifice” that an agent can tolerate when pursuing the desired learning activity. It has strong implications for selective social imitation learning in humans and robots, as it allows to determine whether a demonstrator is worth being imitated and if so, the duration and strength of an agent’s motivation in engaging in that imitation.

3.2 Benefit-over-cost

To determine whether a particular demonstrator is worth being imitated, a decision-making process might need to take place resulting from a benefit-over-cost analysis, understood as weighing the anticipated costs of effort against the anticipated benefits [62,63,64]. One important element to consider in the benefit-over-cost analysis is the amount of irrelevant and/or missing information in the demonstration. Cognitive psychology has long documented that suppressing irrelevant information consumes cognitive resources [65,66,67,68] and a recent study investigated this question with regard to the benefit-over-cost analysis in human learning. Using computational models based on data from a reinforcement learning task embedded with task-irrelevant information, Sidarus et al. [69] revealed that decision making was affected and learning reduced as resulting from conflict costs related to irrelevant information processing being traded off against expected rewards. From a robotic perspective, the notion of cost can be easily understood in terms of computational resources expenditure during task completion. For example, facing an unclear or incomplete demonstration, the observer robot might need to allocate a great amount of resources to clarify and/or fill incomplete parts of the demonstration [54] which could be avoided by favoring another demonstrator. Therefore, the anticipation of imitation-related costs may represent a crucial component of a robot’s observational learning and, a fortiori, be useful for the purpose of selective social imitation learning.

4 Towards Connecting Who to Imitate with Motivation

Here we suggest that learning from observation in robotics may benefit from leveraging motivational factors [70] with the idea that an agent is most motivated when the task to be learned is both achievable and yields high learning gains [55, 71]. The conceptual contribution of this paper proposes that imitation-related outcome expectancy, in terms of anticipated learning gains, be weighed against self-efficacy expectancy, in terms of the anticipated costs required to overcome the demonstrator’s unavailable/infeasible actions. This benefit-over-cost analysis could be applied sequentially to several demonstrators, whose outputs (i.e., ratio) would characterize their respective worthiness for imitation and thus the motivation for the robot to prioritize specific demonstrators. We emphasize that this could apply to both virtual and physically-present demonstrators if the robot is equipped with cameras to record the demonstrator’s movements, thus relying on visual information throughout the imitation process. Kinaesthetic teaching is viewed as an efficient resource that could complement this approach. Nevertheless, we note that our aim is to provide insights to help address the who-to-imitate question, not to propose an approach for how to imitate. In the next section, we will use the example of learning how to hand-write the letter “a” to illustrate our purpose, as this entails direct applications for the educational context where social robots might be particularly useful for tutoring physical skills [72].

4.1 Outcome Expectancy Viewed as Anticipated Learning Gains

A simple way to consider artificial outcome expectancy is as resulting from the comparison between three visual patterns (i.e., images), which output value would represent the anticipated learning gains in case the imitation of the demonstrator is to be performed. Using our example of hand-writing the letter “a”, a first image may correspond to the robot’s prior knowledge about the goal (cf. the upper right square in panel a, Fig. 1), in other words, how a prototypical letter “a” should look like in the robot’s memory.

Fig. 1
figure 1

Illustration of an Agent’s Outcome Expectancy (learning gains: upper panel a) and Self-efficacy Expectancy (cost due to the incompleteness of the demonstration: lower panel b). Note. In panel b. the imitation requires to fill in unfeasible parts of the demonstration, which consumes additional computational resources (i.e., anticipated costs) thus balancing out the anticipated learning gains displayed in panel a

The second image may correspond to the robot’s level of competence, that is, the final letter that the robot actually generates with its own policies (e.g., learned through prior goal-babbling activities) (cf. the bottom right square in panel a, Fig. 1). The third image may correspond to the demonstrator’s final letter, that is, how the letter “a” looks at the end of the demonstration (cf. the left square in panel a, Fig. 1). The latter image could be retrieved from the recorded demonstration using an event detection system searching for the closest representation of the letter “a” contained in the robot’s memory. The anticipated learning gains could be a function that accepts the visual pattern generated by the demonstrator, the robot’s prototypical letter, and the pattern drawn by the robot. This function compares these three patterns to assess whether imitating the demonstrator’s movement trajectories would potentially lead to minimizing the competence-related difference (i.e., the anticipated learning gain value) between the robot’s actual outcome and desired outcome, that is, better approximating the prototypical letter image held in memory.

4.2 Self-efficacy Expectancy Viewed as Anticipated Cost

One flexible aspect of observational learning is to actively determine whether the demonstration is actually reproducible by the observer. To this end, we envision that a self-efficacy expectancy function could serve this purpose. Self-efficacy expectancy is understood as the estimation of the anticipated costs during a first attempt to execute the imitation. Contrary to the anticipated learning gain values obtained from fixed images, the value for anticipated costs involves the temporal discourse of the demonstrators’ actions. This would allow the robot to segment the demonstrator’s movements into sub-goals (e.g., a given number of time epochs or segments) in order to distinguish feasible segments from infeasible ones in the demonstration [54, 73]. Figure 1, panel b., displays an example of a demonstration incompleteness related to invisible movement trajectories deployed by the demonstrator while hand-writing the letter “a”.

After segmenting the demonstration into sub-goals, a total error (i.e., the anticipated costs value) calculated based on the proportion of infeasible segments out of the total number of segments could be derived. For example, only 30% of the segments for a demonstrator A might be infeasible after execution attempts of all segment, compared to a prohibitive 80% for a demonstrator B. The anticipated cost value is then to be outweighed against the value for anticipated learning gains in the benefit-over-cost analysis,Footnote 1 which will ultimately guide the selection and prioritization of demonstrators.

4.3 Using the Benefit-over-cost Analysis to Guide the Selection of Specific Demonstrators for Imitation

The benefit-over-cost analysis, the proposed instance guiding the selection of specific demonstrators, corresponds to the ratio of anticipated learning gains and cost for each demonstrator separately, and is illustrated in Fig. 2.

Fig. 2
figure 2

Schematic representation of the selection process of a demonstrator during observational learning. Note. Even though Demonstrator 2 yields more promising learning gains (black dotted horizontal line), Demonstrator 1 is prioritized as it leads to a net benefit. If the first two prioritized demonstrators fall in the same category, a more fine-grained analysis may be conducted within this category. For the sake of simplicity, we include discrete categories (1 to 4, ranging from high to low intrinsic motivation) to evaluate the worthiness of each demonstrator. Categories 2 and 3 are ranked by giving priority to the expected outcome (achieve high learning gains at the expense of many efforts) over expected self-efficacy (achieve small learning gains at the expense of little effort) but the priority could be inverted in case of time and resource constraints are put on the robot

For each demonstrator, the analysis may lead to four different scenarios. In scenario 1, important anticipated gains clearly outweigh the minimal cost required to imitate the demonstration, in other words, high outcome expectancies and high efficacy expectations, i.e., the most interesting option. In scenarios 2 and 3, there is no net benefit in pursuing the imitation. Scenario 2 corresponds to high outcome expectancies but low efficacy expectations (important learning gains are anticipated but the imitation is deemed difficult to follow). Conversely, Scenario 3 corresponds to low outcome expectancy but high self-efficacy (small learning gains are anticipated but the imitation is deemed easy to follow). In scenario 4, both outcome expectancies and efficacy expectations are low as the large anticipated costs clearly outweigh the small anticipated benefits, i.e., the least interesting option. After this process is conducted for each demonstrator separately, a comparison of the benefit-over-cost outputs could be performed across the different demonstrators to determine which one should be selected and prioritized. The prioritized demonstrations may then be subjected to multiple imitation iterations by the robot to refine the movement trajectory.

5 Conclusion and Future Research Directions

This paper has elaborated on imitation learning and focused especially on who-to-imitate, a fundamental question for social learning yet overlooked in social robotics. We have placed a particular emphasis on learning from observation, where imitation does not necessarily depend on action guidance, which is viewed as a highly flexible and common human-like ability [12,13,14,15,16] that should be considered in social robots.

Considering possible applications for education, we have discussed the notions of outcome expectancy and self-efficacy expectancy as potent factors deployed during observation to orient agents to prioritize specific demonstrators for imitation. Although we introduce the possible connection of observational learning with motivational factors for the purpose of hand-drawing letters, the same concepts could also be extended to different teaching purposes involving motor tasks: drawing specific shapes for the objects that have multiple parts (e.g., legs, hand, head for humans) and forming complex objects, (e.g., plane, by attaching lego-like small elements together), to mention a few.

Most of the attention of the present paper has been paid to epistemic considerations (i.e., the expertise level of the demonstrator and feasibility of the demonstration by the learner) to explain an agent’s social selection, which are consistently put forward both in psychology and robotics studies. However, we acknowledge that non-epistemic considerations may also be highly relevant for imitation learning (i.e., in particular, the social-affective classes of attributes highlighted in Sect. 2). In this regard, imitation learning architectures could integrate some of these non-epistemic attributes as possible future research directions, for instance, via vicarious reinforcement to guide the selection of demonstrators [74, 75]. This may correspond to the use of affective feedback provided on the demonstrator’s performance to infer the consequences that pursuing the imitation would entail for the robot. For example, a demonstrator’s failures suggested by the expression of negative emotions might be used as a warning to either discard the demonstrator or, more interestingly, to identify specific errors that should be avoided [76] which is especially shown in human coping models [31, 32, 36]. That direction is particularly promising considering the advances made on the recognition of emotions both on faces and body poses in robotics [77, 78], as well as on the modelling of artificial empathy [79,80,81], which are yet to be connected with imitation learning.